Science.gov

Sample records for affymetrix 250k snp

  1. Array-based karyotyping for prognostic assessment in chronic lymphocytic leukemia: performance comparison of Affymetrix 10K2.0, 250K Nsp, and SNP6.0 arrays.

    PubMed

    Hagenkord, Jill M; Monzon, Federico A; Kash, Shera F; Lilleberg, Stan; Xie, Qingmei; Kant, Jeffrey A

    2010-03-01

    Specific chromosomal alterations are recognized as important prognostic factors in chronic lymphocytic leukemia (CLL). Array-based karyotyping is gaining acceptance as an alternative to the standard fluorescence in situ hybridization (FISH) panel for detecting these aberrations. This study explores the optimum single nucleotide polymorphism (SNP) array probe density for routine clinical use, presents clinical validation results for the 250K Nsp Affymetrix SNP array, and highlights clinically actionable genetic lesions missed by FISH and conventional cytogenetics. CLL samples were processed on low (10K2.0), medium (250K Nsp), and high (SNP6.0) probe density Affymetrix SNP arrays. Break point definition and detection rates for clinically relevant genetic lesions were compared. The 250K Nsp array was subsequently validated for routine clinical use and demonstrated 98.5% concordance with the standard CLL FISH panel. SNP array karyotyping detected genomic complexity and/or acquired uniparental disomy not detected by the FISH panel. In particular, a region of acquired uniparental disomy on 17p was shown to harbor two mutated copies of TP53 that would have gone undetected by FISH, conventional cytogenetics, or array comparative genomic hybridization. SNP array karyotyping allows genome-wide, high resolution detection of copy number and uniparental disomy at genomic regions with established prognostic significance in CLL, detects lesions missed by FISH, and provides insight into gene dosage at these loci.

  2. Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform.

    PubMed

    Eckel-Passow, Jeanette E; Atkinson, Elizabeth J; Maharjan, Sooraj; Kardia, Sharon L R; de Andrade, Mariza

    2011-05-31

    Copy number data are routinely being extracted from genome-wide association study chips using a variety of software. We empirically evaluated and compared four freely-available software packages designed for Affymetrix SNP chips to estimate copy number: Affymetrix Power Tools (APT), Aroma.Affymetrix, PennCNV and CRLMM. Our evaluation used 1,418 GENOA samples that were genotyped on the Affymetrix Genome-Wide Human SNP Array 6.0. We compared bias and variance in the locus-level copy number data, the concordance amongst regions of copy number gains/deletions and the false-positive rate amongst deleted segments. APT had median locus-level copy numbers closest to a value of two, whereas PennCNV and Aroma.Affymetrix had the smallest variability associated with the median copy number. Of those evaluated, only PennCNV provides copy number specific quality-control metrics and identified 136 poor CNV samples. Regions of copy number variation (CNV) were detected using the hidden Markov models provided within PennCNV and CRLMM/VanillaIce. PennCNV detected more CNVs than CRLMM/VanillaIce; the median number of CNVs detected per sample was 39 and 30, respectively. PennCNV detected most of the regions that CRLMM/VanillaIce did as well as additional CNV regions. The median concordance between PennCNV and CRLMM/VanillaIce was 47.9% for duplications and 51.5% for deletions. The estimated false-positive rate associated with deletions was similar for PennCNV and CRLMM/VanillaIce. If the objective is to perform statistical tests on the locus-level copy number data, our empirical results suggest that PennCNV or Aroma.Affymetrix is optimal. If the objective is to perform statistical tests on the summarized segmented data then PennCNV would be preferred over CRLMM/VanillaIce. Specifically, PennCNV allows the analyst to estimate locus-level copy number, perform segmentation and evaluate CNV-specific quality-control metrics within a single software package. PennCNV has relatively small bias

  3. Genome wide linkage study, using a 250K SNP map, of Plasmodium falciparum infection and mild malaria attack in a Senegalese population.

    PubMed

    Milet, Jacqueline; Nuel, Gregory; Watier, Laurence; Courtin, David; Slaoui, Yousri; Senghor, Paul; Migot-Nabias, Florence; Gaye, Oumar; Garcia, André

    2010-07-15

    Multiple factors are involved in the variability of host's response to P. falciparum infection, like the intensity and seasonality of malaria transmission, the virulence of parasite and host characteristics like age or genetic make-up. Although admitted nowadays, the involvement of host genetic factors remains unclear. Discordant results exist, even concerning the best-known malaria resistance genes that determine the structure or function of red blood cells. Here we report on a genome-wide linkage and association study for P. falciparum infection intensity and mild malaria attack among a Senegalese population of children and young adults from 2 to 18 years old. A high density single nucleotide polymorphisms (SNP) genome scan (Affimetrix GeneChip Human Mapping 250K-nsp) was performed for 626 individuals: i.e. 249 parents and 377 children out of the 504 ones included in the follow-up. The population belongs to a unique ethnic group and was closely followed-up during 3 years. Genome-wide linkage analyses were performed on four clinical and parasitological phenotypes and association analyses using the family based association tests (FBAT) method were carried out in regions previously linked to malaria phenotypes in literature and in the regions for which we identified a linkage peak. Analyses revealed three strongly suggestive evidences for linkage: between mild malaria attack and both the 6p25.1 and the 12q22 regions (empirical p-value=5x10(-5) and 9x10(-5) respectively), and between the 20p11q11 region and the prevalence of parasite density in asymptomatic children (empirical p-value=1.5x10(-4)). Family based association analysis pointed out one significant association between the intensity of plasmodial infection and a polymorphism located in ARHGAP26 gene in the 5q31-q33 region (p-value=3.7x10(-5)). This study identified three candidate regions, two of them containing genes that could point out new pathways implicated in the response to malaria infection

  4. Genome Wide Linkage Study, Using a 250K SNP Map, of Plasmodium falciparum Infection and Mild Malaria Attack in a Senegalese Population

    PubMed Central

    Milet, Jacqueline; Nuel, Gregory; Watier, Laurence; Courtin, David; Slaoui, Yousri; Senghor, Paul; Migot-Nabias, Florence; Gaye, Oumar; Garcia, André

    2010-01-01

    Multiple factors are involved in the variability of host's response to P. falciparum infection, like the intensity and seasonality of malaria transmission, the virulence of parasite and host characteristics like age or genetic make-up. Although admitted nowadays, the involvement of host genetic factors remains unclear. Discordant results exist, even concerning the best-known malaria resistance genes that determine the structure or function of red blood cells. Here we report on a genome-wide linkage and association study for P. falciparum infection intensity and mild malaria attack among a Senegalese population of children and young adults from 2 to 18 years old. A high density single nucleotide polymorphisms (SNP) genome scan (Affimetrix GeneChip Human Mapping 250K-nsp) was performed for 626 individuals: i.e. 249 parents and 377 children out of the 504 ones included in the follow-up. The population belongs to a unique ethnic group and was closely followed-up during 3 years. Genome-wide linkage analyses were performed on four clinical and parasitological phenotypes and association analyses using the family based association tests (FBAT) method were carried out in regions previously linked to malaria phenotypes in literature and in the regions for which we identified a linkage peak. Analyses revealed three strongly suggestive evidences for linkage: between mild malaria attack and both the 6p25.1 and the 12q22 regions (empirical p-value = 5×10−5 and 9×10−5 respectively), and between the 20p11q11 region and the prevalence of parasite density in asymptomatic children (empirical p-value = 1.5×10−4). Family based association analysis pointed out one significant association between the intensity of plasmodial infection and a polymorphism located in ARHGAP26 gene in the 5q31–q33 region (p-value = 3.7×10−5). This study identified three candidate regions, two of them containing genes that could point out new pathways implicated in the response to

  5. Non-targeted whole genome 250K SNP array analysis as replacement for karyotyping in fetuses with structural ultrasound anomalies: evaluation of a one-year experience.

    PubMed

    Faas, Brigitte H W; Feenstra, Ilse; Eggink, Alex J; Kooper, Angelique J A; Pfundt, Rolph; van Vugt, John M G; de Leeuw, Nicole

    2012-04-01

    We evaluated both clinical and laboratory aspects of our new strategy offering quantitative fluorescence (QF)-PCR followed by non-targeted whole genome 250K single-nucleotide polymorphism array analysis instead of routine karyotyping for prenatal diagnosis of fetuses with structural anomalies. Upon the detection of structural fetal anomalies, parents were offered a choice between QF-PCR and 250K single-nucleotide polymorphism array analysis (QF/array) or QF-PCR and routine karyotyping (QF/karyo). Two hundred twenty fetal samples were included. In 153/220 cases (70%), QF/array analysis was requested. In 35/153 (23%), an abnormal QF-PCR result was found. The remaining samples were analyzed by array, which revealed clinically relevant aberrations, including two known microdeletions, in 5/118 cases. Inherited copy number variants were detected in 11/118 fetuses, copy number variants with uncertain clinical relevance in 3/118 and homozygous stretches in 2/118. In 67/220 (30%) fetuses, QF/karyo was requested: 23/67 (34%) were abnormal with QF-PCR, and in 3/67, an abnormal karyotype was found. Even though QF/array does not reveal a high percentage of submicroscopic aberrations in fetuses with unselected structural anomalies, it is preferred over QF/karyo, as it provides a whole genome scan at high resolution, without additional tests needed and with a low chance on findings not related to the ultrasound anomalies. © 2012 John Wiley & Sons, Ltd.

  6. Evaluating the performance of Affymetrix SNP Array 6.0 platform with 400 Japanese individuals

    PubMed Central

    Nishida, Nao; Koike, Asako; Tajima, Atsushi; Ogasawara, Yuko; Ishibashi, Yoshimi; Uehara, Yasuka; Inoue, Ituro; Tokunaga, Katsushi

    2008-01-01

    Background With improvements in genotyping technologies, genome-wide association studies with hundreds of thousands of SNPs allow the identification of candidate genetic loci for multifactorial diseases in different populations. However, genotyping errors caused by genotyping platforms or genotype calling algorithms may lead to inflation of false associations between markers and phenotypes. In addition, the number of SNPs available for genome-wide association studies in the Japanese population has been investigated using only 45 samples in the HapMap project, which could lead to an inaccurate estimation of the number of SNPs with low minor allele frequencies. We genotyped 400 Japanese samples in order to estimate the number of SNPs available for genome-wide association studies in the Japanese population and to examine the performance of the current SNP Array 6.0 platform and the genotype calling algorithm "Birdseed". Results About 20% of the 909,622 SNP markers on the array were revealed to be monomorphic in the Japanese population. Consequently, 661,599 SNPs were available for genome-wide association studies in the Japanese population, after excluding the poorly behaving SNPs. The Birdseed algorithm accurately determined the genotype calls of each sample with a high overall call rate of over 99.5% and a high concordance rate of over 99.8% using more than 48 samples after removing low-quality samples by adjusting QC criteria. Conclusion Our results confirmed that the SNP Array 6.0 platform reached the level reported by the manufacturer, and thus genome-wide association studies using the SNP Array 6.0 platform have considerable potential to identify candidate susceptibility or resistance genetic factors for multifactorial diseases in the Japanese population, as well as in other populations. PMID:18803882

  7. A Single-Array-Based Method for Detecting Copy Number Variants Using Affymetrix High Density SNP Arrays and its Application to Breast Cancer.

    PubMed

    Li, Ming; Wen, Yalu; Fu, Wenjiang

    2014-01-01

    Cumulative evidence has shown that structural variations, due to insertions, deletions, and inversions of DNA, may contribute considerably to the development of complex human diseases, such as breast cancer. High-throughput genotyping technologies, such as Affymetrix high density single-nucleotide polymorphism (SNP) arrays, have produced large amounts of genetic data for genome-wide SNP genotype calling and copy number estimation. Meanwhile, there is a great need for accurate and efficient statistical methods to detect copy number variants. In this article, we introduce a hidden-Markov-model (HMM)-based method, referred to as the PICR-CNV, for copy number inference. The proposed method first estimates copy number abundance for each single SNP on a single array based on the raw fluorescence values, and then standardizes the estimated copy number abundance to achieve equal footing among multiple arrays. This method requires no between-array normalization, and thus, maintains data integrity and independence of samples among individual subjects. In addition to our efforts to apply new statistical technology to raw fluorescence values, the HMM has been applied to the standardized copy number abundance in order to reduce experimental noise. Through simulations, we show our refined method is able to infer copy number variants accurately. Application of the proposed method to a breast cancer dataset helps to identify genomic regions significantly associated with the disease.

  8. A Single-Array-Based Method for Detecting Copy Number Variants Using Affymetrix High Density SNP Arrays and its Application to Breast Cancer

    PubMed Central

    Li, Ming; Wen, Yalu; Fu, Wenjiang

    2014-01-01

    Cumulative evidence has shown that structural variations, due to insertions, deletions, and inversions of DNA, may contribute considerably to the development of complex human diseases, such as breast cancer. High-throughput genotyping technologies, such as Affymetrix high density single-nucleotide polymorphism (SNP) arrays, have produced large amounts of genetic data for genome-wide SNP genotype calling and copy number estimation. Meanwhile, there is a great need for accurate and efficient statistical methods to detect copy number variants. In this article, we introduce a hidden-Markov-model (HMM)-based method, referred to as the PICR-CNV, for copy number inference. The proposed method first estimates copy number abundance for each single SNP on a single array based on the raw fluorescence values, and then standardizes the estimated copy number abundance to achieve equal footing among multiple arrays. This method requires no between-array normalization, and thus, maintains data integrity and independence of samples among individual subjects. In addition to our efforts to apply new statistical technology to raw fluorescence values, the HMM has been applied to the standardized copy number abundance in order to reduce experimental noise. Through simulations, we show our refined method is able to infer copy number variants accurately. Application of the proposed method to a breast cancer dataset helps to identify genomic regions significantly associated with the disease. PMID:26279618

  9. Comparison of genotyping using pooled DNA samples (allelotyping) and individual genotyping using the affymetrix genome-wide human SNP array 6.0.

    PubMed

    Teumer, Alexander; Ernst, Florian D; Wiechert, Anja; Uhr, Katharina; Nauck, Matthias; Petersmann, Astrid; Völzke, Henry; Völker, Uwe; Homuth, Georg

    2013-07-26

    Genome-wide association studies (GWAS) using array-based genotyping technology are widely used to identify genetic loci associated with complex diseases or other phenotypes. The costs of GWAS projects based on individual genotyping are still comparatively high and increase with the size of study populations. Genotyping using pooled DNA samples, as also being referred as to allelotyping approach, offers an alternative at affordable costs. In the present study, data from 100 DNA samples individually genotyped with the Affymetrix Genome-Wide Human SNP Array 6.0 were used to estimate the error of the pooling approach by comparing the results with those obtained using the same array type but DNA pools each composed of 50 of the same samples. Newly developed and established methods for signal intensity correction were applied. Furthermore, the relative allele intensity signals (RAS) obtained by allelotyping were compared to the corresponding values derived from individual genotyping. Similarly, differences in RAS values between pools were determined and compared. Regardless of the intensity correction method applied, the pooling-specific error of the pool intensity values was larger for single pools than for the comparison of the intensity values of two pools, which reflects the scenario of a case-control study. Using 50 pooled samples and analyzing 10,000 SNPs with a minor allele frequency of >1% and applying the best correction method for the corresponding type of comparison, the 90% quantile (median) of the pooling-specific absolute error of the RAS values for single sub-pools and the SNP-specific difference in allele frequency comparing two pools was 0.064 (0.026) and 0.056 (0.021), respectively. Correction of the RAS values reduced the error of the RAS values when analyzing single pool intensities. We developed a new correction method with high accuracy but low computational costs. Correction of RAS, however, only marginally reduced the error of true differences

  10. Whole genome-wide association study using affymetrix SNP chip: a two-stage sequential selection method to identify genes that increase the risk of developing complex diseases.

    PubMed

    Yang, Howard H; Hu, Nan; Taylor, Philip R; Lee, Maxwell P

    2008-01-01

    Whole-genome association studies of complex diseases hold great promise to identify systematically genetic loci that influence one's risk of developing these diseases. However, the polygenic nature of the complex diseases and genetic interactions among the genes pose significant challenge in both experimental design and data analysis. High-density genotype data make it possible to identify most of the genetic loci that may be involved in the etiology. On the other hand, utilizing large number of statistic tests could lead to false positives if the tests are not adequately adjusted. In this paper, we discuss a two-stage method that sequentially applies a generalized linear model (GLM) and principal components analysis (PCA) to identify genetic loci that jointly determine the likelihood of developing disease. The method was applied to a pilot case-control study of esophageal squamous cell carcinoma (ESCC) that included 50 ESCC patients and 50 neighborhood-matched controls. Genotype data were determined by using the Affymetrix 10K SNP chip. We will discuss some of the special considerations that are important to the proper interpretation of whole genome-wide association studies, which include multiple comparisons, epistatic interaction among multiple genetic loci, and generalization of predictive models.

  11. 250-kW CW klystron amplifier for planetary radar

    NASA Technical Reports Server (NTRS)

    Cormier, Reginald A.; Mizuhara, Albert

    1992-01-01

    The design, construction, and performance testing is described of two Varian klystrons, model VKX-7864A, which replaced the aging and less efficient VA-949J klystrons in the X band planetary radar transmitter on the Goldstone, CA, 70 meter antenna. The project was carried out jointly by the JPL and Varian Assoc. Output power was increased from 200 to 250 kW continuous wave per klystron, and full dc beam power is dissipated in the collector (it was not possible to operate the VA-949J klystrons without RF drive because of limited collector dissipation capability). Replacement were made with a minimum of transmitter modifciations. The planetary radar transmitter is now operating successfully with these two klystrons.

  12. Fine-scaled human genetic structure revealed by SNP microarrays.

    PubMed

    Xing, Jinchuan; Watkins, W Scott; Witherspoon, David J; Zhang, Yuhua; Guthery, Stephen L; Thara, Rangaswamy; Mowry, Bryan J; Bulayeva, Kazima; Weiss, Robert B; Jorde, Lynn B

    2009-05-01

    We report an analysis of more than 240,000 loci genotyped using the Affymetrix SNP microarray in 554 individuals from 27 worldwide populations in Africa, Asia, and Europe. To provide a more extensive and complete sampling of human genetic variation, we have included caste and tribal samples from two states in South India, Daghestanis from eastern Europe, and the Iban from Malaysia. Consistent with observations made by Charles Darwin, our results highlight shared variation among human populations and demonstrate that much genetic variation is geographically continuous. At the same time, principal components analyses reveal discernible genetic differentiation among almost all identified populations in our sample, and in most cases, individuals can be clearly assigned to defined populations on the basis of SNP genotypes. All individuals are accurately classified into continental groups using a model-based clustering algorithm, but between closely related populations, genetic and self-classifications conflict for some individuals. The 250K data permitted high-level resolution of genetic variation among Indian caste and tribal populations and between highland and lowland Daghestani populations. In particular, upper-caste individuals from Tamil Nadu and Andhra Pradesh form one defined group, lower-caste individuals from these two states form another, and the tribal Irula samples form a third. Our results emphasize the correlation of genetic and geographic distances and highlight other elements, including social factors that have contributed to population structure.

  13. DMET-Analyzer: automatic analysis of Affymetrix DMET Data

    PubMed Central

    2012-01-01

    Background Clinical Bioinformatics is currently growing and is based on the integration of clinical and omics data aiming at the development of personalized medicine. Thus the introduction of novel technologies able to investigate the relationship among clinical states and biological machineries may help the development of this field. For instance the Affymetrix DMET platform (drug metabolism enzymes and transporters) is able to study the relationship among the variation of the genome of patients and drug metabolism, detecting SNPs (Single Nucleotide Polymorphism) on genes related to drug metabolism. This may allow for instance to find genetic variants in patients which present different drug responses, in pharmacogenomics and clinical studies. Despite this, there is currently a lack in the development of open-source algorithms and tools for the analysis of DMET data. Existing software tools for DMET data generally allow only the preprocessing of binary data (e.g. the DMET-Console provided by Affymetrix) and simple data analysis operations, but do not allow to test the association of the presence of SNPs with the response to drugs. Results We developed DMET-Analyzer a tool for the automatic association analysis among the variation of the patient genomes and the clinical conditions of patients, i.e. the different response to drugs. The proposed system allows: (i) to automatize the workflow of analysis of DMET-SNP data avoiding the use of multiple tools; (ii) the automatic annotation of DMET-SNP data and the search in existing databases of SNPs (e.g. dbSNP), (iii) the association of SNP with pathway through the search in PharmaGKB, a major knowledge base for pharmacogenomic studies. DMET-Analyzer has a simple graphical user interface that allows users (doctors/biologists) to upload and analyse DMET files produced by Affymetrix DMET-Console in an interactive way. The effectiveness and easy use of DMET Analyzer is demonstrated through different case studies regarding

  14. Rawcopy: Improved copy number analysis with Affymetrix arrays

    PubMed Central

    Mayrhofer, Markus; Viklund, Björn; Isaksson, Anders

    2016-01-01

    Microarray data is subject to noise and systematic variation that negatively affects the resolution of copy number analysis. We describe Rawcopy, an R package for processing of Affymetrix CytoScan HD, CytoScan 750k and SNP 6.0 microarray raw intensities (CEL files). Noise characteristics of a large number of reference samples are used to estimate log ratio and B-allele frequency for total and allele-specific copy number analysis. Rawcopy achieves better signal-to-noise ratio and higher proportion of validated alterations than commonly used free and proprietary alternatives. In addition, Rawcopy visualizes each microarray sample for assessment of technical quality, patient identity and genome-wide absolute copy number states. Software and instructions are available at http://rawcopy.org. PMID:27796336

  15. A survey and new measurements of ice vapor pressure at temperatures between 170 and 250K

    NASA Technical Reports Server (NTRS)

    Marti, James; Mauersberger, Konrad

    1993-01-01

    New measurements of ice vapor pressures at temperatures between 170 and 250 K are presented and published vapor pressure data are summarized. An empirical vapor pressure equation was derived and allows prediction of vapor pressures between 170 k and the triple point of water with an accuracy of approximately 2 percent. Predictions obtained agree, within experimental uncertainty, with the most reliable equation derived from thermodynamic principles.

  16. The infrared optical constants of sulfuric acid at 250 K. [spectral reflectance measurement of aqueous solutions

    NASA Technical Reports Server (NTRS)

    Pinkley, L. W.; Williams, D.

    1976-01-01

    Results are presented for measurements of the IR spectral reflectance at near-normal incidence of aqueous solutions of sulfuric acid with acid concentrations of 75% and 95.6% by weight. Kramers-Kronig analyses of the reflectance data are employed to obtain values of the optical constants n(nu) and k(nu) in the spectral range from 400 to 6000 cm to the -1 power. The optical constants of these solutions at 250 K and 300 K are compared. It is found that in spectral regions remote from strong absorption bands, the values of the n(nu) indices obtained at 250 K agree with the values given by Lorentz-Lorenz correction of the same indices at 300 K. All absorption bands observed at 300 K are found to be present at 250 K with slight shifts in frequency and with significant differences in the k(nu) indices at the band maxima. Based on these results, it is concluded that the clouds of Venus probably consist of droplets of aqueous solutions of sulfuric acid with acid concentrations of about 75% by weight.

  17. Micro-Analyzer: automatic preprocessing of Affymetrix microarray data.

    PubMed

    Guzzi, Pietro Hiram; Cannataro, Mario

    2013-08-01

    A current trend in genomics is the investigation of the cell mechanism using different technologies, in order to explain the relationship among genes, molecular processes and diseases. For instance, the combined use of gene-expression arrays and genomic arrays has been demonstrated as an effective instrument in clinical practice. Consequently, in a single experiment different kind of microarrays may be used, resulting in the production of different types of binary data (images and textual raw data). The analysis of microarray data requires an initial preprocessing phase, that makes raw data suitable for use on existing analysis platforms, such as the TIGR M4 (TM4) Suite. An additional challenge to be faced by emerging data analysis platforms is the ability to treat in a combined way those different microarray formats coupled with clinical data. In fact, resulting integrated data may include both numerical and symbolic data (e.g. gene expression and SNPs regarding molecular data), as well as temporal data (e.g. the response to a drug, time to progression and survival rate), regarding clinical data. Raw data preprocessing is a crucial step in analysis but is often performed in a manual and error prone way using different software tools. Thus novel, platform independent, and possibly open source tools enabling the semi-automatic preprocessing and annotation of different microarray data are needed. The paper presents Micro-Analyzer (Microarray Analyzer), a cross-platform tool for the automatic normalization, summarization and annotation of Affymetrix gene expression and SNP binary data. It represents the evolution of the μ-CS tool, extending the preprocessing to SNP arrays that were not allowed in μ-CS. The Micro-Analyzer is provided as a Java standalone tool and enables users to read, preprocess and analyse binary microarray data (gene expression and SNPs) by invoking TM4 platform. It avoids: (i) the manual invocation of external tools (e.g. the Affymetrix Power

  18. Three-dimensional magnetic cloak working from d.c. to 250 kHz

    NASA Astrophysics Data System (ADS)

    Zhu, Jianfei; Jiang, Wei; Liu, Yichao; Yin, Ge; Yuan, Jun; He, Sailing; Ma, Yungui

    2015-11-01

    Invisible cloaking is one of the major outcomes of the metamaterial research, but the practical potential, in particular for high frequencies (for example, microwave to visible light), is fatally challenged by the complex material properties they usually demand. On the other hand, it will be advantageous and also technologically instrumental to design cloaking devices for applications at low frequencies where electromagnetic components are favourably uncoupled. In this work, we vastly develop the bilayer approach to create a three-dimensional magnetic cloak able to work in both static and dynamic fields. Under the quasi-static approximation, we demonstrate a perfect magnetic cloaking device with a large frequency band from 0 to 250 kHz. The practical potential of our device is experimentally verified by using a commercial metal detector, which may lead us to having a real cloaking application where the dynamic magnetic field can be manipulated in desired ways.

  19. 250 kA compact linear transformer driver for wire array z-pinch loads

    NASA Astrophysics Data System (ADS)

    Bott, S. C.; Haas, D. M.; Madden, R. E.; Ueda, U.; Eshaq, Y.; Collins, G., IV; Gunasekera, K.; Mariscal, D.; Peebles, J.; Beg, F. N.; Mazarakis, M.; Struve, K.; Sharpe, R.

    2011-05-01

    We present the application of a short rise (˜150ns) 250 kA linear transformer driver (LTD) to wire array z-pinch loads for the first time. The generator is a modification of a previous driver in which a new conical power feed provides a low inductance coupling to wire loads. Performance of the new design using both short circuit and plasma loads is presented and discussed. The final design delivers ˜200kA to a wire array load which is in good agreement with SCREAMER calculations using a simplified representative circuit. Example results demonstrate successful experiments using cylindrical, conical, and inverse wire arrays as well as previously published work on x-pinch loads.

  20. Design and development of collector for C-band 250 kW CW Klystron

    NASA Astrophysics Data System (ADS)

    Baloda, Suman; Lamba, O. S.; Kaushik, Meenu; Richa; Bansal, Prachi; Kumud; Pradeep; Kant, D.; Joshi, L. M.

    2012-11-01

    The paper presents the design and development of collector for C-band 250 kW high power klystron. The design criteria for the collector assembly is selection of material, vacuum and high temperature compatibility, proper electron beam dispersion, minimum back scattering of electrons and thermal design for proper cooling at high power dissipation. All these aspects have been discussed for collector development in details. The collector has been designed in TRAK and then beam propagation has been analyzed in MAGIC 2D software. The thermal simulation has been done using ANSYS 11.0 (multi-physics). The outer surface of the collector has been grooved to facilitate its proper cooling. Design results are presented for water cooling with different flow rates and channel dimensions. OFHC copper material is chosen for collector which is suitable for vacuum and hydrogen brazing operations and good thermal properties for efficient cooling.

  1. The cause and effect of power fluctuations near 250 kW

    SciTech Connect

    Church, L.B.

    1980-07-01

    In a 250kW Mark I TRIGA power fluctuations to an extent of 8% ({+-}4%) over a one-minute interval have been observed in three independent channels. These random and sudden changes are removed by the 'automatic' mode and are present only when the primary water system is on. Simultaneous electrical interference with the three channels from an external source (i.e., the primary pump) has been ruled out as a possible cause; so has the possible movement of the control rods, neutron chambers and lazy susan shafts by the primary water flow. A monitoring of water temperature above the core revealed changes by as much as 13 deg. C in about 10 seconds. It is thought (although not completely understood) that these temperature variations are the cause of the observed power fluctuations. (author)

  2. The 250-kW CW klystron amplifier for planetary radar

    NASA Technical Reports Server (NTRS)

    Cormier, R.; Mizuhara, A.

    1992-01-01

    The design, construction, and performance testing is described of two Varian klystrons, model VKX-7864A, which replaced the aging and less efficient VA-949J klystrons in the X band planetary radar transmitter on the Goldstone, CA, 70 meter antenna. The project was carried out jointly by the JPL and Varian Assoc. Output power was increased from 200 to 250 kW continuous wave per klystron, and full dc beam power is dissipated in the collector (it was not possible to operate the VA-949J klystrons without RF drive because of limited collector dissipation capability). Replacements were made with a minimum of transmitter modifications. The planetary radar transmitter is now operating successfully with these two klystrons.

  3. 250 kW flywheel with HTS magnetic bearing for industrial use

    NASA Astrophysics Data System (ADS)

    Werfel, F. N.; Floegel-Delor, U.; Riedel, T.; Rothfeld, R.; Wippich, D.; Goebel, B.; Reiner, G.; Wehlau, N.

    2008-02-01

    A 250 kW / 5 kWh engineering prototype Flywheel Energy Storage System (FESS) was designed, fabricated and component tested by Adelwitz Technologiezentrum GmbH (ATZ) and L-3 Communications Magnet - Motor GmbH (MM). A heavy - load vertical 0.6 ton rotor is suspended totally magnetically by an HTS radial-passive bearing on the top together with a PM bearing at the bottom. Further features are the flywheel rotor body which is manufactured from carbon fibre reinforced plastics (CFRP) in a multi-rim version and combined with an integrated high-power motor/generator. A 35 W/77 K single- stage Gifford McMahon cryo-cooler is cooling the HTS bearing to a temperature of 45 - 60 K. Functionality and efficiency of the magnetic bearing configurations, rotor control concepts and motor / generator power electric system is considered and established. Bearing stiffness parameters, damping performance, and rotational friction are measured. Testing of further components under vacuum conditions confirmed that low bearing drag and wear- free operation can be attained. The motor-generator operates with a power in excess of 250 kW and an efficiency of > 92%, including the losses of the inverters. A redundant mechanical touchdown bearing system can be activated to restore the rotor position. The separately tested flywheel components are now in the assembling status expecting first machine tests in November 2007. After studying and measuring all FESS parameters in -house the dynamical storage device will be tested in a German E.ON power station under industrial conditions.

  4. SNP array-based whole genome homozygosity mapping as the first step to a molecular diagnosis in patients with Charcot-Marie-Tooth disease.

    PubMed

    Fischer, Carina; Trajanoski, Slave; Papić, Lea; Windpassinger, Christian; Bernert, Günther; Freilinger, Michael; Schabhüttl, Maria; Arslan-Kirchner, Mine; Javaher-Haghighi, Poupak; Plecko, Barbara; Senderek, Jan; Rauscher, Christian; Löscher, Wolfgang N; Pieber, Thomas R; Janecke, Andreas R; Auer-Grumbach, Michaela

    2012-03-01

    Considerable non-allelic heterogeneity for autosomal recessively inherited Charcot-Marie-Tooth (ARCMT) disease has challenged molecular testing and often requires a large amount of work in terms of DNA sequencing and data interpretation or remains unpractical. This study tested the value of SNP array-based whole-genome homozygosity mapping as a first step in the molecular genetic diagnosis of sporadic or ARCMT in patients from inbred families or outbred populations with the ancestors originating from the same geographic area. Using 10 K 2.0 and 250 K Nsp Affymetrix SNP arrays, 15 (63%) of 24 CMT patients received an accurate genetic diagnosis. We used our Java-based script eHoPASA CMT-easy Homozygosity Profiling of SNP arrays for CMT patients to display the location of homozygous regions and their extent of marker count and base-pairs throughout the whole genome. CMT4C was the most common genetic subtype with mutations detected in SH3TC2, one (p.E632Kfs13X) appearing to be a novel founder mutation. A sporadic patient with severe CMT was homozygous for the c.250G > C (p.G84R) HSPB1 mutation which has previously been reported to cause autosomal dominant dHMN. Two distantly related CMT1 patients with early disease onset were found to carry a novel homozygous mutation in MFN2 (p.N131S). We conclude that SNP array-based homozygosity mapping is a fast, powerful, and economic tool to guide molecular genetic testing in ARCMT and in selected sporadic CMT patients.

  5. A comparison of Affymetrix gene expression arrays.

    PubMed

    Robinson, Mark D; Speed, Terence P

    2007-11-15

    Affymetrix GeneChips are an important tool in many facets of biological research. Recently, notable design changes to the chips have been made. In this study, we use publicly available data from Affymetrix to gauge the performance of three human gene expression arrays: Human Genome U133 Plus 2.0 (U133), Human Exon 1.0 ST (HuEx) and Human Gene 1.0 ST (HuGene). We studied probe-, exon- and gene-level reproducibility of technical and biological replicates from each of the 3 platforms. The U133 array has larger feature sizes so it is no surprise that probe-level variances are smaller, however the larger number of probes per gene on the HuGene array seems to produce gene-level summaries that have similar variances. The gene-level summaries of the HuEx array are less reproducible than the other two, despite having the largest average number of probes per gene. Greater than 80% of the content on the HuEx arrays is expressed at or near background. Biological variation seems to have a smaller effect on U133 data. Comparing the overlap of differentially expressed genes, we see a high overall concordance among all 3 platforms, with HuEx and HuGene having greater overlap, as expected given their design. We performed an analysis of detection rates and area under ROC curves using an experiment made up of several mixtures of 2 human tissues. Though it appears that the HuEx array has worse performance in terms of detection rates, all arrays have similar ability to separate differentially expressed and non-differentially expressed genes. Despite noticeable differences in the probe-level reproducibility, gene-level reproducibility and differential expression detection are quite similar across the three platforms. The HuEx array, an all-encompassing array, has the flexibility of measuring all known or predicted exonic content. However, the HuEx array induces poorer reproducibility for genes with fewer exons. The HuGene measures just the well-annotated genome content and appears to

  6. SNP Arrays

    PubMed Central

    Louhelainen, Jari

    2016-01-01

    The papers published in this Special Issue “SNP arrays” (Single Nucleotide Polymorphism Arrays) focus on several perspectives associated with arrays of this type. The range of papers vary from a case report to reviews, thereby targeting wider audiences working in this field. The research focus of SNP arrays is often human cancers but this Issue expands that focus to include areas such as rare conditions, animal breeding and bioinformatics tools. Given the limited scope, the spectrum of papers is nothing short of remarkable and even from a technical point of view these papers will contribute to the field at a general level. Three of the papers published in this Special Issue focus on the use of various SNP array approaches in the analysis of three different cancer types. Two of the papers concentrate on two very different rare conditions, applying the SNP arrays slightly differently. Finally, two other papers evaluate the use of the SNP arrays in the context of genetic analysis of livestock. The findings reported in these papers help to close gaps in the current literature and also to give guidelines for future applications of SNP arrays. PMID:27792140

  7. Celsius: a community resource for Affymetrix microarray data.

    PubMed

    Day, Allen; Carlson, Marc R J; Dong, Jun; O'Connor, Brian D; Nelson, Stanley F

    2007-01-01

    Celsius is a data warehousing system to aggregate Affymetrix CEL files and associated metadata. It provides mechanisms for importing, storing, querying, and exporting large volumes of primary and pre-processed microarray data. Celsius contains ten billion assay measurements and affiliated metadata. It is the largest publicly available source of Affymetrix microarray data, and through sheer volume it allows a sophisticated, broad view of transcription that has not previously been possible.

  8. 250 kV 6 mA compact Cockcroft-Walton high-voltage power supply

    SciTech Connect

    Ma, Zhan-Wen; Su, Xiao-Dong; Wei, Zhen; Huang, Zhi-Wu; Miao, Tian-You; Su, Tong-Ling; Lu, Xiao-Long; Wang, Jun-Run; Yao, Ze-En

    2016-08-15

    A compact power supply system for a compact neutron generator has been developed. A 4-stage symmetrical Cockcroft-Walton circuit is adopted to produce 250 kV direct current high-voltage. A 2-stage 280 kV isolation transformer system is used to drive the ion source power supply. For a compact structure, safety, and reliability during the operation, the Cockcroft-Walton circuit and the isolation transformer system are enclosed in an epoxy vessel containing the transformer oil whose size is about ∅350 mm × 766 mm. Test results indicate that the maximum output voltage of the power supply is 282 kV, and the stability of the output voltage is better than 0.63% when the high voltage power supply is operated at 250 kV, 6.9 mA with the input voltage varying ±10%.

  9. 250 kV 6 mA compact Cockcroft-Walton high-voltage power supply

    NASA Astrophysics Data System (ADS)

    Ma, Zhan-Wen; Su, Xiao-Dong; Lu, Xiao-Long; Wei, Zhen; Wang, Jun-Run; Huang, Zhi-Wu; Miao, Tian-You; Su, Tong-Ling; Yao, Ze-En

    2016-08-01

    A compact power supply system for a compact neutron generator has been developed. A 4-stage symmetrical Cockcroft-Walton circuit is adopted to produce 250 kV direct current high-voltage. A 2-stage 280 kV isolation transformer system is used to drive the ion source power supply. For a compact structure, safety, and reliability during the operation, the Cockcroft-Walton circuit and the isolation transformer system are enclosed in an epoxy vessel containing the transformer oil whose size is about ∅350 mm × 766 mm. Test results indicate that the maximum output voltage of the power supply is 282 kV, and the stability of the output voltage is better than 0.63% when the high voltage power supply is operated at 250 kV, 6.9 mA with the input voltage varying ±10%.

  10. 250 kV 6 mA compact Cockcroft-Walton high-voltage power supply.

    PubMed

    Ma, Zhan-Wen; Su, Xiao-Dong; Lu, Xiao-Long; Wei, Zhen; Wang, Jun-Run; Huang, Zhi-Wu; Miao, Tian-You; Su, Tong-Ling; Yao, Ze-En

    2016-08-01

    A compact power supply system for a compact neutron generator has been developed. A 4-stage symmetrical Cockcroft-Walton circuit is adopted to produce 250 kV direct current high-voltage. A 2-stage 280 kV isolation transformer system is used to drive the ion source power supply. For a compact structure, safety, and reliability during the operation, the Cockcroft-Walton circuit and the isolation transformer system are enclosed in an epoxy vessel containing the transformer oil whose size is about ∅350 mm × 766 mm. Test results indicate that the maximum output voltage of the power supply is 282 kV, and the stability of the output voltage is better than 0.63% when the high voltage power supply is operated at 250 kV, 6.9 mA with the input voltage varying ±10%.

  11. Performance and Design Analysis of a 250-kW, Grid-Connected Battery Energy Storage System

    SciTech Connect

    Ball, Greg J.; Norris, Benjamin L.

    1999-06-01

    This report documents the assessment of performance and design of a 250-kW prototype battery energy storage system developed by Omnion Power Engineering Company and tested by Pacific Gas and Electric Company, both in collaboration with Sandia National Laboratories. The assess- ment included system performance, operator interface, and reliability. The report also discusses how to detect failed battery strings with strategically located voltage measurements.

  12. Inference of kinship coefficients from Korean SNP genotyping data.

    PubMed

    Park, Seong-Jin; Yang, Jin Ok; Kim, Sang Cheol; Kwon, Jekeun; Lee, Sanghyuk; Lee, Byungwook

    2013-06-01

    The determination of relatedness between individuals in a family is crucial in analysis of common complex diseases. We present a method to infer close inter-familial relationships based on SNP genotyping data and provide the relationship coefficient of kinship in Korean families. We obtained blood samples from 43 Korean individuals in two families. SNP data was obtained using the Affymetrix Genome-wide Human SNP array 6.0 and the Illumina Human 1M-Duo chip. To measure the kinship coefficient with the SNP genotyping data, we considered all possible pairs of individuals in each family. The genetic distance between two individuals in a pair was determined using the allele sharing distance method. The results show that genetic distance is proportional to the kinship coefficient and that a close degree of kinship can be confirmed with SNP genotyping data. This study represents the first attempt to identify the genetic distance between very closely related individuals.

  13. Automated SNP genotype clustering algorithm to improve data completeness in high-throughput SNP genotyping datasets from custom arrays.

    PubMed

    Smith, Edward M; Littrell, Jack; Olivier, Michael

    2007-12-01

    High-throughput SNP genotyping platforms use automated genotype calling algorithms to assign genotypes. While these algorithms work efficiently for individual platforms, they are not compatible with other platforms, and have individual biases that result in missed genotype calls. Here we present data on the use of a second complementary SNP genotype clustering algorithm. The algorithm was originally designed for individual fluorescent SNP genotyping assays, and has been optimized to permit the clustering of large datasets generated from custom-designed Affymetrix SNP panels. In an analysis of data from a 3K array genotyped on 1,560 samples, the additional analysis increased the overall number of genotypes by over 45,000, significantly improving the completeness of the experimental data. This analysis suggests that the use of multiple genotype calling algorithms may be advisable in high-throughput SNP genotyping experiments. The software is written in Perl and is available from the corresponding author.

  14. Construction and start-up of a 250 kW natural gas fueled MCFC demonstration power plant

    SciTech Connect

    Figueroa, R.A.; Carter, J.; Rivera, R.; Otahal, J.

    1996-12-31

    San Diego Gas & Electric (SDG&E) is participating with M-C Power in the development and commercialization program of their internally manifolded heat exchanger (IMHEX{reg_sign}) carbonate fuel cell technology. Development of the IMHEX technology base on the UNOCAL test facility resulted in the demonstration of a 250 kW thermally integrated power plant located at the Naval Air Station at Miramar, California. The members of the commercialization team lead by M-C Power (MCP) include Bechtel Corporation, Stewart & Stevenson Services, Inc., and Ishikawajima-Harima Heavy Industries (IHI). MCP produced the fuel cell stack, Bechtel was responsible for the process engineering including the control system, Stewart & Stevenson was responsible for packaging the process equipment in a skid (pumps, desulfurizer, gas heater, turbo, heat exchanger and stem generator), IHI produced a compact flat plate catalytic reformer operating on natural gas, and SDG&E assumed responsibility for plant construction, start-up and operation of the plant.

  15. Dilatometer setup for low coefficient of thermal expansion materials measurements in the 140 K-250 K temperature range

    NASA Astrophysics Data System (ADS)

    Spannagel, Ruven; Hamann, Ines; Sanjuan, Josep; Schuldt, Thilo; Gohlke, Martin; Johann, Ulrich; Weise, Dennis; Braxmaier, Claus

    2016-10-01

    Space applications demand light weight materials with excellent dimensional stability for telescopes, optical benches, optical resonators, etc. Glass-ceramics and composite materials can be tuned to reach very low coefficient of thermal expansion (CTE) at different temperatures. In order to determine such CTEs, very accurate setups are needed. Here we present a dilatometer that is able to measure the CTE of a large variety of materials in the temperature range of 140 K to 250 K. The dilatometer is based on a heterodyne interferometer with nanometer noise levels to measure the expansion of a sample when applying small amplitude controlled temperature signals. In this article, the CTE of a carbon fiber reinforced polymer sample has been determined with an accuracy in the 10-8 K-1 range.

  16. Dilatometer setup for low coefficient of thermal expansion materials measurements in the 140 K-250 K temperature range.

    PubMed

    Spannagel, Ruven; Hamann, Ines; Sanjuan, Josep; Schuldt, Thilo; Gohlke, Martin; Johann, Ulrich; Weise, Dennis; Braxmaier, Claus

    2016-10-01

    Space applications demand light weight materials with excellent dimensional stability for telescopes, optical benches, optical resonators, etc. Glass-ceramics and composite materials can be tuned to reach very low coefficient of thermal expansion (CTE) at different temperatures. In order to determine such CTEs, very accurate setups are needed. Here we present a dilatometer that is able to measure the CTE of a large variety of materials in the temperature range of 140 K to 250 K. The dilatometer is based on a heterodyne interferometer with nanometer noise levels to measure the expansion of a sample when applying small amplitude controlled temperature signals. In this article, the CTE of a carbon fiber reinforced polymer sample has been determined with an accuracy in the 10(-8) K(-1) range.

  17. VIZARD: analysis of Affymetrix Arabidopsis GeneChip data

    NASA Technical Reports Server (NTRS)

    Moseyko, Nick; Feldman, Lewis J.

    2002-01-01

    SUMMARY: The Affymetrix GeneChip Arabidopsis genome array has proved to be a very powerful tool for the analysis of gene expression in Arabidopsis thaliana, the most commonly studied plant model organism. VIZARD is a Java program created at the University of California, Berkeley, to facilitate analysis of Arabidopsis GeneChip data. It includes several integrated tools for filtering, sorting, clustering and visualization of gene expression data as well as tools for the discovery of regulatory motifs in upstream sequences. VIZARD also includes annotation and upstream sequence databases for the majority of genes represented on the Affymetrix Arabidopsis GeneChip array. AVAILABILITY: VIZARD is available free of charge for educational, research, and not-for-profit purposes, and can be downloaded at http://www.anm.f2s.com/research/vizard/ CONTACT: moseyko@uclink4.berkeley.edu.

  18. VIZARD: analysis of Affymetrix Arabidopsis GeneChip data

    NASA Technical Reports Server (NTRS)

    Moseyko, Nick; Feldman, Lewis J.

    2002-01-01

    SUMMARY: The Affymetrix GeneChip Arabidopsis genome array has proved to be a very powerful tool for the analysis of gene expression in Arabidopsis thaliana, the most commonly studied plant model organism. VIZARD is a Java program created at the University of California, Berkeley, to facilitate analysis of Arabidopsis GeneChip data. It includes several integrated tools for filtering, sorting, clustering and visualization of gene expression data as well as tools for the discovery of regulatory motifs in upstream sequences. VIZARD also includes annotation and upstream sequence databases for the majority of genes represented on the Affymetrix Arabidopsis GeneChip array. AVAILABILITY: VIZARD is available free of charge for educational, research, and not-for-profit purposes, and can be downloaded at http://www.anm.f2s.com/research/vizard/ CONTACT: moseyko@uclink4.berkeley.edu.

  19. Effect of Bubbles on Liquid Nitrogen Breakdown in Plane-Plane Electrode Geometry From 100-250 kPa

    SciTech Connect

    Sauers, Isidor; James, David Randy; Tuncer, Enis; Polyzos, Georgios; Pace, Marshall O

    2011-01-01

    Liquid nitrogen (LN(2)) is used as the cryogen and dielectric for many high temperature superconducting, high voltage applications. When a quench in the superconductor occurs, bubbles are generated which can affect the dielectric breakdown properties of the LN(2). Experiments were performed using plane-plane electrode geometry where bubbles were introduced into the gap through a pinhole in the ground electrode. Bubbles were generated using one or more kapton heaters producing heater powers up to 30 W. Pressure was varied from 100-250 kPa. Breakdown strength was found to be relatively constant up to a given heater power and pressure at which the breakdown strength drops to a low value depending on the pressure. After the drop the breakdown strength continues to drop gradually at higher heater power. This is particularly illustrated at 100 kPa. After the drop in breakdown strength the breakdown is believed to be due to the formation of a vapor bridge. Also the heater power at which the breakdown strength changes from that of LN(2) to that of gaseous nitrogen increases with increasing pressure. The data can provide design constraints for high temperature superconducting fault current limiters (FCLs) so that the formation of a vapor bridge can be suppressed or avoided.

  20. SKM-SNP: SNP markers detection method.

    PubMed

    Liu, Yang; Li, Mark; Cheung, Yiu M; Sham, Pak C; Ng, Michael K

    2010-04-01

    SKM-SNP, SNP markers detection program, is proposed to identify a set of relevant SNPs for the association between a disease and multiple marker genotypes. We employ a subspace categorical clustering algorithm to compute a weight for each SNP in the group of patient samples and the group of normal samples, and use the weights to identify the subsets of relevant SNPs that categorize these two groups. The experiments on both Schizophrenia and Parkinson Disease data sets containing genome-wide SNPs are reported to demonstrate the program. Results indicate that our method can find some relevant SNPs that categorize the disease samples. The online SKM-SNP program is available at http://www.math.hkbu.edu.hk/~mng/SKM-SNP/SKM-SNP.html.

  1. Qualitative assessment of gene expression in affymetrix genechip arrays

    NASA Astrophysics Data System (ADS)

    Nagarajan, Radhakrishnan; Upreti, Meenakshi

    2007-01-01

    Affymetrix Genechip microarrays are used widely to determine the simultaneous expression of genes in a given biological paradigm. Probes on the Genechip array are atomic entities which by definition are randomly distributed across the array and in turn govern the gene expression. In the present study, we make several interesting observations. We show that there is considerable correlation between the probe intensities across the array which defy the independence assumption. While the mechanism behind such correlations is unclear, we show that scaling behavior and the profiles of perfect match (PM) as well as mismatch (MM) probes are similar and immune-to-background subtraction. We believe that the observed correlations are possibly an outcome of inherent non-stationarities or patchiness in the array devoid of biological significance. This is demonstrated by inspecting their scaling behavior and profiles of the PM and MM probe intensities obtained from publicly available Genechip arrays from three eukaryotic genomes, namely: Drosophila melanogaster (fruit fly), Homo sapiens (humans) and Mus musculus (house mouse) across distinct biological paradigms and across laboratories, with and without background subtraction. The fluctuation functions were estimated using detrended fluctuation analysis (DFA) with fourth-order polynomial detrending. The results presented in this study provide new insights into correlation signatures of PM and MM probe intensities and suggests the choice of DFA as a tool for qualitative assessment of Affymetrix Genechip microarrays prior to their analysis. A more detailed investigation is necessary in order to understand the source of these correlations.

  2. CEL_INTERROGATOR: A FREE AND OPEN SOURCE PACKAGE FOR AFFYMETRIX CEL FILE PARSING

    USDA-ARS?s Scientific Manuscript database

    CEL_Interrogator Package is a suite of programs designed to extract the average probe intensity and other information for each probe sequence from an Affymetrix GeneChip CEL file and unite them with their human-readable Affymetrix consensus sequence names. The resulting text file is suitable for di...

  3. High correspondence between Affymetrix exon and standard expression arrays.

    PubMed

    Okoniewski, Michał J; Hey, Yvonne; Pepper, Stuart D; Miller, Crispin J

    2007-02-01

    Exon arrays aim to provide comprehensive gene expression data at the level of individual exons, similar to that provided on a per-gene basis by existing expression arrays. This report describes the performance of Affymetrix GeneChip Human Exon 1.0 ST array by using replicated RNA samples from two human cell lines, MCF7 and MCF10A, hybridized both to Exon 1.0 ST and to HG-U133 Plus2 arrays. Cross-comparison between array types requires an appropriate mapping to be found between individual probe sets. Three possible mappings were considered, reflecting different strategies for dealing with probe sets that target different parts of the same transcript. Irrespective of the mapping used, Exon 1.0 ST and HG-U133 Plus2 arrays show a high degree of correspondence. More than 80% of HG-U133 Plus2 probe sets may be mapped to the Exon chip, and fold changes are found well preserved for over 96% of those probe sets detected present. Since HG-U133 Plus2 arrays have already been extensively validated, these results lend a significant degree of confidence to exon arrays.

  4. Construction of a versatile SNP array for pyramiding useful genes of rice.

    PubMed

    Kurokawa, Yusuke; Noda, Tomonori; Yamagata, Yoshiyuki; Angeles-Shim, Rosalyn; Sunohara, Hidehiko; Uehara, Kanako; Furuta, Tomoyuki; Nagai, Keisuke; Jena, Kshirod Kumar; Yasui, Hideshi; Yoshimura, Atsushi; Ashikari, Motoyuki; Doi, Kazuyuki

    2016-01-01

    DNA marker-assisted selection (MAS) has become an indispensable component of breeding. Single nucleotide polymorphisms (SNP) are the most frequent polymorphism in the rice genome. However, SNP markers are not readily employed in MAS because of limitations in genotyping platforms. Here the authors report a Golden Gate SNP array that targets specific genes controlling yield-related traits and biotic stress resistance in rice. As a first step, the SNP genotypes were surveyed in 31 parental varieties using the Affymetrix Rice 44K SNP microarray. The haplotype information for 16 target genes was then converted to the Golden Gate platform with 143-plex markers. Haplotypes for the 14 useful allele are unique and can discriminate among all other varieties. The genotyping consistency between the Affymetrix microarray and the Golden Gate array was 92.8%, and the accuracy of the Golden Gate array was confirmed in 3 F2 segregating populations. The concept of the haplotype-based selection by using the constructed SNP array was proofed. Copyright © 2015 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.

  5. SNP-VISTA

    SciTech Connect

    Shah, Nameeta; Teplitsky, Michael; Minovitsky, Simon; Dubchak, Inna

    2005-11-07

    SNP-VISTA aids in analyses of the following types of data: A. Large-scale re-sequence data of disease-related genes for discovery of associated and/or causative alleles (GeneSNP-VISTA). B. Massive amounts of ecogenomics data for studying homologous recombination in microbial populations (EcoSNP-VISTA). The main features and capabilities of SNP-VISTA are: 1) Mapping of SNPs to gene structure; 2) classification of SNPs, based on their location in the gene, frequency of occurrence in samples and allele composition; 3) clustering, based on user-defined subsets of SNPs, highlighting haplotypes as well as recombinant sequences; 4) integration of protein conservation visualization; and 5) display of automatically calculated recombination points that are user-editable. The main strength of SNP-VISTA is its graphical interface and use of visual representations, which support interactive exploration and hence better understanding of large-scale SNPs data.

  6. Test of a 250 kVA Battery-Inverter System Micro-Grid: Cooperative Research and Development Final Report, CRADA Number CRD-11-460

    SciTech Connect

    Kramer, William; Martin, Greg; Lundstrom, Blake

    2013-12-01

    Portland General Electric (PGE) is installing a 5-megawatt (MW) lithium-ion-based battery-inverter system (BIS) in Salem, Oregon, as part of the Pacific Northwest Smart Grid Demonstration Project. NREL will assist PGE in testing a 250-kilovolt-ampere (kVA) portion of the BIS in order to verify correct operation and minimize risk to subsequent demonstrations. In this project NREL will providetechnical support for the 250-kVA test and will work with PGE to write a test plan and evaluate the system in the lab before deployment in the field.

  7. A comparison of the reactivating and therapeutic efficacy of newly developed bispyridinium oximes (K250, K251) with commonly used oximes against tabun in rats and mice.

    PubMed

    Kassa, Jiri; Karasova, Jana; Bajgar, Jiri; Kuca, Kamil; Musilek, Kamil; Kopelikova, Irena

    2009-08-01

    The potency of newly developed bispyridinium compounds (K250, K251) in reactivating tabun-inhibited acetylcholinesterase and reducing tabun-induced lethal toxic effects was compared with currently available oximes (obidoxime, trimedoxime, the oxime HI-6) using in vivo methods. Studies determined percentage of reactivation of tabun-inhibited blood and tissue AChE in poisoned rats and showed that the reactivating efficacy of both newly developed oximes is comparable with the oxime HI-6 but it is significantly lower than the reactivating effects of obidoxime and trimedoxime, especially in diaphragm and brain. Both newly developed oximes were also found to be able to slightly reduce lethal toxic effects in tabun-poisoned mice. Their therapeutic efficacy is higher than the potency of the oxime HI-6 but it is lower than the therapeutic effects of trimedoxime and obidoxime. Thus, the reactivating and therapeutic potency of both newly developed oximes (K250, K251) does not prevail over the effectiveness of currently available oximes and, therefore, they are not suitable for their replacement for the treatment of acute tabun poisoning.

  8. Magnetic properties of NiO nano particles: Contributions of the antiferromagnetic and ferromagnetic subsystems in different magnetic field ranges up to 250 kOe

    NASA Astrophysics Data System (ADS)

    Balaev, D. A.; Dubrovskiy, A. A.; Krasikov, A. A.; Popkov, S. I.; Balaev, A. D.; Shaikhutdinov, K. A.; Kirillov, V. L.; Mart'yanov, O. N.

    2017-08-01

    The magnetic properties of antiferromagnetic NiO nanoparticles prepared by thermal decomposition of nickel hydroxocarbonate are investigated. According to the data of magnetization measurements in fields of up to 250 kOe, the magnetic moment linearly grows in strong fields, which is caused by the contribution of the antiferromagnetically ordered nanoparticle core, and the antiferromagnetic susceptibility corresponds to that of bulk polycrystalline NiO. This allowed the antiferromagnetic and ferromagnetic contributions to the total magnetic response of a sample to be quantitatively determined. The latter occurs due to the incomplete spin compensation in an antiferromagnetic nanoparticle caused by defects on its surface. It is demonstrated that to correctly determine the superparamagnetic blocking temperature, it is necessary to take into account the antiferromagnetic susceptibility of the particle core.

  9. Comparison of a 250 kV single-stage accelerator mass spectrometer with a 5 MV tandem accelerator mass spectrometer--fitness for purpose in bioanalysis.

    PubMed

    Young, G C; Corless, S; Felgate, C C; Colthup, P V

    2008-12-01

    The introduction of 'compact' accelerator mass spectrometers into biomedical science, including use in drug metabolism and bioanalytical applications, is an exciting recent development. Comparisons are presented here between a more established and relatively large tandem accelerator which operates at up to 5 MV and a conventional laboratory-sized 250 kV single-stage accelerator mass spectrometer. Biological samples were enriched with low levels of radiocarbon, then converted into graphite prior to analysis on each of the two instruments. The data obtained showed the single-stage instrument to be capable of delivering comparable results, and thus able to provide similar study support, with that provided by the 5 MV instrument, without the significant overheads and complexities which are inherent to the operation of the larger instrument. We believe that the advent of these laboratory-sized accelerator mass spectrometry (AMS) instruments represents a real turning point in the potential for application of AMS by a wider user group.

  10. Identification of SNP-SNP interaction for chronic dialysis patients.

    PubMed

    Yang, Cheng-Hong; Weng, Zi-Jie; Chuang, Li-Yeh; Yang, Cheng-San

    2017-04-01

    Analyses of interactions between single nucleotide polymorphisms (SNPs) have reported significant associations between mitochondrial displacement loops (D-loops) and chronic dialysis diseases. However, the method used to detect potential SNP-SNP interaction still requires improvement. This study proposes an effective algorithm named dynamic center particle swarm optimization k-nearest neighbors (DCPSO-KNN) to detect the SNP-SNP interaction. DCPSO-KNN uses dynamic center particle swarm optimization (DCPSO) to generate SNP combinations with a fitness function designed using the KNN method and statistical verification. A total of 77 SNPs in the mitochondrial D-loop were used to detect the SNP-SNP interactions and the search ability was compared against that of other methods. The detected SNP-SNP interactions were statistically evaluated. Experimental results showed that DCPSO-KNN successfully detects SNP-SNP interactions in two-to-seven-order combinations (positive predictive value (PPV)+negative predictive value (NPV)=1.154 to 1.310; odds ratio (OR)=1.859 to 4.015; 95% confidence interval (95% CI)=1.151 to 4.265; p-value <0.001). DCPSO-KNN can improve the detection ability of SNP-SNP associations between mitochondrial D-loops and chronic dialysis diseases, thus facilitating the development of biomedical applications. Copyright © 2017 Elsevier Ltd. All rights reserved.

  11. Loss of Heterozygosity and Copy Number Abnormality in Clear Cell Renal Cell Carcinoma Discovered by High-Density Affymetrix 10K Single Nucleotide Polymorphism Mapping Array1

    PubMed Central

    Toma, Marieta I; Grosser, Marianne; Herr, Alexander; Aust, Daniela E; Meye, Axel; Hoefling, Christian; Fuessel, Susanne; Wuttig, Daniela; Wirth, Manfred P; Baretton, Gustavo B

    2008-01-01

    Genetic aberrations are crucial in renal tumor progression. In this study, we describe loss of heterozygosity (LOH) and DNA-copy number abnormalities in clear cell renal cell carcinoma (cc-RCC) discovered by genome-wide single nucleotide polymorphism (SNP) arrays. Genomic DNA from tumor and normal tissue of 22 human cc-RCCs was analyzed on the Affymetrix GeneChip Human Mapping 10K Array. The array data were validated by quantitative polymerase chain reaction and immunohistochemistry. Reduced DNA copy numbers were detected on chromosomal arm 3p in 91%, on chromosome 9 in 32%, and on chromosomal arm 14q in 36% of the tumors. Gains were detected on chromosomal arm 5q in 45% and on chromosome 7 in 32% of the tumors. Copy number abnormalities were found not only in FHIT and VHL loci, known to be involved in renal carcinogenesis, but also in regions containing putative new tumor suppressor genes or oncogenes. In addition, microdeletions were detected on chromosomes 1 and 6 in genes with unknown impact on renal carcinogenesis. In validation experiments, abnormal protein expression of FOXP1 (on 3p) was found in 90% of tumors (concordance with SNP array data in 85%). As assessed by quantitative polymerase chain reaction, PARK2 and PACRG were down-regulated in 57% and 100%, respectively, and CSF1R was up-regulated in 69% of the cc-RCC cases (concordance with SNP array data in 57%, 33%, and 38%). Genome-wide SNP array analysis not only confirmed previously described large chromosomal aberrations but also detected novel microdeletions in genes potentially involved in tumor genesis of cc-RCC. PMID:18592004

  12. SNP panels/Imputation

    USDA-ARS?s Scientific Manuscript database

    Participants from thirteen countries discussed services that Interbull can perform or recommendations that Interbull can make to promote harmonization and assist member countries in improving their genomic evaluations in regard to SNP panels and imputation. The panel recommended: A mechanism to shar...

  13. Pulsed Yb:fiber system capable of >250kW peak power with tunable pulses in the 50ps to 1.5ns range

    NASA Astrophysics Data System (ADS)

    McComb, Timothy S.; Lowder, Tyson L.; Leadbetter, Vickie; Reynolds, Mitch; Saracco, Matthieu J.; Hutchinson, Joel; Green, Jared; McCal, Dennis; Burkholder, Gary; Kutscha, Tim; Dittli, Adam; Hamilton, Chuck; Kliner, Dahv A. V.; Randall, Matthew; Fanning, Geoff; Bell, Jake

    2013-03-01

    We have demonstrated a pulsed 1064 nm PM Yb:fiber laser system incorporating a seed source with a tunable pulse repetition rate and pulse duration and a multistage fiber amplifier, ending in a large core (>650 μm2 mode field area), tapered fiber amplifier. The amplifier chain is all-fiber, with the exception of the final amplifier's pump combiner, allowing robust, compact packaging. The air-cooled laser system is rated for >60 W of average power and beam quality of M2 < 1.3 at repetition rates below 100 kHz to 10's of MHz, with pulses discretely tunable over a range spanning 50 ps to greater than 1.5 ns. Maximum pulse energies, limited by the onset of self phase modulation and stimulated Raman scattering, are greater than 12.5 μJ at 50 ps and 375 μJ at 1.5 ns , corresponding to >250 kW peak power across the pulse tuning range. We present frequency conversion to 532 nm with efficiency greater than 70% and conversion to UV via frequency tripling, with initial feasibility experiments showing >30% UV conversion efficiency. Application results of the laser in scribing, thin film removal and micro-machining will be discussed.

  14. SFP Genotyping from Affymetrix Arrays is Robust but Largely Detects Cis-acting Expression Regulators

    USDA-ARS?s Scientific Manuscript database

    The recent development of Affymetrix chips designed from assembled EST sequences has spawned considerable interest in identifying single-feature polymorphisms (SFPs) from transcriptome data. SFPs are valuable genetic markers that potentially offer a physical link to the structural genes themselves....

  15. Discovery and mapping of single feature polymorphisms in wheat using affymetrix arrays

    USDA-ARS?s Scientific Manuscript database

    Single feature polymorphisms (SFPs) can be a rich source of markers for gene mapping and function studies. To explore the feasibility of using the Affymetrix GeneChip to discover and map SFPs in the large hexaploid wheat genome, six wheat varieties of diverse origins were analyzed for significant pr...

  16. Concordance of copy number alterations using a common analytic pipeline for genome-wide analysis of Illumina and Affymetrix genotyping data: a report from the Children's Oncology Group.

    PubMed

    Vujkovic, Marijana; Attiyeh, Edward F; Ries, Rhonda E; Horn, Michelle; Goodman, Elizabeth K; Ding, Yang; Kavcic, Marko; Alonzo, Todd A; Gerbing, Robert B; Hirsch, Betsy; Raimondi, Susana; Gamis, Alan S; Meshinchi, Soheil; Aplenc, Richard

    2015-01-01

    Copy number alterations (CNAs) are a hallmark of pediatric cancer genomes. An increasing number of research groups use multiple platforms and software packages to detect and analyze CNAs. However, different platforms have experimental and analysis-specific biases that may yield different results. We sought to estimate the concordance of CNAs in children with de novo acute myeloid leukemia between two experimental platforms: Affymetrix SNP 6.0 array and Illumina OmniQuad 2.5 BeadChip. Forty-five paired tumor-remission samples were genotyped on both platforms, and CNAs were estimated from total signal intensity and allelic contrast values using the allele-specific copy number analysis of tumors (ASCAT) algorithm. The two platforms were comparable in detection of CNAs, each missing only two segments from a total of 42 CNAs (4.6%). Overall, there was an interplatform agreement of 96% for allele-specific tumor profiles. However, poor quality samples with low signal/noise ratios showed a high rate of false-positive segments independent of the genotyping platform. These results demonstrate that a common analytic pipeline can be utilized for SNP array data from these two platforms. The customized programming template for the preprocessing, data integration, and analysis is publicly available at https://github.com/AplenCHOP/affyLumCNA. Published by Elsevier Inc.

  17. CSRMT measurements in the frequency range of 1-250 kHz to map a normal fault in the Volvi basin, Greece

    NASA Astrophysics Data System (ADS)

    Bastani, M.; Savvaidis, A.; Pedersen, L. B.; Kalscheuer, T.

    2011-10-01

    In order to gain a better understanding of the geometry of surface faults, five Controlled Source/Radio Magnetotelluric (CSRMT) profiles were measured across the Volvi basin, 45 km northeast of the city of Thessaloniki in Greece. The data were collected in two frequency ranges: a) 1-12.5 kHz using a remotely controlled double horizontal magnetic dipole transmitter (CSAMT measurements), and b) 15-250 kHz using the signal from distant radio transmitters (RMT measurements). The transition from the RMT band to the CSAMT band was smooth and continuous allowing us to combine both datasets for plane-wave modeling. The surface geology shows a predominantly 2D structure, and therefore we planned the survey into profiles perpendicular to the geological strike. We have used a 2D interpretation tool to model the data in TE, TM, TE + TM and determinant modes. Using a 4% error floor on the impedance, 2D resistivity models from inversion of the determinant data provide lower RMS data fits (4.2 and 1.2 for resistivity and phase, respectively) compared to the combined TE + TM data (4.4, 2.8, overall resistivity and phase, respectively). 2D inversion of the measured tensor data shows a sharp change in the depth to the top of resistive gneiss-schist basement that is overlain by a less resistive overburden at southern basin flanks. The change in depth to the bedrock is clearly seen in all 2D models along the measured profiles suggesting the existence of normal faults with strike directions of NE-SW to E-W. The 2D electrical resistivity models suggest that the bedrock deepens towards south-west. The resistivity models are also compared with the existing borehole information in the area and show a reasonable correlation. For example the sharp change of depth to the bedrock towards the center of the basin as seen in the resistivity models are also confirmed by the borehole data.

  18. Monte Carlo calculations of the ionization chamber wall correction factors for 192Ir and 60Co gamma rays and 250 kV x-rays for use in calibration of 192Ir HDR brachytherapy sources.

    PubMed

    Ferreira, I H; de Almeida, C E; Marre, D; Marechal, M H; Bridier, A; Chavaudra, J

    1999-08-01

    As in the method for the calibration of 192Ir high-dose-rate (HDR) brachytherapy sources, the ionization chamber wall correction factor A(w), is needed for 192Ir and 60Co gamma rays and 250 kV x-rays. This factor takes into account the variation in chamber response due to the attenuation of the photon beam in the chamber wall and build-up cap and the contribution of scattered photons. Monte Carlo calculations were performed using the EGS4 code system with the PRESTA algorithm, to calculate the A(w) factor for 51 commercial ionization chambers and build-up caps exposed to the typical energy spectrum of 192Ir and 60Co gamma rays and 250 kV x-rays. The calculated A(w) correction factors for 192Ir and 60Co sources and 250 kV x-rays agree very well to within 0.1% with published experimental data (the statistical uncertainty is less than 0.1% of the calculated correction factor value). For the 192Ir sources, A(w) varies from 0.973 to 0.993 and for the 250 kV x-rays the minimum value of A(w) for all chambers studied is 0.983. The calculated A(w) correction factors can be used to calculate the air kerma calibration factor of HDR brachytherapy sources, when interpolative methods are considered, contributing to the reduction in the overall uncertainties in the calibration procedure.

  19. Establishing a major cause of discrepancy in the calibration of Affymetrix GeneChips

    PubMed Central

    Harrison, Andrew P; Johnston, Caroline E; Orengo, Christine A

    2007-01-01

    Background Affymetrix GeneChips are a popular platform for performing whole-genome experiments on the transcriptome. There are a range of different calibration steps, and users are presented with choices of different background subtractions, normalisations and expression measures. We wished to establish which of the calibration steps resulted in the biggest uncertainty in the sets of genes reported to be differentially expressed. Results Our results indicate that the sets of genes identified as being most significantly differentially expressed, as estimated by the z-score of fold change, is relatively insensitive to the choice of background subtraction and normalisation. However, the contents of the gene list are most sensitive to the choice of expression measure. This is irrespective of whether the experiment uses a rat, mouse or human chip and whether the chip definition is made using probe mappings from Unigene, RefSeq, Entrez Gene or the original Affymetrix definitions. It is also irrespective of whether both Present and Absent, or just Present, Calls from the MAS5 algorithm are used to filter genelists, and this conclusion holds for genes of differing intensities. We also reach the same conclusion after assigning genes to be differentially expressed using t-statistics, although this approach results in a large amount of false positives in the sets of genes identified due to the small numbers of replicates typically used in microarray experiments. Conclusion The major calibration uncertainty that biologists need to consider when analysing Affymetrix data is how their multiple probe values are condensed into one expression measure. PMID:17562008

  20. An annotation infrastructure for the analysis and interpretation of Affymetrix exon array data.

    PubMed

    Okoniewski, Michał J; Yates, Tim; Dibben, Siân; Miller, Crispin J

    2007-01-01

    Affymetrix exon arrays contain probesets intended to target every known and predicted exon in the entire genome, posing significant challenges for high-throughput genome-wide data analysis. X:MAP http://xmap.picr.man.ac.uk, an annotation database, and exonmap http://www.bioconductor.org/packages/2.0/bioc/html/exonmap.html, a BioConductor/R package, are designed to support fine-grained analysis of exon array data. The system supports the application of standard statistical techniques, prior to the use of genome scale annotation to provide gene-, transcript- and exon-level summaries and visualization tools.

  1. An annotation infrastructure for the analysis and interpretation of Affymetrix exon array data

    PubMed Central

    Okoniewski, Michał J; Yates, Tim; Dibben, Siân; Miller, Crispin J

    2007-01-01

    Affymetrix exon arrays contain probesets intended to target every known and predicted exon in the entire genome, posing significant challenges for high-throughput genome-wide data analysis. X:MAP , an annotation database, and exonmap , a BioConductor/R package, are designed to support fine-grained analysis of exon array data. The system supports the application of standard statistical techniques, prior to the use of genome scale annotation to provide gene-, transcript- and exon-level summaries and visualization tools. PMID:17498294

  2. Etiological yield of SNP microarrays in idiopathic intellectual disability.

    PubMed

    Utine, G Eda; Haliloğlu, Göknur; Volkan-Salancı, Bilge; Çetinkaya, Arda; Kiper, Pelin Ö; Alanay, Yasemin; Aktaş, Dilek; Anlar, Banu; Topçu, Meral; Boduroğlu, Koray; Alikaşifoğlu, Mehmet

    2014-05-01

    Intellectual disability (ID) has a prevalence of 3% and is classified according to its severity. An underlying etiology cannot be determined in 75-80% in mild ID, and in 20-50% of severe ID. After it has been shown that copy number variations involving short DNA segments may cause ID, genome-wide SNP microarrays are being used as a tool for detecting submicroscopic copy number changes and uniparental disomy. This study was performed to investigate the presence of copy number changes in patients with ID of unidentified etiology. Affymetrix(®) 6.0 SNP microarray platform was used for analysis of 100 patients and their healthy parents, and data were evaluated using various databases and literature. Etiological diagnoses were made in 12 patients (12%). Homozygous deletion in NRXN1 gene and duplication in IL1RAPL1 gene were detected for the first time. Two separate patients had deletions in FOXP2 and UBE2A genes, respectively, for which only few patients have recently been reported. Interstitial and subtelomeric copy number changes were described in 6 patients, in whom routine cytogenetic tools revealed normal results. In one patient uniparental disomy type of Angelman syndrome was diagnosed. SNP microarrays constitute a screening test able to detect very small genomic changes, with a high etiological yield even in patients already evaluated using traditional cytogenetic tools, offer analysis for uniparental disomy and homozygosity, and thereby are helpful in finding novel disease-causing genes: for these reasons they should be considered as a first-tier genetic screening test in the evaluation of patients with ID and autism.

  3. Genome-wide SNP typing reveals signatures of population history.

    PubMed

    Hughes, Austin L; Welch, Robert; Puri, Vinita; Matthews, Casey; Haque, Kashif; Chanock, Stephen J; Yeager, Meredith

    2008-07-01

    Single-nucleotide polymorphism (SNP) arrays have become a popular technology for disease-association studies, but they also have potential for studying the genetic differentiation of human populations. Application of the Affymetrix GeneChip Human Mapping 500K Array Set to a population of 102 individuals representing the major ethnic groups in the United States (African, Asian, European, and Hispanic) revealed patterns of gene diversity and genetic distance that reflected population history. We analyzed allelic frequencies at 388,654 autosomal SNP sites that showed some variation in our study population and 10% or fewer missing values. Despite the small size (23-31 individuals) of each subpopulation, there were no fixed differences at any site between any two subpopulations. As expected from the African origin of modern humans, greater gene diversity was seen in Africans than in either Asians or Europeans, and the genetic distance between the Asian and the European populations was significantly lower than that between either of these two populations and Africans. Principal components analysis applied to a correlation matrix among individuals was able to separate completely the major continental groups of humans (Africans, Asians, and Europeans), while Hispanics overlapped all three of these groups. Genes containing two or more markers with extraordinarily high genetic distance between subpopulations were identified as candidate genes for health differences between subpopulations. The results show that, even with modest sample sizes, genome-wide SNP genotyping technologies have great promise for capturing signatures of gene frequency difference between human subpopulations, with applications in areas as diverse as forensics and the study of ethnic health disparities.

  4. Chromosomal lesions and uniparental disomy detected by SNP arrays in MDS, MDS/MPD, and MDS-derived AML

    PubMed Central

    Gondek, Lukasz P.; Tiu, Ramon; O'Keefe, Christine L.; Sekeres, Mikkael A.; Theil, Karl S.

    2008-01-01

    Using metaphase cytogenetics (MC), chromosomal abnormalities are found in only a proportion of patients with myelodysplastic syndrome (MDS). We hypothesized that with new precise methods more cryptic karyotypic lesions can be uncovered that may show important clinical implications. We have applied 250K single nucleotide polymorphisms (SNP) arrays (SNP-A) to study chromosomal lesions in samples from 174 patients (94 MDS, 33 secondary acute myeloid leukemia [sAML], and 47 myelodysplastic/myeloproliferative disease [MDS/MPD]) and 76 controls. Using SNP-A, aberrations were found in around three-fourths of MDS, MDS/MPD, and sAML (vs 59%, 37%, 53% by MC; in 8% of patients MC was unsuccessful). Previously unrecognized lesions were detected in patients with normal MC and in those with known lesions. Moreover, segmental uniparental disomy (UPD) was found in 20% of MDS, 23% of sAML, and 35% of MDS/MPD patients, a lesion resulting in copy-neutral loss of heterozygosity undetectable by MC. The potential clinical significance of abnormalities detected by SNP-A, but not seen on MC, was demonstrated by their impact on overall survival. UPD involving chromosomes frequently affected by deletions may have prognostic implications similar to the deletions visible by MC. SNP-A–based karyotyping shows superior resolution for chromosomal defects, including UPD. This technique further complements MC to improve clinical prognosis and targeted therapies. PMID:17954704

  5. An expression index for Affymetrix GeneChips based on the generalized logarithm.

    PubMed

    Zhou, Lei; Rocke, David M

    2005-11-01

    Affymetrix GeneChip high-density oligonucleotide arrays interrogate a single transcript using multiple short 25mer probes. Usually, a necessary step in the analysis of experiments using these GeneChips is to summarize each of these probe sets into a single expression index that can then be used for determining differential expression, for classification, for clustering, and for other analyses. In this paper, we propose a new expression index that is competitive with the best existing methods, and superior in many cases. We call this expression index method GLA, for GLog Average, since after normalization at the probe level, we take the mean generalized logarithm of perfect match probes. In this paper, we use Affycomp as the primary tool to assess the weaknesses and strengths of GLA. Comparisons are made between GLA and most widely used summary methods (RMA, MAS5.0 and MBEI) in great detail. The substantial reduction in variability and increased ability to detect differential expression, together with the simplicity of implementation, make GLA a plausible candidate for analysis of Affymetrix GeneChip data.

  6. Improvements to previous algorithms to predict gene structure and isoform concentrations using Affymetrix Exon arrays

    PubMed Central

    2010-01-01

    Background Exon arrays provide a way to measure the expression of different isoforms of genes in an organism. Most of the procedures to deal with these arrays are focused on gene expression or on exon expression. Although the only biological analytes that can be properly assigned a concentration are transcripts, there are very few algorithms that focus on them. The reason is that previously developed summarization methods do not work well if applied to transcripts. In addition, gene structure prediction, i.e., the correspondence between probes and novel isoforms, is a field which is still unexplored. Results We have modified and adapted a previous algorithm to take advantage of the special characteristics of the Affymetrix exon arrays. The structure and concentration of transcripts -some of them possibly unknown- in microarray experiments were predicted using this algorithm. Simulations showed that the suggested modifications improved both specificity (SP) and sensitivity (ST) of the predictions. The algorithm was also applied to different real datasets showing its effectiveness and the concordance with PCR validated results. Conclusions The proposed algorithm shows a substantial improvement in the performance over the previous version. This improvement is mainly due to the exploitation of the redundancy of the Affymetrix exon arrays. An R-Package of SPACE with the updated algorithms have been developed and is freely available. PMID:21110835

  7. Inter- and intra-reproducibility of genotypes from sheep technical replicates on Illumina and Affymetrix platforms.

    PubMed

    Berry, Donagh P; O'Brien, Aine; Wall, Eamonn; McDermott, Kevin; Randles, Shane; Flynn, Paul; Park, Stephen; Grose, Jenny; Weld, Rebecca; McHugh, Noirin

    2016-11-10

    Accurate genomic analyses are predicated upon access to accurate genotype input data. The objective of this study was to quantify the reproducibility of genotype data that are generated from the same genotype platform and from different genotyping platforms. Genotypes based on 51,121 single nucleotide polymorphisms (SNPs) for 84 animals that were each genotyped on Illumina and Affymetrix platforms and for another 25 animals that were each genotyped twice on the same Illumina platform were compared. Genotypes based on 11,323 SNPs for an additional 21 animals that were genotyped on two different Illumina platforms by two different service providers were also compared. Reproducibility of the results was measured as the correlation between allele counts and as genotype and allele concordance rates. A mean within-animal correlation of 0.9996 was found between allele counts in the 25 duplicate samples that were genotyped on the same Illumina platform and varied from 0.9963 to 1.0000 per animal. The mean (minimum, maximum) genotype and allele concordance rates per animal between the 25 duplicate samples were equal to 0.9996 (0.9968, 1.0000) and 0.9993 (0.9937, 1.0000), respectively. The concordance rate between the two different Illumina platforms was also near 1. A mean within-animal correlation of 0.9738 was found between genotypes that were generated on the Illumina and Affymetrix platforms and varied from 0.9505 to 0.9812 per animal. The mean (minimum, maximum) within-animal genotype and allele concordance rates between the Illumina and Affymetrix platforms were equal to 0.9711 (0.9418, 0.9798) and 0.9845 (0.9695, 0.9889), respectively. The genotype concordance rate across all genotypes increased from 0.9711 to 0.9949 when the SNPs used were restricted to those with three high-resolution genotype clusters which represented 75.2% of the called genotypes. Our results suggest that, regardless of the genotype platform or service provider, high genotype concordance rates

  8. X:Map: annotation and visualization of genome structure for Affymetrix exon array analysis

    PubMed Central

    Yates, Tim; Okoniewski, Michał J.; Miller, Crispin J.

    2008-01-01

    Affymetrix exon arrays aim to target every known and predicted exon in the human, mouse or rat genomes, and have reporters that extend beyond protein coding regions to other areas of the transcribed genome. This combination of increased coverage and precision is important because a substantial proportion of protein coding genes are predicted to be alternatively spliced, and because many non-coding genes are known also to be of biological significance. In order to fully exploit these arrays, it is necessary to associate each reporter on the array with the features of the genome it is targeting, and to relate these to gene and genome structure. X:Map is a genome annotation database that provides this information. Data can be browsed using a novel Google-maps based interface, and analysed and further visualized through an associated BioConductor package. The database can be found at http://xmap.picr.man.ac.uk. PMID:17932061

  9. Exon array data analysis using Affymetrix power tools and R statistical software

    PubMed Central

    2011-01-01

    The use of microarray technology to measure gene expression on a genome-wide scale has been well established for more than a decade. Methods to process and analyse the vast quantity of expression data generated by a typical microarray experiment are similarly well-established. The Affymetrix Exon 1.0 ST array is a relatively new type of array, which has the capability to assess expression at the individual exon level. This allows a more comprehensive analysis of the transcriptome, and in particular enables the study of alternative splicing, a gene regulation mechanism important in both normal conditions and in diseases. Some aspects of exon array data analysis are shared with those for standard gene expression data but others present new challenges that have required development of novel tools. Here, I will introduce the exon array and present a detailed example tutorial for analysis of data generated using this platform. PMID:21498550

  10. Understanding the physics of oligonucleotide microarrays: the Affymetrix spike-in data reanalysed.

    PubMed

    Burden, Conrad J

    2008-03-27

    The Affymetrix U95 and U133 Latin-Square spike-in datasets are reanalysed, together with a dataset from a version of the U95 spike-in experiment without a complex non-specific background. The approach uses a physico-chemical model which includes the effects of the specific and non-specific hybridization and probe folding at the microarray surface, target folding and hybridization in the bulk RNA target solution and duplex dissociation during the post-hybridization washing phase. The model predicts a three-parameter hyperbolic response function that fits well with fluorescence intensity data from all the three datasets. The importance of the various hybridization and washing effects in determining each of the three parameters is examined, and some guidance is given as to how a practical algorithm for determining specific target concentrations might be developed.

  11. Understanding the physics of oligonucleotide microarrays: the Affymetrix spike-in data reanalysed

    NASA Astrophysics Data System (ADS)

    Burden, Conrad J.

    2008-03-01

    The Affymetrix U95 and U133 Latin-Square spike-in datasets are reanalysed, together with a dataset from a version of the U95 spike-in experiment without a complex non-specific background. The approach uses a physico-chemical model which includes the effects of the specific and non-specific hybridization and probe folding at the microarray surface, target folding and hybridization in the bulk RNA target solution and duplex dissociation during the post-hybridization washing phase. The model predicts a three-parameter hyperbolic response function that fits well with fluorescence intensity data from all the three datasets. The importance of the various hybridization and washing effects in determining each of the three parameters is examined, and some guidance is given as to how a practical algorithm for determining specific target concentrations might be developed.

  12. Exon array data analysis using Affymetrix power tools and R statistical software.

    PubMed

    Lockstone, Helen E

    2011-11-01

    The use of microarray technology to measure gene expression on a genome-wide scale has been well established for more than a decade. Methods to process and analyse the vast quantity of expression data generated by a typical microarray experiment are similarly well-established. The Affymetrix Exon 1.0 ST array is a relatively new type of array, which has the capability to assess expression at the individual exon level. This allows a more comprehensive analysis of the transcriptome, and in particular enables the study of alternative splicing, a gene regulation mechanism important in both normal conditions and in diseases. Some aspects of exon array data analysis are shared with those for standard gene expression data but others present new challenges that have required development of novel tools. Here, I will introduce the exon array and present a detailed example tutorial for analysis of data generated using this platform.

  13. ArrayInitiative - a tool that simplifies creating custom Affymetrix CDFs

    PubMed Central

    2011-01-01

    Background Probes on a microarray represent a frozen view of a genome and are quickly outdated when new sequencing studies extend our knowledge, resulting in significant measurement error when analyzing any microarray experiment. There are several bioinformatics approaches to improve probe assignments, but without in-house programming expertise, standardizing these custom array specifications as a usable file (e.g. as Affymetrix CDFs) is difficult, owing mostly to the complexity of the specification file format. However, without correctly standardized files there is a significant barrier for testing competing analysis approaches since this file is one of the required inputs for many commonly used algorithms. The need to test combinations of probe assignments and analysis algorithms led us to develop ArrayInitiative, a tool for creating and managing custom array specifications. Results ArrayInitiative is a standalone, cross-platform, rich client desktop application for creating correctly formatted, custom versions of manufacturer-provided (default) array specifications, requiring only minimal knowledge of the array specification rules and file formats. Users can import default array specifications, import probe sequences for a default array specification, design and import a custom array specification, export any array specification to multiple output formats, export the probe sequences for any array specification and browse high-level information about the microarray, such as version and number of probes. The initial release of ArrayInitiative supports the Affymetrix 3' IVT expression arrays we currently analyze, but as an open source application, we hope that others will contribute modules for other platforms. Conclusions ArrayInitiative allows researchers to create new array specifications, in a standard format, based upon their own requirements. This makes it easier to test competing design and analysis strategies that depend on probe definitions. Since the

  14. A tractable probabilistic model for Affymetrix probe-level analysis across multiple chips.

    PubMed

    Liu, Xuejun; Milo, Marta; Lawrence, Neil D; Rattray, Magnus

    2005-09-15

    Affymetrix GeneChip arrays are currently the most widely used microarray technology. Many summarization methods have been developed to provide gene expression levels from Affymetrix probe-level data. Most of the currently popular methods do not provide a measure of uncertainty for the expression level of each gene. The use of probabilistic models can overcome this limitation. A full hierarchical Bayesian approach requires the use of computationally intensive MCMC methods that are impractical for large datasets. An alternative computationally efficient probabilistic model, mgMOS, uses Gamma distributions to model specific and non-specific binding with a latent variable to capture variations in probe affinity. Although promising, the main limitations of this model are that it does not use information from multiple chips and does not account for specific binding to the mismatch (MM) probes. We extend mgMOS to model the binding affinity of probe-pairs across multiple chips and to capture the effect of specific binding to MM probes. The new model, multi-mgMOS, provides improved accuracy, as demonstrated on some bench-mark datasets and a real time-course dataset, and is much more computationally efficient than a competing hierarchical Bayesian approach that requires MCMC sampling. We demonstrate how the probabilistic model can be used to estimate credibility intervals for expression levels and their log-ratios between conditions. Both mgMOS and the new model multi-mgMOS have been implemented in an R package, which is available at http://www.bioinf.man.ac.uk/resources/puma.

  15. MAAMD: a workflow to standardize meta-analyses and comparison of affymetrix microarray data

    PubMed Central

    2014-01-01

    Background Mandatory deposit of raw microarray data files for public access, prior to study publication, provides significant opportunities to conduct new bioinformatics analyses within and across multiple datasets. Analysis of raw microarray data files (e.g. Affymetrix CEL files) can be time consuming, complex, and requires fundamental computational and bioinformatics skills. The development of analytical workflows to automate these tasks simplifies the processing of, improves the efficiency of, and serves to standardize multiple and sequential analyses. Once installed, workflows facilitate the tedious steps required to run rapid intra- and inter-dataset comparisons. Results We developed a workflow to facilitate and standardize Meta-Analysis of Affymetrix Microarray Data analysis (MAAMD) in Kepler. Two freely available stand-alone software tools, R and AltAnalyze were embedded in MAAMD. The inputs of MAAMD are user-editable csv files, which contain sample information and parameters describing the locations of input files and required tools. MAAMD was tested by analyzing 4 different GEO datasets from mice and drosophila. MAAMD automates data downloading, data organization, data quality control assesment, differential gene expression analysis, clustering analysis, pathway visualization, gene-set enrichment analysis, and cross-species orthologous-gene comparisons. MAAMD was utilized to identify gene orthologues responding to hypoxia or hyperoxia in both mice and drosophila. The entire set of analyses for 4 datasets (34 total microarrays) finished in ~ one hour. Conclusions MAAMD saves time, minimizes the required computer skills, and offers a standardized procedure for users to analyze microarray datasets and make new intra- and inter-dataset comparisons. PMID:24621103

  16. SNP genotyping by heteroduplex analysis.

    PubMed

    Paniego, Norma; Fusari, Corina; Lia, Verónica; Puebla, Andrea

    2015-01-01

    Heteroduplex-based genotyping methods have proven to be technologically effective and economically efficient for low- to medium-range throughput single-nucleotide polymorphism (SNP) determination. In this chapter we describe two protocols that were successfully applied for SNP detection and haplotype analysis of candidate genes in association studies. The protocols involve (1) enzymatic mismatch cleavage with endonuclease CEL1 from celery, associated with fragment separation using capillary electrophoresis (CEL1 cleavage), and (2) differential retention of the homo/heteroduplex DNA molecules under partial denaturing conditions on ion pair reversed-phase liquid chromatography (dHPLC). Both methods are complementary since dHPLC is more versatile than CEL1 cleavage for identifying multiple SNP per target region, and the latter is easily optimized for sequences with fewer SNPs or small insertion/deletion polymorphisms. Besides, CEL1 cleavage is a powerful method to localize the position of the mutation when fragment resolution is done using capillary electrophoresis.

  17. Spin Transport and Relaxation up to 250 K in Heavily Doped n -Type Ge Detected Using Co2 FeAl0.5 Si0.5 Electrodes

    NASA Astrophysics Data System (ADS)

    Fujita, Y.; Yamada, M.; Tsukahara, M.; Oka, T.; Yamada, S.; Kanashima, T.; Sawano, K.; Hamaya, K.

    2017-07-01

    To achieve spin transport in heavily doped n -type Ge (n+ -type Ge) in the high-temperature range (T ≥130 K ), we examine the growth of highly spin-polarized Co2 FeAl0.5 Si0.5 (CFAS) films on Ge(111). Using lateral spin valves with the CFAS /Ge Schottky-tunnel contacts, we can observe giant enhancement in the nonlocal spin signals at low temperatures and get a spin signal of approximately 1 m Ω at room temperature. Since nonlocal Hanle-effect curves can be seen up to about 250 K, we experimentally clarify that the spin-relaxation mechanism in n+-type Ge in 8 K ≤T ≤250 K is attributed to impurity- and phonon-induced spin-flip scatterings. This study experimentally shows the availability of the CFAS spin injector and detector for getting important factors of the spin relaxation up to near room temperature, even in Ge.

  18. Quality assessment of the Affymetrix U133A&B probesets by target sequence mapping and expression data analysis.

    PubMed

    Orlov, Yuriy L; Zhou, Jiangtao; Lipovich, Leonard; Shahab, Atif; Kuznetsov, Vladimir A

    2007-01-01

    Careful analysis of microarray probe design should be an obligatory component of MicroArray Quality Control (MACQ) project [Patterson et al., 2006; Shi et al., 2006] initiated by the FDA (USA) in order to provide quality control tools to researchers of gene expression profiles and to translate the microarray technology from bench to bedside. The identification and filtering of unreliable probesets are important preprocessing steps before analysis of microarray data. These steps may result in an essential improvement in the selection of differentially expressed genes, gene clustering and construction of co-regulatory expression networks. We revised genome localization of the Affymetrix U133A&B GeneChip initial (target) probe sequences, and evaluated the impact of erroneous and poorly annotated target sequences on the quality of gene expression data. We found about 25% of Affymetrix target sequences overlapping with interspersed repeats that could cause cross-hybridization effects. In total, discrepancies in target sequence annotation account for up to approximately 30% of 44692 Affymetrix probesets. We introduce a novel quality control algorithm based on target sequence mapping onto genome and GeneChip expression data analysis. To validate the quality of probesets we used expression data from large, clinically and genetically distinct groups of breast cancers (249 samples). For the first time, we quantitatively evaluated the effect of repeats and other sources of inadequate probe design on the specificity, reliability and discrimination ability of Affymetrix probesets. We propose that only functionally reliable Affymetrix probesets that passed our quality control algorithm (approximately 86%) for gene expression analysis should be utilized. The target sequence annotation and filtering is available upon request.

  19. Elucidation of the ‘Honeycrisp’ pedigree through haplotype analysis with a multi-family integrated SNP linkage map and a large apple (Malus×domestica) pedigree-connected SNP data set

    PubMed Central

    Howard, Nicholas P; van de Weg, Eric; Bedford, David S; Peace, Cameron P; Vanderzande, Stijn; Clark, Matthew D; Teh, Soon Li; Cai, Lichun; Luby, James J

    2017-01-01

    The apple (Malus×domestica) cultivar Honeycrisp has become important economically and as a breeding parent. An earlier study with SSR markers indicated the original recorded pedigree of ‘Honeycrisp’ was incorrect and ‘Keepsake’ was identified as one putative parent, the other being unknown. The objective of this study was to verify ‘Keepsake’ as a parent and identify and genetically describe the unknown parent and its grandparents. A multi-family based dense and high-quality integrated SNP map was created using the apple 8 K Illumina Infinium SNP array. This map was used alongside a large pedigree-connected data set from the RosBREED project to build extended SNP haplotypes and to identify pedigree relationships. ‘Keepsake’ was verified as one parent of ‘Honeycrisp’ and ‘Duchess of Oldenburg’ and ‘Golden Delicious’ were identified as grandparents through the unknown parent. Following this finding, siblings of ‘Honeycrisp’ were identified using the SNP data. Breeding records from several of these siblings suggested that the previously unreported parent is a University of Minnesota selection, MN1627. This selection is no longer available, but now is genetically described through imputed SNP haplotypes. We also present the mosaic grandparental composition of ‘Honeycrisp’ for each of its 17 chromosome pairs. This new pedigree and genetic information will be useful in future pedigree-based genetic studies to connect ‘Honeycrisp’ with other cultivars used widely in apple breeding programs. The created SNP linkage map will benefit future research using the data from the Illumina apple 8 and 20 K and Affymetrix 480 K SNP arrays. PMID:28243452

  20. Identifying the impact of G-quadruplexes on Affymetrix 3' arrays using cloud computing.

    PubMed

    Memon, Farhat N; Owen, Anne M; Sanchez-Graillet, Olivia; Upton, Graham J G; Harrison, Andrew P

    2010-01-15

    A tetramer quadruplex structure is formed by four parallel strands of DNA/ RNA containing runs of guanine. These quadruplexes are able to form because guanine can Hoogsteen hydrogen bond to other guanines, and a tetrad of guanines can form a stable arrangement. Recently we have discovered that probes on Affymetrix GeneChips that contain runs of guanine do not measure gene expression reliably. We associate this finding with the likelihood that quadruplexes are forming on the surface of GeneChips. In order to cope with the rapidly expanding size of GeneChip array datasets in the public domain, we are exploring the use of cloud computing to replicate our experiments on 3' arrays to look at the effect of the location of G-spots (runs of guanines). Cloud computing is a recently introduced high-performance solution that takes advantage of the computational infrastructure of large organisations such as Amazon and Google. We expect that cloud computing will become widely adopted because it enables bioinformaticians to avoid capital expenditure on expensive computing resources and to only pay a cloud computing provider for what is used. Moreover, as well as financial efficiency, cloud computing is an ecologically-friendly technology, it enables efficient data-sharing and we expect it to be faster for development purposes. Here we propose the advantageous use of cloud computing to perform a large data-mining analysis of public domain 3' arrays.

  1. The Affymetrix DMET Plus Platform Reveals Unique Distribution of ADME-Related Variants in Ethnic Arabs

    PubMed Central

    Wakil, Salma M.; Nguyen, Cao; Muiya, Nzioka P.; Andres, Editha; Lykowska-Tarnowska, Agnieszka; Baz, Batoul; Meyer, Brian F.; Morahan, Grant

    2015-01-01

    Background. The Affymetrix Drug Metabolizing Enzymes and Transporters (DMET) Plus Premier Pack has been designed to genotype 1936 gene variants thought to be essential for screening patients in personalized drug therapy. These variants include the cytochrome P450s (CYP450s), the key metabolizing enzymes, many other enzymes involved in phase I and phase II pharmacokinetic reactions, and signaling mediators associated with variability in clinical response to numerous drugs not only among individuals, but also between ethnic populations. Materials and Methods. We genotyped 600 Saudi individuals for 1936 variants on the DMET platform to evaluate their clinical potential in personalized medicine in ethnic Arabs. Results. Approximately 49% each of the 437 CYP450 variants, 56% of the 581 transporters, 56% of 419 transferases, 48% of the 104 dehydrogenases, and 58% of the remaining 390 variants were detected. Several variants, such as rs3740071, rs6193, rs258751, rs6199, rs11568421, and rs8187797, exhibited significantly either higher or lower minor allele frequencies (MAFs) than those in other ethnic groups. Discussion. The present study revealed some unique distribution trends for several variants in Arabs, which displayed partly inverse allelic prevalence compared to other ethnic populations. The results point therefore to the need to verify and ascertain the prevalence of a variant as a prerequisite for engaging it in clinical routine screening in personalized medicine in any given population. PMID:25802476

  2. LiF:Mg,Ti TLD response as a function of photon energy for moderately filtered x-ray spectra in the range of 20-250 kVp relative to {sup 60}Co

    SciTech Connect

    Nunn, A. A.; Davis, S. D.; Micka, J. A.; DeWerd, L. A.

    2008-05-15

    The response of LiF:Mg,Ti thermoluminescent dosimeters (TLDs) as a function of photon energy was determined using irradiations with moderately filtered x-ray beams in the energy range of 20-250 kVp relative to the response to irradiations with {sup 60}Co photons. To determine if the relative light output from LiF:Mg,Ti TLDs per unit air kerma as a function of photon energy can be predicted using calculations such as Monte Carlo (MC) simulations, measurements from the x-ray beam irradiations were compared with MC calculated results, similar to the methodology used by Davis et al. [Radiat. Prot. Dosim. 106, 33-43 (2003)]. TLDs were irradiated in photon beams with well-known air kerma rates using the National Institute of Standards and Technology traceable M-series x-ray beams in the range of 20-250 kVp. For each x-ray beam, several sets of TLDs were irradiated for times corresponding to different air kerma levels to take into account any dose nonlinearity. TLD light output was then compared to that from several sets of TLDs irradiated at similar corresponding air kerma levels using a {sup 60}Co irradiator. The MC code MCNP5 was used to account for photon scatter and attenuation in the holder and TLDs and was used to calculate the predicted relative TLD light output per unit air kerma for irradiations with each of the experimentally used photon beams. The measured relative TLD response as a function of photon energy differed by up to 13% from the MC calculations. We conclude that MC calculations do not accurately predict the relative response of TLDs as a function of photon energy, consistent with the conclusions of Davis et al. [Radiat. Prot. Dosim. 106, 33-43 (2003)]. This is likely due to complications in the solid state physics of the thermoluminescence process that are not incorporated into the simulation.

  3. Linear reduction methods for tag SNP selection.

    PubMed

    He, Jingwu; Zelikovsky, Alex

    2004-01-01

    It is widely hoped that constructing a complete human haplotype map will help to associate complex diseases with certain SNP's. Unfortunately, the number of SNP's is huge and it is very costly to sequence many individuals. Therefore, it is desirable to reduce the number of SNP's that should be sequenced to considerably small number of informative representatives, so called tag SNP's. In this paper, we propose a new linear algebra based method for selecting and using tag SNP's. Our method is purely combinatorial and can be combined with linkage disequilibrium (LD) and block based methods. We measure the quality of our tag SNP selection algorithm by comparing actual SNP's with SNP's linearly predicted from linearly chosen tag SNP's. We obtain an extremely good compression and prediction rates. For example, for long haplotypes (>25000 SNP's), knowing only 0.4% of all SNP's we predict the entire unknown haplotype with 2% accuracy while the prediction method is based on a 10% sample of the population.

  4. Ancestry informative marker panels for African Americans based on subsets of commercially available SNP arrays.

    PubMed

    Tandon, Arti; Patterson, Nick; Reich, David

    2011-01-01

    Admixture mapping is a widely used method for localizing disease genes in African Americans. Most current methods for inferring ancestry at each locus in the genome use a few thousand single nucleotide polymorphisms (SNPs) that are very different in frequency between West Africans and European Americans, and that are required to not be in linkage disequilibrium in the ancestral populations. Modern SNP arrays provide data on hundreds of thousands of SNPs per sample, and to use these to infer ancestry, using many of the standard methods, it is necessary to choose subsets of the SNPs for analysis. Here we present panels of about 4,300 ancestry informative markers (AIMs) that are subsets respectively of SNPs on the Illumina 1 M, Illumina 650, Illumina 610, Affymetrix 6.0 and Affymetrix 5.0 arrays. To validate the usefulness of these panels, we applied them to samples that are different from the ones used to select the SNPs. The panels provide about 80% of the maximum information about African or European ancestry, even with up to 10% missing data. © 2010 Wiley-Liss, Inc.

  5. Development and evaluation of an Arabidopsis whole genome Affymetrix probe array.

    PubMed

    Redman, Julia C; Haas, Brian J; Tanimoto, Gene; Town, Christopher D

    2004-05-01

    We describe the development of a high-density Arabidopsis'whole genome' oligonucleotide probe array for expression analysis (the Affymetrix ATH1 GeneChip probe array) that contains approximately 22 750 probe sets. Precedence on the array was given to genes for which either expression evidence or a credible database match existed. The remaining space was filled with 'hypothetical' genes. The new ATH1 array represents approximately 23 750 genes of which 60% were detected in RNA from cultured seedlings. Sensitivity of the array, determined using spiking controls, was approximately one transcript per cell. The array demonstrated high technical reproducibility and concordance with real-time PCR results. Indole-3 acetic acid (IAA)-induced changes in gene expression were used for biological validation of the array. A total of 222 genes were significantly upregulated and 103 significantly downregulated by exposure to IAA. Of the genes whose products could be functionally classified, the largest specific classes of upregulated genes were transcriptional regulators and protein kinases, many fewer of which were represented among the downregulated genes. Over one-third of the auxin-regulated genes have no known function, although many belong to gene families with members that have previously been shown to be auxin regulated. For the 6714 genes represented both on this and the earlier Arabidopsis Genome (AG) array, both signal intensities and gene expression ratios were very similar. Mapping of the oligonucleotides on the ATH1 array to the latest (version 4.0) annotation showed that over 95% of the probe sets (based on version 2.0 annotation) still fully represented their original target genes.

  6. Preferred analysis methods for Affymetrix GeneChips. II. An expanded, balanced, wholly-defined spike-in dataset

    PubMed Central

    2010-01-01

    Background Concomitant with the rise in the popularity of DNA microarrays has been a surge of proposed methods for the analysis of microarray data. Fully controlled "spike-in" datasets are an invaluable but rare tool for assessing the performance of various methods. Results We generated a new wholly defined Affymetrix spike-in dataset consisting of 18 microarrays. Over 5700 RNAs are spiked in at relative concentrations ranging from 1- to 4-fold, and the arrays from each condition are balanced with respect to both total RNA amount and degree of positive versus negative fold change. We use this new "Platinum Spike" dataset to evaluate microarray analysis routes and contrast the results to those achieved using our earlier Golden Spike dataset. Conclusions We present updated best-route methods for Affymetrix GeneChip analysis and demonstrate that the degree of "imbalance" in gene expression has a significant effect on the performance of these methods. PMID:20507584

  7. Genome-wide SNP genotyping highlights the role of natural selection in Plasmodium falciparum population divergence

    PubMed Central

    Neafsey, Daniel E; Schaffner, Stephen F; Volkman, Sarah K; Park, Daniel; Montgomery, Philip; Milner, Danny A; Lukens, Amanda; Rosen, David; Daniels, Rachel; Houde, Nathan; Cortese, Joseph F; Tyndall, Erin; Gates, Casey; Stange-Thomann, Nicole; Sarr, Ousmane; Ndiaye, Daouda; Ndir, Omar; Mboup, Soulyemane; Ferreira, Marcelo U; Moraes, Sandra do Lago; Dash, Aditya P; Chitnis, Chetan E; Wiegand, Roger C; Hartl, Daniel L; Birren, Bruce W; Lander, Eric S; Sabeti, Pardis C; Wirth, Dyann F

    2008-01-01

    Background The malaria parasite Plasmodium falciparum exhibits abundant genetic diversity, and this diversity is key to its success as a pathogen. Previous efforts to study genetic diversity in P. falciparum have begun to elucidate the demographic history of the species, as well as patterns of population structure and patterns of linkage disequilibrium within its genome. Such studies will be greatly enhanced by new genomic tools and recent large-scale efforts to map genomic variation. To that end, we have developed a high throughput single nucleotide polymorphism (SNP) genotyping platform for P. falciparum. Results Using an Affymetrix 3,000 SNP assay array, we found roughly half the assays (1,638) yielded high quality, 100% accurate genotyping calls for both major and minor SNP alleles. Genotype data from 76 global isolates confirm significant genetic differentiation among continental populations and varying levels of SNP diversity and linkage disequilibrium according to geographic location and local epidemiological factors. We further discovered that nonsynonymous and silent (synonymous or noncoding) SNPs differ with respect to within-population diversity, inter-population differentiation, and the degree to which allele frequencies are correlated between populations. Conclusions The distinct population profile of nonsynonymous variants indicates that natural selection has a significant influence on genomic diversity in P. falciparum, and that many of these changes may reflect functional variants deserving of follow-up study. Our analysis demonstrates the potential for new high-throughput genotyping technologies to enhance studies of population structure, natural selection, and ultimately enable genome-wide association studies in P. falciparum to find genes underlying key phenotypic traits. PMID:19077304

  8. Genome-wide SNP analysis of the Systemic Capillary Leak Syndrome (Clarkson disease)

    PubMed Central

    Xie, Zhihui; Nagarajan, Vijayaraj; Sturdevant, Daniel E; Iwaki, Shoko; Chan, Eunice; Wisch, Laura; Young, Michael; Nelson, Celeste M; Porcella, Stephen F; Druey, Kirk M

    2013-01-01

    The Systemic Capillary Leak Syndrome (SCLS) is an extremely rare, orphan disease that resembles, and is frequently erroneously diagnosed as, systemic anaphylaxis. The disorder is characterized by repeated, transient, and seemingly unprovoked episodes of hypotensive shock and peripheral edema due to transient endothelial hyperpermeability. SCLS is often accompanied by a monoclonal gammopathy of unknown significance (MGUS). Using Affymetrix Single Nucleotide Polymorphism (SNP) microarrays, we performed the first genome-wide SNP analysis of SCLS in a cohort of 12 disease subjects and 18 controls. Exome capture sequencing was performed on genomic DNA from nine of these patients as validation for the SNP-chip discoveries and de novo data generation. We identified candidate susceptibility loci for SCLS, which included a region flanking CAV3 (3p25.3) as well as SNP clusters in PON1 (7q21.3), PSORS1C1 (6p21.3), and CHCHD3 (7q33). Among the most highly ranked discoveries were gene-associated SNPs in the uncharacterized LOC100130480 gene (rs6417039, rs2004296). Top case-associated SNPs were observed in BTRC (rs12355803, 3rs4436485), ARHGEF18 (rs11668246), CDH13 (rs4782779), and EDG2 (rs12552348), which encode proteins with known or suspected roles in B cell function and/or vascular integrity. 61 SNPs that were significantly associated with SCLS by microarray analysis were also detected and validated by exome deep sequencing. Functional annotation of highly ranked SNPs revealed enrichment of cell projections, cell junctions and adhesion, and molecules containing pleckstrin homology, Ras/Rho regulatory, and immunoglobulin Ig-like C2/fibronectin type III domains, all of which involve mechanistic functions that correlate with the SCLS phenotype. These results highlight SNPs with potential relevance to SCLS. PMID:24808988

  9. Development of a Medium Density Combined-Species SNP Array for Pacific and European Oysters (Crassostrea gigas and Ostrea edulis)

    PubMed Central

    Gutierrez, Alejandro P.; Turner, Frances; Gharbi, Karim; Talbot, Richard; Lowe, Natalie R.; Peñaloza, Carolina; McCullough, Mark; Prodöhl, Paulo A.; Bean, Tim P.; Houston, Ross D.

    2017-01-01

    SNP arrays are enabling tools for high-resolution studies of the genetic basis of complex traits in farmed and wild animals. Oysters are of critical importance in many regions from both an ecological and economic perspective, and oyster aquaculture forms a key component of global food security. The aim of our study was to design a combined-species, medium density SNP array for Pacific oyster (Crassostrea gigas) and European flat oyster (Ostrea edulis), and to test the performance of this array on farmed and wild populations from multiple locations, with a focus on European populations. SNP discovery was carried out by whole-genome sequencing (WGS) of pooled genomic DNA samples from eight C. gigas populations, and restriction site-associated DNA sequencing (RAD-Seq) of 11 geographically diverse O. edulis populations. Nearly 12 million candidate SNPs were discovered and filtered based on several criteria, including preference for SNPs segregating in multiple populations and SNPs with monomorphic flanking regions. An Affymetrix Axiom Custom Array was created and tested on a diverse set of samples (n = 219) showing ∼27 K high quality SNPs for C. gigas and ∼11 K high quality SNPs for O. edulis segregating in these populations. A high proportion of SNPs were segregating in each of the populations, and the array was used to detect population structure and levels of linkage disequilibrium (LD). Further testing of the array on three C. gigas nuclear families (n = 165) revealed that the array can be used to clearly distinguish between both families based on identity-by-state (IBS) clustering parental assignment software. This medium density, combined-species array will be publicly available through Affymetrix, and will be applied for genome-wide association and evolutionary genetic studies, and for genomic selection in oyster breeding programs. PMID:28533337

  10. Development of a Medium Density Combined-Species SNP Array for Pacific and European Oysters (Crassostrea gigas and Ostrea edulis).

    PubMed

    Gutierrez, Alejandro P; Turner, Frances; Gharbi, Karim; Talbot, Richard; Lowe, Natalie R; Peñaloza, Carolina; McCullough, Mark; Prodöhl, Paulo A; Bean, Tim P; Houston, Ross D

    2017-07-05

    SNP arrays are enabling tools for high-resolution studies of the genetic basis of complex traits in farmed and wild animals. Oysters are of critical importance in many regions from both an ecological and economic perspective, and oyster aquaculture forms a key component of global food security. The aim of our study was to design a combined-species, medium density SNP array for Pacific oyster (Crassostrea gigas) and European flat oyster (Ostrea edulis), and to test the performance of this array on farmed and wild populations from multiple locations, with a focus on European populations. SNP discovery was carried out by whole-genome sequencing (WGS) of pooled genomic DNA samples from eight C. gigas populations, and restriction site-associated DNA sequencing (RAD-Seq) of 11 geographically diverse O. edulis populations. Nearly 12 million candidate SNPs were discovered and filtered based on several criteria, including preference for SNPs segregating in multiple populations and SNPs with monomorphic flanking regions. An Affymetrix Axiom Custom Array was created and tested on a diverse set of samples (n = 219) showing ∼27 K high quality SNPs for C. gigas and ∼11 K high quality SNPs for O. edulis segregating in these populations. A high proportion of SNPs were segregating in each of the populations, and the array was used to detect population structure and levels of linkage disequilibrium (LD). Further testing of the array on three C. gigas nuclear families (n = 165) revealed that the array can be used to clearly distinguish between both families based on identity-by-state (IBS) clustering parental assignment software. This medium density, combined-species array will be publicly available through Affymetrix, and will be applied for genome-wide association and evolutionary genetic studies, and for genomic selection in oyster breeding programs. Copyright © 2017 Gutierrez et al.

  11. Determination of the Sensibility Factors for TLD-100 Powder on the Energy of X-Ray of 50, 250 kVp; 192Ir, 137Cs and 60Co

    SciTech Connect

    Loaiza, Sandra P.; Alvarez, Jose T.

    2006-09-08

    TLD-100 powder is calibrated in terms of absorbed dose to water Dw, using the protocols AAPM TG61, AAPM TG43 and IAEA-TRS 398, for the energy of RX 50, 250 kVp, 137Cs and 60Co respectively. The calibration curves, TLD Response R versus Dw, are fitted by weighted least square by a quadratic polynomials; which are validated with the lack of fit and the Anderson-Darling normality test. The slope of these curves corresponds to the sensibility factor: Fs R/DW, [Fs] = nC Gy-1. The expanded uncertainties U's for these factors are obtained from the ANOVA tables. Later, the Fs' values are interpolated using the effective energy hvefec for the 192Ir. The SSDL sent a set of capsules with powder TLD-100 for two Hospitals. These irradiated them a nominal dose of Dw = 2 Gy. The results determined at SSDL are: for the Hospital A the Dw is overestimated in order to 4.8% and the Hospital B underestimates it in the range from -1.4% to -17.5%.

  12. Crystallization of 21.25Gd 2O 3-63.75MoO 3-15B 2O 3 glass induced by femtosecond laser at the repetition rate of 250 kHz

    NASA Astrophysics Data System (ADS)

    Zhong, M. J.; Han, Y. M.; Liu, L. P.; Zhou, P.; Du, Y. Y.; Guo, Q. T.; Ma, H. L.; Dai, Y.

    2010-12-01

    We report the formation of β'-Gd 2(MoO 4) 3 (GMO) crystal on the surface of the 21.25Gd 2O 3-63.75MoO 3-15B 2O 3 glass, induced by 250 kHz, 800 nm femtosecond laser irradiation. The morphology of the modified region in the glass was clearly examined by scanning electron microscopy (SEM). By micro-Raman spectra, the laser-induced crystals were confirmed to be GMO phases and it is found that these crystals have a strong dependence on the number and power of the femtosecond laser pulses. When the irradiation laser power was 900 mW, not only the Raman peaks of GMO crystals but also some new peaks at 214 cm -1, 240 cm -1, 466 cm -1, 664 cm -1 and 994 cm -1which belong to the MoO 3 crystals were observed. The possible mechanisms are proposed to explain these phenomena.

  13. Detection of selective sweeps in cattle using genome-wide SNP data

    PubMed Central

    2013-01-01

    Background The domestication and subsequent selection by humans to create breeds and biological types of cattle undoubtedly altered the patterning of variation within their genomes. Strong selection to fix advantageous large-effect mutations underlying domesticability, breed characteristics or productivity created selective sweeps in which variation was lost in the chromosomal region flanking the selected allele. Selective sweeps have now been identified in the genomes of many animal species including humans, dogs, horses, and chickens. Here, we attempt to identify and characterise regions of the bovine genome that have been subjected to selective sweeps. Results Two datasets were used for the discovery and validation of selective sweeps via the fixation of alleles at a series of contiguous SNP loci. BovineSNP50 data were used to identify 28 putative sweep regions among 14 diverse cattle breeds. Affymetrix BOS 1 prescreening assay data for five breeds were used to identify 85 regions and validate 5 regions identified using the BovineSNP50 data. Many genes are located within these regions and the lack of sequence data for the analysed breeds precludes the nomination of selected genes or variants and limits the prediction of the selected phenotypes. However, phenotypes that we predict to have historically been under strong selection include horned-polled, coat colour, stature, ear morphology, and behaviour. Conclusions The bias towards common SNPs in the design of the BovineSNP50 assay led to the identification of recent selective sweeps associated with breed formation and common to only a small number of breeds rather than ancient events associated with domestication which could potentially be common to all European taurines. The limited SNP density, or marker resolution, of the BovineSNP50 assay significantly impacted the rate of false discovery of selective sweeps, however, we found sweeps in common between breeds which were confirmed using an ultra

  14. Analysis of a claimed distant relationship in a deficient pedigree using high density SNP data.

    PubMed

    Lareu, M V; García-Magariños, M; Phillips, C; Quintela, I; Carracedo, A; Salas, A

    2012-05-01

    DNA markers are routinely used to reveal both simple and complex family relationships. Likelihood based approaches have been traditionally used to estimate relationships using relatively few unlinked markers. However it is widely recognized that when using such limited numbers of loci distant relationships between two individuals cannot be distinguished from the average level of allele sharing found in random pairwise comparisons in the same population. As a real example, we demonstrate the usefulness of genome-wide SNP genotyping to analyze a claimed second cousin relationship that could not be resolved using standard forensic markers, confirming theoretical expectations for very distant relationships. Genome profiles derived from Affymetrix 6.0 SNP arrays obtained from the claimed second cousins were compared to profiles obtained from unrelated individuals and simulated data. Significance of the high estimated probabilities in favor of the second cousin relationship hypothesis was proved from the results obtained with both real and simulated unrelated pairs. As a final cautionary note, it is important to consider that successful identification of the claimed distant relationship reported here is largely due to a well-founded hypothesis being compared to the alternative hypothesis of the claimants being unrelated, but where there are several possible alternative hypotheses, the approach we outline here can yield false indications of unfounded alternative relationships. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

  15. SNP-RFLPing 2: an updated and integrated PCR-RFLP tool for SNP genotyping

    PubMed Central

    2010-01-01

    Background PCR-restriction fragment length polymorphism (RFLP) assay is a cost-effective method for SNP genotyping and mutation detection, but the manual mining for restriction enzyme sites is challenging and cumbersome. Three years after we constructed SNP-RFLPing, a freely accessible database and analysis tool for restriction enzyme mining of SNPs, significant improvements over the 2006 version have been made and incorporated into the latest version, SNP-RFLPing 2. Results The primary aim of SNP-RFLPing 2 is to provide comprehensive PCR-RFLP information with multiple functionality about SNPs, such as SNP retrieval to multiple species, different polymorphism types (bi-allelic, tri-allelic, tetra-allelic or indels), gene-centric searching, HapMap tagSNPs, gene ontology-based searching, miRNAs, and SNP500Cancer. The RFLP restriction enzymes and the corresponding PCR primers for the natural and mutagenic types of each SNP are simultaneously analyzed. All the RFLP restriction enzyme prices are also provided to aid selection. Furthermore, the previously encountered updating problems for most SNP related databases are resolved by an on-line retrieval system. Conclusions The user interfaces for functional SNP analyses have been substantially improved and integrated. SNP-RFLPing 2 offers a new and user-friendly interface for RFLP genotyping that can be used in association studies and is freely available at http://bio.kuas.edu.tw/snp-rflping2. PMID:20377871

  16. Development of a high density 600K SNP genotyping array for chicken

    PubMed Central

    2013-01-01

    Background High density (HD) SNP genotyping arrays are an important tool for genetic analyses of animals and plants. Although the chicken is one of the most important farm animals, no HD array is yet available for high resolution genetic analysis of this species. Results We report here the development of a 600 K Affymetrix® Axiom® HD genotyping array designed using SNPs segregating in a wide variety of chicken populations. In order to generate a large catalogue of segregating SNPs, we re-sequenced 243 chickens from 24 chicken lines derived from diverse sources (experimental, commercial broiler and layer lines) by pooling 10–15 samples within each line. About 139 million (M) putative SNPs were detected by mapping sequence reads to the new reference genome (Gallus_gallus_4.0) of which ~78 M appeared to be segregating in different lines. Using criteria such as high SNP-quality score, acceptable design scores predicting high conversion performance in the final array and uniformity of distribution across the genome, we selected ~1.8 M SNPs for validation through genotyping on an independent set of samples (n = 282). About 64% of the SNPs were polymorphic with high call rates (>98%), good cluster separation and stable Mendelian inheritance. Polymorphic SNPs were further analysed for their population characteristics and genomic effects. SNPs with extreme breach of Hardy-Weinberg equilibrium (P < 0.00001) were excluded from the panel. The final array, designed on the basis of these analyses, consists of 580,954 SNPs and includes 21,534 coding variants. SNPs were selected to achieve an essentially uniform distribution based on genetic map distance for both broiler and layer lines. Due to a lower extent of LD in broilers compared to layers, as reported in previous studies, the ratio of broiler and layer SNPs in the array was kept as 3:2. The final panel was shown to genotype a wide range of samples including broilers and layers with over 100 K to 450 K

  17. Development of a high density 600K SNP genotyping array for chicken.

    PubMed

    Kranis, Andreas; Gheyas, Almas A; Boschiero, Clarissa; Turner, Frances; Yu, Le; Smith, Sarah; Talbot, Richard; Pirani, Ali; Brew, Fiona; Kaiser, Pete; Hocking, Paul M; Fife, Mark; Salmon, Nigel; Fulton, Janet; Strom, Tim M; Haberer, Georg; Weigend, Steffen; Preisinger, Rudolf; Gholami, Mahmood; Qanbari, Saber; Simianer, Henner; Watson, Kellie A; Woolliams, John A; Burt, David W

    2013-01-28

    High density (HD) SNP genotyping arrays are an important tool for genetic analyses of animals and plants. Although the chicken is one of the most important farm animals, no HD array is yet available for high resolution genetic analysis of this species. We report here the development of a 600 K Affymetrix® Axiom® HD genotyping array designed using SNPs segregating in a wide variety of chicken populations. In order to generate a large catalogue of segregating SNPs, we re-sequenced 243 chickens from 24 chicken lines derived from diverse sources (experimental, commercial broiler and layer lines) by pooling 10-15 samples within each line. About 139 million (M) putative SNPs were detected by mapping sequence reads to the new reference genome (Gallus_gallus_4.0) of which ~78 M appeared to be segregating in different lines. Using criteria such as high SNP-quality score, acceptable design scores predicting high conversion performance in the final array and uniformity of distribution across the genome, we selected ~1.8 M SNPs for validation through genotyping on an independent set of samples (n = 282). About 64% of the SNPs were polymorphic with high call rates (>98%), good cluster separation and stable Mendelian inheritance. Polymorphic SNPs were further analysed for their population characteristics and genomic effects. SNPs with extreme breach of Hardy-Weinberg equilibrium (P < 0.00001) were excluded from the panel. The final array, designed on the basis of these analyses, consists of 580,954 SNPs and includes 21,534 coding variants. SNPs were selected to achieve an essentially uniform distribution based on genetic map distance for both broiler and layer lines. Due to a lower extent of LD in broilers compared to layers, as reported in previous studies, the ratio of broiler and layer SNPs in the array was kept as 3:2. The final panel was shown to genotype a wide range of samples including broilers and layers with over 100 K to 450 K informative SNPs per line. A principal

  18. Affymetrix Whole-Transcript Human Gene 1.0 ST array is highly concordant with standard 3' expression arrays.

    PubMed

    Pradervand, Sylvain; Paillusson, Alexandra; Thomas, Jérôme; Weber, Johann; Wirapati, Pratyaksha; Hagenbüchle, Otto; Harshman, Keith

    2008-05-01

    The recently released Affymetrix Human Gene 1.0 ST array has two major differences compared with standard 3' based arrays: (i) it interrogates the entire mRNA transcript, and (ii) it uses DNA targets. To assess the impact of these differences on array performance, we performed a series of comparative hybridizations between the Human Gene 1.0 ST and the Affymetrix HG-U133 Plus 2.0 and the Illumina HumanRef-8 BeadChip arrays. Additionally, both RNA and DNA targets were hybridized on HG-U133 Plus 2.0 arrays. The results show that the overall reproducibility of the Gene 1.0 ST array is best. When looking only at the high intensity probes, the reproducibility of the Gene 1.0 ST array and the Illumina BeadChip array is equally good. Concordance of array results was assessed using different inter-platform mappings. Agreements are best between the two labeling protocols using HG-U133 Plus 2.0 array. The Gene 1.0 ST array is most concordant with the HG-U133 array hybridized with cDNA targets. This may reflect the impact of the target type. Overall, the high degree of correspondence provides strong evidence for the reliability of the Gene 1.0 ST array.

  19. Mining Affymetrix microarray data for long non-coding RNAs: altered expression in the nucleus accumbens of heroin abusers.

    PubMed

    Michelhaugh, Sharon K; Lipovich, Leonard; Blythe, Jason; Jia, Hui; Kapatos, Gregory; Bannon, Michael J

    2011-02-01

    Although recent data suggest that some long non-coding RNAs (lncRNAs) exert widespread effects on gene expression and organelle formation, lncRNAs as a group constitute a sizable but poorly characterized fraction of the human transcriptome. We investigated whether some human lncRNA sequences were fortuitously represented on commonly used microarrays, then used this annotation to assess lncRNA expression in human brain. A computational and annotation pipeline was developed to identify lncRNA transcripts represented on Affymetrix U133 arrays. A previously published dataset derived from human nucleus accumbens was then examined for potential lncRNA expression. Twenty-three lncRNAs were determined to be represented on U133 arrays. Of these, dataset analysis revealed that five lncRNAs were consistently detected in samples of human nucleus accumbens. Strikingly, the abundance of these lncRNAs was up-regulated in human heroin abusers compared to matched drug-free control subjects, a finding confirmed by quantitative PCR. This study presents a paradigm for examining existing Affymetrix datasets for the detection and potential regulation of lncRNA expression, including changes associated with human disease. The finding that all detected lncRNAs were up-regulated in heroin abusers is consonant with the proposed role of lncRNAs as mediators of widespread changes in gene expression as occur in drug abuse.

  20. Detecting Susceptibility to Breast Cancer with SNP-SNP Interaction Using BPSOHS and Emotional Neural Networks

    PubMed Central

    Wang, Xiao; Fan, Yue

    2016-01-01

    Studies for the association between diseases and informative single nucleotide polymorphisms (SNPs) have received great attention. However, most of them just use the whole set of useful SNPs and fail to consider the SNP-SNP interactions, while these interactions have already been proven in biology experiments. In this paper, we use a binary particle swarm optimization with hierarchical structure (BPSOHS) algorithm to improve the effective of PSO for the identification of the SNP-SNP interactions. Furthermore, in order to use these SNP interactions in the susceptibility analysis, we propose an emotional neural network (ENN) to treat SNP interactions as emotional tendency. Different from the normal architecture, just as the emotional brain, this architecture provides a specific path to treat the emotional value, by which the SNP interactions can be considered more quickly and directly. The ENN helps us use the prior knowledge about the SNP interactions and other influence factors together. Finally, the experimental results prove that the proposed BPSOHS_ENN algorithm can detect the informative SNP-SNP interaction and predict the breast cancer risk with a much higher accuracy than existing methods. PMID:27294121

  1. Genome-wide copy number profiling using high-density SNP array in chickens.

    PubMed

    Yi, G; Qu, L; Chen, S; Xu, G; Yang, N

    2015-04-01

    Phenotypic diversity is a direct consequence resulting mainly from the impact of underlying genetic variation, and recent studies have shown that copy number variation (CNV) is emerging as an important contributor to both phenotypic variability and disease susceptibility. Herein, we performed a genome-wide CNV scan in 96 chickens from 12 diversified breeds, benefiting from the high-density Affymetrix 600 K SNP arrays. We identified a total of 231 autosomal CNV regions (CNVRs) encompassing 5.41 Mb of the chicken genome and corresponding to 0.59% of the autosomal sequence. The length of these CNVRs ranged from 2.6 to 586.2 kb with an average of 23.4 kb, including 130 gain, 93 loss and eight both gain and loss events. These CNVRs, especially deletions, had lower GC content and were located particularly in gene deserts. In particular, 102 CNVRs harbored 128 chicken genes, most of which were enriched in immune responses. We obtained 221 autosomal CNVRs after converting probe coordinates to Galgal3, and comparative analysis with previous studies illustrated that 153 of these CNVRs were regarded as novel events. Furthermore, qPCR assays were designed for 11 novel CNVRs, and eight (72.73%) were validated successfully. In this study, we demonstrated that the high-density 600 K SNP array can capture CNVs with higher efficiency and accuracy and highlighted the necessity of integrating multiple technologies and algorithms. Our findings provide a pioneering exploration of chicken CNVs based on a high-density SNP array, which contributes to a more comprehensive understanding of genetic variation in the chicken genome and is beneficial to unearthing potential CNVs underlying important traits of chickens. © 2015 Stichting International Foundation for Animal Genetics.

  2. Construction and evaluation of a high-density SNP array for the Pacific oyster (Crassostrea gigas)

    PubMed Central

    Li, Chunyan; Wang, Wei; Li, Busu; Li, Li

    2017-01-01

    Single nucleotide polymorphisms (SNPs) are widely used in genetics and genomics research. The Pacific oyster (Crassostrea gigas) is an economically and ecologically important marine bivalve, and it possesses one of the highest levels of genomic DNA variation among animal species. Pacific oyster SNPs have been extensively investigated; however, the mechanisms by which these SNPs may be used in a high-throughput, transferable, and economical manner remain to be elucidated. Here, we constructed an oyster 190K SNP array using Affymetrix Axiom genotyping technology. We designed 190,420 SNPs on the chip; these SNPs were selected from 54 million SNPs identified through re-sequencing of 472 Pacific oysters collected in China, Japan, Korea, and Canada. Our genotyping results indicated that 133,984 (70.4%) SNPs were polymorphic and successfully converted on the chip. The SNPs were distributed evenly throughout the oyster genome, located in 3,595 scaffolds with a length of ~509.4 million; the average interval spacing was 4,210 bp. In addition, 111,158 SNPs were distributed in 21,050 coding genes, with an average of 5.3 SNPs per gene. In comparison with genotypes obtained through re-sequencing, ~69% of the converted SNPs had a concordance rate of >0.971; the mean concordance rate was 0.966. Evaluation based on genotypes of full-sib family individuals revealed that the average genotyping accuracy rate was 0.975. Carrying 133 K polymorphic SNPs, our oyster 190K SNP array is the first commercially available high-density SNP chip for mollusks, with the highest throughput. It represents a valuable tool for oyster genome-wide association studies, fine linkage mapping, and population genetics. PMID:28328985

  3. SNP Cutter: a comprehensive tool for SNP PCR–RFLP assay design

    PubMed Central

    Zhang, Ruifang; Zhu, Zanhua; Zhu, Hongming; Nguyen, Tu; Yao, Fengxia; Xia, Kun; Liang, Desheng; Liu, Chunyu

    2005-01-01

    The Polymerase chain reaction–restriction fragment length polymorphism (PCR–RFLP) is a relatively simple and inexpensive method for genotyping single nucleotide polymorphisms (SNPs). It requires minimal investment in instrumentation. Here, we describe a web application, ‘SNP Cutter,’ which designs PCR–RFLP assays on a batch of SNPs from the human genome. NCBI dbSNP rs IDs or formatted SNPs are submitted into the SNP Cutter which then uses restriction enzymes from a pre-selected list to perform enzyme selection. The program is capable of designing primers for either natural PCR–RFLP or mismatch PCR–RFLP, depending on the SNP sequence data. SNP Cutter generates the information needed to evaluate and perform genotyping experiments, including a PCR primers list, sizes of original amplicons and different allelic fragment after enzyme digestion. Some output data is tab-delimited, therefore suitable for database archiving. The SNP Cut-ter is available at . PMID:15980518

  4. Development and validation of a high density SNP genotyping array for Atlantic salmon (Salmo salar)

    PubMed Central

    2014-01-01

    Background Dense single nucleotide polymorphism (SNP) genotyping arrays provide extensive information on polymorphic variation across the genome of species of interest. Such information can be used in studies of the genetic architecture of quantitative traits and to improve the accuracy of selection in breeding programs. In Atlantic salmon (Salmo salar), these goals are currently hampered by the lack of a high-density SNP genotyping platform. Therefore, the aim of the study was to develop and test a dense Atlantic salmon SNP array. Results SNP discovery was performed using extensive deep sequencing of Reduced Representation (RR-Seq), Restriction site-Associated DNA (RAD-Seq) and mRNA (RNA-Seq) libraries derived from farmed and wild Atlantic salmon samples (n = 283) resulting in the discovery of > 400 K putative SNPs. An Affymetrix Axiom® myDesign Custom Array was created and tested on samples of animals of wild and farmed origin (n = 96) revealing a total of 132,033 polymorphic SNPs with high call rate, good cluster separation on the array and stable Mendelian inheritance in our sample. At least 38% of these SNPs are from transcribed genomic regions and therefore more likely to include functional variants. Linkage analysis utilising the lack of male recombination in salmonids allowed the mapping of 40,214 SNPs distributed across all 29 pairs of chromosomes, highlighting the extensive genome-wide coverage of the SNPs. An identity-by-state clustering analysis revealed that the array can clearly distinguish between fish of different origins, within and between farmed and wild populations. Finally, Y-chromosome-specific probes included on the array provide an accurate molecular genetic test for sex. Conclusions This manuscript describes the first high-density SNP genotyping array for Atlantic salmon. This array will be publicly available and is likely to be used as a platform for high-resolution genetics research into traits of evolutionary and economic importance in

  5. SNP genotyping by DNA photoligation: application to SNP detection of genes from food crops

    NASA Astrophysics Data System (ADS)

    Yoshimura, Yoshinaga; Ohtake, Tomoko; Okada, Hajime; Ami, Takehiro; Tsukaguchi, Tadashi; Fujimoto, Kenzo

    2009-06-01

    We describe a simple and inexpensive single-nucleotide polymorphism (SNP) typing method, using DNA photoligation with 5-carboxyvinyl-2'-deoxyuridine and two fluorophores. This SNP-typing method facilitates qualitative determination of genes from indica and japonica rice, and showed a high degree of single nucleotide specificity up to 10 000. This method can be used in the SNP typing of actual genomic DNA samples from food crops.

  6. Disease-driven detection of differential inherited SNP modules from SNP network.

    PubMed

    Li, Chuanxing; Li, Yongsheng; Xu, Juan; Lv, Junying; Ma, Ye; Shao, Tingting; Gong, Binsheng; Tan, Renjie; Xiao, Yun; Li, Xia

    2011-12-10

    Detection of the synergetic effects between variants, such as single-nucleotide polymorphisms (SNPs), is crucial for understanding the genetic characters of complex diseases. Here, we proposed a two-step approach to detect differentially inherited SNP modules (synergetic SNP units) from a SNP network. First, SNP-SNP interactions are identified based on prior biological knowledge, such as their adjacency on the chromosome or degree of relatedness between the functional relationships of their genes. These interactions form SNP networks. Second, disease-risk SNP modules (or sub-networks) are prioritised by their differentially inherited properties in IBD (Identity by Descent) profiles of affected and unaffected sibpairs. The search process is driven by the disease information and follows the structure of a SNP network. Simulation studies have indicated that this approach achieves high accuracy and a low false-positive rate in the identification of known disease-susceptible SNPs. Applying this method to an alcoholism dataset, we found that flexible patterns of susceptible SNP combinations do play a role in complex diseases, and some known genes were detected through these risk SNP modules. One example is GRM7, a known alcoholism gene successfully detected by a SNP module comprised of two SNPs, but neither of the two SNPs was significantly associated with the disease in single-locus analysis. These identified genes are also enriched in some pathways associated with alcoholism, including the calcium signalling pathway, axon guidance and neuroactive ligand-receptor interaction. The integration of network biology and genetic analysis provides putative functional bridges between genetic variants and candidate genes or pathways, thereby providing new insight into the aetiology of complex diseases. Copyright © 2011 Elsevier B.V. All rights reserved.

  7. EzArray: A web-based highly automated Affymetrix expression array data management and analysis system

    PubMed Central

    Zhu, Yuerong; Zhu, Yuelin; Xu, Wei

    2008-01-01

    Background Though microarray experiments are very popular in life science research, managing and analyzing microarray data are still challenging tasks for many biologists. Most microarray programs require users to have sophisticated knowledge of mathematics, statistics and computer skills for usage. With accumulating microarray data deposited in public databases, easy-to-use programs to re-analyze previously published microarray data are in high demand. Results EzArray is a web-based Affymetrix expression array data management and analysis system for researchers who need to organize microarray data efficiently and get data analyzed instantly. EzArray organizes microarray data into projects that can be analyzed online with predefined or custom procedures. EzArray performs data preprocessing and detection of differentially expressed genes with statistical methods. All analysis procedures are optimized and highly automated so that even novice users with limited pre-knowledge of microarray data analysis can complete initial analysis quickly. Since all input files, analysis parameters, and executed scripts can be downloaded, EzArray provides maximum reproducibility for each analysis. In addition, EzArray integrates with Gene Expression Omnibus (GEO) and allows instantaneous re-analysis of published array data. Conclusion EzArray is a novel Affymetrix expression array data analysis and sharing system. EzArray provides easy-to-use tools for re-analyzing published microarray data and will help both novice and experienced users perform initial analysis of their microarray data from the location of data storage. We believe EzArray will be a useful system for facilities with microarray services and laboratories with multiple members involved in microarray data analysis. EzArray is freely available from . PMID:18218103

  8. Genome-wide SNP detection, validation, and development of an 8K SNP array for apple

    USDA-ARS?s Scientific Manuscript database

    As high-throughput genetic marker screening systems are essential for a range of genetics studies and plant breeding applications, the International RosBREED SNP Consortium (IRSC) has utilized the Illumina Infinium® II system to develop a medium- to high-throughput SNP screening tool for genome-wide...

  9. SNPMeta: SNP annotation and SNP metadata collection without a reference genome

    USDA-ARS?s Scientific Manuscript database

    The increase in availability of resequencing data is greatly accelerating SNP discovery and has facilitated the development of SNP genotyping assays. This, in turn, is increasing interest in annotation of individual SNPs. Currently, these data are only available through curation, or comparison to a ...

  10. Array-Based Karyotyping for Prognostic Assessment in Chronic Lymphocytic Leukemia

    PubMed Central

    Hagenkord, Jill M.; Monzon, Federico A.; Kash, Shera F.; Lilleberg, Stan; Xie, Qingmei; Kant, Jeffrey A.

    2010-01-01

    Specific chromosomal alterations are recognized as important prognostic factors in chronic lymphocytic leukemia (CLL). Array-based karyotyping is gaining acceptance as an alternative to the standard fluorescence in situ hybridization (FISH) panel for detecting these aberrations. This study explores the optimum single nucleotide polymorphism (SNP) array probe density for routine clinical use, presents clinical validation results for the 250K Nsp Affymetrix SNP array, and highlights clinically actionable genetic lesions missed by FISH and conventional cytogenetics. CLL samples were processed on low (10K2.0), medium (250K Nsp), and high (SNP6.0) probe density Affymetrix SNP arrays. Break point definition and detection rates for clinically relevant genetic lesions were compared. The 250K Nsp array was subsequently validated for routine clinical use and demonstrated 98.5% concordance with the standard CLL FISH panel. SNP array karyotyping detected genomic complexity and/or acquired uniparental disomy not detected by the FISH panel. In particular, a region of acquired uniparental disomy on 17p was shown to harbor two mutated copies of TP53 that would have gone undetected by FISH, conventional cytogenetics, or array comparative genomic hybridization. SNP array karyotyping allows genome-wide, high resolution detection of copy number and uniparental disomy at genomic regions with established prognostic significance in CLL, detects lesions missed by FISH, and provides insight into gene dosage at these loci. PMID:20075210

  11. Characterization of the Streptomyces sp. Strain C5 snp Locus and Development of snp-Derived Expression Vectors

    PubMed Central

    DeSanti, Charles L.; Strohl, William R.

    2003-01-01

    The Streptomyces sp. strain C5 snp locus is comprised of two divergently oriented genes: snpA, a metalloproteinase gene, and snpR, which encodes a LysR-like activator of snpA transcription. The transcriptional start point of snpR is immediately downstream of a strong T-N11-A inverted repeat motif likely to be the SnpR binding site, while the snpA transcriptional start site overlaps the ATG start codon, generating a leaderless snpA transcript. By using the aphII reporter gene of pIJ486 as a reporter, the plasmid-borne snpR-activated snpA promoter was ca. 60-fold more active than either the nonactivated snpA promoter or the melC1 promoter of pIJ702. The snpR-activated snpA promoter produced reporter protein levels comparable to those of the up-mutated ermE∗ promoter. The SnpR-activated snpA promoter was built into a set of transcriptional and translational fusion expression vectors which have been used for the intracellular expression of numerous daunomycin biosynthesis pathway genes from Streptomyces sp. strain C5 as well as the expression and secretion of soluble recombinant human endostatin. PMID:12620855

  12. Chaotic particle swarm optimization for detecting SNP-SNP interactions for CXCL12-related genes in breast cancer prevention.

    PubMed

    Chuang, Li-Yeh; Chang, Hsueh-Wei; Lin, Ming-Cheng; Yang, Cheng-Hong

    2012-07-01

    Genome-wide association studies have revealed that many single nucleotide polymorphisms (SNPs) are associated with breast cancer, and yet the potential SNP-SNP interactions have not been well addressed to date. This study aims to develop a methodology for the selection of SNP-genotype combinations with a maximum difference between case and control groups. We propose a new chaotic particle swarm optimization (CPSO) algorithm that identifies the best SNP combinations for breast cancer association studies containing seven SNPs. Five scoring functions, that is, the percentage correct, sensitivity/specificity, positive predictive value/negative predictive value, risk ratio, and odds ratio, are provided for evaluating SNP interactions in different SNP combinations. The CPSO algorithm identified the best SNP combinations associated with breast cancer protection. Some SNP interactions in specific SNPs and their corresponding genotypes were revealed. These SNP combinations showed a significant association with breast cancer protection (P<0.05). The sensitivity and specificity of the respective best SNP combinations were all higher than 90%. In contrast to the corresponding non-SNP-SNP interaction combinations, the estimated odds ratio and risk ratio of the SNP-SNP interaction in SNP combinations for breast cancer were less than 100%. This suggests that CPSO can successfully identify the best SNP combinations for breast cancer protection. In conclusion, we focus on developing a methodology for the selection of SNP-genotype combinations with a maximum difference between case and control groups. The CPSO method can effectively identify SNP-SNP interactions in complex biological relationships underlying the progression of breast cancer.

  13. SNP-SNP Interaction Analysis on Soybean Oil Content under Multi-Environments

    PubMed Central

    Yin, Zhengong; Leng, Yue; Yu, Hongxiao; Jia, Huiying; Jiang, Shanshan; Ni, Zhongqiu; Jiang, Hongwei; Han, Xue; Liu, Chunyan; Hu, Zhenbang; Wu, Xiaoxia; Hu, Guohua; Xin, Dawei; Qi, Zhaoming

    2016-01-01

    Soybean oil content is one of main quality traits. In this study, we used the multifactor dimensionality reduction (MDR) method and a soybean high-density genetic map including 5,308 markers to identify stable single nucleotide polymorphism (SNP)—SNP interactions controlling oil content in soybean across 23 environments. In total, 36,442,756 SNP-SNP interaction pairs were detected, 1865 of all interaction pairs associated with soybean oil content were identified under multiple environments by the Bonferroni correction with p <3.55×10−11. Two and 1863 SNP-SNP interaction pairs detected stable across 12 and 11 environments, respectively, which account around 50% of total environments. Epistasis values and contribution rates of stable interaction (the SNP interaction pairs were detected in more than 2 environments) pairs were detected by the two way ANOVA test, the available interaction pairs were ranged 0.01 to 0.89 and from 0.01 to 0.85, respectively. Some of one side of the interaction pairs were identified with previously research as a major QTL without epistasis effects. The results of this study provide insights into the genetic architecture of soybean oil content and can serve as a basis for marker-assisted selection breeding. PMID:27668866

  14. A summarization approach for Affymetrix GeneChip data using a reference training set from a large, biologically diverse database.

    PubMed

    Katz, Simon; Irizarry, Rafael A; Lin, Xue; Tripputi, Mark; Porter, Mark W

    2006-10-23

    Many of the most popular pre-processing methods for Affymetrix expression arrays, such as RMA, gcRMA, and PLIER, simultaneously analyze data across a set of predetermined arrays to improve precision of the final measures of expression. One problem associated with these algorithms is that expression measurements for a particular sample are highly dependent on the set of samples used for normalization and results obtained by normalization with a different set may not be comparable. A related problem is that an organization producing and/or storing large amounts of data in a sequential fashion will need to either re-run the pre-processing algorithm every time an array is added or store them in batches that are pre-processed together. Furthermore, pre-processing of large numbers of arrays requires loading all the feature-level data into memory which is a difficult task even with modern computers. We utilize a scheme that produces all the information necessary for pre-processing using a very large training set that can be used for summarization of samples outside of the training set. All subsequent pre-processing tasks can be done on an individual array basis. We demonstrate the utility of this approach by defining a new version of the Robust Multi-chip Averaging (RMA) algorithm which we refer to as refRMA. We assess performance based on multiple sets of samples processed over HG U133A Affymetrix GeneChip arrays. We show that the refRMA workflow, when used in conjunction with a large, biologically diverse training set, results in the same general characteristics as that of RMA in its classic form when comparing overall data structure, sample-to-sample correlation, and variation. Further, we demonstrate that the refRMA workflow and reference set can be robustly applied to naïve organ types and to benchmark data where its performance indicates respectable results. Our results indicate that a biologically diverse reference database can be used to train a model for

  15. A verification protocol for the probe sequences of Affymetrix genome arrays reveals high probe accuracy for studies in mouse, human and rat

    PubMed Central

    Alberts, Rudi; Terpstra, Peter; Hardonk, Menno; Bystrykh, Leonid V; de Haan, Gerald; Breitling, Rainer; Nap, Jan-Peter; Jansen, Ritsert C

    2007-01-01

    Background The Affymetrix GeneChip technology uses multiple probes per gene to measure its expression level. Individual probe signals can vary widely, which hampers proper interpretation. This variation can be caused by probes that do not properly match their target gene or that match multiple genes. To determine the accuracy of Affymetrix arrays, we developed an extensive verification protocol, for mouse arrays incorporating the NCBI RefSeq, NCBI UniGene Unique, NIA Mouse Gene Index, and UCSC mouse genome databases. Results Applying this protocol to Affymetrix Mouse Genome arrays (the earlier U74Av2 and the newer 430 2.0 array), the number of sequence-verified probes with perfect matches was no less than 85% and 95%, respectively; and for 74% and 85% of the probe sets all probes were sequence verified. The latter percentages increased to 80% and 94% after discarding one or two unverifiable probes per probe set, and even further to 84% and 97% when, in addition, allowing for one or two mismatches between probe and target gene. Similar results were obtained for other mouse arrays, as well as for human and rat arrays. Based on these data, refined chip definition files for all arrays are provided online. Researchers can choose the version appropriate for their study to (re)analyze expression data. Conclusion The accuracy of Affymetrix probe sequences is higher than previously reported, particularly on newer arrays. Yet, refined probe set definitions have clear effects on the detection of differentially expressed genes. We demonstrate that the interpretation of the results of Affymetrix arrays is improved when the new chip definition files are used. PMID:17448222

  16. Clinical application of SNP array analysis in fetuses with ventricular septal defects and normal karyotypes.

    PubMed

    Fu, Fang; Deng, Qiong; Lei, Ting-Ying; Li, Ru; Jing, Xiang-Yi; Yang, Xin; Liao, Can

    2017-09-13

    The present study aims to evaluate the utility of high-resolution single-nucleotide polymorphism (SNP) arrays in fetuses with ventricular septal defects (VSDs) with or without other structural anomalies but with normal karyotypes and to investigate the outcomes of cases of prenatal VSDs via clinical follow-up. We analyzed 144 fetuses with VSDs and normal karyotypes using Affymetrix CytoScan HD arrays and the analyses were carried out a year after birth. Clinically significant CNVs were detected in 12 fetuses (8.3%). The most common pathogenic CNV was a 22q11.2 deletion with a detection rate of 2.8% (4/144). Well-known microdeletion or microduplication syndromes, including Smith-Magenis, Miller-Dieker, 9q subtelomeric deletion, 1p36 microdeletion, 1q21.1 microduplication, and terminal 4q deletion syndrome, were identified in six cases. Three regions of chromosomal imbalance were also identified: microduplication at 12q24.32q24.33, microdeletion at 16p13.13p13.12 and microdeletion at Xp21.1. The genes TBX1, SKI, GJA5, EHMT1, NOTCH1 were identified as established genes and LZTR1, PRDM26, YWHAE, FAT1, AKAP10, ERCC4, and ULK1 were identified as potential candidate genes of fetal VSDs. There was no significant difference in pathogenic CNVs between isolated VSDs and VSDs with additional structural abnormalities. Ninety-five (74.8%) pregnant women with fetuses with benign CNVs chose to continue the pregnancy and had a favorable prognosis, while nine (75%) pregnant women with fetuses with pathogenic CNVs chose to terminate the pregnancy. High-resolution SNP arrays are valuable tools for identifying submicroscopic chromosomal abnormalities in the prenatal diagnosis of VSDs. An excellent outcome can be expected for VSD fetuses that are negative for chromosomal anomalies and other severe anatomic abnormalities.

  17. Technical Reproducibility of Genotyping SNP Arrays Used in Genome-Wide Association Studies

    PubMed Central

    Hong, Huixiao; Xu, Lei; Liu, Jie; Jones, Wendell D.; Su, Zhenqiang; Ning, Baitang; Perkins, Roger; Ge, Weigong; Miclaus, Kelci; Zhang, Li; Park, Kyunghee; Green, Bridgett; Han, Tao; Fang, Hong; Lambert, Christophe G.; Vega, Silvia C.; Lin, Simon M.; Jafari, Nadereh; Czika, Wendy; Wolfinger, Russell D.; Goodsaid, Federico; Tong, Weida; Shi, Leming

    2012-01-01

    During the last several years, high-density genotyping SNP arrays have facilitated genome-wide association studies (GWAS) that successfully identified common genetic variants associated with a variety of phenotypes. However, each of the identified genetic variants only explains a very small fraction of the underlying genetic contribution to the studied phenotypic trait. Moreover, discordance observed in results between independent GWAS indicates the potential for Type I and II errors. High reliability of genotyping technology is needed to have confidence in using SNP data and interpreting GWAS results. Therefore, reproducibility of two widely genotyping technology platforms from Affymetrix and Illumina was assessed by analyzing four technical replicates from each of the six individuals in five laboratories. Genotype concordance of 99.40% to 99.87% within a laboratory for the sample platform, 98.59% to 99.86% across laboratories for the same platform, and 98.80% across genotyping platforms was observed. Moreover, arrays with low quality data were detected when comparing genotyping data from technical replicates, but they could not be detected according to venders’ quality control (QC) suggestions. Our results demonstrated the technical reliability of currently available genotyping platforms but also indicated the importance of incorporating some technical replicates for genotyping QC in order to improve the reliability of GWAS results. The impact of discordant genotypes on association analysis results was simulated and could explain, at least in part, the irreproducibility of some GWAS findings when the effect size (i.e. the odds ratio) and the minor allele frequencies are low. PMID:22970228

  18. SNP Array in Hematopoietic Neoplasms: A Review

    PubMed Central

    Song, Jinming; Shao, Haipeng

    2015-01-01

    Cytogenetic analysis is essential for the diagnosis and prognosis of hematopoietic neoplasms in current clinical practice. Many hematopoietic malignancies are characterized by structural chromosomal abnormalities such as specific translocations, inversions, deletions and/or numerical abnormalities that can be identified by karyotype analysis or fluorescence in situ hybridization (FISH) studies. Single nucleotide polymorphism (SNP) arrays offer high-resolution identification of copy number variants (CNVs) and acquired copy-neutral loss of heterozygosity (LOH)/uniparental disomy (UPD) that are usually not identifiable by conventional cytogenetic analysis and FISH studies. As a result, SNP arrays have been increasingly applied to hematopoietic neoplasms to search for clinically-significant genetic abnormalities. A large numbers of CNVs and UPDs have been identified in a variety of hematopoietic neoplasms. CNVs detected by SNP array in some hematopoietic neoplasms are of prognostic significance. A few specific genes in the affected regions have been implicated in the pathogenesis and may be the targets for specific therapeutic agents in the future. In this review, we summarize the current findings of application of SNP arrays in a variety of hematopoietic malignancies with an emphasis on the clinically significant genetic variants. PMID:27600067

  19. Acquisition of biologically relevant gene expression data by Affymetrix microarray analysis of archival formalin-fixed paraffin-embedded tumours

    PubMed Central

    Linton, K M; Hey, Y; Saunders, E; Jeziorska, M; Denton, J; Wilson, C L; Swindell, R; Dibben, S; Miller, C J; Pepper, S D; Radford, J A; Freemont, A J

    2008-01-01

    Robust protocols for microarray gene expression profiling of archival formalin-fixed paraffin-embedded tissue (FFPET) are needed to facilitate research when availability of fresh-frozen tissue is limited. Recent reports attest to the feasibility of this approach, but the clinical value of these data is poorly understood. We employed state-of-the-art RNA extraction and Affymetrix microarray technology to examine 34 archival FFPET primary extremity soft tissue sarcomas. Nineteen arrays met stringent QC criteria and were used to model prognostic signatures for metastatic recurrence. Arrays from two paired frozen and FFPET samples were compared: although FFPET sensitivity was low (∼50%), high specificity (95%) and positive predictive value (92%) suggest that transcript detection is reliable. Good agreement between arrays and real time (RT)–PCR was confirmed, especially for abundant transcripts, and RT–PCR validated the regulation pattern for 19 of 24 candidate genes (overall R2=0.4662). RT–PCR and immunohistochemistry on independent cases validated prognostic significance for several genes including RECQL4, FRRS1, CFH and MET – whose combined expression carried greater prognostic value than tumour grade – and cmet and TRKB proteins. These molecules warrant further evaluation in larger series. Reliable clinically relevant data can be obtained from archival FFPET, but protocol amendments are needed to improve the sensitivity and broad application of this approach. PMID:18382428

  20. Acquisition of biologically relevant gene expression data by Affymetrix microarray analysis of archival formalin-fixed paraffin-embedded tumours.

    PubMed

    Linton, K M; Hey, Y; Saunders, E; Jeziorska, M; Denton, J; Wilson, C L; Swindell, R; Dibben, S; Miller, C J; Pepper, S D; Radford, J A; Freemont, A J

    2008-04-22

    Robust protocols for microarray gene expression profiling of archival formalin-fixed paraffin-embedded tissue (FFPET) are needed to facilitate research when availability of fresh-frozen tissue is limited. Recent reports attest to the feasibility of this approach, but the clinical value of these data is poorly understood. We employed state-of-the-art RNA extraction and Affymetrix microarray technology to examine 34 archival FFPET primary extremity soft tissue sarcomas. Nineteen arrays met stringent QC criteria and were used to model prognostic signatures for metastatic recurrence. Arrays from two paired frozen and FFPET samples were compared: although FFPET sensitivity was low ( approximately 50%), high specificity (95%) and positive predictive value (92%) suggest that transcript detection is reliable. Good agreement between arrays and real time (RT)-PCR was confirmed, especially for abundant transcripts, and RT-PCR validated the regulation pattern for 19 of 24 candidate genes (overall R(2)=0.4662). RT-PCR and immunohistochemistry on independent cases validated prognostic significance for several genes including RECQL4, FRRS1, CFH and MET - whose combined expression carried greater prognostic value than tumour grade - and cmet and TRKB proteins. These molecules warrant further evaluation in larger series. Reliable clinically relevant data can be obtained from archival FFPET, but protocol amendments are needed to improve the sensitivity and broad application of this approach.

  1. Analysis of SNP-SNP interactions and bone quantitative ultrasound parameter in early adulthood.

    PubMed

    Correa-Rodríguez, María; Viatte, Sebastien; Massey, Jonathan; Schmidt-RioValle, Jacqueline; Rueda-Medina, Blanca; Orozco, Gisela

    2017-10-03

    Osteoporosis individual susceptibility is determined by the interaction of multiple genetic variants and environmental factors. The aim of this study was to conduct SNP-SNP interaction analyses in candidate genes influencing heel quantitative ultrasound (QUS) parameter in early adulthood to identify novel insights into the mechanism of disease. The study population included 575 healthy subjects (mean age 20.41; SD 2.36). To assess bone mass QUS was performed to determine Broadband ultrasound attenuation (BUA, dB/MHz). A total of 32 SNPs mapping to loci that have been characterized as genetic markers for QUS and/or BMD parameters were selected as genetic markers in this study. The association of all possible SNP pairs with QUS was assessed by linear regression and a SNP-SNP interaction was defined as a significant departure from additive effects. The pairwise SNP-SNP analysis showed multiple interactions. The interaction comprising SNPs rs9340799 and rs3736228 that map in the ESR1 and LRP5 genes respectively, revealed the lowest p value after adjusting for confounding factors (p-value = 0.001, β (95% CI) = 14.289 (5.548, 23.029). In addition, our model reported others such as TMEM135-WNT16 (p = 0.007, β(95%CI) = 9.101 (2.498, 15.704), ESR1-DKK1 (p = 0.012, β(95%CI) = 13.641 (2.959, 24.322) or OPG-LRP5 (p = 0.012, β(95%CI) = 8.724 (1.936, 15.512). However, none of the detected interactions remain significant considering the Bonferroni significance threshold for multiple testing (p<0.0001). Our analysis of SNP-SNP interaction in candidate genes of QUS in Caucasian young adults reveal several interactions, especially between ESR1 and LRP5 genes, that did not reach statistical significance. Although our results do not support a relevant genetic contribution of SNP-SNP epistatic interactions to QUS in young adults, further studies in larger independent populations would be necessary to support these preliminary findings.

  2. An Improved Opposition-Based Learning Particle Swarm Optimization for the Detection of SNP-SNP Interactions.

    PubMed

    Shang, Junliang; Sun, Yan; Li, Shengjun; Liu, Jin-Xing; Zheng, Chun-Hou; Zhang, Junying

    2015-01-01

    SNP-SNP interactions have been receiving increasing attention in understanding the mechanism underlying susceptibility to complex diseases. Though many works have been done for the detection of SNP-SNP interactions, the algorithmic development is still ongoing. In this study, an improved opposition-based learning particle swarm optimization (IOBLPSO) is proposed for the detection of SNP-SNP interactions. Highlights of IOBLPSO are the introduction of three strategies, namely, opposition-based learning, dynamic inertia weight, and a postprocedure. Opposition-based learning not only enhances the global explorative ability, but also avoids premature convergence. Dynamic inertia weight allows particles to cover a wider search space when the considered SNP is likely to be a random one and converges on promising regions of the search space while capturing a highly suspected SNP. The postprocedure is used to carry out a deep search in highly suspected SNP sets. Experiments of IOBLPSO are performed on both simulation data sets and a real data set of age-related macular degeneration, results of which demonstrate that IOBLPSO is promising in detecting SNP-SNP interactions. IOBLPSO might be an alternative to existing methods for detecting SNP-SNP interactions.

  3. [Research progress on the phenotype informative SNP in forensic science].

    PubMed

    Liu, Yu-Xuan; Hu, Qing-Qing; Ma, Hong-Du; Huang, Dai-Xin

    2014-10-01

    Single nucleotide polymorphism (SNP) refers to the single base sequence variation in specific location of the human genome. Phenotype informative SNP has gradually become one of the research hot spots in forensic science. In this paper, the forensic research situation and application prospect of phenotype informative SNP in the characteristics of hair, eye and skin color, height, and facial feature are reviewed.

  4. The SNP-set based association study identifies ITGA1 as a susceptibility gene of attention-deficit/hyperactivity disorder in Han Chinese

    PubMed Central

    Liu, L; Zhang, L; Li, H M; Wang, Z R; Xie, X F; Mei, J P; Jin, J L; Shi, J; Sun, L; Li, S C; Tan, Y L; Yang, L; Wang, J; Yang, H M; Qian, Q J; Wang, Y F

    2017-01-01

    Genome-wide association studies, which detect the association between single-nucleotide polymorphisms (SNPs) and disease susceptibility, have been extensively applied to study attention-deficit/hyperactivity disorder (ADHD), but genome-wide significant associations have not been found yet. Genetic heterogeneity and insufficient genomic coverage may account for the missing heritability. We performed a two-stage association study for ADHD in the Han Chinese population. In the discovery stage, 1033 ADHD patients and 950 healthy controls were genotyped using both the Affymetrix Genome-Wide Human SNP Array 6.0 and the Illumina Infinium HumanExome BeadChip. The genotyped SNPs were combined to generate a powerful SNP set with better genomic coverage especially for the nonsynonymous variants. In addition to the association of single SNPs, we collected adjacent SNPs as SNP sets, which were determined by either genes or successive sliding windows, to evaluate their synergetic effect. The candidate susceptibility SNPs were further replicated in an independent cohort of 1441 ADHD patients and 1447 healthy controls. No genome-wide significant SNPs or gene-based SNP sets were found to be associated with ADHD. However, two continuous sliding windows located in ITGA1 (P-value=8.33E−7 and P-value=8.43E−7) were genome-wide significant. The quantitative trait analyses also demonstrated their association with ADHD core symptoms and executive functions. The association was further validated by follow-up replications for four selected SNPs: rs1979398 (P-value=2.64E−6), rs16880453 (P-value=3.58E−4), rs1531545 (P-value=7.62E−4) and rs4074793 (P-value=2.03E−4). Our results suggest that genetic variants in ITGA1 may be involved in the etiology of ADHD and the SNP-set based analysis is a promising strategy for the detection of underlying genetic risk factors. PMID:28809852

  5. The SNP-set based association study identifies ITGA1 as a susceptibility gene of attention-deficit/hyperactivity disorder in Han Chinese.

    PubMed

    Liu, L; Zhang, L; Li, H M; Wang, Z R; Xie, X F; Mei, J P; Jin, J L; Shi, J; Sun, L; Li, S C; Tan, Y L; Yang, L; Wang, J; Yang, H M; Qian, Q J; Wang, Y F

    2017-08-15

    Genome-wide association studies, which detect the association between single-nucleotide polymorphisms (SNPs) and disease susceptibility, have been extensively applied to study attention-deficit/hyperactivity disorder (ADHD), but genome-wide significant associations have not been found yet. Genetic heterogeneity and insufficient genomic coverage may account for the missing heritability. We performed a two-stage association study for ADHD in the Han Chinese population. In the discovery stage, 1033 ADHD patients and 950 healthy controls were genotyped using both the Affymetrix Genome-Wide Human SNP Array 6.0 and the Illumina Infinium HumanExome BeadChip. The genotyped SNPs were combined to generate a powerful SNP set with better genomic coverage especially for the nonsynonymous variants. In addition to the association of single SNPs, we collected adjacent SNPs as SNP sets, which were determined by either genes or successive sliding windows, to evaluate their synergetic effect. The candidate susceptibility SNPs were further replicated in an independent cohort of 1441 ADHD patients and 1447 healthy controls. No genome-wide significant SNPs or gene-based SNP sets were found to be associated with ADHD. However, two continuous sliding windows located in ITGA1 (P-value=8.33E-7 and P-value=8.43E-7) were genome-wide significant. The quantitative trait analyses also demonstrated their association with ADHD core symptoms and executive functions. The association was further validated by follow-up replications for four selected SNPs: rs1979398 (P-value=2.64E-6), rs16880453 (P-value=3.58E-4), rs1531545 (P-value=7.62E-4) and rs4074793 (P-value=2.03E-4). Our results suggest that genetic variants in ITGA1 may be involved in the etiology of ADHD and the SNP-set based analysis is a promising strategy for the detection of underlying genetic risk factors.

  6. A novel TCF7L2 type 2 diabetes SNP identified from fine mapping in African American women

    PubMed Central

    Haddad, Stephen A.; Palmer, Julie R.; Lunetta, Kathryn L.; Ng, Maggie C. Y.; Ruiz-Narváez, Edward A.

    2017-01-01

    SNP rs7903146 in the Wnt pathway’s TCF7L2 gene is the variant most significantly associated with type 2 diabetes to date, with associations observed across diverse populations. We sought to determine whether variants in other Wnt pathway genes are also associated with this disease. We evaluated 69 genes involved in the Wnt pathway, including TCF7L2, for associations with type 2 diabetes in 2632 African American cases and 2596 controls from the Black Women’s Health Study. Tag SNPs for each gene region were genotyped on a custom Affymetrix Axiom Array, and imputation was performed to 1000 Genomes Phase 3 data. Gene-based analyses were conducted using the adaptive rank truncated product (ARTP) statistic. The PSMD2 gene was significantly associated with type 2 diabetes after correction for multiple testing (corrected p = 0.016), based on the nine most significant single variants in the +/- 20 kb region surrounding the gene, which includes nearby genes EIF4G1, ECE2, and EIF2B5. Association data on four of the nine variants were available from an independent sample of 8284 African American cases and 15,543 controls; associations were in the same direction, but weak and not statistically significant. TCF7L2 was the only other gene associated with type 2 diabetes at nominal p <0.01 in our data. One of the three variants in the best gene-based model for TCF7L2, rs114770437, was not correlated with the GWAS index SNP rs7903146 and may represent an independent association signal seen only in African ancestry populations. Data on this SNP were not available in the replication sample. PMID:28253288

  7. Development of a maize 55 K SNP array with improved genome coverage for molecular breeding.

    PubMed

    Xu, Cheng; Ren, Yonghong; Jian, Yinqiao; Guo, Zifeng; Zhang, Yan; Xie, Chuanxiao; Fu, Junjie; Wang, Hongwu; Wang, Guoying; Xu, Yunbi; Li, Ping; Zou, Cheng

    2017-01-01

    With the decrease of cost in genotyping, single nucleotide polymorphisms (SNPs) have gained wide acceptance because of their abundance, even distribution throughout the maize (Zea mays L.) genome, and suitability for high-throughput analysis. In this study, a maize 55 K SNP array with improved genome coverage for molecular breeding was developed on an Affymetrix® Axiom® platform with 55,229 SNPs evenly distributed across the genome, including 22,278 exonic and 19,425 intronic SNPs. This array contains 451 markers that are associated with 368 known genes and two traits of agronomic importance (drought tolerance and kernel oil biosynthesis), 4067 markers that are not covered by the current reference genome, 734 markers that are differentiated significantly between heterotic groups, and 132 markers that are tags for important transgenic events. To evaluate the performance of 55 K array, we genotyped 593 inbred lines with diverse genetic backgrounds. Compared with the widely-used Illumina® MaizeSNP50 BeadChip, our 55 K array has lower missing and heterozygous rates and more SNPs with lower minor allele frequency (MAF) in tropical maize, facilitating in-depth dissection of rare but possibly valuable variation in tropical germplasm resources. Population structure and genetic diversity analysis revealed that this 55 K array is also quite efficient in resolving heterotic groups and performing fine fingerprinting of germplasm. Therefore, this maize 55 K SNP array is a potentially powerful tool for germplasm evaluation (including germplasm fingerprinting, genetic diversity analysis, and heterotic grouping), marker-assisted breeding, and primary quantitative trait loci (QTL) mapping and genome-wide association study (GWAS) for both tropical and temperate maize.

  8. Genome-Wide SNP Detection, Validation, and Development of an 8K SNP Array for Apple

    PubMed Central

    Chagné, David; Crowhurst, Ross N.; Troggio, Michela; Davey, Mark W.; Gilmore, Barbara; Lawley, Cindy; Vanderzande, Stijn; Hellens, Roger P.; Kumar, Satish; Cestaro, Alessandro; Velasco, Riccardo; Main, Dorrie; Rees, Jasper D.; Iezzoni, Amy; Mockler, Todd; Wilhelm, Larry; Van de Weg, Eric; Gardiner, Susan E.; Bassil, Nahla; Peace, Cameron

    2012-01-01

    As high-throughput genetic marker screening systems are essential for a range of genetics studies and plant breeding applications, the International RosBREED SNP Consortium (IRSC) has utilized the Illumina Infinium® II system to develop a medium- to high-throughput SNP screening tool for genome-wide evaluation of allelic variation in apple (Malus×domestica) breeding germplasm. For genome-wide SNP discovery, 27 apple cultivars were chosen to represent worldwide breeding germplasm and re-sequenced at low coverage with the Illumina Genome Analyzer II. Following alignment of these sequences to the whole genome sequence of ‘Golden Delicious’, SNPs were identified using SoapSNP. A total of 2,113,120 SNPs were detected, corresponding to one SNP to every 288 bp of the genome. The Illumina GoldenGate® assay was then used to validate a subset of 144 SNPs with a range of characteristics, using a set of 160 apple accessions. This validation assay enabled fine-tuning of the final subset of SNPs for the Illumina Infinium® II system. The set of stringent filtering criteria developed allowed choice of a set of SNPs that not only exhibited an even distribution across the apple genome and a range of minor allele frequencies to ensure utility across germplasm, but also were located in putative exonic regions to maximize genotyping success rate. A total of 7867 apple SNPs was established for the IRSC apple 8K SNP array v1, of which 5554 were polymorphic after evaluation in segregating families and a germplasm collection. This publicly available genomics resource will provide an unprecedented resolution of SNP haplotypes, which will enable marker-locus-trait association discovery, description of the genetic architecture of quantitative traits, investigation of genetic variation (neutral and functional), and genomic selection in apple. PMID:22363718

  9. is-rSNP: a novel technique for in silico regulatory SNP detection

    PubMed Central

    Macintyre, Geoff; Bailey, James; Haviv, Izhak; Kowalczyk, Adam

    2010-01-01

    Motivation: Determining the functional impact of non-coding disease-associated single nucleotide polymorphisms (SNPs) identified by genome-wide association studies (GWAS) is challenging. Many of these SNPs are likely to be regulatory SNPs (rSNPs): variations which affect the ability of a transcription factor (TF) to bind to DNA. However, experimental procedures for identifying rSNPs are expensive and labour intensive. Therefore, in silico methods are required for rSNP prediction. By scoring two alleles with a TF position weight matrix (PWM), it can be determined which SNPs are likely rSNPs. However, predictions in this manner are noisy and no method exists that determines the statistical significance of a nucleotide variation on a PWM score. Results: We have designed an algorithm for in silico rSNP detection called is-rSNP. We employ novel convolution methods to determine the complete distributions of PWM scores and ratios between allele scores, facilitating assignment of statistical significance to rSNP effects. We have tested our method on 41 experimentally verified rSNPs, correctly predicting the disrupted TF in 28 cases. We also analysed 146 disease-associated SNPs with no known functional impact in an attempt to identify candidate rSNPs. Of the 11 significantly predicted disrupted TFs, 9 had previous evidence of being associated with the disease in the literature. These results demonstrate that is-rSNP is suitable for high-throughput screening of SNPs for potential regulatory function. This is a useful and important tool in the interpretation of GWAS. Availability: is-rSNP software is available for use at: www.genomics.csse.unimelb.edu.au/is-rSNP Contact: gmaci@csse.unimelb.edu.au; adam.kowalczyk@nicta.com.au Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20823317

  10. A Bayesian Framework for SNP Identification

    SciTech Connect

    Webb-Robertson, Bobbie-Jo M.; Havre, Susan L.; Payne, Deborah A.

    2005-07-01

    Current proteomics techniques, such as mass spectrometry, focus on protein identification, usually ignoring most types of modifications beyond post-translational modifications, with the assumption that only a small number of peptides have to be matched to a protein for a positive identification. However, not all proteins are being identified with current techniques and improved methods to locate points of mutation are becoming a necessity. In the case when single-nucleotide polymorphisms (SNPs) are observed, brute force is the most common method to locate them, quickly becoming computationally unattractive as the size of the database associated with the model organism grows. We have developed a Bayesian model for SNPs, BSNP, incorporating evolutionary information at both the nucleotide and amino acid levels. Formulating SNPs as a Bayesian inference problem allows probabilities of interest to be easily obtained, for example the probability of a specific SNP or specific type of mutation over a gene or entire genome. Three SNP databases were observed in the evaluation of the BSNP model; the first SNP database is a disease specific gene in human, hemoglobin, the second is also a disease specific gene in human, p53, and the third is a more general SNP database for multiple genes in mouse. We validate that the BSNP model assigns higher posterior probabilities to the SNPs defined in all three separate databases than can be attributed to chance under specific evolutionary information, for example the amino acid model described by Majewski and Ott in conjunction with either the four-parameter nucleotide model by Bulmer or seven-parameter nucleotide model by Majewski and Ott.

  11. Analyzing cancer samples with SNP arrays.

    PubMed

    Van Loo, Peter; Nilsen, Gro; Nordgard, Silje H; Vollan, Hans Kristian Moen; Børresen-Dale, Anne-Lise; Kristensen, Vessela N; Lingjærde, Ole Christian

    2012-01-01

    Single nucleotide polymorphism (SNP) arrays are powerful tools to delineate genomic aberrations in cancer genomes. However, the analysis of these SNP array data of cancer samples is complicated by three phenomena: (a) aneuploidy: due to massive aberrations, the total DNA content of a cancer cell can differ significantly from its normal two copies; (b) nonaberrant cell admixture: samples from solid tumors do not exclusively contain aberrant tumor cells, but always contain some portion of nonaberrant cells; (c) intratumor heterogeneity: different cells in the tumor sample may have different aberrations. We describe here how these phenomena impact the SNP array profile, and how these can be accounted for in the analysis. In an extended practical example, we apply our recently developed and further improved ASCAT (allele-specific copy number analysis of tumors) suite of tools to analyze SNP array data using data from a series of breast carcinomas as an example. We first describe the structure of the data, how it can be plotted and interpreted, and how it can be segmented. The core ASCAT algorithm next determines the fraction of nonaberrant cells and the tumor ploidy (the average number of DNA copies), and calculates an ASCAT profile. We describe how these ASCAT profiles visualize both copy number aberrations as well as copy-number-neutral events. Finally, we touch upon regions showing intratumor heterogeneity, and how they can be detected in ASCAT profiles. All source code and data described here can be found at our ASCAT Web site ( http://www.ifi.uio.no/forskning/grupper/bioinf/Projects/ASCAT/).

  12. Statistical evaluation of transcriptomic data generated using the Affymetrix one-cycle, two-cycle and IVT-Express RNA labelling protocols with the Arabidopsis ATH1 microarray

    PubMed Central

    2010-01-01

    Background Microarrays are a powerful tool used for the determination of global RNA expression. There is an increasing requirement to focus on profiling gene expression in tissues where it is difficult to obtain large quantities of material, for example individual tissues within organs such as the root, or individual isolated cells. From such samples, it is difficult to produce the amount of RNA required for labelling and hybridisation in microarray experiments, thus a process of amplification is usually adopted. Despite the increasing use of two-cycle amplification for transcriptomic analyses on the Affymetrix ATH1 array, there has been no report investigating any potential bias in gene representation that may occur as a result. Results Here we compare transcriptomic data generated using Affymetrix one-cycle (standard labelling protocol), two-cycle (small-sample protocol) and IVT-Express protocols with the Affymetrix ATH1 array using Arabidopsis root samples. Results obtained with each protocol are broadly similar. However, we show that there are 35 probe sets (of a total of 22810) that are misrepresented in the two-cycle data sets. Of these, 33 probe sets were classed as mis-amplified when comparisons of two independent publicly available data sets were undertaken. Conclusions Given the unreliable nature of the highlighted probes, we caution against using data associated with the corresponding genes in analyses involving transcriptomic data generated with two-cycle amplification protocols. We have shown that the Affymetrix IVT-E labelling protocol produces data with less associated bias than the two-cycle protocol, and as such, would recommend this kit for new experiments that involve small samples. PMID:20230623

  13. Comparison of the Predictive Accuracy of DNA Array-Based Multigene Classifiers across cDNA Arrays and Affymetrix GeneChips

    PubMed Central

    Stec, James; Wang, Jing; Coombes, Kevin; Ayers, Mark; Hoersch, Sebastian; Gold, David L.; Ross, Jeffrey S; Hess, Kenneth R.; Tirrell, Stephen; Linette, Gerald; Hortobagyi, Gabriel N.; Symmans, W. Fraser; Pusztai, Lajos

    2005-01-01

    We examined how well differentially expressed genes and multigene outcome classifiers retain their class-discriminating values when tested on data generated by different transcriptional profiling platforms. RNA from 33 stage I-III breast cancers was hybridized to both Affymetrix GeneChip and Millennium Pharmaceuticals cDNA arrays. Only 30% of all corresponding gene expression measurements on the two platforms had Pearson correlation coefficient r ≥ 0.7 when UniGene was used to match probes. There was substantial variation in correlation between different Affymetrix probe sets matched to the same cDNA probe. When cDNA and Affymetrix probes were matched by basic local alignment tool (BLAST) sequence identity, the correlation increased substantially. We identified 182 genes in the Affymetrix and 45 in the cDNA data (including 17 common genes) that accurately separated 91% of cases in supervised hierarchical clustering in each data set. Cross-platform testing of these informative genes resulted in lower clustering accuracy of 45 and 79%, respectively. Several sets of accurate five-gene classifiers were developed on each platform using linear discriminant analysis. The best 100 classifiers showed average misclassification error rate of 2% on the original data that rose to 19.5% when tested on data from the other platform. Random five-gene classifiers showed misclassification error rate of 33%. We conclude that multigene predictors optimized for one platform lose accuracy when applied to data from another platform due to missing genes and sequence differences in probes that result in differing measurements for the same gene. PMID:16049308

  14. Variable Selection in Logistic Regression for Detecting SNP-SNP Interactions: the Rheumatoid Arthritis Example

    PubMed Central

    Lin, H. Y.; Desmond, R.; Liu, Y. H.; Bridges, S. L.; Soong, S. J.

    2013-01-01

    Summary Many complex disease traits are observed to be associated with single nucleotide polymorphism (SNP) interactions. In testing small-scale SNP-SNP interactions, variable selection procedures in logistic regressions are commonly used. The empirical evidence of variable selection for testing interactions in logistic regressions is limited. This simulation study was designed to compare nine variable selection procedures in logistic regressions for testing SNP-SNP interactions. Data on 10 SNPs were simulated for 400 and 1000 subjects (case/control ratio=1). The simulated model included one main effect and two 2-way interactions. The variable selection procedures included automatic selection (stepwise, forward and backward), common 2-step selection, AIC- and BIC-based selection. The hierarchical rule effect, in which all main effects and lower order terms of the highest-order interaction term are included in the model regardless of their statistical significance, was also examined. We found that the stepwise variable selection without the hierarchical rule which had reasonably high authentic (true positive) proportion and low noise (false positive) proportion, is a better method compared to other variable selection procedures. The procedure without the hierarchical rule requires fewer terms in testing interactions, so it can accommodate more SNPs than the procedure with the hierarchical rule. For testing interactions, the procedures without the hierarchical rule had higher authentic proportion and lower noise proportion compared with ones with the hierarchical rule. These variable selection procedures were also applied and compared in a rheumatoid arthritis study. PMID:18231122

  15. Exploring SNP-SNP interactions and colon cancer risk using polymorphism interaction analysis

    PubMed Central

    Goodman, Julie E.; Mechanic, Leah E.; Luke, Brian T.; Ambs, Stefan; Chanock, Stephen; Harris, Curtis C.

    2006-01-01

    Several single nucleotide polymorphisms (SNPs) in genes derived from distinct pathways are associated with colon cancer risk; however, few studies have examined SNP-SNP interactions concurrently. We explored the association between colon cancer and 94 SNPs, using a novel approach, polymorphism interaction analysis (PIA). We developed PIA to examine all possible SNP combinations, based on the 94 SNPs studied in 216 male colon cancer cases and 255 male controls, employing 2 separate functions that cross-validate and minimize false-positive results in the evaluation of SNP combinations to predict colon cancer risk. PIA identified previously described null polymorphisms in glutathione-S-transferase T1 (GSTT1) as the best predictor of colon cancer among the studied SNPs, and also identified novel polymorphisms in the inflammation and hormone metabolism pathways that singly or jointly predict cancer risk. PIA identified SNPs that may interact with the GSTT1 polymorphism, including coding polymorphisms in TP53 (Arg72Pro in p53) and CASP8 (Asp302His in caspase 8), which may modify the association between this polymorphism and colon cancer. This was confirmed by logistic regression, as the GSTT1 null polymorphism in combination with either the TP53 or the CASP8 polymorphism significantly alter colon cancer risk (pinteraction < 0.02 for both). GSTT1 prevents DNA damage by detoxifying mutagenic compounds, while the p53 protein facilitates repair of DNA damage and induces apoptosis, and caspase 8 is activated in p53-mediated apoptosis. Our results suggest that PIA is a valid method for suggesting SNP-SNP interactions that may be validated in future studies, using more traditional statistical methods on different datasets (Supplementary material can be found on the International Journal of Cancer website at http://www.interscience.wiley.com/jpages/0020-7136/suppmat). PMID:16217767

  16. SNP-RFLPing: restriction enzyme mining for SNPs in genomes.

    PubMed

    Chang, Hsueh-Wei; Yang, Cheng-Hong; Chang, Phei-Lang; Cheng, Yu-Huei; Chuang, Li-Yeh

    2006-02-17

    The restriction fragment length polymorphism (RFLP) is a common laboratory method for the genotyping of single nucleotide polymorphisms (SNPs). Here, we describe a web-based software, named SNP-RFLPing, which provides the restriction enzyme for RFLP assays on a batch of SNPs and genes from the human, rat, and mouse genomes. Three user-friendly inputs are included: 1) NCBI dbSNP "rs" or "ss" IDs; 2) NCBI Entrez gene ID and HUGO gene name; 3) any formats of SNP-in-sequence, are allowed to perform the SNP-RFLPing assay. These inputs are auto-programmed to SNP-containing sequences and their complementary sequences for the selection of restriction enzymes. All SNPs with available RFLP restriction enzymes of each input genes are provided even if many SNPs exist. The SNP-RFLPing analysis provides the SNP contig position, heterozygosity, function, protein residue, and amino acid position for cSNPs, as well as commercial and non-commercial restriction enzymes. This web-based software solves the input format problems in similar softwares and greatly simplifies the procedure for providing the RFLP enzyme. Mixed free forms of input data are friendly to users who perform the SNP-RFLPing assay. SNP-RFLPing offers a time-saving application for association studies in personalized medicine and is freely available at http://bio.kuas.edu.tw/snp-rflp/.

  17. SNP calling by sequencing pooled samples

    PubMed Central

    2012-01-01

    Background Performing high throughput sequencing on samples pooled from different individuals is a strategy to characterize genetic variability at a small fraction of the cost required for individual sequencing. In certain circumstances some variability estimators have even lower variance than those obtained with individual sequencing. SNP calling and estimating the frequency of the minor allele from pooled samples, though, is a subtle exercise for at least three reasons. First, sequencing errors may have a much larger relevance than in individual SNP calling: while their impact in individual sequencing can be reduced by setting a restriction on a minimum number of reads per allele, this would have a strong and undesired effect in pools because it is unlikely that alleles at low frequency in the pool will be read many times. Second, the prior allele frequency for heterozygous sites in individuals is usually 0.5 (assuming one is not analyzing sequences coming from, e.g. cancer tissues), but this is not true in pools: in fact, under the standard neutral model, singletons (i.e. alleles of minimum frequency) are the most common class of variants because P(f) ∝ 1/f and they occur more often as the sample size increases. Third, an allele appearing only once in the reads from a pool does not necessarily correspond to a singleton in the set of individuals making up the pool, and vice versa, there can be more than one read – or, more likely, none – from a true singleton. Results To improve upon existing theory and software packages, we have developed a Bayesian approach for minor allele frequency (MAF) computation and SNP calling in pools (and implemented it in a program called snape): the approach takes into account sequencing errors and allows users to choose different priors. We also set up a pipeline which can simulate the coalescence process giving rise to the SNPs, the pooling procedure and the sequencing. We used it to compare the performance of snape to that

  18. dbSNP: the NCBI database of genetic variation.

    PubMed

    Sherry, S T; Ward, M H; Kholodov, M; Baker, J; Phan, L; Smigielski, E M; Sirotkin, K

    2001-01-01

    In response to a need for a general catalog of genome variation to address the large-scale sampling designs required by association studies, gene mapping and evolutionary biology, the National Center for Biotechnology Information (NCBI) has established the dbSNP database [S.T.Sherry, M.Ward and K. Sirotkin (1999) Genome Res., 9, 677-679]. Submissions to dbSNP will be integrated with other sources of information at NCBI such as GenBank, PubMed, LocusLink and the Human Genome Project data. The complete contents of dbSNP are available to the public at website: http://www.ncbi.nlm.nih.gov/SNP. The complete contents of dbSNP can also be downloaded in multiple formats via anonymous FTP at ftp://ncbi.nlm.nih.gov/snp/.

  19. SNP2CAPS: a SNP and INDEL analysis tool for CAPS marker development.

    PubMed

    Thiel, Thomas; Kota, Raja; Grosse, Ivo; Stein, Nils; Graner, Andreas

    2004-01-02

    With the influx of various SNP genotyping assays in recent years, there has been a need for an assay that is robust, yet cost effective, and could be performed using standard gel-based procedures. In this context, CAPS markers have been shown to meet these criteria. However, converting SNPs to CAPS markers can be a difficult process if done manually. In order to address this problem, we describe a computer program, SNP2CAPS, that facilitates the computational conversion of SNP markers into CAPS markers. 413 multiple aligned sequences derived from barley ESTs were analysed for the presence of polymorphisms in 235 distinct restriction sites. 282 (90%) of 314 alignments that contain sequence variation due to SNPs and InDels revealed at least one polymorphic restriction site. After reducing the number of restriction enzymes from 235 to 10, 31% of the polymorphic sites could still be detected. In order to demonstrate the usefulness of this tool for marker development, we experimentally validated some of the results predicted by SNP2CAPS.

  20. SNP marker detection and genotyping in tilapia.

    PubMed

    Van Bers, N E M; Crooijmans, R P M A; Groenen, M A M; Dibbits, B W; Komen, J

    2012-09-01

    We have generated a unique resource consisting of nearly 175 000 short contig sequences and 3569 SNP markers from the widely cultured GIFT (Genetically Improved Farmed Tilapia) strain of Nile tilapia (Oreochromis niloticus). In total, 384 SNPs were selected to monitor the wider applicability of the SNPs by genotyping tilapia individuals from different strains and different geographical locations. In all strains and species tested (O. niloticus, O. aureus and O. mossambicus), the genotyping assay was working for a similar number of SNPs (288-305 SNPs). The actual number of polymorphic SNPs was, as expected, highest for individuals from the GIFT population (255 SNPs). In the individuals from an Egyptian strain and in individuals caught in the wild in the basin of the river Volta, 197 and 163 SNPs were polymorphic, respectively. A pairwise calculation of Nei's genetic distance allowed the discrimination of the individual strains and species based on the genotypes determined with the SNP set. We expect that this set will be widely applicable for use in tilapia aquaculture, e.g. for pedigree reconstruction. In addition, this set is currently used for assaying the genetic diversity of native Nile tilapia in areas where tilapia is, or will be, introduced in aquaculture projects. This allows the tracing of escapees from aquaculture and the monitoring of effects of introgression and hybridization.

  1. SNP-RFLPing: restriction enzyme mining for SNPs in genomes

    PubMed Central

    Chang, Hsueh-Wei; Yang, Cheng-Hong; Chang, Phei-Lang; Cheng, Yu-Huei; Chuang, Li-Yeh

    2006-01-01

    Background The restriction fragment length polymorphism (RFLP) is a common laboratory method for the genotyping of single nucleotide polymorphisms (SNPs). Here, we describe a web-based software, named SNP-RFLPing, which provides the restriction enzyme for RFLP assays on a batch of SNPs and genes from the human, rat, and mouse genomes. Results Three user-friendly inputs are included: 1) NCBI dbSNP "rs" or "ss" IDs; 2) NCBI Entrez gene ID and HUGO gene name; 3) any formats of SNP-in-sequence, are allowed to perform the SNP-RFLPing assay. These inputs are auto-programmed to SNP-containing sequences and their complementary sequences for the selection of restriction enzymes. All SNPs with available RFLP restriction enzymes of each input genes are provided even if many SNPs exist. The SNP-RFLPing analysis provides the SNP contig position, heterozygosity, function, protein residue, and amino acid position for cSNPs, as well as commercial and non-commercial restriction enzymes. Conclusion This web-based software solves the input format problems in similar softwares and greatly simplifies the procedure for providing the RFLP enzyme. Mixed free forms of input data are friendly to users who perform the SNP-RFLPing assay. SNP-RFLPing offers a time-saving application for association studies in personalized medicine and is freely available at . PMID:16503968

  2. The importance of integrating SNP and cheminformatics resources to pharmacogenomics.

    PubMed

    Chang, Hsueh-Wei; Chuang, Li-Yeh; Tsai, Ming-Tz; Yang, Cheng-Hong

    2012-09-01

    Single nucleotide polymorphisms (SNPs) are the most frequent variants in many genes and are promising markers in relation to drug responses in pharmacogenomics studies. In this review, we emphasized the importance of the cheminformatic-related and SNP-related resources and tools and how they can improve pharmacogenomics studies. Currently, many cheminformatic resources are well developed and provide much information on drug metabolism and targeting. In parallel, there are also many well established SNP-related resources that are able to provide the information related to SNP genotyping, tag SNPs and functional classification. However, cheminformatic and SNP resources have not, as yet, been well-integrated to provide a user-friendly platform for pharmacogenomics studies. This paper presents a brief overview of the many available public resources for cheminformatics (DrugBank, PharmGKB and other drugrelated databases) and SNPs (dbSNP, HapMap, SNP500Cancer, SNP-RFLPing 2 and other SNP tools) and points out the importance of integrating cheminformatic and SNP resources for the future of pharmacogenomics.

  3. Analysis of genetic variation in Ashkenazi Jews by high density SNP genotyping

    PubMed Central

    Olshen, Adam B; Gold, Bert; Lohmueller, Kirk E; Struewing, Jeffery P; Satagopan, Jaya; Stefanov, Stefan A; Eskin, Eleazar; Kirchhoff, Tomas; Lautenberger, James A; Klein, Robert J; Friedman, Eitan; Norton, Larry; Ellis, Nathan A; Viale, Agnes; Lee, Catherine S; Borgen, Patrick I; Clark, Andrew G; Offit, Kenneth; Boyd, Jeff

    2008-01-01

    Background Genetic isolates such as the Ashkenazi Jews (AJ) potentially offer advantages in mapping novel loci in whole genome disease association studies. To analyze patterns of genetic variation in AJ, genotypes of 101 healthy individuals were determined using the Affymetrix EAv3 500 K SNP array and compared to 60 CEPH-derived HapMap (CEU) individuals. 435,632 SNPs overlapped and met annotation criteria in the two groups. Results A small but significant global difference in allele frequencies between AJ and CEU was demonstrated by a mean FST of 0.009 (P < 0.001); large regions that differed were found on chromosomes 2 and 6. Haplotype blocks inferred from pairwise linkage disequilibrium (LD) statistics (Haploview) as well as by expectation-maximization haplotype phase inference (HAP) showed a greater number of haplotype blocks in AJ compared to CEU by Haploview (50,397 vs. 44,169) or by HAP (59,269 vs. 54,457). Average haplotype blocks were smaller in AJ compared to CEU (e.g., 36.8 kb vs. 40.5 kb HAP). Analysis of global patterns of local LD decay for closely-spaced SNPs in CEU demonstrated more LD, while for SNPs further apart, LD was slightly greater in the AJ. A likelihood ratio approach showed that runs of homozygous SNPs were approximately 20% longer in AJ. A principal components analysis was sufficient to completely resolve the CEU from the AJ. Conclusion LD in the AJ versus was lower than expected by some measures and higher by others. Any putative advantage in whole genome association mapping using the AJ population will be highly dependent on regional LD structure. PMID:18251999

  4. SNIT: SNP identification for strain typing

    PubMed Central

    2011-01-01

    With ever-increasing numbers of microbial genomes being sequenced, efficient tools are needed to perform strain-level identification of any newly sequenced genome. Here, we present the SNP identification for strain typing (SNIT) pipeline, a fast and accurate software system that compares a newly sequenced bacterial genome with other genomes of the same species to identify single nucleotide polymorphisms (SNPs) and small insertions/deletions (indels). Based on this information, the pipeline analyzes the polymorphic loci present in all input genomes to identify the genome that has the fewest differences with the newly sequenced genome. Similarly, for each of the other genomes, SNIT identifies the input genome with the fewest differences. Results from five bacterial species show that the SNIT pipeline identifies the correct closest neighbor with 75% to 100% accuracy. The SNIT pipeline is available for download at http://www.bhsai.org/snit.html PMID:21902825

  5. Atomic Force Microscopy for DNA SNP Identification

    NASA Astrophysics Data System (ADS)

    Valbusa, Ugo; Ierardi, Vincenzo

    The knowledge of the effects of single-nucleotide polymorphisms (SNPs) in the human genome greatly contributes to better comprehension of the relation between genetic factors and diseases. Sequence analysis of genomic DNA in different individuals reveals positions where variations that involve individual base substitutions can occur. Single-nucleotide polymorphisms are highly abundant and can have different consequences at phenotypic level. Several attempts were made to apply atomic force microscopy (AFM) to detect and map SNP sites in DNA strands. The most promising approach is the study of DNA mutations producing heteroduplex DNA strands and identifying the mismatches by means of a protein that labels the mismatches. MutS is a protein that is part of a well-known complex of mismatch repair, which initiates the process of repairing when the MutS binds to the mismatched DNA filament. The position of MutS on the DNA filament can be easily recorded by means of AFM imaging.

  6. SNP-SNP interactions as risk factors for aggressive prostate cancer.

    PubMed

    Vaidyanathan, Venkatesh; Naidu, Vijay; Karunasinghe, Nishi; Jabed, Anower; Pallati, Radha; Marlow, Gareth; R Ferguson, Lynnette

    2017-01-01

    Prostate cancer (PCa) is one of the most significant male health concerns worldwide. Single nucleotide polymorphisms (SNPs) are becoming increasingly strong candidate biomarkers for identifying susceptibility to PCa. We identified a number of SNPs reported in genome-wide association analyses (GWAS) as risk factors for aggressive PCa in various European populations, and then defined SNP-SNP interactions, using PLINK software, with nucleic acid samples from a New Zealand cohort. We used this approach to find a gene x environment marker for aggressive PCa, as although statistically gene x environment interactions can be adjusted for, it is highly impossible in practicality, and thus must be incorporated in the search for a reliable biomarker for PCa. We found two intronic SNPs statistically significantly interacting with each other as a risk for aggressive prostate cancer on being compared to healthy controls in a New Zealand population.

  7. SNP-SNP interactions as risk factors for aggressive prostate cancer

    PubMed Central

    Vaidyanathan, Venkatesh; Naidu, Vijay; Karunasinghe, Nishi; Jabed, Anower; Pallati, Radha; Marlow, Gareth; R. Ferguson, Lynnette

    2017-01-01

    Prostate cancer (PCa) is one of the most significant male health concerns worldwide. Single nucleotide polymorphisms (SNPs) are becoming increasingly strong candidate biomarkers for identifying susceptibility to PCa. We identified a number of SNPs reported in genome-wide association analyses (GWAS) as risk factors for aggressive PCa in various European populations, and then defined SNP-SNP interactions, using PLINK software, with nucleic acid samples from a New Zealand cohort. We used this approach to find a gene x environment marker for aggressive PCa, as although statistically gene x environment interactions can be adjusted for, it is highly impossible in practicality, and thus must be incorporated in the search for a reliable biomarker for PCa. We found two intronic SNPs statistically significantly interacting with each other as a risk for aggressive prostate cancer on being compared to healthy controls in a New Zealand population. PMID:28580135

  8. Methods comparison for high-resolution transcriptional analysis of archival material on Affymetrix Plus 2.0 and Exon 1.0 microarrays.

    PubMed

    Linton, Kim; Hey, Yvonne; Dibben, Sian; Miller, Crispin; Freemont, Anthony; Radford, John; Pepper, Stuart

    2009-07-01

    Microarray gene expression profiling of formalin-fixed paraffin-embedded (FFPE) tissues is a new and evolving technique. This report compares transcript detection rates on Affymetrix U133 Plus 2.0 and Human Exon 1.0 ST GeneChips across several RNA extraction and target labeling protocols, using routinely collected archival FFPE samples. All RNA extraction protocols tested (Ambion-Optimum, Ambion-RecoverAll, and Qiagen-RNeasy FFPE) provided extracts suitable for microarray hybridization. Compared with Affymetrix One-Cycle labeled extracts, NuGEN system protocols utilizing oligo(dT) and random hexamer primers, and cDNA target preparations instead of cRNA, achieved percent present rates up to 55% on Plus 2.0 arrays. Based on two paired-sample analyses, at 90% specificity this equalled an average 30 percentage-point increase (from 50% to 80%) in FFPE transcript sensitivity relative to fresh frozen tissues, which we have assumed to have 100% sensitivity and specificity. The high content of Exon arrays, with multiple probe sets per exon, improved FFPE sensitivity to 92% at 96% specificity, corresponding to an absolute increase of ~600 genes over Plus 2.0 arrays. While larger series are needed to confirm high correspondence between fresh-frozen and FFPE expression patterns, these data suggest that both Plus 2.0 and Exon arrays are suitable platforms for FFPE microarray expression analyses.

  9. Exercise improves adiponectin concentrations irrespective of the adiponectin gene polymorphisms SNP45 and the SNP276 in obese Korean women.

    PubMed

    Lee, Kyoung-Young; Kang, Hyun-Sik; Shin, Yun-A

    2013-03-10

    The effects of exercise on adiponectin levels have been reported to be variable and may be attributable to an interaction between environmental and genetic factors. The single nucleotide polymorphisms (SNP) 45 (T>G) and SNP276 (G>T) of the adiponectin gene are associated with metabolic risk factors including adiponectin levels. We examined whether SNP45 and SNP276 would differentially influence the effect of exercise training in middle-aged women with uncomplicated obesity. We conducted a prospective study in the general community that included 90 Korean women (age 47.0±5.1 years) with uncomplicated obesity. The intervention was aerobic exercise training for 3 months. Body composition, adiponectin levels, and other metabolic risk factors were measured. Prior to exercise training, only body weight differed among the SNP276 genotypes. Exercise training improved body composition, systolic blood pressure, maximal oxygen consumption, high-density lipoprotein cholesterol, and leptin levels. In addition, exercise improved adiponectin levels irrespective of weight gain or loss. However, after adjustments for age, BMI, body fat (%), and waist circumference, no differences were found in obesity-related characteristics (e.g., adiponectin) following exercise training among the SNP45 and the 276 genotypes. Our findings suggest that aerobic exercise affects adiponectin levels regardless of weight loss and this effect would not be influenced by SNP45 and SNP276 in the adiponectin gene.

  10. Hybrid Propulsion Demonstration Program 250K Hybrid Motor

    NASA Technical Reports Server (NTRS)

    Story, George; Zoladz, Tom; Arves, Joe; Kearney, Darren; Abel, Terry; Park, O.

    2003-01-01

    The Hybrid Propulsion Demonstration Program (HPDP) program was formed to mature hybrid propulsion technology to a readiness level sufficient to enable commercialization for various space launch applications. The goal of the HPDP was to develop and test a 250,000 pound vacuum thrust hybrid booster in order to demonstrate hybrid propulsion technology and enable manufacturing of large hybrid boosters for current and future space launch vehicles. The HPDP has successfully conducted four tests of the 250,000 pound thrust hybrid rocket motor at NASA's Stennis Space Center. This paper documents the test series.

  11. A model of binding on DNA microarrays: understanding the combined effect of probe synthesis failure, cross-hybridization, DNA fragmentation and other experimental details of affymetrix arrays

    PubMed Central

    2012-01-01

    Background DNA microarrays are used both for research and for diagnostics. In research, Affymetrix arrays are commonly used for genome wide association studies, resequencing, and for gene expression analysis. These arrays provide large amounts of data. This data is analyzed using statistical methods that quite often discard a large portion of the information. Most of the information that is lost comes from probes that systematically fail across chips and from batch effects. The aim of this study was to develop a comprehensive model for hybridization that predicts probe intensities for Affymetrix arrays and that could provide a basis for improved microarray analysis and probe development. The first part of the model calculates probe binding affinities to all the possible targets in the hybridization solution using the Langmuir isotherm. In the second part of the model we integrate details that are specific to each experiment and contribute to the differences between hybridization in solution and on the microarray. These details include fragmentation, wash stringency, temperature, salt concentration, and scanner settings. Furthermore, the model fits probe synthesis efficiency and target concentration parameters directly to the data. All the parameters used in the model have a well-established physical origin. Results For the 302 chips that were analyzed the mean correlation between expected and observed probe intensities was 0.701 with a range of 0.88 to 0.55. All available chips were included in the analysis regardless of the data quality. Our results show that batch effects arise from differences in probe synthesis, scanner settings, wash strength, and target fragmentation. We also show that probe synthesis efficiencies for different nucleotides are not uniform. Conclusions To date this is the most complete model for binding on microarrays. This is the first model that includes both probe synthesis efficiency and hybridization kinetics/cross-hybridization. These

  12. Haplotype assembly from aligned weighted SNP fragments.

    PubMed

    Zhao, Yu-Ying; Wu, Ling-Yun; Zhang, Ji-Hong; Wang, Rui-Sheng; Zhang, Xiang-Sun

    2005-08-01

    Given an assembled genome of a diploid organism the haplotype assembly problem can be formulated as retrieval of a pair of haplotypes from a set of aligned weighted SNP fragments. Known computational formulations (models) of this problem are minimum letter flips (MLF) and the weighted minimum letter flips (WMLF; Greenberg et al. (INFORMS J. Comput. 2004, 14, 211-213)). In this paper we show that the general WMLF model is NP-hard even for the gapless case. However the algorithmic solutions for selected variants of WMFL can exist and we propose a heuristic algorithm based on a dynamic clustering technique. We also introduce a new formulation of the haplotype assembly problem that we call COMPLETE WMLF (CWMLF). This model and algorithms for its implementation take into account a simultaneous presence of multiple kinds of data errors. Extensive computational experiments indicate that the algorithmic implementations of the CWMLF model achieve higher accuracy of haplotype reconstruction than the WMLF-based algorithms, which in turn appear to be more accurate than those based on MLF.

  13. SNP-SNP interaction analysis of NF-κB signaling pathway on breast cancer survival

    PubMed Central

    Jamshidi, Maral; Fagerholm, Rainer; Khan, Sofia; Aittomäki, Kristiina; Czene, Kamila; Darabi, Hatef; Li, Jingmei; Andrulis, Irene L.; Chang-Claude, Jenny; Devilee, Peter; Fasching, Peter A.; Michailidou, Kyriaki; Bolla, Manjeet K.; Dennis, Joe; Wang, Qin; Guo, Qi; Rhenius, Valerie; Cornelissen, Sten; Rudolph, Anja; Knight, Julia A.; Loehberg, Christian R.; Burwinkel, Barbara; Marme, Frederik; Hopper, John L.; Southey, Melissa C.; Bojesen, Stig E.; Flyger, Henrik; Brenner, Hermann; Holleczek, Bernd; Margolin, Sara; Mannermaa, Arto; Kosma, Veli-Matti; Dyck, Laurien Van; Nevelsteen, Ines; Couch, Fergus J.; Olson, Janet E.; Giles, Graham G.; McLean, Catriona; Haiman, Christopher A.; Henderson, Brian E.; Winqvist, Robert; Pylkäs, Katri; Tollenaar, Rob A.E.M.; García-Closas, Montserrat; Figueroa, Jonine; Hooning, Maartje J.; Martens, John W.M.; Cox, Angela; Cross, Simon S.; Simard, Jacques; Dunning, Alison M.; Easton, Douglas F.; Pharoah, Paul D.P.; Hall, Per; Blomqvist, Carl; Schmidt, Marjanka K.; Nevanlinna, Heli

    2015-01-01

    In breast cancer, constitutive activation of NF-κB has been reported, however, the impact of genetic variation of the pathway on patient prognosis has been little studied. Furthermore, a combination of genetic variants, rather than single polymorphisms, may affect disease prognosis. Here, in an extensive dataset (n = 30,431) from the Breast Cancer Association Consortium, we investigated the association of 917 SNPs in 75 genes in the NF-κB pathway with breast cancer prognosis. We explored SNP-SNP interactions on survival using the likelihood-ratio test comparing multivariate Cox’ regression models of SNP pairs without and with an interaction term. We found two interacting pairs associating with prognosis: patients simultaneously homozygous for the rare alleles of rs5996080 and rs7973914 had worse survival (HRinteraction 6.98, 95% CI=3.3-14.4, P = 1.42E-07), and patients carrying at least one rare allele for rs17243893 and rs57890595 had better survival (HRinteraction 0.51, 95% CI=0.3-0.6, P = 2.19E-05). Based on in silico functional analyses and literature, we speculate that the rs5996080 and rs7973914 loci may affect the BAFFR and TNFR1/TNFR3 receptors and breast cancer survival, possibly by disturbing both the canonical and non-canonical NF-κB pathways or their dynamics, whereas, rs17243893-rs57890595 interaction on survival may be mediated through TRAF2-TRAIL-R4 interplay. These results warrant further validation and functional analyses. PMID:26317411

  14. Gene Expression Analysis of Cultured Rat-Endothelial Cells after Nd:YAG Laser Irradiation by Affymetrix GeneChip Array

    PubMed Central

    MASUDA, YOSHIKO; YOKOSE, SATOSHI; SAKAGAMI, HIROSHI

    2017-01-01

    Endothelial cells and dental pulp cells enhance osteo-/odontogenic and angiogenic differentiation. In our previous study, rat pulp cells migrated to Nd:YAG laser-irradiated endothelial cells in an insert cell culture system. The purpose of this study was to examine the possible changes in the gene expression of cultured rat aortic endothelial cells after Nd:YAG laser irradiation using affymetrix GeneChip Array. Total RNA was extracted from the cells at 5 h after laser irradiation. Gene expressions were evaluated by DNA array chip. Up-regulated genes were related to cell migration and cell structure (membrane stretch, actin regulation and junctional complexes), neurotransmission and inflammation. Heat-shock 70 kDa protein (Hsp70) was related to the development of tooth germ. This study offers candidate genes for understanding the relationship between the laser-stimulated endothelial cells and dental pulp cells. PMID:28064220

  15. A scan statistic for identifying chromosomal patterns of SNP association.

    PubMed

    Sun, Yan V; Levin, Albert M; Boerwinkle, Eric; Robertson, Henry; Kardia, Sharon L R

    2006-11-01

    We have developed a single nucleotide polymorphism (SNP) association scan statistic that takes into account the complex distribution of the human genome variation in the identification of chromosomal regions with significant SNP associations. This scan statistic has wide applicability for genetic analysis, whether to identify important chromosomal regions associated with common diseases based on whole-genome SNP association studies or to identify disease susceptibility genes based on dense SNP positional candidate studies. To illustrate this method, we analyzed patterns of SNP associations on chromosome 19 in a large cohort study. Among 2,944 SNPs, we found seven regions that contained clusters of significantly associated SNPs. The average width of these regions was 35 kb with a range of 10-72 kb. We compared the scan statistic results to Fisher's product method using a sliding window approach, and detected 22 regions with significant clusters of SNP associations. The average width of these regions was 131 kb with a range of 10.1-615 kb. Given that the distances between SNPs are not taken into consideration in the sliding window approach, it is likely that a large fraction of these regions represents false positives. However, all seven regions detected by the scan statistic were also detected by the sliding window approach. The linkage disequilibrium (LD) patterns within the seven regions were highly variable indicating that the clusters of SNP associations were not due to LD alone. The scan statistic developed here can be used to make gene-based or region-based SNP inferences about disease association.

  16. A Novel Test for Detecting SNP-SNP Interactions in Case-Only Trio Studies.

    PubMed

    Balliu, Brunilda; Zaitlen, Noah

    2016-04-01

    Epistasis plays a significant role in the genetic architecture of many complex phenotypes in model organisms. To date, there have been very few interactions replicated in human studies due in part to the multiple-hypothesis burden implicit in genome-wide tests of epistasis. Therefore, it is of paramount importance to develop the most powerful tests possible for detecting interactions. In this work we develop a new SNP-SNP interaction test for use in case-only trio studies called the trio correlation (TC) test. The TC test computes the expected joint distribution of marker pairs in offspring conditional on parental genotypes. This distribution is then incorporated into a standard 1 d.f. correlation test of interaction. We show via extensive simulations under a variety of disease models that our test substantially outperforms existing tests of interaction in case-only trio studies. We also demonstrate a bias in a previous case-only trio interaction test and identify its origin. Finally, we show that a previously proposed permutation scheme in trio studies mitigates the known biases of case-only tests in the presence of population stratification. We conclude that the TC test shows improved power to identify interactions in existing, as well as emerging, trio association studies. The method is publicly available at www.github.com/BrunildaBalliu/TrioEpi.

  17. Cardiovascular pharmacogenetics in the SNP era.

    PubMed

    Mooser, V; Waterworth, D M; Isenhour, T; Middleton, L

    2003-07-01

    In the past pharmacological agents have contributed to a significant reduction in age-adjusted incidence of cardiovascular events. However, not all patients treated with these agents respond favorably, and some individuals may develop side-effects. With aging of the population and the growing prevalence of cardiovascular risk factors worldwide, it is expected that the demand for cardiovascular drugs will increase in the future. Accordingly, there is a growing need to identify the 'good' responders as well as the persons at risk for developing adverse events. Evidence is accumulating to indicate that responses to drugs are at least partly under genetic control. As such, pharmacogenetics - the study of variability in drug responses attributed to hereditary factors in different populations - may significantly assist in providing answers toward meeting this challenge. Pharmacogenetics mostly relies on associations between a specific genetic marker like single nucleotide polymorphisms (SNPs), either alone or arranged in a specific linear order on a certain chromosomal region (haplotypes), and a particular response to drugs. Numerous associations have been reported between selected genotypes and specific responses to cardiovascular drugs. Recently, for instance, associations have been reported between specific alleles of the apoE gene and the lipid-lowering response to statins, or the lipid-elevating effect of isotretinoin. Thus far, these types of studies have been mostly limited to a priori selected candidate genes due to restricted genotyping and analytical capacities. Thanks to the large number of SNPs now available in the public domain through the SNP Consortium and the newly developed technologies (high throughput genotyping, bioinformatics software), it is now possible to interrogate more than 200,000 SNPs distributed over the entire human genome. One pharmacogenetic study using this approach has been launched by GlaxoSmithKline to identify the approximately 4% of

  18. A SNP resource for Douglas-fir: de novo transcriptome assembly and SNP detection and validation

    PubMed Central

    2013-01-01

    Background Douglas-fir (Pseudotsuga menziesii), one of the most economically and ecologically important tree species in the world, also has one of the largest tree breeding programs. Although the coastal and interior varieties of Douglas-fir (vars. menziesii and glauca) are native to North America, the coastal variety is also widely planted for timber production in Europe, New Zealand, Australia, and Chile. Our main goal was to develop a SNP resource large enough to facilitate genomic selection in Douglas-fir breeding programs. To accomplish this, we developed a 454-based reference transcriptome for coastal Douglas-fir, annotated and evaluated the quality of the reference, identified putative SNPs, and then validated a sample of those SNPs using the Illumina Infinium genotyping platform. Results We assembled a reference transcriptome consisting of 25,002 isogroups (unique gene models) and 102,623 singletons from 2.76 million 454 and Sanger cDNA sequences from coastal Douglas-fir. We identified 278,979 unique SNPs by mapping the 454 and Sanger sequences to the reference, and by mapping four datasets of Illumina cDNA sequences from multiple seed sources, genotypes, and tissues. The Illumina datasets represented coastal Douglas-fir (64.00 and 13.41 million reads), interior Douglas-fir (80.45 million reads), and a Yakima population similar to interior Douglas-fir (8.99 million reads). We assayed 8067 SNPs on 260 trees using an Illumina Infinium SNP genotyping array. Of these SNPs, 5847 (72.5%) were called successfully and were polymorphic. Conclusions Based on our validation efficiency, our SNP database may contain as many as ~200,000 true SNPs, and as many as ~69,000 SNPs that could be genotyped at ~20,000 gene loci using an Infinium II array—more SNPs than are needed to use genomic selection in tree breeding programs. Ultimately, these genomic resources will enhance Douglas-fir breeding and allow us to better understand landscape-scale patterns of genetic variation

  19. Heterogeneous computing architecture for fast detection of SNP-SNP interactions.

    PubMed

    Sluga, Davor; Curk, Tomaz; Zupan, Blaz; Lotric, Uros

    2014-06-25

    The extent of data in a typical genome-wide association study (GWAS) poses considerable computational challenges to software tools for gene-gene interaction discovery. Exhaustive evaluation of all interactions among hundreds of thousands to millions of single nucleotide polymorphisms (SNPs) may require weeks or even months of computation. Massively parallel hardware within a modern Graphic Processing Unit (GPU) and Many Integrated Core (MIC) coprocessors can shorten the run time considerably. While the utility of GPU-based implementations in bioinformatics has been well studied, MIC architecture has been introduced only recently and may provide a number of comparative advantages that have yet to be explored and tested. We have developed a heterogeneous, GPU and Intel MIC-accelerated software module for SNP-SNP interaction discovery to replace the previously single-threaded computational core in the interactive web-based data exploration program SNPsyn. We report on differences between these two modern massively parallel architectures and their software environments. Their utility resulted in an order of magnitude shorter execution times when compared to the single-threaded CPU implementation. GPU implementation on a single Nvidia Tesla K20 runs twice as fast as that for the MIC architecture-based Xeon Phi P5110 coprocessor, but also requires considerably more programming effort. General purpose GPUs are a mature platform with large amounts of computing power capable of tackling inherently parallel problems, but can prove demanding for the programmer. On the other hand the new MIC architecture, albeit lacking in performance reduces the programming effort and makes it up with a more general architecture suitable for a wider range of problems.

  20. snp-search: simple processing, manipulation and searching of SNPs from high-throughput sequencing

    PubMed Central

    2013-01-01

    Background A typical bacterial pathogen genome mapping project can identify thousands of single nucleotide polymorphisms (SNP). Interpreting SNP data is complex and it is difficult to conceptualise the data contained within the large flat files that are the typical output from most SNP calling algorithms. One solution to this problem is to construct a database that can be queried using simple commands so that SNP interrogation and output is both easy and comprehensible. Results Here we present snp-search, a tool that manages SNP data and allows for manipulation and searching of SNP data. After creation of a SNP database from a VCF file, snp-search can be used to convert the selected SNP data into FASTA sequences, construct phylogenies, look for unique SNPs, and output contextual information about each SNP. The FASTA output from snp-search is particularly useful for the generation of robust phylogenetic trees that are based on SNP differences across the conserved positions in whole genomes. Queries can be designed to answer critical genomic questions such as the association of SNPs with particular phenotypes. Conclusions snp-search is a tool that manages SNP data and outputs useful information which can be used to test important biological hypotheses. PMID:24246037

  1. DoGSD: the dog and wolf genome SNP database.

    PubMed

    Bai, Bing; Zhao, Wen-Ming; Tang, Bi-Xia; Wang, Yan-Qing; Wang, Lu; Zhang, Zhang; Yang, He-Chuan; Liu, Yan-Hu; Zhu, Jun-Wei; Irwin, David M; Wang, Guo-Dong; Zhang, Ya-Ping

    2015-01-01

    The rapid advancement of next-generation sequencing technology has generated a deluge of genomic data from domesticated dogs and their wild ancestor, grey wolves, which have simultaneously broadened our understanding of domestication and diseases that are shared by humans and dogs. To address the scarcity of single nucleotide polymorphism (SNP) data provided by authorized databases and to make SNP data more easily/friendly usable and available, we propose DoGSD (http://dogsd.big.ac.cn), the first canidae-specific database which focuses on whole genome SNP data from domesticated dogs and grey wolves. The DoGSD is a web-based, open-access resource comprising ∼ 19 million high-quality whole-genome SNPs. In addition to the dbSNP data set (build 139), DoGSD incorporates a comprehensive collection of SNPs from two newly sequenced samples (1 wolf and 1 dog) and collected SNPs from three latest dog/wolf genetic studies (7 wolves and 68 dogs), which were taken together for analysis with the population genetic statistics, Fst. In addition, DoGSD integrates some closely related information including SNP annotation, summary lists of SNPs located in genes, synonymous and non-synonymous SNPs, sampling location and breed information. All these features make DoGSD a useful resource for in-depth analysis in dog-/wolf-related studies.

  2. RASSF1A and the rs2073498 Cancer Associated SNP

    PubMed Central

    Donninger, Howard; Barnoud, Thibaut; Nelson, Nick; Kassler, Suzanna; Clark, Jennifer; Cummins, Timothy D.; Powell, David W.; Nyante, Sarah; Millikan, Robert C.; Clark, Geoffrey J.

    2011-01-01

    RASSF1A is one of the most frequently inactivated tumor suppressors yet identified in human cancer. It is pro-apoptotic and appears to function as a scaffolding protein that interacts with a variety of other tumor suppressors to modulate their function. It can also complex with the Ras oncoprotein and may serve to integrate pro-growth and pro-death signaling pathways. A SNP has been identified that is present in approximately 29% of European populations [rs2073498, A(133)S]. Several studies have now presented evidence that this SNP is associated with an enhanced risk of developing breast cancer. We have used a proteomics based approach to identify multiple differences in the pattern of protein/protein interactions mediated by the wild type compared to the SNP variant protein. We have also identified a significant difference in biological activity between wild type and SNP variant protein. However, we have found only a very modest association of the SNP with breast cancer predisposition. PMID:22649770

  3. Assessing batch effects of genotype calling algorithm BRLMM for the Affymetrix GeneChip Human Mapping 500 K array set using 270 HapMap samples

    PubMed Central

    Hong, Huixiao; Su, Zhenqiang; Ge, Weigong; Shi, Leming; Perkins, Roger; Fang, Hong; Xu, Joshua; Chen, James J; Han, Tao; Kaput, Jim; Fuscoe, James C; Tong, Weida

    2008-01-01

    Background Genome-wide association studies (GWAS) aim to identify genetic variants (usually single nucleotide polymorphisms [SNPs]) across the entire human genome that are associated with phenotypic traits such as disease status and drug response. Highly accurate and reproducible genotype calling are paramount since errors introduced by calling algorithms can lead to inflation of false associations between genotype and phenotype. Most genotype calling algorithms currently used for GWAS are based on multiple arrays. Because hundreds of gigabytes (GB) of raw data are generated from a GWAS, the samples are typically partitioned into batches containing subsets of the entire dataset for genotype calling. High call rates and accuracies have been achieved. However, the effects of batch size (i.e., number of chips analyzed together) and of batch composition (i.e., the choice of chips in a batch) on call rate and accuracy as well as the propagation of the effects into significantly associated SNPs identified have not been investigated. In this paper, we analyzed both the batch size and batch composition for effects on the genotype calling algorithm BRLMM using raw data of 270 HapMap samples analyzed with the Affymetrix Human Mapping 500 K array set. Results Using data from 270 HapMap samples interrogated with the Affymetrix Human Mapping 500 K array set, three different batch sizes and three different batch compositions were used for genotyping using the BRLMM algorithm. Comparative analysis of the calling results and the corresponding lists of significant SNPs identified through association analysis revealed that both batch size and composition affected genotype calling results and significantly associated SNPs. Batch size and batch composition effects were more severe on samples and SNPs with lower call rates than ones with higher call rates, and on heterozygous genotype calls compared to homozygous genotype calls. Conclusion Batch size and composition affect the genotype

  4. The Association of CYP1A1 Gene With Cervical Cancer and Additional SNP-SNP Interaction in Chinese Women.

    PubMed

    Li, Shuhong; Li, Guiqin; Kong, Fanqiang; Liu, Zhifen; Li, Ning; Li, Yan; Guo, Xiaojing

    2016-11-01

    The aim of this study was to investigate the association between CYP1A1 gene polymorphism and cervical cancer risk, and the impact of SNP-SNP interaction on cervical cancer risk in Chinese women. A total of 728 females with a mean age of 60.1 ± 14.5 years old were selected, including 360 cervical cancer patients and 368 normal controls. Logistic regression was performed to investigate association between single-nucleotide polymorphisms (SNP) and cervical cancer risk. Generalized multifactor dimensionality reduction (GMDR) was used to analyze the SNP-SNP interaction. Logistic analysis showed a significant association between rs4646903 and increased cervical cancer risk. The carriers of homozygous mutant of rs4646903 polymorphism revealed increased cervical cancer risk than those with wild-type homozygotes, OR (95%CI) were 1.45 (1.20-1.95). There was a significant two-locus model (P = 0.0107) involving rs4646903 and rs1048943, indicating a potential SNP-SNP interaction between rs4646903 and rs1048943. Overall, the two-locus models had a cross-validation consistency of 10 of 10, and had the testing accuracy of 60.72%. Subjects with TC or CC of rs4646903 and AG or GG of rs1048943 genotype have the highest cervical cancer risk, compared to subjects with TT of rs4646903 and AA of rs1048943 genotype, OR (95%CI) was 2.03 (1.42-2.89). rs4646903 minor alleles and interaction between rs4646903 and rs1048943 were associated with increased cervical cancer risk. © 2016 Wiley Periodicals, Inc.

  5. MDM2 SNP309 and SNP285 Act as Negative Prognostic Markers for Non-small Cell Lung Cancer Adenocarcinoma Patients

    PubMed Central

    Deben, Christophe; Op de Beeck, Ken; Van den Bossche, Jolien; Jacobs, Julie; Lardon, Filip; Wouters, An; Peeters, Marc; Van Camp, Guy; Rolfo, Christian; Deschoolmeester, Vanessa; Pauwels, Patrick

    2017-01-01

    Objectives: Two functional polymorphisms in the MDM2 promoter region, SNP309T>G and SNP285G>C, have been shown to impact MDM2 expression and cancer risk. Currently available data on the prognostic value of MDM2 SNP309 in non-small cell lung cancer (NSCLC) is contradictory and unavailable for SNP285. The goal of this study was to clarify the role of these MDM2 SNPs in the outcome of NSCLC patients. Materials and Methods: In this study we genotyped SNP309 and SNP285 in 98 NSCLC adenocarcinoma patients and determined MDM2 mRNA and protein levels. In addition, we assessed the prognostic value of these common SNPs on overall and progression free survival, taking into account the TP53 status of the tumor. Results and Conclusion: We found that the SNP285C allele, but not the SNP309G allele, was significantly associated with increased MDM2 mRNA expression levels (p = 0.025). However, we did not observe an association with MDM2 protein levels for SNP285. The SNP309G allele was significantly associated with the presence of wild type TP53 (p = 0.047) and showed a strong trend towards increased MDM2 protein levels (p = 0.068). In addition, patients harboring the SNP309G allele showed a worse overall survival, but only in the presence of wild type TP53. The SNP285C allele was significantly associated with an early age of diagnosis and metastasis. Additionally, the SNP285C allele acted as an independent predictor for worse progression free survival (HR = 3.97; 95% CI = 1.51 - 10.42; p = 0.005). Our data showed that both SNP309 (in the presence of wild type TP53) and SNP285 act as negative prognostic markers for NSCLC patients, implicating a prominent role for these variants in the outcome of these patients. PMID:28819417

  6. Forensic SNP Genotyping using Nanopore MinION Sequencing

    PubMed Central

    Cornelis, Senne; Gansemans, Yannick; Deleye, Lieselot; Deforce, Dieter; Van Nieuwerburgh, Filip

    2017-01-01

    One of the latest developments in next generation sequencing is the Oxford Nanopore Technologies’ (ONT) MinION nanopore sequencer. We studied the applicability of this system to perform forensic genotyping of the forensic female DNA standard 9947 A using the 52 SNP-plex assay developed by the SNPforID consortium. All but one of the loci were correctly genotyped. Several SNP loci were identified as problematic for correct and robust genotyping using nanopore sequencing. All these loci contained homopolymers in the sequence flanking the forensic SNP and most of them were already reported as problematic in studies using other sequencing technologies. When these problematic loci are avoided, correct forensic genotyping using nanopore sequencing is technically feasible. PMID:28155888

  7. PanSNPdb: the Pan-Asian SNP genotyping database.

    PubMed

    Ngamphiw, Chumpol; Assawamakin, Anunchai; Xu, Shuhua; Shaw, Philip J; Yang, Jin Ok; Ghang, Ho; Bhak, Jong; Liu, Edison; Tongsima, Sissades

    2011-01-01

    The HUGO Pan-Asian SNP consortium conducted the largest survey to date of human genetic diversity among Asians by sampling 1,719 unrelated individuals among 71 populations from China, India, Indonesia, Japan, Malaysia, the Philippines, Singapore, South Korea, Taiwan, and Thailand. We have constructed a database (PanSNPdb), which contains these data and various new analyses of them. PanSNPdb is a research resource in the analysis of the population structure of Asian peoples, including linkage disequilibrium patterns, haplotype distributions, and copy number variations. Furthermore, PanSNPdb provides an interactive comparison with other SNP and CNV databases, including HapMap3, JSNP, dbSNP and DGV and thus provides a comprehensive resource of human genetic diversity. The information is accessible via a widely accepted graphical interface used in many genetic variation databases. Unrestricted access to PanSNPdb and any associated files is available at: http://www4a.biotec.or.th/PASNP.

  8. Forensic SNP Genotyping using Nanopore MinION Sequencing.

    PubMed

    Cornelis, Senne; Gansemans, Yannick; Deleye, Lieselot; Deforce, Dieter; Van Nieuwerburgh, Filip

    2017-02-03

    One of the latest developments in next generation sequencing is the Oxford Nanopore Technologies' (ONT) MinION nanopore sequencer. We studied the applicability of this system to perform forensic genotyping of the forensic female DNA standard 9947 A using the 52 SNP-plex assay developed by the SNPforID consortium. All but one of the loci were correctly genotyped. Several SNP loci were identified as problematic for correct and robust genotyping using nanopore sequencing. All these loci contained homopolymers in the sequence flanking the forensic SNP and most of them were already reported as problematic in studies using other sequencing technologies. When these problematic loci are avoided, correct forensic genotyping using nanopore sequencing is technically feasible.

  9. A 48 SNP set for grapevine cultivar identification

    PubMed Central

    2011-01-01

    Background Rapid and consistent genotyping is an important requirement for cultivar identification in many crop species. Among them grapevine cultivars have been the subject of multiple studies given the large number of synonyms and homonyms generated during many centuries of vegetative multiplication and exchange. Simple sequence repeat (SSR) markers have been preferred until now because of their high level of polymorphism, their codominant nature and their high profile repeatability. However, the rapid application of partial or complete genome sequencing approaches is identifying thousands of single nucleotide polymorphisms (SNP) that can be very useful for such purposes. Although SNP markers are bi-allelic, and therefore not as polymorphic as microsatellites, the high number of loci that can be multiplexed and the possibilities of automation as well as their highly repeatable results under any analytical procedure make them the future markers of choice for any type of genetic identification. Results We analyzed over 300 SNP in the genome of grapevine using a re-sequencing strategy in a selection of 11 genotypes. Among the identified polymorphisms, we selected 48 SNP spread across all grapevine chromosomes with allele frequencies balanced enough as to provide sufficient information content for genetic identification in grapevine allowing for good genotyping success rate. Marker stability was tested in repeated analyses of a selected group of cultivars obtained worldwide to demonstrate their usefulness in genetic identification. Conclusions We have selected a set of 48 stable SNP markers with a high discrimination power and a uniform genome distribution (2-3 markers/chromosome), which is proposed as a standard set for grapevine (Vitis vinifera L.) genotyping. Any previous problems derived from microsatellite allele confusion between labs or the need to run reference cultivars to identify allele sizes disappear using this type of marker. Furthermore, because SNP

  10. Global Expression Patterns of Three Festuca Species Exposed to Different Doses of Glyphosate Using the Affymetrix GeneChip Wheat Genome Array

    PubMed Central

    Cebeci, Ozge; Budak, Hikmet

    2009-01-01

    Glyphosate has been shown to act as an inhibitor of an aromatic amino acid biosynthetic pathway, while other pathways that may be affected by glyphosate are not known. Cross species hybridizations can provide a tool for elucidating biological pathways conserved among organisms. Comparative genome analyses have indicated a high level of colinearity among grass species and Festuca, on which we focus here, and showed rearrangements common to the Pooideae family. Based on sequence conservation among grass species, we selected the Affymetrix GeneChip Wheat Genome Array as a tool for the analysis of expression profiles of three Festuca (fescue) species with distinctly different tolerances to varying levels of glyphosate. Differences in transcript expression were recorded upon foliar glyphosate application at 1.58 mM and 6.32 mM, representing 5% and 20%, respectively, of the recommended rate. Differences highlighted categories of general metabolic processes, such as photosynthesis, protein synthesis, stress responses, and a larger number of transcripts responded to 20% glyphosate application. Differential expression of genes encoding proteins involved in the shikimic acid pathway could not be identified by cross hybridization. Microarray data were confirmed by RT-PCR and qRT-PCR analyses. This is the first report to analyze the potential of cross species hybridization in Fescue species and the data and analyses will help extend our knowledge on the cellular processes affected by glyphosate. PMID:20182642

  11. A sequence-based identification of the genes detected by probesets on the Affymetrix U133 plus 2.0 array.

    PubMed

    Harbig, Jeremy; Sprinkle, Robert; Enkemann, Steven A

    2005-02-18

    One of the biggest problems facing microarray experiments is the difficulty of translating results into other microarray formats or comparing microarray results to other biochemical methods. We believe that this is largely the result of poor gene identification. We re-identified the probesets on the Affymetrix U133 plus 2.0 GeneChip array. This identification was based on the sequence of the probes and the sequence of the human genome. Using the BLAST program, we matched probes with documented and postulated human transcripts. This resulted in the redefinition of approximately 37% of the probes on the U133 plus 2.0 array. This updated identification specifically points out where the identification is complicated by cross-hybridization from splice variants or closely related genes. More than 5000 probesets detect multiple transcripts and therefore the exact protein affected cannot be readily concluded from the performance of one probeset alone. This makes naming difficult and impacts any downstream analysis such as associating gene ontologies, mapping affected pathways or simply validating expression changes. We have now automated the sequence-based identification and can more appropriately annotate any array where the sequence on each spot is known.

  12. Population distribution and ancestry of the cancer protective MDM2 SNP285 (rs117039649).

    PubMed

    Knappskog, Stian; Gansmo, Liv B; Dibirova, Khadizha; Metspalu, Andres; Cybulski, Cezary; Peterlongo, Paolo; Aaltonen, Lauri; Vatten, Lars; Romundstad, Pål; Hveem, Kristian; Devilee, Peter; Evans, Gareth D; Lin, Dongxin; Van Camp, Guy; Manolopoulos, Vangelis G; Osorio, Ana; Milani, Lili; Ozcelik, Tayfun; Zalloua, Pierre; Mouzaya, Francis; Bliznetz, Elena; Balanovska, Elena; Pocheshkova, Elvira; Kučinskas, Vaidutis; Atramentova, Lubov; Nymadawa, Pagbajabyn; Titov, Konstantin; Lavryashina, Maria; Yusupov, Yuldash; Bogdanova, Natalia; Koshel, Sergey; Zamora, Jorge; Wedge, David C; Charlesworth, Deborah; Dörk, Thilo; Balanovsky, Oleg; Lønning, Per E

    2014-09-30

    The MDM2 promoter SNP285C is located on the SNP309G allele. While SNP309G enhances Sp1 transcription factor binding and MDM2 transcription, SNP285C antagonizes Sp1 binding and reduces the risk of breast-, ovary- and endometrial cancer. Assessing SNP285 and 309 genotypes across 25 different ethnic populations (>10.000 individuals), the incidence of SNP285C was 6-8% across European populations except for Finns (1.2%) and Saami (0.3%). The incidence decreased towards the Middle-East and Eastern Russia, and SNP285C was absent among Han Chinese, Mongolians and African Americans. Interhaplotype variation analyses estimated SNP285C to have originated about 14,700 years ago (95% CI: 8,300 - 33,300). Both this estimate and the geographical distribution suggest SNP285C to have arisen after the separation between Caucasians and modern day East Asians (17,000 - 40,000 years ago). We observed a strong inverse correlation (r = -0.805; p < 0.001) between the percentage of SNP309G alleles harboring SNP285C and the MAF for SNP309G itself across different populations suggesting selection and environmental adaptation with respect to MDM2 expression in recent human evolution. In conclusion, we found SNP285C to be a pan-Caucasian variant. Ethnic variation regarding distribution of SNP285C needs to be taken into account when assessing the impact of MDM2 SNPs on cancer risk.

  13. Sniper: improved SNP discovery by multiply mapping deep sequenced reads.

    PubMed

    Simola, Daniel F; Kim, Junhyong

    2011-06-20

    SNP (single nucleotide polymorphism) discovery using next-generation sequencing data remains difficult primarily because of redundant genomic regions, such as interspersed repetitive elements and paralogous genes, present in all eukaryotic genomes. To address this problem, we developed Sniper, a novel multi-locus Bayesian probabilistic model and a computationally efficient algorithm that explicitly incorporates sequence reads that map to multiple genomic loci. Our model fully accounts for sequencing error, template bias, and multi-locus SNP combinations, maintaining high sensitivity and specificity under a broad range of conditions. An implementation of Sniper is freely available at http://kim.bio.upenn.edu/software/sniper.shtml.

  14. Genetic algorithm-generated SNP barcodes of the mitochondrial D-loop for chronic dialysis susceptibility.

    PubMed

    Chen, Jin-Bor; Chuang, Li-Yeh; Lin, Yu-Da; Liou, Chia-Wei; Lin, Tsu-Kung; Lee, Wen-Chin; Cheng, Ben-Chung; Chang, Hsueh-Wei; Yang, Cheng-Hong

    2014-06-01

    Single nucleotide polymorphism (SNP) interaction analysis can simultaneously evaluate the complex SNP interactions present in complex diseases. However, it is less commonly applied to evaluate the predisposition of chronic dialysis and its computational analysis remains challenging. In this study, we aimed to improve the analysis of SNP-SNP interactions within the mitochondrial D-loop in chronic dialysis. The SNP-SNP interactions between 77 reported SNPs within the mitochondrial D-loop in chronic dialysis study were evaluated in terms of SNP barcodes (different SNP combinations with their corresponding genotypes). We propose a genetic algorithm (GA) to generate SNP barcodes. The χ(2) values were then calculated by the occurrences of the specific SNP barcodes and their non-specific combinations between cases and controls. Each SNP barcode (2- to 7-SNP) with the highest value in the χ(2) test was regarded as the best SNP barcode (11.304 to 23.310; p < 0.001). The best GA-generated SNP barcodes (2- to 7-SNP) were significantly associated with chronic dialysis (odds ratio [OR] = 1.998 to 3.139; p < 0.001). The order of influence for SNPs was the same as the order of their OR values for chronic dialysis in terms of 2- to 7-SNP barcodes. Taken together, we propose an effective algorithm to address the SNP-SNP interactions and demonstrated that many non-significant SNPs within the mitochondrial D-loop may play a role in jointed effects to chronic dialysis susceptibility.

  15. snpTree--a web-server to identify and construct SNP trees from whole genome sequence data.

    PubMed

    Leekitcharoenphon, Pimlapas; Kaas, Rolf S; Thomsen, Martin Christen Frølund; Friis, Carsten; Rasmussen, Simon; Aarestrup, Frank M

    2012-01-01

    The advances and decreasing economical cost of whole genome sequencing (WGS), will soon make this technology available for routine infectious disease epidemiology. In epidemiological studies, outbreak isolates have very little diversity and require extensive genomic analysis to differentiate and classify isolates. One of the successfully and broadly used methods is analysis of single nucletide polymorphisms (SNPs). Currently, there are different tools and methods to identify SNPs including various options and cut-off values. Furthermore, all current methods require bioinformatic skills. Thus, we lack a standard and simple automatic tool to determine SNPs and construct phylogenetic tree from WGS data. Here we introduce snpTree, a server for online-automatic SNPs analysis. This tool is composed of different SNPs analysis suites, perl and python scripts. snpTree can identify SNPs and construct phylogenetic trees from WGS as well as from assembled genomes or contigs. WGS data in fastq format are aligned to reference genomes by BWA while contigs in fasta format are processed by Nucmer. SNPs are concatenated based on position on reference genome and a tree is constructed from concatenated SNPs using FastTree and a perl script. The online server was implemented by HTML, Java and python script.The server was evaluated using four published bacterial WGS data sets (V. cholerae, S. aureus CC398, S. Typhimurium and M. tuberculosis). The evaluation results for the first three cases was consistent and concordant for both raw reads and assembled genomes. In the latter case the original publication involved extensive filtering of SNPs, which could not be repeated using snpTree. The snpTree server is an easy to use option for rapid standardised and automatic SNP analysis in epidemiological studies also for users with limited bioinformatic experience. The web server is freely accessible at http://www.cbs.dtu.dk/services/snpTree-1.0/.

  16. Evidence for SNP-SNP interaction identified through targeted sequencing of cleft case-parent trios.

    PubMed

    Xiao, Yanzi; Taub, Margaret A; Ruczinski, Ingo; Begum, Ferdouse; Hetmanski, Jacqueline B; Schwender, Holger; Leslie, Elizabeth J; Koboldt, Daniel C; Murray, Jeffrey C; Marazita, Mary L; Beaty, Terri H

    2017-04-01

    Nonsyndromic cleft lip with or without cleft palate (NSCL/P) is the most common craniofacial birth defect in humans, affecting 1 in 700 live births. This malformation has a complex etiology where multiple genes and several environmental factors influence risk. At least a dozen different genes have been confirmed to be associated with risk of NSCL/P in previous studies. However, all the known genetic risk factors cannot fully explain the observed heritability of NSCL/P, and several authors have suggested gene-gene (G × G) interaction may be important in the etiology of this complex and heterogeneous malformation. We tested for G × G interactions using common single nucleotide polymorphic (SNP) markers from targeted sequencing in 13 regions identified by previous studies spanning 6.3 Mb of the genome in a study of 1,498 NSCL/P case-parent trios. We used the R-package trio to assess interactions between polymorphic markers in different genes, using a 1 degree of freedom (1df) test for screening, and a 4 degree of freedom (4df) test to assess statistical significance of epistatic interactions. To adjust for multiple comparisons, we performed permutation tests. The most significant interaction was observed between rs6029315 in MAFB and rs6681355 in IRF6 (4df P = 3.8 × 10(-8) ) in case-parent trios of European ancestry, which remained significant after correcting for multiple comparisons. However, no significant interaction was detected in trios of Asian ancestry.

  17. Tag SNP selection in genotype data for maximizing SNP prediction accuracy.

    PubMed

    Halperin, Eran; Kimmel, Gad; Shamir, Ron

    2005-06-01

    The search for genetic regions associated with complex diseases, such as cancer or Alzheimer's disease, is an important challenge that may lead to better diagnosis and treatment. The existence of millions of DNA variations, primarily single nucleotide polymorphisms (SNPs), may allow the fine dissection of such associations. However, studies seeking disease association are limited by the cost of genotyping SNPs. Therefore, it is essential to find a small subset of informative SNPs (tag SNPs) that may be used as good representatives of the rest of the SNPs. We define a new natural measure for evaluating the prediction accuracy of a set of tag SNPs, and use it to develop a new method for tag SNPs selection. Our method is based on a novel algorithm that predicts the values of the rest of the SNPs given the tag SNPs. In contrast to most previous methods, our prediction algorithm uses the genotype information and not the haplotype information of the tag SNPs. Our method is very efficient, and it does not rely on having a block partition of the genomic region. We compared our method with two state-of-the-art tag SNP selection algorithms on 58 different genotype datasets from four different sources. Our method consistently found tag SNPs with considerably better prediction ability than the other methods. The software is available from the authors on request.

  18. Do you really know where this SNP goes?

    USDA-ARS?s Scientific Manuscript database

    The release of build 10.2 of the swine genome was a marked improvement over previous builds and has proven extremely useful. However, as most know, there are regions of the genome that this particular build does not accurately represent. For instance, nearly 25% of the 62,162 SNP on the Illumina Por...

  19. Target SNP selection in complex disease association studies

    PubMed Central

    Wjst, Matthias

    2004-01-01

    Background The massive amount of SNP data stored at public internet sites provides unprecedented access to human genetic variation. Selecting target SNP for disease-gene association studies is currently done more or less randomly as decision rules for the selection of functional relevant SNPs are not available. Results We implemented a computational pipeline that retrieves the genomic sequence of target genes, collects information about sequence variation and selects functional motifs containing SNPs. Motifs being considered are gene promoter, exon-intron structure, AU-rich mRNA elements, transcription factor binding motifs, cryptic and enhancer splice sites together with expression in target tissue. As a case study, 396 genes on chromosome 6p21 in the extended HLA region were selected that contributed nearly 20,000 SNPs. By computer annotation ~2,500 SNPs in functional motifs could be identified. Most of these SNPs are disrupting transcription factor binding sites but only those introducing new sites had a significant depressing effect on SNP allele frequency. Other decision rules concern position within motifs, the validity of SNP database entries, the unique occurrence in the genome and conserved sequence context in other mammalian genomes. Conclusion Only 10% of all gene-based SNPs have sequence-predicted functional relevance making them a primary target for genotyping in association studies. PMID:15248903

  20. SNP Discovery and Linkage Map Construction in Cultivated Tomato

    PubMed Central

    Shirasawa, Kenta; Isobe, Sachiko; Hirakawa, Hideki; Asamizu, Erika; Fukuoka, Hiroyuki; Just, Daniel; Rothan, Christophe; Sasamoto, Shigemi; Fujishiro, Tsunakazu; Kishida, Yoshie; Kohara, Mitsuyo; Tsuruoka, Hisano; Wada, Tsuyuko; Nakamura, Yasukazu; Sato, Shusei; Tabata, Satoshi

    2010-01-01

    Few intraspecific genetic linkage maps have been reported for cultivated tomato, mainly because genetic diversity within Solanum lycopersicum is much less than that between tomato species. Single nucleotide polymorphisms (SNPs), the most abundant source of genomic variation, are the most promising source of polymorphisms for the construction of linkage maps for closely related intraspecific lines. In this study, we developed SNP markers based on expressed sequence tags for the construction of intraspecific linkage maps in tomato. Out of the 5607 SNP positions detected through in silico analysis, 1536 were selected for high-throughput genotyping of two mapping populations derived from crosses between ‘Micro-Tom’ and either ‘Ailsa Craig’ or ‘M82’. A total of 1137 markers, including 793 out of the 1338 successfully genotyped SNPs, along with 344 simple sequence repeat and intronic polymorphism markers, were mapped onto two linkage maps, which covered 1467.8 and 1422.7 cM, respectively. The SNP markers developed were then screened against cultivated tomato lines in order to estimate the transferability of these SNPs to other breeding materials. The molecular markers and linkage maps represent a milestone in the genomics and genetics, and are the first step toward molecular breeding of cultivated tomato. Information on the DNA markers, linkage maps, and SNP genotypes for these tomato lines is available at http://www.kazusa.or.jp/tomato/. PMID:21044984

  1. Weighted SNP set analysis in genome-wide association study.

    PubMed

    Dai, Hui; Zhao, Yang; Qian, Cheng; Cai, Min; Zhang, Ruyang; Chu, Minjie; Dai, Juncheng; Hu, Zhibin; Shen, Hongbing; Chen, Feng

    2013-01-01

    Genome-wide association studies (GWAS) are popular for identifying genetic variants which are associated with disease risk. Many approaches have been proposed to test multiple single nucleotide polymorphisms (SNPs) in a region simultaneously which considering disadvantages of methods in single locus association analysis. Kernel machine based SNP set analysis is more powerful than single locus analysis, which borrows information from SNPs correlated with causal or tag SNPs. Four types of kernel machine functions and principal component based approach (PCA) were also compared. However, given the loss of power caused by low minor allele frequencies (MAF), we conducted an extension work on PCA and used a new method called weighted PCA (wPCA). Comparative analysis was performed for weighted principal component analysis (wPCA), logistic kernel machine based test (LKM) and principal component analysis (PCA) based on SNP set in the case of different minor allele frequencies (MAF) and linkage disequilibrium (LD) structures. We also applied the three methods to analyze two SNP sets extracted from a real GWAS dataset of non-small cell lung cancer in Han Chinese population. Simulation results show that when the MAF of the causal SNP is low, weighted principal component and weighted IBS are more powerful than PCA and other kernel machine functions at different LD structures and different numbers of causal SNPs. Application of the three methods to a real GWAS dataset indicates that wPCA and wIBS have better performance than the linear kernel, IBS kernel and PCA.

  2. High throughput SNP detection system based on magnetic nanoparticles separation.

    PubMed

    Liu, Bin; Jia, Yingying; Ma, Man; Li, Zhiyang; Liu, Hongna; Li, Song; Deng, Yan; Zhang, Liming; Lu, Zhuoxuan; Wang, Wei; He, Nongyue

    2013-02-01

    Single-nucleotide polymorphism (SNP) was one-base variations in DNA sequence that can often be helpful to find genes associations for hereditary disease, communicable disease and so on. We developed a high throughput SNP detection system based on magnetic nanoparticles (MNPs) separation and dual-color hybridization or single base extension. This system includes a magnetic separation unit for sample separation, three high precision robot arms for pipetting and microtiter plate transferring respectively, an accurate temperature control unit for PCR and DNA hybridization and a high accurate and sensitive optical signal detection unit for fluorescence detection. The cyclooxygenase-2 gene promoter region--65G > C polymorphism locus SNP genotyping experiment for 48 samples from the northern Jiangsu area has been done to verify that if this system can simplify manual operation of the researchers, save time and improve efficiency in SNP genotyping experiments. It can realize sample preparation, target sequence amplification, signal detection and data analysis automatically and can be used in clinical molecule diagnosis and high throughput fluorescence immunological detection and so on.

  3. Software solutions for the livestock genomics SNP array revolution.

    PubMed

    Nicolazzi, E L; Biffani, S; Biscarini, F; Orozco Ter Wengel, P; Caprera, A; Nazzicari, N; Stella, A

    2015-08-01

    Since the beginning of the genomic era, the number of available single nucleotide polymorphism (SNP) arrays has grown considerably. In the bovine species alone, 11 SNP chips not completely covered by intellectual property are currently available, and the number is growing. Genomic/genotype data are not standardized, and this hampers its exchange and integration. In addition, software used for the analyses of these data usually requires not standard (i.e. case specific) input files which, considering the large amount of data to be handled, require at least some programming skills in their production. In this work, we describe a software toolkit for SNP array data management, imputation, genome-wide association studies, population genetics and genomic selection. However, this toolkit does not solve the critical need for standardization of the genotypic data and software input files. It only highlights the chaotic situation each researcher has to face on a daily basis and gives some helpful advice on the currently available tools in order to navigate the SNP array data complexity.

  4. Genetic mapping in grapevine using a SNP microarray: intensity values

    USDA-ARS?s Scientific Manuscript database

    Genotyping microarrays are widely used for genome wide association studies, but in high-diversity organisms, the quality of SNP calls can be diminished by genetic variation near the assayed nucleotide. To address this limitation in grapevine, we developed a simple heuristic that uses hybridization i...

  5. Amerindians show association to obesity with adiponectin gene SNP45 and SNP276: population genetics of a food intake control and "thrifty" gene.

    PubMed

    Arnaiz-Villena, Antonio; Fernández-Honrado, Mercedes; Rey, Diego; Enríquez-de-Salamanca, Mercedes; Abd-El-Fatah-Khalil, Sedeka; Arribas, Ignacio; Coca, Carmen; Algora, Manuel; Areces, Cristina

    2013-02-01

    Adiponectin gene polymorphisms SNP45 and SNP276 have been related to metabolic syndrome (MS) and related pathologies, including obesity. However results of associations are contradictory depending on which population is studied. In the present study, these adiponectin SNPs are for the first time studied in Amerindians. Allele frequencies are obtained and comparison with obesity and other MS related parameters are performed. Amerindians were also defined by characteristic HLA genes. Our main results are: (1) SNP276 T is associated to low diastolic blood pressure in Amerindians, (2) SNP45 G allele is correlated with obesity in female but not in male Amerindians, (3) SNP45/SNP276 T/G haplotype in total obese/non-obese subjects tends to show a linkage with non-obese Amerindians, (4) SNP45/SNP276 T/T haplotype is linked to obese Amerindian males. Also, a world population study is carried out finding that SNP45 T and SNP276 T alleles are the most frequent in African Blacks and are found significantly in lower frequencies in Europeans and Asians. This together with the fact that there is a linkage of this haplotype to obese Amerindian males suggest that evolutionary forces related to famine (or population density in relation with available food) may have shaped world population adiponectin polymorphism frequencies.

  6. The Usage of an SNP-SNP Relationship Matrix for Best Linear Unbiased Prediction (BLUP) Analysis Using a Community-Based Cohort Study

    PubMed Central

    Lee, Young-Sup; Kim, Hyeon-Jeong; Cho, Seoae

    2014-01-01

    Best linear unbiased prediction (BLUP) has been used to estimate the fixed effects and random effects of complex traits. Traditionally, genomic relationship matrix-based (GRM) and random marker-based BLUP analyses are prevalent to estimate the genetic values of complex traits. We used three methods: GRM-based prediction (G-BLUP), random marker-based prediction using an identity matrix (so-called single-nucleotide polymorphism [SNP]-BLUP), and SNP-SNP variance-covariance matrix (so-called SNP-GBLUP). We used 35,675 SNPs and R package "rrBLUP" for the BLUP analysis. The SNP-SNP relationship matrix was calculated using the GRM and Sherman-Morrison-Woodbury lemma. The SNP-GBLUP result was very similar to G-BLUP in the prediction of genetic values. However, there were many discrepancies between SNP-BLUP and the other two BLUPs. SNP-GBLUP has the merit to be able to predict genetic values through SNP effects. PMID:25705167

  7. Comparative transcriptomic profiling of Vitis vinifera under high light using a custom-made array and the Affymetrix GeneChip.

    PubMed

    Carvalho, Luísa C; Vilela, Belmiro J; Mullineaux, Phil M; Amâncio, Sara

    2011-11-01

    Understanding abiotic stress responses is one of the most important issues in plant research nowadays. Abiotic stress, including excess light, can promote the onset of oxidative stress through the accumulation of reactive oxygen species. Oxidative stress also arises when in vitro propagated plants are exposed to high light upon transfer to ex vitro. To determine whether the underlying pathways activated at the transfer of in vitro grapevine to ex vitro conditions reflect the processes occurring upon light stress, we used Vitis vinifera Affymetrix GeneChip (VvGA) and a custom array of genes responsive to light stress (LSCA) detected by real-time reverse transcriptase PCR (qRT-PCR). When gene-expression profiles were compared, 'protein metabolism and modification', 'signaling', and 'anti-oxidative' genes were more represented in LSCA, while, in VvGA, 'cell wall metabolism' and 'secondary metabolism' were the categories in which gene expression varied more significantly. The above functional categories confirm previous studies involving other types of abiotic stresses, enhancing the common attributes of abiotic stress defense pathways. The LSCA analysis of our experimental system detected strong response of heat shock genes, particularly the protein rescuing mechanism involving the cooperation of two ATP-dependent chaperone systems, Hsp100 and Hsp70, which showed an unusually late response during the recovery period, of extreme relevance to remove non-functional, potentially harmful polypeptides arising from misfolding, denaturation, or aggregation brought about by stress. The success of LSCA also proves the feasibility of a custom-made qRT-PCR approach, particularly for species for which no GeneChip is available and for researchers dealing with a specific and focused problem.

  8. SNP selection and classification of genome-wide SNP data using stratified sampling random forests.

    PubMed

    Wu, Qingyao; Ye, Yunming; Liu, Yang; Ng, Michael K

    2012-09-01

    For high dimensional genome-wide association (GWA) case-control data of complex disease, there are usually a large portion of single-nucleotide polymorphisms (SNPs) that are irrelevant with the disease. A simple random sampling method in random forest using default mtry parameter to choose feature subspace, will select too many subspaces without informative SNPs. Exhaustive searching an optimal mtry is often required in order to include useful and relevant SNPs and get rid of vast of non-informative SNPs. However, it is too time-consuming and not favorable in GWA for high-dimensional data. The main aim of this paper is to propose a stratified sampling method for feature subspace selection to generate decision trees in a random forest for GWA high-dimensional data. Our idea is to design an equal-width discretization scheme for informativeness to divide SNPs into multiple groups. In feature subspace selection, we randomly select the same number of SNPs from each group and combine them to form a subspace to generate a decision tree. The advantage of this stratified sampling procedure can make sure each subspace contains enough useful SNPs, but can avoid a very high computational cost of exhaustive search of an optimal mtry, and maintain the randomness of a random forest. We employ two genome-wide SNP data sets (Parkinson case-control data comprised of 408 803 SNPs and Alzheimer case-control data comprised of 380 157 SNPs) to demonstrate that the proposed stratified sampling method is effective, and it can generate better random forest with higher accuracy and lower error bound than those by Breiman's random forest generation method. For Parkinson data, we also show some interesting genes identified by the method, which may be associated with neurological disorders for further biological investigations.

  9. Large-Scale SNP Discovery through RNA Sequencing and SNP Genotyping by Targeted Enrichment Sequencing in Cassava (Manihot esculenta Crantz)

    PubMed Central

    Pootakham, Wirulda; Shearman, Jeremy R.; Ruang-areerate, Panthita; Sonthirod, Chutima; Sangsrakru, Duangjai; Jomchai, Nukoon; Yoocha, Thippawan; Triwitayakorn, Kanokporn; Tragoonrung, Somvong; Tangphatsornruang, Sithichoke

    2014-01-01

    Cassava (Manihot esculenta Crantz) is one of the most important crop species being the main source of dietary energy in several countries. Marker-assisted selection has become an essential tool in plant breeding. Single nucleotide polymorphism (SNP) discovery via transcriptome sequencing is an attractive strategy for genome complexity reduction in organisms with large genomes. We sequenced the transcriptome of 16 cassava accessions using the Illumina HiSeq platform and identified 675,559 EST-derived SNP markers. A subset of those markers was subsequently genotyped by capture-based targeted enrichment sequencing in 100 F1 progeny segregating for starch viscosity phenotypes. A total of 2,110 non-redundant SNP markers were used to construct a genetic map. This map encompasses 1,785 cM and consists of 19 linkage groups. A major quantitative trait locus (QTL) controlling starch pasting properties was identified and shown to coincide with the QTL previously reported for this trait. With a high-density SNP-based linkage map presented here, we also uncovered a novel QTL associated with starch pasting time on LG 10. PMID:25551642

  10. High throughput SNP discovery and validation in the pig: towards the development of a high density swine SNP chip

    USDA-ARS?s Scientific Manuscript database

    Recent developments in sequencing technology have allowed the generation of millions of short read sequences in a fast and inexpensive way. This enables the cost effective large scale identification of hundreds of thousands of SNPs needed for the development of high density SNP arrays. Currently, a ...

  11. Large-scale SNP discovery through RNA sequencing and SNP genotyping by targeted enrichment sequencing in cassava (Manihot esculenta Crantz).

    PubMed

    Pootakham, Wirulda; Shearman, Jeremy R; Ruang-Areerate, Panthita; Sonthirod, Chutima; Sangsrakru, Duangjai; Jomchai, Nukoon; Yoocha, Thippawan; Triwitayakorn, Kanokporn; Tragoonrung, Somvong; Tangphatsornruang, Sithichoke

    2014-01-01

    Cassava (Manihot esculenta Crantz) is one of the most important crop species being the main source of dietary energy in several countries. Marker-assisted selection has become an essential tool in plant breeding. Single nucleotide polymorphism (SNP) discovery via transcriptome sequencing is an attractive strategy for genome complexity reduction in organisms with large genomes. We sequenced the transcriptome of 16 cassava accessions using the Illumina HiSeq platform and identified 675,559 EST-derived SNP markers. A subset of those markers was subsequently genotyped by capture-based targeted enrichment sequencing in 100 F1 progeny segregating for starch viscosity phenotypes. A total of 2,110 non-redundant SNP markers were used to construct a genetic map. This map encompasses 1,785 cM and consists of 19 linkage groups. A major quantitative trait locus (QTL) controlling starch pasting properties was identified and shown to coincide with the QTL previously reported for this trait. With a high-density SNP-based linkage map presented here, we also uncovered a novel QTL associated with starch pasting time on LG 10.

  12. High-throughput SNP genotyping for breeding applications in rice using the BeadXpress platform

    USDA-ARS?s Scientific Manuscript database

    Multiplexed single nucleotide polymorphism (SNP) markers have the potential to increase the speed and cost-effectiveness of genotyping, provided that an optimal SNP density is used for each application. To test the efficiency of multiplexed SNP genotyping for diversity, mapping and breeding applicat...

  13. Development of Single Nucleotide Polymorphism (SNP) Markers for Use in Commercial Maize (Zea Mays L.) Germplasm

    USDA-ARS?s Scientific Manuscript database

    The development of single nucleotide polymorphism (SNP) markers in maize offer the opportunity to utilize DNA markers in many new areas of population genetics, gene discovery, plant breeding, and germplasm identification. However, the steps from sequencing and SNP discovery to SNP marker design and ...

  14. SNP genotyping using single-tube fluorescent bidirectional PCR.

    PubMed

    Waterfall, Christy M; Cobb, Benjamin D

    2002-07-01

    SNP genotyping is a well-populatedfield with a large number of assay formats offering accurate allelic discrimination. However, there remains a discord between the ultimate goal of rapid, inexpensive assays that do not require complex design considerations and involved optimization strategies. We describe the first integration of bidirectional allele-specific amplification, SYBR Green I, and rapid-cycle PCR to provide a homogeneous SNP-typing assay. Wild-type, mutant, and heterozygous alleles were easily discriminated in a single tube using melt curve profiling of PCR products alone. We demonstrate the effectiveness and reliability of this assay with a blinded trial using clinical samples from individuals with sickle cell anemia, sickle cell trait, or unaffected individuals. The tests were completed in less than 30 min without expensive fluorogenic probes, prohibiting design rules, or lengthy downstream processing for product analysis.

  15. SNP typing on the NanoChip electronic microarray.

    PubMed

    Børsting, Claus; Sanchez, Juan J; Morling, Niels

    2005-01-01

    We describe a single nucleotide polymorphism (SNP) typing protocol developed for the NanoChip electronic microarray. The NanoChip array consists of 100 electrodes covered by a thin hydrogel layer containing streptavidin. An electric currency can be applied to one, several, or all electrodes at the same time according to a loading protocol generated by the user. Biotinylated deoxyribonucleic acid (DNA) is directed to the pad(s) via the electronic field(s) and bound to streptavidin in the hydrogel layer. Subsequently, fluorescently labeled reporter oligos and a stabilizer oligo are hybridized to the bound DNA. Base stacking between the short reporter and the longer stabilizer oligo stabilizes the binding of a matching reporter, whereas the binding of a reporter carrying a mismatch in the SNP position will be relatively weak. Thermal stringency is applied to the NanoChip array according to a reader protocol generated by the user and the fluorescent label on the matching reporter is detected.

  16. Pyrobayes: an improved base caller for SNP discovery in pyrosequences.

    PubMed

    Quinlan, Aaron R; Stewart, Donald A; Strömberg, Michael P; Marth, Gábor T

    2008-02-01

    Previously reported applications of the 454 Life Sciences pyrosequencing technology have relied on deep sequence coverage for accurate polymorphism discovery because of frequent insertion and deletion sequence errors. Here we report a new base calling program, Pyrobayes, for pyrosequencing reads. Pyrobayes permits accurate single-nucleotide polymorphism (SNP) calling in resequencing applications, even in shallow read coverage, primarily because it produces more confident base calls than the native base calling program.

  17. Introgression browser: high-throughput whole-genome SNP visualization.

    PubMed

    Aflitos, Saulo Alves; Sanchez-Perez, Gabino; de Ridder, Dick; Fransz, Paul; Schranz, Michael E; de Jong, Hans; Peters, Sander A

    2015-04-01

    Breeding by introgressive hybridization is a pivotal strategy to broaden the genetic basis of crops. Usually, the desired traits are monitored in consecutive crossing generations by marker-assisted selection, but their analyses fail in chromosome regions where crossover recombinants are rare or not viable. Here, we present the Introgression Browser (iBrowser), a bioinformatics tool aimed at visualizing introgressions at nucleotide or SNP (Single Nucleotide Polymorphisms) accuracy. The software selects homozygous SNPs from Variant Call Format (VCF) information and filters out heterozygous SNPs, multi-nucleotide polymorphisms (MNPs) and insertion-deletions (InDels). For data analysis iBrowser makes use of sliding windows, but if needed it can generate any desired fragmentation pattern through General Feature Format (GFF) information. In an example of tomato (Solanum lycopersicum) accessions we visualize SNP patterns and elucidate both position and boundaries of the introgressions. We also show that our tool is capable of identifying alien DNA in a panel of the closely related S. pimpinellifolium by examining phylogenetic relationships of the introgressed segments in tomato. In a third example, we demonstrate the power of the iBrowser in a panel of 597 Arabidopsis accessions, detecting the boundaries of a SNP-free region around a polymorphic 1.17 Mbp inverted segment on the short arm of chromosome 4. The architecture and functionality of iBrowser makes the software appropriate for a broad set of analyses including SNP mining, genome structure analysis, and pedigree analysis. Its functionality, together with the capability to process large data sets and efficient visualization of sequence variation, makes iBrowser a valuable breeding tool. © 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd.

  18. Gene-based SNP discovery and genetic mapping in pea.

    PubMed

    Sindhu, Anoop; Ramsay, Larissa; Sanderson, Lacey-Anne; Stonehouse, Robert; Li, Rong; Condie, Janet; Shunmugam, Arun S K; Liu, Yong; Jha, Ambuj B; Diapari, Marwan; Burstin, Judith; Aubert, Gregoire; Tar'an, Bunyamin; Bett, Kirstin E; Warkentin, Thomas D; Sharpe, Andrew G

    2014-10-01

    Gene-based SNPs were identified and mapped in pea using five recombinant inbred line populations segregating for traits of agronomic importance. Pea (Pisum sativum L.) is one of the world's oldest domesticated crops and has been a model system in plant biology and genetics since the work of Gregor Mendel. Pea is the second most widely grown pulse crop in the world following common bean. The importance of pea as a food crop is growing due to its combination of moderate protein concentration, slowly digestible starch, high dietary fiber concentration, and its richness in micronutrients; however, pea has lagged behind other major crops in harnessing recent advances in molecular biology, genomics and bioinformatics, partly due to its large genome size with a large proportion of repetitive sequence, and to the relatively limited investment in research in this crop globally. The objective of this research was the development of a genome-wide transcriptome-based pea single-nucleotide polymorphism (SNP) marker platform using next-generation sequencing technology. A total of 1,536 polymorphic SNP loci selected from over 20,000 non-redundant SNPs identified using deep transcriptome sequencing of eight diverse Pisum accessions were used for genotyping in five RIL populations using an Illumina GoldenGate assay. The first high-density pea SNP map defining all seven linkage groups was generated by integrating with previously published anchor markers. Syntenic relationships of this map with the model legume Medicago truncatula and lentil (Lens culinaris Medik.) maps were established. The genic SNP map establishes a foundation for future molecular breeding efforts by enabling both the identification and tracking of introgression of genomic regions harbouring QTLs related to agronomic and seed quality traits.

  19. Multi-SNP Haplotype Analysis Methods for Association Analysis.

    PubMed

    Stram, Daniel O

    2017-01-01

    Haplotype analysis forms the basis of much of genetic association analysis using both related and unrelated individuals (we concentrate on unrelated). For example, haplotype analysis indirectly underlies the SNP imputation methods that are used for testing trait associations with known but unmeasured variants and for performing collaborative post-GWAS meta-analysis. This chapter is focused on the direct use of haplotypes in association testing. It reviews the rationale for haplotype-based association testing, discusses statistical issues related to haplotype uncertainty that affect the analysis, then gives practical guidance for testing haplotype-based associations with phenotype or outcome trait, first of candidate gene regions and then for the genome as a whole. Haplotypes are interesting for two reasons, first they may be in closer LD with a causal variant than any single measured SNP, and therefore may enhance the coverage value of the genotypes over single SNP analysis. Second, haplotypes may themselves be the causal variants of interest and some solid examples of this have appeared in the literature.This chapter discusses three possible approaches to incorporation of SNP haplotype analysis into generalized linear regression models: (1) a simple substitution method involving imputed haplotypes, (2) simultaneous maximum likelihood (ML) estimation of all parameters, including haplotype frequencies and regression parameters, and (3) a simplified approximation to full ML for case-control data.Examples of the various approaches for a haplotype analysis of a candidate gene are provided. We compare the behavior of the approximation-based methods and argue that in most instances the simpler methods hold up well in practice. We also describe the practical implementation of haplotype risk estimation genome-wide and discuss several shortcuts that can be used to speed up otherwise potentially very intensive computational requirements.

  20. Universal SNP genotyping assay with fluorescence polarization detection.

    PubMed

    Hsu, T M; Chen, X; Duan, S; Miller, R D; Kwok, P Y

    2001-09-01

    The degree of fluorescence polarization (FP) of a fluorescent molecule is a reflection of its molecular weight (Mr). FP is therefore a useful detection methodfor homogeneous assays in which the starting reagents and products differ significantly in Mr. We have previously shown that FP is a good detection method for the single-base extension and the 5'-nuclease assays. In this report, we describe a universal, optimized single-base extension assay for genotyping single nucleotide polymorphisms (SNPs). This assay, which we named the template-directed dye-terminator incorporation assay with fluorescence polarization detection (FP-TDI), uses four spectrally distinct dye terminators to achieve universal assay conditions. Even without optimization, approximately 70% of all SNP markers tested yielded robust assays. The addition of an E. coli ssDNA-binding protein just before the FP reading significantly increased FP values of the products and brought the success rate of FP-TDI assays up to 90%. Increasing the amount of dye terminators and reducing the number of thermal cycles in the single-base extension step of the assay increased the separation of the FP values benveen the products corresponding to different genotypes and improved the success rate of the assay to 100%. In this study the genomic DNA samples of 90 individuals were typed for a total of 38 FP-TDI assays (using both the sense and antisense TDI primers for 19 SNP markers). With the previously described modifications, the FP-TDI assay gave unambiguous genotyping data for all the samples tested in the 38 FP-TDI assays. When the genotypes determined by the FP-TDI and 5'-nuclease assays were compared, they were in 100% concordance for all experiments (a total of 3420 genotypes). The four-dye-terminator master mixture described here can be used for assaying any SNP marker and greatly simplifies the SNP genotyping assay design.

  1. BEAT: Bioinformatics Exon Array Tool to store, analyze and visualize Affymetrix GeneChip Human Exon Array data from disease experiments

    PubMed Central

    2012-01-01

    Background It is known from recent studies that more than 90% of human multi-exon genes are subject to Alternative Splicing (AS), a key molecular mechanism in which multiple transcripts may be generated from a single gene. It is widely recognized that a breakdown in AS mechanisms plays an important role in cellular differentiation and pathologies. Polymerase Chain Reactions, microarrays and sequencing technologies have been applied to the study of transcript diversity arising from alternative expression. Last generation Affymetrix GeneChip Human Exon 1.0 ST Arrays offer a more detailed view of the gene expression profile providing information on the AS patterns. The exon array technology, with more than five million data points, can detect approximately one million exons, and it allows performing analyses at both gene and exon level. In this paper we describe BEAT, an integrated user-friendly bioinformatics framework to store, analyze and visualize exon arrays datasets. It combines a data warehouse approach with some rigorous statistical methods for assessing the AS of genes involved in diseases. Meta statistics are proposed as a novel approach to explore the analysis results. BEAT is available at http://beat.ba.itb.cnr.it. Results BEAT is a web tool which allows uploading and analyzing exon array datasets using standard statistical methods and an easy-to-use graphical web front-end. BEAT has been tested on a dataset with 173 samples and tuned using new datasets of exon array experiments from 28 colorectal cancer and 26 renal cell cancer samples produced at the Medical Genetics Unit of IRCCS Casa Sollievo della Sofferenza. To highlight all possible AS events, alternative names, accession Ids, Gene Ontology terms and biochemical pathways annotations are integrated with exon and gene level expression plots. The user can customize the results choosing custom thresholds for the statistical parameters and exploiting the available clinical data of the samples for a

  2. Development and application of a 6.5 million feature Affymetrix Genechip® for massively parallel discovery of single position polymorphisms in lettuce (Lactuca spp.)

    PubMed Central

    2012-01-01

    Background High-resolution genetic maps are needed in many crops to help characterize the genetic diversity that determines agriculturally important traits. Hybridization to microarrays to detect single feature polymorphisms is a powerful technique for marker discovery and genotyping because of its highly parallel nature. However, microarrays designed for gene expression analysis rarely provide sufficient gene coverage for optimal detection of nucleotide polymorphisms, which limits utility in species with low rates of polymorphism such as lettuce (Lactuca sativa). Results We developed a 6.5 million feature Affymetrix GeneChip® for efficient polymorphism discovery and genotyping, as well as for analysis of gene expression in lettuce. Probes on the microarray were designed from 26,809 unigenes from cultivated lettuce and an additional 8,819 unigenes from four related species (L. serriola, L. saligna, L. virosa and L. perennis). Where possible, probes were tiled with a 2 bp stagger, alternating on each DNA strand; providing an average of 187 probes covering approximately 600 bp for each of over 35,000 unigenes; resulting in up to 13 fold redundancy in coverage per nucleotide. We developed protocols for hybridization of genomic DNA to the GeneChip® and refined custom algorithms that utilized coverage from multiple, high quality probes to detect single position polymorphisms in 2 bp sliding windows across each unigene. This allowed us to detect greater than 18,000 polymorphisms between the parental lines of our core mapping population, as well as numerous polymorphisms between cultivated lettuce and wild species in the lettuce genepool. Using marker data from our diversity panel comprised of 52 accessions from the five species listed above, we were able to separate accessions by species using both phylogenetic and principal component analyses. Additionally, we estimated the diversity between different types of cultivated lettuce and distinguished morphological types

  3. Assessment of the relationship between pre-chip and post-chip quality measures for Affymetrix GeneChip expression data

    PubMed Central

    Jones, Lesley; Goldstein, Darlene R; Hughes, Gareth; Strand, Andrew D; Collin, Francois; Dunnett, Stephen B; Kooperberg, Charles; Aragaki, Aaron; Olson, James M; Augood, Sarah J; Faull, Richard LM; Luthi-Carter, Ruth; Moskvina, Valentina; Hodges, Angela K

    2006-01-01

    Background Gene expression microarray experiments are expensive to conduct and guidelines for acceptable quality control at intermediate steps before and after the samples are hybridised to chips are vague. We conducted an experiment hybridising RNA from human brain to 117 U133A Affymetrix GeneChips and used these data to explore the relationship between 4 pre-chip variables and 22 post-chip outcomes and quality control measures. Results We found that the pre-chip variables were significantly correlated with each other but that this correlation was strongest between measures of RNA quality and cRNA yield. Post-mortem interval was negatively correlated with these variables. Four principal components, reflecting array outliers, array adjustment, hybridisation noise and RNA integrity, explain about 75% of the total post-chip measure variability. Two significant canonical correlations existed between the pre-chip and post-chip variables, derived from MAS 5.0, dChip and the Bioconductor packages affy and affyPLM. The strongest (CANCOR 0.838, p < 0.0001) correlated RNA integrity and yield with post chip quality control (QC) measures indexing 3'/5' RNA ratios, bias or scaling of the chip and scaling of the variability of the signal across the chip. Post-mortem interval was relatively unimportant. We also found that the RNA integrity number (RIN) could be moderately well predicted by post-chip measures B_ACTIN35, GAPDH35 and SF. Conclusion We have found that the post-chip variables having the strongest association with quantities measurable before hybridisation are those reflecting RNA integrity. Other aspects of quality, such as noise measures (reflecting the execution of the assay) or measures reflecting data quality (outlier status and array adjustment variables) are not well predicted by the variables we were able to determine ahead of time. There could be other variables measurable pre-hybridisation which may be better associated with expression data quality measures

  4. Development of a forensic identity SNP panel for Indonesia.

    PubMed

    Augustinus, Daniel; Gahan, Michelle E; McNevin, Dennis

    2015-07-01

    Genetic markers included in forensic identity panels must exhibit Hardy-Weinberg and linkage equilibrium (HWE and LE). "Universal" panels designed for global use can fail these tests in regional jurisdictions exhibiting high levels of genetic differentiation such as the Indonesian archipelago. This is especially the case where a single DNA database is required for allele frequency estimates to calculate random match probabilities (RMPs) and associated likelihood ratios (LRs). A panel of 65 single nucleotide polymorphisms (SNPs) and a reduced set of 52 SNPs have been selected from 15 Indonesian subpopulations in the HUGO Pan Asian SNP database using a SNP selection strategy that could be applied to any panel of forensic identity markers. The strategy consists of four screening steps: (1) application of a G test for HWE; (2) ranking for high heterozygosity; (3) selection for LE; and (4) selection for low inbreeding depression. SNPs in our Indonesian panel perform well in comparison to some other universal SNP and short tandem repeat (STR) panels as measured by Fisher's exact test for HWE and LE and Wright's F statistics.

  5. Development of SNP-genotyping arrays in two shellfish species.

    PubMed

    Lapègue, S; Harrang, E; Heurtebise, S; Flahauw, E; Donnadieu, C; Gayral, P; Ballenghien, M; Genestout, L; Barbotte, L; Mahla, R; Haffray, P; Klopp, C

    2014-07-01

    Use of SNPs has been favoured due to their abundance in plant and animal genomes, accompanied by the falling cost and rising throughput capacity for detection and genotyping. Here, we present in vitro (obtained from targeted sequencing) and in silico discovery of SNPs, and the design of medium-throughput genotyping arrays for two oyster species, the Pacific oyster, Crassostrea gigas, and European flat oyster, Ostrea edulis. Two sets of 384 SNP markers were designed for two Illumina GoldenGate arrays and genotyped on more than 1000 samples for each species. In each case, oyster samples were obtained from wild and selected populations and from three-generation families segregating for traits of interest in aquaculture. The rate of successfully genotyped polymorphic SNPs was about 60% for each species. Effects of SNP origin and quality on genotyping success (Illumina functionality Score) were analysed and compared with other model and nonmodel species. Furthermore, a simulation was made based on a subset of the C. gigas SNP array with a minor allele frequency of 0.3 and typical crosses used in shellfish hatcheries. This simulation indicated that at least 150 markers were needed to perform an accurate parental assignment. Such panels might provide valuable tools to improve our understanding of the connectivity between wild (and selected) populations and could contribute to future selective breeding programmes.

  6. Rapid SNP Discovery and Genetic Mapping Using Sequenced RAD Markers

    PubMed Central

    Atwood, Tressa S.; Currey, Mark C.; Shiver, Anthony L.; Lewis, Zachary A.; Selker, Eric U.; Cresko, William A.; Johnson, Eric A.

    2008-01-01

    Single nucleotide polymorphism (SNP) discovery and genotyping are essential to genetic mapping. There remains a need for a simple, inexpensive platform that allows high-density SNP discovery and genotyping in large populations. Here we describe the sequencing of restriction-site associated DNA (RAD) tags, which identified more than 13,000 SNPs, and mapped three traits in two model organisms, using less than half the capacity of one Illumina sequencing run. We demonstrated that different marker densities can be attained by choice of restriction enzyme. Furthermore, we developed a barcoding system for sample multiplexing and fine mapped the genetic basis of lateral plate armor loss in threespine stickleback by identifying recombinant breakpoints in F2 individuals. Barcoding also facilitated mapping of a second trait, a reduction of pelvic structure, by in silico re-sorting of individuals. To further demonstrate the ease of the RAD sequencing approach we identified polymorphic markers and mapped an induced mutation in Neurospora crassa. Sequencing of RAD markers is an integrated platform for SNP discovery and genotyping. This approach should be widely applicable to genetic mapping in a variety of organisms. PMID:18852878

  7. Multiple SNP-sets Analysis for Genome-wide Association Studies through Bayesian Latent Variable Selection

    PubMed Central

    Lu, Zhaohua; Zhu, Hongtu; Knickmeyer, Rebecca C; Sullivan, Patrick F.; Stephanie, Williams N.; Zou, Fei

    2015-01-01

    The power of genome-wide association studies (GWAS) for mapping complex traits with single SNP analysis may be undermined by modest SNP effect sizes, unobserved causal SNPs, correlation among adjacent SNPs, and SNP-SNP interactions. Alternative approaches for testing the association between a single SNP-set and individual phenotypes have been shown to be promising for improving the power of GWAS. We propose a Bayesian latent variable selection (BLVS) method to simultaneously model the joint association mapping between a large number of SNP-sets and complex traits. Compared to single SNP-set analysis, such joint association mapping not only accounts for the correlation among SNP-sets, but also is capable of detecting causal SNP-sets that are marginally uncorrelated with traits. The spike-slab prior assigned to the effects of SNP-sets can greatly reduce the dimension of effective SNP-sets, while speeding up computation. An efficient MCMC algorithm is developed. Simulations demonstrate that BLVS outperforms several competing variable selection methods in some important scenarios. PMID:26515609

  8. CACNA1C SNP rs1006737 associates with bipolar I disorder independent of the Bcl-2 SNP rs956572 variant and its associated effect on intracellular calcium homeostasis.

    PubMed

    Uemura, Takuji; Green, Marty; Warsh, Jerry J

    2016-10-01

    Intracellular calcium (Ca(2+)) dyshomeostasis (ICDH) has been implicated in bipolar disorder (BD) pathophysiology. We previously showed that SNP rs956572 in the B-cell CLL/lymphoma 2 (Bcl-2) gene associates with elevated B lymphoblast (BLCL) intracellular Ca(2+) concentrations ([Ca(2+)]B) differentially in BD-I. Genome-wide association studies strongly support the association between BD and the SNP rs1006737, located within the L-type voltage-dependent Ca(2+) channel α1C subunit gene (CACNA1C). Here we investigated whether this CACNA1C variant also associates with ICDH and interacts with SNP rs956572 on [Ca(2+)]B in BD-I. CACNA1C SNP rs1006737 was genotyped in 150 BD-I, 65 BD-II, 30 major depressive disorder patients, and 70 healthy subjects with available BLCL [Ca(2+)]B and Bcl-2 SNP rs956572 genotype measures. SNP rs1006737 was significantly associated with BD-I. The [Ca(2+)]B was significantly higher in BD-I rs1006737 A compared with healthy A allele carriers and also in healthy GG compared with A allele carriers. There was no significant interaction between SNP rs1006737 and SNP rs956572 on [Ca(2+)]B. Our study further supports the association of SNP rs1006737 with BD-I and suggests that CACNA1C SNP rs1006737 and Bcl-2 SNP rs956572, or specific causal variants in LD with these proxies, act independently to increase risk and ICDH in BD-I.

  9. Population distribution and ancestry of the cancer protective MDM2 SNP285 (rs117039649)

    PubMed Central

    Knappskog, Stian; Gansmo, Liv B.; Dibirova, Khadizha; Metspalu, Andres; Cybulski, Cezary; Peterlongo, Paolo; Aaltonen, Lauri; Vatten, Lars; Romundstad, Pål; Hveem, Kristian; Devilee, Peter; Evans, Gareth D.; Lin, Dongxin; Camp, Guy Van; Manolopoulos, Vangelis G.; Osorio, Ana; Milani, Lili; Ozcelik, Tayfun; Zalloua, Pierre; Mouzaya, Francis; Bliznetz, Elena; Balanovska, Elena; Pocheshkova, Elvira; Kučinskas, Vaidutis; Atramentova, Lubov; Nymadawa, Pagbajabyn; Titov, Konstantin; Lavryashina, Maria; Yusupov, Yuldash; Bogdanova, Natalia; Koshel, Sergey; Zamora, Jorge; Wedge, David C.; Charlesworth, Deborah; Dörk, Thilo; Balanovsky, Oleg; Lønning, Per E.

    2014-01-01

    The MDM2 promoter SNP285C is located on the SNP309G allele. While SNP309G enhances Sp1 transcription factor binding and MDM2 transcription, SNP285C antagonizes Sp1 binding and reduces the risk of breast-, ovary- and endometrial cancer. Assessing SNP285 and 309 genotypes across 25 different ethnic populations (>10.000 individuals), the incidence of SNP285C was 6-8% across European populations except for Finns (1.2%) and Saami (0.3%). The incidence decreased towards the Middle-East and Eastern Russia, and SNP285C was absent among Han Chinese, Mongolians and African Americans. Interhaplotype variation analyses estimated SNP285C to have originated about 14,700 years ago (95% CI: 8,300 – 33,300). Both this estimate and the geographical distribution suggest SNP285C to have arisen after the separation between Caucasians and modern day East Asians (17,000 - 40,000 years ago). We observed a strong inverse correlation (r = -0.805; p < 0.001) between the percentage of SNP309G alleles harboring SNP285C and the MAF for SNP309G itself across different populations suggesting selection and environmental adaptation with respect to MDM2 expression in recent human evolution. In conclusion, we found SNP285C to be a pan-Caucasian variant. Ethnic variation regarding distribution of SNP285C needs to be taken into account when assessing the impact of MDM2 SNPs on cancer risk. PMID:25327560

  10. Forensic SNP genotyping with SNaPshot: Technical considerations for the development and optimization of multiplexed SNP assays.

    PubMed

    Fondevila, M; Børsting, C; Phillips, C; de la Puente, M; Consortium, Euroforen-NoE; Carracedo, A; Morling, N; Lareu, M V

    2017-01-01

    This review explores the key factors that influence the optimization, routine use, and profile interpretation of the SNaPshot single-base extension (SBE) system applied to forensic single-nucleotide polymorphism (SNP) genotyping. Despite being a mainly complimentary DNA genotyping technique to routine STR profiling, use of SNaPshot is an important part of the development of SNP sets for a wide range of forensic applications with these markers, from genotyping highly degraded DNA with very short amplicons to the introduction of SNPs to ascertain the ancestry and physical characteristics of an unidentified contact trace donor. However, this technology, as resourceful as it is, displays several features that depart from the usual STR genotyping far enough to demand a certain degree of expertise from the forensic analyst before tackling the complex casework on which SNaPshot application provides an advantage. In order to provide the basis for developing such expertise, we cover in this paper the most challenging aspects of the SNaPshot technology, focusing on the steps taken to design primer sets, optimize the PCR and single-base extension chemistries, and the important features of the peak patterns observed in typical forensic SNP profiles using SNaPshot. With that purpose in mind, we provide guidelines and troubleshooting for multiplex-SNaPshot-oriented primer design and the resulting capillary electrophoresis (CE) profile interpretation (covering the most commonly observed artifacts and expected departures from the ideal conditions).

  11. Design And Performance Of 44,100 SNP Genotyping Array For Rice

    USDA-ARS?s Scientific Manuscript database

    To document genome-wide allelic variation within and between the different subpopulations of both O. sativa and O. rufipogon, we developed an Affymetrix custom genotyping array containing 44,100 SNPs well distributed across the 400Mb rice genome. The SNPs on this array were selected from the MBML-in...

  12. Exploration of SNP variants affecting hair colour prediction in Europeans.

    PubMed

    Söchtig, Jens; Phillips, Chris; Maroñas, Olalla; Gómez-Tato, Antonio; Cruz, Raquel; Alvarez-Dios, Jose; de Cal, María-Ángeles Casares; Ruiz, Yarimar; Reich, Kristian; Fondevila, Manuel; Carracedo, Ángel; Lareu, María V

    2015-09-01

    DNA profiling is a key tool for forensic analysis; however, current methods identify a suspect either by direct comparison or from DNA database searches. In cases with unidentified suspects, prediction of visible physical traits e.g. pigmentation or hair distribution of the DNA donors can provide important probative information. This study aimed to explore single nucleotide polymorphism (SNP) variants for their effect on hair colour prediction. A discovery panel of 63 SNPs consisting of already established hair colour markers from the HIrisPlex hair colour phenotyping assay as well as additional markers for which associations to human pigmentation traits were previously identified was used to develop multiplex assays based on SNaPshot single-base extension technology. A genotyping study was performed on a range of European populations (n = 605). Hair colour phenotyping was accomplished by matching donor's hair to a graded colour category system of reference shades and photography. Since multiple SNPs in combination contribute in varying degrees to hair colour predictability in Europeans, we aimed to compile a compact marker set that could provide a reliable hair colour inference from the fewest SNPs. The predictive approach developed uses a naïve Bayes classifier to provide hair colour assignment probabilities for the SNP profiles of the key SNPs and was embedded into the Snipper online SNP classifier ( http://mathgene.usc.es/snipper/ ). Results indicate that red, blond, brown and black hair colours are predictable with informative probabilities in a high proportion of cases. Our study resulted in the identification of 12 most strongly associated SNPs to hair pigmentation variation in six genes.

  13. Computational tradeoffs in multiplex PCR assay design for SNP genotyping

    PubMed Central

    Rachlin, John; Ding, Chunming; Cantor, Charles; Kasif, Simon

    2005-01-01

    Background Multiplex PCR is a key technology for detecting infectious microorganisms, whole-genome sequencing, forensic analysis, and for enabling flexible yet low-cost genotyping. However, the design of a multiplex PCR assays requires the consideration of multiple competing objectives and physical constraints, and extensive computational analysis must be performed in order to identify the possible formation of primer-dimers that can negatively impact product yield. Results This paper examines the computational design limits of multiplex PCR in the context of SNP genotyping and examines tradeoffs associated with several key design factors including multiplexing level (the number of primer pairs per tube), coverage (the % of SNP whose associated primers are actually assigned to one of several available tube), and tube-size uniformity. We also examine how design performance depends on the total number of available SNPs from which to choose, and primer stringency criterial. We show that finding high-multiplexing/high-coverage designs is subject to a computational phase transition, becoming dramatically more difficult when the probability of primer pair interaction exceeds a critical threshold. The precise location of this critical transition point depends on the number of available SNPs and the level of multiplexing required. We also demonstrate how coverage performance is impacted by the number of available snps, primer selection criteria, and target multiplexing levels. Conclusion The presence of a phase transition suggests limits to scaling Multiplex PCR performance for high-throughput genomics applications. Achieving broad SNP coverage rapidly transitions from being very easy to very hard as the target multiplexing level (# of primer pairs per tube) increases. The onset of a phase transition can be "delayed" by having a larger pool of SNPs, or loosening primer selection constraints so as to increase the number of candidate primer pairs per SNP, though the latter

  14. Genome-wide SNP discovery in walnut with an AGSNP pipeline updated for SNP discovery in allogamous organisms

    PubMed Central

    2012-01-01

    Background A genome-wide set of single nucleotide polymorphisms (SNPs) is a valuable resource in genetic research and breeding and is usually developed by re-sequencing a genome. If a genome sequence is not available, an alternative strategy must be used. We previously reported the development of a pipeline (AGSNP) for genome-wide SNP discovery in coding sequences and other single-copy DNA without a complete genome sequence in self-pollinating (autogamous) plants. Here we updated this pipeline for SNP discovery in outcrossing (allogamous) species and demonstrated its efficacy in SNP discovery in walnut (Juglans regia L.). Results The first step in the original implementation of the AGSNP pipeline was the construction of a reference sequence and the identification of single-copy sequences in it. To identify single-copy sequences, multiple genome equivalents of short SOLiD reads of another individual were mapped to shallow genome coverage of long Sanger or Roche 454 reads making up the reference sequence. The relative depth of SOLiD reads was used to filter out repeated sequences from single-copy sequences in the reference sequence. The second step was a search for SNPs between SOLiD reads and the reference sequence. Polymorphism within the mapped SOLiD reads would have precluded SNP discovery; hence both individuals had to be homozygous. The AGSNP pipeline was updated here for using SOLiD or other type of short reads of a heterozygous individual for these two principal steps. A total of 32.6X walnut genome equivalents of SOLiD reads of vegetatively propagated walnut scion cultivar ‘Chandler’ were mapped to 48,661 ‘Chandler’ bacterial artificial chromosome (BAC) end sequences (BESs) produced by Sanger sequencing during the construction of a walnut physical map. A total of 22,799 putative SNPs were initially identified. A total of 6,000 Infinium II type SNPs evenly distributed along the walnut physical map were selected for the construction of an Infinium Bead

  15. Genome-wide SNP discovery in walnut with an AGSNP pipeline updated for SNP discovery in allogamous organisms.

    PubMed

    You, Frank M; Deal, Karin R; Wang, Jirui; Britton, Monica T; Fass, Joseph N; Lin, Dawei; Dandekar, Abhaya M; Leslie, Charles A; Aradhya, Mallikarjuna; Luo, Ming-Cheng; Dvorak, Jan

    2012-07-31

    A genome-wide set of single nucleotide polymorphisms (SNPs) is a valuable resource in genetic research and breeding and is usually developed by re-sequencing a genome. If a genome sequence is not available, an alternative strategy must be used. We previously reported the development of a pipeline (AGSNP) for genome-wide SNP discovery in coding sequences and other single-copy DNA without a complete genome sequence in self-pollinating (autogamous) plants. Here we updated this pipeline for SNP discovery in outcrossing (allogamous) species and demonstrated its efficacy in SNP discovery in walnut (Juglans regia L.). The first step in the original implementation of the AGSNP pipeline was the construction of a reference sequence and the identification of single-copy sequences in it. To identify single-copy sequences, multiple genome equivalents of short SOLiD reads of another individual were mapped to shallow genome coverage of long Sanger or Roche 454 reads making up the reference sequence. The relative depth of SOLiD reads was used to filter out repeated sequences from single-copy sequences in the reference sequence. The second step was a search for SNPs between SOLiD reads and the reference sequence. Polymorphism within the mapped SOLiD reads would have precluded SNP discovery; hence both individuals had to be homozygous. The AGSNP pipeline was updated here for using SOLiD or other type of short reads of a heterozygous individual for these two principal steps. A total of 32.6X walnut genome equivalents of SOLiD reads of vegetatively propagated walnut scion cultivar 'Chandler' were mapped to 48,661 'Chandler' bacterial artificial chromosome (BAC) end sequences (BESs) produced by Sanger sequencing during the construction of a walnut physical map. A total of 22,799 putative SNPs were initially identified. A total of 6,000 Infinium II type SNPs evenly distributed along the walnut physical map were selected for the construction of an Infinium BeadChip, which was used to

  16. A genome-wide search for common SNP x SNP interactions on the risk of venous thrombosis

    PubMed Central

    2013-01-01

    Background Venous Thrombosis (VT) is a common multifactorial disease with an estimated heritability between 35% and 60%. Known genetic polymorphisms identified so far only explain ~5% of the genetic variance of the disease. This study was aimed to investigate whether pair-wise interactions between common single nucleotide polymorphisms (SNPs) could exist and modulate the risk of VT. Methods A genome-wide SNP x SNP interaction analysis on VT risk was conducted in a French case–control study and the most significant findings were tested for replication in a second independent French case–control sample. The results obtained in the two studies totaling 1,953 cases and 2,338 healthy subjects were combined into a meta-analysis. Results The smallest observed p-value for interaction was p = 6.00 10-11 but it did not pass the Bonferroni significance threshold of 1.69 10-12 correcting for the number of investigated interactions that was 2.96 1010. Among the 37 suggestive pair-wise interactions with p-value less than 10-8, one was further shown to involve two SNPs, rs9804128 (IGFS21 locus) and rs4784379 (IRX3 locus) that demonstrated significant interactive effects (p = 4.83 10-5) on the variability of plasma Factor VIII levels, a quantitative biomarker of VT risk, in a sample of 1,091 VT patients. Conclusion This study, the first genome-wide SNP interaction analysis conducted so far on VT risk, suggests that common SNPs are unlikely exerting strong interactive effects on the risk of disease. PMID:23509962

  17. The Impact of a Common MDM2 SNP on the Sensitivity of Breast Cancer to Treatment

    DTIC Science & Technology

    2012-06-01

    could decrease the effectiveness of treatment. These outcomes are likely due to the increased expression of mdm2 protein in SNP309 individuals, which...expression at the protein level occur in the mdm2 SNP309 cell line. There was no association between the mdm2 SNP309 and clinical outcome of breast cancer...with chemotherapy, hormonal therapy and radiation therapy. 1S. SUBJECT TERMS mdm2, breast cancer, polymorphisms 16. SECURITY CLASSIFICATION OF: 17

  18. A SNP transferability survey within the genus Vitis

    PubMed Central

    Vezzulli, Silvia; Micheletti, Diego; Riaz, Summaira; Pindo, Massimo; Viola, Roberto; This, Patrice; Walker, M Andrew; Troggio, Michela; Velasco, Riccardo

    2008-01-01

    Background Efforts to sequence the genomes of different organisms continue to increase. The DNA sequence is usually decoded for one individual and its application is for the whole species. The recent sequencing of the highly heterozygous Vitis vinifera L. cultivar Pinot Noir (clone ENTAV 115) genome gave rise to several thousand polymorphisms and offers a good model to study the transferability of its degree of polymorphism to other individuals of the same species and within the genus. Results This study was performed by genotyping 137 SNPs through the SNPlex™ Genotyping System (Applied Biosystems Inc.) and by comparing the SNPlex sequencing results across 35 (of the 137) regions from 69 grape accessions. A heterozygous state transferability of 31.5% across the unrelated cultivars of V. vinifera, of 18.8% across the wild forms of V. vinifera, of 2.3% among non-vinifera Vitis species, and of 0% with Muscadinia rotundifolia was found. In addition, mean allele frequencies were used to evaluate SNP informativeness and develop useful subsets of markers. Conclusion Using SNPlex application and corroboration from the sequencing analysis, the informativeness of SNP markers from the heterozygous grape cultivar Pinot Noir was validated in V. vinifera (including cultivars and wild forms), but had a limited application for non-vinifera Vitis species where a resequencing strategy may be preferred, knowing that homology at priming sites is sufficient. This work will allow future applications such as mapping and diversity studies, accession identification and genomic-research assisted breeding within V. vinifera. PMID:19087337

  19. Structural Architecture of SNP Effects on Complex Traits

    PubMed Central

    Gamazon, Eric R.; Cox, Nancy J.; Davis, Lea K.

    2014-01-01

    Despite the discovery of copy-number variation (CNV) across the genome nearly 10 years ago, current SNP-based analysis methodologies continue to collapse the homozygous (i.e., A/A), hemizygous (i.e., A/0), and duplicative (i.e., A/A/A) genotype states, treating the genotype variable as irreducible or unaltered by other colocalizing forms of genetic (e.g., structural) variation. Our understanding of common, genome-wide CNVs suggests that the canonical genotype construct might belie the enormous complexity of the genome. Here we present multiple analyses of several phenotypes and provide methods supporting a conceptual shift that embraces the structural dimension of genotype. We comprehensively investigate the impact of the structural dimension of genotype on (1) GWAS methods, (2) interpretation of rare LOF variants, (3) characterization of genomic architecture, and (4) implications for mapping loci involved in complex disease. Taken together, these results argue for the inclusion of a structural dimension and suggest that some portion of the “missing” heritability might be recovered through integration of the structural dimension of SNP effects on complex traits. PMID:25307299

  20. Eigenanalysis of SNP data with an identity by descent interpretation.

    PubMed

    Zheng, Xiuwen; Weir, Bruce S

    2016-02-01

    Principal component analysis (PCA) is widely used in genome-wide association studies (GWAS), and the principal component axes often represent perpendicular gradients in geographic space. The explanation of PCA results is of major interest for geneticists to understand fundamental demographic parameters. Here, we provide an interpretation of PCA based on relatedness measures, which are described by the probability that sets of genes are identical-by-descent (IBD). An approximately linear transformation between ancestral proportions (AP) of individuals with multiple ancestries and their projections onto the principal components is found. In addition, a new method of eigenanalysis "EIGMIX" is proposed to estimate individual ancestries. EIGMIX is a method of moments with computational efficiency suitable for millions of SNP data, and it is not subject to the assumption of linkage equilibrium. With the assumptions of multiple ancestries and their surrogate ancestral samples, EIGMIX is able to infer ancestral proportions (APs) of individuals. The methods were applied to the SNP data from the HapMap Phase 3 project and the Human Genome Diversity Panel. The APs of individuals inferred by EIGMIX are consistent with the findings of the program ADMIXTURE. In conclusion, EIGMIX can be used to detect population structure and estimate genome-wide ancestral proportions with a relatively high accuracy.

  1. SNP Markers and Their Impact on Plant Breeding

    PubMed Central

    Mammadov, Jafar; Aggarwal, Rajat; Buyyarapu, Ramesh; Kumpatla, Siva

    2012-01-01

    The use of molecular markers has revolutionized the pace and precision of plant genetic analysis which in turn facilitated the implementation of molecular breeding of crops. The last three decades have seen tremendous advances in the evolution of marker systems and the respective detection platforms. Markers based on single nucleotide polymorphisms (SNPs) have rapidly gained the center stage of molecular genetics during the recent years due to their abundance in the genomes and their amenability for high-throughput detection formats and platforms. Computational approaches dominate SNP discovery methods due to the ever-increasing sequence information in public databases; however, complex genomes pose special challenges in the identification of informative SNPs warranting alternative strategies in those crops. Many genotyping platforms and chemistries have become available making the use of SNPs even more attractive and efficient. This paper provides a review of historical and current efforts in the development, validation, and application of SNP markers in QTL/gene discovery and plant breeding by discussing key experimental strategies and cases exemplifying their impact. PMID:23316221

  2. Data mining and genetic algorithm based gene/SNP selection.

    PubMed

    Shah, Shital C; Kusiak, Andrew

    2004-07-01

    Genomic studies provide large volumes of data with the number of single nucleotide polymorphisms (SNPs) ranging into thousands. The analysis of SNPs permits determining relationships between genotypic and phenotypic information as well as the identification of SNPs related to a disease. The growing wealth of information and advances in biology call for the development of approaches for discovery of new knowledge. One such area is the identification of gene/SNP patterns impacting cure/drug development for various diseases. A new approach for predicting drug effectiveness is presented. The approach is based on data mining and genetic algorithms. A global search mechanism, weighted decision tree, decision-tree-based wrapper, a correlation-based heuristic, and the identification of intersecting feature sets are employed for selecting significant genes. The feature selection approach has resulted in 85% reduction of number of features. The relative increase in cross-validation accuracy and specificity for the significant gene/SNP set was 10% and 3.2%, respectively. The feature selection approach was successfully applied to data sets for drug and placebo subjects. The number of features has been significantly reduced while the quality of knowledge was enhanced. The feature set intersection approach provided the most significant genes/SNPs. The results reported in the paper discuss associations among SNPs resulting in patient-specific treatment protocols.

  3. New multilocus linkage disequilibrium measure for tag SNP selection.

    PubMed

    Liao, Bo; Wang, Xiangjun; Zhu, Wen; Li, Xiong; Cai, Lijun; Chen, Haowen

    2017-02-01

    Numerous approaches have been proposed for selecting an optimal tag single-nucleotide polymorphism (SNP) set. Most of these approaches are based on linkage disequilibrium (LD). Classical LD measures, such as D' and r(2), are frequently used to quantify the relationship between two marker (pairwise) linkage disequilibria. Despite of their successful use in many applications, these measures cannot be used to measure the LD between multiple-marker. These LD measures need information about the frequencies of alleles collected from haplotype dataset. In this study, a cluster algorithm is proposed to cluster SNPs according to multilocus LD measure which is based on information theory. After that, tag SNPs are selected in each cluster optimized by the number of tag SNPs, prediction accuracy and so on. The experimental results show that this new LD measure can be directly applied to genotype dataset collected from the HapMap project, so that it saves the cost of haplotyping. More importantly, the proposed method significantly improves the efficiency and prediction accuracy of tag SNP selection.

  4. New generation pharmacogenomic tools: a SNP linkage disequilibrium Map, validated SNP assay resource, and high-throughput instrumentation system for large-scale genetic studies.

    PubMed

    De La Vega, Francisco M; Dailey, David; Ziegle, Janet; Williams, Julie; Madden, Dawn; Gilbert, Dennis A

    2002-06-01

    Since public and private efforts announced the first draft of the human genome last year, researchers have reported great numbers of single nucleotide polymorphisms (SNPs). We believe that the availability of well-mapped, quality SNP markers constitutes the gateway to a revolution in genetics and personalized medicine that will lead to better diagnosis and treatment of common complex disorders. A new generation of tools and public SNP resources for pharmacogenomic and genetic studies--specifically for candidate-gene, candidate-region, and whole-genome association studies--will form part of the new scientific landscape. This will only be possible through the greater accessibility of SNP resources and superior high-throughput instrumentation-assay systems that enable affordable, highly productive large-scale genetic studies. We are contributing to this effort by developing a high-quality linkage disequilibrium SNP marker map and an accompanying set of ready-to-use, validated SNP assays across every gene in the human genome. This effort incorporates both the public sequence and SNP data sources, and Celera Genomics' human genome assembly and enormous resource ofphysically mapped SNPs (approximately 4,000,000 unique records). This article discusses our approach and methodology for designing the map, choosing quality SNPs, designing and validating these assays, and obtaining population frequency ofthe polymorphisms. We also discuss an advanced, high-performance SNP assay chemisty--a new generation of the TaqMan probe-based, 5' nuclease assay-and high-throughput instrumentation-software system for large-scale genotyping. We provide the new SNP map and validation information, validated SNP assays and reagents, and instrumentation systems as a novel resource for genetic discoveries.

  5. Prognostic impact of SNP array karyotyping in myelodysplastic syndromes and related myeloid malignancies

    PubMed Central

    Tiu, Ramon V.; Gondek, Lukasz P.; O'Keefe, Christine L.; Elson, Paul; Huh, Jungwon; Mohamedali, Azim; Kulasekararaj, Austin; Advani, Anjali S.; Paquette, Ronald; List, Alan F.; Sekeres, Mikkael A.; McDevitt, Michael A.

    2011-01-01

    Single nucleotide polymorphism arrays (SNP-As) have emerged as an important tool in the identification of chromosomal defects undetected by metaphase cytogenetics (MC) in hematologic cancers, offering superior resolution of unbalanced chromosomal defects and acquired copy-neutral loss of heterozygosity. Myelodysplastic syndromes (MDSs) and related cancers share recurrent chromosomal defects and molecular lesions that predict outcomes. We hypothesized that combining SNP-A and MC could improve diagnosis/prognosis and further the molecular characterization of myeloid malignancies. We analyzed MC/SNP-A results from 430 patients (MDS = 250, MDS/myeloproliferative overlap neoplasm = 95, acute myeloid leukemia from MDS = 85). The frequency and clinical significance of genomic aberrations was compared between MC and MC plus SNP-A. Combined MC/SNP-A karyotyping lead to higher diagnostic yield of chromosomal defects (74% vs 44%, P < .0001), compared with MC alone, often through detection of novel lesions in patients with normal/noninformative (54%) and abnormal (62%) MC results. Newly detected SNP-A defects contributed to poorer prognosis for patients stratified by current morphologic and clinical risk schemes. The presence and number of new SNP-A detected lesions are independent predictors of overall and event-free survival. The significant diagnostic and prognostic contributions of SNP-A–detected defects in MDS and related diseases underscore the utility of SNP-A when combined with MC in hematologic malignancies. PMID:21285439

  6. Analysis of high-order SNP barcodes in mitochondrial D-loop for chronic dialysis susceptibility.

    PubMed

    Yang, Cheng-Hong; Lin, Yu-Da; Chuang, Li-Yeh; Chang, Hsueh-Wei

    2016-10-01

    Positively identifying disease-associated single nucleotide polymorphism (SNP) markers in genome-wide studies entails the complex association analysis of a huge number of SNPs. Such large numbers of SNP barcode (SNP/genotype combinations) continue to pose serious computational challenges, especially for high-dimensional data. We propose a novel exploiting SNP barcode method based on differential evolution, termed IDE (improved differential evolution). IDE uses a "top combination strategy" to improve the ability of differential evolution to explore high-order SNP barcodes in high-dimensional data. We simulate disease data and use real chronic dialysis data to test four global optimization algorithms. In 48 simulated disease models, we show that IDE outperforms existing global optimization algorithms in terms of exploring ability and power to detect the specific SNP/genotype combinations with a maximum difference between cases and controls. In real data, we show that IDE can be used to evaluate the relative effects of each individual SNP on disease susceptibility. IDE generated significant SNP barcode with less computational complexity than the other algorithms, making IDE ideally suited for analysis of high-order SNP barcodes. Copyright © 2016 Elsevier Inc. All rights reserved.

  7. PCR amplification of SNP loci from crude DNA for large-scale genotyping of oomycetes.

    PubMed

    Hu, Jian; Lyon, Rebecca; Zhou, Yuxin; Lamour, Kurt

    2014-01-01

    Similar to other eukaryotes, single nucleotide polymorphism (SNP) markers are abundant in many oomycete plant pathogen genomes. High resolution DNA melting analysis (HR-DMA) is a cost-effective method for SNP genotyping, but like many SNP marker technologies, is limited by the amount and quality of template DNA. We describe PCR preamplification of Phytophthora and Peronospora SNP loci from crude DNA extracted from a small amount of mycelium and/or infected plant tissue to produce sufficient template to genotype at least 10 000 SNPs. The approach is fast, inexpensive, requires minimal biological material and should be useful for many organisms in a variety of contexts.

  8. Accuracy of direct genomic values in Holstein bulls and cows using subsets of SNP markers

    PubMed Central

    2010-01-01

    Background At the current price, the use of high-density single nucleotide polymorphisms (SNP) genotyping assays in genomic selection of dairy cattle is limited to applications involving elite sires and dams. The objective of this study was to evaluate the use of low-density assays to predict direct genomic value (DGV) on five milk production traits, an overall conformation trait, a survival index, and two profit index traits (APR, ASI). Methods Dense SNP genotypes were available for 42,576 SNP for 2,114 Holstein bulls and 510 cows. A subset of 1,847 bulls born between 1955 and 2004 was used as a training set to fit models with various sets of pre-selected SNP. A group of 297 bulls born between 2001 and 2004 and all cows born between 1992 and 2004 were used to evaluate the accuracy of DGV prediction. Ridge regression (RR) and partial least squares regression (PLSR) were used to derive prediction equations and to rank SNP based on the absolute value of the regression coefficients. Four alternative strategies were applied to select subset of SNP, namely: subsets of the highest ranked SNP for each individual trait, or a single subset of evenly spaced SNP, where SNP were selected based on their rank for ASI, APR or minor allele frequency within intervals of approximately equal length. Results RR and PLSR performed very similarly to predict DGV, with PLSR performing better for low-density assays and RR for higher-density SNP sets. When using all SNP, DGV predictions for production traits, which have a higher heritability, were more accurate (0.52-0.64) than for survival (0.19-0.20), which has a low heritability. The gain in accuracy using subsets that included the highest ranked SNP for each trait was marginal (5-6%) over a common set of evenly spaced SNP when at least 3,000 SNP were used. Subsets containing 3,000 SNP provided more than 90% of the accuracy that could be achieved with a high-density assay for cows, and 80% of the high-density assay for young bulls

  9. A Genome-Wide Association Study for Agronomic Traits in Soybean Using SNP Markers and SNP-Based Haplotype Analysis

    PubMed Central

    de Oliveira, Marco Antônio Rott; Higashi, Wilson; Scapim, Carlos Alberto; Schuster, Ivan

    2017-01-01

    Mapping quantitative trait loci through the use of linkage disequilibrium (LD) in populations of unrelated individuals provides a valuable approach for dissecting the genetic basis of complex traits in soybean (Glycine max). The haplotype-based genome-wide association study (GWAS) has now been proposed as a complementary approach to intensify benefits from LD, which enable to assess the genetic determinants of agronomic traits. In this study a GWAS was undertaken to identify genomic regions that control 100-seed weight (SW), plant height (PH) and seed yield (SY) in a soybean association mapping panel using single nucleotide polymorphism (SNP) markers and haplotype information. The soybean cultivars (N = 169) were field-evaluated across four locations of southern Brazil. The genome-wide haplotype association analysis (941 haplotypes) identified eleven, seventeen and fifty-nine SNP-based haplotypes significantly associated with SY, SW and PH, respectively. Although most marker-trait associations were environment and trait specific, stable haplotype associations were identified for SY and SW across environments (i.e., haplotypes Gm12_Hap12). The haplotype block 42 on Chr19 (Gm19_Hap42) was confirmed to be associated with PH in two environments. These findings enable us to refine the breeding strategy for tropical soybean, which confirm that haplotype-based GWAS can provide new insights on the genetic determinants that are not captured by the single-marker approach. PMID:28152092

  10. A Genome-Wide Association Study for Agronomic Traits in Soybean Using SNP Markers and SNP-Based Haplotype Analysis.

    PubMed

    Contreras-Soto, Rodrigo Iván; Mora, Freddy; de Oliveira, Marco Antônio Rott; Higashi, Wilson; Scapim, Carlos Alberto; Schuster, Ivan

    2017-01-01

    Mapping quantitative trait loci through the use of linkage disequilibrium (LD) in populations of unrelated individuals provides a valuable approach for dissecting the genetic basis of complex traits in soybean (Glycine max). The haplotype-based genome-wide association study (GWAS) has now been proposed as a complementary approach to intensify benefits from LD, which enable to assess the genetic determinants of agronomic traits. In this study a GWAS was undertaken to identify genomic regions that control 100-seed weight (SW), plant height (PH) and seed yield (SY) in a soybean association mapping panel using single nucleotide polymorphism (SNP) markers and haplotype information. The soybean cultivars (N = 169) were field-evaluated across four locations of southern Brazil. The genome-wide haplotype association analysis (941 haplotypes) identified eleven, seventeen and fifty-nine SNP-based haplotypes significantly associated with SY, SW and PH, respectively. Although most marker-trait associations were environment and trait specific, stable haplotype associations were identified for SY and SW across environments (i.e., haplotypes Gm12_Hap12). The haplotype block 42 on Chr19 (Gm19_Hap42) was confirmed to be associated with PH in two environments. These findings enable us to refine the breeding strategy for tropical soybean, which confirm that haplotype-based GWAS can provide new insights on the genetic determinants that are not captured by the single-marker approach.

  11. Comparing the efficacy of SNP filtering methods for identifying a single causal SNP in a known association region.

    PubMed

    Spencer, Amy Victoria; Cox, Angela; Walters, Kevin

    2014-01-01

    Genome-wide association studies have successfully identified associations between common diseases and a large number of single nucleotide polymorphisms (SNPs) across the genome. We investigate the effectiveness of several statistics, including p-values, likelihoods, genetic map distance and linkage disequilibrium between SNPs, in filtering SNPs in several disease-associated regions. We use simulated data to compare the efficacy of filters with different sample sizes and for causal SNPs with different minor allele frequencies (MAFs) and effect sizes, focusing on the small effect sizes and MAFs likely to represent the majority of unidentified causal SNPs. In our analyses, of all the methods investigated, filtering on the ranked likelihoods consistently retains the true causal SNP with the highest probability for a given false positive rate. This was the case for all the local linkage disequilibrium patterns investigated. Our results indicate that when using this method to retain only the top 5% of SNPs, even a causal SNP with an odds ratio of 1.1 and MAF of 0.08 can be retained with a probability exceeding 0.9 using an overall sample size of 50,000. © 2013 John Wiley & Sons Ltd/University College London.

  12. SNP Discovery for mapping alien introgressions in wheat

    PubMed Central

    2014-01-01

    Background Monitoring alien introgressions in crop plants is difficult due to the lack of genetic and molecular mapping information on the wild crop relatives. The tertiary gene pool of wheat is a very important source of genetic variability for wheat improvement against biotic and abiotic stresses. By exploring the 5Mg short arm (5MgS) of Aegilops geniculata, we can apply chromosome genomics for the discovery of SNP markers and their use for monitoring alien introgressions in wheat (Triticum aestivum L). Results The short arm of chromosome 5Mg of Ae. geniculata Roth (syn. Ae. ovata L.; 2n = 4x = 28, UgUgMgMg) was flow-sorted from a wheat line in which it is maintained as a telocentric chromosome. DNA of the sorted arm was amplified and sequenced using an Illumina Hiseq 2000 with ~45x coverage. The sequence data was used for SNP discovery against wheat homoeologous group-5 assemblies. A total of 2,178 unique, 5MgS-specific SNPs were discovered. Randomly selected samples of 59 5MgS-specific SNPs were tested (44 by KASPar assay and 15 by Sanger sequencing) and 84% were validated. Of the selected SNPs, 97% mapped to a chromosome 5Mg addition to wheat (the source of t5MgS), and 94% to 5Mg introgressed from a different accession of Ae. geniculata substituting for chromosome 5D of wheat. The validated SNPs also identified chromosome segments of 5MgS origin in a set of T5D-5Mg translocation lines; eight SNPs (25%) mapped to TA5601 [T5DL · 5DS-5MgS(0.75)] and three (8%) to TA5602 [T5DL · 5DS-5MgS (0.95)]. SNPs (gsnp_5ms83 and gsnp_5ms94), tagging chromosome T5DL · 5DS-5MgS(0.95) with the smallest introgression carrying resistance to leaf rust (Lr57) and stripe rust (Yr40), were validated in two released germplasm lines with Lr57 and Yr40 genes. Conclusion This approach should be widely applicable for the identification of species/genome-specific SNPs. The development of a large number of SNP markers will facilitate the precise introgression and

  13. Molecular cloning and SNP association analysis of chicken PMCH gene.

    PubMed

    Sun, Guirong; Li, Ming; Li, Hong; Tian, Yadong; Chen, Qixin; Bai, Yichun; Kang, Xiangtao

    2013-08-01

    The pre-melanin-concentrating hormone (PMCH) gene is an important gene functionally concerning the regulations of body fat content, feeding behavior and energy balance. In this study, the full-length cDNA of chicken PMCH gene was amplified by SMART RACE method. The single nucleotide polymorphisms (SNPs) in the PMCH gene were screened by comparative sequence analysis. The obtained non-synonymous coding SNPs (ncSNPs) were designed for genotyping firstly. Its effects on growth, carcass characteristics and meat quality traits were investigated employing the F2 resource population of Gushi chicken crossed with Anak broiler by AluI CRS-PCR-RFLP. Our results indicated that the cDNA of chicken PMCH shared 67.25 and 66.47% homology with that of human and bovine PMCH, respectively. The deduced amino acid sequence of chicken PMCH (163 amino acids) were 52.07 and 50.89% identical to those of human and bovine PMCH, respectively. The PMCH protein sequence is predicted to have several functional domains, including pro-MCH, CSP, IL7, XPGI and some low complexity sequence. It has 8 phosphorylation sites and no signal peptide sequence. gga-miR-18a, gga-miR-18b, gga-miR-499 microRNA targeting site was predicted in the 3' untranslated region of chicken PMCH mRNA. In addition, a total of seven SNPs including an ncSNP and a synonymous coding SNP, were identified in the PMCH gene. The ncSNP c.81 A>T was found to be in moderate polymorphic state (polymorphic index=0.365), and the frequencies for genotype AA, AB and BB were 0.3648, 0.4682 and 0.1670, respectively. Significant associations between the locus and shear force of breast and leg were observed. This polymorphic site may serve as a useful target for the marker assisted selection of the growth and meat quality traits in chicken.

  14. Development of maizeSNP3072, a high-throughput compatible SNP array, for DNA fingerprinting identification of Chinese maize varieties.

    PubMed

    Tian, Hong-Li; Wang, Feng-Ge; Zhao, Jiu-Ran; Yi, Hong-Mei; Wang, Lu; Wang, Rui; Yang, Yang; Song, Wei

    2015-01-01

    Single nucleotide polymorphisms (SNPs) are abundant and evenly distributed throughout the maize (Zea mays L.) genome. SNPs have several advantages over simple sequence repeats, such as ease of data comparison and integration, high-throughput processing of loci, and identification of associated phenotypes. SNPs are thus ideal for DNA fingerprinting, genetic diversity analysis, and marker-assisted breeding. Here, we developed a high-throughput and compatible SNP array, maizeSNP3072, containing 3072 SNPs developed from the maizeSNP50 array. To improve genotyping efficiency, a high-quality cluster file, maizeSNP3072_GT.egt, was constructed. All 3072 SNP loci were localized within different genes, where they were distributed in exons (43 %), promoters (21 %), 3' untranslated regions (UTRs; 22 %), 5' UTRs (9 %), and introns (5 %). The average genotyping failure rate using these SNPs was only 6 %, or 3 % using the cluster file to call genotypes. The genotype consistency of repeat sample analysis on Illumina GoldenGate versus Infinium platforms exceeded 96.4 %. The minor allele frequency (MAF) of the SNPs averaged 0.37 based on data from 309 inbred lines. The 3072 SNPs were highly effective for distinguishing among 276 examined hybrids. Comparative analysis using Chinese varieties revealed that the 3072SNP array showed a better marker success rate and higher average MAF values, evaluation scores, and variety-distinguishing efficiency than the maizeSNP50K array. The maizeSNP3072 array thus can be successfully used in DNA fingerprinting identification of Chinese maize varieties and shows potential as a useful tool for germplasm resource evaluation and molecular marker-assisted breeding.

  15. The association between MEFV gene polymorphisms and Henoch-Schönlein purpura, and additional SNP-SNP interactions in Chinese Han children.

    PubMed

    Xiong, Shunjun; Xiong, Ying; Huang, Qian; Wang, Jierong; Zhang, Xiaofang

    2017-03-01

    The aim of this study was to investigate the association between single-nucleotide polymorphisms (SNP) within MEFV gene and Henoch-Schönlein purpura (HSP) risk, and the impact of SNP-SNP interaction on HSP risk in Chinese children. A total of 662 subjects with a mean age of 7.9 ± 2.4 years old were selected, including 320 HSP patients and 342 normal controls. Logistic regression was performed to investigate association between SNP and HSP risk, and generalized multifactor dimensionality reduction (GMDR) was used to analyze the SNP-SNP interaction. Logistic analysis showed a significant association between genotypes of variants in rs3743930 and increased HSP risk. The carriers of homozygous mutant of rs3743930 polymorphism revealed increased HSP risk than those with wild-type homozygotes; OR (95% CI) was 1.55 (1.23-1.85). GMDR analysis suggested a significant two-locus model (p = 0.0107) involving rs3743930 and rs28940580, indicating a potential SNP-SNP interaction between rs3743930 and rs28940580. Overall, the two-locus models had a cross-validation consistency of 10 of 10 and had the testing accuracy of 60.72%. Subjects with rs3743930-GC or CC and rs28940580-GA or AA genotype have the highest HSP risk, compared to subjects with rs3743930-GG and rs28940580-GG genotype; OR (95% CI) was 2.13 (1.52-2.89). The variants in rs3743930 and interaction between rs3743930 and rs28940580 were associated with increased HSP risk in Chinese children.

  16. SNPWaveTM: a flexible multiplexed SNP genotyping technology

    PubMed Central

    van Eijk, Michiel J. T.; Broekhof, José L. N.; van der Poel, Hein J. A.; Hogers, René C. J.; Schneiders, Harrie; Kamerbeek, Judith; Verstege, Esther; van Aart, Joris W.; Geerlings, Henk; Buntjer, Jaap B.; van Oeveren, A. Jan; Vos, Pieter

    2004-01-01

    Scalable multiplexed amplification technologies are needed for cost-effective large-scale genotyping of genetic markers such as single nucleotide polymorphisms (SNPs). We present SNPWaveTM, a novel SNP genotyping technology to detect various subsets of sequences in a flexible fashion in a fixed detection format. SNPWave is based on highly multiplexed ligation, followed by amplification of up to 20 ligated probes in a single PCR. Depending on the multiplexing level of the ligation reaction, the latter employs selective amplification using the amplified fragment length polymorphism (AFLP®) technology. Detection of SNPWave reaction products is based on size separation on a sequencing instrument with multiple fluorescence labels and short run times. The SNPWave technique is illustrated by a 100-plex genotyping assay for Arabidopsis, a 40-plex assay for tomato and a 10-plex assay for Caenorhabditis elegans, detected on the MegaBACE 1000 capillary sequencer. PMID:15004220

  17. SNP-VISTA: An Interactive SNPs Visualization Tool

    SciTech Connect

    Shah, Nameeta; Teplitsky, Michael V.; Pennacchio, Len A.; Hugenholtz, Philip; Hamann, Bernd; Dubchak, Inna L.

    2005-07-05

    Recent advances in sequencing technologies promise better diagnostics for many diseases as well as better understanding of evolution of microbial populations. Single Nucleotide Polymorphisms(SNPs) are established genetic markers that aid in the identification of loci affecting quantitative traits and/or disease in a wide variety of eukaryotic species. With today's technological capabilities, it is possible to re-sequence a large set of appropriate candidate genes in individuals with a given disease and then screen for causative mutations.In addition, SNPs have been used extensively in efforts to study the evolution of microbial populations, and the recent application of random shotgun sequencing to environmental samples makes possible more extensive SNP analysis of co-occurring and co-evolving microbial populations. The program is available at http://genome.lbl.gov/vista/snpvista.

  18. Linear reduction method for predictive and informative tag SNP selection.

    PubMed

    He, Jingwu; Westbrooks, Kelly; Zelikovsky, Alexander

    2005-01-01

    Constructing a complete human haplotype map is helpful when associating complex diseases with their related SNPs. Unfortunately, the number of SNPs is very large and it is costly to sequence many individuals. Therefore, it is desirable to reduce the number of SNPs that should be sequenced to a small number of informative representatives called tag SNPs. In this paper, we propose a new linear algebra-based method for selecting and using tag SNPs. We measure the quality of our tag SNP selection algorithm by comparing actual SNPs with SNPs predicted from selected linearly independent tag SNPs. Our experiments show that for sufficiently long haplotypes, knowing only 0.4% of all SNPs the proposed linear reduction method predicts an unknown haplotype with the error rate below 2% based on 10% of the population.

  19. Grouping preprocess for haplotype inference from SNP and CNV data

    NASA Astrophysics Data System (ADS)

    Shindo, Hiroyuki; Chigira, Hiroshi; Nagaoka, Tomoyo; Kamatani, Naoyuki; Inoue, Masato

    2009-12-01

    The method of statistical haplotype inference is an indispensable technique in the field of medical science. The authors previously reported Hardy-Weinberg equilibrium-based haplotype inference that could manage single nucleotide polymorphism (SNP) data. We recently extended the method to cover copy number variation (CNV) data. Haplotype inference from mixed data is important because SNPs and CNVs are occasionally in linkage disequilibrium. The idea underlying the proposed method is simple, but the algorithm for it needs to be quite elaborate to reduce the calculation cost. Consequently, we have focused on the details on the algorithm in this study. Although the main advantage of the method is accuracy, in that it does not use any approximation, its main disadvantage is still the calculation cost, which is sometimes intractable for large data sets with missing values.

  20. Authentication of medicinal plants by SNP-based multiplex PCR.

    PubMed

    Lee, Ok Ran; Kim, Min-Kyeoung; Yang, Deok-Chun

    2012-01-01

    Highly variable intergenic spacer and intron regions from nuclear and cytoplasmic DNA have been used for species identification. Noncoding internal transcribed spacers (ITSs) located in 18S-5.8S-26S, and 5S ribosomal RNA genes (rDNAs) represent suitable region for medicinal plant authentication. Noncoding regions from two cytoplasmic DNA, chloroplast DNA (trnT-F intergenic spacer region), and mitochondrial DNA (fourth intron region of nad7 gene) are also successfully applied for the proper identification of medicinal plants. Single-nucleotide polymorphism (SNP) sites obtained from the amplification of intergenic spacer and intron regions are properly utilized for the verification of medicinal plants in species level using multiplex PCR. Multiplex PCR as a variant of PCR technique used to amplify more than two loci simultaneously.

  1. TNF-alpha SNP haplotype frequencies in equidae.

    PubMed

    Brown, J J; Ollier, W E R; Thomson, W; Matthews, J B; Carter, S D; Binns, M; Pinchbeck, G; Clegg, P D

    2006-05-01

    Tumour necrosis factor alpha (TNF-alpha) is a pro-inflammatory cytokine that plays a crucial role in the regulation of inflammatory and immune responses. In all vertebrate species the genes encoding TNF-alpha are located within the major histocompatability complex. In the horse TNF-alpha has been ascribed a role in a variety of important disease processes. Previously two single nucleotide polymorphisms (SNPs) have been reported within the 5' un-translated region of the equine TNF-alpha gene. We have examined the equine TNF-alpha promoter region further for additional SNPs by analysing DNA from 131 horses (Equus caballus), 19 donkeys (E. asinus), 2 Grant's zebras (E. burchellii boehmi) and one onager (E. hemionus). Two further SNPs were identified at nucleotide positions 24 (T/G) and 452 (T/C) relative to the first nucleotide of the 522 bp polymerase chain reaction product. A sequence variant at position 51 was observed between equidae. SNaPSHOT genotyping assays for these and the two previously reported SNPs were performed on 457 horses comprising seven different breeds and 23 donkeys to determine the gene frequencies. SNP frequencies varied considerably between different horse breeds and also between the equine species. In total, nine different TNF-alpha promoter SNP haplotypes and their frequencies were established amongst the various equidae examined, with some haplotypes being found only in horses and others only in donkeys or zebras. The haplotype frequencies observed varied greatly between different horse breeds. Such haplotypes may relate to levels of TNF-alpha production and disease susceptibility and further investigation is required to identify associations between particular haplotypes and altered risk of disease.

  2. Impact of population diversity on the prediction of 7-SNP NAT2 phenotypes using the tagSNP rs1495741 or paired SNPs.

    PubMed

    Suarez-Kurtz, Guilherme; Sortica, Vinicius A; Vargens, Daniela D; Bruxel, Estela M; Petzl-Erler, Maria-Luiza; Petz-Erler, Maria-Luiza; Tsuneto, Luisa T; Hutz, Mara H

    2012-04-01

    A novel NAT2 tagSNP (rs1495741) and a 2-SNP genotype (rs1041983 and rs1801280) have been recently shown to accurately predict the NAT2 acetylator phenotypes in populations of exclusive or predominant European/White ancestry. We confirmed the accuracy of the tagSNP approach in White Brazilians, but not in Brown or Black Brazilians, sub-Saharan Mozambicans, and Guarani Amerindians. The combined rs1041983 and rs1801280 genotypes provided considerably better prediction of the NAT2 phenotype in Guarani, but no consistent improvement in Brown or Black Brazilians and Mozambicans. Best predictions of the NAT2 phenotype in Mozambicans using NAT2 SNP pairs were obtained with rs1801280 and rs1799930, but the accuracy of the estimates remained inadequate for clinical use or for investigations in this sub-Saharan group or in Brazilians with considerable African ancestry. In conclusion, the rs1495741 tagSNP cannot be applied to predict the NAT2 acetylation phenotype in Guarani and African-derived populations, whereas 2-SNP genotypes may accurately predict NAT2 phenotypes in Guarani, but not in Africans.

  3. A new SNP panel for evaluating genetic diversity in a composite cattle breed

    USDA-ARS?s Scientific Manuscript database

    A custom 60K SNP panel, extracted from Bovine HD SNP chip was used to evaluate genotypic frequency changes in Braford (BF, a composite breed) when compared to progenitor breeds: Hereford (HF), Brahman (BR), and Nelore (NE). Samples from both the U. S. and Brazil were used. The new panel differentiat...

  4. Development and Applications of a Bovine 50,000 SNP Chip

    USDA-ARS?s Scientific Manuscript database

    To develop an Illumina iSelect high density single nucleotide polymorphism (SNP) assay for cattle, the collaborative iBMC (Illumina, USDA ARS Beltsville, University of Missouri, USDA ARS Clay Center) Consortium first performed a de novo SNP discovery project in which genomic reduced representation l...

  5. A genome-wide SNP panel for genetic diversity, mapping and breeding studies in rice

    USDA-ARS?s Scientific Manuscript database

    A genome-wide SNP resource was developed for rice using the GoldenGate assay and used to genotype 400 landrace accessions of O. sativa. SNPs were originally discovered using Perlegen re-sequencing technology in 20 diverse landraces of O. sativa as part of OryzaSNP project (http://irfgc.irri.org). An...

  6. Genome-wide copy number variations using SNP genotyping in a mixed breed swine population

    USDA-ARS?s Scientific Manuscript database

    Copy number variations (CNVs) are increasingly understood to affect phenotypic variation. This study uses SNP genotyping of trios of mixed breed swine to add to the catalog of known genotypic variation in an important agricultural animal. Porcine SNP60 BeadChip genotypes were collected from 1802 pi...

  7. [Accurate detection of a case with Angelman syndrome (type 1) using SNP array].

    PubMed

    Shi, Shanshan; Lin, Shaobin; Liao, Yanfen; Li, Weijing

    2016-12-10

    To analyze a case with Angelman syndrome (AS) using single nucleotide polymorphism array (SNP array) and explore its genotype-phenotype correlation. G-banded karyotyping and SNP array were performed on a child featuring congenital malformations, intellectual disability and developmental delay. Mendelian error checking based on the SNP information was used to delineate the parental origin of detected abnormality. Result of the SNP array was validated with fluorescence in situ hybridization (FISH). The SNP array has detected a 6.053 Mb deletion at 15q11.2q13.1 (22,770,421- 28,823,722) which overlapped with the critical region of AS (type 1). The parents of the child showed no abnormal results for G-banded karyotyping, SNP array and FISH analysis, indicating a de novo origin of the deletion. Mendelian error checking based on the SNP information suggested that the 15q11.2q13.1 deletion was of maternal origin. SNP array can accurately define the size, location and parental origin of chromosomal microdeletions, which may facilitate the diagnosis of AS due to 15q11q13 deletion and better understanding of its genotype-phenotype correlation.

  8. Methods for the design, implementation, and analysis of illumina infinium™ SNP assays in plants.

    PubMed

    Chagné, David; Bianco, Luca; Lawley, Cindy; Micheletti, Diego; Jacobs, Jeanne M E

    2015-01-01

    The advent of Next-Generation sequencing-by-synthesis technologies has fuelled SNP discovery, genotyping, and screening of populations in myriad ways for many species, including various plant species. One technique widely applied to screening a large number of SNP markers over a large number of samples is the Illumina Infinium™ assay.

  9. Kernel machine SNP-set analysis for censored survival outcomes in genome-wide association studies.

    PubMed

    Lin, Xinyi; Cai, Tianxi; Wu, Michael C; Zhou, Qian; Liu, Geoffrey; Christiani, David C; Lin, Xihong

    2011-11-01

    In this article, we develop a powerful test for identifying single nucleotide polymorphism (SNP)-sets that are predictive of survival with data from genome-wide association studies. We first group typed SNPs into SNP-sets based on genomic features and then apply a score test to assess the overall effect of each SNP-set on the survival outcome through a kernel machine Cox regression framework. This approach uses genetic information from all SNPs in the SNP-set simultaneously and accounts for linkage disequilibrium (LD), leading to a powerful test with reduced degrees of freedom when the typed SNPs are in LD with each other. This type of test also has the advantage of capturing the potentially nonlinear effects of the SNPs, SNP-SNP interactions (epistasis), and the joint effects of multiple causal variants. By simulating SNP data based on the LD structure of real genes from the HapMap project, we demonstrate that our proposed test is more powerful than the standard single SNP minimum P-value-based test for association studies with censored survival outcomes. We illustrate the proposed test with a real data application. © 2011 Wiley Periodicals, Inc.

  10. A Coordinated Approach to Peach SNP Discovery in RosBREED

    USDA-ARS?s Scientific Manuscript database

    In the USDA-funded multi-institutional and trans-disciplinary project, “RosBREED”, crop-specific SNP genome scan platforms are being developed for peach, apple, strawberry, and cherry at a resolution of at least one polymorphic SNP marker every 5 cM in any random cross, for use in Pedigree-Based Ana...

  11. Model, properties and imputation method of missing SNP genotype data utilizing mutual information

    NASA Astrophysics Data System (ADS)

    Wang, Ying; Wan, Weiming; Wang, Rui-Sheng; Feng, Enmin

    2009-07-01

    Mutual information can be used as a measure for the association of a genetic marker or a combination of markers with the phenotype. In this paper, we study the imputation of missing genotype data. We first utilize joint mutual information to compute the dependence between SNP sites, then construct a mathematical model in order to find the two SNP sites having maximal dependence with missing SNP sites, and further study the properties of this model. Finally, an extension method to haplotype-based imputation is proposed to impute the missing values in genotype data. To verify our method, extensive experiments have been performed, and numerical results show that our method is superior to haplotype-based imputation methods. At the same time, numerical results also prove joint mutual information can better measure the dependence between SNP sites. According to experimental results, we also conclude that the dependence between the adjacent SNP sites is not necessarily strongest.

  12. SNP uniqueness problem: a proof-of-principle in HapMap SNPs.

    PubMed

    Doron, Shany; Shweiki, Dorit

    2011-04-01

    SNP-based research strongly affects our biomedical and clinically associated knowledge. Nonunique and false-positive SNP existence in commonly used datasets may thus lead to biased, inaccurate clinically associated conclusions. We designed a computational study to reveal the degree of nonunique/false-positive SNPs in the HapMap dataset. Two sets of SNP flanking sequences were used as queries for BLAT analysis against the human genome. 4.2% and 11.9% of HapMap SNPs align to the genome nonuniquely (long and short, respectively). Furthermore, an average of 7.9% nonunique SNPs are included in common commercial genotyping arrays (according to our designed probes). Nonunique SNPs identified in this study are represented to various degrees in clinically associated databases, stressing the consequence of inaccurate SNP annotation and hence SNP utilization. Unfortunately, our results question some disease-related genotyping analyses, raising a worrisome concern on their validity.

  13. Design and characterization of a 52K SNP chip for goats.

    PubMed

    Tosser-Klopp, Gwenola; Bardou, Philippe; Bouchez, Olivier; Cabau, Cédric; Crooijmans, Richard; Dong, Yang; Donnadieu-Tonon, Cécile; Eggen, André; Heuven, Henri C M; Jamli, Saadiah; Jiken, Abdullah Johari; Klopp, Christophe; Lawley, Cynthia T; McEwan, John; Martin, Patrice; Moreno, Carole R; Mulsant, Philippe; Nabihoudine, Ibouniyamine; Pailhoux, Eric; Palhière, Isabelle; Rupp, Rachel; Sarry, Julien; Sayre, Brian L; Tircazes, Aurélie; Jun Wang; Wang, Wen; Zhang, Wenguang

    2014-01-01

    The success of Genome Wide Association Studies in the discovery of sequence variation linked to complex traits in humans has increased interest in high throughput SNP genotyping assays in livestock species. Primary goals are QTL detection and genomic selection. The purpose here was design of a 50-60,000 SNP chip for goats. The success of a moderate density SNP assay depends on reliable bioinformatic SNP detection procedures, the technological success rate of the SNP design, even spacing of SNPs on the genome and selection of Minor Allele Frequencies (MAF) suitable to use in diverse breeds. Through the federation of three SNP discovery projects consolidated as the International Goat Genome Consortium, we have identified approximately twelve million high quality SNP variants in the goat genome stored in a database together with their biological and technical characteristics. These SNPs were identified within and between six breeds (meat, milk and mixed): Alpine, Boer, Creole, Katjang, Saanen and Savanna, comprising a total of 97 animals. Whole genome and Reduced Representation Library sequences were aligned on >10 kb scaffolds of the de novo goat genome assembly. The 60,000 selected SNPs, evenly spaced on the goat genome, were submitted for oligo manufacturing (Illumina, Inc) and published in dbSNP along with flanking sequences and map position on goat assemblies (i.e. scaffolds and pseudo-chromosomes), sheep genome V2 and cattle UMD3.1 assembly. Ten breeds were then used to validate the SNP content and 52,295 loci could be successfully genotyped and used to generate a final cluster file. The combined strategy of using mainly whole genome Next Generation Sequencing and mapping on a contig genome assembly, complemented with Illumina design tools proved to be efficient in producing this GoatSNP50 chip. Advances in use of molecular markers are expected to accelerate goat genomic studies in coming years.

  14. A customized pigmentation SNP array identifies a novel SNP associated with melanoma predisposition in the SLC45A2 gene.

    PubMed

    Ibarrola-Villava, Maider; Fernandez, Lara P; Alonso, Santos; Boyano, M Dolores; Peña-Chilet, Maria; Pita, Guillermo; Aviles, Jose A; Mayor, Matias; Gomez-Fernandez, Cristina; Casado, Beatriz; Martin-Gonzalez, Manuel; Izagirre, Neskuts; De la Rua, Concepcion; Asumendi, Aintzane; Perez-Yarza, Gorka; Arroyo-Berdugo, Yoana; Boldo, Enrique; Lozoya, Rafael; Torrijos-Aguilar, Arantxa; Pitarch, Ana; Pitarch, Gerard; Sanchez-Motilla, Jose M; Valcuende-Cavero, Francisca; Tomas-Cabedo, Gloria; Perez-Pastor, Gemma; Diaz-Perez, Jose L; Gardeazabal, Jesus; Martinez de Lizarduy, Iñigo; Sanchez-Diez, Ana; Valdes, Carlos; Pizarro, Angel; Casado, Mariano; Carretero, Gregorio; Botella-Estrada, Rafael; Nagore, Eduardo; Lazaro, Pablo; Lluch, Ana; Benitez, Javier; Martinez-Cadenas, Conrado; Ribas, Gloria

    2011-04-29

    As the incidence of Malignant Melanoma (MM) reflects an interaction between skin colour and UV exposure, variations in genes implicated in pigmentation and tanning response to UV may be associated with susceptibility to MM. In this study, 363 SNPs in 65 gene regions belonging to the pigmentation pathway have been successfully genotyped using a SNP array. Five hundred and ninety MM cases and 507 controls were analyzed in a discovery phase I. Ten candidate SNPs based on a p-value threshold of 0.01 were identified. Two of them, rs35414 (SLC45A2) and rs2069398 (SILV/CKD2), were statistically significant after conservative Bonferroni correction. The best six SNPs were further tested in an independent Spanish series (624 MM cases and 789 controls). A novel SNP located on the SLC45A2 gene (rs35414) was found to be significantly associated with melanoma in both phase I and phase II (P<0.0001). None of the other five SNPs were replicated in this second phase of the study. However, three SNPs in TYR, SILV/CDK2 and ADAMTS20 genes (rs17793678, rs2069398 and rs1510521 respectively) had an overall p-value<0.05 when considering the whole DNA collection (1214 MM cases and 1296 controls). Both the SLC45A2 and the SILV/CDK2 variants behave as protective alleles, while the TYR and ADAMTS20 variants seem to function as risk alleles. Cumulative effects were detected when these four variants were considered together. Furthermore, individuals carrying two or more mutations in MC1R, a well-known low penetrance melanoma-predisposing gene, had a decreased MM risk if concurrently bearing the SLC45A2 protective variant. To our knowledge, this is the largest study on Spanish sporadic MM cases to date.

  15. Evaluation of the SNP tagging approach in an independent population sample--array-based SNP discovery in Sami.

    PubMed

    Johansson, Asa; Vavruch-Nilsson, Veronika; Cox, David R; Frazer, Kelly A; Gyllensten, Ulf

    2007-09-01

    Significant efforts have been made to determine the correlation structure of common SNPs in the human genome. One method has been to identify the sets of tagSNPs that capture most of the genetic variation. Here, we evaluate the transferability of tagSNPs between populations using a population sample of Sami, the indigenous people of Scandinavia. Array-based SNP discovery in a 4.4 Mb region of 28 phased copies of chromosome 21 uncovered 5,132 segregating sites, 3,188 of which had a minimum minor allele frequency (mMAF) of 0.1. Due to the population structure and consequently high LD, the number of tagSNPs needed to capture all SNP variation in Sami is much lower than that for the HapMap populations. TagSNPs identified from the HapMap data perform only slightly better in the Sami than choosing tagSNPs at random from the same set of common SNPs. Surprisingly, tagSNPs defined from the HapMap data did not perform better than selecting the same number of SNPs at random from all SNPs discovered in Sami. Nearly half (46%) of the Sami SNPs with a mMAF of 0.1 are not present in the HapMap dataset. Among sites overlapping between Sami and HapMap populations, 18% are not tagged by the European American (CEU) HapMap tagSNPs, while 43% of the SNPs that are unique to Sami are not tagged by the CEU tagSNPs. These results point to serious limitations in the transferability of common tagSNPs to capture random sequence variation, even between closely related populations, such as CEU and Sami.

  16. A Customized Pigmentation SNP Array Identifies a Novel SNP Associated with Melanoma Predisposition in the SLC45A2 Gene

    PubMed Central

    Alonso, Santos; Boyano, M. Dolores; Peña-Chilet, Maria; Pita, Guillermo; Aviles, Jose A.; Mayor, Matias; Gomez-Fernandez, Cristina; Casado, Beatriz; Martin-Gonzalez, Manuel; Izagirre, Neskuts; De la Rua, Concepcion; Asumendi, Aintzane; Perez-Yarza, Gorka; Arroyo-Berdugo, Yoana; Boldo, Enrique; Lozoya, Rafael; Torrijos-Aguilar, Arantxa; Pitarch, Ana; Pitarch, Gerard; Sanchez-Motilla, Jose M.; Valcuende-Cavero, Francisca; Tomas-Cabedo, Gloria; Perez-Pastor, Gemma; Diaz-Perez, Jose L.; Gardeazabal, Jesus; de Lizarduy, Iñigo Martinez; Sanchez-Diez, Ana; Valdes, Carlos; Pizarro, Angel; Casado, Mariano; Carretero, Gregorio; Botella-Estrada, Rafael; Nagore, Eduardo; Lazaro, Pablo; Lluch, Ana; Benitez, Javier; Martinez-Cadenas, Conrado; Ribas, Gloria

    2011-01-01

    As the incidence of Malignant Melanoma (MM) reflects an interaction between skin colour and UV exposure, variations in genes implicated in pigmentation and tanning response to UV may be associated with susceptibility to MM. In this study, 363 SNPs in 65 gene regions belonging to the pigmentation pathway have been successfully genotyped using a SNP array. Five hundred and ninety MM cases and 507 controls were analyzed in a discovery phase I. Ten candidate SNPs based on a p-value threshold of 0.01 were identified. Two of them, rs35414 (SLC45A2) and rs2069398 (SILV/CKD2), were statistically significant after conservative Bonferroni correction. The best six SNPs were further tested in an independent Spanish series (624 MM cases and 789 controls). A novel SNP located on the SLC45A2 gene (rs35414) was found to be significantly associated with melanoma in both phase I and phase II (P<0.0001). None of the other five SNPs were replicated in this second phase of the study. However, three SNPs in TYR, SILV/CDK2 and ADAMTS20 genes (rs17793678, rs2069398 and rs1510521 respectively) had an overall p-value<0.05 when considering the whole DNA collection (1214 MM cases and 1296 controls). Both the SLC45A2 and the SILV/CDK2 variants behave as protective alleles, while the TYR and ADAMTS20 variants seem to function as risk alleles. Cumulative effects were detected when these four variants were considered together. Furthermore, individuals carrying two or more mutations in MC1R, a well-known low penetrance melanoma-predisposing gene, had a decreased MM risk if concurrently bearing the SLC45A2 protective variant. To our knowledge, this is the largest study on Spanish sporadic MM cases to date. PMID:21559390

  17. Case-control study on association of peroxisome proliferator-activated receptor-δ and SNP-SNP interactions with essential hypertension in Chinese Han population.

    PubMed

    Li, Yubo; Sun, Guoqiang

    2016-01-01

    The aim of this study was to investigate the association of peroxisome proliferator-activated receptor-δ (PPAR-δ) and additional SNP-SNP interaction with essential hypertension (EH) in Chinese Han population. A total of 1248 subjects (625 males, 623 females), including 620 EH patients and 628 normotension subjects, were included in the study. The mean age was 51.2 ± 15.1 years old. Logistic regression model was used to examine the association between four SNP and EH; odds ratio (OR) and 95% confident interval (95%CI) were calculated. Generalized multifactor dimensionality reduction (GMDR) was employed to analyze SNP-SNP interaction. EH risk was significantly lower in carriers of C allele of the rs2016520 polymorphism than those with TT (TC + CC versus TT, adjusted OR (95%CI) = 0.61 (0.49-0.78)). In addition, we also found a significant association between rs9794 and EH; EH risk was also significantly lower in carriers of G allele of the rs9794 polymorphism than those with CC (CG + GG versus CC, adjusted OR (95%CI) = 0.65 (0.53-0.83)). We also found a potential SNP-SNP interaction between rs2016520 and rs9794; subjects with TC or CC of rs2016520 and CG or GG of rs9794 genotype have the lowest EH risk, compared to subjects with TT of rs2016520 and CC of rs9794 genotype; OR (95%CI) was 0.32 (0.23-0.62) after covariate adjustment. Our results support an important association between rs2016520 and rs9794 minor allele of PPAR-δ and decreased risk of EH and additional interaction between rs2016520 and rs9794.

  18. Dual Effects of a RETN Single Nucleotide Polymorphism (SNP) at -420 on Plasma Resistin: Genotype and DNA Methylation.

    PubMed

    Onuma, Hiroshi; Tabara, Yasuharu; Kawamura, Ryoichi; Ohashi, Jun; Nishida, Wataru; Takata, Yasunori; Ochi, Masaaki; Nishimiya, Tatsuya; Ohyagi, Yasumasa; Kawamoto, Ryuichi; Kohara, Katsuhiko; Miki, Tetsuro; Osawa, Haruhiko

    2017-03-01

    We previously reported that single nucleotide polymorphism (SNP)-420 C>G (rs1862513) in the promoter region of RETN was associated with type 2 diabetes. Plasma resistin was tightly correlated with SNP-420 genotypes. SNP-420 is a CpG-SNP affecting the sequence of cytosine-phosphate-guanine dinucleotides. To examine whether methylation at SNP-420 affects plasma resistin, we analyzed plasma resistin and methylation at RETN SNP-420. Genomic DNA was extracted from peripheral white blood cells in 2078 Japanese subjects. Quantification of the methylation was performed by pyrosequencing after DNA bisulfite conversion. Methylation at SNP-420 was highest in the C/C genotype (36.9 ± 5.7%), followed by C/G (21.4 ± 3.5%) and G/G (2.9 ± 1.4%; P < 0.001). When assessed in each genotype, methylation at SNP-420 was inversely associated with plasma resistin in the C/C (β = -0.134, P < 0.001) or C/G (β = -0.227, P < 0.001) genotype. In THP-1 human monocytes intrinsically having the C/C genotype, a demethylating reagent, 5-aza-dC, decreased the methylation at SNP-420 and increased RETN messenger RNA. SNP+1263 (rs3745369), located in the 3' untranslated region of RETN, was also associated with methylation at SNP-420. In addition, highly sensitive C-reactive protein was inversely associated with methylation at SNP-420 in the C/C genotype, whereas body mass index was positively associated. Plasma resistin was inversely associated with the extent of methylation at SNP-420 mainly dependent on the SNP-420 genotype. The association can also be explained partially independent of SNP-420 genotypes. SNP-420 could have dual, genetic and epigenetic effects on plasma resistin.

  19. Analysis of population structure and genetic history of cattle breeds based on high-density SNP data

    USDA-ARS?s Scientific Manuscript database

    Advances in single nucleotide polymorphism (SNP) genotyping microarrays have facilitated a new understanding of population structure and evolutionary history for several species. Most existing studies in livestock were based on low density SNP arrays. The first wave of low density SNP studies on cat...

  20. Exploring of new Y-chromosome SNP loci using Pyrosequencing and the SNaPshot methods.

    PubMed

    Wei, Wei; Luo, Hai-Bo; Yan, Jing; Hou, Yi-Ping

    2012-11-01

    The single nucleotide polymorphisms on the Y chromosome (Y-SNP) have been considered to be important in forensic casework. However, Y-SNP loci were mostly population specific and lacked biallelic polymorphisms in the Asian population. In this study, we developed a strategy for seeking and genotyping new Y-SNP markers based on both Pyrosequencing and the SNaPshot methods. As results, 34 new biallelic markers were observed to be polymorphic in the Chinese Han population by estimation of allele frequencies of 103 candidate's Y-SNP loci in DNA pools using Pyrosequencing technology. Then, a multiplex system with 20 Y-SNP loci was genotyped using the SNaPshot™ multiplex kit. Twenty Y-SNP loci defined 56 different haplotypes, and the haplotype diversity was estimated to be 0.9539. Our result demonstrated that the strategy could be used as an efficient tool to search and genotype biallelic markers from a large amount of candidate loci. In addition, 20 Y-SNP loci constructed a multiplex system, which could provide supplementary information for forensic identification.

  1. Rice SNP-seek database update: new SNPs, indels, and queries.

    PubMed

    Mansueto, Locedie; Fuentes, Roven Rommel; Borja, Frances Nikki; Detras, Jeffery; Abriol-Santos, Juan Miguel; Chebotarov, Dmytro; Sanciangco, Millicent; Palis, Kevin; Copetti, Dario; Poliakov, Alexandre; Dubchak, Inna; Solovyev, Victor; Wing, Rod A; Hamilton, Ruaraidh Sackville; Mauleon, Ramil; McNally, Kenneth L; Alexandrov, Nickolai

    2017-01-04

    We describe updates to the Rice SNP-Seek Database since its first release. We ran a new SNP-calling pipeline followed by filtering that resulted in complete, base, filtered and core SNP datasets. Besides the Nipponbare reference genome, the pipeline was run on genome assemblies of IR 64, 93-11, DJ 123 and Kasalath. New genotype query and display features are added for reference assemblies, SNP datasets and indels. JBrowse now displays BAM, VCF and other annotation tracks, the additional genome assemblies and an embedded VISTA genome comparison viewer. Middleware is redesigned for improved performance by using a hybrid of HDF5 and RDMS for genotype storage. Query modules for genotypes, varieties and genes are improved to handle various constraints. An integrated list manager allows the user to pass query parameters for further analysis. The SNP Annotator adds traits, ontology terms, effects and interactions to markers in a list. Web-service calls were implemented to access most data. These features enable seamless querying of SNP-Seek across various biological entities, a step toward semi-automated gene-trait association discovery. URL: http://snp-seek.irri.org. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. Identification, validation and survey of a single nucleotide polymorphism (SNP) associated with pungency in Capsicum spp.

    PubMed

    Garcés-Claver, Ana; Fellman, Shanna Moore; Gil-Ortega, Ramiro; Jahn, Molly; Arnedo-Andrés, María S

    2007-11-01

    A single nucleotide polymorphism (SNP) associated with pungency was detected within an expressed sequence tag (EST) of 307 bp. This fragment was identified after expression analysis of the EST clone SB2-66 in placenta tissue of Capsicum fruits. Sequence alignments corresponding to this new fragment allowed us to identify an SNP between pungent and non-pungent accessions. Two methods were chosen for the development of the SNP marker linked to pungency: tetra-primer amplification refractory mutation system-PCR (tetra-primer ARMS-PCR) and cleaved amplified polymorphic sequence. Results showed that both methods were successful in distinguishing genotypes. Nevertheless, tetra-primer ARMS-PCR was chosen for SNP genotyping because it was more rapid, reliable and less cost-effective. The utility of this SNP marker for pungency was demonstrated by the ability to distinguish between 29 pungent and non-pungent cultivars of Capsicum annuum. In addition, the SNP was also associated with phenotypic pungent character in the tested genotypes of C. chinense, C. baccatum, C. frutescens, C. galapagoense, C. eximium, C. tovarii and C. cardenasi. This SNP marker is a faster, cheaper and more reproducible method for identifying pungent peppers than other techniques such as panel tasting, and allows rapid screening of the trait in early growth stages.

  3. Rice SNP-seek database update: new SNPs, indels, and queries

    PubMed Central

    Mansueto, Locedie; Fuentes, Roven Rommel; Borja, Frances Nikki; Detras, Jeffery; Abriol-Santos, Juan Miguel; Chebotarov, Dmytro; Sanciangco, Millicent; Palis, Kevin; Copetti, Dario; Poliakov, Alexandre; Dubchak, Inna; Solovyev, Victor; Wing, Rod A.; Hamilton, Ruaraidh Sackville; Mauleon, Ramil; McNally, Kenneth L.; Alexandrov, Nickolai

    2017-01-01

    We describe updates to the Rice SNP-Seek Database since its first release. We ran a new SNP-calling pipeline followed by filtering that resulted in complete, base, filtered and core SNP datasets. Besides the Nipponbare reference genome, the pipeline was run on genome assemblies of IR 64, 93-11, DJ 123 and Kasalath. New genotype query and display features are added for reference assemblies, SNP datasets and indels. JBrowse now displays BAM, VCF and other annotation tracks, the additional genome assemblies and an embedded VISTA genome comparison viewer. Middleware is redesigned for improved performance by using a hybrid of HDF5 and RDMS for genotype storage. Query modules for genotypes, varieties and genes are improved to handle various constraints. An integrated list manager allows the user to pass query parameters for further analysis. The SNP Annotator adds traits, ontology terms, effects and interactions to markers in a list. Web-service calls were implemented to access most data. These features enable seamless querying of SNP-Seek across various biological entities, a step toward semi-automated gene-trait association discovery. URL: http://snp-seek.irri.org. PMID:27899667

  4. Mutagenic primer design for mismatch PCR-RFLP SNP genotyping using a genetic algorithm.

    PubMed

    Yang, Cheng-Hong; Cheng, Yu-Huei; Yang, Cheng-Huei; Chuang, Li-Yeh

    2012-01-01

    Polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) is useful in small-scale basic research studies of complex genetic diseases that are associated with single nucleotide polymorphism (SNP). Designing a feasible primer pair is an important work before performing PCR-RFLP for SNP genotyping. However, in many cases, restriction enzymes to discriminate the target SNP resulting in the primer design is not applicable. A mutagenic primer is introduced to solve this problem. GA-based Mismatch PCR-RFLP Primers Design (GAMPD) provides a method that uses a genetic algorithm to search for optimal mutagenic primers and available restriction enzymes from REBASE. In order to improve the efficiency of the proposed method, a mutagenic matrix is employed to judge whether a hypothetical mutagenic primer can discriminate the target SNP by digestion with available restriction enzymes. The available restriction enzymes for the target SNP are mined by the updated core of SNP-RFLPing. GAMPD has been used to simulate the SNPs in the human SLC6A4 gene under different parameter settings and compared with SNP Cutter for mismatch PCR-RFLP primer design. The in silico simulation of the proposed GAMPD program showed that it designs mismatch PCR-RFLP primers. The GAMPD program is implemented in JAVA and is freely available at http://bio.kuas.edu.tw/gampd/.

  5. A system for exact and approximate genetic linkage analysis of SNP data in large pedigrees

    PubMed Central

    Silberstein, Mark; Weissbrod, Omer; Otten, Lars; Tzemach, Anna; Anisenia, Andrei; Shtark, Oren; Tuberg, Dvir; Galfrin, Eddie; Gannon, Irena; Shalata, Adel; Borochowitz, Zvi U.; Dechter, Rina; Thompson, Elizabeth; Geiger, Dan

    2013-01-01

    Motivation: The use of dense single nucleotide polymorphism (SNP) data in genetic linkage analysis of large pedigrees is impeded by significant technical, methodological and computational challenges. Here we describe Superlink-Online SNP, a new powerful online system that streamlines the linkage analysis of SNP data. It features a fully integrated flexible processing workflow comprising both well-known and novel data analysis tools, including SNP clustering, erroneous data filtering, exact and approximate LOD calculations and maximum-likelihood haplotyping. The system draws its power from thousands of CPUs, performing data analysis tasks orders of magnitude faster than a single computer. By providing an intuitive interface to sophisticated state-of-the-art analysis tools coupled with high computing capacity, Superlink-Online SNP helps geneticists unleash the potential of SNP data for detecting disease genes. Results: Computations performed by Superlink-Online SNP are automatically parallelized using novel paradigms, and executed on unlimited number of private or public CPUs. One novel service is large-scale approximate Markov Chain–Monte Carlo (MCMC) analysis. The accuracy of the results is reliably estimated by running the same computation on multiple CPUs and evaluating the Gelman–Rubin Score to set aside unreliable results. Another service within the workflow is a novel parallelized exact algorithm for inferring maximum-likelihood haplotyping. The reported system enables genetic analyses that were previously infeasible. We demonstrate the system capabilities through a study of a large complex pedigree affected with metabolic syndrome. Availability: Superlink-Online SNP is freely available for researchers at http://cbl-hap.cs.technion.ac.il/superlink-snp. The system source code can also be downloaded from the system website. Contact: omerw@cs.technion.ac.il Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23162081

  6. MDM2 Promoter SNP344T>A (rs1196333) Status Does Not Affect Cancer Risk

    PubMed Central

    Knappskog, Stian; Gansmo, Liv B.; Romundstad, Pål; Bjørnslett, Merete; Trovik, Jone; Sommerfelt-Pettersen, Jan; Løkkevik, Erik; Tollenaar, Rob A. E. M.; Seynaeve, Caroline; Devilee, Peter; Salvesen, Helga B.; Dørum, Anne; Hveem, Kristian; Vatten, Lars; Lønning, Per E.

    2012-01-01

    The MDM2 proto-oncogene plays a key role in central cellular processes like growth control and apoptosis, and the gene locus is frequently amplified in sarcomas. Two polymorphisms located in the MDM2 promoter P2 have been shown to affect cancer risk. One of these polymorphisms (SNP309T>G; rs2279744) facilitates Sp1 transcription factor binding to the promoter and is associated with increased cancer risk. In contrast, SNP285G>C (rs117039649), located 24 bp upstream of rs2279744, and in complete linkage disequilibrium with the SNP309G allele, reduces Sp1 recruitment and lowers cancer risk. Thus, fine tuning of MDM2 expression has proven to be of significant importance with respect to tumorigenesis. We assessed the potential functional effects of a third MDM2 promoter P2 polymorphism (SNP344T>A; rs1196333) located on the SNP309T allele. While in silico analyses indicated SNP344A to modulate TFAP2A, SPIB and AP1 transcription factor binding, we found no effect of SNP344 status on MDM2 expression levels. Assessing the frequency of SNP344A in healthy Caucasians (n = 2,954) and patients suffering from ovarian (n = 1,927), breast (n = 1,271), endometrial (n = 895) or prostatic cancer (n = 641), we detected no significant difference in the distribution of this polymorphism between any of these cancer forms and healthy controls (6.1% in healthy controls, and 4.9%, 5.0%, 5.4% and 7.2% in the cancer groups, respectively). In conclusion, our findings provide no evidence indicating that SNP344A may affect MDM2 transcription or cancer risk. PMID:22558411

  7. SNP-SNP interactions between WNT4 and WNT5A were associated with obesity related traits in Han Chinese Population

    PubMed Central

    Dong, Shan-Shan; Hu, Wei-Xin; Yang, Tie-Lin; Chen, Xiao-Feng; Yan, Han; Chen, Xiang-Ding; Tan, Li-Jun; Tian, Qing; Deng, Hong-Wen; Guo, Yan

    2017-01-01

    Considering the biological roles of WNT4 and WNT5A involved in adipogenesis, we aimed to investigate whether SNPs in WNT4 and WNT5A contribute to obesity related traits in Han Chinese population. Targeted genomic sequence for WNT4 and WNT5A was determined in 100 Han Chinese subjects and tag SNPs were selected. Both single SNP and SNP × SNP interaction association analyses with body mass index (BMI) were evaluated in the 100 subjects and another independent sample of 1,627 Han Chinese subjects. Meta-analyses were performed and multiple testing corrections were carried out using the Bonferroni method. Consistent with the Genetic Investigation of ANthropometric Traits (GIANT) dataset results, we didn’t detect significant association signals in single SNP association analyses. However, the interaction between rs2072920 and rs11918967, was associated with BMI after multiple testing corrections (combined P = 2.20 × 10−4). The signal was also significant in each contributing data set. SNP rs2072920 is located in the 3′-UTR of WNT4 and SNP rs11918967 is located in the intron of WNT5A. Functional annotation results revealed that both SNPs might be involved in transcriptional regulation of gene expression. Our results suggest that a combined effect of SNPs via WNT4-WNT5A interaction may affect the variation of BMI in Han Chinese population. PMID:28272483

  8. Sequential sentinel SNP Regional Association Plots (SSS-RAP): an approach for testing independence of SNP association signals using meta-analysis data.

    PubMed

    Zheng, Jie; Gaunt, Tom R; Day, Ian N M

    2013-01-01

    Genome-Wide Association Studies (GWAS) frequently incorporate meta-analysis within their framework. However, conditional analysis of individual-level data, which is an established approach for fine mapping of causal sites, is often precluded where only group-level summary data are available for analysis. Here, we present a numerical and graphical approach, "sequential sentinel SNP regional association plot" (SSS-RAP), which estimates regression coefficients (beta) with their standard errors using the meta-analysis summary results directly. Under an additive model, typical for genes with small effect, the effect for a sentinel SNP can be transformed to the predicted effect for a possibly dependent SNP through a 2×2 2-SNP haplotypes table. The approach assumes Hardy-Weinberg equilibrium for test SNPs. SSS-RAP is available as a Web-tool (http://apps.biocompute.org.uk/sssrap/sssrap.cgi). To develop and illustrate SSS-RAP we analyzed lipid and ECG traits data from the British Women's Heart and Health Study (BWHHS), evaluated a meta-analysis for ECG trait and presented several simulations. We compared results with existing approaches such as model selection methods and conditional analysis. Generally findings were consistent. SSS-RAP represents a tool for testing independence of SNP association signals using meta-analysis data, and is also a convenient approach based on biological principles for fine mapping in group level summary data. © 2012 Blackwell Publishing Ltd/University College London.

  9. Performance comparison of SNP detection tools with illumina exome sequencing data—an assessment using both family pedigree information and sample-matched SNP array data

    PubMed Central

    Yi, Ming; Zhao, Yongmei; Jia, Li; He, Mei; Kebebew, Electron; Stephens, Robert M.

    2014-01-01

    To apply exome-seq-derived variants in the clinical setting, there is an urgent need to identify the best variant caller(s) from a large collection of available options. We have used an Illumina exome-seq dataset as a benchmark, with two validation scenarios—family pedigree information and SNP array data for the same samples, permitting global high-throughput cross-validation, to evaluate the quality of SNP calls derived from several popular variant discovery tools from both the open-source and commercial communities using a set of designated quality metrics. To the best of our knowledge, this is the first large-scale performance comparison of exome-seq variant discovery tools using high-throughput validation with both Mendelian inheritance checking and SNP array data, which allows us to gain insights into the accuracy of SNP calling through such high-throughput validation in an unprecedented way, whereas the previously reported comparison studies have only assessed concordance of these tools without directly assessing the quality of the derived SNPs. More importantly, the main purpose of our study was to establish a reusable procedure that applies high-throughput validation to compare the quality of SNP discovery tools with a focus on exome-seq, which can be used to compare any forthcoming tool(s) of interest. PMID:24831545

  10. Imputation of KIR Types from SNP Variation Data

    PubMed Central

    Vukcevic, Damjan; Traherne, James A.; Næss, Sigrid; Ellinghaus, Eva; Kamatani, Yoichiro; Dilthey, Alexander; Lathrop, Mark; Karlsen, Tom H.; Franke, Andre; Moffatt, Miriam; Cookson, William; Trowsdale, John; McVean, Gil; Sawcer, Stephen; Leslie, Stephen

    2015-01-01

    Large population studies of immune system genes are essential for characterizing their role in diseases, including autoimmune conditions. Of key interest are a group of genes encoding the killer cell immunoglobulin-like receptors (KIRs), which have known and hypothesized roles in autoimmune diseases, resistance to viruses, reproductive conditions, and cancer. These genes are highly polymorphic, which makes typing expensive and time consuming. Consequently, despite their importance, KIRs have been little studied in large cohorts. Statistical imputation methods developed for other complex loci (e.g., human leukocyte antigen [HLA]) on the basis of SNP data provide an inexpensive high-throughput alternative to direct laboratory typing of these loci and have enabled important findings and insights for many diseases. We present KIR∗IMP, a method for imputation of KIR copy number. We show that KIR∗IMP is highly accurate and thus allows the study of KIRs in large cohorts and enables detailed investigation of the role of KIRs in human disease. PMID:26430804

  11. Rapid SNP Detection and Genotyping of Bacterial Pathogens by Pyrosequencing.

    PubMed

    Amoako, Kingsley K; Thomas, Matthew C; Janzen, Timothy W; Goji, Noriko

    2017-01-01

    Bacterial identification and typing are fixtures of microbiology laboratories and are vital aspects of our response mechanisms in the event of foodborne outbreaks and bioterrorist events. Whole genome sequencing (WGS) is leading the way in terms of expanding our ability to identify and characterize bacteria through the identification of subtle differences between genomes (e.g. single nucleotide polymorphisms (SNPs) and insertions/deletions). Modern high-throughput technologies such as pyrosequencing can facilitate the typing of bacteria by generating short-read sequence data of informative regions identified by WGS analyses, at a fraction of the cost of WGS. Thus, pyrosequencing systems remain a valuable asset in the laboratory today. Presented in this chapter are two methods developed in the Amoako laboratory that detail the identification and genotyping of bacterial pathogens. The first targets canonical single nucleotide polymorphisms (canSNPs) of evolutionary importance in Bacillus anthracis, the causative agent of Anthrax. The second assay detects Shiga-toxin (stx) genes, which are associated with virulence in Escherichia coli and Shigella spp., and differentiates the subtypes of stx-1 and stx-2 based on SNP loci. These rapid methods provide end users with important information regarding virulence traits as well as the evolutionary and biogeographic origin of isolates.

  12. Using Mendelian inheritance to improve high-throughput SNP discovery.

    PubMed

    Chen, Nancy; Van Hout, Cristopher V; Gottipati, Srikanth; Clark, Andrew G

    2014-11-01

    Restriction site-associated DNA sequencing or genotyping-by-sequencing (GBS) approaches allow for rapid and cost-effective discovery and genotyping of thousands of single-nucleotide polymorphisms (SNPs) in multiple individuals. However, rigorous quality control practices are needed to avoid high levels of error and bias with these reduced representation methods. We developed a formal statistical framework for filtering spurious loci, using Mendelian inheritance patterns in nuclear families, that accommodates variable-quality genotype calls and missing data--both rampant issues with GBS data--and for identifying sex-linked SNPs. Simulations predict excellent performance of both the Mendelian filter and the sex-linkage assignment under a variety of conditions. We further evaluate our method by applying it to real GBS data and validating a subset of high-quality SNPs. These results demonstrate that our metric of Mendelian inheritance is a powerful quality filter for GBS loci that is complementary to standard coverage and Hardy-Weinberg filters. The described method, implemented in the software MendelChecker, will improve quality control during SNP discovery in nonmodel as well as model organisms.

  13. Porcine colonization of the Americas: a 60k SNP story

    PubMed Central

    Burgos-Paz, W; Souza, C A; Megens, H J; Ramayo-Caldas, Y; Melo, M; Lemús-Flores, C; Caal, E; Soto, H W; Martínez, R; Álvarez, L A; Aguirre, L; Iñiguez, V; Revidatti, M A; Martínez-López, O R; Llambi, S; Esteve-Codina, A; Rodríguez, M C; Crooijmans, R P M A; Paiva, S R; Schook, L B; Groenen, M A M; Pérez-Enciso, M

    2013-01-01

    The pig, Sus scrofa, is a foreign species to the American continent. Although pigs originally introduced in the Americas should be related to those from the Iberian Peninsula and Canary islands, the phylogeny of current creole pigs that now populate the continent is likely to be very complex. Because of the extreme climates that America harbors, these populations also provide a unique example of a fast evolutionary phenomenon of adaptation. Here, we provide a genome wide study of these issues by genotyping, with a 60k SNP chip, 206 village pigs sampled across 14 countries and 183 pigs from outgroup breeds that are potential founders of the American populations, including wild boar, Iberian, international and Chinese breeds. Results show that American village pigs are primarily of European ancestry, although the observed genetic landscape is that of a complex conglomerate. There was no correlation between genetic and geographical distances, neither continent wide nor when analyzing specific areas. Most populations showed a clear admixed structure where the Iberian pig was not necessarily the main component, illustrating how international breeds, but also Chinese pigs, have contributed to extant genetic composition of American village pigs. We also observe that many genes related to the cardiovascular system show an increased differentiation between altiplano and genetically related pigs living near sea level. PMID:23250008

  14. Porcine colonization of the Americas: a 60k SNP story.

    PubMed

    Burgos-Paz, W; Souza, C A; Megens, H J; Ramayo-Caldas, Y; Melo, M; Lemús-Flores, C; Caal, E; Soto, H W; Martínez, R; Alvarez, L A; Aguirre, L; Iñiguez, V; Revidatti, M A; Martínez-López, O R; Llambi, S; Esteve-Codina, A; Rodríguez, M C; Crooijmans, R P M A; Paiva, S R; Schook, L B; Groenen, M A M; Pérez-Enciso, M

    2013-04-01

    The pig, Sus scrofa, is a foreign species to the American continent. Although pigs originally introduced in the Americas should be related to those from the Iberian Peninsula and Canary islands, the phylogeny of current creole pigs that now populate the continent is likely to be very complex. Because of the extreme climates that America harbors, these populations also provide a unique example of a fast evolutionary phenomenon of adaptation. Here, we provide a genome wide study of these issues by genotyping, with a 60k SNP chip, 206 village pigs sampled across 14 countries and 183 pigs from outgroup breeds that are potential founders of the American populations, including wild boar, Iberian, international and Chinese breeds. Results show that American village pigs are primarily of European ancestry, although the observed genetic landscape is that of a complex conglomerate. There was no correlation between genetic and geographical distances, neither continent wide nor when analyzing specific areas. Most populations showed a clear admixed structure where the Iberian pig was not necessarily the main component, illustrating how international breeds, but also Chinese pigs, have contributed to extant genetic composition of American village pigs. We also observe that many genes related to the cardiovascular system show an increased differentiation between altiplano and genetically related pigs living near sea level.

  15. Single Nucleotide Polymorphism (SNP)-Strings: An Alternative Method for Assessing Genetic Associations

    PubMed Central

    Goodin, Douglas S.; Khankhanian, Pouya

    2014-01-01

    Background Genome-wide association studies (GWAS) identify disease-associations for single-nucleotide-polymorphisms (SNPs) from scattered genomic-locations. However, SNPs frequently reside on several different SNP-haplotypes, only some of which may be disease-associated. This circumstance lowers the observed odds-ratio for disease-association. Methodology/Principal Findings Here we develop a method to identify the two SNP-haplotypes, which combine to produce each person’s SNP-genotype over specified chromosomal segments. Two multiple sclerosis (MS)-associated genetic regions were modeled; DRB1 (a Class II molecule of the major histocompatibility complex) and MMEL1 (an endopeptidase that degrades both neuropeptides and β-amyloid). For each locus, we considered sets of eleven adjacent SNPs, surrounding the putative disease-associated gene and spanning ∼200 kb of DNA. The SNP-information was converted into an ordered-set of eleven-numbers (subject-vectors) based on whether a person had zero, one, or two copies of particular SNP-variant at each sequential SNP-location. SNP-strings were defined as those ordered-combinations of eleven-numbers (0 or 1), representing a haplotype, two of which combined to form the observed subject-vector. Subject-vectors were resolved using probabilistic methods. In both regions, only a small number of SNP-strings were present. We compared our method to the SHAPEIT-2 phasing-algorithm. When the SNP-information spanning 200 kb was used, SHAPEIT-2 was inaccurate. When the SHAPEIT-2 window was increased to 2,000 kb, the concordance between the two methods, in both of these eleven-SNP regions, was over 99%, suggesting that, in these regions, both methods were quite accurate. Nevertheless, correspondence was not uniformly high over the entire DNA-span but, rather, was characterized by alternating peaks and valleys of concordance. Moreover, in the valleys of poor-correspondence, SHAPEIT-2 was also inconsistent with itself, suggesting that

  16. Impact of the PDE4D gene polymorphism and additional SNP-SNP and gene-smoking interaction on ischemic stroke risk in Chinese Han population.

    PubMed

    Wang, Xianxiang; Sun, Zhongwu; Zhang, Yiquan; Tian, Xuefeng; Li, Qingxin; Luo, Jing

    2017-04-01

    To investigate the association between phosphodiesterase 4D gene (PDE4D) gene single nucleotide polymorphisms (SNPs) and ischemic stroke (IS) risk, and impact of additional SNP- SNP and gene- smoking interaction on IS risk in Chinese population. A total of 1228 subjects (666 males, 562 females) were selected, including 610 IS patients and 618 control subjects. Logistic regression model was used to examine the association between SNPs in PDE4D gene and IS risk. Generalized multifactor dimensionality reduction (GMDR) was employed to analyze the SNP- SNP and gene- smoking interaction. IS risks were significantly higher in carriers of A allele of rs12188950 polymorphism than those with GG genotype (GA + AA vs. GG), adjusted OR (95%CI) = 1.61 (1.26-2.19), and also significantly higher in carriers of T allele of rs966221 polymorphism than those with CC (CT + TT vs. CC), adjusted OR (95%CI) = 1.82 (1.39-2.23). We found that there was a significant SNP- SNP interaction between rs966221 and rs12188950. Subjects with CT or TT of rs966221 and GA or AA of rs12188950 genotype have the highest IS risk, compared to subjects with CC of rs966221 and GG of rs12188950 genotype, OR (95%CI) was 3.52 (2.68-4.69). We also found a significant gene-environment interaction between rs966221 and smoking. Smokers with CT or TT of rs966221 genotype have the highest IS risk, compared to never smokers with CC of rs966221 genotype, OR (95%CI) was 3.97 (2.25-5.71). Our results support an important association of rs966221 and rs12188950 minor allele and its interaction with increased risk of IS risk, and additional interaction between rs966221 and smoking.

  17. Development and Evaluation of a 9K SNP Array for Peach by Internationally Coordinated SNP Detection and Validation in Breeding Germplasm

    PubMed Central

    Scalabrin, Simone; Gilmore, Barbara; Lawley, Cynthia T.; Gasic, Ksenija; Micheletti, Diego; Rosyara, Umesh R.; Cattonaro, Federica; Vendramin, Elisa; Main, Dorrie; Aramini, Valeria; Blas, Andrea L.; Mockler, Todd C.; Bryant, Douglas W.; Wilhelm, Larry; Troggio, Michela; Sosinski, Bryon; Aranzana, Maria José; Arús, Pere; Iezzoni, Amy; Morgante, Michele; Peace, Cameron

    2012-01-01

    Although a large number of single nucleotide polymorphism (SNP) markers covering the entire genome are needed to enable molecular breeding efforts such as genome wide association studies, fine mapping, genomic selection and marker-assisted selection in peach [Prunus persica (L.) Batsch] and related Prunus species, only a limited number of genetic markers, including simple sequence repeats (SSRs), have been available to date. To address this need, an international consortium (The International Peach SNP Consortium; IPSC) has pursued a coordinated effort to perform genome-scale SNP discovery in peach using next generation sequencing platforms to develop and characterize a high-throughput Illumina Infinium® SNP genotyping array platform. We performed whole genome re-sequencing of 56 peach breeding accessions using the Illumina and Roche/454 sequencing technologies. Polymorphism detection algorithms identified a total of 1,022,354 SNPs. Validation with the Illumina GoldenGate® assay was performed on a subset of the predicted SNPs, verifying ∼75% of genic (exonic and intronic) SNPs, whereas only about a third of intergenic SNPs were verified. Conservative filtering was applied to arrive at a set of 8,144 SNPs that were included on the IPSC peach SNP array v1, distributed over all eight peach chromosomes with an average spacing of 26.7 kb between SNPs. Use of this platform to screen a total of 709 accessions of peach in two separate evaluation panels identified a total of 6,869 (84.3%) polymorphic SNPs. The almost 7,000 SNPs verified as polymorphic through extensive empirical evaluation represent an excellent source of markers for future studies in genetic relatedness, genetic mapping, and dissecting the genetic architecture of complex agricultural traits. The IPSC peach SNP array v1 is commercially available and we expect that it will be used worldwide for genetic studies in peach and related stone fruit and nut species. PMID:22536421

  18. Interim report on updated microarray probes for the LLNL Burkholderia pseudomallei SNP array

    SciTech Connect

    Gardner, S; Jaing, C

    2012-03-27

    The overall goal of this project is to forensically characterize 100 unknown Burkholderia isolates in the US-Australia collaboration. We will identify genome-wide single nucleotide polymorphisms (SNPs) from B. pseudomallei and near neighbor species including B. mallei, B. thailandensis and B. oklahomensis. We will design microarray probes to detect these SNP markers and analyze 100 Burkholderia genomic DNAs extracted from environmental, clinical and near neighbor isolates from Australian collaborators on the Burkholderia SNP microarray. We will analyze the microarray genotyping results to characterize the genetic diversity of these new isolates and triage the samples for whole genome sequencing. In this interim report, we described the SNP analysis and the microarray probe design for the Burkholderia SNP microarray.

  19. SNP ascertainment bias in population genetic analyses: Why it is important, and how to correct it

    PubMed Central

    Lachance, Joseph; Tishkoff, Sarah A.

    2013-01-01

    Summary Whole genome sequencing and SNP genotyping arrays can paint strikingly different pictures of demographic history and natural selection. This is because genotyping arrays contain biased sets of pre-ascertained SNPs. In this short review, we use comparisons between high-coverage whole genome sequences of African hunter-gatherers and data from genotyping arrays to highlight how SNP ascertainment bias distorts population genetic inferences. Sample sizes and the populations in which SNPs are discovered affect the characteristics of observed variants. We find that SNPs on genotyping arrays tend to be older and present in multiple populations. In addition, genotyping arrays cause allele frequency distributions to be shifted towards intermediate frequency alleles, and estimates of linkage disequilibrium are modified. Since population genetic analyses depend on allele frequencies it is imperative that researchers are aware of the effects of SNP ascertainment bias. With this in mind we describe multiple ways to correct for SNP ascertainment bias. PMID:23836388

  20. An overview of SNP interactions in genome-wide association studies.

    PubMed

    Li, Pei; Guo, Maozu; Wang, Chunyu; Liu, Xiaoyan; Zou, Quan

    2015-03-01

    With the recent explosion in high-throughput genotyping technology, the amount and quality of single-nucleotide polymorphism (SNP) data has increased exponentially. Therefore, the identification of SNP interactions that are associated with common diseases is playing an increasing and important role in interpreting the genetic basis of disease susceptibility and in devising new diagnostic tests and treatments. However, because these data sets are large, although they typically have small sample sizes and low signal-to-noise ratios, there has been no major breakthrough despite many efforts, making this a major focus in the field of bioinformatics. In this article, we review the two main aspects of SNP interaction studies in recent years-the simulation and identification of SNP interactions-and then discuss the principles, efficiency and differences between these methods. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  1. SNP discovery and genotyping using Genotyping-by-Sequencing in Pekin ducks

    PubMed Central

    Zhu, Feng; Cui, Qian-Qian; Hou, Zhuo-Cheng

    2016-01-01

    Genomic selection and genome-wide association studies need thousands to millions of SNPs. However, many non-model species do not have reference chips for detecting variation. Our goal was to develop and validate an inexpensive but effective method for detecting SNP variation. Genotyping by sequencing (GBS) can be a highly efficient strategy for genome-wide SNP detection, as an alternative to microarray chips. Here, we developed a GBS protocol for ducks and tested it to genotype 49 Pekin ducks. A total of 169,209 SNPs were identified from all animals, with a mean of 55,920 SNPs per individual. The average SNP density reached 1156 SNPs/MB. In this study, the first application of GBS to ducks, we demonstrate the power and simplicity of this method. GBS can be used for genetic studies in to provide an effective method for genome-wide SNP discovery. PMID:27845353

  2. A user guide to the Brassica 60K Illumina Infinium™ SNP genotyping array.

    PubMed

    Mason, Annaliese S; Higgins, Erin E; Snowdon, Rod J; Batley, Jacqueline; Stein, Anna; Werner, Christian; Parkin, Isobel A P

    2017-04-01

    The Brassica napus 60K Illumina Infinium™ SNP array has had huge international uptake in the rapeseed community due to the revolutionary speed of acquisition and ease of analysis of this high-throughput genotyping data, particularly when coupled with the newly available reference genome sequence. However, further utilization of this valuable resource can be optimized by better understanding the promises and pitfalls of SNP arrays. We outline how best to analyze Brassica SNP marker array data for diverse applications, including linkage and association mapping, genetic diversity and genomic introgression studies. We present data on which SNPs are locus-specific in winter, semi-winter and spring B. napus germplasm pools, rather than amplifying both an A-genome and a C-genome locus or multiple loci. Common issues that arise when analyzing array data will be discussed, particularly those unique to SNP markers and how to deal with these for practical applications in Brassica breeding applications.

  3. Set up of cutoff thresholds for kinship determination using SNP loci.

    PubMed

    Cho, Sohee; Shin, Eun Soon; Yu, Hyung Jin; Lee, Ji Hyun; Seo, Hee Jin; Kim, Moon Young; Lee, Soong Deok

    2017-03-08

    The usefulness of single nucleotide polymorphism (SNP) loci for kinship testing has been demonstrated in many case works, and suggested as a promising marker for relationship identification. For interpreting results based on the calculation of the likelihood ratio (LR) in kinship testing, it is important to prepare cutoffs for respective relatives which are dependent on genetic relatedness. For this, analysis using true pedigree data is significant and reliable as it reflects the actual frequencies of markers in the population. In this study, the kinship index was explored through 1209 parent-child pairs, 1373 full sibling pairs, and 247 uncle-nephew pairs using 136 SNP loci. The cutoffs for LR were set up using different numbers of SNP loci with accuracy, sensitivity, and specificity. It is expected that this study can support the application of SNP loci-based kinship testing for various relationships.

  4. Identification of SNP Haplotypes and Prospects of Association Mapping in Watermelon

    USDA-ARS?s Scientific Manuscript database

    Watermelon is the fifth most economically important vegetable crop cultivated world-wide. Implementing Single Nucleotide Polymorphism (SNP) marker technology in watermelon breeding and germplasm evaluation programs holds a key to improve horticulturally important traits. Next-generation sequencing...

  5. Gene-Environment Interaction in the Etiology of Mathematical Ability Using SNP Sets

    PubMed Central

    Kovas, Yulia; Plomin, Robert

    2010-01-01

    Mathematics ability and disability is as heritable as other cognitive abilities and disabilities, however its genetic etiology has received relatively little attention. In our recent genome-wide association study of mathematical ability in 10-year-old children, 10 SNP associations were nominated from scans of pooled DNA and validated in an individually genotyped sample. In this paper, we use a ‘SNP set’ composite of these 10 SNPs to investigate gene-environment (GE) interaction, examining whether the association between the 10-SNP set and mathematical ability differs as a function of ten environmental measures in the home and school in a sample of 1888 children with complete data. We found two significant GE interactions for environmental measures in the home and the school both in the direction of the diathesis-stress type of GE interaction: The 10-SNP set was more strongly associated with mathematical ability in chaotic homes and when parents are negative. PMID:20978832

  6. Use of molecular variation in the NCBI dbSNP database.

    PubMed

    Sherry, S T; Ward, M; Sirotkin, K

    2000-01-01

    While high quality information regarding variation in genes is currently available in locus-specific or specialized mutation databases, the need remains for a general catalog of genome variation to address the large-scale sampling designs required by association studies, gene mapping, and evolutionary biology. In response to this need, the National Center for Biotechnology Information (NCBI) has established the dbSNP database http://ncbi. nlm.nih.gov/SNP/ to serve as a generalized, central variation database. Submissions to dbSNP will be integrated with other sources of information at NCBI such as GenBank, PubMed, LocusLink, and the Human Genome Project data, and the complete contents of dbSNP are available to the public via anonymous FTP. Hum Mutat 15:68-75, 2000. Published 2000 Wiley-Liss, Inc.

  7. Methods of tagSNP selection and other variables affecting imputation accuracy in swine

    PubMed Central

    2013-01-01

    Background Genotype imputation is a cost efficient alternative to use of high density genotypes for implementing genomic selection. The objective of this study was to investigate variables affecting imputation accuracy from low density tagSNP (average distance between tagSNP from 100kb to 1Mb) sets in swine, selected using LD information, physical location, or accuracy for genotype imputation. We compared results of imputation accuracy based on several sets of low density tagSNP of varying densities and selected using three different methods. In addition, we assessed the effect of varying size and composition of the reference panel of haplotypes used for imputation. Results TagSNP density of at least 1 tagSNP per 340kb (∼7000 tagSNP) selected using pairwise LD information was necessary to achieve average imputation accuracy higher than 0.95. A commercial low density (9K) tagSNP set for swine was developed concurrent to this study and an average accuracy of imputation of 0.951 based on these tagSNP was estimated. Construction of a haplotype reference panel was most efficient when these haplotypes were obtained from randomly sampled individuals. Increasing the size of the original reference haplotype panel (128 haplotypes sampled from 32 sire/dam/offspring trios phased in a previous study) led to an overall increase in imputation accuracy (IA = 0.97 with 512 haplotypes), but was especially useful in increasing imputation accuracy of SNP with MAF below 0.1 and for SNP located in the chromosomal extremes (within 5% of chromosome end). Conclusion The new commercially available 9K tagSNP set can be used to obtain imputed genotypes with high accuracy, even when imputation is based on a comparably small panel of reference haplotypes (128 haplotypes). Average imputation accuracy can be further increased by adding haplotypes to the reference panel. In addition, our results show that randomly sampling individuals to genotype for the construction of a reference haplotype

  8. Longitudinal SNP-set association analysis of quantitative phenotypes.

    PubMed

    Wang, Zhong; Xu, Ke; Zhang, Xinyu; Wu, Xiaowei; Wang, Zuoheng

    2017-01-01

    Many genetic epidemiological studies collect repeated measurements over time. This design not only provides a more accurate assessment of disease condition, but allows us to explore the genetic influence on disease development and progression. Thus, it is of great interest to study the longitudinal contribution of genes to disease susceptibility. Most association testing methods for longitudinal phenotypes are developed for single variant, and may have limited power to detect association, especially for variants with low minor allele frequency. We propose Longitudinal SNP-set/sequence kernel association test (LSKAT), a robust, mixed-effects method for association testing of rare and common variants with longitudinal quantitative phenotypes. LSKAT uses several random effects to account for the within-subject correlation in longitudinal data, and allows for adjustment for both static and time-varying covariates. We also present a longitudinal trait burden test (LBT), where we test association between the trait and the burden score in linear mixed models. In simulation studies, we demonstrate that LBT achieves high power when variants are almost all deleterious or all protective, while LSKAT performs well in a wide range of genetic models. By making full use of trait values from repeated measures, LSKAT is more powerful than several tests applied to a single measurement or average over all time points. Moreover, LSKAT is robust to misspecification of the covariance structure. We apply the LSKAT and LBT methods to detect association with longitudinally measured body mass index in the Framingham Heart Study, where we are able to replicate association with a circadian gene NR1D2. © 2016 WILEY PERIODICALS, INC.

  9. Prim-SNPing: a primer designer for cost-effective SNP genotyping.

    PubMed

    Chang, Hsueh-Wei; Chuang, Li-Yeh; Cheng, Yu-Huei; Hung, Yu-Chen; Wen, Cheng-Hao; Gu, De-Leung; Yang, Cheng-Hong

    2009-05-01

    Many kinds of primer design (PD) software tools have been developed, but most of them lack a single nucleotide polymorphism (SNP) genotyping service. Here, we introduce the web-based freeware "Prim-SNPing," which, in addition to general PD, provides three kinds of primer design functions for cost-effective SNP genotyping: natural PD, mutagenic PD, and confronting two-pair primers (CTPP) PD. The natural PD and mutagenic PD provide primers and restriction enzyme mining for polymerase chain reaction-restriction fragment of length polymorphism (PCR-RFLP), while CTPP PD provides primers for restriction enzyme-free SNP genotyping. The PCR specificity and efficiency of the designed primers are improved by BLAST searching and evaluating secondary structure (such as GC clamps, dimers, and hairpins), respectively. The length pattern of PCR-RFLP using natural PD is user-adjustable, and the restriction sites of the RFLP enzymes provided by Prim-SNPing are confirmed to be absent within the generated PCR product. In CTPP PD, the need for a separate digestion step in RFLP is eliminated, thus making it faster and cheaper. The output of Prim-SNPing includes the primer list, melting temperature (Tm) value, GC percentage, and amplicon size with enzyme digestion information. The reference SNP (refSNP, or rs) clusters from the Single Nucleotide Polymorphism database (dbSNP) at the National Center for Biotechnology Information (NCBI), and multiple other formats of human, mouse, and rat SNP sequences are acceptable input. In summary, Prim-SNPing provides interactive, user-friendly and cost-effective primer design for SNP genotyping. It is freely available at http://bio.kuas.edu.tw/prim-snping.

  10. Evaluation of approaches for identifying population informative markers from high density SNP Chips

    PubMed Central

    2011-01-01

    Background Genetic markers can be used to identify and verify the origin of individuals. Motivation for the inference of ancestry ranges from conservation genetics to forensic analysis. High density assays featuring Single Nucleotide Polymorphism (SNP) markers can be exploited to create a reduced panel containing the most informative markers for these purposes. The objectives of this study were to evaluate methods of marker selection and determine the minimum number of markers from the BovineSNP50 BeadChip required to verify the origin of individuals in European cattle breeds. Delta, Wright's FST, Weir & Cockerham's FST and PCA methods for population differentiation were compared. The level of informativeness of each SNP was estimated from the breed specific allele frequencies. Individual assignment analysis was performed using the ranked informative markers. Stringency levels were applied by log-likelihood ratio to assess the confidence of the assignment test. Results A 95% assignment success rate for the 384 individually genotyped animals was achieved with < 80, < 100, < 140 and < 200 SNP markers (with increasing stringency threshold levels) across all the examined methods for marker selection. No further gain in power of assignment was achieved by sampling in excess of 200 SNP markers. The marker selection method that required the lowest number of SNP markers to verify the animal's breed origin was Wright's FST (60 to 140 SNPs depending on the chosen degree of confidence). Certain breeds required fewer markers (< 100) to achieve 100% assignment success. In contrast, closely related breeds require more markers (~200) to achieve > 95% assignment success. The power of assignment success, and therefore the number of SNP markers required, is dependent on the levels of genetic heterogeneity and pool of samples considered. Conclusions While all SNP selection methods produced marker panels capable of breed identification, the power of assignment varied markedly among

  11. Design and validation of a 90K SNP genotyping assay for the water buffalo (Bubalus bubalis).

    PubMed

    Iamartino, Daniela; Nicolazzi, Ezequiel L; Van Tassell, Curtis P; Reecy, James M; Fritz-Waters, Eric R; Koltes, James E; Biffani, Stefano; Sonstegard, Tad S; Schroeder, Steven G; Ajmone-Marsan, Paolo; Negrini, Riccardo; Pasquariello, Rolando; Ramelli, Paola; Coletta, Angelo; Garcia, José F; Ali, Ahmad; Ramunno, Luigi; Cosenza, Gianfranco; de Oliveira, Denise A A; Drummond, Marcela G; Bastianetto, Eduardo; Davassi, Alessandro; Pirani, Ali; Brew, Fiona; Williams, John L

    2017-01-01

    The availability of the bovine genome sequence and SNP panels has improved various genomic analyses, from exploring genetic diversity to aiding genetic selection. However, few of the SNP on the bovine chips are polymorphic in buffalo, therefore a panel of single nucleotide DNA markers exclusive for buffalo was necessary for molecular genetic analyses and to develop genomic selection approaches for water buffalo. The creation of a 90K SNP panel for river buffalo and testing in a genome wide association study for milk production is described here. The genomes of 73 buffaloes of 4 different breeds were sequenced and aligned against the bovine genome, which facilitated the identification of 22 million of sequence variants among the buffalo genomes. Based on frequencies of variants within and among buffalo breeds, and their distribution across the genome, inferred from the bovine genome sequence, 90,000 putative single nucleotide polymorphisms were selected to create an Axiom® Buffalo Genotyping Array 90K. This 90K "SNP-Chip" was tested in several river buffalo populations and found to have ∼70% high quality and polymorphic SNPs. Of the 90K SNPs about 24K were also found to be polymorphic in swamp buffalo. The SNP chip was used to investigate the structure of buffalo populations, and could distinguish buffalo from different farms. A Genome Wide Association Study identified genomic regions on 5 chromosomes putatively involved in milk production. The 90K buffalo SNP chip described here is suitable for the analysis of the genomes of river buffalo breeds, and could be used for genetic diversity studies and potentially as a starting point for genome-assisted selection programmes. This SNP Chip could also be used to analyse swamp buffalo, but many loci are not informative and creation of a revised SNP set specific for swamp buffalo would be advised.

  12. Evaluation of breast cancer susceptibility using improved genetic algorithms to generate genotype SNP barcodes.

    PubMed

    Yang, Cheng-Hong; Lin, Yu-Da; Chuang, Li-Yeh; Chang, Hsueh-Wei

    2013-01-01

    Genetic association is a challenging task for the identification and characterization of genes that increase the susceptibility to common complex multifactorial diseases. To fully execute genetic studies of complex diseases, modern geneticists face the challenge of detecting interactions between loci. A genetic algorithm (GA) is developed to detect the association of genotype frequencies of cancer cases and noncancer cases based on statistical analysis. An improved genetic algorithm (IGA) is proposed to improve the reliability of the GA method for high-dimensional SNP-SNP interactions. The strategy offers the top five results to the random population process, in which they guide the GA toward a significant search course. The IGA increases the likelihood of quickly detecting the maximum ratio difference between cancer cases and noncancer cases. The study systematically evaluates the joint effect of 23 SNP combinations of six steroid hormone metabolisms, and signaling-related genes involved in breast carcinogenesis pathways were systematically evaluated, with IGA successfully detecting significant ratio differences between breast cancer cases and noncancer cases. The possible breast cancer risks were subsequently analyzed by odds-ratio (OR) and risk-ratio analysis. The estimated OR of the best SNP barcode is significantly higher than 1 (between 1.15 and 7.01) for specific combinations of two to 13 SNPs. Analysis results support that the IGA provides higher ratio difference values than the GA between breast cancer cases and noncancer cases over 3-SNP to 13-SNP interactions. A more specific SNP-SNP interaction profile for the risk of breast cancer is also provided.

  13. SNP2TFBS – a database of regulatory SNPs affecting predicted transcription factor binding site affinity

    PubMed Central

    Kumar, Sunil; Ambrosini, Giovanna; Bucher, Philipp

    2017-01-01

    SNP2TFBS is a computational resource intended to support researchers investigating the molecular mechanisms underlying regulatory variation in the human genome. The database essentially consists of a collection of text files providing specific annotations for human single nucleotide polymorphisms (SNPs), namely whether they are predicted to abolish, create or change the affinity of one or several transcription factor (TF) binding sites. A SNP's effect on TF binding is estimated based on a position weight matrix (PWM) model for the binding specificity of the corresponding factor. These data files are regenerated at regular intervals by an automatic procedure that takes as input a reference genome, a comprehensive SNP catalogue and a collection of PWMs. SNP2TFBS is also accessible over a web interface, enabling users to view the information provided for an individual SNP, to extract SNPs based on various search criteria, to annotate uploaded sets of SNPs or to display statistics about the frequencies of binding sites affected by selected SNPs. Homepage: http://ccg.vital-it.ch/snp2tfbs/. PMID:27899579

  14. SNP rs1511412 in FOXL2 gene as a risk factor for keloid by meta analysis.

    PubMed

    Lu, Wensheng; Zheng, Xiaodong; Liu, Shengli; Ding, Maoqian; Xie, Jian; Yao, Xiuhua; Zhang, Lanfang; Hu, Bai

    2015-01-01

    Determine whether SNP rs1511412 is associated with keloid. One large-scale GWAS identified association between SNP rs1511412 in the FOXL2 gene and keloid disease in the Japanese population. However, researchers didn't observe significant association for keloid in Chinese Han population (PBonferroni>0.05). It's probable that the frequency of this variant in Chinese Han population was relatively low and the sample size was not very large in this study (power =45.5). We performed an independent case control association study in the Chinese Han population and a follow-up large scale meta-analysis for SNP rs1511412. Our study included 309 keloid patients and 1080 controls of the Chinese Han population. A significant association was found between SNP and keloid (P=0.02, OR=2.23). Meta-analysis included 1847 keloid patients and 7229 controls combined from five Asian populations. The association between SNP rs1511412 and keloid became highly significant (P<1×10(-8) OR=1.89). We conclude that SNP rs1511412 in FOXL2 is indeed a genetic risk factor for keloid across different ethnic populations.

  15. SNP rs1511412 in FOXL2 gene as a risk factor for keloid by meta analysis

    PubMed Central

    Lu, Wensheng; Zheng, Xiaodong; Liu, Shengli; Ding, Maoqian; Xie, Jian; Yao, Xiuhua; Zhang, Lanfang; Hu, Bai

    2015-01-01

    Objective: Determine whether SNP rs1511412 is associated with keloid. Design and methods: One large-scale GWAS identified association between SNP rs1511412 in the FOXL2 gene and keloid disease in the Japanese population. However, researchers didn’t observe significant association for keloid in Chinese Han population (PBonferroni>0.05). It’s probable that the frequency of this variant in Chinese Han population was relatively low and the sample size was not very large in this study (power =45.5). We performed an independent case control association study in the Chinese Han population and a follow-up large scale meta-analysis for SNP rs1511412. Results: Our study included 309 keloid patients and 1080 controls of the Chinese Han population. A significant association was found between SNP and keloid (P=0.02, OR=2.23). Meta-analysis included 1847 keloid patients and 7229 controls combined from five Asian populations. The association between SNP rs1511412 and keloid became highly significant (P<1×10-8 OR=1.89). Conclusion: We conclude that SNP rs1511412 in FOXL2 is indeed a genetic risk factor for keloid across different ethnic populations. PMID:25932232

  16. SNP and mutation data on the web - hidden treasures for uncovering.

    PubMed

    Barnes, Michael R

    2002-01-01

    SNP data has grown exponentially over the last two years, SNP database evolution has matched this growth, as initial development of several independent SNP databases has given way to one central SNP database, dbSNP. Other SNP databases have instead evolved to complement this central database by providing gene specific focus and an increased level of curation and analysis on subsets of data, derived from the central data set. By contrast, human mutation data, which has been collected over many years, is still stored in disparate sources, although moves are afoot to move to a similar central database. These developments are timely, human mutation and polymorphism data both hold complementary keys to a better understanding of how genes function and malfunction in disease. The impending availability of a complete human genome presents us with an ideal framework to integrate both these forms of data, as our understanding of the mechanisms of disease increase, the full genomic context of variation may become increasingly significant.

  17. SNP-based prediction of the human germ cell methylation landscape.

    PubMed

    Xie, Hehuang; Wang, Min; Bischof, Jared; Bonaldo, Maria de Fatima; Soares, Marcelo Bento

    2009-05-01

    Base substitution occurs at a high rate at CpG dinucleotides due to the frequent methylation of CpG and the deamination of methylated cytosine to thymine. If these substitutions occur in germ cells, they constitute a heritable mutation that may eventually rise to polymorphic frequencies, hence resulting in a SNP that is methylation associated. In this study, we sought to identify clusters of methylation associated SNPs as a basis for prediction of methylation landscapes of germ cell genomes. Genomic regions enriched with methylation associated SNPs, namely "methylation associated SNP clusters", were identified with an agglomerative hierarchical clustering algorithm. Repetitive elements, segmental duplications, and syntenic tandem DNA repeats were enriched in methylation associated SNP clusters. The frequency of methylation associated SNPs in Alu Y/S elements exhibited a gradient pattern suggestive of linear spreading, being higher in proximity to methylation associated SNP clusters and lower closer to CpG islands. Interestingly, methylation associated SNP clusters were over-represented near the transcriptional initiation sites of immune response genes. We propose a de novo DNA methylation model during germ cell development whereby a pattern is established by long-range chromatic interactions through syntenic repeats combined with regional methylation spreading from methylation associated SNP clusters.

  18. SNP Microarray in FISH Negative Clinically Suspected 22q11.2 Microdeletion Syndrome

    PubMed Central

    Jain, Manish; Kalsi, Amanpreet Kaur

    2016-01-01

    The present study evaluated the role of SNP microarray in 101 cases of clinically suspected FISH negative (noninformative/normal) 22q11.2 microdeletion syndrome. SNP microarray was carried out using 300 K HumanCytoSNP-12 BeadChip array or CytoScan 750 K array. SNP microarray identified 8 cases of 22q11.2 microdeletions and/or microduplications in addition to cases of chromosomal abnormalities and other pathogenic/likely pathogenic CNVs. Clinically suspected specific deletions (22q11.2) were detectable in approximately 8% of cases by SNP microarray, mostly from FISH noninformative cases. This study also identified several LOH/AOH loci with known and well-defined UPD (uniparental disomy) disorders. In conclusion, this study suggests more strict clinical criteria for FISH analysis. However, if clinical criteria are few or doubtful, in particular newborn/neonate in intensive care, SNP microarray should be the first screening test to be ordered. FISH is ideal test for detecting mosaicism, screening family members, and prenatal diagnosis in proven families. PMID:27051557

  19. SNP Microarray in FISH Negative Clinically Suspected 22q11.2 Microdeletion Syndrome.

    PubMed

    Halder, Ashutosh; Jain, Manish; Kalsi, Amanpreet Kaur

    2016-01-01

    The present study evaluated the role of SNP microarray in 101 cases of clinically suspected FISH negative (noninformative/normal) 22q11.2 microdeletion syndrome. SNP microarray was carried out using 300 K HumanCytoSNP-12 BeadChip array or CytoScan 750 K array. SNP microarray identified 8 cases of 22q11.2 microdeletions and/or microduplications in addition to cases of chromosomal abnormalities and other pathogenic/likely pathogenic CNVs. Clinically suspected specific deletions (22q11.2) were detectable in approximately 8% of cases by SNP microarray, mostly from FISH noninformative cases. This study also identified several LOH/AOH loci with known and well-defined UPD (uniparental disomy) disorders. In conclusion, this study suggests more strict clinical criteria for FISH analysis. However, if clinical criteria are few or doubtful, in particular newborn/neonate in intensive care, SNP microarray should be the first screening test to be ordered. FISH is ideal test for detecting mosaicism, screening family members, and prenatal diagnosis in proven families.

  20. Rapid Identification of Ginseng Cultivars (Panax ginseng Meyer) Using Novel SNP-Based Probes

    PubMed Central

    Jo, Ick-Hyun; Bang, Kyong Hwan; Kim, Young-Chang; Lee, Jei-Wan; Seo, A-Yeon; Seong, Bong-Jae; Kim, Hyun-Ho; Kim, Dong-Hwi; Cha, Seon-Woo; Cho, Yong-Gu; Kim, Hong-Sig

    2011-01-01

    In order to develop a novel system for the discrimination of five ginseng cultivars (Panax ginseng Meyer), single nucleotide polymorphism (SNP) genotyping assays with real-time polymerase chain reaction were conducted. Nucleotide substitution in gDNA library clones of P. ginseng cv. Yunpoong was targeted for the SNP genotyping assay. From these SNP sites, a set of modified SNP specific fluorescence probes (PGP74, PGP110, and PGP130) and novel primer sets have been developed to distinguish among five ginseng cultivars. The combination of the SNP type of the five cultivars, Chungpoong, Yunpoong, Gopoong, Kumpoong, and Sunpoong, was identified as ‘ATA’, ‘GCC’, ‘GTA’, ‘GCA’, and ‘ACC’, respectively. This study represents the first report of the identification of ginseng cultivars by fluorescence probes. An SNP genotyping assay using fluorescence probes could prove useful for the identification of ginseng cultivars and ginseng seed management systems and guarantee the purity of ginseng seed. PMID:23717098

  1. Electrochemical Li Topotactic Reaction in Layered SnP3 for Superior Li-Ion Batteries

    NASA Astrophysics Data System (ADS)

    Park, Jae-Wan; Park, Cheol-Min

    2016-10-01

    The development of new anode materials having high electrochemical performances and interesting reaction mechanisms is highly required to satisfy the need for long-lasting mobile electronic devices and electric vehicles. Here, we report a layer crystalline structured SnP3 and its unique electrochemical behaviors with Li. The SnP3 was simply synthesized through modification of Sn crystallography by combination with P and its potential as an anode material for LIBs was investigated. During Li insertion reaction, the SnP3 anode showed an interesting two-step electrochemical reaction mechanism comprised of a topotactic transition (0.7–2.0 V) and a conversion (0.0–2.0 V) reaction. When the SnP3-based composite electrode was tested within the topotactic reaction region (0.7–2.0 V) between SnP3 and LixSnP3 (x ≤ 4), it showed excellent electrochemical properties, such as a high volumetric capacity (1st discharge/charge capacity was 840/663 mA h cm‑3) with a high initial coulombic efficiency, stable cycle behavior (636 mA h cm‑3 over 100 cycles), and fast rate capability (550 mA h cm‑3 at 3C). This layered SnP3 anode will be applicable to a new anode material for rechargeable LIBs.

  2. Electrochemical Li Topotactic Reaction in Layered SnP3 for Superior Li-Ion Batteries

    PubMed Central

    Park, Jae-Wan; Park, Cheol-Min

    2016-01-01

    The development of new anode materials having high electrochemical performances and interesting reaction mechanisms is highly required to satisfy the need for long-lasting mobile electronic devices and electric vehicles. Here, we report a layer crystalline structured SnP3 and its unique electrochemical behaviors with Li. The SnP3 was simply synthesized through modification of Sn crystallography by combination with P and its potential as an anode material for LIBs was investigated. During Li insertion reaction, the SnP3 anode showed an interesting two-step electrochemical reaction mechanism comprised of a topotactic transition (0.7–2.0 V) and a conversion (0.0–2.0 V) reaction. When the SnP3-based composite electrode was tested within the topotactic reaction region (0.7–2.0 V) between SnP3 and LixSnP3 (x ≤ 4), it showed excellent electrochemical properties, such as a high volumetric capacity (1st discharge/charge capacity was 840/663 mA h cm−3) with a high initial coulombic efficiency, stable cycle behavior (636 mA h cm−3 over 100 cycles), and fast rate capability (550 mA h cm−3 at 3C). This layered SnP3 anode will be applicable to a new anode material for rechargeable LIBs. PMID:27775090

  3. Electrochemical Li Topotactic Reaction in Layered SnP3 for Superior Li-Ion Batteries.

    PubMed

    Park, Jae-Wan; Park, Cheol-Min

    2016-10-24

    The development of new anode materials having high electrochemical performances and interesting reaction mechanisms is highly required to satisfy the need for long-lasting mobile electronic devices and electric vehicles. Here, we report a layer crystalline structured SnP3 and its unique electrochemical behaviors with Li. The SnP3 was simply synthesized through modification of Sn crystallography by combination with P and its potential as an anode material for LIBs was investigated. During Li insertion reaction, the SnP3 anode showed an interesting two-step electrochemical reaction mechanism comprised of a topotactic transition (0.7-2.0 V) and a conversion (0.0-2.0 V) reaction. When the SnP3-based composite electrode was tested within the topotactic reaction region (0.7-2.0 V) between SnP3 and LixSnP3 (x ≤ 4), it showed excellent electrochemical properties, such as a high volumetric capacity (1st discharge/charge capacity was 840/663 mA h cm(-3)) with a high initial coulombic efficiency, stable cycle behavior (636 mA h cm(-3) over 100 cycles), and fast rate capability (550 mA h cm(-3) at 3C). This layered SnP3 anode will be applicable to a new anode material for rechargeable LIBs.

  4. SNP2TFBS - a database of regulatory SNPs affecting predicted transcription factor binding site affinity.

    PubMed

    Kumar, Sunil; Ambrosini, Giovanna; Bucher, Philipp

    2017-01-04

    SNP2TFBS is a computational resource intended to support researchers investigating the molecular mechanisms underlying regulatory variation in the human genome. The database essentially consists of a collection of text files providing specific annotations for human single nucleotide polymorphisms (SNPs), namely whether they are predicted to abolish, create or change the affinity of one or several transcription factor (TF) binding sites. A SNP's effect on TF binding is estimated based on a position weight matrix (PWM) model for the binding specificity of the corresponding factor. These data files are regenerated at regular intervals by an automatic procedure that takes as input a reference genome, a comprehensive SNP catalogue and a collection of PWMs. SNP2TFBS is also accessible over a web interface, enabling users to view the information provided for an individual SNP, to extract SNPs based on various search criteria, to annotate uploaded sets of SNPs or to display statistics about the frequencies of binding sites affected by selected SNPs. Homepage: http://ccg.vital-it.ch/snp2tfbs/.

  5. Generation of SNP datasets for orangutan population genomics using improved reduced-representation sequencing and direct comparisons of SNP calling algorithms

    PubMed Central

    2014-01-01

    Background High-throughput sequencing has opened up exciting possibilities in population and conservation genetics by enabling the assessment of genetic variation at genome-wide scales. One approach to reduce genome complexity, i.e. investigating only parts of the genome, is reduced-representation library (RRL) sequencing. Like similar approaches, RRL sequencing reduces ascertainment bias due to simultaneous discovery and genotyping of single-nucleotide polymorphisms (SNPs) and does not require reference genomes. Yet, generating such datasets remains challenging due to laboratory and bioinformatical issues. In the laboratory, current protocols require improvements with regards to sequencing homologous fragments to reduce the number of missing genotypes. From the bioinformatical perspective, the reliance of most studies on a single SNP caller disregards the possibility that different algorithms may produce disparate SNP datasets. Results We present an improved RRL (iRRL) protocol that maximizes the generation of homologous DNA sequences, thus achieving improved genotyping-by-sequencing efficiency. Our modifications facilitate generation of single-sample libraries, enabling individual genotype assignments instead of pooled-sample analysis. We sequenced ~1% of the orangutan genome with 41-fold median coverage in 31 wild-born individuals from two populations. SNPs and genotypes were called using three different algorithms. We obtained substantially different SNP datasets depending on the SNP caller. Genotype validations revealed that the Unified Genotyper of the Genome Analysis Toolkit and SAMtools performed significantly better than a caller from CLC Genomics Workbench (CLC). Of all conflicting genotype calls, CLC was only correct in 17% of the cases. Furthermore, conflicting genotypes between two algorithms showed a systematic bias in that one caller almost exclusively assigned heterozygotes, while the other one almost exclusively assigned homozygotes. Conclusions

  6. Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications.

    PubMed

    Wu, Xiao-Lin; Xu, Jiaqi; Feng, Guofei; Wiggans, George R; Taylor, Jeremy F; He, Jun; Qian, Changsong; Qiu, Jiansheng; Simpson, Barry; Walker, Jeremy; Bauck, Stewart

    2016-01-01

    Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for the optimal design of LD SNP chips. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optimal LD SNP chips that can be imputed accurately to medium-density (MD) or high-density (HD) SNP genotypes for genomic prediction. The objective function facilitates maximization of non-gap map length and system information for the SNP chip, and the latter is computed either as locus-averaged (LASE) or haplotype-averaged Shannon entropy (HASE) and adjusted for uniformity of the SNP distribution. HASE performed better than LASE with ≤1,000 SNPs, but required considerably more computing time. Nevertheless, the differences diminished when >5,000 SNPs were selected. Optimization was accomplished conditionally on the presence of SNPs that were obligated to each chromosome. The frame location of SNPs on a chip can be either uniform (evenly spaced) or non-uniform. For the latter design, a tunable empirical Beta distribution was used to guide location distribution of frame SNPs such that both ends of each chromosome were enriched with SNPs. The SNP distribution on each chromosome was finalized through the objective function that was locally and empirically maximized. This MOLO algorithm was capable of selecting a set of approximately evenly-spaced and highly-informative SNPs, which in turn led to increased imputation accuracy compared with selection solely of evenly-spaced SNPs. Imputation accuracy increased with LD chip size, and imputation error rate was extremely low for chips with ≥3,000 SNPs. Assuming that genotyping or imputation error occurs at random, imputation error rate can be viewed as the upper limit for genomic prediction error. Our results show that about 25% of imputation error rate was propagated to genomic prediction in an Angus population. The

  7. Identification of Laying-Related SNP Markers in Geese Using RAD Sequencing.

    PubMed

    Yu, ShiGang; Chu, WeiWei; Zhang, LiFan; Han, HouMing; Zhao, RongXue; Wu, Wei; Zhu, JiangNing; Dodson, Michael V; Wei, Wei; Liu, HongLin; Chen, Jie

    2015-01-01

    Laying performance is an important economical trait of goose production. As laying performance is of low heritability, it is of significance to develop a marker-assisted selection (MAS) strategy for this trait. Definition of sequence variation related to the target trait is a prerequisite of quantitating MAS, but little is presently known about the goose genome, which greatly hinders the identification of genetic markers for the laying traits of geese. Recently developed restriction site-associated DNA (RAD) sequencing is a possible approach for discerning large-scale single nucleotide polymorphism (SNP) and reducing the complexity of a genome without having reference genomic information available. In the present study, we developed a pooled RAD sequencing strategy for detecting geese laying-related SNP. Two DNA pools were constructed, each consisting of equal amounts of genomic DNA from 10 individuals with either high estimated breeding value (HEBV) or low estimated breeding value (LEBV). A total of 139,013 SNP were obtained from 42,291,356 sequences, of which 18,771,943 were for LEBV and 23,519,413 were for HEBV cohorts. Fifty-five SNP which had different allelic frequencies in the two DNA pools were further validated by individual-based AS-PCR genotyping in the LEBV and HEBV cohorts. Ten out of 55 SNP exhibited distinct allele distributions in these two cohorts. These 10 SNP were further genotyped in a goose population of 492 geese to verify the association with egg numbers. The result showed that 8 of 10 SNP were associated with egg numbers. Additionally, liner regression analysis revealed that SNP Record-111407, 106975 and 112359 were involved in a multiplegene network affecting laying performance. We used IPCR to extend the unknown regions flanking the candidate RAD tags. The obtained sequences were subjected to BLAST to retrieve the orthologous genes in either ducks or chickens. Five novel genes were cloned for geese which harbored the candidate laying

  8. Identification of Laying-Related SNP Markers in Geese Using RAD Sequencing

    PubMed Central

    Yu, ShiGang; Chu, WeiWei; Zhang, LiFan; Han, HouMing; Zhao, RongXue; Wu, Wei; Zhu, JiangNing; Dodson, Michael V.; Wei, Wei; Liu, HongLin; Chen, Jie

    2015-01-01

    Laying performance is an important economical trait of goose production. As laying performance is of low heritability, it is of significance to develop a marker-assisted selection (MAS) strategy for this trait. Definition of sequence variation related to the target trait is a prerequisite of quantitating MAS, but little is presently known about the goose genome, which greatly hinders the identification of genetic markers for the laying traits of geese. Recently developed restriction site-associated DNA (RAD) sequencing is a possible approach for discerning large-scale single nucleotide polymorphism (SNP) and reducing the complexity of a genome without having reference genomic information available. In the present study, we developed a pooled RAD sequencing strategy for detecting geese laying-related SNP. Two DNA pools were constructed, each consisting of equal amounts of genomic DNA from 10 individuals with either high estimated breeding value (HEBV) or low estimated breeding value (LEBV). A total of 139,013 SNP were obtained from 42,291,356 sequences, of which 18,771,943 were for LEBV and 23,519,413 were for HEBV cohorts. Fifty-five SNP which had different allelic frequencies in the two DNA pools were further validated by individual-based AS-PCR genotyping in the LEBV and HEBV cohorts. Ten out of 55 SNP exhibited distinct allele distributions in these two cohorts. These 10 SNP were further genotyped in a goose population of 492 geese to verify the association with egg numbers. The result showed that 8 of 10 SNP were associated with egg numbers. Additionally, liner regression analysis revealed that SNP Record-111407, 106975 and 112359 were involved in a multiplegene network affecting laying performance. We used IPCR to extend the unknown regions flanking the candidate RAD tags. The obtained sequences were subjected to BLAST to retrieve the orthologous genes in either ducks or chickens. Five novel genes were cloned for geese which harbored the candidate laying

  9. Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications

    PubMed Central

    Wu, Xiao-Lin; Xu, Jiaqi; Feng, Guofei; Wiggans, George R.; Taylor, Jeremy F.; He, Jun; Qian, Changsong; Qiu, Jiansheng; Simpson, Barry; Walker, Jeremy; Bauck, Stewart

    2016-01-01

    Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for the optimal design of LD SNP chips. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optimal LD SNP chips that can be imputed accurately to medium-density (MD) or high-density (HD) SNP genotypes for genomic prediction. The objective function facilitates maximization of non-gap map length and system information for the SNP chip, and the latter is computed either as locus-averaged (LASE) or haplotype-averaged Shannon entropy (HASE) and adjusted for uniformity of the SNP distribution. HASE performed better than LASE with ≤1,000 SNPs, but required considerably more computing time. Nevertheless, the differences diminished when >5,000 SNPs were selected. Optimization was accomplished conditionally on the presence of SNPs that were obligated to each chromosome. The frame location of SNPs on a chip can be either uniform (evenly spaced) or non-uniform. For the latter design, a tunable empirical Beta distribution was used to guide location distribution of frame SNPs such that both ends of each chromosome were enriched with SNPs. The SNP distribution on each chromosome was finalized through the objective function that was locally and empirically maximized. This MOLO algorithm was capable of selecting a set of approximately evenly-spaced and highly-informative SNPs, which in turn led to increased imputation accuracy compared with selection solely of evenly-spaced SNPs. Imputation accuracy increased with LD chip size, and imputation error rate was extremely low for chips with ≥3,000 SNPs. Assuming that genotyping or imputation error occurs at random, imputation error rate can be viewed as the upper limit for genomic prediction error. Our results show that about 25% of imputation error rate was propagated to genomic prediction in an Angus population. The

  10. HapRice, an SNP haplotype database and a web tool for rice.

    PubMed

    Yonemaru, Jun-ichi; Ebana, Kaworu; Yano, Masahiro

    2014-01-01

    Genome-wide single nucleotide polymorphism (SNP) analysis is a promising tool to examine the genetic diversity of rice populations and genetic traits of scientific and economic importance. Next-generation sequencing technology has accelerated the re-sequencing of diverse rice varieties and the discovery of genome-wide SNPs. Notably, validation of these SNPs by a high-throughput genotyping system, such as an SNP array, could provide a manageable and highly accurate SNP set. To enhance the potential utility of genome-wide SNPs for geneticists and breeders, analysis tools need to be developed. Here, we constructed an SNP haplotype database, which allows visualization of the allele frequency of all SNPs in the genome browser. We calculated the allele frequencies of 3,334 SNPs in 76 accessions from the world rice collection and 3,252 SNPs in 177 Japanese rice accessions; all these SNPs have been validated in our previous studies. The SNP haplotypes were defined by the allele frequency in each cultivar group (aus, indica, tropical japonica and temperate japonica) for the world rice accessions, and in non-irrigated and three irrigated groups (three variety registration periods) for Japanese rice accessions. We also developed web tools for finding polymorphic SNPs between any two rice accessions and for the primer design to develop cleaved amplified polymorphic sequence markers at any SNP. The 'HapRice' database and the web tools can be accessed at http://qtaro.abr.affrc.go.jp/index.html. In addition, we established a core SNP set consisting of 768 SNPs uniformly distributed in the rice genome; this set is of a practically appropriate size for use in rice genetic analysis.

  11. AMY-tree: an algorithm to use whole genome SNP calling for Y chromosomal phylogenetic applications

    PubMed Central

    2013-01-01

    Background Due to the rapid progress of next-generation sequencing (NGS) facilities, an explosion of human whole genome data will become available in the coming years. These data can be used to optimize and to increase the resolution of the phylogenetic Y chromosomal tree. Moreover, the exponential growth of known Y chromosomal lineages will require an automatic determination of the phylogenetic position of an individual based on whole genome SNP calling data and an up to date Y chromosomal tree. Results We present an automated approach, ‘AMY-tree’, which is able to determine the phylogenetic position of a Y chromosome using a whole genome SNP profile, independently from the NGS platform and SNP calling program, whereby mistakes in the SNP calling or phylogenetic Y chromosomal tree are taken into account. Moreover, AMY-tree indicates ambiguities within the present phylogenetic tree and points out new Y-SNPs which may be phylogenetically relevant. The AMY-tree software package was validated successfully on 118 whole genome SNP profiles of 109 males with different origins. Moreover, support was found for an unknown recurrent mutation, wrong reported mutation conversions and a large amount of new interesting Y-SNPs. Conclusions Therefore, AMY-tree is a useful tool to determine the Y lineage of a sample based on SNP calling, to identify Y-SNPs with yet unknown phylogenetic position and to optimize the Y chromosomal phylogenetic tree in the future. AMY-tree will not add lineages to the existing phylogenetic tree of the Y-chromosome but it is the first step to analyse whole genome SNP profiles in a phylogenetic framework. PMID:23405914

  12. AMY-tree: an algorithm to use whole genome SNP calling for Y chromosomal phylogenetic applications.

    PubMed

    Van Geystelen, Anneleen; Decorte, Ronny; Larmuseau, Maarten H D

    2013-02-13

    Due to the rapid progress of next-generation sequencing (NGS) facilities, an explosion of human whole genome data will become available in the coming years. These data can be used to optimize and to increase the resolution of the phylogenetic Y chromosomal tree. Moreover, the exponential growth of known Y chromosomal lineages will require an automatic determination of the phylogenetic position of an individual based on whole genome SNP calling data and an up to date Y chromosomal tree. We present an automated approach, 'AMY-tree', which is able to determine the phylogenetic position of a Y chromosome using a whole genome SNP profile, independently from the NGS platform and SNP calling program, whereby mistakes in the SNP calling or phylogenetic Y chromosomal tree are taken into account. Moreover, AMY-tree indicates ambiguities within the present phylogenetic tree and points out new Y-SNPs which may be phylogenetically relevant. The AMY-tree software package was validated successfully on 118 whole genome SNP profiles of 109 males with different origins. Moreover, support was found for an unknown recurrent mutation, wrong reported mutation conversions and a large amount of new interesting Y-SNPs. Therefore, AMY-tree is a useful tool to determine the Y lineage of a sample based on SNP calling, to identify Y-SNPs with yet unknown phylogenetic position and to optimize the Y chromosomal phylogenetic tree in the future. AMY-tree will not add lineages to the existing phylogenetic tree of the Y-chromosome but it is the first step to analyse whole genome SNP profiles in a phylogenetic framework.

  13. High-throughput SNP genotyping in Cucurbita pepo for map construction and quantitative trait loci mapping

    PubMed Central

    2012-01-01

    Background Cucurbita pepo is a member of the Cucurbitaceae family, the second- most important horticultural family in terms of economic importance after Solanaceae. The "summer squash" types, including Zucchini and Scallop, rank among the highest-valued vegetables worldwide. There are few genomic tools available for this species. The first Cucurbita transcriptome, along with a large collection of Single Nucleotide Polymorphisms (SNP), was recently generated using massive sequencing. A set of 384 SNP was selected to generate an Illumina GoldenGate assay in order to construct the first SNP-based genetic map of Cucurbita and map quantitative trait loci (QTL). Results We herein present the construction of the first SNP-based genetic map of Cucurbita pepo using a population derived from the cross of two varieties with contrasting phenotypes, representing the main cultivar groups of the species' two subspecies: Zucchini (subsp. pepo) × Scallop (subsp. ovifera). The mapping population was genotyped with 384 SNP, a set of selected EST-SNP identified in silico after massive sequencing of the transcriptomes of both parents, using the Illumina GoldenGate platform. The global success rate of the assay was higher than 85%. In total, 304 SNP were mapped, along with 11 SSR from a previous map, giving a map density of 5.56 cM/marker. This map was used to infer syntenic relationships between C. pepo and cucumber and to successfully map QTL that control plant, flowering and fruit traits that are of benefit to squash breeding. The QTL effects were validated in backcross populations. Conclusion Our results show that massive sequencing in different genotypes is an excellent tool for SNP discovery, and that the Illumina GoldenGate platform can be successfully applied to constructing genetic maps and performing QTL analysis in Cucurbita. This is the first SNP-based genetic map in the Cucurbita genus and is an invaluable new tool for biological research, especially considering that most

  14. Highly specific SNP detection using 2D graphene electronics and DNA strand displacement.

    PubMed

    Hwang, Michael T; Landon, Preston B; Lee, Joon; Choi, Duyoung; Mo, Alexander H; Glinsky, Gennadi; Lal, Ratnesh

    2016-06-28

    Single-nucleotide polymorphisms (SNPs) in a gene sequence are markers for a variety of human diseases. Detection of SNPs with high specificity and sensitivity is essential for effective practical implementation of personalized medicine. Current DNA sequencing, including SNP detection, primarily uses enzyme-based methods or fluorophore-labeled assays that are time-consuming, need laboratory-scale settings, and are expensive. Previously reported electrical charge-based SNP detectors have insufficient specificity and accuracy, limiting their effectiveness. Here, we demonstrate the use of a DNA strand displacement-based probe on a graphene field effect transistor (FET) for high-specificity, single-nucleotide mismatch detection. The single mismatch was detected by measuring strand displacement-induced resistance (and hence current) change and Dirac point shift in a graphene FET. SNP detection in large double-helix DNA strands (e.g., 47 nt) minimize false-positive results. Our electrical sensor-based SNP detection technology, without labeling and without apparent cross-hybridization artifacts, would allow fast, sensitive, and portable SNP detection with single-nucleotide resolution. The technology will have a wide range of applications in digital and implantable biosensors and high-throughput DNA genotyping, with transformative implications for personalized medicine.

  15. Objective evaluation measures of genetic marker selection in large-scale SNP genotyping.

    PubMed

    Kaminuma, Eli; Masuya, Hiroshi; Miura, Ikuo; Motegi, Hiromi; Takahasi, Kenzi R; Nakazawa, Miki; Matsui, Minami; Gondo, Yoichi; Noda, Tetsuo; Shiroishi, Toshihiko; Wakana, Shigeharu; Toyoda, Tetsuro

    2008-10-01

    High-throughput single nucleotide polymorphism (SNP) genotyping systems provide two kinds of fluorescent signals detected from different alleles. In current technologies, the process of genotype discrimination requires subjective judgments by expert operators, even when using clustering algorithms. Here, we propose two evaluation measures to manage fluorescent scatter data with nonclear plot aggregation. The first is the marker ranking measure, which provides a ranking system for the SNP markers based on the distance between the scatter plot distribution and a user-defined ideal distribution. The second measure, called individual genotype membership, uses the membership probability of each genotype related to an individual plot in the scatter data. In verification experiments, the marker ranking measure determined the ranking of SNP markers correlated with the subjective order of SNP markers judged by an expert operator. The experiment using the individual genotype membership measure clarified that the total number of unclassified individuals was remarkably reduced compared to that of manually unclassified ones. These two evaluation measures were implemented as the GTAssist software. GTAssist provides objective standards and avoids subjective biases in SNP genotyping workflows.

  16. Leveraging Ethnic Group Incidence Variation to Investigate Genetic Susceptibility to Glioma: A Novel Candidate SNP Approach

    PubMed Central

    Jacobs, Daniel I.; Walsh, Kyle M.; Wrensch, Margaret; Wiencke, John; Jenkins, Robert; Houlston, Richard S.; Bondy, Melissa; Simon, Matthias; Sanson, Marc; Gousias, Konstantinos; Schramm, Johannes; Labussière, Marianne; Di Stefano, Anna Luisa; Wichmann, H.-Erich; Müller-Nurasyid, Martina; Schreiber, Stefan; Franke, Andre; Moebus, Susanne; Eisele, Lewin; Dewan, Andrew T.; Dubrow, Robert

    2012-01-01

    Objectives: Using a novel candidate SNP approach, we aimed to identify a possible genetic basis for the higher glioma incidence in Whites relative to East Asians and African-Americans. Methods:  We hypothesized that genetic regions containing SNPs with extreme differences in allele frequencies across ethnicities are most likely to harbor susceptibility variants. We used International HapMap Project data to identify 3,961 candidate SNPs with the largest allele frequency differences in Whites compared to East Asians and Africans and tested these SNPs for association with glioma risk in a set of White cases and controls. Top SNPs identified in the discovery dataset were tested for association with glioma in five independent replication datasets. Results: No SNP achieved statistical significance in either the discovery or replication datasets after accounting for multiple testing or conducting meta-analysis. However, the most strongly associated SNP, rs879471, was found to be in linkage disequilibrium with a previously identified risk SNP, rs6010620, in RTEL1. We estimate rs6010620 to account for a glioma incidence rate ratio of 1.34 for Whites relative to East Asians. Conclusion: We explored genetic susceptibility to glioma using a novel candidate SNP method which may be applicable to other diseases with appropriate epidemiologic patterns. PMID:23091480

  17. Supervised learning-based tagSNP selection for genome-wide disease classifications

    PubMed Central

    Liu, Qingzhong; Yang, Jack; Chen, Zhongxue; Yang, Mary Qu; Sung, Andrew H; Huang, Xudong

    2008-01-01

    Background Comprehensive evaluation of common genetic variations through association of single nucleotide polymorphisms (SNPs) with complex human diseases on the genome-wide scale is an active area in human genome research. One of the fundamental questions in a SNP-disease association study is to find an optimal subset of SNPs with predicting power for disease status. To find that subset while reducing study burden in terms of time and costs, one can potentially reconcile information redundancy from associations between SNP markers. Results We have developed a feature selection method named Supervised Recursive Feature Addition (SRFA). This method combines supervised learning and statistical measures for the chosen candidate features/SNPs to reconcile the redundancy information and, in doing so, improve the classification performance in association studies. Additionally, we have proposed a Support Vector based Recursive Feature Addition (SVRFA) scheme in SNP-disease association analysis. Conclusions We have proposed using SRFA with different statistical learning classifiers and SVRFA for both SNP selection and disease classification and then applying them to two complex disease data sets. In general, our approaches outperform the well-known feature selection method of Support Vector Machine Recursive Feature Elimination and logic regression-based SNP selection for disease classification in genetic association studies. Our study further indicates that both genetic and environmental variables should be taken into account when doing disease predictions and classifications for the most complex human diseases that have gene-environment interactions. PMID:18366619

  18. snpGeneSets: An R Package for Genome-Wide Study Annotation.

    PubMed

    Mei, Hao; Li, Lianna; Jiang, Fan; Simino, Jeannette; Griswold, Michael; Mosley, Thomas; Liu, Shijian

    2016-12-07

    Genome-wide studies (GWS) of SNP associations and differential gene expressions have generated abundant results; next-generation sequencing technology has further boosted the number of variants and genes identified. Effective interpretation requires massive annotation and downstream analysis of these genome-wide results, a computationally challenging task. We developed the snpGeneSets package to simplify annotation and analysis of GWS results. Our package integrates local copies of knowledge bases for SNPs, genes, and gene sets, and implements wrapper functions in the R language to enable transparent access to low-level databases for efficient annotation of large genomic data. The package contains functions that execute three types of annotations: (1) genomic mapping annotation for SNPs and genes and functional annotation for gene sets; (2) bidirectional mapping between SNPs and genes, and genes and gene sets; and (3) calculation of gene effect measures from SNP associations and performance of gene set enrichment analyses to identify functional pathways. We applied snpGeneSets to type 2 diabetes (T2D) results from the NHGRI genome-wide association study (GWAS) catalog, a Finnish GWAS, and a genome-wide expression study (GWES). These studies demonstrate the usefulness of snpGeneSets for annotating and performing enrichment analysis of GWS results. The package is open-source, free, and can be downloaded at: https://www.umc.edu/biostats_software/.

  19. Mining and Analysis of SNP in Response to Salinity Stress in Upland Cotton (Gossypium hirsutum L.).

    PubMed

    Wang, Xiaoge; Lu, Xuke; Wang, Junjuan; Wang, Delong; Yin, Zujun; Fan, Weili; Wang, Shuai; Ye, Wuwei

    2016-01-01

    Salinity stress is a major abiotic factor that affects crop output, and as a pioneer crop in saline and alkaline land, salt tolerance study of cotton is particularly important. In our experiment, four salt-tolerance varieties with different salt tolerance indexes including CRI35 (65.04%), Kanghuanwei164 (56.19%), Zhong9807 (55.20%) and CRI44 (50.50%), as well as four salt-sensitive cotton varieties including Hengmian3 (48.21%), GK50 (40.20%), Xinyan96-48 (34.90%), ZhongS9612 (24.80%) were used as the materials. These materials were divided into salt-tolerant group (ST) and salt-sensitive group (SS). Illumina Cotton SNP 70K Chip was used to detect SNP in different cotton varieties. SNPv (SNP variation of the same seedling pre- and after- salt stress) in different varieties were screened; polymorphic SNP and SNPr (SNP related to salt tolerance) were obtained. Annotation and analysis of these SNPs showed that (1) the induction efficiency of salinity stress on SNPv of cotton materials with different salt tolerance index was different, in which the induction efficiency on salt-sensitive materials was significantly higher than that on salt-tolerant materials. The induction of salt stress on SNPv was obviously biased. (2) SNPv induced by salt stress may be related to the methylation changes under salt stress. (3) SNPr may influence salt tolerance of plants by affecting the expression of salt-tolerance related genes.

  20. Inferring Loss-of-Heterozygosity from Unpaired Tumors Using High-Density Oligonucleotide SNP Arrays

    PubMed Central

    Park, Yuhyun; Hao, Ke; Zhao, Xiaojun; Garraway, Levi A; Fox, Edward A; Hochberg, Ephraim P; Mellinghoff, Ingo K; Hofer, Matthias D; Descazeaud, Aurelien; Rubin, Mark A; Meyerson, Matthew; Wong, Wing Hung; Sellers, William R; Li, Cheng

    2006-01-01

    Loss of heterozygosity (LOH) of chromosomal regions bearing tumor suppressors is a key event in the evolution of epithelial and mesenchymal tumors. Identification of these regions usually relies on genotyping tumor and counterpart normal DNA and noting regions where heterozygous alleles in the normal DNA become homozygous in the tumor. However, paired normal samples for tumors and cell lines are often not available. With the advent of oligonucleotide arrays that simultaneously assay thousands of single-nucleotide polymorphism (SNP) markers, genotyping can now be done at high enough resolution to allow identification of LOH events by the absence of heterozygous loci, without comparison to normal controls. Here we describe a hidden Markov model-based method to identify LOH from unpaired tumor samples, taking into account SNP intermarker distances, SNP-specific heterozygosity rates, and the haplotype structure of the human genome. When we applied the method to data genotyped on 100 K arrays, we correctly identified 99% of SNP markers as either retention or loss. We also correctly identified 81% of the regions of LOH, including 98% of regions greater than 3 megabases. By integrating copy number analysis into the method, we were able to distinguish LOH from allelic imbalance. Application of this method to data from a set of prostate samples without paired normals identified known regions of prevalent LOH. We have developed a method for analyzing high-density oligonucleotide SNP array data to accurately identify of regions of LOH and retention in tumors without the need for paired normal samples. PMID:16699594

  1. Explaining the disease phenotype of intergenic SNP through predicted long range regulation.

    PubMed

    Chen, Jingqi; Tian, Weidong

    2016-10-14

    Thousands of disease-associated SNPs (daSNPs) are located in intergenic regions (IGR), making it difficult to understand their association with disease phenotypes. Recent analysis found that non-coding daSNPs were frequently located in or approximate to regulatory elements, inspiring us to try to explain the disease phenotypes of IGR daSNPs through nearby regulatory sequences. Hence, after locating the nearest distal regulatory element (DRE) to a given IGR daSNP, we applied a computational method named INTREPID to predict the target genes regulated by the DRE, and then investigated their functional relevance to the IGR daSNP's disease phenotypes. 36.8% of all IGR daSNP-disease phenotype associations investigated were possibly explainable through the predicted target genes, which were enriched with, were functionally relevant to, or consisted of the corresponding disease genes. This proportion could be further increased to 60.5% if the LD SNPs of daSNPs were also considered. Furthermore, the predicted SNP-target gene pairs were enriched with known eQTL/mQTL SNP-gene relationships. Overall, it's likely that IGR daSNPs may contribute to disease phenotypes by interfering with the regulatory function of their nearby DREs and causing abnormal expression of disease genes. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. Assessment of high resolution melting analysis as a potential SNP genotyping technique in forensic casework.

    PubMed

    Venables, Samantha J; Mehta, Bhavik; Daniel, Runa; Walsh, Simon J; van Oorschot, Roland A H; McNevin, Dennis

    2014-11-01

    High resolution melting (HRM) analysis is a simple, cost effective, closed tube SNP genotyping technique with high throughput potential. The effectiveness of HRM for forensic SNP genotyping was assessed with five commercially available HRM kits evaluated on the ViiA™ 7 Real Time PCR instrument. Four kits performed satisfactorily against forensically relevant criteria. One was further assessed to determine the sensitivity, reproducibility, and accuracy of HRM SNP genotyping. The manufacturer's protocol using 0.5 ng input DNA and 45 PCR cycles produced accurate and reproducible results for 17 of the 19 SNPs examined. Problematic SNPs had GC rich flanking regions which introduced additional melting domains into the melting curve (rs1800407) or included homozygotes that were difficult to distinguish reliably (rs16891982; a G to C SNP). A proof of concept multiplexing experiment revealed that multiplexing a small number of SNPs may be possible after further investigation. HRM enables genotyping of a number of SNPs in a large number of samples without extensive optimization. However, it requires more genomic DNA as template in comparison to SNaPshot®. Furthermore, suitably modifying pre-existing forensic intelligence SNP panels for HRM analysis may pose difficulties due to the properties of some SNPs.

  3. Integrated Analysis of SNP, CNV and Gene Expression Data in Genetic Association Studies.

    PubMed

    Momtaz, Rana; Ghanem, Nagia M; El-Makky, Nagwa M; Ismail, Mohamed A

    2017-07-07

    Integrative approaches that combine multiple forms of data can more accurately capture CGEway associations and so provide a comprehensive understanding of the molecular mechanisms that cause complex diseases. Association analyses based on SNP genotypes, CNV genotypes, and gene expression profiles are the three most common paradigms used for gene set/ CGEway enrichment analyses. Many work has been done to leverage information from two types of data from these three paradigms. However, to the best of our knowledge, there is no work done before to integrate the three paradigms all together. In this paper, we present an integrated analysis that combine SNP, CNV, and gene expression data to generate a single gene list. We present different methods to compare this gene list with the other three possible lists that result from the combinations of the following pairs of data: SNP genotype with gene expression, CNV genotype with gene expression, and SNP genotype with CNV genotype. The comparison is done using three different cancer datasets and two different methods of comparison. Our results show that integrating SNP, CNV, and gene expression data give better association results than integrating any pair of three data. This article is protected by copyright. All rights reserved.

  4. SNP-based association analysis for seedling traits in durum wheat (Triticum turgidum L. durum (Desf.)).

    PubMed

    Sabiel, Salih A I; Huang, Sisi; Hu, Xin; Ren, Xifeng; Fu, Chunjie; Peng, Junhua; Sun, Dongfa

    2017-03-01

    In the present study, 150 accessions of worldwide originated durum wheat germplasm (Triticum turgidum spp. durum) were observed for major seedling traits and their growth. The accessions were evaluated for major seedling traits under controlled conditions of hydroponics at the 13(th), 20(th), 27(th) and 34(th) day-after germination. Biomass traits were measured at the 34(th) day-after germination. Correlation analysis was conducted among the seedling traits and three field traits at maturity, plant height, grain weight and 1000-grain weight observed in four consecutive years. Associations of the measured seedling traits and SNP markers were analyzed based on the mixed linear model (MLM). The results indicated that highly significant genetic variation and robust heritability were found for the seedling and field mature traits. In total, 259 significant associations were detected for all the traits and four growth stages. The phenotypic variation explained (R2) by a single SNP marker is higher than 10% for most (84%) of the significant SNP markers. Forty-six SNP markers associated with multiple traits, indicating non-neglectable pleiotropy in seedling stage. The associated SNP markers could be helpful for genetic analysis of seedling traits, and marker-assisted breeding of new wheat varieties with strong seedling vigor.

  5. SNP-based association analysis for seedling traits in durum wheat (Triticum turgidum L. durum (Desf.))

    PubMed Central

    Sabiel, Salih A. I.; Huang, Sisi; Hu, Xin; Ren, Xifeng; Fu, Chunjie; Peng, Junhua; Sun, Dongfa

    2017-01-01

    In the present study, 150 accessions of worldwide originated durum wheat germplasm (Triticum turgidum spp. durum) were observed for major seedling traits and their growth. The accessions were evaluated for major seedling traits under controlled conditions of hydroponics at the 13th, 20th, 27th and 34th day-after germination. Biomass traits were measured at the 34th day-after germination. Correlation analysis was conducted among the seedling traits and three field traits at maturity, plant height, grain weight and 1000-grain weight observed in four consecutive years. Associations of the measured seedling traits and SNP markers were analyzed based on the mixed linear model (MLM). The results indicated that highly significant genetic variation and robust heritability were found for the seedling and field mature traits. In total, 259 significant associations were detected for all the traits and four growth stages. The phenotypic variation explained (R2) by a single SNP marker is higher than 10% for most (84%) of the significant SNP markers. Forty-six SNP markers associated with multiple traits, indicating non-neglectable pleiotropy in seedling stage. The associated SNP markers could be helpful for genetic analysis of seedling traits, and marker-assisted breeding of new wheat varieties with strong seedling vigor. PMID:28588384

  6. The Generalized Higher Criticism for Testing SNP-Set Effects in Genetic Association Studies.

    PubMed

    Barnett, Ian; Mukherjee, Rajarshi; Lin, Xihong

    2017-01-01

    It is of substantial interest to study the effects of genes, genetic pathways, and networks on the risk of complex diseases. These genetic constructs each contain multiple SNPs, which are often correlated and function jointly, and might be large in number. However, only a sparse subset of SNPs in a genetic construct is generally associated with the disease of interest. In this article, we propose the generalized higher criticism (GHC) to test for the association between an SNP set and a disease outcome. The higher criticism is a test traditionally used in high-dimensional signal detection settings when marginal test statistics are independent and the number of parameters is very large. However, these assumptions do not always hold in genetic association studies, due to linkage disequilibrium among SNPs and the finite number of SNPs in an SNP set in each genetic construct. The proposed GHC overcomes the limitations of the higher criticism by allowing for arbitrary correlation structures among the SNPs in an SNP-set, while performing accurate analytic p-value calculations for any finite number of SNPs in the SNP-set. We obtain the detection boundary of the GHC test. We compared empirically using simulations the power of the GHC method with existing SNP-set tests over a range of genetic regions with varied correlation structures and signal sparsity. We apply the proposed methods to analyze the CGEM breast cancer genome-wide association study. Supplementary materials for this article are available online.

  7. Association of NR3C1/Glucocorticoid Receptor gene SNP with azoospermia in Japanese men.

    PubMed

    Chihara, Makoto; Yoshihara, Kosuke; Ishiguro, Tatsuya; Adachi, Sosuke; Okada, Hiroyuki; Kashima, Katsunori; Sato, Takaaki; Tanaka, Atsushi; Tanaka, Kenichi; Enomoto, Takayuki

    2016-01-01

    The molecular pathogenesis of non-obstructive azoospermia (NOA) is unclear. Our aim was to identify the genetic susceptibility for NOA in Japanese men by using a combination of transcriptome network analysis and SNP genotyping. We searched for candidate genes using RNA transcriptome network analysis of 2611 NOA-related genes that we had previously reported. We analyzed candidate genes for disease linkage with single nucleotide polymorphisms (SNP) in the genomes of 335 Japanese men with NOA and 410 healthy controls using SNP-specific real-time polymerase chain reaction TaqMan assays. Three candidate genes (NR3C1, YBX2, and BCL2) were identified by the transcriptome network analysis, each with three SNP. Allele frequency analysis of the nine SNP indicated a significantly higher frequency of the NR3C1 rs852977 G allele in NOA cases compared with controls (corrected P = 5.7e-15; odds ratio = 3.20; 95% confidence interval, 2.40-4.26). The other eight candidate polymorphisms showed no significant association. The NR3C1 rs852977 polymorphism is a potential marker for genetic susceptibility to NOA in Japanese men. Further studies are necessary to clarify the association between the NR3C1 polymorphism and alterations of glucocorticoid signaling pathway leading to male infertility. © 2015 Japan Society of Obstetrics and Gynecology.

  8. Supervised learning-based tagSNP selection for genome-wide disease classifications.

    PubMed

    Liu, Qingzhong; Yang, Jack; Chen, Zhongxue; Yang, Mary Qu; Sung, Andrew H; Huang, Xudong

    2008-01-01

    Comprehensive evaluation of common genetic variations through association of single nucleotide polymorphisms (SNPs) with complex human diseases on the genome-wide scale is an active area in human genome research. One of the fundamental questions in a SNP-disease association study is to find an optimal subset of SNPs with predicting power for disease status. To find that subset while reducing study burden in terms of time and costs, one can potentially reconcile information redundancy from associations between SNP markers. We have developed a feature selection method named Supervised Recursive Feature Addition (SRFA). This method combines supervised learning and statistical measures for the chosen candidate features/SNPs to reconcile the redundancy information and, in doing so, improve the classification performance in association studies. Additionally, we have proposed a Support Vector based Recursive Feature Addition (SVRFA) scheme in SNP-disease association analysis. We have proposed using SRFA with different statistical learning classifiers and SVRFA for both SNP selection and disease classification and then applying them to two complex disease data sets. In general, our approaches outperform the well-known feature selection method of Support Vector Machine Recursive Feature Elimination and logic regression-based SNP selection for disease classification in genetic association studies. Our study further indicates that both genetic and environmental variables should be taken into account when doing disease predictions and classifications for the most complex human diseases that have gene-environment interactions.

  9. Explaining the disease phenotype of intergenic SNP through predicted long range regulation

    PubMed Central

    Chen, Jingqi; Tian, Weidong

    2016-01-01

    Thousands of disease-associated SNPs (daSNPs) are located in intergenic regions (IGR), making it difficult to understand their association with disease phenotypes. Recent analysis found that non-coding daSNPs were frequently located in or approximate to regulatory elements, inspiring us to try to explain the disease phenotypes of IGR daSNPs through nearby regulatory sequences. Hence, after locating the nearest distal regulatory element (DRE) to a given IGR daSNP, we applied a computational method named INTREPID to predict the target genes regulated by the DRE, and then investigated their functional relevance to the IGR daSNP's disease phenotypes. 36.8% of all IGR daSNP-disease phenotype associations investigated were possibly explainable through the predicted target genes, which were enriched with, were functionally relevant to, or consisted of the corresponding disease genes. This proportion could be further increased to 60.5% if the LD SNPs of daSNPs were also considered. Furthermore, the predicted SNP-target gene pairs were enriched with known eQTL/mQTL SNP-gene relationships. Overall, it's likely that IGR daSNPs may contribute to disease phenotypes by interfering with the regulatory function of their nearby DREs and causing abnormal expression of disease genes. PMID:27280978

  10. Different SNP combinations in the GCH1 gene and use of labor analgesia

    PubMed Central

    2010-01-01

    Background The aim of this study was to investigate if there is an association between different SNP combinations in the guanosine triphosphate cyclohydrolase (GCH1) gene and a number of pain behavior related outcomes during labor. A population-based sample of pregnant women (n = 814) was recruited at gestational week 18. A plasma sample was collected from each subject. Genotyping was performed and three single nucleotide polymorphisms (SNP) previously defined as a pain-protective SNP combination of GCH1 were used. Results Homozygous carriers of the pain-protective SNP combination of GCH1 arrived to the delivery ward with a more advanced stage of cervical dilation compared to heterozygous carriers and non-carriers. However, homozygous carriers more often used second line labor analgesia compared to the others. Conclusion The pain-protective SNP combination of GCH1 may be of importance in the limited number of homozygous carriers during the initial dilation of cervix but upon arrival at the delivery unit these women are more inclined to use second line labor analgesia. PMID:20633294

  11. snpGeneSets: An R Package for Genome-Wide Study Annotation

    PubMed Central

    Mei, Hao; Li, Lianna; Jiang, Fan; Simino, Jeannette; Griswold, Michael; Mosley, Thomas; Liu, Shijian

    2016-01-01

    Genome-wide studies (GWS) of SNP associations and differential gene expressions have generated abundant results; next-generation sequencing technology has further boosted the number of variants and genes identified. Effective interpretation requires massive annotation and downstream analysis of these genome-wide results, a computationally challenging task. We developed the snpGeneSets package to simplify annotation and analysis of GWS results. Our package integrates local copies of knowledge bases for SNPs, genes, and gene sets, and implements wrapper functions in the R language to enable transparent access to low-level databases for efficient annotation of large genomic data. The package contains functions that execute three types of annotations: (1) genomic mapping annotation for SNPs and genes and functional annotation for gene sets; (2) bidirectional mapping between SNPs and genes, and genes and gene sets; and (3) calculation of gene effect measures from SNP associations and performance of gene set enrichment analyses to identify functional pathways. We applied snpGeneSets to type 2 diabetes (T2D) results from the NHGRI genome-wide association study (GWAS) catalog, a Finnish GWAS, and a genome-wide expression study (GWES). These studies demonstrate the usefulness of snpGeneSets for annotating and performing enrichment analysis of GWS results. The package is open-source, free, and can be downloaded at: https://www.umc.edu/biostats_software/. PMID:27807048

  12. Highly specific SNP detection using 2D graphene electronics and DNA strand displacement

    PubMed Central

    Hwang, Michael T.; Landon, Preston B.; Lee, Joon; Choi, Duyoung; Mo, Alexander H.; Glinsky, Gennadi; Lal, Ratnesh

    2016-01-01

    Single-nucleotide polymorphisms (SNPs) in a gene sequence are markers for a variety of human diseases. Detection of SNPs with high specificity and sensitivity is essential for effective practical implementation of personalized medicine. Current DNA sequencing, including SNP detection, primarily uses enzyme-based methods or fluorophore-labeled assays that are time-consuming, need laboratory-scale settings, and are expensive. Previously reported electrical charge-based SNP detectors have insufficient specificity and accuracy, limiting their effectiveness. Here, we demonstrate the use of a DNA strand displacement-based probe on a graphene field effect transistor (FET) for high-specificity, single-nucleotide mismatch detection. The single mismatch was detected by measuring strand displacement-induced resistance (and hence current) change and Dirac point shift in a graphene FET. SNP detection in large double-helix DNA strands (e.g., 47 nt) minimize false-positive results. Our electrical sensor-based SNP detection technology, without labeling and without apparent cross-hybridization artifacts, would allow fast, sensitive, and portable SNP detection with single-nucleotide resolution. The technology will have a wide range of applications in digital and implantable biosensors and high-throughput DNA genotyping, with transformative implications for personalized medicine. PMID:27298347

  13. MDM2 SNP309 polymorphism is associated with colorectal cancer risk

    PubMed Central

    Wang, Weizhi; Du, Mulong; Gu, Dongying; Zhu, Lingjun; Chu, Haiyan; Tong, Na; Zhang, Zhengdong; Xu, Zekuan; Wang, Meilin

    2014-01-01

    The human murine double minute 2 (MDM2) is known as an oncoprotein through inhibiting P53 transcriptional activity and mediating P53 ubiquitination. Therefore, the amplification of MDM2 may attenuate the P53 pathway and promote tumorigenesis. The SNP309 T>G polymorphism (rs2279744), which is located in the intronic promoter of MDM2 gene, was reported to contribute to the increased level of MDM2 protein. In this hospital-based case-control study, which consisted of 573 cases and 588 controls, we evaluated the association between MDM2 SNP309 and the risk of colorectal cancer (CRC) in a Chinese population by using the TaqMan method to genotype the polymorphism. We found that the MDM2 SNP309 polymorphism was significantly associated with CRC risk. In addition, in our meta-analysis, we found a significant association between MDM2 SNP309 and CRC risk among Asians, which was consistent with our results. In conclusion, we demonstrated that the MDM2 SNP309 polymorphism increased the susceptibility of CRC in Asian populations. PMID:24797837

  14. Developing a new nonbinary SNP fluorescent multiplex detection system for forensic application in China.

    PubMed

    Liu, Yanfang; Liao, Huidan; Liu, Ying; Guo, Juanjuan; Sun, Yi; Fu, Xiaoliang; Xiao, Ding; Cai, Jifeng; Lan, Lingmei; Xie, Pingli; Zha, Lagabaiyila

    2017-02-06

    Nonbinary single-nucleotide polymorphisms (SNPs) are potential forensic genetic markers because their discrimination power is greater than that of normal binary SNPs, and that they can detect highly degraded samples. We previously developed a nonbinary SNP multiplex typing assay. In this study, we selected additional 20 nonbinary SNPs from the NCBI SNP database and verified them through pyrosequencing. These 20 nonbinary SNPs were analyzed using the fluorescent-labeled SNaPshot multiplex SNP typing method. The allele frequencies and genetic parameters of these 20 nonbinary SNPs were determined among 314 unrelated individuals from Han populations from China. The total power of discrimination was 0.9999999999994, and the cumulative probability of exclusion was 0.9986. Moreover, the result of the combination of this 20 nonbinary SNP assay with the 20 nonbinary SNP assay we previously developed demonstrated that the cumulative probability of exclusion of the 40 nonbinary SNPs was 0.999991 and that no significant linkage disequilibrium was observed in all 40 nonbinary SNPs. Thus, we concluded that this new system consisting of new 20 nonbinary SNPs could provide highly informative polymorphic data which would be further used in forensic application and would serve as a potentially valuable supplement to forensic DNA analysis.

  15. SnpFilt: A pipeline for reference-free assembly-based identification of SNPs in bacterial genomes.

    PubMed

    Chan, Carmen H S; Octavia, Sophie; Sintchenko, Vitali; Lan, Ruiting

    2016-12-01

    De novo assembly of bacterial genomes from next-generation sequencing (NGS) data allows a reference-free discovery of single nucleotide polymorphisms (SNP). However, substantial rates of errors in genomes assembled by this approach remain a major barrier for the reference-free analysis of genome variations in medically important bacteria. The aim of this report was to improve the quality of SNP identification in bacterial genomes without closely related references. We developed a bioinformatics pipeline (SnpFilt) that constructs an assembly using SPAdes and then removes unreliable regions based on the quality and coverage of re-aligned reads at neighbouring regions. The performance of the pipeline was compared against reference-based SNP calling for Illumina HiSeq, MiSeq and NextSeq reads from a range of bacterial pathogens including Salmonella, which is one of the most common causes of food-borne disease. The SnpFilt pipeline removed all false SNP in all test NGS datasets consisting of paired-end Illumina reads. We also showed that for reliable and complete SNP calls, at least 40-fold coverage is required. Analysis of bacterial isolates associated with epidemiologically confirmed outbreaks using the SnpFilt pipeline produced results consistent with previously published findings. The SnpFilt pipeline improves the quality of de-novo assembly and precision of SNP calling in bacterial genomes by removal of regions of the assembly that may potentially contain assembly errors. SnpFilt is available from https://github.com/LanLab/SnpFilt.

  16. Transcriptome sequencing for SNP discovery across Cucumis melo

    PubMed Central

    2012-01-01

    from India and Africa as compared to commercial cultivars, cultigens and landraces from Eastern Europe, Western Asia and the Mediterranean basin is consistent with the evolutionary history proposed for the species. Group-specific SNVs that will be useful in introgression programs were also detected. In a sample of 143 selected putative SNPs, we verified 93% of the polymorphisms in a panel of 78 genotypes. Conclusions This study provides the first comprehensive resequencing data for wild, exotic, and cultivated (landraces and commercial) melon transcriptomes, yielding the largest melon SNP collection available to date and representing a notable sample of the species diversity. This data provides a valuable resource for creating a catalog of allelic variants of melon genes and it will aid in future in-depth studies of population genetics, marker-assisted breeding, and gene identification aimed at developing improved varieties. PMID:22726804

  17. SNP Discovery by Illumina-Based Transcriptome Sequencing of the Olive and the Genetic Characterization of Turkish Olive Genotypes Revealed by AFLP, SSR and SNP Markers

    PubMed Central

    Kaya, Hilal Betul; Cetin, Oznur; Kaya, Hulya; Sahin, Mustafa; Sefer, Filiz; Kahraman, Abdullah; Tanyolac, Bahattin

    2013-01-01

    Background The olive tree (Olea europaea L.) is a diploid (2n = 2x = 46) outcrossing species mainly grown in the Mediterranean area, where it is the most important oil-producing crop. Because of its economic, cultural and ecological importance, various DNA markers have been used in the olive to characterize and elucidate homonyms, synonyms and unknown accessions. However, a comprehensive characterization and a full sequence of its transcriptome are unavailable, leading to the importance of an efficient large-scale single nucleotide polymorphism (SNP) discovery in olive. The objectives of this study were (1) to discover olive SNPs using next-generation sequencing and to identify SNP primers for cultivar identification and (2) to characterize 96 olive genotypes originating from different regions of Turkey. Methodology/Principal Findings Next-generation sequencing technology was used with five distinct olive genotypes and generated cDNA, producing 126,542,413 reads using an Illumina Genome Analyzer IIx. Following quality and size trimming, the high-quality reads were assembled into 22,052 contigs with an average length of 1,321 bases and 45 singletons. The SNPs were filtered and 2,987 high-quality putative SNP primers were identified. The assembled sequences and singletons were subjected to BLAST similarity searches and annotated with a Gene Ontology identifier. To identify the 96 olive genotypes, these SNP primers were applied to the genotypes in combination with amplified fragment length polymorphism (AFLP) and simple sequence repeats (SSR) markers. Conclusions/Significance This study marks the highest number of SNP markers discovered to date from olive genotypes using transcriptome sequencing. The developed SNP markers will provide a useful source for molecular genetic studies, such as genetic diversity and characterization, high density quantitative trait locus (QTL) analysis, association mapping and map-based gene cloning in the olive. High levels of

  18. Breast cancer-associated high-order SNP-SNP interaction of CXCL12/CXCR4-related genes by an improved multifactor dimensionality reduction (MDR-ER).

    PubMed

    Fu, Ou-Yang; Chang, Hsueh-Wei; Lin, Yu-Da; Chuang, Li-Yeh; Hou, Ming-Feng; Yang, Cheng-Hong

    2016-09-01

    In association studies, the combined effects of single nucleotide polymorphism (SNP)-SNP interactions and the problem of imbalanced data between cases and controls are frequently ignored. In the present study, we used an improved multifactor dimensionality reduction (MDR) approach namely MDR-ER to detect the high order SNP‑SNP interaction in an imbalanced breast cancer data set containing seven SNPs of chemokine CXCL12/CXCR4 pathway genes. Most individual SNPs were not significantly associated with breast cancer. After MDR‑ER analysis, six significant SNP‑SNP interaction models with seven genes (highest cross‑validation consistency, 10; classification error rates, 41.3‑21.0; and prediction error rates, 47.4‑55.3) were identified. CD4 and VEGFA genes were associated in a 2‑loci interaction model (classification error rate, 41.3; prediction error rate, 47.5; odds ratio (OR), 2.069; 95% bootstrap CI, 1.40‑2.90; P=1.71E‑04) and it also appeared in all the best 2‑7‑loci models. When the loci number increased, the classification error rates and P‑values decreased. The powers in 2‑7‑loci in all models were >0.9. The minimum classification error rate of the MDR‑ER‑generated model was shown with the 7‑loci interaction model (classification error rate, 21.0; OR=15.282; 95% bootstrap CI, 9.54‑23.87; P=4.03E‑31). In the epistasis network analysis, the overall effect with breast cancer susceptibility was identified and the SNP order of impact on breast cancer was identified as follows: CD4 = VEGFA > KITLG > CXCL12 > CCR7 = MMP2 > CXCR4. In conclusion, the MDR‑ER can effectively and correctly identify the best SNP‑SNP interaction models in an imbalanced data set for breast cancer cases.

  19. An Affymetrix Microarray Design for Microbial Genotyping

    DTIC Science & Technology

    2009-10-01

    Pagotto, F. 2004. Selective discrimination of Listeria monocytogenes epidemic strains by a mixed-genome DNA microarray compared to discrimination by...Legionella pneumophila Paris 399 Legionella pneumophila pneumophila 5 Listeria innocua Clip 11262 105 Listeria ivanoviiI ATCC 19119 5 Listeria ...monocytogenes monocytogenes 10 Listeria monocytogenes APRT EGD-e 5 Listeria monocytogenes HPT 4b 2365 10 Listeria monocytogenes HPT EGD-e 5 Listeria

  20. SNP discrimination through proofreading and OFF-switch of exo+ polymerase.

    PubMed

    Zhang, Jia; Li, Kai; Pardinas, Jose R; Liao, Duan F; Li, Hong J; Zhang, Xu

    2004-05-01

    Single nucleotide polymorphisms (SNPs) are useful physical markers for genetic studies as well as the cause of some genetic diseases. To develop more reliable SNP assays, we examined the underlying molecular mechanisms by which deoxyribonucleic acid (DNA) polymerases with 3' exonuclease activity maintain the high fidelity of DNA replication. In addition to mismatch removal by proofreading, we have discovered a premature termination of polymerization mediated by a novel OFF-switch mechanism. Two SNP assays were developed, one based on proofreading using 3' end-labeled primer extension and the other based on the newly identified OFF-switch, respectively. These two new assays are well suited for conventional techniques, such as electrophoresis and microplates detection systems as well as the sophisticated microchips. Application of these reliable SNP assays will greatly facilitate genetic and biomedical studies in the postgenome era.

  1. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations

    PubMed Central

    Welter, Danielle; MacArthur, Jacqueline; Morales, Joannella; Burdett, Tony; Hall, Peggy; Junkins, Heather; Klemm, Alan; Flicek, Paul; Manolio, Teri; Hindorff, Lucia; Parkinson, Helen

    2014-01-01

    The National Human Genome Research Institute (NHGRI) Catalog of Published Genome-Wide Association Studies (GWAS) Catalog provides a publicly available manually curated collection of published GWAS assaying at least 100 000 single-nucleotide polymorphisms (SNPs) and all SNP-trait associations with P <1 × 10−5. The Catalog includes 1751 curated publications of 11 912 SNPs. In addition to the SNP-trait association data, the Catalog also publishes a quarterly diagram of all SNP-trait associations mapped to the SNPs’ chromosomal locations. The Catalog can be accessed via a tabular web interface, via a dynamic visualization on the human karyotype, as a downloadable tab-delimited file and as an OWL knowledge base. This article presents a number of recent improvements to the Catalog, including novel ways for users to interact with the Catalog and changes to the curation infrastructure. PMID:24316577

  2. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations.

    PubMed

    Welter, Danielle; MacArthur, Jacqueline; Morales, Joannella; Burdett, Tony; Hall, Peggy; Junkins, Heather; Klemm, Alan; Flicek, Paul; Manolio, Teri; Hindorff, Lucia; Parkinson, Helen

    2014-01-01

    The National Human Genome Research Institute (NHGRI) Catalog of Published Genome-Wide Association Studies (GWAS) Catalog provides a publicly available manually curated collection of published GWAS assaying at least 100,000 single-nucleotide polymorphisms (SNPs) and all SNP-trait associations with P <1 × 10(-5). The Catalog includes 1751 curated publications of 11 912 SNPs. In addition to the SNP-trait association data, the Catalog also publishes a quarterly diagram of all SNP-trait associations mapped to the SNPs' chromosomal locations. The Catalog can be accessed via a tabular web interface, via a dynamic visualization on the human karyotype, as a downloadable tab-delimited file and as an OWL knowledge base. This article presents a number of recent improvements to the Catalog, including novel ways for users to interact with the Catalog and changes to the curation infrastructure.

  3. Developing single nucleotide polymorphism (SNP) markers from transcriptome sequences for identification of longan (Dimocarpus longan) germplasm

    PubMed Central

    Wang, Boyi; Tan, Hua-Wei; Fang, Wanping; Meinhardt, Lyndel W; Mischke, Sue; Matsumoto, Tracie; Zhang, Dapeng

    2015-01-01

    Longan (Dimocarpus longan Lour.) is an important tropical fruit tree crop. Accurate varietal identification is essential for germplasm management and breeding. Using longan transcriptome sequences from public databases, we developed single nucleotide polymorphism (SNP) markers; validated 60 SNPs in 50 longan germplasm accessions, including cultivated varieties and wild germplasm; and designated 25 SNP markers that unambiguously identified all tested longan varieties with high statistical rigor (P<0.0001). Multiple trees from the same clone were verified and off-type trees were identified. Diversity analysis revealed genetic relationships among analyzed accessions. Cultivated varieties differed significantly from wild populations (Fst=0.300; P<0.001), demonstrating untapped genetic diversity for germplasm conservation and utilization. Within cultivated varieties, apparent differences between varieties from China and those from Thailand and Hawaii indicated geographic patterns of genetic differentiation. These SNP markers provide a powerful tool to manage longan genetic resources and breeding, with accurate and efficient genotype identification. PMID:26504559

  4. A SNP-Based Molecular Barcode for Characterization of Common Wheat

    PubMed Central

    Gao, LiFeng; Jia, JiZeng; Kong, XiuYing

    2016-01-01

    Wheat is grown as a staple crop worldwide. It is important to develop an effective genotyping tool for this cereal grain both to identify germplasm diversity and to protect the rights of breeders. Single-nucleotide polymorphism (SNP) genotyping provides a means for developing a practical, rapid, inexpensive and high-throughput assay. Here, we investigated SNPs as robust markers of genetic variation for typing wheat cultivars. We identified SNPs from an array of 9000 across a collection of 429 well-known wheat cultivars grown in China, of which 43 SNP markers with high minor allele frequency and variations discriminated the selected wheat varieties and their wild ancestors. This SNP-based barcode will allow for the rapid and precise identification of wheat germplasm resources and newly released varieties and will further assist in the wheat breeding program. PMID:26985664

  5. Analyzing copy number variation using SNP array data: protocols for calling CNV and association tests.

    PubMed

    Lin, Chiao-Feng; Naj, Adam C; Wang, Li-San

    2013-10-18

    High-density SNP genotyping technology provides a low-cost, effective tool for conducting Genome Wide Association (GWA) studies. The wide adoption of GWA studies has indeed led to discoveries of disease- or trait-associated SNPs, some of which were subsequently shown to be causal. However, the nearly universal shortcoming of many GWA studies--missing heritability--has prompted great interest in searching for other types of genetic variation, such as copy number variation (CNV). Certain CNVs have been reported to alter disease susceptibility. Algorithms and tools have been developed to identify CNVs using SNP array hybridization intensity data. Such an approach provides an additional source of data with almost no extra cost. In this unit, we demonstrate the steps for calling CNVs from Illumina SNP array data using PennCNV and performing association analysis using R and PLINK. Copyright © 2013 John Wiley & Sons, Inc.

  6. Bayesian model comparison in genetic association analysis: linear mixed modeling and SNP set testing.

    PubMed

    Wen, Xiaoquan

    2015-10-01

    We consider the problems of hypothesis testing and model comparison under a flexible Bayesian linear regression model whose formulation is closely connected with the linear mixed effect model and the parametric models for Single Nucleotide Polymorphism (SNP) set analysis in genetic association studies. We derive a class of analytic approximate Bayes factors and illustrate their connections with a variety of frequentist test statistics, including the Wald statistic and the variance component score statistic. Taking advantage of Bayesian model averaging and hierarchical modeling, we demonstrate some distinct advantages and flexibilities in the approaches utilizing the derived Bayes factors in the context of genetic association studies. We demonstrate our proposed methods using real or simulated numerical examples in applications of single SNP association testing, multi-locus fine-mapping and SNP set association testing. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  7. Multi-marker-LD based genetic algorithm for tag SNP selection.

    PubMed

    Mouawad, Amer E; Mansour, Nashat

    2014-12-01

    Despite the advances in genotyping technologies which have led to large reduction in genotyping cost, the Tag SNP Selection problem remains an important problem for computational biologists and geneticists. Selecting the smallest subset of tag SNPs that can predict the other SNPs would considerably minimize the complexity of genome-wide or block-based SNP-disease association studies. These studies would lead to better diagnosis and treatment of diseases. In this work, we propose three variations of a genetic algorithm based on two-marker linkage disequilibrium, multi-marker linkage disequilibrium, and a third measure that we denote by prediction power. The performance of the three algorithms are compared with those of a recognized tag SNP selection algorithm using three different real data sets from the HapMap project. The results indicate that the multi-marker linkage disequilibrium based genetic algorithm yields better prediction accuracy.

  8. Observation of perturbed 3snp double photoexcited Ryberg series of beryllium atoms

    SciTech Connect

    Yoshida, Fumiko; Matsuoka, Leo; Osaki, Hiroyuki; Kikkawa, Satoshi; Fukushima, Yu; Hasegawa, Shuichi; Nagata, Tetsuo; Azuma, Yoshiro; Obara, Satoshi

    2006-04-15

    We observed the 3snp autoionizing Rydberg series of the Be atom in order to investigate the double-photoexcitation processes in two-s-electron systems. We employed synchrotron radiation to photoexcite the Be atoms and measured the generated Be{sup +} photoions by the time-of-flight method. The 3snp (n=3-9) photoexcitation resonance peaks with interloper state of 3p4s that converges to Be{sup +}(3p) threshold were observed. We derived the resonance parameters of 3snp series from a fitting procedure and obtained the Fano parameter q, energy position E{sub 0}, and resonance width {gamma}. These parameters are in good agreement with theoretical values. In the vicinity of the 3s5p state these experimental results clearly revealed the influence of the interloper 3p4s state, and the comparison with the numerical calculations indicates that more detailed calculations might be required to fully explain this phenomenon.

  9. Sequential Support Vector Regression with Embedded Entropy for SNP Selection and Disease Classification.

    PubMed

    Liang, Yulan; Kelemen, Arpad

    2011-06-01

    Comprehensive evaluation of common genetic variations through association of SNP structure with common diseases on the genome-wide scale is currently a hot area in human genome research. For less costly and faster diagnostics, advanced computational approaches are needed to select the minimum SNPs with the highest prediction accuracy for common complex diseases. In this paper, we present a sequential support vector regression model with embedded entropy algorithm to deal with the redundancy for the selection of the SNPs that have best prediction performance of diseases. We implemented our proposed method for both SNP selection and disease classification, and applied it to simulation data sets and two real disease data sets. Results show that on the average, our proposed method outperforms the well known methods of Support Vector Machine Recursive Feature Elimination, logistic regression, CART, and logic regression based SNP selections for disease classification.

  10. SNP-Seek database of SNPs derived from 3000 rice genomes

    PubMed Central

    Alexandrov, Nickolai; Tai, Shuaishuai; Wang, Wensheng; Mansueto, Locedie; Palis, Kevin; Fuentes, Roven Rommel; Ulat, Victor Jun; Chebotarov, Dmytro; Zhang, Gengyun; Li, Zhikang; Mauleon, Ramil; Hamilton, Ruaraidh Sackville; McNally, Kenneth L.

    2015-01-01

    We have identified about 20 million rice SNPs by aligning reads from the 3000 rice genomes project with the Nipponbare genome. The SNPs and allele information are organized into a SNP-Seek system (http://www.oryzasnp.org/iric-portal/), which consists of Oracle database having a total number of rows with SNP genotypes close to 60 billion (20 M SNPs × 3 K rice lines) and web interface for convenient querying. The database allows quick retrieving of SNP alleles for all varieties in a given genome region, finding different alleles from predefined varieties and querying basic passport and morphological phenotypic information about sequenced rice lines. SNPs can be visualized together with the gene structures in JBrowse genome browser. Evolutionary relationships between rice varieties can be explored using phylogenetic trees or multidimensional scaling plots. PMID:25429973

  11. k-merSNP discovery: Software for alignment-and reference-free scalable SNP discovery, phylogenetics, and annotation for hundreds of microbial genomes

    SciTech Connect

    2014-11-18

    With the flood of whole genome finished and draft microbial sequences, we need faster, more scalable bioinformatics tools for sequence comparison. An algorithm is described to find single nucleotide polymorphisms (SNPs) in whole genome data. It scales to hundreds of bacterial or viral genomes, and can be used for finished and/or draft genomes available as unassembled contigs or raw, unassembled reads. The method is fast to compute, finding SNPs and building a SNP phylogeny in minutes to hours, depending on the size and diversity of the input sequences. The SNP-based trees that result are consistent with known taxonomy and trees determined in other studies. The approach we describe can handle many gigabases of sequence in a single run. The algorithm is based on k-mer analysis.

  12. Toward a consensus on SNP and STR mutation rates on the human Y-chromosome.

    PubMed

    Balanovsky, O

    2017-05-01

    The mutation rate on the Y-chromosome matters for estimating the time-to-the-most-recent-common-ancestor (TMRCA, i.e. haplogroup age) in population genetics, as well as for forensic, medical, and genealogical studies. Large-scale sequencing efforts have produced several independent estimates of Y-SNP mutation rates. Genealogical, or pedigree, rates tend to be slightly faster than evolutionary rates obtained from ancient DNA or calibrations using dated (pre)historical events. It is, therefore, suggested to report TMRCAs using an envelope defined by the average aDNA-based rate and the average pedigree-based rate. The current estimate of the "envelope rate" is 0.75-0.89 substitutions per billion base pairs per year. The available Y-SNP mutation rates can be applied to high-coverage data from the entire X-degenerate region, but other datasets may demand recalibrated rates. While a consensus on Y-SNP rates is approaching, the debate on Y-STR rates has continued for two decades, because multiple genealogical rates were consistent with each other but three times faster than the single evolutionary estimate. Applying Y-SNP and Y-STR rates to the same haplogroups recently helped to clarify the issue. Genealogical and evolutionary STR rates typically provide lower and upper bounds of the "true" (SNP-based) age. The genealogical rate often-but not always-works well for haplogroups less than 7000 years old. The evolutionary rate, although calibrated using recent events, inflates ages of young haplogroups and deflates the age of the entire Y-chromosomal tree, but often provides reasonable estimates for intermediate ages (old haplogroups). Future rate estimates and accumulating case studies should further clarify the Y-SNP rates.

  13. Mycobacterium leprae in Colombia described by SNP7614 in gyrA, two minisatellites and geography

    PubMed Central

    Cardona-Castro, Nora; Beltrán-Alzate, Juan Camilo; Romero-Montoya, Irma Marcela; Li, Wei; Brennan, Patrick J; Vissa, Varalakshmi

    2013-01-01

    New cases of leprosy are still being detected in Colombia after the country declared achievement of the WHO defined ‘elimination’ status. To study the ecology of leprosy in endemic regions, a combination of geographic and molecular tools were applied for a group of 201 multibacillary patients including six multi-case families from eleven departments. The location (latitude and longitude) of patient residences were mapped. Slit skin smears and/or skin biopsies were collected and DNA was extracted. Standard agarose gel electrophoresis following a multiplex PCR-was developed for rapid and inexpensive strain typing of M. leprae based on copy numbers of two VNTR minisatellite loci 27-5 and 12-5. A SNP (C/T) in gyrA (SNP7614) was mapped by introducing a novel PCR-RFLP into an ongoing drug resistance surveillance effort. Multiple genotypes were detected combining the three molecular markers. The two frequent genotypes in Colombia were SNP7614(C)/27-5(5)/12-5(4) [C54] predominantly distributed in the Atlantic departments and SNP7614 (T)/27-5(4)/12-5(5) [T45] associated with the Andean departments. A novel genotype SNP7614 (C)/27-5(6)/12-5(4) [C64] was detected in cities along the Magdalena river which separates the Andean from Atlantic departments; a subset was further characterized showing association with a rare allele of minisatellite 23-3 and the SNP type 1 of M. leprae. The genotypes within intra-family cases were conserved. Overall, this is the first large scale study that utilized simple and rapid assay formats for identification of major strain types and their distribution in Colombia. It provides the framework for further strain type discrimination and geographic information systems as tools for tracing transmission of leprosy. PMID:23291420

  14. An Integrated SNP Mining and Utilization (ISMU) Pipeline for Next Generation Sequencing Data

    PubMed Central

    Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M.; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A. V. S. K.; Varshney, Rajeev K.

    2014-01-01

    Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone

  15. SNP markers-based map construction and genome-wide linkage analysis in Brassica napus.

    PubMed

    Raman, Harsh; Dalton-Morgan, Jessica; Diffey, Simon; Raman, Rosy; Alamery, Salman; Edwards, David; Batley, Jacqueline

    2014-09-01

    An Illumina Infinium array comprising 5306 single nucleotide polymorphism (SNP) markers was used to genotype 175 individuals of a doubled haploid population derived from a cross between Skipton and Ag-Spectrum, two Australian cultivars of rapeseed (Brassica napus L.). A genetic linkage map based on 613 SNP and 228 non-SNP (DArT, SSR, SRAP and candidate gene markers) covering 2514.8 cM was constructed and further utilized to identify loci associated with flowering time and resistance to blackleg, a disease caused by the fungus Leptosphaeria maculans. Comparison between genetic map positions of SNP markers and the sequenced Brassica rapa (A) and Brassica oleracea (C) genome scaffolds showed several genomic rearrangements in the B. napus genome. A major locus controlling resistance to L. maculans was identified at both seedling and adult plant stages on chromosome A07. QTL analyses revealed that up to 40.2% of genetic variation for flowering time was accounted for by loci having quantitative effects. Comparative mapping showed Arabidopsis and Brassica flowering genes such as Phytochrome A/D, Flowering Locus C and agamous-Like MADS box gene AGL1 map within marker intervals associated with flowering time in a DH population from Skipton/Ag-Spectrum. Genomic regions associated with flowering time and resistance to L. maculans had several SNP markers mapped within 10 cM. Our results suggest that SNP markers will be suitable for various applications such as trait introgression, comparative mapping and high-resolution mapping of loci in B. napus. © 2014 Society for Experimental Biology, Association of Applied Biologists and John Wiley & Sons Ltd.

  16. Viability of in-house datamarting approaches for population genetics analysis of SNP genotypes

    PubMed Central

    Amigo, Jorge; Phillips, Christopher; Salas, Antonio; Carracedo, Ángel

    2009-01-01

    Background Databases containing very large amounts of SNP (Single Nucleotide Polymorphism) data are now freely available for researchers interested in medical and/or population genetics applications. While many of these SNP repositories have implemented data retrieval tools for general-purpose mining, these alone cannot cover the broad spectrum of needs of most medical and population genetics studies. Results To address this limitation, we have built in-house customized data marts from the raw data provided by the largest public databases. In particular, for population genetics analysis based on genotypes we have built a set of data processing scripts that deal with raw data coming from the major SNP variation databases (e.g. HapMap, Perlegen), stripping them into single genotypes and then grouping them into populations, then merged with additional complementary descriptive information extracted from dbSNP. This allows not only in-house standardization and normalization of the genotyping data retrieved from different repositories, but also the calculation of statistical indices from simple allele frequency estimates to more elaborate genetic differentiation tests within populations, together with the ability to combine population samples from different databases. Conclusion The present study demonstrates the viability of implementing scripts for handling extensive datasets of SNP genotypes with low computational costs, dealing with certain complex issues that arise from the divergent nature and configuration of the most popular SNP repositories. The information contained in these databases can also be enriched with additional information obtained from other complementary databases, in order to build a dedicated data mart. Updating the data structure is straightforward, as well as permitting easy implementation of new external data and the computation of supplementary statistical indices of interest. PMID:19344481

  17. Using Hamming Distance as Information for SNP-Sets Clustering and Testing in Disease Association Studies.

    PubMed

    Wang, Charlotte; Kao, Wen-Hsin; Hsiao, Chuhsing Kate

    2015-01-01

    The availability of high-throughput genomic data has led to several challenges in recent genetic association studies, including the large number of genetic variants that must be considered and the computational complexity in statistical analyses. Tackling these problems with a marker-set study such as SNP-set analysis can be an efficient solution. To construct SNP-sets, we first propose a clustering algorithm, which employs Hamming distance to measure the similarity between strings of SNP genotypes and evaluates whether the given SNPs or SNP-sets should be clustered. A dendrogram can then be constructed based on such distance measure, and the number of clusters can be determined. With the resulting SNP-sets, we next develop an association test HDAT to examine susceptibility to the disease of interest. This proposed test assesses, based on Hamming distance, whether the similarity between a diseased and a normal individual differs from the similarity between two individuals of the same disease status. In our proposed methodology, only genotype information is needed. No inference of haplotypes is required, and SNPs under consideration do not need to locate in nearby regions. The proposed clustering algorithm and association test are illustrated with applications and simulation studies. As compared with other existing methods, the clustering algorithm is faster and better at identifying sets containing SNPs exerting a similar effect. In addition, the simulation studies demonstrated that the proposed test works well for SNP-sets containing a large proportion of neutral SNPs. Furthermore, employing the clustering algorithm before testing a large set of data improves the knowledge in confining the genetic regions for susceptible genetic markers.

  18. Association between CYP19 gene SNP rs2414096 polymorphism and polycystic ovary syndrome in Chinese women.

    PubMed

    Jin, Jia-Li; Sun, Jing; Ge, Hui-Juan; Cao, Yun-Xia; Wu, Xiao-Ke; Liang, Feng-Jing; Sun, Hai-Xiang; Ke, Lu; Yi, Long; Wu, Zhi-Wei; Wang, Yong

    2009-12-16

    Several studies have reported the association of the SNP rs2414096 in the CYP19 gene with hyperandrogenism, which is one of the clinical manifestations of polycystic ovary syndrome (PCOS). These studies suggest that SNP rs2414096 may be involved in the etiopathogenisis of PCOS. To investigate whetherthe CYP19 gene SNP rs2414096 polymorphism is associated with the susceptibility to PCOS, we designed a case-controlled association study including 684 individuals. A case-controlled association study including 684 individuals (386 PCOS patients and 298 controls) was performed to assess the association of SNP rs2414096 with PCOS. Genotyping of SNP rs2414096 was conducted by the polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) method that was performed on genomic DNA isolated from blood leucocytes. Results were analyzed in respect to clinical test results. The genotypic distributions of rs2414096 (GG, AG, AA) in the CYP19 gene (GG, AG, AA) in women with PCOS (0.363, 0.474, 0.163, respectively) were significantly different from that in controls (0.242, 0.500, 0.258, respectively) (P = 0.001). E2/T was different between the AA and GG genotypes. Age at menarche (AAM) and FSH were also significantly different among the GG, AG, and AA genotypes in women with PCOS (P = 0.0391 and 0.0118, respectively). No differences were observed in body mass index (BMI) and other serum hormone concentrations among the three genotypes, either in the PCOS patients or controls. Our data suggest that SNP rs2414096 in the CYP19 gene is associated with susceptibility to PCOS.

  19. An integrated SNP mining and utilization (ISMU) pipeline for next generation sequencing data.

    PubMed

    Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A V S K; Varshney, Rajeev K

    2014-01-01

    Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone

  20. Vitis Phylogenomics: Hybridization Intensities from a SNP Array Outperform Genotype Calls

    PubMed Central

    Miller, Allison J.; Matasci, Naim; Schwaninger, Heidi; Aradhya, Mallikarjuna K.; Prins, Bernard; Zhong, Gan-Yuan; Simon, Charles; Buckler, Edward S.; Myles, Sean

    2013-01-01

    Understanding relationships among species is a fundamental goal of evolutionary biology. Single nucleotide polymorphisms (SNPs) identified through next generation sequencing and related technologies enable phylogeny reconstruction by providing unprecedented numbers of characters for analysis. One approach to SNP-based phylogeny reconstruction is to identify SNPs in a subset of individuals, and then to compile SNPs on an array that can be used to genotype additional samples at hundreds or thousands of sites simultaneously. Although powerful and efficient, this method is subject to ascertainment bias because applying variation discovered in a representative subset to a larger sample favors identification of SNPs with high minor allele frequencies and introduces bias against rare alleles. Here, we demonstrate that the use of hybridization intensity data, rather than genotype calls, reduces the effects of ascertainment bias. Whereas traditional SNP calls assess known variants based on diversity housed in the discovery panel, hybridization intensity data survey variation in the broader sample pool, regardless of whether those variants are present in the initial SNP discovery process. We apply SNP genotype and hybridization intensity data derived from the Vitis9kSNP array developed for grape to show the effects of ascertainment bias and to reconstruct evolutionary relationships among Vitis species. We demonstrate that phylogenies constructed using hybridization intensities suffer less from the distorting effects of ascertainment bias, and are thus more accurate than phylogenies based on genotype calls. Moreover, we reconstruct the phylogeny of the genus Vitis using hybridization data, show that North American subgenus Vitis species are monophyletic, and resolve several previously poorly known relationships among North American species. This study builds on earlier work that applied the Vitis9kSNP array to evolutionary questions within Vitis vinifera and has general

  1. BlueSNP: R package for highly scalable genome-wide association studies using Hadoop clusters.

    PubMed

    Huang, Hailiang; Tata, Sandeep; Prill, Robert J

    2013-01-01

    Computational workloads for genome-wide association studies (GWAS) are growing in scale and complexity outpacing the capabilities of single-threaded software designed for personal computers. The BlueSNP R package implements GWAS statistical tests in the R programming language and executes the calculations across computer clusters configured with Apache Hadoop, a de facto standard framework for distributed data processing using the MapReduce formalism. BlueSNP makes computationally intensive analyses, such as estimating empirical p-values via data permutation, and searching for expression quantitative trait loci over thousands of genes, feasible for large genotype-phenotype datasets. http://github.com/ibm-bioinformatics/bluesnp

  2. Antares prototype 300-kJ, 250-kA Marx generator. Final report

    SciTech Connect

    Riepe, K.B.; Barrone, L.L.; Bickford, K.J.; Livermore, G.H.

    1981-01-01

    A high-energy, low-inductance, low prefire rate, low trigger jitter, high-voltage, pulsed-power supply was needed to drive the gas discharge in the Antares laser power amplifier. This report describes the design and testing of a Marx generator that meets these requirements, the development and testing of a high-capacity spark gap, and the selection of suitable capacitors and resistors.

  3. Operating experience with a 250 kW el molten carbonate fuel cell (MCFC) power plant

    NASA Astrophysics Data System (ADS)

    Bischoff, Manfred; Huppmann, Gerhard

    The MTU MCFC program is carried out by a European consortium comprising the German companies MTU Friedrichshafen GmbH, Ruhrgas AG and RWE Energie AG as well as the Danish company Energi E2 S/A. MTU acts as consortium leader. The company shares a license and technology exchange agreement with Fuel Cell Energy Inc., Danbury, CT, USA (formerly Energy Research Corp., ERC). The program was started in 1990 and covers a period of about 10 years. The highlights of this program to date are: Considerable improvements regarding component stability have been demonstrated on laboratory scale. Manufacturing technology has been developed to a point which enables the consortium to fabricate the porous components on a 250 cm 2 scale. Several large area stacks with 5000-7660 cm 2 cell area and a power range of 3-10 kW have been tested at the facilities in Munich (Germany) and Kyndby (Denmark). These stacks have been supplied by FCE. As far as the system design is concerned it was soon realized that conventional systems do not hold the promise for competitive power plants. A system analysis led to the conclusion that a new innovative design approach is required. As a result the "Hot Module" system was developed by the consortium. A Hot Module combines all the components of a MCFC system operating at the similar temperatures and pressures into a common thermally insulated vessel. In August 1997 the consortium started its first full size Hot Module MCFC test plant at the facilities of Ruhrgas AG in Dorsten, Germany. The stack was assembled in Munich using 292 cell packages purchased from FCE. The plant is based on the consortium's unique and proprietary "Hot Module" concept. It operates on pipeline natural gas and was grid connected on 16 August 1997. After a total of 1500 h of operation, the plant was intentionally shut down in a controlled manner in April 1998 for post-test analysis. The Hot Module system concept has demonstrated its functionality. The safety concept has been convincingly proven, though in part unintentionally. The electrical power level of 155 kW (ca. 60% of maximum power) achieved allows validation of the concept with reasonable degree of confidence. Horizontal stack operation—an essential innovation of the Hot Module concept—is feasible. The fuel processing subsystem worked reliably as expected. After initial problems in the inverter control software, the electrical and control subsystem operated to full satisfaction. Stable automatic operation not only under various load conditions, but also in idle mode, hot parking mode, and grid-independent mode has been demonstrated. Together with progress achieved by FCE in the qualification of large direct fuel cell (DFC) stacks the basis was laid for the next test unit of similar design, which will be operated in Bielefeld, Germany. The pre-tests of the stack took place already in July 1999 with good results. Additionally, projects for the test of the DFC Hot Module operating on biogas and other opportunity fuels are under preparation.

  4. DISCOVERY OF A ∼250 K BROWN DWARF AT 2 pc FROM THE SUN

    SciTech Connect

    Luhman, K. L.

    2014-05-10

    Through a previous analysis of multi-epoch astrometry from the Wide-field Infrared Survey Explorer (WISE), I identified WISE J085510.83–071442.5 as a new high proper motion object. By combining astrometry from WISE and the Spitzer Space Telescope, I have measured a proper motion of 8.1 ± 0.1'' yr{sup –1} and a parallax of 0.454 ± 0.045'' (2.20{sub −0.20}{sup +0.24} pc) for WISE J085510.83–071442.5, giving it the third highest proper motion and the fourth largest parallax of any known star or brown dwarf. It is also the coldest known brown dwarf based on its absolute magnitude at 4.5 μm and its color in [3.6]-[4.5]. By comparing M {sub 4.5} with the values predicted by theoretical evolutionary models, I estimate an effective temperature of 225-260 K and a mass of 3-10 M {sub Jup} for the age range of 1-10 Gyr that encompasses most nearby stars.

  5. SNP500Cancer: a public resource for sequence validation, assay development, and frequency analysis for genetic variation in candidate genes.

    PubMed

    Packer, Bernice R; Yeager, Meredith; Burdett, Laura; Welch, Robert; Beerman, Michael; Qi, Liqun; Sicotte, Hugues; Staats, Brian; Acharya, Mekhala; Crenshaw, Andrew; Eckert, Andrew; Puri, Vinita; Gerhard, Daniela S; Chanock, Stephen J

    2006-01-01

    The SNP500Cancer database provides sequence and genotype assay information for candidate SNPs useful in mapping complex diseases, such as cancer. The database is an integral component of the NCI Cancer Genome Anatomy Project (http://cgap.nci.nih.gov). SNP500Cancer reports sequence analysis of anonymized control DNA samples (n = 102 Coriell samples representing four self-described ethnic groups: African/African-American, Caucasian, Hispanic and Pacific Rim). The website is searchable by gene, chromosome, gene ontology pathway, dbSNP ID and SNP500Cancer SNP ID. As of October 2005, the database contains >13 400 SNPs, 9124 of which have been sequenced in the SNP500Cancer population. For each analysed SNP, gene location and >200 bp of surrounding annotated sequence (including nearby SNPs) are provided, with frequency information in total and per subpopulation as well as calculation of Hardy-Weinberg equilibrium for each subpopulation. The website provides the conditions for validated sequencing and genotyping assays, as well as genotype results for the 102 samples, in both viewable and downloadable formats. A subset of sequence validated SNPs with minor allele frequency >5% are entered into a high-throughput pipeline for genotyping analysis to determine concordance for the same 102 samples. In addition, the results of genotype analysis for select validated SNP assays (defined as 100% concordance between sequence analysis and genotype results) are posted for an additional 280 samples drawn from the Human Diversity Panel (HDP). SNP500Cancer provides an invaluable resource for investigators to select SNPs for analysis, design genotyping assays using validated sequence data, choose selected assays already validated on one or more genotyping platforms, and select reference standards for genotyping assays. The SNP500Cancer database is freely accessible via the web page at http://snp500cancer.nci.nih.gov.

  6. Functional SNP associated with birth weight in independent populations identified with a permutation step added to GBLUP-GWAS

    USDA-ARS?s Scientific Manuscript database

    This study was conducted as an initial assessment of a newly available genotyping assay containing about 34,000 common SNP included on previous SNP chips, and 199,000 sequence variants predicted to affect gene function. Objectives were to identify functional variants associated with birth weight in...

  7. Tagging SNP-set selection with maximum information based on linkage disequilibrium structure in genome-wide association studies.

    PubMed

    Wang, Shudong; He, Sicheng; Yuan, Fayou; Zhu, Xinjie

    2017-07-15

    Effective tagging single-nucleotide polymorphism (SNP)-set selection is crucial to SNP-set analysis in genome-wide association studies (GWAS). Most of the existing tagging SNP-set selection methods cannot make full use of the information hidden in common or rare variants associated diseases. It is noticed that some SNPs have overlapping genetic information owing to linkage disequilibrium (LD) structure between SNPs. Therefore, when testing the association between SNPs and disease susceptibility, it is sufficient to elect the representative SNPs (called tag SNP-set or tagSNP-set) with maximum information. It is proposed a new tagSNP-set selection method based on LD information between SNPs, namely TagSNP-Set with Maximum Information. Compared with classical SNP-set analytical method, our method not only has higher power, but also can minimize the number of selected tagSNPs and maximize the information provided by selected tagSNPs with less genotyping cost and lower time complexity. hesicheng12@163.com. Supplementary data are available at Bioinformatics online.

  8. Genome-wide Target Enrichment-aided Chip Design: a 66 K SNP Chip for Cashmere Goat.

    PubMed

    Qiao, Xian; Su, Rui; Wang, Yang; Wang, Ruijun; Yang, Ting; Li, Xiaokai; Chen, Wei; He, Shiyang; Jiang, Yu; Xu, Qiwu; Wan, Wenting; Zhang, Yaolei; Zhang, Wenguang; Chen, Jiang; Liu, Bin; Liu, Xin; Fan, Yixing; Chen, Duoyuan; Jiang, Huaizhi; Fang, Dongming; Liu, Zhihong; Wang, Xiaowen; Zhang, Yanjun; Mao, Danqing; Wang, Zhiying; Di, Ran; Zhao, Qianjun; Zhong, Tao; Yang, Huanming; Wang, Jian; Wang, Wen; Dong, Yang; Chen, Xiaoli; Xu, Xun; Li, Jinquan

    2017-08-17

    Compared with the commercially available single nucleotide polymorphism (SNP) chip based on the Bead Chip technology, the solution hybrid selection (SHS)-based target enrichment SNP chip is not only design-flexible, but also cost-effective for genotype sequencing. In this study, we propose to design an animal SNP chip using the SHS-based target enrichment strategy for the first time. As an update to the international collaboration on goat research, a 66 K SNP chip for cashmere goat was created from the whole-genome sequencing data of 73 individuals. Verification of this 66 K SNP chip with the whole-genome sequencing data of 436 cashmere goats showed that the SNP call rates was between 95.3% and 99.8%. The average sequencing depth for target SNPs were 40X. The capture regions were shown to be 200 bp that flank target SNPs. This chip was further tested in a genome-wide association analysis of cashmere fineness (fiber diameter). Several top hit loci were found marginally associated with signaling pathways involved in hair growth. These results demonstrate that the 66 K SNP chip is a useful tool in the genomic analyses of cashmere goats. The successful chip design shows that the SHS-based target enrichment strategy could be applied to SNP chip design in other species.

  9. MDM2 promoter SNP55 (rs2870820) affects risk of colon cancer but not breast-, lung-, or prostate cancer

    PubMed Central

    Helwa, Reham; Gansmo, Liv B.; Romundstad, Pål; Hveem, Kristian; Vatten, Lars; Ryan, Bríd M.; Harris, Curtis C.; Lønning, Per E.; Knappskog, Stian

    2016-01-01

    Two functional SNPs (SNP285G > C; rs117039649 and SNP309T > G; rs2279744) have previously been reported to modulate Sp1 transcription factor binding to the promoter of the proto-oncogene MDM2, and to influence cancer risk. Recently, a third SNP (SNP55C > T; rs2870820) was also reported to affect Sp1 binding and MDM2 transcription. In this large population based case-control study, we genotyped MDM2 SNP55 in 10,779 Caucasian individuals, previously genotyped for SNP309 and SNP285, including cases of colon (n = 1,524), lung (n = 1,323), breast (n = 1,709) and prostate cancer (n = 2,488) and 3,735 non-cancer controls, as well as 299 healthy African-Americans. Applying the dominant model, we found an elevated risk of colon cancer among individuals harbouring SNP55TT/CT genotypes compared to the SNP55CC genotype (OR = 1.15; 95% CI = 1.01–1.30). The risk was found to be highest for left-sided colon cancer (OR = 1.21; 95% CI = 1.00–1.45) and among females (OR = 1.32; 95% CI = 1.01–1.74). Assessing combined genotypes, we found the highest risk of colon cancer among individuals harbouring the SNP55TT or CT together with the SNP309TG genotype (OR = 1.21; 95% CI = 1.00–1.46). Supporting the conclusions from the risk estimates, we found colon cancer cases carrying the SNP55TT/CT genotypes to be diagnosed at younger age as compared to SNP55CC (p = 0.053), in particular among patients carrying the SNP309TG/TT genotypes (p = 0.009). PMID:27624283

  10. A comparison of two informative SNP-based strategies for typing Pseudomonas aeruginosa isolates from patients with cystic fibrosis

    PubMed Central

    2014-01-01

    Background Molecular typing is integral for identifying Pseudomonas aeruginosa strains that may be shared between patients with cystic fibrosis (CF). We conducted a side-by-side comparison of two P. aeruginosa genotyping methods utilising informative-single nucleotide polymorphism (SNP) methods; one targeting 10 P. aeruginosa SNPs and using real-time polymerase chain reaction technology (HRM10SNP) and the other targeting 20 SNPs and based on the Sequenom MassARRAY platform (iPLEX20SNP). Methods An in-silico analysis of the 20 SNPs used for the iPLEX20SNP method was initially conducted using sequence type (ST) data on the P. aeruginosa PubMLST website. A total of 506 clinical isolates collected from patients attending 11 CF centres throughout Australia were then tested by both the HRM10SNP and iPLEX20SNP assays. Type-ability and discriminatory power of the methods, as well as their ability to identify commonly shared P. aeruginosa strains, were compared. Results The in-silico analyses showed that the 1401 STs available on the PubMLST website could be divided into 927 different 20-SNP profiles (D-value = 0.999), and that most STs of national or international importance in CF could be distinguished either individually or as belonging to closely related single- or double-locus variant groups. When applied to the 506 clinical isolates, the iPLEX20SNP provided better discrimination over the HRM10SNP method with 147 different 20-SNP and 92 different 10-SNP profiles observed, respectively. For detecting the three most commonly shared Australian P. aeruginosa strains AUST-01, AUST-02 and AUST-06, the two methods were in agreement for 80/81 (98.8%), 48/49 (97.8%) and 11/12 (91.7%) isolates, respectively. Conclusions The iPLEX20SNP is a superior new method for broader SNP-based MLST-style investigations of P. aeruginosa. However, because of convenience and availability, the HRM10SNP method remains better suited for clinical microbiology laboratories that only utilise real

  11. SNP array–based karyotyping: differences and similarities between aplastic anemia and hypocellular myelodysplastic syndromes

    PubMed Central

    Afable, Manuel G.; Wlodarski, Marcin; Makishima, Hideki; Shaik, Mohammed; Sekeres, Mikkael A.; Tiu, Ramon V.; Kalaycio, Matt; O'Keefe, Christine L.

    2011-01-01

    In aplastic anemia (AA), contraction of the stem cell pool may result in oligoclonality, while in myelodysplastic syndromes (MDS) a single hematopoietic clone often characterized by chromosomal aberrations expands and outcompetes normal stem cells. We analyzed patients with AA (N = 93) and hypocellular MDS (hMDS, N = 24) using single nucleotide polymorphism arrays (SNP-A) complementing routine cytogenetics. We hypothesized that clinically important cryptic clonal aberrations may exist in some patients with BM failure. Combined metaphase and SNP-A karyotyping improved detection of chromosomal lesions: 19% and 54% of AA and hMDS cases harbored clonal abnormalities including copy-neutral loss of heterozygosity (UPD, 7%). Remarkably, lesions involving the HLA locus suggestive of clonal immune escape were found in 3 of 93 patients with AA. In hMDS, additional clonal lesions were detected in 5 (36%) of 14 patients with normal/noninformative routine cytogenetics. In a subset of AA patients studied at presentation, persistent chromosomal genomic lesions were found in 10 of 33, suggesting that the initial diagnosis may have been hMDS. Similarly, using SNP-A, earlier clonal evolution was found in 4 of 7 AA patients followed serially. In sum, our results indicate that SNP-A identify cryptic clonal genomic aberrations in AA and hMDS leading to improved distinction of these disease entities. PMID:21527527

  12. selectSNP – An R package for selecting SNPs optimal for genetic evaluation

    USDA-ARS?s Scientific Manuscript database

    There has been a huge increase in the number of SNPs in the public repositories. This has made it a challenge to design low and medium density SNP panels, which requires careful selection of available SNPs considering many criteria, such as map position, allelic frequency, possible biological functi...

  13. The identification of SNPs with indeterminate positions using the Equine SNP50 BeadChip.

    PubMed

    Corbin, L J; Blott, S C; Swinburne, J E; Vaudin, M; Bishop, S C; Woolliams, J A

    2012-06-01

    We have used linkage disequilibrium (LD) to identify single nucleotide polymorphisms (SNPs) on the Illumina Equine SNP50 BeadChip, which may be incorrectly positioned on the genome map. A total of 1201 Thoroughbred horses were genotyped using the Illumina Equine SNP50 BeadChip. LD was evaluated in a pairwise fashion between all autosomal SNPs, both within and across chromosomes. Filters were then applied to the data, firstly to identify SNPs that may have been mapped to the wrong chromosome and secondly to identify SNPs that may have been incorrectly positioned within chromosomes. We identified a single SNP on ECA28, which showed low LD with neighbouring SNPs but considerable LD with a group of SNPs on ECA10. Furthermore, a cluster of SNPs on ECA5 showed unusually low LD with surrounding SNPs. A total of 39 SNPs met the criteria for unusual within-chromosome LD. The results of this study indicate that some SNPs may be misplaced. This finding is significant, as misplaced SNPs may lead to difficulties in the application of genomic methods, such as homozygosity mapping, for which SNP order is important.

  14. An abbreviated SNP panel for ancestry assignment of honeybees (Apis mellifera)

    USDA-ARS?s Scientific Manuscript database

    This paper examines whether an abbreviated panel of 37 single nucleotide polymorphisms (SNPs) has the same power as a larger and more expensive panel of 95 SNPs to assign ancestry of honeybees (Apis mellifera) to three ancestral lineages. We selected 37 SNPs from the original 95 SNP panel using alle...

  15. Making a chocolate chip: development and evaluation of a 6K SNP array for Theobroma cacao.

    USDA-ARS?s Scientific Manuscript database

    Theobroma cacao, the key ingredient in chocolate production, is one of the world's most important tree fruit crops, with ~4,000,000 metric tons produced across 50 countries. To move towards gene discovery and marker-assisted breeding in cacao, a single-nucleotide polymorphism (SNP) identification pr...

  16. Longevity and Plasticity of CFTR Provide an Argument for Noncanonical SNP Organization in Hominid DNA

    PubMed Central

    Hill, Aubrey E.; Plyler, Zackery E.; Tiwari, Hemant; Patki, Amit; Tully, Joel P.; McAtee, Christopher W.; Moseley, Leah A.; Sorscher, Eric J.

    2014-01-01

    Like many other ancient genes, the cystic fibrosis transmembrane conductance regulator (CFTR) has survived for hundreds of millions of years. In this report, we consider whether such prodigious longevity of an individual gene – as opposed to an entire genome or species – should be considered surprising in the face of eons of relentless DNA replication errors, mutagenesis, and other causes of sequence polymorphism. The conventions that modern human SNP patterns result either from purifying selection or random (neutral) drift were not well supported, since extant models account rather poorly for the known plasticity and function (or the established SNP distributions) found in a multitude of genes such as CFTR. Instead, our analysis can be taken as a polemic indicating that SNPs in CFTR and many other mammalian genes may have been generated—and continue to accrue—in a fundamentally more organized manner than would otherwise have been expected. The resulting viewpoint contradicts earlier claims of ‘directional’ or ‘intelligent design-type’ SNP formation, and has important implications regarding the pace of DNA adaptation, the genesis of conserved non-coding DNA, and the extent to which eukaryotic SNP formation should be viewed as adaptive. PMID:25350658

  17. EvoSNP-DB: A database of genetic diversity in East Asian populations

    PubMed Central

    Kim, Young Uk; Kim, Young Jin; Lee, Jong-Young; Park, Kiejung

    2013-01-01

    Genome-wide association studies (GWAS) have become popular as an approach for the identification of large numbers of phenotype-associated variants. However, differences in genetic architecture and environmental factors mean that the effect of variants can vary across populations. Understanding population genetic diversity is valuable for the investigation of possible population specific and independent effects of variants. EvoSNP-DB aims to provide information regarding genetic diversity among East Asian populations, including Chinese, Japanese, and Korean. Non-redundant SNPs (1.6 million) were genotyped in 54 Korean trios (162 samples) and were compared with 4 million SNPs from HapMap phase II populations. EvoSNP-DB provides two user interfaces for data query and visualization, and integrates scores of genetic diversity (Fst and VarLD) at the level of SNPs, genes, and chromosome regions. EvoSNP-DB is a web-based application that allows users to navigate and visualize measurements of population genetic differences in an interactive manner, and is available online at [http://biomi.cdc.go.kr/EvoSNP/]. [BMB Reports 2013; 46(8): 416-421] PMID:23977990

  18. Measuring diversity in Gossypium hirsutum using the CottonSNP63K Array

    USDA-ARS?s Scientific Manuscript database

    A CottonSNP63K array and accompanying cluster file has been developed and includes 45,104 intra-specific SNPs and 17,954 inter-specific SNPs for automated genotyping of cotton (Gossypium spp.) samples. Development of the cluster file included genotyping of 1,156 samples, a subset of which were iden...

  19. Linkage disequilibrium among commonly genotyped SNP and variants detected from bull sequence

    USDA-ARS?s Scientific Manuscript database

    Genomic prediction utilizing causal variants could increase selection accuracy above that achieved with SNP genotyped by commercial assays. A number of variants detected from sequencing influential sires are likely to be causal, but noticable improvements in prediction accuracy using imputed sequen...

  20. SNP-microarrays can accurately identify the presence of an individual in complex forensic DNA mixtures.

    PubMed

    Voskoboinik, Lev; Ayers, Sheri B; LeFebvre, Aaron K; Darvasi, Ariel

    2015-05-01

    Common forensic and mass disaster scenarios present DNA evidence that comprises a mixture of several contributors. Identifying the presence of an individual in such mixtures has proven difficult. In the current study, we evaluate the practical usefulness of currently available "off-the-shelf" SNP microarrays for such purposes. We found that a set of 3000 SNPs specifically selected for this purpose can accurately identify the presence of an individual in complex DNA mixtures of various compositions. For example, individuals contributing as little as 5% to a complex DNA mixture can be robustly identified even if the starting DNA amount was as little as 5.0ng and had undergone whole-genome amplification (WGA) prior to SNP analysis. The work presented in this study represents proof-of-principle that our previously proposed approach, can work with real "forensic-type" samples. Furthermore, in the absence of a low-density focused forensic SNP microarray, the use of standard, currently available high-density SNP microarrays can be similarly used and even increase statistical power due to the larger amount of available information.

  1. Fine mapping of copy number variations on two cattle genome assemblies using high density SNP array

    USDA-ARS?s Scientific Manuscript database

    Btau_4.0 and UMD3.1 are two distinct cattle reference genome assemblies. In our previous study using the low density BovineSNP50 array, we reported a copy number variation (CNV) analysis on Btau_4.0 with 521 animals of 21 cattle breeds, yielding 682 CNV regions with a total length of 139.8 megabases...

  2. An improved consensus linkage map of barley based on flow-sorted chromosomes and SNP markers

    USDA-ARS?s Scientific Manuscript database

    Recent advances in high-throughput genotyping have made it easier to combine information from different mapping populations into consensus genetic maps, which provide increased marker density and genome coverage compared to individual maps. Previously, a SNP-based genotyping platform was developed a...

  3. Microsatellite Imputation for parental verification from SNP across multiple Bos taurus and indicus breeds

    USDA-ARS?s Scientific Manuscript database

    Microsatellite markers (MS) have traditionally been used for parental verification and are still the international standard in spite of their higher cost, error rate, and turnaround time compared with Single Nucleotide Polymorphisms (SNP)-based assays. Despite domestic and international demands fro...

  4. A web-based genome browser for 'SNP-aware' assay design

    USDA-ARS?s Scientific Manuscript database

    Human and animal genomes contain an abundance of single nucleotide polymorphisms (SNPs) that are useful for genetic testing. However, the relatively large number of SNPs present in diverse populations can pose serious problems when designing assays. It is important to “mask” some SNP positions so ...

  5. Use of microsatellite and SNP markers to characterize biotypes in Hessian fly

    USDA-ARS?s Scientific Manuscript database

    Exploration of the biotype structure of Hessian fly, Mayetiola destructor (Say), would improve our knowledge regarding variation in virulence phenotypes and difference in genetic background. The objective of this study was to develop and test a panel of 18 microsatellite and 22 SNP markers to reveal...

  6. High-throughput RAD-SNP genotyping for characterization of sugar beet genotypes

    USDA-ARS?s Scientific Manuscript database

    High-throughput SNP genotyping provides a rapid way of developing resourceful set of markers for delineating the genetic architecture and for effective species discrimination. In the presented research, we demonstrate a set of 192 SNPs for effective genotyping in sugar beet using high-throughput mar...

  7. A novel approach to analyzing fMRI and SNP data via parallel independent component analysis

    NASA Astrophysics Data System (ADS)

    Liu, Jingyu; Pearlson, Godfrey; Calhoun, Vince; Windemuth, Andreas

    2007-03-01

    There is current interest in understanding genetic influences on brain function in both the healthy and the disordered brain. Parallel independent component analysis, a new method for analyzing multimodal data, is proposed in this paper and applied to functional magnetic resonance imaging (fMRI) and a single nucleotide polymorphism (SNP) array. The method aims to identify the independent components of each modality and the relationship between the two modalities. We analyzed 92 participants, including 29 schizophrenia (SZ) patients, 13 unaffected SZ relatives, and 50 healthy controls. We found a correlation of 0.79 between one fMRI component and one SNP component. The fMRI component consists of activations in cingulate gyrus, multiple frontal gyri, and superior temporal gyrus. The related SNP component is contributed to significantly by 9 SNPs located in sets of genes, including those coding for apolipoprotein A-I, and C-III, malate dehydrogenase 1 and the gamma-aminobutyric acid alpha-2 receptor. A significant difference in the presences of this SNP component is found between the SZ group (SZ patients and their relatives) and the control group. In summary, we constructed a framework to identify the interactions between brain functional and genetic information; our findings provide new insight into understanding genetic influences on brain function in a common mental disorder.

  8. Development and validation of a low-density SNP panel related to prolificacy in sheep

    USDA-ARS?s Scientific Manuscript database

    High-density SNP panels (e.g., 50,000 and 600,000 markers) have been used in exploratory population genetic studies with commercial and minor breeds of sheep. However, routine genetic diversity evaluations of large numbers of samples with large panels are in general cost-prohibitive for gene banks. ...

  9. The use of SNP data for the monitoring of genetic diversity in cattle breeds

    USDA-ARS?s Scientific Manuscript database

    LD between SNPs contains information about effective population size. In this study, we investigate the use of genome-wide SNP data for marker based estimation of effective population size for two taurine cattle breeds of Africa and two local cattle breeds of Switzerland. Estimated recombination rat...

  10. Mining for SNPs and SSRs using SNPServer, dbSNP and SSR taxonomy tree.

    PubMed

    Batley, Jacqueline; Edwards, David

    2009-01-01

    Molecular genetic markers represent one of the most powerful tools for the analysis of genomes and the association of heritable traits with underlying genetic variation. The development of high-throughput methods for the detection of single nucleotide polymorphisms (SNPs) and simple sequence repeats (SSRs) has led to a revolution in their use as molecular markers. The availability of large sequence data sets permits mining for these molecular markers, which may then be used for applications such as genetic trait mapping, diversity analysis and marker assisted selection in agriculture. Here we describe web-based automated methods for the discovery of SSRs using SSR taxonomy tree, the discovery of SNPs from sequence data using SNPServer and the identification of validated SNPs from within the dbSNP database. SSR taxonomy tree identifies pre-determined SSR amplification primers for virtually all species represented within the GenBank database. SNPServer uses a redundancy based approach to identify SNPs within DNA sequences. Following submission of a sequence of interest, SNPServer uses BLAST to identify similar sequences, CAP3 to cluster and assemble these sequences and then the SNP discovery software autoSNP to detect SNPs and insertion/deletion (indel) polymorphisms. The NCBI dbSNP database is a catalogue of molecular variation, hosting validated SNPs for several species within a public-domain archive.

  11. The impact of SNP fingerprinting and parentage analysis on the effectiveness of variety recommendations in cacao

    USDA-ARS?s Scientific Manuscript database

    Evidence for the impact of mislabeling and/or pollen contamination on consistency of field performance has been lacking to reinforce the need for strict adherence to quality control protocols in cacao seed garden and germplasm plot management. The present study used SNP fingerprinting at 64 loci to ...

  12. Performance of the SNPforID 52 SNP-plex assay in paternity testing.

    PubMed

    Børsting, Claus; Sanchez, Juan J; Hansen, Hanna E; Hansen, Anders J; Bruun, Hanne Q; Morling, Niels

    2008-09-01

    The performance of a multiplex assay with 52 autosomal single nucleotide polymorphisms (SNPs) developed for human identification was tested on 124 mother-child-father trios. The typical paternity indices (PIs) were 10(5)-10(6) for the trios and 10(3)-10(4) for the child-father duos. Using the SNP profiles from the randomly selected trios and 700 previously typed individuals, a total of 83,096 comparisons between mother, child and an unrelated man were performed. On average, 9-10 mismatches per comparison were detected. Four mismatches were genetic inconsistencies and 5-6 mismatches were opposite homozygosities. In only two of the 83,096 comparisons did an unrelated man match perfectly to a mother-child duo, and in both cases the PI of the true father was much higher than the PI of the unrelated man. The trios were also typed for 15 short tandem repeats (STRs) and seven variable number of tandem repeats (VNTRs). The typical PIs based on 15 STRs or seven VNTRs were 5-50 times higher than the typical PIs based on 52 SNPs. Six mutations in tandem repeats were detected among the randomly selected trios. In contrast, there was not found any mutations in the SNP loci. The results showed that the 52 SNP-plex assay is a very useful alternative to currently used methods in relationship testing. The usefulness of SNP markers with low mutation rates in paternity and immigration casework is discussed.

  13. SNP discovery in complex allotetraploid genomes (Gossypium spp., Malvaceae) using genotyping by sequencing

    USDA-ARS?s Scientific Manuscript database

    Dramatic decreases in the cost of DNA sequencing have enabled the development of very large numbers of markers based on single nucleotide polymorphism (SNP) for phylogenetic studies, population genetics, linkage mapping, marker-assisted breeding and other applications. Using Illumina next-generatio...

  14. Verification of genetic identity of introduced cacao germplasm in Ghana using single nucleotide polymorphism (SNP) markers

    USDA-ARS?s Scientific Manuscript database

    Accurate identification of individual genotypes is important for cacao (Theobroma cacao L.) breeding, germplasm conservation and seed propagation. The development of single nucleotide polymorphism (SNP) markers in cacao offers an effective way to use a high-throughput genotyping system for cacao gen...

  15. Applying SNP marker technology in the cacao breeding program at the Cocoa Research Institute of Ghana

    USDA-ARS?s Scientific Manuscript database

    In this investigation 45 parental cacao plants and five progeny derived from the parental stock studied were genotyped using six SNP markers to determine off-types or mislabeled clones and to authenticate crosses made in the Cocoa Research Institute of Ghana (CRIG) breeding program. Investigation wa...

  16. Association of Agronomic Traits with SNP Markers in Durum Wheat (Triticum turgidum L. durum (Desf.)).

    PubMed

    Hu, Xin; Ren, Jing; Ren, Xifeng; Huang, Sisi; Sabiel, Salih A I; Luo, Mingcheng; Nevo, Eviatar; Fu, Chunjie; Peng, Junhua; Sun, Dongfa

    2015-01-01

    Association mapping is a powerful approach to detect associations between traits of interest and genetic markers based on linkage disequilibrium (LD) in molecular plant breeding. In this study, 150 accessions of worldwide originated durum wheat germplasm (Triticum turgidum spp. durum) were genotyped using 1,366 SNP markers. The extent of LD on each chromosome was evaluated. Association of single nucleotide polymorphisms (SNP) markers with ten agronomic traits measured in four consecutive years was analyzed under a mix linear model (MLM). Two hundred and one significant association pairs were detected in the four years. Several markers were associated with one trait, and also some markers were associated with multiple traits. Some of the associated markers were in agreement with previous quantitative trait loci (QTL) analyses. The function and homology analyses of the corresponding ESTs of some SNP markers could explain many of the associations for plant height, length of main spike, number of spikelets on main spike, grain number per plant, and 1000-grain weight, etc. The SNP associations for the observed traits are generally clustered in specific chromosome regions of the wheat genome, mainly in 2A, 5A, 6A, 7A, 1B, and 6B chromosomes. This study demonstrates that association mapping can complement and enhance previous QTL analyses and provide additional information for marker-assisted selection.

  17. Identification of a SNP marker associated with WB242 nematode resistance in sugar beet

    USDA-ARS?s Scientific Manuscript database

    The beet-cyst nematode (Heterodera schachtii Schmidt) is one of the major diseases of sugar beet. The identification of molecular markers associated to the nematode resistance would be helpful for developing resistant varieties. The aim of this study was the identification of SNP (Single Nucleotide ...

  18. Utilization of a whole genome SNP panel for efficient genetic mapping in the mouse

    PubMed Central

    Moran, Jennifer L.; Bolton, Andrew D.; Tran, Pamela V.; Brown, Alison; Dwyer, Noelle D.; Manning, Danielle K.; Bjork, Bryan C.; Li, Cheng; Montgomery, Kate; Siepka, Sandra M.; Vitaterna, Martha Hotz; Takahashi, Joseph S.; Wiltshire, Tim; Kwiatkowski, David J.; Kucherlapati, Raju; Beier, David R.

    2006-01-01

    Phenotype-driven genetics can be used to create mouse models of human disease and birth defects. However, the utility of these mutant models is limited without identification of the causal gene. To facilitate genetic mapping, we developed a fixed single nucleotide polymorphism (SNP) panel of 394 SNPs as an alternative to analyses using simple sequence length polymorphism (SSLP) marker mapping. With the SNP panel, chromosomal locations for 22 monogenic mutants were identified. The average number of affected progeny genotyped for mapped monogenic mutations is nine. Map locations for several mutants have been obtained with as few as four affected progeny. The average size of genetic intervals obtained for these mutants is 43 Mb, with a range of 17–83 Mb. Thus, our SNP panel allows for identification of moderate resolution map position with small numbers of mice in a high-throughput manner. Importantly, the panel is suitable for mapping crosses from many inbred and wild-derived inbred strain combinations. The chromosomal localizations obtained with the SNP panel allow one to quickly distinguish between potentially novel loci or remutations in known genes, and facilitates fine mapping and positional cloning. By using this approach, we identified DNA sequence changes in two ethylnitrosourea-induced mutants. PMID:16461637

  19. Changes in variance explained by top SNP windows over generations for three traits in broiler chicken

    PubMed Central

    Fragomeni, Breno de Oliveira; Misztal, Ignacy; Lourenco, Daniela Lino; Aguilar, Ignacio; Okimoto, Ronald; Muir, William M.

    2014-01-01

    The purpose of this study was to determine if the set of genomic regions inferred as accounting for the majority of genetic variation in quantitative traits remain stable over multiple generations of selection. The data set contained phenotypes for five generations of broiler chicken for body weight, breast meat, and leg score. The population consisted of 294,632 animals over five generations and also included genotypes of 41,036 single nucleotide polymorphism (SNP) for 4,866 animals, after quality control. The SNP effects were calculated by a GWAS type analysis using single step genomic BLUP approach for generations 1–3, 2–4, 3–5, and 1–5. Variances were calculated for windows of 20 SNP. The top ten windows for each trait that explained the largest fraction of the genetic variance across generations were examined. Across generations, the top 10 windows explained more than 0.5% but less than 1% of the total variance. Also, the pattern of the windows was not consistent across generations. The windows that explained the greatest variance changed greatly among the combinations of generations, with a few exceptions. In many cases, a window identified as top for one combination, explained less than 0.1% for the other combinations. We conclude that identification of top SNP windows for a population may have little predictive power for genetic selection in the following generations for the traits here evaluated. PMID:25324857

  20. Optimal design of low-density SNP arrays for genomic prediction: algorithm and applications

    USDA-ARS?s Scientific Manuscript database

    Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for their optimal design. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optim...

  1. Association mapping of resistance to leaf rust in emmer wheat using high throughput SNP markers

    USDA-ARS?s Scientific Manuscript database

    Emmer wheat (Triticum turgidum L. subsp. dicoccum) is known to be a useful source of genes for many desirable characters for improvement of modern cultivated wheat. Recently, a panel of 181 emmer wheat accessions has been genotyped with wheat 9K SNP (single nucleotide polymorphism) markers and exte...

  2. An innovative SNP genotyping method adapting to multiple platforms and throughputs

    USDA-ARS?s Scientific Manuscript database

    Single nucleotide polymorphisms (SNPs) are highly abundant, distributed throughout the genome in various species, and therefore they are widely used as genetic markers. However, the usefulness of this genetic tool relies heavily on the availability of user-friendly SNP genotyping methods. We have d...

  3. Correlation between SNP genotypes and periodontitis in Japanese type II diabetic patients: a preliminary study.

    PubMed

    Damrongrungruang, Teerasak; Ogawa, Hiroshi; Hori-Matsumoto, Sayaka; Minagawa, Kumiko; Hanyu, Osamu; Sone, Hirohito; Miyazaki, Hideo

    2015-05-01

    The present study aims to investigate the correlation between SNP genotype patterns and periodontitis severity in Japanese type II diabetic patients. A cross-sectional study in 43 Japanese diabetic patients with periodontitis was performed. Blood samples were drawn for single nucleotide polymorphism (SNP) analyses and periodontal index (probing pocket depth and clinical attachment level) was subsequently recorded. Twelve functional genes with SNPs that had been shown to be associated with diabetes and/or inflammation were genotyped using a nuclease-mediated SNP-specific ligation method. Subjects with two or more sites with clinical attachment level ≥6 mm and who additionally had one or more sites with pocket depth ≥5 mm were classified as having severe periodontitis. Proportions of risk genotypes/non-risk genotypes between severe and non-severe periodontitis were subsequently compared. A high frequency (21/43 participants, 49%) of adiponectin gene polymorphism (ADIPOQ 45T > G) homozygous risk genotype (TT genotype) was observed in the participants. The frequency of TGF-β1 SNP (29C > T) risk genotype (TT genotype) in severe periodontitis (34%, n = 11) was significantly higher than in non-severe periodontitis (0%, n = 0) (p = 0.04). Our study suggests that TGF-β1 SNPs (29C > T) may be used as one of the risk indicators for severe periodontitis in Japanese diabetic patients.

  4. Multiplexed SNP genotyping using the Qbead™ system: a quantum dot-encoded microsphere-based assay

    PubMed Central

    Xu, Hongxia; Sha, Michael Y.; Wong, Edith Y.; Uphoff, Janet; Xu, Yanzhang; Treadway, Joseph A.; Truong, Anh; O’Brien, Eamonn; Asquith, Steven; Stubbins, Michael; Spurr, Nigel K.; Lai, Eric H.; Mahoney, Walt

    2003-01-01

    We have developed a new method using the Qbead™ system for high-throughput genotyping of single nucleotide polymorphisms (SNPs). The Qbead system employs fluorescent Qdot™ semiconductor nanocrystals, also known as quantum dots, to encode microspheres that subsequently can be used as a platform for multiplexed assays. By combining mixtures of quantum dots with distinct emission wavelengths and intensities, unique spectral ‘barcodes’ are created that enable the high levels of multiplexing required for complex genetic analyses. Here, we applied the Qbead system to SNP genotyping by encoding microspheres conjugated to allele-specific oligonucleotides. After hybridization of oligonucleotides to amplicons produced by multiplexed PCR of genomic DNA, individual microspheres are analyzed by flow cytometry and each SNP is distinguished by its unique spectral barcode. Using 10 model SNPs, we validated the Qbead system as an accurate and reliable technique for multiplexed SNP genotyping. By modifying the types of probes conjugated to microspheres, the Qbead system can easily be adapted to other assay chemistries for SNP genotyping as well as to other applications such as analysis of gene expression and protein–protein interactions. With its capability for high-throughput automation, the Qbead system has the potential to be a robust and cost-effective platform for a number of applications. PMID:12682378

  5. Longevity and plasticity of CFTR provide an argument for noncanonical SNP organization in hominid DNA.

    PubMed

    Hill, Aubrey E; Plyler, Zackery E; Tiwari, Hemant; Patki, Amit; Tully, Joel P; McAtee, Christopher W; Moseley, Leah A; Sorscher, Eric J

    2014-01-01

    Like many other ancient genes, the cystic fibrosis transmembrane conductance regulator (CFTR) has survived for hundreds of millions of years. In this report, we consider whether such prodigious longevity of an individual gene--as opposed to an entire genome or species--should be considered surprising in the face of eons of relentless DNA replication errors, mutagenesis, and other causes of sequence polymorphism. The conventions that modern human SNP patterns result either from purifying selection or random (neutral) drift were not well supported, since extant models account rather poorly for the known plasticity and function (or the established SNP distributions) found in a multitude of genes such as CFTR. Instead, our analysis can be taken as a polemic indicating that SNPs in CFTR and many other mammalian genes may have been generated--and continue to accrue--in a fundamentally more organized manner than would otherwise have been expected. The resulting viewpoint contradicts earlier claims of 'directional' or 'intelligent design-type' SNP formation, and has important implications regarding the pace of DNA adaptation, the genesis of conserved non-coding DNA, and the extent to which eukaryotic SNP formation should be viewed as adaptive.

  6. MAFsnp: A Multi-Sample Accurate and Flexible SNP Caller Using Next-Generation Sequencing Data.

    PubMed

    Hu, Jiyuan; Li, Tengfei; Xiu, Zidi; Zhang, Hong

    2015-01-01

    Most existing statistical methods developed for calling single nucleotide polymorphisms (SNPs) using next-generation sequencing (NGS) data are based on Bayesian frameworks, and there does not exist any SNP caller that produces p-values for calling SNPs in a frequentist framework. To fill in this gap, we develop a new method MAFsnp, a Multiple-sample based Accurate and Flexible algorithm for calling SNPs with NGS data. MAFsnp is based on an estimated likelihood ratio test (eLRT) statistic. In practical situation, the involved parameter is very close to the boundary of the parametric space, so the standard large sample property is not suitable to evaluate the finite-sample distribution of the eLRT statistic. Observing that the distribution of the test statistic is a mixture of zero and a continuous part, we propose to model the test statistic with a novel two-parameter mixture distribution. Once the parameters in the mixture distribution are estimated, p-values can be easily calculated for detecting SNPs, and the multiple-testing corrected p-values can be used to control false discovery rate (FDR) at any pre-specified level. With simulated data, MAFsnp is shown to have much better control of FDR than the existing SNP callers. Through the application to two real datasets, MAFsnp is also shown to outperform the existing SNP callers in terms of calling accuracy. An R package "MAFsnp" implementing the new SNP caller is freely available at http://homepage.fudan.edu.cn/zhangh/softwares/.

  7. Assessing the Clinical Utility of SNP Microarray for Prader-Willi Syndrome due to Uniparental Disomy.

    PubMed

    Santoro, Stephanie L; Hashimoto, Sayaka; McKinney, Aimee; Mihalic Mosher, Theresa; Pyatt, Robert; Reshmi, Shalini C; Astbury, Caroline; Hickey, Scott E

    2017-01-01

    Maternal uniparental disomy (UPD) 15 is one of the molecular causes of Prader-Willi syndrome (PWS), a multisystem disorder which presents with neonatal hypotonia and feeding difficulty. Current diagnostic algorithms differ regarding the use of SNP microarray to detect PWS. We retrospectively examined the frequency with which SNP microarray could identify regions of homozygosity (ROH) in patients with PWS. We determined that 7/12 (58%) patients with previously confirmed PWS by methylation analysis and microsatellite-positive UPD studies had ROH (>10 Mb) by SNP microarray. Additional assessment of 5,000 clinical microarrays, performed from 2013 to present, determined that only a single case of ROH for chromosome 15 was not caused by an imprinting disorder or identity by descent. We observed that ROH for chromosome 15 is rarely incidental and strongly associated with hypotonic infants having features of PWS. Although UPD microsatellite studies remain essential to definitively establish the presence of UPD, SNP microarray has important utility in the timely diagnostic algorithm for PWS. © 2017 S. Karger AG, Basel.

  8. Analysis of gene-derived SNP marker polymorphism in wheat (Triticum aestivum L.)

    USDA-ARS?s Scientific Manuscript database

    In this study, we analyzed 359 single nucleotide polymorphisms (SNPs) previously discovered in intron sequences of wheat genes to evaluate SNP marker polymorphism in common wheat (Triticum aestivum L.). These SNPs showed an average polymorphism information content (PIC) of 0.181 among 20 US wheat c...

  9. DHOEM: a statistical simulation software for simulating new markers in real SNP marker data.

    PubMed

    Jacquin, Laval; Cao, Tuong-Vi; Grenier, Cécile; Ahmadi, Nourollah

    2015-12-03

    Numerous simulation tools based on specific assumptions have been proposed to simulate populations. Here we present a simulation tool named DHOEM (densification of haplotypes by loess regression and maximum likelihood) which is free from population assumptions and simulates new markers in real SNP marker data. The main objective of DHOEM is to generate a new population, which incorporates real and simulated SNP by statistical learning from an initial population, which match the realized features of the latter. To demonstrate DHOEM's abilities, we used a sample of 704 haplotypes for 12 chromosomes with 8336 SNP from a synthetic population, used for breeding upland rice in Latin America. The distributions of allele frequencies, pairwise SNP LD coefficients and data structures, before and after marker densification of the associated marker data set, were shown to be in relatively good agreement at moderate degrees of marker densification. DHOEM is a user-friendly tool that allows the user to specify the level of marker density desired, with a user defined minor allele frequency (MAF) limit, which is produced in a reasonable computation time. DHOEM is a user-friendly and useful tool for simulation and methodological studies in quantitative genetics and breeding.

  10. SNP-based high density genetic map and mapping of btwd1 dwarfing gene in barley

    PubMed Central

    Ren, Xifeng; Wang, Jibin; Liu, Lipan; Sun, Genlou; Li, Chengdao; Luo, Hong; Sun, Dongfa

    2016-01-01

    A high-density linkage map is a valuable tool for functional genomics and breeding. A newly developed sequence-based marker technology, restriction site associated DNA (RAD) sequencing, has been proven to be powerful for the rapid discovery and genotyping of genome-wide single nucleotide polymorphism (SNP) markers and for the high-density genetic map construction. The objective of this research was to construct a high-density genetic map of barley using RAD sequencing. 1894 high-quality SNP markers were developed and mapped onto all seven chromosomes together with 68 SSR markers. These 1962 markers constituted a total genetic length of 1375.8 cM and an average of 0.7 cM between adjacent loci. The number of markers within each linkage group ranged from 209 to 396. The new recessive dwarfing gene btwd1 in Huaai 11 was mapped onto the high density linkage maps. The result showed that the btwd1 is positioned between SNP marks 7HL_6335336 and 7_249275418 with a genetic distance of 0.9 cM and 0.7 cM on chromosome 7H, respectively. The SNP-based high-density genetic map developed and the dwarfing gene btwd1 mapped in this study provide critical information for position cloning of the btwd1 gene and molecular breeding of barley. PMID:27530597

  11. Changes in variance explained by top SNP windows over generations for three traits in broiler chicken.

    PubMed

    Fragomeni, Breno de Oliveira; Misztal, Ignacy; Lourenco, Daniela Lino; Aguilar, Ignacio; Okimoto, Ronald; Muir, William M

    2014-01-01

    The purpose of this study was to determine if the set of genomic regions inferred as accounting for the majority of genetic variation in quantitative traits remain stable over multiple generations of selection. The data set contained phenotypes for five generations of broiler chicken for body weight, breast meat, and leg score. The population consisted of 294,632 animals over five generations and also included genotypes of 41,036 single nucleotide polymorphism (SNP) for 4,866 animals, after quality control. The SNP effects were calculated by a GWAS type analysis using single step genomic BLUP approach for generations 1-3, 2-4, 3-5, and 1-5. Variances were calculated for windows of 20 SNP. The top ten windows for each trait that explained the largest fraction of the genetic variance across generations were examined. Across generations, the top 10 windows explained more than 0.5% but less than 1% of the total variance. Also, the pattern of the windows was not consistent across generations. The windows that explained the greatest variance changed greatly among the combinations of generations, with a few exceptions. In many cases, a window identified as top for one combination, explained less than 0.1% for the other combinations. We conclude that identification of top SNP windows for a population may have little predictive power for genetic selection in the following generations for the traits here evaluated.

  12. SNP-based genotyping in lentil: linking sequence information with phenotypes

    USDA-ARS?s Scientific Manuscript database

    Lentil (Lens culinaris) has been late to enter the world of high throughput molecular analysis due to a general lack of genomic resources. Using a 454 sequencing-based approach, SNPs have been identified in genes across the lentil genome. Several hundred have been turned into single SNP KASP assay...

  13. SNP discovery in candidate adaptive genes using exon capture in a free-ranging alpine ungulate

    Treesearch

    Gretchen H. Roffler; Stephen J. Amish; Seth Smith; Ted Cosart; Marty Kardos; Michael K. Schwartz; Gordon Luikart

    2016-01-01

    Identification of genes underlying genomic signatures of natural selection is key to understanding adaptation to local conditions. We used targeted resequencing to identify SNP markers in 5321 candidate adaptive genes associated with known immunological, metabolic and growth functions in ovids and other ungulates. We selectively targeted 8161 exons in protein-coding...

  14. Association of Agronomic Traits with SNP Markers in Durum Wheat (Triticum turgidum L. durum (Desf.))

    PubMed Central

    Hu, Xin; Ren, Jing; Ren, Xifeng; Huang, Sisi; Sabiel, Salih A. I.; Luo, Mingcheng; Nevo, Eviatar; Fu, Chunjie; Peng, Junhua; Sun, Dongfa

    2015-01-01

    Association mapping is a powerful approach to detect associations between traits of interest and genetic markers based on linkage disequilibrium (LD) in molecular plant breeding. In this study, 150 accessions of worldwide originated durum wheat germplasm (Triticum turgidum spp. durum) were genotyped using 1,366 SNP markers. The extent of LD on each chromosome was evaluated. Association of single nucleotide polymorphisms (SNP) markers with ten agronomic traits measured in four consecutive years was analyzed under a mix linear model (MLM). Two hundred and one significant association pairs were detected in the four years. Several markers were associated with one trait, and also some markers were associated with multiple traits. Some of the associated markers were in agreement with previous quantitative trait loci (QTL) analyses. The function and homology analyses of the corresponding ESTs of some SNP markers could explain many of the associations for plant height, length of main spike, number of spikelets on main spike, grain number per plant, and 1000-grain weight, etc. The SNP associations for the observed traits are generally clustered in specific chromosome regions of the wheat genome, mainly in 2A, 5A, 6A, 7A, 1B, and 6B chromosomes. This study demonstrates that association mapping can complement and enhance previous QTL analyses and provide additional information for marker-assisted selection. PMID:26110423

  15. Priming of seeds with nitric oxide donor sodium nitroprusside (SNP) alleviates the inhibition on wheat seed germination by salt stress.

    PubMed

    Duan, Pei; Ding, Feng; Wang, Fang; Wang, Bao-Shan

    2007-06-01

    The effect of SNP, an NO donor, on seed germination of wheat (Triticum aestivum L. cv. 'DK961') under salt stress was studied. The results showed that priming of seeds with 0.06 mmol/L SNP for 24 h markedly alleviated the decrease of the germination percentage, germination index, vigor index and imbibition rate of wheat seeds under salt stress. SNP significantly alleviated the decrease of the beta-amylase activity but almost did not affect the alpha-amylase activity of wheat seeds under salt stress. SNP slightly increased the alpha-amylase isoenzymes (especially isoenzyme 3) and significantly increased the beta-amylase isoenzymes (especially isoenzyme d, e, f and g). SNP pretreatment decreased Na(+) content, but increased the K(+) content, resulting in a mark increase of K(+)/Na(+) ratio of wheat seedlings under salt stress. These results suggested that NO is involved in promoting wheat seed germination under salt stress by increasing the beta-amylase activity.

  16. Estimating the effect of SNP genotype on quantitative traits from pooled DNA samples

    PubMed Central

    2012-01-01

    Background Studies to detect associations between DNA markers and traits of interest in humans and livestock benefit from increasing the number of individuals genotyped. Performing association studies on pooled DNA samples can provide greater power for a given cost. For quantitative traits, the effect of an SNP is measured in the units of the trait and here we propose and demonstrate a method to estimate SNP effects on quantitative traits from pooled DNA data. Methods To obtain estimates of SNP effects from pooled DNA samples, we used logistic regression of estimated allele frequencies in pools on phenotype. The method was tested on a simulated dataset, and a beef cattle dataset using a model that included principal components from a genomic correlation matrix derived from the allele frequencies estimated from the pooled samples. The performance of the obtained estimates was evaluated by comparison with estimates obtained using regression of phenotype on genotype from individual samples of DNA. Results For the simulated data, the estimates of SNP effects from pooled DNA are similar but asymptotically different to those from individual DNA data. Error in estimating allele frequencies had a large effect on the accuracy of estimated SNP effects. For the beef cattle dataset, the principal components of the genomic correlation matrix from pooled DNA were consistent with known breed groups, and could be used to account for population stratification. Correctly modeling the contemporary group structure was essential to achieve estimates similar to those from individual DNA data, and pooling DNA from individuals within groups was superior to pooling DNA across groups. For a fixed number of assays, pooled DNA samples produced results that were more correlated with results from individual genotyping data than were results from one random individual assayed from each pool. Conclusions Use of logistic regression of allele frequency on phenotype makes it possible to estimate SNP

  17. Single Nucleotide Polymorphism (SNP) in the Adiponectin Gene and Cardiovascular Disease.

    PubMed

    Chirumbolo, Salvatore

    2016-07-01

    Dear Editor, The recent article by Mohammadzadeh et al.[1] on the latest issue of this Journal showed that the T allele +276G/T SNP of ADIPOQ gene is more associated with the increasing risk of coronary artery disease (CAD) in subjects with type 2 diabetes. Adipocytes were described in myocardial tissue of CAD patients and their role recently discussed[2,3]. Susceptibility to CAD by polymorphism in the Q gene of adiponectin has been reported for 3'-UTR, which harbours some genetic loci associated with metabolic risks and atherosclerosis[4]. Actually, previous studies have shown that the haplotype SNP +276G>T was associated with a decreased risk of CAD, after adjustment for potential confounding factors, therefore some controversial opinion still exists[5]. This evidence should be associated with the role exerted by adipocytes and adiponectin in heart physiology. In particular, in hypertensive disorder complicating pregnancy (HDCP), by investigating the population frequency of alleles, genotypes, and haplotypes of two single nucleotide polymorphisms (SNPs), namely +45T>G (rs2241766) and +276G>T (rs1501299), some authors found that the SNP +276 TT genotype was significantly associated with protection against HDCP, when compared to the pooled G genotypes[6]. Moreover, the same +276G/T SNP haplotype was strongly associated with biliary atresia, an intractable neonatal inflammatory and obliterative cholangiopathy, leading to progressive fibrosis and cirrhosis[7]. CAD is closely related to adiponectin biology. The same isoforms of adiponectin seem to be not associated to CAD severity but to glucose metabolism and its impairment[8]. In the paper by Mohammadzadeh et al.[1], T allele in +276G/T SNP haplotype is highly associated with CAD in subjects with type 2 diabetes, but this linkage should be reappraised if related much more to diabetes rather than CAD. Association of T allele in the indicated SNP with CAD may be an indirect consequence of type 2 diabetes, as reported

  18. Development and Validation of a High-Density SNP Genotyping Array for African Oil Palm.

    PubMed

    Kwong, Qi Bin; Teh, Chee Keng; Ong, Ai Ling; Heng, Huey Ying; Lee, Heng Leng; Mohamed, Mohaimi; Low, Joel Zi-Bin; Apparow, Sukganah; Chew, Fook Tim; Mayes, Sean; Kulaveerasingam, Harikrishna; Tammi, Martti; Appleton, David Ross

    2016-08-01

    High-density single nucleotide polymorphism (SNP) genotyping arrays are powerful tools that can measure the level of genetic polymorphism within a population. To develop a whole-genome SNP array for oil palms, SNP discovery was performed using deep resequencing of eight libraries derived from 132 Elaeis guineensis and Elaeis oleifera palms belonging to 59 origins, resulting in the discovery of >3 million putative SNPs. After SNP filtering, the Illumina OP200K custom array was built with 170 860 successful probes. Phenetic clustering analysis revealed that the array could distinguish between palms of different origins in a way consistent with pedigree records. Genome-wide linkage disequilibrium declined more slowly for the commercial populations (ranging from 120 kb at r(2) = 0.43 to 146 kb at r(2) = 0.50) when compared with the semi-wild populations (19.5 kb at r(2) = 0.22). Genetic fixation mapping comparing the semi-wild and commercial population identified 321 selective sweeps. A genome-wide association study (GWAS) detected a significant peak on chromosome 2 associated with the polygenic component of the shell thickness trait (based on the trait shell-to-fruit; S/F %) in tenera palms. Testing of a genomic selection model on the same trait resulted in good prediction accuracy (r = 0.65) with 42% of the S/F % variation explained. The first high-density SNP genotyping array for oil palm has been developed and shown to be robust for use in genetic studies and with potential for developing early trait prediction to shorten the oil palm breeding cycle. Copyright © 2016 The Author. Published by Elsevier Inc. All rights reserved.

  19. Detecting SNP combinations discriminating human populations from HapMap data.

    PubMed

    Ding, XiaoJun; Li, Min; Gu, HaiHua; Peng, XiaoQing; Zhang, Zhen; Wu, FangXiang

    2015-03-01

    The genomes of different human beings are similar. There are only a relatively small number of genetic differences between people. The genetic differences between people are very worthy of study. Researchers have proposed the fixation index FST measurement to find the single nucleotide polymorphisms (SNPs) which can reflect human population differences. However, most SNPs have interactions and they work together, which leads to the differences among human populations. The number of all possible m-locus combinations chosen from n SNPs grows exponentially. Most methods concern on 2-locus interactions. In this paper, we propose a novel method to find a new coordinate system under which the energy distributions of different populations are quite different. We select out candidate SNPs from n SNPs by using the information of the axes in the coordinate system. The number of candidate SNPs is small, thus SNP-SNP interactions can be searched efficiently. The method can also find interactions of more than two loci. These interactions should be able to reflect the evolution of human populations from another way. The numbers of SNP-SNP interactions are regarded as the differences between pairwise populations and a hierarchical clustering algorithm is used to construct the evolutionary tree. In the experiments, we apply the method to SNP data of four chromosomes separately and the trees constructed on these four chromosomes are highly consistent. Furthermore, the trees are also consistent with previous studies, which indicates that evolutionary information is well mined. The method provides a new insight to analyze the human population differences.

  20. Comparative SNP diversity among four Eucalyptus species for genes from secondary metabolite biosynthetic pathways

    PubMed Central

    Külheim, Carsten; Hui Yeoh, Suat; Maintz, Jens; Foley, William J; Moran, Gavin F

    2009-01-01

    Background There is little information about the DNA sequence variation within and between closely related plant species. The combination of re-sequencing technologies, large-scale DNA pools and availability of reference gene sequences allowed the extensive characterisation of single nucleotide polymorphisms (SNPs) in genes of four biosynthetic pathways leading to the formation of ecologically relevant secondary metabolites in Eucalyptus. With this approach the occurrence and patterns of SNP variation for a set of genes can be compared across different species from the same genus. Results In a single GS-FLX run, we sequenced over 103 Mbp and assembled them to approximately 50 kbp of reference sequences. An average sequencing depth of 315 reads per nucleotide site was achieved for all four eucalypt species, Eucalyptus globulus, E. nitens, E. camaldulensis and E. loxophleba. We sequenced 23 genes from 1,764 individuals and discovered 8,631 SNPs across the species, with about 1.5 times as many SNPs per kbp in the introns compared to exons. The exons of the two closely related species (E. globulus and E. nitens) had similar numbers of SNPs at synonymous and non-synonymous sites. These species also had similar levels of SNP diversity, whereas E. camaldulensis and E. loxophleba had much higher SNP diversity. Neither the pathway nor the position in the pathway influenced gene diversity. The four species share between 20 and 43% of the SNPs in these genes. Conclusion By using conservative statistical detection methods, we were confident about the validity of each SNP. With numerous individuals sampled over the geographical range of each species, we discovered one SNP in every 33 bp for E. nitens and one in every 31 bp in E. globulus. In contrast, the more distantly related species contained more SNPs: one in every 16 bp for E. camaldulensis and one in 17 bp for E. loxophleba, which is, to the best of our knowledge, the highest frequency of SNPs described in woody plant

  1. Multiplex single nucleotide polymorphism (SNP) assay for detection of soybean mosaic virus resistance genes in soybean.

    PubMed

    Shi, Ainong; Chen, Pengyin; Vierling, Richard; Zheng, Cuming; Li, Dexiao; Dong, Dekun; Shakiba, Ehsan; Cervantez, Innan

    2011-02-01

    Soybean mosaic virus (SMV) is one of the most destructive viral diseases in soybean (Glycine max). Three independent loci for SMV resistance have been identified in soybean germplasm. The use of genetic resistance is the most effective method of controlling this disease. Marker assisted selection (MAS) has become very important and useful in the effort of selecting genes for SMV resistance. Single nucleotide polymorphism (SNP), because of its abundance and high-throughput potential, is a powerful tool in genome mapping, association studies, diversity analysis, and tagging of important genes in plant genomics. In this study, a 10 SNPs plus one insert/deletion (InDel) multiplex assay was developed for SMV resistance: two SNPs were developed from the candidate gene 3gG2 at Rsv1 locus, two SNPs selected from the clone N11PF linked to Rsv1, one 'BARC' SNP screened from soybean chromosome 13 [linkage group (LG) F] near Rsv1, two 'BARC' SNPs from probe A519 linked to Rsv3, one 'BARC' SNP from chromosome 14 (LG B2) near Rsv3, and two 'BARC' SNPs from chromosome 2 (LG D1b) near Rsv4, plus one InDel marker from expressed sequence tag (EST) AW307114 linked to Rsv4. This 11 SNP/InDel multiplex assay showed polymorphism among 47 diverse soybean germplasm, indicating this assay can be used to investigate the mode of inheritance in a SMV resistant soybean line carrying Rsv1, Rsv3, and/or Rsv4 through a segregating population with phenotypic data, and to select a specific gene or pyramid two or three genes for SMV resistance through MAS in soybean breeding program. The presence of two SMV resistance genes (Rsv1 and Rsv3) in J05 soybean was confirmed by the SNP assay.

  2. SNP-set analysis replicates acute lung injury genetic risk factors

    PubMed Central

    2012-01-01

    Background We used a gene – based replication strategy to test the reproducibility of prior acute lung injury (ALI) candidate gene associations. Methods We phenotyped 474 patients from a prospective severe trauma cohort study for ALI. Genomic DNA from subjects’ blood was genotyped using the IBC chip, a multiplex single nucleotide polymorphism (SNP) array. Results were filtered for 25 candidate genes selected using prespecified literature search criteria and present on the IBC platform. For each gene, we grouped SNPs according to haplotype blocks and tested the joint effect of all SNPs on susceptibility to ALI using the SNP-set kernel association test. Results were compared to single SNP analysis of the candidate SNPs. Analyses were separate for genetically determined ancestry (African or European). Results We identified 4 genes in African ancestry and 2 in European ancestry trauma subjects which replicated their associations with ALI. Ours is the first replication of IL6, IL10, IRAK3, and VEGFA associations in non-European populations with ALI. Only one gene – VEGFA – demonstrated association with ALI in both ancestries, with distinct haplotype blocks in each ancestry driving the association. We also report the association between trauma-associated ALI and NFKBIA in European ancestry subjects. Conclusions Prior ALI genetic associations are reproducible and replicate in a trauma cohort. Kernel - based SNP-set analysis is a more powerful method to detect ALI association than single SNP analysis, and thus may be more useful for replication testing. Further, gene-based replication can extend candidate gene associations to diverse ethnicities. PMID:22742663

  3. Hypothesis driven single nucleotide polymorphism search (HyDn-SNP-S).

    PubMed

    Swett, Rebecca J; Elias, Angela; Miller, Jeffrey A; Dyson, Gregory E; Andrés Cisneros, G

    2013-09-01

    The advent of complete-genome genotyping across phenotype cohorts has provided a rich source of information for bioinformaticians. However the search for SNPs from this data is generally performed on a study-by-study case without any specific hypothesis of the location for SNPs that are predictive for the phenotype. We have designed a method whereby very large SNP lists (several gigabytes in size), combining several genotyping studies at once, can be sorted and traced back to their ultimate consequence in protein structure. Given a working hypothesis, researchers are able to easily search whole genome genotyping data for SNPs that link genetic locations to phenotypes. This allows a targeted search for correlations between phenotypes and potentially relevant systems, rather than utilizing statistical methods only. HyDn-SNP-S returns results that are less data dense, allowing more thorough analysis, including haplotype analysis. We have applied our method to correlate DNA polymerases to cancer phenotypes using four of the available cancer databases in dbGaP. Logistic regression and derived haplotype analysis indicates that ~80SNPs, previously overlooked, are statistically significant. Derived haplotypes from this work link POLL to breast cancer and POLG to prostate cancer with an increase in incidence of 3.01- and 9.6-fold, respectively. Molecular dynamics simulations on wild-type and one of the SNP mutants from the haplotype of POLL provide insights at the atomic level on the functional impact of this cancer related SNP. Furthermore, HyDn-SNP-S has been designed to allow application to any system. The program is available upon request from the authors. Copyright © 2013 Elsevier B.V. All rights reserved.

  4. Impact of pre-imputation SNP-filtering on genotype imputation results

    PubMed Central

    2014-01-01

    Background Imputation of partially missing or unobserved genotypes is an indispensable tool for SNP data analyses. However, research and understanding of the impact of initial SNP-data quality control on imputation results is still limited. In this paper, we aim to evaluate the effect of different strategies of pre-imputation quality filtering on the performance of the widely used imputation algorithms MaCH and IMPUTE. Results We considered three scenarios: imputation of partially missing genotypes with usage of an external reference panel, without usage of an external reference panel, as well as imputation of completely un-typed SNPs using an external reference panel. We first created various datasets applying different SNP quality filters and masking certain percentages of randomly selected high-quality SNPs. We imputed these SNPs and compared the results between the different filtering scenarios by using established and newly proposed measures of imputation quality. While the established measures assess certainty of imputation results, our newly proposed measures focus on the agreement with true genotypes. These measures showed that pre-imputation SNP-filtering might be detrimental regarding imputation quality. Moreover, the strongest drivers of imputation quality were in general the burden of missingness and the number of SNPs used for imputation. We also found that using a reference panel always improves imputation quality of partially missing genotypes. MaCH performed slightly better than IMPUTE2 in most of our scenarios. Again, these results were more pronounced when using our newly defined measures of imputation quality. Conclusion Even a moderate filtering has a detrimental effect on the imputation quality. Therefore little or no SNP filtering prior to imputation appears to be the best strategy for imputing small to moderately sized datasets. Our results also showed that for these datasets, MaCH performs slightly better than IMPUTE2 in most scenarios at

  5. Gradient Boosting as a SNP Filter: an Evaluation Using Simulated and Hair Morphology Data

    PubMed Central

    Lubke, GH; Laurin, C; Walters, R; Eriksson, N; Hysi, P; Spector, TD; Montgomery, GW; Martin, NG; Medland, SE; Boomsma, DI

    2013-01-01

    Typically, genome-wide association studies consist of regressing the phenotype on each SNP separately using an additive genetic model. Although statistical models for recessive, dominant, SNP-SNP, or SNP-environment interactions exist, the testing burden makes an evaluation of all possible effects impractical for genome-wide data. We advocate a two-step approach where the first step consists of a filter that is sensitive to different types of SNP main and interactions effects. The aim is to substantially reduce the number of SNPs such that more specific modeling becomes feasible in a second step. We provide an evaluation of a statistical learning method called “gradient boosting machine” (GBM) that can be used as a filter. GBM does not require an a priori specification of a genetic model, and permits inclusion of large numbers of covariates. GBM can therefore be used to explore multiple GxE interactions, which would not be feasible within the parametric framework used in GWAS. We show in a simulation that GBM performs well even under conditions favorable to the standard additive regression model commonly used in GWAS, and is sensitive to the detection of interaction effects even if one of the interacting variables has a zero main effect. The latter would not be detected in GWAS. Our evaluation is accompanied by an analysis of empirical data concerning hair morphology. We estimate the phenotypic variance explained by increasing numbers of highest ranked SNPs, and show that it is sufficient to select 10K-20K SNPs in the first step of a two-step approach. PMID:24404405

  6. SNP discovery in the transcriptome of white Pacific shrimp Litopenaeus vannamei by next generation sequencing.

    PubMed

    Yu, Yang; Wei, Jiankai; Zhang, Xiaojun; Liu, Jingwen; Liu, Chengzhang; Li, Fuhua; Xiang, Jianhai

    2014-01-01

    The application of next generation sequencing technology has greatly facilitated high throughput single nucleotide polymorphism (SNP) discovery and genotyping in genetic research. In the present study, SNPs were discovered based on two transcriptomes of Litopenaeus vannamei (L. vannamei) generated from Illumina sequencing platform HiSeq 2000. One transcriptome of L. vannamei was obtained through sequencing on the RNA from larvae at mysis stage and its reference sequence was de novo assembled. The data from another transcriptome were downloaded from NCBI and the reads of the two transcriptomes were mapped separately to the assembled reference by BWA. SNP calling was performed using SAMtools. A total of 58,717 and 36,277 SNPs with high quality were predicted from the two transcriptomes, respectively. SNP calling was also performed using the reads of two transcriptomes together, and a total of 96,040 SNPs with high quality were predicted. Among these 96,040 SNPs, 5,242 and 29,129 were predicted as non-synonymous and synonymous SNPs respectively. Characterization analysis of the predicted SNPs in L. vannamei showed that the estimated SNP frequency was 0.21% (one SNP per 476 bp) and the estimated ratio for transition to transversion was 2.0. Fifty SNPs were randomly selected for validation by Sanger sequencing after PCR amplification and 76% of SNPs were confirmed, which indicated that the SNPs predicted in this study were reliable. These SNPs will be very useful for genetic study in L. vannamei, especially for the high density linkage map construction and genome-wide association studies.

  7. SNP discovery in candidate adaptive genes using exon capture in a free-ranging alpine ungulate

    USGS Publications Warehouse

    Roffler, Gretchen H.; Amish, Stephen J.; Smith, Seth; Cosart, Ted F.; Kardos, Marty; Schwartz, Michael K.; Luikart, Gordon

    2016-01-01

    Identification of genes underlying genomic signatures of natural selection is key to understanding adaptation to local conditions. We used targeted resequencing to identify SNP markers in 5321 candidate adaptive genes associated with known immunological, metabolic and growth functions in ovids and other ungulates. We selectively targeted 8161 exons in protein-coding and nearby 5′ and 3′ untranslated regions of chosen candidate genes. Targeted sequences were taken from bighorn sheep (Ovis canadensis) exon capture data and directly from the domestic sheep genome (Ovis aries v. 3; oviAri3). The bighorn sheep sequences used in the Dall's sheep (Ovis dalli dalli) exon capture aligned to 2350 genes on the oviAri3 genome with an average of 2 exons each. We developed a microfluidic qPCR-based SNP chip to genotype 476 Dall's sheep from locations across their range and test for patterns of selection. Using multiple corroborating approaches (lositan and bayescan), we detected 28 SNP loci potentially under selection. We additionally identified candidate loci significantly associated with latitude, longitude, precipitation and temperature, suggesting local environmental adaptation. The three methods demonstrated consistent support for natural selection on nine genes with immune and disease-regulating functions (e.g. Ovar-DRA, APC, BATF2, MAGEB18), cell regulation signalling pathways (e.g. KRIT1, PI3K, ORRC3), and respiratory health (CYSLTR1). Characterizing adaptive allele distributions from novel genetic techniques will facilitate investigation of the influence of environmental variation on local adaptation of a northern alpine ungulate throughout its range. This research demonstrated the utility of exon capture for gene-targeted SNP discovery and subsequent SNP chip genotyping using low-quality samples in a nonmodel species.

  8. Whole genome sequencing of peach (Prunus persica L.) for SNP identification and selection.

    PubMed

    Ahmad, Riaz; Parfitt, Dan E; Fass, Joseph; Ogundiwin, Ebenezer; Dhingra, Amit; Gradziel, Thomas M; Lin, Dawei; Joshi, Nikhil A; Martinez-Garcia, Pedro J; Crisosto, Carlos H

    2011-11-22

    The application of next generation sequencing technologies and bioinformatic scripts to identify high frequency SNPs distributed throughout the peach genome is described. Three peach genomes were sequenced using Roche 454 and Illumina/Solexa technologies to obtain long contigs for alignment to the draft 'Lovell' peach sequence as well as sufficient depth of coverage for 'in silico' SNP discovery. The sequences were aligned to the 'Lovell' peach genome released April 01, 2010 by the International Peach Genome Initiative (IPGI). 'Dr. Davis', 'F8, 1-42' and 'Georgia Belle' were sequenced to add SNPs segregating in two breeding populations, Pop DF ('Dr. Davis' × 'F8, 1-42') and Pop DG ('Dr. Davis' × 'Georgia Belle'). Roche 454 sequencing produced 980,000 total reads with 236 Mb sequence for 'Dr. Davis' and 735,000 total reads with 172 Mb sequence for 'F8, 1-42'. 84 bp × 84 bp paired end Illumina/Solexa sequences yielded 25.5, 21.4, 25.5 million sequences for 'Dr. Davis', 'F8, 1-42' and 'Georgia Belle', respectively. BWA/SAMtools were used for alignment of raw reads and SNP detection, with custom PERL scripts for SNP filtering. Velvet's Columbus module was used for sequence assembly. Comparison of aligned and overlapping sequences from both Roche 454 and Illumina/Solexa resulted in the selection of 6654 high quality SNPs for 'Dr. Davis' vs. 'F8, 1-42' and 'Georgia Belle', distributed on eight major peach genome scaffolds as defined from the 'Lovell' assembly. The eight scaffolds contained about 215-225 Mb of peach genomic sequences with one SNP/~ 40,000 bases. All sequences from Roche 454 and Illumina/Solexa have been submitted to NCBI for public use in the Short Read Archive database. SNPs have been deposited in the NCBI SNP database.

  9. MDM2 SNP309 polymorphism contributes to endometrial cancer susceptibility: evidence from a meta-analysis

    PubMed Central

    2013-01-01

    Objective The SNP309 polymorphism (T-G) in the promoter of MDM2 gene has been reported to be associated with enhanced MDM2 expression and tumor development. Studies investigating the association between MDM2 SNP309 polymorphism and endometrial cancer risk reported conflicting results. We performed a meta-analysis of all available studies to explore this association. Methods All studies published up to August 2013 on the association between MDM2 SNP309 polymorphism and endometrial cancer risk were identified by searching electronic databases PubMed, Web of Science, EMBASE, and Chinese Biomedical Literature database (CBM). The association between the MDM2 SNP309 polymorphism and endometrial cancer risk was assessed by odds ratios (ORs) together with their 95% confidence intervals (CIs). Results Eight case–control studies with 2069 endometrial cancer cases and 4546 controls were identified. Overall, significant increase of endometrial cancer risk was found when all studies were pooled in the meta-analysis (GG vs. TT: OR = 1.464, 95% CI 1.246–1.721, P < 0.001; GG vs. TG + TT: OR = 1.726, 95% CI 1.251–2.380, P = 0.001; GG + TG vs. TT: OR = 1.169, 95% CI 1.048–1.304, P = 0.005). In subgroup analysis by ethnicity and HWE in controls, significant increase of endometrial cancer risks were observed in Caucasians and studies consistent with HWE. In subgroup analysis according to study quality, significant associations were observed in both high quality studies and low quality studies. Conclusions This meta-analysis suggests that MDM2 SNP309 polymorphism contributes to endometrial cancer susceptibility, especially in Caucasian populations. Further large and well-designed studies are needed to confirm this association. PMID:24423195

  10. Identification of differently expressed genes with specific SNP Loci for breast cancer by the integration of SNP and gene expression profiling analyses.

    PubMed

    Yuan, Pengfei; Liu, Dechun; Deng, Miao; Liu, Jiangbo; Wang, Jianguang; Zhang, Like; Liu, Qipeng; Zhang, Ting; Chen, Yanbin; Jin, Gaoyuan

    2015-04-01

    This study aims to explore the relationship between gene polymorphism and breast cancer, and to screen DEGs (differentially expressed genes) with SNPs (single nucleotide polymorphisms) related to breast cancer. The SNPs of 17 patients and the preprocessed SNP profiling GSE 32258 (38 cases of normal breast cells) were combined to identify their correlation with breast cancer using chi-square test. The gene expression profiling batch8_9 (38 cases of patients and 8 cases of normal tissue) was preprocessed with limma package, and the DEGs were filtered out. Then fisher's method was applied to integrate DEGs and SNPs associated with breast cancer. With NetBox software, TRED (Transcriptional Regulatory Element Database) and UCSC (University of California Santa Cruz) database, genes-associated network and transcriptional regulatory network were constructed using cytoscape software. Further, GO (Gene Ontology) and KEGG analyses were performed for genes in the networks by using siggenes. In total, 332 DEGs were identified. There were 160 breast cancer-related SNPs related to 106 genes of gene expression profiling (19 were significant DEGs). Finally, 11co-correlated DEGs were selected. In genes-associated network, 9 significant DEGs were correlated to 23 LINKER genes while, in transcriptional regulatory network, E2F1 had regulatory relationships with 7 DEGs including MTUS1, CD44, CCNB1 and CCND2. KRAS with SNP locus of rs1137282 was involved in 35 KEGG pathways. The genes of MTUS1, CD44, CCNB1, CCND2 and KRAS with specific SNP loci may be used as biomarkers for diagnosis of breast cancer. Besides, E2F1 was recognized as the transcription factor of 7 DEGs including MTUS1, CD44, CCNB1 and CCND2.

  11. When Whole-Genome Alignments Just Won't Work: kSNP v2 Software for Alignment-Free SNP Discovery and Phylogenetics of Hundreds of Microbial Genomes

    PubMed Central

    Gardner, Shea N.; Hall, Barry G.

    2013-01-01

    Effective use of rapid and inexpensive whole genome sequencing for microbes requires fast, memory efficient bioinformatics tools for sequence comparison. The kSNP v2 software finds single nucleotide polymorphisms (SNPs) in whole genome data. kSNP v2 has numerous improvements over kSNP v1 including SNP gene annotation; better scaling for draft genomes available as assembled contigs or raw, unassembled reads; a tool to identify the optimal value of k; distribution of packages of executables for Linux and Mac OS X for ease of installation and user-friendly use; and a detailed User Guide. SNP discovery is based on k-mer analysis, and requires no multiple sequence alignment or the selection of a single reference genome. Most target sets with hundreds of genomes complete in minutes to hours. SNP phylogenies are built by maximum likelihood, parsimony, and distance, based on all SNPs, only core SNPs, or SNPs present in some intermediate user-specified fraction of targets. The SNP-based trees that result are consistent with known taxonomy. kSNP v2 can handle many gigabases of sequence in a single run, and if one or more annotated genomes are included in the target set, SNPs are annotated with protein coding and other information (UTRs, etc.) from Genbank file(s). We demonstrate application of kSNP v2 on sets of viral and bacterial genomes, and discuss in detail analysis of a set of 68 finished E. coli and Shigella genomes and a set of the same genomes to which have been added 47 assemblies and four “raw read” genomes of H104:H4 strains from the recent European E. coli outbreak that resulted in both bloody diarrhea and hemolytic uremic syndrome (HUS), and caused at least 50 deaths. PMID:24349125

  12. When whole-genome alignments just won't work: kSNP v2 software for alignment-free SNP discovery and phylogenetics of hundreds of microbial genomes.

    PubMed

    Gardner, Shea N; Hall, Barry G

    2013-01-01

    Effective use of rapid and inexpensive whole genome sequencing for microbes requires fast, memory efficient bioinformatics tools for sequence comparison. The kSNP v2 software finds single nucleotide polymorphisms (SNPs) in whole genome data. kSNP v2 has numerous improvements over kSNP v1 including SNP gene annotation; better scaling for draft genomes available as assembled contigs or raw, unassembled reads; a tool to identify the optimal value of k; distribution of packages of executables for Linux and Mac OS X for ease of installation and user-friendly use; and a detailed User Guide. SNP discovery is based on k-mer analysis, and requires no multiple sequence alignment or the selection of a single reference genome. Most target sets with hundreds of genomes complete in minutes to hours. SNP phylogenies are built by maximum likelihood, parsimony, and distance, based on all SNPs, only core SNPs, or SNPs present in some intermediate user-specified fraction of targets. The SNP-based trees that result are consistent with known taxonomy. kSNP v2 can handle many gigabases of sequence in a single run, and if one or more annotated genomes are included in the target set, SNPs are annotated with protein coding and other information (UTRs, etc.) from Genbank file(s). We demonstrate application of kSNP v2 on sets of viral and bacterial genomes, and discuss in detail analysis of a set of 68 finished E. coli and Shigella genomes and a set of the same genomes to which have been added 47 assemblies and four "raw read" genomes of H104:H4 strains from the recent European E. coli outbreak that resulted in both bloody diarrhea and hemolytic uremic syndrome (HUS), and caused at least 50 deaths.

  13. Identification of Mendelian inconsistencies between SNP and pedigree information of sibs

    PubMed Central

    2011-01-01

    Background Using SNP genotypes to apply genomic selection in breeding programs is becoming common practice. Tools to edit and check the quality of genotype data are required. Checking for Mendelian inconsistencies makes it possible to identify animals for which pedigree information and genotype information are not in agreement. Methods Straightforward tests to detect Mendelian inconsistencies exist that count the number of opposing homozygous marker (e.g. SNP) genotypes between parent and offspring (PAR-OFF). Here, we develop two tests to identify Mendelian inconsistencies between sibs. The first test counts SNP with opposing homozygous genotypes between sib pairs (SIBCOUNT). The second test compares pedigree and SNP-based relationships (SIBREL). All tests iteratively remove animals based on decreasing numbers of inconsistent parents and offspring or sibs. The PAR-OFF test, followed by either SIB test, was applied to a dataset comprising 2,078 genotyped cows and 211 genotyped sires. Theoretical expectations for distributions of test statistics of all three tests were calculated and compared to empirically derived values. Type I and II error rates were calculated after applying the tests to the edited data, while Mendelian inconsistencies were introduced by permuting pedigree against genotype data for various proportions of animals. Results Both SIB tests identified animal pairs for which pedigree and genomic relationships could be considered as inconsistent by visual inspection of a scatter plot of pairwise pedigree and SNP-based relationships. After removal of 235 animals with the PAR-OFF test, SIBCOUNT (SIBREL) identified 18 (22) additional inconsistent animals. Seventeen animals were identified by both methods. The numbers of incorrectly deleted animals (Type I error), were equally low for both methods, while the numbers of incorrectly non-deleted animals (Type II error), were considerably higher for SIBREL compared to SIBCOUNT. Conclusions Tests to remove

  14. Identification of Mendelian inconsistencies between SNP and pedigree information of sibs.

    PubMed

    Calus, Mario P L; Mulder, Han A; Bastiaansen, John W M

    2011-10-11

    Using SNP genotypes to apply genomic selection in breeding programs is becoming common practice. Tools to edit and check the quality of genotype data are required. Checking for Mendelian inconsistencies makes it possible to identify animals for which pedigree information and genotype information are not in agreement. Straightforward tests to detect Mendelian inconsistencies exist that count the number of opposing homozygous marker (e.g. SNP) genotypes between parent and offspring (PAR-OFF). Here, we develop two tests to identify Mendelian inconsistencies between sibs. The first test counts SNP with opposing homozygous genotypes between sib pairs (SIBCOUNT). The second test compares pedigree and SNP-based relationships (SIBREL). All tests iteratively remove animals based on decreasing numbers of inconsistent parents and offspring or sibs. The PAR-OFF test, followed by either SIB test, was applied to a dataset comprising 2,078 genotyped cows and 211 genotyped sires. Theoretical expectations for distributions of test statistics of all three tests were calculated and compared to empirically derived values. Type I and II error rates were calculated after applying the tests to the edited data, while Mendelian inconsistencies were introduced by permuting pedigree against genotype data for various proportions of animals. Both SIB tests identified animal pairs for which pedigree and genomic relationships could be considered as inconsistent by visual inspection of a scatter plot of pairwise pedigree and SNP-based relationships. After removal of 235 animals with the PAR-OFF test, SIBCOUNT (SIBREL) identified 18 (22) additional inconsistent animals.Seventeen animals were identified by both methods. The numbers of incorrectly deleted animals (Type I error), were equally low for both methods, while the numbers of incorrectly non-deleted animals (Type II error), were considerably higher for SIBREL compared to SIBCOUNT. Tests to remove Mendelian inconsistencies between sibs should

  15. Model SNP development for complex genomes based on hexaploid oat using high-throughput 454 sequencing technology

    PubMed Central

    2011-01-01

    Background Genetic markers are pivotal to modern genomics research; however, discovery and genotyping of molecular markers in oat has been hindered by the size and complexity of the genome, and by a scarcity of sequence data. The purpose of this study was to generate oat expressed sequence tag (EST) information, develop a bioinformatics pipeline for SNP discovery, and establish a method for rapid, cost-effective, and straightforward genotyping of SNP markers in complex polyploid genomes such as oat. Results Based on cDNA libraries of four cultivated oat genotypes, approximately 127,000 contigs were assembled from approximately one million Roche 454 sequence reads. Contigs were filtered through a novel bioinformatics pipeline to eliminate ambiguous polymorphism caused by subgenome homology, and 96 in silico SNPs were selected from 9,448 candidate loci for validation using high-resolution melting (HRM) analysis. Of these, 52 (54%) were polymorphic between parents of the Ogle1040 × TAM O-301 (OT) mapping population, with 48 segregating as single Mendelian loci, and 44 being placed on the existing OT linkage map. Ogle and TAM amplicons from 12 primers were sequenced for SNP validation, revealing complex polymorphism in seven amplicons but general sequence conservation within SNP loci. Whole-amplicon interrogation with HRM revealed insertions, deletions, and heterozygotes in secondary oat germplasm pools, generating multiple alleles at some primer targets. To validate marker utility, 36 SNP assays were used to evaluate the genetic diversity of 34 diverse oat genotypes. Dendrogram clusters corresponded generally to known genome composition and genetic ancestry. Conclusions The high-throughput SNP discovery pipeline presented here is a rapid and effective method for identification of polymorphic SNP alleles in the oat genome. The current-generation HRM system is a simple and highly-informative platform for SNP genotyping. These techniques provide a model for SNP

  16. MA-SNP--A new genotype calling method for oligonucleotide SNP arrays modeling the batch effect with a normal mixture model.

    PubMed

    Wen, Yalu; Li, Ming; Fu, Wenjiang J

    2011-08-30

    Genome-wide association studies hold great promise in identifying disease-susceptibility variants and understanding the genetic etiology of complex diseases. Microarray technology enables the genotyping of millions of single nucleotide polymorphisms. Many factors in microarray studies, such as probe selection, sample quality, and experimental process and batch, have substantial effect on the genotype calling accuracy, which is crucial for downstream analyses. Failure to account for the variability of these sources may lead to inaccurate genotype calls and false positive and false negative findings. In this study, we develop a SNP-specific genotype calling algorithm based on the probe intensity composite representation (PICR) model, while using a normal mixture model to account for the variability of batch effect on the genotype calls. We demonstrate our method with SNP array data in a few studies, including the HapMap project, the coronary heart disease and the UK Blood Service Control studies by the Wellcome Trust Case-Control Consortium, and a methylation profiling study. Our single array based approach outperforms PICR and is comparable to the best multi-array genotype calling methods.

  17. Ecotoxicological assessment of PAHs and their dead-end metabolites after degradation by Mycobacterium sp. strain SNP11.

    PubMed

    Pagnout, Christophe; Rast, Claudine; Veber, Anne-Marie; Poupin, Pascal; Férard, Jean-François

    2006-10-01

    Mycobacterium sp. SNP11 has a high PAH biodegradation potential. In this paper, the toxicity of pyrene, fluoranthene, phenanthrene, and their dead-end metabolites, accumulated in the media after biodegradation by Mycobacterium sp. SNP11, were evaluated by a screening battery of acute, chronic, and genotoxic tests. According to the bioassays, performed on bacteria (Vibrio fischeri, Salmonella typhimurium strains TA1535/pSK1002, TA97a, TA98, TA100), algae (Pseudokirchneriella subcapitata), and crustaceans (Daphnia magna, Ceriodaphnia dubia), total disappearance or a very significant reduction of the (geno)toxic potential was observed after PAH degradation by Mycobacterium sp. SNP11.

  18. Comparative performance of SNP typing and 'Bruce-ladder' in the discrimination of Brucella suis and Brucella canis.

    PubMed

    Koylass, Mark S; King, Amanda C; Edwards-Smallbone, James; Gopaul, Krishna K; Perrett, Lorraine L; Whatmore, Adrian M

    2010-05-19

    Two novel molecular assays, 'Bruce-ladder' and SNP typing, have recently been described designed to differentiate isolates of the genus Brucella, causative organisms of the significant zoonotic disease brucellosis, at the species level. Differentiation of Brucella canis from Brucella suis by molecular approaches can be difficult and here we compare the performance of 'Bruce-ladder' and SNP typing in correctly identifying B. canis isolates. Both assays proved easy to perform but while 'Bruce-ladder' misidentifies a substantial proportion of B. canis isolates as B. suis, all B. canis isolates were correctly identified by SNP typing. Crown Copyright 2009. Published by Elsevier B.V. All rights reserved.

  19. Genome-wide SNP association-based localization of a dwarfism gene in Friesian dwarf horses.

    PubMed

    Orr, N; Back, W; Gu, J; Leegwater, P; Govindarajan, P; Conroy, J; Ducro, B; Van Arendonk, J A M; MacHugh, D E; Ennis, S; Hill, E W; Brama, P A J

    2010-12-01

    The recent completion of the horse genome and commercial availability of an equine SNP genotyping array has facilitated the mapping of disease genes. We report putative localization of the gene responsible for dwarfism, a trait in Friesian horses that is thought to have a recessive mode of inheritance, to a 2-MB region of chromosome 14 using just 10 affected animals and 10 controls. We successfully genotyped 34,429 SNPs that were tested for association with dwarfism using chi-square tests. The most significant SNP in our study, BIEC2-239376 (P(2df)=4.54 × 10(-5), P(rec)=7.74 × 10(-6)), is located close to a gene implicated in human dwarfism. Fine-mapping and resequencing analyses did not aid in further localization of the causative variant, and replication of our findings in independent sample sets will be necessary to confirm these results.

  20. Microfluidic linear hydrogel array for multiplexed single nucleotide polymorphism (SNP) detection.

    PubMed

    Jung, Yun Kyung; Kim, Jungkyu; Mathies, Richard A

    2015-03-17

    A PDMS-based microfluidic linear hydrogel array is developed for multiplexed single nucleotide polymorphism (SNP) detection. A sequence of three-dimensional (3D) hydrogel plugs containing the desired DNA probes is prepared by UV polymerization within a PDMS microchannel system. The fluorescently labeled target DNA is then electrophoresed through the sequence of hydrogel plugs for hybridization. Continued electrophoresis provides an electrophoretic wash that removes nonspecific binders. The capture gel array is imaged after washing at various temperatures (temperature gradient electrophoresis) to further distinguish perfect matches from mismatches. The ability of this microdevice to perform multiplex SNP genotyping is demonstrated by analyzing a mixture of model E. coli bacterial targets. This microfluidic hydrogel array is ∼1000 times more sensitive than planar microarrays due to the 3D gel capture, the hybridization time is much shorter due to electrophoretic control of the transport properties, and the stringent wash with temperature gradient electrophoresis enables analysis of single nucleotide mismatches with high specificity.

  1. [Artificial selection for cattle based on high-density SNP markers].

    PubMed

    Liu, Xi-Dong; Wang, Zhi-Peng; Fan, Hui-Zhong; Li, Jun-Ya; Gao, Hui-Jiang

    2012-10-01

    With the implementation of genetic improvement in recent years, artificial selection has greatly improved beef cattle production performance and its genetic basis has been dramatically changed. In this study, based on the Illumina BovineSNP50 (54K) and BovineHD (770K) BeadChip and the FST value, we analyzed the genetic differentiation of cattle and screened the imprints of selection in bovine genome. Finally, we found 47104 OUTLIER SNP loci and 3064 candidate genes, for example, CLIC5, TG, CACNA2D1, and FSHR etc. The biological processes and molecular functions of genes were analyzed through gene annotation.The results of this study established a genome-wide map of selection footprints in beef cattle genome and a clue for in-depth study of artificial selection and understanding of biological evolution.Our results indicate that artificial selection has played an important role in cattle breed genetic improvement.

  2. Using false discovery rates to benchmark SNP-callers in next-generation sequencing projects.

    PubMed

    Farrer, Rhys A; Henk, Daniel A; MacLean, Dan; Studholme, David J; Fisher, Matthew C

    2013-01-01

    Sequence alignments form the basis for many comparative and population genomic studies. Alignment tools provide a range of accuracies dependent on the divergence between the sequences and the alignment methods. Despite widespread use, there is no standard method for assessing the accuracy of a dataset and alignment strategy after resequencing. We present a framework and tool for determining the overall accuracies of an input read dataset, alignment and SNP-calling method providing an isolate in that dataset has a corresponding, or closely related reference sequence available. In addition to this tool for comparing False Discovery Rates (FDR), we include a method for determining homozygous and heterozygous positions from an alignment using binomial probabilities for an expected error rate. We benchmark this method against other SNP callers using our FDR method with three fungal genomes, finding that it was able achieve a high level of accuracy. These tools are available at http://cfdr.sourceforge.net/.

  3. Using False Discovery Rates to Benchmark SNP-callers in next-generation sequencing projects

    PubMed Central

    Farrer, Rhys A.; Henk, Daniel A.; MacLean, Dan; Studholme, David J.; Fisher, Matthew C.

    2013-01-01

    Sequence alignments form the basis for many comparative and population genomic studies. Alignment tools provide a range of accuracies dependent on the divergence between the sequences and the alignment methods. Despite widespread use, there is no standard method for assessing the accuracy of a dataset and alignment strategy after resequencing. We present a framework and tool for determining the overall accuracies of an input read dataset, alignment and SNP-calling method providing an isolate in that dataset has a corresponding, or closely related reference sequence available. In addition to this tool for comparing False Discovery Rates (FDR), we include a method for determining homozygous and heterozygous positions from an alignment using binomial probabilities for an expected error rate. We benchmark this method against other SNP callers using our FDR method with three fungal genomes, finding that it was able achieve a high level of accuracy. These tools are available at http://cfdr.sourceforge.net/. PMID:23518929

  4. Human population genetic diversity as a function of SNP type from HapMap data.

    PubMed

    Garte, Seymour

    2010-01-01

    Data from the international HapMap project were mined to determine if the degree of genetic differentiation (Fst) is dependent on single nucleotide polymorphism (SNP) category. The Fst statistic was evaluated across all SNPs for each of 30 genes and for each of five chromosomes. A consistent decrease in diversity between Europeans and Africans was seen for nonsynonymous coding region SNPs compared to the three other SNP categories: synonymous SNPs, UTR, and intronic SNPs. This suggests an effect of balancing selection in reducing interpopulation genetic diversity at sites that would be expected to influence phenotype and therefore be subject to selection. This result is inconsistent with the concept of large population specific genetic differences that could have applications in "racialized medicine."

  5. Slider--maximum use of probability information for alignment of short sequence reads and SNP detection.

    PubMed

    Malhis, Nawar; Butterfield, Yaron S N; Ester, Martin; Jones, Steven J M

    2009-01-01

    A plethora of alignment tools have been created that are designed to best fit different types of alignment conditions. While some of these are made for aligning Illumina Sequence Analyzer reads, none of these are fully utilizing its probability (prb) output. In this article, we will introduce a new alignment approach (Slider) that reduces the alignment problem space by utilizing each read base's probabilities given in the prb files. Compared with other aligners, Slider has higher alignment accuracy and efficiency. In addition, given that Slider matches bases with probabilities other than the most probable, it significantly reduces the percentage of base mismatches. The result is that its SNP predictions are more accurate than other SNP prediction approaches used today that start from the most probable sequence, including those using base quality.

  6. HapMap tagSNP transferability in multiple populations: general guidelines

    PubMed Central

    Xing, Jinchuan; Witherspoon, David J.; Watkins, W. Scott; Zhang, Yuhua; Tolpinrud, Whitney; Jorde, Lynn B.

    2008-01-01

    This PDF receipt will only be used as the basis for generating PubMed Central (PMC) documents. PMC documents will be made available for review after conversion (approx. 2–3 weeks time). Any corrections that need to be made will be done at that time. No materials will be released to PMC without the approval of an author. Only the PMC documents will appear on PubMed Central -- this PDF Receipt will not appear on PubMed Central. Linkage disequilibrium (LD) has received much recent attention because of its value in localizing disease-causing genes. Due to the extensive LD between neighboring loci in the human genome, it is believed that a subset of the single nucleotide polymorphisms in a region (tagSNPs) can be selected to capture most of the remaining SNP variants. In this study, we examined LD patterns and HapMap tagSNP transferability in more than 300 individuals. A South Indian and an African Mbuti Pygmy population sample were included to evaluate the performance of HapMap tagSNPs in geographically distinct and genetically isolated populations. Our results show that HapMap tagSNPs selected with r2 >= 0.8 can capture more than 85% of the SNPs in populations that are from the same continental group. Combined tagSNPs from HapMap CEU and CHB+JPT serve as the best reference for the Indian sample. The HapMap YRI are a sufficient reference for tagSNP selection in the Pygmy sample. In addition to our findings, we reviewed over 25 recent studies of tagSNP transferability and propose a general guideline for selecting tagSNPs from HapMap populations. PMID:18482828

  7. Sensitive Quantification of Mosaicism Using High Density SNP Arrays and the Cumulative Distribution Function

    PubMed Central

    Markello, Thomas C.; Carlson-Donohoe, Hannah; Sincan, Murat; Adams, David; Bodine, David M.; Farrar, Jason E.; Vlachos, Adrianna; Lipton, Jeffrey M.; Auerbach, Arleen D.; Ostrander, Elaine A.; Chandrasekharappa, Settara C.; Boerkoel, Cornelius F.; Gahl, William A.

    2012-01-01

    Medicine is rapidly applying exome and genome sequencing to the diagnosis and management of human disease. Somatic mosaicism, however, is not readily detectable by these means, and yet it accounts for a significant portion of undiagnosed disease. We present a rapid and sensitive method, the Continuous Distribution Function as applied to single nucleotide polymorphism (SNP) array data, to quantify somatic mosaicism throughout the genome. We also demonstrate application of the method to novel diseases and mechanisms. PMID:22277120

  8. Applying SNP-Derived Molecular Coancestry Estimates to Captive Breeding Programs.

    PubMed

    Ivy, Jamie A; Putnam, Andrea S; Navarro, Asako Y; Gurr, Jessica; Ryder, Oliver A

    2016-09-01

    Captive breeding programs for wildlife species typically rely on pedigrees to inform genetic management. Although pedigree-based breeding strategies are quite effective at retaining long-term genetic variation, management of zoo-based breeding programs continues to be hampered when pedigrees are poorly known. The objective of this study was to evaluate 2 options for generating single nucleotide polymorphism (SNP) data to resolve unknown relationships within captive breeding programs. We generated SNP data for a zoo-based population of addax (Addax nasomasculatus) using both the Illumina BovineHD BeadChip and double digest restriction site-associated DNA (ddRAD) sequencing. Our results demonstrated that estimates of allele sharing (AS) between pairs of individuals exhibited low variances. Average AS variances were highest when using 50 loci (SNPchipall = 0.00159; ddRADall = 0.0249), but fell below 0.0003 for the SNP chip dataset when sampling ≥250 loci and below 0.0025 for the ddRAD dataset when sampling ≥500 loci. Furthermore, the correlation between the SNPchipall and ddRADall AS datasets was 0.88 (95%CI = 0.84-0.91) when subsampling 500 loci. Collectively, our results indicated that both SNP genotyping methods produced sufficient data for accurately estimating relationships, even within an extremely bottlenecked population. Our results also suggested that analytic assumptions historically integrated into the addax pedigree are not adversely impacting long-term pedigree-based management; kinships calculated from the analytic pedigree were significantly correlated (P < 0.001) with AS estimates. Overall, our conclusions are intended to serve as both a proof of concept and a model for applying molecular data to the genetic management of captive breeding programs. © The American Genetic Association 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  9. A comparison of SNP and STR loci for delineating population structure and performing individual genetic assignment

    PubMed Central

    2010-01-01

    Background Technological advances have lead to the rapid increase in availability of single nucleotide polymorphisms (SNPs) in a range of organisms, and there is a general optimism that SNPs will become the marker of choice for a range of evolutionary applications. Here, comparisons between 300 polymorphic SNPs and 14 short tandem repeats (STRs) were conducted on a data set consisting of approximately 500 Atlantic salmon arranged in 10 samples/populations. Results Global FST ranged from 0.033-0.115 and -0.002-0.316 for the 14 STR and 300 SNP loci respectively. Global FST was similar among 28 linkage groups when averaging data from mapped SNPs. With the exception of selecting a panel of SNPs taking the locus displaying the highest global FST for each of the 28 linkage groups, which inflated estimation of genetic differentiation among the samples, inferred genetic relationships were highly similar between SNP and STR data sets and variants thereof. The best 15 SNPs (30 alleles) gave a similar level of self-assignment to the best 4 STR loci (83 alleles), however, addition of further STR loci did not lead to a notable increase assignment whereas addition of up to 100 SNP loci increased assignment. Conclusion Whilst the optimal combinations of SNPs identified in this study are linked to the samples from which they were selected, this study demonstrates that identification of highly informative SNP loci from larger panels will provide researchers with a powerful approach to delineate genetic relationships at the individual and population levels. PMID:20051144

  10. Mapping mutations in plant genomes with the user-friendly web application CandiSNP.

    PubMed

    Etherington, Graham J; Monaghan, Jacqueline; Zipfel, Cyril; MacLean, Dan

    2014-01-01

    Analysis of mutants isolated from forward-genetic screens has revealed key components of several plant signalling pathways. Mapping mutations by position, either using classical methods or whole genome high-throughput sequencing (HTS), largely relies on the analysis of genome-wide polymorphisms in F2 recombinant populations. Combining bulk segregant analysis with HTS has accelerated the identification of causative mutations and has been widely adopted in many research programmes. A major advantage of HTS is the ability to perform bulk segregant analysis after back-crossing to the parental line rather than out-crossing to a polymorphic ecotype, which reduces genetic complexity and avoids issues with phenotype penetrance in different ecotypes. Plotting the positions of homozygous polymorphisms in a mutant genome identifies areas of low recombination and is an effective way to detect molecular linkage to a phenotype of interest. We describe the use of single nucleotide polymorphism (SNP) density plots as a mapping strategy to identify and refine chromosomal positions of causative mutations from screened plant populations. We developed a web application called CandiSNP that generates density plots from user-provided SNP data obtained from HTS. Candidate causative mutations, defined as SNPs causing non-synonymous changes in annotated coding regions are highlighted on the plots and listed in a table. We use data generated from a recent mutant screen in the model plant Arabidopsis thaliana as proof-of-concept for the validity of our tool. CandiSNP is a user-friendly application that will aid in novel discoveries from forward-genetic mutant screens. It is particularly useful for analysing HTS data from bulked back-crossed mutants, which contain fewer polymorphisms than data generated from out-crosses. The web-application is freely available online at http://candisnp.tsl.ac.uk.

  11. SNP Discovery Using Next Generation Transcriptomic Sequencing in Atlantic Herring (Clupea harengus)

    PubMed Central

    Bekkevold, Dorte; Babbucci, Massimiliano; van Houdt, Jeroen; Maes, Gregory E.; Bargelloni, Luca; Nielsen, Rasmus O.; Taylor, Martin I.; Ogden, Rob; Cariani, Alessia; Carvalho, Gary R.; Consortium, FishPopTrace; Panitz, Frank

    2012-01-01

    The introduction of Next Generation Sequencing (NGS) has revolutionised population genetics, providing studies of non-model species with unprecedented genomic coverage, allowing evolutionary biologists to address questions previously far beyond the reach of available resources. Furthermore, the simple mutation model of Single Nucleotide Polymorphisms (SNPs) permits cost-effective high-throughput genotyping in thousands of individuals simultaneously. Genomic resources are scarce for the Atlantic herring (Clupea harengus), a small pelagic species that sustains high revenue fisheries. This paper details the development of 578 SNPs using a combined NGS and high-throughput genotyping approach. Eight individuals covering the species distribution in the eastern Atlantic were bar-coded and multiplexed into a single cDNA library and sequenced using the 454 GS FLX platform. SNP discovery was performed by de novo sequence clustering and contig assembly, followed by the mapping of reads against consensus contig sequences. Selection of candidate SNPs for genotyping was conducted using an in silico approach. SNP validation and genotyping were performed simultaneously using an Illumina 1,536 GoldenGate assay. Although the conversion rate of candidate SNPs in the genotyping assay cannot be predicted in advance, this approach has the potential to maximise cost and time efficiencies by avoiding expensive and time-consuming laboratory stages of SNP validation. Additionally, the in silico approach leads to lower ascertainment bias in the resulting SNP panel as marker selection is based only on the ability to design primers and the predicted presence of intron-exon boundaries. Consequently SNPs with a wider spectrum of minor allele frequencies (MAFs) will be genotyped in the final panel. The genomic resources presented here represent a valuable multi-purpose resource for developing informative marker panels for population discrimination, microarray development and for population

  12. SNP discovery using Next Generation Transcriptomic Sequencing in Atlantic herring (Clupea harengus).

    PubMed

    Helyar, Sarah J; Limborg, Morten T; Bekkevold, Dorte; Babbucci, Massimiliano; van Houdt, Jeroen; Maes, Gregory E; Bargelloni, Luca; Nielsen, Rasmus O; Taylor, Martin I; Ogden, Rob; Cariani, Alessia; Carvalho, Gary R; Panitz, Frank

    2012-01-01

    The introduction of Next Generation Sequencing (NGS) has revolutionised population genetics, providing studies of non-model species with unprecedented genomic coverage, allowing evolutionary biologists to address questions previously far beyond the reach of available resources. Furthermore, the simple mutation model of Single Nucleotide Polymorphisms (SNPs) permits cost-effective high-throughput genotyping in thousands of individuals simultaneously. Genomic resources are scarce for the Atlantic herring (Clupea harengus), a small pelagic species that sustains high revenue fisheries. This paper details the development of 578 SNPs using a combined NGS and high-throughput genotyping approach. Eight individuals covering the species distribution in the eastern Atlantic were bar-coded and multiplexed into a single cDNA library and sequenced using the 454 GS FLX platform. SNP discovery was performed by de novo sequence clustering and contig assembly, followed by the mapping of reads against consensus contig sequences. Selection of candidate SNPs for genotyping was conducted using an in silico approach. SNP validation and genotyping were performed simultaneously using an Illumina 1,536 GoldenGate assay. Although the conversion rate of candidate SNPs in the genotyping assay cannot be predicted in advance, this approach has the potential to maximise cost and time efficiencies by avoiding expensive and time-consuming laboratory stages of SNP validation. Additionally, the in silico approach leads to lower ascertainment bias in the resulting SNP panel as marker selection is based only on the ability to design primers and the predicted presence of intron-exon boundaries. Consequently SNPs with a wider spectrum of minor allele frequencies (MAFs) will be genotyped in the final panel. The genomic resources presented here represent a valuable multi-purpose resource for developing informative marker panels for population discrimination, microarray development and for population

  13. Genome rearrangements detected by SNP microarrays in individuals with intellectual disability referred with possible Williams syndrome.

    PubMed

    Pani, Ariel M; Hobart, Holly H; Morris, Colleen A; Mervis, Carolyn B; Bray-Ward, Patricia; Kimberley, Kendra W; Rios, Cecilia M; Clark, Robin C; Gulbronson, Maricela D; Gowans, Gordon C; Gregg, Ronald G

    2010-08-31

    Intellectual disability (ID) affects 2-3% of the population and may occur with or without multiple congenital anomalies (MCA) or other medical conditions. Established genetic syndromes and visible chromosome abnormalities account for a substantial percentage of ID diagnoses, although for approximately 50% the molecular etiology is unknown. Individuals with features suggestive of various syndromes but lacking their associated genetic anomalies pose a formidable clinical challenge. With the advent of microarray techniques, submicroscopic genome alterations not associated with known syndromes are emerging as a significant cause of ID and MCA. High-density SNP microarrays were used to determine genome wide copy number in 42 individuals: 7 with confirmed alterations in the WS region but atypical clinical phenotypes, 31 with ID and/or MCA, and 4 controls. One individual from the first group had the most telomeric gene in the WS critical region deleted along with 2 Mb of flanking sequence. A second person had the classic WS deletion and a rearrangement on chromosome 5p within the Cri du Chat syndrome (OMIM:123450) region. Six individuals from the ID/MCA group had large rearrangements (3 deletions, 3 duplications), one of whom had a large inversion associated with a deletion that was not detected by the SNP arrays. Combining SNP microarray analyses and qPCR allowed us to clone and sequence 21 deletion breakpoints in individuals with atypical deletions in the WS region and/or ID or MCA. Comparison of these breakpoints to databases of genomic variation revealed that 52% occurred in regions harboring structural variants in the general population. For two probands the genomic alterations were flanked by segmental duplications, which frequently mediate recurrent genome rearrangements; these may represent new genomic disorders. While SNP arrays and related technologies can identify potentially pathogenic deletions and duplications, obtaining sequence information from the

  14. Light whole genome sequence for SNP discovery across domestic cat breeds

    PubMed Central

    2010-01-01

    Background The domestic cat has offered enormous genomic potential in the veterinary description of over 250 hereditary disease models as well as the occurrence of several deadly feline viruses (feline leukemia virus -- FeLV, feline coronavirus -- FECV, feline immunodeficiency virus - FIV) that are homologues to human scourges (cancer, SARS, and AIDS respectively). However, to realize this bio-medical potential, a high density single nucleotide polymorphism (SNP) map is required in order to accomplish disease and phenotype association discovery. Description To remedy this, we generated 3,178,297 paired fosmid-end Sanger sequence reads from seven cats, and combined these data with the publicly available 2X cat whole genome sequence. All sequence reads were assembled together to form a 3X whole genome assembly allowing the discovery of over three million SNPs. To reduce potential false positive SNPs due to the low coverage assembly, a low upper-limit was placed on sequence coverage and a high lower-limit on the quality of the discrepant bases at a potential variant site. In all domestic cats of different breeds: female Abyssinian, female American shorthair, male Cornish Rex, female European Burmese, female Persian, female Siamese, a male Ragdoll and a female African wildcat were sequenced lightly. We report a total of 964 k common SNPs suitable for a domestic cat SNP genotyping array and an additional 900 k SNPs detected between African wildcat and domestic cats breeds. An empirical sampling of 94 discovered SNPs were tested in the sequenced cats resulting in a SNP validation rate of 99%. Conclusions These data provide a large collection of mapped feline SNPs across the cat genome that will allow for the development of SNP genotyping platforms for mapping feline diseases. PMID:20576142

  15. Light whole genome sequence for SNP discovery across domestic cat breeds.

    PubMed

    Mullikin, James C; Hansen, Nancy F; Shen, Lei; Ebling, Heather; Donahue, William F; Tao, Wei; Saranga, David J; Brand, Adrianne; Rubenfield, Marc J; Young, Alice C; Cruz, Pedro; Driscoll, Carlos; David, Victor; Al-Murrani, Samer W K; Locniskar, Mary F; Abrahamsen, Mitchell S; O'Brien, Stephen J; Smith, Douglas R; Brockman, Jeffrey A

    2010-06-24

    The domestic cat has offered enormous genomic potential in the veterinary description of over 250 hereditary disease models as well as the occurrence of several deadly feline viruses (feline leukemia virus--FeLV, feline coronavirus--FECV, feline immunodeficiency virus--FIV) that are homologues to human scourges (cancer, SARS, and AIDS respectively). However, to realize this bio-medical potential, a high density single nucleotide polymorphism (SNP) map is required in order to accomplish disease and phenotype association discovery. To remedy this, we generated 3,178,297 paired fosmid-end Sanger sequence reads from seven cats, and combined these data with the publicly available 2X cat whole genome sequence. All sequence reads were assembled together to form a 3X whole genome assembly allowing the discovery of over three million SNPs. To reduce potential false positive SNPs due to the low coverage assembly, a low upper-limit was placed on sequence coverage and a high lower-limit on the quality of the discrepant bases at a potential variant site. In all domestic cats of different breeds: female Abyssinian, female American shorthair, male Cornish Rex, female European Burmese, female Persian, female Siamese, a male Ragdoll and a female African wildcat were sequenced lightly. We report a total of 964 k common SNPs suitable for a domestic cat SNP genotyping array and an additional 900 k SNPs detected between African wildcat and domestic cats breeds. An empirical sampling of 94 discovered SNPs were tested in the sequenced cats resulting in a SNP validation rate of 99%. These data provide a large collection of mapped feline SNPs across the cat genome that will allow for the development of SNP genotyping platforms for mapping feline diseases.

  16. Report on the development of putative functional SSR and SNP markers in passion fruits.

    PubMed

    da Costa, Zirlane Portugal; Munhoz, Carla de Freitas; Vieira, Maria Lucia Carneiro

    2017-09-06

    Passionflowers Passiflora edulis and Passiflora alata are diploid, outcrossing and understudied fruit bearing species. In Brazil, passion fruit cultivation began relatively recently and has earned the country an outstanding position as the world's top producer of passion fruit. The fruit's main economic value lies in the production of juice, an essential exotic ingredient in juice blends. Currently, crop improvement strategies, including those for underexploited tropical species, tend to incorporate molecular genetic approaches. In this study, we examined a set of P. edulis transcripts expressed in response to infection by Xanthomonas axonopodis, (the passion fruit's main bacterial pathogen that attacks the vines), aiming at the development of putative functional markers, i.e. SSRs (simple sequence repeats) and SNPs (single nucleotide polymorphisms). A total of 210 microsatellites were found in 998 sequences, and trinucleotide repeats were found to be the most frequent (31.4%). Of the sequences selected for designing primers, 80.9% could be used to develop SSR markers, and 60.6% SNP markers for P. alata. SNPs were all biallelic and found within 15 gene fragments of P. alata. Overall, gene fragments generated 10,003 bp. SNP frequency was estimated as one SNP every 294 bp. Polymorphism rates revealed by SSR and SNP loci were 29.4 and 53.6%, respectively. Passiflora edulis transcripts were useful for the development of putative functional markers for P. alata, suggesting a certain level of sequence conservation between these cultivated species. The markers developed herein could be used for genetic mapping purposes and also in diversity studies.

  17. Haplotype inference from unphased SNP data in heterozygous polyploids based on SAT

    PubMed Central

    Neigenfind, Jost; Gyetvai, Gabor; Basekow, Rico; Diehl, Svenja; Achenbach, Ute; Gebhardt, Christiane; Selbig, Joachim; Kersten, Birgit

    2008-01-01

    Background Haplotype inference based on unphased SNP markers is an important task in population genetics. Although there are different approaches to the inference of haplotypes in diploid species, the existing software is not suitable for inferring haplotypes from unphased SNP data in polyploid species, such as the cultivated potato (Solanum tuberosum). Potato species are tetraploid and highly heterozygous. Results Here we present the software SATlotyper which is able to handle polyploid and polyallelic data. SATlo-typer uses the Boolean satisfiability problem to formulate Haplotype Inference by Pure Parsimony. The software excludes existing haplotype inferences, thus allowing for calculation of alternative inferences. As it is not known which of the multiple haplotype inferences are best supported by the given unphased data set, we use a bootstrapping procedure that allows for scoring of alternative inferences. Finally, by means of the bootstrapping scores, it is possible to optimise the phased genotypes belonging to a given haplotype inference. The program is evaluated with simulated and experimental SNP data generated for heterozygous tetraploid populations of potato. We show that, instead of taking the first haplotype inference reported by the program, we can significantly improve the quality of the final result by applying additional methods that include scoring of the alternative haplotype inferences and genotype optimisation. For a sub-population of nineteen individuals, the predicted results computed by SATlotyper were directly compared with results obtained by experimental haplotype inference via sequencing of cloned amplicons. Prediction and experiment gave similar results regarding the inferred haplotypes and phased genotypes. Conclusion Our results suggest that Haplotype Inference by Pure Parsimony can be solved efficiently by the SAT approach, even for data sets of unphased SNP from heterozygous polyploids. SATlotyper is freeware and is distributed as

  18. High-throughput SNP-genotyping analysis of the relationships among Ponto-Caspian sturgeon species

    PubMed Central

    Rastorguev, Sergey M; Nedoluzhko, Artem V; Mazur, Alexander M; Gruzdeva, Natalia M; Volkov, Alexander A; Barmintseva, Anna E; Mugue, Nikolai S; Prokhortchouk, Egor B

    2013-01-01

    Abstract Legally certified sturgeon fisheries require population protection and conservation methods, including DNA tests to identify the source of valuable sturgeon roe. However, the available genetic data are insufficient to distinguish between different sturgeon populations, and are even unable to distinguish between some species. We performed high-throughput single-nucleotide polymorphism (SNP)-genotyping analysis on different populations of Russian (Acipenser gueldenstaedtii), Persian (A. persicus), and Siberian (A. baerii) sturgeon species from the Caspian Sea region (Volga and Ural Rivers), the Azov Sea, and two Siberian rivers. We found that Russian sturgeons from the Volga and Ural Rivers were essentially indistinguishable, but they differed from Russian sturgeons in the Azov Sea, and from Persian and Siberian sturgeons. We identified eight SNPs that were sufficient to distinguish these sturgeon populations with 80% confidence, and allowed the development of markers to distinguish sturgeon species. Finally, on the basis of our SNP data, we propose that the A. baerii-like mitochondrial DNA found in some Russian sturgeons from the Caspian Sea arose via an introgression event during the Pleistocene glaciation. In the present study, the high-throughput genotyping analysis of several sturgeon populations was performed. SNP markers for species identification were defined. The possible explanation of the baerii-like mitotype presence in some Russian sturgeons in the Caspian Sea was suggested. PMID:24567827

  19. Quadruplex-single nucleotide polymorphisms (Quad-SNP) influence gene expression difference among individuals.

    PubMed

    Baral, Aradhita; Kumar, Pankaj; Halder, Rashi; Mani, Prithvi; Yadav, Vinod Kumar; Singh, Ankita; Das, Swapan K; Chowdhury, Shantanu

    2012-05-01

    Non-canonical guanine quadruplex structures are not only predominant but also conserved among bacterial and mammalian promoters. Moreover recent findings directly implicate quadruplex structures in transcription. These argue for an intrinsic role of the structural motif and thereby posit that single nucleotide polymorphisms (SNP) that compromise the quadruplex architecture could influence function. To test this, we analysed SNPs within quadruplex motifs (Quad-SNP) and gene expression in 270 individuals across four populations (HapMap) representing more than 14,500 genotypes. Findings reveal significant association between quadruplex-SNPs and expression of the corresponding gene in individuals (P < 0.0001). Furthermore, analysis of Quad-SNPs obtained from population-scale sequencing of 1000 human genomes showed relative selection bias against alteration of the structural motif. To directly test the quadruplex-SNP-transcription connection, we constructed a reporter system using the RPS3 promoter-remarkable difference in promoter activity in the 'quadruplex-destabilized' versus 'quadruplex-intact' promoter was noticed. As a further test, we incorporated a quadruplex motif or its disrupted counterpart within a synthetic promoter reporter construct. The quadruplex motif, and not the disrupted-motif, enhanced transcription in human cell lines of different origin. Together, these findings build direct support for quadruplex-mediated transcription and suggest quadruplex-SNPs may play significant role in mechanistically understanding variations in gene expression among individuals.

  20. Demographic Trends in Korean Native Cattle Explained Using Bovine SNP50 Beadchip.

    PubMed

    Sharma, Aditi; Lim, Dajeong; Chai, Han-Ha; Choi, Bong-Hwan; Cho, Yongmin

    2016-12-01

    Linkage disequilibrium (LD) is the non-random association between the loci and it could give us a preliminary insight into the genetic history of the population. In the present study LD patterns and effective population size (Ne) of three Korean cattle breeds along with Chinese, Japanese and Mongolian cattle were compared using the bovine Illumina SNP50 panel. The effective population size (Ne) is the number of breeding individuals in a population and is particularly important as it determines the rate at which genetic variation is lost. The genotype data in our study comprised a total of 129 samples, varying from 4 to 39 samples. After quality control there were ~29,000 single nucleotide polymorphisms (SNPs) for which r(2) value was calculated. Average distance between SNP pairs was 1.14 Mb across all breeds. Average r(2) between adjacent SNP pairs ranged between was 0.1 for Yanbian to 0.3 for Qinchuan. Effective population size of the breeds based on r(2) varied from 16 in Hainan to 226 in Yanbian. Amongst the Korean native breeds effective population size of Brindle Hanwoo was the least with Ne = 59 and Brown Hanwoo was the highest with Ne = 83. The effective population size of the Korean cattle breeds has been decreasing alarmingly over the past generations. We suggest appropriate measures to be taken to prevent these local breeds in their native tracts.

  1. OPRM1 SNP (A118G): Involvement in disease development, treatment response, and animal models

    PubMed Central

    Mague, Stephen D.; Blendy, Julie A.

    2010-01-01

    Endogenous opioids acting at μ-opioid receptors mediate many biological functions. Pharmacological intervention at these receptors has greatly aided in the treatment of acute and chronic pain, in addition to other uses. However, the development of tolerance and dependence has made it difficult to adequately prescribe these therapeutics. A common single nucleotide polymorphism (SNP), A118G, in the μ-opioid receptor gene can affect opioid function and, consequently, has been suggested to contribute to individual variability in pain management and drug addiction. Investigation into the role of A118G in human disease and treatment response has generated a large number of association studies across various disease states as well as physiological responses. However, characterizing the functional consequences of this SNP and establishing if it causes or contributes to disease phenotypes have been significant challenges. In this manuscript, we will review a number of association studies as well as investigations of the functional impact of this gene variant. In addition, we will describe a novel mouse model that was generated to recapitulate this SNP in mice. Evaluation of models that incorporate known human genetic variants into a tractable system, like the mouse, will facilitate the understanding of discrete contributions of SNPs to human disease. PMID:20074870

  2. Comparison of SNP-based detection assays for food analysis: Coffee authentication.

    PubMed

    Spaniolas, Stelios; Bazakos, Christos; Tucker, Gregory A; Bennett, Malcolm J

    2014-01-01

    Recently, DNA-based authentication methods were developed to serve as complementary approaches to analytical chemistry techniques. The single nucleotide polymorphism (SNP)-based reaction chemistries, when combined with the existing detection methods, could result in numerous analytical approaches, all with particular advantages and disadvantages. The dual aim of this study was (a) to develop SNP-based analytical assays such as the single-base primer extension (SNaPShot) and pyrosequencing in order to differentiate Arabica and Robusta varieties for the authentication of coffee beans and (b) to compare the performances of SNaPshot, pyrosequencing and the previously developed polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) using an Agilent 2100 Bioanalyzer on the basis of linearity (R2) and LOD, expressed as percentage of the adulterant species, using green coffee beans (Arabica and Robusta) as a food model. The results showed that SNaPshot analysis exhibited the best LOD, whereas pyrosequencing revealed the best linearity (R2 = 0.997). The PCR-RFLP assay using the Agilent 2100 Bioanalyzer could prove to be a very useful method for a laboratory that lacks sequencing facilities but it can be used only if a SNP creates/deletes a restriction site.

  3. Making a chocolate chip: development and evaluation of a 6K SNP array for Theobroma cacao

    PubMed Central

    Livingstone, Donald; Royaert, Stefan; Stack, Conrad; Mockaitis, Keithanne; May, Greg; Farmer, Andrew; Saski, Christopher; Schnell, Ray; Kuhn, David; Motamayor, Juan Carlos

    2015-01-01

    Theobroma cacao, the key ingredient in chocolate production, is one of the world's most important tree fruit crops, with ∼4,000,000 metric tons produced across 50 countries. To move towards gene discovery and marker-assisted breeding in cacao, a single-nucleotide polymorphism (SNP) identification project was undertaken using RNAseq data from 16 diverse cacao cultivars. RNA sequences were aligned to the assembled transcriptome of the cultivar Matina 1-6, and 330,000 SNPs within coding regions were identified. From these SNPs, a subset of 6,000 high-quality SNPs were selected for inclusion on an Illumina Infinium SNP array: the Cacao6kSNP array. Using Cacao6KSNP array data from over 1,000 cacao samples, we demonstrate that our custom array produces a saturated genetic map and can be used to distinguish among even closely related genotypes. Our study enhances and expands the genetic resources available to the cacao research community, and provides the genome-scale set of tools that are critical for advancing breeding with molecular markers in an agricultural species with high genetic diversity. PMID:26070980

  4. Making a chocolate chip: development and evaluation of a 6K SNP array for Theobroma cacao.

    PubMed

    Livingstone, Donald; Royaert, Stefan; Stack, Conrad; Mockaitis, Keithanne; May, Greg; Farmer, Andrew; Saski, Christopher; Schnell, Ray; Kuhn, David; Motamayor, Juan Carlos

    2015-08-01

    Theobroma cacao, the key ingredient in chocolate production, is one of the world's most important tree fruit crops, with ∼4,000,000 metric tons produced across 50 countries. To move towards gene discovery and marker-assisted breeding in cacao, a single-nucleotide polymorphism (SNP) identification project was undertaken using RNAseq data from 16 diverse cacao cultivars. RNA sequences were aligned to the assembled transcriptome of the cultivar Matina 1-6, and 330,000 SNPs within coding regions were identified. From these SNPs, a subset of 6,000 high-quality SNPs were selected for inclusion on an Illumina Infinium SNP array: the Cacao6kSNP array. Using Cacao6KSNP array data from over 1,000 cacao samples, we demonstrate that our custom array produces a saturated genetic map and can be used to distinguish among even closely related genotypes. Our study enhances and expands the genetic resources available to the cacao research community, and provides the genome-scale set of tools that are critical for advancing breeding with molecular markers in an agricultural species with high genetic diversity. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  5. Demographic Trends in Korean Native Cattle Explained Using Bovine SNP50 Beadchip

    PubMed Central

    Sharma, Aditi; Lim, Dajeong; Chai, Han-Ha; Choi, Bong-Hwan; Cho, Yongmin

    2016-01-01

    Linkage disequilibrium (LD) is the non-random association between the loci and it could give us a preliminary insight into the genetic history of the population. In the present study LD patterns and effective population size (Ne) of three Korean cattle breeds along with Chinese, Japanese and Mongolian cattle were compared using the bovine Illumina SNP50 panel. The effective population size (Ne) is the number of breeding individuals in a population and is particularly important as it determines the rate at which genetic variation is lost. The genotype data in our study comprised a total of 129 samples, varying from 4 to 39 samples. After quality control there were ~29,000 single nucleotide polymorphisms (SNPs) for which r2 value was calculated. Average distance between SNP pairs was 1.14 Mb across all breeds. Average r2 between adjacent SNP pairs ranged between was 0.1 for Yanbian to 0.3 for Qinchuan. Effective population size of the breeds based on r2 varied from 16 in Hainan to 226 in Yanbian. Amongst the Korean native breeds effective population size of Brindle Hanwoo was the least with Ne = 59 and Brown Hanwoo was the highest with Ne = 83. The effective population size of the Korean cattle breeds has been decreasing alarmingly over the past generations. We suggest appropriate measures to be taken to prevent these local breeds in their native tracts. PMID:28154516

  6. Proper joint analysis of summary association statistics requires the adjustment of heterogeneity in SNP coverage pattern.

    PubMed

    Zhang, Han; Wheeler, William; Song, Lei; Yu, Kai

    2017-07-07

    As meta-analysis results published by consortia of genome-wide association studies (GWASs) become increasingly available, many association summary statistics-based multi-locus tests have been developed to jointly evaluate multiple single-nucleotide polymorphisms (SNPs) to reveal novel genetic architectures of various complex traits. The validity of these approaches relies on the accurate estimate of z-score correlations at considered SNPs, which in turn requires knowledge on the set of SNPs assessed by each study participating in the meta-analysis. However, this exact SNP coverage information is usually unavailable from the meta-analysis results published by GWAS consortia. In the absence of the coverage information, researchers typically estimate the z-score correlations by making oversimplified coverage assumptions. We show through real studies that such a practice can generate highly inflated type I errors, and we demonstrate the proper way to incorporate correct coverage information into multi-locus analyses. We advocate that consortia should make SNP coverage information available when posting their meta-analysis results, and that investigators who develop analytic tools for joint analyses based on summary data should pay attention to the variation in SNP coverage and adjust for it appropriately. Published by Oxford University Press 2017. This work is written by US Government employees and is in the public domain in the US.

  7. Identification and SNP association analysis of a novel gene in chicken.

    PubMed

    Mei, Xingxing; Kang, Xiangtao; Liu, Xiaojun; Jia, Lijuan; Li, Hong; Li, Zhuanjian; Jiang, Ruirui

    2016-02-01

    A novel gene that was predicted to encode a long noncoding RNA (lncRNA) transcript was identified in a previous study that aimed to detect candidate genes related to growth rate differences between Chinese local breed Gushi chickens and Anka broilers. To characterise the biological function of the lncRNA, we cloned and sequenced the complete open reading frame of the gene. We performed quantitative real-time polymerase chain reaction (qPCR) to analyse the expression patterns of the lncRNA in different tissues of chicken at different development stages. The qPCR data showed that the novel lncRNA gene was expressed extensively, with the highest abundance in spleen and lung and the lowest abundance in pectoralis and leg muscle. Additionally, we identified a single nucleotide polymorphism (SNP) at the 5'-end of the gene and studied the association between the SNP and chicken growth traits using data from an F2 resource population of Gushi chickens and Anka broilers. The association analysis showed that the SNP was significantly (P < 0.05) associated with leg muscle weight, chest breadth, sternal length and body weight in chickens at 1 day, 4 weeks and 6 weeks of age. We concluded that the novel lncRNA gene, which we designated pouBW1, may play an important role in regulating chicken growth. © 2015 Stichting International Foundation for Animal Genetics.

  8. SNP Discovery and Development of a High-Density Genotyping Array for Sunflower

    PubMed Central

    Bachlava, Eleni; Taylor, Christopher A.; Tang, Shunxue; Bowers, John E.; Mandel, Jennifer R.; Burke, John M.; Knapp, Steven J.

    2012-01-01

    Recent advances in next-generation DNA sequencing technologies have made possible the development of high-throughput SNP genotyping platforms that allow for the simultaneous interrogation of thousands of single-nucleotide polymorphisms (SNPs). Such resources have the potential to facilitate the rapid development of high-density genetic maps, and to enable genome-wide association studies as well as molecular breeding approaches in a variety of taxa. Herein, we describe the development of a SNP genotyping resource for use in sunflower (Helianthus annuus L.). This work involved the development of a reference transcriptome assembly for sunflower, the discovery of thousands of high quality SNPs based on the generation and analysis of ca. 6 Gb of transcriptome re-sequencing data derived from multiple genotypes, the selection of 10,640 SNPs for inclusion in the genotyping array, and the use of the resulting array to screen a diverse panel of sunflower accessions as well as related wild species. The results of this work revealed a high frequency of polymorphic SNPs and relatively high level of cross-species transferability. Indeed, greater than 95% of successful SNP assays revealed polymorphism, and more than 90% of these assays could be successfully transferred to related wild species. Analysis of the polymorphism data revealed patterns of genetic differentiation that were largely congruent with the evolutionary history of sunflower, though the large number of markers allowed for finer resolution than has previously been possible. PMID:22238659

  9. SNP typing reveals similarity in Mycobacterium tuberculosis genetic diversity between Portugal and Northeast Brazil.

    PubMed

    Lopes, Joao S; Marques, Isabel; Soares, Patricia; Nebenzahl-Guimaraes, Hanna; Costa, Joao; Miranda, Anabela; Duarte, Raquel; Alves, Adriana; Macedo, Rita; Duarte, Tonya A; Barbosa, Theolis; Oliveira, Martha; Nery, Joilda S; Boechat, Neio; Pereira, Susan M; Barreto, Mauricio L; Pereira-Leal, Jose; Gomes, Maria Gabriela Miranda; Penha-Goncalves, Carlos

    2013-08-01

    Human tuberculosis is an infectious disease caused by bacteria from the Mycobacterium tuberculosis complex (MTBC). Although spoligotyping and MIRU-VNTR are standard methodologies in MTBC genetic epidemiology, recent studies suggest that Single Nucleotide Polymorphisms (SNP) are advantageous in phylogenetics and strain group/lineages identification. In this work we use a set of 79 SNPs to characterize 1987 MTBC isolates from Portugal and 141 from Northeast Brazil. All Brazilian samples were further characterized using spolygotyping. Phylogenetic analysis against a reference set revealed that about 95% of the isolates in both populations are singly attributed to bacterial lineage 4. Within this lineage, the most frequent strain groups in both Portugal and Brazil are LAM, followed by Haarlem and X. Contrary to these groups, strain group T showed a very different prevalence between Portugal (10%) and Brazil (1.5%). Spoligotype identification shows about 10% of mis-matches compared to the use of SNPs and a little more than 1% of strains unidentifiability. The mis-matches are observed in the most represented groups of our sample set (i.e., LAM and Haarlem) in almost the same proportion. Besides being more accurate in identifying strain groups/lineages, SNP-typing can also provide phylogenetic relationships between strain groups/lineages and, thus, indicate cases showing phylogenetic incongruence. Overall, the use of SNP-typing revealed striking similarities between MTBC populations from Portugal and Brazil.

  10. SNP genotyping in melons: genetic variation, population structure, and linkage disequilibrium.

    PubMed

    Esteras, Cristina; Formisano, Gelsomina; Roig, Cristina; Díaz, Aurora; Blanca, José; Garcia-Mas, Jordi; Gómez-Guillamón, María Luisa; López-Sesé, Ana Isabel; Lázaro, Almudena; Monforte, Antonio J; Picó, Belén

    2013-05-01

    Novel sequencing technologies were recently used to generate sequences from multiple melon (Cucumis melo L.) genotypes, enabling the in silico identification of large single nucleotide polymorphism (SNP) collections. In order to optimize the use of these markers, SNP validation an