Science.gov

Sample records for acat-1 rs1044925 snp

  1. Polymorphism of rs1044925 in the acyl-CoA:cholesterol acyltransferase-1 gene and serum lipid levels in the Guangxi Bai Ku Yao and Han populations

    PubMed Central

    2010-01-01

    Background The association of rs1044925 polymorphism in the acyl-CoA:cholesterol acyltransferase-1 (ACAT-1) gene and serum lipid profiles is not well known in different ethnic groups. Bai Ku Yao is a special subgroup of the Yao minority in China. The present study was carried out to clarify the association of rs1044925 polymorphism in the ACAT-1 gene and several environmental factors with serum lipid levels in the Guangxi Bai Ku Yao and Han populations. Methods A total of 626 subjects of Bai Ku Yao and 624 participants of Han Chinese were randomly selected from our previous stratified randomized cluster samples. Genotyping of rs1044925 polymorphism in the ACAT-1 gene was performed by polymerase chain reaction and restriction fragment length polymorphism combined with gel electrophoresis, and then confirmed by direct sequencing. Results The levels of serum total cholesterol (TC), high-density lipoprotein cholesterol (HDL-C), apolipoprotein (Apo) AI and ApoB were lower in Bai Ku Yao than in Han (P < 0.01 for all). The frequency of A and C alleles was 79.0% and 21.0% in Bai Ku Yao, and 87.3% and 12.7% in Han (P < 0.001); respectively. The frequency of AA, AC and CC genotypes was 63.2%, 31.4% and 5.2% in Bai Ku Yao, and 75.6%, 23.2% and 1.1% in Han (P < 0.001); respectively. The levels of TC, LDL-C and ApoB in Bai Ku Yao but not in Han were different between the AA and AC/CC genotypes in females but not in males (P < 0.05 for all). The C allele carriers had lower serum TC, LDL-C and ApoB levels as compared with the C allele noncarriers. The levels of TC, LDL-C and ApoB in Bai Ku Yao but not in Han were correlated with genotypes in females but not in males (P < 0.05 for all). Serum lipid parameters were also correlated with sex, age, body mass index, alcohol consumption, and blood pressure in both ethnic groups (P < 0.05-0.001). Conclusions These results suggest that the polymorphism of rs1044925 in the ACAT-1 gene is mainly associated with female serum TC, LDL-C and

  2. Knockdown of ACAT-1 reduces amyloidogenic processing of APP.

    PubMed

    Huttunen, Henri J; Greco, Christopher; Kovacs, Dora M

    2007-04-17

    Previous studies have shown that acyl-coenzyme A:cholesterol acyl transferase (ACAT), an enzyme that controls cellular equilibrium between free cholesterol and cholesteryl esters, modulates proteolytic processing of APP in cell-based and animal models of Alzheimer's disease. Here we report that ACAT-1 RNAi reduced cellular ACAT-1 protein by approximately 50% and cholesteryl ester levels by 22% while causing a slight increase in the free cholesterol content of ER membranes. This correlated with reduced proteolytic processing of APP and 40% decrease in Abeta secretion. These data show that even a modest decrease in ACAT activity can have robust suppressive effects on Abeta generation.

  3. Expression of ACAT-1 protein in human atherosclerotic lesions and cultured human monocytes-macrophages.

    PubMed

    Miyazaki, A; Sakashita, N; Lee, O; Takahashi, K; Horiuchi, S; Hakamata, H; Morganelli, P M; Chang, C C; Chang, T Y

    1998-10-01

    The acyl coenzyme A:cholesterol acyltransferase (ACAT) gene was first cloned in 1993 (Chang et al, J Biol Chem. 1993;268:20747-20755; designated ACAT-1). Using affinity-purified antibodies raised against the N-terminal portion of human ACAT-1 protein, we performed immunohistochemical localization studies and showed that the ACAT-1 protein was highly expressed in atherosclerotic lesions of the human aorta. We also performed cell-specific localization studies using double immunostaining and showed that ACAT-1 was predominantly expressed in macrophages but not in smooth muscle cells. We then used a cell culture system in vitro to monitor the ACAT-1 expression in differentiating monocytes-macrophages. The ACAT-1 protein content increased by up to 10-fold when monocytes spontaneously differentiated into macrophages. This increase occurred within the first 2 days of culturing the monocytes and reached a plateau level within 4 days of culturing, indicating that the increase in ACAT-1 protein content is an early event during the monocyte differentiation process. The ACAT-1 protein expressed in the differentiating monocytes-macrophages was shown to be active by enzyme assay in vitro. The high levels of ACAT-1 present in macrophages maintained in culture can explain the high ACAT-1 contents found in atherosclerotic lesions. Our results thus support the idea that ACAT-1 plays an important role in differentiating monocytes and in forming macrophage foam cells during the development of human atherosclerosis.

  4. Immunological quantitation and localization of ACAT-1 and ACAT-2 in human liver and small intestine.

    PubMed

    Chang, C C; Sakashita, N; Ornvold, K; Lee, O; Chang, E T; Dong, R; Lin, S; Lee, C Y; Strom, S C; Kashyap, R; Fung, J J; Farese, R V; Patoiseau, J F; Delhon, A; Chang, T Y

    2000-09-08

    By using specific anti-ACAT-1 antibodies in immunodepletion studies, we previously found that ACAT-1, a 50-kDa protein, plays a major catalytic role in the adult human liver, adrenal glands, macrophages, and kidneys but not in the intestine. Acyl-coenzyme A:cholesterol acyltransferase (ACAT) activity in the intestine may be largely derived from a different ACAT protein. To test this hypothesis, we produced specific polyclonal anti-ACAT-2 antibodies that quantitatively immunodepleted human ACAT-2, a 46-kDa protein expressed in Chinese hamster ovary cells. In hepatocyte-like HepG2 cells, ACAT-1 comprises 85-90% of the total ACAT activity, with the remainder attributed to ACAT-2. In adult intestines, most of the ACAT activity can be immunodepleted by anti-ACAT-2. ACAT-1 and ACAT-2 do not form hetero-oligomeric complexes. In differentiating intestinal enterocyte-like Caco-2 cells, ACAT-2 protein content increases by 5-10-fold in 6 days, whereas ACAT-1 protein content remains relatively constant. In the small intestine, ACAT-2 is concentrated at the apices of the villi, whereas ACAT-1 is uniformly distributed along the villus-crypt axis. In the human liver, ACAT-1 is present in both fetal and adult hepatocytes. In contrast, ACAT-2 is evident in fetal but not adult hepatocytes. Our results collectively suggest that in humans, ACAT-2 performs significant catalytic roles in the fetal liver and in intestinal enterocytes.

  5. Acat1 knockdown gene therapy decreases amyloid-β in a mouse model of Alzheimer's disease.

    PubMed

    Murphy, Stephanie R; Chang, Catherine Cy; Dogbevia, Godwin; Bryleva, Elena Y; Bowen, Zachary; Hasan, Mazahir T; Chang, Ta-Yuan

    2013-08-01

    Both genetic inactivation and pharmacological inhibition of the cholesteryl ester synthetic enzyme acyl-CoA:cholesterol acyltransferase 1 (ACAT1) have shown benefit in mouse models of Alzheimer's disease (AD). In this study, we aimed to test the potential therapeutic applications of adeno-associated virus (AAV)-mediated Acat1 gene knockdown in AD mice. We constructed recombinant AAVs expressing artificial microRNA (miRNA) sequences, which targeted Acat1 for knockdown. We demonstrated that our AAVs could infect cultured mouse neurons and glia and effectively knockdown ACAT activity in vitro. We next delivered the AAVs to mouse brains neurosurgically, and demonstrated that Acat1-targeting AAVs could express viral proteins and effectively diminish ACAT activity in vivo, without inducing appreciable inflammation. We delivered the AAVs to the brains of 10-month-old AD mice and analyzed the effects on the AD phenotype at 12 months of age. Acat1-targeting AAV delivered to the brains of AD mice decreased the levels of brain amyloid-β and full-length human amyloid precursor protein (hAPP), to levels similar to complete genetic ablation of Acat1. This study provides support for the potential therapeutic use of Acat1 knockdown gene therapy in AD.

  6. Absence of ACAT-1 attenuates atherosclerosis but causes dry eye and cutaneous xanthomatosis in mice with congenital hyperlipidemia.

    PubMed

    Yagyu, H; Kitamine, T; Osuga, J; Tozawa, R; Chen, Z; Kaji, Y; Oka, T; Perrey, S; Tamura, Y; Ohashi, K; Okazaki, H; Yahagi, N; Shionoiri, F; Iizuka, Y; Harada, K; Shimano, H; Yamashita, H; Gotoda, T; Yamada, N; Ishibashi, S

    2000-07-14

    Acyl-CoA:cholesterol acyltransferase (ACAT) catalyzes esterification of cellular cholesterol. To investigate the role of ACAT-1 in atherosclerosis, we have generated ACAT-1 null (ACAT-1-/-) mice. ACAT activities were present in the liver and intestine but were completely absent in adrenal, testes, ovaries, and peritoneal macrophages in our ACAT-1-/- mice. The ACAT-1-/- mice had decreased openings of the eyes because of atrophy of the meibomian glands, a modified form of sebaceous glands normally expressing high ACAT activities. This phenotype is similar to dry eye syndrome in humans. To determine the role of ACAT-1 in atherogenesis, we crossed the ACAT-1-/- mice with mice lacking apolipoprotein (apo) E or the low density lipoprotein receptor (LDLR), hyperlipidemic models susceptible to atherosclerosis. High fat feeding resulted in extensive cutaneous xanthomatosis with loss of hair in both ACAT-1-/-:apo E-/- and ACAT-1-/-:LDLR-/- mice. Free cholesterol content was significantly increased in their skin. Aortic fatty streak lesion size as well as cholesteryl ester content were moderately reduced in both double mutant mice compared with their respective controls. These results indicate that the local inhibition of ACAT activity in tissue macrophages is protective against cholesteryl ester accumulation but causes cutaneous xanthomatosis in mice that lack apo E or LDLR.

  7. Localization of human acyl-coenzyme A: cholesterol acyltransferase-1 (ACAT-1) in macrophages and in various tissues.

    PubMed

    Sakashita, N; Miyazaki, A; Takeya, M; Horiuchi, S; Chang, C C; Chang, T Y; Takahashi, K

    2000-01-01

    To investigate the distribution of acyl-coenzyme A:cholesterol acyltransferase-1 (ACAT-1) in various human tissues, we examined tissues of autopsy cases immunohistochemically. ACAT-1 was demonstrated in macrophages, antigen-presenting cells, steroid hormone-producing cells, neurons, cardiomyocytes, smooth muscle cells, mesothelial cells, epithelial cells of the urinary tracts, thyroid follicles, renal tubules, pituitary, prostatic, and bronchial glands, alveolar and intestinal epithelial cells, pancreatic acinar cells, and hepatocytes. These findings showed that ACAT-1 is present in a variety of human tissues examined. The immunoreactivities are particularly prominent in the macrophages, steroid hormone-producing cells, followed by hepatocytes, and intestinal epithelia. In cultured human macrophages, immunoelectron microscopy revealed that ACAT-1 was located mainly in the tubular rough endoplasmic reticulum; immunoblot analysis showed that the ACAT-1 protein content did not change with or without cholesterol loading; however, on cholesterol loading, about 30 to 40% of the total immunoreactivity appeared in small-sized vesicles. These vesicles were also enriched in 78-kd glucose-regulated protein (GRP 78), a specific marker for the endoplasmic reticulum. Immunofluorescent microscopy demonstrated extensive colocalization of ACAT-1 and GRP 78 signals in both the tubular and vesicular endoplasmic reticulum before and after cholesterol loading. These results raise the possibility that foam cell formation may activate an endoplasmic reticulum vesiculation process, producing vesicles enriched in the ACAT-1 protein.

  8. Identification of ACAT1- and ACAT2-specific inhibitors using a novel, cell-based fluorescence assay: individual ACAT uniqueness.

    PubMed

    Lada, Aaron T; Davis, Matthew; Kent, Carol; Chapman, James; Tomoda, Hiroshi; Omura, Satoshi; Rudel, Lawrence L

    2004-02-01

    Acyl CoA:cholesterol acyltransferase 1 (ACAT1) and ACAT2 are enzymes responsible for the formation of cholesteryl esters in tissues. While both ACAT1 and ACAT2 are present in the liver and intestine, the cells containing either enzyme within these tissues are distinct, suggesting that ACAT1 and ACAT2 have separate functions. In this study, NBD-cholesterol was used to screen for specific inhibitors of ACAT1 and ACAT2. Incubation of AC29 cells, which do not contain ACAT activity, with NBD-cholesterol showed weak fluorescence when the compound was localized in the membrane. When AC29 cells stably transfected with either ACAT1 or ACAT2 were incubated with NBD-cholesterol, the fluorescent signal localized to the nonpolar core of cytoplasmic lipid droplets was strongly fluorescent and was correlated with two independent measures of ACAT activity. Several compounds were found to have greater inhibitory activity toward ACAT1 than ACAT2, and one compound was identified that specifically inhibits ACAT2. The demonstration of selective inhibition of ACAT1 and ACAT2 provides evidence for uniqueness in structure and function of these two enzymes. To the extent that ACAT2 is confined to hepatocytes and enterocytes, the only two cell types that secrete lipoproteins, selective inhibition of ACAT2 may prove to be most beneficial in the reduction of plasma lipoprotein cholesterol concentrations.

  9. ACAT1 deletion in murine macrophages associated with cytotoxicity and decreased expression of collagen type 3A1

    SciTech Connect

    Rodriguez, Annabelle . E-mail: arodrig5@jhmi.edu; Ashen, M. Dominique; Chen, Edward S.

    2005-05-27

    In contrast to some published studies of murine macrophages, we previously showed that ACAT inhibitors appeared to be anti-atherogenic in primary human macrophages in that they decreased foam cell formation without inducing cytotoxicity. Herein, we examined foam cell formation and cytotoxicity in murine ACAT1 knockout (KO) macrophages in an attempt to resolve the discrepancies. Elicited peritoneal macrophages from normal C57BL6 and ACAT1 KO mice were incubated with DMEM containing acetylated LDL (acLDL, 100 {mu}g protein/ml) for 48 h. Cells became cholesterol enriched and there were no differences in the total cholesterol mass. Esterified cholesterol mass was lower in ACAT1 KO foam cells compared to normal macrophages (p < 0.04). Cytotoxicity, as measured by the cellular release of [{sup 14}C]adenine from macrophages, was approximately 2-fold greater in ACAT1 KO macrophages as compared to normal macrophages (p < 0.0001), and this was independent of cholesterol enrichment. cDNA microarray analysis showed that ACAT1 KO macrophages expressed substantially less collagen type 3A1 (26-fold), which was confirmed by RT-PCR. Total collagen content was also significantly reduced (57%) in lung homogenates isolated from ACAT1 KO mice (p < 0.02). Thus, ACAT1 KO macrophages show biochemical changes consistent with increased cytotoxicity and also a novel association with decreased expression of collagen type 3A1.

  10. A novel technical approach for the measurement of individual ACAT-1 and ACAT-2 enzymatic activity in the testis.

    PubMed

    Chen, Li; Lafond, Julie; Pelletier, R-Marc

    2009-01-01

    Acyl-coenzyme A:cholesterol acyltransferase (ACAT) is implicated in the esterification of cholesterol when the latter is present at concentrations exceeding metabolic demands. Thus, ACAT contributes to the maintenance of cholesterol homeostasis which in testis is essential for the production of fertile gametes. However, the role of individual isoform of the enzyme in the maintenance of cholesterol homeostasis in the gonads has not been addressed yet because approaches to measure the enzymatic activity of each isoform were lacking. Here, we used the selective ACAT-1 inhibitor, K-604, to measure the individual enzymatic activity of ACAT-1 and ACAT-2 in enriched fractions of mouse seminiferous tubules. K-604 inhibited adult mouse ACAT-1 much more than ACAT-2 with IC(50) values of 100 and 1,000 microM, respectively, in the tubules. Next, the inhibitor concentration (100 microM) that inhibits the activity of ACAT-1 but not the activity of ACAT-2 was determined and applied to measure ACAT-1 and ACAT-2 enzymatic activities in mouse seminiferous tubule-enriched fractions. ACAT-2 activity reached 2173 CPMB/200 microg protein, while ACAT-1 enzymatic activity was 713 CPMB/200 microg proteins in the tubules. We also compared the effect of another inhibitor Manassantin B with K-604. Increasing the concentration (0-1,000 microM) of Manassantin B resulted in the inhibition of the activity of both ACAT-1 and ACAT-2. The results show that only K-604 is a useful tool to determine the individual ACAT-1 and ACAT-2 enzymatic activities in the seminiferous tubules.

  11. Compared with Acyl-CoA:cholesterol O-acyltransferase (ACAT) 1 and lecithin:cholesterol acyltransferase, ACAT2 displays the greatest capacity to differentiate cholesterol from sitosterol.

    PubMed

    Temel, Ryan E; Gebre, Abraham K; Parks, John S; Rudel, Lawrence L

    2003-11-28

    The capacity of acyl-CoA:cholesterol O-acyltransferase (ACAT) 2 to differentiate cholesterol from the plant sterol, sitosterol, was compared with that of the sterol esterifying enzymes, ACAT1 and lecithin:cholesterol acyltransferase (LCAT). Cholesterol-loaded microsomes from transfected cells containing either ACAT1 or ACAT2 exhibited significantly more ACAT activity than their sitosterol-loaded counterparts. In sitosterol-loaded microsomes, both ACAT1 and ACAT2 were able to esterify sitosterol albeit with lower efficiencies than cholesterol. The mass ratios of cholesterol ester to sitosterol ester formed by ACAT1 and ACAT2 were 1.6 and 7.2, respectively. Compared with ACAT1, ACAT2 selectively esterified cholesterol even when sitosterol was loaded into the microsomes. To further characterize the difference in sterol specificity, ACAT1 and ACAT2 were compared in intact cells loaded with either cholesterol or sitosterol. Despite a lower level of ACAT activity, the ACAT1-expressing cells esterified 4-fold more sitosterol than the ACAT2 cells. The data showed that compared with ACAT1, ACAT2 displayed significantly greater selectively for cholesterol compared with sitosterol. The plasma cholesterol esterification enzyme lecithin:cholesterol acyltransferase was also compared. With recombinant high density lipoprotein particles, the esterification rate of cholesterol by LCAT was only 15% greater than for sitosterol. Thus, LCAT was able to efficiently esterify both cholesterol and sitosterol. In contrast, ACAT2 demonstrated a strong preference for cholesterol rather than sitosterol. This sterol selectivity by ACAT2 may reflect a role in the sorting of dietary sterols during their absorption by the intestine in vivo.

  12. Effect of valsartan on ACAT-1 and PPAR-γ expression in intima with carotid artery endothelial balloon injury in rabbit

    PubMed Central

    Ma, Tao; Ma, Zhi-Qiang; Du, Xiao-Hui; Yu, Qiu-Shi; Wang, Rong; Liu, Li

    2015-01-01

    Objective: To study the effect of valsartan on ACAT-1 and PPAR-γ expression after vascular endothelial balloon injury in intimal hyperplasia process. Methods: 24 male New Zealand white rabbits were randomly divided into three groups with 8 in each group. Control group: rabbits were fed with normal diet; Balloon injury group: rabbits were fed with 0.5% cholesterol, 5% lard rabbit feed; balloon injury + valsartan group, rabbits were fed with 0.5% cholesterol, 5% lard rabbit feed added with 10 mg/(kg.d) valsartan gavage. RT-PCR and Western blotting method were used to detect the carotid ACAT-1, PPAR-γ mRNA and protein expression after 8 weeks of feeding. Results: In carotid artery balloon injury group, vascular smooth muscle cells (VSMC) proliferation and intimal hyperplasia were significantly higher 14 d after endothelial injury. In 14 days valsartan treatment group VSMC proliferation and intimal hyperplasia were lighter than the surgery group. Compared with the control group, ACAT-1, PPAR-γ mRNA and protein were significantly increased in balloon injury group and valsartan group (P < 0.01 or P < 0.05); the expression of ACAT-1 mRNA and protein were significantly lower in valsartan group and balloon injury group (P < 0.01 or P < 0.05). The expression of PPAR-γ mRNA and protein in valsartan group expression was significantly higher than that in the balloon injury group (P < 0.05). The expression level of ACAT-1 and PPAR-γ mRNA in balloon injury group and valsartan group showed negative correlation (P < 0.05). Conclusion: The expression of ACAT-1, PPAR-γ mRNA and protein content were significantly increased in intimal hyperplasia process after vascular endothelial balloon injury. The effect of valsartan suppressed intimal hyperplasia correlated with the expression of down-regulated ACAT-1 and up-regulated PPAR-γ. PMID:26131133

  13. Human ACAT-1 and ACAT-2 inhibitory activities of pentacyclic triterpenes from the leaves of Lycopus lucidus TURCZ.

    PubMed

    Lee, Woo Song; Im, Kyung-Ran; Park, Yong-Dae; Sung, Nack-Do; Jeong, Tae-Sook

    2006-02-01

    Acyl-CoA: cholesterol acyltransferase (ACAT) catalyzes the acylation of cholesterol to cholesteryl ester with long chain fatty acids and ACAT inhibition is a useful strategy for treating hypercholesterolemia or atherosclerosis. Pentacyclic triterpenes, ursolic acid (1), oleanolic acid (2), and betulinic acid (3) were isolated from the methanol extracts of the leaves of Lycopus lucidus TURCZ. by bioassay-guided fractionation. The structures of compounds 1-3 were elucidated by their spectroscopic data analysis. Among them, betulinic acid (3) exhibited more potent human ACAT-1 and ACAT-2 inhibitory activities with IC(50) values of 16.2+/-0.6 and 28.8+/-1.3 microM, respectively.

  14. Human ACAT-1 and -2 inhibitory activities of saucerneol B, manassantin A and B isolated from Saururus chinensis.

    PubMed

    Lee, Woo Song; Lee, Dae-Woo; Baek, Young-Il; An, Sojin; An, So-Jin; Cho, Kyung-Hyun; Choi, Yang-Kyu; Kim, Hyoung-Chin; Park, Ho-Yong; Bae, Ki-Hwan; Jeong, Tae-Sook

    2004-06-21

    The sesquineolignan, saucerneol B (1), and dineolignans, manassantin A (2), and manassantin B (3), were isolated from the methanol extracts of Saururus chinensis root and elucidated by their spectroscopic data analysis. Compounds 1-3 inhibited hACAT-1 and hACAT-2 with IC(50) values of 43.0 and 124.0 microM for 1, of 39.0 and 8.0 microM for 2, of 82.0 microM and only 32% inhibition at 1mM for 3, respectively. The EtOAc-soluble fraction, which contained compounds 1-3, of methanol extracts of S. chinensis exhibited strong cholesterol-lowering effect in high cholesterol-fed mice.

  15. A selective ACAT-1 inhibitor, K-604, stimulates collagen production in cultured smooth muscle cells and alters plaque phenotype in apolipoprotein E-knockout mice.

    PubMed

    Yoshinaka, Yasunobu; Shibata, Haruki; Kobayashi, Hideyuki; Kuriyama, Hiroki; Shibuya, Kimiyuki; Tanabe, Sohei; Watanabe, Takuya; Miyazaki, Akira

    2010-11-01

    Acyl-coenzyme A:cholesterol O-acyltransferase-1 (ACAT-1) plays an essential role in macrophage foam cell formation and progression of atherosclerosis. We developed a potent and selective ACAT-1 inhibitor, K-604, and tested its effects in apoE-knockout mice. Administration of K-604 to 8-week-old apoE-knockout mice for 12 weeks at a dose of 60 mg/kg/day significantly reduced macrophage-positive area and increased collagen-positive area in atherosclerotic plaques in the aorta without affecting plasma cholesterol levels or lesion areas, indicating direct plaque-modulating effects of K-604 on vascular walls independent of plasma cholesterol levels. Pactimibe, a nonselective inhibitor of ACAT-1 and ACAT-2, reduced plasma cholesterol levels but did not affect macrophage- or collagen-positive areas. The size of macrophages and cholesteryl ester contents in the aorta were reduced by K-604. Exposure of cultured human aortic smooth muscle cells to K-604 resulted in increased procollagen type 1 contents in the culture supernatant and increased procollagen type 1 mRNA levels. Procollagen production was unaffected by pactimibe even at a concentration that inhibited cholesterol esterification to the basal level. Thus, the plaque-modulating effects of K-604 can be explained by stimulation of procollagen production independent of ACAT inhibition in addition to potent inhibition of macrophage ACAT-1.

  16. Recombinant acyl-CoA:cholesterol acyltransferase-1 (ACAT-1) purified to essential homogeneity utilizes cholesterol in mixed micelles or in vesicles in a highly cooperative manner.

    PubMed

    Chang, C C; Lee, C Y; Chang, E T; Cruz, J C; Levesque, M C; Chang, T Y

    1998-12-25

    Acyl-coenzyme A:cholesterol acyltransferase (ACAT) is an integral membrane protein located in the endoplasmic reticulum. It catalyzes the formation of cholesteryl esters from cholesterol and long-chain fatty acyl coenzyme A. The first gene encoding the enzyme, designated as ACAT-1, was identified in 1993 through an expression cloning approach. We isolated a Chinese hamster ovary cell line that stably expresses the recombinant human ACAT-1 protein bearing an N-terminal hexahistidine tag. We purified this enzyme approximately 7000-fold from crude cell extracts by first solubilizing the cell membranes with the zwitterionic detergent 3-[(3-cholamidopropyl)dimethylammonio]-1-propanesulfonate, then proceeding with an ACAT-1 monoclonal antibody affinity column and an immobilized metal affinity column. The final preparation is enzymologically active and migrates as a single band at 54 kDa on SDS-polyacrylamide gel electrophoresis. Pure ACAT-1 dispersed in mixed micelles containing sodium taurocholate, phosphatidylcholine, and cholesterol remains catalytically active. The cholesterol substrate saturation curves of the enzyme assayed either in mixed micelles or in reconstituted vesicles are both highly sigmoidal. The oleoyl-coenzyme A substrate saturation curves of the enzyme assayed under the same conditions are both hyperbolic. These results support the hypothesis that ACAT is an allosteric enzyme regulated by cholesterol.

  17. Mass-production of human ACAT-1 and ACAT-2 to screen isoform-specific inhibitor: a different substrate specificity and inhibitory regulation.

    PubMed

    Cho, Kyung-Hyun; An, Sojin; Lee, Woo-Song; Paik, Young-Ki; Kim, Young-Kook; Jeong, Tae-Sook

    2003-10-03

    Recently, acyl-CoA:cholesterol acyltransferase was found to be present as two isoforms, ACAT-1 and ACAT-2, in mammalian tissues with different metabolic functions and tissue-specific locations. In this study, the isoforms were mass-produced individually from insect cells to establish a more sensitive and reliable screening method for specific inhibitors against each isoform. The expressed hACAT-1 and hACAT-2 appeared as a 50 kDa- and a 46 kDa-band on SDS-PAGE, respectively, from Hi5 cells and they preferred to exist in oligomeric form, from dimer to tetramer, during the purification process. They also exhibited an approximate 3.4 to 3.7-fold increase in activities when compared to rat liver microsomal fractions at the same protein concentration. Known ACAT inhibitors, pyripyropene A, oleic acid anilide, and diethyl pyrocarbonate, were tested to evaluate the inhibitory specificity and sensitivity of the expressed enzymes. Interestingly, pyripyropene A inhibited only the hACAT-2 fraction with IC(50)=0.64 microM but not the hACAT-1 fraction; whereas the fatty acid anilide did not show a significant difference in inhibitory activity with either hACAT-1 or hACAT-2. Furthermore, cholesterol was more rapidly utilized by hACAT-1, but hACAT-2 esterified other cholic acid derivatives more efficiently. These results suggest that the specificity of each substrate and inhibitor was highly different, depending on each isoform from the viewpoint of the regulatory site and the substrate binding site location.

  18. Immunodepletion experiments suggest that acyl-coenzyme A:cholesterol acyltransferase-1 (ACAT-1) protein plays a major catalytic role in adult human liver, adrenal gland, macrophages, and kidney, but not in intestines.

    PubMed

    Lee, O; Chang, C C; Lee, W; Chang, T Y

    1998-08-01

    The first acyl-coenzyme A:cholesterol acyltransferase (ACAT) cDNA cloned and expressed in 1993 is designated as ACAT-1. In various human tissue homogenates, ACAT-1 protein is effectively solubilized with retention of enzymatic activity by the detergent CHAPS along with high salt. After using anti-ACAT-1 antibodies to quantitatively remove ACAT-1 protein from the solubilized enzyme, measuring the residual ACAT activity remaining in the immunodepleted supernatants allows us to assess the functional significance of ACAT-1 protein in various human tissues. The results showed that ACAT activity was immunodepleted 90% in liver (83% in hepatocytes), 98% in adrenal gland, 91% in macrophages, 80% in kidney, and 19% in intestines, suggesting that ACAT-1 protein plays a major catalytic role in all of the human tissue/cell homogenates examined except intestines. Intestinal ACAT activity is largely resistant to immunodepletion and is much more sensitive to inhibition by the ACAT inhibitor Dup 128 than liver ACAT activity.

  19. Exon 10 skipping in ACAT1 caused by a novel c.949G>A mutation located at an exonic splice enhancer site.

    PubMed

    Otsuka, Hiroki; Sasai, Hideo; Nakama, Mina; Aoyama, Yuka; Abdelkreem, Elsayed; Ohnishi, Hidenori; Konstantopoulou, Vassiliki; Sass, Jörn Oliver; Fukao, Toshiyuki

    2016-11-01

    Beta-ketothiolase deficiency, also known as mitochondrial acetoacetyl-CoA thiolase (T2) deficiency, is an autosomal recessive disease caused by mutations in the acetyl‑CoA acetyltransferase 1 (ACAT1) gene. A German T2‑deficient patient that developed a severe ketoacidotic episode at the age of 11 months, was revealed to be a compound heterozygote of a previously reported null mutation, c.472A>G (p.N158D) and a novel mutation, c.949G>A (p.D317N), in ACAT1. The c.949G>A mutation was suspected to cause aberrant splicing as it is located within an exonic splicing enhancer sequence (c. 947CTGACGC) that is a potential binding site for serine/arginine‑rich splicing factor 1. A mutation in this sequence, c.951C>T, results in exon 10 skipping. A minigene construct was synthesized that included exon 9‑truncated intron 9‑exon 10‑truncated intron 10‑exon 11, and the splicing of this minigene revealed that the c.949G>A mutant construct caused exon 10 skipping in a proportion of the transcripts. Furthermore, additional substitution of G for C at the first nucleotide of exon 10 (c.941G>C) abolished the effect of the c.949G>A mutation. Transient expression analysis of the c.949G>A mutant cDNA revealed no residual T2 activity in the mutated D317N enzyme. Therefore, c.949G>A (D317N) is a pathogenic missense mutation, and diminishes the effect of an exonic splicing enhancer and causes exon 10 skipping. The present study demonstrates that a missense mutation, or even a synonymous substitution, may disrupt enzyme function by interference with splicing.

  20. SNP Arrays

    PubMed Central

    Louhelainen, Jari

    2016-01-01

    The papers published in this Special Issue “SNP arrays” (Single Nucleotide Polymorphism Arrays) focus on several perspectives associated with arrays of this type. The range of papers vary from a case report to reviews, thereby targeting wider audiences working in this field. The research focus of SNP arrays is often human cancers but this Issue expands that focus to include areas such as rare conditions, animal breeding and bioinformatics tools. Given the limited scope, the spectrum of papers is nothing short of remarkable and even from a technical point of view these papers will contribute to the field at a general level. Three of the papers published in this Special Issue focus on the use of various SNP array approaches in the analysis of three different cancer types. Two of the papers concentrate on two very different rare conditions, applying the SNP arrays slightly differently. Finally, two other papers evaluate the use of the SNP arrays in the context of genetic analysis of livestock. The findings reported in these papers help to close gaps in the current literature and also to give guidelines for future applications of SNP arrays. PMID:27792140

  1. SKM-SNP: SNP markers detection method.

    PubMed

    Liu, Yang; Li, Mark; Cheung, Yiu M; Sham, Pak C; Ng, Michael K

    2010-04-01

    SKM-SNP, SNP markers detection program, is proposed to identify a set of relevant SNPs for the association between a disease and multiple marker genotypes. We employ a subspace categorical clustering algorithm to compute a weight for each SNP in the group of patient samples and the group of normal samples, and use the weights to identify the subsets of relevant SNPs that categorize these two groups. The experiments on both Schizophrenia and Parkinson Disease data sets containing genome-wide SNPs are reported to demonstrate the program. Results indicate that our method can find some relevant SNPs that categorize the disease samples. The online SKM-SNP program is available at http://www.math.hkbu.edu.hk/~mng/SKM-SNP/SKM-SNP.html.

  2. SNP-VISTA

    SciTech Connect

    Shah, Nameeta; Teplitsky, Michael; Minovitsky, Simon; Dubchak, Inna

    2005-11-07

    SNP-VISTA aids in analyses of the following types of data: A. Large-scale re-sequence data of disease-related genes for discovery of associated and/or causative alleles (GeneSNP-VISTA). B. Massive amounts of ecogenomics data for studying homologous recombination in microbial populations (EcoSNP-VISTA). The main features and capabilities of SNP-VISTA are: 1) Mapping of SNPs to gene structure; 2) classification of SNPs, based on their location in the gene, frequency of occurrence in samples and allele composition; 3) clustering, based on user-defined subsets of SNPs, highlighting haplotypes as well as recombinant sequences; 4) integration of protein conservation visualization; and 5) display of automatically calculated recombination points that are user-editable. The main strength of SNP-VISTA is its graphical interface and use of visual representations, which support interactive exploration and hence better understanding of large-scale SNPs data.

  3. SNP panels/Imputation

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Participants from thirteen countries discussed services that Interbull can perform or recommendations that Interbull can make to promote harmonization and assist member countries in improving their genomic evaluations in regard to SNP panels and imputation. The panel recommended: A mechanism to shar...

  4. SNP genotyping by heteroduplex analysis.

    PubMed

    Paniego, Norma; Fusari, Corina; Lia, Verónica; Puebla, Andrea

    2015-01-01

    Heteroduplex-based genotyping methods have proven to be technologically effective and economically efficient for low- to medium-range throughput single-nucleotide polymorphism (SNP) determination. In this chapter we describe two protocols that were successfully applied for SNP detection and haplotype analysis of candidate genes in association studies. The protocols involve (1) enzymatic mismatch cleavage with endonuclease CEL1 from celery, associated with fragment separation using capillary electrophoresis (CEL1 cleavage), and (2) differential retention of the homo/heteroduplex DNA molecules under partial denaturing conditions on ion pair reversed-phase liquid chromatography (dHPLC). Both methods are complementary since dHPLC is more versatile than CEL1 cleavage for identifying multiple SNP per target region, and the latter is easily optimized for sequences with fewer SNPs or small insertion/deletion polymorphisms. Besides, CEL1 cleavage is a powerful method to localize the position of the mutation when fragment resolution is done using capillary electrophoresis.

  5. UASIS: Universal Automatic SNP Identification System

    PubMed Central

    2011-01-01

    Background SNP (Single Nucleotide Polymorphism), the most common genetic variations between human beings, is believed to be a promising way towards personalized medicine. As more and more research on SNPs are being conducted, non-standard nomenclatures may generate potential problems. The most serious issue is that researchers cannot perform cross referencing among different SNP databases. This will result in more resources and time required to track SNPs. It could be detrimental to the entire academic community. Results UASIS (Universal Automated SNP Identification System) is a web-based server for SNP nomenclature standardization and translation at DNA level. Three utilities are available. They are UASIS Aligner, Universal SNP Name Generator and SNP Name Mapper. UASIS maps SNPs from different databases, including dbSNP, GWAS, HapMap and JSNP etc., into an uniform view efficiently using a proposed universal nomenclature and state-of-art alignment algorithms. UASIS is freely available at http://www.uasis.tk with no requirement of log-in. Conclusions UASIS is a helpful platform for SNP cross referencing and tracking. By providing an informative, unique and unambiguous nomenclature, which utilizes unique position of a SNP, we aim to resolve the ambiguity of SNP nomenclatures currently practised. Our universal nomenclature is a good complement to mainstream SNP notations such as rs# and HGVS guidelines. UASIS acts as a bridge to connect heterogeneous representations of SNPs. PMID:22369494

  6. Linear reduction methods for tag SNP selection.

    PubMed

    He, Jingwu; Zelikovsky, Alex

    2004-01-01

    It is widely hoped that constructing a complete human haplotype map will help to associate complex diseases with certain SNP's. Unfortunately, the number of SNP's is huge and it is very costly to sequence many individuals. Therefore, it is desirable to reduce the number of SNP's that should be sequenced to considerably small number of informative representatives, so called tag SNP's. In this paper, we propose a new linear algebra based method for selecting and using tag SNP's. Our method is purely combinatorial and can be combined with linkage disequilibrium (LD) and block based methods. We measure the quality of our tag SNP selection algorithm by comparing actual SNP's with SNP's linearly predicted from linearly chosen tag SNP's. We obtain an extremely good compression and prediction rates. For example, for long haplotypes (>25000 SNP's), knowing only 0.4% of all SNP's we predict the entire unknown haplotype with 2% accuracy while the prediction method is based on a 10% sample of the population.

  7. Detecting Susceptibility to Breast Cancer with SNP-SNP Interaction Using BPSOHS and Emotional Neural Networks

    PubMed Central

    Wang, Xiao; Fan, Yue

    2016-01-01

    Studies for the association between diseases and informative single nucleotide polymorphisms (SNPs) have received great attention. However, most of them just use the whole set of useful SNPs and fail to consider the SNP-SNP interactions, while these interactions have already been proven in biology experiments. In this paper, we use a binary particle swarm optimization with hierarchical structure (BPSOHS) algorithm to improve the effective of PSO for the identification of the SNP-SNP interactions. Furthermore, in order to use these SNP interactions in the susceptibility analysis, we propose an emotional neural network (ENN) to treat SNP interactions as emotional tendency. Different from the normal architecture, just as the emotional brain, this architecture provides a specific path to treat the emotional value, by which the SNP interactions can be considered more quickly and directly. The ENN helps us use the prior knowledge about the SNP interactions and other influence factors together. Finally, the experimental results prove that the proposed BPSOHS_ENN algorithm can detect the informative SNP-SNP interaction and predict the breast cancer risk with a much higher accuracy than existing methods. PMID:27294121

  8. SNP Cutter: a comprehensive tool for SNP PCR–RFLP assay design

    PubMed Central

    Zhang, Ruifang; Zhu, Zanhua; Zhu, Hongming; Nguyen, Tu; Yao, Fengxia; Xia, Kun; Liang, Desheng; Liu, Chunyu

    2005-01-01

    The Polymerase chain reaction–restriction fragment length polymorphism (PCR–RFLP) is a relatively simple and inexpensive method for genotyping single nucleotide polymorphisms (SNPs). It requires minimal investment in instrumentation. Here, we describe a web application, ‘SNP Cutter,’ which designs PCR–RFLP assays on a batch of SNPs from the human genome. NCBI dbSNP rs IDs or formatted SNPs are submitted into the SNP Cutter which then uses restriction enzymes from a pre-selected list to perform enzyme selection. The program is capable of designing primers for either natural PCR–RFLP or mismatch PCR–RFLP, depending on the SNP sequence data. SNP Cutter generates the information needed to evaluate and perform genotyping experiments, including a PCR primers list, sizes of original amplicons and different allelic fragment after enzyme digestion. Some output data is tab-delimited, therefore suitable for database archiving. The SNP Cut-ter is available at . PMID:15980518

  9. SNP genotyping by DNA photoligation: application to SNP detection of genes from food crops

    NASA Astrophysics Data System (ADS)

    Yoshimura, Yoshinaga; Ohtake, Tomoko; Okada, Hajime; Ami, Takehiro; Tsukaguchi, Tadashi; Fujimoto, Kenzo

    2009-06-01

    We describe a simple and inexpensive single-nucleotide polymorphism (SNP) typing method, using DNA photoligation with 5-carboxyvinyl-2'-deoxyuridine and two fluorophores. This SNP-typing method facilitates qualitative determination of genes from indica and japonica rice, and showed a high degree of single nucleotide specificity up to 10 000. This method can be used in the SNP typing of actual genomic DNA samples from food crops.

  10. SNPMeta: SNP annotation and SNP metadata collection without a reference genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The increase in availability of resequencing data is greatly accelerating SNP discovery and has facilitated the development of SNP genotyping assays. This, in turn, is increasing interest in annotation of individual SNPs. Currently, these data are only available through curation, or comparison to a ...

  11. Genome-wide SNP detection, validation, and development of an 8K SNP array for apple

    Technology Transfer Automated Retrieval System (TEKTRAN)

    As high-throughput genetic marker screening systems are essential for a range of genetics studies and plant breeding applications, the International RosBREED SNP Consortium (IRSC) has utilized the Illumina Infinium® II system to develop a medium- to high-throughput SNP screening tool for genome-wide...

  12. Characterization of the Streptomyces sp. Strain C5 snp Locus and Development of snp-Derived Expression Vectors

    PubMed Central

    DeSanti, Charles L.; Strohl, William R.

    2003-01-01

    The Streptomyces sp. strain C5 snp locus is comprised of two divergently oriented genes: snpA, a metalloproteinase gene, and snpR, which encodes a LysR-like activator of snpA transcription. The transcriptional start point of snpR is immediately downstream of a strong T-N11-A inverted repeat motif likely to be the SnpR binding site, while the snpA transcriptional start site overlaps the ATG start codon, generating a leaderless snpA transcript. By using the aphII reporter gene of pIJ486 as a reporter, the plasmid-borne snpR-activated snpA promoter was ca. 60-fold more active than either the nonactivated snpA promoter or the melC1 promoter of pIJ702. The snpR-activated snpA promoter produced reporter protein levels comparable to those of the up-mutated ermE∗ promoter. The SnpR-activated snpA promoter was built into a set of transcriptional and translational fusion expression vectors which have been used for the intracellular expression of numerous daunomycin biosynthesis pathway genes from Streptomyces sp. strain C5 as well as the expression and secretion of soluble recombinant human endostatin. PMID:12620855

  13. SNP Array in Hematopoietic Neoplasms: A Review

    PubMed Central

    Song, Jinming; Shao, Haipeng

    2015-01-01

    Cytogenetic analysis is essential for the diagnosis and prognosis of hematopoietic neoplasms in current clinical practice. Many hematopoietic malignancies are characterized by structural chromosomal abnormalities such as specific translocations, inversions, deletions and/or numerical abnormalities that can be identified by karyotype analysis or fluorescence in situ hybridization (FISH) studies. Single nucleotide polymorphism (SNP) arrays offer high-resolution identification of copy number variants (CNVs) and acquired copy-neutral loss of heterozygosity (LOH)/uniparental disomy (UPD) that are usually not identifiable by conventional cytogenetic analysis and FISH studies. As a result, SNP arrays have been increasingly applied to hematopoietic neoplasms to search for clinically-significant genetic abnormalities. A large numbers of CNVs and UPDs have been identified in a variety of hematopoietic neoplasms. CNVs detected by SNP array in some hematopoietic neoplasms are of prognostic significance. A few specific genes in the affected regions have been implicated in the pathogenesis and may be the targets for specific therapeutic agents in the future. In this review, we summarize the current findings of application of SNP arrays in a variety of hematopoietic malignancies with an emphasis on the clinically significant genetic variants. PMID:27600067

  14. [Research progress on the phenotype informative SNP in forensic science].

    PubMed

    Liu, Yu-Xuan; Hu, Qing-Qing; Ma, Hong-Du; Huang, Dai-Xin

    2014-10-01

    Single nucleotide polymorphism (SNP) refers to the single base sequence variation in specific location of the human genome. Phenotype informative SNP has gradually become one of the research hot spots in forensic science. In this paper, the forensic research situation and application prospect of phenotype informative SNP in the characteristics of hair, eye and skin color, height, and facial feature are reviewed.

  15. is-rSNP: a novel technique for in silico regulatory SNP detection

    PubMed Central

    Macintyre, Geoff; Bailey, James; Haviv, Izhak; Kowalczyk, Adam

    2010-01-01

    Motivation: Determining the functional impact of non-coding disease-associated single nucleotide polymorphisms (SNPs) identified by genome-wide association studies (GWAS) is challenging. Many of these SNPs are likely to be regulatory SNPs (rSNPs): variations which affect the ability of a transcription factor (TF) to bind to DNA. However, experimental procedures for identifying rSNPs are expensive and labour intensive. Therefore, in silico methods are required for rSNP prediction. By scoring two alleles with a TF position weight matrix (PWM), it can be determined which SNPs are likely rSNPs. However, predictions in this manner are noisy and no method exists that determines the statistical significance of a nucleotide variation on a PWM score. Results: We have designed an algorithm for in silico rSNP detection called is-rSNP. We employ novel convolution methods to determine the complete distributions of PWM scores and ratios between allele scores, facilitating assignment of statistical significance to rSNP effects. We have tested our method on 41 experimentally verified rSNPs, correctly predicting the disrupted TF in 28 cases. We also analysed 146 disease-associated SNPs with no known functional impact in an attempt to identify candidate rSNPs. Of the 11 significantly predicted disrupted TFs, 9 had previous evidence of being associated with the disease in the literature. These results demonstrate that is-rSNP is suitable for high-throughput screening of SNPs for potential regulatory function. This is a useful and important tool in the interpretation of GWAS. Availability: is-rSNP software is available for use at: www.genomics.csse.unimelb.edu.au/is-rSNP Contact: gmaci@csse.unimelb.edu.au; adam.kowalczyk@nicta.com.au Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20823317

  16. Analyzing cancer samples with SNP arrays.

    PubMed

    Van Loo, Peter; Nilsen, Gro; Nordgard, Silje H; Vollan, Hans Kristian Moen; Børresen-Dale, Anne-Lise; Kristensen, Vessela N; Lingjærde, Ole Christian

    2012-01-01

    Single nucleotide polymorphism (SNP) arrays are powerful tools to delineate genomic aberrations in cancer genomes. However, the analysis of these SNP array data of cancer samples is complicated by three phenomena: (a) aneuploidy: due to massive aberrations, the total DNA content of a cancer cell can differ significantly from its normal two copies; (b) nonaberrant cell admixture: samples from solid tumors do not exclusively contain aberrant tumor cells, but always contain some portion of nonaberrant cells; (c) intratumor heterogeneity: different cells in the tumor sample may have different aberrations. We describe here how these phenomena impact the SNP array profile, and how these can be accounted for in the analysis. In an extended practical example, we apply our recently developed and further improved ASCAT (allele-specific copy number analysis of tumors) suite of tools to analyze SNP array data using data from a series of breast carcinomas as an example. We first describe the structure of the data, how it can be plotted and interpreted, and how it can be segmented. The core ASCAT algorithm next determines the fraction of nonaberrant cells and the tumor ploidy (the average number of DNA copies), and calculates an ASCAT profile. We describe how these ASCAT profiles visualize both copy number aberrations as well as copy-number-neutral events. Finally, we touch upon regions showing intratumor heterogeneity, and how they can be detected in ASCAT profiles. All source code and data described here can be found at our ASCAT Web site ( http://www.ifi.uio.no/forskning/grupper/bioinf/Projects/ASCAT/).

  17. A Bayesian Framework for SNP Identification

    SciTech Connect

    Webb-Robertson, Bobbie-Jo M.; Havre, Susan L.; Payne, Deborah A.

    2005-07-01

    Current proteomics techniques, such as mass spectrometry, focus on protein identification, usually ignoring most types of modifications beyond post-translational modifications, with the assumption that only a small number of peptides have to be matched to a protein for a positive identification. However, not all proteins are being identified with current techniques and improved methods to locate points of mutation are becoming a necessity. In the case when single-nucleotide polymorphisms (SNPs) are observed, brute force is the most common method to locate them, quickly becoming computationally unattractive as the size of the database associated with the model organism grows. We have developed a Bayesian model for SNPs, BSNP, incorporating evolutionary information at both the nucleotide and amino acid levels. Formulating SNPs as a Bayesian inference problem allows probabilities of interest to be easily obtained, for example the probability of a specific SNP or specific type of mutation over a gene or entire genome. Three SNP databases were observed in the evaluation of the BSNP model; the first SNP database is a disease specific gene in human, hemoglobin, the second is also a disease specific gene in human, p53, and the third is a more general SNP database for multiple genes in mouse. We validate that the BSNP model assigns higher posterior probabilities to the SNPs defined in all three separate databases than can be attributed to chance under specific evolutionary information, for example the amino acid model described by Majewski and Ott in conjunction with either the four-parameter nucleotide model by Bulmer or seven-parameter nucleotide model by Majewski and Ott.

  18. Variable Selection in Logistic Regression for Detecting SNP-SNP Interactions: the Rheumatoid Arthritis Example

    PubMed Central

    Lin, H. Y.; Desmond, R.; Liu, Y. H.; Bridges, S. L.; Soong, S. J.

    2013-01-01

    Summary Many complex disease traits are observed to be associated with single nucleotide polymorphism (SNP) interactions. In testing small-scale SNP-SNP interactions, variable selection procedures in logistic regressions are commonly used. The empirical evidence of variable selection for testing interactions in logistic regressions is limited. This simulation study was designed to compare nine variable selection procedures in logistic regressions for testing SNP-SNP interactions. Data on 10 SNPs were simulated for 400 and 1000 subjects (case/control ratio=1). The simulated model included one main effect and two 2-way interactions. The variable selection procedures included automatic selection (stepwise, forward and backward), common 2-step selection, AIC- and BIC-based selection. The hierarchical rule effect, in which all main effects and lower order terms of the highest-order interaction term are included in the model regardless of their statistical significance, was also examined. We found that the stepwise variable selection without the hierarchical rule which had reasonably high authentic (true positive) proportion and low noise (false positive) proportion, is a better method compared to other variable selection procedures. The procedure without the hierarchical rule requires fewer terms in testing interactions, so it can accommodate more SNPs than the procedure with the hierarchical rule. For testing interactions, the procedures without the hierarchical rule had higher authentic proportion and lower noise proportion compared with ones with the hierarchical rule. These variable selection procedures were also applied and compared in a rheumatoid arthritis study. PMID:18231122

  19. dbSNP: the NCBI database of genetic variation.

    PubMed

    Sherry, S T; Ward, M H; Kholodov, M; Baker, J; Phan, L; Smigielski, E M; Sirotkin, K

    2001-01-01

    In response to a need for a general catalog of genome variation to address the large-scale sampling designs required by association studies, gene mapping and evolutionary biology, the National Center for Biotechnology Information (NCBI) has established the dbSNP database [S.T.Sherry, M.Ward and K. Sirotkin (1999) Genome Res., 9, 677-679]. Submissions to dbSNP will be integrated with other sources of information at NCBI such as GenBank, PubMed, LocusLink and the Human Genome Project data. The complete contents of dbSNP are available to the public at website: http://www.ncbi.nlm.nih.gov/SNP. The complete contents of dbSNP can also be downloaded in multiple formats via anonymous FTP at ftp://ncbi.nlm.nih.gov/snp/.

  20. TRM: a powerful two-stage machine learning approach for identifying SNP-SNP interactions.

    PubMed

    Lin, Hui-Yi; Chen, Y Ann; Tsai, Ya-Yu; Qu, Xiaotao; Tseng, Tung-Sung; Park, Jong Y

    2012-01-01

    Studies have shown that interactions of single nucleotide polymorphisms (SNPs) may play an important role in understanding the causes of complex disease. We have proposed an integrated machine learning method that combines two machine-learning methods-Random Forests (RF) and Multivariate Adaptive Regression Splines (MARS)-to identify a subset of important SNPs and detect interaction patterns more effectively and efficiently. In this two-stage RF-MARS (TRM) approach, RF is first applied to detect a predictive subset of SNPs, and then MARS is used to identify the interaction patterns. We evaluated the TRM performances in four models. RF variable selection was based on out-of-bag classification error rate (OOB) and variable important spectrum (IS). Our results support that RF(OOB) had better performance than MARS and RF(IS) in detecting important variables. This study demonstrates that TRM(OOB) , which is RF(OOB) plus MARS, has combined the strengths of RF and MARS in identifying SNP-SNP interactions in a scenario of 100 candidate SNPs. TRM(OOB) had greater true positive rate and lower false positive rate compared with MARS, particularly for searching interactions with a strong association with the outcome. Therefore, the use of TRM(OOB) is favored for exploring SNP-SNP interactions in a large-scale genetic variation study.

  1. SNP2CAPS: a SNP and INDEL analysis tool for CAPS marker development.

    PubMed

    Thiel, Thomas; Kota, Raja; Grosse, Ivo; Stein, Nils; Graner, Andreas

    2004-01-02

    With the influx of various SNP genotyping assays in recent years, there has been a need for an assay that is robust, yet cost effective, and could be performed using standard gel-based procedures. In this context, CAPS markers have been shown to meet these criteria. However, converting SNPs to CAPS markers can be a difficult process if done manually. In order to address this problem, we describe a computer program, SNP2CAPS, that facilitates the computational conversion of SNP markers into CAPS markers. 413 multiple aligned sequences derived from barley ESTs were analysed for the presence of polymorphisms in 235 distinct restriction sites. 282 (90%) of 314 alignments that contain sequence variation due to SNPs and InDels revealed at least one polymorphic restriction site. After reducing the number of restriction enzymes from 235 to 10, 31% of the polymorphic sites could still be detected. In order to demonstrate the usefulness of this tool for marker development, we experimentally validated some of the results predicted by SNP2CAPS.

  2. SNP marker detection and genotyping in tilapia.

    PubMed

    Van Bers, N E M; Crooijmans, R P M A; Groenen, M A M; Dibbits, B W; Komen, J

    2012-09-01

    We have generated a unique resource consisting of nearly 175 000 short contig sequences and 3569 SNP markers from the widely cultured GIFT (Genetically Improved Farmed Tilapia) strain of Nile tilapia (Oreochromis niloticus). In total, 384 SNPs were selected to monitor the wider applicability of the SNPs by genotyping tilapia individuals from different strains and different geographical locations. In all strains and species tested (O. niloticus, O. aureus and O. mossambicus), the genotyping assay was working for a similar number of SNPs (288-305 SNPs). The actual number of polymorphic SNPs was, as expected, highest for individuals from the GIFT population (255 SNPs). In the individuals from an Egyptian strain and in individuals caught in the wild in the basin of the river Volta, 197 and 163 SNPs were polymorphic, respectively. A pairwise calculation of Nei's genetic distance allowed the discrimination of the individual strains and species based on the genotypes determined with the SNP set. We expect that this set will be widely applicable for use in tilapia aquaculture, e.g. for pedigree reconstruction. In addition, this set is currently used for assaying the genetic diversity of native Nile tilapia in areas where tilapia is, or will be, introduced in aquaculture projects. This allows the tracing of escapees from aquaculture and the monitoring of effects of introgression and hybridization.

  3. SNIT: SNP identification for strain typing

    PubMed Central

    2011-01-01

    With ever-increasing numbers of microbial genomes being sequenced, efficient tools are needed to perform strain-level identification of any newly sequenced genome. Here, we present the SNP identification for strain typing (SNIT) pipeline, a fast and accurate software system that compares a newly sequenced bacterial genome with other genomes of the same species to identify single nucleotide polymorphisms (SNPs) and small insertions/deletions (indels). Based on this information, the pipeline analyzes the polymorphic loci present in all input genomes to identify the genome that has the fewest differences with the newly sequenced genome. Similarly, for each of the other genomes, SNIT identifies the input genome with the fewest differences. Results from five bacterial species show that the SNIT pipeline identifies the correct closest neighbor with 75% to 100% accuracy. The SNIT pipeline is available for download at http://www.bhsai.org/snit.html PMID:21902825

  4. Atomic Force Microscopy for DNA SNP Identification

    NASA Astrophysics Data System (ADS)

    Valbusa, Ugo; Ierardi, Vincenzo

    The knowledge of the effects of single-nucleotide polymorphisms (SNPs) in the human genome greatly contributes to better comprehension of the relation between genetic factors and diseases. Sequence analysis of genomic DNA in different individuals reveals positions where variations that involve individual base substitutions can occur. Single-nucleotide polymorphisms are highly abundant and can have different consequences at phenotypic level. Several attempts were made to apply atomic force microscopy (AFM) to detect and map SNP sites in DNA strands. The most promising approach is the study of DNA mutations producing heteroduplex DNA strands and identifying the mismatches by means of a protein that labels the mismatches. MutS is a protein that is part of a well-known complex of mismatch repair, which initiates the process of repairing when the MutS binds to the mismatched DNA filament. The position of MutS on the DNA filament can be easily recorded by means of AFM imaging.

  5. Inference of kinship coefficients from Korean SNP genotyping data.

    PubMed

    Park, Seong-Jin; Yang, Jin Ok; Kim, Sang Cheol; Kwon, Jekeun; Lee, Sanghyuk; Lee, Byungwook

    2013-06-01

    The determination of relatedness between individuals in a family is crucial in analysis of common complex diseases. We present a method to infer close inter-familial relationships based on SNP genotyping data and provide the relationship coefficient of kinship in Korean families. We obtained blood samples from 43 Korean individuals in two families. SNP data was obtained using the Affymetrix Genome-wide Human SNP array 6.0 and the Illumina Human 1M-Duo chip. To measure the kinship coefficient with the SNP genotyping data, we considered all possible pairs of individuals in each family. The genetic distance between two individuals in a pair was determined using the allele sharing distance method. The results show that genetic distance is proportional to the kinship coefficient and that a close degree of kinship can be confirmed with SNP genotyping data. This study represents the first attempt to identify the genetic distance between very closely related individuals.

  6. Exercise improves adiponectin concentrations irrespective of the adiponectin gene polymorphisms SNP45 and the SNP276 in obese Korean women.

    PubMed

    Lee, Kyoung-Young; Kang, Hyun-Sik; Shin, Yun-A

    2013-03-10

    The effects of exercise on adiponectin levels have been reported to be variable and may be attributable to an interaction between environmental and genetic factors. The single nucleotide polymorphisms (SNP) 45 (T>G) and SNP276 (G>T) of the adiponectin gene are associated with metabolic risk factors including adiponectin levels. We examined whether SNP45 and SNP276 would differentially influence the effect of exercise training in middle-aged women with uncomplicated obesity. We conducted a prospective study in the general community that included 90 Korean women (age 47.0±5.1 years) with uncomplicated obesity. The intervention was aerobic exercise training for 3 months. Body composition, adiponectin levels, and other metabolic risk factors were measured. Prior to exercise training, only body weight differed among the SNP276 genotypes. Exercise training improved body composition, systolic blood pressure, maximal oxygen consumption, high-density lipoprotein cholesterol, and leptin levels. In addition, exercise improved adiponectin levels irrespective of weight gain or loss. However, after adjustments for age, BMI, body fat (%), and waist circumference, no differences were found in obesity-related characteristics (e.g., adiponectin) following exercise training among the SNP45 and the 276 genotypes. Our findings suggest that aerobic exercise affects adiponectin levels regardless of weight loss and this effect would not be influenced by SNP45 and SNP276 in the adiponectin gene.

  7. Automated SNP genotype clustering algorithm to improve data completeness in high-throughput SNP genotyping datasets from custom arrays.

    PubMed

    Smith, Edward M; Littrell, Jack; Olivier, Michael

    2007-12-01

    High-throughput SNP genotyping platforms use automated genotype calling algorithms to assign genotypes. While these algorithms work efficiently for individual platforms, they are not compatible with other platforms, and have individual biases that result in missed genotype calls. Here we present data on the use of a second complementary SNP genotype clustering algorithm. The algorithm was originally designed for individual fluorescent SNP genotyping assays, and has been optimized to permit the clustering of large datasets generated from custom-designed Affymetrix SNP panels. In an analysis of data from a 3K array genotyped on 1,560 samples, the additional analysis increased the overall number of genotypes by over 45,000, significantly improving the completeness of the experimental data. This analysis suggests that the use of multiple genotype calling algorithms may be advisable in high-throughput SNP genotyping experiments. The software is written in Perl and is available from the corresponding author.

  8. Haplotype assembly from aligned weighted SNP fragments.

    PubMed

    Zhao, Yu-Ying; Wu, Ling-Yun; Zhang, Ji-Hong; Wang, Rui-Sheng; Zhang, Xiang-Sun

    2005-08-01

    Given an assembled genome of a diploid organism the haplotype assembly problem can be formulated as retrieval of a pair of haplotypes from a set of aligned weighted SNP fragments. Known computational formulations (models) of this problem are minimum letter flips (MLF) and the weighted minimum letter flips (WMLF; Greenberg et al. (INFORMS J. Comput. 2004, 14, 211-213)). In this paper we show that the general WMLF model is NP-hard even for the gapless case. However the algorithmic solutions for selected variants of WMFL can exist and we propose a heuristic algorithm based on a dynamic clustering technique. We also introduce a new formulation of the haplotype assembly problem that we call COMPLETE WMLF (CWMLF). This model and algorithms for its implementation take into account a simultaneous presence of multiple kinds of data errors. Extensive computational experiments indicate that the algorithmic implementations of the CWMLF model achieve higher accuracy of haplotype reconstruction than the WMLF-based algorithms, which in turn appear to be more accurate than those based on MLF.

  9. SNP-SNP interaction analysis of NF-κB signaling pathway on breast cancer survival

    PubMed Central

    Jamshidi, Maral; Fagerholm, Rainer; Khan, Sofia; Aittomäki, Kristiina; Czene, Kamila; Darabi, Hatef; Li, Jingmei; Andrulis, Irene L.; Chang-Claude, Jenny; Devilee, Peter; Fasching, Peter A.; Michailidou, Kyriaki; Bolla, Manjeet K.; Dennis, Joe; Wang, Qin; Guo, Qi; Rhenius, Valerie; Cornelissen, Sten; Rudolph, Anja; Knight, Julia A.; Loehberg, Christian R.; Burwinkel, Barbara; Marme, Frederik; Hopper, John L.; Southey, Melissa C.; Bojesen, Stig E.; Flyger, Henrik; Brenner, Hermann; Holleczek, Bernd; Margolin, Sara; Mannermaa, Arto; Kosma, Veli-Matti; Dyck, Laurien Van; Nevelsteen, Ines; Couch, Fergus J.; Olson, Janet E.; Giles, Graham G.; McLean, Catriona; Haiman, Christopher A.; Henderson, Brian E.; Winqvist, Robert; Pylkäs, Katri; Tollenaar, Rob A.E.M.; García-Closas, Montserrat; Figueroa, Jonine; Hooning, Maartje J.; Martens, John W.M.; Cox, Angela; Cross, Simon S.; Simard, Jacques; Dunning, Alison M.; Easton, Douglas F.; Pharoah, Paul D.P.; Hall, Per; Blomqvist, Carl; Schmidt, Marjanka K.; Nevanlinna, Heli

    2015-01-01

    In breast cancer, constitutive activation of NF-κB has been reported, however, the impact of genetic variation of the pathway on patient prognosis has been little studied. Furthermore, a combination of genetic variants, rather than single polymorphisms, may affect disease prognosis. Here, in an extensive dataset (n = 30,431) from the Breast Cancer Association Consortium, we investigated the association of 917 SNPs in 75 genes in the NF-κB pathway with breast cancer prognosis. We explored SNP-SNP interactions on survival using the likelihood-ratio test comparing multivariate Cox’ regression models of SNP pairs without and with an interaction term. We found two interacting pairs associating with prognosis: patients simultaneously homozygous for the rare alleles of rs5996080 and rs7973914 had worse survival (HRinteraction 6.98, 95% CI=3.3-14.4, P = 1.42E-07), and patients carrying at least one rare allele for rs17243893 and rs57890595 had better survival (HRinteraction 0.51, 95% CI=0.3-0.6, P = 2.19E-05). Based on in silico functional analyses and literature, we speculate that the rs5996080 and rs7973914 loci may affect the BAFFR and TNFR1/TNFR3 receptors and breast cancer survival, possibly by disturbing both the canonical and non-canonical NF-κB pathways or their dynamics, whereas, rs17243893-rs57890595 interaction on survival may be mediated through TRAF2-TRAIL-R4 interplay. These results warrant further validation and functional analyses. PMID:26317411

  10. A scan statistic for identifying chromosomal patterns of SNP association.

    PubMed

    Sun, Yan V; Levin, Albert M; Boerwinkle, Eric; Robertson, Henry; Kardia, Sharon L R

    2006-11-01

    We have developed a single nucleotide polymorphism (SNP) association scan statistic that takes into account the complex distribution of the human genome variation in the identification of chromosomal regions with significant SNP associations. This scan statistic has wide applicability for genetic analysis, whether to identify important chromosomal regions associated with common diseases based on whole-genome SNP association studies or to identify disease susceptibility genes based on dense SNP positional candidate studies. To illustrate this method, we analyzed patterns of SNP associations on chromosome 19 in a large cohort study. Among 2,944 SNPs, we found seven regions that contained clusters of significantly associated SNPs. The average width of these regions was 35 kb with a range of 10-72 kb. We compared the scan statistic results to Fisher's product method using a sliding window approach, and detected 22 regions with significant clusters of SNP associations. The average width of these regions was 131 kb with a range of 10.1-615 kb. Given that the distances between SNPs are not taken into consideration in the sliding window approach, it is likely that a large fraction of these regions represents false positives. However, all seven regions detected by the scan statistic were also detected by the sliding window approach. The linkage disequilibrium (LD) patterns within the seven regions were highly variable indicating that the clusters of SNP associations were not due to LD alone. The scan statistic developed here can be used to make gene-based or region-based SNP inferences about disease association.

  11. A Novel Test for Detecting SNP-SNP Interactions in Case-Only Trio Studies.

    PubMed

    Balliu, Brunilda; Zaitlen, Noah

    2016-04-01

    Epistasis plays a significant role in the genetic architecture of many complex phenotypes in model organisms. To date, there have been very few interactions replicated in human studies due in part to the multiple-hypothesis burden implicit in genome-wide tests of epistasis. Therefore, it is of paramount importance to develop the most powerful tests possible for detecting interactions. In this work we develop a new SNP-SNP interaction test for use in case-only trio studies called the trio correlation (TC) test. The TC test computes the expected joint distribution of marker pairs in offspring conditional on parental genotypes. This distribution is then incorporated into a standard 1 d.f. correlation test of interaction. We show via extensive simulations under a variety of disease models that our test substantially outperforms existing tests of interaction in case-only trio studies. We also demonstrate a bias in a previous case-only trio interaction test and identify its origin. Finally, we show that a previously proposed permutation scheme in trio studies mitigates the known biases of case-only tests in the presence of population stratification. We conclude that the TC test shows improved power to identify interactions in existing, as well as emerging, trio association studies. The method is publicly available at www.github.com/BrunildaBalliu/TrioEpi.

  12. A SNP resource for Douglas-fir: de novo transcriptome assembly and SNP detection and validation

    PubMed Central

    2013-01-01

    Background Douglas-fir (Pseudotsuga menziesii), one of the most economically and ecologically important tree species in the world, also has one of the largest tree breeding programs. Although the coastal and interior varieties of Douglas-fir (vars. menziesii and glauca) are native to North America, the coastal variety is also widely planted for timber production in Europe, New Zealand, Australia, and Chile. Our main goal was to develop a SNP resource large enough to facilitate genomic selection in Douglas-fir breeding programs. To accomplish this, we developed a 454-based reference transcriptome for coastal Douglas-fir, annotated and evaluated the quality of the reference, identified putative SNPs, and then validated a sample of those SNPs using the Illumina Infinium genotyping platform. Results We assembled a reference transcriptome consisting of 25,002 isogroups (unique gene models) and 102,623 singletons from 2.76 million 454 and Sanger cDNA sequences from coastal Douglas-fir. We identified 278,979 unique SNPs by mapping the 454 and Sanger sequences to the reference, and by mapping four datasets of Illumina cDNA sequences from multiple seed sources, genotypes, and tissues. The Illumina datasets represented coastal Douglas-fir (64.00 and 13.41 million reads), interior Douglas-fir (80.45 million reads), and a Yakima population similar to interior Douglas-fir (8.99 million reads). We assayed 8067 SNPs on 260 trees using an Illumina Infinium SNP genotyping array. Of these SNPs, 5847 (72.5%) were called successfully and were polymorphic. Conclusions Based on our validation efficiency, our SNP database may contain as many as ~200,000 true SNPs, and as many as ~69,000 SNPs that could be genotyped at ~20,000 gene loci using an Infinium II array—more SNPs than are needed to use genomic selection in tree breeding programs. Ultimately, these genomic resources will enhance Douglas-fir breeding and allow us to better understand landscape-scale patterns of genetic variation

  13. Cardiovascular pharmacogenetics in the SNP era.

    PubMed

    Mooser, V; Waterworth, D M; Isenhour, T; Middleton, L

    2003-07-01

    In the past pharmacological agents have contributed to a significant reduction in age-adjusted incidence of cardiovascular events. However, not all patients treated with these agents respond favorably, and some individuals may develop side-effects. With aging of the population and the growing prevalence of cardiovascular risk factors worldwide, it is expected that the demand for cardiovascular drugs will increase in the future. Accordingly, there is a growing need to identify the 'good' responders as well as the persons at risk for developing adverse events. Evidence is accumulating to indicate that responses to drugs are at least partly under genetic control. As such, pharmacogenetics - the study of variability in drug responses attributed to hereditary factors in different populations - may significantly assist in providing answers toward meeting this challenge. Pharmacogenetics mostly relies on associations between a specific genetic marker like single nucleotide polymorphisms (SNPs), either alone or arranged in a specific linear order on a certain chromosomal region (haplotypes), and a particular response to drugs. Numerous associations have been reported between selected genotypes and specific responses to cardiovascular drugs. Recently, for instance, associations have been reported between specific alleles of the apoE gene and the lipid-lowering response to statins, or the lipid-elevating effect of isotretinoin. Thus far, these types of studies have been mostly limited to a priori selected candidate genes due to restricted genotyping and analytical capacities. Thanks to the large number of SNPs now available in the public domain through the SNP Consortium and the newly developed technologies (high throughput genotyping, bioinformatics software), it is now possible to interrogate more than 200,000 SNPs distributed over the entire human genome. One pharmacogenetic study using this approach has been launched by GlaxoSmithKline to identify the approximately 4% of

  14. RASSF1A and the rs2073498 Cancer Associated SNP

    PubMed Central

    Donninger, Howard; Barnoud, Thibaut; Nelson, Nick; Kassler, Suzanna; Clark, Jennifer; Cummins, Timothy D.; Powell, David W.; Nyante, Sarah; Millikan, Robert C.; Clark, Geoffrey J.

    2011-01-01

    RASSF1A is one of the most frequently inactivated tumor suppressors yet identified in human cancer. It is pro-apoptotic and appears to function as a scaffolding protein that interacts with a variety of other tumor suppressors to modulate their function. It can also complex with the Ras oncoprotein and may serve to integrate pro-growth and pro-death signaling pathways. A SNP has been identified that is present in approximately 29% of European populations [rs2073498, A(133)S]. Several studies have now presented evidence that this SNP is associated with an enhanced risk of developing breast cancer. We have used a proteomics based approach to identify multiple differences in the pattern of protein/protein interactions mediated by the wild type compared to the SNP variant protein. We have also identified a significant difference in biological activity between wild type and SNP variant protein. However, we have found only a very modest association of the SNP with breast cancer predisposition. PMID:22649770

  15. DoGSD: the dog and wolf genome SNP database.

    PubMed

    Bai, Bing; Zhao, Wen-Ming; Tang, Bi-Xia; Wang, Yan-Qing; Wang, Lu; Zhang, Zhang; Yang, He-Chuan; Liu, Yan-Hu; Zhu, Jun-Wei; Irwin, David M; Wang, Guo-Dong; Zhang, Ya-Ping

    2015-01-01

    The rapid advancement of next-generation sequencing technology has generated a deluge of genomic data from domesticated dogs and their wild ancestor, grey wolves, which have simultaneously broadened our understanding of domestication and diseases that are shared by humans and dogs. To address the scarcity of single nucleotide polymorphism (SNP) data provided by authorized databases and to make SNP data more easily/friendly usable and available, we propose DoGSD (http://dogsd.big.ac.cn), the first canidae-specific database which focuses on whole genome SNP data from domesticated dogs and grey wolves. The DoGSD is a web-based, open-access resource comprising ∼ 19 million high-quality whole-genome SNPs. In addition to the dbSNP data set (build 139), DoGSD incorporates a comprehensive collection of SNPs from two newly sequenced samples (1 wolf and 1 dog) and collected SNPs from three latest dog/wolf genetic studies (7 wolves and 68 dogs), which were taken together for analysis with the population genetic statistics, Fst. In addition, DoGSD integrates some closely related information including SNP annotation, summary lists of SNPs located in genes, synonymous and non-synonymous SNPs, sampling location and breed information. All these features make DoGSD a useful resource for in-depth analysis in dog-/wolf-related studies.

  16. PanSNPdb: the Pan-Asian SNP genotyping database.

    PubMed

    Ngamphiw, Chumpol; Assawamakin, Anunchai; Xu, Shuhua; Shaw, Philip J; Yang, Jin Ok; Ghang, Ho; Bhak, Jong; Liu, Edison; Tongsima, Sissades

    2011-01-01

    The HUGO Pan-Asian SNP consortium conducted the largest survey to date of human genetic diversity among Asians by sampling 1,719 unrelated individuals among 71 populations from China, India, Indonesia, Japan, Malaysia, the Philippines, Singapore, South Korea, Taiwan, and Thailand. We have constructed a database (PanSNPdb), which contains these data and various new analyses of them. PanSNPdb is a research resource in the analysis of the population structure of Asian peoples, including linkage disequilibrium patterns, haplotype distributions, and copy number variations. Furthermore, PanSNPdb provides an interactive comparison with other SNP and CNV databases, including HapMap3, JSNP, dbSNP and DGV and thus provides a comprehensive resource of human genetic diversity. The information is accessible via a widely accepted graphical interface used in many genetic variation databases. Unrestricted access to PanSNPdb and any associated files is available at: http://www4a.biotec.or.th/PASNP.

  17. Forensic SNP Genotyping using Nanopore MinION Sequencing

    PubMed Central

    Cornelis, Senne; Gansemans, Yannick; Deleye, Lieselot; Deforce, Dieter; Van Nieuwerburgh, Filip

    2017-01-01

    One of the latest developments in next generation sequencing is the Oxford Nanopore Technologies’ (ONT) MinION nanopore sequencer. We studied the applicability of this system to perform forensic genotyping of the forensic female DNA standard 9947 A using the 52 SNP-plex assay developed by the SNPforID consortium. All but one of the loci were correctly genotyped. Several SNP loci were identified as problematic for correct and robust genotyping using nanopore sequencing. All these loci contained homopolymers in the sequence flanking the forensic SNP and most of them were already reported as problematic in studies using other sequencing technologies. When these problematic loci are avoided, correct forensic genotyping using nanopore sequencing is technically feasible. PMID:28155888

  18. Forensic SNP Genotyping using Nanopore MinION Sequencing.

    PubMed

    Cornelis, Senne; Gansemans, Yannick; Deleye, Lieselot; Deforce, Dieter; Van Nieuwerburgh, Filip

    2017-02-03

    One of the latest developments in next generation sequencing is the Oxford Nanopore Technologies' (ONT) MinION nanopore sequencer. We studied the applicability of this system to perform forensic genotyping of the forensic female DNA standard 9947 A using the 52 SNP-plex assay developed by the SNPforID consortium. All but one of the loci were correctly genotyped. Several SNP loci were identified as problematic for correct and robust genotyping using nanopore sequencing. All these loci contained homopolymers in the sequence flanking the forensic SNP and most of them were already reported as problematic in studies using other sequencing technologies. When these problematic loci are avoided, correct forensic genotyping using nanopore sequencing is technically feasible.

  19. Population distribution and ancestry of the cancer protective MDM2 SNP285 (rs117039649).

    PubMed

    Knappskog, Stian; Gansmo, Liv B; Dibirova, Khadizha; Metspalu, Andres; Cybulski, Cezary; Peterlongo, Paolo; Aaltonen, Lauri; Vatten, Lars; Romundstad, Pål; Hveem, Kristian; Devilee, Peter; Evans, Gareth D; Lin, Dongxin; Van Camp, Guy; Manolopoulos, Vangelis G; Osorio, Ana; Milani, Lili; Ozcelik, Tayfun; Zalloua, Pierre; Mouzaya, Francis; Bliznetz, Elena; Balanovska, Elena; Pocheshkova, Elvira; Kučinskas, Vaidutis; Atramentova, Lubov; Nymadawa, Pagbajabyn; Titov, Konstantin; Lavryashina, Maria; Yusupov, Yuldash; Bogdanova, Natalia; Koshel, Sergey; Zamora, Jorge; Wedge, David C; Charlesworth, Deborah; Dörk, Thilo; Balanovsky, Oleg; Lønning, Per E

    2014-09-30

    The MDM2 promoter SNP285C is located on the SNP309G allele. While SNP309G enhances Sp1 transcription factor binding and MDM2 transcription, SNP285C antagonizes Sp1 binding and reduces the risk of breast-, ovary- and endometrial cancer. Assessing SNP285 and 309 genotypes across 25 different ethnic populations (>10.000 individuals), the incidence of SNP285C was 6-8% across European populations except for Finns (1.2%) and Saami (0.3%). The incidence decreased towards the Middle-East and Eastern Russia, and SNP285C was absent among Han Chinese, Mongolians and African Americans. Interhaplotype variation analyses estimated SNP285C to have originated about 14,700 years ago (95% CI: 8,300 - 33,300). Both this estimate and the geographical distribution suggest SNP285C to have arisen after the separation between Caucasians and modern day East Asians (17,000 - 40,000 years ago). We observed a strong inverse correlation (r = -0.805; p < 0.001) between the percentage of SNP309G alleles harboring SNP285C and the MAF for SNP309G itself across different populations suggesting selection and environmental adaptation with respect to MDM2 expression in recent human evolution. In conclusion, we found SNP285C to be a pan-Caucasian variant. Ethnic variation regarding distribution of SNP285C needs to be taken into account when assessing the impact of MDM2 SNPs on cancer risk.

  20. A 48 SNP set for grapevine cultivar identification

    PubMed Central

    2011-01-01

    Background Rapid and consistent genotyping is an important requirement for cultivar identification in many crop species. Among them grapevine cultivars have been the subject of multiple studies given the large number of synonyms and homonyms generated during many centuries of vegetative multiplication and exchange. Simple sequence repeat (SSR) markers have been preferred until now because of their high level of polymorphism, their codominant nature and their high profile repeatability. However, the rapid application of partial or complete genome sequencing approaches is identifying thousands of single nucleotide polymorphisms (SNP) that can be very useful for such purposes. Although SNP markers are bi-allelic, and therefore not as polymorphic as microsatellites, the high number of loci that can be multiplexed and the possibilities of automation as well as their highly repeatable results under any analytical procedure make them the future markers of choice for any type of genetic identification. Results We analyzed over 300 SNP in the genome of grapevine using a re-sequencing strategy in a selection of 11 genotypes. Among the identified polymorphisms, we selected 48 SNP spread across all grapevine chromosomes with allele frequencies balanced enough as to provide sufficient information content for genetic identification in grapevine allowing for good genotyping success rate. Marker stability was tested in repeated analyses of a selected group of cultivars obtained worldwide to demonstrate their usefulness in genetic identification. Conclusions We have selected a set of 48 stable SNP markers with a high discrimination power and a uniform genome distribution (2-3 markers/chromosome), which is proposed as a standard set for grapevine (Vitis vinifera L.) genotyping. Any previous problems derived from microsatellite allele confusion between labs or the need to run reference cultivars to identify allele sizes disappear using this type of marker. Furthermore, because SNP

  1. Sniper: improved SNP discovery by multiply mapping deep sequenced reads.

    PubMed

    Simola, Daniel F; Kim, Junhyong

    2011-06-20

    SNP (single nucleotide polymorphism) discovery using next-generation sequencing data remains difficult primarily because of redundant genomic regions, such as interspersed repetitive elements and paralogous genes, present in all eukaryotic genomes. To address this problem, we developed Sniper, a novel multi-locus Bayesian probabilistic model and a computationally efficient algorithm that explicitly incorporates sequence reads that map to multiple genomic loci. Our model fully accounts for sequencing error, template bias, and multi-locus SNP combinations, maintaining high sensitivity and specificity under a broad range of conditions. An implementation of Sniper is freely available at http://kim.bio.upenn.edu/software/sniper.shtml.

  2. Evidence for SNP-SNP interaction identified through targeted sequencing of cleft case-parent trios.

    PubMed

    Xiao, Yanzi; Taub, Margaret A; Ruczinski, Ingo; Begum, Ferdouse; Hetmanski, Jacqueline B; Schwender, Holger; Leslie, Elizabeth J; Koboldt, Daniel C; Murray, Jeffrey C; Marazita, Mary L; Beaty, Terri H

    2017-04-01

    Nonsyndromic cleft lip with or without cleft palate (NSCL/P) is the most common craniofacial birth defect in humans, affecting 1 in 700 live births. This malformation has a complex etiology where multiple genes and several environmental factors influence risk. At least a dozen different genes have been confirmed to be associated with risk of NSCL/P in previous studies. However, all the known genetic risk factors cannot fully explain the observed heritability of NSCL/P, and several authors have suggested gene-gene (G × G) interaction may be important in the etiology of this complex and heterogeneous malformation. We tested for G × G interactions using common single nucleotide polymorphic (SNP) markers from targeted sequencing in 13 regions identified by previous studies spanning 6.3 Mb of the genome in a study of 1,498 NSCL/P case-parent trios. We used the R-package trio to assess interactions between polymorphic markers in different genes, using a 1 degree of freedom (1df) test for screening, and a 4 degree of freedom (4df) test to assess statistical significance of epistatic interactions. To adjust for multiple comparisons, we performed permutation tests. The most significant interaction was observed between rs6029315 in MAFB and rs6681355 in IRF6 (4df P = 3.8 × 10(-8) ) in case-parent trios of European ancestry, which remained significant after correcting for multiple comparisons. However, no significant interaction was detected in trios of Asian ancestry.

  3. Software solutions for the livestock genomics SNP array revolution.

    PubMed

    Nicolazzi, E L; Biffani, S; Biscarini, F; Orozco Ter Wengel, P; Caprera, A; Nazzicari, N; Stella, A

    2015-08-01

    Since the beginning of the genomic era, the number of available single nucleotide polymorphism (SNP) arrays has grown considerably. In the bovine species alone, 11 SNP chips not completely covered by intellectual property are currently available, and the number is growing. Genomic/genotype data are not standardized, and this hampers its exchange and integration. In addition, software used for the analyses of these data usually requires not standard (i.e. case specific) input files which, considering the large amount of data to be handled, require at least some programming skills in their production. In this work, we describe a software toolkit for SNP array data management, imputation, genome-wide association studies, population genetics and genomic selection. However, this toolkit does not solve the critical need for standardization of the genotypic data and software input files. It only highlights the chaotic situation each researcher has to face on a daily basis and gives some helpful advice on the currently available tools in order to navigate the SNP array data complexity.

  4. Target SNP selection in complex disease association studies

    PubMed Central

    Wjst, Matthias

    2004-01-01

    Background The massive amount of SNP data stored at public internet sites provides unprecedented access to human genetic variation. Selecting target SNP for disease-gene association studies is currently done more or less randomly as decision rules for the selection of functional relevant SNPs are not available. Results We implemented a computational pipeline that retrieves the genomic sequence of target genes, collects information about sequence variation and selects functional motifs containing SNPs. Motifs being considered are gene promoter, exon-intron structure, AU-rich mRNA elements, transcription factor binding motifs, cryptic and enhancer splice sites together with expression in target tissue. As a case study, 396 genes on chromosome 6p21 in the extended HLA region were selected that contributed nearly 20,000 SNPs. By computer annotation ~2,500 SNPs in functional motifs could be identified. Most of these SNPs are disrupting transcription factor binding sites but only those introducing new sites had a significant depressing effect on SNP allele frequency. Other decision rules concern position within motifs, the validity of SNP database entries, the unique occurrence in the genome and conserved sequence context in other mammalian genomes. Conclusion Only 10% of all gene-based SNPs have sequence-predicted functional relevance making them a primary target for genotyping in association studies. PMID:15248903

  5. Do you really know where this SNP goes?

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The release of build 10.2 of the swine genome was a marked improvement over previous builds and has proven extremely useful. However, as most know, there are regions of the genome that this particular build does not accurately represent. For instance, nearly 25% of the 62,162 SNP on the Illumina Por...

  6. Genetic mapping in grapevine using a SNP microarray: intensity values

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genotyping microarrays are widely used for genome wide association studies, but in high-diversity organisms, the quality of SNP calls can be diminished by genetic variation near the assayed nucleotide. To address this limitation in grapevine, we developed a simple heuristic that uses hybridization i...

  7. High throughput SNP detection system based on magnetic nanoparticles separation.

    PubMed

    Liu, Bin; Jia, Yingying; Ma, Man; Li, Zhiyang; Liu, Hongna; Li, Song; Deng, Yan; Zhang, Liming; Lu, Zhuoxuan; Wang, Wei; He, Nongyue

    2013-02-01

    Single-nucleotide polymorphism (SNP) was one-base variations in DNA sequence that can often be helpful to find genes associations for hereditary disease, communicable disease and so on. We developed a high throughput SNP detection system based on magnetic nanoparticles (MNPs) separation and dual-color hybridization or single base extension. This system includes a magnetic separation unit for sample separation, three high precision robot arms for pipetting and microtiter plate transferring respectively, an accurate temperature control unit for PCR and DNA hybridization and a high accurate and sensitive optical signal detection unit for fluorescence detection. The cyclooxygenase-2 gene promoter region--65G > C polymorphism locus SNP genotyping experiment for 48 samples from the northern Jiangsu area has been done to verify that if this system can simplify manual operation of the researchers, save time and improve efficiency in SNP genotyping experiments. It can realize sample preparation, target sequence amplification, signal detection and data analysis automatically and can be used in clinical molecule diagnosis and high throughput fluorescence immunological detection and so on.

  8. Weighted SNP set analysis in genome-wide association study.

    PubMed

    Dai, Hui; Zhao, Yang; Qian, Cheng; Cai, Min; Zhang, Ruyang; Chu, Minjie; Dai, Juncheng; Hu, Zhibin; Shen, Hongbing; Chen, Feng

    2013-01-01

    Genome-wide association studies (GWAS) are popular for identifying genetic variants which are associated with disease risk. Many approaches have been proposed to test multiple single nucleotide polymorphisms (SNPs) in a region simultaneously which considering disadvantages of methods in single locus association analysis. Kernel machine based SNP set analysis is more powerful than single locus analysis, which borrows information from SNPs correlated with causal or tag SNPs. Four types of kernel machine functions and principal component based approach (PCA) were also compared. However, given the loss of power caused by low minor allele frequencies (MAF), we conducted an extension work on PCA and used a new method called weighted PCA (wPCA). Comparative analysis was performed for weighted principal component analysis (wPCA), logistic kernel machine based test (LKM) and principal component analysis (PCA) based on SNP set in the case of different minor allele frequencies (MAF) and linkage disequilibrium (LD) structures. We also applied the three methods to analyze two SNP sets extracted from a real GWAS dataset of non-small cell lung cancer in Han Chinese population. Simulation results show that when the MAF of the causal SNP is low, weighted principal component and weighted IBS are more powerful than PCA and other kernel machine functions at different LD structures and different numbers of causal SNPs. Application of the three methods to a real GWAS dataset indicates that wPCA and wIBS have better performance than the linear kernel, IBS kernel and PCA.

  9. High throughput SNP discovery and validation in the pig: towards the development of a high density swine SNP chip

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Recent developments in sequencing technology have allowed the generation of millions of short read sequences in a fast and inexpensive way. This enables the cost effective large scale identification of hundreds of thousands of SNPs needed for the development of high density SNP arrays. Currently, a ...

  10. Large-Scale SNP Discovery through RNA Sequencing and SNP Genotyping by Targeted Enrichment Sequencing in Cassava (Manihot esculenta Crantz)

    PubMed Central

    Pootakham, Wirulda; Shearman, Jeremy R.; Ruang-areerate, Panthita; Sonthirod, Chutima; Sangsrakru, Duangjai; Jomchai, Nukoon; Yoocha, Thippawan; Triwitayakorn, Kanokporn; Tragoonrung, Somvong; Tangphatsornruang, Sithichoke

    2014-01-01

    Cassava (Manihot esculenta Crantz) is one of the most important crop species being the main source of dietary energy in several countries. Marker-assisted selection has become an essential tool in plant breeding. Single nucleotide polymorphism (SNP) discovery via transcriptome sequencing is an attractive strategy for genome complexity reduction in organisms with large genomes. We sequenced the transcriptome of 16 cassava accessions using the Illumina HiSeq platform and identified 675,559 EST-derived SNP markers. A subset of those markers was subsequently genotyped by capture-based targeted enrichment sequencing in 100 F1 progeny segregating for starch viscosity phenotypes. A total of 2,110 non-redundant SNP markers were used to construct a genetic map. This map encompasses 1,785 cM and consists of 19 linkage groups. A major quantitative trait locus (QTL) controlling starch pasting properties was identified and shown to coincide with the QTL previously reported for this trait. With a high-density SNP-based linkage map presented here, we also uncovered a novel QTL associated with starch pasting time on LG 10. PMID:25551642

  11. Large-scale SNP discovery through RNA sequencing and SNP genotyping by targeted enrichment sequencing in cassava (Manihot esculenta Crantz).

    PubMed

    Pootakham, Wirulda; Shearman, Jeremy R; Ruang-Areerate, Panthita; Sonthirod, Chutima; Sangsrakru, Duangjai; Jomchai, Nukoon; Yoocha, Thippawan; Triwitayakorn, Kanokporn; Tragoonrung, Somvong; Tangphatsornruang, Sithichoke

    2014-01-01

    Cassava (Manihot esculenta Crantz) is one of the most important crop species being the main source of dietary energy in several countries. Marker-assisted selection has become an essential tool in plant breeding. Single nucleotide polymorphism (SNP) discovery via transcriptome sequencing is an attractive strategy for genome complexity reduction in organisms with large genomes. We sequenced the transcriptome of 16 cassava accessions using the Illumina HiSeq platform and identified 675,559 EST-derived SNP markers. A subset of those markers was subsequently genotyped by capture-based targeted enrichment sequencing in 100 F1 progeny segregating for starch viscosity phenotypes. A total of 2,110 non-redundant SNP markers were used to construct a genetic map. This map encompasses 1,785 cM and consists of 19 linkage groups. A major quantitative trait locus (QTL) controlling starch pasting properties was identified and shown to coincide with the QTL previously reported for this trait. With a high-density SNP-based linkage map presented here, we also uncovered a novel QTL associated with starch pasting time on LG 10.

  12. High-throughput SNP genotyping for breeding applications in rice using the BeadXpress platform

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Multiplexed single nucleotide polymorphism (SNP) markers have the potential to increase the speed and cost-effectiveness of genotyping, provided that an optimal SNP density is used for each application. To test the efficiency of multiplexed SNP genotyping for diversity, mapping and breeding applicat...

  13. Development of Single Nucleotide Polymorphism (SNP) Markers for Use in Commercial Maize (Zea Mays L.) Germplasm

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The development of single nucleotide polymorphism (SNP) markers in maize offer the opportunity to utilize DNA markers in many new areas of population genetics, gene discovery, plant breeding, and germplasm identification. However, the steps from sequencing and SNP discovery to SNP marker design and ...

  14. SNP genotyping using single-tube fluorescent bidirectional PCR.

    PubMed

    Waterfall, Christy M; Cobb, Benjamin D

    2002-07-01

    SNP genotyping is a well-populatedfield with a large number of assay formats offering accurate allelic discrimination. However, there remains a discord between the ultimate goal of rapid, inexpensive assays that do not require complex design considerations and involved optimization strategies. We describe the first integration of bidirectional allele-specific amplification, SYBR Green I, and rapid-cycle PCR to provide a homogeneous SNP-typing assay. Wild-type, mutant, and heterozygous alleles were easily discriminated in a single tube using melt curve profiling of PCR products alone. We demonstrate the effectiveness and reliability of this assay with a blinded trial using clinical samples from individuals with sickle cell anemia, sickle cell trait, or unaffected individuals. The tests were completed in less than 30 min without expensive fluorogenic probes, prohibiting design rules, or lengthy downstream processing for product analysis.

  15. Pyrobayes: an improved base caller for SNP discovery in pyrosequences.

    PubMed

    Quinlan, Aaron R; Stewart, Donald A; Strömberg, Michael P; Marth, Gábor T

    2008-02-01

    Previously reported applications of the 454 Life Sciences pyrosequencing technology have relied on deep sequence coverage for accurate polymorphism discovery because of frequent insertion and deletion sequence errors. Here we report a new base calling program, Pyrobayes, for pyrosequencing reads. Pyrobayes permits accurate single-nucleotide polymorphism (SNP) calling in resequencing applications, even in shallow read coverage, primarily because it produces more confident base calls than the native base calling program.

  16. Development of SNP-genotyping arrays in two shellfish species.

    PubMed

    Lapègue, S; Harrang, E; Heurtebise, S; Flahauw, E; Donnadieu, C; Gayral, P; Ballenghien, M; Genestout, L; Barbotte, L; Mahla, R; Haffray, P; Klopp, C

    2014-07-01

    Use of SNPs has been favoured due to their abundance in plant and animal genomes, accompanied by the falling cost and rising throughput capacity for detection and genotyping. Here, we present in vitro (obtained from targeted sequencing) and in silico discovery of SNPs, and the design of medium-throughput genotyping arrays for two oyster species, the Pacific oyster, Crassostrea gigas, and European flat oyster, Ostrea edulis. Two sets of 384 SNP markers were designed for two Illumina GoldenGate arrays and genotyped on more than 1000 samples for each species. In each case, oyster samples were obtained from wild and selected populations and from three-generation families segregating for traits of interest in aquaculture. The rate of successfully genotyped polymorphic SNPs was about 60% for each species. Effects of SNP origin and quality on genotyping success (Illumina functionality Score) were analysed and compared with other model and nonmodel species. Furthermore, a simulation was made based on a subset of the C. gigas SNP array with a minor allele frequency of 0.3 and typical crosses used in shellfish hatcheries. This simulation indicated that at least 150 markers were needed to perform an accurate parental assignment. Such panels might provide valuable tools to improve our understanding of the connectivity between wild (and selected) populations and could contribute to future selective breeding programmes.

  17. Development of a forensic identity SNP panel for Indonesia.

    PubMed

    Augustinus, Daniel; Gahan, Michelle E; McNevin, Dennis

    2015-07-01

    Genetic markers included in forensic identity panels must exhibit Hardy-Weinberg and linkage equilibrium (HWE and LE). "Universal" panels designed for global use can fail these tests in regional jurisdictions exhibiting high levels of genetic differentiation such as the Indonesian archipelago. This is especially the case where a single DNA database is required for allele frequency estimates to calculate random match probabilities (RMPs) and associated likelihood ratios (LRs). A panel of 65 single nucleotide polymorphisms (SNPs) and a reduced set of 52 SNPs have been selected from 15 Indonesian subpopulations in the HUGO Pan Asian SNP database using a SNP selection strategy that could be applied to any panel of forensic identity markers. The strategy consists of four screening steps: (1) application of a G test for HWE; (2) ranking for high heterozygosity; (3) selection for LE; and (4) selection for low inbreeding depression. SNPs in our Indonesian panel perform well in comparison to some other universal SNP and short tandem repeat (STR) panels as measured by Fisher's exact test for HWE and LE and Wright's F statistics.

  18. Population distribution and ancestry of the cancer protective MDM2 SNP285 (rs117039649)

    PubMed Central

    Knappskog, Stian; Gansmo, Liv B.; Dibirova, Khadizha; Metspalu, Andres; Cybulski, Cezary; Peterlongo, Paolo; Aaltonen, Lauri; Vatten, Lars; Romundstad, Pål; Hveem, Kristian; Devilee, Peter; Evans, Gareth D.; Lin, Dongxin; Camp, Guy Van; Manolopoulos, Vangelis G.; Osorio, Ana; Milani, Lili; Ozcelik, Tayfun; Zalloua, Pierre; Mouzaya, Francis; Bliznetz, Elena; Balanovska, Elena; Pocheshkova, Elvira; Kučinskas, Vaidutis; Atramentova, Lubov; Nymadawa, Pagbajabyn; Titov, Konstantin; Lavryashina, Maria; Yusupov, Yuldash; Bogdanova, Natalia; Koshel, Sergey; Zamora, Jorge; Wedge, David C.; Charlesworth, Deborah; Dörk, Thilo; Balanovsky, Oleg; Lønning, Per E.

    2014-01-01

    The MDM2 promoter SNP285C is located on the SNP309G allele. While SNP309G enhances Sp1 transcription factor binding and MDM2 transcription, SNP285C antagonizes Sp1 binding and reduces the risk of breast-, ovary- and endometrial cancer. Assessing SNP285 and 309 genotypes across 25 different ethnic populations (>10.000 individuals), the incidence of SNP285C was 6-8% across European populations except for Finns (1.2%) and Saami (0.3%). The incidence decreased towards the Middle-East and Eastern Russia, and SNP285C was absent among Han Chinese, Mongolians and African Americans. Interhaplotype variation analyses estimated SNP285C to have originated about 14,700 years ago (95% CI: 8,300 – 33,300). Both this estimate and the geographical distribution suggest SNP285C to have arisen after the separation between Caucasians and modern day East Asians (17,000 - 40,000 years ago). We observed a strong inverse correlation (r = -0.805; p < 0.001) between the percentage of SNP309G alleles harboring SNP285C and the MAF for SNP309G itself across different populations suggesting selection and environmental adaptation with respect to MDM2 expression in recent human evolution. In conclusion, we found SNP285C to be a pan-Caucasian variant. Ethnic variation regarding distribution of SNP285C needs to be taken into account when assessing the impact of MDM2 SNPs on cancer risk. PMID:25327560

  19. Forensic SNP genotyping with SNaPshot: Technical considerations for the development and optimization of multiplexed SNP assays.

    PubMed

    Fondevila, M; Børsting, C; Phillips, C; de la Puente, M; Consortium, Euroforen-NoE; Carracedo, A; Morling, N; Lareu, M V

    2017-01-01

    This review explores the key factors that influence the optimization, routine use, and profile interpretation of the SNaPshot single-base extension (SBE) system applied to forensic single-nucleotide polymorphism (SNP) genotyping. Despite being a mainly complimentary DNA genotyping technique to routine STR profiling, use of SNaPshot is an important part of the development of SNP sets for a wide range of forensic applications with these markers, from genotyping highly degraded DNA with very short amplicons to the introduction of SNPs to ascertain the ancestry and physical characteristics of an unidentified contact trace donor. However, this technology, as resourceful as it is, displays several features that depart from the usual STR genotyping far enough to demand a certain degree of expertise from the forensic analyst before tackling the complex casework on which SNaPshot application provides an advantage. In order to provide the basis for developing such expertise, we cover in this paper the most challenging aspects of the SNaPshot technology, focusing on the steps taken to design primer sets, optimize the PCR and single-base extension chemistries, and the important features of the peak patterns observed in typical forensic SNP profiles using SNaPshot. With that purpose in mind, we provide guidelines and troubleshooting for multiplex-SNaPshot-oriented primer design and the resulting capillary electrophoresis (CE) profile interpretation (covering the most commonly observed artifacts and expected departures from the ideal conditions).

  20. Exploration of SNP variants affecting hair colour prediction in Europeans.

    PubMed

    Söchtig, Jens; Phillips, Chris; Maroñas, Olalla; Gómez-Tato, Antonio; Cruz, Raquel; Alvarez-Dios, Jose; de Cal, María-Ángeles Casares; Ruiz, Yarimar; Reich, Kristian; Fondevila, Manuel; Carracedo, Ángel; Lareu, María V

    2015-09-01

    DNA profiling is a key tool for forensic analysis; however, current methods identify a suspect either by direct comparison or from DNA database searches. In cases with unidentified suspects, prediction of visible physical traits e.g. pigmentation or hair distribution of the DNA donors can provide important probative information. This study aimed to explore single nucleotide polymorphism (SNP) variants for their effect on hair colour prediction. A discovery panel of 63 SNPs consisting of already established hair colour markers from the HIrisPlex hair colour phenotyping assay as well as additional markers for which associations to human pigmentation traits were previously identified was used to develop multiplex assays based on SNaPshot single-base extension technology. A genotyping study was performed on a range of European populations (n = 605). Hair colour phenotyping was accomplished by matching donor's hair to a graded colour category system of reference shades and photography. Since multiple SNPs in combination contribute in varying degrees to hair colour predictability in Europeans, we aimed to compile a compact marker set that could provide a reliable hair colour inference from the fewest SNPs. The predictive approach developed uses a naïve Bayes classifier to provide hair colour assignment probabilities for the SNP profiles of the key SNPs and was embedded into the Snipper online SNP classifier ( http://mathgene.usc.es/snipper/ ). Results indicate that red, blond, brown and black hair colours are predictable with informative probabilities in a high proportion of cases. Our study resulted in the identification of 12 most strongly associated SNPs to hair pigmentation variation in six genes.

  1. Etiological yield of SNP microarrays in idiopathic intellectual disability.

    PubMed

    Utine, G Eda; Haliloğlu, Göknur; Volkan-Salancı, Bilge; Çetinkaya, Arda; Kiper, Pelin Ö; Alanay, Yasemin; Aktaş, Dilek; Anlar, Banu; Topçu, Meral; Boduroğlu, Koray; Alikaşifoğlu, Mehmet

    2014-05-01

    Intellectual disability (ID) has a prevalence of 3% and is classified according to its severity. An underlying etiology cannot be determined in 75-80% in mild ID, and in 20-50% of severe ID. After it has been shown that copy number variations involving short DNA segments may cause ID, genome-wide SNP microarrays are being used as a tool for detecting submicroscopic copy number changes and uniparental disomy. This study was performed to investigate the presence of copy number changes in patients with ID of unidentified etiology. Affymetrix(®) 6.0 SNP microarray platform was used for analysis of 100 patients and their healthy parents, and data were evaluated using various databases and literature. Etiological diagnoses were made in 12 patients (12%). Homozygous deletion in NRXN1 gene and duplication in IL1RAPL1 gene were detected for the first time. Two separate patients had deletions in FOXP2 and UBE2A genes, respectively, for which only few patients have recently been reported. Interstitial and subtelomeric copy number changes were described in 6 patients, in whom routine cytogenetic tools revealed normal results. In one patient uniparental disomy type of Angelman syndrome was diagnosed. SNP microarrays constitute a screening test able to detect very small genomic changes, with a high etiological yield even in patients already evaluated using traditional cytogenetic tools, offer analysis for uniparental disomy and homozygosity, and thereby are helpful in finding novel disease-causing genes: for these reasons they should be considered as a first-tier genetic screening test in the evaluation of patients with ID and autism.

  2. Genome-wide SNP typing reveals signatures of population history.

    PubMed

    Hughes, Austin L; Welch, Robert; Puri, Vinita; Matthews, Casey; Haque, Kashif; Chanock, Stephen J; Yeager, Meredith

    2008-07-01

    Single-nucleotide polymorphism (SNP) arrays have become a popular technology for disease-association studies, but they also have potential for studying the genetic differentiation of human populations. Application of the Affymetrix GeneChip Human Mapping 500K Array Set to a population of 102 individuals representing the major ethnic groups in the United States (African, Asian, European, and Hispanic) revealed patterns of gene diversity and genetic distance that reflected population history. We analyzed allelic frequencies at 388,654 autosomal SNP sites that showed some variation in our study population and 10% or fewer missing values. Despite the small size (23-31 individuals) of each subpopulation, there were no fixed differences at any site between any two subpopulations. As expected from the African origin of modern humans, greater gene diversity was seen in Africans than in either Asians or Europeans, and the genetic distance between the Asian and the European populations was significantly lower than that between either of these two populations and Africans. Principal components analysis applied to a correlation matrix among individuals was able to separate completely the major continental groups of humans (Africans, Asians, and Europeans), while Hispanics overlapped all three of these groups. Genes containing two or more markers with extraordinarily high genetic distance between subpopulations were identified as candidate genes for health differences between subpopulations. The results show that, even with modest sample sizes, genome-wide SNP genotyping technologies have great promise for capturing signatures of gene frequency difference between human subpopulations, with applications in areas as diverse as forensics and the study of ethnic health disparities.

  3. Computational tradeoffs in multiplex PCR assay design for SNP genotyping

    PubMed Central

    Rachlin, John; Ding, Chunming; Cantor, Charles; Kasif, Simon

    2005-01-01

    Background Multiplex PCR is a key technology for detecting infectious microorganisms, whole-genome sequencing, forensic analysis, and for enabling flexible yet low-cost genotyping. However, the design of a multiplex PCR assays requires the consideration of multiple competing objectives and physical constraints, and extensive computational analysis must be performed in order to identify the possible formation of primer-dimers that can negatively impact product yield. Results This paper examines the computational design limits of multiplex PCR in the context of SNP genotyping and examines tradeoffs associated with several key design factors including multiplexing level (the number of primer pairs per tube), coverage (the % of SNP whose associated primers are actually assigned to one of several available tube), and tube-size uniformity. We also examine how design performance depends on the total number of available SNPs from which to choose, and primer stringency criterial. We show that finding high-multiplexing/high-coverage designs is subject to a computational phase transition, becoming dramatically more difficult when the probability of primer pair interaction exceeds a critical threshold. The precise location of this critical transition point depends on the number of available SNPs and the level of multiplexing required. We also demonstrate how coverage performance is impacted by the number of available snps, primer selection criteria, and target multiplexing levels. Conclusion The presence of a phase transition suggests limits to scaling Multiplex PCR performance for high-throughput genomics applications. Achieving broad SNP coverage rapidly transitions from being very easy to very hard as the target multiplexing level (# of primer pairs per tube) increases. The onset of a phase transition can be "delayed" by having a larger pool of SNPs, or loosening primer selection constraints so as to increase the number of candidate primer pairs per SNP, though the latter

  4. Genome-wide SNP discovery in walnut with an AGSNP pipeline updated for SNP discovery in allogamous organisms

    PubMed Central

    2012-01-01

    Background A genome-wide set of single nucleotide polymorphisms (SNPs) is a valuable resource in genetic research and breeding and is usually developed by re-sequencing a genome. If a genome sequence is not available, an alternative strategy must be used. We previously reported the development of a pipeline (AGSNP) for genome-wide SNP discovery in coding sequences and other single-copy DNA without a complete genome sequence in self-pollinating (autogamous) plants. Here we updated this pipeline for SNP discovery in outcrossing (allogamous) species and demonstrated its efficacy in SNP discovery in walnut (Juglans regia L.). Results The first step in the original implementation of the AGSNP pipeline was the construction of a reference sequence and the identification of single-copy sequences in it. To identify single-copy sequences, multiple genome equivalents of short SOLiD reads of another individual were mapped to shallow genome coverage of long Sanger or Roche 454 reads making up the reference sequence. The relative depth of SOLiD reads was used to filter out repeated sequences from single-copy sequences in the reference sequence. The second step was a search for SNPs between SOLiD reads and the reference sequence. Polymorphism within the mapped SOLiD reads would have precluded SNP discovery; hence both individuals had to be homozygous. The AGSNP pipeline was updated here for using SOLiD or other type of short reads of a heterozygous individual for these two principal steps. A total of 32.6X walnut genome equivalents of SOLiD reads of vegetatively propagated walnut scion cultivar ‘Chandler’ were mapped to 48,661 ‘Chandler’ bacterial artificial chromosome (BAC) end sequences (BESs) produced by Sanger sequencing during the construction of a walnut physical map. A total of 22,799 putative SNPs were initially identified. A total of 6,000 Infinium II type SNPs evenly distributed along the walnut physical map were selected for the construction of an Infinium Bead

  5. The Impact of a Common MDM2 SNP on the Sensitivity of Breast Cancer to Treatment

    DTIC Science & Technology

    2012-06-01

    could decrease the effectiveness of treatment. These outcomes are likely due to the increased expression of mdm2 protein in SNP309 individuals, which...expression at the protein level occur in the mdm2 SNP309 cell line. There was no association between the mdm2 SNP309 and clinical outcome of breast cancer...with chemotherapy, hormonal therapy and radiation therapy. 1S. SUBJECT TERMS mdm2, breast cancer, polymorphisms 16. SECURITY CLASSIFICATION OF: 17

  6. A genome-wide search for common SNP x SNP interactions on the risk of venous thrombosis

    PubMed Central

    2013-01-01

    Background Venous Thrombosis (VT) is a common multifactorial disease with an estimated heritability between 35% and 60%. Known genetic polymorphisms identified so far only explain ~5% of the genetic variance of the disease. This study was aimed to investigate whether pair-wise interactions between common single nucleotide polymorphisms (SNPs) could exist and modulate the risk of VT. Methods A genome-wide SNP x SNP interaction analysis on VT risk was conducted in a French case–control study and the most significant findings were tested for replication in a second independent French case–control sample. The results obtained in the two studies totaling 1,953 cases and 2,338 healthy subjects were combined into a meta-analysis. Results The smallest observed p-value for interaction was p = 6.00 10-11 but it did not pass the Bonferroni significance threshold of 1.69 10-12 correcting for the number of investigated interactions that was 2.96 1010. Among the 37 suggestive pair-wise interactions with p-value less than 10-8, one was further shown to involve two SNPs, rs9804128 (IGFS21 locus) and rs4784379 (IRX3 locus) that demonstrated significant interactive effects (p = 4.83 10-5) on the variability of plasma Factor VIII levels, a quantitative biomarker of VT risk, in a sample of 1,091 VT patients. Conclusion This study, the first genome-wide SNP interaction analysis conducted so far on VT risk, suggests that common SNPs are unlikely exerting strong interactive effects on the risk of disease. PMID:23509962

  7. Accuracy of direct genomic values in Holstein bulls and cows using subsets of SNP markers

    PubMed Central

    2010-01-01

    Background At the current price, the use of high-density single nucleotide polymorphisms (SNP) genotyping assays in genomic selection of dairy cattle is limited to applications involving elite sires and dams. The objective of this study was to evaluate the use of low-density assays to predict direct genomic value (DGV) on five milk production traits, an overall conformation trait, a survival index, and two profit index traits (APR, ASI). Methods Dense SNP genotypes were available for 42,576 SNP for 2,114 Holstein bulls and 510 cows. A subset of 1,847 bulls born between 1955 and 2004 was used as a training set to fit models with various sets of pre-selected SNP. A group of 297 bulls born between 2001 and 2004 and all cows born between 1992 and 2004 were used to evaluate the accuracy of DGV prediction. Ridge regression (RR) and partial least squares regression (PLSR) were used to derive prediction equations and to rank SNP based on the absolute value of the regression coefficients. Four alternative strategies were applied to select subset of SNP, namely: subsets of the highest ranked SNP for each individual trait, or a single subset of evenly spaced SNP, where SNP were selected based on their rank for ASI, APR or minor allele frequency within intervals of approximately equal length. Results RR and PLSR performed very similarly to predict DGV, with PLSR performing better for low-density assays and RR for higher-density SNP sets. When using all SNP, DGV predictions for production traits, which have a higher heritability, were more accurate (0.52-0.64) than for survival (0.19-0.20), which has a low heritability. The gain in accuracy using subsets that included the highest ranked SNP for each trait was marginal (5-6%) over a common set of evenly spaced SNP when at least 3,000 SNP were used. Subsets containing 3,000 SNP provided more than 90% of the accuracy that could be achieved with a high-density assay for cows, and 80% of the high-density assay for young bulls

  8. PCR amplification of SNP loci from crude DNA for large-scale genotyping of oomycetes.

    PubMed

    Hu, Jian; Lyon, Rebecca; Zhou, Yuxin; Lamour, Kurt

    2014-01-01

    Similar to other eukaryotes, single nucleotide polymorphism (SNP) markers are abundant in many oomycete plant pathogen genomes. High resolution DNA melting analysis (HR-DMA) is a cost-effective method for SNP genotyping, but like many SNP marker technologies, is limited by the amount and quality of template DNA. We describe PCR preamplification of Phytophthora and Peronospora SNP loci from crude DNA extracted from a small amount of mycelium and/or infected plant tissue to produce sufficient template to genotype at least 10 000 SNPs. The approach is fast, inexpensive, requires minimal biological material and should be useful for many organisms in a variety of contexts.

  9. SNP Markers and Their Impact on Plant Breeding

    PubMed Central

    Mammadov, Jafar; Aggarwal, Rajat; Buyyarapu, Ramesh; Kumpatla, Siva

    2012-01-01

    The use of molecular markers has revolutionized the pace and precision of plant genetic analysis which in turn facilitated the implementation of molecular breeding of crops. The last three decades have seen tremendous advances in the evolution of marker systems and the respective detection platforms. Markers based on single nucleotide polymorphisms (SNPs) have rapidly gained the center stage of molecular genetics during the recent years due to their abundance in the genomes and their amenability for high-throughput detection formats and platforms. Computational approaches dominate SNP discovery methods due to the ever-increasing sequence information in public databases; however, complex genomes pose special challenges in the identification of informative SNPs warranting alternative strategies in those crops. Many genotyping platforms and chemistries have become available making the use of SNPs even more attractive and efficient. This paper provides a review of historical and current efforts in the development, validation, and application of SNP markers in QTL/gene discovery and plant breeding by discussing key experimental strategies and cases exemplifying their impact. PMID:23316221

  10. Eigenanalysis of SNP data with an identity by descent interpretation.

    PubMed

    Zheng, Xiuwen; Weir, Bruce S

    2016-02-01

    Principal component analysis (PCA) is widely used in genome-wide association studies (GWAS), and the principal component axes often represent perpendicular gradients in geographic space. The explanation of PCA results is of major interest for geneticists to understand fundamental demographic parameters. Here, we provide an interpretation of PCA based on relatedness measures, which are described by the probability that sets of genes are identical-by-descent (IBD). An approximately linear transformation between ancestral proportions (AP) of individuals with multiple ancestries and their projections onto the principal components is found. In addition, a new method of eigenanalysis "EIGMIX" is proposed to estimate individual ancestries. EIGMIX is a method of moments with computational efficiency suitable for millions of SNP data, and it is not subject to the assumption of linkage equilibrium. With the assumptions of multiple ancestries and their surrogate ancestral samples, EIGMIX is able to infer ancestral proportions (APs) of individuals. The methods were applied to the SNP data from the HapMap Phase 3 project and the Human Genome Diversity Panel. The APs of individuals inferred by EIGMIX are consistent with the findings of the program ADMIXTURE. In conclusion, EIGMIX can be used to detect population structure and estimate genome-wide ancestral proportions with a relatively high accuracy.

  11. Fine-scaled human genetic structure revealed by SNP microarrays.

    PubMed

    Xing, Jinchuan; Watkins, W Scott; Witherspoon, David J; Zhang, Yuhua; Guthery, Stephen L; Thara, Rangaswamy; Mowry, Bryan J; Bulayeva, Kazima; Weiss, Robert B; Jorde, Lynn B

    2009-05-01

    We report an analysis of more than 240,000 loci genotyped using the Affymetrix SNP microarray in 554 individuals from 27 worldwide populations in Africa, Asia, and Europe. To provide a more extensive and complete sampling of human genetic variation, we have included caste and tribal samples from two states in South India, Daghestanis from eastern Europe, and the Iban from Malaysia. Consistent with observations made by Charles Darwin, our results highlight shared variation among human populations and demonstrate that much genetic variation is geographically continuous. At the same time, principal components analyses reveal discernible genetic differentiation among almost all identified populations in our sample, and in most cases, individuals can be clearly assigned to defined populations on the basis of SNP genotypes. All individuals are accurately classified into continental groups using a model-based clustering algorithm, but between closely related populations, genetic and self-classifications conflict for some individuals. The 250K data permitted high-level resolution of genetic variation among Indian caste and tribal populations and between highland and lowland Daghestani populations. In particular, upper-caste individuals from Tamil Nadu and Andhra Pradesh form one defined group, lower-caste individuals from these two states form another, and the tribal Irula samples form a third. Our results emphasize the correlation of genetic and geographic distances and highlight other elements, including social factors that have contributed to population structure.

  12. Structural Architecture of SNP Effects on Complex Traits

    PubMed Central

    Gamazon, Eric R.; Cox, Nancy J.; Davis, Lea K.

    2014-01-01

    Despite the discovery of copy-number variation (CNV) across the genome nearly 10 years ago, current SNP-based analysis methodologies continue to collapse the homozygous (i.e., A/A), hemizygous (i.e., A/0), and duplicative (i.e., A/A/A) genotype states, treating the genotype variable as irreducible or unaltered by other colocalizing forms of genetic (e.g., structural) variation. Our understanding of common, genome-wide CNVs suggests that the canonical genotype construct might belie the enormous complexity of the genome. Here we present multiple analyses of several phenotypes and provide methods supporting a conceptual shift that embraces the structural dimension of genotype. We comprehensively investigate the impact of the structural dimension of genotype on (1) GWAS methods, (2) interpretation of rare LOF variants, (3) characterization of genomic architecture, and (4) implications for mapping loci involved in complex disease. Taken together, these results argue for the inclusion of a structural dimension and suggest that some portion of the “missing” heritability might be recovered through integration of the structural dimension of SNP effects on complex traits. PMID:25307299

  13. A Genome-Wide Association Study for Agronomic Traits in Soybean Using SNP Markers and SNP-Based Haplotype Analysis

    PubMed Central

    de Oliveira, Marco Antônio Rott; Higashi, Wilson; Scapim, Carlos Alberto; Schuster, Ivan

    2017-01-01

    Mapping quantitative trait loci through the use of linkage disequilibrium (LD) in populations of unrelated individuals provides a valuable approach for dissecting the genetic basis of complex traits in soybean (Glycine max). The haplotype-based genome-wide association study (GWAS) has now been proposed as a complementary approach to intensify benefits from LD, which enable to assess the genetic determinants of agronomic traits. In this study a GWAS was undertaken to identify genomic regions that control 100-seed weight (SW), plant height (PH) and seed yield (SY) in a soybean association mapping panel using single nucleotide polymorphism (SNP) markers and haplotype information. The soybean cultivars (N = 169) were field-evaluated across four locations of southern Brazil. The genome-wide haplotype association analysis (941 haplotypes) identified eleven, seventeen and fifty-nine SNP-based haplotypes significantly associated with SY, SW and PH, respectively. Although most marker-trait associations were environment and trait specific, stable haplotype associations were identified for SY and SW across environments (i.e., haplotypes Gm12_Hap12). The haplotype block 42 on Chr19 (Gm19_Hap42) was confirmed to be associated with PH in two environments. These findings enable us to refine the breeding strategy for tropical soybean, which confirm that haplotype-based GWAS can provide new insights on the genetic determinants that are not captured by the single-marker approach. PMID:28152092

  14. SNP Discovery for mapping alien introgressions in wheat

    PubMed Central

    2014-01-01

    Background Monitoring alien introgressions in crop plants is difficult due to the lack of genetic and molecular mapping information on the wild crop relatives. The tertiary gene pool of wheat is a very important source of genetic variability for wheat improvement against biotic and abiotic stresses. By exploring the 5Mg short arm (5MgS) of Aegilops geniculata, we can apply chromosome genomics for the discovery of SNP markers and their use for monitoring alien introgressions in wheat (Triticum aestivum L). Results The short arm of chromosome 5Mg of Ae. geniculata Roth (syn. Ae. ovata L.; 2n = 4x = 28, UgUgMgMg) was flow-sorted from a wheat line in which it is maintained as a telocentric chromosome. DNA of the sorted arm was amplified and sequenced using an Illumina Hiseq 2000 with ~45x coverage. The sequence data was used for SNP discovery against wheat homoeologous group-5 assemblies. A total of 2,178 unique, 5MgS-specific SNPs were discovered. Randomly selected samples of 59 5MgS-specific SNPs were tested (44 by KASPar assay and 15 by Sanger sequencing) and 84% were validated. Of the selected SNPs, 97% mapped to a chromosome 5Mg addition to wheat (the source of t5MgS), and 94% to 5Mg introgressed from a different accession of Ae. geniculata substituting for chromosome 5D of wheat. The validated SNPs also identified chromosome segments of 5MgS origin in a set of T5D-5Mg translocation lines; eight SNPs (25%) mapped to TA5601 [T5DL · 5DS-5MgS(0.75)] and three (8%) to TA5602 [T5DL · 5DS-5MgS (0.95)]. SNPs (gsnp_5ms83 and gsnp_5ms94), tagging chromosome T5DL · 5DS-5MgS(0.95) with the smallest introgression carrying resistance to leaf rust (Lr57) and stripe rust (Yr40), were validated in two released germplasm lines with Lr57 and Yr40 genes. Conclusion This approach should be widely applicable for the identification of species/genome-specific SNPs. The development of a large number of SNP markers will facilitate the precise introgression and

  15. Development of maizeSNP3072, a high-throughput compatible SNP array, for DNA fingerprinting identification of Chinese maize varieties.

    PubMed

    Tian, Hong-Li; Wang, Feng-Ge; Zhao, Jiu-Ran; Yi, Hong-Mei; Wang, Lu; Wang, Rui; Yang, Yang; Song, Wei

    Single nucleotide polymorphisms (SNPs) are abundant and evenly distributed throughout the maize (Zea mays L.) genome. SNPs have several advantages over simple sequence repeats, such as ease of data comparison and integration, high-throughput processing of loci, and identification of associated phenotypes. SNPs are thus ideal for DNA fingerprinting, genetic diversity analysis, and marker-assisted breeding. Here, we developed a high-throughput and compatible SNP array, maizeSNP3072, containing 3072 SNPs developed from the maizeSNP50 array. To improve genotyping efficiency, a high-quality cluster file, maizeSNP3072_GT.egt, was constructed. All 3072 SNP loci were localized within different genes, where they were distributed in exons (43 %), promoters (21 %), 3' untranslated regions (UTRs; 22 %), 5' UTRs (9 %), and introns (5 %). The average genotyping failure rate using these SNPs was only 6 %, or 3 % using the cluster file to call genotypes. The genotype consistency of repeat sample analysis on Illumina GoldenGate versus Infinium platforms exceeded 96.4 %. The minor allele frequency (MAF) of the SNPs averaged 0.37 based on data from 309 inbred lines. The 3072 SNPs were highly effective for distinguishing among 276 examined hybrids. Comparative analysis using Chinese varieties revealed that the 3072SNP array showed a better marker success rate and higher average MAF values, evaluation scores, and variety-distinguishing efficiency than the maizeSNP50K array. The maizeSNP3072 array thus can be successfully used in DNA fingerprinting identification of Chinese maize varieties and shows potential as a useful tool for germplasm resource evaluation and molecular marker-assisted breeding.

  16. Linear reduction method for predictive and informative tag SNP selection.

    PubMed

    He, Jingwu; Westbrooks, Kelly; Zelikovsky, Alexander

    2005-01-01

    Constructing a complete human haplotype map is helpful when associating complex diseases with their related SNPs. Unfortunately, the number of SNPs is very large and it is costly to sequence many individuals. Therefore, it is desirable to reduce the number of SNPs that should be sequenced to a small number of informative representatives called tag SNPs. In this paper, we propose a new linear algebra-based method for selecting and using tag SNPs. We measure the quality of our tag SNP selection algorithm by comparing actual SNPs with SNPs predicted from selected linearly independent tag SNPs. Our experiments show that for sufficiently long haplotypes, knowing only 0.4% of all SNPs the proposed linear reduction method predicts an unknown haplotype with the error rate below 2% based on 10% of the population.

  17. Grouping preprocess for haplotype inference from SNP and CNV data

    NASA Astrophysics Data System (ADS)

    Shindo, Hiroyuki; Chigira, Hiroshi; Nagaoka, Tomoyo; Kamatani, Naoyuki; Inoue, Masato

    2009-12-01

    The method of statistical haplotype inference is an indispensable technique in the field of medical science. The authors previously reported Hardy-Weinberg equilibrium-based haplotype inference that could manage single nucleotide polymorphism (SNP) data. We recently extended the method to cover copy number variation (CNV) data. Haplotype inference from mixed data is important because SNPs and CNVs are occasionally in linkage disequilibrium. The idea underlying the proposed method is simple, but the algorithm for it needs to be quite elaborate to reduce the calculation cost. Consequently, we have focused on the details on the algorithm in this study. Although the main advantage of the method is accuracy, in that it does not use any approximation, its main disadvantage is still the calculation cost, which is sometimes intractable for large data sets with missing values.

  18. Authentication of medicinal plants by SNP-based multiplex PCR.

    PubMed

    Lee, Ok Ran; Kim, Min-Kyeoung; Yang, Deok-Chun

    2012-01-01

    Highly variable intergenic spacer and intron regions from nuclear and cytoplasmic DNA have been used for species identification. Noncoding internal transcribed spacers (ITSs) located in 18S-5.8S-26S, and 5S ribosomal RNA genes (rDNAs) represent suitable region for medicinal plant authentication. Noncoding regions from two cytoplasmic DNA, chloroplast DNA (trnT-F intergenic spacer region), and mitochondrial DNA (fourth intron region of nad7 gene) are also successfully applied for the proper identification of medicinal plants. Single-nucleotide polymorphism (SNP) sites obtained from the amplification of intergenic spacer and intron regions are properly utilized for the verification of medicinal plants in species level using multiplex PCR. Multiplex PCR as a variant of PCR technique used to amplify more than two loci simultaneously.

  19. Methods for the design, implementation, and analysis of illumina infinium™ SNP assays in plants.

    PubMed

    Chagné, David; Bianco, Luca; Lawley, Cindy; Micheletti, Diego; Jacobs, Jeanne M E

    2015-01-01

    The advent of Next-Generation sequencing-by-synthesis technologies has fuelled SNP discovery, genotyping, and screening of populations in myriad ways for many species, including various plant species. One technique widely applied to screening a large number of SNP markers over a large number of samples is the Illumina Infinium™ assay.

  20. A genome-wide SNP panel for genetic diversity, mapping and breeding studies in rice

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A genome-wide SNP resource was developed for rice using the GoldenGate assay and used to genotype 400 landrace accessions of O. sativa. SNPs were originally discovered using Perlegen re-sequencing technology in 20 diverse landraces of O. sativa as part of OryzaSNP project (http://irfgc.irri.org). An...

  1. A Coordinated Approach to Peach SNP Discovery in RosBREED

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In the USDA-funded multi-institutional and trans-disciplinary project, “RosBREED”, crop-specific SNP genome scan platforms are being developed for peach, apple, strawberry, and cherry at a resolution of at least one polymorphic SNP marker every 5 cM in any random cross, for use in Pedigree-Based Ana...

  2. TNF-alpha SNP haplotype frequencies in equidae.

    PubMed

    Brown, J J; Ollier, W E R; Thomson, W; Matthews, J B; Carter, S D; Binns, M; Pinchbeck, G; Clegg, P D

    2006-05-01

    Tumour necrosis factor alpha (TNF-alpha) is a pro-inflammatory cytokine that plays a crucial role in the regulation of inflammatory and immune responses. In all vertebrate species the genes encoding TNF-alpha are located within the major histocompatability complex. In the horse TNF-alpha has been ascribed a role in a variety of important disease processes. Previously two single nucleotide polymorphisms (SNPs) have been reported within the 5' un-translated region of the equine TNF-alpha gene. We have examined the equine TNF-alpha promoter region further for additional SNPs by analysing DNA from 131 horses (Equus caballus), 19 donkeys (E. asinus), 2 Grant's zebras (E. burchellii boehmi) and one onager (E. hemionus). Two further SNPs were identified at nucleotide positions 24 (T/G) and 452 (T/C) relative to the first nucleotide of the 522 bp polymerase chain reaction product. A sequence variant at position 51 was observed between equidae. SNaPSHOT genotyping assays for these and the two previously reported SNPs were performed on 457 horses comprising seven different breeds and 23 donkeys to determine the gene frequencies. SNP frequencies varied considerably between different horse breeds and also between the equine species. In total, nine different TNF-alpha promoter SNP haplotypes and their frequencies were established amongst the various equidae examined, with some haplotypes being found only in horses and others only in donkeys or zebras. The haplotype frequencies observed varied greatly between different horse breeds. Such haplotypes may relate to levels of TNF-alpha production and disease susceptibility and further investigation is required to identify associations between particular haplotypes and altered risk of disease.

  3. SNP uniqueness problem: a proof-of-principle in HapMap SNPs.

    PubMed

    Doron, Shany; Shweiki, Dorit

    2011-04-01

    SNP-based research strongly affects our biomedical and clinically associated knowledge. Nonunique and false-positive SNP existence in commonly used datasets may thus lead to biased, inaccurate clinically associated conclusions. We designed a computational study to reveal the degree of nonunique/false-positive SNPs in the HapMap dataset. Two sets of SNP flanking sequences were used as queries for BLAT analysis against the human genome. 4.2% and 11.9% of HapMap SNPs align to the genome nonuniquely (long and short, respectively). Furthermore, an average of 7.9% nonunique SNPs are included in common commercial genotyping arrays (according to our designed probes). Nonunique SNPs identified in this study are represented to various degrees in clinically associated databases, stressing the consequence of inaccurate SNP annotation and hence SNP utilization. Unfortunately, our results question some disease-related genotyping analyses, raising a worrisome concern on their validity.

  4. Design and characterization of a 52K SNP chip for goats.

    PubMed

    Tosser-Klopp, Gwenola; Bardou, Philippe; Bouchez, Olivier; Cabau, Cédric; Crooijmans, Richard; Dong, Yang; Donnadieu-Tonon, Cécile; Eggen, André; Heuven, Henri C M; Jamli, Saadiah; Jiken, Abdullah Johari; Klopp, Christophe; Lawley, Cynthia T; McEwan, John; Martin, Patrice; Moreno, Carole R; Mulsant, Philippe; Nabihoudine, Ibouniyamine; Pailhoux, Eric; Palhière, Isabelle; Rupp, Rachel; Sarry, Julien; Sayre, Brian L; Tircazes, Aurélie; Jun Wang; Wang, Wen; Zhang, Wenguang

    2014-01-01

    The success of Genome Wide Association Studies in the discovery of sequence variation linked to complex traits in humans has increased interest in high throughput SNP genotyping assays in livestock species. Primary goals are QTL detection and genomic selection. The purpose here was design of a 50-60,000 SNP chip for goats. The success of a moderate density SNP assay depends on reliable bioinformatic SNP detection procedures, the technological success rate of the SNP design, even spacing of SNPs on the genome and selection of Minor Allele Frequencies (MAF) suitable to use in diverse breeds. Through the federation of three SNP discovery projects consolidated as the International Goat Genome Consortium, we have identified approximately twelve million high quality SNP variants in the goat genome stored in a database together with their biological and technical characteristics. These SNPs were identified within and between six breeds (meat, milk and mixed): Alpine, Boer, Creole, Katjang, Saanen and Savanna, comprising a total of 97 animals. Whole genome and Reduced Representation Library sequences were aligned on >10 kb scaffolds of the de novo goat genome assembly. The 60,000 selected SNPs, evenly spaced on the goat genome, were submitted for oligo manufacturing (Illumina, Inc) and published in dbSNP along with flanking sequences and map position on goat assemblies (i.e. scaffolds and pseudo-chromosomes), sheep genome V2 and cattle UMD3.1 assembly. Ten breeds were then used to validate the SNP content and 52,295 loci could be successfully genotyped and used to generate a final cluster file. The combined strategy of using mainly whole genome Next Generation Sequencing and mapping on a contig genome assembly, complemented with Illumina design tools proved to be efficient in producing this GoatSNP50 chip. Advances in use of molecular markers are expected to accelerate goat genomic studies in coming years.

  5. A Customized Pigmentation SNP Array Identifies a Novel SNP Associated with Melanoma Predisposition in the SLC45A2 Gene

    PubMed Central

    Alonso, Santos; Boyano, M. Dolores; Peña-Chilet, Maria; Pita, Guillermo; Aviles, Jose A.; Mayor, Matias; Gomez-Fernandez, Cristina; Casado, Beatriz; Martin-Gonzalez, Manuel; Izagirre, Neskuts; De la Rua, Concepcion; Asumendi, Aintzane; Perez-Yarza, Gorka; Arroyo-Berdugo, Yoana; Boldo, Enrique; Lozoya, Rafael; Torrijos-Aguilar, Arantxa; Pitarch, Ana; Pitarch, Gerard; Sanchez-Motilla, Jose M.; Valcuende-Cavero, Francisca; Tomas-Cabedo, Gloria; Perez-Pastor, Gemma; Diaz-Perez, Jose L.; Gardeazabal, Jesus; de Lizarduy, Iñigo Martinez; Sanchez-Diez, Ana; Valdes, Carlos; Pizarro, Angel; Casado, Mariano; Carretero, Gregorio; Botella-Estrada, Rafael; Nagore, Eduardo; Lazaro, Pablo; Lluch, Ana; Benitez, Javier; Martinez-Cadenas, Conrado; Ribas, Gloria

    2011-01-01

    As the incidence of Malignant Melanoma (MM) reflects an interaction between skin colour and UV exposure, variations in genes implicated in pigmentation and tanning response to UV may be associated with susceptibility to MM. In this study, 363 SNPs in 65 gene regions belonging to the pigmentation pathway have been successfully genotyped using a SNP array. Five hundred and ninety MM cases and 507 controls were analyzed in a discovery phase I. Ten candidate SNPs based on a p-value threshold of 0.01 were identified. Two of them, rs35414 (SLC45A2) and rs2069398 (SILV/CKD2), were statistically significant after conservative Bonferroni correction. The best six SNPs were further tested in an independent Spanish series (624 MM cases and 789 controls). A novel SNP located on the SLC45A2 gene (rs35414) was found to be significantly associated with melanoma in both phase I and phase II (P<0.0001). None of the other five SNPs were replicated in this second phase of the study. However, three SNPs in TYR, SILV/CDK2 and ADAMTS20 genes (rs17793678, rs2069398 and rs1510521 respectively) had an overall p-value<0.05 when considering the whole DNA collection (1214 MM cases and 1296 controls). Both the SLC45A2 and the SILV/CDK2 variants behave as protective alleles, while the TYR and ADAMTS20 variants seem to function as risk alleles. Cumulative effects were detected when these four variants were considered together. Furthermore, individuals carrying two or more mutations in MC1R, a well-known low penetrance melanoma-predisposing gene, had a decreased MM risk if concurrently bearing the SLC45A2 protective variant. To our knowledge, this is the largest study on Spanish sporadic MM cases to date. PMID:21559390

  6. Analysis of population structure and genetic history of cattle breeds based on high-density SNP data

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Advances in single nucleotide polymorphism (SNP) genotyping microarrays have facilitated a new understanding of population structure and evolutionary history for several species. Most existing studies in livestock were based on low density SNP arrays. The first wave of low density SNP studies on cat...

  7. Rice SNP-seek database update: new SNPs, indels, and queries

    PubMed Central

    Mansueto, Locedie; Fuentes, Roven Rommel; Borja, Frances Nikki; Detras, Jeffery; Abriol-Santos, Juan Miguel; Chebotarov, Dmytro; Sanciangco, Millicent; Palis, Kevin; Copetti, Dario; Poliakov, Alexandre; Dubchak, Inna; Solovyev, Victor; Wing, Rod A.; Hamilton, Ruaraidh Sackville; Mauleon, Ramil; McNally, Kenneth L.; Alexandrov, Nickolai

    2017-01-01

    We describe updates to the Rice SNP-Seek Database since its first release. We ran a new SNP-calling pipeline followed by filtering that resulted in complete, base, filtered and core SNP datasets. Besides the Nipponbare reference genome, the pipeline was run on genome assemblies of IR 64, 93-11, DJ 123 and Kasalath. New genotype query and display features are added for reference assemblies, SNP datasets and indels. JBrowse now displays BAM, VCF and other annotation tracks, the additional genome assemblies and an embedded VISTA genome comparison viewer. Middleware is redesigned for improved performance by using a hybrid of HDF5 and RDMS for genotype storage. Query modules for genotypes, varieties and genes are improved to handle various constraints. An integrated list manager allows the user to pass query parameters for further analysis. The SNP Annotator adds traits, ontology terms, effects and interactions to markers in a list. Web-service calls were implemented to access most data. These features enable seamless querying of SNP-Seek across various biological entities, a step toward semi-automated gene-trait association discovery. URL: http://snp-seek.irri.org. PMID:27899667

  8. Rice SNP-seek database update: new SNPs, indels, and queries.

    PubMed

    Mansueto, Locedie; Fuentes, Roven Rommel; Borja, Frances Nikki; Detras, Jeffery; Abriol-Santos, Juan Miguel; Chebotarov, Dmytro; Sanciangco, Millicent; Palis, Kevin; Copetti, Dario; Poliakov, Alexandre; Dubchak, Inna; Solovyev, Victor; Wing, Rod A; Hamilton, Ruaraidh Sackville; Mauleon, Ramil; McNally, Kenneth L; Alexandrov, Nickolai

    2017-01-04

    We describe updates to the Rice SNP-Seek Database since its first release. We ran a new SNP-calling pipeline followed by filtering that resulted in complete, base, filtered and core SNP datasets. Besides the Nipponbare reference genome, the pipeline was run on genome assemblies of IR 64, 93-11, DJ 123 and Kasalath. New genotype query and display features are added for reference assemblies, SNP datasets and indels. JBrowse now displays BAM, VCF and other annotation tracks, the additional genome assemblies and an embedded VISTA genome comparison viewer. Middleware is redesigned for improved performance by using a hybrid of HDF5 and RDMS for genotype storage. Query modules for genotypes, varieties and genes are improved to handle various constraints. An integrated list manager allows the user to pass query parameters for further analysis. The SNP Annotator adds traits, ontology terms, effects and interactions to markers in a list. Web-service calls were implemented to access most data. These features enable seamless querying of SNP-Seek across various biological entities, a step toward semi-automated gene-trait association discovery. URL: http://snp-seek.irri.org.

  9. Identification, validation and survey of a single nucleotide polymorphism (SNP) associated with pungency in Capsicum spp.

    PubMed

    Garcés-Claver, Ana; Fellman, Shanna Moore; Gil-Ortega, Ramiro; Jahn, Molly; Arnedo-Andrés, María S

    2007-11-01

    A single nucleotide polymorphism (SNP) associated with pungency was detected within an expressed sequence tag (EST) of 307 bp. This fragment was identified after expression analysis of the EST clone SB2-66 in placenta tissue of Capsicum fruits. Sequence alignments corresponding to this new fragment allowed us to identify an SNP between pungent and non-pungent accessions. Two methods were chosen for the development of the SNP marker linked to pungency: tetra-primer amplification refractory mutation system-PCR (tetra-primer ARMS-PCR) and cleaved amplified polymorphic sequence. Results showed that both methods were successful in distinguishing genotypes. Nevertheless, tetra-primer ARMS-PCR was chosen for SNP genotyping because it was more rapid, reliable and less cost-effective. The utility of this SNP marker for pungency was demonstrated by the ability to distinguish between 29 pungent and non-pungent cultivars of Capsicum annuum. In addition, the SNP was also associated with phenotypic pungent character in the tested genotypes of C. chinense, C. baccatum, C. frutescens, C. galapagoense, C. eximium, C. tovarii and C. cardenasi. This SNP marker is a faster, cheaper and more reproducible method for identifying pungent peppers than other techniques such as panel tasting, and allows rapid screening of the trait in early growth stages.

  10. QuickSNP: an automated web server for selection of tagSNPs

    PubMed Central

    Grover, Deepak; Woodfield, Alonzo S.; Verma, Ranjana; Zandi, Peter P.; Levinson, Douglas F.; Potash, James B.

    2007-01-01

    Although large-scale genetic association studies involving hundreds to thousands of SNPs have become feasible, the associated cost is substantial. Even with the increased efficiency introduced by the use of tagSNPs, researchers are often seeking ways to maximize resource utilization given a set of SNP-based gene-mapping goals. We have developed a web server named QuickSNP in order to provide cost-effective selection of SNPs, and to fill in some of the gaps in existing SNP selection tools. One useful feature of QuickSNP is the option to select only gene-centric SNPs from a chromosomal region in an automated fashion. Other useful features include automated selection of coding non-synonymous SNPs, SNP filtering based on inter-SNP distances and information regarding the availability of genotyping assays for SNPs and whether they are present on whole genome chips. The program produces user-friendly summary tables and results, and a link to a UCSC Genome Browser track illustrating the position of the selected tagSNPs in relation to genes and other genomic features. We hope the unique combination of features of this server will be useful for researchers aiming to select markers for their genotyping studies. The server is freely available and can be accessed at the URL http://bioinformoodics.jhmi.edu/quickSNP.pl. PMID:17517769

  11. Exploring of new Y-chromosome SNP loci using Pyrosequencing and the SNaPshot methods.

    PubMed

    Wei, Wei; Luo, Hai-Bo; Yan, Jing; Hou, Yi-Ping

    2012-11-01

    The single nucleotide polymorphisms on the Y chromosome (Y-SNP) have been considered to be important in forensic casework. However, Y-SNP loci were mostly population specific and lacked biallelic polymorphisms in the Asian population. In this study, we developed a strategy for seeking and genotyping new Y-SNP markers based on both Pyrosequencing and the SNaPshot methods. As results, 34 new biallelic markers were observed to be polymorphic in the Chinese Han population by estimation of allele frequencies of 103 candidate's Y-SNP loci in DNA pools using Pyrosequencing technology. Then, a multiplex system with 20 Y-SNP loci was genotyped using the SNaPshot™ multiplex kit. Twenty Y-SNP loci defined 56 different haplotypes, and the haplotype diversity was estimated to be 0.9539. Our result demonstrated that the strategy could be used as an efficient tool to search and genotype biallelic markers from a large amount of candidate loci. In addition, 20 Y-SNP loci constructed a multiplex system, which could provide supplementary information for forensic identification.

  12. A system for exact and approximate genetic linkage analysis of SNP data in large pedigrees

    PubMed Central

    Silberstein, Mark; Weissbrod, Omer; Otten, Lars; Tzemach, Anna; Anisenia, Andrei; Shtark, Oren; Tuberg, Dvir; Galfrin, Eddie; Gannon, Irena; Shalata, Adel; Borochowitz, Zvi U.; Dechter, Rina; Thompson, Elizabeth; Geiger, Dan

    2013-01-01

    Motivation: The use of dense single nucleotide polymorphism (SNP) data in genetic linkage analysis of large pedigrees is impeded by significant technical, methodological and computational challenges. Here we describe Superlink-Online SNP, a new powerful online system that streamlines the linkage analysis of SNP data. It features a fully integrated flexible processing workflow comprising both well-known and novel data analysis tools, including SNP clustering, erroneous data filtering, exact and approximate LOD calculations and maximum-likelihood haplotyping. The system draws its power from thousands of CPUs, performing data analysis tasks orders of magnitude faster than a single computer. By providing an intuitive interface to sophisticated state-of-the-art analysis tools coupled with high computing capacity, Superlink-Online SNP helps geneticists unleash the potential of SNP data for detecting disease genes. Results: Computations performed by Superlink-Online SNP are automatically parallelized using novel paradigms, and executed on unlimited number of private or public CPUs. One novel service is large-scale approximate Markov Chain–Monte Carlo (MCMC) analysis. The accuracy of the results is reliably estimated by running the same computation on multiple CPUs and evaluating the Gelman–Rubin Score to set aside unreliable results. Another service within the workflow is a novel parallelized exact algorithm for inferring maximum-likelihood haplotyping. The reported system enables genetic analyses that were previously infeasible. We demonstrate the system capabilities through a study of a large complex pedigree affected with metabolic syndrome. Availability: Superlink-Online SNP is freely available for researchers at http://cbl-hap.cs.technion.ac.il/superlink-snp. The system source code can also be downloaded from the system website. Contact: omerw@cs.technion.ac.il Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23162081

  13. Sequential sentinel SNP Regional Association Plots (SSS-RAP): an approach for testing independence of SNP association signals using meta-analysis data.

    PubMed

    Zheng, Jie; Gaunt, Tom R; Day, Ian N M

    2013-01-01

    Genome-Wide Association Studies (GWAS) frequently incorporate meta-analysis within their framework. However, conditional analysis of individual-level data, which is an established approach for fine mapping of causal sites, is often precluded where only group-level summary data are available for analysis. Here, we present a numerical and graphical approach, "sequential sentinel SNP regional association plot" (SSS-RAP), which estimates regression coefficients (beta) with their standard errors using the meta-analysis summary results directly. Under an additive model, typical for genes with small effect, the effect for a sentinel SNP can be transformed to the predicted effect for a possibly dependent SNP through a 2×2 2-SNP haplotypes table. The approach assumes Hardy-Weinberg equilibrium for test SNPs. SSS-RAP is available as a Web-tool (http://apps.biocompute.org.uk/sssrap/sssrap.cgi). To develop and illustrate SSS-RAP we analyzed lipid and ECG traits data from the British Women's Heart and Health Study (BWHHS), evaluated a meta-analysis for ECG trait and presented several simulations. We compared results with existing approaches such as model selection methods and conditional analysis. Generally findings were consistent. SSS-RAP represents a tool for testing independence of SNP association signals using meta-analysis data, and is also a convenient approach based on biological principles for fine mapping in group level summary data.

  14. SNP-SNP interactions between WNT4 and WNT5A were associated with obesity related traits in Han Chinese Population

    PubMed Central

    Dong, Shan-Shan; Hu, Wei-Xin; Yang, Tie-Lin; Chen, Xiao-Feng; Yan, Han; Chen, Xiang-Ding; Tan, Li-Jun; Tian, Qing; Deng, Hong-Wen; Guo, Yan

    2017-01-01

    Considering the biological roles of WNT4 and WNT5A involved in adipogenesis, we aimed to investigate whether SNPs in WNT4 and WNT5A contribute to obesity related traits in Han Chinese population. Targeted genomic sequence for WNT4 and WNT5A was determined in 100 Han Chinese subjects and tag SNPs were selected. Both single SNP and SNP × SNP interaction association analyses with body mass index (BMI) were evaluated in the 100 subjects and another independent sample of 1,627 Han Chinese subjects. Meta-analyses were performed and multiple testing corrections were carried out using the Bonferroni method. Consistent with the Genetic Investigation of ANthropometric Traits (GIANT) dataset results, we didn’t detect significant association signals in single SNP association analyses. However, the interaction between rs2072920 and rs11918967, was associated with BMI after multiple testing corrections (combined P = 2.20 × 10−4). The signal was also significant in each contributing data set. SNP rs2072920 is located in the 3′-UTR of WNT4 and SNP rs11918967 is located in the intron of WNT5A. Functional annotation results revealed that both SNPs might be involved in transcriptional regulation of gene expression. Our results suggest that a combined effect of SNPs via WNT4-WNT5A interaction may affect the variation of BMI in Han Chinese population. PMID:28272483

  15. Porcine colonization of the Americas: a 60k SNP story.

    PubMed

    Burgos-Paz, W; Souza, C A; Megens, H J; Ramayo-Caldas, Y; Melo, M; Lemús-Flores, C; Caal, E; Soto, H W; Martínez, R; Alvarez, L A; Aguirre, L; Iñiguez, V; Revidatti, M A; Martínez-López, O R; Llambi, S; Esteve-Codina, A; Rodríguez, M C; Crooijmans, R P M A; Paiva, S R; Schook, L B; Groenen, M A M; Pérez-Enciso, M

    2013-04-01

    The pig, Sus scrofa, is a foreign species to the American continent. Although pigs originally introduced in the Americas should be related to those from the Iberian Peninsula and Canary islands, the phylogeny of current creole pigs that now populate the continent is likely to be very complex. Because of the extreme climates that America harbors, these populations also provide a unique example of a fast evolutionary phenomenon of adaptation. Here, we provide a genome wide study of these issues by genotyping, with a 60k SNP chip, 206 village pigs sampled across 14 countries and 183 pigs from outgroup breeds that are potential founders of the American populations, including wild boar, Iberian, international and Chinese breeds. Results show that American village pigs are primarily of European ancestry, although the observed genetic landscape is that of a complex conglomerate. There was no correlation between genetic and geographical distances, neither continent wide nor when analyzing specific areas. Most populations showed a clear admixed structure where the Iberian pig was not necessarily the main component, illustrating how international breeds, but also Chinese pigs, have contributed to extant genetic composition of American village pigs. We also observe that many genes related to the cardiovascular system show an increased differentiation between altiplano and genetically related pigs living near sea level.

  16. Porcine colonization of the Americas: a 60k SNP story

    PubMed Central

    Burgos-Paz, W; Souza, C A; Megens, H J; Ramayo-Caldas, Y; Melo, M; Lemús-Flores, C; Caal, E; Soto, H W; Martínez, R; Álvarez, L A; Aguirre, L; Iñiguez, V; Revidatti, M A; Martínez-López, O R; Llambi, S; Esteve-Codina, A; Rodríguez, M C; Crooijmans, R P M A; Paiva, S R; Schook, L B; Groenen, M A M; Pérez-Enciso, M

    2013-01-01

    The pig, Sus scrofa, is a foreign species to the American continent. Although pigs originally introduced in the Americas should be related to those from the Iberian Peninsula and Canary islands, the phylogeny of current creole pigs that now populate the continent is likely to be very complex. Because of the extreme climates that America harbors, these populations also provide a unique example of a fast evolutionary phenomenon of adaptation. Here, we provide a genome wide study of these issues by genotyping, with a 60k SNP chip, 206 village pigs sampled across 14 countries and 183 pigs from outgroup breeds that are potential founders of the American populations, including wild boar, Iberian, international and Chinese breeds. Results show that American village pigs are primarily of European ancestry, although the observed genetic landscape is that of a complex conglomerate. There was no correlation between genetic and geographical distances, neither continent wide nor when analyzing specific areas. Most populations showed a clear admixed structure where the Iberian pig was not necessarily the main component, illustrating how international breeds, but also Chinese pigs, have contributed to extant genetic composition of American village pigs. We also observe that many genes related to the cardiovascular system show an increased differentiation between altiplano and genetically related pigs living near sea level. PMID:23250008

  17. Rapid SNP Detection and Genotyping of Bacterial Pathogens by Pyrosequencing.

    PubMed

    Amoako, Kingsley K; Thomas, Matthew C; Janzen, Timothy W; Goji, Noriko

    2017-01-01

    Bacterial identification and typing are fixtures of microbiology laboratories and are vital aspects of our response mechanisms in the event of foodborne outbreaks and bioterrorist events. Whole genome sequencing (WGS) is leading the way in terms of expanding our ability to identify and characterize bacteria through the identification of subtle differences between genomes (e.g. single nucleotide polymorphisms (SNPs) and insertions/deletions). Modern high-throughput technologies such as pyrosequencing can facilitate the typing of bacteria by generating short-read sequence data of informative regions identified by WGS analyses, at a fraction of the cost of WGS. Thus, pyrosequencing systems remain a valuable asset in the laboratory today. Presented in this chapter are two methods developed in the Amoako laboratory that detail the identification and genotyping of bacterial pathogens. The first targets canonical single nucleotide polymorphisms (canSNPs) of evolutionary importance in Bacillus anthracis, the causative agent of Anthrax. The second assay detects Shiga-toxin (stx) genes, which are associated with virulence in Escherichia coli and Shigella spp., and differentiates the subtypes of stx-1 and stx-2 based on SNP loci. These rapid methods provide end users with important information regarding virulence traits as well as the evolutionary and biogeographic origin of isolates.

  18. SNP Markers as Additional Information to Resolve Complex Kinship Cases

    PubMed Central

    Pontes, M. Lurdes; Fondevila, Manuel; Laréu, Maria Victoria; Medeiros, Rui

    2015-01-01

    Summary Background DNA profiling with sets of highly polymorphic autosomal short tandem repeat (STR) markers has been applied in various aspects of human identification in forensic casework for nearly 20 years. However, in some cases of complex kinship investigation, the information provided by the conventionally used STR markers is not enough, often resulting in low likelihood ratio (LR) calculations. In these cases, it becomes necessary to increment the number of loci under analysis to reach adequate LRs. Recently, it has been proposed that single nucleotide polymorphisms (SNPs) could be used as a supportive tool to STR typing, eventually even replacing the methods/markers now employed. Methods In this work, we describe the results obtained in 7 revised complex paternity cases when applying a battery of STRs, as well as 52 human identification SNPs (SNPforID 52plex identification panel) using a SNaPshot methodology followed by capillary electrophoresis. Results Our results show that the analysis of SNPs, as complement to STR typing in forensic casework applications, would at least increase by a factor of 4 total PI values and correspondent Essen-Möller's W value. Conclusions We demonstrated that SNP genotyping could be a key complement to STR information in challenging casework of disputed paternity, such as close relative individualization or complex pedigrees subject to endogamous relations. PMID:26733770

  19. Using Mendelian inheritance to improve high-throughput SNP discovery.

    PubMed

    Chen, Nancy; Van Hout, Cristopher V; Gottipati, Srikanth; Clark, Andrew G

    2014-11-01

    Restriction site-associated DNA sequencing or genotyping-by-sequencing (GBS) approaches allow for rapid and cost-effective discovery and genotyping of thousands of single-nucleotide polymorphisms (SNPs) in multiple individuals. However, rigorous quality control practices are needed to avoid high levels of error and bias with these reduced representation methods. We developed a formal statistical framework for filtering spurious loci, using Mendelian inheritance patterns in nuclear families, that accommodates variable-quality genotype calls and missing data--both rampant issues with GBS data--and for identifying sex-linked SNPs. Simulations predict excellent performance of both the Mendelian filter and the sex-linkage assignment under a variety of conditions. We further evaluate our method by applying it to real GBS data and validating a subset of high-quality SNPs. These results demonstrate that our metric of Mendelian inheritance is a powerful quality filter for GBS loci that is complementary to standard coverage and Hardy-Weinberg filters. The described method, implemented in the software MendelChecker, will improve quality control during SNP discovery in nonmodel as well as model organisms.

  20. Imputation of KIR Types from SNP Variation Data

    PubMed Central

    Vukcevic, Damjan; Traherne, James A.; Næss, Sigrid; Ellinghaus, Eva; Kamatani, Yoichiro; Dilthey, Alexander; Lathrop, Mark; Karlsen, Tom H.; Franke, Andre; Moffatt, Miriam; Cookson, William; Trowsdale, John; McVean, Gil; Sawcer, Stephen; Leslie, Stephen

    2015-01-01

    Large population studies of immune system genes are essential for characterizing their role in diseases, including autoimmune conditions. Of key interest are a group of genes encoding the killer cell immunoglobulin-like receptors (KIRs), which have known and hypothesized roles in autoimmune diseases, resistance to viruses, reproductive conditions, and cancer. These genes are highly polymorphic, which makes typing expensive and time consuming. Consequently, despite their importance, KIRs have been little studied in large cohorts. Statistical imputation methods developed for other complex loci (e.g., human leukocyte antigen [HLA]) on the basis of SNP data provide an inexpensive high-throughput alternative to direct laboratory typing of these loci and have enabled important findings and insights for many diseases. We present KIR∗IMP, a method for imputation of KIR copy number. We show that KIR∗IMP is highly accurate and thus allows the study of KIRs in large cohorts and enables detailed investigation of the role of KIRs in human disease. PMID:26430804

  1. Development and Evaluation of a 9K SNP Array for Peach by Internationally Coordinated SNP Detection and Validation in Breeding Germplasm

    PubMed Central

    Scalabrin, Simone; Gilmore, Barbara; Lawley, Cynthia T.; Gasic, Ksenija; Micheletti, Diego; Rosyara, Umesh R.; Cattonaro, Federica; Vendramin, Elisa; Main, Dorrie; Aramini, Valeria; Blas, Andrea L.; Mockler, Todd C.; Bryant, Douglas W.; Wilhelm, Larry; Troggio, Michela; Sosinski, Bryon; Aranzana, Maria José; Arús, Pere; Iezzoni, Amy; Morgante, Michele; Peace, Cameron

    2012-01-01

    Although a large number of single nucleotide polymorphism (SNP) markers covering the entire genome are needed to enable molecular breeding efforts such as genome wide association studies, fine mapping, genomic selection and marker-assisted selection in peach [Prunus persica (L.) Batsch] and related Prunus species, only a limited number of genetic markers, including simple sequence repeats (SSRs), have been available to date. To address this need, an international consortium (The International Peach SNP Consortium; IPSC) has pursued a coordinated effort to perform genome-scale SNP discovery in peach using next generation sequencing platforms to develop and characterize a high-throughput Illumina Infinium® SNP genotyping array platform. We performed whole genome re-sequencing of 56 peach breeding accessions using the Illumina and Roche/454 sequencing technologies. Polymorphism detection algorithms identified a total of 1,022,354 SNPs. Validation with the Illumina GoldenGate® assay was performed on a subset of the predicted SNPs, verifying ∼75% of genic (exonic and intronic) SNPs, whereas only about a third of intergenic SNPs were verified. Conservative filtering was applied to arrive at a set of 8,144 SNPs that were included on the IPSC peach SNP array v1, distributed over all eight peach chromosomes with an average spacing of 26.7 kb between SNPs. Use of this platform to screen a total of 709 accessions of peach in two separate evaluation panels identified a total of 6,869 (84.3%) polymorphic SNPs. The almost 7,000 SNPs verified as polymorphic through extensive empirical evaluation represent an excellent source of markers for future studies in genetic relatedness, genetic mapping, and dissecting the genetic architecture of complex agricultural traits. The IPSC peach SNP array v1 is commercially available and we expect that it will be used worldwide for genetic studies in peach and related stone fruit and nut species. PMID:22536421

  2. Interim report on updated microarray probes for the LLNL Burkholderia pseudomallei SNP array

    SciTech Connect

    Gardner, S; Jaing, C

    2012-03-27

    The overall goal of this project is to forensically characterize 100 unknown Burkholderia isolates in the US-Australia collaboration. We will identify genome-wide single nucleotide polymorphisms (SNPs) from B. pseudomallei and near neighbor species including B. mallei, B. thailandensis and B. oklahomensis. We will design microarray probes to detect these SNP markers and analyze 100 Burkholderia genomic DNAs extracted from environmental, clinical and near neighbor isolates from Australian collaborators on the Burkholderia SNP microarray. We will analyze the microarray genotyping results to characterize the genetic diversity of these new isolates and triage the samples for whole genome sequencing. In this interim report, we described the SNP analysis and the microarray probe design for the Burkholderia SNP microarray.

  3. A user guide to the Brassica 60K Illumina Infinium™ SNP genotyping array.

    PubMed

    Mason, Annaliese S; Higgins, Erin E; Snowdon, Rod J; Batley, Jacqueline; Stein, Anna; Werner, Christian; Parkin, Isobel A P

    2017-02-20

    The Brassica napus 60K Illumina Infinium™ SNP array has had huge international uptake in the rapeseed community due to the revolutionary speed of acquisition and ease of analysis of this high-throughput genotyping data, particularly when coupled with the newly available reference genome sequence. However, further utilization of this valuable resource can be optimized by better understanding the promises and pitfalls of SNP arrays. We outline how best to analyze Brassica SNP marker array data for diverse applications, including linkage and association mapping, genetic diversity and genomic introgression studies. We present data on which SNPs are locus-specific in winter, semi-winter and spring B. napus germplasm pools, rather than amplifying both an A-genome and a C-genome locus or multiple loci. Common issues that arise when analyzing array data will be discussed, particularly those unique to SNP markers and how to deal with these for practical applications in Brassica breeding applications.

  4. Use of molecular variation in the NCBI dbSNP database.

    PubMed

    Sherry, S T; Ward, M; Sirotkin, K

    2000-01-01

    While high quality information regarding variation in genes is currently available in locus-specific or specialized mutation databases, the need remains for a general catalog of genome variation to address the large-scale sampling designs required by association studies, gene mapping, and evolutionary biology. In response to this need, the National Center for Biotechnology Information (NCBI) has established the dbSNP database http://ncbi. nlm.nih.gov/SNP/ to serve as a generalized, central variation database. Submissions to dbSNP will be integrated with other sources of information at NCBI such as GenBank, PubMed, LocusLink, and the Human Genome Project data, and the complete contents of dbSNP are available to the public via anonymous FTP. Hum Mutat 15:68-75, 2000. Published 2000 Wiley-Liss, Inc.

  5. Set up of cutoff thresholds for kinship determination using SNP loci.

    PubMed

    Cho, Sohee; Shin, Eun Soon; Yu, Hyung Jin; Lee, Ji Hyun; Seo, Hee Jin; Kim, Moon Young; Lee, Soong Deok

    2017-03-08

    The usefulness of single nucleotide polymorphism (SNP) loci for kinship testing has been demonstrated in many case works, and suggested as a promising marker for relationship identification. For interpreting results based on the calculation of the likelihood ratio (LR) in kinship testing, it is important to prepare cutoffs for respective relatives which are dependent on genetic relatedness. For this, analysis using true pedigree data is significant and reliable as it reflects the actual frequencies of markers in the population. In this study, the kinship index was explored through 1209 parent-child pairs, 1373 full sibling pairs, and 247 uncle-nephew pairs using 136 SNP loci. The cutoffs for LR were set up using different numbers of SNP loci with accuracy, sensitivity, and specificity. It is expected that this study can support the application of SNP loci-based kinship testing for various relationships.

  6. SNP discovery and genotyping using Genotyping-by-Sequencing in Pekin ducks

    PubMed Central

    Zhu, Feng; Cui, Qian-Qian; Hou, Zhuo-Cheng

    2016-01-01

    Genomic selection and genome-wide association studies need thousands to millions of SNPs. However, many non-model species do not have reference chips for detecting variation. Our goal was to develop and validate an inexpensive but effective method for detecting SNP variation. Genotyping by sequencing (GBS) can be a highly efficient strategy for genome-wide SNP detection, as an alternative to microarray chips. Here, we developed a GBS protocol for ducks and tested it to genotype 49 Pekin ducks. A total of 169,209 SNPs were identified from all animals, with a mean of 55,920 SNPs per individual. The average SNP density reached 1156 SNPs/MB. In this study, the first application of GBS to ducks, we demonstrate the power and simplicity of this method. GBS can be used for genetic studies in to provide an effective method for genome-wide SNP discovery. PMID:27845353

  7. Gene-Environment Interaction in the Etiology of Mathematical Ability Using SNP Sets

    PubMed Central

    Kovas, Yulia; Plomin, Robert

    2010-01-01

    Mathematics ability and disability is as heritable as other cognitive abilities and disabilities, however its genetic etiology has received relatively little attention. In our recent genome-wide association study of mathematical ability in 10-year-old children, 10 SNP associations were nominated from scans of pooled DNA and validated in an individually genotyped sample. In this paper, we use a ‘SNP set’ composite of these 10 SNPs to investigate gene-environment (GE) interaction, examining whether the association between the 10-SNP set and mathematical ability differs as a function of ten environmental measures in the home and school in a sample of 1888 children with complete data. We found two significant GE interactions for environmental measures in the home and the school both in the direction of the diathesis-stress type of GE interaction: The 10-SNP set was more strongly associated with mathematical ability in chaotic homes and when parents are negative. PMID:20978832

  8. Evaluation of breast cancer susceptibility using improved genetic algorithms to generate genotype SNP barcodes.

    PubMed

    Yang, Cheng-Hong; Lin, Yu-Da; Chuang, Li-Yeh; Chang, Hsueh-Wei

    2013-01-01

    Genetic association is a challenging task for the identification and characterization of genes that increase the susceptibility to common complex multifactorial diseases. To fully execute genetic studies of complex diseases, modern geneticists face the challenge of detecting interactions between loci. A genetic algorithm (GA) is developed to detect the association of genotype frequencies of cancer cases and noncancer cases based on statistical analysis. An improved genetic algorithm (IGA) is proposed to improve the reliability of the GA method for high-dimensional SNP-SNP interactions. The strategy offers the top five results to the random population process, in which they guide the GA toward a significant search course. The IGA increases the likelihood of quickly detecting the maximum ratio difference between cancer cases and noncancer cases. The study systematically evaluates the joint effect of 23 SNP combinations of six steroid hormone metabolisms, and signaling-related genes involved in breast carcinogenesis pathways were systematically evaluated, with IGA successfully detecting significant ratio differences between breast cancer cases and noncancer cases. The possible breast cancer risks were subsequently analyzed by odds-ratio (OR) and risk-ratio analysis. The estimated OR of the best SNP barcode is significantly higher than 1 (between 1.15 and 7.01) for specific combinations of two to 13 SNPs. Analysis results support that the IGA provides higher ratio difference values than the GA between breast cancer cases and noncancer cases over 3-SNP to 13-SNP interactions. A more specific SNP-SNP interaction profile for the risk of breast cancer is also provided.

  9. Prim-SNPing: a primer designer for cost-effective SNP genotyping.

    PubMed

    Chang, Hsueh-Wei; Chuang, Li-Yeh; Cheng, Yu-Huei; Hung, Yu-Chen; Wen, Cheng-Hao; Gu, De-Leung; Yang, Cheng-Hong

    2009-05-01

    Many kinds of primer design (PD) software tools have been developed, but most of them lack a single nucleotide polymorphism (SNP) genotyping service. Here, we introduce the web-based freeware "Prim-SNPing," which, in addition to general PD, provides three kinds of primer design functions for cost-effective SNP genotyping: natural PD, mutagenic PD, and confronting two-pair primers (CTPP) PD. The natural PD and mutagenic PD provide primers and restriction enzyme mining for polymerase chain reaction-restriction fragment of length polymorphism (PCR-RFLP), while CTPP PD provides primers for restriction enzyme-free SNP genotyping. The PCR specificity and efficiency of the designed primers are improved by BLAST searching and evaluating secondary structure (such as GC clamps, dimers, and hairpins), respectively. The length pattern of PCR-RFLP using natural PD is user-adjustable, and the restriction sites of the RFLP enzymes provided by Prim-SNPing are confirmed to be absent within the generated PCR product. In CTPP PD, the need for a separate digestion step in RFLP is eliminated, thus making it faster and cheaper. The output of Prim-SNPing includes the primer list, melting temperature (Tm) value, GC percentage, and amplicon size with enzyme digestion information. The reference SNP (refSNP, or rs) clusters from the Single Nucleotide Polymorphism database (dbSNP) at the National Center for Biotechnology Information (NCBI), and multiple other formats of human, mouse, and rat SNP sequences are acceptable input. In summary, Prim-SNPing provides interactive, user-friendly and cost-effective primer design for SNP genotyping. It is freely available at http://bio.kuas.edu.tw/prim-snping.

  10. Electrochemical Li Topotactic Reaction in Layered SnP3 for Superior Li-Ion Batteries

    PubMed Central

    Park, Jae-Wan; Park, Cheol-Min

    2016-01-01

    The development of new anode materials having high electrochemical performances and interesting reaction mechanisms is highly required to satisfy the need for long-lasting mobile electronic devices and electric vehicles. Here, we report a layer crystalline structured SnP3 and its unique electrochemical behaviors with Li. The SnP3 was simply synthesized through modification of Sn crystallography by combination with P and its potential as an anode material for LIBs was investigated. During Li insertion reaction, the SnP3 anode showed an interesting two-step electrochemical reaction mechanism comprised of a topotactic transition (0.7–2.0 V) and a conversion (0.0–2.0 V) reaction. When the SnP3-based composite electrode was tested within the topotactic reaction region (0.7–2.0 V) between SnP3 and LixSnP3 (x ≤ 4), it showed excellent electrochemical properties, such as a high volumetric capacity (1st discharge/charge capacity was 840/663 mA h cm−3) with a high initial coulombic efficiency, stable cycle behavior (636 mA h cm−3 over 100 cycles), and fast rate capability (550 mA h cm−3 at 3C). This layered SnP3 anode will be applicable to a new anode material for rechargeable LIBs. PMID:27775090

  11. Electrochemical Li Topotactic Reaction in Layered SnP3 for Superior Li-Ion Batteries

    NASA Astrophysics Data System (ADS)

    Park, Jae-Wan; Park, Cheol-Min

    2016-10-01

    The development of new anode materials having high electrochemical performances and interesting reaction mechanisms is highly required to satisfy the need for long-lasting mobile electronic devices and electric vehicles. Here, we report a layer crystalline structured SnP3 and its unique electrochemical behaviors with Li. The SnP3 was simply synthesized through modification of Sn crystallography by combination with P and its potential as an anode material for LIBs was investigated. During Li insertion reaction, the SnP3 anode showed an interesting two-step electrochemical reaction mechanism comprised of a topotactic transition (0.7–2.0 V) and a conversion (0.0–2.0 V) reaction. When the SnP3-based composite electrode was tested within the topotactic reaction region (0.7–2.0 V) between SnP3 and LixSnP3 (x ≤ 4), it showed excellent electrochemical properties, such as a high volumetric capacity (1st discharge/charge capacity was 840/663 mA h cm‑3) with a high initial coulombic efficiency, stable cycle behavior (636 mA h cm‑3 over 100 cycles), and fast rate capability (550 mA h cm‑3 at 3C). This layered SnP3 anode will be applicable to a new anode material for rechargeable LIBs.

  12. Construction of a versatile SNP array for pyramiding useful genes of rice.

    PubMed

    Kurokawa, Yusuke; Noda, Tomonori; Yamagata, Yoshiyuki; Angeles-Shim, Rosalyn; Sunohara, Hidehiko; Uehara, Kanako; Furuta, Tomoyuki; Nagai, Keisuke; Jena, Kshirod Kumar; Yasui, Hideshi; Yoshimura, Atsushi; Ashikari, Motoyuki; Doi, Kazuyuki

    2016-01-01

    DNA marker-assisted selection (MAS) has become an indispensable component of breeding. Single nucleotide polymorphisms (SNP) are the most frequent polymorphism in the rice genome. However, SNP markers are not readily employed in MAS because of limitations in genotyping platforms. Here the authors report a Golden Gate SNP array that targets specific genes controlling yield-related traits and biotic stress resistance in rice. As a first step, the SNP genotypes were surveyed in 31 parental varieties using the Affymetrix Rice 44K SNP microarray. The haplotype information for 16 target genes was then converted to the Golden Gate platform with 143-plex markers. Haplotypes for the 14 useful allele are unique and can discriminate among all other varieties. The genotyping consistency between the Affymetrix microarray and the Golden Gate array was 92.8%, and the accuracy of the Golden Gate array was confirmed in 3 F2 segregating populations. The concept of the haplotype-based selection by using the constructed SNP array was proofed.

  13. SNP-based prediction of the human germ cell methylation landscape.

    PubMed

    Xie, Hehuang; Wang, Min; Bischof, Jared; Bonaldo, Maria de Fatima; Soares, Marcelo Bento

    2009-05-01

    Base substitution occurs at a high rate at CpG dinucleotides due to the frequent methylation of CpG and the deamination of methylated cytosine to thymine. If these substitutions occur in germ cells, they constitute a heritable mutation that may eventually rise to polymorphic frequencies, hence resulting in a SNP that is methylation associated. In this study, we sought to identify clusters of methylation associated SNPs as a basis for prediction of methylation landscapes of germ cell genomes. Genomic regions enriched with methylation associated SNPs, namely "methylation associated SNP clusters", were identified with an agglomerative hierarchical clustering algorithm. Repetitive elements, segmental duplications, and syntenic tandem DNA repeats were enriched in methylation associated SNP clusters. The frequency of methylation associated SNPs in Alu Y/S elements exhibited a gradient pattern suggestive of linear spreading, being higher in proximity to methylation associated SNP clusters and lower closer to CpG islands. Interestingly, methylation associated SNP clusters were over-represented near the transcriptional initiation sites of immune response genes. We propose a de novo DNA methylation model during germ cell development whereby a pattern is established by long-range chromatic interactions through syntenic repeats combined with regional methylation spreading from methylation associated SNP clusters.

  14. SNP2TFBS – a database of regulatory SNPs affecting predicted transcription factor binding site affinity

    PubMed Central

    Kumar, Sunil; Ambrosini, Giovanna; Bucher, Philipp

    2017-01-01

    SNP2TFBS is a computational resource intended to support researchers investigating the molecular mechanisms underlying regulatory variation in the human genome. The database essentially consists of a collection of text files providing specific annotations for human single nucleotide polymorphisms (SNPs), namely whether they are predicted to abolish, create or change the affinity of one or several transcription factor (TF) binding sites. A SNP's effect on TF binding is estimated based on a position weight matrix (PWM) model for the binding specificity of the corresponding factor. These data files are regenerated at regular intervals by an automatic procedure that takes as input a reference genome, a comprehensive SNP catalogue and a collection of PWMs. SNP2TFBS is also accessible over a web interface, enabling users to view the information provided for an individual SNP, to extract SNPs based on various search criteria, to annotate uploaded sets of SNPs or to display statistics about the frequencies of binding sites affected by selected SNPs. Homepage: http://ccg.vital-it.ch/snp2tfbs/. PMID:27899579

  15. SNP and mutation data on the web - hidden treasures for uncovering.

    PubMed

    Barnes, Michael R

    2002-01-01

    SNP data has grown exponentially over the last two years, SNP database evolution has matched this growth, as initial development of several independent SNP databases has given way to one central SNP database, dbSNP. Other SNP databases have instead evolved to complement this central database by providing gene specific focus and an increased level of curation and analysis on subsets of data, derived from the central data set. By contrast, human mutation data, which has been collected over many years, is still stored in disparate sources, although moves are afoot to move to a similar central database. These developments are timely, human mutation and polymorphism data both hold complementary keys to a better understanding of how genes function and malfunction in disease. The impending availability of a complete human genome presents us with an ideal framework to integrate both these forms of data, as our understanding of the mechanisms of disease increase, the full genomic context of variation may become increasingly significant.

  16. SNP2TFBS - a database of regulatory SNPs affecting predicted transcription factor binding site affinity.

    PubMed

    Kumar, Sunil; Ambrosini, Giovanna; Bucher, Philipp

    2017-01-04

    SNP2TFBS is a computational resource intended to support researchers investigating the molecular mechanisms underlying regulatory variation in the human genome. The database essentially consists of a collection of text files providing specific annotations for human single nucleotide polymorphisms (SNPs), namely whether they are predicted to abolish, create or change the affinity of one or several transcription factor (TF) binding sites. A SNP's effect on TF binding is estimated based on a position weight matrix (PWM) model for the binding specificity of the corresponding factor. These data files are regenerated at regular intervals by an automatic procedure that takes as input a reference genome, a comprehensive SNP catalogue and a collection of PWMs. SNP2TFBS is also accessible over a web interface, enabling users to view the information provided for an individual SNP, to extract SNPs based on various search criteria, to annotate uploaded sets of SNPs or to display statistics about the frequencies of binding sites affected by selected SNPs. Homepage: http://ccg.vital-it.ch/snp2tfbs/.

  17. Identification of Laying-Related SNP Markers in Geese Using RAD Sequencing.

    PubMed

    Yu, ShiGang; Chu, WeiWei; Zhang, LiFan; Han, HouMing; Zhao, RongXue; Wu, Wei; Zhu, JiangNing; Dodson, Michael V; Wei, Wei; Liu, HongLin; Chen, Jie

    2015-01-01

    Laying performance is an important economical trait of goose production. As laying performance is of low heritability, it is of significance to develop a marker-assisted selection (MAS) strategy for this trait. Definition of sequence variation related to the target trait is a prerequisite of quantitating MAS, but little is presently known about the goose genome, which greatly hinders the identification of genetic markers for the laying traits of geese. Recently developed restriction site-associated DNA (RAD) sequencing is a possible approach for discerning large-scale single nucleotide polymorphism (SNP) and reducing the complexity of a genome without having reference genomic information available. In the present study, we developed a pooled RAD sequencing strategy for detecting geese laying-related SNP. Two DNA pools were constructed, each consisting of equal amounts of genomic DNA from 10 individuals with either high estimated breeding value (HEBV) or low estimated breeding value (LEBV). A total of 139,013 SNP were obtained from 42,291,356 sequences, of which 18,771,943 were for LEBV and 23,519,413 were for HEBV cohorts. Fifty-five SNP which had different allelic frequencies in the two DNA pools were further validated by individual-based AS-PCR genotyping in the LEBV and HEBV cohorts. Ten out of 55 SNP exhibited distinct allele distributions in these two cohorts. These 10 SNP were further genotyped in a goose population of 492 geese to verify the association with egg numbers. The result showed that 8 of 10 SNP were associated with egg numbers. Additionally, liner regression analysis revealed that SNP Record-111407, 106975 and 112359 were involved in a multiplegene network affecting laying performance. We used IPCR to extend the unknown regions flanking the candidate RAD tags. The obtained sequences were subjected to BLAST to retrieve the orthologous genes in either ducks or chickens. Five novel genes were cloned for geese which harbored the candidate laying

  18. Identification of Laying-Related SNP Markers in Geese Using RAD Sequencing

    PubMed Central

    Yu, ShiGang; Chu, WeiWei; Zhang, LiFan; Han, HouMing; Zhao, RongXue; Wu, Wei; Zhu, JiangNing; Dodson, Michael V.; Wei, Wei; Liu, HongLin; Chen, Jie

    2015-01-01

    Laying performance is an important economical trait of goose production. As laying performance is of low heritability, it is of significance to develop a marker-assisted selection (MAS) strategy for this trait. Definition of sequence variation related to the target trait is a prerequisite of quantitating MAS, but little is presently known about the goose genome, which greatly hinders the identification of genetic markers for the laying traits of geese. Recently developed restriction site-associated DNA (RAD) sequencing is a possible approach for discerning large-scale single nucleotide polymorphism (SNP) and reducing the complexity of a genome without having reference genomic information available. In the present study, we developed a pooled RAD sequencing strategy for detecting geese laying-related SNP. Two DNA pools were constructed, each consisting of equal amounts of genomic DNA from 10 individuals with either high estimated breeding value (HEBV) or low estimated breeding value (LEBV). A total of 139,013 SNP were obtained from 42,291,356 sequences, of which 18,771,943 were for LEBV and 23,519,413 were for HEBV cohorts. Fifty-five SNP which had different allelic frequencies in the two DNA pools were further validated by individual-based AS-PCR genotyping in the LEBV and HEBV cohorts. Ten out of 55 SNP exhibited distinct allele distributions in these two cohorts. These 10 SNP were further genotyped in a goose population of 492 geese to verify the association with egg numbers. The result showed that 8 of 10 SNP were associated with egg numbers. Additionally, liner regression analysis revealed that SNP Record-111407, 106975 and 112359 were involved in a multiplegene network affecting laying performance. We used IPCR to extend the unknown regions flanking the candidate RAD tags. The obtained sequences were subjected to BLAST to retrieve the orthologous genes in either ducks or chickens. Five novel genes were cloned for geese which harbored the candidate laying

  19. Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications.

    PubMed

    Wu, Xiao-Lin; Xu, Jiaqi; Feng, Guofei; Wiggans, George R; Taylor, Jeremy F; He, Jun; Qian, Changsong; Qiu, Jiansheng; Simpson, Barry; Walker, Jeremy; Bauck, Stewart

    2016-01-01

    Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for the optimal design of LD SNP chips. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optimal LD SNP chips that can be imputed accurately to medium-density (MD) or high-density (HD) SNP genotypes for genomic prediction. The objective function facilitates maximization of non-gap map length and system information for the SNP chip, and the latter is computed either as locus-averaged (LASE) or haplotype-averaged Shannon entropy (HASE) and adjusted for uniformity of the SNP distribution. HASE performed better than LASE with ≤1,000 SNPs, but required considerably more computing time. Nevertheless, the differences diminished when >5,000 SNPs were selected. Optimization was accomplished conditionally on the presence of SNPs that were obligated to each chromosome. The frame location of SNPs on a chip can be either uniform (evenly spaced) or non-uniform. For the latter design, a tunable empirical Beta distribution was used to guide location distribution of frame SNPs such that both ends of each chromosome were enriched with SNPs. The SNP distribution on each chromosome was finalized through the objective function that was locally and empirically maximized. This MOLO algorithm was capable of selecting a set of approximately evenly-spaced and highly-informative SNPs, which in turn led to increased imputation accuracy compared with selection solely of evenly-spaced SNPs. Imputation accuracy increased with LD chip size, and imputation error rate was extremely low for chips with ≥3,000 SNPs. Assuming that genotyping or imputation error occurs at random, imputation error rate can be viewed as the upper limit for genomic prediction error. Our results show that about 25% of imputation error rate was propagated to genomic prediction in an Angus population. The

  20. Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications

    PubMed Central

    Wu, Xiao-Lin; Xu, Jiaqi; Feng, Guofei; Wiggans, George R.; Taylor, Jeremy F.; He, Jun; Qian, Changsong; Qiu, Jiansheng; Simpson, Barry; Walker, Jeremy; Bauck, Stewart

    2016-01-01

    Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for the optimal design of LD SNP chips. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optimal LD SNP chips that can be imputed accurately to medium-density (MD) or high-density (HD) SNP genotypes for genomic prediction. The objective function facilitates maximization of non-gap map length and system information for the SNP chip, and the latter is computed either as locus-averaged (LASE) or haplotype-averaged Shannon entropy (HASE) and adjusted for uniformity of the SNP distribution. HASE performed better than LASE with ≤1,000 SNPs, but required considerably more computing time. Nevertheless, the differences diminished when >5,000 SNPs were selected. Optimization was accomplished conditionally on the presence of SNPs that were obligated to each chromosome. The frame location of SNPs on a chip can be either uniform (evenly spaced) or non-uniform. For the latter design, a tunable empirical Beta distribution was used to guide location distribution of frame SNPs such that both ends of each chromosome were enriched with SNPs. The SNP distribution on each chromosome was finalized through the objective function that was locally and empirically maximized. This MOLO algorithm was capable of selecting a set of approximately evenly-spaced and highly-informative SNPs, which in turn led to increased imputation accuracy compared with selection solely of evenly-spaced SNPs. Imputation accuracy increased with LD chip size, and imputation error rate was extremely low for chips with ≥3,000 SNPs. Assuming that genotyping or imputation error occurs at random, imputation error rate can be viewed as the upper limit for genomic prediction error. Our results show that about 25% of imputation error rate was propagated to genomic prediction in an Angus population. The

  1. Generation of SNP datasets for orangutan population genomics using improved reduced-representation sequencing and direct comparisons of SNP calling algorithms

    PubMed Central

    2014-01-01

    Background High-throughput sequencing has opened up exciting possibilities in population and conservation genetics by enabling the assessment of genetic variation at genome-wide scales. One approach to reduce genome complexity, i.e. investigating only parts of the genome, is reduced-representation library (RRL) sequencing. Like similar approaches, RRL sequencing reduces ascertainment bias due to simultaneous discovery and genotyping of single-nucleotide polymorphisms (SNPs) and does not require reference genomes. Yet, generating such datasets remains challenging due to laboratory and bioinformatical issues. In the laboratory, current protocols require improvements with regards to sequencing homologous fragments to reduce the number of missing genotypes. From the bioinformatical perspective, the reliance of most studies on a single SNP caller disregards the possibility that different algorithms may produce disparate SNP datasets. Results We present an improved RRL (iRRL) protocol that maximizes the generation of homologous DNA sequences, thus achieving improved genotyping-by-sequencing efficiency. Our modifications facilitate generation of single-sample libraries, enabling individual genotype assignments instead of pooled-sample analysis. We sequenced ~1% of the orangutan genome with 41-fold median coverage in 31 wild-born individuals from two populations. SNPs and genotypes were called using three different algorithms. We obtained substantially different SNP datasets depending on the SNP caller. Genotype validations revealed that the Unified Genotyper of the Genome Analysis Toolkit and SAMtools performed significantly better than a caller from CLC Genomics Workbench (CLC). Of all conflicting genotype calls, CLC was only correct in 17% of the cases. Furthermore, conflicting genotypes between two algorithms showed a systematic bias in that one caller almost exclusively assigned heterozygotes, while the other one almost exclusively assigned homozygotes. Conclusions

  2. HapRice, an SNP haplotype database and a web tool for rice.

    PubMed

    Yonemaru, Jun-ichi; Ebana, Kaworu; Yano, Masahiro

    2014-01-01

    Genome-wide single nucleotide polymorphism (SNP) analysis is a promising tool to examine the genetic diversity of rice populations and genetic traits of scientific and economic importance. Next-generation sequencing technology has accelerated the re-sequencing of diverse rice varieties and the discovery of genome-wide SNPs. Notably, validation of these SNPs by a high-throughput genotyping system, such as an SNP array, could provide a manageable and highly accurate SNP set. To enhance the potential utility of genome-wide SNPs for geneticists and breeders, analysis tools need to be developed. Here, we constructed an SNP haplotype database, which allows visualization of the allele frequency of all SNPs in the genome browser. We calculated the allele frequencies of 3,334 SNPs in 76 accessions from the world rice collection and 3,252 SNPs in 177 Japanese rice accessions; all these SNPs have been validated in our previous studies. The SNP haplotypes were defined by the allele frequency in each cultivar group (aus, indica, tropical japonica and temperate japonica) for the world rice accessions, and in non-irrigated and three irrigated groups (three variety registration periods) for Japanese rice accessions. We also developed web tools for finding polymorphic SNPs between any two rice accessions and for the primer design to develop cleaved amplified polymorphic sequence markers at any SNP. The 'HapRice' database and the web tools can be accessed at http://qtaro.abr.affrc.go.jp/index.html. In addition, we established a core SNP set consisting of 768 SNPs uniformly distributed in the rice genome; this set is of a practically appropriate size for use in rice genetic analysis.

  3. High-throughput SNP genotyping in Cucurbita pepo for map construction and quantitative trait loci mapping

    PubMed Central

    2012-01-01

    Background Cucurbita pepo is a member of the Cucurbitaceae family, the second- most important horticultural family in terms of economic importance after Solanaceae. The "summer squash" types, including Zucchini and Scallop, rank among the highest-valued vegetables worldwide. There are few genomic tools available for this species. The first Cucurbita transcriptome, along with a large collection of Single Nucleotide Polymorphisms (SNP), was recently generated using massive sequencing. A set of 384 SNP was selected to generate an Illumina GoldenGate assay in order to construct the first SNP-based genetic map of Cucurbita and map quantitative trait loci (QTL). Results We herein present the construction of the first SNP-based genetic map of Cucurbita pepo using a population derived from the cross of two varieties with contrasting phenotypes, representing the main cultivar groups of the species' two subspecies: Zucchini (subsp. pepo) × Scallop (subsp. ovifera). The mapping population was genotyped with 384 SNP, a set of selected EST-SNP identified in silico after massive sequencing of the transcriptomes of both parents, using the Illumina GoldenGate platform. The global success rate of the assay was higher than 85%. In total, 304 SNP were mapped, along with 11 SSR from a previous map, giving a map density of 5.56 cM/marker. This map was used to infer syntenic relationships between C. pepo and cucumber and to successfully map QTL that control plant, flowering and fruit traits that are of benefit to squash breeding. The QTL effects were validated in backcross populations. Conclusion Our results show that massive sequencing in different genotypes is an excellent tool for SNP discovery, and that the Illumina GoldenGate platform can be successfully applied to constructing genetic maps and performing QTL analysis in Cucurbita. This is the first SNP-based genetic map in the Cucurbita genus and is an invaluable new tool for biological research, especially considering that most

  4. SnpFilt: A pipeline for reference-free assembly-based identification of SNPs in bacterial genomes.

    PubMed

    Chan, Carmen H S; Octavia, Sophie; Sintchenko, Vitali; Lan, Ruiting

    2016-12-01

    De novo assembly of bacterial genomes from next-generation sequencing (NGS) data allows a reference-free discovery of single nucleotide polymorphisms (SNP). However, substantial rates of errors in genomes assembled by this approach remain a major barrier for the reference-free analysis of genome variations in medically important bacteria. The aim of this report was to improve the quality of SNP identification in bacterial genomes without closely related references. We developed a bioinformatics pipeline (SnpFilt) that constructs an assembly using SPAdes and then removes unreliable regions based on the quality and coverage of re-aligned reads at neighbouring regions. The performance of the pipeline was compared against reference-based SNP calling for Illumina HiSeq, MiSeq and NextSeq reads from a range of bacterial pathogens including Salmonella, which is one of the most common causes of food-borne disease. The SnpFilt pipeline removed all false SNP in all test NGS datasets consisting of paired-end Illumina reads. We also showed that for reliable and complete SNP calls, at least 40-fold coverage is required. Analysis of bacterial isolates associated with epidemiologically confirmed outbreaks using the SnpFilt pipeline produced results consistent with previously published findings. The SnpFilt pipeline improves the quality of de-novo assembly and precision of SNP calling in bacterial genomes by removal of regions of the assembly that may potentially contain assembly errors. SnpFilt is available from https://github.com/LanLab/SnpFilt.

  5. Explaining the disease phenotype of intergenic SNP through predicted long range regulation

    PubMed Central

    Chen, Jingqi; Tian, Weidong

    2016-01-01

    Thousands of disease-associated SNPs (daSNPs) are located in intergenic regions (IGR), making it difficult to understand their association with disease phenotypes. Recent analysis found that non-coding daSNPs were frequently located in or approximate to regulatory elements, inspiring us to try to explain the disease phenotypes of IGR daSNPs through nearby regulatory sequences. Hence, after locating the nearest distal regulatory element (DRE) to a given IGR daSNP, we applied a computational method named INTREPID to predict the target genes regulated by the DRE, and then investigated their functional relevance to the IGR daSNP's disease phenotypes. 36.8% of all IGR daSNP-disease phenotype associations investigated were possibly explainable through the predicted target genes, which were enriched with, were functionally relevant to, or consisted of the corresponding disease genes. This proportion could be further increased to 60.5% if the LD SNPs of daSNPs were also considered. Furthermore, the predicted SNP-target gene pairs were enriched with known eQTL/mQTL SNP-gene relationships. Overall, it's likely that IGR daSNPs may contribute to disease phenotypes by interfering with the regulatory function of their nearby DREs and causing abnormal expression of disease genes. PMID:27280978

  6. Objective evaluation measures of genetic marker selection in large-scale SNP genotyping.

    PubMed

    Kaminuma, Eli; Masuya, Hiroshi; Miura, Ikuo; Motegi, Hiromi; Takahasi, Kenzi R; Nakazawa, Miki; Matsui, Minami; Gondo, Yoichi; Noda, Tetsuo; Shiroishi, Toshihiko; Wakana, Shigeharu; Toyoda, Tetsuro

    2008-10-01

    High-throughput single nucleotide polymorphism (SNP) genotyping systems provide two kinds of fluorescent signals detected from different alleles. In current technologies, the process of genotype discrimination requires subjective judgments by expert operators, even when using clustering algorithms. Here, we propose two evaluation measures to manage fluorescent scatter data with nonclear plot aggregation. The first is the marker ranking measure, which provides a ranking system for the SNP markers based on the distance between the scatter plot distribution and a user-defined ideal distribution. The second measure, called individual genotype membership, uses the membership probability of each genotype related to an individual plot in the scatter data. In verification experiments, the marker ranking measure determined the ranking of SNP markers correlated with the subjective order of SNP markers judged by an expert operator. The experiment using the individual genotype membership measure clarified that the total number of unclassified individuals was remarkably reduced compared to that of manually unclassified ones. These two evaluation measures were implemented as the GTAssist software. GTAssist provides objective standards and avoids subjective biases in SNP genotyping workflows.

  7. Assessment of high resolution melting analysis as a potential SNP genotyping technique in forensic casework.

    PubMed

    Venables, Samantha J; Mehta, Bhavik; Daniel, Runa; Walsh, Simon J; van Oorschot, Roland A H; McNevin, Dennis

    2014-11-01

    High resolution melting (HRM) analysis is a simple, cost effective, closed tube SNP genotyping technique with high throughput potential. The effectiveness of HRM for forensic SNP genotyping was assessed with five commercially available HRM kits evaluated on the ViiA™ 7 Real Time PCR instrument. Four kits performed satisfactorily against forensically relevant criteria. One was further assessed to determine the sensitivity, reproducibility, and accuracy of HRM SNP genotyping. The manufacturer's protocol using 0.5 ng input DNA and 45 PCR cycles produced accurate and reproducible results for 17 of the 19 SNPs examined. Problematic SNPs had GC rich flanking regions which introduced additional melting domains into the melting curve (rs1800407) or included homozygotes that were difficult to distinguish reliably (rs16891982; a G to C SNP). A proof of concept multiplexing experiment revealed that multiplexing a small number of SNPs may be possible after further investigation. HRM enables genotyping of a number of SNPs in a large number of samples without extensive optimization. However, it requires more genomic DNA as template in comparison to SNaPshot®. Furthermore, suitably modifying pre-existing forensic intelligence SNP panels for HRM analysis may pose difficulties due to the properties of some SNPs.

  8. Highly specific SNP detection using 2D graphene electronics and DNA strand displacement.

    PubMed

    Hwang, Michael T; Landon, Preston B; Lee, Joon; Choi, Duyoung; Mo, Alexander H; Glinsky, Gennadi; Lal, Ratnesh

    2016-06-28

    Single-nucleotide polymorphisms (SNPs) in a gene sequence are markers for a variety of human diseases. Detection of SNPs with high specificity and sensitivity is essential for effective practical implementation of personalized medicine. Current DNA sequencing, including SNP detection, primarily uses enzyme-based methods or fluorophore-labeled assays that are time-consuming, need laboratory-scale settings, and are expensive. Previously reported electrical charge-based SNP detectors have insufficient specificity and accuracy, limiting their effectiveness. Here, we demonstrate the use of a DNA strand displacement-based probe on a graphene field effect transistor (FET) for high-specificity, single-nucleotide mismatch detection. The single mismatch was detected by measuring strand displacement-induced resistance (and hence current) change and Dirac point shift in a graphene FET. SNP detection in large double-helix DNA strands (e.g., 47 nt) minimize false-positive results. Our electrical sensor-based SNP detection technology, without labeling and without apparent cross-hybridization artifacts, would allow fast, sensitive, and portable SNP detection with single-nucleotide resolution. The technology will have a wide range of applications in digital and implantable biosensors and high-throughput DNA genotyping, with transformative implications for personalized medicine.

  9. snpGeneSets: An R Package for Genome-Wide Study Annotation

    PubMed Central

    Mei, Hao; Li, Lianna; Jiang, Fan; Simino, Jeannette; Griswold, Michael; Mosley, Thomas; Liu, Shijian

    2016-01-01

    Genome-wide studies (GWS) of SNP associations and differential gene expressions have generated abundant results; next-generation sequencing technology has further boosted the number of variants and genes identified. Effective interpretation requires massive annotation and downstream analysis of these genome-wide results, a computationally challenging task. We developed the snpGeneSets package to simplify annotation and analysis of GWS results. Our package integrates local copies of knowledge bases for SNPs, genes, and gene sets, and implements wrapper functions in the R language to enable transparent access to low-level databases for efficient annotation of large genomic data. The package contains functions that execute three types of annotations: (1) genomic mapping annotation for SNPs and genes and functional annotation for gene sets; (2) bidirectional mapping between SNPs and genes, and genes and gene sets; and (3) calculation of gene effect measures from SNP associations and performance of gene set enrichment analyses to identify functional pathways. We applied snpGeneSets to type 2 diabetes (T2D) results from the NHGRI genome-wide association study (GWAS) catalog, a Finnish GWAS, and a genome-wide expression study (GWES). These studies demonstrate the usefulness of snpGeneSets for annotating and performing enrichment analysis of GWS results. The package is open-source, free, and can be downloaded at: https://www.umc.edu/biostats_software/. PMID:27807048

  10. snpGeneSets: An R Package for Genome-Wide Study Annotation.

    PubMed

    Mei, Hao; Li, Lianna; Jiang, Fan; Simino, Jeannette; Griswold, Michael; Mosley, Thomas; Liu, Shijian

    2016-12-07

    Genome-wide studies (GWS) of SNP associations and differential gene expressions have generated abundant results; next-generation sequencing technology has further boosted the number of variants and genes identified. Effective interpretation requires massive annotation and downstream analysis of these genome-wide results, a computationally challenging task. We developed the snpGeneSets package to simplify annotation and analysis of GWS results. Our package integrates local copies of knowledge bases for SNPs, genes, and gene sets, and implements wrapper functions in the R language to enable transparent access to low-level databases for efficient annotation of large genomic data. The package contains functions that execute three types of annotations: (1) genomic mapping annotation for SNPs and genes and functional annotation for gene sets; (2) bidirectional mapping between SNPs and genes, and genes and gene sets; and (3) calculation of gene effect measures from SNP associations and performance of gene set enrichment analyses to identify functional pathways. We applied snpGeneSets to type 2 diabetes (T2D) results from the NHGRI genome-wide association study (GWAS) catalog, a Finnish GWAS, and a genome-wide expression study (GWES). These studies demonstrate the usefulness of snpGeneSets for annotating and performing enrichment analysis of GWS results. The package is open-source, free, and can be downloaded at: https://www.umc.edu/biostats_software/.

  11. Supervised learning-based tagSNP selection for genome-wide disease classifications

    PubMed Central

    Liu, Qingzhong; Yang, Jack; Chen, Zhongxue; Yang, Mary Qu; Sung, Andrew H; Huang, Xudong

    2008-01-01

    Background Comprehensive evaluation of common genetic variations through association of single nucleotide polymorphisms (SNPs) with complex human diseases on the genome-wide scale is an active area in human genome research. One of the fundamental questions in a SNP-disease association study is to find an optimal subset of SNPs with predicting power for disease status. To find that subset while reducing study burden in terms of time and costs, one can potentially reconcile information redundancy from associations between SNP markers. Results We have developed a feature selection method named Supervised Recursive Feature Addition (SRFA). This method combines supervised learning and statistical measures for the chosen candidate features/SNPs to reconcile the redundancy information and, in doing so, improve the classification performance in association studies. Additionally, we have proposed a Support Vector based Recursive Feature Addition (SVRFA) scheme in SNP-disease association analysis. Conclusions We have proposed using SRFA with different statistical learning classifiers and SVRFA for both SNP selection and disease classification and then applying them to two complex disease data sets. In general, our approaches outperform the well-known feature selection method of Support Vector Machine Recursive Feature Elimination and logic regression-based SNP selection for disease classification in genetic association studies. Our study further indicates that both genetic and environmental variables should be taken into account when doing disease predictions and classifications for the most complex human diseases that have gene-environment interactions. PMID:18366619

  12. Highly specific SNP detection using 2D graphene electronics and DNA strand displacement

    PubMed Central

    Hwang, Michael T.; Landon, Preston B.; Lee, Joon; Choi, Duyoung; Mo, Alexander H.; Glinsky, Gennadi; Lal, Ratnesh

    2016-01-01

    Single-nucleotide polymorphisms (SNPs) in a gene sequence are markers for a variety of human diseases. Detection of SNPs with high specificity and sensitivity is essential for effective practical implementation of personalized medicine. Current DNA sequencing, including SNP detection, primarily uses enzyme-based methods or fluorophore-labeled assays that are time-consuming, need laboratory-scale settings, and are expensive. Previously reported electrical charge-based SNP detectors have insufficient specificity and accuracy, limiting their effectiveness. Here, we demonstrate the use of a DNA strand displacement-based probe on a graphene field effect transistor (FET) for high-specificity, single-nucleotide mismatch detection. The single mismatch was detected by measuring strand displacement-induced resistance (and hence current) change and Dirac point shift in a graphene FET. SNP detection in large double-helix DNA strands (e.g., 47 nt) minimize false-positive results. Our electrical sensor-based SNP detection technology, without labeling and without apparent cross-hybridization artifacts, would allow fast, sensitive, and portable SNP detection with single-nucleotide resolution. The technology will have a wide range of applications in digital and implantable biosensors and high-throughput DNA genotyping, with transformative implications for personalized medicine. PMID:27298347

  13. Developing a new nonbinary SNP fluorescent multiplex detection system for forensic application in China.

    PubMed

    Liu, Yanfang; Liao, Huidan; Liu, Ying; Guo, Juanjuan; Sun, Yi; Fu, Xiaoliang; Xiao, Ding; Cai, Jifeng; Lan, Lingmei; Xie, Pingli; Zha, Lagabaiyila

    2017-02-06

    Nonbinary single-nucleotide polymorphisms (SNPs) are potential forensic genetic markers because their discrimination power is greater than that of normal binary SNPs, and that they can detect highly degraded samples. We previously developed a nonbinary SNP multiplex typing assay. In this study, we selected additional 20 nonbinary SNPs from the NCBI SNP database and verified them through pyrosequencing. These 20 nonbinary SNPs were analyzed using the fluorescent-labeled SNaPshot multiplex SNP typing method. The allele frequencies and genetic parameters of these 20 nonbinary SNPs were determined among 314 unrelated individuals from Han populations from China. The total power of discrimination was 0.9999999999994, and the cumulative probability of exclusion was 0.9986. Moreover, the result of the combination of this 20 nonbinary SNP assay with the 20 nonbinary SNP assay we previously developed demonstrated that the cumulative probability of exclusion of the 40 nonbinary SNPs was 0.999991 and that no significant linkage disequilibrium was observed in all 40 nonbinary SNPs. Thus, we concluded that this new system consisting of new 20 nonbinary SNPs could provide highly informative polymorphic data which would be further used in forensic application and would serve as a potentially valuable supplement to forensic DNA analysis.

  14. MDM2 SNP309 polymorphism is associated with colorectal cancer risk

    PubMed Central

    Wang, Weizhi; Du, Mulong; Gu, Dongying; Zhu, Lingjun; Chu, Haiyan; Tong, Na; Zhang, Zhengdong; Xu, Zekuan; Wang, Meilin

    2014-01-01

    The human murine double minute 2 (MDM2) is known as an oncoprotein through inhibiting P53 transcriptional activity and mediating P53 ubiquitination. Therefore, the amplification of MDM2 may attenuate the P53 pathway and promote tumorigenesis. The SNP309 T>G polymorphism (rs2279744), which is located in the intronic promoter of MDM2 gene, was reported to contribute to the increased level of MDM2 protein. In this hospital-based case-control study, which consisted of 573 cases and 588 controls, we evaluated the association between MDM2 SNP309 and the risk of colorectal cancer (CRC) in a Chinese population by using the TaqMan method to genotype the polymorphism. We found that the MDM2 SNP309 polymorphism was significantly associated with CRC risk. In addition, in our meta-analysis, we found a significant association between MDM2 SNP309 and CRC risk among Asians, which was consistent with our results. In conclusion, we demonstrated that the MDM2 SNP309 polymorphism increased the susceptibility of CRC in Asian populations. PMID:24797837

  15. SNP Discovery by Illumina-Based Transcriptome Sequencing of the Olive and the Genetic Characterization of Turkish Olive Genotypes Revealed by AFLP, SSR and SNP Markers

    PubMed Central

    Kaya, Hilal Betul; Cetin, Oznur; Kaya, Hulya; Sahin, Mustafa; Sefer, Filiz; Kahraman, Abdullah; Tanyolac, Bahattin

    2013-01-01

    Background The olive tree (Olea europaea L.) is a diploid (2n = 2x = 46) outcrossing species mainly grown in the Mediterranean area, where it is the most important oil-producing crop. Because of its economic, cultural and ecological importance, various DNA markers have been used in the olive to characterize and elucidate homonyms, synonyms and unknown accessions. However, a comprehensive characterization and a full sequence of its transcriptome are unavailable, leading to the importance of an efficient large-scale single nucleotide polymorphism (SNP) discovery in olive. The objectives of this study were (1) to discover olive SNPs using next-generation sequencing and to identify SNP primers for cultivar identification and (2) to characterize 96 olive genotypes originating from different regions of Turkey. Methodology/Principal Findings Next-generation sequencing technology was used with five distinct olive genotypes and generated cDNA, producing 126,542,413 reads using an Illumina Genome Analyzer IIx. Following quality and size trimming, the high-quality reads were assembled into 22,052 contigs with an average length of 1,321 bases and 45 singletons. The SNPs were filtered and 2,987 high-quality putative SNP primers were identified. The assembled sequences and singletons were subjected to BLAST similarity searches and annotated with a Gene Ontology identifier. To identify the 96 olive genotypes, these SNP primers were applied to the genotypes in combination with amplified fragment length polymorphism (AFLP) and simple sequence repeats (SSR) markers. Conclusions/Significance This study marks the highest number of SNP markers discovered to date from olive genotypes using transcriptome sequencing. The developed SNP markers will provide a useful source for molecular genetic studies, such as genetic diversity and characterization, high density quantitative trait locus (QTL) analysis, association mapping and map-based gene cloning in the olive. High levels of

  16. Breast cancer-associated high-order SNP-SNP interaction of CXCL12/CXCR4-related genes by an improved multifactor dimensionality reduction (MDR-ER).

    PubMed

    Fu, Ou-Yang; Chang, Hsueh-Wei; Lin, Yu-Da; Chuang, Li-Yeh; Hou, Ming-Feng; Yang, Cheng-Hong

    2016-09-01

    In association studies, the combined effects of single nucleotide polymorphism (SNP)-SNP interactions and the problem of imbalanced data between cases and controls are frequently ignored. In the present study, we used an improved multifactor dimensionality reduction (MDR) approach namely MDR-ER to detect the high order SNP‑SNP interaction in an imbalanced breast cancer data set containing seven SNPs of chemokine CXCL12/CXCR4 pathway genes. Most individual SNPs were not significantly associated with breast cancer. After MDR‑ER analysis, six significant SNP‑SNP interaction models with seven genes (highest cross‑validation consistency, 10; classification error rates, 41.3‑21.0; and prediction error rates, 47.4‑55.3) were identified. CD4 and VEGFA genes were associated in a 2‑loci interaction model (classification error rate, 41.3; prediction error rate, 47.5; odds ratio (OR), 2.069; 95% bootstrap CI, 1.40‑2.90; P=1.71E‑04) and it also appeared in all the best 2‑7‑loci models. When the loci number increased, the classification error rates and P‑values decreased. The powers in 2‑7‑loci in all models were >0.9. The minimum classification error rate of the MDR‑ER‑generated model was shown with the 7‑loci interaction model (classification error rate, 21.0; OR=15.282; 95% bootstrap CI, 9.54‑23.87; P=4.03E‑31). In the epistasis network analysis, the overall effect with breast cancer susceptibility was identified and the SNP order of impact on breast cancer was identified as follows: CD4 = VEGFA > KITLG > CXCL12 > CCR7 = MMP2 > CXCR4. In conclusion, the MDR‑ER can effectively and correctly identify the best SNP‑SNP interaction models in an imbalanced data set for breast cancer cases.

  17. Transcriptome sequencing for SNP discovery across Cucumis melo

    PubMed Central

    2012-01-01

    from India and Africa as compared to commercial cultivars, cultigens and landraces from Eastern Europe, Western Asia and the Mediterranean basin is consistent with the evolutionary history proposed for the species. Group-specific SNVs that will be useful in introgression programs were also detected. In a sample of 143 selected putative SNPs, we verified 93% of the polymorphisms in a panel of 78 genotypes. Conclusions This study provides the first comprehensive resequencing data for wild, exotic, and cultivated (landraces and commercial) melon transcriptomes, yielding the largest melon SNP collection available to date and representing a notable sample of the species diversity. This data provides a valuable resource for creating a catalog of allelic variants of melon genes and it will aid in future in-depth studies of population genetics, marker-assisted breeding, and gene identification aimed at developing improved varieties. PMID:22726804

  18. Observation of perturbed 3snp double photoexcited Ryberg series of beryllium atoms

    SciTech Connect

    Yoshida, Fumiko; Matsuoka, Leo; Osaki, Hiroyuki; Kikkawa, Satoshi; Fukushima, Yu; Hasegawa, Shuichi; Nagata, Tetsuo; Azuma, Yoshiro; Obara, Satoshi

    2006-04-15

    We observed the 3snp autoionizing Rydberg series of the Be atom in order to investigate the double-photoexcitation processes in two-s-electron systems. We employed synchrotron radiation to photoexcite the Be atoms and measured the generated Be{sup +} photoions by the time-of-flight method. The 3snp (n=3-9) photoexcitation resonance peaks with interloper state of 3p4s that converges to Be{sup +}(3p) threshold were observed. We derived the resonance parameters of 3snp series from a fitting procedure and obtained the Fano parameter q, energy position E{sub 0}, and resonance width {gamma}. These parameters are in good agreement with theoretical values. In the vicinity of the 3s5p state these experimental results clearly revealed the influence of the interloper 3p4s state, and the comparison with the numerical calculations indicates that more detailed calculations might be required to fully explain this phenomenon.

  19. Multi-marker-LD based genetic algorithm for tag SNP selection.

    PubMed

    Mouawad, Amer E; Mansour, Nashat

    2014-12-01

    Despite the advances in genotyping technologies which have led to large reduction in genotyping cost, the Tag SNP Selection problem remains an important problem for computational biologists and geneticists. Selecting the smallest subset of tag SNPs that can predict the other SNPs would considerably minimize the complexity of genome-wide or block-based SNP-disease association studies. These studies would lead to better diagnosis and treatment of diseases. In this work, we propose three variations of a genetic algorithm based on two-marker linkage disequilibrium, multi-marker linkage disequilibrium, and a third measure that we denote by prediction power. The performance of the three algorithms are compared with those of a recognized tag SNP selection algorithm using three different real data sets from the HapMap project. The results indicate that the multi-marker linkage disequilibrium based genetic algorithm yields better prediction accuracy.

  20. Developing single nucleotide polymorphism (SNP) markers from transcriptome sequences for identification of longan (Dimocarpus longan) germplasm

    PubMed Central

    Wang, Boyi; Tan, Hua-Wei; Fang, Wanping; Meinhardt, Lyndel W; Mischke, Sue; Matsumoto, Tracie; Zhang, Dapeng

    2015-01-01

    Longan (Dimocarpus longan Lour.) is an important tropical fruit tree crop. Accurate varietal identification is essential for germplasm management and breeding. Using longan transcriptome sequences from public databases, we developed single nucleotide polymorphism (SNP) markers; validated 60 SNPs in 50 longan germplasm accessions, including cultivated varieties and wild germplasm; and designated 25 SNP markers that unambiguously identified all tested longan varieties with high statistical rigor (P<0.0001). Multiple trees from the same clone were verified and off-type trees were identified. Diversity analysis revealed genetic relationships among analyzed accessions. Cultivated varieties differed significantly from wild populations (Fst=0.300; P<0.001), demonstrating untapped genetic diversity for germplasm conservation and utilization. Within cultivated varieties, apparent differences between varieties from China and those from Thailand and Hawaii indicated geographic patterns of genetic differentiation. These SNP markers provide a powerful tool to manage longan genetic resources and breeding, with accurate and efficient genotype identification. PMID:26504559

  1. A SNP-Based Molecular Barcode for Characterization of Common Wheat

    PubMed Central

    Gao, LiFeng; Jia, JiZeng; Kong, XiuYing

    2016-01-01

    Wheat is grown as a staple crop worldwide. It is important to develop an effective genotyping tool for this cereal grain both to identify germplasm diversity and to protect the rights of breeders. Single-nucleotide polymorphism (SNP) genotyping provides a means for developing a practical, rapid, inexpensive and high-throughput assay. Here, we investigated SNPs as robust markers of genetic variation for typing wheat cultivars. We identified SNPs from an array of 9000 across a collection of 429 well-known wheat cultivars grown in China, of which 43 SNP markers with high minor allele frequency and variations discriminated the selected wheat varieties and their wild ancestors. This SNP-based barcode will allow for the rapid and precise identification of wheat germplasm resources and newly released varieties and will further assist in the wheat breeding program. PMID:26985664

  2. SNP discrimination through proofreading and OFF-switch of exo+ polymerase.

    PubMed

    Zhang, Jia; Li, Kai; Pardinas, Jose R; Liao, Duan F; Li, Hong J; Zhang, Xu

    2004-05-01

    Single nucleotide polymorphisms (SNPs) are useful physical markers for genetic studies as well as the cause of some genetic diseases. To develop more reliable SNP assays, we examined the underlying molecular mechanisms by which deoxyribonucleic acid (DNA) polymerases with 3' exonuclease activity maintain the high fidelity of DNA replication. In addition to mismatch removal by proofreading, we have discovered a premature termination of polymerization mediated by a novel OFF-switch mechanism. Two SNP assays were developed, one based on proofreading using 3' end-labeled primer extension and the other based on the newly identified OFF-switch, respectively. These two new assays are well suited for conventional techniques, such as electrophoresis and microplates detection systems as well as the sophisticated microchips. Application of these reliable SNP assays will greatly facilitate genetic and biomedical studies in the postgenome era.

  3. k-merSNP discovery: Software for alignment-and reference-free scalable SNP discovery, phylogenetics, and annotation for hundreds of microbial genomes

    SciTech Connect

    2014-11-18

    With the flood of whole genome finished and draft microbial sequences, we need faster, more scalable bioinformatics tools for sequence comparison. An algorithm is described to find single nucleotide polymorphisms (SNPs) in whole genome data. It scales to hundreds of bacterial or viral genomes, and can be used for finished and/or draft genomes available as unassembled contigs or raw, unassembled reads. The method is fast to compute, finding SNPs and building a SNP phylogeny in minutes to hours, depending on the size and diversity of the input sequences. The SNP-based trees that result are consistent with known taxonomy and trees determined in other studies. The approach we describe can handle many gigabases of sequence in a single run. The algorithm is based on k-mer analysis.

  4. An integrated SNP mining and utilization (ISMU) pipeline for next generation sequencing data.

    PubMed

    Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A V S K; Varshney, Rajeev K

    2014-01-01

    Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone

  5. Using Hamming Distance as Information for SNP-Sets Clustering and Testing in Disease Association Studies.

    PubMed

    Wang, Charlotte; Kao, Wen-Hsin; Hsiao, Chuhsing Kate

    2015-01-01

    The availability of high-throughput genomic data has led to several challenges in recent genetic association studies, including the large number of genetic variants that must be considered and the computational complexity in statistical analyses. Tackling these problems with a marker-set study such as SNP-set analysis can be an efficient solution. To construct SNP-sets, we first propose a clustering algorithm, which employs Hamming distance to measure the similarity between strings of SNP genotypes and evaluates whether the given SNPs or SNP-sets should be clustered. A dendrogram can then be constructed based on such distance measure, and the number of clusters can be determined. With the resulting SNP-sets, we next develop an association test HDAT to examine susceptibility to the disease of interest. This proposed test assesses, based on Hamming distance, whether the similarity between a diseased and a normal individual differs from the similarity between two individuals of the same disease status. In our proposed methodology, only genotype information is needed. No inference of haplotypes is required, and SNPs under consideration do not need to locate in nearby regions. The proposed clustering algorithm and association test are illustrated with applications and simulation studies. As compared with other existing methods, the clustering algorithm is faster and better at identifying sets containing SNPs exerting a similar effect. In addition, the simulation studies demonstrated that the proposed test works well for SNP-sets containing a large proportion of neutral SNPs. Furthermore, employing the clustering algorithm before testing a large set of data improves the knowledge in confining the genetic regions for susceptible genetic markers.

  6. Viability of in-house datamarting approaches for population genetics analysis of SNP genotypes

    PubMed Central

    Amigo, Jorge; Phillips, Christopher; Salas, Antonio; Carracedo, Ángel

    2009-01-01

    Background Databases containing very large amounts of SNP (Single Nucleotide Polymorphism) data are now freely available for researchers interested in medical and/or population genetics applications. While many of these SNP repositories have implemented data retrieval tools for general-purpose mining, these alone cannot cover the broad spectrum of needs of most medical and population genetics studies. Results To address this limitation, we have built in-house customized data marts from the raw data provided by the largest public databases. In particular, for population genetics analysis based on genotypes we have built a set of data processing scripts that deal with raw data coming from the major SNP variation databases (e.g. HapMap, Perlegen), stripping them into single genotypes and then grouping them into populations, then merged with additional complementary descriptive information extracted from dbSNP. This allows not only in-house standardization and normalization of the genotyping data retrieved from different repositories, but also the calculation of statistical indices from simple allele frequency estimates to more elaborate genetic differentiation tests within populations, together with the ability to combine population samples from different databases. Conclusion The present study demonstrates the viability of implementing scripts for handling extensive datasets of SNP genotypes with low computational costs, dealing with certain complex issues that arise from the divergent nature and configuration of the most popular SNP repositories. The information contained in these databases can also be enriched with additional information obtained from other complementary databases, in order to build a dedicated data mart. Updating the data structure is straightforward, as well as permitting easy implementation of new external data and the computation of supplementary statistical indices of interest. PMID:19344481

  7. SNP markers-based map construction and genome-wide linkage analysis in Brassica napus.

    PubMed

    Raman, Harsh; Dalton-Morgan, Jessica; Diffey, Simon; Raman, Rosy; Alamery, Salman; Edwards, David; Batley, Jacqueline

    2014-09-01

    An Illumina Infinium array comprising 5306 single nucleotide polymorphism (SNP) markers was used to genotype 175 individuals of a doubled haploid population derived from a cross between Skipton and Ag-Spectrum, two Australian cultivars of rapeseed (Brassica napus L.). A genetic linkage map based on 613 SNP and 228 non-SNP (DArT, SSR, SRAP and candidate gene markers) covering 2514.8 cM was constructed and further utilized to identify loci associated with flowering time and resistance to blackleg, a disease caused by the fungus Leptosphaeria maculans. Comparison between genetic map positions of SNP markers and the sequenced Brassica rapa (A) and Brassica oleracea (C) genome scaffolds showed several genomic rearrangements in the B. napus genome. A major locus controlling resistance to L. maculans was identified at both seedling and adult plant stages on chromosome A07. QTL analyses revealed that up to 40.2% of genetic variation for flowering time was accounted for by loci having quantitative effects. Comparative mapping showed Arabidopsis and Brassica flowering genes such as Phytochrome A/D, Flowering Locus C and agamous-Like MADS box gene AGL1 map within marker intervals associated with flowering time in a DH population from Skipton/Ag-Spectrum. Genomic regions associated with flowering time and resistance to L. maculans had several SNP markers mapped within 10 cM. Our results suggest that SNP markers will be suitable for various applications such as trait introgression, comparative mapping and high-resolution mapping of loci in B. napus.

  8. An Integrated SNP Mining and Utilization (ISMU) Pipeline for Next Generation Sequencing Data

    PubMed Central

    Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M.; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A. V. S. K.; Varshney, Rajeev K.

    2014-01-01

    Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone

  9. Mycobacterium leprae in Colombia described by SNP7614 in gyrA, two minisatellites and geography

    PubMed Central

    Cardona-Castro, Nora; Beltrán-Alzate, Juan Camilo; Romero-Montoya, Irma Marcela; Li, Wei; Brennan, Patrick J; Vissa, Varalakshmi

    2013-01-01

    New cases of leprosy are still being detected in Colombia after the country declared achievement of the WHO defined ‘elimination’ status. To study the ecology of leprosy in endemic regions, a combination of geographic and molecular tools were applied for a group of 201 multibacillary patients including six multi-case families from eleven departments. The location (latitude and longitude) of patient residences were mapped. Slit skin smears and/or skin biopsies were collected and DNA was extracted. Standard agarose gel electrophoresis following a multiplex PCR-was developed for rapid and inexpensive strain typing of M. leprae based on copy numbers of two VNTR minisatellite loci 27-5 and 12-5. A SNP (C/T) in gyrA (SNP7614) was mapped by introducing a novel PCR-RFLP into an ongoing drug resistance surveillance effort. Multiple genotypes were detected combining the three molecular markers. The two frequent genotypes in Colombia were SNP7614(C)/27-5(5)/12-5(4) [C54] predominantly distributed in the Atlantic departments and SNP7614 (T)/27-5(4)/12-5(5) [T45] associated with the Andean departments. A novel genotype SNP7614 (C)/27-5(6)/12-5(4) [C64] was detected in cities along the Magdalena river which separates the Andean from Atlantic departments; a subset was further characterized showing association with a rare allele of minisatellite 23-3 and the SNP type 1 of M. leprae. The genotypes within intra-family cases were conserved. Overall, this is the first large scale study that utilized simple and rapid assay formats for identification of major strain types and their distribution in Colombia. It provides the framework for further strain type discrimination and geographic information systems as tools for tracing transmission of leprosy. PMID:23291420

  10. MDM2 promoter SNP55 (rs2870820) affects risk of colon cancer but not breast-, lung-, or prostate cancer

    PubMed Central

    Helwa, Reham; Gansmo, Liv B.; Romundstad, Pål; Hveem, Kristian; Vatten, Lars; Ryan, Bríd M.; Harris, Curtis C.; Lønning, Per E.; Knappskog, Stian

    2016-01-01

    Two functional SNPs (SNP285G > C; rs117039649 and SNP309T > G; rs2279744) have previously been reported to modulate Sp1 transcription factor binding to the promoter of the proto-oncogene MDM2, and to influence cancer risk. Recently, a third SNP (SNP55C > T; rs2870820) was also reported to affect Sp1 binding and MDM2 transcription. In this large population based case-control study, we genotyped MDM2 SNP55 in 10,779 Caucasian individuals, previously genotyped for SNP309 and SNP285, including cases of colon (n = 1,524), lung (n = 1,323), breast (n = 1,709) and prostate cancer (n = 2,488) and 3,735 non-cancer controls, as well as 299 healthy African-Americans. Applying the dominant model, we found an elevated risk of colon cancer among individuals harbouring SNP55TT/CT genotypes compared to the SNP55CC genotype (OR = 1.15; 95% CI = 1.01–1.30). The risk was found to be highest for left-sided colon cancer (OR = 1.21; 95% CI = 1.00–1.45) and among females (OR = 1.32; 95% CI = 1.01–1.74). Assessing combined genotypes, we found the highest risk of colon cancer among individuals harbouring the SNP55TT or CT together with the SNP309TG genotype (OR = 1.21; 95% CI = 1.00–1.46). Supporting the conclusions from the risk estimates, we found colon cancer cases carrying the SNP55TT/CT genotypes to be diagnosed at younger age as compared to SNP55CC (p = 0.053), in particular among patients carrying the SNP309TG/TT genotypes (p = 0.009). PMID:27624283

  11. Priming of seeds with nitric oxide donor sodium nitroprusside (SNP) alleviates the inhibition on wheat seed germination by salt stress.

    PubMed

    Duan, Pei; Ding, Feng; Wang, Fang; Wang, Bao-Shan

    2007-06-01

    The effect of SNP, an NO donor, on seed germination of wheat (Triticum aestivum L. cv. 'DK961') under salt stress was studied. The results showed that priming of seeds with 0.06 mmol/L SNP for 24 h markedly alleviated the decrease of the germination percentage, germination index, vigor index and imbibition rate of wheat seeds under salt stress. SNP significantly alleviated the decrease of the beta-amylase activity but almost did not affect the alpha-amylase activity of wheat seeds under salt stress. SNP slightly increased the alpha-amylase isoenzymes (especially isoenzyme 3) and significantly increased the beta-amylase isoenzymes (especially isoenzyme d, e, f and g). SNP pretreatment decreased Na(+) content, but increased the K(+) content, resulting in a mark increase of K(+)/Na(+) ratio of wheat seedlings under salt stress. These results suggested that NO is involved in promoting wheat seed germination under salt stress by increasing the beta-amylase activity.

  12. Changes in variance explained by top SNP windows over generations for three traits in broiler chicken

    PubMed Central

    Fragomeni, Breno de Oliveira; Misztal, Ignacy; Lourenco, Daniela Lino; Aguilar, Ignacio; Okimoto, Ronald; Muir, William M.

    2014-01-01

    The purpose of this study was to determine if the set of genomic regions inferred as accounting for the majority of genetic variation in quantitative traits remain stable over multiple generations of selection. The data set contained phenotypes for five generations of broiler chicken for body weight, breast meat, and leg score. The population consisted of 294,632 animals over five generations and also included genotypes of 41,036 single nucleotide polymorphism (SNP) for 4,866 animals, after quality control. The SNP effects were calculated by a GWAS type analysis using single step genomic BLUP approach for generations 1–3, 2–4, 3–5, and 1–5. Variances were calculated for windows of 20 SNP. The top ten windows for each trait that explained the largest fraction of the genetic variance across generations were examined. Across generations, the top 10 windows explained more than 0.5% but less than 1% of the total variance. Also, the pattern of the windows was not consistent across generations. The windows that explained the greatest variance changed greatly among the combinations of generations, with a few exceptions. In many cases, a window identified as top for one combination, explained less than 0.1% for the other combinations. We conclude that identification of top SNP windows for a population may have little predictive power for genetic selection in the following generations for the traits here evaluated. PMID:25324857

  13. SNP discovery in complex allotetraploid genomes (Gossypium spp., Malvaceae) using genotyping by sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Dramatic decreases in the cost of DNA sequencing have enabled the development of very large numbers of markers based on single nucleotide polymorphism (SNP) for phylogenetic studies, population genetics, linkage mapping, marker-assisted breeding and other applications. Using Illumina next-generatio...

  14. Multiplexed SNP genotyping using the Qbead™ system: a quantum dot-encoded microsphere-based assay

    PubMed Central

    Xu, Hongxia; Sha, Michael Y.; Wong, Edith Y.; Uphoff, Janet; Xu, Yanzhang; Treadway, Joseph A.; Truong, Anh; O’Brien, Eamonn; Asquith, Steven; Stubbins, Michael; Spurr, Nigel K.; Lai, Eric H.; Mahoney, Walt

    2003-01-01

    We have developed a new method using the Qbead™ system for high-throughput genotyping of single nucleotide polymorphisms (SNPs). The Qbead system employs fluorescent Qdot™ semiconductor nanocrystals, also known as quantum dots, to encode microspheres that subsequently can be used as a platform for multiplexed assays. By combining mixtures of quantum dots with distinct emission wavelengths and intensities, unique spectral ‘barcodes’ are created that enable the high levels of multiplexing required for complex genetic analyses. Here, we applied the Qbead system to SNP genotyping by encoding microspheres conjugated to allele-specific oligonucleotides. After hybridization of oligonucleotides to amplicons produced by multiplexed PCR of genomic DNA, individual microspheres are analyzed by flow cytometry and each SNP is distinguished by its unique spectral barcode. Using 10 model SNPs, we validated the Qbead system as an accurate and reliable technique for multiplexed SNP genotyping. By modifying the types of probes conjugated to microspheres, the Qbead system can easily be adapted to other assay chemistries for SNP genotyping as well as to other applications such as analysis of gene expression and protein–protein interactions. With its capability for high-throughput automation, the Qbead system has the potential to be a robust and cost-effective platform for a number of applications. PMID:12682378

  15. Identification of a SNP marker associated with WB242 nematode resistance in sugar beet

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The beet-cyst nematode (Heterodera schachtii Schmidt) is one of the major diseases of sugar beet. The identification of molecular markers associated to the nematode resistance would be helpful for developing resistant varieties. The aim of this study was the identification of SNP (Single Nucleotide ...

  16. Utilization of a whole genome SNP panel for efficient genetic mapping in the mouse

    PubMed Central

    Moran, Jennifer L.; Bolton, Andrew D.; Tran, Pamela V.; Brown, Alison; Dwyer, Noelle D.; Manning, Danielle K.; Bjork, Bryan C.; Li, Cheng; Montgomery, Kate; Siepka, Sandra M.; Vitaterna, Martha Hotz; Takahashi, Joseph S.; Wiltshire, Tim; Kwiatkowski, David J.; Kucherlapati, Raju; Beier, David R.

    2006-01-01

    Phenotype-driven genetics can be used to create mouse models of human disease and birth defects. However, the utility of these mutant models is limited without identification of the causal gene. To facilitate genetic mapping, we developed a fixed single nucleotide polymorphism (SNP) panel of 394 SNPs as an alternative to analyses using simple sequence length polymorphism (SSLP) marker mapping. With the SNP panel, chromosomal locations for 22 monogenic mutants were identified. The average number of affected progeny genotyped for mapped monogenic mutations is nine. Map locations for several mutants have been obtained with as few as four affected progeny. The average size of genetic intervals obtained for these mutants is 43 Mb, with a range of 17–83 Mb. Thus, our SNP panel allows for identification of moderate resolution map position with small numbers of mice in a high-throughput manner. Importantly, the panel is suitable for mapping crosses from many inbred and wild-derived inbred strain combinations. The chromosomal localizations obtained with the SNP panel allow one to quickly distinguish between potentially novel loci or remutations in known genes, and facilitates fine mapping and positional cloning. By using this approach, we identified DNA sequence changes in two ethylnitrosourea-induced mutants. PMID:16461637

  17. Verification of genetic identity of introduced cacao germplasm in Ghana using single nucleotide polymorphism (SNP) markers

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Accurate identification of individual genotypes is important for cacao (Theobroma cacao L.) breeding, germplasm conservation and seed propagation. The development of single nucleotide polymorphism (SNP) markers in cacao offers an effective way to use a high-throughput genotyping system for cacao gen...

  18. Applying SNP marker technology in the cacao breeding program at the Cocoa Research Institute of Ghana

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In this investigation 45 parental cacao plants and five progeny derived from the parental stock studied were genotyped using six SNP markers to determine off-types or mislabeled clones and to authenticate crosses made in the Cocoa Research Institute of Ghana (CRIG) breeding program. Investigation wa...

  19. Making a chocolate chip: development and evaluation of a 6K SNP array for Theobroma cacao.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Theobroma cacao, the key ingredient in chocolate production, is one of the world's most important tree fruit crops, with ~4,000,000 metric tons produced across 50 countries. To move towards gene discovery and marker-assisted breeding in cacao, a single-nucleotide polymorphism (SNP) identification pr...

  20. A web-based genome browser for 'SNP-aware' assay design

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Human and animal genomes contain an abundance of single nucleotide polymorphisms (SNPs) that are useful for genetic testing. However, the relatively large number of SNPs present in diverse populations can pose serious problems when designing assays. It is important to “mask” some SNP positions so ...

  1. SNP-based genotyping in lentil: linking sequence information with phenotypes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Lentil (Lens culinaris) has been late to enter the world of high throughput molecular analysis due to a general lack of genomic resources. Using a 454 sequencing-based approach, SNPs have been identified in genes across the lentil genome. Several hundred have been turned into single SNP KASP assay...

  2. High-throughput RAD-SNP genotyping for characterization of sugar beet genotypes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    High-throughput SNP genotyping provides a rapid way of developing resourceful set of markers for delineating the genetic architecture and for effective species discrimination. In the presented research, we demonstrate a set of 192 SNPs for effective genotyping in sugar beet using high-throughput mar...

  3. A novel approach to analyzing fMRI and SNP data via parallel independent component analysis

    NASA Astrophysics Data System (ADS)

    Liu, Jingyu; Pearlson, Godfrey; Calhoun, Vince; Windemuth, Andreas

    2007-03-01

    There is current interest in understanding genetic influences on brain function in both the healthy and the disordered brain. Parallel independent component analysis, a new method for analyzing multimodal data, is proposed in this paper and applied to functional magnetic resonance imaging (fMRI) and a single nucleotide polymorphism (SNP) array. The method aims to identify the independent components of each modality and the relationship between the two modalities. We analyzed 92 participants, including 29 schizophrenia (SZ) patients, 13 unaffected SZ relatives, and 50 healthy controls. We found a correlation of 0.79 between one fMRI component and one SNP component. The fMRI component consists of activations in cingulate gyrus, multiple frontal gyri, and superior temporal gyrus. The related SNP component is contributed to significantly by 9 SNPs located in sets of genes, including those coding for apolipoprotein A-I, and C-III, malate dehydrogenase 1 and the gamma-aminobutyric acid alpha-2 receptor. A significant difference in the presences of this SNP component is found between the SZ group (SZ patients and their relatives) and the control group. In summary, we constructed a framework to identify the interactions between brain functional and genetic information; our findings provide new insight into understanding genetic influences on brain function in a common mental disorder.

  4. The use of SNP data for the monitoring of genetic diversity in cattle breeds

    Technology Transfer Automated Retrieval System (TEKTRAN)

    LD between SNPs contains information about effective population size. In this study, we investigate the use of genome-wide SNP data for marker based estimation of effective population size for two taurine cattle breeds of Africa and two local cattle breeds of Switzerland. Estimated recombination rat...

  5. Microsatellite Imputation for parental verification from SNP across multiple Bos taurus and indicus breeds

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Microsatellite markers (MS) have traditionally been used for parental verification and are still the international standard in spite of their higher cost, error rate, and turnaround time compared with Single Nucleotide Polymorphisms (SNP)-based assays. Despite domestic and international demands fro...

  6. Optimal design of low-density SNP arrays for genomic prediction: algorithm and applications

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for their optimal design. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optim...

  7. An improved consensus linkage map of barley based on flow-sorted chromosomes and SNP markers

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Recent advances in high-throughput genotyping have made it easier to combine information from different mapping populations into consensus genetic maps, which provide increased marker density and genome coverage compared to individual maps. Previously, a SNP-based genotyping platform was developed a...

  8. Mining for SNPs and SSRs using SNPServer, dbSNP and SSR taxonomy tree.

    PubMed

    Batley, Jacqueline; Edwards, David

    2009-01-01

    Molecular genetic markers represent one of the most powerful tools for the analysis of genomes and the association of heritable traits with underlying genetic variation. The development of high-throughput methods for the detection of single nucleotide polymorphisms (SNPs) and simple sequence repeats (SSRs) has led to a revolution in their use as molecular markers. The availability of large sequence data sets permits mining for these molecular markers, which may then be used for applications such as genetic trait mapping, diversity analysis and marker assisted selection in agriculture. Here we describe web-based automated methods for the discovery of SSRs using SSR taxonomy tree, the discovery of SNPs from sequence data using SNPServer and the identification of validated SNPs from within the dbSNP database. SSR taxonomy tree identifies pre-determined SSR amplification primers for virtually all species represented within the GenBank database. SNPServer uses a redundancy based approach to identify SNPs within DNA sequences. Following submission of a sequence of interest, SNPServer uses BLAST to identify similar sequences, CAP3 to cluster and assemble these sequences and then the SNP discovery software autoSNP to detect SNPs and insertion/deletion (indel) polymorphisms. The NCBI dbSNP database is a catalogue of molecular variation, hosting validated SNPs for several species within a public-domain archive.

  9. The impact of SNP fingerprinting and parentage analysis on the effectiveness of variety recommendations in cacao

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Evidence for the impact of mislabeling and/or pollen contamination on consistency of field performance has been lacking to reinforce the need for strict adherence to quality control protocols in cacao seed garden and germplasm plot management. The present study used SNP fingerprinting at 64 loci to ...

  10. Association mapping of resistance to leaf rust in emmer wheat using high throughput SNP markers

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Emmer wheat (Triticum turgidum L. subsp. dicoccum) is known to be a useful source of genes for many desirable characters for improvement of modern cultivated wheat. Recently, a panel of 181 emmer wheat accessions has been genotyped with wheat 9K SNP (single nucleotide polymorphism) markers and exte...

  11. EvoSNP-DB: A database of genetic diversity in East Asian populations

    PubMed Central

    Kim, Young Uk; Kim, Young Jin; Lee, Jong-Young; Park, Kiejung

    2013-01-01

    Genome-wide association studies (GWAS) have become popular as an approach for the identification of large numbers of phenotype-associated variants. However, differences in genetic architecture and environmental factors mean that the effect of variants can vary across populations. Understanding population genetic diversity is valuable for the investigation of possible population specific and independent effects of variants. EvoSNP-DB aims to provide information regarding genetic diversity among East Asian populations, including Chinese, Japanese, and Korean. Non-redundant SNPs (1.6 million) were genotyped in 54 Korean trios (162 samples) and were compared with 4 million SNPs from HapMap phase II populations. EvoSNP-DB provides two user interfaces for data query and visualization, and integrates scores of genetic diversity (Fst and VarLD) at the level of SNPs, genes, and chromosome regions. EvoSNP-DB is a web-based application that allows users to navigate and visualize measurements of population genetic differences in an interactive manner, and is available online at [http://biomi.cdc.go.kr/EvoSNP/]. [BMB Reports 2013; 46(8): 416-421] PMID:23977990

  12. Measuring diversity in Gossypium hirsutum using the CottonSNP63K Array

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A CottonSNP63K array and accompanying cluster file has been developed and includes 45,104 intra-specific SNPs and 17,954 inter-specific SNPs for automated genotyping of cotton (Gossypium spp.) samples. Development of the cluster file included genotyping of 1,156 samples, a subset of which were iden...

  13. Longevity and Plasticity of CFTR Provide an Argument for Noncanonical SNP Organization in Hominid DNA

    PubMed Central

    Hill, Aubrey E.; Plyler, Zackery E.; Tiwari, Hemant; Patki, Amit; Tully, Joel P.; McAtee, Christopher W.; Moseley, Leah A.; Sorscher, Eric J.

    2014-01-01

    Like many other ancient genes, the cystic fibrosis transmembrane conductance regulator (CFTR) has survived for hundreds of millions of years. In this report, we consider whether such prodigious longevity of an individual gene – as opposed to an entire genome or species – should be considered surprising in the face of eons of relentless DNA replication errors, mutagenesis, and other causes of sequence polymorphism. The conventions that modern human SNP patterns result either from purifying selection or random (neutral) drift were not well supported, since extant models account rather poorly for the known plasticity and function (or the established SNP distributions) found in a multitude of genes such as CFTR. Instead, our analysis can be taken as a polemic indicating that SNPs in CFTR and many other mammalian genes may have been generated—and continue to accrue—in a fundamentally more organized manner than would otherwise have been expected. The resulting viewpoint contradicts earlier claims of ‘directional’ or ‘intelligent design-type’ SNP formation, and has important implications regarding the pace of DNA adaptation, the genesis of conserved non-coding DNA, and the extent to which eukaryotic SNP formation should be viewed as adaptive. PMID:25350658

  14. SNP-microarrays can accurately identify the presence of an individual in complex forensic DNA mixtures.

    PubMed

    Voskoboinik, Lev; Ayers, Sheri B; LeFebvre, Aaron K; Darvasi, Ariel

    2015-05-01

    Common forensic and mass disaster scenarios present DNA evidence that comprises a mixture of several contributors. Identifying the presence of an individual in such mixtures has proven difficult. In the current study, we evaluate the practical usefulness of currently available "off-the-shelf" SNP microarrays for such purposes. We found that a set of 3000 SNPs specifically selected for this purpose can accurately identify the presence of an individual in complex DNA mixtures of various compositions. For example, individuals contributing as little as 5% to a complex DNA mixture can be robustly identified even if the starting DNA amount was as little as 5.0ng and had undergone whole-genome amplification (WGA) prior to SNP analysis. The work presented in this study represents proof-of-principle that our previously proposed approach, can work with real "forensic-type" samples. Furthermore, in the absence of a low-density focused forensic SNP microarray, the use of standard, currently available high-density SNP microarrays can be similarly used and even increase statistical power due to the larger amount of available information.

  15. The identification of SNPs with indeterminate positions using the Equine SNP50 BeadChip.

    PubMed

    Corbin, L J; Blott, S C; Swinburne, J E; Vaudin, M; Bishop, S C; Woolliams, J A

    2012-06-01

    We have used linkage disequilibrium (LD) to identify single nucleotide polymorphisms (SNPs) on the Illumina Equine SNP50 BeadChip, which may be incorrectly positioned on the genome map. A total of 1201 Thoroughbred horses were genotyped using the Illumina Equine SNP50 BeadChip. LD was evaluated in a pairwise fashion between all autosomal SNPs, both within and across chromosomes. Filters were then applied to the data, firstly to identify SNPs that may have been mapped to the wrong chromosome and secondly to identify SNPs that may have been incorrectly positioned within chromosomes. We identified a single SNP on ECA28, which showed low LD with neighbouring SNPs but considerable LD with a group of SNPs on ECA10. Furthermore, a cluster of SNPs on ECA5 showed unusually low LD with surrounding SNPs. A total of 39 SNPs met the criteria for unusual within-chromosome LD. The results of this study indicate that some SNPs may be misplaced. This finding is significant, as misplaced SNPs may lead to difficulties in the application of genomic methods, such as homozygosity mapping, for which SNP order is important.

  16. Analysis of gene-derived SNP marker polymorphism in wheat (Triticum aestivum L.)

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In this study, we analyzed 359 single nucleotide polymorphisms (SNPs) previously discovered in intron sequences of wheat genes to evaluate SNP marker polymorphism in common wheat (Triticum aestivum L.). These SNPs showed an average polymorphism information content (PIC) of 0.181 among 20 US wheat c...

  17. Fine mapping of copy number variations on two cattle genome assemblies using high density SNP array

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Btau_4.0 and UMD3.1 are two distinct cattle reference genome assemblies. In our previous study using the low density BovineSNP50 array, we reported a copy number variation (CNV) analysis on Btau_4.0 with 521 animals of 21 cattle breeds, yielding 682 CNV regions with a total length of 139.8 megabases...

  18. SNP-based high density genetic map and mapping of btwd1 dwarfing gene in barley

    PubMed Central

    Ren, Xifeng; Wang, Jibin; Liu, Lipan; Sun, Genlou; Li, Chengdao; Luo, Hong; Sun, Dongfa

    2016-01-01

    A high-density linkage map is a valuable tool for functional genomics and breeding. A newly developed sequence-based marker technology, restriction site associated DNA (RAD) sequencing, has been proven to be powerful for the rapid discovery and genotyping of genome-wide single nucleotide polymorphism (SNP) markers and for the high-density genetic map construction. The objective of this research was to construct a high-density genetic map of barley using RAD sequencing. 1894 high-quality SNP markers were developed and mapped onto all seven chromosomes together with 68 SSR markers. These 1962 markers constituted a total genetic length of 1375.8 cM and an average of 0.7 cM between adjacent loci. The number of markers within each linkage group ranged from 209 to 396. The new recessive dwarfing gene btwd1 in Huaai 11 was mapped onto the high density linkage maps. The result showed that the btwd1 is positioned between SNP marks 7HL_6335336 and 7_249275418 with a genetic distance of 0.9 cM and 0.7 cM on chromosome 7H, respectively. The SNP-based high-density genetic map developed and the dwarfing gene btwd1 mapped in this study provide critical information for position cloning of the btwd1 gene and molecular breeding of barley. PMID:27530597

  19. Changes in variance explained by top SNP windows over generations for three traits in broiler chicken.

    PubMed

    Fragomeni, Breno de Oliveira; Misztal, Ignacy; Lourenco, Daniela Lino; Aguilar, Ignacio; Okimoto, Ronald; Muir, William M

    2014-01-01

    The purpose of this study was to determine if the set of genomic regions inferred as accounting for the majority of genetic variation in quantitative traits remain stable over multiple generations of selection. The data set contained phenotypes for five generations of broiler chicken for body weight, breast meat, and leg score. The population consisted of 294,632 animals over five generations and also included genotypes of 41,036 single nucleotide polymorphism (SNP) for 4,866 animals, after quality control. The SNP effects were calculated by a GWAS type analysis using single step genomic BLUP approach for generations 1-3, 2-4, 3-5, and 1-5. Variances were calculated for windows of 20 SNP. The top ten windows for each trait that explained the largest fraction of the genetic variance across generations were examined. Across generations, the top 10 windows explained more than 0.5% but less than 1% of the total variance. Also, the pattern of the windows was not consistent across generations. The windows that explained the greatest variance changed greatly among the combinations of generations, with a few exceptions. In many cases, a window identified as top for one combination, explained less than 0.1% for the other combinations. We conclude that identification of top SNP windows for a population may have little predictive power for genetic selection in the following generations for the traits here evaluated.

  20. Longevity and plasticity of CFTR provide an argument for noncanonical SNP organization in hominid DNA.

    PubMed

    Hill, Aubrey E; Plyler, Zackery E; Tiwari, Hemant; Patki, Amit; Tully, Joel P; McAtee, Christopher W; Moseley, Leah A; Sorscher, Eric J

    2014-01-01

    Like many other ancient genes, the cystic fibrosis transmembrane conductance regulator (CFTR) has survived for hundreds of millions of years. In this report, we consider whether such prodigious longevity of an individual gene--as opposed to an entire genome or species--should be considered surprising in the face of eons of relentless DNA replication errors, mutagenesis, and other causes of sequence polymorphism. The conventions that modern human SNP patterns result either from purifying selection or random (neutral) drift were not well supported, since extant models account rather poorly for the known plasticity and function (or the established SNP distributions) found in a multitude of genes such as CFTR. Instead, our analysis can be taken as a polemic indicating that SNPs in CFTR and many other mammalian genes may have been generated--and continue to accrue--in a fundamentally more organized manner than would otherwise have been expected. The resulting viewpoint contradicts earlier claims of 'directional' or 'intelligent design-type' SNP formation, and has important implications regarding the pace of DNA adaptation, the genesis of conserved non-coding DNA, and the extent to which eukaryotic SNP formation should be viewed as adaptive.

  1. MAFsnp: A Multi-Sample Accurate and Flexible SNP Caller Using Next-Generation Sequencing Data.

    PubMed

    Hu, Jiyuan; Li, Tengfei; Xiu, Zidi; Zhang, Hong

    2015-01-01

    Most existing statistical methods developed for calling single nucleotide polymorphisms (SNPs) using next-generation sequencing (NGS) data are based on Bayesian frameworks, and there does not exist any SNP caller that produces p-values for calling SNPs in a frequentist framework. To fill in this gap, we develop a new method MAFsnp, a Multiple-sample based Accurate and Flexible algorithm for calling SNPs with NGS data. MAFsnp is based on an estimated likelihood ratio test (eLRT) statistic. In practical situation, the involved parameter is very close to the boundary of the parametric space, so the standard large sample property is not suitable to evaluate the finite-sample distribution of the eLRT statistic. Observing that the distribution of the test statistic is a mixture of zero and a continuous part, we propose to model the test statistic with a novel two-parameter mixture distribution. Once the parameters in the mixture distribution are estimated, p-values can be easily calculated for detecting SNPs, and the multiple-testing corrected p-values can be used to control false discovery rate (FDR) at any pre-specified level. With simulated data, MAFsnp is shown to have much better control of FDR than the existing SNP callers. Through the application to two real datasets, MAFsnp is also shown to outperform the existing SNP callers in terms of calling accuracy. An R package "MAFsnp" implementing the new SNP caller is freely available at http://homepage.fudan.edu.cn/zhangh/softwares/.

  2. Association of Agronomic Traits with SNP Markers in Durum Wheat (Triticum turgidum L. durum (Desf.))

    PubMed Central

    Hu, Xin; Ren, Jing; Ren, Xifeng; Huang, Sisi; Sabiel, Salih A. I.; Luo, Mingcheng; Nevo, Eviatar; Fu, Chunjie; Peng, Junhua; Sun, Dongfa

    2015-01-01

    Association mapping is a powerful approach to detect associations between traits of interest and genetic markers based on linkage disequilibrium (LD) in molecular plant breeding. In this study, 150 accessions of worldwide originated durum wheat germplasm (Triticum turgidum spp. durum) were genotyped using 1,366 SNP markers. The extent of LD on each chromosome was evaluated. Association of single nucleotide polymorphisms (SNP) markers with ten agronomic traits measured in four consecutive years was analyzed under a mix linear model (MLM). Two hundred and one significant association pairs were detected in the four years. Several markers were associated with one trait, and also some markers were associated with multiple traits. Some of the associated markers were in agreement with previous quantitative trait loci (QTL) analyses. The function and homology analyses of the corresponding ESTs of some SNP markers could explain many of the associations for plant height, length of main spike, number of spikelets on main spike, grain number per plant, and 1000-grain weight, etc. The SNP associations for the observed traits are generally clustered in specific chromosome regions of the wheat genome, mainly in 2A, 5A, 6A, 7A, 1B, and 6B chromosomes. This study demonstrates that association mapping can complement and enhance previous QTL analyses and provide additional information for marker-assisted selection. PMID:26110423

  3. Detecting SNP combinations discriminating human populations from HapMap data.

    PubMed

    Ding, XiaoJun; Li, Min; Gu, HaiHua; Peng, XiaoQing; Zhang, Zhen; Wu, FangXiang

    2015-03-01

    The genomes of different human beings are similar. There are only a relatively small number of genetic differences between people. The genetic differences between people are very worthy of study. Researchers have proposed the fixation index FST measurement to find the single nucleotide polymorphisms (SNPs) which can reflect human population differences. However, most SNPs have interactions and they work together, which leads to the differences among human populations. The number of all possible m-locus combinations chosen from n SNPs grows exponentially. Most methods concern on 2-locus interactions. In this paper, we propose a novel method to find a new coordinate system under which the energy distributions of different populations are quite different. We select out candidate SNPs from n SNPs by using the information of the axes in the coordinate system. The number of candidate SNPs is small, thus SNP-SNP interactions can be searched efficiently. The method can also find interactions of more than two loci. These interactions should be able to reflect the evolution of human populations from another way. The numbers of SNP-SNP interactions are regarded as the differences between pairwise populations and a hierarchical clustering algorithm is used to construct the evolutionary tree. In the experiments, we apply the method to SNP data of four chromosomes separately and the trees constructed on these four chromosomes are highly consistent. Furthermore, the trees are also consistent with previous studies, which indicates that evolutionary information is well mined. The method provides a new insight to analyze the human population differences.

  4. MDM2 SNP309 polymorphism contributes to endometrial cancer susceptibility: evidence from a meta-analysis

    PubMed Central

    2013-01-01

    Objective The SNP309 polymorphism (T-G) in the promoter of MDM2 gene has been reported to be associated with enhanced MDM2 expression and tumor development. Studies investigating the association between MDM2 SNP309 polymorphism and endometrial cancer risk reported conflicting results. We performed a meta-analysis of all available studies to explore this association. Methods All studies published up to August 2013 on the association between MDM2 SNP309 polymorphism and endometrial cancer risk were identified by searching electronic databases PubMed, Web of Science, EMBASE, and Chinese Biomedical Literature database (CBM). The association between the MDM2 SNP309 polymorphism and endometrial cancer risk was assessed by odds ratios (ORs) together with their 95% confidence intervals (CIs). Results Eight case–control studies with 2069 endometrial cancer cases and 4546 controls were identified. Overall, significant increase of endometrial cancer risk was found when all studies were pooled in the meta-analysis (GG vs. TT: OR = 1.464, 95% CI 1.246–1.721, P < 0.001; GG vs. TG + TT: OR = 1.726, 95% CI 1.251–2.380, P = 0.001; GG + TG vs. TT: OR = 1.169, 95% CI 1.048–1.304, P = 0.005). In subgroup analysis by ethnicity and HWE in controls, significant increase of endometrial cancer risks were observed in Caucasians and studies consistent with HWE. In subgroup analysis according to study quality, significant associations were observed in both high quality studies and low quality studies. Conclusions This meta-analysis suggests that MDM2 SNP309 polymorphism contributes to endometrial cancer susceptibility, especially in Caucasian populations. Further large and well-designed studies are needed to confirm this association. PMID:24423195

  5. SNP discovery in candidate adaptive genes using exon capture in a free-ranging alpine ungulate

    USGS Publications Warehouse

    Roffler, Gretchen H.; Amish, Stephen J.; Smith, Seth; Cosart, Ted F.; Kardos, Marty; Schwartz, Michael K.; Luikart, Gordon

    2016-01-01

    Identification of genes underlying genomic signatures of natural selection is key to understanding adaptation to local conditions. We used targeted resequencing to identify SNP markers in 5321 candidate adaptive genes associated with known immunological, metabolic and growth functions in ovids and other ungulates. We selectively targeted 8161 exons in protein-coding and nearby 5′ and 3′ untranslated regions of chosen candidate genes. Targeted sequences were taken from bighorn sheep (Ovis canadensis) exon capture data and directly from the domestic sheep genome (Ovis aries v. 3; oviAri3). The bighorn sheep sequences used in the Dall's sheep (Ovis dalli dalli) exon capture aligned to 2350 genes on the oviAri3 genome with an average of 2 exons each. We developed a microfluidic qPCR-based SNP chip to genotype 476 Dall's sheep from locations across their range and test for patterns of selection. Using multiple corroborating approaches (lositan and bayescan), we detected 28 SNP loci potentially under selection. We additionally identified candidate loci significantly associated with latitude, longitude, precipitation and temperature, suggesting local environmental adaptation. The three methods demonstrated consistent support for natural selection on nine genes with immune and disease-regulating functions (e.g. Ovar-DRA, APC, BATF2, MAGEB18), cell regulation signalling pathways (e.g. KRIT1, PI3K, ORRC3), and respiratory health (CYSLTR1). Characterizing adaptive allele distributions from novel genetic techniques will facilitate investigation of the influence of environmental variation on local adaptation of a northern alpine ungulate throughout its range. This research demonstrated the utility of exon capture for gene-targeted SNP discovery and subsequent SNP chip genotyping using low-quality samples in a nonmodel species.

  6. SNP discovery in the transcriptome of white Pacific shrimp Litopenaeus vannamei by next generation sequencing.

    PubMed

    Yu, Yang; Wei, Jiankai; Zhang, Xiaojun; Liu, Jingwen; Liu, Chengzhang; Li, Fuhua; Xiang, Jianhai

    2014-01-01

    The application of next generation sequencing technology has greatly facilitated high throughput single nucleotide polymorphism (SNP) discovery and genotyping in genetic research. In the present study, SNPs were discovered based on two transcriptomes of Litopenaeus vannamei (L. vannamei) generated from Illumina sequencing platform HiSeq 2000. One transcriptome of L. vannamei was obtained through sequencing on the RNA from larvae at mysis stage and its reference sequence was de novo assembled. The data from another transcriptome were downloaded from NCBI and the reads of the two transcriptomes were mapped separately to the assembled reference by BWA. SNP calling was performed using SAMtools. A total of 58,717 and 36,277 SNPs with high quality were predicted from the two transcriptomes, respectively. SNP calling was also performed using the reads of two transcriptomes together, and a total of 96,040 SNPs with high quality were predicted. Among these 96,040 SNPs, 5,242 and 29,129 were predicted as non-synonymous and synonymous SNPs respectively. Characterization analysis of the predicted SNPs in L. vannamei showed that the estimated SNP frequency was 0.21% (one SNP per 476 bp) and the estimated ratio for transition to transversion was 2.0. Fifty SNPs were randomly selected for validation by Sanger sequencing after PCR amplification and 76% of SNPs were confirmed, which indicated that the SNPs predicted in this study were reliable. These SNPs will be very useful for genetic study in L. vannamei, especially for the high density linkage map construction and genome-wide association studies.

  7. Gradient Boosting as a SNP Filter: an Evaluation Using Simulated and Hair Morphology Data

    PubMed Central

    Lubke, GH; Laurin, C; Walters, R; Eriksson, N; Hysi, P; Spector, TD; Montgomery, GW; Martin, NG; Medland, SE; Boomsma, DI

    2013-01-01

    Typically, genome-wide association studies consist of regressing the phenotype on each SNP separately using an additive genetic model. Although statistical models for recessive, dominant, SNP-SNP, or SNP-environment interactions exist, the testing burden makes an evaluation of all possible effects impractical for genome-wide data. We advocate a two-step approach where the first step consists of a filter that is sensitive to different types of SNP main and interactions effects. The aim is to substantially reduce the number of SNPs such that more specific modeling becomes feasible in a second step. We provide an evaluation of a statistical learning method called “gradient boosting machine” (GBM) that can be used as a filter. GBM does not require an a priori specification of a genetic model, and permits inclusion of large numbers of covariates. GBM can therefore be used to explore multiple GxE interactions, which would not be feasible within the parametric framework used in GWAS. We show in a simulation that GBM performs well even under conditions favorable to the standard additive regression model commonly used in GWAS, and is sensitive to the detection of interaction effects even if one of the interacting variables has a zero main effect. The latter would not be detected in GWAS. Our evaluation is accompanied by an analysis of empirical data concerning hair morphology. We estimate the phenotypic variance explained by increasing numbers of highest ranked SNPs, and show that it is sufficient to select 10K-20K SNPs in the first step of a two-step approach. PMID:24404405

  8. Identification of differently expressed genes with specific SNP Loci for breast cancer by the integration of SNP and gene expression profiling analyses.

    PubMed

    Yuan, Pengfei; Liu, Dechun; Deng, Miao; Liu, Jiangbo; Wang, Jianguang; Zhang, Like; Liu, Qipeng; Zhang, Ting; Chen, Yanbin; Jin, Gaoyuan

    2015-04-01

    This study aims to explore the relationship between gene polymorphism and breast cancer, and to screen DEGs (differentially expressed genes) with SNPs (single nucleotide polymorphisms) related to breast cancer. The SNPs of 17 patients and the preprocessed SNP profiling GSE 32258 (38 cases of normal breast cells) were combined to identify their correlation with breast cancer using chi-square test. The gene expression profiling batch8_9 (38 cases of patients and 8 cases of normal tissue) was preprocessed with limma package, and the DEGs were filtered out. Then fisher's method was applied to integrate DEGs and SNPs associated with breast cancer. With NetBox software, TRED (Transcriptional Regulatory Element Database) and UCSC (University of California Santa Cruz) database, genes-associated network and transcriptional regulatory network were constructed using cytoscape software. Further, GO (Gene Ontology) and KEGG analyses were performed for genes in the networks by using siggenes. In total, 332 DEGs were identified. There were 160 breast cancer-related SNPs related to 106 genes of gene expression profiling (19 were significant DEGs). Finally, 11co-correlated DEGs were selected. In genes-associated network, 9 significant DEGs were correlated to 23 LINKER genes while, in transcriptional regulatory network, E2F1 had regulatory relationships with 7 DEGs including MTUS1, CD44, CCNB1 and CCND2. KRAS with SNP locus of rs1137282 was involved in 35 KEGG pathways. The genes of MTUS1, CD44, CCNB1, CCND2 and KRAS with specific SNP loci may be used as biomarkers for diagnosis of breast cancer. Besides, E2F1 was recognized as the transcription factor of 7 DEGs including MTUS1, CD44, CCNB1 and CCND2.

  9. When whole-genome alignments just won't work: kSNP v2 software for alignment-free SNP discovery and phylogenetics of hundreds of microbial genomes.

    PubMed

    Gardner, Shea N; Hall, Barry G

    2013-01-01

    Effective use of rapid and inexpensive whole genome sequencing for microbes requires fast, memory efficient bioinformatics tools for sequence comparison. The kSNP v2 software finds single nucleotide polymorphisms (SNPs) in whole genome data. kSNP v2 has numerous improvements over kSNP v1 including SNP gene annotation; better scaling for draft genomes available as assembled contigs or raw, unassembled reads; a tool to identify the optimal value of k; distribution of packages of executables for Linux and Mac OS X for ease of installation and user-friendly use; and a detailed User Guide. SNP discovery is based on k-mer analysis, and requires no multiple sequence alignment or the selection of a single reference genome. Most target sets with hundreds of genomes complete in minutes to hours. SNP phylogenies are built by maximum likelihood, parsimony, and distance, based on all SNPs, only core SNPs, or SNPs present in some intermediate user-specified fraction of targets. The SNP-based trees that result are consistent with known taxonomy. kSNP v2 can handle many gigabases of sequence in a single run, and if one or more annotated genomes are included in the target set, SNPs are annotated with protein coding and other information (UTRs, etc.) from Genbank file(s). We demonstrate application of kSNP v2 on sets of viral and bacterial genomes, and discuss in detail analysis of a set of 68 finished E. coli and Shigella genomes and a set of the same genomes to which have been added 47 assemblies and four "raw read" genomes of H104:H4 strains from the recent European E. coli outbreak that resulted in both bloody diarrhea and hemolytic uremic syndrome (HUS), and caused at least 50 deaths.

  10. Detection of selective sweeps in cattle using genome-wide SNP data

    PubMed Central

    2013-01-01

    Background The domestication and subsequent selection by humans to create breeds and biological types of cattle undoubtedly altered the patterning of variation within their genomes. Strong selection to fix advantageous large-effect mutations underlying domesticability, breed characteristics or productivity created selective sweeps in which variation was lost in the chromosomal region flanking the selected allele. Selective sweeps have now been identified in the genomes of many animal species including humans, dogs, horses, and chickens. Here, we attempt to identify and characterise regions of the bovine genome that have been subjected to selective sweeps. Results Two datasets were used for the discovery and validation of selective sweeps via the fixation of alleles at a series of contiguous SNP loci. BovineSNP50 data were used to identify 28 putative sweep regions among 14 diverse cattle breeds. Affymetrix BOS 1 prescreening assay data for five breeds were used to identify 85 regions and validate 5 regions identified using the BovineSNP50 data. Many genes are located within these regions and the lack of sequence data for the analysed breeds precludes the nomination of selected genes or variants and limits the prediction of the selected phenotypes. However, phenotypes that we predict to have historically been under strong selection include horned-polled, coat colour, stature, ear morphology, and behaviour. Conclusions The bias towards common SNPs in the design of the BovineSNP50 assay led to the identification of recent selective sweeps associated with breed formation and common to only a small number of breeds rather than ancient events associated with domestication which could potentially be common to all European taurines. The limited SNP density, or marker resolution, of the BovineSNP50 assay significantly impacted the rate of false discovery of selective sweeps, however, we found sweeps in common between breeds which were confirmed using an ultra

  11. Identification of Mendelian inconsistencies between SNP and pedigree information of sibs

    PubMed Central

    2011-01-01

    Background Using SNP genotypes to apply genomic selection in breeding programs is becoming common practice. Tools to edit and check the quality of genotype data are required. Checking for Mendelian inconsistencies makes it possible to identify animals for which pedigree information and genotype information are not in agreement. Methods Straightforward tests to detect Mendelian inconsistencies exist that count the number of opposing homozygous marker (e.g. SNP) genotypes between parent and offspring (PAR-OFF). Here, we develop two tests to identify Mendelian inconsistencies between sibs. The first test counts SNP with opposing homozygous genotypes between sib pairs (SIBCOUNT). The second test compares pedigree and SNP-based relationships (SIBREL). All tests iteratively remove animals based on decreasing numbers of inconsistent parents and offspring or sibs. The PAR-OFF test, followed by either SIB test, was applied to a dataset comprising 2,078 genotyped cows and 211 genotyped sires. Theoretical expectations for distributions of test statistics of all three tests were calculated and compared to empirically derived values. Type I and II error rates were calculated after applying the tests to the edited data, while Mendelian inconsistencies were introduced by permuting pedigree against genotype data for various proportions of animals. Results Both SIB tests identified animal pairs for which pedigree and genomic relationships could be considered as inconsistent by visual inspection of a scatter plot of pairwise pedigree and SNP-based relationships. After removal of 235 animals with the PAR-OFF test, SIBCOUNT (SIBREL) identified 18 (22) additional inconsistent animals. Seventeen animals were identified by both methods. The numbers of incorrectly deleted animals (Type I error), were equally low for both methods, while the numbers of incorrectly non-deleted animals (Type II error), were considerably higher for SIBREL compared to SIBCOUNT. Conclusions Tests to remove

  12. Ecotoxicological assessment of PAHs and their dead-end metabolites after degradation by Mycobacterium sp. strain SNP11.

    PubMed

    Pagnout, Christophe; Rast, Claudine; Veber, Anne-Marie; Poupin, Pascal; Férard, Jean-François

    2006-10-01

    Mycobacterium sp. SNP11 has a high PAH biodegradation potential. In this paper, the toxicity of pyrene, fluoranthene, phenanthrene, and their dead-end metabolites, accumulated in the media after biodegradation by Mycobacterium sp. SNP11, were evaluated by a screening battery of acute, chronic, and genotoxic tests. According to the bioassays, performed on bacteria (Vibrio fischeri, Salmonella typhimurium strains TA1535/pSK1002, TA97a, TA98, TA100), algae (Pseudokirchneriella subcapitata), and crustaceans (Daphnia magna, Ceriodaphnia dubia), total disappearance or a very significant reduction of the (geno)toxic potential was observed after PAH degradation by Mycobacterium sp. SNP11.

  13. MA-SNP--A new genotype calling method for oligonucleotide SNP arrays modeling the batch effect with a normal mixture model.

    PubMed

    Wen, Yalu; Li, Ming; Fu, Wenjiang J

    2011-08-30

    Genome-wide association studies hold great promise in identifying disease-susceptibility variants and understanding the genetic etiology of complex diseases. Microarray technology enables the genotyping of millions of single nucleotide polymorphisms. Many factors in microarray studies, such as probe selection, sample quality, and experimental process and batch, have substantial effect on the genotype calling accuracy, which is crucial for downstream analyses. Failure to account for the variability of these sources may lead to inaccurate genotype calls and false positive and false negative findings. In this study, we develop a SNP-specific genotype calling algorithm based on the probe intensity composite representation (PICR) model, while using a normal mixture model to account for the variability of batch effect on the genotype calls. We demonstrate our method with SNP array data in a few studies, including the HapMap project, the coronary heart disease and the UK Blood Service Control studies by the Wellcome Trust Case-Control Consortium, and a methylation profiling study. Our single array based approach outperforms PICR and is comparable to the best multi-array genotype calling methods.

  14. Microfluidic linear hydrogel array for multiplexed single nucleotide polymorphism (SNP) detection.

    PubMed

    Jung, Yun Kyung; Kim, Jungkyu; Mathies, Richard A

    2015-03-17

    A PDMS-based microfluidic linear hydrogel array is developed for multiplexed single nucleotide polymorphism (SNP) detection. A sequence of three-dimensional (3D) hydrogel plugs containing the desired DNA probes is prepared by UV polymerization within a PDMS microchannel system. The fluorescently labeled target DNA is then electrophoresed through the sequence of hydrogel plugs for hybridization. Continued electrophoresis provides an electrophoretic wash that removes nonspecific binders. The capture gel array is imaged after washing at various temperatures (temperature gradient electrophoresis) to further distinguish perfect matches from mismatches. The ability of this microdevice to perform multiplex SNP genotyping is demonstrated by analyzing a mixture of model E. coli bacterial targets. This microfluidic hydrogel array is ∼1000 times more sensitive than planar microarrays due to the 3D gel capture, the hybridization time is much shorter due to electrophoretic control of the transport properties, and the stringent wash with temperature gradient electrophoresis enables analysis of single nucleotide mismatches with high specificity.

  15. Using false discovery rates to benchmark SNP-callers in next-generation sequencing projects.

    PubMed

    Farrer, Rhys A; Henk, Daniel A; MacLean, Dan; Studholme, David J; Fisher, Matthew C

    2013-01-01

    Sequence alignments form the basis for many comparative and population genomic studies. Alignment tools provide a range of accuracies dependent on the divergence between the sequences and the alignment methods. Despite widespread use, there is no standard method for assessing the accuracy of a dataset and alignment strategy after resequencing. We present a framework and tool for determining the overall accuracies of an input read dataset, alignment and SNP-calling method providing an isolate in that dataset has a corresponding, or closely related reference sequence available. In addition to this tool for comparing False Discovery Rates (FDR), we include a method for determining homozygous and heterozygous positions from an alignment using binomial probabilities for an expected error rate. We benchmark this method against other SNP callers using our FDR method with three fungal genomes, finding that it was able achieve a high level of accuracy. These tools are available at http://cfdr.sourceforge.net/.

  16. Using False Discovery Rates to Benchmark SNP-callers in next-generation sequencing projects

    PubMed Central

    Farrer, Rhys A.; Henk, Daniel A.; MacLean, Dan; Studholme, David J.; Fisher, Matthew C.

    2013-01-01

    Sequence alignments form the basis for many comparative and population genomic studies. Alignment tools provide a range of accuracies dependent on the divergence between the sequences and the alignment methods. Despite widespread use, there is no standard method for assessing the accuracy of a dataset and alignment strategy after resequencing. We present a framework and tool for determining the overall accuracies of an input read dataset, alignment and SNP-calling method providing an isolate in that dataset has a corresponding, or closely related reference sequence available. In addition to this tool for comparing False Discovery Rates (FDR), we include a method for determining homozygous and heterozygous positions from an alignment using binomial probabilities for an expected error rate. We benchmark this method against other SNP callers using our FDR method with three fungal genomes, finding that it was able achieve a high level of accuracy. These tools are available at http://cfdr.sourceforge.net/. PMID:23518929

  17. Genome-wide SNP association-based localization of a dwarfism gene in Friesian dwarf horses.

    PubMed

    Orr, N; Back, W; Gu, J; Leegwater, P; Govindarajan, P; Conroy, J; Ducro, B; Van Arendonk, J A M; MacHugh, D E; Ennis, S; Hill, E W; Brama, P A J

    2010-12-01

    The recent completion of the horse genome and commercial availability of an equine SNP genotyping array has facilitated the mapping of disease genes. We report putative localization of the gene responsible for dwarfism, a trait in Friesian horses that is thought to have a recessive mode of inheritance, to a 2-MB region of chromosome 14 using just 10 affected animals and 10 controls. We successfully genotyped 34,429 SNPs that were tested for association with dwarfism using chi-square tests. The most significant SNP in our study, BIEC2-239376 (P(2df)=4.54 × 10(-5), P(rec)=7.74 × 10(-6)), is located close to a gene implicated in human dwarfism. Fine-mapping and resequencing analyses did not aid in further localization of the causative variant, and replication of our findings in independent sample sets will be necessary to confirm these results.

  18. Light whole genome sequence for SNP discovery across domestic cat breeds

    PubMed Central

    2010-01-01

    Background The domestic cat has offered enormous genomic potential in the veterinary description of over 250 hereditary disease models as well as the occurrence of several deadly feline viruses (feline leukemia virus -- FeLV, feline coronavirus -- FECV, feline immunodeficiency virus - FIV) that are homologues to human scourges (cancer, SARS, and AIDS respectively). However, to realize this bio-medical potential, a high density single nucleotide polymorphism (SNP) map is required in order to accomplish disease and phenotype association discovery. Description To remedy this, we generated 3,178,297 paired fosmid-end Sanger sequence reads from seven cats, and combined these data with the publicly available 2X cat whole genome sequence. All sequence reads were assembled together to form a 3X whole genome assembly allowing the discovery of over three million SNPs. To reduce potential false positive SNPs due to the low coverage assembly, a low upper-limit was placed on sequence coverage and a high lower-limit on the quality of the discrepant bases at a potential variant site. In all domestic cats of different breeds: female Abyssinian, female American shorthair, male Cornish Rex, female European Burmese, female Persian, female Siamese, a male Ragdoll and a female African wildcat were sequenced lightly. We report a total of 964 k common SNPs suitable for a domestic cat SNP genotyping array and an additional 900 k SNPs detected between African wildcat and domestic cats breeds. An empirical sampling of 94 discovered SNPs were tested in the sequenced cats resulting in a SNP validation rate of 99%. Conclusions These data provide a large collection of mapped feline SNPs across the cat genome that will allow for the development of SNP genotyping platforms for mapping feline diseases. PMID:20576142

  19. Sensitive Quantification of Mosaicism Using High Density SNP Arrays and the Cumulative Distribution Function

    PubMed Central

    Markello, Thomas C.; Carlson-Donohoe, Hannah; Sincan, Murat; Adams, David; Bodine, David M.; Farrar, Jason E.; Vlachos, Adrianna; Lipton, Jeffrey M.; Auerbach, Arleen D.; Ostrander, Elaine A.; Chandrasekharappa, Settara C.; Boerkoel, Cornelius F.; Gahl, William A.

    2012-01-01

    Medicine is rapidly applying exome and genome sequencing to the diagnosis and management of human disease. Somatic mosaicism, however, is not readily detectable by these means, and yet it accounts for a significant portion of undiagnosed disease. We present a rapid and sensitive method, the Continuous Distribution Function as applied to single nucleotide polymorphism (SNP) array data, to quantify somatic mosaicism throughout the genome. We also demonstrate application of the method to novel diseases and mechanisms. PMID:22277120

  20. Genome-wide SNP analysis of the Systemic Capillary Leak Syndrome (Clarkson disease)

    PubMed Central

    Xie, Zhihui; Nagarajan, Vijayaraj; Sturdevant, Daniel E; Iwaki, Shoko; Chan, Eunice; Wisch, Laura; Young, Michael; Nelson, Celeste M; Porcella, Stephen F; Druey, Kirk M

    2013-01-01

    The Systemic Capillary Leak Syndrome (SCLS) is an extremely rare, orphan disease that resembles, and is frequently erroneously diagnosed as, systemic anaphylaxis. The disorder is characterized by repeated, transient, and seemingly unprovoked episodes of hypotensive shock and peripheral edema due to transient endothelial hyperpermeability. SCLS is often accompanied by a monoclonal gammopathy of unknown significance (MGUS). Using Affymetrix Single Nucleotide Polymorphism (SNP) microarrays, we performed the first genome-wide SNP analysis of SCLS in a cohort of 12 disease subjects and 18 controls. Exome capture sequencing was performed on genomic DNA from nine of these patients as validation for the SNP-chip discoveries and de novo data generation. We identified candidate susceptibility loci for SCLS, which included a region flanking CAV3 (3p25.3) as well as SNP clusters in PON1 (7q21.3), PSORS1C1 (6p21.3), and CHCHD3 (7q33). Among the most highly ranked discoveries were gene-associated SNPs in the uncharacterized LOC100130480 gene (rs6417039, rs2004296). Top case-associated SNPs were observed in BTRC (rs12355803, 3rs4436485), ARHGEF18 (rs11668246), CDH13 (rs4782779), and EDG2 (rs12552348), which encode proteins with known or suspected roles in B cell function and/or vascular integrity. 61 SNPs that were significantly associated with SCLS by microarray analysis were also detected and validated by exome deep sequencing. Functional annotation of highly ranked SNPs revealed enrichment of cell projections, cell junctions and adhesion, and molecules containing pleckstrin homology, Ras/Rho regulatory, and immunoglobulin Ig-like C2/fibronectin type III domains, all of which involve mechanistic functions that correlate with the SCLS phenotype. These results highlight SNPs with potential relevance to SCLS. PMID:24808988

  1. Haplotype inference from unphased SNP data in heterozygous polyploids based on SAT

    PubMed Central

    Neigenfind, Jost; Gyetvai, Gabor; Basekow, Rico; Diehl, Svenja; Achenbach, Ute; Gebhardt, Christiane; Selbig, Joachim; Kersten, Birgit

    2008-01-01

    Background Haplotype inference based on unphased SNP markers is an important task in population genetics. Although there are different approaches to the inference of haplotypes in diploid species, the existing software is not suitable for inferring haplotypes from unphased SNP data in polyploid species, such as the cultivated potato (Solanum tuberosum). Potato species are tetraploid and highly heterozygous. Results Here we present the software SATlotyper which is able to handle polyploid and polyallelic data. SATlo-typer uses the Boolean satisfiability problem to formulate Haplotype Inference by Pure Parsimony. The software excludes existing haplotype inferences, thus allowing for calculation of alternative inferences. As it is not known which of the multiple haplotype inferences are best supported by the given unphased data set, we use a bootstrapping procedure that allows for scoring of alternative inferences. Finally, by means of the bootstrapping scores, it is possible to optimise the phased genotypes belonging to a given haplotype inference. The program is evaluated with simulated and experimental SNP data generated for heterozygous tetraploid populations of potato. We show that, instead of taking the first haplotype inference reported by the program, we can significantly improve the quality of the final result by applying additional methods that include scoring of the alternative haplotype inferences and genotype optimisation. For a sub-population of nineteen individuals, the predicted results computed by SATlotyper were directly compared with results obtained by experimental haplotype inference via sequencing of cloned amplicons. Prediction and experiment gave similar results regarding the inferred haplotypes and phased genotypes. Conclusion Our results suggest that Haplotype Inference by Pure Parsimony can be solved efficiently by the SAT approach, even for data sets of unphased SNP from heterozygous polyploids. SATlotyper is freeware and is distributed as

  2. Evaluating the performance of Affymetrix SNP Array 6.0 platform with 400 Japanese individuals

    PubMed Central

    Nishida, Nao; Koike, Asako; Tajima, Atsushi; Ogasawara, Yuko; Ishibashi, Yoshimi; Uehara, Yasuka; Inoue, Ituro; Tokunaga, Katsushi

    2008-01-01

    Background With improvements in genotyping technologies, genome-wide association studies with hundreds of thousands of SNPs allow the identification of candidate genetic loci for multifactorial diseases in different populations. However, genotyping errors caused by genotyping platforms or genotype calling algorithms may lead to inflation of false associations between markers and phenotypes. In addition, the number of SNPs available for genome-wide association studies in the Japanese population has been investigated using only 45 samples in the HapMap project, which could lead to an inaccurate estimation of the number of SNPs with low minor allele frequencies. We genotyped 400 Japanese samples in order to estimate the number of SNPs available for genome-wide association studies in the Japanese population and to examine the performance of the current SNP Array 6.0 platform and the genotype calling algorithm "Birdseed". Results About 20% of the 909,622 SNP markers on the array were revealed to be monomorphic in the Japanese population. Consequently, 661,599 SNPs were available for genome-wide association studies in the Japanese population, after excluding the poorly behaving SNPs. The Birdseed algorithm accurately determined the genotype calls of each sample with a high overall call rate of over 99.5% and a high concordance rate of over 99.8% using more than 48 samples after removing low-quality samples by adjusting QC criteria. Conclusion Our results confirmed that the SNP Array 6.0 platform reached the level reported by the manufacturer, and thus genome-wide association studies using the SNP Array 6.0 platform have considerable potential to identify candidate susceptibility or resistance genetic factors for multifactorial diseases in the Japanese population, as well as in other populations. PMID:18803882

  3. Evaluation of TP53 Pro72Arg and MDM2 SNP285-SNP309 polymorphisms in an Italian cohort of LFS suggestive patients lacking identifiable TP53 germline mutations.

    PubMed

    Ponti, Francesca; Corsini, Serena; Gnoli, Maria; Pedrini, Elena; Mordenti, Marina; Sangiorgi, Luca

    2016-10-01

    Li-Fraumeni syndrome (LFS) is a rare genetic cancer predisposition disease, partly determined by the presence of a TP53 germline mutation; lacking thereof, in presence of a typical LFS phenotype, defines a wide group of 'LFS Suggestive' patients. Alternative LFS susceptibility genes have been investigated without promising results, thus suggesting other genetic determinants involvement in cancer predisposition. Hence, this study explores the single and combined effects of cancer risk, age of onset and cancer type of three single nucleotide polymorphisms (SNPs)-TP53 Pro72Arg, MDM2 SNP285 and SNP309-already described as modifiers on TP53 mutation carriers but not properly investigated in LFS Suggestive patients. This case-control study examines 34 Italian LFS Suggestive lacking of germline TP53 mutations and 95 tumour-free subjects. A significant prevalence of homozygous MDM2 SNP309 G in the LFS Suggestive group (p < 0.0005) confirms its contribute to cancer susceptibility, also highlighted in LFS TP53 positive families. Conversely its anticipating role on tumour onset has not been confirmed, as in our results it was associated with the SNP309 T allele. A strong combined outcome with a 'dosage' effect has also been reported for TP53 P72 and MDM2 SNP309 G allele on cancer susceptibility (p < 0.0005). Whereas the MDM2 SNP285 C allele neutralizing effect on MDM2 SNP309 G variant is not evident in our population. Although it needs further evaluations, obtained results strengthen the role of MDM2 SNP309 as a genetic factor in hereditary predisposition to cancer, so improving LFS Suggestive patients management.

  4. High-throughput SNP-genotyping analysis of the relationships among Ponto-Caspian sturgeon species

    PubMed Central

    Rastorguev, Sergey M; Nedoluzhko, Artem V; Mazur, Alexander M; Gruzdeva, Natalia M; Volkov, Alexander A; Barmintseva, Anna E; Mugue, Nikolai S; Prokhortchouk, Egor B

    2013-01-01

    Abstract Legally certified sturgeon fisheries require population protection and conservation methods, including DNA tests to identify the source of valuable sturgeon roe. However, the available genetic data are insufficient to distinguish between different sturgeon populations, and are even unable to distinguish between some species. We performed high-throughput single-nucleotide polymorphism (SNP)-genotyping analysis on different populations of Russian (Acipenser gueldenstaedtii), Persian (A. persicus), and Siberian (A. baerii) sturgeon species from the Caspian Sea region (Volga and Ural Rivers), the Azov Sea, and two Siberian rivers. We found that Russian sturgeons from the Volga and Ural Rivers were essentially indistinguishable, but they differed from Russian sturgeons in the Azov Sea, and from Persian and Siberian sturgeons. We identified eight SNPs that were sufficient to distinguish these sturgeon populations with 80% confidence, and allowed the development of markers to distinguish sturgeon species. Finally, on the basis of our SNP data, we propose that the A. baerii-like mitochondrial DNA found in some Russian sturgeons from the Caspian Sea arose via an introgression event during the Pleistocene glaciation. In the present study, the high-throughput genotyping analysis of several sturgeon populations was performed. SNP markers for species identification were defined. The possible explanation of the baerii-like mitotype presence in some Russian sturgeons in the Caspian Sea was suggested. PMID:24567827

  5. Demographic Trends in Korean Native Cattle Explained Using Bovine SNP50 Beadchip

    PubMed Central

    Sharma, Aditi; Lim, Dajeong; Chai, Han-Ha; Choi, Bong-Hwan; Cho, Yongmin

    2016-01-01

    Linkage disequilibrium (LD) is the non-random association between the loci and it could give us a preliminary insight into the genetic history of the population. In the present study LD patterns and effective population size (Ne) of three Korean cattle breeds along with Chinese, Japanese and Mongolian cattle were compared using the bovine Illumina SNP50 panel. The effective population size (Ne) is the number of breeding individuals in a population and is particularly important as it determines the rate at which genetic variation is lost. The genotype data in our study comprised a total of 129 samples, varying from 4 to 39 samples. After quality control there were ~29,000 single nucleotide polymorphisms (SNPs) for which r2 value was calculated. Average distance between SNP pairs was 1.14 Mb across all breeds. Average r2 between adjacent SNP pairs ranged between was 0.1 for Yanbian to 0.3 for Qinchuan. Effective population size of the breeds based on r2 varied from 16 in Hainan to 226 in Yanbian. Amongst the Korean native breeds effective population size of Brindle Hanwoo was the least with Ne = 59 and Brown Hanwoo was the highest with Ne = 83. The effective population size of the Korean cattle breeds has been decreasing alarmingly over the past generations. We suggest appropriate measures to be taken to prevent these local breeds in their native tracts. PMID:28154516

  6. SNP-based markers for discriminating olive (Olea europaea L.) cultivars.

    PubMed

    Reale, S; Doveri, S; Díaz, A; Angiolillo, A; Lucentini, L; Pilla, F; Martín, A; Donini, P; Lee, D

    2006-09-01

    A set of 11 polymorphic markers (1 cleaved amplified polymorphic sequence (CAPS), 2 sequence-characterized amplified regions (SCARs), and 8 single-nucleotide polymorphism (SNP)-derived markers) was obtained for olive cultivar identification by comparing DNA sequences from different accessions. Marker development was more efficient, using sequences from the database rather than cloning arbitrary DNA fragments. Analyses of the sequences of 3 genes from 11 diverse cultivars revealed an SNP frequency of 1 per 190 base pairs in exons and 1 per 149 base pairs in introns. Most mutations were silent or had little perceptible effect on the polypeptide encoded. The higher incidence of transversions (55%) suggests that methylation is not the major driving force for DNA base changes. Evidence of linkage disequilibrium in 2 pairs of markers has been detected. The set of predominantly SNP-based markers was used to genotype 65 olive samples obtained from Europe and Australia, and was able clearly to discriminate 77% of the cultivars. Samples, putatively of the same cultivar but derived from different sources, were revealed as identical, demonstrating the utility of these markers as tools for resolving nomenclature issues. Genotyping data were used for constructing a dendrogram by UPGMA cluster analysis using the simple matching similarity coefficient. Relationships between cultivars are discussed in relation to the route of olive's spread.

  7. Making a chocolate chip: development and evaluation of a 6K SNP array for Theobroma cacao

    PubMed Central

    Livingstone, Donald; Royaert, Stefan; Stack, Conrad; Mockaitis, Keithanne; May, Greg; Farmer, Andrew; Saski, Christopher; Schnell, Ray; Kuhn, David; Motamayor, Juan Carlos

    2015-01-01

    Theobroma cacao, the key ingredient in chocolate production, is one of the world's most important tree fruit crops, with ∼4,000,000 metric tons produced across 50 countries. To move towards gene discovery and marker-assisted breeding in cacao, a single-nucleotide polymorphism (SNP) identification project was undertaken using RNAseq data from 16 diverse cacao cultivars. RNA sequences were aligned to the assembled transcriptome of the cultivar Matina 1-6, and 330,000 SNPs within coding regions were identified. From these SNPs, a subset of 6,000 high-quality SNPs were selected for inclusion on an Illumina Infinium SNP array: the Cacao6kSNP array. Using Cacao6KSNP array data from over 1,000 cacao samples, we demonstrate that our custom array produces a saturated genetic map and can be used to distinguish among even closely related genotypes. Our study enhances and expands the genetic resources available to the cacao research community, and provides the genome-scale set of tools that are critical for advancing breeding with molecular markers in an agricultural species with high genetic diversity. PMID:26070980

  8. High-throughput SNP-genotyping analysis of the relationships among Ponto-Caspian sturgeon species.

    PubMed

    Rastorguev, Sergey M; Nedoluzhko, Artem V; Mazur, Alexander M; Gruzdeva, Natalia M; Volkov, Alexander A; Barmintseva, Anna E; Mugue, Nikolai S; Prokhortchouk, Egor B

    2013-08-01

    Legally certified sturgeon fisheries require population protection and conservation methods, including DNA tests to identify the source of valuable sturgeon roe. However, the available genetic data are insufficient to distinguish between different sturgeon populations, and are even unable to distinguish between some species. We performed high-throughput single-nucleotide polymorphism (SNP)-genotyping analysis on different populations of Russian (Acipenser gueldenstaedtii), Persian (A. persicus), and Siberian (A. baerii) sturgeon species from the Caspian Sea region (Volga and Ural Rivers), the Azov Sea, and two Siberian rivers. We found that Russian sturgeons from the Volga and Ural Rivers were essentially indistinguishable, but they differed from Russian sturgeons in the Azov Sea, and from Persian and Siberian sturgeons. We identified eight SNPs that were sufficient to distinguish these sturgeon populations with 80% confidence, and allowed the development of markers to distinguish sturgeon species. Finally, on the basis of our SNP data, we propose that the A. baerii-like mitochondrial DNA found in some Russian sturgeons from the Caspian Sea arose via an introgression event during the Pleistocene glaciation. In the present study, the high-throughput genotyping analysis of several sturgeon populations was performed. SNP markers for species identification were defined. The possible explanation of the baerii-like mitotype presence in some Russian sturgeons in the Caspian Sea was suggested.

  9. SNP genotyping in melons: genetic variation, population structure, and linkage disequilibrium.

    PubMed

    Esteras, Cristina; Formisano, Gelsomina; Roig, Cristina; Díaz, Aurora; Blanca, José; Garcia-Mas, Jordi; Gómez-Guillamón, María Luisa; López-Sesé, Ana Isabel; Lázaro, Almudena; Monforte, Antonio J; Picó, Belén

    2013-05-01

    Novel sequencing technologies were recently used to generate sequences from multiple melon (Cucumis melo L.) genotypes, enabling the in silico identification of large single nucleotide polymorphism (SNP) collections. In order to optimize the use of these markers, SNP validation and large-scale genotyping are necessary. In this paper, we present the first validated design for a genotyping array with 768 SNPs that are evenly distributed throughout the melon genome. This customized Illumina GoldenGate assay was used to genotype a collection of 74 accessions, representing most of the botanical groups of the species. Of the assayed loci, 91 % were successfully genotyped. The array provided a large number of polymorphic SNPs within and across accessions. This set of SNPs detected high levels of variation in accessions from this crop's center of origin as well as from several other areas of melon diversification. Allele distribution throughout the genome revealed regions that distinguished between the two main groups of cultivated accessions (inodorus and cantalupensis). Population structure analysis showed a subdivision into five subpopulations, reflecting the history of the crop. A considerably low level of LD was detected, which decayed rapidly within a few kilobases. Our results show that the GoldenGate assay can be used successfully for high-throughput SNP genotyping in melon. Since many of the genotyped accessions are currently being used as the parents of breeding populations in various programs, this set of mapped markers could be used for future mapping and breeding efforts.

  10. Making a chocolate chip: development and evaluation of a 6K SNP array for Theobroma cacao.

    PubMed

    Livingstone, Donald; Royaert, Stefan; Stack, Conrad; Mockaitis, Keithanne; May, Greg; Farmer, Andrew; Saski, Christopher; Schnell, Ray; Kuhn, David; Motamayor, Juan Carlos

    2015-08-01

    Theobroma cacao, the key ingredient in chocolate production, is one of the world's most important tree fruit crops, with ∼4,000,000 metric tons produced across 50 countries. To move towards gene discovery and marker-assisted breeding in cacao, a single-nucleotide polymorphism (SNP) identification project was undertaken using RNAseq data from 16 diverse cacao cultivars. RNA sequences were aligned to the assembled transcriptome of the cultivar Matina 1-6, and 330,000 SNPs within coding regions were identified. From these SNPs, a subset of 6,000 high-quality SNPs were selected for inclusion on an Illumina Infinium SNP array: the Cacao6kSNP array. Using Cacao6KSNP array data from over 1,000 cacao samples, we demonstrate that our custom array produces a saturated genetic map and can be used to distinguish among even closely related genotypes. Our study enhances and expands the genetic resources available to the cacao research community, and provides the genome-scale set of tools that are critical for advancing breeding with molecular markers in an agricultural species with high genetic diversity.

  11. SNP Marker Discovery in Pima Cotton (Gossypium barbadense L.) Leaf Transcriptomes

    PubMed Central

    Kottapalli, Pratibha; Ulloa, Mauricio; Kottapalli, Kameswara Rao; Payton, Paxton; Burke, John

    2016-01-01

    The objective of this study was to explore the known narrow genetic diversity and discover single-nucleotide polymorphic (SNP) markers for marker-assisted breeding within Pima cotton (Gossypium barbadense L.) leaf transcriptomes. cDNA from 25-day plants of three diverse cotton genotypes [Pima S6 (PS6), Pima S7 (PS7), and Pima 3-79 (P3-79)] was sequenced on Illumina sequencing platform. A total of 28.9 million reads (average read length of 138 bp) were generated by sequencing cDNA libraries of these three genotypes. The de novo assembly of reads generated transcriptome sets of 26,369 contigs for PS6, 25,870 contigs for PS7, and 24,796 contigs for P3-79. A Pima leaf reference transcriptome was generated consisting of 42,695 contigs. More than 10,000 single-nucleotide polymorphisms (SNPs) were identified between the genotypes, with 100% SNP frequency and a minimum of eight sequencing reads. The most prevalent SNP substitutions were C—T and A—G in these cotton genotypes. The putative SNPs identified can be utilized for characterizing genetic diversity, genotyping, and eventually in Pima cotton breeding through marker-assisted selection. PMID:27721653

  12. Triallelic SNP-mediated genotyping of regenerated protoplasts of the heterokaryotic fungus Rhizoctonia solani.

    PubMed

    Thomas, Elizabeth; Pakala, Suman; Fedorova, Natalie D; Nierman, William C; Cubeta, Marc A

    2012-04-15

    The aneuploid and heterokaryotic nuclear condition of the soil fungus Rhizoctonia solani have provided challenges in obtaining a complete genome sequence. To better aid in the assembly and annotation process, a protoplast and single nucleotide polymorphism (SNP)-based method was developed to identify regenerated protoplasts with a reduced nuclear genome. Protocol optimization experiments showed that enzymatic digestion of mycelium from a 24 h culture of R. solani increased the proportion of protoplasts with a diameter of ≤7.5 μm and 1-4 nuclei. To determine whether strains regenerated from protoplasts with a reduced number of nuclei were genetically different from the parental strain, triallelic SNPs identified from variance records of the genomic DNA sequence reads of R. solani were used in PCR-based genotyping assays. Results from 16 of the 24 SNP-based PCR assays provided evidence that one of the three alleles was missing in the 11 regenerated protoplast strains, suggesting that these strains represent a reduced genomic complement of the parental strain. The protoplast and triallelic SNP-based method used in this study may be useful in strain development and analysis of other basidiomycete fungi with complex nuclear genomes.

  13. SNP typing reveals similarity in Mycobacterium tuberculosis genetic diversity between Portugal and Northeast Brazil.

    PubMed

    Lopes, Joao S; Marques, Isabel; Soares, Patricia; Nebenzahl-Guimaraes, Hanna; Costa, Joao; Miranda, Anabela; Duarte, Raquel; Alves, Adriana; Macedo, Rita; Duarte, Tonya A; Barbosa, Theolis; Oliveira, Martha; Nery, Joilda S; Boechat, Neio; Pereira, Susan M; Barreto, Mauricio L; Pereira-Leal, Jose; Gomes, Maria Gabriela Miranda; Penha-Goncalves, Carlos

    2013-08-01

    Human tuberculosis is an infectious disease caused by bacteria from the Mycobacterium tuberculosis complex (MTBC). Although spoligotyping and MIRU-VNTR are standard methodologies in MTBC genetic epidemiology, recent studies suggest that Single Nucleotide Polymorphisms (SNP) are advantageous in phylogenetics and strain group/lineages identification. In this work we use a set of 79 SNPs to characterize 1987 MTBC isolates from Portugal and 141 from Northeast Brazil. All Brazilian samples were further characterized using spolygotyping. Phylogenetic analysis against a reference set revealed that about 95% of the isolates in both populations are singly attributed to bacterial lineage 4. Within this lineage, the most frequent strain groups in both Portugal and Brazil are LAM, followed by Haarlem and X. Contrary to these groups, strain group T showed a very different prevalence between Portugal (10%) and Brazil (1.5%). Spoligotype identification shows about 10% of mis-matches compared to the use of SNPs and a little more than 1% of strains unidentifiability. The mis-matches are observed in the most represented groups of our sample set (i.e., LAM and Haarlem) in almost the same proportion. Besides being more accurate in identifying strain groups/lineages, SNP-typing can also provide phylogenetic relationships between strain groups/lineages and, thus, indicate cases showing phylogenetic incongruence. Overall, the use of SNP-typing revealed striking similarities between MTBC populations from Portugal and Brazil.

  14. SNP Discovery and Development of a High-Density Genotyping Array for Sunflower

    PubMed Central

    Bachlava, Eleni; Taylor, Christopher A.; Tang, Shunxue; Bowers, John E.; Mandel, Jennifer R.; Burke, John M.; Knapp, Steven J.

    2012-01-01

    Recent advances in next-generation DNA sequencing technologies have made possible the development of high-throughput SNP genotyping platforms that allow for the simultaneous interrogation of thousands of single-nucleotide polymorphisms (SNPs). Such resources have the potential to facilitate the rapid development of high-density genetic maps, and to enable genome-wide association studies as well as molecular breeding approaches in a variety of taxa. Herein, we describe the development of a SNP genotyping resource for use in sunflower (Helianthus annuus L.). This work involved the development of a reference transcriptome assembly for sunflower, the discovery of thousands of high quality SNPs based on the generation and analysis of ca. 6 Gb of transcriptome re-sequencing data derived from multiple genotypes, the selection of 10,640 SNPs for inclusion in the genotyping array, and the use of the resulting array to screen a diverse panel of sunflower accessions as well as related wild species. The results of this work revealed a high frequency of polymorphic SNPs and relatively high level of cross-species transferability. Indeed, greater than 95% of successful SNP assays revealed polymorphism, and more than 90% of these assays could be successfully transferred to related wild species. Analysis of the polymorphism data revealed patterns of genetic differentiation that were largely congruent with the evolutionary history of sunflower, though the large number of markers allowed for finer resolution than has previously been possible. PMID:22238659

  15. Quadruplex-single nucleotide polymorphisms (Quad-SNP) influence gene expression difference among individuals.

    PubMed

    Baral, Aradhita; Kumar, Pankaj; Halder, Rashi; Mani, Prithvi; Yadav, Vinod Kumar; Singh, Ankita; Das, Swapan K; Chowdhury, Shantanu

    2012-05-01

    Non-canonical guanine quadruplex structures are not only predominant but also conserved among bacterial and mammalian promoters. Moreover recent findings directly implicate quadruplex structures in transcription. These argue for an intrinsic role of the structural motif and thereby posit that single nucleotide polymorphisms (SNP) that compromise the quadruplex architecture could influence function. To test this, we analysed SNPs within quadruplex motifs (Quad-SNP) and gene expression in 270 individuals across four populations (HapMap) representing more than 14,500 genotypes. Findings reveal significant association between quadruplex-SNPs and expression of the corresponding gene in individuals (P < 0.0001). Furthermore, analysis of Quad-SNPs obtained from population-scale sequencing of 1000 human genomes showed relative selection bias against alteration of the structural motif. To directly test the quadruplex-SNP-transcription connection, we constructed a reporter system using the RPS3 promoter-remarkable difference in promoter activity in the 'quadruplex-destabilized' versus 'quadruplex-intact' promoter was noticed. As a further test, we incorporated a quadruplex motif or its disrupted counterpart within a synthetic promoter reporter construct. The quadruplex motif, and not the disrupted-motif, enhanced transcription in human cell lines of different origin. Together, these findings build direct support for quadruplex-mediated transcription and suggest quadruplex-SNPs may play significant role in mechanistically understanding variations in gene expression among individuals.

  16. Demographic Trends in Korean Native Cattle Explained Using Bovine SNP50 Beadchip.

    PubMed

    Sharma, Aditi; Lim, Dajeong; Chai, Han-Ha; Choi, Bong-Hwan; Cho, Yongmin

    2016-12-01

    Linkage disequilibrium (LD) is the non-random association between the loci and it could give us a preliminary insight into the genetic history of the population. In the present study LD patterns and effective population size (Ne) of three Korean cattle breeds along with Chinese, Japanese and Mongolian cattle were compared using the bovine Illumina SNP50 panel. The effective population size (Ne) is the number of breeding individuals in a population and is particularly important as it determines the rate at which genetic variation is lost. The genotype data in our study comprised a total of 129 samples, varying from 4 to 39 samples. After quality control there were ~29,000 single nucleotide polymorphisms (SNPs) for which r(2) value was calculated. Average distance between SNP pairs was 1.14 Mb across all breeds. Average r(2) between adjacent SNP pairs ranged between was 0.1 for Yanbian to 0.3 for Qinchuan. Effective population size of the breeds based on r(2) varied from 16 in Hainan to 226 in Yanbian. Amongst the Korean native breeds effective population size of Brindle Hanwoo was the least with Ne = 59 and Brown Hanwoo was the highest with Ne = 83. The effective population size of the Korean cattle breeds has been decreasing alarmingly over the past generations. We suggest appropriate measures to be taken to prevent these local breeds in their native tracts.

  17. FASTSNP: an always up-to-date and extendable service for SNP function analysis and prioritization

    PubMed Central

    Yuan, Hsiang-Yu; Chiou, Jen-Jie; Tseng, Wen-Hsien; Liu, Chia-Hung; Liu, Chuan-Kun; Lin, Yi-Jung; Wang, Hui-Hung; Yao, Adam; Chen, Yuan-Tsong; Hsu, Chun-Nan

    2006-01-01

    Single nucleotide polymorphism (SNP) prioritization based on the phenotypic risk is essential for association studies. Assessment of the risk requires access to a variety of heterogeneous biological databases and analytical tools. FASTSNP (function analysis and selection tool for single nucleotide polymorphisms) is a web server that allows users to efficiently identify and prioritize high-risk SNPs according to their phenotypic risks and putative functional effects. A unique feature of FASTSNP is that the functional effect information used for SNP prioritization is always up-to-date, because FASTSNP extracts the information from 11 external web servers at query time using a team of web wrapper agents. Moreover, FASTSNP is extendable by simply deploying more Web wrapper agents. To validate the results of our prioritization, we analyzed 1569 SNPs from the SNP500Cancer database. The results show that SNPs with a high predicted risk exhibit low allele frequencies for the minor alleles, consistent with a well-known finding that a strong selective pressure exists for functional polymorphisms. We have been using FASTSNP for 2 years and FASTSNP enables us to discover a novel promoter polymorphism. FASTSNP is available at . PMID:16845089

  18. Rapid Detection of Rare Deleterious Variants by Next Generation Sequencing with Optional Microarray SNP Genotype Data.

    PubMed

    Watson, Christopher M; Crinnion, Laura A; Gurgel-Gianetti, Juliana; Harrison, Sally M; Daly, Catherine; Antanavicuite, Agne; Lascelles, Carolina; Markham, Alexander F; Pena, Sergio D J; Bonthron, David T; Carr, Ian M

    2015-09-01

    Autozygosity mapping is a powerful technique for the identification of rare, autosomal recessive, disease-causing genes. The ease with which this category of disease gene can be identified has greatly increased through the availability of genome-wide SNP genotyping microarrays and subsequently of exome sequencing. Although these methods have simplified the generation of experimental data, its analysis, particularly when disparate data types must be integrated, remains time consuming. Moreover, the huge volume of sequence variant data generated from next generation sequencing experiments opens up the possibility of using these data instead of microarray genotype data to identify disease loci. To allow these two types of data to be used in an integrated fashion, we have developed AgileVCFMapper, a program that performs both the mapping of disease loci by SNP genotyping and the analysis of potentially deleterious variants using exome sequence variant data, in a single step. This method does not require microarray SNP genotype data, although analysis with a combination of microarray and exome genotype data enables more precise delineation of disease loci, due to superior marker density and distribution.

  19. SNP discovery and development of a high-density genotyping array for sunflower.

    PubMed

    Bachlava, Eleni; Taylor, Christopher A; Tang, Shunxue; Bowers, John E; Mandel, Jennifer R; Burke, John M; Knapp, Steven J

    2012-01-01

    Recent advances in next-generation DNA sequencing technologies have made possible the development of high-throughput SNP genotyping platforms that allow for the simultaneous interrogation of thousands of single-nucleotide polymorphisms (SNPs). Such resources have the potential to facilitate the rapid development of high-density genetic maps, and to enable genome-wide association studies as well as molecular breeding approaches in a variety of taxa. Herein, we describe the development of a SNP genotyping resource for use in sunflower (Helianthus annuus L.). This work involved the development of a reference transcriptome assembly for sunflower, the discovery of thousands of high quality SNPs based on the generation and analysis of ca. 6 Gb of transcriptome re-sequencing data derived from multiple genotypes, the selection of 10,640 SNPs for inclusion in the genotyping array, and the use of the resulting array to screen a diverse panel of sunflower accessions as well as related wild species. The results of this work revealed a high frequency of polymorphic SNPs and relatively high level of cross-species transferability. Indeed, greater than 95% of successful SNP assays revealed polymorphism, and more than 90% of these assays could be successfully transferred to related wild species. Analysis of the polymorphism data revealed patterns of genetic differentiation that were largely congruent with the evolutionary history of sunflower, though the large number of markers allowed for finer resolution than has previously been possible.

  20. Efficient SNP Discovery by Combining Microarray and Lab-on-a-Chip Data for Animal Breeding and Selection

    PubMed Central

    Huang, Chao-Wei; Lin, Yu-Tsung; Ding, Shih-Torng; Lo, Ling-Ling; Wang, Pei-Hwa; Lin, En-Chung; Liu, Fang-Wei; Lu, Yen-Wen

    2015-01-01

    The genetic markers associated with economic traits have been widely explored for animal breeding. Among these markers, single-nucleotide polymorphism (SNPs) are gradually becoming a prevalent and effective evaluation tool. Since SNPs only focus on the genetic sequences of interest, it thereby reduces the evaluation time and cost. Compared to traditional approaches, SNP genotyping techniques incorporate informative genetic background, improve the breeding prediction accuracy and acquiesce breeding quality on the farm. This article therefore reviews the typical procedures of animal breeding using SNPs and the current status of related techniques. The associated SNP information and genotyping techniques, including microarray and Lab-on-a-Chip based platforms, along with their potential are highlighted. Examples in pig and poultry with different SNP loci linked to high economic trait values are given. The recommendations for utilizing SNP genotyping in nimal breeding are summarized. PMID:27600241

  1. Citrus (Rutaceae) SNP markers based on Competitive Allele-Specific PCR; transferability across the Aurantioideae subfamily1

    PubMed Central

    Garcia-Lor, Andres; Ancillo, Gema; Navarro, Luis; Ollitrault, Patrick

    2013-01-01

    • Premise of the study: Single nucleotide polymorphism (SNP) markers based on Competitive Allele-Specific PCR (KASPar) were developed from sequences of three Citrus species. Their transferability was tested in 63 Citrus genotypes and 19 relative genera of the subfamily Aurantioideae to estimate the potential of SNP markers, selected from a limited intrageneric discovery panel, for ongoing broader diversity analysis at the intra- and intergeneric levels and systematic germplasm bank characterization. • Methods and Results: Forty-two SNP markers were developed using KASPar technology. Forty-one were successfully genotyped in all of the Citrus germplasm, where intra- and interspecific polymorphisms were observed. The transferability and diversity decreased with increasing taxonomic distance. • Conclusions: SNP markers based on the KASPar method developed from sequence data of a limited intrageneric discovery panel provide a valuable molecular resource for genetic diversity analysis of germplasm within a genus and should be useful for germplasm fingerprinting at a much broader diversity level. PMID:25202535

  2. Developing Single Nucleotide Polymorphism (SNP) markers from transcriptome sequences for the identification of longan (Dimocarpus longan) germplasm

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Longan (Dimocarpus longan Lour.) is an important tropical fruit tree crop. Accurate varietal identification is essential for germplasm management and breeding. Using longan transcriptome sequences from public databases, we developed single nucleotide polymorphism (SNP) markers; validated 60 SNPs in...

  3. [Novel mechanism of 3' exonuclease of polymerase in maintenance of DNA replication fidelity and its application in SNP assay].

    PubMed

    Chen, Lin-Ling; Zhang, Jia; Peng, Cui-Ying; Liao, Duan-Fang; Li, Hong-Jian; Gao, Han-Lin; Li, Kai

    2005-03-01

    Polymerase with 3' to 5'exonulcease plays an important role in the maintenance of in vivo DNA replication fidelity. In order to develop more reliable SNP assays, we revisit the underlying molecular mechanisms by which DNA polymerases with 3' exonucleases maintain high fidelity of DNA replication. In addition to mismatch removal by proofreading, we recently discovered a premature termination of polymerization by a new mechanism of OFF-switch. This novel ON/OFF switch turns off DNA polymerization from mismatched primers and turns on DNA polymerization from matched primers. Two SNP assays were developed based on the proofreading and the newly identified OFF-switch respectively: terminal labeled primer extension and the ON/OFF switch operated SNP assay. These two new methods are well adapted to conventional techniques such as electrophoresis, real time PCR, microplates, and microarray. Application of these reliable SNP assays will greatly facilitate genetic and biomedical studies in the post-genome era.

  4. A Multiple-SNP Approach for Genome-Wide Association Study of Milk Production Traits in Chinese Holstein Cattle

    PubMed Central

    Fang, Ming; Fu, Weixuan; Jiang, Dan; Zhang, Qin; Sun, Dongxiao; Ding, Xiangdong; Liu, Jianfeng

    2014-01-01

    The multiple-SNP analysis has been studied by many researchers, in which the effects of multiple SNPs are simultaneously estimated and tested in a multiple linear regression. The multiple-SNP association analysis usually has higher power and lower false-positive rate for detecting causative SNP(s) than single marker analysis (SMA). Several methods have been proposed to simultaneously estimate and test multiple SNP effects. In this research, a fast method called MEML (Mixed model based Expectation-Maximization Lasso algorithm) was developed for simultaneously estimate of multiple SNP effects. An improved Lasso prior was assigned to SNP effects which were estimated by searching the maximum joint posterior mode. The residual polygenic effect was included in the model to absorb many tiny SNP effects, which is treated as missing data in our EM algorithm. A series of simulation experiments were conducted to validate the proposed method, and the results showed that compared with SMMA, the new method can dramatically decrease the false-positive rate. The new method was also applied to the 50k SNP-panel dataset for genome-wide association study of milk production traits in Chinese Holstein cattle. Totally, 39 significant SNPs and their nearby 25 genes were found. The number of significant SNPs is remarkably fewer than that by SMMA which found 105 significant SNPs. Among 39 significant SNPs, 8 were also found by SMMA and several well-known QTLs or genes were confirmed again; furthermore, we also got some positional candidate gene with potential function of effecting milk production traits. These novel findings in our research should be valuable for further investigation. PMID:25148050

  5. TheSNPpit—A High Performance Database System for Managing Large Scale SNP Data

    PubMed Central

    Groeneveld, Eildert; Lichtenberg, Helmut

    2016-01-01

    The fast development of high throughput genotyping has opened up new possibilities in genetics while at the same time producing considerable data handling issues. TheSNPpit is a database system for managing large amounts of multi panel SNP genotype data from any genotyping platform. With an increasing rate of genotyping in areas like animal and plant breeding as well as human genetics, already now hundreds of thousand of individuals need to be managed. While the common database design with one row per SNP can manage hundreds of samples this approach becomes progressively slower as the size of the data sets increase until it finally fails completely once tens or even hundreds of thousands of individuals need to be managed. TheSNPpit has implemented three ideas to also accomodate such large scale experiments: highly compressed vector storage in a relational database, set based data manipulation, and a very fast export written in C with Perl as the base for the framework and PostgreSQL as the database backend. Its novel subset system allows the creation of named subsets based on the filtering of SNP (based on major allele frequency, no-calls, and chromosomes) and manually applied sample and SNP lists at negligible storage costs, thus avoiding the issue of proliferating file copies. The named subsets are exported for down stream analysis. PLINK ped and map files are processed as in- and outputs. TheSNPpit allows management of different panel sizes in the same population of individuals when higher density panels replace previous lower density versions as it occurs in animal and plant breeding programs. A completely generalized procedure allows storage of phenotypes. TheSNPpit only occupies 2 bits for storing a single SNP implying a capacity of 4 mio SNPs per 1MB of disk storage. To investigate performance scaling, a database with more than 18.5 mio samples has been created with 3.4 trillion SNPs from 12 panels ranging from 1000 through 20 mio SNPs resulting in a

  6. TheSNPpit-A High Performance Database System for Managing Large Scale SNP Data.

    PubMed

    Groeneveld, Eildert; Lichtenberg, Helmut

    2016-01-01

    The fast development of high throughput genotyping has opened up new possibilities in genetics while at the same time producing considerable data handling issues. TheSNPpit is a database system for managing large amounts of multi panel SNP genotype data from any genotyping platform. With an increasing rate of genotyping in areas like animal and plant breeding as well as human genetics, already now hundreds of thousand of individuals need to be managed. While the common database design with one row per SNP can manage hundreds of samples this approach becomes progressively slower as the size of the data sets increase until it finally fails completely once tens or even hundreds of thousands of individuals need to be managed. TheSNPpit has implemented three ideas to also accomodate such large scale experiments: highly compressed vector storage in a relational database, set based data manipulation, and a very fast export written in C with Perl as the base for the framework and PostgreSQL as the database backend. Its novel subset system allows the creation of named subsets based on the filtering of SNP (based on major allele frequency, no-calls, and chromosomes) and manually applied sample and SNP lists at negligible storage costs, thus avoiding the issue of proliferating file copies. The named subsets are exported for down stream analysis. PLINK ped and map files are processed as in- and outputs. TheSNPpit allows management of different panel sizes in the same population of individuals when higher density panels replace previous lower density versions as it occurs in animal and plant breeding programs. A completely generalized procedure allows storage of phenotypes. TheSNPpit only occupies 2 bits for storing a single SNP implying a capacity of 4 mio SNPs per 1MB of disk storage. To investigate performance scaling, a database with more than 18.5 mio samples has been created with 3.4 trillion SNPs from 12 panels ranging from 1000 through 20 mio SNPs resulting in a

  7. SNP genotypes of olfactory receptor genes associated with olfactory ability in German Shepherd dogs.

    PubMed

    Yang, M; Geng, G-J; Zhang, W; Cui, L; Zhang, H-X; Zheng, J-L

    2016-04-01

    To find out the relationship between SNP genotypes of canine olfactory receptor genes and olfactory ability, 28 males and 20 females from German Shepherd dogs in police service were scored by odor detection tests and analyzed using the Beckman GenomeLab SNPstream. The representative 22 SNP loci from the exonic regions of 12 olfactory receptor genes were investigated, and three kinds of odor (human, ice drug and trinitrotoluene) were detected. The results showed that the SNP genotypes at the OR10H1-like:c.632C>T, OR10H1-like:c.770A>T, OR2K2-like:c.518G>A, OR4C11-like:c.511T>G and OR4C11-like:c.692G>A loci had a statistically significant effect on the scenting abilities (P < 0.001). The kind of odor influenced the performances of the dogs (P < 0.001). In addition, there were interactions between genotype and the kind of odor at the following loci: OR10H1-like:c.632C>T, OR10H1-like:c.770A>T, OR4C11-like:c.511T>G and OR4C11-like:c.692G>A (P < 0.001). The dogs with genotype CC at the OR10H1-like:c.632C>T, genotype AA at the OR10H1-like:c.770A>T, genotype TT at the OR4C11-like:c.511T>G and genotype GG at the OR4C11-like:c.692G>A loci did better at detecting the ice drug. We concluded that there was linkage between certain SNP genotypes and the olfactory ability of dogs and that SNP genotypes might be useful in determining dogs' scenting potential.

  8. Identification of SNP and SSR Markers in Finger Millet Using Next Generation Sequencing Technologies

    PubMed Central

    Gimode, Davis; Odeny, Damaris A.; de Villiers, Etienne P.; Wanyonyi, Solomon; Dida, Mathews M.; Mneney, Emmarold E.; Muchugi, Alice; Machuka, Jesse; de Villiers, Santie M.

    2016-01-01

    Finger millet is an important cereal crop in eastern Africa and southern India with excellent grain storage quality and unique ability to thrive in extreme environmental conditions. Since negligible attention has been paid to improving this crop to date, the current study used Next Generation Sequencing (NGS) technologies to develop both Simple Sequence Repeat (SSR) and Single Nucleotide Polymorphism (SNP) markers. Genomic DNA from cultivated finger millet genotypes KNE755 and KNE796 was sequenced using both Roche 454 and Illumina technologies. Non-organelle sequencing reads were assembled into 207 Mbp representing approximately 13% of the finger millet genome. We identified 10,327 SSRs and 23,285 non-homeologous SNPs and tested 101 of each for polymorphism across a diverse set of wild and cultivated finger millet germplasm. For the 49 polymorphic SSRs, the mean polymorphism information content (PIC) was 0.42, ranging from 0.16 to 0.77. We also validated 92 SNP markers, 80 of which were polymorphic with a mean PIC of 0.29 across 30 wild and 59 cultivated accessions. Seventy-six of the 80 SNPs were polymorphic across 30 wild germplasm with a mean PIC of 0.30 while only 22 of the SNP markers showed polymorphism among the 59 cultivated accessions with an average PIC value of 0.15. Genetic diversity analysis using the polymorphic SNP markers revealed two major clusters; one of wild and another of cultivated accessions. Detailed STRUCTURE analysis confirmed this grouping pattern and further revealed 2 sub-populations within wild E. coracana subsp. africana. Both STRUCTURE and genetic diversity analysis assisted with the correct identification of the new germplasm collections. These polymorphic SSR and SNP markers are a significant addition to the existing 82 published SSRs, especially with regard to the previously reported low polymorphism levels in finger millet. Our results also reveal an unexploited finger millet genetic resource that can be included in the regional

  9. Supplementing High-Density SNP Microarrays for Additional Coverage of Disease-Related Genes: Addiction as a Paradigm

    PubMed Central

    Saccone, Scott F.; Bierut, Laura J.; Chesler, Elissa J.; Kalivas, Peter W.; Lerman, Caryn; Saccone, Nancy L.; Uhl, George R.; Li, Chuan-Yun; Philip, Vivek M.; Edenberg, Howard J.; Sherry, Stephen T.; Feolo, Michael; Moyzis, Robert K.; Rutter, Joni L.

    2009-01-01

    Commercial SNP microarrays now provide comprehensive and affordable coverage of the human genome. However, some diseases have biologically relevant genomic regions that may require additional coverage. Addiction, for example, is thought to be influenced by complex interactions among many relevant genes and pathways. We have assembled a list of 486 biologically relevant genes nominated by a panel of experts on addiction. We then added 424 genes that showed evidence of association with addiction phenotypes through mouse QTL mappings and gene co-expression analysis. We demonstrate that there are a substantial number of SNPs in these genes that are not well represented by commercial SNP platforms. We address this problem by introducing a publicly available SNP database for addiction. The database is annotated using numeric prioritization scores indicating the extent of biological relevance. The scores incorporate a number of factors such as SNP/gene functional properties (including synonymy and promoter regions), data from mouse systems genetics and measures of human/mouse evolutionary conservation. We then used HapMap genotyping data to determine if a SNP is tagged by a commercial microarray through linkage disequilibrium. This combination of biological prioritization scores and LD tagging annotation will enable addiction researchers to supplement commercial SNP microarrays to ensure comprehensive coverage of biologically relevant regions. PMID:19381300

  10. Supplementing High-Density SNP Microarrays for Additional Coverage of Disease-Related Genes: Addiction as a Paradigm

    SciTech Connect

    SacconePhD, Scott F; Chesler, Elissa J; Bierut, Laura J; Kalivas, Peter J; Lerman, Caryn; Saccone, Nancy L; Uhl, George R; Li, Chuan-Yun; Philip, Vivek M; Edenberg, Howard; Sherry, Steven; Feolo, Michael; Moyzis, Robert K; Rutter, Joni L

    2009-01-01

    Commercial SNP microarrays now provide comprehensive and affordable coverage of the human genome. However, some diseases have biologically relevant genomic regions that may require additional coverage. Addiction, for example, is thought to be influenced by complex interactions among many relevant genes and pathways. We have assembled a list of 486 biologically relevant genes nominated by a panel of experts on addiction. We then added 424 genes that showed evidence of association with addiction phenotypes through mouse QTL mappings and gene co-expression analysis. We demonstrate that there are a substantial number of SNPs in these genes that are not well represented by commercial SNP platforms. We address this problem by introducing a publicly available SNP database for addiction. The database is annotated using numeric prioritization scores indicating the extent of biological relevance. The scores incorporate a number of factors such as SNP/gene functional properties (including synonymy and promoter regions), data from mouse systems genetics and measures of human/mouse evolutionary conservation. We then used HapMap genotyping data to determine if a SNP is tagged by a commercial microarray through linkage disequilibrium. This combination of biological prioritization scores and LD tagging annotation will enable addiction researchers to supplement commercial SNP microarrays to ensure comprehensive coverage of biologically relevant regions.

  11. Allele frequencies for 40 autosomal SNP loci typed for US population samples using electrospray ionization mass spectrometry

    PubMed Central

    Kiesler, Kevin M.; Vallone, Peter M.

    2013-01-01

    Aim To type a set of 194 US African American, Caucasian, and Hispanic samples (self-declared ancestry) for 40 autosomal single nucleotide polymorphism (SNP) markers intended for human identification purposes. Methods Genotyping was performed on an automated commercial electrospray ionization time-of-flight mass spectrometer, the PLEX-ID. The 40 SNP markers were amplified in eight unique 5plex PCRs, desalted, and resolved based on amplicon mass. For each of the three US sample groups statistical analyses were performed on the resulting genotypes. Results The assay was found to be robust and capable of genotyping the 40 SNP markers consuming approximately 4 nanograms of template per sample. The combined random match probabilities for the 40 SNP assay ranged from 10−16 to 10−21. Conclusion The multiplex PLEX-ID SNP-40 assay is the first fully automated genotyping method capable of typing a panel of 40 forensically relevant autosomal SNP markers on a mass spectrometry platform. The data produced provided the first allele frequencies estimates for these 40 SNPs in a National Institute of Standards and Technology US population sample set. No population bias was detected although one locus deviated from its expected level of heterozygosity. PMID:23771752

  12. Elucidation of the ‘Honeycrisp’ pedigree through haplotype analysis with a multi-family integrated SNP linkage map and a large apple (Malus×domestica) pedigree-connected SNP data set

    PubMed Central

    Howard, Nicholas P; van de Weg, Eric; Bedford, David S; Peace, Cameron P; Vanderzande, Stijn; Clark, Matthew D; Teh, Soon Li; Cai, Lichun; Luby, James J

    2017-01-01

    The apple (Malus×domestica) cultivar Honeycrisp has become important economically and as a breeding parent. An earlier study with SSR markers indicated the original recorded pedigree of ‘Honeycrisp’ was incorrect and ‘Keepsake’ was identified as one putative parent, the other being unknown. The objective of this study was to verify ‘Keepsake’ as a parent and identify and genetically describe the unknown parent and its grandparents. A multi-family based dense and high-quality integrated SNP map was created using the apple 8 K Illumina Infinium SNP array. This map was used alongside a large pedigree-connected data set from the RosBREED project to build extended SNP haplotypes and to identify pedigree relationships. ‘Keepsake’ was verified as one parent of ‘Honeycrisp’ and ‘Duchess of Oldenburg’ and ‘Golden Delicious’ were identified as grandparents through the unknown parent. Following this finding, siblings of ‘Honeycrisp’ were identified using the SNP data. Breeding records from several of these siblings suggested that the previously unreported parent is a University of Minnesota selection, MN1627. This selection is no longer available, but now is genetically described through imputed SNP haplotypes. We also present the mosaic grandparental composition of ‘Honeycrisp’ for each of its 17 chromosome pairs. This new pedigree and genetic information will be useful in future pedigree-based genetic studies to connect ‘Honeycrisp’ with other cultivars used widely in apple breeding programs. The created SNP linkage map will benefit future research using the data from the Illumina apple 8 and 20 K and Affymetrix 480 K SNP arrays. PMID:28243452

  13. Imputation of microsatellite alleles from dense SNP genotypes for parentage verification across multiple Bos taurus and Bos indicus breeds

    PubMed Central

    McClure, Matthew C.; Sonstegard, Tad S.; Wiggans, George R.; Van Eenennaam, Alison L.; Weber, Kristina L.; Penedo, Cecilia T.; Berry, Donagh P.; Flynn, John; Garcia, Jose F.; Carmo, Adriana S.; Regitano, Luciana C. A.; Albuquerque, Milla; Silva, Marcos V. G. B.; Machado, Marco A.; Coffey, Mike; Moore, Kirsty; Boscher, Marie-Yvonne; Genestout, Lucie; Mazza, Raffaele; Taylor, Jeremy F.; Schnabel, Robert D.; Simpson, Barry; Marques, Elisa; McEwan, John C.; Cromie, Andrew; Coutinho, Luiz L.; Kuehn, Larry A.; Keele, John W.; Piper, Emily K.; Cook, Jim; Williams, Robert; Van Tassell, Curtis P.

    2013-01-01

    To assist cattle producers transition from microsatellite (MS) to single nucleotide polymorphism (SNP) genotyping for parental verification we previously devised an effective and inexpensive method to impute MS alleles from SNP haplotypes. While the reported method was verified with only a limited data set (N = 479) from Brown Swiss, Guernsey, Holstein, and Jersey cattle, some of the MS-SNP haplotype associations were concordant across these phylogenetically diverse breeds. This implied that some haplotypes predate modern breed formation and remain in strong linkage disequilibrium. To expand the utility of MS allele imputation across breeds, MS and SNP data from more than 8000 animals representing 39 breeds (Bos taurus and B. indicus) were used to predict 9410 SNP haplotypes, incorporating an average of 73 SNPs per haplotype, for which alleles from 12 MS markers could be accurately be imputed. Approximately 25% of the MS-SNP haplotypes were present in multiple breeds (N = 2 to 36 breeds). These shared haplotypes allowed for MS imputation in breeds that were not represented in the reference population with only a small increase in Mendelian inheritance inconsistancies. Our reported reference haplotypes can be used for any cattle breed and the reported methods can be applied to any species to aid the transition from MS to SNP genetic markers. While ~91% of the animals with imputed alleles for 12 MS markers had ≤1 Mendelian inheritance conflicts with their parents' reported MS genotypes, this figure was 96% for our reference animals, indicating potential errors in the reported MS genotypes. The workflow we suggest autocorrects for genotyping errors and rare haplotypes, by MS genotyping animals whose imputed MS alleles fail parentage verification, and then incorporating those animals into the reference dataset. PMID:24065982

  14. Somatic Mutation of the SNP rs11614913 and Its Association with Increased MIR 196A2 Expression in Breast Cancer.

    PubMed

    Zhao, Huanhuan; Xu, Jingman; Zhao, Dan; Geng, Meijuan; Ge, Haize; Fu, Li; Zhu, Zhengmao

    2016-02-01

    Common genetic variants (single-nucleotide polymorphisms [SNPs]) in microRNA genes may alter their maturation or expression, resulting in varied functional consequences. Several studies have evaluated the association between the SNP rs11614913 and cancer risk in diverse populations and in a range of cancers, with contradictory outcomes. In this study, we examined 114 paired samples (tumor and normal tissues) from breast cancer patients to study the genotype distribution and somatic mutation of the SNP in MIR 196A2 (rs11614913 C-T). In addition, we evaluated their influence on the mature MIR 196A2 expression. We found that 14% (16/114) of tumors underwent somatic mutation of the SNP rs11614913. Moreover, the CT heterozygous and the CC homozygous states of SNP rs11614913 were more prone to mutation, while the TT homozygous state appeared to be resistant. We further detected a significant increase (p = 0.002) in mature MIR 196A2 expression in breast cancer. In particular, we found a significant association between the occurrence of SNP rs11614913 mutation and high expression (p = 0.0002). In addition, the mature MIR 196A2 expression level was significantly associated with the higher tumor grade (p = 0.004). Taken together, our results seem to demonstrate that somatic mutation of SNP rs11614913 in MIR 196A2 can have an influence on its expression. In addition, it indicated that an unknown mechanism is responsible for both the mutation of SNP rs11614913 and the dysregulation of mature MIR 196A2 expression.

  15. Deriving Gene Networks from SNP Associated with Triacylglycerol and Phospholipid Fatty Acid Fractions from Ribeyes of Angus Cattle

    PubMed Central

    Buchanan, Justin W.; Reecy, James M.; Garrick, Dorian J.; Duan, Qing; Beitz, Don C.; Koltes, James E.; Saatchi, Mahdi; Koesterke, Lars; Mateescu, Raluca G.

    2016-01-01

    The fatty acid profile of beef is a complex trait that can benefit from gene-interaction network analysis to understand relationships among loci that contribute to phenotypic variation. Phenotypic measures of fatty acid profile from triacylglycerol and phospholipid fractions of longissimus muscle, pedigree information, and Illumina 54 k bovine SNP genotypes were utilized to derive an annotated gene network associated with fatty acid composition in 1,833 Angus beef cattle. The Bayes-B statistical model was utilized to perform a genome wide association study to estimate associations between 54 k SNP genotypes and 39 individual fatty acid phenotypes within each fraction. Posterior means of the effects were estimated for each of the 54 k SNP and for the collective effects of all the SNP in every 1-Mb genomic window in terms of the proportion of genetic variance explained by the window. Windows that explained the largest proportions of genetic variance for individual lipids were found in the triacylglycerol fraction. There was almost no overlap in the genomic regions explaining variance between the triacylglycerol and phospholipid fractions. Partial correlations were used to identify correlated regions of the genome for the set of largest 1 Mb windows that explained up to 35% genetic variation in either fatty acid fraction. SNP were allocated to windows based on the bovine UMD3.1 assembly. Gene network clusters were generated utilizing a partial correlation and information theory algorithm. Results were used in conjunction with network scoring and visualization software to analyze correlated SNP across 39 fatty acid phenotypes to identify SNP of significance. Significant pathways implicated in fatty acid metabolism through GO term enrichment analysis included homeostasis of number of cells, homeostatic process, coenzyme/cofactor activity, and immunoglobulin. These results suggest different metabolic pathways regulate the development of different types of lipids found in

  16. Plastid DNA sequencing and nuclear SNP genotyping help resolve the puzzle of central American Platanus

    PubMed Central

    De Castro, Olga; Di Maio, Antonietta; Lozada García, José Armando; Piacenti, Danilo; Vázquez-Torres, Mario; De Luca, Paolo

    2013-01-01

    Background and Aims Recent research on the history of Platanus reveals that hybridization phenomena occurred in the central American species. This study has two goals: to help resolve the evolutive puzzle of central American Platanus, and to test the potential of real-time polymerase chain reaction (PCR) for detecting ancient hybridization. Methods Sequencing of a uniparental plastid DNA marker [psbA-trnH(GUG) intergenic spacer] and qualitative and quantitative single nucleotide polymorphism (SNP) genotyping of biparental nuclear ribosomal DNA (nrDNA) markers [LEAFY intron 2 (LFY-i2) and internal transcribed spacer 2 (ITS2)] were used. Key Results Based on the SNP genotyping results, several Platanus accessions show the presence of hybridization/introgression, including some accessions of P. rzedowskii and of P. mexicana var. interior and one of P. mexicana var. mexicana from Oaxaca (= P. oaxacana). Based on haplotype analyses of the psbA-trnH spacer, five haplotypes were detected. The most common of these is present in taxa belonging to P. orientalis, P. racemosa sensu lato, some accessions of P. occidentalis sensu stricto (s.s.) from Texas, P. occidentalis var. palmeri, P. mexicana s.s. and P. rzedowskii. This is highly relevant to genetic relationships with the haplotypes present in P. occidentalis s.s. and P. mexicana var. interior. Conclusions Hybridization and introgression events between lineages ancestral to modern central and eastern North American Platanus species occurred. Plastid haplotypes and qualitative and quantitative SNP genotyping provide information critical for understanding the complex history of Mexican Platanus. Compared with the usual molecular techniques of sub-cloning, sequencing and genotyping, real-time PCR assay is a quick and sensitive technique for analysing complex evolutionary patterns. PMID:23798602

  17. SNP identification and SNAP marker development for a GmNARK gene controlling supernodulation in soybean.

    PubMed

    Kim, M Y; Van, K; Lestari, P; Moon, J-K; Lee, S-H

    2005-04-01

    Supernodulation in soybean (Glycine max L. Merr.) is an important source of nitrogen supply to subterranean ecological systems. Single nucleotide-amplified polymorphism (SNAP) markers for supernodulation should allow rapid screening of the trait in early growth stages, without the need for inoculation and phenotyping. The gene GmNARK (Glycine max nodule autoregulation receptor kinase), controlling autoregulation of nodulation, was found to have a single nucleotide polymorphism (SNP) between the wild-type cultivar Sinpaldalkong 2 and its supernodulating mutant, SS2-2. Transversion of A to T at the 959-bp position of the GmNARK sequence results in a change of lysine (AAG) to a stop codon (TAG), thus terminating its translation in SS2-2. Based on the identified SNP in GmNARK, five primer pairs specific to each allele were designed using the WebSnaper program to develop a SNAP marker for supernodulation. One A-specific primer pair produced a band present in only Sinpaldalkong 2, while two T-specific pairs showed a band in only SS2-2. Both complementary PCRs, using each allele-specific primer pair were performed to genotype supernodulation against F2 progeny of Sinpaldalkong 2 x SS2-2. Among 28 individuals with the normal phenotype, eight individuals having only the A-allele-specific band were homozygous and normal, while 20 individuals were found to be heterozygous at the SNP having both A and T bands. Twelve supernodulating individuals showed only the band specific to the T allele. This SNAP marker for supernodulation could easily be analyzed through simple PCR and agarose gel electrophoresis. Therefore, use of this SNAP marker might be faster, cheaper, and more reproducible than using other genotyping methods, such as a cleaved amplified polymorphic sequence marker, which demand of restriction enzymes.

  18. Transcriptome-based SNP discovery by GBS and the construction of a genetic map for olive.

    PubMed

    İpek, Ahmet; İpek, Meryem; Ercişli, Sezai; Tangu, Nesrin Aktepe

    2017-02-18

    Molecular markers located in the genic regions of plants are valuable tools for the identification of candidate genes of economically important traits and consequent use in marker-assisted selection (MAS). In the past, simple sequence repeat markers (SSRs) and single-nucleotide polymorphisms (SNPs) located in expressed sequence tags (ESTs) were developed by sequencing RNA derived from different plant tissues, which involves laborious RNA extraction, mRNA isolation, and cDNA synthesis. In order to develop SNP markers located in olive transcriptomes, we used the recently developed genotyping-by-sequencing (GBS) technique. An analysis was done for 125 olive DNA samples (123 DNA samples from a cross-pollinated F1 mapping population, and two samples from parents). From 45 to 66% of Illumina reads from GBS analysis were aligned to the olive transcriptome. A total of 22,033 transcriptome-based SNP markers were identified, and 3384 of these were mapped in the olive genome. The genetic linkage map constructed in this study consists of 1 cleaved amplified polymorphic sequence (CAPS), 19 SSR, and 3384 transcriptome-based SNP markers. The map covers 3340.8 cM of the olive genome in 23 linkage groups, with the length of the linkage groups ranging from 55.6 to 248.7 cM. Average map distance between flanking markers was 0.98 cM. This genetic linkage map is a saturated genetic map and will be a useful tool for the localization of quantitative trait loci (QTLs) and gene(s) of interest and for the identification of candidate genes for economically important traits.

  19. SNP variation in ADRB3 gene reflects the breed difference of sheep populations.

    PubMed

    Wu, Jianliang; Qiao, Liying; Liu, Jianhua; Yuan, Yanan; Liu, Wenzhong

    2012-08-01

    The β3-adrenergic receptor (ADRB3), a G-protein coupled receptor, plays a major role in energy metabolism and regulation of lipolysis and homeostasis. We detect the single nucleotide polymorphism (SNP) variation in full-length sequence of ovine ADRB3 gene in 12 domestic sheep populations within four types by polymerase chain reaction-single strand conformation polymorphism and sequencing to reveal the breed difference. Twenty-two SNPs, 12 of which in the exon 1 and ten in the intron, were detected, and 12 new exonic and four new intronic SNPs were found. Most SNPs presented in Shanxi Dam Line and least ones in Dorset. The average SNP number in both meat and dual purpose for meat and wool breeds was significantly higher than general and dual purpose breeds for wool and meat. Frequency of each SNP in studied breeds or types was different. The 18C Del and 1617T Ins majorly existed in dual purpose breeds for wool and meat. The 25A Del, 119C>G and 130C>T were mostly found in the meat and dual purpose for meat and wool breeds. The 1764C>A more frequently presented in meat than in other types. The majority of variations came from within the populations as suggested by analysis of molecular variance. Close relationship presented among the Chinese and western breeds, respectively. In conclusion, SNPs of ovine ADRB3 gene can reflect the breed difference and within- and between-population variations, and to a great extent, the breed relationship.

  20. Validation of a Cost-Efficient Multi-Purpose SNP Panel for Disease Based Research

    PubMed Central

    Hou, Liping; Phillips, Christopher; Azaro, Marco; Brzustowicz, Linda M.; Bartlett, Christopher W.

    2011-01-01

    Background Here we present convergent methodologies using theoretical calculations, empirical assessment on in-house and publicly available datasets as well as in silico simulations, that validate a panel of SNPs for a variety of necessary tasks in human genetics disease research before resources are committed to larger-scale genotyping studies on those samples. While large-scale well-funded human genetic studies routinely have up to a million SNP genotypes, samples in a human genetics laboratory that are not yet part of such studies may be productively utilized in pilot projects or as part of targeted follow-up work though such smaller scale applications require at least some genome-wide genotype data for quality control purposes such as DNA “barcoding” to detect swaps or contamination issues, determining familial relationships between samples and correcting biases due to population effects such as population stratification in pilot studies. Principal Findings Empirical performance in classification of relative types for any two given DNA samples (e.g., full siblings, parental, etc) indicated that for outbred populations the panel performs sufficiently to classify relationship in extended families and therefore also for smaller structures such as trios and for twin zygosity testing. Additionally, familial relationships do not significantly diminish the (mean match) probability of sharing SNP genotypes in pedigrees, further indicating the uniqueness of the “barcode.” Simulation using these SNPs for an African American case-control disease association study demonstrated that population stratification, even in complex admixed samples, can be adequately corrected under a range of disease models using the SNP panel. Conclusion The panel has been validated for use in a variety of human disease genetics research tasks including sample barcoding, relationship verification, population substructure detection and statistical correction. Given the ease of genotyping

  1. SNP genotyping of animal and human derived isolates of Mycobacterium avium subsp. paratuberculosis.

    PubMed

    Wynne, James W; Beller, Christie; Boyd, Victoria; Francis, Barry; Gwoźdź, Jacek; Carajias, Marios; Heine, Hans G; Wagner, Josef; Kirkwood, Carl D; Michalski, Wojtek P

    2014-08-27

    Mycobacterium avium subsp. paratuberculosis (MAP) is the aetiological agent of Johne's disease (JD), a chronic granulomatous enteritis that affects ruminants worldwide. While the ability of MAP to cause disease in animals is clear, the role of this bacterium in human inflammatory bowel diseases remains unresolved. Previous whole genome sequencing of MAP isolates derived from human and three animal hosts showed that human isolates were genetically similar and showed a close phylogenetic relationship to one bovine isolate. In contrast, other animal derived isolates were more genetically diverse. The present study aimed to investigate the frequency of this human strain across 52 wild-type MAP isolates, collected predominantly from Australia. A Luminex based SNP genotyping approach was utilised to genotype SNPs that had previously been shown to be specific to the human, bovine or ovine isolate types. Fourteen SNPs were initially evaluated across a reference panel of isolates with known genotypes. A subset of seven SNPs was chosen for analysis within the wild-type collection. Of the seven SNPs, three were found to be unique to paediatric human isolates. No wild-type isolates contain these SNP alleles. Interestingly, and in contrast to the paediatric isolates, three additional adult human isolates (derived from adult Crohn's disease patients) also did not contain these SNP alleles. Furthermore we identified two SNPs, which demonstrate extensive polymorphism within the animal-derived MAP isolates. One of which appears unique to ovine and a single camel isolate. From this study we suggest the existence of genetic heterogeneity between human derived MAP isolates, some of which are highly similar to those derived from bovine hosts, but others of which are more divergent.

  2. Construction and evaluation of a high-density SNP array for the Pacific oyster (Crassostrea gigas)

    PubMed Central

    Li, Chunyan; Wang, Wei; Li, Busu; Li, Li

    2017-01-01

    Single nucleotide polymorphisms (SNPs) are widely used in genetics and genomics research. The Pacific oyster (Crassostrea gigas) is an economically and ecologically important marine bivalve, and it possesses one of the highest levels of genomic DNA variation among animal species. Pacific oyster SNPs have been extensively investigated; however, the mechanisms by which these SNPs may be used in a high-throughput, transferable, and economical manner remain to be elucidated. Here, we constructed an oyster 190K SNP array using Affymetrix Axiom genotyping technology. We designed 190,420 SNPs on the chip; these SNPs were selected from 54 million SNPs identified through re-sequencing of 472 Pacific oysters collected in China, Japan, Korea, and Canada. Our genotyping results indicated that 133,984 (70.4%) SNPs were polymorphic and successfully converted on the chip. The SNPs were distributed evenly throughout the oyster genome, located in 3,595 scaffolds with a length of ~509.4 million; the average interval spacing was 4,210 bp. In addition, 111,158 SNPs were distributed in 21,050 coding genes, with an average of 5.3 SNPs per gene. In comparison with genotypes obtained through re-sequencing, ~69% of the converted SNPs had a concordance rate of >0.971; the mean concordance rate was 0.966. Evaluation based on genotypes of full-sib family individuals revealed that the average genotyping accuracy rate was 0.975. Carrying 133 K polymorphic SNPs, our oyster 190K SNP array is the first commercially available high-density SNP chip for mollusks, with the highest throughput. It represents a valuable tool for oyster genome-wide association studies, fine linkage mapping, and population genetics. PMID:28328985

  3. A Whole Methylome CpG-SNP Association Study of Psychosis in Blood and Brain Tissue.

    PubMed

    van den Oord, Edwin J C G; Clark, Shaunna L; Xie, Lin Ying; Shabalin, Andrey A; Dozmorov, Mikhail G; Kumar, Gaurav; Vladimirov, Vladimir I; Magnusson, Patrik K E; Aberg, Karolina A

    2016-07-01

    Mutated CpG sites (CpG-SNPs) are potential hotspots for human diseases because in addition to the sequence variation they may show individual differences in DNA methylation. We performed methylome-wide association studies (MWAS) to test whether methylation differences at those sites were associated with schizophrenia. We assayed all common CpG-SNPs with methyl-CpG binding domain protein-enriched genome sequencing (MBD-seq) using DNA extracted from 1408 blood samples and 66 postmortem brain samples (BA10) of schizophrenia cases and controls. Seven CpG-SNPs passed our FDR threshold of 0.1 in the blood MWAS. Of the CpG-SNPs methylated in brain, 94% were also methylated in blood. This significantly exceeded the 46.2% overlap expected by chance (P-value < 1.0×10(-8)) and justified replicating findings from blood in brain tissue. CpG-SNP rs3796293 in IL1RAP replicated (P-value = .003) with the same direction of effects. This site was further validated through targeted bisulfite pyrosequencing in 736 independent case-control blood samples (P-value < 9.5×10(-4)). Our top result in the brain MWAS (P-value = 8.8×10(-7)) was CpG-SNP rs16872141 located in the potential promoter of ENC1. Overall, our results suggested that CpG-SNP methylation may reflect effects of environmental insults and can provide biomarkers in blood that could potentially improve disease management.

  4. Solar Radiation-Associated Adaptive SNP Genetic Differentiation in Wild Emmer Wheat, Triticum dicoccoides

    PubMed Central

    Ren, Jing; Chen, Liang; Jin, Xiaoli; Zhang, Miaomiao; You, Frank M.; Wang, Jirui; Frenkel, Vladimir; Yin, Xuegui; Nevo, Eviatar; Sun, Dongfa; Luo, Ming-Cheng; Peng, Junhua

    2017-01-01

    Whole-genome scans with large number of genetic markers provide the opportunity to investigate local adaptation in natural populations and identify candidate genes under positive selection. In the present study, adaptation genetic differentiation associated with solar radiation was investigated using 695 polymorphic SNP markers in wild emmer wheat originated in a micro-site at Yehudiyya, Israel. The test involved two solar radiation niches: (1) sun, in-between trees; and (2) shade, under tree canopy, separated apart by a distance of 2–4 m. Analysis of molecular variance showed a small (0.53%) but significant portion of overall variation between the sun and shade micro-niches, indicating a non-ignorable genetic differentiation between sun and shade habitats. Fifty SNP markers showed a medium (0.05 ≤ FST ≤ 0.15) or high genetic differentiation (FST > 0.15). A total of 21 outlier loci under positive selection were identified by using four different FST-outlier testing algorithms. The markers and genome locations under positive selection are consistent with the known patterns of selection. These results suggested that genetic differentiation between sun and shade habitats is substantial, radiation-associated, and therefore ecologically determined. Hence, the results of this study reflected effects of natural selection through solar radiation on EST-related SNP genetic diversity, resulting presumably in different adaptive complexes at a micro-scale divergence. The present work highlights the evolutionary theory and application significance of solar radiation-driven natural selection in wheat improvement. PMID:28352272

  5. High-density SNP assay development for genetic analysis in maritime pine (Pinus pinaster).

    PubMed

    Plomion, C; Bartholomé, J; Lesur, I; Boury, C; Rodríguez-Quilón, I; Lagraulet, H; Ehrenmann, F; Bouffier, L; Gion, J M; Grivet, D; de Miguel, M; de María, N; Cervera, M T; Bagnoli, F; Isik, F; Vendramin, G G; González-Martínez, S C

    2016-03-01

    Maritime pine provides essential ecosystem services in the south-western Mediterranean basin, where it covers around 4 million ha. Its scattered distribution over a range of environmental conditions makes it an ideal forest tree species for studies of local adaptation and evolutionary responses to climatic change. Highly multiplexed single nucleotide polymorphism (SNP) genotyping arrays are increasingly used to study genetic variation in living organisms and for practical applications in plant and animal breeding and genetic resource conservation. We developed a 9k Illumina Infinium SNP array and genotyped maritime pine trees from (i) a three-generation inbred (F2) pedigree, (ii) the French breeding population and (iii) natural populations from Portugal and the French Atlantic coast. A large proportion of the exploitable SNPs (2052/8410, i.e. 24.4%) segregated in the mapping population and could be mapped, providing the densest ever gene-based linkage map for this species. Based on 5016 SNPs, natural and breeding populations from the French gene pool exhibited similar level of genetic diversity. Population genetics and structure analyses based on 3981 SNP markers common to the Portuguese and French gene pools revealed high levels of differentiation, leading to the identification of a set of highly differentiated SNPs that could be used for seed provenance certification. Finally, we discuss how the validated SNPs could facilitate the identification of ecologically and economically relevant genes in this species, improving our understanding of the demography and selective forces shaping its natural genetic diversity, and providing support for new breeding strategies.

  6. An EST-derived SNP and SSR genetic linkage map of cassava (Manihot esculenta Crantz).

    PubMed

    Rabbi, Ismail Yusuf; Kulembeka, Heneriko Philbert; Masumba, Esther; Marri, Pradeep Reddy; Ferguson, Morag

    2012-07-01

    Cassava (Manihot esculenta Crantz) is one of the most important food security crops in the tropics and increasingly being adopted for agro-industrial processing. Genetic improvement of cassava can be enhanced through marker-assisted breeding. For this, appropriate genomic tools are required to dissect the genetic architecture of economically important traits. Here, a genome-wide SNP-based genetic map of cassava anchored in SSRs is presented. An outbreeder full-sib (F1) family was genotyped on two independent SNP assay platforms: an array of 1,536 SNPs on Illumina's GoldenGate platform was used to genotype a first batch of 60 F1. Of the 1,358 successfully converted SNPs, 600 which were polymorphic in at least one of the parents and was subsequently converted to KBiosciences' KASPar assay platform for genotyping 70 additional F1. High-precision genotyping of 163 informative SSRs using capillary electrophoresis was also carried out. Linkage analysis resulted in a final linkage map of 1,837 centi-Morgans (cM) containing 568 markers (434 SNPs and 134 SSRs) distributed across 19 linkage groups. The average distance between adjacent markers was 3.4 cM. About 94.2% of the mapped SNPs and SSRs have also been localized on scaffolds of version 4.1 assembly of the cassava draft genome sequence. This more saturated genetic linkage map of cassava that combines SSR and SNP markers should find several applications in the improvement of cassava including aligning scaffolds of the cassava genome sequence, genetic analyses of important agro-morphological traits, studying the linkage disequilibrium landscape and comparative genomics.

  7. Estimation of effective population size using single-nucleotide polymorphism (SNP) data in Jeju horse.

    PubMed

    Do, Kyoung-Tag; Lee, Joon-Ho; Lee, Hak-Kyo; Kim, Jun; Park, Kyung-Do

    2014-01-01

    This study was conducted to estimate the effective population size using SNPs data of 240 Jeju horses that had raced at the Jeju racing park. Of the total 61,746 genotyped autosomal SNPs, 17,320 (28.1%) SNPs (missing genotype rate of >10%, minor allele frequency of <0.05 and Hardy-Weinberg equilibrium test P-value of <10(-6)) were excluded after quality control processes. SNPs on the X and Y chromosomes and genotyped individuals with missing genotype rate over 10% were also excluded, and finally, 44,426 (71.9%) SNPs were selected and used for the analysis. The measures of the LD, square of correlation coefficient (r(2)) between SNP pairs, were calculated for each allele and the effective population size was determined based on r(2) measures. The polymorphism information contents (PIC) and expected heterozygosity (HE) were 0.27 and 0.34, respectively. In LD, the most rapid decline was observed over the first 1 Mb. But r(2) decreased more slowly with increasing distance and was constant after 2 Mb of distance and the decline was almost linear with log-transformed distance. The average r(2) between adjacent SNP pairs ranged from 0.20 to 0.31 in each chromosome and whole average was 0.26, while the whole average r(2) between all SNP pairs was 0.02. We observed an initial pattern of decreasing Ne and estimated values were closer to 41 at 1 ~ 5 generations ago. The effective population size (41 heads) estimated in this study seems to be large considering Jeju horse's population size (about 2,000 heads), but it should be interpreted with caution because of the technical limitations of the methods and sample size.

  8. High-Throughput SNP Allele-Frequency Determination in Pooled DNA Samples by Kinetic PCR

    PubMed Central

    Germer, Søren; Holland, Michael J.; Higuchi, Russell

    2000-01-01

    We have developed an accurate, yet inexpensive and high-throughput, method for determining the allele frequency of biallelic polymorphisms in pools of DNA samples. The assay combines kinetic (real-time quantitative) PCR with allele-specific amplification and requires no post-PCR processing. The relative amounts of each allele in a sample are quantified. This is performed by dividing equal aliquots of the pooled DNA between two separate PCR reactions, each of which contains a primer pair specific to one or the other allelic SNP variant. For pools with equal amounts of the two alleles, the two amplifications should reach a detectable level of fluorescence at the same cycle number. For pools that contain unequal ratios of the two alleles, the difference in cycle number between the two amplification reactions can be used to calculate the relative allele amounts. We demonstrate the accuracy and reliability of the assay on samples with known predetermined SNP allele frequencies from 5% to 95%, including pools of both human and mouse DNAs using eight different SNPs altogether. The accuracy of measuring known allele frequencies is very high, with the strength of correlation between measured and known frequencies having an r2 = 0.997. The loss of sensitivity as a result of measurement error is typically minimal, compared with that due to sampling error alone, for population samples up to 1000. We believe that by providing a means for SNP genotyping up to thousands of samples simultaneously, inexpensively, and reproducibly, this method is a powerful strategy for detecting meaningful polymorphic differences in candidate gene association studies and genome-wide linkage disequilibrium scans. PMID:10673283

  9. CLUSTAG & WCLUSTAG: Hierarchical Clustering Algorithms for Efficient Tag-SNP Selection

    NASA Astrophysics Data System (ADS)

    Ao, Sio-Iong

    More than 6 million single nucleotide polymorphisms (SNPs) in the human genome have been genotyped by the HapMap project. Although only a pro portion of these SNPs are functional, all can be considered as candidate markers for indirect association studies to detect disease-related genetic variants. The complete screening of a gene or a chromosomal region is nevertheless an expensive undertak ing for association studies. A key strategy for improving the efficiency of association studies is to select a subset of informative SNPs, called tag SNPs, for analysis. In the chapter, hierarchical clustering algorithms have been proposed for efficient tag SNP selection.

  10. mrsFAST-Ultra: a compact, SNP-aware mapper for high performance sequencing applications.

    PubMed

    Hach, Faraz; Sarrafi, Iman; Hormozdiari, Farhad; Alkan, Can; Eichler, Evan E; Sahinalp, S Cenk

    2014-07-01

    High throughput sequencing (HTS) platforms generate unprecedented amounts of data that introduce challenges for processing and downstream analysis. While tools that report the 'best' mapping location of each read provide a fast way to process HTS data, they are not suitable for many types of downstream analysis such as structural variation detection, where it is important to report multiple mapping loci for each read. For this purpose we introduce mrsFAST-Ultra, a fast, cache oblivious, SNP-aware aligner that can handle the multi-mapping of HTS reads very efficiently. mrsFAST-Ultra improves mrsFAST, our first cache oblivious read aligner capable of handling multi-mapping reads, through new and compact index structures that reduce not only the overall memory usage but also the number of CPU operations per alignment. In fact the size of the index generated by mrsFAST-Ultra is 10 times smaller than that of mrsFAST. As importantly, mrsFAST-Ultra introduces new features such as being able to (i) obtain the best mapping loci for each read, and (ii) return all reads that have at most n mapping loci (within an error threshold), together with these loci, for any user specified n. Furthermore, mrsFAST-Ultra is SNP-aware, i.e. it can map reads to reference genome while discounting the mismatches that occur at common SNP locations provided by db-SNP; this significantly increases the number of reads that can be mapped to the reference genome. Notice that all of the above features are implemented within the index structure and are not simple post-processing steps and thus are performed highly efficiently. Finally, mrsFAST-Ultra utilizes multiple available cores and processors and can be tuned for various memory settings. Our results show that mrsFAST-Ultra is roughly five times faster than its predecessor mrsFAST. In comparison to newly enhanced popular tools such as Bowtie2, it is more sensitive (it can report 10 times or more mappings per read) and much faster (six times or

  11. Single-cell SNP analyses and interpretations based on RNA-Seq data for colon cancer research.

    PubMed

    Chen, Jiahuan; Zhou, Qian; Wang, Yangfan; Ning, Kang

    2016-09-28

    Single-cell sequencing is useful for illustrating the cellular heterogeneities inherent in many intricate biological systems, particularly in human cancer. However, owing to the difficulties in acquiring, amplifying and analyzing single-cell genetic material, obstacles remain for single-cell diversity assessments such as single nucleotide polymorphism (SNP) analyses, rendering biological interpretations of single-cell omics data elusive. We used RNA-Seq data from single-cell and bulk colon cancer samples to analyze the SNP profiles for both structural and functional comparisons. Colon cancer-related pathways with single-cell level SNP enrichment, including the TGF-β and p53 signaling pathways, were also investigated based on both their SNP enrichment patterns and gene expression. We also detected a certain number of fusion transcripts, which may promote tumorigenesis, at the single-cell level. Based on these results, single-cell analyses not only recapitulated the SNP analysis results from the bulk samples but also detected cell-to-cell and cell-to-bulk variations, thereby aiding in early diagnosis and in identifying the precise mechanisms underlying cancers at the single-cell level.

  12. SNP-based discovery of salinity-tolerant QTLs in a bi-parental population of rice (Oryza sativa).

    PubMed

    Gimhani, D R; Gregorio, Glenn B; Kottearachchi, N S; Samarasinghe, W L G

    2016-12-01

    Breeding for salt tolerance is the most promising approach to enhance the productivity of saline prone areas. However, polygenic inheritance of salt tolerance in rice acts as a bottleneck in conventional breeding for salt tolerance. Hence, we set our goals to construct a single nucleotide polymorphism (SNP)-based molecular map employing high-throughput SNP marker technology and to investigate salinity tolerant QTLs with closest flanking markers using an elite rice background. Seedling stage salinity responses were assessed in a population of 281 recombinant inbred lines (RILs) derived from the cross between At354 (salt tolerant) and Bg352 (salt susceptible), by 11 morpho-physiological indices under a hydroponic system. Selected extreme 94 RILs were genotyped using Illumina Infinium rice 6K SNP array and densely saturated molecular map spanning 1460.81 cM of the rice genome with an average interval of 1.29 cM between marker loci was constructed using 1135 polymorphic SNP markers. The results revealed 83 significant QTLs for 11 salt responsive traits explaining 12.5-46.7 % of phenotypic variation in respective traits. Of them, 72 QTLs responsible for 10 traits were co-localized together forming 14 QTL hotspots at 14 different genomic regions. The all QTL hotspots were flanked less than 1 Mb intervals and therefore the SNP loci associated with these QTL hotspots would be important in candidate gene discovery for salt tolerance.

  13. A chromatin-associated and transcriptionally inactive p53-Mdm2 complex occurs in mdm2 SNP309 homozygous cells.

    PubMed

    Arva, Nicoleta C; Gopen, Tamara R; Talbott, Kathryn E; Campbell, Latoya E; Chicas, Agustin; White, David E; Bond, Gareth L; Levine, Arnold J; Bargonetti, Jill

    2005-07-22

    In cancer cells, the function of the tumor suppressor protein p53 is usually blocked. Impairment of the p53 pathway results in tumor cells with endogenous overexpression of Mdm2 via a naturally occurring single nucleotide polymorphism (SNP) in the mdm2 gene at position 309. Here we report that in mdm2 SNP309 cells, inactivation of p53 results in a chromatin-associated Mdm2-p53 complex without clearance of p53 by protein degradation. Nuclear accumulation of p53 protein in mdm2 SNP309 cells results after 6 h of camptothecin, etoposide, or mitomycin C treatment, with the p53 protein phosphorylated at Ser15. Chromatin immunoprecipitation demonstrated p53 and Mdm2 bound to p53 responsive elements. Interestingly, although the p53 protein was able to bind to DNA, quantitative PCR showed compromised transcription of endogenous target genes. Additionally, exogenously introduced p53 was incapable of activating transcription from p53 responsive elements in SNP309 cells, confirming the trans-acting nature of the inhibitor. Inhibition of Mdm2 by siRNA resulted in transcriptional activation of these p53 targets. Our data suggest that overproduction of Mdm2, resulting from a naturally occurring SNP, inhibits chromatin-bound p53 from activating the transcription of its target genes.

  14. IL-32 promoter SNP rs4786370 predisposes to modified lipoprotein profiles in patients with rheumatoid arthritis

    PubMed Central

    Damen, Michelle S. M. A.; Agca, Rabia; Holewijn, Suzanne; de Graaf, Jacqueline; Dos Santos, Jéssica C.; van Riel, Piet L.; Fransen, Jaap; Coenen, Marieke J. H.; Nurmohamed, Mike T.; Netea, Mihai G.; Dinarello, Charles A.; Joosten, Leo A. B.; Heinhuis, Bas; Popa, Calin D.

    2017-01-01

    Patients with rheumatoid arthritis (RA) are at higher risk of developing cardiovascular diseases (CVD). Interleukin (IL)-32 has previously been shown to be involved in the pathogenesis of RA and might be linked to the development of atherosclerosis. However, the exact mechanism linking IL-32 to CVD still needs to be elucidated. The influence of a functional genetic variant of IL-32 on lipid profiles and CVD risk was therefore studied in whole blood from individuals from the NBS cohort and RA patients from 2 independent cohorts. Lipid profiles were matched to the specific IL-32 genotypes. Allelic distribution was similar in all three groups. Interestingly, significantly higher levels of high density lipoprotein cholesterol (HDLc) were observed in individuals from the NBS cohort and RA patients from the Nijmegen cohort homozygous for the C allele (p = 0.0141 and p = 0.0314 respectively). In contrast, the CC-genotype was associated with elevated low density lipoprotein cholesterol (LDLc) and total cholesterol (TC) in individuals at higher risk for CVD (plaque positive) (p = 0.0396; p = 0.0363 respectively). Our study shows a functional effect of a promoter single-nucleotide polymorphism (SNP) in IL32 on lipid profiles in RA patients and individuals, suggesting a possible protective role of this SNP against CVD. PMID:28134327

  15. PrimerMapper: high throughput primer design and graphical assembly for PCR and SNP detection.

    PubMed

    O'Halloran, Damien M

    2016-02-08

    Primer design represents a widely employed gambit in diverse molecular applications including PCR, sequencing, and probe hybridization. Variations of PCR, including primer walking, allele-specific PCR, and nested PCR provide specialized validation and detection protocols for molecular analyses that often require screening large numbers of DNA fragments. In these cases, automated sequence retrieval and processing become important features, and furthermore, a graphic that provides the user with a visual guide to the distribution of designed primers across targets is most helpful in quickly ascertaining primer coverage. To this end, I describe here, PrimerMapper, which provides a comprehensive graphical user interface that designs robust primers from any number of inputted sequences while providing the user with both, graphical maps of primer distribution for each inputted sequence, and also a global assembled map of all inputted sequences with designed primers. PrimerMapper also enables the visualization of graphical maps within a browser and allows the user to draw new primers directly onto the webpage. Other features of PrimerMapper include allele-specific design features for SNP genotyping, a remote BLAST window to NCBI databases, and remote sequence retrieval from GenBank and dbSNP. PrimerMapper is hosted at GitHub and freely available without restriction.

  16. Informatics Enhanced SNP Microarray Analysis of 30 Miscarriage Samples Compared to Routine Cytogenetics

    PubMed Central

    Lathi, Ruth B.; Loring, Megan; Massie, Jamie A. M.; Demko, Zachary P.; Johnson, David; Sigurjonsson, Styrmir; Gemelos, George; Rabinowitz, Matthew

    2012-01-01

    Purpose The metaphase karyotype is often used as a diagnostic tool in the setting of early miscarriage; however this technique has several limitations. We evaluate a new technique for karyotyping that uses single nucleotide polymorphism microarrays (SNP). This technique was compared in a blinded, prospective fashion, to the traditional metaphase karyotype. Methods Patients undergoing dilation and curettage for first trimester miscarriage between February and August 2010 were enrolled. Samples of chorionic villi were equally divided and sent for microarray testing in parallel with routine cytogenetic testing. Results Thirty samples were analyzed, with only four discordant results. Discordant results occurred when the entire genome was duplicated or when a balanced rearrangement was present. Cytogenetic karyotyping took an average of 29 days while microarray-based karytoyping took an average of 12 days. Conclusions Molecular karyotyping of POC after missed abortion using SNP microarray analysis allows for the ability to detect maternal cell contamination and provides rapid results with good concordance to standard cytogenetic analysis. PMID:22403611

  17. Use of Sequenom sample ID Plus® SNP genotyping in identification of FFPE tumor samples.

    PubMed

    Miller, Jessica K; Buchner, Nicholas; Timms, Lee; Tam, Shirley; Luo, Xuemei; Brown, Andrew M K; Pasternack, Danielle; Bristow, Robert G; Fraser, Michael; Boutros, Paul C; McPherson, John D

    2014-01-01

    Short tandem repeat (STR) analysis, such as the AmpFlSTR® Identifiler® Plus kit, is a standard, PCR-based human genotyping method used in the field of forensics. Misidentification of cell line and tissue DNA can be costly if not detected early; therefore it is necessary to have quality control measures such as STR profiling in place. A major issue in large-scale research studies involving archival formalin-fixed paraffin embedded (FFPE) tissues is that varying levels of DNA degradation can result in failure to correctly identify samples using STR genotyping. PCR amplification of STRs of several hundred base pairs is not always possible when DNA is degraded. The Sample ID Plus® panel from Sequenom allows for human DNA identification and authentication using SNP genotyping. In comparison to lengthy STR amplicons, this multiplexing PCR assay requires amplification of only 76-139 base pairs, and utilizes 47 SNPs to discriminate between individual samples. In this study, we evaluated both STR and SNP genotyping methods of sample identification, with a focus on paired FFPE tumor/normal DNA samples intended for next-generation sequencing (NGS). The ability to successfully validate the identity of FFPE samples can enable cost savings by reducing rework.

  18. The use of SNP markers for linkage mapping in diploid and tetraploid peanuts.

    PubMed

    Bertioli, David J; Ozias-Akins, Peggy; Chu, Ye; Dantas, Karinne M; Santos, Silvio P; Gouvea, Ediene; Guimarães, Patricia M; Leal-Bertioli, Soraya C M; Knapp, Steven J; Moretzsohn, Marcio C

    2014-01-10

    Single nucleotide polymorphic markers (SNPs) are attractive for use in genetic mapping and marker-assisted breeding because they can be scored in parallel assays at favorable costs. However, scoring SNP markers in polyploid plants like the peanut is problematic because of interfering signal generated from the DNA bases that are homeologous to those being assayed. The present study used a previously constructed 1536 GoldenGate SNP assay developed using SNPs identified between two A. duranensis accessions. In this study, the performance of this assay was tested on two RIL mapping populations, one diploid (A. duranensis × A. stenosperma) and one tetraploid [A. hypogaea cv. Runner IAC 886 × synthetic tetraploid (A. ipaënsis × A. duranensis)(4×)]. The scoring was performed using the software GenomeStudio version 2011.1. For the diploid, polymorphic markers provided excellent genotyping scores with default software parameters. In the tetraploid, as expected, most of the polymorphic markers provided signal intensity plots that were distorted compared to diploid patterns and that were incorrectly scored using default parameters. However, these scorings were easily corrected using the GenomeStudio software. The degree of distortion was highly variable. Of the polymorphic markers, approximately 10% showed no distortion at all behaving as expected for single-dose markers, and another 30% showed low distortion and could be considered high-quality. The genotyped markers were incorporated into diploid and tetraploid genetic maps of Arachis and, in the latter case, were located almost entirely on A genome linkage groups.

  19. A high-density SNP genome-wide linkage scan in a large autism extended pedigree.

    PubMed

    Allen-Brady, K; Miller, J; Matsunami, N; Stevens, J; Block, H; Farley, M; Krasny, L; Pingree, C; Lainhart, J; Leppert, M; McMahon, W M; Coon, H

    2009-06-01

    We performed a high-density, single nucleotide polymorphism (SNP), genome-wide scan on a six-generation pedigree from Utah with seven affected males, diagnosed with autism spectrum disorder. Using a two-stage linkage design, we first performed a nonparametric analysis on the entire genome using a 10K SNP chip to identify potential regions of interest. To confirm potentially interesting regions, we eliminated SNPs in high linkage disequilibrium (LD) using a principal components analysis (PCA) method and repeated the linkage results. Three regions met genome-wide significance criteria after controlling for LD: 3q13.2-q13.31 (nonparametric linkage (NPL), 5.58), 3q26.31-q27.3 (NPL, 4.85) and 20q11.21-q13.12 (NPL, 5.56). Two regions met suggestive criteria for significance 7p14.1-p11.22 (NPL, 3.18) and 9p24.3 (NPL, 3.44). All five chromosomal regions are consistent with other published findings. Haplotype sharing results showed that five of the affected subjects shared more than a single chromosomal region of interest with other affected subjects. Although no common autism susceptibility genes were found for all seven autism cases, these results suggest that multiple genetic loci within these regions may contribute to the autism phenotype in this family, and further follow-up of these chromosomal regions is warranted.

  20. Measuring inbreeding and inbreeding depression on pig growth from pedigree or SNP-derived metrics.

    PubMed

    Silió, L; Rodríguez, M C; Fernández, A; Barragán, C; Benítez, R; Óvilo, C; Fernández, A I

    2013-10-01

    Multilocus homozygosity, measured as the proportion of the autosomal genome in homozygous genotypes or in runs of homozygosity, was compared with the respective pedigree inbreeding coefficients in 64 Iberian pigs genotyped using the Porcine SNP60 Beadchip. Pigs were sampled from a set of experimental animals with a large inbreeding variation born in a closed strain with a completely recorded multi-generation genealogy. Individual inbreeding coefficients calculated from pedigree were strongly correlated with the different SNP-derived metrics of homozygosity (r = 0.814-0.919). However, unequal correlations between molecular and pedigree inbreeding were observed at chromosomal level being mainly dependent on the number of SNPs and on the correlation between heterozygosities measured across different loci. A panel of 192 SNPs of intermediate frequencies was selected for genotyping 322 piglets to test inbreeding depression on postweaning growth performance (daily gain and weight at 90 days). The negative effects on these traits of homozygosities calculated from the genotypes of 168 quality-checked SNPs were similar to those of inbreeding coefficients. The results support that few hundreds of SNPs may be useful for measuring inbreeding and inbreeding depression, when the population structure or the mating system causes a large variance of inbreeding.

  1. Predicting Alzheimer's Disease Using Combined Imaging-Whole Genome SNP Data.

    PubMed

    Kong, Dehan; Giovanello, Kelly S; Wang, Yalin; Lin, Weili; Lee, Eunjee; Fan, Yong; Murali Doraiswamy, P; Zhu, Hongtu

    2015-01-01

    The growing public threat of Alzheimer's disease (AD) has raised the urgency to discover and validate prognostic biomarkers in order to predicting time to onset of AD. It is anticipated that both whole genome single nucleotide polymorphism (SNP) data and high dimensional whole brain imaging data offer predictive values to identify subjects at risk for progressing to AD. The aim of this paper is to test whether both whole genome SNP data and whole brain imaging data offer predictive values to identify subjects at risk for progressing to AD. In 343 subjects with mild cognitive impairment (MCI) enrolled in the Alzheimer's Disease Neuroimaging Initiative (ADNI-1), we extracted high dimensional MR imaging (volumetric data on 93 brain regions plus a surface fluid registration based hippocampal subregion and surface data), and whole genome data (504,095 SNPs from GWAS), as well as routine neurocognitive and clinical data at baseline. MCI patients were then followed over 48 months, with 150 participants progressing to AD. Combining information from whole brain MR imaging and whole genome data was substantially superior to the standard model for predicting time to onset of AD in a 48-month national study of subjects at risk. Our findings demonstrate the promise of combined imaging-whole genome prognostic markers in people with mild memory impairment.

  2. A Pipeline for Classifying Relationships Using Dense SNP/SNV Data and Putative Pedigree Information.

    PubMed

    Zeng, Zhen; Weeks, Daniel E; Chen, Wei; Mukhopadhyay, Nandita; Feingold, Eleanor

    2016-02-01

    When genome-wide association studies (GWAS) or sequencing studies are performed on family-based datasets, the genotype data can be used to check the structure of putative pedigrees. Even in datasets of putatively unrelated people, close relationships can often be detected using dense single-nucleotide polymorphism/variant (SNP/SNV) data. A number of methods for finding relationships using dense genetic data exist, but they all have certain limitations, including that they typically use average genetic sharing, which is only a subset of the available information. Here, we present a set of approaches for classifying relationships in GWAS datasets or large-scale sequencing datasets. We first propose an empirical method for detecting identity by descent segments in close relative pairs using un-phased dense SNP data and demonstrate how that information can assist in building a relationship classifier. We then develop a strategy to take advantage of putative pedigree information to enhance classification accuracy. Our methods are tested and illustrated with two datasets from two distinct populations. Finally, we propose classification pipelines for checking and identifying relationships in datasets containing a large number of small pedigrees.

  3. Performance of different SNP panels for parentage testing in two East Asian cattle breeds.

    PubMed

    Strucken, E M; Gudex, B; Ferdosi, M H; Lee, H K; Song, K D; Gibson, J P; Kelly, M; Piper, E K; Porto-Neto, L R; Lee, S H; Gondro, C

    2014-08-01

    The International Society for Animal Genetics (ISAG) proposed a panel of single nucleotide polymorphisms (SNPs) for parentage testing in cattle (a core panel of 100 SNPs and an additional list of 100 SNPs). However, markers specific to East Asian taurine cattle breeds were not included, and no information is available as to whether the ISAG panel performs adequately for these breeds. We tested ISAG's core (100 SNP) and full (200 SNP) panels on two East Asian taurine breeds: the Korean Hanwoo and the Japanese Wagyu, the latter from the Australian herd. Even though the power of exclusion was high at 0.99 for both ISAG panels, the core panel performed poorly with 3.01% false-positive assignments in the Hanwoo population and 3.57% in the Wagyu. The full ISAG panel identified all sire-offspring relations correctly in both populations with 0.02% of relations wrongly excluded in the Hanwoo population. Based on these results, we created and tested two population-specific marker panels: one for the Wagyu population, which showed no false-positive assignments with either 100 or 200 SNPs, and a second panel for the Hanwoo, which still had some false-positive assignments with 100 SNPs but no false positives using 200 SNPs. In conclusion, for parentage assignment in East Asian cattle breeds, only the full ISAG panel is adequate for parentage testing. If fewer markers should be used, it is advisable to use population-specific markers rather than the ISAG panel.

  4. Transcriptome sequencing to produce SNP-based genetic maps of onion.

    PubMed

    Duangjit, J; Bohanec, B; Chan, A P; Town, C D; Havey, M J

    2013-08-01

    We used the Roche-454 platform to sequence from normalized cDNA libraries from each of two inbred lines of onion (OH1 and 5225). From approximately 1.6 million reads from each inbred, 27,065 and 33,254 cDNA contigs were assembled from OH1 and 5225, respectively. In total, 3,364 well supported single nucleotide polymorphisms (SNPs) on 1,716 cDNA contigs were identified between these two inbreds. One SNP on each of 1,256 contigs was randomly selected for genotyping. OH1 and 5225 were crossed and 182 gynogenic haploids extracted from hybrid plants were used for SNP mapping. A total of 597 SNPs segregated in the OH1 × 5225 haploid family and a genetic map of ten linkage groups (LOD ≥8) was constructed. Three hundred and thirty-nine of the newly identified SNPs were also mapped using a previously developed segregating family from BYG15-23 × AC43, and 223 common SNPs were used to join the two maps. Because these new SNPs are in expressed regions of the genome and commonly occur among onion germplasms, they will be useful for genetic mapping, gene tagging, marker-aided selection, quality control of seed lots, and fingerprinting of cultivars.

  5. Whole-Genome Analysis of Diversity and SNP-Major Gene Association in Peach Germplasm

    PubMed Central

    Micheletti, Diego; Dettori, Maria Teresa; Micali, Sabrina; Aramini, Valeria; Pacheco, Igor; Da Silva Linge, Cassia; Foschi, Stefano; Banchi, Elisa; Barreneche, Teresa; Quilot-Turion, Bénédicte; Lambert, Patrick; Pascal, Thierry; Iglesias, Ignasi; Carbó, Joaquim; Wang, Li-rong; Ma, Rui-juan; Li, Xiong-wei; Gao, Zhong-shan; Nazzicari, Nelson; Troggio, Michela; Bassi, Daniele; Rossini, Laura; Verde, Ignazio; Laurens, François; Arús, Pere; Aranzana, Maria José

    2015-01-01

    Peach was domesticated in China more than four millennia ago and from there it spread world-wide. Since the middle of the last century, peach breeding programs have been very dynamic generating hundreds of new commercial varieties, however, in most cases such varieties derive from a limited collection of parental lines (founders). This is one reason for the observed low levels of variability of the commercial gene pool, implying that knowledge of the extent and distribution of genetic variability in peach is critical to allow the choice of adequate parents to confer enhanced productivity, adaptation and quality to improved varieties. With this aim we genotyped 1,580 peach accessions (including a few closely related Prunus species) maintained and phenotyped in five germplasm collections (four European and one Chinese) with the International Peach SNP Consortium 9K SNP peach array. The study of population structure revealed the subdivision of the panel in three main populations, one mainly made up of Occidental varieties from breeding programs (POP1OCB), one of Occidental landraces (POP2OCT) and the third of Oriental accessions (POP3OR). Analysis of linkage disequilibrium (LD) identified differential patterns of genome-wide LD blocks in each of the populations. Phenotypic data for seven monogenic traits were integrated in a genome-wide association study (GWAS). The significantly associated SNPs were always in the regions predicted by linkage analysis, forming haplotypes of markers. These diagnostic haplotypes could be used for marker-assisted selection (MAS) in modern breeding programs. PMID:26352671

  6. Human Y-chromosome SNP characterization by multiplex amplified product-length polymorphism analysis.

    PubMed

    Medina, Laura Smeldy Jurado; Muzzio, Marina; Schwab, Marisol; Costantino, María Leticia Bravi; Barreto, Guillermo; Bailliet, Graciela

    2014-09-01

    We designed an allele-specific amplification protocol to optimize Y-chromosome SNP typing, which is an unavoidable step for defining the phylogenetic status of paternal lineages. It allows the simultaneous highly specific definition of up to six mutations in a single reaction by amplification fragment length polymorphism (AFLP) without the need of specialized equipment, at a considerably lower cost than that based on single-base primer extension (SNaPshot™) technology or PCR-RFLP systems, requiring as little as 0.5 ng DNA and compatible with the small fragments characteristic of low-quality DNA. By designation of two primers recognizing the derived and ancestral state for each SNP, which can be differentiated by size by the addition of a noncomplementary nucleotide tail, we could define major Y clades E, F, K, R, Q, and subhaplogroups R1, R1a, R1b, R1b1b, R1b1c, J1, J2, G1, G2, I1, Q1a3, and Q1a3a1 through amplification fragments that ranged between 60 and 158bp.

  7. SNP diversity within and among Brassica rapa accessions reveals no geographic differentiation.

    PubMed

    Tanhuanpää, P; Erkkilä, M; Tenhola-Roininen, T; Tanskanen, J; Manninen, O

    2016-01-01

    Genetic diversity was studied in a collection of 61 accessions of Brassica rapa, which were mostly oil-type turnip rapes but also included two oil-type subsp. dichotoma and five subsp. trilocularis accessions, as well as three leaf-type subspecies (subsp. japonica, pekinensis, and chinensis) and five turnip cultivars (subsp. rapa). Two-hundred and nine SNP markers, which had been discovered by amplicon resequencing, were used to genotype 893 plants from the B. rapa collection using Illumina BeadXpress. There was great variation in the diversity indices between accessions. With STRUCTURE analysis, the plant collection could be divided into three groups that seemed to correspond to morphotype and flowering habit but not to geography. According to AMOVA analysis, 65% of the variation was due to variation within accessions, 25% among accessions, and 10% among groups. A smaller subset of the plant collection, 12 accessions, was also studied with 5727 GBS-SNPs. Diversity indices obtained with GBS-SNPs correlated well with those obtained with Illumina BeadXpress SNPs. The developed SNP markers have already been used and will be used in future plant breeding programs as well as in mapping and diversity studies.

  8. Screening of human SNP database identifies recoding sites of A-to-I RNA editing

    PubMed Central

    Gommans, Willemijn M.; Tatalias, Nicholas E.; Sie, Christina P.; Dupuis, Dylan; Vendetti, Nicholas; Smith, Lauren; Kaushal, Rikhi; Maas, Stefan

    2008-01-01

    Single nucleotide polymorphisms (SNPs) are DNA sequence variations that can affect the expression or function of genes. As a result, they may lead to phenotypic differences between individuals, such as susceptibility to disease, response to medications, and disease progression. Millions of SNPs have been mapped within the human genome providing a rich resource for genetic variation studies. Adenosine-to-inosine RNA editing also leads to the production of RNA and protein sequence variants, but it acts on the level of primary gene transcripts. Sequence variations due to RNA editing may be misannotated as SNPs when relying solely on expressed sequence data instead of genomic material. In this study, we screened the human SNP database for potential cases of A-to-I RNA editing that cause amino acid changes in the encoded protein. Our search strategy applies five molecular features to score candidate sites. It identifies all previously known cases of editing present in the SNP database and successfully uncovers novel, bona fide targets of adenosine deamination editing. Our approach sets the stage for effective and comprehensive genome-wide screens for A-to-I editing targets. PMID:18772245

  9. Genome-wide prediction of cancer driver genes based on SNP and cancer SNV data.

    PubMed

    He, Quanze; He, Quanyuan; Liu, Xiaohui; Wei, Youheng; Shen, Suqin; Hu, Xiaohui; Li, Qiao; Peng, Xiangwen; Wang, Lin; Yu, Long

    2014-01-01

    Identifying cancer driver genes and exploring their functions are essential and the most urgent need in basic cancer research. Developing efficient methods to differentiate between driver and passenger somatic mutations revealed from large-scale cancer genome sequencing data is critical to cancer driver gene discovery. Here, we compared distinct features of SNP with SNV data in detail and found that the weighted ratio of SNV to SNP (termed as WVPR) is an excellent indicator for cancer driver genes. The power of WVPR was validated by accurate predictions of known drivers. We ranked most of human genes by WVPR and did functional analyses on the list. The results demonstrate that driver genes are usually highly enriched in chromatin organization related genes/pathways. And some protein complexes, such as histone acetyltransferase, histone methyltransferase, telomerase, centrosome, sin3 and U12-type spliceosomal complexes, are hot spots of driver mutations. Furthermore, this study identified many new potential driver genes (e.g. NTRK3 and ZIC4) and pathways including oxidative phosphorylation pathway, which were not deemed by previous methods. Taken together, our study not only developed a method to identify cancer driver genes/pathways but also provided new insights into molecular mechanisms of cancer development.

  10. A genome wide survey of SNP variation reveals the genetic structure of sheep breeds.

    PubMed

    Kijas, James W; Townley, David; Dalrymple, Brian P; Heaton, Michael P; Maddox, Jillian F; McGrath, Annette; Wilson, Peter; Ingersoll, Roxann G; McCulloch, Russell; McWilliam, Sean; Tang, Dave; McEwan, John; Cockett, Noelle; Oddy, V Hutton; Nicholas, Frank W; Raadsma, Herman

    2009-01-01

    The genetic structure of sheep reflects their domestication and subsequent formation into discrete breeds. Understanding genetic structure is essential for achieving genetic improvement through genome-wide association studies, genomic selection and the dissection of quantitative traits. After identifying the first genome-wide set of SNP for sheep, we report on levels of genetic variability both within and between a diverse sample of ovine populations. Then, using cluster analysis and the partitioning of genetic variation, we demonstrate sheep are characterised by weak phylogeographic structure, overlapping genetic similarity and generally low differentiation which is consistent with their short evolutionary history. The degree of population substructure was, however, sufficient to cluster individuals based on geographic origin and known breed history. Specifically, African and Asian populations clustered separately from breeds of European origin sampled from Australia, New Zealand, Europe and North America. Furthermore, we demonstrate the presence of stratification within some, but not all, ovine breeds. The results emphasize that careful documentation of genetic structure will be an essential prerequisite when mapping the genetic basis of complex traits. Furthermore, the identification of a subset of SNP able to assign individuals into broad groupings demonstrates even a small panel of markers may be suitable for applications such as traceability.

  11. MultiBLUP: improved SNP-based prediction for complex traits

    PubMed Central

    Balding, David J.

    2014-01-01

    BLUP (best linear unbiased prediction) is widely used to predict complex traits in plant and animal breeding, and increasingly in human genetics. The BLUP mathematical model, which consists of a single random effect term, was adequate when kinships were measured from pedigrees. However, when genome-wide SNPs are used to measure kinships, the BLUP model implicitly assumes that all SNPs have the same effect-size distribution, which is a severe and unnecessary limitation. We propose MultiBLUP, which extends the BLUP model to include multiple random effects, allowing greatly improved prediction when the random effects correspond to classes of SNPs with distinct effect-size variances. The SNP classes can be specified in advance, for example, based on SNP functional annotations, and we also provide an adaptive procedure for determining a suitable partition of SNPs. We apply MultiBLUP to genome-wide association data from the Wellcome Trust Case Control Consortium (seven diseases), and from much larger studies of celiac disease and inflammatory bowel disease, finding that it consistently provides better prediction than alternative methods. Moreover, MultiBLUP is computationally very efficient; for the largest data set, which includes 12,678 individuals and 1.5 M SNPs, the total analysis can be run on a single desktop PC in less than a day and can be parallelized to run even faster. Tools to perform MultiBLUP are freely available in our software LDAK. PMID:24963154

  12. Identification of close relatives in the HUGO Pan-Asian SNP database.

    PubMed

    Yang, Xiong; Xu, Shuhua

    2011-01-01

    The HUGO Pan-Asian SNP Consortium has recently released a genome-wide dataset, which consists of 1,719 DNA samples collected from 71 Asian populations. For studies of human population genetics such as genetic structure and migration history, this provided the most comprehensive large-scale survey of genetic variation to date in East and Southeast Asia. However, although considered in the analysis, close relatives were not clearly reported in the original paper. Here we performed a systematic analysis of genetic relationships among individuals from the Pan-Asian SNP (PASNP) database and identified 3 pairs of monozygotic twins or duplicate samples, 100 pairs of first-degree and 161 second-degree of relationships. Three standardized subsets with different levels of unrelated individuals were suggested here for future applications of the samples in most types of population-genetics studies (denoted by PASNP1716, PASNP1640 and PASNP1583 respectively) based on the relationships inferred in this study. In addition, we provided gender information for PASNP samples, which were not included in the original dataset, based on analysis of X chromosome data.

  13. PrimerMapper: high throughput primer design and graphical assembly for PCR and SNP detection

    PubMed Central

    O’Halloran, Damien M.

    2016-01-01

    Primer design represents a widely employed gambit in diverse molecular applications including PCR, sequencing, and probe hybridization. Variations of PCR, including primer walking, allele-specific PCR, and nested PCR provide specialized validation and detection protocols for molecular analyses that often require screening large numbers of DNA fragments. In these cases, automated sequence retrieval and processing become important features, and furthermore, a graphic that provides the user with a visual guide to the distribution of designed primers across targets is most helpful in quickly ascertaining primer coverage. To this end, I describe here, PrimerMapper, which provides a comprehensive graphical user interface that designs robust primers from any number of inputted sequences while providing the user with both, graphical maps of primer distribution for each inputted sequence, and also a global assembled map of all inputted sequences with designed primers. PrimerMapper also enables the visualization of graphical maps within a browser and allows the user to draw new primers directly onto the webpage. Other features of PrimerMapper include allele-specific design features for SNP genotyping, a remote BLAST window to NCBI databases, and remote sequence retrieval from GenBank and dbSNP. PrimerMapper is hosted at GitHub and freely available without restriction. PMID:26853558

  14. Forensically relevant SNaPshot(®) assays for human DNA SNP analysis: a review.

    PubMed

    Mehta, Bhavik; Daniel, Runa; Phillips, Chris; McNevin, Dennis

    2017-01-01

    Short tandem repeats are the gold standard for human identification but are not informative for forensic DNA phenotyping (FDP). Single-nucleotide polymorphisms (SNPs) as genetic markers can be applied to both identification and FDP. The concept of DNA intelligence emerged with the potential for SNPs to infer biogeographical ancestry (BGA) and externally visible characteristics (EVCs), which together enable the FDP process. For more than a decade, the SNaPshot(®) technique has been utilised to analyse identity and FDP-associated SNPs in forensic DNA analysis. SNaPshot is a single-base extension (SBE) assay with capillary electrophoresis as its detection system. This multiplexing technique offers the advantage of easy integration into operational forensic laboratories without the requirement for any additional equipment. Further, the SNP panels from SNaPshot(®) assays can be incorporated into customised panels for massively parallel sequencing (MPS). Many SNaPshot(®) assays are available for identity, BGA and EVC profiling with examples including the well-known SNPforID 52-plex identity assay, the SNPforID 34-plex BGA assay and the HIrisPlex EVC assay. This review lists the major forensically relevant SNaPshot(®) assays for human DNA SNP analysis and can be used as a guide for selecting the appropriate assay for specific identity and FDP applications.

  15. Differentiation of drug and non-drug Cannabis using a single nucleotide polymorphism (SNP) assay.

    PubMed

    Rotherham, D; Harbison, S A

    2011-04-15

    Cannabis sativa is both an illegal drug and a legitimate crop. The differentiation of illegal drug Cannabis from non-drug forms of Cannabis is relevant in the context of the growth of fibre and seed oil varieties of Cannabis for commercial purposes. This differentiation is currently determined based on the levels of tetrahydrocannabinol (THC) in adult plants. DNA based methods have the potential to assay Cannabis material unsuitable for analysis using conventional means including seeds, pollen and severely degraded material. The purpose of this research was to develop a single nucleotide polymorphism (SNP) assay for the differentiation of "drug" and "non-drug"Cannabis plants. An assay was developed based on four polymorphisms within a 399 bp fragment of the tetrahydrocannabinolic acid (THCA) synthase gene, utilising the snapshot multiplex kit. This SNP assay was tested on 94 Cannabis plants, which included 10 blind samples, and was able to differentiate between "drug" and "non-drug"Cannabis in all cases, while also differentiating between Cannabis and other species. Non-drug plants were found to be homozygous at the four sites assayed while drug Cannabis plants were either homozygous or heterozygous.

  16. Heritability of Recurrent Exertional Rhabdomyolysis in Standardbred and Thoroughbred Racehorses Derived From SNP Genotyping Data.

    PubMed

    Norton, Elaine M; Mickelson, James R; Binns, Matthew M; Blott, Sarah C; Caputo, Paul; Isgren, Cajsa M; McCoy, Annette M; Moore, Alison; Piercy, Richard J; Swinburne, June E; Vaudin, Mark; McCue, Molly E

    2016-11-01

    Recurrent exertional rhabdomyolysis (RER) in Thoroughbred and Standardbred racehorses is characterized by episodes of muscle rigidity and cell damage that often recur upon strenuous exercise. The objective was to evaluate the importance of genetic factors in RER by obtaining an unbiased estimate of heritability in cohorts of unrelated Thoroughbred and Standardbred racehorses. Four hundred ninety-one Thoroughbred and 196 Standardbred racehorses were genotyped with the 54K or 74K SNP genotyping arrays. Heritability was calculated from genome-wide SNP data with a mixed linear and Bayesian model, utilizing the standard genetic relationship matrix (GRM). Both the mixed linear and Bayesian models estimated heritability of RER in Thoroughbreds to be approximately 0.34 and in Standardbred racehorses to be approximately 0.45 after adjusting for disease prevalence and sex. To account for potential differences in the genetic architecture of the underlying causal variants, heritability estimates were adjusted based on linkage disequilibrium weighted kinship matrix, minor allele frequency and variant effect size, yielding heritability estimates that ranged between 0.41-0.46 (Thoroughbreds) and 0.39-0.49 (Standardbreds). In conclusion, between 34-46% and 39-49% of the variance in RER susceptibility in Thoroughbred and Standardbred racehorses, respectively, can be explained by the SNPs present on these 2 genotyping arrays, indicating that RER is moderately heritable. These data provide further rationale for the investigation of genetic mutations associated with RER susceptibility.

  17. Y-SNP analysis versus Y-haplogroup predictor in the Slovak population.

    PubMed

    Petrejcíková, Eva; Carnogurská, Jana; Hronská, Danica; Bernasovská, Jarmila; Boronová, Iveta; Gabriková, Dana; Bôziková, Alexandra; Maceková, Sona

    2014-01-01

    Human Y-chromosome haplogroups are important markers used mainly in population genetic studies. The haplogroups are defined by several SNPs according to the phylogeny and international nomenclature. The alternative method to estimate the Y-chromosome haplogroups is to predict Y-chromosome haplotypes from a set of Y-STR markers using software for Y-haplogroup prediction. The purpose of this study was to compare the accuracy of three types of Y-haplogroup prediction software and to determine the structure of Slovak population revealed by the Y-chromosome haplogroups. We used a sample of 166 Slovak males in which 12 Y-STR markers were genotyped in our previous study. These results were analyzed by three different software products that predict Y-haplogroups. To estimate the accuracy of these prediction software, Y-haplogroups were determined in the same sample by genotyping Y-chromosome SNPs. Haplogroups were correctly predicted in 98.80% (Whit Athey's Haplogroup Predictor), 97.59% (Jim Cullen's Haplogroup Predictor) and 98.19% (YPredictor by Vadim Urasin 1.5.0) of individuals. The occurrence of errors in Y-chromosome haplogroup prediction suggests that the validation using SNP analysis is appropriate when high accuracy is required. The results of SNP based haplotype determination indicate that 39.15% of the Slovak population belongs to R1a-M198 lineage, which is one of the main European lineages.

  18. Analysis of a claimed distant relationship in a deficient pedigree using high density SNP data.

    PubMed

    Lareu, M V; García-Magariños, M; Phillips, C; Quintela, I; Carracedo, A; Salas, A

    2012-05-01

    DNA markers are routinely used to reveal both simple and complex family relationships. Likelihood based approaches have been traditionally used to estimate relationships using relatively few unlinked markers. However it is widely recognized that when using such limited numbers of loci distant relationships between two individuals cannot be distinguished from the average level of allele sharing found in random pairwise comparisons in the same population. As a real example, we demonstrate the usefulness of genome-wide SNP genotyping to analyze a claimed second cousin relationship that could not be resolved using standard forensic markers, confirming theoretical expectations for very distant relationships. Genome profiles derived from Affymetrix 6.0 SNP arrays obtained from the claimed second cousins were compared to profiles obtained from unrelated individuals and simulated data. Significance of the high estimated probabilities in favor of the second cousin relationship hypothesis was proved from the results obtained with both real and simulated unrelated pairs. As a final cautionary note, it is important to consider that successful identification of the claimed distant relationship reported here is largely due to a well-founded hypothesis being compared to the alternative hypothesis of the claimants being unrelated, but where there are several possible alternative hypotheses, the approach we outline here can yield false indications of unfounded alternative relationships.

  19. Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics.

    PubMed

    Lamparter, David; Marbach, Daniel; Rueedi, Rico; Kutalik, Zoltán; Bergmann, Sven

    2016-01-01

    Integrating single nucleotide polymorphism (SNP) p-values from genome-wide association studies (GWAS) across genes and pathways is a strategy to improve statistical power and gain biological insight. Here, we present Pascal (Pathway scoring algorithm), a powerful tool for computing gene and pathway scores from SNP-phenotype association summary statistics. For gene score computation, we implemented analytic and efficient numerical solutions to calculate test statistics. We examined in particular the sum and the maximum of chi-squared statistics, which measure the strongest and the average association signals per gene, respectively. For pathway scoring, we use a modified Fisher method, which offers not only significant power improvement over more traditional enrichment strategies, but also eliminates the problem of arbitrary threshold selection inherent in any binary membership based pathway enrichment approach. We demonstrate the marked increase in power by analyzing summary statistics from dozens of large meta-studies for various traits. Our extensive testing indicates that our method not only excels in rigorous type I error control, but also results in more biologically meaningful discoveries.

  20. Diversity in 113 cowpea [Vigna unguiculata (L) Walp] accessions assessed with 458 SNP markers.

    PubMed

    Egbadzor, Kenneth F; Ofori, Kwadwo; Yeboah, Martin; Aboagye, Lawrence M; Opoku-Agyeman, Michael O; Danquah, Eric Y; Offei, Samuel K

    2014-01-01

    Single Nucleotide Polymorphism (SNP) markers were used in characterization of 113 cowpea accessions comprising of 108 from Ghana and 5 from abroad. Leaf tissues from plants cultivated at the University of Ghana were genotyped at KBioscience in the United Kingdom. Data was generated for 477 SNPs, out of which 458 revealed polymorphism. The results were used to analyze genetic dissimilarity among the accessions using Darwin 5 software. The markers discriminated among all of the cowpea accessions and the dissimilarity values which ranged from 0.006 to 0.63 were used for factorial plot. Unexpected high levels of heterozygosity were observed on some of the accessions. Accessions known to be closely related clustered together in a dendrogram drawn with WPGMA method. A maximum length sub-tree which comprised of 48 core accessions was constructed. The software package structure was used to separate accessions into three groups, and the programme correctly identified varieties that were known hybrids. The hybrids were those accessions with numerous heterozygous loci. The structure plot showed closely related accessions with similar genome patterns. The SNP markers were more efficient in discriminating among the cowpea germplasm than morphological, seed protein polymorphism and simple sequence repeat studies reported earlier on the same collection.

  1. SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments

    PubMed Central

    Taylor, Ben; Delaney, Aidan J.; Soares, Jorge; Seemann, Torsten; Keane, Jacqueline A.; Harris, Simon R.

    2016-01-01

    Rapidly decreasing genome sequencing costs have led to a proportionate increase in the number of samples used in prokaryotic population studies. Extracting single nucleotide polymorphisms (SNPs) from a large whole genome alignment is now a routine task, but existing tools have failed to scale efficiently with the increased size of studies. These tools are slow, memory inefficient and are installed through non-standard procedures. We present SNP-sites which can rapidly extract SNPs from a multi-FASTA alignment using modest resources and can output results in multiple formats for downstream analysis. SNPs can be extracted from a 8.3 GB alignment file (1842 taxa, 22 618 sites) in 267 seconds using 59 MB of RAM and 1 CPU core, making it feasible to run on modest computers. It is easy to install through the Debian and Homebrew package managers, and has been successfully tested on more than 20 operating systems. SNP-sites is implemented in C and is available under the open source license GNU GPL version 3. PMID:28348851

  2. Linkage Disequilibrium Patterns and tagSNP Transferability among European Populations

    PubMed Central

    Mueller, Jakob C.; Lõhmussaar, Elin; Mägi, Reedik; Remm, Maido; Bettecken, Thomas; Lichtner, Peter; Biskup, Saskia; Illig, Thomas; Pfeufer, Arne; Luedemann, Jan; Schreiber, Stefan; Pramstaller, Peter; Pichler, Irene; Romeo, Giovanni; Gaddi, Anthony; Testa, Alessandra; Wichmann, Heinz-Erich; Metspalu, Andres; Meitinger, Thomas

    2005-01-01

    The pattern of linkage disequilibrium (LD) is critical for association studies, in which disease-causing variants are identified by allelic association with adjacent markers. The aim of this study is to compare the LD patterns in several distinct European populations. We analyzed four genomic regions (in total, 749 kb) containing candidate genes for complex traits. Individuals were genotyped for markers that are evenly distributed at an average spacing of ∼2–4 kb in eight population-based samples from ongoing epidemiological studies across Europe. The Centre d'Etude du Polymorphisme Humain (CEPH) trios of the HapMap project were included and were used as a reference population. In general, we observed a conservation of the LD patterns across European samples. Nevertheless, shifts in the positions of the boundaries of high-LD regions can be demonstrated between populations, when assessed by a novel procedure based on bootstrapping. Transferability of LD information among populations was also tested. In two of the analyzed gene regions, sets of tagging single-nucleotide polymorphisms (tagSNPs) selected from the HapMap CEPH trios performed surprisingly well in all local European samples. However, significant variation in the other two gene regions predicts a restricted applicability of CEPH-derived tagging markers. Simulations based on our data set show the extent to which further gain in tagSNP efficiency and transferability can be achieved by increased SNP density. PMID:15637659

  3. Population structure of Atlantic mackerel inferred from RAD-seq-derived SNP markers: effects of sequence clustering parameters and hierarchical SNP selection.

    PubMed

    Rodríguez-Ezpeleta, Naiara; Bradbury, Ian R; Mendibil, Iñaki; Álvarez, Paula; Cotano, Unai; Irigoien, Xabier

    2016-07-01

    Restriction-site-associated DNA sequencing (RAD-seq) and related methods are revolutionizing the field of population genomics in nonmodel organisms as they allow generating an unprecedented number of single nucleotide polymorphisms (SNPs) even when no genomic information is available. Yet, RAD-seq data analyses rely on assumptions on nature and number of nucleotide variants present in a single locus, the choice of which may lead to an under- or overestimated number of SNPs and/or to incorrectly called genotypes. Using the Atlantic mackerel (Scomber scombrus L.) and a close relative, the Atlantic chub mackerel (Scomber colias), as case study, here we explore the sensitivity of population structure inferences to two crucial aspects in RAD-seq data analysis: the maximum number of mismatches allowed to merge reads into a locus and the relatedness of the individuals used for genotype calling and SNP selection. Our study resolves the population structure of the Atlantic mackerel, but, most importantly, provides insights into the effects of alternative RAD-seq data analysis strategies on population structure inferences that are directly applicable to other species.

  4. SNP@Domain: a web resource of single nucleotide polymorphisms (SNPs) within protein domain structures and sequences

    PubMed Central

    Han, Areum; Kang, Hyo Jin; Cho, Yoobok; Lee, Sunghoon; Kim, Young Joo; Gong, Sungsam

    2006-01-01

    The single nucleotide polymorphisms (SNPs) in conserved protein regions have been thought to be strong candidates that alter protein functions. Thus, we have developed SNP@Domain, a web resource, to identify SNPs within human protein domains. We annotated SNPs from dbSNP with protein structure-based as well as sequence-based domains: (i) structure-based using SCOP and (ii) sequence-based using Pfam to avoid conflicts from two domain assignment methodologies. Users can investigate SNPs within protein domains with 2D and 3D maps. We expect this visual annotation of SNPs within protein domains will help scientists select and interpret SNPs associated with diseases. A web interface for the SNP@Domain is freely available at and from . PMID:16845090

  5. Publishing SNP genotypes of human embryonic stem cell lines: policy statement of the International Stem Cell Forum Ethics Working Party.

    PubMed

    Knoppers, Bartha M; Isasi, Rosario; Benvenisty, Nissim; Kim, Ock-Joo; Lomax, Geoffrey; Morris, Clive; Murray, Thomas H; Lee, Eng Hin; Perry, Margery; Richardson, Genevra; Sipp, Douglas; Tanner, Klaus; Wahlström, Jan; de Wert, Guido; Zeng, Fanyi

    2011-09-01

    Novel methods and associated tools permitting individual identification in publicly accessible SNP databases have become a debatable issue. There is growing concern that current technical and ethical safeguards to protect the identities of donors could be insufficient. In the context of human embryonic stem cell research, there are no studies focusing on the probability that an hESC line donor could be identified by analyzing published SNP profiles and associated genotypic and phenotypic information. We present the International Stem Cell Forum (ISCF) Ethics Working Party's Policy Statement on "Publishing SNP Genotypes of Human Embryonic Stem Cell Lines (hESC)". The Statement prospectively addresses issues surrounding the publication of genotypic data and associated annotations of hESC lines in open access databases. It proposes a balanced approach between the goals of open science and data sharing with the respect for fundamental bioethical principles (autonomy, privacy, beneficence, justice and research merit and integrity).

  6. SNP@lincTFBS: an integrated database of polymorphisms in human LincRNA transcription factor binding sites.

    PubMed

    Ning, Shangwei; Zhao, Zuxianglan; Ye, Jingrun; Wang, Peng; Zhi, Hui; Li, Ronghong; Wang, Tingting; Wang, Jianjian; Wang, Lihua; Li, Xia

    2014-01-01

    Large intergenic non-coding RNAs (lincRNAs) are a new class of functional transcripts, and aberrant expression of lincRNAs was associated with several human diseases. The genetic variants in lincRNA transcription factor binding sites (TFBSs) can change lincRNA expression, thereby affecting the susceptibility to human diseases. To identify and annotate these functional candidates, we have developed a database SNP@lincTFBS, which is devoted to the exploration and annotation of single nucleotide polymorphisms (SNPs) in potential TFBSs of human lincRNAs. We identified 6,665 SNPs in 6,614 conserved TFBSs of 2,423 human lincRNAs. In addition, with ChIPSeq dataset, we identified 139,576 SNPs in 304,517 transcription factor peaks of 4,813 lincRNAs. We also performed comprehensive annotation for these SNPs using 1000 Genomes Project datasets across 11 populations. Moreover, one of the distinctive features of SNP@lincTFBS is the collection of disease-associated SNPs in the lincRNA TFBSs and SNPs in the TFBSs of disease-associated lincRNAs. The web interface enables both flexible data searches and downloads. Quick search can be query of lincRNA name, SNP identifier, or transcription factor name. SNP@lincTFBS provides significant advances in identification of disease-associated lincRNA variants and improved convenience to interpret the discrepant expression of lincRNAs. The SNP@lincTFBS database is available at http://bioinfo.hrbmu.edu.cn/SNP_lincTFBS.

  7. Assignment of SNP allelic configuration in polyploids using competitive allele-specific PCR: application to citrus triploid progeny

    PubMed Central

    Cuenca, José; Aleza, Pablo; Navarro, Luis; Ollitrault, Patrick

    2013-01-01

    Background Polyploidy is a major component of eukaryote evolution. Estimation of allele copy numbers for molecular markers has long been considered a challenge for polyploid species, while this process is essential for most genetic research. With the increasing availability and whole-genome coverage of single nucleotide polymorphism (SNP) markers, it is essential to implement a versatile SNP genotyping method to assign allelic configuration efficiently in polyploids. Scope This work evaluates the usefulness of the KASPar method, based on competitive allele-specific PCR, for the assignment of SNP allelic configuration. Citrus was chosen as a model because of its economic importance, the ongoing worldwide polyploidy manipulation projects for cultivar and rootstock breeding, and the increasing availability of SNP markers. Conclusions Fifteen SNP markers were successfully designed that produced clear allele signals that were in agreement with previous genotyping results at the diploid level. The analysis of DNA mixes between two haploid lines (Clementine and pummelo) at 13 different ratios revealed a very high correlation (average = 0·9796; s.d. = 0·0094) between the allele ratio and two parameters [θ angle = tan−1 (y/x) and y′ = y/(x + y)] derived from the two normalized allele signals (x and y) provided by KASPar. Separated cluster analysis and analysis of variance (ANOVA) from mixed DNA simulating triploid and tetraploid hybrids provided 99·71 % correct allelic configuration. Moreover, triploid populations arising from 2n gametes and interploid crosses were easily genotyped and provided useful genetic information. This work demonstrates that the KASPar SNP genotyping technique is an efficient way to assign heterozygous allelic configurations within polyploid populations. This method is accurate, simple and cost-effective. Moreover, it may be useful for quantitative studies, such as relative allele-specific expression analysis and bulk segregant analysis

  8. Combining fMRI and SNP Data to Investigate Connections Between Brain Function and Genetics Using Parallel ICA

    PubMed Central

    Liu, Jingyu; Pearlson, Godfrey; Windemuth, Andreas; Ruano, Gualberto; Perrone-Bizzozero, Nora I.; Calhoun, Vince

    2009-01-01

    There is current interest in understanding genetic influences on both healthy and disordered brain function. We assessed brain function with functional magnetic resonance imaging (fMRI) data collected during an auditory oddball task—detecting an infrequent sound within a series of frequent sounds. Then, task-related imaging findings were utilized as potential intermediate phenotypes (endophenotypes) to investigate genomic factors derived from a single nucleotide polymorphism (SNP) array. Our target is the linkage of these genomic factors to normal/abnormal brain functionality. We explored parallel independent component analysis (paraICA) as a new method for analyzing multimodal data. The method was aimed to identify simultaneously independent components of each modality and the relationships between them. When 43 healthy controls and 20 schizophrenia patients, all Caucasian, were studied, we found a correlation of 0.38 between one fMRI component and one SNP component. This fMRI component consisted mainly of parietal lobe activations. The relevant SNP component was contributed to significantly by 10 SNPs located in genes, including those coding for the nicotinic α-7cholinergic receptor, aromatic amino acid decarboxylase, disrupted in schizophrenia 1, among others. Both fMRI and SNP components showed significant differences in loading parameters between the schizophrenia and control groups (P = 0.0006 for the fMRI component; P = 0.001 for the SNP component). In summary, we constructed a framework to identify interactions between brain functional and genetic information; our findings provide a proof-of-concept that genomic SNP factors can be investigated by using endophenotypic imaging findings in a multivariate format. PMID:18072279

  9. A non-synonymous SNP with the allele frequency correlated with the altitude may contribute to the hypoxia adaptation of Tibetan chicken

    PubMed Central

    Wang, Yan; Yin, Huadong; Zhou, Lanyun; Zhong, Chengling

    2017-01-01

    The hypoxia adaptation to high altitudes is of considerable interest in the biological sciences. As a breed with adaptability to highland environments, the Tibetan chicken (Gallus gallus domestics), provides a biological model to search for genetic differences between high and lowland chickens. To address mechanisms of hypoxia adaptability at high altitudes for the Tibetan chicken, we focused on the Endothelial PAS domain protein 1 (EPAS1), a key regulatory factor in hypoxia responses. Detected were polymorphisms of EPAS1 exons in 157 Tibetan chickens from 8 populations and 139 lowland chickens from 7 breeds. We then designed 15 pairs of primers to amplify exon sequences by Sanger sequencing methods. Six single nucleotide polymorphisms (SNPs) were detected, including 2 missense mutations (SNP3 rs316126786 and SNP5 rs740389732) and 4 synonymous mutations (SNP1 rs315040213, SNP4 rs739281102, SNP6 rs739010166, and SNP2 rs14330062). There were negative correlations between altitude and mutant allele frequencies for both SNP6 (rs739010166, r = 0.758, p<0.001) and SNP3 (rs316126786, r = 0.844, P<0.001). We also aligned the EPAS1 protein with ortholog proteins from diverse vertebrates and focused that SNP3 (Y333C) was a conserved site among species. Also, SNP3 (Y333C) occurred in a well-defined protein domain Per-AhR-Arnt-Sim (PAS domain). These results imply that SNP3 (Y333C) is the most likely casual mutation for the high-altitude adaption in Tibetan chicken. These variations of EPAS1 provide new insights into the gene’s function. PMID:28222154

  10. The MDM2 promoter polymorphism SNP309T→G and the risk of uterine leiomyosarcoma, colorectal cancer, and squamous cell carcinoma of the head and neck

    PubMed Central

    Alhopuro, P; Ylisaukko-oja, S; Koskinen, W; Bono, P; Arola, J; Jarvinen, H; Mecklin, J; Atula, T; Kontio, R; Makitie, A; Suominen, S; Leivo, I; Vahteristo, P; Aaltonen, L; Aaltonen, L

    2005-01-01

    Background: MDM2 acts as a principal regulator of the tumour suppressor p53 by targeting its destruction through the ubiquitin pathway. A polymorphism in the MDM2 promoter (SNP309) was recently identified. SNP309 was shown to result, via Sp1, in higher levels of MDM2 RNA and protein, and subsequent attenuation of the p53 pathway. Furthermore, SNP309 was proposed to be associated with accelerated soft tissue sarcoma formation in both hereditary (Li-Fraumeni) and sporadic cases in humans. Methods: We evaluated the possible contribution of SNP309 to three tumour types known to be linked with the MDM2/p53 pathway, using genomic sequencing or restriction fragment length polymorphism as screening methods. Three separate Finnish tumour materials (population based sets of 68 patients with early onset uterine leiomyosarcomas and 1042 patients with colorectal cancer, and a series of 162 patients with squamous cell carcinoma of the head and neck) and a set of 185 healthy Finnish controls were analysed for SNP309. Results: Frequencies of SNP309 were similar in all four cohorts. In the colorectal cancer series, SNP309 was somewhat more frequent in women and in patients with microsatellite stable tumours. Female SNP309 carriers were diagnosed with colorectal cancer approximately 2.7 years earlier than those carrying the wild type gene. However, no statistically significant association of SNP309 with patients' age at disease onset or to any other clinicopathological parameter was found in these three tumour materials. Conclusion: SNP309 had no significant contribution to tumour formation in our materials. Possible associations of SNP309 with microsatellite stable colorectal cancer and with earlier disease onset in female carriers need to be examined in subsequent studies. PMID:16141004

  11. Pacifiplex: an ancestry-informative SNP panel centred on Australia and the Pacific region.

    PubMed

    Santos, Carla; Phillips, Christopher; Fondevila, Manuel; Daniel, Runa; van Oorschot, Roland A H; Burchard, Esteban G; Schanfield, Moses S; Souto, Luis; Uacyisrael, Jolame; Via, Marc; Carracedo, Ángel; Lareu, Maria V

    2016-01-01

    The analysis of human population variation is an area of considerable interest in the forensic, medical genetics and anthropological fields. Several forensic single nucleotide polymorphism (SNP) assays provide ancestry-informative genotypes in sensitive tests designed to work with limited DNA samples, including a 34-SNP multiplex differentiating African, European and East Asian ancestries. Although assays capable of differentiating Oceanian ancestry at a global scale have become available, this study describes markers compiled specifically for differentiation of Oceanian populations. A sensitive multiplex assay, termed Pacifiplex, was developed and optimized in a small-scale test applicable to forensic analyses. The Pacifiplex assay comprises 29 ancestry-informative marker SNPs (AIM-SNPs) selected to complement the 34-plex test, that in a combined set distinguish Africans, Europeans, East Asians and Oceanians. Nine Pacific region study populations were genotyped with both SNP assays, then compared to four reference population groups from the HGDP-CEPH human diversity panel. STRUCTURE analyses estimated population cluster membership proportions that aligned with the patterns of variation suggested for each study population's currently inferred demographic histories. Aboriginal Taiwanese and Philippine samples indicated high East Asian ancestry components, Papua New Guinean and Aboriginal Australians samples were predominantly Oceanian, while other populations displayed cluster patterns explained by the distribution of divergence amongst Melanesians, Polynesians and Micronesians. Genotype data from Pacifiplex and 34-plex tests is particularly well suited to analysis of Australian Aboriginal populations and when combined with Y and mitochondrial DNA variation will provide a powerful set of markers for ancestry inference applied to modern Australian demographic profiles. On a broader geographic scale, Pacifiplex adds highly informative data for inferring the ancestry

  12. SNP Regulation of microRNA Expression and Subsequent Colon Cancer Risk

    PubMed Central

    Mullany, Lila E.; Wolff, Roger K.; Herrick, Jennifer S.; Buas, Matthew F.; Slattery, Martha L.

    2015-01-01

    Introduction MicroRNAs (miRNAs) regulate messenger RNAs (mRNAs) and as such have been implicated in a variety of diseases, including cancer. MiRNAs regulate mRNAs through binding of the miRNA 5’ seed sequence (~7–8 nucleotides) to the mRNA 3’ UTRs; polymorphisms in these regions have the potential to alter miRNA-mRNA target associations. SNPs in miRNA genes as well as miRNA-target genes have been proposed to influence cancer risk through altered miRNA expression levels. Methods MiRNA-SNPs and miRNA-target gene-SNPs were identified through the literature. We used SNPs from Genome-Wide Association Study (GWAS) data that were matched to individuals with miRNA expression data generated from an Agilent platform for colon tumor and non-tumor paired tissues. These samples were used to evaluate 327 miRNA-SNP pairs for associations between SNPs and miRNA expression levels as well as for SNP associations with colon cancer. Results Twenty-two miRNAs expressed in non-tumor tissue were significantly different by genotype and 21 SNPs were associated with altered tumor/non-tumor differential miRNA expression across genotypes. Two miRNAs were associated with SNP genotype for both non-tumor and tumor/non-tumor differential expression. Of the 41 miRNAs significantly associated with SNPs all but seven were significantly differentially expressed in colon tumor tissue. Two of the 41 SNPs significantly associated with miRNA expression levels were associated with colon cancer risk: rs8176318 (BRCA1), ORAA 1.31 95% CI 1.01, 1.78, and rs8905 (PRKAR1A), ORGG 2.31 95% CI 1.11, 4.77. Conclusion Of the 327 SNPs identified in the literature as being important because of their potential regulation of miRNA expression levels, 12.5% had statistically significantly associations with miRNA expression. However, only two of these SNPs were significantly associated with colon cancer. PMID:26630397

  13. Efficient fast heuristic algorithms for minimum error correction haplotyping from SNP fragments.

    PubMed

    Anaraki, Maryam Pourkamali; Sadeghi, Mehdi

    2014-01-01

    Availability of complete human genome is a crucial factor for genetic studies to explore possible association between the genome and complex diseases. Haplotype, as a set of single nucleotide polymorphisms (SNPs) on a single chromosome, is believed to contain promising data for disease association studies, detecting natural positive selection and recombination hotspots. Various computational methods for haplotype reconstruction from aligned fragment of SNPs have already been proposed. This study presents a novel approach to obtain paternal and maternal haplotypes form the SNP fragments on minimum error correction (MEC) model. Reconstructing haplotypes in MEC model is an NP-hard problem. Therefore, our proposed methods employ two fast and accurate clustering techniques as the core of their procedure to efficiently solve this ill-defined problem. The assessment of our approaches, compared to conventional methods, on two real benchmark datasets, i.e., ACE and DALY, proves the efficiency and accuracy.

  14. Cloning, chromosomal localization, SNP detection and association analysis of the porcine IRS-1 gene.

    PubMed

    Niu, P-X; Huang, Z; Li, C-C; Fan, B; Li, K; Liu, B; Yu, M; Zhao, S-H

    2009-11-01

    Insulin receptor substrate-1(IRS-1) gene is one member of the Insulin receptor substrate (IRS) gene family, which plays an important role in mediating the growth of skeletal muscle and the molecular metabolism of type 2 diabetes. Here, we cloned a 3,573 bp fragment of the partial CDS sequence of porcine IRS-1 gene by in silicon cloning strategy and RT-PCR method. The porcine IRS-1 gene was assigned to SSC15q25 by using IMpRH. Sequencing of PCR products from Duroc and Tibetan pig breeds identified one SNP in exon 1 of porcine IRS-1 gene (C3257A polymorphisms). Association analysis of genotypes with the growth traits, anatomy traits, meat quality traits and physiological biochemical indexes traits showed that different genotypes at locus 3,257 of IRS-1 have significant differences in carcass straight length in pigs (P = 0.0102 \\ 0.05).

  15. Syndromic ciliopathies: From single gene to multi gene analysis by SNP arrays and next generation sequencing.

    PubMed

    Knopp, C; Rudnik-Schöneborn, S; Eggermann, T; Bergmann, C; Begemann, M; Schoner, K; Zerres, K; Ortiz Brüchle, N

    2015-10-01

    Joubert syndrome (JS) and related disorders (JSRD), Meckel syndrome (MKS) and Bardet-Biedl syndrome (BBS) are autosomal recessive ciliopathies with a broad clinical and genetic overlap. In our multiethnic cohort of 88 MKS, 61 JS/JSRD and 66 BBS families we performed genetic analyses and were able to determine mutation frequencies and detection rates for the most frequently mutated MKS genes. On the basis of determined mutation frequencies, a next generation gene panel for JS/JSRD and MKS was established. Furthermore 35 patients from 26 unrelated consanguineous families were investigated by SNP array-based homozygosity mapping and subsequent DNA sequencing of known candidate genes according to runs of homozygosity size in descending order. This led to the identification of the causative homozygous mutation in 62% of unrelated index cases. Based on our data we discuss various strategies for diagnostic mutation detection in the syndromic ciliopathies JS/JSRD, MKS and BBS.

  16. A high-performance computing toolset for relatedness and principal component analysis of SNP data.

    PubMed

    Zheng, Xiuwen; Levine, David; Shen, Jess; Gogarten, Stephanie M; Laurie, Cathy; Weir, Bruce S

    2012-12-15

    Genome-wide association studies are widely used to investigate the genetic basis of diseases and traits, but they pose many computational challenges. We developed gdsfmt and SNPRelate (R packages for multi-core symmetric multiprocessing computer architectures) to accelerate two key computations on SNP data: principal component analysis (PCA) and relatedness analysis using identity-by-descent measures. The kernels of our algorithms are written in C/C++ and highly optimized. Benchmarks show the uniprocessor implementations of PCA and identity-by-descent are ∼8-50 times faster than the implementations provided in the popular EIGENSTRAT (v3.0) and PLINK (v1.07) programs, respectively, and can be sped up to 30-300-fold by using eight cores. SNPRelate can analyse tens of thousands of samples with millions of SNPs. For example, our package was used to perform PCA on 55 324 subjects from the 'Gene-Environment Association Studies' consortium studies.

  17. Development of SNP markers identifying European wildcats, domestic cats, and their admixed progeny.

    PubMed

    Nussberger, B; Greminger, M P; Grossen, C; Keller, L F; Wandeler, P

    2013-05-01

    Introgression can be an important evolutionary force but it can also lead to species extinction and as such is a crucial issue for species conservation. However, introgression is difficult to detect, morphologically as well as genetically. Hybridization with domestic cats (Felis silvestris catus) is a major concern for the conservation of European wildcats (Felis s. silvestris). The available morphologic and genetic markers for the two Felis subspecies are not sufficient to reliably detect hybrids beyond first generation. Here we present a single nucleotide polymorphism (SNP) based approach that allows the identification of introgressed individuals. Using high-throughput sequencing of reduced representation libraries we developed a diagnostic marker set containing 48 SNPs (Fst > 0.8) which allows the identification of wildcats, domestic cats, their hybrids and backcrosses. This allows assessing introgression rate in natural wildcat populations and is key for a better understanding of hybridization processes.

  18. Specificity of SNP detection with molecular beacons is improved by stem and loop separation with spacers.

    PubMed

    Farzan, Valentina M; Markelov, Mikhail L; Skoblov, Alexander Yu; Shipulin, German A; Zatsepin, Timofei S

    2017-03-13

    Molecular beacons (MBs) are valuable tools in molecular biology, clinical diagnostics and analytical chemistry. Here we describe a novel approach for the design of MBs with nucleotide or non-nucleotide linkers between the stem and loop regions. Such modified MBs have significantly improved specificity and performance for single nucleotide polymorphism (SNP) detection. These advantages are especially distinct, when compared to the classic MBs, in the case of possible interactions between the stem and loop regions. We demonstrated the applicability of such modified MBs for the discrimination of common Factor V, NOS3 and ADRB2 SNPs in model plasmids and in clinical samples. The developed approach could be applicable not only to fluorescently labeled MBs, but also to other biosensors based on nucleic acids with stem-loop structures.

  19. [Mechanism of genuineness of Glycyrrhiza uralensis based on SNP of β-Amyrin synthase gene].

    PubMed

    Zang, Yi-mei; Li, Yan-peng; Qiao, Jing; Chen, Hong-hao; Liu, Chun-sheng

    2015-07-01

    β-Amyrin synthase (β-AS) genes of Glycyrrhiza uralensis from 6 different regions were analyzed by PCR-SSCP and sequenced, then the correlationship between β-AS SNP and regions of Glycyrrhiza uralensis were determined. According to the 1 coding single nucleotide polymorphism on the first exon of β-AS gene at 94 bp site, Glycyrrhiza uralensis could be divided into 3 genotypes. In these genotypes, the percentage of 94A type in genuine regions was much higher, and it had significant differences with the percentage in non-genuine regions (P < 0.001). The results of the experiment proved that different β-AS genotypes at 94 bp site from different regions may be one of the important reasons to result in the genuineness of Glycyrrhiza uralensis.

  20. Design and synthesis of the superionic conductor Na10SnP2S12

    NASA Astrophysics Data System (ADS)

    Richards, William D.; Tsujimura, Tomoyuki; Miara, Lincoln J.; Wang, Yan; Kim, Jae Chul; Ong, Shyue Ping; Uechi, Ichiro; Suzuki, Naoki; Ceder, Gerbrand

    2016-03-01

    Sodium-ion batteries are emerging as candidates for large-scale energy storage due to their low cost and the wide variety of cathode materials available. As battery size and adoption in critical applications increases, safety concerns are resurfacing due to the inherent flammability of organic electrolytes currently in use in both lithium and sodium battery chemistries. Development of solid-state batteries with ionic electrolytes eliminates this concern, while also allowing novel device architectures and potentially improving cycle life. Here we report the computation-assisted discovery and synthesis of a high-performance solid-state electrolyte material: Na10SnP2S12, with room temperature ionic conductivity of 0.4 mS cm-1 rivalling the conductivity of the best sodium sulfide solid electrolytes to date. We also computationally investigate the variants of this compound where tin is substituted by germanium or silicon and find that the latter may achieve even higher conductivity.

  1. Prediction of a time-to-event trait using genome wide SNP data

    PubMed Central

    2013-01-01

    Background A popular objective of many high-throughput genome projects is to discover various genomic markers associated with traits and develop statistical models to predict traits of future patients based on marker values. Results In this paper, we present a prediction method for time-to-event traits using genome-wide single-nucleotide polymorphisms (SNPs). We also propose a MaxTest associating between a time-to-event trait and a SNP accounting for its possible genetic models. The proposed MaxTest can help screen out nonprognostic SNPs and identify genetic models of prognostic SNPs. The performance of the proposed method is evaluated through simulations. Conclusions In conjunction with the MaxTest, the proposed method provides more parsimonious prediction models but includes more prognostic SNPs than some naive prediction methods. The proposed method is demonstrated with real GWAS data. PMID:23418752

  2. Olive oil DNA fingerprinting by multiplex SNP genotyping on fluorescent microspheres.

    PubMed

    Kalogianni, Despina P; Bazakos, Christos; Boutsika, Lemonia M; Targem, Mehdi Ben; Christopoulos, Theodore K; Kalaitzis, Panagiotis; Ioannou, Penelope C

    2015-04-01

    Olive oil cultivar verification is of primary importance for the competitiveness of the product and the protection of consumers and producers from fraudulence. Single-nucleotide polymorphisms (SNPs) have emerged as excellent DNA markers for authenticity testing. This paper reports the first multiplex SNP genotyping assay for olive oil cultivar identification that is performed on a suspension of fluorescence-encoded microspheres. Up to 100 sets of microspheres, with unique "fluorescence signatures", are available. Allele discrimination was accomplished by primer extension reaction. The reaction products were captured via hybridization on the microspheres and analyzed, within seconds, by a flow cytometer. The "fluorescence signature" of each microsphere is assigned to a specific allele, whereas the signal from a reporter fluorophore denotes the presence of the allele. As a model, a panel of three SNPs was chosen that enabled identification of five common Greek olive cultivars (Adramytini, Chondrolia Chalkidikis, Kalamon, Koroneiki, and Valanolia).

  3. A Method for Checking Genomic Integrity in Cultured Cell Lines from SNP Genotyping Data.

    PubMed

    Danecek, Petr; McCarthy, Shane A; Durbin, Richard

    2016-01-01

    Genomic screening for chromosomal abnormalities is an important part of quality control when establishing and maintaining stem cell lines. We present a new method for sensitive detection of copy number alterations, aneuploidy, and contamination in cell lines using genome-wide SNP genotyping data. In contrast to other methods designed for identifying copy number variations in a single sample or in a sample composed of a mixture of normal and tumor cells, this new method is tailored for determining differences between cell lines and the starting material from which they were derived, which allows us to distinguish between normal and novel copy number variation. We implemented the method in the freely available BCFtools package and present results based on induced pluripotent stem cell lines obtained in the HipSci project.

  4. A Method for Checking Genomic Integrity in Cultured Cell Lines from SNP Genotyping Data

    PubMed Central

    McCarthy, Shane A.; Durbin, Richard

    2016-01-01

    Genomic screening for chromosomal abnormalities is an important part of quality control when establishing and maintaining stem cell lines. We present a new method for sensitive detection of copy number alterations, aneuploidy, and contamination in cell lines using genome-wide SNP genotyping data. In contrast to other methods designed for identifying copy number variations in a single sample or in a sample composed of a mixture of normal and tumor cells, this new method is tailored for determining differences between cell lines and the starting material from which they were derived, which allows us to distinguish between normal and novel copy number variation. We implemented the method in the freely available BCFtools package and present results based on induced pluripotent stem cell lines obtained in the HipSci project. PMID:27176002

  5. To Cheat or Not To Cheat: Tryptophan Hydroxylase 2 SNP Variants Contribute to Dishonest Behavior.

    PubMed

    Shen, Qiang; Teo, Meijun; Winter, Eyal; Hart, Einav; Chew, Soo H; Ebstein, Richard P

    2016-01-01

    Although, lying (bear false witness) is explicitly prohibited in the Decalogue and a focus of interest in philosophy and theology, more recently the behavioral and neural mechanisms of deception are gaining increasing attention from diverse fields especially economics, psychology, and neuroscience. Despite the considerable role of heredity in explaining individual differences in deceptive behavior, few studies have investigated which specific genes contribute to the heterogeneity of lying behavior across individuals. Also, little is known concerning which specific neurotransmitter pathways underlie deception. Toward addressing these two key questions, we implemented a neurogenetic strategy and modeled deception by an incentivized die-under-cup task in a laboratory setting. The results of this exploratory study provide provisional evidence that SNP variants across the tryptophan hydroxylase 2 (TPH2) gene, that encodes the rate-limiting enzyme in the biosynthesis of brain serotonin, contribute to individual differences in deceptive behavior.

  6. To Cheat or Not To Cheat: Tryptophan Hydroxylase 2 SNP Variants Contribute to Dishonest Behavior

    PubMed Central

    Shen, Qiang; Teo, Meijun; Winter, Eyal; Hart, Einav; Chew, Soo H.; Ebstein, Richard P.

    2016-01-01

    Although, lying (bear false witness) is explicitly prohibited in the Decalogue and a focus of interest in philosophy and theology, more recently the behavioral and neural mechanisms of deception are gaining increasing attention from diverse fields especially economics, psychology, and neuroscience. Despite the considerable role of heredity in explaining individual differences in deceptive behavior, few studies have investigated which specific genes contribute to the heterogeneity of lying behavior across individuals. Also, little is known concerning which specific neurotransmitter pathways underlie deception. Toward addressing these two key questions, we implemented a neurogenetic strategy and modeled deception by an incentivized die-under-cup task in a laboratory setting. The results of this exploratory study provide provisional evidence that SNP variants across the tryptophan hydroxylase 2 (TPH2) gene, that encodes the rate-limiting enzyme in the biosynthesis of brain serotonin, contribute to individual differences in deceptive behavior. PMID:27199691

  7. Highly effective SNP-based association mapping and management of recessive defects in livestock.

    PubMed

    Charlier, Carole; Coppieters, Wouter; Rollin, Frédéric; Desmecht, Daniel; Agerholm, Jorgen S; Cambisano, Nadine; Carta, Eloisa; Dardano, Sabrina; Dive, Marc; Fasquelle, Corinne; Frennet, Jean-Claude; Hanset, Roger; Hubin, Xavier; Jorgensen, Claus; Karim, Latifa; Kent, Matthew; Harvey, Kirsten; Pearce, Brian R; Simon, Patricia; Tama, Nico; Nie, Haisheng; Vandeputte, Sébastien; Lien, Sigbjorn; Longeri, Maria; Fredholm, Merete; Harvey, Robert J; Georges, Michel

    2008-04-01

    The widespread use of elite sires by means of artificial insemination in livestock breeding leads to the frequent emergence of recessive genetic defects, which cause significant economic and animal welfare concerns. Here we show that the availability of genome-wide, high-density SNP panels, combined with the typical structure of livestock populations, markedly accelerates the positional identification of genes and mutations that cause inherited defects. We report the fine-scale mapping of five recessive disorders in cattle and the molecular basis for three of these: congenital muscular dystony (CMD) types 1 and 2 in Belgian Blue cattle and ichthyosis fetalis in Italian Chianina cattle. Identification of these causative mutations has an immediate translation into breeding practice, allowing marker assisted selection against the defects through avoidance of at-risk matings.

  8. SNP in starch biosynthesis genes associated with nutritional and functional properties of rice

    PubMed Central

    Kharabian-Masouleh, Ardashir; Waters, Daniel L. E.; Reinke, Russell F.; Ward, Rachelle; Henry, Robert J.

    2012-01-01

    Starch is a major component of human diets. The relative contribution of variation in the genes of starch biosynthesis to the nutritional and functional properties of the rice was evaluated in a rice breeding population. Sequencing 18 genes involved in starch synthesis in a population of 233 rice breeding lines discovered 66 functional SNPs in exonic regions. Five genes, AGPS2b, Isoamylase1, SPHOL, SSIIb and SSIVb showed no polymorphism. Association analysis found 31 of the SNP were associated with differences in pasting and cooking quality properties of the rice lines. Two genes appear to be the major loci controlling traits under human selection in rice, GBSSI (waxy gene) and SSIIa. GBSSI influenced amylose content and retrogradation. Other genes contributing to retrogradation were GPT1, SSI, BEI and SSIIIa. SSIIa explained much of the variation in cooking characteristics. Other genes had relatively small effects. PMID:22870386

  9. [Genetic diversity analysis of Andrographis paniculata in China based on SRAP and SNP].

    PubMed

    Chen, Rong; Wang, Xiao-Yun; Song, Yu-Ning; Zhu, Yun-feng; Wang, Peng-liang; Li, Min; Zhong, Guo-Yue

    2014-12-01

    In order to reveal genetic diversity of domestic Andrographis paniculata and its impact on quality, genetic backgrounds of 103 samples from 7 provinces in China were analyzed using SRAP marker and SNP marker. Genetic structures of the A. paniculata populations were estimated with Powermarker V 3.25 and Mega 6.0 software, and polymorphic SNPs were identified with CodonCode Aligner software. The results showed that the genetic distances of domestic A. paniculata germplasm ranged from 0. 01 to 0.09, and no polymorphic SNPs were discovered in coding sequence fragments of ent-copalyl diphosphate synthase. A. paniculata germplasm from various regions in China had poor genetic diversity. This phenomenon was closely related to strict self-fertilization and earlier introduction from the same origin. Therefore, genetic background had little impact on variable qualities of A. paniculata in domestic market. Mutation breeding, polyploid breeding and molecular breeding were proposed as promising strategies in germplasm innovation.

  10. Analysis of Y-chromosomal SNP haplogroups and STR haplotypes in an Algerian population sample.

    PubMed

    Robino, C; Crobu, F; Di Gaetano, C; Bekada, A; Benhamamouch, S; Cerutti, N; Piazza, A; Inturri, S; Torre, C

    2008-05-01

    The distribution of Y-chromosomal single nucleotide polymorphism (SNP) haplogroups and short tandem repeat (STR) haplotypes was determined in a sample of 102 unrelated men of Arab origin from northwestern Algeria (Oran area). A total of nine different haplogroups were identified by a panel of 22 binary markers. The most common haplogroups observed in the Algerian population were E3b2 (45.1%) and J1 (22.5%). Y-STR typing by a 17-loci multiplex system allowed 93 haplotypes to be defined (88 were unique). Striking differences in the allele distribution and gene diversity of Y-STR markers between haplogroups could be found. In particular, intermediate alleles at locus DYS458 specifically characterized the haplotypes of individuals carrying haplogroup J1. All the intermediate alleles shared a common repeat sequence structure, supporting the hypothesis that the variant originated from a single mutational event.

  11. A high resolution genetic linkage map of soybean based on 357 recombinant inbred lines genotyped with BARCSoySNP6K

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The objective of this study was to construct a high density genetic map of soybean (Glycine max L. Merr) using a high throughput single nucleotide polymorphism (SNP) genotyping on 357 F7 recombinant inbred lines (RILs) from a cross of ‘Wyandot’ × PI 567301B. Of 5,403 SNP loci scored from the Infiniu...

  12. The human lactase persistence-associated SNP -13910*T enables in vivo functional persistence of lactase promoter-reporter transgene expression.

    PubMed

    Fang, Lin; Ahn, Jong Kun; Wodziak, Dariusz; Sibley, Eric

    2012-07-01

    Lactase is the intestinal enzyme responsible for digestion of the milk sugar lactose. Lactase gene expression declines dramatically upon weaning in mammals and during early childhood in humans (lactase nonpersistence). In various ethnic groups, however, lactase persists in high levels throughout adulthood (lactase persistence). Genetic association studies have identified that lactase persistence in northern Europeans is strongly associated with a single nucleotide polymorphism (SNP) located 14 kb upstream of the lactase gene: -13910*C/T. To determine whether the -13910*T SNP can function in vivo to mediate lactase persistence, we generated transgenic mice harboring human DNA fragments with the -13910*T SNP or the ancestral -13910*C SNP cloned upstream of a 2-kb rat lactase gene promoter in a luciferase reporter construct. We previously reported that the 2-kb rat lactase promoter directs a post-weaning decline of luciferase transgene expression similar to that of the endogenous lactase gene. In the present study, the post-weaning decline directed by the rat lactase promoter is impeded by addition of the -13910*T SNP human DNA fragment, but not by addition of the -13910*C ancestral SNP fragment. Persistence of transgene expression associated with the -13910*T SNP represents the first in vivo data in support of a functional role for the -13910*T SNP in mediating the human lactase persistence phenotype.

  13. Design of the Illumina Porcine 50K+ SNP Iselect(TM) Beadchip and Characterization of the Porcine HapMap Population

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Using next generation sequencing technology the International Swine SNP Consortium has identified 500,000 SNPs and used these to design an Illumina Infinium iSelect™ SNP BeadChip with a selection of 60,218 SNPs. The selected SNPs include previously validated SNPs and SNPs identified de novo using se...

  14. Electrochemical detection of type 2 diabetes mellitus-related SNP via DNA-mediated growth of silver nanoparticles on single walled carbon nanotubes.

    PubMed

    Tao, Jia; Zhao, Peng; Zheng, Jing; Wu, Cuichen; Shi, Muling; Li, Jishan; Li, Yinhui; Yang, Ronghua

    2015-11-07

    Herein, we proposed a new electrochemical sensing strategy for T2DM-related SNP detection via DNA-mediated growth of AgNPs on a SWCNT-modified electrode. Coupled with RNase HII enzyme assisted amplification, this approach could realize T2DM-related SNP assay and be applied in crude extracts of carcinoma pancreatic β-cell lines.

  15. A large maize (Zea Mays L.) SNP genotyping array: development and germplasm genotyping, and genetic mapping to compare with the B73 reference genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    SNP genotyping arrays have been useful for many applications that require a large number of molecular markers such as high-density genetic mapping, genome-wide association studies (GWAS), and genomic selection for accelerated breeding. We report the establishment of a large SNP array for maize and i...

  16. Switchgrass Genomic Diversity, Ploidy, and Evolution: Novel Insights from a Network-Based SNP Discovery Protocol

    PubMed Central

    Lu, Fei; Lipka, Alexander E.; Glaubitz, Jeff; Elshire, Rob; Cherney, Jerome H.; Casler, Michael D.; Buckler, Edward S.; Costich, Denise E.

    2013-01-01

    Switchgrass (Panicum virgatum L.) is a perennial grass that has been designated as an herbaceous model biofuel crop for the United States of America. To facilitate accelerated breeding programs of switchgrass, we developed both an association panel and linkage populations for genome-wide association study (GWAS) and genomic selection (GS). All of the 840 individuals were then genotyped using genotyping by sequencing (GBS), generating 350 GB of sequence in total. As a highly heterozygous polyploid (tetraploid and octoploid) species lacking a reference genome, switchgrass is highly intractable with earlier methodologies of single nucleotide polymorphism (SNP) discovery. To access the genetic diversity of species like switchgrass, we developed a SNP discovery pipeline based on a network approach called the Universal Network-Enabled Analysis Kit (UNEAK). Complexities that hinder single nucleotide polymorphism discovery, such as repeats, paralogs, and sequencing errors, are easily resolved with UNEAK. Here, 1.2 million putative SNPs were discovered in a diverse collection of primarily upland, northern-adapted switchgrass populations. Further analysis of this data set revealed the fundamentally diploid nature of tetraploid switchgrass. Taking advantage of the high conservation of genome structure between switchgrass and foxtail millet (Setaria italica (L.) P. Beauv.), two parent-specific, synteny-based, ultra high-density linkage maps containing a total of 88,217 SNPs were constructed. Also, our results showed clear patterns of isolation-by-distance and isolation-by-ploidy in natural populations of switchgrass. Phylogenetic analysis supported a general south-to-north migration path of switchgrass. In addition, this analysis suggested that upland tetraploid arose from upland octoploid. All together, this study provides unparalleled insights into the diversity, genomic complexity, population structure, phylogeny, phylogeography, ploidy, and evolutionary dynamics of

  17. JAM: A Scalable Bayesian Framework for Joint Analysis of Marginal SNP Effects

    PubMed Central

    Conti, David V.; Richardson, Sylvia

    2016-01-01

    ABSTRACT Recently, large scale genome‐wide association study (GWAS) meta‐analyses have boosted the number of known signals for some traits into the tens and hundreds. Typically, however, variants are only analysed one‐at‐a‐time. This complicates the ability of fine‐mapping to identify a small set of SNPs for further functional follow‐up. We describe a new and scalable algorithm, joint analysis of marginal summary statistics (JAM), for the re‐analysis of published marginal summary stactistics under joint multi‐SNP models. The correlation is accounted for according to estimates from a reference dataset, and models and SNPs that best explain the complete joint pattern of marginal effects are highlighted via an integrated Bayesian penalized regression framework. We provide both enumerated and Reversible Jump MCMC implementations of JAM and present some comparisons of performance. In a series of realistic simulation studies, JAM demonstrated identical performance to various alternatives designed for single region settings. In multi‐region settings, where the only multivariate alternative involves stepwise selection, JAM offered greater power and specificity. We also present an application to real published results from MAGIC (meta‐analysis of glucose and insulin related traits consortium) – a GWAS meta‐analysis of more than 15,000 people. We re‐analysed several genomic regions that produced multiple significant signals with glucose levels 2 hr after oral stimulation. Through joint multivariate modelling, JAM was able to formally rule out many SNPs, and for one gene, ADCY5, suggests that an additional SNP, which transpired to be more biologically plausible, should be followed up with equal priority to the reported index. PMID:27027514

  18. JAM: A Scalable Bayesian Framework for Joint Analysis of Marginal SNP Effects.

    PubMed

    Newcombe, Paul J; Conti, David V; Richardson, Sylvia

    2016-04-01

    Recently, large scale genome-wide association study (GWAS) meta-analyses have boosted the number of known signals for some traits into the tens and hundreds. Typically, however, variants are only analysed one-at-a-time. This complicates the ability of fine-mapping to identify a small set of SNPs for further functional follow-up. We describe a new and scalable algorithm, joint analysis of marginal summary statistics (JAM), for the re-analysis of published marginal summary statistics under joint multi-SNP models. The correlation is accounted for according to estimates from a reference dataset, and models and SNPs that best explain the complete joint pattern of marginal effects are highlighted via an integrated Bayesian penalized regression framework. We provide both enumerated and Reversible Jump MCMC implementations of JAM and present some comparisons of performance. In a series of realistic simulation studies, JAM demonstrated identical performance to various alternatives designed for single region settings. In multi-region settings, where the only multivariate alternative involves stepwise selection, JAM offered greater power and specificity. We also present an application to real published results from MAGIC (meta-analysis of glucose and insulin related traits consortium) - a GWAS meta-analysis of more than 15,000 people. We re-analysed several genomic regions that produced multiple significant signals with glucose levels 2 hr after oral stimulation. Through joint multivariate modelling, JAM was able to formally rule out many SNPs, and for one gene, ADCY5, suggests that an additional SNP, which transpired to be more biologically plausible, should be followed up with equal priority to the reported index.

  19. A 'golden' SNP in CmOr governs the fruit flesh color of melon (Cucumis melo).

    PubMed

    Tzuri, Galil; Zhou, Xiangjun; Chayut, Noam; Yuan, Hui; Portnoy, Vitaly; Meir, Ayala; Sa'ar, Uzi; Baumkoler, Fabian; Mazourek, Michael; Lewinsohn, Efraim; Fei, Zhangjun; Schaffer, Arthur A; Li, Li; Burger, Joseph; Katzir, Nurit; Tadmor, Yaakov

    2015-04-01

    The flesh color of Cucumis melo (melon) is genetically determined, and can be white, light green or orange, with β-carotene being the predominant pigment. We associated carotenoid accumulation in melon fruit flesh with polymorphism within CmOr, a homolog of the cauliflower BoOr gene, and identified CmOr as the previously described gf locus in melon. CmOr was found to co-segregate with fruit flesh color, and presented two haplotypes (alleles) in a broad germplasm collection, one being associated with orange flesh and the second being associated with either white or green flesh. Allelic variation of CmOr does not affect its transcription or protein level. The variation also does not affect its plastid subcellular localization. Among the identified single nucleotide polymorphisms (SNPs) between CmOr alleles in orange versus green/white-flesh fruit, a single SNP causes a change of an evolutionarily highly conserved arginine to histidine in the CmOr protein. Functional analysis of CmOr haplotypes in an Arabidopsis callus system confirmed the ability of the CmOr orange haplotype to induce β-carotene accumulation. Site-directed mutagenesis of the CmOr green/white haplotype to change the CmOR arginine to histidine triggered β-carotene accumulation. The identification of the 'golden' SNP in CmOr, which is responsible for the non-orange and orange melon fruit phenotypes, provides new tools for studying the Or mechanism of action, and suggests genome editing of the Or gene for nutritional biofortification of crops.

  20. Two-stage designs to identify the effects of SNP combinations on complex diseases.

    PubMed

    Kang, Guolian; Yue, Weihua; Zhang, Jifeng; Huebner, Marianne; Zhang, Handi; Ruan, Yan; Lu, Tianlan; Ling, Yansu; Zuo, Yijun; Zhang, Dai

    2008-01-01

    The genetic basis of complex diseases is expected to be highly heterogeneous, with many disease genes, where each gene by itself has only a small effect. Based on the nonlinear contributions of disease genes across the genome to complex diseases, we introduce the concept of single nucleotide polymorphism (SNP) synergistic blocks. A two-stage approach is applied to detect the genetic association of synergistic blocks with a disease. In the first stage, synergistic blocks associated with a complex disease are identified by clustering SNP patterns and choosing blocks within a cluster that minimize a diversity criterion. In the second stage, a logistic regression model is given for a synergistic block. Using simulated case-control data, we demonstrate that our method has reasonable power to identify gene-gene interactions. To further evaluate the performance of our method, we apply our method to 17 loci of four candidate genes for paranoid schizophrenia in a Chinese population. Five synergistic blocks are found to be associated with schizophrenia, three of which are negatively associated (odds ratio, OR < 0.3, P < 0.05), while the others are positively associated (OR > 2.0, P < 0.05). The mathematical models of these five synergistic blocks are presented. The results suggest that there may be interactive effects for schizophrenia among variants of the genes neuregulin 1 (NRG1, 8p22-p11), G72 (13q34), the regulator of G-protein signaling-4 (RGS4, 1q21-q22) and frizzled 3 (FZD3, 8p21). Using synergistic blocks, we can reduce the dimensionality in a multi-locus association analysis, and evaluate the sizes of interactive effects among multiple disease genes on complex phenotypes.

  1. Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly

    PubMed Central

    Li, Heng

    2012-01-01

    Motivation: Eugene Myers in his string graph paper suggested that in a string graph or equivalently a unitig graph, any path spells a valid assembly. As a string/unitig graph also encodes every valid assembly of reads, such a graph, provided that it can be constructed correctly, is in fact a lossless representation of reads. In principle, every analysis based on whole-genome shotgun sequencing (WGS) data, such as SNP and insertion/deletion (INDEL) calling, can also be achieved with unitigs. Results: To explore the feasibility of using de novo assembly in the context of resequencing, we developed a de novo assembler, fermi, that assembles Illumina short reads into unitigs while preserving most of information of the input reads. SNPs and INDELs can be called by mapping the unitigs against a reference genome. By applying the method on 35-fold human resequencing data, we showed that in comparison to the standard pipeline, our approach yields similar accuracy for SNP calling and better results for INDEL calling. It has higher sensitivity than other de novo assembly based methods for variant calling. Our work suggests that variant calling with de novo assembly can be a beneficial complement to the standard variant calling pipeline for whole-genome resequencing. In the methodological aspects, we propose FMD-index for forward–backward extension of DNA sequences, a fast algorithm for finding all super-maximal exact matches and one-pass construction of unitigs from an FMD-index. Availability: http://github.com/lh3/fermi Contact: hengli@broadinstitute.org PMID:22569178

  2. Eurasiaplex: a forensic SNP assay for differentiating European and South Asian ancestries.

    PubMed

    Phillips, C; Freire Aradas, A; Kriegel, A K; Fondevila, M; Bulbul, O; Santos, C; Serrulla Rech, F; Perez Carceles, M D; Carracedo, Á; Schneider, P M; Lareu, M V

    2013-05-01

    We have selected a set of single nucleotide polymorphisms (SNPs) with the specific aim of differentiating European and South Asian ancestries. The SNPs were combined into a 23-plex SNaPshot primer extension assay: Eurasiaplex, designed to complement an existing 34-plex forensic ancestry test with both marker sets occupying well-spaced genomic positions, enabling their combination as single profile submissions to the Bayesian Snipper forensic ancestry inference system. We analyzed the ability of Eurasiaplex plus 34plex SNPs to assign ancestry to a total 1648 profiles from 16 European, 7 Middle East, 13 Central-South Asian and 21 East Asian populations. Ancestry assignment likelihoods were estimated from Snipper using training sets of five-group data (three Eurasian groups, East Asian and African genotypes) and four-group data (Middle East genotypes removed). Five-group differentiations gave assignment success of 91% for NW European populations, 72% for Middle East populations and 39% for Central-South Asian populations, indicating Middle East individuals are not reliably differentiated from either Europeans or Central-South Asians. Four-group differentiations provided markedly improved assignment success rates of 97% for most continental Europeans tested (excluding Turkish and Adygei at the far eastern edge of Europe) and 95% for Central-South Asians, despite applying a probability threshold for the highest likelihood ratio above '100 times more likely'. As part of the assessment of the sensitivity of Eurasiaplex to analyze challenging forensic material we detail Eurasiaplex and 34-plex SNP typing to infer ancestry of a cranium recovered from the sea, achieving 82% SNP genotype completeness. Therefore, Eurasiaplex provides an informative and forensically robust approach to the differentiation of European and South Asian ancestries amongst Eurasian populations.

  3. EST-derived SNP discovery and selective pressure analysis in Pacific white shrimp ( Litopenaeus vannamei)

    NASA Astrophysics Data System (ADS)

    Liu, Chengzhang; Wang, Xia; Xiang, Jianhai; Li, Fuhua

    2012-09-01

    Pacific white shrimp has become a major aquaculture and fishery species worldwide. Although a large scale EST resource has been publicly available since 2008, the data have not yet been widely used for SNP discovery or transcriptome-wide assessment of selective pressure. In this study, a set of 155 411 expressed sequence tags (ESTs) from the NCBI database were computationally analyzed and 17 225 single nucleotide polymorphisms (SNPs) were predicted, including 9 546 transitions, 5 124 transversions and 2 481 indels. Among the 7 298 SNP substitutions located in functionally annotated contigs, 58.4% (4 262) are non-synonymous SNPs capable of introducing amino acid mutations. Two hundred and fifty nonsynonymous SNPs in genes associated with economic traits have been identified as candidates for markers in selective breeding. Diversity estimates among the synonymous nucleotides were on average 3.49 times greater than those in non-synonymous, suggesting negative selection. Distribution of non-synonymous to synonymous substitutions (Ka/Ks) ratio ranges from 0 to 4.01, (average 0.42, median 0.26), suggesting that the majority of the affected genes are under purifying selection. Enrichment analysis identified multiple gene ontology categories under positive or negative selection. Categories involved in innate immune response and male gamete generation are rich in positively selected genes, which is similar to reports in Drosophila and primates. This work is the first transcriptome-wide assessment of selective pressure in a Penaeid shrimp species. The functionally annotated SNPs provide a valuable resource of potential molecular markers for selective breeding.

  4. Tomato breeding in the genomics era: insights from a SNP array

    PubMed Central

    2013-01-01

    Background The major bottle neck in genetic and linkage studies in tomato has been the lack of a sufficient number of molecular markers. This has radically changed with the application of next generation sequencing and high throughput genotyping. A set of 6000 SNPs was identified and 5528 of them were used to evaluate tomato germplasm at the level of species, varieties and segregating populations. Results From the 5528 SNPs, 1980 originated from 454-sequencing, 3495 from Illumina Solexa sequencing and 53 were additional known markers. Genotyping different tomato samples allowed the evaluation of the level of heterozygosity and introgressions among commercial varieties. Cherry tomatoes were especially different from round/beefs in chromosomes 4, 5 and 12. We were able to identify a set of 750 unique markers distinguishing S. lycopersicum ‘Moneymaker’ from all its distantly related wild relatives. Clustering and neighbour joining analysis among varieties and species showed expected grouping patterns, with S. pimpinellifolium as the most closely related to commercial tomatoesearlier results. Conclusions Our results show that a SNP search in only a few breeding lines already provides generally applicable markers in tomato and its wild relatives. It also shows that the Illumina bead array generated data are highly reproducible. Our SNPs can roughly be divided in two categories: SNPs of which both forms are present in the wild relatives and in domesticated tomatoes (originating from common ancestors) and SNPs unique for the domesticated tomato (originating from after the domestication event). The SNPs can be used for genotyping, identification of varieties, comparison of genetic and physical linkage maps and to confirm (phylogenetic) relations. In the SNPs used for the array there is hardly any overlap with the SolCAP array and it is strongly recommended to combine both SNP sets and to select a core collection of robust SNPs completely covering the entire tomato

  5. Multilocus analysis of SNP and metabolic data within a given pathway

    PubMed Central

    Kristensen, Vessela N; Tsalenko, Anya; Geisler, Jurgen; Faldaas, Anne; Grenaker, Grethe Irene; Lingjærde, Ole Christian; Fjeldstad, Ståle; Yakhini, Zohar; Lønning, Per Eystein; Børresen-Dale, Anne-Lise

    2006-01-01

    Background Complex traits, which are under the influence of multiple and possibly interacting genes, have become a subject of new statistical methodological research. One of the greatest challenges facing human geneticists is the identification and characterization of susceptibility genes for common multifactorial diseases and their association to different quantitative phenotypic traits. Results Two types of data from the same metabolic pathway were used in the analysis: categorical measurements of 18 SNPs; and quantitative measurements of plasma levels of several steroids and their precursors. Using the combinatorial partitioning method we tested various thresholds for each metabolic trait and each individual SNP locus. One SNP in CYP19, 3UTR, two SNPs in CYP1B1 (R48G and A119S) and one in CYP1A1 (T461N) were significantly differently distributed between the high and low level metabolic groups. The leave one out cross validation method showed that 6 SNPs in concert make 65% correct prediction of phenotype. Further we used pattern recognition, computing the p-value by Monte Carlo simulation to identify sets of SNPs and physiological characteristics such as age and weight that contribute to a given metabolic level. Since the SNPs detected by both methods reside either in the same gene (CYP1B1) or in 3 different genes in immediate vicinity on chromosome 15 (CYP19, CYP11 and CYP1A1) we investigated the possibility that they form intragenic and intergenic haplotypes, which may jointly account for a higher activity in the pathway. We identified such haplotypes associated with metabolic levels. Conclusion The methods reported here may enable to study multiple low-penetrance genetic factors that together determine various quantitative phenotypic traits. Our preliminary data suggest that several genes coding for proteins involved in a common pathway, that happen to be located on common chromosomal areas and may form intragenic haplotypes, together account for a higher

  6. SNP-Based Quantification of Allele-Specific DNA Methylation Patterns by Pyrosequencing®.

    PubMed

    Busato, Florence; Tost, Jörg

    2015-01-01

    The analysis of allele-specific DNA methylation patterns has recently attracted much interest as loci of allele-specific DNA methylation overlap with known risk loci for complex diseases and the analysis might contribute to the fine-mapping and interpretation of non-coding genetic variants associated with complex diseases and improve the understanding between genotype and phenotype. In the presented protocol, we present a method for the analysis of DNA methylation patterns on both alleles separately using heterozygous Single Nucleotide Polymorphisms (SNPs) as anchor for allele-specific PCR amplification followed by analysis of the allele-specific DNA methylation patterns by Pyrosequencing(®). Pyrosequencing is an easy-to-handle, quantitative real-time sequencing method that is frequently used for genotyping as well as for the analysis of DNA methylation patterns. The protocol consists of three major steps: (1) identification of individuals heterozygous for a SNP in a region of interest using Pyrosequencing; (2) analysis of the DNA methylation patterns surrounding the SNP on bisulfite-treated DNA to identify regions of potential allele-specific DNA methylation; and (3) the analysis of the DNA methylation patterns associated with each of the two alleles, which are individually amplified using allele-specific PCR. The enrichment of the targeted allele is re-enforced by modification of the allele-specific primers at the allele-discriminating base with Locked Nucleic Acids (LNA). For the proof-of-principle of the developed approach, we provide assay details for three imprinted genes (IGF2, IGF2R, and PEG3) within this chapter. The mean of the DNA methylation patterns derived from the individual alleles corresponds well to the overall DNA methylation patterns and the developed approach proved more reliable compared to other protocols for allele-specific DNA methylation analysis.

  7. High-density SNP genotyping to define beta-globin locus haplotypes.

    PubMed

    Liu, Li; Muralidhar, Shalini; Singh, Manisha; Sylvan, Caprice; Kalra, Inderdeep S; Quinn, Charles T; Onyekwere, Onyinye C; Pace, Betty S

    2009-01-01

    Five major beta-globin locus haplotypes have been established in individuals with sickle cell disease (SCD) from the Benin, Bantu, Senegal, Cameroon, and Arab-Indian populations. Historically, beta-haplotypes were established using restriction fragment length polymorphism (RFLP) analysis across the beta-locus, which consists of five functional beta-like globin genes located on chromosome 11. Previous attempts to correlate these haplotypes as robust predictors of clinical phenotypes observed in SCD have not been successful. We speculate that the coverage and distribution of the RFLP sites located proximal to or within the globin genes are not sufficiently dense to accurately reflect the complexity of this region. To test our hypothesis, we performed RFLP analysis and high-density single nucleotide polymorphism (SNP) genotyping across the beta-locus using DNA samples from healthy African Americans with either normal hemoglobin A (HbAA) or individuals with homozygous SS (HbSS) disease. Using the genotyping data from 88 SNPs and Haploview analysis, we generated a greater number of haplotypes than that observed with RFLP analysis alone. Furthermore, a unique pattern of long-range linkage disequilibrium between the locus control region and the beta-like globin genes was observed in the HbSS group. Interestingly, we observed multiple SNPs within the HindIII restriction site located in the Ggamma-globin intervening sequence II which produced the same RFLP pattern. These findings illustrated the inability of RFLP analysis to decipher the complexity of sequence variations that impacts genomic structure in this region. Our data suggest that high-density SNP mapping may be required to accurately define beta-haplotypes that correlate with the different clinical phenotypes observed in SCD.

  8. Predicting HLA alleles from high-resolution SNP data in three Southeast Asian populations.

    PubMed

    Pillai, Nisha Esakimuthu; Okada, Yukinori; Saw, Woei-Yuh; Ong, Rick Twee-Hee; Wang, Xu; Tantoso, Erwin; Xu, Wenting; Peterson, Trevor A; Bielawny, Thomas; Ali, Mohammad; Tay, Koon-Yong; Poh, Wan-Ting; Tan, Linda Wei-Lin; Koo, Seok-Hwee; Lim, Wei-Yen; Soong, Richie; Wenk, Markus; Raychaudhuri, Soumya; Little, Peter; Plummer, Francis A; Lee, Edmund J D; Chia, Kee-Seng; Luo, Ma; De Bakker, Paul I W; Teo, Yik-Ying

    2014-08-15

    The major histocompatibility complex (MHC) containing the classical human leukocyte antigen (HLA) Class I and Class II genes is among the most polymorphic and diverse regions in the human genome. Despite the clinical importance of identifying the HLA types, very few databases jointly characterize densely genotyped single nucleotide polymorphisms (SNPs) and HLA alleles in the same samples. To date, the HapMap presents the only public resource that provides a SNP reference panel for predicting HLA alleles, constructed with four collections of individuals of north-western European, northern Han Chinese, cosmopolitan Japanese and Yoruba Nigerian ancestry. Owing to complex patterns of linkage disequilibrium in this region, it is unclear whether the HapMap reference panels can be appropriately utilized for other populations. Here, we describe a public resource for the Singapore Genome Variation Project with: (i) dense genotyping across ∼ 9000 SNPs in the MHC; (ii) four-digit HLA typing for eight Class I and Class II loci, in 96 southern Han Chinese, 89 Southeast Asian Malays and 83 Tamil Indians. This resource provides population estimates of the frequencies of HLA alleles at these eight loci in the three population groups, particularly for HLA-DPA1 and HLA-DPB1 that were not assayed in HapMap. Comparing between population-specific reference panels and a cosmopolitan panel created from all four HapMap populations, we demonstrate that more accurate imputation is obtained with population-specific panels than with the cosmopolitan panel, especially for the Malays and Indians but even when imputing between northern and southern Han Chinese. As with SNP imputation, common HLA alleles were imputed with greater accuracy than low-frequency variants.

  9. Mapping of Genetic Abnormalities of Primary Tumours from Metastatic CRC by High-Resolution SNP Arrays

    PubMed Central

    Sayagués, José María; Fontanillo, Celia; Abad, María del Mar; González-González, María; Sarasquete, María Eugenia; del Carmen Chillon, Maria; Garcia, Eva; Bengoechea, Oscar; Fonseca, Emilio; Gonzalez-Diaz, Marcos; De Las Rivas, Javier

    2010-01-01

    Background For years, the genetics of metastatic colorectal cancer (CRC) have been studied using a variety of techniques. However, most of the approaches employed so far have a relatively limited resolution which hampers detailed characterization of the common recurrent chromosomal breakpoints as well as the identification of small regions carrying genetic changes and the genes involved in them. Methodology/Principal Findings Here we applied 500K SNP arrays to map the most common chromosomal lesions present at diagnosis in a series of 23 primary tumours from sporadic CRC patients who had developed liver metastasis. Overall our results confirm that the genetic profile of metastatic CRC is defined by imbalanced gains of chromosomes 7, 8q, 11q, 13q, 20q and X together with losses of the 1p, 8p, 17p and 18q chromosome regions. In addition, SNP-array studies allowed the identification of small (<1.3 Mb) and extensive/large (>1.5 Mb) altered DNA sequences, many of which contain cancer genes known to be involved in CRC and the metastatic process. Detailed characterization of the breakpoint regions for the altered chromosomes showed four recurrent breakpoints at chromosomes 1p12, 8p12, 17p11.2 and 20p12.1; interestingly, the most frequently observed recurrent chromosomal breakpoint was localized at 17p11.2 and systematically targeted the FAM27L gene, whose role in CRC deserves further investigations. Conclusions/Significance In summary, in the present study we provide a detailed map of the genetic abnormalities of primary tumours from metastatic CRC patients, which confirm and extend on previous observations as regards the identification of genes potentially involved in development of CRC and the metastatic process. PMID:21060790

  10. High-density SNP-based genetic maps for the parents of an outcrossed and a selfed tetraploid garden rose cross, inferred from admixed progeny using the 68k rose SNP array

    PubMed Central

    Vukosavljev, Mirjana; Arens, Paul; Voorrips, Roeland E; van ‘t Westende, Wendy PC; Esselink, GD; Bourke, Peter M; Cox, Peter; van de Weg, W Eric; Visser, Richard GF; Maliepaard, Chris; Smulders, Marinus JM

    2016-01-01

    Dense genetic maps create a base for QTL analysis of important traits and future implementation of marker-assisted breeding. In tetraploid rose, the existing linkage maps include <300 markers to cover 28 linkage groups (4 homologous sets of 7 chromosomes). Here we used the 68k WagRhSNP Axiom single-nucleotide polymorphism (SNP) array for rose, in combination with SNP dosage calling at the tetraploid level, to genotype offspring from the garden rose cultivar ‘Red New Dawn’. The offspring proved to be not from a single bi-parental cross. In rose breeding, crosses with unintended parents occur regularly. We developed a strategy to separate progeny into putative populations, even while one of the parents was unknown, using principle component analysis on pairwise genetic distances based on sets of selected SNP markers that were homozygous, and therefore uninformative for one parent. One of the inferred populations was consistent with self-fertilization of ‘Red New Dawn’. Subsequently, linkage maps were generated for a bi-parental and a self-pollinated population with ‘Red New Dawn’ as the common maternal parent. The densest map, for the selfed parent, had 1929 SNP markers on 25 linkage groups, covering 1765.5 cM at an average marker distance of 0.9 cM. Synteny with the strawberry (Fragaria vesca) genome was extensive. Rose ICM1 corresponded to F. vesca pseudochromosome 7 (Fv7), ICM4 to Fv4, ICM5 to Fv3, ICM6 to Fv2 and ICM7 to Fv5. Rose ICM2 corresponded to parts of F. vesca pseudochromosomes 1 and 6, whereas ICM3 is syntenic to the remainder of Fv6. PMID:27818777

  11. Development of EST-based SNP and InDel markers and their utilization in tetraploid cotton genetic mapping

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Expressed sequence tags (ESTs) were analyzed in silico in order to identify single nucleotide polymorphisms (SNPs) and insertion-deletion polymorphisms (InDels) in cotton. A total of 1349 EST-based SNP and InDel markers were developed by comparing ESTs between Gossypium hirsutum and G. barbadense, m...

  12. SEL1L SNP rs12435998, a predictor of glioblastoma survival and response to radio-chemotherapy

    PubMed Central

    Storaci, Alessandra Maria; Annovazzi, Laura; Cassoni, Paola; Melcarne, Antonio; De Blasio, Pasquale; Schiffer, Davide; Biunno, Ida

    2015-01-01

    The suppressor of Lin-12-like (C. elegans) (SEL1L) is involved in the endoplasmic reticulum (ER)-associated degradation pathway, malignant transformation and stem cells. In 412 formalin-fixed and paraffin-embedded brain tumors and 39 Glioblastoma multiforme (GBM) cell lines, we determined the frequency of five SEL1L single nucleotide genetic variants with regulatory and coding functions by a SNaPShot™ assay. We tested their possible association with brain tumor risk, prognosis and therapy. We studied the in vitro cytotoxicity of valproic acid (VPA), temozolomide (TMZ), doxorubicin (DOX) and paclitaxel (PTX), alone or in combination, on 11 GBM cell lines, with respect to the SNP rs12435998 genotype. The SNP rs12435998 was prevalent in anaplastic and malignant gliomas, and in meningiomas of all histologic grades, but unrelated to brain tumor risks. In GBM patients, the SNP rs12435998 was associated with prolonged overall survival (OS) and better response to TMZ-based radio-chemotherapy. GBM stem cells with this SNP showed lower levels of SEL1L expression and enhanced sensitivity to VPA. PMID:25948789

  13. Transcriptome analysis and SNP validation in sorghum using RNA seq data from germplasm with differential response to cold tolerance.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Currently there is a critical need for breeder friendly, easy access and high troughput single nucleotide polymorphic (SNP) markers in implementation of molecular breeding for sorghum improvement. To address this need we performed transcriptome profiling between cold sensitive and tolerant sorghum l...

  14. DNA sequences of Pima (Gossypium barbadense L.) cotton leaf for examining transcriptome diversity and SNP biomarker discovery

    Technology Transfer Automated Retrieval System (TEKTRAN)

    As an initial step to explore the transcriptome genetic diversity and to discover single nucleotide polymorphic (SNP)-biomarkers for marker assisted breeding within Pima (Gossypium barbadense L.) cotton, leaves from 25 day plants of three diverse genotypes were used to develop cDNA libraries. Using ...

  15. CLOCK 3111 T/C SNP interacts with emotional eating behavior for weight-loss in a Mediterranean population

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The goals of this research was (1) to analyze the role of emotional eating behavior on weight-loss progression during a 30-week weight-loss program in 1,272 individuals from a large Mediterranean population and (2) to test for interaction between CLOCK 3111 T/C SNP and emotional eating behavior on t...

  16. Identification of novel single nucleotide polymorphisms (SNPs) in deer (Odocoileus spp.) using the BovineSNP50 BeadChip.

    PubMed

    Haynes, Gwilym D; Latch, Emily K

    2012-01-01

    Single nucleotide polymorphisms (SNPs) are growing in popularity as a genetic marker for investigating evolutionary processes. A panel of SNPs is often developed by comparing large quantities of DNA sequence data across multiple individuals to identify polymorphic sites. For non-model species, this is particularly difficult, as performing the necessary large-scale genomic sequencing often exceeds the resources available for the project. In this study, we trial the Bovine SNP50 BeadChip developed in cattle (Bos taurus) for identifying polymorphic SNPs in cervids Odocoileus hemionus (mule deer and black-tailed deer) and O. virginianus (white-tailed deer) in the Pacific Northwest. We found that 38.7% of loci could be genotyped, of which 5% (n = 1068) were polymorphic. Of these 1068 polymorphic SNPs, a mixture of putatively neutral loci (n = 878) and loci under selection (n = 190) were identified with the F(ST)-outlier method. A range of population genetic analyses were implemented using these SNPs and a panel of 10 microsatellite loci. The three types of deer could readily be distinguished with both the SNP and microsatellite datasets. This study demonstrates that commercially developed SNP chips are a viable means of SNP discovery for non-model organisms, even when used between very distantly related species (the Bovidae and Cervidae families diverged some 25.1-30.1 million years before present).

  17. Re-evaluating data quality of dog mitochondrial, Y chromosomal, and autosomal SNPs genotyped by SNP array.

    PubMed

    O Otecko, Newton; Peng, Min-Sheng; Yang, He-Chuan; Zhang, Ya-Ping; Wang, Guo-Dong

    2016-11-18

    Quality deficiencies in single nucleotide polymorphism (SNP) analyses have important implications. We used missingness rates to investigate the quality of a recently published dataset containing 424 mitochondrial, 211 Y chromosomal, and 160 432 autosomal SNPs generated by a semicustom Illumina SNP array from 5 392 dogs and 14 grey wolves. Overall, the individual missingness rate for mitochondrial SNPs was ~43.8%, with 980 (18.1%) individuals completely missing mitochondrial SNP genotyping (missingness rate=1). In males, the genotype missingness rate was ~28.8% for Y chromosomal SNPs, with 374 males recording rates above 0.96. These 374 males also exhibited completely failed mitochondrial SNPs genotyping, indicative of a batch effect. Individual missingness rates for autosomal markers were greater than zero, but less than 0.5. Neither mitochondrial nor Y chromosomal SNPs achieved complete genotyping (locus missingness rate=0), whereas 5.9% of autosomal SNPs had a locus missingness rate=1. The high missingness rates and possible batch effect show that caution and rigorous measures are vital when genotyping and analyzing SNP array data for domestic animals. Further improvements of these arrays will be helpful to future studies.

  18. Development of a high-throughput SNP resource to advance genomic, genetic and breeding research in carrot (Daucus carota L.)

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The rapid advancement in high-throughput SNP genotyping technologies along with next generation sequencing (NGS) platforms has decreased the cost, improved the quality of large-scale genome surveys, and allowed specialty crops with limited genomic resources such as carrot (Daucus carota) to access t...

  19. Re-evaluating data quality of dog mitochondrial, Y chromosomal, and autosomal SNPs genotyped by SNP array

    PubMed Central

    OTECKO, Newton O.; PENG, Min-Sheng; YANG, He-Chuan; ZHANG, Ya-Ping; WANG, Guo-Dong

    2016-01-01

    Quality deficiencies in single nucleotide polymorphism (SNP) analyses have important implications. We used missingness rates to investigate the quality of a recently published dataset containing 424 mitochondrial, 211 Y chromosomal, and 160 432 autosomal SNPs generated by a semicustom Illumina SNP array from 5 392 dogs and 14 grey wolves. Overall, the individual missingness rate for mitochondrial SNPs was ~43.8%, with 980 (18.1%) individuals completely missing mitochondrial SNP genotyping (missingness rate=1). In males, the genotype missingness rate was ~28.8% for Y chromosomal SNPs, with 374 males recording rates above 0.96. These 374 males also exhibited completely failed mitochondrial SNPs genotyping, indicative of a batch effect. Individual missingness rates for autosomal markers were greater than zero, but less than 0.5. Neither mitochondrial nor Y chromosomal SNPs achieved complete genotyping (locus missingness rate=0), whereas 5.9% of autosomal SNPs had a locus missingness rate=1. The high missingness rates and possible batch effect show that caution and rigorous measures are vital when genotyping and analyzing SNP array data for domestic animals. Further improvements of these arrays will be helpful to future studies. PMID:28105800

  20. SNP marker development for linkage map construction, anchoring of the common bean whole genome sequence and genetic research

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Our objectives were to identify SNP DNA markers based on a diverse set of common bean cultivars via next generation sequencing technologies; to develop Illumina Infinium BeadChip assays containing SNPs with high polymorphism within and between common bean market classes, to create high density genet...

  1. An ultra-dense SNP linkage map for the octoploid, cultivated strawberry and its application in genetic research

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We will present an ultra-dense genetic linkage map for the octoploid, cultivated strawberry (Fragaria x ananassa) consisting of over 13K Axiom® based SNP markers and 150 previously mapped reference SSR loci. The high quality of the map is demonstrated by the short sizes of each of the 28 linkage gro...

  2. Imputation of microsatellite alleles from dense SNP genotypes for parentage verification across multiple Bos taurus and Bos indicus breeds

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Microsatellite markers (MS) have traditionally been used for parental verification and are still the international standard in spite of their higher cost, error rate, and turnaround time compared with Single Nucleotide Polymorphisms (SNP) -based assays. Despite domestic and international demands fr...

  3. Comparative analysis of CNV calling algorithms: literature survey and a case study using bovine high-density SNP data

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Copy number variations (CNVs) are gains and losses of genomic sequence between two individuals of a species. The data from single nucleotide polymorphism (SNP) microarrays are now routinely used for genotyping, but they also can be utilized for copy number detection. Substantial progress has been ...

  4. Translational genomics for abiotic stress in sorghum: transcriptional profiling and validation of SNP markers between germplasm with differential cold tolerance

    Technology Transfer Automated Retrieval System (TEKTRAN)

    One focus of the Sorghum Translational Genomics Lab (part of sorghum CRIS, PSGD, CSRL, USDA-ARS, Lubbock TX) is to utilize nucleotide variation between sorghum germplasm such as those derived from RNA seq for translation and validation of Single Nucleotide Polymorphism (SNP) into easy access DNA m...

  5. Development and evaluation of a genome-wide 6K SNP array for diploid sweet cherry and tetraploid sour cherry

    Technology Transfer Automated Retrieval System (TEKTRAN)

    High-throughput genome scans are important tools for genetic studies and breeding applications. Here, a 6K SNP array for use with the Illumina Infinium® system was developed for diploid sweet cherry (Prunus avium) and allotetraploid sour cherry (P. cerasus). This effort was led by RosBREED, a commun...

  6. Characterizing Associations and SNP-Environment Interactions for GWAS-Identified Prostate Cancer Risk Markers—Results from BPC3

    PubMed Central

    Lindstrom, Sara; Schumacher, Fredrick; Siddiq, Afshan; Travis, Ruth C.; Campa, Daniele; Berndt, Sonja I.; Diver, W. Ryan; Severi, Gianluca; Allen, Naomi; Andriole, Gerald; Bueno-de-Mesquita, Bas; Chanock, Stephen J.; Crawford, David; Gaziano, J. Michael; Giles, Graham G.; Giovannucci, Edward; Guo, Carolyn; Haiman, Christopher A.; Hayes, Richard B.; Halkjaer, Jytte; Hunter, David J.; Johansson, Mattias; Kaaks, Rudolf; Kolonel, Laurence N.; Navarro, Carmen; Riboli, Elio; Sacerdote, Carlotta; Stampfer, Meir; Stram, Daniel O.; Thun, Michael J.; Trichopoulos, Dimitrios; Virtamo, Jarmo; Weinstein, Stephanie J.; Yeager, Meredith; Henderson, Brian; Ma, Jing; Le Marchand, Loic; Albanes, Demetrius; Kraft, Peter

    2011-01-01

    Genome-wide association studies (GWAS) have identified multiple single nucleotide polymorphisms (SNPs) associated with prostate cancer risk. However, whether these associations can be consistently replicated, vary with disease aggressiveness (tumor stage and grade) and/or interact with non-genetic potential risk factors or other SNPs is unknown. We therefore genotyped 39 SNPs from regions identified by several prostate cancer GWAS in 10,501 prostate cancer cases and 10,831 controls from the NCI Breast and Prostate Cancer Cohort Consortium (BPC3). We replicated 36 out of 39 SNPs (P-values ranging from 0.01 to 10−28). Two SNPs located near KLK3 associated with PSA levels showed differential association with Gleason grade (rs2735839, P = 0.0001 and rs266849, P = 0.0004; case-only test), where the alleles associated with decreasing PSA levels were inversely associated with low-grade (as defined by Gleason grade <8) tumors but positively associated with high-grade tumors. No other SNP showed differential associations according to disease stage or grade. We observed no effect modification by SNP for association with age at diagnosis, family history of prostate cancer, diabetes, BMI, height, smoking or alcohol intake. Moreover, we found no evidence of pair-wise SNP-SNP interactions. While these SNPs represent new independent risk factors for prostate cancer, we saw little evidence for effect modification by other SNPs or by the environmental factors examined. PMID:21390317

  7. SNP markers identify widely distributed clonal lineages of Phytophthora colocasiae in Vietnam, Hawaii and Hainan Island, China.

    PubMed

    Shrestha, Sandesh; Hu, Jian; Fryxell, Rebecca Trout; Mudge, Joann; Lamour, Kurt

    2014-01-01

    Taro (Colocasia esculenta) is an important food crop, and taro leaf blight caused by Phytophthora colocasiae can significantly affect production. Our objectives were to develop single nucleotide polymorphism (SNP) markers for P. colocasiae and characterize populations in Hawaii (HI), Vietnam (VN) and Hainan Island, China (HIC). In total, 379 isolates were analyzed for mating type and multilocus SNP profiles including 214 from HI, 97 from VN and 68 from HIC. A total of 1152 single nucleotide variant (SNV) sites were identified via restriction site-associated DNA (RAD) sequencing of two field isolates. Genotyping with 27 SNPs revealed 41 multilocus SNP genotypes grouped into seven clonal lineages containing 2-232 members. Three clonal lineages were shared among countries. In addition, five SNP markers had a low incidence of loss of heterozygosity (LOH) during asexual laboratory growth. For HI and VN, >95% of isolates were the A2 mating type. On HIC, isolates within single clonal lineages had A1, A2 and A0 (neuter) isolates. The implications for the wide dispersal of clonal lineages are discussed.

  8. Association of STAT2 SNP genotypes and growth phenotypes in heifers from an Angus, Brahman and Romosinuano diallel population

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Components of the growth endocrine axis regulate growth and reproduction traits in cattle. A SNP in the promoter of the signal transducer and activator of transcription 2 (STAT2) has been previously reported to be associated with postpartum rebreeding in a diallel beef population composed of 650 hei...

  9. Population-standardized genetic risk score: the SNP-based method of choice for inherited risk assessment of prostate cancer

    PubMed Central

    Conran, Carly A; Na, Rong; Chen, Haitao; Jiang, Deke; Lin, Xiaoling; Zheng, S Lilly; Brendler, Charles B; Xu, Jianfeng

    2016-01-01

    Several different approaches are available to clinicians for determining prostate cancer (PCa) risk. The clinical validity of various PCa risk assessment methods utilizing single nucleotide polymorphisms (SNPs) has been established; however, these SNP-based methods have not been compared. The objective of this study was to compare the three most commonly used SNP-based methods for PCa risk assessment. Participants were men (n = 1654) enrolled in a prospective study of PCa development. Genotypes of 59 PCa risk-associated SNPs were available in this cohort. Three methods of calculating SNP-based genetic risk scores (GRSs) were used for the evaluation of individual disease risk such as risk allele count (GRS-RAC), weighted risk allele count (GRS-wRAC), and population-standardized genetic risk score (GRS-PS). Mean GRSs were calculated, and performances were compared using area under the receiver operating characteristic curve (AUC) and positive predictive value (PPV). All SNP-based methods were found to be independently associated with PCa (all P < 0.05; hence their clinical validity). The mean GRSs in men with or without PCa using GRS-RAC were 55.15 and 53.46, respectively, using GRS-wRAC were 7.42 and 6.97, respectively, and using GRS-PS were 1.12 and 0.84, respectively (all P < 0.05 for differences between patients with or without PCa). All three SNP-based methods performed similarly in discriminating PCa from non-PCa based on AUC and in predicting PCa risk based on PPV (all P > 0.05 for comparisons between the three methods), and all three SNP-based methods had a significantly higher AUC than family history (all P < 0.05). Results from this study suggest that while the three most commonly used SNP-based methods performed similarly in discriminating PCa from non-PCa at the population level, GRS-PS is the method of choice for risk assessment at the individual level because its value (where 1.0 represents average population risk) can be easily interpreted regardless

  10. SNP Discovery and Chromosome Anchoring Provide the First Physically-Anchored Hexaploid Oat Map and Reveal Synteny with Model Species

    PubMed Central

    Chao, Shiaoman; Jellen, Eric N.; Carson, Martin L.; Rines, Howard W.; Obert, Donald E.; Lutz, Joseph D.; Shackelford, Irene; Korol, Abraham B.; Wight, Charlene P.; Gardner, Kyle M.; Hattori, Jiro; Beattie, Aaron D.; Bjørnstad, Åsmund; Bonman, J. Michael; Jannink, Jean-Luc; Sorrells, Mark E.; Brown-Guedira, Gina L.; Mitchell Fetch, Jennifer W.; Harrison, Stephen A.; Howarth, Catherine J.; Ibrahim, Amir; Kolb, Frederic L.; McMullen, Michael S.; Murphy, J. Paul; Ohm, Herbert W.; Rossnagel, Brian G.; Yan, Weikai; Miclaus, Kelci J.; Hiller, Jordan; Maughan, Peter J.; Redman Hulse, Rachel R.; Anderson, Joseph M.; Islamovic, Emir

    2013-01-01

    A physically anchored consensus map is foundational to modern genomics research; however, construction of such a map in oat (Avena sativa L., 2n = 6x = 42) has been hindered by the size and complexity of the genome, the scarcity of robust molecular markers, and the lack of aneuploid stocks. Resources developed in this study include a modified SNP discovery method for complex genomes, a diverse set of oat SNP markers, and a novel chromosome-deficient SNP anchoring strategy. These resources were applied to build the first complete, physically-anchored consensus map of hexaploid oat. Approximately 11,000 high-confidence in silico SNPs were discovered based on nine million inter-varietal sequence reads of genomic and cDNA origin. GoldenGate genotyping of 3,072 SNP assays yielded 1,311 robust markers, of which 985 were mapped in 390 recombinant-inbred lines from six bi-parental mapping populations ranging in size from 49 to 97 progeny. The consensus map included 985 SNPs and 68 previously-published markers, resolving 21 linkage groups with a total map distance of 1,838.8 cM. Consensus linkage groups were assigned to 21 chromosomes using SNP deletion analysis of chromosome-deficient monosomic hybrid stocks. Alignments with sequenced genomes of rice and Brachypodium provide evidence for extensive conservation of genomic regions, and renewed encouragement for orthology-based genomic discovery in this important hexaploid species. These results also provide a framework for high-resolution genetic analysis in oat, and a model for marker development and map construction in other species with complex genomes and limited resources. PMID:23533580

  11. Modification of heparanase gene expression in response to conditioning and LPS treatment: strong correlation to rs4693608 SNP.

    PubMed

    Ostrovsky, Olga; Shimoni, Avichai; Baryakh, Polina; Morgulis, Yan; Mayorov, Margarita; Beider, Katia; Shteingauz, Anna; Ilan, Neta; Vlodavsky, Israel; Nagler, Arnon

    2014-04-01

    Heparanase is an endo-β-glucuronidase that specifically cleaves the saccharide chains of HSPGs, important structural and functional components of the ECM. Cleavage of HS leads to loss of the structural integrity of the ECM and release of HS-bound cytokines, chemokines, and bioactive angiogenic- and growth-promoting factors. Our previous study revealed a highly significant correlation of HPSE gene SNPs rs4693608 and rs4364254 and their combination with the risk of developing GVHD. We now demonstrate that HPSE is up-regulated in response to pretransplantation conditioning, followed by a gradual decrease thereafter. Expression of heparanase correlated with the rs4693608 HPSE SNP before and after conditioning. Moreover, a positive correlation was found between recipient and donor rs4693608 SNP discrepancy and the time of neutrophil and platelet recovery. Similarly, the discrepancy in rs4693608 HPSE SNP between recipients and donors was found to be a more significant factor for the risk of aGVHD than patient genotype. The rs4693608 SNP also affected HPSE gene expression in LPS-treated MNCs from PB and CB. Possessors of the AA genotype exhibited up-regulation of heparanase with a high ratio in the LPS-treated MNCs, whereas individuals with genotype GG showed down-regulation or no effect on HPSE gene expression. HPSE up-regulation was mediated by TLR4. The study emphasizes the importance of rs4693608 SNP for HPSE gene expression in activated MNCs, indicating a role in allogeneic stem cell transplantation, including postconditioning, engraftment, and GVHD.

  12. Development of a SNP-based panel for human identification for Indian populations.

    PubMed

    Sarkar, Anujit; Nandineni, Madhusudan R

    2017-03-01

    The widely employed short tandem repeat (STR)-based panels for forensic human identification (HID) have limitations while dealing with challenging forensic samples involving DNA degradation, resulting in dropping-out of higher molecular weight alleles/loci. To address this issue, bialleic markers like single nucleotide polymorphisms (SNPs) and insertion-deletions (indels), which can be scored even when the template DNA is heavily degraded (<100bp), have been suggested as alternative markers for HID testing. Recent studies have highlighted their utility in forensic HID and several panels based on biallelic markers have been described for worldwide populations. However, there has been very little information about the behavior of such DNA markers in Indian populations, which is known to possess great genetic diversity. This study describes a two-step approach for designing a SNP-based panel consisting of 70 SNPs for HID testing in Indian populations. In the first step, candidate SNPs were shortlisted from public databases by screening them for several criteria including allelic distribution, genomic location, potential phenotypic expression or functionality and species specificity. The second step involved genotyping the shortlisted SNPs in various Indian populations followed by shortlisting of the best performers for identity-testing. Starting with 592,652 SNPs listed in Human660W-Quad Beadchip (Illumina Inc.), we shortlisted 275 candidate SNPs for identity-testing and genotyped them in 462 unrelated individuals from different population groups in India. Post genotyping and statistical analyses based on biogeographic regions, 206 SNPs demonstrated desired allelic distribution (Heterozygosity≥0.4 and FST≤0.02), from which 2-4 widely separated (>20 Mb apart) SNPs from each chromosome were finally selected to construct a panel of 70 SNPs. This panel on average possessed match probability 10e-29 and probability of paternity of 0.99999997, which was orders of

  13. Association of the ARL15 rs6450176 SNP and serum lipid levels in the Jing and Han populations

    PubMed Central

    Sun, Jia-Qi; Yin, Rui-Xing; Shi, Guang-Yuan; Shen, Shao-Wen; Chen, Xia; Bin, Yuan; Huang, Feng; Wang, Wei; Lin, Wei-Xiong; Pan, Shang-Ling

    2015-01-01

    The association of ADP-ribosylation factor-like 15 (ARL15) rs6450176 single nucleotide polymorphism (SNP) and serum lipid profiles has never been studied in the Chinese population. The present study was undertaken to detect the association of ARL15 rs6450176 SNP and several environmental factors with serum lipid levels in the Jing and Han populations. Genotypes of the SNP were determined in 726 unrelated subjects of Jing nationality and 726 participants of Han nationality. The genotypic and allelic frequencies of the SNP in Jing but not in Han were different between males and females (P < 0.001 and P < 0.05; respectively). The G allele carriers in Han had lower serum total cholesterol (TC), low-density lipoprotein cholesterol (LDL-C) and apolipoprotein (Apo) B levels, and higher ApoA1/ApoB ratio than the G allele non-carriers (P < 0.05-0.01). The G allele carriers in Jing had lower serum TC, high-density lipoprotein cholesterol (HDL-C), ApoA1, ApoB levels and higher ApoA1/ApoB ratio than the G allele non-carriers (P < 0.05 for all). Subgroup analyses showed that the G allele carriers had lower TC and LDL-C levels in Han males; lower LDL-C and ApoB levels in Han females; lower ApoB levels and ApoA1/ApoB ratio in Jing males; and lower LDL-C levels in Jing females than the G allele non-carriers (P < 0.05-0.01). Multiple linear regression analysis showed that serum TC, LDL-C, ApoB levels and the ApoA1/ApoB ratio in Han; and TC, HDL-C and ApoA1 levels in Jing were correlated with the genotypes of the ARL15 rs6450176 SNP (P < 0.05-0.001). Serum lipid parameters were also associated with several environmental factors in both ethnic groups. These findings indicated that there may be a racial/ethnic- and/or sex-specific association of the ARL15 rs6450176 SNP and serum lipid levels. PMID:26722494

  14. Whole-genome single-nucleotide polymorphism (SNP) marker discovery and association analysis with the eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA) content in Larimichthys crocea

    PubMed Central

    Xiao, Shijun; Wang, Panpan; Dong, Linsong; Zhang, Yaguang; Han, Zhaofang; Wang, Qiurong

    2016-01-01

    Whole-genome single-nucleotide polymorphism (SNP) markers are valuable genetic resources for the association and conservation studies. Genome-wide SNP development in many teleost species are still challenging because of the genome complexity and the cost of re-sequencing. Genotyping-By-Sequencing (GBS) provided an efficient reduced representative method to squeeze cost for SNP detection; however, most of recent GBS applications were reported on plant organisms. In this work, we used an EcoRI-NlaIII based GBS protocol to teleost large yellow croaker, an important commercial fish in China and East-Asia, and reported the first whole-genome SNP development for the species. 69,845 high quality SNP markers that evenly distributed along genome were detected in at least 80% of 500 individuals. Nearly 95% randomly selected genotypes were successfully validated by Sequenom MassARRAY assay. The association studies with the muscle eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA) content discovered 39 significant SNP markers, contributing as high up to ∼63% genetic variance that explained by all markers. Functional genes that involved in fat digestion and absorption pathway were identified, such as APOB, CRAT and OSBPL10. Notably, PPT2 Gene, previously identified in the association study of the plasma n-3 and n-6 polyunsaturated fatty acid level in human, was re-discovered in large yellow croaker. Our study verified that EcoRI-NlaIII based GBS could produce quality SNP markers in a cost-efficient manner in teleost genome. The developed SNP markers and the EPA and DHA associated SNP loci provided invaluable resources for the population structure, conservation genetics and genomic selection of large yellow croaker and other fish organisms. PMID:28028455

  15. Review of alignment and SNP calling algorithms for next-generation sequencing data.

    PubMed

    Mielczarek, M; Szyda, J

    2016-02-01

    Application of the massive parallel sequencing technology has become one of the most important issues in life sciences. Therefore, it was crucial to develop bioinformatics tools for next-generation sequencing (NGS) data processing. Currently, two of the most significant tasks include alignment to a reference genome and detection of single nucleotide polymorphisms (SNPs). In many types of genomic analyses, great numbers of reads need to be mapped to the reference genome; therefore, selection of the aligner is an essential step in NGS pipelines. Two main algorithms-suffix tries and hash tables-have been introduced for this purpose. Suffix array-based aligners are memory-efficient and work faster than hash-based aligners, but they are less accurate. In contrast, hash table algorithms tend to be slower, but more sensitive. SNP and genotype callers may also be divided into two main different approaches: heuristic and probabilistic methods. A variety of software has been subsequently developed over the past several years. In this paper, we briefly review the current development of NGS data processing algorithms and present the available software.

  16. Genome-wide SNP analysis explains coral diversity and recovery in the Ryukyu Archipelago

    PubMed Central

    Shinzato, Chuya; Mungpakdee, Sutada; Arakaki, Nana; Satoh, Noriyuki

    2015-01-01

    Following a global coral bleaching event in 1998, Acropora corals surrounding most of Okinawa island (OI) were devastated, although they are now gradually recovering. In contrast, the Kerama Islands (KIs) only 30 km west of OI, have continuously hosted a great variety of healthy corals. Taking advantage of the decoded Acropora digitifera genome and using genome-wide SNP analyses, we clarified Acropora population structure in the southern Ryukyu Archipelago (sRA). Despite small genetic distances, we identified distinct clusters corresponding to specific island groups, suggesting infrequent long-distance dispersal within the sRA. Although the KIs were believed to supply coral larvae to OI, admixture analyses showed that such dispersal is much more limited than previously realized, indicating independent recovery of OI coral populations and the necessity of local conservation efforts for each region. We detected strong historical migration from the Yaeyama Islands (YIs) to OI, and suggest that the YIs are the original source of OI corals. In addition, migration edges to the KIs suggest that they are a historical sink population in the sRA, resulting in high diversity. This population genomics study provides the highest resolution data to date regarding coral population structure and history. PMID:26656261

  17. ChroMoS: an integrated web tool for SNP classification, prioritization and functional interpretation

    PubMed Central

    Barenboim, Maxim; Manke, Thomas

    2013-01-01

    Summary: Genome-wide association studies and re-sequencing projects are revealing an increasing number of disease-associated SNPs, a large fraction of which are non-coding. Although they could have relevance for disease susceptibility and progression, the lack of information about regulatory regions impedes the assessment of their functionality. Here we present a web server, ChroMoS (Chromatin Modified SNPs), which combines genetic and epigenetic data with the goal of facilitating SNPs' classification, prioritization and prediction of their functional consequences. ChroMoS uses a large database of SNPs and chromatin states, but allows a user to provide his/her own genetic information. Based on the SNP classification and interactive prioritization, a user can compute the functional impact of multiple SNPs using two prediction tools, one for differential analysis of transcription factor binding (sTRAP) and another for SNPs with potential impact on binding of miRNAs (MicroSNiPer). Availability: Web server, ChroMoS, is freely available at http://epicenter.immunbio.mpg.de/services/chromos. Contact: barenboim@ie-freiburg.mpg.de or manke@ie-freiburg.mpg.de PMID:23782616

  18. AncestrySNPminer: A bioinformatics tool to retrieve and develop ancestry informative SNP panels

    PubMed Central

    Amirisetty, Sushil; Khurana Hershey, Gurjit K.; Baye, Tesfaye M.

    2012-01-01

    A wealth of genomic information is available in public and private databases. However, this information is underutilized for uncovering population specific and functionally relevant markers underlying complex human traits. Given the huge amount of SNP data available from the annotation of human genetic variation, data mining is a faster and cost effective approach for investigating the number of SNPs that are informative for ancestry. In this study, we present AncestrySNPminer, the first web-based bioinformatics tool specifically designed to retrieve Ancestry Informative Markers (AIMs) from genomic data sets and link these informative markers to genes and ontological annotation classes. The tool includes an automated and simple “scripting at the click of a button” functionality that enables researchers to perform various population genomics statistical analyses methods with user friendly querying and filtering of data sets across various populations through a single web interface. AncestrySNPminer can be freely accessed at https://research.cchmc.org/mershalab/AncestrySNPminer/login.php. PMID:22584067

  19. SNP/RD typing of Mycobacterium tuberculosis Beijing strains reveals local and worldwide disseminated clonal complexes.

    PubMed

    Schürch, Anita C; Kremer, Kristin; Hendriks, Amber C A; Freyee, Benthe; McEvoy, Christopher R E; van Crevel, Reinout; Boeree, Martin J; van Helden, Paul; Warren, Robin M; Siezen, Roland J; van Soolingen, Dick

    2011-01-01

    The Beijing strain is one of the most successful genotypes of Mycobacterium tuberculosis worldwide and appears to be highly homogenous according to existing genotyping methods. To type Beijing strains reliably we developed a robust typing scheme using single nucleotide polymorphisms (SNPs) and regions of difference (RDs) derived from whole-genome sequencing data of eight Beijing strains. SNP/RD typing of 259 M. tuberculosis isolates originating from 45 countries worldwide discriminated 27 clonal complexes within the Beijing genotype family. A total of 16 Beijing clonal complexes contained more than one isolate of known origin, of which two clonal complexes were strongly associated with South African origin. The remaining 14 clonal complexes encompassed isolates from different countries. Even highly resolved clonal complexes comprised isolates from distinct geographical sites. Our results suggest that Beijing strains spread globally on multiple occasions and that the tuberculosis epidemic caused by the Beijing genotype is at least partially driven by modern migration patterns. The SNPs and RDs presented in this study will facilitate future molecular epidemiological and phylogenetic studies on Beijing strains.

  20. Genome-wide SNP analysis explains coral diversity and recovery in the Ryukyu Archipelago.

    PubMed

    Shinzato, Chuya; Mungpakdee, Sutada; Arakaki, Nana; Satoh, Noriyuki

    2015-12-10

    Following a global coral bleaching event in 1998, Acropora corals surrounding most of Okinawa island (OI) were devastated, although they are now gradually recovering. In contrast, the Kerama Islands (KIs) only 30 km west of OI, have continuously hosted a great variety of healthy corals. Taking advantage of the decoded Acropora digitifera genome and using genome-wide SNP analyses, we clarified Acropora population structure in the southern Ryukyu Archipelago (sRA). Despite small genetic distances, we identified distinct clusters corresponding to specific island groups, suggesting infrequent long-distance dispersal within the sRA. Although the KIs were believed to supply coral larvae to OI, admixture analyses showed that such dispersal is much more limited than previously realized, indicating independent recovery of OI coral populations and the necessity of local conservation efforts for each region. We detected strong historical migration from the Yaeyama Islands (YIs) to OI, and suggest that the YIs are the original source of OI corals. In addition, migration edges to the KIs suggest that they are a historical sink population in the sRA, resulting in high diversity. This population genomics study provides the highest resolution data to date regarding coral population structure and history.

  1. TNF promoter SNP variation in Amerindians and white-admixed women from Misiones, Argentina.

    PubMed

    Badano, I; Schurr, T G; Stietz, S M; Dulik, M C; Mampaey, M; Quintero, I M; Zinovich, J B; Campos, R H; Liotta, D J

    2013-06-01

    The aim of this study is to describe genetic variation in the TNF promoter in the ethnically diverse population of Misiones, north-eastern Argentina. We analysed 210 women including 66 Amerindians of the Mbya-Guarani ethnic group and 144 white-admixed individuals from urban and rural areas of Misiones. Their DNA samples were surveyed for TNF polymorphisms -376 A/G, -308 A/G -244 A/G and -238 A/G by PCR amplification and direct sequencing and for the Amerindian marker -857 C/T by real-time PCR. Our main findings are as follows:(i) a distinctive pattern of Single Nucleotide Polymorphism (SNP) distribution among these groups, (ii) genetic differentiation between the Mbya-Guarani and the white-admixed populations (P < 0.05), (iii) lower gene diversity (~0.05) in Mbya-Guarani compared with the white-admixed group (~0.21); and (iv) linkage disequilibrium between the -376A and -238A SNPs in white-admixed populations. These data highlight the principal role of population history in establishing present-day genetic variation at the TNF locus and provide a framework for undertaking ethnographic and disease association studies in Misiones.

  2. Prospective diagnostic analysis of copy number variants using SNP microarrays in individuals with autism spectrum disorders

    PubMed Central

    Nava, Caroline; Keren, Boris; Mignot, Cyril; Rastetter, Agnès; Chantot-Bastaraud, Sandra; Faudet, Anne; Fonteneau, Eric; Amiet, Claire; Laurent, Claudine; Jacquette, Aurélia; Whalen, Sandra; Afenjar, Alexandra; Périsse, Didier; Doummar, Diane; Dorison, Nathalie; Leboyer, Marion; Siffroi, Jean-Pierre; Cohen, David; Brice, Alexis; Héron, Delphine; Depienne, Christel

    2014-01-01

    Copy number variants (CNVs) have repeatedly been found to cause or predispose to autism spectrum disorders (ASDs). For diagnostic purposes, we screened 194 individuals with ASDs for CNVs using Illumina SNP arrays. In several probands, we also analyzed candidate genes located in inherited deletions to unmask autosomal recessive variants. Three CNVs, a de novo triplication of chromosome 15q11–q12 of paternal origin, a deletion on chromosome 9p24 and a de novo 3q29 deletion, were identified as the cause of the disorder in one individual each. An autosomal recessive cause was considered possible in two patients: a homozygous 1p31.1 deletion encompassing PTGER3 and a deletion of the entire DOCK10 gene associated with a rare hemizygous missense variant. We also identified multiple private or recurrent CNVs, the majority of which were inherited from asymptomatic parents. Although highly penetrant CNVs or variants inherited in an autosomal recessive manner were detected in rare cases, our results mainly support the hypothesis that most CNVs contribute to ASDs in association with other CNVs or point variants located elsewhere in the genome. Identification of these genetic interactions in individuals with ASDs constitutes a formidable challenge. PMID:23632794

  3. Prospective diagnostic analysis of copy number variants using SNP microarrays in individuals with autism spectrum disorders.

    PubMed

    Nava, Caroline; Keren, Boris; Mignot, Cyril; Rastetter, Agnès; Chantot-Bastaraud, Sandra; Faudet, Anne; Fonteneau, Eric; Amiet, Claire; Laurent, Claudine; Jacquette, Aurélia; Whalen, Sandra; Afenjar, Alexandra; Périsse, Didier; Doummar, Diane; Dorison, Nathalie; Leboyer, Marion; Siffroi, Jean-Pierre; Cohen, David; Brice, Alexis; Héron, Delphine; Depienne, Christel

    2014-01-01

    Copy number variants (CNVs) have repeatedly been found to cause or predispose to autism spectrum disorders (ASDs). For diagnostic purposes, we screened 194 individuals with ASDs for CNVs using Illumina SNP arrays. In several probands, we also analyzed candidate genes located in inherited deletions to unmask autosomal recessive variants. Three CNVs, a de novo triplication of chromosome 15q11-q12 of paternal origin, a deletion on chromosome 9p24 and a de novo 3q29 deletion, were identified as the cause of the disorder in one individual each. An autosomal recessive cause was considered possible in two patients: a homozygous 1p31.1 deletion encompassing PTGER3 and a deletion of the entire DOCK10 gene associated with a rare hemizygous missense variant. We also identified multiple private or recurrent CNVs, the majority of which were inherited from asymptomatic parents. Although highly penetrant CNVs or variants inherited in an autosomal recessive manner were detected in rare cases, our results mainly support the hypothesis that most CNVs contribute to ASDs in association with other CNVs or point variants located elsewhere in the genome. Identification of these genetic interactions in individuals with ASDs constitutes a formidable challenge.

  4. SNPpy - Database Management for SNP Data from Genome Wide Association Studies

    PubMed Central

    Mitha, Faheem; Herodotou, Herodotos; Borisov, Nedyalko; Jiang, Chen; Yoder, Josh; Owzar, Kouros

    2011-01-01

    Background We describe SNPpy, a hybrid script database system using the Python SQLAlchemy library coupled with the PostgreSQL database to manage genotype data from Genome-Wide Association Studies (GWAS). This system makes it possible to merge study data with HapMap data and merge across studies for meta-analyses, including data filtering based on the values of phenotype and Single-Nucleotide Polymorphism (SNP) data. SNPpy and its dependencies are open source software. Results The current version of SNPpy offers utility functions to import genotype and annotation data from two commercial platforms. We use these to import data from two GWAS studies and the HapMap Project. We then export these individual datasets to standard data format files that can be imported into statistical software for downstream analyses. Conclusions By leveraging the power of relational databases, SNPpy offers integrated management and manipulation of genotype and phenotype data from GWAS studies. The analysis of these studies requires merging across GWAS datasets as well as patient and marker selection. To this end, SNPpy enables the user to filter the data and output the results as standardized GWAS file formats. It does low level and flexible data validation, including validation of patient data. SNPpy is a practical and extensible solution for investigators who seek to deploy central management of their GWAS data. PMID:22039405

  5. A high-performance computing toolset for relatedness and principal component analysis of SNP data

    PubMed Central

    Zheng, Xiuwen; Levine, David; Shen, Jess; Gogarten, Stephanie M.; Laurie, Cathy; Weir, Bruce S.

    2012-01-01

    Summary: Genome-wide association studies are widely used to investigate the genetic basis of diseases and traits, but they pose many computational challenges. We developed gdsfmt and SNPRelate (R packages for multi-core symmetric multiprocessing computer architectures) to accelerate two key computations on SNP data: principal component analysis (PCA) and relatedness analysis using identity-by-descent measures. The kernels of our algorithms are written in C/C++ and highly optimized. Benchmarks show the uniprocessor implementations of PCA and identity-by-descent are ∼8–50 times faster than the implementations provided in the popular EIGENSTRAT (v3.0) and PLINK (v1.07) programs, respectively, and can be sped up to 30–300-fold by using eight cores. SNPRelate can analyse tens of thousands of samples with millions of SNPs. For example, our package was used to perform PCA on 55 324 subjects from the ‘Gene-Environment Association Studies’ consortium studies. Availability and implementation: gdsfmt and SNPRelate are available from R CRAN (http://cran.r-project.org), including a vignette. A tutorial can be found at https://www.genevastudy.org/Accomplishments/software. Contact: zhengx@u.washington.edu PMID:23060615

  6. [Advances in development of gene-gene interaction analysis methods based on SNP data: a review].

    PubMed

    Luan, Yi-Zhao; Zuo, Xiao-Yu; Liu, Ke; Li, Gu; Rao, Shao-Qi

    2013-12-01

    The SNP-based association analysis has become one of the most important approaches to interpret the underlying molecular mechanisms for human complex diseases. Nevertheless, the widely-used singe-locus analysis is only capable of capturing a small portion of susceptible SNPs with prominent marginal effects, leaving the important genetic component, epistasis or joint effects, to be undetectable. Identifying the complex interplays among multiple genes in the genome-wide context is an essential task for systematically unraveling the molecular mechanisms for complex diseases. Many approaches have been used to detect genome-wide gene-gene interactions and provided new insights into the genetic basis of complex diseases. This paper reviewed recent advances of the methods for detecting gene-gene interaction, categorized into three types, model-based and model-free statistical methods, and data mining methods, based on their characteristics in theory and numerical algorithm. In particular, the basic principle, numerical implementation and cautions for application for each method were elucidated. In addition, this paper briefly discussed the limitations and challenges associated with detecting genome-wide epistasis, in order to provide some methodological consultancies for scientists in the related fields.

  7. Building a forensic ancestry panel from the ground up: The EUROFORGEN Global AIM-SNP set.

    PubMed

    Phillips, C; Parson, W; Lundsberg, B; Santos, C; Freire-Aradas, A; Torres, M; Eduardoff, M; Børsting, C; Johansen, P; Fondevila, M; Morling, N; Schneider, P; Carracedo, A; Lareu, M V

    2014-07-01

    Emerging next-generation sequencing technologies will enable DNA analyses to add pigmentation predictive and ancestry informative (AIM) SNPs to the range of markers detectable from a single PCR test. This prompted us to re-appraise current forensic and genomics AIM-SNPs and from the best sets, to identify the most divergent markers for a five population group differentiation of Africans, Europeans, East Asians, Native Americans and Oceanians by using our own online genome variation browsers. We prioritized careful balancing of population differentiation across the five group comparisons in order to minimize bias when estimating co-ancestry proportions in individuals with admixed ancestries. The differentiation of European from Middle East or South Asian ancestries was not chosen as a characteristic in order to concentrate on introducing Oceanian differentiation for the first time in a forensic AIM set. We describe a complete set of 128 AIM-SNPs that have near identical population-specific divergence across five continentally defined population groups. The full set can be systematically reduced in size, while preserving the most informative markers and the balance of population-specific divergence in at least four groups. We describe subsets of 88, 55, 28, 20 and 12 AIMs, enabling both new and existing SNP genotyping technologies to exploit the best markers identified for forensic ancestry analysis.

  8. High-throughput SNP-based authentication of human cell lines

    PubMed Central

    Castro, Felipe; Dirks, Wilhelm G.; Fähnrich, Silke; Hotz-Wagenblatt, Agnes; Pawlita, Michael; Schmitt, Markus

    2012-01-01

    Use of false cell lines remains a major problem in biological research. Short tandem repeat (STR) profiling represents the gold standard technique for cell line authentication. However, mismatch repair (MMR) deficient cell lines are characterized by microsatellite instability, which could force allelic drifts in combination with a selective outgrowth of otherwise persisting side lines, and thus, are likely to be misclassified by STR-profiling. Based on the high-throughput Luminex platform, we developed a 24-plex SNP-profiling assay, called Multiplex Cell Authentication (MCA), for determining authentication of human cell lines. MCA was evaluated by analysing a collection of 436 human cell lines from the DSMZ, previously characterised by eight loci STR profiling. Both assays showed a very high degree of concordance and similar average matching probabilities (~1 × 10−8 for STR-profiling and ~1 × 10−9 for MCA). MCA enabled the detection of less than 3% contaminating human cells. Analysing MMR deficient cell lines, evidence was obtained for a higher robustness of the MCA compared to STR profiling. In conclusion, MCA could complement routine cell line authentication and replace the standard authentication STR technique in case of MSI cell lines. PMID:22700458

  9. Design and synthesis of the superionic conductor Na10SnP2S12

    PubMed Central

    Richards, William D.; Tsujimura, Tomoyuki; Miara, Lincoln J.; Wang, Yan; Kim, Jae Chul; Ong, Shyue Ping; Uechi, Ichiro; Suzuki, Naoki; Ceder, Gerbrand

    2016-01-01

    Sodium-ion batteries are emerging as candidates for large-scale energy storage due to their low cost and the wide variety of cathode materials available. As battery size and adoption in critical applications increases, safety concerns are resurfacing due to the inherent flammability of organic electrolytes currently in use in both lithium and sodium battery chemistries. Development of solid-state batteries with ionic electrolytes eliminates this concern, while also allowing novel device architectures and potentially improving cycle life. Here we report the computation-assisted discovery and synthesis of a high-performance solid-state electrolyte material: Na10SnP2S12, with room temperature ionic conductivity of 0.4 mS cm−1 rivalling the conductivity of the best sodium sulfide solid electrolytes to date. We also computationally investigate the variants of this compound where tin is substituted by germanium or silicon and find that the latter may achieve even higher conductivity. PMID:26984102

  10. Sensitive DNA detection and SNP discrimination using ultrabright SERS nanorattles and magnetic beads for malaria diagnostics.

    PubMed

    Ngo, Hoan T; Gandra, Naveen; Fales, Andrew M; Taylor, Steve M; Vo-Dinh, Tuan

    2016-07-15

    One of the major obstacles to implement nucleic acid-based molecular diagnostics at the point-of-care (POC) and in resource-limited settings is the lack of sensitive and practical DNA detection methods that can be seamlessly integrated into portable platforms. Herein we present a sensitive yet simple DNA detection method using a surface-enhanced Raman scattering (SERS) nanoplatform: the ultrabright SERS nanorattle. The method, referred to as the nanorattle-based method, involves sandwich hybridization of magnetic beads that are loaded with capture probes, target sequences, and ultrabright SERS nanorattles that are loaded with reporter probes. Upon hybridization, a magnet was applied to concentrate the hybridization sandwiches at a detection spot for SERS measurements. The ultrabright SERS nanorattles, composed of a core and a shell with resonance Raman reporters loaded in the gap space between the core and the shell, serve as SERS tags for signal detection. Using this method, a specific DNA sequence of the malaria parasite Plasmodium falciparum could be detected with a detection limit of approximately 100 attomoles. Single nucleotide polymorphism (SNP) discrimination of wild type malaria DNA and mutant malaria DNA, which confers resistance to artemisinin drugs, was also demonstrated. These test models demonstrate the molecular diagnostic potential of the nanorattle-based method to both detect and genotype infectious pathogens. Furthermore, the method's simplicity makes it a suitable candidate for integration into portable platforms for POC and in resource-limited settings applications.

  11. Use of direct and iterative solvers for estimation of SNP effects in genome-wide selection.

    PubMed

    Pimentel, Eduardo da Cruz Gouveia; Sargolzaei, Mehdi; Simianer, Henner; Schenkel, Flávio Schramm; Liu, Zengting; Fries, Luiz Alberto; de Queiroz, Sandra Aidar

    2010-01-01

    The aim of this study was to compare iterative and direct solvers for estimation of marker effects in genomic selection. One iterative and two direct methods were used: Gauss-Seidel with Residual Update, Cholesky Decomposition and Gentleman-Givens rotations. For resembling different scenarios with respect to number of markers and of genotyped animals, a simulated data set divided into 25 subsets was used. Number of markers ranged from 1,200 to 5,925 and number of animals ranged from 1,200 to 5,865. Methods were also applied to real data comprising 3081 individuals genotyped for 45181 SNPs. Results from simulated data showed that the iterative solver was substantially faster than direct methods for larger numbers of markers. Use of a direct solver may allow for computing (co)variances of SNP effects. When applied to real data, performance of the iterative method varied substantially, depending on the level of ill-conditioning of the coefficient matrix. From results with real data, Gentleman-Givens rotations would be the method of choice in this particular application as it provided an exact solution within a fairly reasonable time frame (less than two hours). It would indeed be the preferred method whenever computer resources allow its use.

  12. Development and application of a novel genome-wide SNP array reveals domestication history in soybean.

    PubMed

    Wang, Jiao; Chu, Shanshan; Zhang, Huairen; Zhu, Ying; Cheng, Hao; Yu, Deyue

    2016-02-09

    Domestication of soybeans occurred under the intense human-directed selections aimed at developing high-yielding lines. Tracing the domestication history and identifying the genes underlying soybean domestication require further exploration. Here, we developed a high-throughput NJAU 355 K SoySNP array and used this array to study the genetic variation patterns in 367 soybean accessions, including 105 wild soybeans and 262 cultivated soybeans. The population genetic analysis suggests that cultivated soybeans have tended to originate from northern and central China, from where they spread to other regions, accompanied with a gradual increase in seed weight. Genome-wide scanning for evidence of artificial selection revealed signs of selective sweeps involving genes controlling domestication-related agronomic traits including seed weight. To further identify genomic regions related to seed weight, a genome-wide association study (GWAS) was conducted across multiple environments in wild and cultivated soybeans. As a result, a strong linkage disequilibrium region on chromosome 20 was found to be significantly correlated with seed weight in cultivated soybeans. Collectively, these findings should provide an important basis for genomic-enabled breeding and advance the study of functional genomics in soybean.

  13. SNP Discovery Using a Pangenome: Has the Single Reference Approach Become Obsolete?

    PubMed Central

    Hurgobin, Bhavna; Edwards, David

    2017-01-01

    Increasing evidence suggests that a single individual is insufficient to capture the genetic diversity within a species due to gene presence absence variation. In order to understand the extent to which genomic variation occurs in a species, the construction of its pangenome is necessary. The pangenome represents the complete set of genes of a species; it is composed of core genes, which are present in all individuals, and variable genes, which are present only in some individuals. Aside from variations at the gene level, single nucleotide polymorphisms (SNPs) are also an important form of genetic variation. The advent of next-generation sequencing (NGS) coupled with the heritability of SNPs make them ideal markers for genetic analysis of human, animal, and microbial data. SNPs have also been extensively used in crop genetics for association mapping, quantitative trait loci (QTL) analysis, analysis of genetic diversity, and phylogenetic analysis. This review focuses on the use of pangenomes for SNP discovery. It highlights the advantages of using a pangenome rather than a single reference for this purpose. This review also demonstrates how extra information not captured in a single reference alone can be used to provide additional support for linking genotypic data to phenotypic data. PMID:28287462

  14. Interleukin 6 SNP rs1800797 associates with the risk of adult-onset asthma.

    PubMed

    Lajunen, T K; Jaakkola, J J K; Jaakkola, M S

    2016-04-01

    Interleukin 6 (IL6) is an inflammatory cytokine that has been suggested to have an important role in the pathogenesis of asthma. IL6 single-nucleotide polymorphisms (SNPs) have been associated with levels of IL6, and with childhood and prevalent adult asthma. A recent study also suggested that IL6 SNPs associate especially with atopic asthma. However, association of IL6 SNPs with adult-onset asthma has not been studied. In a population-based study of 467 incident adult-onset asthma cases and 613 disease-free controls from South Finland, we analyzed association of 6 tagging SNPs of the IL6 locus with the risk of adult-onset asthma and with atopy. Asthma was clinically diagnosed, and atopy was defined based on Phadiatop test. IL6 SNP rs1800797 associated with the risk of adult-onset asthma in a log additive model, with adjusted odds ratio (aOR) 1.31 (95% confidence interval 1.09-1.57), and especially with the risk of atopic adult-onset asthma when compared with non-atopic controls, aOR 1.46 (95% CI 1.12-1.90). This is the first study to show an association of IL6 with adult-onset asthma, and especially with atopic adult-onset asthma.

  15. Connecting myelin-related and synaptic dysfunction in schizophrenia with SNP-rich gene expression hubs

    PubMed Central

    Hegyi, Hedi

    2017-01-01

    Combining genome-wide mapping of SNP-rich regions in schizophrenics and gene expression data in all brain compartments across the human life span revealed that genes with promoters most frequently mutated in schizophrenia are expression hubs interacting with far more genes than the rest of the genome. We summed up the differentially methylated “expression neighbors” of genes that fall into one of 108 distinct schizophrenia-associated loci with high number of SNPs. Surprisingly, the number of expression neighbors of the genes in these loci were 35 times higher for the positively correlating genes (32 times higher for the negatively correlating ones) than for the rest of the ~16000 genes. While the genes in the 108 loci have little known impact in schizophrenia, we identified many more known schizophrenia-related important genes with a high degree of connectedness (e.g. MOBP, SYNGR1 and DGCR6), validating our approach. Both the most connected positive and negative hubs affected synapse-related genes the most, supporting the synaptic origin of schizophrenia. At least half of the top genes in both the correlating and anti-correlating categories are cancer-related, including oncogenes (RRAS and ALDOA), providing further insight into the observed inverse relationship between the two diseases. PMID:28382934

  16. SNP Selection in Genome-Wide Association Studies via Penalized Support Vector Machine with MAX Test

    PubMed Central

    Kim, Jinseog; Kim, Dennis (Dong Hwan); Jung, Sin-Ho

    2013-01-01

    One of main objectives of a genome-wide association study (GWAS) is to develop a prediction model for a binary clinical outcome using single-nucleotide polymorphisms (SNPs) which can be used for diagnostic and prognostic purposes and for better understanding of the relationship between the disease and SNPs. Penalized support vector machine (SVM) methods have been widely used toward this end. However, since investigators often ignore the genetic models of SNPs, a final model results in a loss of efficiency in prediction of the clinical outcome. In order to overcome this problem, we propose a two-stage method such that the the genetic models of each SNP are identified using the MAX test and then a prediction model is fitted using a penalized SVM method. We apply the proposed method to various penalized SVMs and compare the performance of SVMs using various penalty functions. The results from simulations and real GWAS data analysis show that the proposed method performs better than the prediction methods ignoring the genetic models in terms of prediction power and selectivity. PMID:24174989

  17. SNP genotyping using TaqMan technology: the CYP2D6*17 assay conundrum.

    PubMed

    Gaedigk, Andrea; Freeman, Natalie; Hartshorne, Toinette; Riffel, Amanda K; Irwin, David; Bishop, Jeffrey R; Stein, Mark A; Newcorn, Jeffrey H; Jaime, Lazara Karelia Montané; Cherner, Mariana; Leeder, J Steven

    2015-03-19

    CYP2D6 contributes to the metabolism of many clinically used drugs and is increasingly tested to individualize drug therapy. The CYP2D6 gene is challenging to genotype due to the highly complex nature of its gene locus. TaqMan technology is widely used in the clinical and research settings for genotype analysis due to assay reliability, low cost, and the availability of commercially available assays. The assay identifying 1023C>T (rs28371706) defining a reduced function (CYP2D6*17) and several nonfunctional alleles, produced a small number of unexpected diplotype calls in three independent sets of samples, i.e. calls suggested the presence of a CYP2D6*4 subvariant containing 1023C>T. Gene resequencing did not reveal any unknown SNPs in the primer or probe binding sites in any of the samples, but all affected samples featured a trio of SNPs on their CYP2D6*4 allele between one of the PCR primer and probe binding sites. While the phenomenon was ultimately overcome by an alternate assay utilizing a PCR primer excluding the SNP trio, the mechanism causing this phenomenon remains elusive. This rare and unexpected event underscores the importance of assay validation in samples representing a variety of genotypes, but also vigilance of assay performance in highly polymorphic genes such as CYP2D6.

  18. SNP discovery and haplotype analysis in the segmentally duplicated DRD5 coding region

    PubMed Central

    HOUSLEY, D. J. E.; NIKOLAS, M.; VENTA, P. J.; JERNIGAN, K. A.; WALDMAN, I. D.; NIGG, J. T.; FRIDERICI, K. H.

    2009-01-01

    SUMMARY The dopamine receptor 5 gene (DRD5) holds much promise as a candidate locus for contributing to neuropsychiatric disorders and other diseases influenced by the dopaminergic system, as well as having potential to affect normal behavioral variation. However, detailed analyses of this gene have been complicated by its location within a segmentally duplicated chromosomal region. Microsatellites and SNPs upstream from the coding region have been used for association studies, but we find, using bioinformatics resources, that these markers all lie within a previously unrecognized second segmental duplication (SD). In order to accurately analyze the DRD5 locus for polymorphisms in the absence of contaminating pseudogene sequences, we developed a fast and reliable method for sequence analysis and genotyping within the DRD5 coding region. We employed restriction enzyme digestion of genomic DNA to eliminate the pseudogenes prior to PCR amplification of the functional gene. This approach allowed us to determine the DRD5 haplotype structure using 31 trios and to reveal additional rare variants in 171 unrelated individuals. We clarify the inconsistencies and errors of the recorded SNPs in dbSNP and HapMap and illustrate the importance of using caution when choosing SNPs in regions of suspected duplications. The simple and relatively inexpensive method presented herein allows for convenient analysis of sequence variation in DRD5 and can be easily adapted to other duplicated genomic regions in order to obtain good quality sequence data. PMID:19397556

  19. A 21-locus autosomal SNP multiplex and its application in forensic science.

    PubMed

    Hou, Guangwei; Jiang, Xianhua; Yang, Yanyan; Jia, Fei; Li, Qiang; Zhao, Jinling; Guo, Fei; Liu, Limin

    2014-01-01

    To develop a cost-effective technique for single-nucleotide polymorphism (SNP) genotyping and improve the efficiency to analyze degraded DNA, we have established a novel multiplex system including 21-locus autosomal SNPs and amelogenin locus, which was based on allele-specific amplification (ASA) and universal reporter primers (URP). The target amplicons for each of the 21 SNPs arranged from 63 base pair (bp) to 192 bp. The system was tested in 539 samples from three ethnic groups (Han, Mongolian, and Zhuang population) in China, and the total power of discrimination (TPD) and cumulative probability of exclusion (CPE) were more than 0.99999999 and 0.98, respectively. The system was further validated with forensic samples and full profiles could be achieved from degraded DNA and 63 case-type samples. In summary, the multiplex system offers an effective technique for individual identification of forensic samples and is much more efficient in the analysis of degraded DNA compared with standard STR typing.

  20. Family-Based Multi-SNP X Chromosome Analysis Using Parent Information.

    PubMed

    Wise, Alison S; Shi, Min; Weinberg, Clarice R

    2016-01-01

    We propose a method for association analysis of haplotypes on the X chromosome that offers both improved power and robustness to population stratification in studies of affected offspring and their parents if all three have been genotyped. The method makes use of assumed parental haplotype exchangeability (PHE), a weaker assumption than Hardy-Weinberg equilibrium (HWE). PHE requires that in the source population, of the three X chromosome haplotypes carried by the two parents, each is equally likely to be carried by the father. We propose a pseudo-sibling approach that exploits that exchangeability assumption. Our method extends the single-SNP PIX-LRT method to multiple SNPs in a high linkage block. We describe methods for testing the PHE assumption and also for determining how apparent violations can be distinguished from true fetal effects or maternally-mediated effects. We show results of simulations that demonstrate nominal type I error rate and good power. The methods are then applied to dbGaP data on the birth defect oral cleft, using both Asian and Caucasian families with cleft.

  1. Family-Based Multi-SNP X Chromosome Analysis Using Parent Information

    PubMed Central

    Wise, Alison S.; Shi, Min; Weinberg, Clarice R.

    2016-01-01

    We propose a method for association analysis of haplotypes on the X chromosome that offers both improved power and robustness to population stratification in studies of affected offspring and their parents if all three have been genotyped. The method makes use of assumed parental haplotype exchangeability (PHE), a weaker assumption than Hardy-Weinberg equilibrium (HWE). PHE requires that in the source population, of the three X chromosome haplotypes carried by the two parents, each is equally likely to be carried by the father. We propose a pseudo-sibling approach that exploits that exchangeability assumption. Our method extends the single-SNP PIX-LRT method to multiple SNPs in a high linkage block. We describe methods for testing the PHE assumption and also for determining how apparent violations can be distinguished from true fetal effects or maternally-mediated effects. We show results of simulations that demonstrate nominal type I error rate and good power. The methods are then applied to dbGaP data on the birth defect oral cleft, using both Asian and Caucasian families with cleft. PMID:26941777

  2. A Functional SNP in BNC2 Is Associated with Adolescent Idiopathic Scoliosis.

    PubMed

    Ogura, Yoji; Kou, Ikuyo; Miura, Shigenori; Takahashi, Atsushi; Xu, Leilei; Takeda, Kazuki; Takahashi, Yohei; Kono, Katsuki; Kawakami, Noriaki; Uno, Koki; Ito, Manabu; Minami, Shohei; Yonezawa, Ikuho; Yanagida, Haruhisa; Taneichi, Hiroshi; Zhu, Zezhang; Tsuji, Taichi; Suzuki, Teppei; Sudo, Hideki; Kotani, Toshiaki; Watanabe, Kota; Hosogane, Naobumi; Okada, Eijiro; Iida, Aritoshi; Nakajima, Masahiro; Sudo, Akihiro; Chiba, Kazuhiro; Hiraki, Yuji; Toyama, Yoshiaki; Qiu, Yong; Shukunami, Chisa; Kamatani, Yoichiro; Kubo, Michiaki; Matsumoto, Morio; Ikegawa, Shiro

    2015-08-06

    Adolescent idiopathic scoliosis (AIS) is the most common spinal deformity. We previously conducted a genome-wide association study (GWAS) and detected two loci associated with AIS. To identify additional loci, we extended our GWAS by increasing the number of cohorts (2,109 affected subjects and 11,140 control subjects in total) and conducting a whole-genome imputation. Through the extended GWAS and replication studies using independent Japanese and Chinese populations, we identified a susceptibility locus on chromosome 9p22.2 (p = 2.46 × 10(-13); odds ratio = 1.21). The most significantly associated SNPs were in intron 3 of BNC2, which encodes a zinc finger transcription factor, basonuclin-2. Expression quantitative trait loci data suggested that the associated SNPs have the potential to regulate the BNC2 transcriptional activity and that the susceptibility alleles increase BNC2 expression. We identified a functional SNP, rs10738445 in BNC2, whose susceptibility allele showed both higher binding to a transcription factor, YY1 (yin and yang 1), and higher BNC2 enhancer activity than the non-susceptibility allele. BNC2 overexpression produced body curvature in developing zebrafish in a gene-dosage-dependent manner. Our results suggest that increased BNC2 expression is implicated in the etiology of AIS.

  3. A functional SNP rs1892901 in FOSL1 is associated with gastric cancer in Chinese population

    PubMed Central

    Liu, Wenjie; Tian, Tian; Liu, Li; Du, Jiangbo; Gu, Yayun; Qin, Na; Yan, Caiwang; Wang, Zhaoming; Dai, Juncheng; Fan, Zhining

    2017-01-01

    FOSL1 (FOS like antigen 1) is one kind of proto-oncogene, and may play a vital role in carcinogenesis of multiple cancers. However, studies about the relationship between SNPs in FOSL1 and gastric cancer are still lacking. Thus, we investigated the association of seven SNPs in FOSL1 with gastric cancer using case-control design in a two-stage strategy (Screening stage: 1,140 gastric cancer cases and 1,547 controls; Replication stage: 1,006 cases and 2,273 controls). We found that rs1892901 was significantly associated with increased risk of gastric cancer in additive model (adjusted OR = 1.25, 95%CI: 1.06–1.47, P = 0.008) in first stage. Following replication results revealed that the relationship between rs1892901 and gastric cancer risk was consistent with our primary results. In silico analysis showed that rs1892901 might alter multiple regulatory motifs, disturb protein binding, and affect the expression of FOSL1 and other important gastric cancer-related genes such as EGR1, CHD, EP300, FOS, JUN and FOSL2. Our findings indicated that functional SNP rs1892901 in FOSL1 might affect the expression of FOSL1, and ultimately increase the risk of gastric cancer. Further functional studies and large-scale population studies are warranted to confirm our findings. PMID:28169308

  4. A comparison of Y-chromosomal lineage dating using either resequencing or Y-SNP plus Y-STR genotyping☆

    PubMed Central

    Wei, Wei; Ayub, Qasim; Xue, Yali; Tyler-Smith, Chris

    2013-01-01

    We have compared phylogenies and time estimates for Y-chromosomal lineages based on resequencing ∼9 Mb of DNA and applying the program GENETREE to similar analyses based on the more standard approach of genotyping 26 Y-SNPs plus 21 Y-STRs and applying the programs NETWORK and BATWING. We find that deep phylogenetic structure is not adequately reconstructed after Y-SNP plus Y-STR genotyping, and that times estimated using observed Y-STR mutation rates are several-fold too recent. In contrast, an evolutionary mutation rate gives times that are more similar to the resequencing data. In principle, systematic comparisons of this kind can in future studies be used to identify the combinations of Y-SNP and Y-STR markers, and time estimation methodologies, that correspond best to resequencing data. PMID:23768990

  5. [New SNP markers of the honeybee vitellogenin gene (Vg) used for identification of subspecies Apis mellifera mellifera L].

    PubMed

    Ilyasov, R A; Poskryakov, A V; Nikolenko, A G

    2015-02-01

    Preservation of the gene pool of honeybee subspecies Apis mellifera mellifera is of vital importance for successful beekeeping development in the northern regions of Eurasia. An effective method of genotyping honeybee colonies used in modern science is the mapping of sites of single nucleotide polymorphism (SNP). The honeybee vitellogenin gene (Vg) encodes a protein that affects reproductive function, behavior, immunity, longevity, and social organization in the honeybee Apis mellifera and is therefore a topical research subject. The results of comparative analysis of honeybee Vg sequences show that there are 26 SNP sites that differentiate M and C evolutionary branches and can be used as markers in selective breeding, DNA-barcoding, and the creation of genetic passports for A. m. mellifera colonies.

  6. A high-density, multi-parental SNP genetic map on apple validates a new mapping approach for outcrossing species

    PubMed Central

    Di Pierro, Erica A; Gianfranceschi, Luca; Di Guardo, Mario; Koehorst-van Putten, Herma JJ; Kruisselbrink, Johannes W; Longhi, Sara; Troggio, Michela; Bianco, Luca; Muranty, Hélène; Pagliarani, Giulia; Tartarini, Stefano; Letschka, Thomas; Lozano Luis, Lidia; Garkava-Gustavsson, Larisa; Micheletti, Diego; Bink, Marco CAM; Voorrips, Roeland E; Aziz, Ebrahimi; Velasco, Riccardo; Laurens, François; van de Weg, W Eric

    2016-01-01

    Quantitative trait loci (QTL) mapping approaches rely on the correct ordering of molecular markers along the chromosomes, which can be obtained from genetic linkage maps or a reference genome sequence. For apple (Malus domestica Borkh), the genome sequence v1 and v2 could not meet this need; therefore, a novel approach was devised to develop a dense genetic linkage map, providing the most reliable marker-loci order for the highest possible number of markers. The approach was based on four strategies: (i) the use of multiple full-sib families, (ii) the reduction of missing information through the use of HaploBlocks and alternative calling procedures for single-nucleotide polymorphism (SNP) markers, (iii) the construction of a single backcross-type data set including all families, and (iv) a two-step map generation procedure based on the sequential inclusion of markers. The map comprises 15 417 SNP markers, clustered in 3 K HaploBlock markers spanning 1 267 cM, with an average distance between adjacent markers of 0.37 cM and a maximum distance of 3.29 cM. Moreover, chromosome 5 was oriented according to its homoeologous chromosome 10. This map was useful to improve the apple genome sequence, design the Axiom Apple 480 K SNP array and perform multifamily-based QTL studies. Its collinearity with the genome sequences v1 and v3 are reported. To our knowledge, this is the shortest published SNP map in apple, while including the largest number of markers, families and individuals. This result validates our methodology, proving its value for the construction of integrated linkage maps for any outbreeding species. PMID:27917289

  7. Direct analysis of unphased SNP genotype data in population-based association studies via Bayesian partition modelling of haplotypes.

    PubMed

    Morris, Andrew P

    2005-09-01

    We describe a novel method for assessing the strength of disease association with single nucleotide polymorphisms (SNPs) in a candidate gene or small candidate region, and for estimating the corresponding haplotype relative risks of disease, using unphased genotype data directly. We begin by estimating the relative frequencies of haplotypes consistent with observed SNP genotypes. Under the Bayesian partition model, we specify cluster centres from this set of consistent SNP haplotypes. The remaining haplotypes are then assigned to the cluster with the "nearest" centre, where distance is defined in terms of SNP allele matches. Within a logistic regression modelling framework, each haplotype within a cluster is assigned the same disease risk, reducing the number of parameters required. Uncertainty in phase assignment is addressed by considering all possible haplotype configurations consistent with each unphased genotype, weighted in the logistic regression likelihood by their probabilities, calculated according to the estimated relative haplotype frequencies. We develop a Markov chain Monte Carlo algorithm to sample over the space of haplotype clusters and corresponding disease risks, allowing for covariates that might include environmental risk factors or polygenic effects. Application of the algorithm to SNP genotype data in an 890-kb region flanking the CYP2D6 gene illustrates that we can identify clusters of haplotypes with similar risk of poor drug metaboliser (PDM) phenotype, and can distinguish PDM cases carrying different high-risk variants. Further, the results of a detailed simulation study suggest that we can identify positive evidence of association for moderate relative disease risks with a sample of 1,000 cases and 1,000 controls.

  8. Pedigree- and SNP-Associated Genetics and Recent Environment are the Major Contributors to Anthropometric and Cardiometabolic Trait Variation.

    PubMed

    Xia, Charley; Amador, Carmen; Huffman, Jennifer; Trochet, Holly; Campbell, Archie; Porteous, David; Hastie, Nicholas D; Hayward, Caroline; Vitart, Veronique; Navarro, Pau; Haley, Chris S

    2016-02-01

    Genome-wide association studies have successfully identified thousands of loci for a range of human complex traits and diseases. The proportion of phenotypic variance explained by significant associations is, however, limited. Given the same dense SNP panels, mixed model analyses capture a greater proportion of phenotypic variance than single SNP analyses but the total is generally still less than the genetic variance estimated from pedigree studies. Combining information from pedigree relationships and SNPs, we examined 16 complex anthropometric and cardiometabolic traits in a Scottish family-based cohort comprising up to 20,000 individuals genotyped for ~520,000 common autosomal SNPs. The inclusion of related individuals provides the opportunity to also estimate the genetic variance associated with pedigree as well as the effects of common family environment. Trait variation was partitioned into SNP-associated and pedigree-associated genetic variation, shared nuclear family environment, shared couple (partner) environment and shared full-sibling environment. Results demonstrate that trait heritabilities vary widely but, on average across traits, SNP-associated and pedigree-associated genetic effects each explain around half the genetic variance. For most traits the recently-shared environment of couples is also significant, accounting for ~11% of the phenotypic variance on average. On the other hand, the environment shared largely in the past by members of a nuclear family or by full-siblings, has a more limited impact. Our findings point to appropriate models to use in future studies as pedigree-associated genetic effects and couple environmental effects have seldom been taken into account in genotype-based analyses. Appropriate description of the trait variation could help understand causes of intra-individual variation and in the detection of contributing loci and environmental factors.

  9. Development of high-density SNP genotyping arrays for white spruce (Picea glauca) and transferability to subtropical and nordic congeners.

    PubMed

    Pavy, Nathalie; Gagnon, France; Rigault, Philippe; Blais, Sylvie; Deschênes, Astrid; Boyle, Brian; Pelgas, Betty; Deslauriers, Marie; Clément, Sébastien; Lavigne, Patricia; Lamothe, Manuel; Cooke, Janice E K; Jaramillo-Correa, Juan P; Beaulieu, Jean; Isabel, Nathalie; Mackay, John; Bousquet, Jean

    2013-03-01

    High-density SNP genotyping arrays can be designed for any species given sufficient sequence information of high quality. Two high-density SNP arrays relying on the Infinium iSelect technology (Illumina) were designed for use in the conifer white spruce (Picea glauca). One array contained 7338 segregating SNPs representative of 2814 genes of various molecular functional classes for main uses in genetic association and population genetics studies. The other one contained 9559 segregating SNPs representative of 9543 genes for main uses in population genetics, linkage mapping of the genome and genomic prediction. The SNPs assayed were discovered from various sources of gene resequencing data. SNPs predicted from high-quality sequences derived from genomic DNA reached a genotyping success rate of 64.7%. Nonsingleton in silico SNPs (i.e. a sequence polymorphism present in at least two reads) predicted from expressed sequenced tags obtained with the Roche 454 technology and Illumina GAII analyser resulted in a similar genotyping success rate of 71.6% when the deepest alignment was used and the most favourable SNP probe per gene was selected. A variable proportion of these SNPs was shared by other nordic and subtropical spruce species from North America and Europe. The number of shared SNPs was inversely proportional to phylogenetic divergence and standing genetic variation in the recipient species, but positively related to allele frequency in P. glauca natural populations. These validated SNP resources should open up new avenues for population genetics and comparative genetic mapping at a genomic scale in spruce species.

  10. Identification of a sex-linked SNP marker in the salmon louse (Lepeophtheirus salmonis) using RAD sequencing.

    PubMed

    Carmichael, Stephen N; Bekaert, Michaël; Taggart, John B; Christie, Hayden R L; Bassett, David I; Bron, James E; Skuce, Philip J; Gharbi, Karim; Skern-Mauritzen, Rasmus; Sturm, Armin

    2013-01-01

    The salmon louse (Lepeophtheirus salmonis (Krøyer, 1837)) is a parasitic copepod that can, if untreated, cause considerable damage to Atlantic salmon (Salmo salar Linnaeus, 1758) and incurs significant costs to the Atlantic salmon mariculture industry. Salmon lice are gonochoristic and normally show sex ratios close to 1:1. While this observation suggests that sex determination in salmon lice is genetic, with only minor environmental influences, the mechanism of sex determination in the salmon louse is unknown. This paper describes the identification of a sex-linked Single Nucleotide Polymorphism (SNP) marker, providing the first evidence for a genetic mechanism of sex determination in the salmon louse. Restriction site-associated DNA sequencing (RAD-seq) was used to isolate SNP markers in a laboratory-maintained salmon louse strain. A total of 85 million raw Illumina 100 base paired-end reads produced 281,838 unique RAD-tags across 24 unrelated individuals. RAD marker Lsa101901 showed complete association with phenotypic sex for all individuals analysed, being heterozygous in females and homozygous in males. Using an allele-specific PCR assay for genotyping, this SNP association pattern was further confirmed for three unrelated salmon louse strains, displaying complete association with phenotypic sex in a total of 96 genotyped individuals. The marker Lsa101901 was located in the coding region of the prohibitin-2 gene, which showed a sex-dependent differential expression, with mRNA levels determined by RT-qPCR about 1.8-fold higher in adult female than adult male salmon lice. This study's observations of a novel sex-linked SNP marker are consistent with sex determination in the salmon louse being genetic and following a female heterozygous system. Marker Lsa101901 provides a tool to determine the genetic sex of salmon lice, and could be useful in the development of control strategies.

  11. A fast and accurate method for detection of IBD shared haplotypes in genome-wide SNP data.

    PubMed

    Bjelland, Douglas W; Lingala, Uday; Patel, Piyush S; Jones, Matt; Keller, Matthew C

    2017-02-08

    Identical by descent (IBD) segments are used to understand a number of fundamental issues in genetics. IBD segments are typically detected using long stretches of identical alleles between haplotypes in phased, whole-genome SNP data. Phase or SNP call errors in genomic data can degrade accuracy of IBD detection and lead to false-positive/negative calls and to under/overextension of true IBD segments. Furthermore, the number of comparisons increases quadratically with sample size, requiring high computational efficiency. We developed a new IBD segment detection program, FISHR (Find IBD Shared Haplotypes Rapidly), in an attempt to accurately detect IBD segments and to better estimate their endpoints using an algorithm that is fast enough to be deployed on very large whole-genome SNP data sets. We compared the performance of FISHR to three leading IBD segment detection programs: GERMLINE, refined IBD, and HaploScore. Using simulated and real genomic sequence data, we show that FISHR is slightly more accurate than all programs at detecting long (>3 cm) IBD segments but slightly less accurate than refined IBD at detecting short (~1 cm) IBD segments. More centrally, FISHR outperforms all programs in determining the true endpoints of IBD segments, which is crucial for several applications of IBD information. FISHR takes two to three times longer than GERMLINE to run, whereas both GERMLINE and FISHR were orders of magnitude faster than refined IBD and HaploScore. Overall, FISHR provides accurate IBD detection in unrelated individuals and is computationally efficient enough to be utilized on large SNP data sets >60 000 individuals.European Journal of Human Genetics advance online publication, 8 February 2017; doi:10.1038/ejhg.2017.6.

  12. Fast-SNP: a fast matrix pre-processing algorithm for efficient loopless flux optimization of metabolic models

    PubMed Central

    Saa, Pedro A.; Nielsen, Lars K.

    2016-01-01

    Motivation: Computation of steady-state flux solutions in large metabolic models is routinely performed using flux balance analysis based on a simple LP (Linear Programming) formulation. A minimal requirement for thermodynamic feasibility of the flux solution is the absence of internal loops, which are enforced using ‘loopless constraints’. The resulting loopless flux problem is a substantially harder MILP (Mixed Integer Linear Programming) problem, which is computationally expensive for large metabolic models. Results: We developed a pre-processing algorithm that significantly reduces the size of the original loopless problem into an easier and equivalent MILP problem. The pre-processing step employs a fast matrix sparsification algorithm—Fast- sparse null-space pursuit (SNP)—inspired by recent results on SNP. By finding a reduced feasible ‘loop-law’ matrix subject to known directionalities, Fast-SNP considerably improves the computational efficiency in several metabolic models running different loopless optimization problems. Furthermore, analysis of the topology encoded in the reduced loop matrix enabled identification of key directional constraints for the potential permanent elimination of infeasible loops in the underlying model. Overall, Fast-SNP is an effective and simple algorithm for efficient formulation of loop-law constraints, making loopless flux optimization feasible and numerically tractable at large scale. Availability and Implementation: Source code for MATLAB including examples is freely available for download at http://www.aibn.uq.edu.au/cssb-resources under Software. Optimization uses Gurobi, CPLEX or GLPK (the latter is included with the algorithm). Contact: lars.nielsen@uq.edu.au Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27559155

  13. A Whole Genome DArTseq and SNP Analysis for Genetic Diversity Assessment in Durum Wheat from Central Fertile Crescent

    PubMed Central

    Shahid, Muhammad Qasim; Çiftçi, Vahdettin; E. Sáenz de Miera, Luis; Aasim, Muhammad; Nadeem, Muhammad Azhar; Aktaş, Husnu; Özkan, Hakan; Hatipoğlu, Rüştü

    2017-01-01

    Until now, little attention has been paid to the geographic distribution and evaluation of genetic diversity of durum wheat from the Central Fertile Crescent (modern-day Turkey and Syria). Turkey and Syria are considered as primary centers of wheat diversity, and thousands of locally adapted wheat landraces are still present in the farmers’ small fields. We planned this study to evaluate the genetic diversity of durum wheat landraces from the Central Fertile Crescent by genotyping based on DArTseq and SNP analysis. A total of 39,568 DArTseq and 20,661 SNP markers were used to characterize the genetic characteristic of 91 durum wheat land races. Clustering based on Neighbor joining analysis, principal coordinate as well as Bayesian model implemented in structure, clearly showed that the grouping pattern is not associated with the geographical distribution of the durum wheat due to the mixing of the Turkish and Syrian landraces. Significant correlation between DArTseq and SNP markers was observed in the Mantel test. However, we detected a non-significant relationship between geographical coordinates and DArTseq (r = -0.085) and SNP (r = -0.039) loci. These results showed that unconscious farmer selection and lack of the commercial varieties might have resulted in the exchange of genetic material and this was apparent in the genetic structure of durum wheat in Turkey and Syria. The genomic characterization presented here is an essential step towards a future exploitation of the available durum wheat genetic resources in genomic and breeding programs. The results of this study have also depicted a clear insight about the genetic diversity of wheat accessions from the Central Fertile Crescent. PMID:28099442

  14. A single-tube 27-plex SNP assay for estimating individual ancestry and admixture from three continents.

    PubMed

    Wei, Yi-Liang; Wei, Li; Zhao, Lei; Sun, Qi-Fan; Jiang, Li; Zhang, Tao; Liu, Hai-Bo; Chen, Jian-Gang; Ye, Jian; Hu, Lan; Li, Cai-Xia

    2016-01-01

    A single-tube multiplex assay of a small set of ancestry-informative markers (AIMs) for effectively estimating individual ancestry and admixture is an ideal forensic tool to trace the population origin of an unknown DNA sample. We present a newly developed 27-plex single nucleotide polymorphism (SNP) panel with highly robust and balanced differential power to perfectly assign individuals to African, European, and East Asian ancestries. Evaluating 968 previously described intercontinental AIMs from three HapMap population genotyping datasets (Yoruban in Ibadan, Nigeria (YRI); Utah residents with Northern and Western European ancestry from the Centre de'Etude du Polymorphism Humain (CEPH) collection (CEU); and Han Chinese in Beijing, China (CHB)), the best set of markers was selected on the basis of Hardy-Weinberg equilibrium (p > 0.00001), population-specific allele frequency (two of three δ values >0.5), according to linkage disequilibrium (r (2) < 0.2), and capable of being multiplexed in one tube and detected by capillary electrophoresis. The 27-SNP panel was first validated by assigning the ancestry of the 11 populations in the HapMap project. Then, we tested the 27-plex SNP assay with 1164 individuals from 17 additional populations. The results demonstrated that the SNP panel was successful for ancestry inference of individuals with African, European, and East Asian ancestry. Furthermore, the system performed well when inferring the admixture of Eurasians (EUR/EAS) after analyzing admixed populations from Xinjiang (Central Asian) as follows: Tajik (68:27), Uyghur (49:46), Kirgiz (40:57), and Kazak (36:60). For individual analyses, we interpreted each sample with a three-ancestry component percentage and a population match probability sequence. This multiplex assay is a convenient and cost-effective tool to assist in criminal investigations, as well as to correct for the effects of population stratification for case-control studies.

  15. Incremental impact of breast cancer SNP panel on risk classification in a screening population of white and African American women.

    PubMed

    McCarthy, Anne Marie; Armstrong, Katrina; Handorf, Elizabeth; Boghossian, Leigh; Jones, Marisa; Chen, Jinbo; Demeter, Mirar Bristol; McGuire, Erin; Conant, Emily F; Domchek, Susan M

    2013-04-01

    Breast cancer risk prediction remains imperfect, particularly among non-white populations. This study examines the impact of including single-nucleotide polymorphism (SNP) alleles in risk prediction for white and African American women undergoing screening mammogram. Using a prospective cohort study, standard risk information and buccal swabs were collected at the time of screening mammography. A 12 SNP panel was performed by deCODE genetics. Five-year and lifetime risks incorporating SNPs were calculated by multiplying estimated Breast Cancer Risk Assessment Tool (BCRAT) risk by the total genetic risk ratio. Concordance between the BCRAT and the combined model (BCRAT + SNPs) in identifying high-risk women was measured using the kappa statistic. SNP data were available for 810 women (39 % African American, 55 % white). The mean BCRAT 5-year risk was 1.71 % for whites and 1.18 % for African Americans. Mean genetic risk ratios were 1.09 in whites and 1.29 in African Americans. Among whites, three SNPs had higher frequencies, and among African Americans, seven SNPs had higher and four had lower high-risk allele frequencies than previously reported. Agreement between the BCRAT and the combined model was relatively low for identifying high-risk women (5-year κ = 0.54, lifetime κ = 0.36). Addition of SNPs had the greatest effect among African Americans, with 12.4 % identified as having high-5-year risk by BCRAT, but 33 % by the combined model. A greater proportion of African Americans were reclassified as having high-5-year risk than whites using the combined model (21 vs. 10 %). The addition of SNPs to the BCRAT reclassifies the high-risk status of some women undergoing screening mammography, particularly African Americans. Further research is needed to determine the clinical validity and utility of the SNP panel for use in breast cancer risk prediction, particularly among African Americans for whom these risk alleles have generally not been validated.

  16. Ruminal expression of the NQO1, RGS5, and ACAT1 genes may be indicators of feed efficiency in beef steers

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Ruminal genes differentially expressed in crossbred beef steers with variation in gain and feed intake were identified in a previous study. Genes identified with expression patterns differing between animals with high gain-low feed intake and low gain-high feed intake were evaluated in a separate po...

  17. Weighted pseudolikelihood for SNP set analysis with multiple secondary outcomes in case-control genetic association studies.

    PubMed

    Sofer, Tamar; Schifano, Elizabeth D; Christiani, David C; Lin, Xihong

    2017-03-27

    We propose a weighted pseudolikelihood method for analyzing the association of a SNP set, example, SNPs in a gene or a genetic pathway or network, with multiple secondary phenotypes in case-control genetic association studies. To boost analysis power, we assume that the SNP-specific effects are shared across all secondary phenotypes using a scaled mean model. We estimate regression parameters using Inverse Probability Weighted (IPW) estimating equations obtained from the weighted pseudolikelihood, which accounts for case-control sampling to prevent potential ascertainment bias. To test the effect of a SNP set, we propose a weighted variance component pseudo-score test. We also propose a penalized IPW pseudolikelihood method for selecting a subset of SNPs that are associated with the multiple secondary phenotypes. We show that the proposed variable selection procedure has the oracle properties and is robust to misspecification of the correlation structure among secondary phenotypes. We select the tuning parameter using a weighted Bayesian Information-like Criterion (wBIC). We evaluate the finite sample performance of the proposed methods via simulations, and illustrate the methods by the analysis of the multiple secondary smoking behavior outcomes in a lung cancer case-control genetic association study.

  18. Integrating Milk Metabolite Profile Information for the Prediction of Traditional Milk Traits Based on SNP Information for Holstein Cows

    PubMed Central

    Melzer, Nina; Wittenburg, Dörte; Repsilber, Dirk

    2013-01-01

    In this study the benefit of metabolome level analysis for the prediction of genetic value of three traditional milk traits was investigated. Our proposed approach consists of three steps: First, milk metabolite profiles are used to predict three traditional milk traits of 1,305 Holstein cows. Two regression methods, both enabling variable selection, are applied to identify important milk metabolites in this step. Second, the prediction of these important milk metabolite from single nucleotide polymorphisms (SNPs) enables the detection of SNPs with significant genetic effects. Finally, these SNPs are used to predict milk traits. The observed precision of predicted genetic values was compared to the results observed for the classical genotype-phenotype prediction using all SNPs or a reduced SNP subset (reduced classical approach). To enable a comparison between SNP subsets, a special invariable evaluation design was implemented. SNPs close to or within known quantitative trait loci (QTL) were determined. This enabled us to determine if detected important SNP subsets were enriched in these regions. The results show that our approach can lead to genetic value prediction, but requires less than 1% of the total amount of (40,317) SNPs., significantly more important SNPs in known QTL regions were detected using our approach compared to the reduced classical approach. Concluding, our approach allows a deeper insight into the associations between the different levels of the genotype-phenotype map (genotype-metabolome, metabolome-phenotype, genotype-phenotype). PMID:23990900

  19. Surface invasive cleavage assay on a maskless light-directed diamond DNA microarray for genome-wide human SNP mapping.

    PubMed

    Nie, Bei; Yang, Min; Fu, Weiling; Liang, Zhiqing

    2015-07-07

    The surface invasive cleavage assay, because of its innate accuracy and ability for self-signal amplification, provides a potential route for the mapping of hundreds of thousands of human SNP sites. However, its performance on a high density DNA array has not yet been established, due to the unusual "hairpin" probe design on the microarray and the lack of chemical stability of commercially available substrates. Here we present an applicable method to implement a nanocrystalline diamond thin film as an alternative substrate for fabricating an addressable DNA array using maskless light-directed photochemistry, producing the most chemically stable and biocompatible system for genetic analysis and enzymatic reactions. The surface invasive cleavage reaction, followed by degenerated primer ligation and post-rolling circle amplification is consecutively performed on the addressable diamond DNA array, accurately mapping SNP sites from PCR-amplified human genomic target DNA. Furthermore, a specially-designed DNA array containing dual probes in the same pixel is fabricated by following a reverse light-directed DNA synthesis protocol. This essentially enables us to decipher thousands of SNP alleles in a single-pot reaction by the simple addition of enzyme, target and reaction buffers.

  20. A second generation SNP and SSR integrated linkage map and QTL mapping for the Chinese mitten crab Eriocheir sinensis

    PubMed Central

    Qiu, Gao-Feng; Xiong, Liang-Wei; Han, Zhi-Ke; Liu, Zhi-Qiang; Feng, Jian-Bin; Wu, Xu-Gan; Yan, Yin-Long; Shen, Hong; Huang, Long; Chen, Li

    2017-01-01

    The Chinese mitten crab Eriocheir sinensis is the most economically important cultivated crab species in China, and its genome has a high number of chromosomes (2n = 146). To obtain sufficient markers for construction of a dense genetic map for this species, we employed the recently developed specific-locus amplified fragment sequencing (SLAF-seq) method for large-scale SNPs screening and genotyping in a F1 full-sib family of 149 individuals. SLAF-seq generated 127,677 polymorphic SNP markers, of which 20,803 valid markers were assigned into five segregation types and were used together with previous SSR markers for linkage map construction. The final integrated genetic map included 17,680 SNP and 629 SSR markers on the 73 linkage groups (LG), and spanned 14,894.9 cM with an average marker interval of 0.81 cM. QTL mapping localized three significant growth-related QTL to a 1.2 cM region in LG53 as well as 146 sex-linked markers in LG48. Genome-wide QTL-association analysis further identified four growth-related QTL genes named LNX2, PAK2, FMRFamide and octopamine receptors. These genes are involved in a variety of different signaling pathways including cell proliferation and growth. The map and SNP markers described here will be a valuable resource for the E. sinensis genome project and selective breeding programs. PMID:28045132

  1. SYBR green dye-based probe-free SNP genotyping: introduction of T-Plex real-time PCR assay.

    PubMed

    Baris, Ibrahim; Etlik, Ozdal; Koksal, Vedat; Ocak, Zeynep; Baris, Saniye Tugba

    2013-10-15

    Single-nucleotide polymorphism (SNP) genotyping is widely used in genetic association studies to characterize genetic factors underlying inherited traits. Despite many recent advances in high-throughput SNP genotyping, inexpensive and flexible methods with reasonable throughput levels are still needed. Real-time PCR methods for discovering and genotyping SNPs are becoming increasingly important in various fields of biology. In this study, we introduce a new, single-tube strategy that combines the tetra-primer ARMS PCR assay, SYBR Green I-based real-time PCR, and melting-point analysis with primer design strategies to detect the SNP of interest. This assay, T-Plex real-time PCR, is based on the T(m) discrimination of the amplified allele-specific amplicons in a single tube. The specificity, sensitivity, and robustness of the assay were evaluated for common mutations in the FV, PII, MTHFR, and FGFR3 genes. We believe that T-Plex real-time PCR would be a useful alternative for either individual genotyping requests or large epidemiological studies.

  2. Genome-scale DNA variant analysis and functional validation of a SNP underlying yellow fruit color in wild strawberry

    PubMed Central

    Hawkins, Charles; Caruana, Julie; Schiksnis, Erin; Liu, Zhongchi

    2016-01-01

    Fragaria vesca is a species of diploid strawberry being developed as a model for the octoploid garden strawberry. This work sequenced and compared the genomes of three F. vesca accessions: ‘Hawaii 4′, ‘Rügen’, and ‘Yellow Wonder’. Genome-scale analyses of shared and distinct SNPs among these three accessions have revealed that ‘Rügen’ and ‘Yellow Wonder’ are more similar to each other than they are to ‘Hawaii 4’. Though all three accessions are inbred seven generations, each accession still possesses extensive heterozygosity, highlighting the inherent differences between individual plants even of the same accession. The identification of the impact of each SNP as well as the large number of Indel markers provides a foundation for locating candidate mutations underlying phenotypic variations among these F. vesca accessions and for mapping new mutations generated through forward genetics screens. Through systematic analysis of SNP variants affecting genes in anthocyanin biosynthesis and regulation, a candidate SNP in FveMYB10 was identified and then functionally confirmed to be responsible for the yellow color fruits made by many F. vesca accessions. As a whole, this study provides further resources for F. vesca and establishes a foundation for linking traits of economic importance to specific genes and variants. PMID:27377763

  3. RAD sequencing yields a high success rate for westslope cutthroat and rainbow trout species-diagnostic SNP assays

    USGS Publications Warehouse

    Stephen J. Amish,; Paul A. Hohenlohe,; Sally Painter,; Robb F. Leary,; Muhlfeld, Clint C.; Fred W. Allendorf,; Luikart, Gordon

    2012-01-01

    Hybridization with introduced rainbow trout threatens most native westslope cutthroat trout populations. Understanding the genetic effects of hybridization and introgression requires a large set of high-throughput, diagnostic genetic markers to inform conservation and management. Recently, we identified several thousand candidate single-nucleotide polymorphism (SNP) markers based on RAD sequencing of 11 westslope cutthroat trout and 13 rainbow trout individuals. Here, we used flanking sequence for 56 of these candidate SNP markers to design high-throughput genotyping assays. We validated the assays on a total of 92 individuals from 22 populations and seven hatchery strains. Forty-six assays (82%) amplified consistently and allowed easy identification of westslope cutthroat and rainbow trout alleles as well as heterozygote controls. The 46 SNPs will provide high power for early detection of population admixture and improved identification of hybrid and nonhybridized individuals. This technique shows promise as a very low-cost, reliable and relatively rapid method for developing and testing SNP markers for nonmodel organisms with limited genomic resources.

  4. Effects of sodium nitroprusside (SNP) pretreatment on UV-B stress tolerance in lettuce (Lactuca sativa L.) seedlings.

    PubMed

    Esringu, Aslıhan; Aksakal, Ozkan; Tabay, Dilruba; Kara, Ayse Aydan

    2016-01-01

    Ultraviolet-B (UV-B) radiation is one of the most important abiotic stress factors that could influence plant growth, development, and productivity. Nitric oxide (NO) is an important plant growth regulator involved in a wide variety of physiological processes. In the present study, the possibility of enhancing UV-B stress tolerance of lettuce seedlings by the exogenous application of sodium nitroprusside (SNP) was investigated. UV-B radiation increased the activities of superoxide dismutase (SOD), catalase (CAT), ascorbate peroxidase (APX), peroxidase (POD) and total phenolic concentrations, antioxidant capacity, and expression of phenylalanine ammonia lyase (PAL) gene in seedlings, but the combination of SNP pretreatment and UV-B enhanced antioxidant enzyme activities, total phenolic concentrations, antioxidant capacity, and PAL gene expression even more. Moreover, UV-B radiation significantly inhibited chlorophylls, carotenoid, gibberellic acid (GA), and indole-3-acetic acid (IAA) contents and increased the contents of abscisic acid (ABA), salicylic acid (SA), malondialdehyde (MDA), hydrogen peroxide (H2O2), and superoxide radical (O2•(-)) in lettuce seedlings. When SNP pretreatment was combined with the UV-B radiation, we observed alleviated chlorophylls, carotenoid, GA, and IAA inhibition and decreased content of ABA, SA, MDA, H2O2, and O2•(-) in comparison to non-pretreated stressed seedlings.

  5. A major SNP resource for dissection of phenotypic and genetic variation in Pacific white shrimp (Litopenaeus vannamei).

    PubMed

    Ciobanu, D C; Bastiaansen, J W M; Magrin, J; Rocha, J L; Jiang, D-H; Yu, N; Geiger, B; Deeb, N; Rocha, D; Gong, H; Kinghorn, B P; Plastow, G S; van der Steen, H A M; Mileham, A J

    2010-02-01

    Bioinformatics and re-sequencing approaches were used for the discovery of sequence polymorphisms in Litopenaeus vannamei. A total of 1221 putative single nucleotide polymorphisms (SNPs) were identified in a pool of individuals from various commercial populations. A set of 211 SNPs were selected for further molecular validation and 88% showed variation in 637 samples representing three commercial breeding lines. An association analysis was performed between these markers and several traits of economic importance for shrimp producers including resistance to three major viral diseases. A small number of SNPs showed associations with test weekly gain, grow-out survival and resistance to Taura Syndrome Virus. Very low levels of linkage disequilibrium were revealed between most SNP pairs, with only 11% of SNPs showing an r(2)-value above 0.10 with at least one other SNP. Comparison of allele frequencies showed small changes over three generations of the breeding programme in one of the commercial breeding populations. This unique SNP resource has the potential to catalyse future studies of genetic dissection of complex traits, tracing relationships in breeding programmes, and monitoring genetic diversity in commercial and wild populations of L. vannamei.

  6. Genetic Variation and Breeding Signature in Mass Selection Lines of the Pacific Oyster (Crassostrea gigas) Assessed by SNP Markers

    PubMed Central

    Zhong, Xiaoxiao; Feng, Dandan; Yu, Hong; Kong, Lingfeng; Li, Qi

    2016-01-01

    In breeding industries, a challenging problem is how to keep genetic diversity over generations. To investigate genetic variation and identify breeding signatures in mass selected lines of Pacific oyster (Crassostrea gigas), three sixth-generation selected lines and four wild populations were assessed using 103 single nucleotide polymorphism (SNP) markers. The genetic diversity data indicated that the selected lines exhibited a significant reduction in the observed heterozygosity and observed number of alleles per locus compared with the wild populations (P≤0.05), indicating the selected lines tended to lose genetic diversity contrasted with the wild populations. The unweighted pair-group method with arithmetic mean (UPGMA) analysis showed that the wild populations and selected lines were not separated into two groups. Using four outlier tests, a total of 17 loci were found under selection at two levels. The global outlier detection suggested that 4 common outlier loci were subject to selection using both the hierarchical island model and Bayesian likelihood approaches. At regional level, 3 SNPs were detected as outlier using at least two outlier tests and one outlier SNP (CgSNP309) was overlapped in the two wild-selected population comparisons. The candidate outlier SNPs provide valuable resources for future association studies in C. gigas. PMID:26954577

  7. RAD sequencing yields a high success rate for westslope cutthroat and rainbow trout species-diagnostic SNP assays.

    PubMed

    Amish, Stephen J; Hohenlohe, Paul A; Painter, Sally; Leary, Robb F; Muhlfeld, Clint; Allendorf, Fred W; Luikart, Gordon

    2012-07-01

    Hybridization with introduced rainbow trout threatens most native westslope cutthroat trout populations. Understanding the genetic effects of hybridization and introgression requires a large set of high-throughput, diagnostic genetic markers to inform conservation and management. Recently, we identified several thousand candidate single-nucleotide polymorphism (SNP) markers based on RAD sequencing of 11 westslope cutthroat trout and 13 rainbow trout individuals. Here, we used flanking sequence for 56 of these candidate SNP markers to design high-throughput genotyping assays. We validated the assays on a total of 92 individuals from 22 populations and seven hatchery strains. Forty-six assays (82%) amplified consistently and allowed easy identification of westslope cutthroat and rainbow trout alleles as well as heterozygote controls. The 46 SNPs will provide high power for early detection of population admixture and improved identification of hybrid and nonhybridized individuals. This technique shows promise as a very low-cost, reliable and relatively rapid method for developing and testing SNP markers for nonmodel organisms with limited genomic resources.

  8. Development of an Alfalfa SNP Array and Its Use to Evaluate Patterns of Population Structure and Linkage Disequilibrium

    PubMed Central

    Li, Xuehui; Han, Yuanhong; Wei, Yanling; Acharya, Ananta; Farmer, Andrew D.; Ho, Julie; Monteros, Maria J.; Brummer, E. Charles

    2014-01-01

    A large set of genome-wide markers and a high-throughput genotyping platform can facilitate the genetic dissection of complex traits and accelerate molecular breeding applications. Previously, we identified about 0.9 million SNP markers by sequencing transcriptomes of 27 diverse alfalfa genotypes. From this SNP set, we developed an Illumina Infinium array containing 9,277 SNPs. Using this array, we genotyped 280 diverse alfalfa genotypes and several genotypes from related species. About 81% (7,476) of the SNPs met the criteria for quality control and showed polymorphisms. The alfalfa SNP array also showed a high level of transferability for several closely related Medicago species. Principal component analysis and model-based clustering showed clear population structure corresponding to subspecies and ploidy levels. Within cultivated tetraploid alfalfa, genotypes from dormant and nondormant cultivars were largely assigned to different clusters; genotypes from semidormant cultivars were split between the groups. The extent of linkage disequilibrium (LD) across all genotypes rapidly decayed to 26 Kbp at r2 = 0.2, but the rate varied across ploidy levels and subspecies. A high level of consistency in LD was found between and within the two subpopulations of cultivated dormant and nondormant alfalfa suggesting that genome-wide association studies (GWAS) and genomic selection (GS) could be conducted using alfalfa genotypes from throughout the fall dormancy spectrum. However, the relatively low LD levels would require a large number of markers to fully saturate the genome. PMID:24416217

  9. Impact of Repetitive Transcranial Magnetic Stimulation on Post-Stroke Dysmnesia and the Role of BDNF Val66Met SNP

    PubMed Central

    Lu, Haitao; Zhang, Tong; Wen, Mei; Sun, Li

    2015-01-01

    Background Little is known about the effects of low-frequency repetitive transcranial magnetic stimulation (rTMS) on dysmnesia and the impact of brain nucleotide neurotrophic factor (BDNF) Val66Met single-nucleotide polymorphism (SNP). This study investigated the impact of low-frequency rTMS on post-stroke dysmnesia and the impact of BDNF Val66Met SNP. Material/Methods Forty patients with post-stroke dysmnesia were prospectively randomized into the rTMS and sham groups. BDNF Val66Met SNP was determined using restriction fragment length polymorphism. Montreal Cognitive Assessment (MoCA), Loewenstein Occupational Therapy of Cognitive Assessment (LOTCA), and Rivermead Behavior Memory Test (RBMT) scores, as well as plasma BDNF concentrations, were measured at baseline and at 3 days and 2 months post-treatment. Results MoCA, LOTCA, and RBMT scores were higher after rTMS. Three days after treatment, BDNF decreased in the rTMS group but it increased in the sham group (P<0.05). Two months after treatment, RMBT scores in the rTMS group were higher than in the sham group, but not MoCA and LOTCA scores. Conclusions Low-frequency rTMS may improve after-stoke memory through various pathways, which may involve polymorphisms and several neural genes, but not through an increase in BDNF levels. PMID:25770310

  10. Correlating observed odds ratios from lung cancer case-control studies to SNP functional scores predicted by bioinformatic tools

    PubMed Central

    Zhu, Yong; Hoffman, Aaron; Wu, Xifeng; Zhang, Heping; Zhang, Yawei; Leaderer, Derek; Zheng, Tongzhang

    2008-01-01

    Bioinformatic tools are widely utilized to predict functional single nucleotide polymorphisms (SNPs) for genotyping in molecular epidemiological studies. However, the extent to which these approaches are mirrored by epidemiological findings has not been fully explored. In this study, we first surveyed SNPs examined in case-control studies of lung cancer, the most extensively-studied cancer type. We then computed SNP functional scores using four popular bioinformatics tools: SIFT, PolyPhen, SNPs3D, and PMut, and determined their predictive potential using the odds ratios (ORs) reported. Spearman’s correlation coefficient (r) for the association with SNP score from SIFT, PolyPhen, SNPs3D, and PMut, and the summary ORs were r = −0.36 (p = 0.007), r = 0.25 (p = 0.068), r = −0.20 (p = 0.205), and r = −0.12 (p = 0.370) respectively. By creating a combined score using information from all four tools we were able to achieve a correlation coefficient of r = 0.51 (p < 0.001). These results indicate that scores of predicted functionality could explain a certain fraction of the lung cancer risk detected in genetic association studies and more accurate predictions may be obtained by combining information from a variety of tools. Our findings suggest that bioinformatic tools are useful in predicting SNP functionality and may facilitate future genetic epidemiological studies. PMID:18191955

  11. Meta-analysis diagnostic accuracy of SNP-based pathogenicity detection tools: a case of UTG1A1 gene mutations

    PubMed Central

    Galehdari, Hamid; Saki, Najmaldin; Mohammadi-asl, Javad; Rahim, Fakher

    2013-01-01

    Crigler-Najjar syndrome (CNS) type I and type II are usually inherited as autosomal recessive conditions that result from mutations in the UGT1A1 gene. The main objective of the present review is to summarize results of all available evidence on the accuracy of SNP-based pathogenicity detection tools compared to published clinical result for the prediction of in nsSNPs that leads to disease using prediction performance method. A comprehensive search was performed to find all mutations related to CNS. Database searches included dbSNP, SNPdbe, HGMD, Swissvar, ensemble, and OMIM. All the mutation related to CNS was extracted. The pathogenicity prediction was done using SNP-based pathogenicity detection tools include SIFT, PHD-SNP, PolyPhen2, fathmm, Provean, and Mutpred. Overall, 59 different SNPs related to missense mutations in the UGT1A1 gene, were reviewed. Comparing the diagnostic OR, PolyPhen2 and Mutpred have the highest detection 4.983 (95% CI: 1.24 – 20.02) in both, following by SIFT (diagnostic OR: 3.25, 95% CI: 1.07 – 9.83). The highest MCC of SNP-based pathogenicity detection tools, was belong to SIFT (34.19%) followed by Provean, PolyPhen2, and Mutpred (29.99%, 29.89%, and 29.89%, respectively). Hence the highest SNP-based pathogenicity detection tools ACC, was fit to SIFT (62.71%) followed by PolyPhen2, and Mutpred (61.02%, in both). Our results suggest that some of the well-established SNP-based pathogenicity detection tools can appropriately reflect the role of a disease-associated SNP in both local and global structures. PMID:23875061

  12. The extent of linkage disequilibrium in beef cattle breeds using high-density SNP genotypes

    PubMed Central

    2014-01-01

    Background The extent of linkage disequilibrium (LD) between molecular markers impacts genome-wide association studies and implementation of genomic selection. The availability of high-density single nucleotide polymorphism (SNP) genotyping platforms makes it possible to investigate LD at an unprecedented resolution. In this work, we characterised LD decay in breeds of beef cattle of taurine, indicine and composite origins and explored its variation across autosomes and the X chromosome. Findings In each breed, LD decayed rapidly and r2 was less than 0.2 for marker pairs separated by 50 kb. The LD decay curves clustered into three groups of similar LD decay that distinguished the three main cattle types. At short distances between markers (< 10 kb), taurine breeds showed higher LD (r2 = 0.45) than their indicine (r2 = 0.25) and composite (r2 = 0.32) counterparts. This higher LD in taurine breeds was attributed to a smaller effective population size and a stronger bottleneck during breed formation. Using all SNPs on only the X chromosome, the three cattle types could still be distinguished. However for taurine breeds, the LD decay on the X chromosome was much faster and the background level much lower than for indicine breeds and composite populations. When using only SNPs that were polymorphic in all breeds, the analysis of the X chromosome mimicked that of the autosomes. Conclusions The pattern of LD mirrored some aspects of the history of breed populations and showed a sharp decay with increasing physical distance between markers. We conclude that the availability of the HD chip can be used to detect association signals that remained hidden when using lower density genotyping platforms, since LD dropped below 0.2 at distances of 50 kb. PMID:24661366

  13. Regions of homozygosity identified by oligonucleotide SNP arrays: evaluating the incidence and clinical utility.

    PubMed

    Wang, Jia-Chi; Ross, Leslie; Mahon, Loretta W; Owen, Renius; Hemmat, Morteza; Wang, Boris T; El Naggar, Mohammed; Kopita, Kimberly A; Randolph, Linda M; Chase, John M; Matas Aguilera, Maria J; Siles, Juan López; Church, Joseph A; Hauser, Natalie; Shen, Joseph J; Jones, Marilyn C; Wierenga, Klaas J; Jiang, Zhijie; Haddadin, Mary; Boyar, Fatih Z; Anguiano, Arturo; Strom, Charles M; Sahoo, Trilochan

    2015-05-01

    Copy neutral segments with allelic homozygosity, also known as regions of homozygosity (ROHs), are frequently identified in cases interrogated by oligonucleotide single-nucleotide polymorphism (oligo-SNP) microarrays. Presence of ROHs may be because of parental relatedness, chromosomal recombination or rearrangements and provides important clues regarding ancestral homozygosity, consanguinity or uniparental disomy. In this study of 14 574 consecutive cases, 832 (6%) were found to harbor one or more ROHs over 10 Mb, of which 651 cases (78%) had multiple ROHs, likely because of identity by descent (IBD), and 181 cases (22%) with ROHs involving a single chromosome. Parental relatedness was predicted to be first degree or closer in 5%, second in 9% and third in 19%. Of the 181 cases, 19 had ROHs for a whole chromosome revealing uniparental isodisomy (isoUPD). In all, 25 cases had significant ROHs involving a single chromosome; 5 cases were molecularly confirmed to have a mixed iso- and heteroUPD15 and 1 case each with segmental UPD9pat and segmental UPD22mat; 17 cases were suspected to have a mixed iso- and heteroUPD including 2 cases with small supernumerary marker and 2 cases with mosaic trisomy. For chromosome 15, 12 (92%) of 13 molecularly studied cases had either Prader-Willi or Angelman syndrome. Autosomal recessive disorders were confirmed in seven of nine cases from eight families because of the finding of suspected gene within a ROH. This study demonstrates that ROHs are much more frequent than previously recognized and often reflect parental relatedness, ascertain autosomal recessive diseases or unravel UPD in many cases.

  14. "Does replication groups scoring reduce false positive rate in SNP interaction discovery?: Response"

    PubMed Central

    2010-01-01

    A response to Toplak et al: Does replication groups scoring reduce false positive rate in SNP interaction discovery? BMC Genomics 2010, 11:58. Background The genomewide evaluation of genetic epistasis is a computationally demanding task, and a current challenge in Genetics. HFCC (Hypothesis-Free Clinical Cloning) is one of the methods that have been suggested for genomewide epistasis analysis. In order to perform an exhaustive search of epistasis, HFCC has implemented several tools and data filters, such as the use of multiple replication groups, and direction of effect and control filters. A recent article has claimed that the use of multiple replication groups (as implemented in HFCC) does not reduce the false positive rate, and we hereby try to clarify these issues. Results/Discussion HFCC uses, as an analysis strategy, the possibility of replicating findings in multiple replication groups, in order to select a liberal subset of preliminary results that are above a statistical criterion and consistent in direction of effect. We show that the use of replication groups and the direction filter reduces the false positive rate of a study, although at the expense of lowering the overall power of the study. A post-hoc analysis of these selected signals in the combined sample could then be performed to select the most promising results. Conclusion Replication of results in independent samples is generally used in scientific studies to establish credibility in a finding. Nonetheless, the combined analysis of several datasets is known to be a preferable and more powerful strategy for the selection of top signals. HFCC is a flexible and complete analysis tool, and one of its analysis options combines these two strategies: A preliminary multiple replication group analysis to eliminate inconsistent false positive results, and a post-hoc combined-group analysis to select the top signals. PMID:20576100

  15. Short Tree, Long Tree, Right Tree, Wrong Tree: New Acquisition Bias Corrections for Inferring SNP Phylogenies.

    PubMed

    Leaché, Adam D; Banbury, Barbara L; Felsenstein, Joseph; de Oca, Adrián Nieto-Montes; Stamatakis, Alexandros

    2015-11-01

    Single nucleotide polymorphisms (SNPs) are useful markers for phylogenetic studies owing in part to their ubiquity throughout the genome and ease of collection. Restriction site associated DNA sequencing (RADseq) methods are becoming increasingly popular for SNP data collection, but an assessment of the best practises for using these data in phylogenetics is lacking. We use computer simulations, and new double digest RADseq (ddRADseq) data for the lizard family Phrynosomatidae, to investigate the accuracy of RAD loci for phylogenetic inference. We compare the two primary ways RAD loci are used during phylogenetic analysis, including the analysis of full sequences (i.e., SNPs together with invariant sites), or the analysis of SNPs on their own after excluding invariant sites. We find that using full sequences rather than just SNPs is preferable from the perspectives of branch length and topological accuracy, but not of computational time. We introduce two new acquisition bias corrections for dealing with alignments composed exclusively of SNPs, a conditional likelihood method and a reconstituted DNA approach. The conditional likelihood method conditions on the presence of variable characters only (the number of invariant sites that are unsampled but known to exist is not considered), while the reconstituted DNA approach requires the user to specify the exact number of unsampled invariant sites prior to the analysis. Under simulation, branch length biases increase with the amount of missing data for both acquisition bias correction methods, but branch length accuracy is much improved in the reconstituted DNA approach compared to the conditional likelihood approach. Phylogenetic analyses of the empirical data using concatenation or a coalescent-based species tree approach provide strong support for many of the accepted relationships among phrynosomatid lizards, suggesting that RAD loci contain useful phylogenetic signal across a range of divergence times despite the

  16. Is MDM2 SNP309 Variation a Risk Factor for Head and Neck Carcinoma?

    PubMed Central

    Zhuo, Xianlu; Ye, Huiping; Li, Qi; Xiang, Zhaolan; Zhang, Xueyuan

    2016-01-01

    Abstract Murine double minute-2 (MDM2) is a negative regulator of P53, and its T309G polymorphism has been suggested as a risk factor for a variety of cancers. Increasing evidence has shown the association of MDM2 T309G polymorphism with head and neck carcinoma (HNC) risk. However, the results are inconsistent. Thus, we performed a meta-analysis to elucidate the association. The meta-analysis retrieved studies published up to August 2015, and essential information was extracted for analysis. Separate analyses on ethnicity, source of controls, sample size, detection method, and cancer types were also conducted. Odds ratios (ORs) and their 95% confidence intervals (CIs) were used to estimate the association. Pooled data from 16 case–control studies including 4625 cases and 6927 controls failed to indicate a significant association. However, in the subgroup analysis of sample sizes, an increased risk was observed in the largest sample size group (>1000) under a recessive model (OR = 1.52; 95% CI = 1.08–2.13). Increased risks were also found in the nasopharyngeal cancer in the subgroup analysis of cancer types (GG vs TT: OR = 2.07; 95% CI = 1.38–3.12; dominant model: OR = 1.48; 95% CI = 1.13–1.93; recessive model: OR = 1.76; 95% CI = 1.17–2.65). The results suggest that homozygote GG alleles of MDM2 SNP309 may be a low-penetrant risk factor for HNC, and G allele may confer nasopharyngeal cancer susceptibility. PMID:26945408

  17. HMGCR rs17671591 SNP Determines Lower Plasma LDL-C after Atorvastatin Therapy in Chilean Individuals.

    PubMed

    Cuevas, Alejandro; Fernández, César; Ferrada, Luis; Zambrano, Tomás; Rosales, Alexy; Saavedra, Nicolás; Salazar, Luis A

    2016-04-01

    Lipid-lowering response to statin therapy shows large interindividual variability. At a genome-wide significance level, single nucleotide polymorphisms (SNPs) in PCSK9 and HMGCR have been implicated in this differential response. However, the influence of these variants is uncertain in the Chilean population. Hence, we aimed to evaluate the contribution of PCSK9 rs7552841 and HMGCR rs17671591 SNPs as genetic determinants of atorvastatin response in Chilean hypercholesterolaemic individuals. One hundred and one hypercholesterolaemic patients received atorvastatin 10 mg/day for 4 weeks. Plasma lipid profile (TC, HDL-C, LDL-C and TG) was determined before and after statin treatment, and SNPs were identified by allelic discrimination using TaqMan(®) SNP Genotyping Assays. Adjusted univariate and multivariate analyses' models were used for statistical analyses, and a p-value <0.05 was considered significant. From baseline (week 0) to the study end-point (week 4), significant reductions were observed in plasma TC, LDL-C and TG (p < 0.001), while HDL-C levels were increased (p < 0.001). Multivariate analysis showed no association between lipid levels and atorvastatin therapy for the PCSK9 variant. However, the HMGCR rs17671591 T allele contributed to basal HDL-C concentration variability along with a higher increase in this lipid fraction after statin medication. In addition, this allele determined greater plasma LDL-C reductions after therapy with atorvastatin. Our data suggest that the HMGCR rs17671591 polymorphism can constitute a genetic marker of lower plasma LDL-C and enhanced HDL-C concentration after atorvastatin therapy in the Chilean population.

  18. A SNP panel for identity and kinship testing using massive parallel sequencing.

    PubMed

    Grandell, Ida; Samara, Raed; Tillmar, Andreas O

    2016-07-01

    Within forensic genetics, there is still a need for supplementary DNA marker typing in order to increase the power to solve cases for both identity testing and complex kinship issues. One major disadvantage with current capillary electrophoresis (CE) methods is the limitation in DNA marker multiplex capability. By utilizing massive parallel sequencing (MPS) technology, this capability can, however, be increased. We have designed a customized GeneRead DNASeq SNP panel (Qiagen) of 140 previously published autosomal forensically relevant identity SNPs for analysis using MPS. One single amplification step was followed by library preparation using the GeneRead Library Prep workflow (Qiagen). The sequencing was performed on a MiSeq System (Illumina), and the bioinformatic analyses were done using the software Biomedical Genomics Workbench (CLC Bio, Qiagen). Forty-nine individuals from a Swedish population were genotyped in order to establish genotype frequencies and to evaluate the performance of the assay. The analyses showed to have a balanced coverage among the included loci, and the heterozygous balance showed to have less than 0.5 % outliers. Analyses of dilution series of the 2800M Control DNA gave reproducible results down to 0.2 ng DNA input. In addition, typing of FTA samples and bone samples was performed with promising results. Further studies and optimizations are, however, required for a more detailed evaluation of the performance of degraded and PCR-inhibited forensic samples. In summary, the assay offers a straightforward sample-to-genotype workflow and could be useful to gain information in forensic casework, for both identity testing and in order to solve complex kinship issues.

  19. Novel SNP Discovery in African Buffalo, Syncerus caffer, using high-throughput Sequencing.

    PubMed

    le Roex, Nikki; Noyes, Harry; Brass, Andrew; Bradley, Daniel G; Kemp, Steven J; Kay, Suzanne; van Helden, Paul D; Hoal, Eileen G

    2012-01-01

    The African buffalo, Syncerus caffer, is one of the most abundant and ecologically important species of megafauna in the savannah ecosystem. It is an important prey species, as well as a host for a vast array of nematodes, pathogens and infectious diseases, such as bovine tuberculosis and corridor disease. Large-scale SNP discovery in this species would greatly facilitate further research into the area of host genetics and disease susceptibility, as well as provide a wealth of sequence information for other conservation and genomics studies. We sequenced pools of Cape buffalo DNA from a total of 9 animals, on an ABI SOLiD4 sequencer. The resulting short reads were mapped to the UMD3.1 Bos taurus genome assembly using both BWA and Bowtie software packages. A mean depth of 2.7× coverage over the mapped regions was obtained. Btau4 gene annotation was added to all SNPs identified within gene regions. Bowtie and BWA identified a maximum of 2,222,665 and 276,847 SNPs within the buffalo respectively, depending on analysis method. A panel of 173 SNPs was validated by fluorescent genotyping in 87 individuals. 27 SNPs failed to amplify, and of the remaining 146 SNPs, 43-54% of the Bowtie SNPs and 57-58% of the BWA SNPs were confirmed as polymorphic. dN/dS ratios found no evidence of positive selection, and although there were genes that appeared to be under negative selection, these were more likely to be slowly evolving house-keeping genes.

  20. Use of the gamma method for self-contained gene-set analysis of SNP data

    PubMed Central

    Biernacka, Joanna M; Jenkins, Gregory D; Wang, Liewei; Moyer, Ann M; Fridley, Brooke L

    2012-01-01

    Gene-set analysis (GSA) evaluates the overall evidence of association between a phenotype and all genotyped single nucleotide polymorphisms (SNPs) in a set of genes, as opposed to testing for association between a phenotype and each SNP individually. We propose using the Gamma Method (GM) to combine gene-level P-values for assessing the significance of GS association. We performed simulations to compare the GM with several other self-contained GSA strategies, including both one-step and two-step GSA approaches, in a variety of scenarios. We denote a ‘one-step' GSA approach to be one in which all SNPs in a GS are used to derive a test of GS association without consideration of gene-level effects, and a ‘two-step' approach to be one in which all genotyped SNPs in a gene are first used to evaluate association of the phenotype with all measured variation in the gene and then the gene-level tests of association are aggregated to assess the GS association with the phenotype. The simulations suggest that, overall, two-step methods provide higher power than one-step approaches and that combining gene-level P-values using the GM with a soft truncation threshold between 0.05 and 0.20 is a powerful approach for conducting GSA, relative to the competing approaches assessed. We also applied all of the considered GSA methods to data from a pharmacogenomic study of cisplatin, and obtained evidence suggesting that the glutathione metabolism GS is associated with cisplatin drug response. PMID:22166939

  1. Short Tree, Long Tree, Right Tree, Wrong Tree: New Acquisition Bias Corrections for Inferring SNP Phylogenies

    PubMed Central

    Leaché, Adam D.; Banbury, Barbara L.; Felsenstein, Joseph; de Oca, Adrián nieto-Montes; Stamatakis, Alexandros

    2015-01-01

    Single nucleotide polymorphisms (SNPs) are useful markers for phylogenetic studies owing in part to their ubiquity throughout the genome and ease of collection. Restriction site associated DNA sequencing (RADseq) methods are becoming increasingly popular for SNP data collection, but an assessment of the best practises for using these data in phylogenetics is lacking. We use computer simulations, and new double digest RADseq (ddRADseq) data for the lizard family Phrynosomatidae, to investigate the accuracy of RAD loci for phylogenetic inference. We compare the two primary ways RAD loci are used during phylogenetic analysis, including the analysis of full sequences (i.e., SNPs together with invariant sites), or the analysis of SNPs on their own after excluding invariant sites. We find that using full sequences rather than just SNPs is preferable from the perspectives of branch length and topological accuracy, but not of computational time. We introduce two new acquisition bias corrections for dealing with alignments composed exclusively of SNPs, a conditional likelihood method and a reconstituted DNA approach. The conditional likelihood method conditions on the presence of variable characters only (the number of invariant sites that are unsampled but known to exist is not considered), while the reconstituted DNA approach requires the user to specify the exact number of unsampled invariant sites prior to the analysis. Under simulation, branch length biases increase with the amount of missing data for both acquisition bias correction methods, but branch length accuracy is much improved in the reconstituted DNA approach compared to the conditional likelihood approach. Phylogenetic analyses of the empirical data using concatenation or a coalescent-based species tree approach provide strong support for many of the accepted relationships among phrynosomatid lizards, suggesting that RAD loci contain useful phylogenetic signal across a range of divergence times despite the

  2. The impact of natural selection on an ABCC11 SNP determining earwax type.

    PubMed

    Ohashi, Jun; Naka, Izumi; Tsuchiya, Naoyuki

    2011-01-01

    A nonsynonymous single nucleotide polymorphism (SNP), rs17822931-G/A (538G>A; Gly180Arg), in the ABCC11 gene determines human earwax type (i.e., wet or dry) and is one of most differentiated nonsynonymous SNPs between East Asian and African populations. A recent genome-wide scan for positive selection revealed that a genomic region spanning ABCC11, LONP2, and SIAH1 genes has been subjected to a selective sweep in East Asians. Considering the potential functional significance as well as the population differentiation of SNPs located in that region, rs17822931 is the most plausible candidate polymorphism to have undergone geographically restricted positive selection. In this study, we estimated the selection intensity or selection coefficient of rs17822931-A in East Asians by analyzing two microsatellite loci flanking rs17822931 in the African (HapMap-YRI) and East Asian (HapMap-JPT and HapMap-CHB) populations. Assuming a recessive selection model, a coalescent-based simulation approach suggested that the selection coefficient of rs17822931-A had been approximately 0.01 in the East Asian population, and a simulation experiment using a pseudo-sampling variable revealed that the mutation of rs17822931-A occurred 2006 generations (95% credible interval, 1,023-3,901 generations) ago. In addition, we show that absolute latitude is significantly associated with the allele frequency of rs17822931-A in Asian, Native American, and European populations, implying that the selective advantage of rs17822931-A is related to an adaptation to a cold climate. Our results provide a striking example of how local adaptation has played a significant role in the diversification of human traits.

  3. SNP-based association mapping of the polled gene in divergent cattle breeds.

    PubMed

    Seichter, D; Russ, I; Rothammer, S; Eder, J; Förster, M; Medugorac, I

    2012-10-01

    Naturally, hornless cattle are called polled. Although the POLL locus could be assigned to a c. 1.36-Mb interval in the centromeric region of BTA1, the underlying genetic basis for the polled trait is still unknown. Here, an association mapping design was set up to refine the candidate region of the polled trait for subsequent high-throughput sequencing. The case group comprised 101 homozygous polled animals from nine divergent cattle breeds, the majority represented by Galloway, Angus, Fleckvieh and Holstein Friesian. Additionally, this group included some polled individuals of Blonde d'Aquitaine, Charolais, Hereford, Jersey and Limousin breeds. The control group comprised horned Belgian Blue, Fleckvieh, Holstein Friesian and Illyrian Buša cattle. A genome-wide scan using 49,163 SNPs was performed, which revealed one shared homozygous haplotype block consisting of nine neighbouring SNPs in all polled animals. This segment defines a 381-kb interval on BTA1 that we consider to be the most likely location of the POLL mutation. Our results further demonstrate that the polled-associated haplotype is also frequent in horned animals included in this study, and thus the haplotype as such cannot be used for population-wide genetic testing. The actual trait-associated haplotype may be revealed by using higher-density SNP arrays. For the final identification of the causal mutation, we suggest high-throughput sequencing of the entire candidate region, because the identification of functional candidate genes is difficult owing to the lack of a comparable model.

  4. SNP detection using RNA-sequences of candidate genes associated with puberty in cattle.

    PubMed

    Dias, M M; Cánovas, A; Mantilla-Rojas, C; Riley, D G; Luna-Nevarez, P; Coleman, S J; Speidel, S E; Enns, R M; Islas-Trejo, A; Medrano, J F; Moore, S S; Fortes, M R S; Nguyen, L T; Venus, B; Diaz, I S D P; Souza, F R P; Fonseca, L F S; Baldi, F; Albuquerque, L G; Thomas, M G; Oliveira, H N

    2017-03-22

    Fertility traits, such as heifer pregnancy, are economically important in cattle production systems, and are therefore, used in genetic selection programs. The aim of this study was to identify single nucleotide polymorphisms (SNPs) using RNA-sequencing (RNA-Seq) data from ovary, uterus, endometrium, pituitary gland, hypothalamus, liver, longissimus dorsi muscle, and adipose tissue in 62 candidate genes associated with heifer puberty in cattle. RNA-Seq reads were assembled to the bovine reference genome (UMD 3.1.1) and analyzed in five cattle breeds; Brangus, Brahman, Nellore, Angus, and Holstein. Two approaches used the Brangus data for SNP discovery 1) pooling all samples, and 2) within each individual sample. These approaches revealed 1157 SNPs. These were compared with those identified in the pooled samples of the other breeds. Overall, 172 SNPs within 13 genes (CPNE5, FAM19A4, FOXN4, KLF1, LOC777593, MGC157266, NEBL, NRXN3, PEPT-1, PPP3CA, SCG5, TSG101, and TSHR) were concordant in the five breeds. Using Ensembl's Variant Effector Predictor, we determined that 12% of SNPs were in exons (71% synonymous, 29% nonsynonymous), 1% were in untranslated regions (UTRs), 86% were in introns, and 1% were in intergenic regions. Since these SNPs were discovered in RNA, the variants were predicted to be within exons or UTRs. Overall, 160 novel transcripts in 42 candidate genes and five novel genes overlapping five candidate genes were observed. In conclusion, 1157 SNPs were identified in 62 candidate genes associated with puberty in Brangus cattle, of which, 172 were concordant in the five cattle breeds. Novel transcripts and genes were also identified.

  5. ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using Next Generation Sequence

    PubMed Central

    2011-01-01

    Background The possibilities offered by next generation sequencing (NGS) platforms are revolutionizing biotechnological laboratories. Moreover, the combination of NGS sequencing and affordable high-throughput genotyping technologies is facilitating the rapid discovery and use of SNPs in non-model species. However, this abundance of sequences and polymorphisms creates new software needs. To fulfill these needs, we have developed a powerful, yet easy-to-use application. Results The ngs_backbone software is a parallel pipeline capable of analyzing Sanger, 454, Illumina and SOLiD (Sequencing by Oligonucleotide Ligation and Detection) sequence reads. Its main supported analyses are: read cleaning, transcriptome assembly and annotation, read mapping and single nucleotide polymorphism (SNP) calling and selection. In order to build a truly useful tool, the software development was paired with a laboratory experiment. All public tomato Sanger EST reads plus 14.2 million Illumina reads were employed to test the tool and predict polymorphism in tomato. The cleaned reads were mapped to the SGN tomato transcriptome obtaining a coverage of 4.2 for Sanger and 8.5 for Illumina. 23,360 single nucleotide variations (SNVs) were predicted. A total of 76 SNVs were experimentally validated, and 85% were found to be real. Conclusions ngs_backbone is a new software package capable of analyzing sequences produced by NGS technologies and predicting SNVs with great accuracy. In our tomato example, we created a highly polymorphic collection of SNVs that will be a useful resource for tomato researchers and breeders. The software developed along with its documentation is freely available under the AGPL license and can be downloaded from http://bioinf.comav.upv.es/ngs_backbone/ or http://github.com/JoseBlanca/franklin. PMID:21635747

  6. Regression Modeling and Meta-Analysis of Diagnostic Accuracy of SNP-Based Pathogenicity Detection Tools for UGT1A1 Gene Mutation

    PubMed Central

    Rahim, Fakher; Galehdari, Hamid; Mohammadi-asl, Javad; Saki, Najmaldin

    2013-01-01

    Aims. This review summarized all available evidence on the accuracy of SNP-based pathogenicity detection tools and introduced regression model based on functional scores, mutation score, and genomic variation degree. Materials and Methods. A comprehensive search was performed to find all mutations related to Crigler-Najjar syndrome. The pathogenicity prediction was done using SNP-based pathogenicity detection tools including SIFT, PHD-SNP, PolyPhen2, fathmm, Provean, and Mutpred. Overall, 59 different SNPs related to missense mutations in the UGT1A1 gene, were reviewed. Results. Comparing the diagnostic OR, our model showed high detection potential (diagnostic OR: 16.71, 95% CI: 3.38–82.69). The highest MCC and ACC belonged to our suggested model (46.8% and 73.3%), followed by SIFT (34.19% and 62.71%). The AUC analysis showed a significance overall performance of our suggested model compared to the selected SNP-based pathogenicity detection tool (P = 0.046). Conclusion. Our suggested model is comparable to the well-established SNP-based pathogenicity detection tools that can appropriately reflect the role of a disease-associated SNP in both local and global structures. Although the accuracy of our suggested model is not relatively high, the functional impact of the pathogenic mutations is highlighted at the protein level, which improves the understanding of the molecular basis of mutation pathogenesis. PMID:23997956

  7. LincSNP 2.0: an updated database for linking disease-associated SNPs to human long non-coding RNAs and their TFBSs

    PubMed Central

    Ning, Shangwei; Yue, Ming; Wang, Peng; Liu, Yue; Zhi, Hui; Zhang, Yan; Zhang, Jizhou; Gao, Yue; Guo, Maoni; Zhou, Dianshuang; Li, Xin; Li, Xia

    2017-01-01

    We describe LincSNP 2.0 (http://bioinfo.hrbmu.edu.cn/LincSNP), an updated database that is used specifically to store and annotate disease-associated single nucleotide polymorphisms (SNPs) in human long non-coding RNAs (lncRNAs) and their transcription factor binding sites (TFBSs). In LincSNP 2.0, we have updated the database with more data and several new features, including (i) expanding disease-associated SNPs in human lncRNAs; (ii) identifying disease-associated SNPs in lncRNA TFBSs; (iii) updating LD-SNPs from the 1000 Genomes Project; and (iv) collecting more experimentally supported SNP-lncRNA-disease associations. Furthermore, we developed three flexible online tools to retrieve and analyze the data. Linc-Mart is a convenient way for users to customize their own data. Linc-Browse is a tool for all data visualization. Linc-Score predicts the associations between lncRNA and disease. In addition, we provided users a newly designed, user-friendly interface to search and download all the data in LincSNP 2.0 and we also provided an interface to submit novel data into the database. LincSNP 2.0 is a continually updated database and will serve as an important resource for investigating the functions and mechanisms of lncRNAs in human diseases. PMID:27924020

  8. Mutations of C-reactive protein (CRP) -286 SNP, APC and p53 in colorectal cancer: implication for a CRP-Wnt crosstalk.

    PubMed

    Su, Hai-Xiang; Zhou, Hai-Hong; Wang, Ming-Yu; Cheng, Jin; Zhang, Shi-Chao; Hui, Feng; Chen, Xue-Zhong; Liu, Shan-Hui; Liu, Qin-Jiang; Zhu, Zi-Jiang; Hu, Qing-Rong; Wu, Yi; Ji, Shang-Rong

    2014-01-01

    C-reactive protein (CRP) is an established marker of inflammation with pattern-recognition receptor-like activities. Despite the close association of the serum level of CRP with the risk and prognosis of several types of cancer, it remains elusive whether CRP contributes directly to tumorigenesis or just represents a bystander marker. We have recently identified recurrent mutations at the SNP position -286 (rs3091244) in the promoter of CRP gene in several tumor types, instead suggesting that locally produced CRP is a potential driver of tumorigenesis. However, it is unknown whether the -286 site is the sole SNP position of CRP gene targeted for mutation and whether there is any association between CRP SNP mutations and other frequently mutated genes in tumors. Herein, we have examined the genotypes of three common CRP non-coding SNPs (rs7553007, rs1205, rs3093077) in tumor/normal sample pairs of 5 cancer types (n = 141). No recurrent somatic mutations are found at these SNP positions, indicating that the -286 SNP mutations are preferentially selected during the development of cancer. Further analysis reveals that the -286 SNP mutations of CRP tend to co-occur with mutated APC particularly in rectal cancer (p = 0.04; n = 67). By contrast, mutations of CRP and p53 or K-ras appear to be unrelated. There results thus underscore the functional importance of the -286 mutation of CRP in tumorigenesis and imply an interaction between CRP and Wnt signaling pathway.

  9. Single nucleotide polymorphism (SNP) variation of wolves (Canis lupus) in Southeast Alaska and comparison with wolves, dogs, and coyotes in North America.

    PubMed

    Cronin, Matthew A; Cánovas, Angela; Bannasch, Danika L; Oberbauer, Anita M; Medrano, Juan F

    2015-01-01

    There is considerable interest in the genetics of wolves (Canis lupus) because of their close relationship to domestic dogs (C. familiaris) and the need for informed conservation and management. This includes wolf populations in Southeast Alaska for which we determined genotypes of 305 wolves at 173662 single nucleotide polymorphism (SNP) loci. After removal of invariant and linked SNP, 123801 SNP were used to quantify genetic differentiation of wolves in Southeast Alaska and wolves, coyotes (C. latrans), and dogs from other areas in North America. There is differentiation of SNP allele frequencies between the species (wolves, coyotes, and dogs), although differentiation is relatively low between some wolf and coyote populations. There are varying levels of differentiation among populations of wolves, including low differentiation of wolves in interior Alaska, British Columbia, and the northern US Rocky Mountains. There is considerable differentiation of SNP allele frequencies of wolves in Southeast Alaska from wolves in other areas. However, wolves in Southeast Alaska are not a genetically homogeneous group and there are comparable levels of genetic differentiation among areas within Southeast Alaska and between Southeast Alaska and other geographic areas. SNP variation and other genetic data are discussed regarding taxonomy and management.

  10. SNP genotypes of Mycobacterium leprae isolates in Thailand and their combination with rpoT and TTC genotyping for analysis of leprosy distribution and transmission.

    PubMed

    Phetsuksiri, Benjawan; Srisungngam, Sopa; Rudeeaneksin, Janisara; Bunchoo, Supranee; Lukebua, Atchariya; Wongtrungkapun, Ruch; Paitoon, Soontara; Sakamuri, Rama Murthy; Brennan, Patrick J; Vissa, Varalakshmi

    2012-01-01

    Based on the discovery of three single nucleotide polymorphisms (SNPs) in Mycobacterium leprae, it has been previously reported that there are four major SNP types associated with different geographic regions around the world. Another typing system for global differentiation of M. leprae is the analysis of the variable number of short tandem repeats within the rpoT gene. To expand the analysis of geographic distribution of M. leprae, classified by SNP and rpoT gene polymorphisms, we studied 85 clinical isolates from Thai patients and compared the findings with those reported from Asian isolates. SNP genotyping by PCR amplification and sequencing revealed that all strains like those in Myanmar were SNP type 1 and 3, with the former being predominant, while in Japan, Korea, and Indonesia, the SNP type 3 was found to be more frequent. The pattern of M. leprae distribution in Thailand and Myanmar is quite similar, except that SNP type 2 was not found in Thailand. In addition, the 3-copy hexamer genotype in the rpoT gene is shared among the isolates from these two neighboring countries. On the basis of these two markers, we postulate that M. leprae in leprosy patients from Myanmar and Thailand has a common historical origin. Further differentiation among Thai isolates was possible by assessing copy numbers of the TTC sequence, a more polymorphic microsatellite locus.

  11. Incorporation of Personal Single Nucleotide Polymorphism (SNP) Data into a National Level Electronic Health Record for Disease Risk Assessment, Part 3: An Evaluation of SNP Incorporated National Health Information System of Turkey for Prostate Cancer

    PubMed Central

    Beyan, Timur

    2014-01-01

    Background A personalized medicine approach provides opportunities for predictive and preventive medicine. Using genomic, clinical, environmental, and behavioral data, the tracking and management of individual wellness is possible. A prolific way to carry this personalized approach into routine practices can be accomplished by integrating clinical interpretations of genomic variations into electronic medical records (EMRs)/electronic health records (EHRs). Today, various central EHR infrastructures have been constituted in many countries of the world, including Turkey. Objective As an initial attempt to develop a sophisticated infrastructure, we have concentrated on incorporating the personal single nucleotide polymorphism (SNP) data into the National Health Information System of Turkey (NHIS-T) for disease risk assessment, and evaluated the performance of various predictive models for prostate cancer cases. We present our work as a three part miniseries: (1) an overview of requirements, (2) the incorporation of SNP data into the NHIS-T, and (3) an evaluation of SNP data incorporated into the NHIS-T for prostate cancer. Methods In the third article of this miniseries, we have evaluated the proposed complementary capabilities (ie, knowledge base and end-user application) with real data. Before the evaluation phase, clinicogenomic associations about increased prostate cancer risk were extracted from knowledge sources, and published predictive genomic models assessing individual prostate cancer risk were collected. To evaluate complementary capabilities, we also gathered personal SNP data of four prostate cancer cases and fifteen controls. Using these data files, we compared various independent and model-based, prostate cancer risk assessment approaches. Results Through the extraction and selection processes of SNP-prostate cancer risk associations, we collected 209 independent associations for increased risk of prostate cancer from the studied knowledge sources. Also

  12. Long form leptin receptor and SNP effect on reproductive traits during embryo attachment in Suzhong sows.

    PubMed

    Fu, Yanfeng; Li, Lan; Li, Bixia; Fang, Xiaomin; Ren, Shouwen

    2016-05-01

    To ascertain whether the long form leptin receptor (LEPR) affects the regulation of embryo attachment and whether there are LEPR Single Nucleotide Polymorphisms (SNPs) associated with reproductive traits in pigs, Real-time qPCR was used to detect relative abundance of LEPR mRNA pattern in different tissues of Suzhong sows during the embryo attachment period (pregnancy day 13, 18 and 24) to the uterus, and PCR-RFLP as well as PCR-sequencing were used to investigate the coding sequence for SNPs of LEPR in a population of 512 Suzhong sows. Real-time qPCR results indicated that LEPR mRNA was present in all 22 tissues of pigs with differences in relative abundance of the LEPR mRNA (P<0.05). Among these tissues, the greatest relative abundance occurred at the endometrial attachment site (P<0.01), followed by the hypothalamus and most reproductive tissues (P<0.05), and there was a lesser relative abundance of the LEPR mRNA in the pituitary. During different embryo attachment periods, LEPR mRNA was greatest on Day 18 (attachment; P<0.05), followed by Day 24 (post-attachment), and relative abundance was least on Day 13 (pre-attachment). The prevalence of the LEPR mRNA in pregnant sows was greater than in non-pregnant sows (P<0.05). At the c.2856C>T locus of LEPR, Chi-square test results demonstrated that allele and genotype frequencies were in Hardy-Weinberg disequilibrium at this locus, PCR-RFLP results revealed that Genotype TT was greater than Genotype CC (P<0.05) for reproductive traits of TNB (Total Number Born) and NBA (Number Born Alive), which suggested that T allele at c.2856C>T locus has advantageous effects on litter size and litter weight in Suzhong pigs. In conclusion, the expression of the LEPR gene might be involved in the regulation of embryo attachment mechanisms in pigs, and the LEPR SNP c.2856C>T could be a molecular marker for improving litter size and litter weight in pig breeding.

  13. SNP-Based Linkage Mapping for Validation of QTLs for Resistance to Ascochyta Blight in Lentil

    PubMed Central

    Sudheesh, Shimna; Rodda, Matthew S.; Davidson, Jenny; Javid, Muhammad; Stephens, Amber; Slater, Anthony T.; Cogan, Noel O. I.; Forster, John W.; Kaur, Sukhjiwan

    2016-01-01

    Lentil (Lens culinaris Medik.) is a self-pollinating, diploid, annual, cool-season, food legume crop that is cultivated throughout the world. Ascochyta blight (AB), caused by Ascochyta lentis Vassilievsky, is an economically important and widespread disease of lentil. Development of cultivars with high levels of durable resistance provides an environmentally acceptable and economically feasible method for AB control. A detailed understanding of the genetic basis of AB resistance is hence highly desirable, in order to obtain insight into the number and influence of resistance genes. Genetic linkage maps based on single nucleotide polymorphisms (SNP) and simple sequence repeat (SSR) markers have been developed from three recombinant inbred line (RIL) populations. The IH × NF map contained 460 loci across 1461.6 cM, while the IH × DIG map contained 329 loci across 1302.5 cM and the third map, NF × DIG contained 330 loci across 1914.1 cM. Data from these maps were combined with a map from a previously published study through use of bridging markers to generate a consensus linkage map containing 689 loci distributed across seven linkage groups (LGs), with a cumulative length of 2429.61 cM at an average density of one marker per 3.5 cM. Trait dissection of AB resistance was performed for the RIL populations, identifying totals of two and three quantitative trait loci (QTLs) explaining 52 and 69% of phenotypic variation for resistance to infection in the IH × DIG and IH × NF populations, respectively. Presence of common markers in the vicinity of the AB_IH1- and AB_IH2.1/AB_IH2.2-containing regions on both maps supports the inference that a common genomic region is responsible for conferring resistance and is associated with the resistant parent, Indianhead. The third QTL was derived from Northfield. Evaluation of markers associated with AB resistance across a diverse lentil germplasm panel revealed that the identity of alleles associated with AB_IH1 predicted the

  14. 1 + 1 = 3: Development and validation of a SNP-based algorithm to identify genetic contributions from three distinct inbred mouse strains.

    PubMed

    Gorham, James D; Ranson, Matthew S; Smith, Janebeth C; Gorham, Beverly J; Muirhead, Kristen-Ashley

    2012-12-01

    State-of-the-art, genome-wide assessment of mouse genetic background uses single nucleotide polymorphism (SNP) PCR. As SNP analysis can use multiplex testing, it is amenable to high-throughput analysis and is the preferred method for shared resource facilities that offer genetic background assessment of mouse genomes. However, a typical individual SNP query yields only two alleles (A vs. B), limiting the application of this methodology to distinguishing contributions from no more than two inbred mouse strains. By contrast, simple sequence length polymorphism (SSLP) analysis yields multiple alleles but is not amenable to high-throughput testing. We sought to devise a SNP-based technique to identify donor strain origins when three distinct mouse strains potentially contribute to the genetic makeup of an individual mouse. A computational approach was used to devise a three-strain analysis (3SA) algorithm that would permit identification of three genetic backgrounds while still using a binary-output SNP platform. A panel of 15 mosaic mice with contributions from BALB/c, C57Bl/6, and DBA/2 genetic backgrounds was bred and analyzed using a genome-wide SNP panel using 1449 markers. The 3SA algorithm was applied and then validated using SSLP. The 3SA algorithm assigned 85% of 1449 SNPs as informative for the C57Bl/6, BALB/c, or DBA/2 backgrounds, respectively. Testing the panel of 15 F2 mice, the 3SA algorithm predicted donor strain origins genome-wide. Donor strain origins predicted by the 3SA algorithm correlated perfectly with results from individual SSLP markers located on five different chromosomes (n=70 tests). We have established and validated an analysis algorithm based on binary SNP data that can successfully identify the donor strain origins of chromosomal regions in mice that are bred from three distinct inbred mouse strains.

  15. SNP-Discovery by RAD-Sequencing in a Germplasm Collection of Wild and Cultivated Grapevines (V. vinifera L.).

    PubMed

    Marrano, Annarita; Birolo, Giovanni; Prazzoli, Maria Lucia; Lorenzi, Silvia; Valle, Giorgio; Grando, Maria Stella

    2017-01-01

    Whole-genome comparisons of Vitis vinifera subsp. sativa and V. vinifera subsp. sylvestris are expected to provide a better estimate of the valuable genetic diversity still present in grapevine, and help to reconstruct the evolutionary history of a major crop worldwide. To this aim, the increase of molecular marker density across the grapevine genome is fundamental. Here we describe the SNP discovery in a grapevine germplasm collection of 51 cultivars and 44 wild accessions through a novel protocol of restriction-site associated DNA (RAD) sequencing. By resequencing 1.1% of the grapevine genome at a high coverage, we recovered 34K BamHI unique restriction sites, of which 6.8% were absent in the 'PN40024' reference genome. Moreover, we identified 37,748 single nucleotide polymorphisms (SNPs), 93% of which belonged to the 19 assembled chromosomes with an average of 1.8K SNPs per chromosome. Nearly half of the SNPs fell in genic regions mostly assigned to the functional categories of metabolism and regulation, whereas some nonsynonymous variants were identified in genes related with the detection and response to environmental stimuli. SNP validation was carried-out, showing the ability of RAD-seq to accurately determine genotypes in a highly heterozygous species. To test the usefulness of our SNP panel, the main diversity statistics were evaluated, highlighting how the wild grapevine retained less genetic variability than the cultivated form. Furthermore, the analysis of Linkage Disequilibrium (LD) in the two subspecies separately revealed how the LD decays faster within the domesticated grapevine compared to its wild relative. Being the first application of RAD-seq in a diverse grapevine germplasm collection, our approach holds great promise for exploiting the genetic resources available in one of the most economically important fruit crops.

  16. Nuclear species-diagnostic SNP markers mined from 454 amplicon sequencing reveal admixture genomic structure of modern citrus varieties.

    PubMed

    Curk, Franck; Ancillo, Gema; Ollitrault, Frédérique; Perrier, Xavier; Jacquemoud-Collet, Jean-Pierre; Garcia-Lor, Andres; Navarro, Luis; Ollitrault, Patrick

    2015-01-01

    Most cultivated Citrus species originated from interspecific hybridisation between four ancestral taxa (C. reticulata, C. maxima, C. medica, and C. micrantha) with limited further interspecific recombination due to vegetative propagation. This evolution resulted in admixture genomes with frequent interspecific heterozygosity. Moreover, a major part of the phenotypic diversity of edible citrus results from the initial differentiation between these taxa. Deciphering the phylogenomic structure of citrus germplasm is therefore essential for an efficient utilization of citrus biodiversity in breeding schemes. The objective of this work was to develop a set of species-diagnostic single nucleotide polymorphism (SNP) markers for the four Citrus ancestral taxa covering the nine chromosomes, and to use these markers to infer the phylogenomic structure of secondary species and modern cultivars. Species-diagnostic SNPs were mined from 454 amplicon sequencing of 57 gene fragments from 26 genotypes of the four basic taxa. Of the 1,053 SNPs mined from 28,507 kb sequence, 273 were found to be highly diagnostic for a single basic taxon. Species-diagnostic SNP markers (105) were used to analyse the admixture structure of varieties and rootstocks. This revealed C. maxima introgressions in most of the old and in all recent selections of mandarins, and suggested that C. reticulata × C. maxima reticulation and introgression processes were important in edible mandarin domestication. The large range of phylogenomic constitutions between C. reticulata and C. maxima revealed in mandarins, tangelos, tangors, sweet oranges, sour oranges, grapefruits, and orangelos is favourable for genetic association studies based on phylogenomic structures of the germplasm. Inferred admixture structures were in agreement with previous hypotheses regarding the origin of several secondary species and also revealed the probable origin of several acid citrus varieties. The developed species-diagnostic SNP

  17. Development of SNP markers for genes of the phenylpropanoid pathway and their association to kernel and malting traits in barley

    PubMed Central

    2013-01-01

    Background Flavonoids are an important class of secondary compounds in angiosperms. Next to certain biological functions in plants, they play a role in the brewing process and have an effect on taste, color and aroma of beer. The aim of this study was to reveal the haplotype diversity of candidate genes involved in the phenylpropanoid biosynthesis pathway in cultivated barley varieties (Hordeum vulgare L.) and to determine associations to kernel and malting quality parameters. Results Five genes encoding phenylalanine ammonia-lyase (PAL), cinnamate 4-hydroxylase (C4H), chalcone synthase (CHS), flavanone 3-hydroxylase (F3H) and dihydroflavonol reductase (DFR) of the phenylpropanoid biosynthesis pathway were partially resequenced in 16 diverse barley reference genotypes. Their localization in the barley genome, their genetic structure, and their genetic variation e.g. single nucleotide polymorphism (SNP) and Insertion/Deletion (InDel) patterns were revealed. In total, 130 SNPs and seven InDels were detected. Of these, 21 polymorphisms were converted into high-throughput pyrosequencing markers. The resulting SNP and haplotype patterns were used to calculate associations with kernel and malting quality parameters. Conclusions SNP patterns were found to be highly variable for the investigated genes. The developed high-throughput markers are applicable for assessing the genetic variability and for the determination of haplotype patterns in a set of barley accessions. The candidate genes PAL, C4H and F3H were shown to be associated to several malting properties like glassiness (PAL), viscosity (C4H) or to final attenuation (F3H). PMID:24088365

  18. Nuclear Species-Diagnostic SNP Markers Mined from 454 Amplicon Sequencing Reveal Admixture Genomic Structure of Modern Citrus Varieties

    PubMed Central

    Curk, Franck; Ancillo, Gema; Ollitrault, Frédérique; Perrier, Xavier; Jacquemoud-Collet, Jean-Pierre; Garcia-Lor, Andres; Navarro, Luis; Ollitrault, Patrick

    2015-01-01

    Most cultivated Citrus species originated from interspecific hybridisation between four ancestral taxa (C. reticulata, C. maxima, C. medica, and C. micrantha) with limited further interspecific recombination due to vegetative propagation. This evolution resulted in admixture genomes with frequent interspecific heterozygosity. Moreover, a major part of the phenotypic diversity of edible citrus results from the initial differentiation between these taxa. Deciphering the phylogenomic structure of citrus germplasm is therefore essential for an efficient utilization of citrus biodiversity in breeding schemes. The objective of this work was to develop a set of species-diagnostic single nucleotide polymorphism (SNP) markers for the four Citrus ancestral taxa covering the nine chromosomes, and to use these markers to infer the phylogenomic structure of secondary species and modern cultivars. Species-diagnostic SNPs were mined from 454 amplicon sequencing of 57 gene fragments from 26 genotypes of the four basic taxa. Of the 1,053 SNPs mined from 28,507 kb sequence, 273 were found to be highly diagnostic for a single basic taxon. Species-diagnostic SNP markers (105) were used to analyse the admixture structure of varieties and rootstocks. This revealed C. maxima introgressions in most of the old and in all recent selections of mandarins, and suggested that C. reticulata × C. maxima reticulation and introgression processes were important in edible mandarin domestication. The large range of phylogenomic constitutions between C. reticulata and C. maxima revealed in mandarins, tangelos, tangors, sweet oranges, sour oranges, grapefruits, and orangelos is favourable for genetic association studies based on phylogenomic structures of the germplasm. Inferred admixture structures were in agreement with previous hypotheses regarding the origin of several secondary species and also revealed the probable origin of several acid citrus varieties. The developed species-diagnostic SNP

  19. A HapMap leads to a Capsicum annuum SNP infinium array: a new tool for pepper breeding.

    PubMed

    Hulse-Kemp, Amanda M; Ashrafi, Hamid; Plieske, Joerg; Lemm, Jana; Stoffel, Kevin; Hill, Theresa; Luerssen, Hartmut; Pethiyagoda, Charit L; Lawley, Cindy T; Ganal, Martin W; Van Deynze, Allen

    2016-01-01

    The Capsicum genus (Pepper) is a part of the Solanacae family. It has been important in many cultures worldwide for its key nutritional components and uses as spices, medicines, ornamentals and vegetables. Worldwide population growth is associated with demand for more nutritionally valuable vegetables while contending with decreasing resources and available land. These conditions require increased efficiency in pepper breeding to deal with these imminent challenges. Through resequencing of inbred lines we have completed a valuable haplotype map (HapMap) for the pepper genome based on single-nucleotide polymorphisms (SNP). The identified SNPs were annotated and classified based on their gene annotation in the pepper draft genome sequence and phenotype of the sequenced inbred lines. A selection of one marker per gene model was utilized to create the PepperSNP16K array, which simultaneously genotyped 16 405 SNPs, of which 90.7% were found to be informative. A set of 84 inbred and hybrid lines and a mapping population of 90 interspecific F2 individuals were utilized to validate the array. Diversity analysis of the inbred lines shows a distinct separation of bell versus chile/hot pepper types and separates them into five distinct germplasm groups. The interspecific population created between Tabasco (C. frutescens chile type) and P4 (C. annuum blocky type) produced a linkage map with 5546 markers separated into 1361 bins on twelve 12 linkage groups representing 1392.3 cM. This publically available genotyping platform can be used to rapidly assess a large number of markers in a reproducible high-throughput manner for pepper. As a standardized tool for genetic analyses, the PepperSNP16K can be used worldwide to share findings and analyze QTLs for important traits leading to continued improvement of pepper for consumers. Data and information on the array are available through the Solanaceae Genomics Network.

  20. A HapMap leads to a Capsicum annuum SNP infinium array: a new tool for pepper breeding

    PubMed Central

    Hulse-Kemp, Amanda M; Ashrafi, Hamid; Plieske, Joerg; Lemm, Jana; Stoffel, Kevin; Hill, Theresa; Luerssen, Hartmut; Pethiyagoda, Charit L; Lawley, Cindy T; Ganal, Martin W; Van Deynze, Allen

    2016-01-01

    The Capsicum genus (Pepper) is a part of the Solanacae family. It has been important in many cultures worldwide for its key nutritional components and uses as spices, medicines, ornamentals and vegetables. Worldwide population growth is associated with demand for more nutritionally valuable vegetables while contending with decreasing resources and available land. These conditions require increased efficiency in pepper breeding to deal with these imminent challenges. Through resequencing of inbred lines we have completed a valuable haplotype map (HapMap) for the pepper genome based on single-nucleotide polymorphisms (SNP). The identified SNPs were annotated and classified based on their gene annotation in the pepper draft genome sequence and phenotype of the sequenced inbred lines. A selection of one marker per gene model was utilized to create the PepperSNP16K array, which simultaneously genotyped 16 405 SNPs, of which 90.7% were found to be informative. A set of 84 inbred and hybrid lines and a mapping population of 90 interspecific F2 individuals were utilized to validate the array. Diversity analysis of the inbred lines shows a distinct separation of bell versus chile/hot pepper types and separates them into five distinct germplasm groups. The interspecific population created between Tabasco (C. frutescens chile type) and P4 (C. annuum blocky type) produced a linkage map with 5546 markers separated into 1361 bins on twelve 12 linkage groups representing 1392.3 cM. This publically available genotyping platform can be used to rapidly assess a large number of markers in a reproducible high-throughput manner for pepper. As a standardized tool for genetic analyses, the PepperSNP16K can be used worldwide to share findings and analyze QTLs for important traits leading to continued improvement of pepper for consumers. Data and information on the array are available through the Solanaceae Genomics Network. PMID:27602231

  1. Japonica array: improved genotype imputation by designing a population-specific SNP array with 1070 Japanese individuals

    PubMed Central

    Kawai, Yosuke; Mimori, Takahiro; Kojima, Kaname; Nariai, Naoki; Danjoh, Inaho; Saito, Rumiko; Yasuda, Jun; Yamamoto, Masayuki; Nagasaki, Masao

    2015-01-01

    The Tohoku Medical Megabank Organization constructed the reference panel (referred to as the 1KJPN panel), which contains >20 million single nucleotide polymorphisms (SNPs), from whole-genome sequence data from 1070 Japanese individuals. The 1KJPN panel contains the largest number of haplotypes of Japanese ancestry to date. Here, from the 1KJPN panel, we designed a novel custom-made SNP array, named the Japonica array, which is suitable for whole-genome imputation of Japanese individuals. The array contains 659 253 SNPs, including tag SNPs for imputation, SNPs of Y chromosome and mitochondria, and SNPs related to previously reported genome-wide association studies and pharmacogenomics. The Japonica array provides better imputation performance for Japanese individuals than the existing commercially available SNP arrays with both the 1KJPN panel and the International 1000 genomes project panel. For common SNPs (minor allele frequency (MAF)>5%), the genomic coverage of the Japonica array (r2>0.8) was 96.9%, that is, almost all common SNPs were covered by this array. Nonetheless, the coverage of low-frequency SNPs (0.5%SNP arrays based on a population-specific reference panel is a practical way to facilitate further association studies through genome-wide genotype imputations. PMID:26108142

  2. A novel TCF7L2 type 2 diabetes SNP identified from fine mapping in African American women

    PubMed Central

    Haddad, Stephen A.; Palmer, Julie R.; Lunetta, Kathryn L.; Ng, Maggie C. Y.; Ruiz-Narváez, Edward A.

    2017-01-01

    SNP rs7903146 in the Wnt pathway’s TCF7L2 gene is the variant most significantly associated with type 2 diabetes to date, with associations observed across diverse populations. We sought to determine whether variants in other Wnt pathway genes are also associated with this disease. We evaluated 69 genes involved in the Wnt pathway, including TCF7L2, for associations with type 2 diabetes in 2632 African American cases and 2596 controls from the Black Women’s Health Study. Tag SNPs for each gene region were genotyped on a custom Affymetrix Axiom Array, and imputation was performed to 1000 Genomes Phase 3 data. Gene-based analyses were conducted using the adaptive rank truncated product (ARTP) statistic. The PSMD2 gene was significantly associated with type 2 diabetes after correction for multiple testing (corrected p = 0.016), based on the nine most significant single variants in the +/- 20 kb region surrounding the gene, which includes nearby genes EIF4G1, ECE2, and EIF2B5. Association data on four of the nine variants were available from an independent sample of 8284 African American cases and 15,543 controls; associations were in the same direction, but weak and not statistically significant. TCF7L2 was the only other gene associated with type 2 diabetes at nominal p <0.01 in our data. One of the three variants in the best gene-based model for TCF7L2, rs114770437, was not correlated with the GWAS index SNP rs7903146 and may represent an independent association signal seen only in African ancestry populations. Data on this SNP were not available in the replication sample. PMID:28253288

  3. SNP-Discovery by RAD-Sequencing in a Germplasm Collection of Wild and Cultivated Grapevines (V. vinifera L.)

    PubMed Central

    Birolo, Giovanni; Prazzoli, Maria Lucia; Lorenzi, Silvia; Valle, Giorgio; Grando, Maria Stella

    2017-01-01

    Whole-genome comparisons of Vitis vinifera subsp. sativa and V. vinifera subsp. sylvestris are expected to provide a better estimate of the valuable genetic diversity still present in grapevine, and help to reconstruct the evolutionary history of a major crop worldwide. To this aim, the increase of molecular marker density across the grapevine genome is fundamental. Here we describe the SNP discovery in a grapevine germplasm collection of 51 cultivars and 44 wild accessions through a novel protocol of restriction-site associated DNA (RAD) sequencing. By resequencing 1.1% of the grapevine genome at a high coverage, we recovered 34K BamHI unique restriction sites, of which 6.8% were absent in the ‘PN40024’ reference genome. Moreover, we identified 37,748 single nucleotide polymorphisms (SNPs), 93% of which belonged to the 19 assembled chromosomes with an average of 1.8K SNPs per chromosome. Nearly half of the SNPs fell in genic regions mostly assigned to the functional categories of metabolism and regulation, whereas some nonsynonymous variants were identified in genes related with the detection and response to environmental stimuli. SNP validation was carried-out, showing the ability of RAD-seq to accurately determine genotypes in a highly heterozygous species. To test the usefulness of our SNP panel, the main diversity statistics were evaluated, highlighting how the wild grapevine retained less genetic variability than the cultivated form. Furthermore, the analysis of Linkage Disequilibrium (LD) in the two subspecies separately revealed how the LD decays faster within the domesticated grapevine compared to its wild relative. Being the first application of RAD-seq in a diverse grapevine germplasm collection, our approach holds great promise for exploiting the genetic resources available in one of the most economically important fruit crops. PMID:28125640

  4. A Systematic Evaluation of Short Tandem Repeats in Lipid Candidate Genes: Riding on the SNP-Wave

    PubMed Central

    Lamina, Claudia; Haun, Margot; Coassin, Stefan; Kloss-Brandstätter, Anita; Gieger, Christian; Peters, Annette; Grallert, Harald; Strauch, Konstantin; Meitinger, Thomas; Kedenko, Lyudmyla; Paulweber, Bernhard; Kronenberg, Florian

    2014-01-01

    Structural genetic variants as short tandem repeats (STRs) are not targeted in SNP-based association studies and thus, their possible association signals are missed. We systematically searched for STRs in gene regions known to contribute to total cholesterol, HDL cholesterol, LDL cholesterol and triglyceride levels in two independent studies (KORA F4, n = 2553 and SAPHIR, n = 1648), resulting in 16 STRs that were finally evaluated. In a combined dataset of both studies, the sum of STR alleles was regressed on each phenotype, adjusted for age and sex. The association analyses were repeated for SNPs in a 200 kb region surrounding the respective STRs in the KORA F4 Study. Three STRs were significantly associated with total cholesterol (within LDLR, the APOA1/C3/A4/A5/BUD13 gene region and ABCG5/8), five with HDL cholesterol (3 within CETP, one in LPL and one inAPOA1/C3/A4/A5/BUD13), three with LDL cholesterol (LDLR, ABCG5/8 and CETP) and two with triglycerides (APOA1/C3/A4/A5/BUD13 and LPL). None of the investigated STRs, however, showed a significant association after adjusting for the lead or adjacent SNPs within that gene region. The evaluated STRs were found to be well tagged by the lead SNP within the respective gene regions. Therefore, the STRs reflect the association signals based on surrounding SNPs. In conclusion, none of the STRs contributed additionally to the SNP-based association signals identified in GWAS on lipid traits. PMID:25050552

  5. A Nonsynonymous SNP Catalog of Mycobacterium tuberculosis Virulence Genes and Its Use for Detecting New Potentially Virulent Sublineages

    PubMed Central

    Mikheecheva, Natalya E.; Zaychikova, Marina V.; Melerzanov, Alexander V.

    2017-01-01

    Mycobacterium tuberculosis is divided into several distinct lineages, and various genetic markers such as IS-elements, VNTR, and SNPs are used for lineage identification. We propose an M. tuberculosis classification approach based on functional polymorphisms in virulence genes. An M. tuberculosis virulence genes catalog has been established, including 319 genes from various protein groups, such as proteases, cell wall proteins, fatty acid and lipid metabolism proteins, sigma factors, toxin–antitoxin systems. Another catalog of 1,573 M. tuberculosis isolates of different lineages has been developed. The developed SNP-calling program has identified 3,563 nonsynonymous SNPs. The constructed SNP-based phylogeny reflected the evolutionary relationship between lineages and detected new sublineages. SNP analysis of sublineage F15/LAM4/KZN revealed four lineage-specific mutations in cyp125, mce3B, vapC25, and vapB34. The Ural lineage has been divided into two geographical clusters based on different SNPs in virulence genes. A new sublineage, B0/N-90, was detected inside the Beijing-B0/W-148 by SNPs in irtB, mce3F and vapC46. We have found 27 members of B0/N-90 among the 227 available genomes of the Beijing-B0/W-148 sublineage. Whole-genome sequencing of strain B9741, isolated from an HIV-positive patient, was demonstrated to belong to the new B0/N-90 group. A primer set for PCR detection of B0/N-90 lineage-specific mutations has been developed. The prospective use of mce3 mutant genes as genetically engineered vaccine is discussed. PMID:28338924

  6. EST-SNP discovery and dense genetic mapping in lentil (Lens culinaris Medik.) enable candidate gene selection for boron tolerance.

    PubMed

    Kaur, Sukhjiwan; Cogan, Noel O I; Stephens, Amber; Noy, Dianne; Butsch, Mirella; Forster, John W; Materne, Michael

    2014-03-01

    Large-scale SNP discovery and dense genetic mapping in a lentil intraspecific cross permitted identification of a single chromosomal region controlling tolerance to boron toxicity, an important breeding objective. Lentil (Lens culinaris Medik.) is a highly nutritious food legume crop that is cultivated world-wide. Until recently, lentil has been considered a genomic 'orphan' crop, limiting the feasibility of marker-assisted selection strategies in breeding programs. The present study reports on the identification of single-nucleotide polymorphisms (SNPs) from transcriptome sequencing data, utilisation of expressed sequence tag (EST)-derived simple sequence repeat (SSR) and SNP marke