coding sequence variants: Topics by Science.gov

Sample records for coding sequence variants

Sequence data and association statistics from 12,940 type 2 diabetes cases and controls.

PubMed

Flannick, Jason; Fuchsberger, Christian; Mahajan, Anubha; Teslovich, Tanya M; Agarwala, Vineeta; Gaulton, Kyle J; Caulkins, Lizz; Koesterer, Ryan; Ma, Clement; Moutsianas, Loukas; McCarthy, Davis J; Rivas, Manuel A; Perry, John R B; Sim, Xueling; Blackwell, Thomas W; Robertson, Neil R; Rayner, N William; Cingolani, Pablo; Locke, Adam E; Tajes, Juan Fernandez; Highland, Heather M; Dupuis, Josee; Chines, Peter S; Lindgren, Cecilia M; Hartl, Christopher; Jackson, Anne U; Chen, Han; Huyghe, Jeroen R; van de Bunt, Martijn; Pearson, Richard D; Kumar, Ashish; Müller-Nurasyid, Martina; Grarup, Niels; Stringham, Heather M; Gamazon, Eric R; Lee, Jaehoon; Chen, Yuhui; Scott, Robert A; Below, Jennifer E; Chen, Peng; Huang, Jinyan; Go, Min Jin; Stitzel, Michael L; Pasko, Dorota; Parker, Stephen C J; Varga, Tibor V; Green, Todd; Beer, Nicola L; Day-Williams, Aaron G; Ferreira, Teresa; Fingerlin, Tasha; Horikoshi, Momoko; Hu, Cheng; Huh, Iksoo; Ikram, Mohammad Kamran; Kim, Bong-Jo; Kim, Yongkang; Kim, Young Jin; Kwon, Min-Seok; Lee, Juyoung; Lee, Selyeong; Lin, Keng-Han; Maxwell, Taylor J; Nagai, Yoshihiko; Wang, Xu; Welch, Ryan P; Yoon, Joon; Zhang, Weihua; Barzilai, Nir; Voight, Benjamin F; Han, Bok-Ghee; Jenkinson, Christopher P; Kuulasmaa, Teemu; Kuusisto, Johanna; Manning, Alisa; Ng, Maggie C Y; Palmer, Nicholette D; Balkau, Beverley; Stančáková, Alena; Abboud, Hanna E; Boeing, Heiner; Giedraitis, Vilmantas; Prabhakaran, Dorairaj; Gottesman, Omri; Scott, James; Carey, Jason; Kwan, Phoenix; Grant, George; Smith, Joshua D; Neale, Benjamin M; Purcell, Shaun; Butterworth, Adam S; Howson, Joanna M M; Lee, Heung Man; Lu, Yingchang; Kwak, Soo-Heon; Zhao, Wei; Danesh, John; Lam, Vincent K L; Park, Kyong Soo; Saleheen, Danish; So, Wing Yee; Tam, Claudia H T; Afzal, Uzma; Aguilar, David; Arya, Rector; Aung, Tin; Chan, Edmund; Navarro, Carmen; Cheng, Ching-Yu; Palli, Domenico; Correa, Adolfo; Curran, Joanne E; Rybin, Dennis; Farook, Vidya S; Fowler, Sharon P; Freedman, Barry I; Griswold, Michael; Hale, Daniel Esten; Hicks, Pamela J; Khor, Chiea-Chuen; Kumar, Satish; Lehne, Benjamin; Thuillier, Dorothée; Lim, Wei Yen; Liu, Jianjun; Loh, Marie; Musani, Solomon K; Puppala, Sobha; Scott, William R; Yengo, Loïc; Tan, Sian-Tsung; Taylor, Herman A; Thameem, Farook; Wilson, Gregory; Wong, Tien Yin; Njølstad, Pål Rasmus; Levy, Jonathan C; Mangino, Massimo; Bonnycastle, Lori L; Schwarzmayr, Thomas; Fadista, João; Surdulescu, Gabriela L; Herder, Christian; Groves, Christopher J; Wieland, Thomas; Bork-Jensen, Jette; Brandslund, Ivan; Christensen, Cramer; Koistinen, Heikki A; Doney, Alex S F; Kinnunen, Leena; Esko, Tõnu; Farmer, Andrew J; Hakaste, Liisa; Hodgkiss, Dylan; Kravic, Jasmina; Lyssenko, Valeri; Hollensted, Mette; Jørgensen, Marit E; Jørgensen, Torben; Ladenvall, Claes; Justesen, Johanne Marie; Käräjämäki, Annemari; Kriebel, Jennifer; Rathmann, Wolfgang; Lannfelt, Lars; Lauritzen, Torsten; Narisu, Narisu; Linneberg, Allan; Melander, Olle; Milani, Lili; Neville, Matt; Orho-Melander, Marju; Qi, Lu; Qi, Qibin; Roden, Michael; Rolandsson, Olov; Swift, Amy; Rosengren, Anders H; Stirrups, Kathleen; Wood, Andrew R; Mihailov, Evelin; Blancher, Christine; Carneiro, Mauricio O; Maguire, Jared; Poplin, Ryan; Shakir, Khalid; Fennell, Timothy; DePristo, Mark; de Angelis, Martin Hrabé; Deloukas, Panos; Gjesing, Anette P; Jun, Goo; Nilsson, Peter; Murphy, Jacquelyn; Onofrio, Robert; Thorand, Barbara; Hansen, Torben; Meisinger, Christa; Hu, Frank B; Isomaa, Bo; Karpe, Fredrik; Liang, Liming; Peters, Annette; Huth, Cornelia; O'Rahilly, Stephen P; Palmer, Colin N A; Pedersen, Oluf; Rauramaa, Rainer; Tuomilehto, Jaakko; Salomaa, Veikko; Watanabe, Richard M; Syvänen, Ann-Christine; Bergman, Richard N; Bharadwaj, Dwaipayan; Bottinger, Erwin P; Cho, Yoon Shin; Chandak, Giriraj R; Chan, Juliana Cn; Chia, Kee Seng; Daly, Mark J; Ebrahim, Shah B; Langenberg, Claudia; Elliott, Paul; Jablonski, Kathleen A; Lehman, Donna M; Jia, Weiping; Ma, Ronald C W; Pollin, Toni I; Sandhu, Manjinder; Tandon, Nikhil; Froguel, Philippe; Barroso, Inês; Teo, Yik Ying; Zeggini, Eleftheria; Loos, Ruth J F; Small, Kerrin S; Ried, Janina S; DeFronzo, Ralph A; Grallert, Harald; Glaser, Benjamin; Metspalu, Andres; Wareham, Nicholas J; Walker, Mark; Banks, Eric; Gieger, Christian; Ingelsson, Erik; Im, Hae Kyung; Illig, Thomas; Franks, Paul W; Buck, Gemma; Trakalo, Joseph; Buck, David; Prokopenko, Inga; Mägi, Reedik; Lind, Lars; Farjoun, Yossi; Owen, Katharine R; Gloyn, Anna L; Strauch, Konstantin; Tuomi, Tiinamaija; Kooner, Jaspal Singh; Lee, Jong-Young; Park, Taesung; Donnelly, Peter; Morris, Andrew D; Hattersley, Andrew T; Bowden, Donald W; Collins, Francis S; Atzmon, Gil; Chambers, John C; Spector, Timothy D; Laakso, Markku; Strom, Tim M; Bell, Graeme I; Blangero, John; Duggirala, Ravindranath; Tai, E Shyong; McVean, Gilean; Hanis, Craig L; Wilson, James G; Seielstad, Mark; Frayling, Timothy M; Meigs, James B; Cox, Nancy J; Sladek, Rob; Lander, Eric S; Gabriel, Stacey; Mohlke, Karen L; Meitinger, Thomas; Groop, Leif; Abecasis, Goncalo; Scott, Laura J; Morris, Andrew P; Kang, Hyun Min; Altshuler, David; Burtt, Noël P; Florez, Jose C; Boehnke, Michael; McCarthy, Mark I

2017-12-19

To investigate the genetic basis of type 2 diabetes (T2D) to high resolution, the GoT2D and T2D-GENES consortia catalogued variation from whole-genome sequencing of 2,657 European individuals and exome sequencing of 12,940 individuals of multiple ancestries. Over 27M SNPs, indels, and structural variants were identified, including 99% of low-frequency (minor allele frequency [MAF] 0.1-5%) non-coding variants in the whole-genome sequenced individuals and 99.7% of low-frequency coding variants in the whole-exome sequenced individuals. Each variant was tested for association with T2D in the sequenced individuals, and, to increase power, most were tested in larger numbers of individuals (>80% of low-frequency coding variants in ~82 K Europeans via the exome chip, and ~90% of low-frequency non-coding variants in ~44 K Europeans via genotype imputation). The variants, genotypes, and association statistics from these analyses provide the largest reference to date of human genetic information relevant to T2D, for use in activities such as T2D-focused genotype imputation, functional characterization of variants or genes, and other novel analyses to detect associations between sequence variation and T2D.
Sequence data and association statistics from 12,940 type 2 diabetes cases and controls

PubMed Central

Jason, Flannick; Fuchsberger, Christian; Mahajan, Anubha; Teslovich, Tanya M.; Agarwala, Vineeta; Gaulton, Kyle J.; Caulkins, Lizz; Koesterer, Ryan; Ma, Clement; Moutsianas, Loukas; McCarthy, Davis J.; Rivas, Manuel A.; Perry, John R. B.; Sim, Xueling; Blackwell, Thomas W.; Robertson, Neil R.; Rayner, N William; Cingolani, Pablo; Locke, Adam E.; Tajes, Juan Fernandez; Highland, Heather M.; Dupuis, Josee; Chines, Peter S.; Lindgren, Cecilia M.; Hartl, Christopher; Jackson, Anne U.; Chen, Han; Huyghe, Jeroen R.; van de Bunt, Martijn; Pearson, Richard D.; Kumar, Ashish; Müller-Nurasyid, Martina; Grarup, Niels; Stringham, Heather M.; Gamazon, Eric R.; Lee, Jaehoon; Chen, Yuhui; Scott, Robert A.; Below, Jennifer E.; Chen, Peng; Huang, Jinyan; Go, Min Jin; Stitzel, Michael L.; Pasko, Dorota; Parker, Stephen C. J.; Varga, Tibor V.; Green, Todd; Beer, Nicola L.; Day-Williams, Aaron G.; Ferreira, Teresa; Fingerlin, Tasha; Horikoshi, Momoko; Hu, Cheng; Huh, Iksoo; Ikram, Mohammad Kamran; Kim, Bong-Jo; Kim, Yongkang; Kim, Young Jin; Kwon, Min-Seok; Lee, Juyoung; Lee, Selyeong; Lin, Keng-Han; Maxwell, Taylor J.; Nagai, Yoshihiko; Wang, Xu; Welch, Ryan P.; Yoon, Joon; Zhang, Weihua; Barzilai, Nir; Voight, Benjamin F.; Han, Bok-Ghee; Jenkinson, Christopher P.; Kuulasmaa, Teemu; Kuusisto, Johanna; Manning, Alisa; Ng, Maggie C. Y.; Palmer, Nicholette D.; Balkau, Beverley; Stančáková, Alena; Abboud, Hanna E.; Boeing, Heiner; Giedraitis, Vilmantas; Prabhakaran, Dorairaj; Gottesman, Omri; Scott, James; Carey, Jason; Kwan, Phoenix; Grant, George; Smith, Joshua D.; Neale, Benjamin M.; Purcell, Shaun; Butterworth, Adam S.; Howson, Joanna M. M.; Lee, Heung Man; Lu, Yingchang; Kwak, Soo-Heon; Zhao, Wei; Danesh, John; Lam, Vincent K. L.; Park, Kyong Soo; Saleheen, Danish; So, Wing Yee; Tam, Claudia H. T.; Afzal, Uzma; Aguilar, David; Arya, Rector; Aung, Tin; Chan, Edmund; Navarro, Carmen; Cheng, Ching-Yu; Palli, Domenico; Correa, Adolfo; Curran, Joanne E.; Rybin, Dennis; Farook, Vidya S.; Fowler, Sharon P.; Freedman, Barry I.; Griswold, Michael; Hale, Daniel Esten; Hicks, Pamela J.; Khor, Chiea-Chuen; Kumar, Satish; Lehne, Benjamin; Thuillier, Dorothée; Lim, Wei Yen; Liu, Jianjun; Loh, Marie; Musani, Solomon K.; Puppala, Sobha; Scott, William R.; Yengo, Loïc; Tan, Sian-Tsung; Taylor, Herman A.; Thameem, Farook; Wilson, Gregory; Wong, Tien Yin; Njølstad, Pål Rasmus; Levy, Jonathan C.; Mangino, Massimo; Bonnycastle, Lori L.; Schwarzmayr, Thomas; Fadista, João; Surdulescu, Gabriela L.; Herder, Christian; Groves, Christopher J.; Wieland, Thomas; Bork-Jensen, Jette; Brandslund, Ivan; Christensen, Cramer; Koistinen, Heikki A.; Doney, Alex S. F.; Kinnunen, Leena; Esko, Tõnu; Farmer, Andrew J.; Hakaste, Liisa; Hodgkiss, Dylan; Kravic, Jasmina; Lyssenko, Valeri; Hollensted, Mette; Jørgensen, Marit E.; Jørgensen, Torben; Ladenvall, Claes; Justesen, Johanne Marie; Käräjämäki, Annemari; Kriebel, Jennifer; Rathmann, Wolfgang; Lannfelt, Lars; Lauritzen, Torsten; Narisu, Narisu; Linneberg, Allan; Melander, Olle; Milani, Lili; Neville, Matt; Orho-Melander, Marju; Qi, Lu; Qi, Qibin; Roden, Michael; Rolandsson, Olov; Swift, Amy; Rosengren, Anders H.; Stirrups, Kathleen; Wood, Andrew R.; Mihailov, Evelin; Blancher, Christine; Carneiro, Mauricio O.; Maguire, Jared; Poplin, Ryan; Shakir, Khalid; Fennell, Timothy; DePristo, Mark; de Angelis, Martin Hrabé; Deloukas, Panos; Gjesing, Anette P.; Jun, Goo; Nilsson, Peter; Murphy, Jacquelyn; Onofrio, Robert; Thorand, Barbara; Hansen, Torben; Meisinger, Christa; Hu, Frank B.; Isomaa, Bo; Karpe, Fredrik; Liang, Liming; Peters, Annette; Huth, Cornelia; O'Rahilly, Stephen P; Palmer, Colin N. A.; Pedersen, Oluf; Rauramaa, Rainer; Tuomilehto, Jaakko; Salomaa, Veikko; Watanabe, Richard M.; Syvänen, Ann-Christine; Bergman, Richard N.; Bharadwaj, Dwaipayan; Bottinger, Erwin P.; Cho, Yoon Shin; Chandak, Giriraj R.; Chan, Juliana CN; Chia, Kee Seng; Daly, Mark J.; Ebrahim, Shah B.; Langenberg, Claudia; Elliott, Paul; Jablonski, Kathleen A.; Lehman, Donna M.; Jia, Weiping; Ma, Ronald C. W.; Pollin, Toni I.; Sandhu, Manjinder; Tandon, Nikhil; Froguel, Philippe; Barroso, Inês; Teo, Yik Ying; Zeggini, Eleftheria; Loos, Ruth J. F.; Small, Kerrin S.; Ried, Janina S.; DeFronzo, Ralph A.; Grallert, Harald; Glaser, Benjamin; Metspalu, Andres; Wareham, Nicholas J.; Walker, Mark; Banks, Eric; Gieger, Christian; Ingelsson, Erik; Im, Hae Kyung; Illig, Thomas; Franks, Paul W.; Buck, Gemma; Trakalo, Joseph; Buck, David; Prokopenko, Inga; Mägi, Reedik; Lind, Lars; Farjoun, Yossi; Owen, Katharine R.; Gloyn, Anna L.; Strauch, Konstantin; Tuomi, Tiinamaija; Kooner, Jaspal Singh; Lee, Jong-Young; Park, Taesung; Donnelly, Peter; Morris, Andrew D.; Hattersley, Andrew T.; Bowden, Donald W.; Collins, Francis S.; Atzmon, Gil; Chambers, John C.; Spector, Timothy D.; Laakso, Markku; Strom, Tim M.; Bell, Graeme I.; Blangero, John; Duggirala, Ravindranath; Tai, E. Shyong; McVean, Gilean; Hanis, Craig L.; Wilson, James G.; Seielstad, Mark; Frayling, Timothy M.; Meigs, James B.; Cox, Nancy J.; Sladek, Rob; Lander, Eric S.; Gabriel, Stacey; Mohlke, Karen L.; Meitinger, Thomas; Groop, Leif; Abecasis, Goncalo; Scott, Laura J.; Morris, Andrew P.; Kang, Hyun Min; Altshuler, David; Burtt, Noël P.; Florez, Jose C.; Boehnke, Michael; McCarthy, Mark I.

2017-01-01

To investigate the genetic basis of type 2 diabetes (T2D) to high resolution, the GoT2D and T2D-GENES consortia catalogued variation from whole-genome sequencing of 2,657 European individuals and exome sequencing of 12,940 individuals of multiple ancestries. Over 27M SNPs, indels, and structural variants were identified, including 99% of low-frequency (minor allele frequency [MAF] 0.1–5%) non-coding variants in the whole-genome sequenced individuals and 99.7% of low-frequency coding variants in the whole-exome sequenced individuals. Each variant was tested for association with T2D in the sequenced individuals, and, to increase power, most were tested in larger numbers of individuals (>80% of low-frequency coding variants in ~82 K Europeans via the exome chip, and ~90% of low-frequency non-coding variants in ~44 K Europeans via genotype imputation). The variants, genotypes, and association statistics from these analyses provide the largest reference to date of human genetic information relevant to T2D, for use in activities such as T2D-focused genotype imputation, functional characterization of variants or genes, and other novel analyses to detect associations between sequence variation and T2D. PMID:29257133
Efficient analysis of mouse genome sequences reveal many nonsense variants

PubMed Central

Steeland, Sophie; Timmermans, Steven; Van Ryckeghem, Sara; Hulpiau, Paco; Saeys, Yvan; Van Montagu, Marc; Vandenbroucke, Roosmarijn E.; Libert, Claude

2016-01-01

Genetic polymorphisms in coding genes play an important role when using mouse inbred strains as research models. They have been shown to influence research results, explain phenotypical differences between inbred strains, and increase the amount of interesting gene variants present in the many available inbred lines. SPRET/Ei is an inbred strain derived from Mus spretus that has ∼1% sequence difference with the C57BL/6J reference genome. We obtained a listing of all SNPs and insertions/deletions (indels) present in SPRET/Ei from the Mouse Genomes Project (Wellcome Trust Sanger Institute) and processed these data to obtain an overview of all transcripts having nonsynonymous coding sequence variants. We identified 8,883 unique variants affecting 10,096 different transcripts from 6,328 protein-coding genes, which is about 28% of all coding genes. Because only a subset of these variants results in drastic changes in proteins, we focused on variations that are nonsense mutations that ultimately resulted in a gain of a stop codon. These genes were identified by in silico changing the C57BL/6J coding sequences to the SPRET/Ei sequences, converting them to amino acid (AA) sequences, and comparing the AA sequences. All variants and transcripts affected were also stored in a database, which can be browsed using a SPRET/Ei M. spretus variants web tool (www.spretus.org), including a manual. We validated the tool by demonstrating the loss of function of three proteins predicted to be severely truncated, namely Fas, IRAK2, and IFNγR1. PMID:27147605
Analysis of CHRNA7 rare variants in autism spectrum disorder susceptibility.

PubMed

Bacchelli, Elena; Battaglia, Agatino; Cameli, Cinzia; Lomartire, Silvia; Tancredi, Raffaella; Thomson, Susanne; Sutcliffe, James S; Maestrini, Elena

2015-04-01

Chromosome 15q13.3 recurrent microdeletions are causally associated with a wide range of phenotypes, including autism spectrum disorder (ASD), seizures, intellectual disability, and other psychiatric conditions. Whether the reciprocal microduplication is pathogenic is less certain. CHRNA7, encoding for the alpha7 subunit of the neuronal nicotinic acetylcholine receptor, is considered the likely culprit gene in mediating neurological phenotypes in 15q13.3 deletion cases. To assess if CHRNA7 rare variants confer risk to ASD, we performed copy number variant analysis and Sanger sequencing of the CHRNA7 coding sequence in a sample of 135 ASD cases. Sequence variation in this gene remains largely unexplored, given the existence of a fusion gene, CHRFAM7A, which includes a nearly identical partial duplication of CHRNA7. Hence, attempts to sequence coding exons must distinguish between CHRNA7 and CHRFAM7A, making next-generation sequencing approaches unreliable for this purpose. A CHRNA7 microduplication was detected in a patient with autism and moderate cognitive impairment; while no rare damaging variants were identified in the coding region, we detected rare variants in the promoter region, previously described to functionally reduce transcription. This study represents the first sequence variant analysis of CHRNA7 in a sample of idiopathic autism. © 2015 Wiley Periodicals, Inc.
VaDiR: an integrated approach to Variant Detection in RNA.

PubMed

Neums, Lisa; Suenaga, Seiji; Beyerlein, Peter; Anders, Sara; Koestler, Devin; Mariani, Andrea; Chien, Jeremy

2018-02-01

Advances in next-generation DNA sequencing technologies are now enabling detailed characterization of sequence variations in cancer genomes. With whole-genome sequencing, variations in coding and non-coding sequences can be discovered. But the cost associated with it is currently limiting its general use in research. Whole-exome sequencing is used to characterize sequence variations in coding regions, but the cost associated with capture reagents and biases in capture rate limit its full use in research. Additional limitations include uncertainty in assigning the functional significance of the mutations when these mutations are observed in the non-coding region or in genes that are not expressed in cancer tissue. We investigated the feasibility of uncovering mutations from expressed genes using RNA sequencing datasets with a method called Variant Detection in RNA(VaDiR) that integrates 3 variant callers, namely: SNPiR, RVBoost, and MuTect2. The combination of all 3 methods, which we called Tier 1 variants, produced the highest precision with true positive mutations from RNA-seq that could be validated at the DNA level. We also found that the integration of Tier 1 variants with those called by MuTect2 and SNPiR produced the highest recall with acceptable precision. Finally, we observed a higher rate of mutation discovery in genes that are expressed at higher levels. Our method, VaDiR, provides a possibility of uncovering mutations from RNA sequencing datasets that could be useful in further functional analysis. In addition, our approach allows orthogonal validation of DNA-based mutation discovery by providing complementary sequence variation analysis from paired RNA/DNA sequencing datasets.
Rare and Coding Region Genetic Variants Associated With Risk of Ischemic Stroke: The NHLBI Exome Sequence Project.

PubMed

Auer, Paul L; Nalls, Mike; Meschia, James F; Worrall, Bradford B; Longstreth, W T; Seshadri, Sudha; Kooperberg, Charles; Burger, Kathleen M; Carlson, Christopher S; Carty, Cara L; Chen, Wei-Min; Cupples, L Adrienne; DeStefano, Anita L; Fornage, Myriam; Hardy, John; Hsu, Li; Jackson, Rebecca D; Jarvik, Gail P; Kim, Daniel S; Lakshminarayan, Kamakshi; Lange, Leslie A; Manichaikul, Ani; Quinlan, Aaron R; Singleton, Andrew B; Thornton, Timothy A; Nickerson, Deborah A; Peters, Ulrike; Rich, Stephen S

2015-07-01

Stroke is the second leading cause of death and the third leading cause of years of life lost. Genetic factors contribute to stroke prevalence, and candidate gene and genome-wide association studies (GWAS) have identified variants associated with ischemic stroke risk. These variants often have small effects without obvious biological significance. Exome sequencing may discover predicted protein-altering variants with a potentially large effect on ischemic stroke risk. To investigate the contribution of rare and common genetic variants to ischemic stroke risk by targeting the protein-coding regions of the human genome. The National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project (ESP) analyzed approximately 6000 participants from numerous cohorts of European and African ancestry. For discovery, 365 cases of ischemic stroke (small-vessel and large-vessel subtypes) and 809 European ancestry controls were sequenced; for replication, 47 affected sibpairs concordant for stroke subtype and an African American case-control series were sequenced, with 1672 cases and 4509 European ancestry controls genotyped. The ESP's exome sequencing and genotyping started on January 1, 2010, and continued through June 30, 2012. Analyses were conducted on the full data set between July 12, 2012, and July 13, 2013. Discovery of new variants or genes contributing to ischemic stroke risk and subtype (primary analysis) and determination of support for protein-coding variants contributing to risk in previously published candidate genes (secondary analysis). We identified 2 novel genes associated with an increased risk of ischemic stroke: a protein-coding variant in PDE4DIP (rs1778155; odds ratio, 2.15; P = 2.63 × 10(-8)) with an intracellular signal transduction mechanism and in ACOT4 (rs35724886; odds ratio, 2.04; P = 1.24 × 10(-7)) with a fatty acid metabolism; confirmation of PDE4DIP was observed in affected sibpair families with large-vessel stroke subtype and in African Americans. Replication of protein-coding variants in candidate genes was observed for 2 previously reported GWAS associations: ZFHX3 (cardioembolic stroke) and ABCA1 (large-vessel stroke). Exome sequencing discovered 2 novel genes and mechanisms, PDE4DIP and ACOT4, associated with increased risk for ischemic stroke. In addition, ZFHX3 and ABCA1 were discovered to have protein-coding variants associated with ischemic stroke. These results suggest that genetic variation in novel pathways contributes to ischemic stroke risk and serves as a target for prediction, prevention, and therapy.
Negligible impact of rare autoimmune-locus coding-region variants on missing heritability.

PubMed

Hunt, Karen A; Mistry, Vanisha; Bockett, Nicholas A; Ahmad, Tariq; Ban, Maria; Barker, Jonathan N; Barrett, Jeffrey C; Blackburn, Hannah; Brand, Oliver; Burren, Oliver; Capon, Francesca; Compston, Alastair; Gough, Stephen C L; Jostins, Luke; Kong, Yong; Lee, James C; Lek, Monkol; MacArthur, Daniel G; Mansfield, John C; Mathew, Christopher G; Mein, Charles A; Mirza, Muddassar; Nutland, Sarah; Onengut-Gumuscu, Suna; Papouli, Efterpi; Parkes, Miles; Rich, Stephen S; Sawcer, Steven; Satsangi, Jack; Simmonds, Matthew J; Trembath, Richard C; Walker, Neil M; Wozniak, Eva; Todd, John A; Simpson, Michael A; Plagnol, Vincent; van Heel, David A

2013-06-13

Genome-wide association studies (GWAS) have identified common variants of modest-effect size at hundreds of loci for common autoimmune diseases; however, a substantial fraction of heritability remains unexplained, to which rare variants may contribute. To discover rare variants and test them for association with a phenotype, most studies re-sequence a small initial sample size and then genotype the discovered variants in a larger sample set. This approach fails to analyse a large fraction of the rare variants present in the entire sample set. Here we perform simultaneous amplicon-sequencing-based variant discovery and genotyping for coding exons of 25 GWAS risk genes in 41,911 UK residents of white European origin, comprising 24,892 subjects with six autoimmune disease phenotypes and 17,019 controls, and show that rare coding-region variants at known loci have a negligible role in common autoimmune disease susceptibility. These results do not support the rare-variant synthetic genome-wide-association hypothesis (in which unobserved rare causal variants lead to association detected at common tag variants). Many known autoimmune disease risk loci contain multiple, independently associated, common and low-frequency variants, and so genes at these loci are a priori stronger candidates for harbouring rare coding-region variants than other genes. Our data indicate that the missing heritability for common autoimmune diseases may not be attributable to the rare coding-region variant portion of the allelic spectrum, but perhaps, as others have proposed, may be a result of many common-variant loci of weak effect.
Whole-genome sequencing identifies EN1 as a determinant of bone density and fracture

PubMed Central

Zheng, Hou-Feng; Forgetta, Vincenzo; Hsu, Yi-Hsiang; Estrada, Karol; Rosello-Diez, Alberto; Leo, Paul J; Dahia, Chitra L; Park-Min, Kyung Hyun; Tobias, Jonathan H; Kooperberg, Charles; Kleinman, Aaron; Styrkarsdottir, Unnur; Liu, Ching-Ti; Uggla, Charlotta; Evans, Daniel S; Nielson, Carrie M; Walter, Klaudia; Pettersson-Kymmer, Ulrika; McCarthy, Shane; Eriksson, Joel; Kwan, Tony; Jhamai, Mila; Trajanoska, Katerina; Memari, Yasin; Min, Josine; Huang, Jie; Danecek, Petr; Wilmot, Beth; Li, Rui; Chou, Wen-Chi; Mokry, Lauren E; Moayyeri, Alireza; Claussnitzer, Melina; Cheng, Chia-Ho; Cheung, Warren; Medina-Gómez, Carolina; Ge, Bing; Chen, Shu-Huang; Choi, Kwangbom; Oei, Ling; Fraser, James; Kraaij, Robert; Hibbs, Matthew A; Gregson, Celia L; Paquette, Denis; Hofman, Albert; Wibom, Carl; Tranah, Gregory J; Marshall, Mhairi; Gardiner, Brooke B; Cremin, Katie; Auer, Paul; Hsu, Li; Ring, Sue; Tung, Joyce Y; Thorleifsson, Gudmar; Enneman, Anke W; van Schoor, Natasja M; de Groot, Lisette C.P.G.M.; van der Velde, Nathalie; Melin, Beatrice; Kemp, John P; Christiansen, Claus; Sayers, Adrian; Zhou, Yanhua; Calderari, Sophie; van Rooij, Jeroen; Carlson, Chris; Peters, Ulrike; Berlivet, Soizik; Dostie, Josée; Uitterlinden, Andre G; Williams, Stephen R.; Farber, Charles; Grinberg, Daniel; LaCroix, Andrea Z; Haessler, Jeff; Chasman, Daniel I; Giulianini, Franco; Rose, Lynda M; Ridker, Paul M; Eisman, John A; Nguyen, Tuan V; Center, Jacqueline R; Nogues, Xavier; Garcia-Giralt, Natalia; Launer, Lenore L; Gudnason, Vilmunder; Mellström, Dan; Vandenput, Liesbeth; Karlsson, Magnus K; Ljunggren, Östen; Svensson, Olle; Hallmans, Göran; Rousseau, François; Giroux, Sylvie; Bussière, Johanne; Arp, Pascal P; Koromani, Fjorda; Prince, Richard L; Lewis, Joshua R; Langdahl, Bente L; Hermann, A Pernille; Jensen, Jens-Erik B; Kaptoge, Stephen; Khaw, Kay-Tee; Reeve, Jonathan; Formosa, Melissa M; Xuereb-Anastasi, Angela; Åkesson, Kristina; McGuigan, Fiona E; Garg, Gaurav; Olmos, Jose M; Zarrabeitia, Maria T; Riancho, Jose A; Ralston, Stuart H; Alonso, Nerea; Jiang, Xi; Goltzman, David; Pastinen, Tomi; Grundberg, Elin; Gauguier, Dominique; Orwoll, Eric S; Karasik, David; Davey-Smith, George; Smith, Albert V; Siggeirsdottir, Kristin; Harris, Tamara B; Zillikens, M Carola; van Meurs, Joyce BJ; Thorsteinsdottir, Unnur; Maurano, Matthew T; Timpson, Nicholas J; Soranzo, Nicole; Durbin, Richard; Wilson, Scott G; Ntzani, Evangelia E; Brown, Matthew A; Stefansson, Kari; Hinds, David A; Spector, Tim; Cupples, L Adrienne; Ohlsson, Claes; Greenwood, Celia MT; Jackson, Rebecca D; Rowe, David W; Loomis, Cynthia A; Evans, David M; Ackert-Bicknell, Cheryl L; Joyner, Alexandra L; Duncan, Emma L; Kiel, Douglas P; Rivadeneira, Fernando; Richards, J Brent

2016-01-01

SUMMARY The extent to which low-frequency (minor allele frequency [MAF] between 1–5%) and rare (MAF ≤ 1%) variants contribute to complex traits and disease in the general population is largely unknown. Bone mineral density (BMD) is highly heritable, is a major predictor of osteoporotic fractures and has been previously associated with common genetic variants1–8, and rare, population-specific, coding variants9. Here we identify novel non-coding genetic variants with large effects on BMD (ntotal = 53,236) and fracture (ntotal = 508,253) in individuals of European ancestry from the general population. Associations for BMD were derived from whole-genome sequencing (n=2,882 from UK10K), whole-exome sequencing (n= 3,549), deep imputation of genotyped samples using a combined UK10K/1000Genomes reference panel (n=26,534), and de-novo replication genotyping (n= 20,271). We identified a low-frequency non-coding variant near a novel locus, EN1, with an effect size 4-fold larger than the mean of previously reported common variants for lumbar spine BMD8 (rs11692564[T], MAF = 1.7%, replication effect size = +0.20 standard deviations [SD], Pmeta = 2×10−14), which was also associated with a decreased risk of fracture (OR = 0.85; P = 2×10−11; ncases = 98,742 and ncontrols = 409,511). Using an En1Cre/flox mouse model, we observed that conditional loss of En1 results in low bone mass, likely as a consequence of high bone turn-over. We also identified a novel low-frequency non-coding variant with large effects on BMD near WNT16 (rs148771817[T], MAF = 1.1%, replication effect size = +0.39 SD, Pmeta = 1×10−11). In general, there was an excess of association signals arising from deleterious coding and conserved non-coding variants. These findings provide evidence that low-frequency non-coding variants have large effects on BMD and fracture, thereby providing rationale for whole-genome sequencing and improved imputation reference panels to study the genetic architecture of complex traits and disease in the general population. PMID:26367794
Analysis of protein-coding genetic variation in 60,706 humans.

PubMed

Lek, Monkol; Karczewski, Konrad J; Minikel, Eric V; Samocha, Kaitlin E; Banks, Eric; Fennell, Timothy; O'Donnell-Luria, Anne H; Ware, James S; Hill, Andrew J; Cummings, Beryl B; Tukiainen, Taru; Birnbaum, Daniel P; Kosmicki, Jack A; Duncan, Laramie E; Estrada, Karol; Zhao, Fengmei; Zou, James; Pierce-Hoffman, Emma; Berghout, Joanne; Cooper, David N; Deflaux, Nicole; DePristo, Mark; Do, Ron; Flannick, Jason; Fromer, Menachem; Gauthier, Laura; Goldstein, Jackie; Gupta, Namrata; Howrigan, Daniel; Kiezun, Adam; Kurki, Mitja I; Moonshine, Ami Levy; Natarajan, Pradeep; Orozco, Lorena; Peloso, Gina M; Poplin, Ryan; Rivas, Manuel A; Ruano-Rubio, Valentin; Rose, Samuel A; Ruderfer, Douglas M; Shakir, Khalid; Stenson, Peter D; Stevens, Christine; Thomas, Brett P; Tiao, Grace; Tusie-Luna, Maria T; Weisburd, Ben; Won, Hong-Hee; Yu, Dongmei; Altshuler, David M; Ardissino, Diego; Boehnke, Michael; Danesh, John; Donnelly, Stacey; Elosua, Roberto; Florez, Jose C; Gabriel, Stacey B; Getz, Gad; Glatt, Stephen J; Hultman, Christina M; Kathiresan, Sekar; Laakso, Markku; McCarroll, Steven; McCarthy, Mark I; McGovern, Dermot; McPherson, Ruth; Neale, Benjamin M; Palotie, Aarno; Purcell, Shaun M; Saleheen, Danish; Scharf, Jeremiah M; Sklar, Pamela; Sullivan, Patrick F; Tuomilehto, Jaakko; Tsuang, Ming T; Watkins, Hugh C; Wilson, James G; Daly, Mark J; MacArthur, Daniel G

2016-08-18

Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.
Association of low-frequency and rare coding-sequence variants with blood lipids and Coronary Heart Disease in 56,000 whites and blacks

USDA-ARS?s Scientific Manuscript database

Low-frequency coding DNA sequence variants in the proprotein convertase subtilisin/kexin type 9 gene (PCSK9) lower plasma low-density lipoprotein cholesterol (LDL-C), protect against risk of coronary heart disease (CHD), and have prompted the development of a new class of therapeutics. It is uncerta...
Refining the accuracy of validated target identification through coding variant fine-mapping in type 2 diabetes.

PubMed

Mahajan, Anubha; Wessel, Jennifer; Willems, Sara M; Zhao, Wei; Robertson, Neil R; Chu, Audrey Y; Gan, Wei; Kitajima, Hidetoshi; Taliun, Daniel; Rayner, N William; Guo, Xiuqing; Lu, Yingchang; Li, Man; Jensen, Richard A; Hu, Yao; Huo, Shaofeng; Lohman, Kurt K; Zhang, Weihua; Cook, James P; Prins, Bram Peter; Flannick, Jason; Grarup, Niels; Trubetskoy, Vassily Vladimirovich; Kravic, Jasmina; Kim, Young Jin; Rybin, Denis V; Yaghootkar, Hanieh; Müller-Nurasyid, Martina; Meidtner, Karina; Li-Gao, Ruifang; Varga, Tibor V; Marten, Jonathan; Li, Jin; Smith, Albert Vernon; An, Ping; Ligthart, Symen; Gustafsson, Stefan; Malerba, Giovanni; Demirkan, Ayse; Tajes, Juan Fernandez; Steinthorsdottir, Valgerdur; Wuttke, Matthias; Lecoeur, Cécile; Preuss, Michael; Bielak, Lawrence F; Graff, Marielisa; Highland, Heather M; Justice, Anne E; Liu, Dajiang J; Marouli, Eirini; Peloso, Gina Marie; Warren, Helen R; Afaq, Saima; Afzal, Shoaib; Ahlqvist, Emma; Almgren, Peter; Amin, Najaf; Bang, Lia B; Bertoni, Alain G; Bombieri, Cristina; Bork-Jensen, Jette; Brandslund, Ivan; Brody, Jennifer A; Burtt, Noël P; Canouil, Mickaël; Chen, Yii-Der Ida; Cho, Yoon Shin; Christensen, Cramer; Eastwood, Sophie V; Eckardt, Kai-Uwe; Fischer, Krista; Gambaro, Giovanni; Giedraitis, Vilmantas; Grove, Megan L; de Haan, Hugoline G; Hackinger, Sophie; Hai, Yang; Han, Sohee; Tybjærg-Hansen, Anne; Hivert, Marie-France; Isomaa, Bo; Jäger, Susanne; Jørgensen, Marit E; Jørgensen, Torben; Käräjämäki, Annemari; Kim, Bong-Jo; Kim, Sung Soo; Koistinen, Heikki A; Kovacs, Peter; Kriebel, Jennifer; Kronenberg, Florian; Läll, Kristi; Lange, Leslie A; Lee, Jung-Jin; Lehne, Benjamin; Li, Huaixing; Lin, Keng-Hung; Linneberg, Allan; Liu, Ching-Ti; Liu, Jun; Loh, Marie; Mägi, Reedik; Mamakou, Vasiliki; McKean-Cowdin, Roberta; Nadkarni, Girish; Neville, Matt; Nielsen, Sune F; Ntalla, Ioanna; Peyser, Patricia A; Rathmann, Wolfgang; Rice, Kenneth; Rich, Stephen S; Rode, Line; Rolandsson, Olov; Schönherr, Sebastian; Selvin, Elizabeth; Small, Kerrin S; Stančáková, Alena; Surendran, Praveen; Taylor, Kent D; Teslovich, Tanya M; Thorand, Barbara; Thorleifsson, Gudmar; Tin, Adrienne; Tönjes, Anke; Varbo, Anette; Witte, Daniel R; Wood, Andrew R; Yajnik, Pranav; Yao, Jie; Yengo, Loïc; Young, Robin; Amouyel, Philippe; Boeing, Heiner; Boerwinkle, Eric; Bottinger, Erwin P; Chowdhury, Rajiv; Collins, Francis S; Dedoussis, George; Dehghan, Abbas; Deloukas, Panos; Ferrario, Marco M; Ferrières, Jean; Florez, Jose C; Frossard, Philippe; Gudnason, Vilmundur; Harris, Tamara B; Heckbert, Susan R; Howson, Joanna M M; Ingelsson, Martin; Kathiresan, Sekar; Kee, Frank; Kuusisto, Johanna; Langenberg, Claudia; Launer, Lenore J; Lindgren, Cecilia M; Männistö, Satu; Meitinger, Thomas; Melander, Olle; Mohlke, Karen L; Moitry, Marie; Morris, Andrew D; Murray, Alison D; de Mutsert, Renée; Orho-Melander, Marju; Owen, Katharine R; Perola, Markus; Peters, Annette; Province, Michael A; Rasheed, Asif; Ridker, Paul M; Rivadineira, Fernando; Rosendaal, Frits R; Rosengren, Anders H; Salomaa, Veikko; Sheu, Wayne H-H; Sladek, Rob; Smith, Blair H; Strauch, Konstantin; Uitterlinden, André G; Varma, Rohit; Willer, Cristen J; Blüher, Matthias; Butterworth, Adam S; Chambers, John Campbell; Chasman, Daniel I; Danesh, John; van Duijn, Cornelia; Dupuis, Josée; Franco, Oscar H; Franks, Paul W; Froguel, Philippe; Grallert, Harald; Groop, Leif; Han, Bok-Ghee; Hansen, Torben; Hattersley, Andrew T; Hayward, Caroline; Ingelsson, Erik; Kardia, Sharon L R; Karpe, Fredrik; Kooner, Jaspal Singh; Köttgen, Anna; Kuulasmaa, Kari; Laakso, Markku; Lin, Xu; Lind, Lars; Liu, Yongmei; Loos, Ruth J F; Marchini, Jonathan; Metspalu, Andres; Mook-Kanamori, Dennis; Nordestgaard, Børge G; Palmer, Colin N A; Pankow, James S; Pedersen, Oluf; Psaty, Bruce M; Rauramaa, Rainer; Sattar, Naveed; Schulze, Matthias B; Soranzo, Nicole; Spector, Timothy D; Stefansson, Kari; Stumvoll, Michael; Thorsteinsdottir, Unnur; Tuomi, Tiinamaija; Tuomilehto, Jaakko; Wareham, Nicholas J; Wilson, James G; Zeggini, Eleftheria; Scott, Robert A; Barroso, Inês; Frayling, Timothy M; Goodarzi, Mark O; Meigs, James B; Boehnke, Michael; Saleheen, Danish; Morris, Andrew P; Rotter, Jerome I; McCarthy, Mark I

2018-04-01

We aggregated coding variant data for 81,412 type 2 diabetes cases and 370,832 controls of diverse ancestry, identifying 40 coding variant association signals (P < 2.2 × 10 -7 ); of these, 16 map outside known risk-associated loci. We make two important observations. First, only five of these signals are driven by low-frequency variants: even for these, effect sizes are modest (odds ratio ≤1.29). Second, when we used large-scale genome-wide association data to fine-map the associated variants in their regional context, accounting for the global enrichment of complex trait associations in coding sequence, compelling evidence for coding variant causality was obtained for only 16 signals. At 13 others, the associated coding variants clearly represent 'false leads' with potential to generate erroneous mechanistic inference. Coding variant associations offer a direct route to biological insight for complex diseases and identification of validated therapeutic targets; however, appropriate mechanistic inference requires careful specification of their causal contribution to disease predisposition.
Evaluation of non-coding variation in GLUT1 deficiency.

PubMed

Liu, Yu-Chi; Lee, Jia Wei Audrey; Bellows, Susannah T; Damiano, John A; Mullen, Saul A; Berkovic, Samuel F; Bahlo, Melanie; Scheffer, Ingrid E; Hildebrand, Michael S

2016-12-01

Loss-of-function mutations in SLC2A1, encoding glucose transporter-1 (GLUT-1), lead to dysfunction of glucose transport across the blood-brain barrier. Ten percent of cases with hypoglycorrhachia (fasting cerebrospinal fluid [CSF] glucose <2.2mmol/L) do not have mutations. We hypothesized that GLUT1 deficiency could be due to non-coding SLC2A1 variants. We performed whole exome sequencing of one proband with a GLUT1 phenotype and hypoglycorrhachia negative for SLC2A1 sequencing and copy number variants. We studied a further 55 patients with different epilepsies and low CSF glucose who did not have exonic mutations or copy number variants. We sequenced non-coding promoter and intronic regions. We performed mRNA studies for the recurrent intronic variant. The proband had a de novo splice site mutation five base pairs from the intron-exon boundary. Three of 55 patients had deep intronic SLC2A1 variants, including a recurrent variant in two. The recurrent variant produced less SLC2A1 mRNA transcript. Fasting CSF glucose levels show an age-dependent correlation, which makes the definition of hypoglycorrhachia challenging. Low CSF glucose levels may be associated with pathogenic SLC2A1 mutations including deep intronic SLC2A1 variants. Extending genetic screening to non-coding regions will enable diagnosis of more patients with GLUT1 deficiency, allowing implementation of the ketogenic diet to improve outcomes. © 2016 Mac Keith Press.
Targeted Deep Resequencing Identifies Coding Variants in the PEAR1 Gene That Play a Role in Platelet Aggregation

PubMed Central

Kim, Yoonhee; Suktitipat, Bhoom; Yanek, Lisa R.; Faraday, Nauder; Wilson, Alexander F.; Becker, Diane M.; Becker, Lewis C.; Mathias, Rasika A.

2013-01-01

Platelet aggregation is heritable, and genome-wide association studies have detected strong associations with a common intronic variant of the platelet endothelial aggregation receptor1 (PEAR1) gene both in African American and European American individuals. In this study, we used a sequencing approach to identify additional exonic variants in PEAR1 that may also determine variability in platelet aggregation in the GeneSTAR Study. A 0.3 Mb targeted region on chromosome 1q23.1 including the entire PEAR1 gene was Sanger sequenced in 104 subjects (45% male, 49% African American, age = 52±13) selected on the basis of hyper- and hypo- aggregation across three different agonists (collagen, epinephrine, and adenosine diphosphate). Single-variant and multi-variant burden tests for association were performed. Of the 235 variants identified through sequencing, 61 were novel, and three of these were missense variants. More rare variants (MAF<5%) were noted in African Americans compared to European Americans (108 vs. 45). The common intronic GWAS-identified variant (rs12041331) demonstrated the most significant association signal in African Americans (p = 4.020×10−4); no association was seen for additional exonic variants in this group. In contrast, multi-variant burden tests indicated that exonic variants play a more significant role in European Americans (p = 0.0099 for the collective coding variants compared to p = 0.0565 for intronic variant rs12041331). Imputation of the individual exonic variants in the rest of the GeneSTAR European American cohort (N = 1,965) supports the results noted in the sequenced discovery sample: p = 3.56×10−4, 2.27×10−7, 5.20×10−5 for coding synonymous variant rs56260937 and collagen, epinephrine and adenosine diphosphate induced platelet aggregation, respectively. Sequencing approaches confirm that a common intronic variant has the strongest association with platelet aggregation in African Americans, and show that exonic variants play an additional role in platelet aggregation in European Americans. PMID:23704978
Whole-genome sequencing reveals a coding non-pathogenic variant tagging a non-coding pathogenic hexanucleotide repeat expansion in C9orf72 as cause of amyotrophic lateral sclerosis.

PubMed

Herdewyn, Sarah; Zhao, Hui; Moisse, Matthieu; Race, Valérie; Matthijs, Gert; Reumers, Joke; Kusters, Benno; Schelhaas, Helenius J; van den Berg, Leonard H; Goris, An; Robberecht, Wim; Lambrechts, Diether; Van Damme, Philip

2012-06-01

Motor neuron degeneration in amyotrophic lateral sclerosis (ALS) has a familial cause in 10% of patients. Despite significant advances in the genetics of the disease, many families remain unexplained. We performed whole-genome sequencing in five family members from a pedigree with autosomal-dominant classical ALS. A family-based elimination approach was used to identify novel coding variants segregating with the disease. This list of variants was effectively shortened by genotyping these variants in 2 additional unaffected family members and 1500 unrelated population-specific controls. A novel rare coding variant in SPAG8 on chromosome 9p13.3 segregated with the disease and was not observed in controls. Mutations in SPAG8 were not encountered in 34 other unexplained ALS pedigrees, including 1 with linkage to chromosome 9p13.2-23.3. The shared haplotype containing the SPAG8 variant in this small pedigree was 22.7 Mb and overlapped with the core 9p21 linkage locus for ALS and frontotemporal dementia. Based on differences in coverage depth of known variable tandem repeat regions between affected and non-affected family members, the shared haplotype was found to contain an expanded hexanucleotide (GGGGCC)(n) repeat in C9orf72 in the affected members. Our results demonstrate that rare coding variants identified by whole-genome sequencing can tag a shared haplotype containing a non-coding pathogenic mutation and that changes in coverage depth can be used to reveal tandem repeat expansions. It also confirms (GGGGCC)n repeat expansions in C9orf72 as a cause of familial ALS.
Exome sequencing in an admixed isolated population indicates NFXL1 variants confer a risk for specific language impairment.

PubMed

Villanueva, Pía; Nudel, Ron; Hoischen, Alexander; Fernández, María Angélica; Simpson, Nuala H; Gilissen, Christian; Reader, Rose H; Jara, Lillian; Echeverry, María Magdalena; Echeverry, Maria Magdalena; Francks, Clyde; Baird, Gillian; Conti-Ramsden, Gina; O'Hare, Anne; Bolton, Patrick F; Hennessy, Elizabeth R; Palomino, Hernán; Carvajal-Carmona, Luis; Veltman, Joris A; Cazier, Jean-Baptiste; De Barbieri, Zulema; Fisher, Simon E; Newbury, Dianne F

2015-03-01

Children affected by Specific Language Impairment (SLI) fail to acquire age appropriate language skills despite adequate intelligence and opportunity. SLI is highly heritable, but the understanding of underlying genetic mechanisms has proved challenging. In this study, we use molecular genetic techniques to investigate an admixed isolated founder population from the Robinson Crusoe Island (Chile), who are affected by a high incidence of SLI, increasing the power to discover contributory genetic factors. We utilize exome sequencing in selected individuals from this population to identify eight coding variants that are of putative significance. We then apply association analyses across the wider population to highlight a single rare coding variant (rs144169475, Minor Allele Frequency of 4.1% in admixed South American populations) in the NFXL1 gene that confers a nonsynonymous change (N150K) and is significantly associated with language impairment in the Robinson Crusoe population (p = 2.04 × 10-4, 8 variants tested). Subsequent sequencing of NFXL1 in 117 UK SLI cases identified four individuals with heterozygous variants predicted to be of functional consequence. We conclude that coding variants within NFXL1 confer an increased risk of SLI within a complex genetic model.
Non-coding variants contribute to the clinical heterogeneity of TTR amyloidosis.

PubMed

Iorio, Andrea; De Lillo, Antonella; De Angelis, Flavio; Di Girolamo, Marco; Luigetti, Marco; Sabatelli, Mario; Pradotto, Luca; Mauro, Alessandro; Mazzeo, Anna; Stancanelli, Claudia; Perfetto, Federico; Frusconi, Sabrina; My, Filomena; Manfellotto, Dario; Fuciarelli, Maria; Polimanti, Renato

2017-09-01

Coding mutations in TTR gene cause a rare hereditary form of systemic amyloidosis, which has a complex genotype-phenotype correlation. We investigated the role of non-coding variants in regulating TTR gene expression and consequently amyloidosis symptoms. We evaluated the genotype-phenotype correlation considering the clinical information of 129 Italian patients with TTR amyloidosis. Then, we conducted a re-sequencing of TTR gene to investigate how non-coding variants affect TTR expression and, consequently, phenotypic presentation in carriers of amyloidogenic mutations. Polygenic scores for genetically determined TTR expression were constructed using data from our re-sequencing analysis and the GTEx (Genotype-Tissue Expression) project. We confirmed a strong phenotypic heterogeneity across coding mutations causing TTR amyloidosis. Considering the effects of non-coding variants on TTR expression, we identified three patient clusters with specific expression patterns associated with certain phenotypic presentations, including late onset, autonomic neurological involvement, and gastrointestinal symptoms. This study provides novel data regarding the role of non-coding variation and the gene expression profiles in patients affected by TTR amyloidosis, also putting forth an approach that could be used to investigate the mechanisms at the basis of the genotype-phenotype correlation of the disease.
A survey of single nucleotide polymorphisms identified from whole-genome sequencing and their functional effect in the porcine genome

USDA-ARS?s Scientific Manuscript database

Genetic variants detected from sequence have been used to successfully identify causal variants and map complex traits in several organisms. High and moderate impact variants, those expected to alter or disrupt the protein coded by a gene and those that regulate protein production, likely have a mor...
Double Hits in Schizophrenia.

PubMed

Vorstman, Jacob A S; Olde Loohuis, Loes M; Kahn, René S; Ophoff, Roel A

2018-05-14

The co-occurrence of a Copy Number Variant (CNV) and a functional variant on the other allele may be a relevant genetic mechanism in schizophrenia. We hypothesized that the cumulative burden of such double hits - in particular those composed of a deletion and a coding single nucleotide variation (SNV) - is increased in patients with schizophrenia.We combined CNV data with coding variants data in 795 patients with schizophrenia and 474 controls. To limit false CNV-detection, only CNVs called only by two algorithms we included. CNV-affected genes were subsequently examined for coding SNVs, which we termed "CNV-SNVs". Correcting for total queried sequence, we assessed the CNV-SNV-burden and the combined predicted deleterious effect. We estimated p-values by permutation of the phenotype.We detected 105 CNV-SNVs; 67 in duplicated and 38 in deleted genic sequence. While the difference in CNV-SNVs rates was not significant, the combined deleteriousness inferred by CNV-SNVs in deleted sequence was almost fourfold higher in cases compared to controls (nominal p = 0.009). This effect may be driven by a higher number of CNV-SNVs and/or by a higher degree of predicted deleteriousness of CNV-SNVs. No such effect was observed for duplications.We provide early evidence that deletions co-occurring with a functional variant may be relevant, albeit of modest impact, for the genetic etiology of schizophrenia. Large-scale consortium studies are required to validate our findings. Sequence-based analyses would provide the best resolution for detection of CNVs as well as coding variants genome-wide.
Association of genetic variants of GRIN2B with autism.

PubMed

Pan, Yongcheng; Chen, Jingjing; Guo, Hui; Ou, Jianjun; Peng, Yu; Liu, Qiong; Shen, Yidong; Shi, Lijuan; Liu, Yalan; Xiong, Zhimin; Zhu, Tengfei; Luo, Sanchuan; Hu, Zhengmao; Zhao, Jingping; Xia, Kun

2015-02-06

Autism (MIM 209850) is a complex neurodevelopmental disorder characterized by social communication impairments and restricted repetitive behaviors. It has a high heritability, although much remains unclear. To evaluate genetic variants of GRIN2B in autism etiology, we performed a system association study of common and rare variants of GRIN2B and autism in cohorts from a Chinese population, involving a total sample of 1,945 subjects. Meta-analysis of a triad family cohort and a case-control cohort identified significant associations of multiple common variants and autism risk (Pmin = 1.73 × 10(-4)). Significantly, the haplotype involved with the top common variants also showed significant association (P = 1.78 × 10(-6)). Sanger sequencing of 275 probands from a triad cohort identified several variants in coding regions, including four common variants and seven rare variants. Two of the common coding variants were located in the autism-related linkage disequilibrium (LD) block, and both were significantly associated with autism (P < 9 × 10(-3)) using an independent control cohort. Burden analysis and case-only analysis of rare coding variants identified by Sanger sequencing did not find this association. Our study for the first time reveals that common variants and related haplotypes of GRIN2B are associated with autism risk.
Detecting very low allele fraction variants using targeted DNA sequencing and a novel molecular barcode-aware variant caller.

PubMed

Xu, Chang; Nezami Ranjbar, Mohammad R; Wu, Zhong; DiCarlo, John; Wang, Yexun

2017-01-03

Detection of DNA mutations at very low allele fractions with high accuracy will significantly improve the effectiveness of precision medicine for cancer patients. To achieve this goal through next generation sequencing, researchers need a detection method that 1) captures rare mutation-containing DNA fragments efficiently in the mix of abundant wild-type DNA; 2) sequences the DNA library extensively to deep coverage; and 3) distinguishes low level true variants from amplification and sequencing errors with high accuracy. Targeted enrichment using PCR primers provides researchers with a convenient way to achieve deep sequencing for a small, yet most relevant region using benchtop sequencers. Molecular barcoding (or indexing) provides a unique solution for reducing sequencing artifacts analytically. Although different molecular barcoding schemes have been reported in recent literature, most variant calling has been done on limited targets, using simple custom scripts. The analytical performance of barcode-aware variant calling can be significantly improved by incorporating advanced statistical models. We present here a highly efficient, simple and scalable enrichment protocol that integrates molecular barcodes in multiplex PCR amplification. In addition, we developed smCounter, an open source, generic, barcode-aware variant caller based on a Bayesian probabilistic model. smCounter was optimized and benchmarked on two independent read sets with SNVs and indels at 5 and 1% allele fractions. Variants were called with very good sensitivity and specificity within coding regions. We demonstrated that we can accurately detect somatic mutations with allele fractions as low as 1% in coding regions using our enrichment protocol and variant caller.

Amyotrophic lateral sclerosis onset is influenced by the burden of rare variants in known amyotrophic lateral sclerosis genes.

PubMed

Cady, Janet; Allred, Peggy; Bali, Taha; Pestronk, Alan; Goate, Alison; Miller, Timothy M; Mitra, Robi D; Ravits, John; Harms, Matthew B; Baloh, Robert H

2015-01-01

To define the genetic landscape of amyotrophic lateral sclerosis (ALS) and assess the contribution of possible oligogenic inheritance, we aimed to comprehensively sequence 17 known ALS genes in 391 ALS patients from the United States. Targeted pooled-sample sequencing was used to identify variants in 17 ALS genes. Fragment size analysis was used to define ATXN2 and C9ORF72 expansion sizes. Genotype-phenotype correlations were made with individual variants and total burden of variants. Rare variant associations for risk of ALS were investigated at both the single variant and gene level. A total of 64.3% of familial and 27.8% of sporadic subjects carried potentially pathogenic novel or rare coding variants identified by sequencing or an expanded repeat in C9ORF72 or ATXN2; 3.8% of subjects had variants in >1 ALS gene, and these individuals had disease onset 10 years earlier (p = 0.0046) than subjects with variants in a single gene. The number of potentially pathogenic coding variants did not influence disease duration or site of onset. Rare and potentially pathogenic variants in known ALS genes are present in >25% of apparently sporadic and 64% of familial patients, significantly higher than previous reports using less comprehensive sequencing approaches. A significant number of subjects carried variants in >1 gene, which influenced the age of symptom onset and supports oligogenic inheritance as relevant to disease pathogenesis. © 2014 American Neurological Association.
Exome Sequencing in an Admixed Isolated Population Indicates NFXL1 Variants Confer a Risk for Specific Language Impairment

PubMed Central

Villanueva, Pía; Nudel, Ron; Hoischen, Alexander; Fernández, María Angélica; Simpson, Nuala H.; Gilissen, Christian; Reader, Rose H.; Jara, Lillian; Echeverry, Maria Magdalena; Francks, Clyde; Baird, Gillian; Conti-Ramsden, Gina; O’Hare, Anne; Bolton, Patrick F.; Hennessy, Elizabeth R.; Palomino, Hernán; Carvajal-Carmona, Luis; Veltman, Joris A.; Cazier, Jean-Baptiste; De Barbieri, Zulema

2015-01-01

Children affected by Specific Language Impairment (SLI) fail to acquire age appropriate language skills despite adequate intelligence and opportunity. SLI is highly heritable, but the understanding of underlying genetic mechanisms has proved challenging. In this study, we use molecular genetic techniques to investigate an admixed isolated founder population from the Robinson Crusoe Island (Chile), who are affected by a high incidence of SLI, increasing the power to discover contributory genetic factors. We utilize exome sequencing in selected individuals from this population to identify eight coding variants that are of putative significance. We then apply association analyses across the wider population to highlight a single rare coding variant (rs144169475, Minor Allele Frequency of 4.1% in admixed South American populations) in the NFXL1 gene that confers a nonsynonymous change (N150K) and is significantly associated with language impairment in the Robinson Crusoe population (p = 2.04 × 10–4, 8 variants tested). Subsequent sequencing of NFXL1 in 117 UK SLI cases identified four individuals with heterozygous variants predicted to be of functional consequence. We conclude that coding variants within NFXL1 confer an increased risk of SLI within a complex genetic model. PMID:25781923
Semiconductor Whole Exome Sequencing for the Identification of Genetic Variants in Colombian Patients Clinically Diagnosed with Long QT Syndrome.

PubMed

Burgos, Mariana; Arenas, Alvaro; Cabrera, Rodrigo

2016-08-01

Inherited long QT syndrome (LQTS) is a cardiac channelopathy characterized by a prolongation of QT interval and the risk of syncope, cardiac arrest, and sudden cardiac death. Genetic diagnosis of LQTS is critical in medical practice as results can guide adequate management of patients and distinguish phenocopies such as catecholaminergic polymorphic ventricular tachycardia (CPVT). However, extensive screening of large genomic regions is required in order to reliably identify genetic causes. Semiconductor whole exome sequencing (WES) is a promising approach for the identification of variants in the coding regions of most human genes. DNA samples from 21 Colombian patients clinically diagnosed with LQTS were enriched for coding regions using multiplex polymerase chain reaction (PCR) and subjected to WES using a semiconductor sequencer. Semiconductor WES showed mean coverage of 93.6 % for all coding regions relevant to LQTS at >10× depth with high intra- and inter-assay depth heterogeneity. Fifteen variants were detected in 12 patients in genes associated with LQTS. Three variants were identified in three patients in genes associated with CPVT. Co-segregation analysis was performed when possible. All variants were analyzed with two pathogenicity prediction algorithms. The overall prevalence of LQTS and CPVT variants in our cohort was 71.4 %. All LQTS variants previously identified through commercial genetic testing were identified. Standardized WES assays can be easily implemented, often at a lower cost than sequencing panels. Our results show that WES can identify LQTS-causing mutations and permits differential diagnosis of related conditions in a real-world clinical setting. However, high heterogeneity in sequencing depth and low coverage in the most relevant genes is expected to be associated with reduced analytical sensitivity.
Next-generation sequencing of the monogenic obesity genes LEP, LEPR, MC4R, PCSK1 and POMC in a Norwegian cohort of patients with morbid obesity and normal weight controls.

PubMed

Nordang, Gry B N; Busk, Øyvind L; Tveten, Kristian; Hanevik, Hans Ivar; Fell, Anne Kristin M; Hjelmesæth, Jøran; Holla, Øystein L; Hertel, Jens K

2017-05-01

Rare sequence variants in at least five genes are known to cause monogenic obesity. In this study we aimed to investigate the prevalence of, and characterize, rare coding and splice site variants in LEP, LEPR, MC4R, PCSK1 and POMC in patients with morbid obesity and normal weight controls. Targeted next-generation sequencing of all exons in LEP, LEPR, MC4R, PCSK1 and POMC was performed in 485 patients with morbid obesity and 327 normal weight population-based controls from Norway. In total 151 variants were detected. Twenty-eight (18.5%) of these were rare, coding or splice variants and five (3.3%) were novel. All individuals, except one control, were heterozygous for the 28 variants, and the distribution of the rare variants showed a significantly higher carrier frequency among cases than controls (9.9% vs. 4.9%, p=0.011). Four variants in MC4R were classified as pathogenic or likely pathogenic. Four cases (0.8%) of monogenic obesity were detected, all due to MC4R variants previously linked to monogenic obesity. Significant differences in carrier frequencies among patients with morbid obesity and normal weight controls suggest an association between heterozygous rare coding variants in these five genes and morbid obesity. However, additional studies in larger cohorts and functional testing of the novel variants identified are required to confirm the findings. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Whole-Exome Sequencing Identifies Rare and Low-Frequency Coding Variants Associated with LDL Cholesterol

PubMed Central

Lange, Leslie A.; Hu, Youna; Zhang, He; Xue, Chenyi; Schmidt, Ellen M.; Tang, Zheng-Zheng; Bizon, Chris; Lange, Ethan M.; Smith, Joshua D.; Turner, Emily H.; Jun, Goo; Kang, Hyun Min; Peloso, Gina; Auer, Paul; Li, Kuo-ping; Flannick, Jason; Zhang, Ji; Fuchsberger, Christian; Gaulton, Kyle; Lindgren, Cecilia; Locke, Adam; Manning, Alisa; Sim, Xueling; Rivas, Manuel A.; Holmen, Oddgeir L.; Gottesman, Omri; Lu, Yingchang; Ruderfer, Douglas; Stahl, Eli A.; Duan, Qing; Li, Yun; Durda, Peter; Jiao, Shuo; Isaacs, Aaron; Hofman, Albert; Bis, Joshua C.; Correa, Adolfo; Griswold, Michael E.; Jakobsdottir, Johanna; Smith, Albert V.; Schreiner, Pamela J.; Feitosa, Mary F.; Zhang, Qunyuan; Huffman, Jennifer E.; Crosby, Jacy; Wassel, Christina L.; Do, Ron; Franceschini, Nora; Martin, Lisa W.; Robinson, Jennifer G.; Assimes, Themistocles L.; Crosslin, David R.; Rosenthal, Elisabeth A.; Tsai, Michael; Rieder, Mark J.; Farlow, Deborah N.; Folsom, Aaron R.; Lumley, Thomas; Fox, Ervin R.; Carlson, Christopher S.; Peters, Ulrike; Jackson, Rebecca D.; van Duijn, Cornelia M.; Uitterlinden, André G.; Levy, Daniel; Rotter, Jerome I.; Taylor, Herman A.; Gudnason, Vilmundur; Siscovick, David S.; Fornage, Myriam; Borecki, Ingrid B.; Hayward, Caroline; Rudan, Igor; Chen, Y. Eugene; Bottinger, Erwin P.; Loos, Ruth J.F.; Sætrom, Pål; Hveem, Kristian; Boehnke, Michael; Groop, Leif; McCarthy, Mark; Meitinger, Thomas; Ballantyne, Christie M.; Gabriel, Stacey B.; O’Donnell, Christopher J.; Post, Wendy S.; North, Kari E.; Reiner, Alexander P.; Boerwinkle, Eric; Psaty, Bruce M.; Altshuler, David; Kathiresan, Sekar; Lin, Dan-Yu; Jarvik, Gail P.; Cupples, L. Adrienne; Kooperberg, Charles; Wilson, James G.; Nickerson, Deborah A.; Abecasis, Goncalo R.; Rich, Stephen S.; Tracy, Russell P.; Willer, Cristen J.; Gabriel, Stacey B.; Altshuler, David M.; Abecasis, Gonçalo R.; Allayee, Hooman; Cresci, Sharon; Daly, Mark J.; de Bakker, Paul I.W.; DePristo, Mark A.; Do, Ron; Donnelly, Peter; Farlow, Deborah N.; Fennell, Tim; Garimella, Kiran; Hazen, Stanley L.; Hu, Youna; Jordan, Daniel M.; Jun, Goo; Kathiresan, Sekar; Kang, Hyun Min; Kiezun, Adam; Lettre, Guillaume; Li, Bingshan; Li, Mingyao; Newton-Cheh, Christopher H.; Padmanabhan, Sandosh; Peloso, Gina; Pulit, Sara; Rader, Daniel J.; Reich, David; Reilly, Muredach P.; Rivas, Manuel A.; Schwartz, Steve; Scott, Laura; Siscovick, David S.; Spertus, John A.; Stitziel, Nathaniel O.; Stoletzki, Nina; Sunyaev, Shamil R.; Voight, Benjamin F.; Willer, Cristen J.; Rich, Stephen S.; Akylbekova, Ermeg; Atwood, Larry D.; Ballantyne, Christie M.; Barbalic, Maja; Barr, R. Graham; Benjamin, Emelia J.; Bis, Joshua; Boerwinkle, Eric; Bowden, Donald W.; Brody, Jennifer; Budoff, Matthew; Burke, Greg; Buxbaum, Sarah; Carr, Jeff; Chen, Donna T.; Chen, Ida Y.; Chen, Wei-Min; Concannon, Pat; Crosby, Jacy; Cupples, L. Adrienne; D’Agostino, Ralph; DeStefano, Anita L.; Dreisbach, Albert; Dupuis, Josée; Durda, J. Peter; Ellis, Jaclyn; Folsom, Aaron R.; Fornage, Myriam; Fox, Caroline S.; Fox, Ervin; Funari, Vincent; Ganesh, Santhi K.; Gardin, Julius; Goff, David; Gordon, Ora; Grody, Wayne; Gross, Myron; Guo, Xiuqing; Hall, Ira M.; Heard-Costa, Nancy L.; Heckbert, Susan R.; Heintz, Nicholas; Herrington, David M.; Hickson, DeMarc; Huang, Jie; Hwang, Shih-Jen; Jacobs, David R.; Jenny, Nancy S.; Johnson, Andrew D.; Johnson, Craig W.; Kawut, Steven; Kronmal, Richard; Kurz, Raluca; Lange, Ethan M.; Lange, Leslie A.; Larson, Martin G.; Lawson, Mark; Lewis, Cora E.; Levy, Daniel; Li, Dalin; Lin, Honghuang; Liu, Chunyu; Liu, Jiankang; Liu, Kiang; Liu, Xiaoming; Liu, Yongmei; Longstreth, William T.; Loria, Cay; Lumley, Thomas; Lunetta, Kathryn; Mackey, Aaron J.; Mackey, Rachel; Manichaikul, Ani; Maxwell, Taylor; McKnight, Barbara; Meigs, James B.; Morrison, Alanna C.; Musani, Solomon K.; Mychaleckyj, Josyf C.; Nettleton, Jennifer A.; North, Kari; O’Donnell, Christopher J.; O’Leary, Daniel; Ong, Frank; Palmas, Walter; Pankow, James S.; Pankratz, Nathan D.; Paul, Shom; Perez, Marco; Person, Sharina D.; Polak, Joseph; Post, Wendy S.; Psaty, Bruce M.; Quinlan, Aaron R.; Raffel, Leslie J.; Ramachandran, Vasan S.; Reiner, Alexander P.; Rice, Kenneth; Rotter, Jerome I.; Sanders, Jill P.; Schreiner, Pamela; Seshadri, Sudha; Shea, Steve; Sidney, Stephen; Silverstein, Kevin; Smith, Nicholas L.; Sotoodehnia, Nona; Srinivasan, Asoke; Taylor, Herman A.; Taylor, Kent; Thomas, Fridtjof; Tracy, Russell P.; Tsai, Michael Y.; Volcik, Kelly A.; Wassel, Chrstina L.; Watson, Karol; Wei, Gina; White, Wendy; Wiggins, Kerri L.; Wilk, Jemma B.; Williams, O. Dale; Wilson, Gregory; Wilson, James G.; Wolf, Phillip; Zakai, Neil A.; Hardy, John; Meschia, James F.; Nalls, Michael; Singleton, Andrew; Worrall, Brad; Bamshad, Michael J.; Barnes, Kathleen C.; Abdulhamid, Ibrahim; Accurso, Frank; Anbar, Ran; Beaty, Terri; Bigham, Abigail; Black, Phillip; Bleecker, Eugene; Buckingham, Kati; Cairns, Anne Marie; Caplan, Daniel; Chatfield, Barbara; Chidekel, Aaron; Cho, Michael; Christiani, David C.; Crapo, James D.; Crouch, Julia; Daley, Denise; Dang, Anthony; Dang, Hong; De Paula, Alicia; DeCelie-Germana, Joan; Drumm, Allen DozorMitch; Dyson, Maynard; Emerson, Julia; Emond, Mary J.; Ferkol, Thomas; Fink, Robert; Foster, Cassandra; Froh, Deborah; Gao, Li; Gershan, William; Gibson, Ronald L.; Godwin, Elizabeth; Gondor, Magdalen; Gutierrez, Hector; Hansel, Nadia N.; Hassoun, Paul M.; Hiatt, Peter; Hokanson, John E.; Howenstine, Michelle; Hummer, Laura K.; Kanga, Jamshed; Kim, Yoonhee; Knowles, Michael R.; Konstan, Michael; Lahiri, Thomas; Laird, Nan; Lange, Christoph; Lin, Lin; Lin, Xihong; Louie, Tin L.; Lynch, David; Make, Barry; Martin, Thomas R.; Mathai, Steve C.; Mathias, Rasika A.; McNamara, John; McNamara, Sharon; Meyers, Deborah; Millard, Susan; Mogayzel, Peter; Moss, Richard; Murray, Tanda; Nielson, Dennis; Noyes, Blakeslee; O’Neal, Wanda; Orenstein, David; O’Sullivan, Brian; Pace, Rhonda; Pare, Peter; Parker, H. Worth; Passero, Mary Ann; Perkett, Elizabeth; Prestridge, Adrienne; Rafaels, Nicholas M.; Ramsey, Bonnie; Regan, Elizabeth; Ren, Clement; Retsch-Bogart, George; Rock, Michael; Rosen, Antony; Rosenfeld, Margaret; Ruczinski, Ingo; Sanford, Andrew; Schaeffer, David; Sell, Cindy; Sheehan, Daniel; Silverman, Edwin K.; Sin, Don; Spencer, Terry; Stonebraker, Jackie; Tabor, Holly K.; Varlotta, Laurie; Vergara, Candelaria I.; Weiss, Robert; Wigley, Fred; Wise, Robert A.; Wright, Fred A.; Wurfel, Mark M.; Zanni, Robert; Zou, Fei; Nickerson, Deborah A.; Rieder, Mark J.; Green, Phil; Shendure, Jay; Akey, Joshua M.; Bustamante, Carlos D.; Crosslin, David R.; Eichler, Evan E.; Fox, P. Keolu; Fu, Wenqing; Gordon, Adam; Gravel, Simon; Jarvik, Gail P.; Johnsen, Jill M.; Kan, Mengyuan; Kenny, Eimear E.; Kidd, Jeffrey M.; Lara-Garduno, Fremiet; Leal, Suzanne M.; Liu, Dajiang J.; McGee, Sean; O’Connor, Timothy D.; Paeper, Bryan; Robertson, Peggy D.; Smith, Joshua D.; Staples, Jeffrey C.; Tennessen, Jacob A.; Turner, Emily H.; Wang, Gao; Yi, Qian; Jackson, Rebecca; Peters, Ulrike; Carlson, Christopher S.; Anderson, Garnet; Anton-Culver, Hoda; Assimes, Themistocles L.; Auer, Paul L.; Beresford, Shirley; Bizon, Chris; Black, Henry; Brunner, Robert; Brzyski, Robert; Burwen, Dale; Caan, Bette; Carty, Cara L.; Chlebowski, Rowan; Cummings, Steven; Curb, J. David; Eaton, Charles B.; Ford, Leslie; Franceschini, Nora; Fullerton, Stephanie M.; Gass, Margery; Geller, Nancy; Heiss, Gerardo; Howard, Barbara V.; Hsu, Li; Hutter, Carolyn M.; Ioannidis, John; Jiao, Shuo; Johnson, Karen C.; Kooperberg, Charles; Kuller, Lewis; LaCroix, Andrea; Lakshminarayan, Kamakshi; Lane, Dorothy; Lasser, Norman; LeBlanc, Erin; Li, Kuo-Ping; Limacher, Marian; Lin, Dan-Yu; Logsdon, Benjamin A.; Ludlam, Shari; Manson, JoAnn E.; Margolis, Karen; Martin, Lisa; McGowan, Joan; Monda, Keri L.; Kotchen, Jane Morley; Nathan, Lauren; Ockene, Judith; O’Sullivan, Mary Jo; Phillips, Lawrence S.; Prentice, Ross L.; Robbins, John; Robinson, Jennifer G.; Rossouw, Jacques E.; Sangi-Haghpeykar, Haleh; Sarto, Gloria E.; Shumaker, Sally; Simon, Michael S.; Stefanick, Marcia L.; Stein, Evan; Tang, Hua; Taylor, Kira C.; Thomson, Cynthia A.; Thornton, Timothy A.; Van Horn, Linda; Vitolins, Mara; Wactawski-Wende, Jean; Wallace, Robert; Wassertheil-Smoller, Sylvia; Zeng, Donglin; Applebaum-Bowden, Deborah; Feolo, Michael; Gan, Weiniu; Paltoo, Dina N.; Sholinsky, Phyliss; Sturcke, Anne

2014-01-01

Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98th or <2nd percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments. PMID:24507775
Whole-exome sequencing identifies rare and low-frequency coding variants associated with LDL cholesterol.

PubMed

Lange, Leslie A; Hu, Youna; Zhang, He; Xue, Chenyi; Schmidt, Ellen M; Tang, Zheng-Zheng; Bizon, Chris; Lange, Ethan M; Smith, Joshua D; Turner, Emily H; Jun, Goo; Kang, Hyun Min; Peloso, Gina; Auer, Paul; Li, Kuo-Ping; Flannick, Jason; Zhang, Ji; Fuchsberger, Christian; Gaulton, Kyle; Lindgren, Cecilia; Locke, Adam; Manning, Alisa; Sim, Xueling; Rivas, Manuel A; Holmen, Oddgeir L; Gottesman, Omri; Lu, Yingchang; Ruderfer, Douglas; Stahl, Eli A; Duan, Qing; Li, Yun; Durda, Peter; Jiao, Shuo; Isaacs, Aaron; Hofman, Albert; Bis, Joshua C; Correa, Adolfo; Griswold, Michael E; Jakobsdottir, Johanna; Smith, Albert V; Schreiner, Pamela J; Feitosa, Mary F; Zhang, Qunyuan; Huffman, Jennifer E; Crosby, Jacy; Wassel, Christina L; Do, Ron; Franceschini, Nora; Martin, Lisa W; Robinson, Jennifer G; Assimes, Themistocles L; Crosslin, David R; Rosenthal, Elisabeth A; Tsai, Michael; Rieder, Mark J; Farlow, Deborah N; Folsom, Aaron R; Lumley, Thomas; Fox, Ervin R; Carlson, Christopher S; Peters, Ulrike; Jackson, Rebecca D; van Duijn, Cornelia M; Uitterlinden, André G; Levy, Daniel; Rotter, Jerome I; Taylor, Herman A; Gudnason, Vilmundur; Siscovick, David S; Fornage, Myriam; Borecki, Ingrid B; Hayward, Caroline; Rudan, Igor; Chen, Y Eugene; Bottinger, Erwin P; Loos, Ruth J F; Sætrom, Pål; Hveem, Kristian; Boehnke, Michael; Groop, Leif; McCarthy, Mark; Meitinger, Thomas; Ballantyne, Christie M; Gabriel, Stacey B; O'Donnell, Christopher J; Post, Wendy S; North, Kari E; Reiner, Alexander P; Boerwinkle, Eric; Psaty, Bruce M; Altshuler, David; Kathiresan, Sekar; Lin, Dan-Yu; Jarvik, Gail P; Cupples, L Adrienne; Kooperberg, Charles; Wilson, James G; Nickerson, Deborah A; Abecasis, Goncalo R; Rich, Stephen S; Tracy, Russell P; Willer, Cristen J

2014-02-06

Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98(th) or <2(nd) percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments. Copyright © 2014 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
An automatic and efficient pipeline for disease gene identification through utilizing family-based sequencing data.

PubMed

Song, Dandan; Li, Ning; Liao, Lejian

2015-01-01

Due to the generation of enormous amounts of data at both lower costs as well as in shorter times, whole-exome sequencing technologies provide dramatic opportunities for identifying disease genes implicated in Mendelian disorders. Since upwards of thousands genomic variants can be sequenced in each exome, it is challenging to filter pathogenic variants in protein coding regions and reduce the number of missing true variants. Therefore, an automatic and efficient pipeline for finding disease variants in Mendelian disorders is designed by exploiting a combination of variants filtering steps to analyze the family-based exome sequencing approach. Recent studies on the Freeman-Sheldon disease are revisited and show that the proposed method outperforms other existing candidate gene identification methods.
Novel variants of the 5S rRNA genes in Eruca sativa.

PubMed

Singh, K; Bhatia, S; Lakshmikumaran, M

1994-02-01

The 5S ribosomal RNA (rRNA) genes of Eruca sativa were cloned and characterized. They are organized into clusters of tandemly repeated units. Each repeat unit consists of a 119-bp coding region followed by a noncoding spacer region that separates it from the coding region of the next repeat unit. Our study reports novel gene variants of the 5S rRNA genes in plants. Two families of the 5S rDNA, the 0.5-kb size family and the 1-kb size family, coexist in the E. sativa genome. The 0.5-kb size family consists of the 5S rRNA genes (S4) that have coding regions similar to those of other reported plant 5S rDNA sequences, whereas the 1-kb size family consists of the 5S rRNA gene variants (S1) that exist as 1-kb BamHI tandem repeats. S1 is made up of two variant units (V1 and V2) of 5S rDNA where the BamHI site between the two units is mutated. Sequence heterogeneity among S4, V1, and V2 units exists throughout the sequence and is not limited to the noncoding spacer region only. The coding regions of V1 and V2 show approximately 20% dissimilarity to the coding regions of S4 and other reported plant 5S rDNA sequences. Such a large variation in the coding regions of the 5S rDNA units within the same plant species has been observed for the first time. Restriction site variation is observed between the two size classes of 5S rDNA in E. sativa.(ABSTRACT TRUNCATED AT 250 WORDS)
Comprehensive Rare Variant Analysis via Whole-Genome Sequencing to Determine the Molecular Pathology of Inherited Retinal Disease.

PubMed

Carss, Keren J; Arno, Gavin; Erwood, Marie; Stephens, Jonathan; Sanchis-Juan, Alba; Hull, Sarah; Megy, Karyn; Grozeva, Detelina; Dewhurst, Eleanor; Malka, Samantha; Plagnol, Vincent; Penkett, Christopher; Stirrups, Kathleen; Rizzo, Roberta; Wright, Genevieve; Josifova, Dragana; Bitner-Glindzicz, Maria; Scott, Richard H; Clement, Emma; Allen, Louise; Armstrong, Ruth; Brady, Angela F; Carmichael, Jenny; Chitre, Manali; Henderson, Robert H H; Hurst, Jane; MacLaren, Robert E; Murphy, Elaine; Paterson, Joan; Rosser, Elisabeth; Thompson, Dorothy A; Wakeling, Emma; Ouwehand, Willem H; Michaelides, Michel; Moore, Anthony T; Webster, Andrew R; Raymond, F Lucy

2017-01-05

Inherited retinal disease is a common cause of visual impairment and represents a highly heterogeneous group of conditions. Here, we present findings from a cohort of 722 individuals with inherited retinal disease, who have had whole-genome sequencing (n = 605), whole-exome sequencing (n = 72), or both (n = 45) performed, as part of the NIHR-BioResource Rare Diseases research study. We identified pathogenic variants (single-nucleotide variants, indels, or structural variants) for 404/722 (56%) individuals. Whole-genome sequencing gives unprecedented power to detect three categories of pathogenic variants in particular: structural variants, variants in GC-rich regions, which have significantly improved coverage compared to whole-exome sequencing, and variants in non-coding regulatory regions. In addition to previously reported pathogenic regulatory variants, we have identified a previously unreported pathogenic intronic variant in CHM in two males with choroideremia. We have also identified 19 genes not previously known to be associated with inherited retinal disease, which harbor biallelic predicted protein-truncating variants in unsolved cases. Whole-genome sequencing is an increasingly important comprehensive method with which to investigate the genetic causes of inherited retinal disease. Copyright © 2017. Published by Elsevier Inc.
Allelic Expression of Deleterious Protein-Coding Variants across Human Tissues

PubMed Central

Kukurba, Kimberly R.; Zhang, Rui; Li, Xin; Smith, Kevin S.; Knowles, David A.; How Tan, Meng; Piskol, Robert; Lek, Monkol; Snyder, Michael; MacArthur, Daniel G.; Li, Jin Billy; Montgomery, Stephen B.

2014-01-01

Personal exome and genome sequencing provides access to loss-of-function and rare deleterious alleles whose interpretation is expected to provide insight into individual disease burden. However, for each allele, accurate interpretation of its effect will depend on both its penetrance and the trait's expressivity. In this regard, an important factor that can modify the effect of a pathogenic coding allele is its level of expression; a factor which itself characteristically changes across tissues. To better inform the degree to which pathogenic alleles can be modified by expression level across multiple tissues, we have conducted exome, RNA and deep, targeted allele-specific expression (ASE) sequencing in ten tissues obtained from a single individual. By combining such data, we report the impact of rare and common loss-of-function variants on allelic expression exposing stronger allelic bias for rare stop-gain variants and informing the extent to which rare deleterious coding alleles are consistently expressed across tissues. This study demonstrates the potential importance of transcriptome data to the interpretation of pathogenic protein-coding variants. PMID:24786518
Polypeptide having or assisting in carbohydrate material degrading activity and uses thereof

DOEpatents

Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Los, Alrik Pieter

2016-02-16

The invention relates to a polypeptide which comprises the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 76% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 76% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having beta-glucosidase activity and uses thereof

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schoonneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; De Jong, Rene Marcel

The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well asmore » the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.« less
Polypeptide having swollenin activity and uses thereof

DOEpatents

Schoonneveld-Bergmans, Margot Elizabeth Francoise; Heijne, Wilbert Herman Marie; Vlasie, Monica D; Damveld, Robbertus Antonius

2015-11-04

The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having beta-glucosidase activity and uses thereof

DOEpatents

Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; De Jong, Rene Marcel; Damveld, Robbertus Antonius

2015-09-01

The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 70% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 70% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having cellobiohydrolase activity and uses thereof

DOEpatents

Sagt, Cornelis Maria Jacobus; Schooneveld-Bergmans, Margot Elisabeth Francoise; Roubos, Johannes Andries; Los, Alrik Pieter

2015-09-15

The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 93% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 93% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having acetyl xylan esterase activity and uses thereof

DOEpatents

Schoonneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Los, Alrik Pieter

2015-10-20

The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 82% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 82% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having carbohydrate degrading activity and uses thereof

DOEpatents

Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Vlasie, Monica Diana; Damveld, Robbertus Antonius

2015-08-18

The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Recurrent Coding Sequence Variation Explains Only A Small Fraction of the Genetic Architecture of Colorectal Cancer

PubMed Central

Timofeeva, Maria N.; Kinnersley, Ben; Farrington, Susan M.; Whiffin, Nicola; Palles, Claire; Svinti, Victoria; Lloyd, Amy; Gorman, Maggie; Ooi, Li-Yin; Hosking, Fay; Barclay, Ella; Zgaga, Lina; Dobbins, Sara; Martin, Lynn; Theodoratou, Evropi; Broderick, Peter; Tenesa, Albert; Smillie, Claire; Grimes, Graeme; Hayward, Caroline; Campbell, Archie; Porteous, David; Deary, Ian J.; Harris, Sarah E.; Northwood, Emma L.; Barrett, Jennifer H.; Smith, Gillian; Wolf, Roland; Forman, David; Morreau, Hans; Ruano, Dina; Tops, Carli; Wijnen, Juul; Schrumpf, Melanie; Boot, Arnoud; Vasen, Hans F A; Hes, Frederik J.; van Wezel, Tom; Franke, Andre; Lieb, Wolgang; Schafmayer, Clemens; Hampe, Jochen; Buch, Stephan; Propping, Peter; Hemminki, Kari; Försti, Asta; Westers, Helga; Hofstra, Robert; Pinheiro, Manuela; Pinto, Carla; Teixeira, Manuel; Ruiz-Ponte, Clara; Fernández-Rozadilla, Ceres; Carracedo, Angel; Castells, Antoni; Castellví-Bel, Sergi; Campbell, Harry; Bishop, D. Timothy; Tomlinson, Ian P M; Dunlop, Malcolm G.; Houlston, Richard S.

2015-01-01

Whilst common genetic variation in many non-coding genomic regulatory regions are known to impart risk of colorectal cancer (CRC), much of the heritability of CRC remains unexplained. To examine the role of recurrent coding sequence variation in CRC aetiology, we genotyped 12,638 CRCs cases and 29,045 controls from six European populations. Single-variant analysis identified a coding variant (rs3184504) in SH2B3 (12q24) associated with CRC risk (OR = 1.08, P = 3.9 × 10−7), and novel damaging coding variants in 3 genes previously tagged by GWAS efforts; rs16888728 (8q24) in UTP23 (OR = 1.15, P = 1.4 × 10−7); rs6580742 and rs12303082 (12q13) in FAM186A (OR = 1.11, P = 1.2 × 10−7 and OR = 1.09, P = 7.4 × 10−8); rs1129406 (12q13) in ATF1 (OR = 1.11, P = 8.3 × 10−9), all reaching exome-wide significance levels. Gene based tests identified associations between CRC and PCDHGA genes (P < 2.90 × 10−6). We found an excess of rare, damaging variants in base-excision (P = 2.4 × 10−4) and DNA mismatch repair genes (P = 6.1 × 10−4) consistent with a recessive mode of inheritance. This study comprehensively explores the contribution of coding sequence variation to CRC risk, identifying associations with coding variation in 4 genes and PCDHG gene cluster and several candidate recessive alleles. However, these findings suggest that recurrent, low-frequency coding variants account for a minority of the unexplained heritability of CRC. PMID:26553438
New genetic variants of LATS1 detected in urinary bladder and colon cancer.

PubMed

Saadeldin, Mona K; Shawer, Heba; Mostafa, Ahmed; Kassem, Neemat M; Amleh, Asma; Siam, Rania

2014-01-01

LATS1, the large tumor suppressor 1 gene, encodes for a serine/threonine kinase protein and is implicated in cell cycle progression. LATS1 is down-regulated in various human cancers, such as breast cancer, and astrocytoma. Point mutations in LATS1 were reported in human sarcomas. Additionally, loss of heterozygosity of LATS1 chromosomal region predisposes to breast, ovarian, and cervical tumors. In the current study, we investigated LATS1 genetic variations including single nucleotide polymorphisms (SNPs), in 28 Egyptian patients with either urinary bladder or colon cancers. The LATS1 gene was amplified and sequenced and the expression of LATS1 at the RNA level was assessed in 12 urinary bladder cancer samples. We report, the identification of a total of 29 variants including previously identified SNPs within LATS1 coding and non-coding sequences. A total of 18 variants were novel. Majority of the novel variants, 13, were mapped to intronic sequences and un-translated regions of the gene. Four of the five novel variants located in the coding region of the gene, represented missense mutations within the serine/threonine kinase catalytic domain. Interestingly, LATS1 RNA steady state levels was lost in urinary bladder cancerous tissue harboring four specific SNPs (16045 + 41736 + 34614 + 56177) positioned in the 5'UTR, intron 6, and two silent mutations within exon 4 and exon 8, respectively. This study identifies novel single-base-sequence alterations in the LATS1 gene. These newly identified variants could potentially be used as novel diagnostic or prognostic tools in cancer.
Network perturbation by recurrent regulatory variants in cancer

PubMed Central

Cho, Ara; Lee, Insuk; Choi, Jung Kyoon

2017-01-01

Cancer driving genes have been identified as recurrently affected by variants that alter protein-coding sequences. However, a majority of cancer variants arise in noncoding regions, and some of them are thought to play a critical role through transcriptional perturbation. Here we identified putative transcriptional driver genes based on combinatorial variant recurrence in cis-regulatory regions. The identified genes showed high connectivity in the cancer type-specific transcription regulatory network, with high outdegree and many downstream genes, highlighting their causative role during tumorigenesis. In the protein interactome, the identified transcriptional drivers were not as highly connected as coding driver genes but appeared to form a network module centered on the coding drivers. The coding and regulatory variants associated via these interactions between the coding and transcriptional drivers showed exclusive and complementary occurrence patterns across tumor samples. Transcriptional cancer drivers may act through an extensive perturbation of the regulatory network and by altering protein network modules through interactions with coding driver genes. PMID:28333928

Pooled Sequencing of 531 Genes in Inflammatory Bowel Disease Identifies an Associated Rare Variant in BTNL2 and Implicates Other Immune Related Genes

PubMed Central

Prescott, Natalie J.; Lehne, Benjamin; Stone, Kristina; Lee, James C.; Taylor, Kirstin; Knight, Jo; Papouli, Efterpi; Mirza, Muddassar M.; Simpson, Michael A.; Spain, Sarah L.; Lu, Grace; Fraternali, Franca; Bumpstead, Suzannah J.; Gray, Emma; Amar, Ariella; Bye, Hannah; Green, Peter; Chung-Faye, Guy; Hayee, Bu’Hussain; Pollok, Richard; Satsangi, Jack; Parkes, Miles; Barrett, Jeffrey C.; Mansfield, John C.; Sanderson, Jeremy; Lewis, Cathryn M.; Weale, Michael E.; Schlitt, Thomas; Mathew, Christopher G.

2015-01-01

The contribution of rare coding sequence variants to genetic susceptibility in complex disorders is an important but unresolved question. Most studies thus far have investigated a limited number of genes from regions which contain common disease associated variants. Here we investigate this in inflammatory bowel disease by sequencing the exons and proximal promoters of 531 genes selected from both genome-wide association studies and pathway analysis in pooled DNA panels from 474 cases of Crohn’s disease and 480 controls. 80 variants with evidence of association in the sequencing experiment or with potential functional significance were selected for follow up genotyping in 6,507 IBD cases and 3,064 population controls. The top 5 disease associated variants were genotyped in an extension panel of 3,662 IBD cases and 3,639 controls, and tested for association in a combined analysis of 10,147 IBD cases and 7,008 controls. A rare coding variant p.G454C in the BTNL2 gene within the major histocompatibility complex was significantly associated with increased risk for IBD (p = 9.65x10−10, OR = 2.3[95% CI = 1.75–3.04]), but was independent of the known common associated CD and UC variants at this locus. Rare (<1%) and low frequency (1–5%) variants in 3 additional genes showed suggestive association (p<0.005) with either an increased risk (ARIH2 c.338-6C>T) or decreased risk (IL12B p.V298F, and NICN p.H191R) of IBD. These results provide additional insights into the involvement of the inhibition of T cell activation in the development of both sub-phenotypes of inflammatory bowel disease. We suggest that although rare coding variants may make a modest overall contribution to complex disease susceptibility, they can inform our understanding of the molecular pathways that contribute to pathogenesis. PMID:25671699
Rare coding variation in paraoxonase-1 is associated with ischemic stroke in the NHLBI Exome Sequencing Project.

PubMed

Kim, Daniel Seung; Crosslin, David R; Auer, Paul L; Suzuki, Stephanie M; Marsillach, Judit; Burt, Amber A; Gordon, Adam S; Meschia, James F; Nalls, Mike A; Worrall, Bradford B; Longstreth, W T; Gottesman, Rebecca F; Furlong, Clement E; Peters, Ulrike; Rich, Stephen S; Nickerson, Deborah A; Jarvik, Gail P

2014-06-01

HDL-associated paraoxonase-1 (PON1) is an enzyme whose activity is associated with cerebrovascular disease. Common PON1 genetic variants have not been consistently associated with cerebrovascular disease. Rare coding variation that likely alters PON1 enzyme function may be more strongly associated with stroke. The National Heart, Lung, and Blood Institute Exome Sequencing Project sequenced the coding regions (exomes) of the genome for heart, lung, and blood-related phenotypes (including ischemic stroke). In this sample of 4,204 unrelated participants, 496 had verified, noncardioembolic ischemic stroke. After filtering, 28 nonsynonymous PON1 variants were identified. Analysis with the sequence kernel association test, adjusted for covariates, identified significant associations between PON1 variants and ischemic stroke (P = 3.01 × 10(-3)). Stratified analyses demonstrated a stronger association of PON1 variants with ischemic stroke in African ancestry (AA) participants (P = 5.03 × 10(-3)). Ethnic differences in the association between PON1 variants with stroke could be due to the effects of PON1Val109Ile (overall P = 7.88 × 10(-3); AA P = 6.52 × 10(-4)), found at higher frequency in AA participants (1.16% vs. 0.02%) and whose protein is less stable than the common allele. In summary, rare genetic variation in PON1 was associated with ischemic stroke, with stronger associations identified in those of AA. Increased focus on PON1 enzyme function and its role in cerebrovascular disease is warranted.
The impact of rare variation on gene expression across tissues.

PubMed

Li, Xin; Kim, Yungil; Tsang, Emily K; Davis, Joe R; Damani, Farhan N; Chiang, Colby; Hess, Gaelen T; Zappala, Zachary; Strober, Benjamin J; Scott, Alexandra J; Li, Amy; Ganna, Andrea; Bassik, Michael C; Merker, Jason D; Hall, Ira M; Battle, Alexis; Montgomery, Stephen B

2017-10-11

Rare genetic variants are abundant in humans and are expected to contribute to individual disease risk. While genetic association studies have successfully identified common genetic variants associated with susceptibility, these studies are not practical for identifying rare variants. Efforts to distinguish pathogenic variants from benign rare variants have leveraged the genetic code to identify deleterious protein-coding alleles, but no analogous code exists for non-coding variants. Therefore, ascertaining which rare variants have phenotypic effects remains a major challenge. Rare non-coding variants have been associated with extreme gene expression in studies using single tissues, but their effects across tissues are unknown. Here we identify gene expression outliers, or individuals showing extreme expression levels for a particular gene, across 44 human tissues by using combined analyses of whole genomes and multi-tissue RNA-sequencing data from the Genotype-Tissue Expression (GTEx) project v6p release. We find that 58% of underexpression and 28% of overexpression outliers have nearby conserved rare variants compared to 8% of non-outliers. Additionally, we developed RIVER (RNA-informed variant effect on regulation), a Bayesian statistical model that incorporates expression data to predict a regulatory effect for rare variants with higher accuracy than models using genomic annotations alone. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues and provide an integrative method for interpretation of rare variants in individual genomes.
Clinical polyomavirus BK variants with agnogene deletion are non-functional but rescued by trans-complementation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Myhre, Marit Renee; Olsen, Gunn-Hege; Gosert, Rainer

High-level replication of polyomavirus BK (BKV) in kidney transplant recipients is associated with the emergence of BKV variants with rearranged (rr) non-coding control region (NCCR) increasing viral early gene expression and cytopathology. Cloning and sequencing revealed the presence of a BKV quasispecies which included non-functional variants when assayed in a recombinant virus assay. Here we report that the rr-NCCR of BKV variants RH-3 and RH-12, both bearing a NCCR deletion including the 5' end of the agnoprotein coding sequence, mediated early and late viral reporter gene expression in kidney cells. However, in a recombinant virus they failed to produce infectiousmore » progeny despite large T-antigen and VP1 expression and the formation of nuclear virus-like particles. Infectious progeny was generated when the agnogene was reconstructed in cis or agnoprotein provided in trans from a co-existing BKV rr-NCCR variant. We conclude that complementation can rescue non-functional BKV variants in vitro and possibly in vivo.« less
Novel GREM1 Variations in Sub-Saharan African Patients With Cleft Lip and/or Cleft Palate.

PubMed

Gowans, Lord Jephthah Joojo; Oseni, Ganiyu; Mossey, Peter A; Adeyemo, Wasiu Lanre; Eshete, Mekonen A; Busch, Tamara D; Donkor, Peter; Obiri-Yeboah, Solomon; Plange-Rhule, Gyikua; Oti, Alexander A; Owais, Arwa; Olaitan, Peter B; Aregbesola, Babatunde S; Oginni, Fadekemi O; Bello, Seidu A; Audu, Rosemary; Onwuamah, Chika; Agbenorku, Pius; Ogunlewe, Mobolanle O; Abdur-Rahman, Lukman O; Marazita, Mary L; Adeyemo, A A; Murray, Jeffrey C; Butali, Azeez

2018-05-01

Cleft lip and/or cleft palate (CL/P) are congenital anomalies of the face and have multifactorial etiology, with both environmental and genetic risk factors playing crucial roles. Though at least 40 loci have attained genomewide significant association with nonsyndromic CL/P, these loci largely reside in noncoding regions of the human genome, and subsequent resequencing studies of neighboring candidate genes have revealed only a limited number of etiologic coding variants. The present study was conducted to identify etiologic coding variants in GREM1, a locus that has been shown to be largely associated with cleft of both lip and soft palate. We resequenced DNA from 397 sub-Saharan Africans with CL/P and 192 controls using Sanger sequencing. Following analyses of the sequence data, we observed 2 novel coding variants in GREM1. These variants were not found in the 192 African controls and have never been previously reported in any public genetic variant database that includes more than 5000 combined African and African American controls or from the CL/P literature. The novel variants include p.Pro164Ser in an individual with soft palate cleft only and p.Gly61Asp in an individual with bilateral cleft lip and palate. The proband with the p.Gly61Asp GREM1 variant is a van der Woude (VWS) case who also has an etiologic variant in IRF6 gene. Our study demonstrated that there is low number of etiologic coding variants in GREM1, confirming earlier suggestions that variants in regulatory elements may largely account for the association between this locus and CL/P.
A Bioinformatics Workflow for Variant Peptide Detection in Shotgun Proteomics*

PubMed Central

Li, Jing; Su, Zengliu; Ma, Ze-Qiang; Slebos, Robbert J. C.; Halvey, Patrick; Tabb, David L.; Liebler, Daniel C.; Pao, William; Zhang, Bing

2011-01-01

Shotgun proteomics data analysis usually relies on database search. However, commonly used protein sequence databases do not contain information on protein variants and thus prevent variant peptides and proteins from been identified. Including known coding variations into protein sequence databases could help alleviate this problem. Based on our recently published human Cancer Proteome Variation Database, we have created a protein sequence database that comprehensively annotates thousands of cancer-related coding variants collected in the Cancer Proteome Variation Database as well as noncancer-specific ones from the Single Nucleotide Polymorphism Database (dbSNP). Using this database, we then developed a data analysis workflow for variant peptide identification in shotgun proteomics. The high risk of false positive variant identifications was addressed by a modified false discovery rate estimation method. Analysis of colorectal cancer cell lines SW480, RKO, and HCT-116 revealed a total of 81 peptides that contain either noncancer-specific or cancer-related variations. Twenty-three out of 26 variants randomly selected from the 81 were confirmed by genomic sequencing. We further applied the workflow on data sets from three individual colorectal tumor specimens. A total of 204 distinct variant peptides were detected, and five carried known cancer-related mutations. Each individual showed a specific pattern of cancer-related mutations, suggesting potential use of this type of information for personalized medicine. Compatibility of the workflow has been tested with four popular database search engines including Sequest, Mascot, X!Tandem, and MyriMatch. In summary, we have developed a workflow that effectively uses existing genomic data to enable variant peptide detection in proteomics. PMID:21389108
Rare, low frequency, and common coding variants in CHRNA5 and their contribution to nicotine dependence in European and African Americans

PubMed Central

Olfson, Emily; Saccone, Nancy L.; Johnson, Eric O.; Chen, Li-Shiun; Culverhouse, Robert; Doheny, Kimberly; Foltz, Steven M.; Fox, Louis; Gogarten, Stephanie M.; Hartz, Sarah; Hetrick, Kurt; Laurie, Cathy C.; Marosy, Beth; Amin, Najaf; Arnett, Donna; Barr, R. Graham; Bartz, Traci M.; Bertelsen, Sarah; Borecki, Ingrid B.; Brown, Michael R.; Chasman, Daniel I.; van Duijn, Cornelia M.; Feitosa, Mary F.; Fox, Ervin R.; Franceschini, Nora; Franco, Oscar H.; Grove, Megan L.; Guo, Xiuqing; Hofman, Albert; Kardia, Sharon L.R.; Morrison, Alanna C.; Musani, Solomon K.; Psaty, Bruce M.; Rao, D.C.; Reiner, Alex P.; Rice, Kenneth; Ridker, Paul M.; Rose, Lynda M.; Schick, Ursula M.; Schwander, Karen; Uitterlinden, Andre G.; Vojinovic, Dina; Wang, Jen-Chyong; Ware, Erin B.; Wilson, Gregory; Yao, Jie; Zhao, Wei; Breslau, Naomi; Hatsukami, Dorothy; Stitzel, Jerry A.; Rice, John; Goate, Alison; Bierut, Laura J.

2015-01-01

The common nonsynonymous variant rs16969968 in the α5 nicotinic receptor subunit gene (CHRNA5) is the strongest genetic risk factor for nicotine dependence in European Americans and contributes to risk in African Americans. To comprehensively examine whether other CHRNA5 coding variation influences nicotine dependence risk, we performed targeted sequencing on 1582 nicotine dependent cases (Fagerström Test for Nicotine Dependence score≥4) and 1238 non-dependent controls, with independent replication of common and low frequency variants using 12 studies with exome chip data. Nicotine dependence was examined using logistic regression with individual common variants (MAF≥0.05), aggregate low frequency variants (0.05>MAF≥0.005), and aggregate rare variants (MAF<0.005). Meta-analysis of primary results was performed with replication studies containing 12 174 heavy and 11 290 light smokers. Next-generation sequencing with 180X coverage identified 24 nonsynonymous variants and 2 frameshift deletions in CHRNA5, including 9 novel variants in the 2820 subjects. Meta-analysis confirmed the risk effect of the only common variant (rs16969968, European ancestry: OR=1.3, p=3.5×10−11; African ancestry: OR=1.3, p=0.01) and demonstrated that 3 low frequency variants contributed an independent risk (aggregate term, European ancestry: OR=1.3, p=0.005; African ancestry: OR=1.4, p=0.0006). The remaining 22 rare coding variants were associated with increased risk of nicotine dependence in the European American primary sample (OR=12.9, p=0.01) and in the same risk direction in African Americans (OR=1.5, p=0.37). Our results indicate that common, low frequency and rare CHRNA5 coding variants are independently associated with nicotine dependence risk. These newly identified variants likely influence risk for smoking-related diseases such as lung cancer. PMID:26239294
Rare, low frequency and common coding variants in CHRNA5 and their contribution to nicotine dependence in European and African Americans.

PubMed

Olfson, E; Saccone, N L; Johnson, E O; Chen, L-S; Culverhouse, R; Doheny, K; Foltz, S M; Fox, L; Gogarten, S M; Hartz, S; Hetrick, K; Laurie, C C; Marosy, B; Amin, N; Arnett, D; Barr, R G; Bartz, T M; Bertelsen, S; Borecki, I B; Brown, M R; Chasman, D I; van Duijn, C M; Feitosa, M F; Fox, E R; Franceschini, N; Franco, O H; Grove, M L; Guo, X; Hofman, A; Kardia, S L R; Morrison, A C; Musani, S K; Psaty, B M; Rao, D C; Reiner, A P; Rice, K; Ridker, P M; Rose, L M; Schick, U M; Schwander, K; Uitterlinden, A G; Vojinovic, D; Wang, J-C; Ware, E B; Wilson, G; Yao, J; Zhao, W; Breslau, N; Hatsukami, D; Stitzel, J A; Rice, J; Goate, A; Bierut, L J

2016-05-01

The common nonsynonymous variant rs16969968 in the α5 nicotinic receptor subunit gene (CHRNA5) is the strongest genetic risk factor for nicotine dependence in European Americans and contributes to risk in African Americans. To comprehensively examine whether other CHRNA5 coding variation influences nicotine dependence risk, we performed targeted sequencing on 1582 nicotine-dependent cases (Fagerström Test for Nicotine Dependence score⩾4) and 1238 non-dependent controls, with independent replication of common and low frequency variants using 12 studies with exome chip data. Nicotine dependence was examined using logistic regression with individual common variants (minor allele frequency (MAF)⩾0.05), aggregate low frequency variants (0.05>MAF⩾0.005) and aggregate rare variants (MAF<0.005). Meta-analysis of primary results was performed with replication studies containing 12 174 heavy and 11 290 light smokers. Next-generation sequencing with 180 × coverage identified 24 nonsynonymous variants and 2 frameshift deletions in CHRNA5, including 9 novel variants in the 2820 subjects. Meta-analysis confirmed the risk effect of the only common variant (rs16969968, European ancestry: odds ratio (OR)=1.3, P=3.5 × 10(-11); African ancestry: OR=1.3, P=0.01) and demonstrated that three low frequency variants contributed an independent risk (aggregate term, European ancestry: OR=1.3, P=0.005; African ancestry: OR=1.4, P=0.0006). The remaining 22 rare coding variants were associated with increased risk of nicotine dependence in the European American primary sample (OR=12.9, P=0.01) and in the same risk direction in African Americans (OR=1.5, P=0.37). Our results indicate that common, low frequency and rare CHRNA5 coding variants are independently associated with nicotine dependence risk. These newly identified variants likely influence the risk for smoking-related diseases such as lung cancer.
Carbohydrate degrading polypeptide and uses thereof

DOEpatents

Sagt, Cornelis Maria Jacobus; Schooneveld-Bergmans, Margot Elisabeth Francoise; Roubos, Johannes Andries; Los, Alrik Pieter

2015-10-20

The invention relates to a polypeptide having carbohydrate material degrading activity which comprises the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1 or SEQ ID NO: 4, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional protein and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants.

PubMed

Ioannidis, Nilah M; Rothstein, Joseph H; Pejaver, Vikas; Middha, Sumit; McDonnell, Shannon K; Baheti, Saurabh; Musolf, Anthony; Li, Qing; Holzinger, Emily; Karyadi, Danielle; Cannon-Albright, Lisa A; Teerlink, Craig C; Stanford, Janet L; Isaacs, William B; Xu, Jianfeng; Cooney, Kathleen A; Lange, Ethan M; Schleutker, Johanna; Carpten, John D; Powell, Isaac J; Cussenot, Olivier; Cancel-Tassin, Geraldine; Giles, Graham G; MacInnis, Robert J; Maier, Christiane; Hsieh, Chih-Lin; Wiklund, Fredrik; Catalona, William J; Foulkes, William D; Mandal, Diptasri; Eeles, Rosalind A; Kote-Jarai, Zsofia; Bustamante, Carlos D; Schaid, Daniel J; Hastie, Trevor; Ostrander, Elaine A; Bailey-Wilson, Joan E; Radivojac, Predrag; Thibodeau, Stephen N; Whittemore, Alice S; Sieh, Weiva

2016-10-06

The vast majority of coding variants are rare, and assessment of the contribution of rare variants to complex traits is hampered by low statistical power and limited functional data. Improved methods for predicting the pathogenicity of rare coding variants are needed to facilitate the discovery of disease variants from exome sequencing studies. We developed REVEL (rare exome variant ensemble learner), an ensemble method for predicting the pathogenicity of missense variants on the basis of individual tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP, SiPhy, phyloP, and phastCons. REVEL was trained with recently discovered pathogenic and rare neutral missense variants, excluding those previously used to train its constituent tools. When applied to two independent test sets, REVEL had the best overall performance (p < 10 -12 ) as compared to any individual tool and seven ensemble methods: MetaSVM, MetaLR, KGGSeq, Condel, CADD, DANN, and Eigen. Importantly, REVEL also had the best performance for distinguishing pathogenic from rare neutral variants with allele frequencies <0.5%. The area under the receiver operating characteristic curve (AUC) for REVEL was 0.046-0.182 higher in an independent test set of 935 recent SwissVar disease variants and 123,935 putatively neutral exome sequencing variants and 0.027-0.143 higher in an independent test set of 1,953 pathogenic and 2,406 benign variants recently reported in ClinVar than the AUCs for other ensemble methods. We provide pre-computed REVEL scores for all possible human missense variants to facilitate the identification of pathogenic variants in the sea of rare variants discovered as sequencing studies expand in scale. Copyright © 2016 American Society of Human Genetics. All rights reserved.
Rare coding variation in paraoxonase-1 is associated with ischemic stroke in the NHLBI Exome Sequencing Project[S

PubMed Central

Kim, Daniel Seung; Crosslin, David R.; Auer, Paul L.; Suzuki, Stephanie M.; Marsillach, Judit; Burt, Amber A.; Gordon, Adam S.; Meschia, James F.; Nalls, Mike A.; Worrall, Bradford B.; Longstreth, W. T.; Gottesman, Rebecca F.; Furlong, Clement E.; Peters, Ulrike; Rich, Stephen S.; Nickerson, Deborah A.; Jarvik, Gail P.

2014-01-01

HDL-associated paraoxonase-1 (PON1) is an enzyme whose activity is associated with cerebrovascular disease. Common PON1 genetic variants have not been consistently associated with cerebrovascular disease. Rare coding variation that likely alters PON1 enzyme function may be more strongly associated with stroke. The National Heart, Lung, and Blood Institute Exome Sequencing Project sequenced the coding regions (exomes) of the genome for heart, lung, and blood-related phenotypes (including ischemic stroke). In this sample of 4,204 unrelated participants, 496 had verified, noncardioembolic ischemic stroke. After filtering, 28 nonsynonymous PON1 variants were identified. Analysis with the sequence kernel association test, adjusted for covariates, identified significant associations between PON1 variants and ischemic stroke (P = 3.01 × 10−3). Stratified analyses demonstrated a stronger association of PON1 variants with ischemic stroke in African ancestry (AA) participants (P = 5.03 × 10−3). Ethnic differences in the association between PON1 variants with stroke could be due to the effects of PON1Val109Ile (overall P = 7.88 × 10−3; AA P = 6.52 × 10−4), found at higher frequency in AA participants (1.16% vs. 0.02%) and whose protein is less stable than the common allele. In summary, rare genetic variation in PON1 was associated with ischemic stroke, with stronger associations identified in those of AA. Increased focus on PON1 enzyme function and its role in cerebrovascular disease is warranted. PMID:24711634
Chromatin accessibility prediction via a hybrid deep convolutional neural network.

PubMed

Liu, Qiao; Xia, Fei; Yin, Qijin; Jiang, Rui

2018-03-01

A majority of known genetic variants associated with human-inherited diseases lie in non-coding regions that lack adequate interpretation, making it indispensable to systematically discover functional sites at the whole genome level and precisely decipher their implications in a comprehensive manner. Although computational approaches have been complementing high-throughput biological experiments towards the annotation of the human genome, it still remains a big challenge to accurately annotate regulatory elements in the context of a specific cell type via automatic learning of the DNA sequence code from large-scale sequencing data. Indeed, the development of an accurate and interpretable model to learn the DNA sequence signature and further enable the identification of causative genetic variants has become essential in both genomic and genetic studies. We proposed Deopen, a hybrid framework mainly based on a deep convolutional neural network, to automatically learn the regulatory code of DNA sequences and predict chromatin accessibility. In a series of comparison with existing methods, we show the superior performance of our model in not only the classification of accessible regions against background sequences sampled at random, but also the regression of DNase-seq signals. Besides, we further visualize the convolutional kernels and show the match of identified sequence signatures and known motifs. We finally demonstrate the sensitivity of our model in finding causative noncoding variants in the analysis of a breast cancer dataset. We expect to see wide applications of Deopen with either public or in-house chromatin accessibility data in the annotation of the human genome and the identification of non-coding variants associated with diseases. Deopen is freely available at https://github.com/kimmo1019/Deopen. ruijiang@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Functional Testing of SLC26A4 Variants—Clinical and Molecular Analysis of a Cohort with Enlarged Vestibular Aqueduct from Austria

PubMed Central

Bernardinelli, Emanuele; Nofziger, Charity; Patsch, Wolfgang; Rasp, Gerd; Paulmichl, Markus; Dossena, Silvia

2018-01-01

The prevalence and spectrum of sequence alterations in the SLC26A4 gene, which codes for the anion exchanger pendrin, are population-specific and account for at least 50% of cases of non-syndromic hearing loss associated with an enlarged vestibular aqueduct. A cohort of nineteen patients from Austria with hearing loss and a radiological alteration of the vestibular aqueduct underwent Sanger sequencing of SLC26A4 and GJB2, coding for connexin 26. The pathogenicity of sequence alterations detected was assessed by determining ion transport and molecular features of the corresponding SLC26A4 protein variants. In this group, four uncharacterized sequence alterations within the SLC26A4 coding region were found. Three of these lead to protein variants with abnormal functional and molecular features, while one should be considered with no pathogenic potential. Pathogenic SLC26A4 sequence alterations were only found in 12% of patients. SLC26A4 sequence alterations commonly found in other Caucasian populations were not detected. This survey represents the first study on the prevalence and spectrum of SLC26A4 sequence alterations in an Austrian cohort and further suggests that genetic testing should always be integrated with functional characterization and determination of the molecular features of protein variants in order to unequivocally identify or exclude a causal link between genotype and phenotype. PMID:29320412
A survey of single nucleotide polymorphisms identified from whole-genome sequencing and their functional effect in the porcine genome.

PubMed

Keel, B N; Nonneman, D J; Rohrer, G A

2017-08-01

Genetic variants detected from sequence have been used to successfully identify causal variants and map complex traits in several organisms. High and moderate impact variants, those expected to alter or disrupt the protein coded by a gene and those that regulate protein production, likely have a more significant effect on phenotypic variation than do other types of genetic variants. Hence, a comprehensive list of these functional variants would be of considerable interest in swine genomic studies, particularly those targeting fertility and production traits. Whole-genome sequence was obtained from 72 of the founders of an intensely phenotyped experimental swine herd at the U.S. Meat Animal Research Center (USMARC). These animals included all 24 of the founding boars (12 Duroc and 12 Landrace) and 48 Yorkshire-Landrace composite sows. Sequence reads were mapped to the Sscrofa10.2 genome build, resulting in a mean of 6.1 fold (×) coverage per genome. A total of 22 342 915 high confidence SNPs were identified from the sequenced genomes. These included 21 million previously reported SNPs and 79% of the 62 163 SNPs on the PorcineSNP60 BeadChip assay. Variation was detected in the coding sequence or untranslated regions (UTRs) of 87.8% of the genes in the porcine genome: loss-of-function variants were predicted in 504 genes, 10 202 genes contained nonsynonymous variants, 10 773 had variation in UTRs and 13 010 genes contained synonymous variants. Approximately 139 000 SNPs were classified as loss-of-function, nonsynonymous or regulatory, which suggests that over 99% of the variation detected in our pigs could potentially be ignored, allowing us to focus on a much smaller number of functional SNPs during future analyses. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.
Rare coding variants in the phospholipase D3 gene confer risk for Alzheimer's disease

NASA Astrophysics Data System (ADS)

2014-01-01

Genome-wide association studies (GWAS) have identified several risk variants for late-onset Alzheimer's disease (LOAD). These common variants have replicable but small effects on LOAD risk and generally do not have obvious functional effects. Low-frequency coding variants, not detected by GWAS, are predicted to include functional variants with larger effects on risk. To identify low-frequency coding variants with large effects on LOAD risk, we carried out whole-exome sequencing (WES) in 14 large LOAD families and follow-up analyses of the candidate variants in several large LOAD case-control data sets. A rare variant in PLD3 (phospholipase D3; Val232Met) segregated with disease status in two independent families and doubled risk for Alzheimer's disease in seven independent case-control series with a total of more than 11,000 cases and controls of European descent. Gene-based burden analyses in 4,387 cases and controls of European descent and 302 African American cases and controls, with complete sequence data for PLD3, reveal that several variants in this gene increase risk for Alzheimer's disease in both populations. PLD3 is highly expressed in brain regions that are vulnerable to Alzheimer's disease pathology, including hippocampus and cortex, and is expressed at significantly lower levels in neurons from Alzheimer's disease brains compared to control brains. Overexpression of PLD3 leads to a significant decrease in intracellular amyloid-β precursor protein (APP) and extracellular Aβ42 and Aβ40 (the 42- and 40-residue isoforms of the amyloid-β peptide), and knockdown of PLD3 leads to a significant increase in extracellular Aβ42 and Aβ40. Together, our genetic and functional data indicate that carriers of PLD3 coding variants have a twofold increased risk for LOAD and that PLD3 influences APP processing. This study provides an example of how densely affected families may help to identify rare variants with large effects on risk for disease or other complex traits.
Rare coding variants in the phospholipase D3 gene confer risk for Alzheimer's disease.

PubMed

Cruchaga, Carlos; Karch, Celeste M; Jin, Sheng Chih; Benitez, Bruno A; Cai, Yefei; Guerreiro, Rita; Harari, Oscar; Norton, Joanne; Budde, John; Bertelsen, Sarah; Jeng, Amanda T; Cooper, Breanna; Skorupa, Tara; Carrell, David; Levitch, Denise; Hsu, Simon; Choi, Jiyoon; Ryten, Mina; Sassi, Celeste; Bras, Jose; Gibbs, Raphael J; Hernandez, Dena G; Lupton, Michelle K; Powell, John; Forabosco, Paola; Ridge, Perry G; Corcoran, Christopher D; Tschanz, JoAnn T; Norton, Maria C; Munger, Ronald G; Schmutz, Cameron; Leary, Maegan; Demirci, F Yesim; Bamne, Mikhil N; Wang, Xingbin; Lopez, Oscar L; Ganguli, Mary; Medway, Christopher; Turton, James; Lord, Jenny; Braae, Anne; Barber, Imelda; Brown, Kristelle; Pastor, Pau; Lorenzo-Betancor, Oswaldo; Brkanac, Zoran; Scott, Erick; Topol, Eric; Morgan, Kevin; Rogaeva, Ekaterina; Singleton, Andy; Hardy, John; Kamboh, M Ilyas; George-Hyslop, Peter St; Cairns, Nigel; Morris, John C; Kauwe, John S K; Goate, Alison M

2014-01-23

Genome-wide association studies (GWAS) have identified several risk variants for late-onset Alzheimer's disease (LOAD). These common variants have replicable but small effects on LOAD risk and generally do not have obvious functional effects. Low-frequency coding variants, not detected by GWAS, are predicted to include functional variants with larger effects on risk. To identify low-frequency coding variants with large effects on LOAD risk, we carried out whole-exome sequencing (WES) in 14 large LOAD families and follow-up analyses of the candidate variants in several large LOAD case-control data sets. A rare variant in PLD3 (phospholipase D3; Val232Met) segregated with disease status in two independent families and doubled risk for Alzheimer's disease in seven independent case-control series with a total of more than 11,000 cases and controls of European descent. Gene-based burden analyses in 4,387 cases and controls of European descent and 302 African American cases and controls, with complete sequence data for PLD3, reveal that several variants in this gene increase risk for Alzheimer's disease in both populations. PLD3 is highly expressed in brain regions that are vulnerable to Alzheimer's disease pathology, including hippocampus and cortex, and is expressed at significantly lower levels in neurons from Alzheimer's disease brains compared to control brains. Overexpression of PLD3 leads to a significant decrease in intracellular amyloid-β precursor protein (APP) and extracellular Aβ42 and Aβ40 (the 42- and 40-residue isoforms of the amyloid-β peptide), and knockdown of PLD3 leads to a significant increase in extracellular Aβ42 and Aβ40. Together, our genetic and functional data indicate that carriers of PLD3 coding variants have a twofold increased risk for LOAD and that PLD3 influences APP processing. This study provides an example of how densely affected families may help to identify rare variants with large effects on risk for disease or other complex traits.
MACARON: A python framework to identify and re-annotate multi-base affected codons in whole genome/exome sequence data.

PubMed

Khan, Waqasuddin; Saripella, Ganapathi Varma-; Ludwig, Thomas; Cuppens, Tania; Thibord, Florian; Génin, Emmanuelle; Deleuze, Jean-Francois; Trégouët, David-Alexandre

2018-05-03

Predicted deleteriousness of coding variants is a frequently used criterion to filter out variants detected in next-generation sequencing projects and to select candidates impacting on the risk of human diseases. Most available dedicated tools implement a base-to-base annotation approach that could be biased in presence of several variants in the same genetic codon. We here proposed the MACARON program that, from a standard VCF file, identifies, re-annotates and predicts the amino acid change resulting from multiple single nucleotide variants (SNVs) within the same genetic codon. Applied to the whole exome dataset of 573 individuals, MACARON identifies 114 situations where multiple SNVs within a genetic codon induce an amino acid change that is different from those predicted by standard single SNV annotation tool. Such events are not uncommon and deserve to be studied in sequencing projects with inconclusive findings. MACARON is written in python with codes available on the GENMED website (www.genmed.fr). david-alexandre.tregouet@inserm.fr. Supplementary data are available at Bioinformatics online.
Exon 11 skipping of SCN10A coding for voltage-gated sodium channels in dorsal root ganglia

PubMed Central

Schirmeyer, Jana; Szafranski, Karol; Leipold, Enrico; Mawrin, Christian; Platzer, Matthias; Heinemann, Stefan H

2014-01-01

The voltage-gated sodium channel NaV1.8 (encoded by SCN10A) is predominantly expressed in dorsal root ganglia (DRG) and plays a critical role in pain perception. We analyzed SCN10A transcripts isolated from human DRGs using deep sequencing and found a novel splice variant lacking exon 11, which codes for 98 amino acids of the domain I/II linker. Quantitative PCR analysis revealed an abundance of this variant of up to 5–10% in human, while no such variants were detected in mouse or rat. Since no obvious functional differences between channels with and without the exon-11 sequence were detected, it is suggested that SCN10A exon 11 skipping in humans is a tolerated event. PMID:24763188
Whole exome sequencing for familial bicuspid aortic valve identifies putative variants.

PubMed

Martin, Lisa J; Pilipenko, Valentina; Kaufman, Kenneth M; Cripe, Linda; Kottyan, Leah C; Keddache, Mehdi; Dexheimer, Phillip; Weirauch, Matthew T; Benson, D Woodrow

2014-10-01

Bicuspid aortic valve (BAV) is the most common congenital cardiovascular malformation. Although highly heritable, few causal variants have been identified. The purpose of this study was to identify genetic variants underlying BAV by whole exome sequencing a multiplex BAV kindred. Whole exome sequencing was performed on 17 individuals from a single family (BAV=3; other cardiovascular malformation, 3). Postvariant calling error control metrics were established after examining the relationship between Mendelian inheritance error rate and coverage, quality score, and call rate. To determine the most effective approach to identifying susceptibility variants from among 54 674 variants passing error control metrics, we evaluated 3 variant selection strategies frequently used in whole exome sequencing studies plus extended family linkage. No putative rare, high-effect variants were identified in all affected but no unaffected individuals. Eight high-effect variants were identified by ≥2 of the commonly used selection strategies; however, these were either common in the general population (>10%) or present in the majority of the unaffected family members. However, using extended family linkage, 3 synonymous variants were identified; all 3 variants were identified by at least one other strategy. These results suggest that traditional whole exome sequencing approaches, which assume causal variants alter coding sense, may be insufficient for BAV and other complex traits. Identification of disease-associated variants is facilitated by the use of segregation within families. © 2014 American Heart Association, Inc.
Platelet function is modified by common sequence variation in megakaryocyte super enhancers

PubMed Central

Petersen, Romina; Lambourne, John J.; Javierre, Biola M.; Grassi, Luigi; Kreuzhuber, Roman; Ruklisa, Dace; Rosa, Isabel M.; Tomé, Ana R.; Elding, Heather; van Geffen, Johanna P.; Jiang, Tao; Farrow, Samantha; Cairns, Jonathan; Al-Subaie, Abeer M.; Ashford, Sofie; Attwood, Antony; Batista, Joana; Bouman, Heleen; Burden, Frances; Choudry, Fizzah A.; Clarke, Laura; Flicek, Paul; Garner, Stephen F.; Haimel, Matthias; Kempster, Carly; Ladopoulos, Vasileios; Lenaerts, An-Sofie; Materek, Paulina M.; McKinney, Harriet; Meacham, Stuart; Mead, Daniel; Nagy, Magdolna; Penkett, Christopher J.; Rendon, Augusto; Seyres, Denis; Sun, Benjamin; Tuna, Salih; van der Weide, Marie-Elise; Wingett, Steven W.; Martens, Joost H.; Stegle, Oliver; Richardson, Sylvia; Vallier, Ludovic; Roberts, David J.; Freson, Kathleen; Wernisch, Lorenz; Stunnenberg, Hendrik G.; Danesh, John; Fraser, Peter; Soranzo, Nicole; Butterworth, Adam S.; Heemskerk, Johan W.; Turro, Ernest; Spivakov, Mikhail; Ouwehand, Willem H.; Astle, William J.; Downes, Kate; Kostadima, Myrto; Frontini, Mattia

2017-01-01

Linking non-coding genetic variants associated with the risk of diseases or disease-relevant traits to target genes is a crucial step to realize GWAS potential in the introduction of precision medicine. Here we set out to determine the mechanisms underpinning variant association with platelet quantitative traits using cell type-matched epigenomic data and promoter long-range interactions. We identify potential regulatory functions for 423 of 565 (75%) non-coding variants associated with platelet traits and we demonstrate, through ex vivo and proof of principle genome editing validation, that variants in super enhancers play an important role in controlling archetypical platelet functions. PMID:28703137

Nucleotide sequence determination of guinea-pig casein B mRNA reveals homology with bovine and rat alpha s1 caseins and conservation of the non-coding regions of the mRNA.

PubMed Central

Hall, L; Laird, J E; Craig, R K

1984-01-01

Nucleotide sequence analysis of cloned guinea-pig casein B cDNA sequences has identified two casein B variants related to the bovine and rat alpha s1 caseins. Amino acid homology was largely confined to the known bovine or predicted rat phosphorylation sites and within the 'signal' precursor sequence. Comparison of the deduced nucleotide sequence of the guinea-pig and rat alpha s1 casein mRNA species showed greater sequence conservation in the non-coding than in the coding regions, suggesting a functional and possibly regulatory role for the non-coding regions of casein mRNA. The results provide insight into the evolution of the casein genes, and raise questions as to the role of conserved nucleotide sequences within the non-coding regions of mRNA species. Images Fig. 1. PMID:6548375
A benchmark study of scoring methods for non-coding mutations.

PubMed

Drubay, Damien; Gautheret, Daniel; Michiels, Stefan

2018-05-15

Detailed knowledge of coding sequences has led to different candidate models for pathogenic variant prioritization. Several deleteriousness scores have been proposed for the non-coding part of the genome, but no large-scale comparison has been realized to date to assess their performance. We compared the leading scoring tools (CADD, FATHMM-MKL, Funseq2 and GWAVA) and some recent competitors (DANN, SNP and SOM scores) for their ability to discriminate assumed pathogenic variants from assumed benign variants (using the ClinVar, COSMIC and 1000 genomes project databases). Using the ClinVar benchmark, CADD was the best tool for detecting the pathogenic variants that are mainly located in protein coding gene regions. Using the COSMIC benchmark, FATHMM-MKL, GWAVA and SOMliver outperformed the other tools for pathogenic variants that are typically located in lincRNAs, pseudogenes and other parts of the non-coding genome. However, all tools had low precision, which could potentially be improved by future non-coding genome feature discoveries. These results may have been influenced by the presence of potential benign variants in the COSMIC database. The development of a gold standard as consistent as ClinVar for these regions will be necessary to confirm our tool ranking. The Snakemake, C++ and R codes are freely available from https://github.com/Oncostat/BenchmarkNCVTools and supported on Linux. damien.drubay@gustaveroussy.fr or stefan.michiels@gustaveroussy.fr. Supplementary data are available at Bioinformatics online.
Haplotype block structure study of the CFTR gene. Most variants are associated with the M470 allele in several European populations.

PubMed

Pompei, Fiorenza; Ciminelli, Bianca Maria; Bombieri, Cristina; Ciccacci, Cinzia; Koudova, Monika; Giorgi, Silvia; Belpinati, Francesca; Begnini, Angela; Cerny, Milos; Des Georges, Marie; Claustres, Mireille; Ferec, Claude; Macek, Milan; Modiano, Guido; Pignatti, Pier Franco

2006-01-01

An average of about 1700 CFTR (cystic fibrosis transmembrane conductance regulator) alleles from normal individuals from different European populations were extensively screened for DNA sequence variation. A total of 80 variants were observed: 61 coding SNSs (results already published), 13 noncoding SNSs, three STRs, two short deletions, and one nucleotide insertion. Eight DNA variants were classified as non-CF causing due to their high frequency of occurrence. Through this survey the CFTR has become the most exhaustively studied gene for its coding sequence variability and, though to a lesser extent, for its noncoding sequence variability as well. Interestingly, most variation was associated with the M470 allele, while the V470 allele showed an 'extended haplotype homozygosity' (EHH). These findings make us suggest a role for selection acting either on the M470V itself or through an hitchhiking mechanism involving a second site. The possible ancient origin of the V allele in an 'out of Africa' time frame is discussed.
Common and rare von Willebrand factor (VWF) coding variants, VWF levels, and factor VIII levels in African Americans: the NHLBI Exome Sequencing Project.

PubMed

Johnsen, Jill M; Auer, Paul L; Morrison, Alanna C; Jiao, Shuo; Wei, Peng; Haessler, Jeffrey; Fox, Keolu; McGee, Sean R; Smith, Joshua D; Carlson, Christopher S; Smith, Nicholas; Boerwinkle, Eric; Kooperberg, Charles; Nickerson, Deborah A; Rich, Stephen S; Green, David; Peters, Ulrike; Cushman, Mary; Reiner, Alex P

2013-07-25

Several rare European von Willebrand disease missense variants of VWF (including p.Arg2185Gln and p.His817Gln) were recently reported to be common in apparently healthy African Americans (AAs). Using data from the NHLBI Exome Sequencing Project, we assessed the association of these and other VWF coding variants with von Willebrand factor (VWF) and factor VIII (FVIII) levels in 4468 AAs. Of 30 nonsynonymous VWF variants, 6 were significantly and independently associated (P < .001) with levels of VWF and/or FVIII. Each additional copy of the common VWF variants encoding p.Thr789Ala or p.Asp1472His was associated with 6 to 8 IU/dL higher VWF levels. The VWF variant encoding p.Arg2185Gln was associated with 7 to 13 IU/dL lower VWF and FVIII levels. The type 2N-related VWF variant encoding p.His817Gln was associated with 17 IU/dL lower FVIII level but normal VWF level. A novel, rare missense VWF variant that predicts disruption of an O-glycosylation site (p.Ser1486Leu) and a rare variant encoding p.Arg2287Trp were each associated with 30 to 40 IU/dL lower VWF level (P < .001). In summary, several common and rare VWF missense variants contribute to phenotypic differences in VWF and FVIII among AAs.
Rare variants and autoimmune disease.

PubMed

Massey, Jonathan; Eyre, Steve

2014-09-01

The study of rare variants in monogenic forms of autoimmune disease has offered insight into the aetiology of more complex pathologies. Research in complex autoimmune disease initially focused on sequencing candidate genes, with some early successes, notably in uncovering low-frequency variation associated with Type 1 diabetes mellitus. However, other early examples have proved difficult to replicate, and a recent study across six autoimmune diseases, re-sequencing 25 autoimmune disease-associated genes in large sample sizes, failed to find any associated rare variants. The study of rare and low-frequency variation in autoimmune diseases has been made accessible by the inclusion of such variants on custom genotyping arrays (e.g. Immunochip and Exome arrays). Whole-exome sequencing approaches are now also being utilised to uncover the contribution of rare coding variants to disease susceptibility, severity and treatment response. Other sequencing strategies are starting to uncover the role of regulatory rare variation. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Large-scale whole-genome sequencing of the Icelandic population.

PubMed

Gudbjartsson, Daniel F; Helgason, Hannes; Gudjonsson, Sigurjon A; Zink, Florian; Oddson, Asmundur; Gylfason, Arnaldur; Besenbacher, Soren; Magnusson, Gisli; Halldorsson, Bjarni V; Hjartarson, Eirikur; Sigurdsson, Gunnar Th; Stacey, Simon N; Frigge, Michael L; Holm, Hilma; Saemundsdottir, Jona; Helgadottir, Hafdis Th; Johannsdottir, Hrefna; Sigfusson, Gunnlaugur; Thorgeirsson, Gudmundur; Sverrisson, Jon Th; Gretarsdottir, Solveig; Walters, G Bragi; Rafnar, Thorunn; Thjodleifsson, Bjarni; Bjornsson, Einar S; Olafsson, Sigurdur; Thorarinsdottir, Hildur; Steingrimsdottir, Thora; Gudmundsdottir, Thora S; Theodors, Asgeir; Jonasson, Jon G; Sigurdsson, Asgeir; Bjornsdottir, Gyda; Jonsson, Jon J; Thorarensen, Olafur; Ludvigsson, Petur; Gudbjartsson, Hakon; Eyjolfsson, Gudmundur I; Sigurdardottir, Olof; Olafsson, Isleifur; Arnar, David O; Magnusson, Olafur Th; Kong, Augustine; Masson, Gisli; Thorsteinsdottir, Unnur; Helgason, Agnar; Sulem, Patrick; Stefansson, Kari

2015-05-01

Here we describe the insights gained from sequencing the whole genomes of 2,636 Icelanders to a median depth of 20×. We found 20 million SNPs and 1.5 million insertions-deletions (indels). We describe the density and frequency spectra of sequence variants in relation to their functional annotation, gene position, pathway and conservation score. We demonstrate an excess of homozygosity and rare protein-coding variants in Iceland. We imputed these variants into 104,220 individuals down to a minor allele frequency of 0.1% and found a recessive frameshift mutation in MYL4 that causes early-onset atrial fibrillation, several mutations in ABCB4 that increase risk of liver diseases and an intronic variant in GNAS associating with increased thyroid-stimulating hormone levels when maternally inherited. These data provide a study design that can be used to determine how variation in the sequence of the human genome gives rise to human diversity.
Implication of common and disease specific variants in CLU, CR1, and PICALM.

PubMed

Ferrari, Raffaele; Moreno, Jorge H; Minhajuddin, Abu T; O'Bryant, Sid E; Reisch, Joan S; Barber, Robert C; Momeni, Parastoo

2012-08-01

Two recent genome-wide association studies (GWAS) for late onset Alzheimer's disease (LOAD) revealed 3 new genes: clusterin (CLU), phosphatidylinositol binding clathrin assembly protein (PICALM), and complement receptor 1 (CR1). In order to evaluate association with these genome-wide association study-identified genes and to isolate the variants contributing to the pathogenesis of LOAD, we genotyped the top single nucleotide polymorphisms (SNPs), rs11136000 (CLU), rs3818361 (CR1), and rs3851179 (PICALM), and sequenced the entire coding regions of these genes in our cohort of 342 LOAD patients and 277 control subjects. We confirmed the association of rs3851179 (PICALM) (p = 7.4 × 10(-3)) with the disease status. Through sequencing we identified 18 variants in CLU, 3 of which were found exclusively in patients; 8 variants (out of 65) in CR1 gene were only found in patients and the 16 variants identified in PICALM gene were present in both patients and controls. In silico analysis of the variants in PICALM did not predict any damaging effect on the protein. The haplotype analysis of the variants in each gene predicted a common haplotype when the 3 single nucleotide polymorphisms rs11136000 (CLU), rs3818361 (CR1), and rs3851179 (PICALM), respectively, were included. For each gene the haplotype structure and size differed between patients and controls. In conclusion, we confirmed association of CLU, CR1, and PICALM genes with the disease status in our cohort through identification of a number of disease-specific variants among patients through the sequencing of the coding region of these genes. Published by Elsevier Inc.
Characterization of the two intra-individual sequence variants in the 18S rRNA gene in the plant parasitic nematode, Rotylenchulus reniformis.

PubMed

Nyaku, Seloame T; Sripathi, Venkateswara R; Kantety, Ramesh V; Gu, Yong Q; Lawrence, Kathy; Sharma, Govind C

2013-01-01

The 18S rRNA gene is fundamental to cellular and organismal protein synthesis and because of its stable persistence through generations it is also used in phylogenetic analysis among taxa. Sequence variation in this gene within a single species is rare, but it has been observed in few metazoan organisms. More frequently it has mostly been reported in the non-transcribed spacer region. Here, we have identified two sequence variants within the near full coding region of 18S rRNA gene from a single reniform nematode (RN) Rotylenchulus reniformis labeled as reniform nematode variant 1 (RN_VAR1) and variant 2 (RN_VAR2). All sequences from three of the four isolates had both RN variants in their sequences; however, isolate 13B had only RN variant 2 sequence. Specific variable base sites (96 or 5.5%) were found within the 18S rRNA gene that can clearly distinguish the two 18S rDNA variants of RN, in 11 (25.0%) and 33 (75.0%) of the 44 RN clones, for RN_VAR1 and RN_VAR2, respectively. Neighbor-joining trees show that the RN_VAR1 is very similar to the previously existing R. reniformis sequence in GenBank, while the RN_VAR2 sequence is more divergent. This is the first report of the identification of two major variants of the 18S rRNA gene in the same single RN, and documents the specific base variation between the two variants, and hypothesizes on simultaneous co-existence of these two variants for this gene.
Characterization of the Two Intra-Individual Sequence Variants in the 18S rRNA Gene in the Plant Parasitic Nematode, Rotylenchulus reniformis

PubMed Central

Nyaku, Seloame T.; Sripathi, Venkateswara R.; Kantety, Ramesh V.; Gu, Yong Q.; Lawrence, Kathy; Sharma, Govind C.

2013-01-01

The 18S rRNA gene is fundamental to cellular and organismal protein synthesis and because of its stable persistence through generations it is also used in phylogenetic analysis among taxa. Sequence variation in this gene within a single species is rare, but it has been observed in few metazoan organisms. More frequently it has mostly been reported in the non-transcribed spacer region. Here, we have identified two sequence variants within the near full coding region of 18S rRNA gene from a single reniform nematode (RN) Rotylenchulus reniformis labeled as reniform nematode variant 1 (RN_VAR1) and variant 2 (RN_VAR2). All sequences from three of the four isolates had both RN variants in their sequences; however, isolate 13B had only RN variant 2 sequence. Specific variable base sites (96 or 5.5%) were found within the 18S rRNA gene that can clearly distinguish the two 18S rDNA variants of RN, in 11 (25.0%) and 33 (75.0%) of the 44 RN clones, for RN_VAR1 and RN_VAR2, respectively. Neighbor-joining trees show that the RN_VAR1 is very similar to the previously existing R. reniformis sequence in GenBank, while the RN_VAR2 sequence is more divergent. This is the first report of the identification of two major variants of the 18S rRNA gene in the same single RN, and documents the specific base variation between the two variants, and hypothesizes on simultaneous co-existence of these two variants for this gene. PMID:23593343
RareVariantVis: new tool for visualization of causative variants in rare monogenic disorders using whole genome sequencing data.

PubMed

Stokowy, Tomasz; Garbulowski, Mateusz; Fiskerstrand, Torunn; Holdhus, Rita; Labun, Kornel; Sztromwasser, Pawel; Gilissen, Christian; Hoischen, Alexander; Houge, Gunnar; Petersen, Kjell; Jonassen, Inge; Steen, Vidar M

2016-10-01

The search for causative genetic variants in rare diseases of presumed monogenic inheritance has been boosted by the implementation of whole exome (WES) and whole genome (WGS) sequencing. In many cases, WGS seems to be superior to WES, but the analysis and visualization of the vast amounts of data is demanding. To aid this challenge, we have developed a new tool-RareVariantVis-for analysis of genome sequence data (including non-coding regions) for both germ line and somatic variants. It visualizes variants along their respective chromosomes, providing information about exact chromosomal position, zygosity and frequency, with point-and-click information regarding dbSNP IDs, gene association and variant inheritance. Rare variants as well as de novo variants can be flagged in different colors. We show the performance of the RareVariantVis tool in the Genome in a Bottle WGS data set. https://www.bioconductor.org/packages/3.3/bioc/html/RareVariantVis.html tomasz.stokowy@k2.uib.no Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Simple and efficient identification of rare recessive pathologically important sequence variants from next generation exome sequence data.

PubMed

Carr, Ian M; Morgan, Joanne; Watson, Christopher; Melnik, Svitlana; Diggle, Christine P; Logan, Clare V; Harrison, Sally M; Taylor, Graham R; Pena, Sergio D J; Markham, Alexander F; Alkuraya, Fowzan S; Black, Graeme C M; Ali, Manir; Bonthron, David T

2013-07-01

Massively parallel ("next generation") DNA sequencing (NGS) has quickly become the method of choice for seeking pathogenic mutations in rare uncharacterized monogenic diseases. Typically, before DNA sequencing, protein-coding regions are enriched from patient genomic DNA, representing either the entire genome ("exome sequencing") or selected mapped candidate loci. Sequence variants, identified as differences between the patient's and the human genome reference sequences, are then filtered according to various quality parameters. Changes are screened against datasets of known polymorphisms, such as dbSNP and the 1000 Genomes Project, in the effort to narrow the list of candidate causative variants. An increasing number of commercial services now offer to both generate and align NGS data to a reference genome. This potentially allows small groups with limited computing infrastructure and informatics skills to utilize this technology. However, the capability to effectively filter and assess sequence variants is still an important bottleneck in the identification of deleterious sequence variants in both research and diagnostic settings. We have developed an approach to this problem comprising a user-friendly suite of programs that can interactively analyze, filter and screen data from enrichment-capture NGS data. These programs ("Agile Suite") are particularly suitable for small-scale gene discovery or for diagnostic analysis. © 2013 WILEY PERIODICALS, INC.
Common and rare variants associated with kidney stones and biochemical traits

PubMed Central

Oddsson, Asmundur; Sulem, Patrick; Helgason, Hannes; Edvardsson, Vidar O.; Thorleifsson, Gudmar; Sveinbjörnsson, Gardar; Haraldsdottir, Eik; Eyjolfsson, Gudmundur I.; Sigurdardottir, Olof; Olafsson, Isleifur; Masson, Gisli; Holm, Hilma; Gudbjartsson, Daniel F.; Thorsteinsdottir, Unnur; Indridason, Olafur S.; Palsson, Runolfur; Stefansson, Kari

2015-01-01

Kidney stone disease is a complex disorder with a strong genetic component. We conducted a genome-wide association study of 28.3 million sequence variants detected through whole-genome sequencing of 2,636 Icelanders that were imputed into 5,419 kidney stone cases, including 2,172 cases with a history of recurrent kidney stones, and 279,870 controls. We identify sequence variants associating with kidney stones at ALPL (rs1256328[T], odds ratio (OR)=1.21, P=5.8 × 10−10) and a suggestive association at CASR (rs7627468[A], OR=1.16, P=2.0 × 10−8). Focusing our analysis on coding sequence variants in 63 genes with preferential kidney expression we identify two rare missense variants SLC34A1 p.Tyr489Cys (OR=2.38, P=2.8 × 10−5) and TRPV5 p.Leu530Arg (OR=3.62, P=4.1 × 10−5) associating with recurrent kidney stones. We also observe associations of the identified kidney stone variants with biochemical traits in a large population set, indicating potential biological mechanism. PMID:26272126
Common and rare variants associated with kidney stones and biochemical traits.

PubMed

Oddsson, Asmundur; Sulem, Patrick; Helgason, Hannes; Edvardsson, Vidar O; Thorleifsson, Gudmar; Sveinbjörnsson, Gardar; Haraldsdottir, Eik; Eyjolfsson, Gudmundur I; Sigurdardottir, Olof; Olafsson, Isleifur; Masson, Gisli; Holm, Hilma; Gudbjartsson, Daniel F; Thorsteinsdottir, Unnur; Indridason, Olafur S; Palsson, Runolfur; Stefansson, Kari

2015-08-14

Kidney stone disease is a complex disorder with a strong genetic component. We conducted a genome-wide association study of 28.3 million sequence variants detected through whole-genome sequencing of 2,636 Icelanders that were imputed into 5,419 kidney stone cases, including 2,172 cases with a history of recurrent kidney stones, and 279,870 controls. We identify sequence variants associating with kidney stones at ALPL (rs1256328[T], odds ratio (OR)=1.21, P=5.8 × 10(-10)) and a suggestive association at CASR (rs7627468[A], OR=1.16, P=2.0 × 10(-8)). Focusing our analysis on coding sequence variants in 63 genes with preferential kidney expression we identify two rare missense variants SLC34A1 p.Tyr489Cys (OR=2.38, P=2.8 × 10(-5)) and TRPV5 p.Leu530Arg (OR=3.62, P=4.1 × 10(-5)) associating with recurrent kidney stones. We also observe associations of the identified kidney stone variants with biochemical traits in a large population set, indicating potential biological mechanism.
Whole genome sequences of a male and female supercentenarian, ages greater than 114 years.

PubMed

Sebastiani, Paola; Riva, Alberto; Montano, Monty; Pham, Phillip; Torkamani, Ali; Scherba, Eugene; Benson, Gary; Milton, Jacqueline N; Baldwin, Clinton T; Andersen, Stacy; Schork, Nicholas J; Steinberg, Martin H; Perls, Thomas T

2011-01-01

Supercentenarians (age 110+ years old) generally delay or escape age-related diseases and disability well beyond the age of 100 and this exceptional survival is likely to be influenced by a genetic predisposition that includes both common and rare genetic variants. In this report, we describe the complete genomic sequences of male and female supercentenarians, both age >114 years old. We show that: (1) the sequence variant spectrum of these two individuals' DNA sequences is largely comparable to existing non-supercentenarian genomes; (2) the two individuals do not appear to carry most of the well-established human longevity enabling variants already reported in the literature; (3) they have a comparable number of known disease-associated variants relative to most human genomes sequenced to-date; (4) approximately 1% of the variants these individuals possess are novel and may point to new genes involved in exceptional longevity; and (5) both individuals are enriched for coding variants near longevity-associated variants that we discovered through a large genome-wide association study. These analyses suggest that there are both common and rare longevity-associated variants that may counter the effects of disease-predisposing variants and extend lifespan. The continued analysis of the genomes of these and other rare individuals who have survived to extremely old ages should provide insight into the processes that contribute to the maintenance of health during extreme aging.
Whole Genome Sequences of a Male and Female Supercentenarian, Ages Greater than 114 Years

PubMed Central

Sebastiani, Paola; Riva, Alberto; Montano, Monty; Pham, Phillip; Torkamani, Ali; Scherba, Eugene; Benson, Gary; Milton, Jacqueline N.; Baldwin, Clinton T.; Andersen, Stacy; Schork, Nicholas J.; Steinberg, Martin H.; Perls, Thomas T.

2012-01-01

Supercentenarians (age 110+ years old) generally delay or escape age-related diseases and disability well beyond the age of 100 and this exceptional survival is likely to be influenced by a genetic predisposition that includes both common and rare genetic variants. In this report, we describe the complete genomic sequences of male and female supercentenarians, both age >114 years old. We show that: (1) the sequence variant spectrum of these two individuals’ DNA sequences is largely comparable to existing non-supercentenarian genomes; (2) the two individuals do not appear to carry most of the well-established human longevity enabling variants already reported in the literature; (3) they have a comparable number of known disease-associated variants relative to most human genomes sequenced to-date; (4) approximately 1% of the variants these individuals possess are novel and may point to new genes involved in exceptional longevity; and (5) both individuals are enriched for coding variants near longevity-associated variants that we discovered through a large genome-wide association study. These analyses suggest that there are both common and rare longevity-associated variants that may counter the effects of disease-predisposing variants and extend lifespan. The continued analysis of the genomes of these and other rare individuals who have survived to extremely old ages should provide insight into the processes that contribute to the maintenance of health during extreme aging. PMID:22303384
Evaluation of a functional variant assay for selecting beef cattle

USDA-ARS?s Scientific Manuscript database

A commercially available genotyping assay for functional variants was chosen to obtain genotypes needed for a selection experiment in populations of pedigreed cattle that have not been extensively genotyped. The assay design included probes for coding sequence variation in 88% of annotated protein c...
Exome Sequencing Analysis Reveals Variants in Primary Immunodeficiency Genes in Patients With Very Early Onset Inflammatory Bowel Disease

PubMed Central

Kelsen, Judith R.; Dawany, Noor; Moran, Christopher J.; Petersen, Britt-Sabina; Sarmady, Mahdi; Sasson, Ariella; Pauly-Hubbard, Helen; Martinez, Alejandro; Maurer, Kelly; Soong, Joanne; Rappaport, Eric; Franke, Andre; Keller, Andreas; Winter, Harland S.; Mamula, Petar; Piccoli, David; Artis, David; Sonnenberg, Gregory F.; Daly, Mark; Sullivan, Kathleen E.; Baldassano, Robert N.; Devoto, Marcella

2016-01-01

Background & Aims Very early onset inflammatory bowel disease (VEO-IBD), IBD diagnosed ≤5 y of age, frequently presents with a different and more severe phenotype than older-onset IBD. We investigated whether patients with VEO-IBD carry rare or novel variants in genes associated with immunodeficiencies that might contribute to disease development. Methods Patients with VEO-IBD and parents (when available) were recruited from the Children's Hospital of Philadelphia from March 2013 through July 2014. We analyzed DNA from 125 patients with VEO-IBD (ages 3 weeks to 4 y) and 19 parents, 4 of whom also had IBD. Exome capture was performed by Agilent SureSelect V4, and sequencing was performed using the Illumina HiSeq platform. Alignment to human genome GRCh37 was achieved followed by post-processing and variant calling. Following functional annotation, candidate variants were analyzed for change in protein function, minor allele frequency <0.1%, and scaled combined annotation dependent depletion scores ≤10. We focused on genes associated with primary immunodeficiencies and related pathways. An additional 210 exome samples from patients with pediatric IBD (n=45) or adult-onset Crohn's disease (n=20) and healthy individuals (controls, n=145) were obtained from the University of Kiel, Germany and used as control groups. Results Four-hundred genes and regions associated with primary immunodeficiency, covering approximately 6500 coding exons totaling > 1 Mbp of coding sequence, were selected from the whole exome data. Our analysis revealed novel and rare variants within these genes that could contribute to the development of VEO-IBD, including rare heterozygous missense variants in IL10RA and previously unidentified variants in MSH5 and CD19. Conclusions In an exome sequence analysis of patients with VEO-IBD and their parents, we identified variants in genes that regulate B- and T-cell functions and could contribute to pathogenesis. Our analysis could lead to the identification of previously unidentified IBD-associated variants. PMID:26193622
Rare coding variants in Phospholipase D3 (PLD3) confer risk for Alzheimer's disease

PubMed Central

Cruchaga, Carlos; Benitez, Bruno A.; Cai, Yefei; Guerreiro, Rita; Harari, Oscar; Norton, Joanne; Budde, John; Bertelsen, Sarah; Jeng, Amanda T.; Cooper, Breanna; Skorupa, Tara; Carrell, David; Levitch, Denise; Hsu, Simon; Choi, Jiyoon; Ryten, Mina; Sassi, Celeste; Bras, Jose; Gibbs, Raphael J.; Hernandez, Dena G.; Lupton, Michelle K.; Powell, John; Forabosco, Paola; Ridge, Perry G.; Corcoran, Christopher D.; Tschanz, JoAnn T.; Norton, Maria C.; Munger, Ronald G.; Schmutz, Cameron; Leary, Maegan; Demirci, F. Yesim; Bamne, Mikhil N.; Wang, Xingbin; Lopez, Oscar L.; Ganguli, Mary; Medway, Christopher; Turton, James; Lord, Jenny; Braae, Anne; Barber, Imelda; Brown, Kristelle; Pastor, Pau; Lorenzo-Betancor, Oswaldo; Brkanac, Zoran; Scott, Erick; Topol, Eric; Morgan, Kevin; Rogaeva, Ekaterina; Singleton, Andy; Hardy, John; Kamboh, M. Ilyas; George-Hyslop, Peter St; Cairns, Nigel; Morris, John C.; Kauwe, John S.K.; Goate, Alison M.

2014-01-01

Genome-wide association studies (GWAS) have identified several risk variants for late-onset Alzheimer's disease (LOAD)1,2. These common variants have replicable but small effects on LOAD risk and generally do not have obvious functional effects. Low-frequency coding variants, not detected by GWAS, are predicted to include functional variants with larger effects on risk. To identify low frequency coding variants with large effects on LOAD risk, we performed whole exome-sequencing (WES) in 14 large LOAD families and follow-up analyses of the candidate variants in several large case-control datasets. A rare variant in PLD3 (phospholipase-D family, member 3, rs145999145; V232M) segregated with disease status in two independent families and doubled risk for AD in seven independent case-control series (V232M meta-analysis; OR= 2.10, CI=1.47-2.99; p= 2.93×10-5, 11,354 cases and controls of European-descent). Gene-based burden analyses in 4,387 cases and controls of European-descent and 302 African American cases and controls, with complete sequence data for PLD3, indicate that several variants in this gene increase risk for AD in both populations (EA: OR= 2.75, CI=2.05-3.68; p=1.44×10-11, AA: OR= 5.48, CI=1.77-16.92; p=1.40×10-3). PLD3 is highly expressed in brain regions vulnerable to AD pathology, including hippocampus and cortex, and is expressed at lower levels in neurons from AD brains compared to control brains (p=8.10×10-10). Over-expression of PLD3 leads to a significant decrease in intracellular APP and extracellular Aβ42 and Aβ40, while knock-down of PLD3 leads to a significant increase in extracellular Aβ42 and Aβ40. Together, our genetic and functional data indicate that carriers of PLD3 coding variants have a two-fold increased risk for LOAD and that PLD3 influences APP processing. This study provides an example of how densely affected families may be used to identify rare variants with large effects on risk for disease or other complex traits. PMID:24336208
A rare coding variant in TREM2 increases risk for Alzheimer's disease in Han Chinese.

PubMed

Jiang, Teng; Tan, Lan; Chen, Qi; Tan, Meng-Shan; Zhou, Jun-Shan; Zhu, Xi-Chen; Lu, Huan; Wang, Hui-Fu; Zhang, Ying-Dong; Yu, Jin-Tai

2016-06-01

Two recent studies have identified that a rare coding variant (p.R47H) in exon 2 of triggering receptor expressed on myeloid cells 2 (TREM2) gene is associated with Alzheimer's disease (AD) susceptibility in Caucasians. This association was not successfully replicated in Han Chinese, where this variant was rare or even absent. Previously, we resequenced TREM2 exon 2 to investigate whether additional rare variants conferred risk to AD in our cohort. Although several new variants had been identified, none of them was significantly associated with disease susceptibility. Here, to test whether TREM2 is truly a susceptibility gene of AD in Han Chinese, we extend our previous study by sequencing the other four exons of TREM2 in 988 AD patients and 1,354 healthy controls. We provided the first evidence that a rare coding variant (p.H157Y) in TREM2 exon 3 conferred a considerable risk of AD in our cohort (Pcorrected = 0.02, odds ratio = 11.01, 95% confidence interval: 1.38-88.05). This finding indicates that rare coding variants of TREM2 may play an important role in AD in Han Chinese. Copyright © 2016 Elsevier Inc. All rights reserved.
The GENCODE exome: sequencing the complete human exome

PubMed Central

Coffey, Alison J; Kokocinski, Felix; Calafato, Maria S; Scott, Carol E; Palta, Priit; Drury, Eleanor; Joyce, Christopher J; LeProust, Emily M; Harrow, Jen; Hunt, Sarah; Lehesjoki, Anna-Elina; Turner, Daniel J; Hubbard, Tim J; Palotie, Aarno

2011-01-01

Sequencing the coding regions, the exome, of the human genome is one of the major current strategies to identify low frequency and rare variants associated with human disease traits. So far, the most widely used commercial exome capture reagents have mainly targeted the consensus coding sequence (CCDS) database. We report the design of an extended set of targets for capturing the complete human exome, based on annotation from the GENCODE consortium. The extended set covers an additional 5594 genes and 10.3 Mb compared with the current CCDS-based sets. The additional regions include potential disease genes previously inaccessible to exome resequencing studies, such as 43 genes linked to ion channel activity and 70 genes linked to protein kinase activity. In total, the new GENCODE exome set developed here covers 47.9 Mb and performed well in sequence capture experiments. In the sample set used in this study, we identified over 5000 SNP variants more in the GENCODE exome target (24%) than in the CCDS-based exome sequencing. PMID:21364695

Regularized rare variant enrichment analysis for case-control exome sequencing data.

PubMed

Larson, Nicholas B; Schaid, Daniel J

2014-02-01

Rare variants have recently garnered an immense amount of attention in genetic association analysis. However, unlike methods traditionally used for single marker analysis in GWAS, rare variant analysis often requires some method of aggregation, since single marker approaches are poorly powered for typical sequencing study sample sizes. Advancements in sequencing technologies have rendered next-generation sequencing platforms a realistic alternative to traditional genotyping arrays. Exome sequencing in particular not only provides base-level resolution of genetic coding regions, but also a natural paradigm for aggregation via genes and exons. Here, we propose the use of penalized regression in combination with variant aggregation measures to identify rare variant enrichment in exome sequencing data. In contrast to marginal gene-level testing, we simultaneously evaluate the effects of rare variants in multiple genes, focusing on gene-based least absolute shrinkage and selection operator (LASSO) and exon-based sparse group LASSO models. By using gene membership as a grouping variable, the sparse group LASSO can be used as a gene-centric analysis of rare variants while also providing a penalized approach toward identifying specific regions of interest. We apply extensive simulations to evaluate the performance of these approaches with respect to specificity and sensitivity, comparing these results to multiple competing marginal testing methods. Finally, we discuss our findings and outline future research. © 2013 WILEY PERIODICALS, INC.
Investigation of the role of TCF4 rare sequence variants in schizophrenia.

PubMed

Basmanav, F Buket; Forstner, Andreas J; Fier, Heide; Herms, Stefan; Meier, Sandra; Degenhardt, Franziska; Hoffmann, Per; Barth, Sandra; Fricker, Nadine; Strohmaier, Jana; Witt, Stephanie H; Ludwig, Michael; Schmael, Christine; Moebus, Susanne; Maier, Wolfgang; Mössner, Rainald; Rujescu, Dan; Rietschel, Marcella; Lange, Christoph; Nöthen, Markus M; Cichon, Sven

2015-07-01

Transcription factor 4 (TCF4) is one of the most robust of all reported schizophrenia risk loci and is supported by several genetic and functional lines of evidence. While numerous studies have implicated common genetic variation at TCF4 in schizophrenia risk, the role of rare, small-sized variants at this locus-such as single nucleotide variants and short indels which are below the resolution of chip-based arrays requires further exploration. The aim of the present study was to investigate the association between rare TCF4 sequence variants and schizophrenia. Exon-targeted resequencing was performed in 190 German schizophrenia patients. Six rare variants at the coding exons and flanking sequences of the TCF4 gene were identified, including two missense variants and one splice site variant. These six variants were then pooled with nine additional rare variants identified in 379 European participants of the 1000 Genomes Project, and all 15 variants were genotyped in an independent German sample (n = 1,808 patients; n = 2,261 controls). These data were then analyzed using six statistical methods developed for the association analysis of rare variants. No significant association (P < 0.05) was found. However, the results from our association and power analyses suggest that further research into the possible involvement of rare TCF4 sequence variants in schizophrenia risk is warranted by the assessment of larger cohorts with higher statistical power to identify rare variant associations. © 2015 Wiley Periodicals, Inc.
Whole-Genome Sequencing Suggests Schizophrenia Risk Mechanisms in Humans with 22q11.2 Deletion Syndrome.

PubMed

Merico, Daniele; Zarrei, Mehdi; Costain, Gregory; Ogura, Lucas; Alipanahi, Babak; Gazzellone, Matthew J; Butcher, Nancy J; Thiruvahindrapuram, Bhooma; Nalpathamkalam, Thomas; Chow, Eva W C; Andrade, Danielle M; Frey, Brendan J; Marshall, Christian R; Scherer, Stephen W; Bassett, Anne S

2015-09-16

Chromosome 22q11.2 microdeletions impart a high but incomplete risk for schizophrenia. Possible mechanisms include genome-wide effects of DGCR8 haploinsufficiency. In a proof-of-principle study to assess the power of this model, we used high-quality, whole-genome sequencing of nine individuals with 22q11.2 deletions and extreme phenotypes (schizophrenia, or no psychotic disorder at age >50 years). The schizophrenia group had a greater burden of rare, damaging variants impacting protein-coding neurofunctional genes, including genes involved in neuron projection (nominal P = 0.02, joint burden of three variant types). Variants in the intact 22q11.2 region were not major contributors. Restricting to genes affected by a DGCR8 mechanism tended to amplify between-group differences. Damaging variants in highly conserved long intergenic noncoding RNA genes also were enriched in the schizophrenia group (nominal P = 0.04). The findings support the 22q11.2 deletion model as a threshold-lowering first hit for schizophrenia risk. If applied to a larger and thus better-powered cohort, this appears to be a promising approach to identify genome-wide rare variants in coding and noncoding sequence that perturb gene networks relevant to idiopathic schizophrenia. Similarly designed studies exploiting genetic models may prove useful to help delineate the genetic architecture of other complex phenotypes. Copyright © 2015 Merico et al.
Whole-Genome Sequencing Suggests Schizophrenia Risk Mechanisms in Humans with 22q11.2 Deletion Syndrome

PubMed Central

Merico, Daniele; Zarrei, Mehdi; Costain, Gregory; Ogura, Lucas; Alipanahi, Babak; Gazzellone, Matthew J.; Butcher, Nancy J.; Thiruvahindrapuram, Bhooma; Nalpathamkalam, Thomas; Chow, Eva W. C.; Andrade, Danielle M.; Frey, Brendan J.; Marshall, Christian R.; Scherer, Stephen W.; Bassett, Anne S.

2015-01-01

Chromosome 22q11.2 microdeletions impart a high but incomplete risk for schizophrenia. Possible mechanisms include genome-wide effects of DGCR8 haploinsufficiency. In a proof-of-principle study to assess the power of this model, we used high-quality, whole-genome sequencing of nine individuals with 22q11.2 deletions and extreme phenotypes (schizophrenia, or no psychotic disorder at age >50 years). The schizophrenia group had a greater burden of rare, damaging variants impacting protein-coding neurofunctional genes, including genes involved in neuron projection (nominal P = 0.02, joint burden of three variant types). Variants in the intact 22q11.2 region were not major contributors. Restricting to genes affected by a DGCR8 mechanism tended to amplify between-group differences. Damaging variants in highly conserved long intergenic noncoding RNA genes also were enriched in the schizophrenia group (nominal P = 0.04). The findings support the 22q11.2 deletion model as a threshold-lowering first hit for schizophrenia risk. If applied to a larger and thus better-powered cohort, this appears to be a promising approach to identify genome-wide rare variants in coding and noncoding sequence that perturb gene networks relevant to idiopathic schizophrenia. Similarly designed studies exploiting genetic models may prove useful to help delineate the genetic architecture of other complex phenotypes. PMID:26384369
A structural variant in the 5’-flanking region of the TWIST2 gene affects melanocyte development in belted cattle

PubMed Central

Drögemüller, Cord; Jagannathan, Vidhya; Keller, Irene; Wüthrich, Daniel; Bruggmann, Rémy; Schütz, Ekkehard; Demmel, Steffi; Moser, Simon; Signer-Hasler, Heidi; Pieńkowska-Schelling, Aldona; Schelling, Claude; Sande, Marcos; Rongen, Ronald

2017-01-01

Belted cattle have a circular belt of unpigmented hair and skin around their midsection. The belt is inherited as a monogenic autosomal dominant trait. We mapped the causative variant to a 37 kb segment on bovine chromosome 3. Whole genome sequence data of 2 belted and 130 control cattle yielded only one private genetic variant in the critical interval in the two belted animals. The belt-associated variant was a copy number variant (CNV) involving the quadruplication of a 6 kb non-coding sequence located approximately 16 kb upstream of the TWIST2 gene. Increased copy numbers at this CNV were strongly associated with the belt phenotype in a cohort of 333 cases and 1322 controls. We hypothesized that the CNV causes aberrant expression of TWIST2 during neural crest development, which might negatively affect melanoblasts. Functional studies showed that ectopic expression of bovine TWIST2 in neural crest in transgenic zebrafish led to a decrease in melanocyte numbers. Our results thus implicate an unsuspected involvement of TWIST2 in regulating pigmentation and reveal a non-coding CNV underlying a captivating Mendelian character. PMID:28658273
Integrating evolutionary and regulatory information with a multispecies approach implicates genes and pathways in obsessive-compulsive disorder.

PubMed

Noh, Hyun Ji; Tang, Ruqi; Flannick, Jason; O'Dushlaine, Colm; Swofford, Ross; Howrigan, Daniel; Genereux, Diane P; Johnson, Jeremy; van Grootheest, Gerard; Grünblatt, Edna; Andersson, Erik; Djurfeldt, Diana R; Patel, Paresh D; Koltookian, Michele; M Hultman, Christina; Pato, Michele T; Pato, Carlos N; Rasmussen, Steven A; Jenike, Michael A; Hanna, Gregory L; Stewart, S Evelyn; Knowles, James A; Ruhrmann, Stephan; Grabe, Hans-Jörgen; Wagner, Michael; Rück, Christian; Mathews, Carol A; Walitza, Susanne; Cath, Daniëlle C; Feng, Guoping; Karlsson, Elinor K; Lindblad-Toh, Kerstin

2017-10-17

Obsessive-compulsive disorder is a severe psychiatric disorder linked to abnormalities in glutamate signaling and the cortico-striatal circuit. We sequenced coding and regulatory elements for 608 genes potentially involved in obsessive-compulsive disorder in human, dog, and mouse. Using a new method that prioritizes likely functional variants, we compared 592 cases to 560 controls and found four strongly associated genes, validated in a larger cohort. NRXN1 and HTR2A are enriched for coding variants altering postsynaptic protein-binding domains. CTTNBP2 (synapse maintenance) and REEP3 (vesicle trafficking) are enriched for regulatory variants, of which at least six (35%) alter transcription factor-DNA binding in neuroblastoma cells. NRXN1 achieves genome-wide significance (p = 6.37 × 10 -11 ) when we include 33,370 population-matched controls. Our findings suggest synaptic adhesion as a key component in compulsive behaviors, and show that targeted sequencing plus functional annotation can identify potentially causative variants, even when genomic data are limited.Obsessive-compulsive disorder (OCD) is a neuropsychiatric disorder with symptoms including intrusive thoughts and time-consuming repetitive behaviors. Here Noh and colleagues identify genes enriched for functional variants associated with increased risk of OCD.
A low-frequency inactivating AKT2 variant enriched in the Finnish population is associated with fasting insulin levels and type 2 diabetes risk

PubMed Central

Grarup, Niels; Rivas, Manuel A; Mahajan, Anubha; Locke, Adam E; Cingolani, Pablo; Pers, Tune H; Viñuela, Ana; Brown, Andrew A; Wu, Ying; Flannick, Jason; Fuchsberger, Christian; Gamazon, Eric R; Gaulton, Kyle J; Im, Hae Kyung; Teslovich, Tanya M; Blackwell, Thomas W; Bork-Jensen, Jette; Burtt, Noël P; Chen, Yuhui; Green, Todd; Hartl, Christopher; Kang, Hyun Min; Kumar, Ashish; Ladenvall, Claes; Ma, Clement; Moutsianas, Loukas; Pearson, Richard D; Perry, John R B; Rayner, N William; Robertson, Neil R; Scott, Laura J; van de Bunt, Martijn; Eriksson, Johan G; Jula, Antti; Koskinen, Seppo; Lehtimäki, Terho; Palotie, Aarno; Raitakari, Olli T; Jacobs, Suzanne BR; Wessel, Jennifer; Chu, Audrey Y; Scott, Robert A; Goodarzi, Mark O; Blancher, Christine; Buck, Gemma; Buck, David; Chines, Peter S; Gabriel, Stacey; Gjesing, Anette P; Groves, Christopher J; Hollensted, Mette; Huyghe, Jeroen R; Jackson, Anne U; Jun, Goo; Justesen, Johanne Marie; Mangino, Massimo; Murphy, Jacquelyn; Neville, Matt; Onofrio, Robert; Small, Kerrin S; Stringham, Heather M; Trakalo, Joseph; Banks, Eric; Carey, Jason; Carneiro, Mauricio O; DePristo, Mark; Farjoun, Yossi; Fennell, Timothy; Goldstein, Jacqueline I; Grant, George; de Angelis, Martin Hrabé; Maguire, Jared; Neale, Benjamin M; Poplin, Ryan; Purcell, Shaun; Schwarzmayr, Thomas; Shakir, Khalid; Smith, Joshua D; Strom, Tim M; Wieland, Thomas; Lindstrom, Jaana; Brandslund, Ivan; Christensen, Cramer; Surdulescu, Gabriela L; Lakka, Timo A; Doney, Alex S F; Nilsson, Peter; Wareham, Nicholas J; Langenberg, Claudia; Varga, Tibor V; Franks, Paul W; Rolandsson, Olov; Rosengren, Anders H; Farook, Vidya S; Thameem, Farook; Puppala, Sobha; Kumar, Satish; Lehman, Donna M; Jenkinson, Christopher P; Curran, Joanne E; Hale, Daniel Esten; Fowler, Sharon P; Arya, Rector; DeFronzo, Ralph A; Abboud, Hanna E; Syvänen, Ann-Christine; Hicks, Pamela J; Palmer, Nicholette D; Ng, Maggie C Y; Bowden, Donald W; Freedman, Barry I; Esko, Tõnu; Mägi, Reedik; Milani, Lili; Mihailov, Evelin; Metspalu, Andres; Narisu, Narisu; Kinnunen, Leena; Bonnycastle, Lori L; Swift, Amy; Pasko, Dorota; Wood, Andrew R; Fadista, João; Pollin, Toni I; Barzilai, Nir; Atzmon, Gil; Glaser, Benjamin; Thorand, Barbara; Strauch, Konstantin; Peters, Annette; Roden, Michael; Müller-Nurasyid, Martina; Liang, Liming; Kriebel, Jennifer; Illig, Thomas; Grallert, Harald; Gieger, Christian; Meisinger, Christa; Lannfelt, Lars; Musani, Solomon K; Griswold, Michael; Taylor, Herman A; Wilson, Gregory; Correa, Adolfo; Oksa, Heikki; Scott, William R; Afzal, Uzma; Tan, Sian-Tsung; Loh, Marie; Chambers, John C; Sehmi, Jobanpreet; Kooner, Jaspal Singh; Lehne, Benjamin; Cho, Yoon Shin; Lee, Jong-Young; Han, Bok-Ghee; Käräjämäki, Annemari; Qi, Qibin; Qi, Lu; Huang, Jinyan; Hu, Frank B; Melander, Olle; Orho-Melander, Marju; Below, Jennifer E; Aguilar, David; Wong, Tien Yin; Liu, Jianjun; Khor, Chiea-Chuen; Chia, Kee Seng; Lim, Wei Yen; Cheng, Ching-Yu; Chan, Edmund; Tai, E Shyong; Aung, Tin; Linneberg, Allan; Isomaa, Bo; Meitinger, Thomas; Tuomi, Tiinamaija; Hakaste, Liisa; Kravic, Jasmina; Jørgensen, Marit E; Lauritzen, Torsten; Deloukas, Panos; Stirrups, Kathleen E; Owen, Katharine R; Farmer, Andrew J; Frayling, Timothy M; O'Rahilly, Stephen P; Walker, Mark; Levy, Jonathan C; Hodgkiss, Dylan; Hattersley, Andrew T; Kuulasmaa, Teemu; Stančáková, Alena; Barroso, Inês; Bharadwaj, Dwaipayan; Chan, Juliana; Chandak, Giriraj R; Daly, Mark J; Donnelly, Peter J; Ebrahim, Shah B; Elliott, Paul; Fingerlin, Tasha; Froguel, Philippe; Hu, Cheng; Jia, Weiping; Ma, Ronald C W; McVean, Gilean; Park, Taesung; Prabhakaran, Dorairaj; Sandhu, Manjinder; Scott, James; Sladek, Rob; Tandon, Nikhil; Teo, Yik Ying; Zeggini, Eleftheria; Watanabe, Richard M; Koistinen, Heikki A; Kesaniemi, Y Antero; Uusitupa, Matti; Spector, Timothy D; Salomaa, Veikko; Rauramaa, Rainer; Palmer, Colin N A; Prokopenko, Inga; Morris, Andrew D; Bergman, Richard N; Collins, Francis S; Lind, Lars; Ingelsson, Erik; Tuomilehto, Jaakko; Karpe, Fredrik; Groop, Leif; Jørgensen, Torben; Hansen, Torben; Pedersen, Oluf; Kuusisto, Johanna; Abecasis, Gonçalo; Bell, Graeme I; Blangero, John; Cox, Nancy J; Duggirala, Ravindranath; Seielstad, Mark; Wilson, James G; Dupuis, Josee; Ripatti, Samuli; Hanis, Craig L; Florez, Jose C; Mohlke, Karen L; Meigs, James B; Laakso, Markku; Morris, Andrew P; Boehnke, Michael; Altshuler, David; McCarthy, Mark I; Gloyn, Anna L; Lindgren, Cecilia M

2017-01-01

To identify novel coding association signals and facilitate characterization of mechanisms influencing glycemic traits and type 2 diabetes risk, we analyzed 109,215 variants derived from exome array genotyping together with an additional 390,225 variants from exome sequence in up to 39,339 normoglycemic individuals from five ancestry groups. We identified a novel association between the coding variant (p.Pro50Thr) in AKT2 and fasting insulin, a gene in which rare fully penetrant mutations are causal for monogenic glycemic disorders. The low-frequency allele is associated with a 12% increase in fasting plasma insulin (FI) levels. This variant is present at 1.1% frequency in Finns but virtually absent in individuals from other ancestries. Carriers of the FI-increasing allele had increased 2-hour insulin values, decreased insulin sensitivity, and increased risk of type 2 diabetes (odds ratio=1.05). In cellular studies, the AKT2-Thr50 protein exhibited a partial loss of function. We extend the allelic spectrum for coding variants in AKT2 associated with disorders of glucose homeostasis and demonstrate bidirectional effects of variants within the pleckstrin homology domain of AKT2. PMID:28341696
Genetic Architecture of Vitamin B12 and Folate Levels Uncovered Applying Deeply Sequenced Large Datasets

PubMed Central

Thorleifsson, Gudmar; Ahluwalia, Tarunveer S.; Steinthorsdottir, Valgerdur; Bjarnason, Helgi; Gudbjartsson, Daniel F.; Magnusson, Olafur T.; Sparsø, Thomas; Albrechtsen, Anders; Kong, Augustine; Masson, Gisli; Tian, Geng; Cao, Hongzhi; Nie, Chao; Kristiansen, Karsten; Husemoen, Lise Lotte; Thuesen, Betina; Li, Yingrui; Nielsen, Rasmus; Linneberg, Allan; Olafsson, Isleifur; Eyjolfsson, Gudmundur I.; Jørgensen, Torben; Wang, Jun; Hansen, Torben; Thorsteinsdottir, Unnur; Stefánsson, Kari; Pedersen, Oluf

2013-01-01

Genome-wide association studies have mainly relied on common HapMap sequence variations. Recently, sequencing approaches have allowed analysis of low frequency and rare variants in conjunction with common variants, thereby improving the search for functional variants and thus the understanding of the underlying biology of human traits and diseases. Here, we used a large Icelandic whole genome sequence dataset combined with Danish exome sequence data to gain insight into the genetic architecture of serum levels of vitamin B12 (B12) and folate. Up to 22.9 million sequence variants were analyzed in combined samples of 45,576 and 37,341 individuals with serum B12 and folate measurements, respectively. We found six novel loci associating with serum B12 (CD320, TCN2, ABCD4, MMAA, MMACHC) or folate levels (FOLR3) and confirmed seven loci for these traits (TCN1, FUT6, FUT2, CUBN, CLYBL, MUT, MTHFR). Conditional analyses established that four loci contain additional independent signals. Interestingly, 13 of the 18 identified variants were coding and 11 of the 13 target genes have known functions related to B12 and folate pathways. Contrary to epidemiological studies we did not find consistent association of the variants with cardiovascular diseases, cancers or Alzheimer's disease although some variants demonstrated pleiotropic effects. Although to some degree impeded by low statistical power for some of these conditions, these data suggest that sequence variants that contribute to the population diversity in serum B12 or folate levels do not modify the risk of developing these conditions. Yet, the study demonstrates the value of combining whole genome and exome sequencing approaches to ascertain the genetic and molecular architectures underlying quantitative trait associations. PMID:23754956
Imputation of Exome Sequence Variants into Population- Based Samples and Blood-Cell-Trait-Associated Loci in African Americans: NHLBI GO Exome Sequencing Project

PubMed Central

Auer, Paul L.; Johnsen, Jill M.; Johnson, Andrew D.; Logsdon, Benjamin A.; Lange, Leslie A.; Nalls, Michael A.; Zhang, Guosheng; Franceschini, Nora; Fox, Keolu; Lange, Ethan M.; Rich, Stephen S.; O’Donnell, Christopher J.; Jackson, Rebecca D.; Wallace, Robert B.; Chen, Zhao; Graubert, Timothy A.; Wilson, James G.; Tang, Hua; Lettre, Guillaume; Reiner, Alex P.; Ganesh, Santhi K.; Li, Yun

2012-01-01

Researchers have successfully applied exome sequencing to discover causal variants in selected individuals with familial, highly penetrant disorders. We demonstrate the utility of exome sequencing followed by imputation for discovering low-frequency variants associated with complex quantitative traits. We performed exome sequencing in a reference panel of 761 African Americans and then imputed newly discovered variants into a larger sample of more than 13,000 African Americans for association testing with the blood cell traits hemoglobin, hematocrit, white blood count, and platelet count. First, we illustrate the feasibility of our approach by demonstrating genome-wide-significant associations for variants that are not covered by conventional genotyping arrays; for example, one such association is that between higher platelet count and an MPL c.117G>T (p.Lys39Asn) variant encoding a p.Lys39Asn amino acid substitution of the thrombpoietin receptor gene (p = 1.5 × 10−11). Second, we identified an association between missense variants of LCT and higher white blood count (p = 4 × 10−13). Third, we identified low-frequency coding variants that might account for allelic heterogeneity at several known blood cell-associated loci: MPL c.754T>C (p.Tyr252His) was associated with higher platelet count; CD36 c.975T>G (p.Tyr325∗) was associated with lower platelet count; and several missense variants at the α-globin gene locus were associated with lower hemoglobin. By identifying low-frequency missense variants associated with blood cell traits not previously reported by genome-wide association studies, we establish that exome sequencing followed by imputation is a powerful approach to dissecting complex, genetically heterogeneous traits in large population-based studies. PMID:23103231
The functional spectrum of low-frequency coding variation.

PubMed

Marth, Gabor T; Yu, Fuli; Indap, Amit R; Garimella, Kiran; Gravel, Simon; Leong, Wen Fung; Tyler-Smith, Chris; Bainbridge, Matthew; Blackwell, Tom; Zheng-Bradley, Xiangqun; Chen, Yuan; Challis, Danny; Clarke, Laura; Ball, Edward V; Cibulskis, Kristian; Cooper, David N; Fulton, Bob; Hartl, Chris; Koboldt, Dan; Muzny, Donna; Smith, Richard; Sougnez, Carrie; Stewart, Chip; Ward, Alistair; Yu, Jin; Xue, Yali; Altshuler, David; Bustamante, Carlos D; Clark, Andrew G; Daly, Mark; DePristo, Mark; Flicek, Paul; Gabriel, Stacey; Mardis, Elaine; Palotie, Aarno; Gibbs, Richard

2011-09-14

Rare coding variants constitute an important class of human genetic variation, but are underrepresented in current databases that are based on small population samples. Recent studies show that variants altering amino acid sequence and protein function are enriched at low variant allele frequency, 2 to 5%, but because of insufficient sample size it is not clear if the same trend holds for rare variants below 1% allele frequency. The 1000 Genomes Exon Pilot Project has collected deep-coverage exon-capture data in roughly 1,000 human genes, for nearly 700 samples. Although medical whole-exome projects are currently afoot, this is still the deepest reported sampling of a large number of human genes with next-generation technologies. According to the goals of the 1000 Genomes Project, we created effective informatics pipelines to process and analyze the data, and discovered 12,758 exonic SNPs, 70% of them novel, and 74% below 1% allele frequency in the seven population samples we examined. Our analysis confirms that coding variants below 1% allele frequency show increased population-specificity and are enriched for functional variants. This study represents a large step toward detecting and interpreting low frequency coding variation, clearly lays out technical steps for effective analysis of DNA capture data, and articulates functional and population properties of this important class of genetic variation.
Association of Germline CHEK2 Gene Variants with Risk and Prognosis of Non-Hodgkin Lymphoma

PubMed Central

Havranek, Ondrej; Kleiblova, Petra; Hojny, Jan; Lhota, Filip; Soucek, Pavel; Trneny, Marek; Kleibl, Zdenek

2015-01-01

The checkpoint kinase 2 gene (CHEK2) codes for the CHK2 protein, an important mediator of the DNA damage response pathway. The CHEK2 gene has been recognized as a multi-cancer susceptibility gene; however, its role in non-Hodgkin lymphoma (NHL) remains unclear. We performed mutation analysis of the entire CHEK2 coding sequence in 340 NHL patients using denaturing high-performance liquid chromatography (DHPLC) and multiplex ligation-dependent probe amplification (MLPA). Identified hereditary variants were genotyped in 445 non-cancer controls. The influence of CHEK2 variants on disease risk was statistically evaluated. Identified CHEK2 germline variants included four truncating mutations (found in five patients and no control; P = 0.02) and nine missense variants (found in 21 patients and 12 controls; P = 0.02). Carriers of non-synonymous variants had an increased risk of NHL development [odds ratio (OR) 2.86; 95% confidence interval (CI) 1.42–5.79] and an unfavorable prognosis [hazard ratio (HR) of progression-free survival (PFS) 2.1; 95% CI 1.12–4.05]. In contrast, the most frequent intronic variant c.319+43dupA (identified in 22% of patients and 31% of controls) was associated with a decreased NHL risk (OR = 0.62; 95% CI 0.45–0.86), but its positive prognostic effect was limited to NHL patients with diffuse large B-cell lymphoma (DLBCL) treated by conventional chemotherapy without rituximab (HR-PFS 0.4; 94% CI 0.17–0.74). Our results show that germ-line CHEK2 mutations affecting protein coding sequence confer a moderately-increased risk of NHL, they are associated with an unfavorable NHL prognosis, and they may represent a valuable predictive biomarker for patients with DLBCL. PMID:26506619
Association of Germline CHEK2 Gene Variants with Risk and Prognosis of Non-Hodgkin Lymphoma.

PubMed

Havranek, Ondrej; Kleiblova, Petra; Hojny, Jan; Lhota, Filip; Soucek, Pavel; Trneny, Marek; Kleibl, Zdenek

2015-01-01

The checkpoint kinase 2 gene (CHEK2) codes for the CHK2 protein, an important mediator of the DNA damage response pathway. The CHEK2 gene has been recognized as a multi-cancer susceptibility gene; however, its role in non-Hodgkin lymphoma (NHL) remains unclear. We performed mutation analysis of the entire CHEK2 coding sequence in 340 NHL patients using denaturing high-performance liquid chromatography (DHPLC) and multiplex ligation-dependent probe amplification (MLPA). Identified hereditary variants were genotyped in 445 non-cancer controls. The influence of CHEK2 variants on disease risk was statistically evaluated. Identified CHEK2 germline variants included four truncating mutations (found in five patients and no control; P = 0.02) and nine missense variants (found in 21 patients and 12 controls; P = 0.02). Carriers of non-synonymous variants had an increased risk of NHL development [odds ratio (OR) 2.86; 95% confidence interval (CI) 1.42-5.79] and an unfavorable prognosis [hazard ratio (HR) of progression-free survival (PFS) 2.1; 95% CI 1.12-4.05]. In contrast, the most frequent intronic variant c.319+43dupA (identified in 22% of patients and 31% of controls) was associated with a decreased NHL risk (OR = 0.62; 95% CI 0.45-0.86), but its positive prognostic effect was limited to NHL patients with diffuse large B-cell lymphoma (DLBCL) treated by conventional chemotherapy without rituximab (HR-PFS 0.4; 94% CI 0.17-0.74). Our results show that germ-line CHEK2 mutations affecting protein coding sequence confer a moderately-increased risk of NHL, they are associated with an unfavorable NHL prognosis, and they may represent a valuable predictive biomarker for patients with DLBCL.
Preconception Carrier Screening by Genome Sequencing: Results from the Clinical Laboratory.

PubMed

Punj, Sumit; Akkari, Yassmine; Huang, Jennifer; Yang, Fei; Creason, Allison; Pak, Christine; Potter, Amiee; Dorschner, Michael O; Nickerson, Deborah A; Robertson, Peggy D; Jarvik, Gail P; Amendola, Laura M; Schleit, Jennifer; Simpson, Dana Kostiner; Rope, Alan F; Reiss, Jacob; Kauffman, Tia; Gilmore, Marian J; Himes, Patricia; Wilfond, Benjamin; Goddard, Katrina A B; Richards, C Sue

2018-06-07

Advances in sequencing technologies permit the analysis of a larger selection of genes for preconception carrier screening. The study was designed as a sequential carrier screen using genome sequencing to analyze 728 gene-disorder pairs for carrier and medically actionable conditions in 131 women and their partners (n = 71) who were planning a pregnancy. We report here on the clinical laboratory results from this expanded carrier screening program. Variants were filtered and classified using the latest American College of Medical Genetics and Genomics (ACMG) guideline; only pathogenic and likely pathogenic variants were confirmed by orthologous methods before being reported. Novel missense variants were classified as variants of uncertain significance. We reported 304 variants in 202 participants. Twelve carrier couples (12/71 couples tested) were identified for common conditions; eight were carriers for hereditary hemochromatosis. Although both known and novel variants were reported, 48% of all reported variants were missense. For novel splice-site variants, RNA-splicing assays were performed to aid in classification. We reported ten copy-number variants and five variants in non-coding regions. One novel variant was reported in F8, associated with hemophilia A; prenatal testing showed that the male fetus harbored this variant and the neonate suffered a life-threatening hemorrhage which was anticipated and appropriately managed. Moreover, 3% of participants had variants that were medically actionable. Compared with targeted mutation screening, genome sequencing improves the sensitivity of detecting clinically significant variants. While certain novel variant interpretation remains challenging, the ACMG guidelines are useful to classify variants in a healthy population. Copyright © 2018 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Whole exome sequencing in an Italian family with isolated maxillary canine agenesis and canine eruption anomalies.

PubMed

Barbato, Ersilia; Traversa, Alice; Guarnieri, Rosanna; Giovannetti, Agnese; Genovesi, Maria Luce; Magliozzi, Maria Rosa; Paolacci, Stefano; Ciolfi, Andrea; Pizzi, Simone; Di Giorgio, Roberto; Tartaglia, Marco; Pizzuti, Antonio; Caputo, Viviana

2018-07-01

The aim of this study was the clinical and molecular characterization of a family segregating a trait consisting of a phenotype specifically involving the maxillary canines, including agenesis, impaction and ectopic eruption, characterized by incomplete penetrance and variable expressivity. Clinical standardized assessment of 14 family members and a whole-exome sequencing (WES) of three affected subjects were performed. WES data analyses (sequence alignment, variant calling, annotation and prioritization) were carried out using an in-house implemented pipeline. Variant filtering retained coding and splice-site high quality private and rare variants. Variant prioritization was performed taking into account both the disruptive impact and the biological relevance of individual variants and genes. Sanger sequencing was performed to validate the variants of interest and to carry out segregation analysis. Prioritization of variants "by function" allowed the identification of multiple variants contributing to the trait, including two concomitant heterozygous variants in EDARADD (c.308C>T, p.Ser103Phe) and COL5A1 (c.1588G>A, p.Gly530Ser), specifically associated with a more severe phenotype (i.e. canine agenesis). Differently, heterozygous variants in genes encoding proteins with a role in the WNT pathway were shared by subjects showing a phenotype of impacted/ectopic erupted canines. This study characterized the genetic contribution underlying a complex trait consisting of isolated canine anomalies in a medium-sized family, highlighting the role of WNT and EDA cell signaling pathways in tooth development. Copyright © 2018 Elsevier Ltd. All rights reserved.
Identification of rare X-linked neuroligin variants by massively parallel sequencing in males with autism spectrum disorder.

PubMed

Steinberg, Karyn Meltz; Ramachandran, Dhanya; Patel, Viren C; Shetty, Amol C; Cutler, David J; Zwick, Michael E

2012-09-28

Autism spectrum disorder (ASD) is highly heritable, but the genetic risk factors for it remain largely unknown. Although structural variants with large effect sizes may explain up to 15% ASD, genome-wide association studies have failed to uncover common single nucleotide variants with large effects on phenotype. The focus within ASD genetics is now shifting to the examination of rare sequence variants of modest effect, which is most often achieved via exome selection and sequencing. This strategy has indeed identified some rare candidate variants; however, the approach does not capture the full spectrum of genetic variation that might contribute to the phenotype. We surveyed two loci with known rare variants that contribute to ASD, the X-linked neuroligin genes by performing massively parallel Illumina sequencing of the coding and noncoding regions from these genes in males from families with multiplex autism. We annotated all variant sites and functionally tested a subset to identify other rare mutations contributing to ASD susceptibility. We found seven rare variants at evolutionary conserved sites in our study population. Functional analyses of the three 3' UTR variants did not show statistically significant effects on the expression of NLGN3 and NLGN4X. In addition, we identified two NLGN3 intronic variants located within conserved transcription factor binding sites that could potentially affect gene regulation. These data demonstrate the power of massively parallel, targeted sequencing studies of affected individuals for identifying rare, potentially disease-contributing variation. However, they also point out the challenges and limitations of current methods of direct functional testing of rare variants and the difficulties of identifying alleles with modest effects.
Identification of rare X-linked neuroligin variants by massively parallel sequencing in males with autism spectrum disorder

PubMed Central

2012-01-01

Background Autism spectrum disorder (ASD) is highly heritable, but the genetic risk factors for it remain largely unknown. Although structural variants with large effect sizes may explain up to 15% ASD, genome-wide association studies have failed to uncover common single nucleotide variants with large effects on phenotype. The focus within ASD genetics is now shifting to the examination of rare sequence variants of modest effect, which is most often achieved via exome selection and sequencing. This strategy has indeed identified some rare candidate variants; however, the approach does not capture the full spectrum of genetic variation that might contribute to the phenotype. Methods We surveyed two loci with known rare variants that contribute to ASD, the X-linked neuroligin genes by performing massively parallel Illumina sequencing of the coding and noncoding regions from these genes in males from families with multiplex autism. We annotated all variant sites and functionally tested a subset to identify other rare mutations contributing to ASD susceptibility. Results We found seven rare variants at evolutionary conserved sites in our study population. Functional analyses of the three 3’ UTR variants did not show statistically significant effects on the expression of NLGN3 and NLGN4X. In addition, we identified two NLGN3 intronic variants located within conserved transcription factor binding sites that could potentially affect gene regulation. Conclusions These data demonstrate the power of massively parallel, targeted sequencing studies of affected individuals for identifying rare, potentially disease-contributing variation. However, they also point out the challenges and limitations of current methods of direct functional testing of rare variants and the difficulties of identifying alleles with modest effects. PMID:23020841
Exome sequencing analysis reveals variants in primary immunodeficiency genes in patients with very early onset inflammatory bowel disease.

PubMed

Kelsen, Judith R; Dawany, Noor; Moran, Christopher J; Petersen, Britt-Sabina; Sarmady, Mahdi; Sasson, Ariella; Pauly-Hubbard, Helen; Martinez, Alejandro; Maurer, Kelly; Soong, Joanne; Rappaport, Eric; Franke, Andre; Keller, Andreas; Winter, Harland S; Mamula, Petar; Piccoli, David; Artis, David; Sonnenberg, Gregory F; Daly, Mark; Sullivan, Kathleen E; Baldassano, Robert N; Devoto, Marcella

2015-11-01

Very early onset inflammatory bowel disease (VEO-IBD), IBD diagnosed at 5 years of age or younger, frequently presents with a different and more severe phenotype than older-onset IBD. We investigated whether patients with VEO-IBD carry rare or novel variants in genes associated with immunodeficiencies that might contribute to disease development. Patients with VEO-IBD and parents (when available) were recruited from the Children's Hospital of Philadelphia from March 2013 through July 2014. We analyzed DNA from 125 patients with VEO-IBD (age, 3 wk to 4 y) and 19 parents, 4 of whom also had IBD. Exome capture was performed by Agilent SureSelect V4, and sequencing was performed using the Illumina HiSeq platform. Alignment to human genome GRCh37 was achieved followed by postprocessing and variant calling. After functional annotation, candidate variants were analyzed for change in protein function, minor allele frequency less than 0.1%, and scaled combined annotation-dependent depletion scores of 10 or less. We focused on genes associated with primary immunodeficiencies and related pathways. An additional 210 exome samples from patients with pediatric IBD (n = 45) or adult-onset Crohn's disease (n = 20) and healthy individuals (controls, n = 145) were obtained from the University of Kiel, Germany, and used as control groups. Four hundred genes and regions associated with primary immunodeficiency, covering approximately 6500 coding exons totaling more than 1 Mbp of coding sequence, were selected from the whole-exome data. Our analysis showed novel and rare variants within these genes that could contribute to the development of VEO-IBD, including rare heterozygous missense variants in IL10RA and previously unidentified variants in MSH5 and CD19. In an exome sequence analysis of patients with VEO-IBD and their parents, we identified variants in genes that regulate B- and T-cell functions and could contribute to pathogenesis. Our analysis could lead to the identification of previously unidentified IBD-associated variants. Copyright © 2015 AGA Institute. Published by Elsevier Inc. All rights reserved.
Whole genome sequencing and integrative genomic analysis approach on two 22q11.2 deletion syndrome family trios for genotype to phenotype correlations

PubMed Central

Chung, Jonathan H.; Cai, Jinlu; Suskin, Barrie G.; Zhang, Zhengdong; Coleman, Karlene

2015-01-01

The 22q11.2 deletion syndrome (22q11DS) affects 1:4000 live births and presents with highly variable phenotype expressivity. In this study, we developed an analytical approach utilizing whole genome sequencing and integrative analysis to discover genetic modifiers. Our pipeline combined available tools in order to prioritize rare, predicted deleterious, coding and non-coding single nucleotide variants (SNVs) and insertion/deletions (INDELs) from whole genome sequencing (WGS). We sequenced two unrelated probands with 22q11DS, with contrasting clinical findings, and their unaffected parents. Proband P1 had cognitive impairment, psychotic episodes, anxiety, and tetralogy of Fallot (TOF); while proband P2 had juvenile rheumatoid arthritis but no other major clinical findings. In P1, we identified common variants in COMT and PRODH on 22q11.2 as well as rare potentially deleterious DNA variants in other behavioral/neurocognitive genes. We also identified a de novo SNV in ADNP2 (NM_014913.3:c.2243G>C), encoding a neuroprotective protein that may be involved in behavioral disorders. In P2, we identified a novel non-synonymous SNV in ZFPM2 (NM_012082.3:c.1576C>T), a known causative gene for TOF, which may act as a protective variant downstream of TBX1, haploinsufficiency of which is responsible for congenital heart disease in individuals with 22q11DS. PMID:25981510
CHEK2 contribution to hereditary breast cancer in non-BRCA families.

PubMed

Desrichard, Alexis; Bidet, Yannick; Uhrhammer, Nancy; Bignon, Yves-Jean

2011-01-01

Mutations in the BRCA1 and BRCA2 genes are responsible for only a part of hereditary breast cancer (HBC). The origins of "non-BRCA" HBC in families may be attributed in part to rare mutations in genes conferring moderate risk, such as CHEK2, which encodes for an upstream regulator of BRCA1. Previous studies have demonstrated an association between CHEK2 founder mutations and non-BRCA HBC. However, very few data on the entire coding sequence of this gene are available. We investigated the contribution of CHEK2 mutations to non-BRCA HBC by direct sequencing of its whole coding sequence in 507 non-BRCA HBC cases and 513 controls. We observed 16 mutations in cases and 4 in controls, including 9 missense variants of uncertain consequence. Using both in silico tools and an in vitro kinase activity test, the majority of the variants were found likely to be deleterious for protein function. One variant present in both cases and controls was proposed to be neutral. Removing this variant from the pool of potentially deleterious variants gave a mutation frequency of 1.48% for cases and 0.29% for controls (P = 0.0040). The odds ratio of breast cancer in the presence of a deleterious CHEK2 mutation was 5.18. Our work indicates that a variety of deleterious CHEK2 alleles make an appreciable contribution to breast cancer susceptibility, and their identification could help in the clinical management of patients carrying a CHEK2 mutation.
Global variation in CYP2C8–CYP2C9 functional haplotypes

PubMed Central

Speed, William C; Kang, Soonmo Peter; Tuck, David P; Harris, Lyndsay N; Kidd, Kenneth K

2009-01-01

We have studied the global frequency distributions of 10 single nucleotide polymorphisms (SNPs) across 132 kb of CYP2C8 and CYP2C9 in ∼2500 individuals representing 45 populations. Five of the SNPs were in noncoding sequences; the other five involved the more common missense variants (four in CYP2C8, one in CYP2C9) that change amino acids in the gene products. One haplotype containing two CYP2C8 coding variants and one CYP2C9 coding variant reaches an average frequency of 10% in Europe; a set of haplotypes with a different CYP2C8 coding variant reaches 17% in Africa. In both cases these haplotypes are found in other regions of the world at <1%. This considerable geographic variation in haplotype frequencies impacts the interpretation of CYP2C8/CYP2C9 association studies, and has pharmacogenomic implications for drug interactions. PMID:19381162

Functional Assessment of Disease-Associated Regulatory Variants In Vivo Using a Versatile Dual Colour Transgenesis Strategy in Zebrafish

PubMed Central

Bhatia, Shipra; Gordon, Christopher T.; Foster, Robert G.; Melin, Lucie; Abadie, Véronique; Baujat, Geneviève; Vazquez, Marie-Paule; Amiel, Jeanne; Lyonnet, Stanislas; van Heyningen, Veronica; Kleinjan, Dirk A.

2015-01-01

Disruption of gene regulation by sequence variation in non-coding regions of the genome is now recognised as a significant cause of human disease and disease susceptibility. Sequence variants in cis-regulatory elements (CREs), the primary determinants of spatio-temporal gene regulation, can alter transcription factor binding sites. While technological advances have led to easy identification of disease-associated CRE variants, robust methods for discerning functional CRE variants from background variation are lacking. Here we describe an efficient dual-colour reporter transgenesis approach in zebrafish, simultaneously allowing detailed in vivo comparison of spatio-temporal differences in regulatory activity between putative CRE variants and assessment of altered transcription factor binding potential of the variant. We validate the method on known disease-associated elements regulating SHH, PAX6 and IRF6 and subsequently characterise novel, ultra-long-range SOX9 enhancers implicated in the craniofacial abnormality Pierre Robin Sequence. The method provides a highly cost-effective, fast and robust approach for simultaneously unravelling in a single assay whether, where and when in embryonic development a disease-associated CRE-variant is affecting its regulatory function. PMID:26030420
Exome-wide association analysis reveals novel coding sequence variants associated with lipid traits in Chinese.

PubMed

Tang, Clara S; Zhang, He; Cheung, Chloe Y Y; Xu, Ming; Ho, Jenny C Y; Zhou, Wei; Cherny, Stacey S; Zhang, Yan; Holmen, Oddgeir; Au, Ka-Wing; Yu, Haiyi; Xu, Lin; Jia, Jia; Porsch, Robert M; Sun, Lijie; Xu, Weixian; Zheng, Huiping; Wong, Lai-Yung; Mu, Yiming; Dou, Jingtao; Fong, Carol H Y; Wang, Shuyu; Hong, Xueyu; Dong, Liguang; Liao, Yanhua; Wang, Jiansong; Lam, Levina S M; Su, Xi; Yan, Hua; Yang, Min-Lee; Chen, Jin; Siu, Chung-Wah; Xie, Gaoqiang; Woo, Yu-Cho; Wu, Yangfeng; Tan, Kathryn C B; Hveem, Kristian; Cheung, Bernard M Y; Zöllner, Sebastian; Xu, Aimin; Eugene Chen, Y; Jiang, Chao Qiang; Zhang, Youyi; Lam, Tai-Hing; Ganesh, Santhi K; Huo, Yong; Sham, Pak C; Lam, Karen S L; Willer, Cristen J; Tse, Hung-Fat; Gao, Wei

2015-12-22

Blood lipids are important risk factors for coronary artery disease (CAD). Here we perform an exome-wide association study by genotyping 12,685 Chinese, using a custom Illumina HumanExome BeadChip, to identify additional loci influencing lipid levels. Single-variant association analysis on 65,671 single nucleotide polymorphisms reveals 19 loci associated with lipids at exome-wide significance (P<2.69 × 10(-7)), including three Asian-specific coding variants in known genes (CETP p.Asp459Gly, PCSK9 p.Arg93Cys and LDLR p.Arg257Trp). Furthermore, missense variants at two novel loci-PNPLA3 p.Ile148Met and PKD1L3 p.Thr429Ser-also influence levels of triglycerides and low-density lipoprotein cholesterol, respectively. Another novel gene, TEAD2, is found to be associated with high-density lipoprotein cholesterol through gene-based association analysis. Most of these newly identified coding variants show suggestive association (P<0.05) with CAD. These findings demonstrate that exome-wide genotyping on samples of non-European ancestry can identify additional population-specific possible causal variants, shedding light on novel lipid biology and CAD.
Extremely hypomorphic and severe deep intronic variants in the ABCA4 locus result in varying Stargardt disease phenotypes.

PubMed

Zernant, Jana; Lee, Winston; Nagasaki, Takayuki; Collison, Frederick T; Fishman, Gerald A; Bertelsen, Mette; Rosenberg, Thomas; Gouras, Peter; Tsang, Stephen H; Allikmets, Rando

2018-05-30

Autosomal recessive Stargardt disease (STGD1, MIM 248200) is caused by mutations in the ABCA4 gene. Complete sequencing of the ABCA4 locus in STGD1 patients identifies two expected disease-causing alleles in ~75% of patients and only one mutation in ~15% of patients. Recently, many possibly pathogenic variants in deep intronic sequences of ABCA4 have been identified in the latter group. We extended our analyses of deep intronic ABCA4 variants and determined that one of these, c.4253+43G>A (rs61754045), is present in 29/1155 (2.6%) of STGD1 patients. The variant is found at statistically significantly higher frequency in patients with only one pathogenic ABCA4 allele, 23/160 (14.38%), MAF=0.072, compared to MAF=0.013 in all STGD1 cases and MAF=0.006 in the matching general population (P<1x10-7). The variant, which is not predicted to have any effect on splicing, is the first reported intronic "extremely hypomorphic allele" in the ABCA4 locus; i.e., it is pathogenic only when in trans with a loss-of-function ABCA4 allele. It results in a distinct clinical phenotype characterized by late-onset of symptoms and foveal sparing. In ~70% of cases the variant was allelic with the c.6006-609T>A (rs575968112) variant, which was deemed non-pathogenic. Another rare deep intronic variant, c.5196+1056A>G (rs886044749), found in 5/834 (0.6%) of STGD1 cases is, conversely, a severe allele. This study determines pathogenicity for three non-coding variants in STGD1 patients of European descent accounting for ~3% of the disease. Defining disease-associated alleles in the non-coding sequences of the ABCA4 locus can be accomplished by integrated clinical and genetic analyses. Cold Spring Harbor Laboratory Press.
Rare, protein-truncating variants in ATM, CHEK2 and PALB2, but not XRCC2, are associated with increased breast cancer risks

PubMed Central

Decker, Brennan; Allen, Jamie; Luccarini, Craig; Pooley, Karen A; Shah, Mitul; Bolla, Manjeet K; Wang, Qin; Ahmed, Shahana; Baynes, Caroline; Conroy, Don M; Brown, Judith; Luben, Robert; Ostrander, Elaine A; Pharoah, Paul DP; Dunning, Alison M; Easton, Douglas F

2017-01-01

Background Breast cancer (BC) is the most common malignancy in women and has a major heritable component. The risks associated with most rare susceptibility variants are not well estimated. To better characterise the contribution of variants in ATM, CHEK2, PALB2 and XRCC2, we sequenced their coding regions in 13 087 BC cases and 5488 controls from East Anglia, UK. Methods Gene coding regions were enriched via PCR, sequenced, variant called and filtered for quality. ORs for BC risk were estimated separately for carriers of truncating variants and of rare missense variants, which were further subdivided by functional domain and pathogenicity as predicted by four in silico algorithms. Results Truncating variants in PALB2 (OR=4.69, 95% CI 2.27 to 9.68), ATM (OR=3.26; 95% CI 1.82 to 6.46) and CHEK2 (OR=3.11; 95% CI 2.15 to 4.69), but not XRCC2 (OR=0.94; 95% CI 0.26 to 4.19) were associated with increased BC risk. Truncating variants in ATM and CHEK2 were more strongly associated with risk of oestrogen receptor (ER)-positive than ER-negative disease, while those in PALB2 were associated with similar risks for both subtypes. There was also some evidence that missense variants in ATM, CHEK2 and PALB2 may contribute to BC risk, but larger studies are necessary to quantify the magnitude of this effect. Conclusions Truncating variants in PALB2 are associated with a higher risk of BC than those in ATM or CHEK2. A substantial risk of BC due to truncating XRCC2 variants can be excluded. PMID:28779002
Evaluation of 10 genes encoding cardiac proteins in Doberman Pinschers with dilated cardiomyopathy.

PubMed

O'Sullivan, M Lynne; O'Grady, Michael R; Pyle, W Glen; Dawson, John F

2011-07-01

To identify a causative mutation for dilated cardiomyopathy (DCM) in Doberman Pinschers by sequencing the coding regions of 10 cardiac genes known to be associated with familial DCM in humans. 5 Doberman Pinschers with DCM and congestive heart failure and 5 control mixed-breed dogs that were euthanized or died. RNA was extracted from frozen ventricular myocardial samples from each dog, and first-strand cDNA was synthesized via reverse transcription, followed by PCR amplification with gene-specific primers. Ten cardiac genes were analyzed: cardiac actin, α-actinin, α-tropomyosin, β-myosin heavy chain, metavinculin, muscle LIM protein, myosinbinding protein C, tafazzin, titin-cap (telethonin), and troponin T. Sequences for DCM-affected and control dogs and the published canine genome were compared. None of the coding sequences yielded a common causative mutation among all Doberman Pinscher samples. However, 3 variants were identified in the α-actinin gene in the DCM-affected Doberman Pinschers. One of these variants, identified in 2 of the 5 Doberman Pinschers, resulted in an amino acid change in the rod-forming triple coiled-coil domain. Mutations in the coding regions of several genes associated with DCM in humans did not appear to consistently account for DCM in Doberman Pinschers. However, an α-actinin variant was detected in some Doberman Pinschers that may contribute to the development of DCM given its potential effect on the structure of this protein. Investigation of additional candidate gene coding and noncoding regions and further evaluation of the role of α-actinin in development of DCM in Doberman Pinschers are warranted.
A Low-Frequency Inactivating AKT2 Variant Enriched in the Finnish Population Is Associated With Fasting Insulin Levels and Type 2 Diabetes Risk.

PubMed

Manning, Alisa; Highland, Heather M; Gasser, Jessica; Sim, Xueling; Tukiainen, Taru; Fontanillas, Pierre; Grarup, Niels; Rivas, Manuel A; Mahajan, Anubha; Locke, Adam E; Cingolani, Pablo; Pers, Tune H; Viñuela, Ana; Brown, Andrew A; Wu, Ying; Flannick, Jason; Fuchsberger, Christian; Gamazon, Eric R; Gaulton, Kyle J; Im, Hae Kyung; Teslovich, Tanya M; Blackwell, Thomas W; Bork-Jensen, Jette; Burtt, Noël P; Chen, Yuhui; Green, Todd; Hartl, Christopher; Kang, Hyun Min; Kumar, Ashish; Ladenvall, Claes; Ma, Clement; Moutsianas, Loukas; Pearson, Richard D; Perry, John R B; Rayner, N William; Robertson, Neil R; Scott, Laura J; van de Bunt, Martijn; Eriksson, Johan G; Jula, Antti; Koskinen, Seppo; Lehtimäki, Terho; Palotie, Aarno; Raitakari, Olli T; Jacobs, Suzanne B R; Wessel, Jennifer; Chu, Audrey Y; Scott, Robert A; Goodarzi, Mark O; Blancher, Christine; Buck, Gemma; Buck, David; Chines, Peter S; Gabriel, Stacey; Gjesing, Anette P; Groves, Christopher J; Hollensted, Mette; Huyghe, Jeroen R; Jackson, Anne U; Jun, Goo; Justesen, Johanne Marie; Mangino, Massimo; Murphy, Jacquelyn; Neville, Matt; Onofrio, Robert; Small, Kerrin S; Stringham, Heather M; Trakalo, Joseph; Banks, Eric; Carey, Jason; Carneiro, Mauricio O; DePristo, Mark; Farjoun, Yossi; Fennell, Timothy; Goldstein, Jacqueline I; Grant, George; Hrabé de Angelis, Martin; Maguire, Jared; Neale, Benjamin M; Poplin, Ryan; Purcell, Shaun; Schwarzmayr, Thomas; Shakir, Khalid; Smith, Joshua D; Strom, Tim M; Wieland, Thomas; Lindstrom, Jaana; Brandslund, Ivan; Christensen, Cramer; Surdulescu, Gabriela L; Lakka, Timo A; Doney, Alex S F; Nilsson, Peter; Wareham, Nicholas J; Langenberg, Claudia; Varga, Tibor V; Franks, Paul W; Rolandsson, Olov; Rosengren, Anders H; Farook, Vidya S; Thameem, Farook; Puppala, Sobha; Kumar, Satish; Lehman, Donna M; Jenkinson, Christopher P; Curran, Joanne E; Hale, Daniel Esten; Fowler, Sharon P; Arya, Rector; DeFronzo, Ralph A; Abboud, Hanna E; Syvänen, Ann-Christine; Hicks, Pamela J; Palmer, Nicholette D; Ng, Maggie C Y; Bowden, Donald W; Freedman, Barry I; Esko, Tõnu; Mägi, Reedik; Milani, Lili; Mihailov, Evelin; Metspalu, Andres; Narisu, Narisu; Kinnunen, Leena; Bonnycastle, Lori L; Swift, Amy; Pasko, Dorota; Wood, Andrew R; Fadista, João; Pollin, Toni I; Barzilai, Nir; Atzmon, Gil; Glaser, Benjamin; Thorand, Barbara; Strauch, Konstantin; Peters, Annette; Roden, Michael; Müller-Nurasyid, Martina; Liang, Liming; Kriebel, Jennifer; Illig, Thomas; Grallert, Harald; Gieger, Christian; Meisinger, Christa; Lannfelt, Lars; Musani, Solomon K; Griswold, Michael; Taylor, Herman A; Wilson, Gregory; Correa, Adolfo; Oksa, Heikki; Scott, William R; Afzal, Uzma; Tan, Sian-Tsung; Loh, Marie; Chambers, John C; Sehmi, Jobanpreet; Kooner, Jaspal Singh; Lehne, Benjamin; Cho, Yoon Shin; Lee, Jong-Young; Han, Bok-Ghee; Käräjämäki, Annemari; Qi, Qibin; Qi, Lu; Huang, Jinyan; Hu, Frank B; Melander, Olle; Orho-Melander, Marju; Below, Jennifer E; Aguilar, David; Wong, Tien Yin; Liu, Jianjun; Khor, Chiea-Chuen; Chia, Kee Seng; Lim, Wei Yen; Cheng, Ching-Yu; Chan, Edmund; Tai, E Shyong; Aung, Tin; Linneberg, Allan; Isomaa, Bo; Meitinger, Thomas; Tuomi, Tiinamaija; Hakaste, Liisa; Kravic, Jasmina; Jørgensen, Marit E; Lauritzen, Torsten; Deloukas, Panos; Stirrups, Kathleen E; Owen, Katharine R; Farmer, Andrew J; Frayling, Timothy M; O'Rahilly, Stephen P; Walker, Mark; Levy, Jonathan C; Hodgkiss, Dylan; Hattersley, Andrew T; Kuulasmaa, Teemu; Stančáková, Alena; Barroso, Inês; Bharadwaj, Dwaipayan; Chan, Juliana; Chandak, Giriraj R; Daly, Mark J; Donnelly, Peter J; Ebrahim, Shah B; Elliott, Paul; Fingerlin, Tasha; Froguel, Philippe; Hu, Cheng; Jia, Weiping; Ma, Ronald C W; McVean, Gilean; Park, Taesung; Prabhakaran, Dorairaj; Sandhu, Manjinder; Scott, James; Sladek, Rob; Tandon, Nikhil; Teo, Yik Ying; Zeggini, Eleftheria; Watanabe, Richard M; Koistinen, Heikki A; Kesaniemi, Y Antero; Uusitupa, Matti; Spector, Timothy D; Salomaa, Veikko; Rauramaa, Rainer; Palmer, Colin N A; Prokopenko, Inga; Morris, Andrew D; Bergman, Richard N; Collins, Francis S; Lind, Lars; Ingelsson, Erik; Tuomilehto, Jaakko; Karpe, Fredrik; Groop, Leif; Jørgensen, Torben; Hansen, Torben; Pedersen, Oluf; Kuusisto, Johanna; Abecasis, Gonçalo; Bell, Graeme I; Blangero, John; Cox, Nancy J; Duggirala, Ravindranath; Seielstad, Mark; Wilson, James G; Dupuis, Josee; Ripatti, Samuli; Hanis, Craig L; Florez, Jose C; Mohlke, Karen L; Meigs, James B; Laakso, Markku; Morris, Andrew P; Boehnke, Michael; Altshuler, David; McCarthy, Mark I; Gloyn, Anna L; Lindgren, Cecilia M

2017-07-01

To identify novel coding association signals and facilitate characterization of mechanisms influencing glycemic traits and type 2 diabetes risk, we analyzed 109,215 variants derived from exome array genotyping together with an additional 390,225 variants from exome sequence in up to 39,339 normoglycemic individuals from five ancestry groups. We identified a novel association between the coding variant (p.Pro50Thr) in AKT2 and fasting plasma insulin (FI), a gene in which rare fully penetrant mutations are causal for monogenic glycemic disorders. The low-frequency allele is associated with a 12% increase in FI levels. This variant is present at 1.1% frequency in Finns but virtually absent in individuals from other ancestries. Carriers of the FI-increasing allele had increased 2-h insulin values, decreased insulin sensitivity, and increased risk of type 2 diabetes (odds ratio 1.05). In cellular studies, the AKT2-Thr50 protein exhibited a partial loss of function. We extend the allelic spectrum for coding variants in AKT2 associated with disorders of glucose homeostasis and demonstrate bidirectional effects of variants within the pleckstrin homology domain of AKT2 . © 2017 by the American Diabetes Association.
Filovirus RefSeq Entries: Evaluation and Selection of Filovirus Type Variants, Type Sequences, and Names

PubMed Central

Kuhn, Jens H.; Andersen, Kristian G.; Bào, Yīmíng; Bavari, Sina; Becker, Stephan; Bennett, Richard S.; Bergman, Nicholas H.; Blinkova, Olga; Bradfute, Steven; Brister, J. Rodney; Bukreyev, Alexander; Chandran, Kartik; Chepurnov, Alexander A.; Davey, Robert A.; Dietzgen, Ralf G.; Doggett, Norman A.; Dolnik, Olga; Dye, John M.; Enterlein, Sven; Fenimore, Paul W.; Formenty, Pierre; Freiberg, Alexander N.; Garry, Robert F.; Garza, Nicole L.; Gire, Stephen K.; Gonzalez, Jean-Paul; Griffiths, Anthony; Happi, Christian T.; Hensley, Lisa E.; Herbert, Andrew S.; Hevey, Michael C.; Hoenen, Thomas; Honko, Anna N.; Ignatyev, Georgy M.; Jahrling, Peter B.; Johnson, Joshua C.; Johnson, Karl M.; Kindrachuk, Jason; Klenk, Hans-Dieter; Kobinger, Gary; Kochel, Tadeusz J.; Lackemeyer, Matthew G.; Lackner, Daniel F.; Leroy, Eric M.; Lever, Mark S.; Mühlberger, Elke; Netesov, Sergey V.; Olinger, Gene G.; Omilabu, Sunday A.; Palacios, Gustavo; Panchal, Rekha G.; Park, Daniel J.; Patterson, Jean L.; Paweska, Janusz T.; Peters, Clarence J.; Pettitt, James; Pitt, Louise; Radoshitzky, Sheli R.; Ryabchikova, Elena I.; Saphire, Erica Ollmann; Sabeti, Pardis C.; Sealfon, Rachel; Shestopalov, Aleksandr M.; Smither, Sophie J.; Sullivan, Nancy J.; Swanepoel, Robert; Takada, Ayato; Towner, Jonathan S.; van der Groen, Guido; Volchkov, Viktor E.; Volchkova, Valentina A.; Wahl-Jensen, Victoria; Warren, Travis K.; Warfield, Kelly L.; Weidmann, Manfred; Nichol, Stuart T.

2014-01-01

Sequence determination of complete or coding-complete genomes of viruses is becoming common practice for supporting the work of epidemiologists, ecologists, virologists, and taxonomists. Sequencing duration and costs are rapidly decreasing, sequencing hardware is under modification for use by non-experts, and software is constantly being improved to simplify sequence data management and analysis. Thus, analysis of virus disease outbreaks on the molecular level is now feasible, including characterization of the evolution of individual virus populations in single patients over time. The increasing accumulation of sequencing data creates a management problem for the curators of commonly used sequence databases and an entry retrieval problem for end users. Therefore, utilizing the data to their fullest potential will require setting nomenclature and annotation standards for virus isolates and associated genomic sequences. The National Center for Biotechnology Information’s (NCBI’s) RefSeq is a non-redundant, curated database for reference (or type) nucleotide sequence records that supplies source data to numerous other databases. Building on recently proposed templates for filovirus variant naming [ ()////-], we report consensus decisions from a majority of past and currently active filovirus experts on the eight filovirus type variants and isolates to be represented in RefSeq, their final designations, and their associated sequences. PMID:25256396
ANGPTL8/Betatrophin R59W variant is associated with higher glucose level in non-diabetic Arabs living in Kuwaits.

PubMed

Abu-Farha, Mohamed; Melhem, Motasem; Abubaker, Jehad; Behbehani, Kazem; Alsmadi, Osama; Elkum, Naser

2016-02-11

ANGPTL8 (betatrophin) has been recently identified as a regulator of lipid metabolism through its interaction with ANGPTL3. A sequence variant in ANGPTL8 has been shown to associate with lower level of Low Density Lipoprotein (LDL) and High Density Lipoprotein (HDL). The objective of this study is to identify sequence variants in ANGPTL8 gene in Arabs and investigate their association with ANGPTL8 plasma level and clinical parameters. A cross sectional study was designed to examine the level of ANGPTL8 in 283 non-diabetic Arabs, and to identify its sequence variants using Sanger sequencing and their association with various clinical parameters. Using Sanger sequencing, we sequenced the full ANGPTL8 gene in 283 Arabs identifying two single nucleotide polymorphisms (SNPs) Rs.892066 and Rs.2278426 in the coding region. Our data shows for the first time that Arabs with the heterozygote form of (c.194C > T Rs.2278426) had higher level of Fasting Blood Glucose (FBG) compared to the CC homozygotes. LDL and HDL level in these subjects did not show significant difference between the two subgroups. Circulation level of ANGPTL8 did not vary between the two forms. No significant changes were observed between the various forms of Rs.892066 variant and FBG, LDL or HDL. Our data shows for the first time that heterozygote form of ANGPTL8 Rs.2278426 variant was associated with higher FBG level in Arabs highlighting the importance of these variants in controlling the function of betatrophin.
Use of whole exome sequencing for the identification of Ito-based arrhythmia mechanism and therapy.

PubMed

Sturm, Amy C; Kline, Crystal F; Glynn, Patric; Johnson, Benjamin L; Curran, Jerry; Kilic, Ahmet; Higgins, Robert S D; Binkley, Philip F; Janssen, Paul M L; Weiss, Raul; Raman, Subha V; Fowler, Steven J; Priori, Silvia G; Hund, Thomas J; Carnes, Cynthia A; Mohler, Peter J

2015-05-26

Identified genetic variants are insufficient to explain all cases of inherited arrhythmia. We tested whether the integration of whole exome sequencing with well-established clinical, translational, and basic science platforms could provide rapid and novel insight into human arrhythmia pathophysiology and disease treatment. We report a proband with recurrent ventricular fibrillation, resistant to standard therapeutic interventions. Using whole-exome sequencing, we identified a variant in a previously unidentified exon of the dipeptidyl aminopeptidase-like protein-6 (DPP6) gene. This variant is the first identified coding mutation in DPP6 and augments cardiac repolarizing current (Ito) causing pathological changes in Ito and action potential morphology. We designed a therapeutic regimen incorporating dalfampridine to target Ito. Dalfampridine, approved for multiple sclerosis, normalized the ECG and reduced arrhythmia burden in the proband by >90-fold. This was combined with cilostazol to accelerate the heart rate to minimize the reverse-rate dependence of augmented Ito. We describe a novel arrhythmia mechanism and therapeutic approach to ameliorate the disease. Specifically, we identify the first coding variant of DPP6 in human ventricular fibrillation. These findings illustrate the power of genetic approaches for the elucidation and treatment of disease when carefully integrated with clinical and basic/translational research teams. © 2015 The Authors. Published on behalf of the American Heart Association, Inc., by Wiley Blackwell.
Frequency of EBV LMP-1 Promoter and Coding Variations in Burkitt Lymphoma Samples in Africa and South America and Peripheral Blood in Uganda.

PubMed

Liao, Hsiao-Mei; Liu, Hebing; Lei, Heiyan; Li, Bingjie; Chin, Pei-Ju; Tsai, Shien; Bhatia, Kishor; Gutierrez, Marina; Epelman, Sidnei; Biggar, Robert J; Nkrumah, Francis; Neequaye, Janet; Ogwang, Martin D; Reynolds, Steven J; Lo, Shyh-Ching; Mbulaiteye, Sam M

2018-06-02

Epstein-Barr virus (EBV) is linked to several cancers, including endemic Burkitt lymphoma (eBL), but causal variants are unknown. We recently reported novel sequence variants in the LMP-1 gene and promoter in EBV genomes sequenced from 13 of 14 BL biopsies. Alignments of the novel sequence variants for 114 published EBV genomes, including 27 from BL cases, revealed four LMP-1 variant patterns, designated A to D. Pattern A variant was found in 48% of BL EBV genomes. Here, we used PCR-Sanger sequencing to evaluate 50 additional BL biopsies from Ghana, Brazil, and Argentina, and peripheral blood samples from 113 eBL cases and 115 controls in Uganda. Pattern A was found in 60.9% of 64 BL biopsies evaluated. Compared to PCR-negative subjects in Uganda, detection of Pattern A in peripheral blood was associated with eBL case status (odds ratio [OR] 31.7, 95% confidence interval: 6.8⁻149), controlling for relevant confounders. Variant Pattern A and Pattern D were associated with eBL case status, but with lower ORs (9.7 and 13.6, respectively). Our results support the hypothesis that EBV LMP-1 Pattern A may be associated with eBL, but it is not the sole associated variant. Further research is needed to replicate and elucidate our findings.
Next-generation sequencing using a pre-designed gene panel for the molecular diagnosis of congenital disorders in pediatric patients.

PubMed

Lim, Eileen C P; Brett, Maggie; Lai, Angeline H M; Lee, Siew-Peng; Tan, Ee-Shien; Jamuar, Saumya S; Ng, Ivy S L; Tan, Ene-Choo

2015-12-14

Next-generation sequencing (NGS) has revolutionized genetic research and offers enormous potential for clinical application. Sequencing the exome has the advantage of casting the net wide for all known coding regions while targeted gene panel sequencing provides enhanced sequencing depths and can be designed to avoid incidental findings in adult-onset conditions. A HaloPlex panel consisting of 180 genes within commonly altered chromosomal regions is available for use on both the Ion Personal Genome Machine (PGM) and MiSeq platforms to screen for causative mutations in these genes. We used this Haloplex ICCG panel for targeted sequencing of 15 patients with clinical presentations indicative of an abnormality in one of the 180 genes. Sequencing runs were done using the Ion 318 Chips on the Ion Torrent PGM. Variants were filtered for known polymorphisms and analysis was done to identify possible disease-causing variants before validation by Sanger sequencing. When possible, segregation of variants with phenotype in family members was performed to ascertain the pathogenicity of the variant. More than 97% of the target bases were covered at >20×. There was an average of 9.6 novel variants per patient. Pathogenic mutations were identified in five genes for six patients, with two novel variants. There were another five likely pathogenic variants, some of which were unreported novel variants. In a cohort of 15 patients, we were able to identify a likely genetic etiology in six patients (40%). Another five patients had candidate variants for which further evaluation and segregation analysis are ongoing. Our results indicate that the HaloPlex ICCG panel is useful as a rapid, high-throughput and cost-effective screening tool for 170 of the 180 genes. There is low coverage for some regions in several genes which might have to be supplemented by Sanger sequencing. However, comparing the cost, ease of analysis, and shorter turnaround time, it is a good alternative to exome sequencing for patients whose features are suggestive of a genetic etiology involving one of the genes in the panel.
Inferring Short-Range Linkage Information from Sequencing Chromatograms

PubMed Central

Beggel, Bastian; Neumann-Fraune, Maria; Kaiser, Rolf; Verheyen, Jens; Lengauer, Thomas

2013-01-01

Direct Sanger sequencing of viral genome populations yields multiple ambiguous sequence positions. It is not straightforward to derive linkage information from sequencing chromatograms, which in turn hampers the correct interpretation of the sequence data. We present a method for determining the variants existing in a viral quasispecies in the case of two nearby ambiguous sequence positions by exploiting the effect of sequence context-dependent incorporation of dideoxynucleotides. The computational model was trained on data from sequencing chromatograms of clonal variants and was evaluated on two test sets of in vitro mixtures. The approach achieved high accuracies in identifying the mixture components of 97.4% on a test set in which the positions to be analyzed are only one base apart from each other, and of 84.5% on a test set in which the ambiguous positions are separated by three bases. In silico experiments suggest two major limitations of our approach in terms of accuracy. First, due to a basic limitation of Sanger sequencing, it is not possible to reliably detect minor variants with a relative frequency of no more than 10%. Second, the model cannot distinguish between mixtures of two or four clonal variants, if one of two sets of linear constraints is fulfilled. Furthermore, the approach requires repetitive sequencing of all variants that might be present in the mixture to be analyzed. Nevertheless, the effectiveness of our method on the two in vitro test sets shows that short-range linkage information of two ambiguous sequence positions can be inferred from Sanger sequencing chromatograms without any further assumptions on the mixture composition. Additionally, our model provides new insights into the established and widely used Sanger sequencing technology. The source code of our method is made available at http://bioinf.mpi-inf.mpg.de/publications/beggel/linkageinformation.zip. PMID:24376502
An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder.

PubMed

Werling, Donna M; Brand, Harrison; An, Joon-Yong; Stone, Matthew R; Zhu, Lingxue; Glessner, Joseph T; Collins, Ryan L; Dong, Shan; Layer, Ryan M; Markenscoff-Papadimitriou, Eirene; Farrell, Andrew; Schwartz, Grace B; Wang, Harold Z; Currall, Benjamin B; Zhao, Xuefang; Dea, Jeanselle; Duhn, Clif; Erdman, Carolyn A; Gilson, Michael C; Yadav, Rachita; Handsaker, Robert E; Kashin, Seva; Klei, Lambertus; Mandell, Jeffrey D; Nowakowski, Tomasz J; Liu, Yuwen; Pochareddy, Sirisha; Smith, Louw; Walker, Michael F; Waterman, Matthew J; He, Xin; Kriegstein, Arnold R; Rubenstein, John L; Sestan, Nenad; McCarroll, Steven A; Neale, Benjamin M; Coon, Hilary; Willsey, A Jeremy; Buxbaum, Joseph D; Daly, Mark J; State, Matthew W; Quinlan, Aaron R; Marth, Gabor T; Roeder, Kathryn; Devlin, Bernie; Talkowski, Michael E; Sanders, Stephan J

2018-05-01

Genomic association studies of common or rare protein-coding variation have established robust statistical approaches to account for multiple testing. Here we present a comparable framework to evaluate rare and de novo noncoding single-nucleotide variants, insertion/deletions, and all classes of structural variation from whole-genome sequencing (WGS). Integrating genomic annotations at the level of nucleotides, genes, and regulatory regions, we define 51,801 annotation categories. Analyses of 519 autism spectrum disorder families did not identify association with any categories after correction for 4,123 effective tests. Without appropriate correction, biologically plausible associations are observed in both cases and controls. Despite excluding previously identified gene-disrupting mutations, coding regions still exhibited the strongest associations. Thus, in autism, the contribution of de novo noncoding variation is probably modest in comparison to that of de novo coding variants. Robust results from future WGS studies will require large cohorts and comprehensive analytical strategies that consider the substantial multiple-testing burden.
Mutations in PIGY: expanding the phenotype of inherited glycosylphosphatidylinositol deficiencies

PubMed Central

Ilkovski, Biljana; Pagnamenta, Alistair T.; O'Grady, Gina L.; Kinoshita, Taroh; Howard, Malcolm F.; Lek, Monkol; Thomas, Brett; Turner, Anne; Christodoulou, John; Sillence, David; Knight, Samantha J.L.; Popitsch, Niko; Keays, David A.; Anzilotti, Consuelo; Goriely, Anne; Waddell, Leigh B.; Brilot, Fabienne; North, Kathryn N.; Kanzawa, Noriyuki; Macarthur, Daniel G.; Taylor, Jenny C.; Kini, Usha; Murakami, Yoshiko; Clarke, Nigel F.

2015-01-01

Glycosylphosphatidylinositol (GPI)-anchored proteins are ubiquitously expressed in the human body and are important for various functions at the cell surface. Mutations in many GPI biosynthesis genes have been described to date in patients with multi-system disease and together these constitute a subtype of congenital disorders of glycosylation. We used whole exome sequencing in two families to investigate the genetic basis of disease and used RNA and cellular studies to investigate the functional consequences of sequence variants in the PIGY gene. Two families with different phenotypes had homozygous recessive sequence variants in the GPI biosynthesis gene PIGY. Two sisters with c.137T>C (p.Leu46Pro) PIGY variants had multi-system disease including dysmorphism, seizures, severe developmental delay, cataracts and early death. There were significantly reduced levels of GPI-anchored proteins (CD55 and CD59) on the surface of patient-derived skin fibroblasts (∼20–50% compared with controls). In a second, consanguineous family, two siblings had moderate development delay and microcephaly. A homozygous PIGY promoter variant (c.-540G>A) was detected within a 7.7 Mb region of autozygosity. This variant was predicted to disrupt a SP1 consensus binding site and was shown to be associated with reduced gene expression. Mutations in PIGY can occur in coding and non-coding regions of the gene and cause variable phenotypes. This article contributes to understanding of the range of disease phenotypes and disease genes associated with deficiencies of the GPI-anchor biosynthesis pathway and also serves to highlight the potential importance of analysing variants detected in 5′-UTR regions despite their typically low coverage in exome data. PMID:26293662
Mutations in PIGY: expanding the phenotype of inherited glycosylphosphatidylinositol deficiencies.

PubMed

Ilkovski, Biljana; Pagnamenta, Alistair T; O'Grady, Gina L; Kinoshita, Taroh; Howard, Malcolm F; Lek, Monkol; Thomas, Brett; Turner, Anne; Christodoulou, John; Sillence, David; Knight, Samantha J L; Popitsch, Niko; Keays, David A; Anzilotti, Consuelo; Goriely, Anne; Waddell, Leigh B; Brilot, Fabienne; North, Kathryn N; Kanzawa, Noriyuki; Macarthur, Daniel G; Taylor, Jenny C; Kini, Usha; Murakami, Yoshiko; Clarke, Nigel F

2015-11-01

Glycosylphosphatidylinositol (GPI)-anchored proteins are ubiquitously expressed in the human body and are important for various functions at the cell surface. Mutations in many GPI biosynthesis genes have been described to date in patients with multi-system disease and together these constitute a subtype of congenital disorders of glycosylation. We used whole exome sequencing in two families to investigate the genetic basis of disease and used RNA and cellular studies to investigate the functional consequences of sequence variants in the PIGY gene. Two families with different phenotypes had homozygous recessive sequence variants in the GPI biosynthesis gene PIGY. Two sisters with c.137T>C (p.Leu46Pro) PIGY variants had multi-system disease including dysmorphism, seizures, severe developmental delay, cataracts and early death. There were significantly reduced levels of GPI-anchored proteins (CD55 and CD59) on the surface of patient-derived skin fibroblasts (∼20-50% compared with controls). In a second, consanguineous family, two siblings had moderate development delay and microcephaly. A homozygous PIGY promoter variant (c.-540G>A) was detected within a 7.7 Mb region of autozygosity. This variant was predicted to disrupt a SP1 consensus binding site and was shown to be associated with reduced gene expression. Mutations in PIGY can occur in coding and non-coding regions of the gene and cause variable phenotypes. This article contributes to understanding of the range of disease phenotypes and disease genes associated with deficiencies of the GPI-anchor biosynthesis pathway and also serves to highlight the potential importance of analysing variants detected in 5'-UTR regions despite their typically low coverage in exome data. © The Author 2015. Published by Oxford University Press.
WES homozygosity mapping in a recessive form of Charcot-Marie-Tooth neuropathy reveals intronic GDAP1 variant leading to a premature stop codon.

PubMed

Masingue, Marion; Perrot, Jimmy; Carlier, Robert-Yves; Piguet-Lacroix, Guenaelle; Latour, Philippe; Stojkovic, Tanya

2018-05-01

Charcot-Marie-Tooth disease (CMT) refers to a group of clinically and genetically heterogeneous inherited neuropathies. Ganglioside-induced differentiation-associated protein 1 GDAP1-related CMT has been reported in an autosomal dominant or recessive form in patients presenting either axonal or demyelinating neuropathy. We report two Sri Lankan sisters born to consanguineous parents and presenting with a severe axonal sensorimotor neuropathy. The early onset of the disease, the distal and proximal weakness and atrophy leading to major disability, along with areflexia, and, most notably, vocal cord and diaphragm paralysis were highly evocative of a GDAP1-related CMT. However, sequencing of the coding regions of the gene was normal. Whole-exome sequencing (WES) was performed and revealed that the largest region of homozygosity was around GDAP1 with several variants, mostly in non-coding regions. In view of the high clinical suspicion of GDAP1 gene involvement, we examined the variants in this gene and this, along with functional studies, allowed us to identify an alternative splicing site revealing a cryptic in-frame stop codon in intron 4 responsible for a severe loss of wild-type GDAP1. This work is the first to describe a deleterious mutation in GDAP1 gene outside of coding sequences or intronic junctions and emphasizes the importance of interpreting molecular analysis, and in particular WES results, in light of the clinical and electrophysiological phenotype.
CHEK2 contribution to hereditary breast cancer in non-BRCA families

PubMed Central

2011-01-01

Background Mutations in the BRCA1 and BRCA2 genes are responsible for only a part of hereditary breast cancer (HBC). The origins of "non-BRCA" HBC in families may be attributed in part to rare mutations in genes conferring moderate risk, such as CHEK2, which encodes for an upstream regulator of BRCA1. Previous studies have demonstrated an association between CHEK2 founder mutations and non-BRCA HBC. However, very few data on the entire coding sequence of this gene are available. Methods We investigated the contribution of CHEK2 mutations to non-BRCA HBC by direct sequencing of its whole coding sequence in 507 non-BRCA HBC cases and 513 controls. Results We observed 16 mutations in cases and 4 in controls, including 9 missense variants of uncertain consequence. Using both in silico tools and an in vitro kinase activity test, the majority of the variants were found likely to be deleterious for protein function. One variant present in both cases and controls was proposed to be neutral. Removing this variant from the pool of potentially deleterious variants gave a mutation frequency of 1.48% for cases and 0.29% for controls (P = 0.0040). The odds ratio of breast cancer in the presence of a deleterious CHEK2 mutation was 5.18. Conclusions Our work indicates that a variety of deleterious CHEK2 alleles make an appreciable contribution to breast cancer susceptibility, and their identification could help in the clinical management of patients carrying a CHEK2 mutation. PMID:22114986
Validation of a next-generation sequencing assay for clinical molecular oncology.

PubMed

Cottrell, Catherine E; Al-Kateb, Hussam; Bredemeyer, Andrew J; Duncavage, Eric J; Spencer, David H; Abel, Haley J; Lockwood, Christina M; Hagemann, Ian S; O'Guin, Stephanie M; Burcea, Lauren C; Sawyer, Christopher S; Oschwald, Dayna M; Stratman, Jennifer L; Sher, Dorie A; Johnson, Mark R; Brown, Justin T; Cliften, Paul F; George, Bijoy; McIntosh, Leslie D; Shrivastava, Savita; Nguyen, Tudung T; Payton, Jacqueline E; Watson, Mark A; Crosby, Seth D; Head, Richard D; Mitra, Robi D; Nagarajan, Rakesh; Kulkarni, Shashikant; Seibert, Karen; Virgin, Herbert W; Milbrandt, Jeffrey; Pfeifer, John D

2014-01-01

Currently, oncology testing includes molecular studies and cytogenetic analysis to detect genetic aberrations of clinical significance. Next-generation sequencing (NGS) allows rapid analysis of multiple genes for clinically actionable somatic variants. The WUCaMP assay uses targeted capture for NGS analysis of 25 cancer-associated genes to detect mutations at actionable loci. We present clinical validation of the assay and a detailed framework for design and validation of similar clinical assays. Deep sequencing of 78 tumor specimens (≥ 1000× average unique coverage across the capture region) achieved high sensitivity for detecting somatic variants at low allele fraction (AF). Validation revealed sensitivities and specificities of 100% for detection of single-nucleotide variants (SNVs) within coding regions, compared with SNP array sequence data (95% CI = 83.4-100.0 for sensitivity and 94.2-100.0 for specificity) or whole-genome sequencing (95% CI = 89.1-100.0 for sensitivity and 99.9-100.0 for specificity) of HapMap samples. Sensitivity for detecting variants at an observed 10% AF was 100% (95% CI = 93.2-100.0) in HapMap mixes. Analysis of 15 masked specimens harboring clinically reported variants yielded concordant calls for 13/13 variants at AF of ≥ 15%. The WUCaMP assay is a robust and sensitive method to detect somatic variants of clinical significance in molecular oncology laboratories, with reduced time and cost of genetic analysis allowing for strategic patient management. Copyright © 2014 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Golbus, Jessica R.; Puckelwartz, Megan J.; Dellefave-Castillo, Lisa

Background—Cardiomyopathy is highly heritable but genetically diverse. At present, genetic testing for cardiomyopathy uses targeted sequencing to simultaneously assess the coding regions of more than 50 genes. New genes are routinely added to panels to improve the diagnostic yield. With the anticipated $1000 genome, it is expected that genetic testing will shift towards comprehensive genome sequencing accompanied by targeted gene analysis. Therefore, we assessed the reliability of whole genome sequencing and targeted analysis to identify cardiomyopathy variants in 11 subjects with cardiomyopathy. Methods and Results—Whole genome sequencing with an average of 37× coverage was combined with targeted analysis focused onmore » 204 genes linked to cardiomyopathy. Genetic variants were scored using multiple prediction algorithms combined with frequency data from public databases. This pipeline yielded 1-14 potentially pathogenic variants per individual. Variants were further analyzed using clinical criteria and/or segregation analysis. Three of three previously identified primary mutations were detected by this analysis. In six subjects for whom the primary mutation was previously unknown, we identified mutations that segregated with disease, had clinical correlates, and/or had additional pathological correlation to provide evidence for causality. For two subjects with previously known primary mutations, we identified additional variants that may act as modifiers of disease severity. In total, we identified the likely pathological mutation in 9 of 11 (82%) subjects. We conclude that these pilot data demonstrate that ~30-40× coverage whole genome sequencing combined with targeted analysis is feasible and sensitive to identify rare variants in cardiomyopathy-associated genes.« less
Whole Exome Sequencing of Pediatric Gastric Adenocarcinoma Reveals an Atypical Presentation of Li-Fraumeni Syndrome

PubMed Central

Chang, Vivian Y.; Federman, Noah; Martinez-Agosto, Julian; Tatishchev, Sergei F.; Nelson, Stanley F.

2014-01-01

Background Gastric adenocarcinoma is a rare diagnosis in childhood. A 14-year old male patient presented with metastatic gastric adenocarcinoma, and a strong family history of colon cancer. Clinical sequencing of CDH1 and APC were negative. Whole exome sequencing was therefore applied to capture the majority of protein-coding regions for the identification of single-nucleotide variants, small insertion/deletions, and copy number abnormalities in the patient’s germline as well as primary tumor. Materials and Methods DNA was extracted from the patient’s blood, primary tumor, and the unaffected mother’s blood. DNA libraries were constructed and sequenced on Illumina HiSeq2000. Data were post-processed using Picard and Samtools, then analyzed with the Genome Analysis Toolkit. Variants were annotated using an in-house Ensembl-based program. Copy number was assessed using ExomeCNV. Results Each sample was sequenced to a mean depth of coverage of greater than 120×. A rare non-synonymous coding SNV in TP53 was identified in the germline. There were 10 somatic cancer protein-damaging variants that were not observed in the unaffected mother genome. ExomeCNV comparing tumor to the patient’s germline, identified abnormal copy number, spanning 6,946 genes. Conclusion We present an unusual case of Li-Fraumeni detected by whole exome sequencing. There were also likely driver somatic mutations in the gastric adenocarcinoma. These results highlight the need for more thorough and broad scale germline and cancer analyses to accurately inform patients of inherited risk to cancer and to identify somatic mutations. PMID:23015295

Targeted Analysis of Whole Genome Sequence Data to Diagnose Genetic Cardiomyopathy

DOE PAGES

Golbus, Jessica R.; Puckelwartz, Megan J.; Dellefave-Castillo, Lisa; ...

2014-09-01

Background—Cardiomyopathy is highly heritable but genetically diverse. At present, genetic testing for cardiomyopathy uses targeted sequencing to simultaneously assess the coding regions of more than 50 genes. New genes are routinely added to panels to improve the diagnostic yield. With the anticipated $1000 genome, it is expected that genetic testing will shift towards comprehensive genome sequencing accompanied by targeted gene analysis. Therefore, we assessed the reliability of whole genome sequencing and targeted analysis to identify cardiomyopathy variants in 11 subjects with cardiomyopathy. Methods and Results—Whole genome sequencing with an average of 37× coverage was combined with targeted analysis focused onmore » 204 genes linked to cardiomyopathy. Genetic variants were scored using multiple prediction algorithms combined with frequency data from public databases. This pipeline yielded 1-14 potentially pathogenic variants per individual. Variants were further analyzed using clinical criteria and/or segregation analysis. Three of three previously identified primary mutations were detected by this analysis. In six subjects for whom the primary mutation was previously unknown, we identified mutations that segregated with disease, had clinical correlates, and/or had additional pathological correlation to provide evidence for causality. For two subjects with previously known primary mutations, we identified additional variants that may act as modifiers of disease severity. In total, we identified the likely pathological mutation in 9 of 11 (82%) subjects. We conclude that these pilot data demonstrate that ~30-40× coverage whole genome sequencing combined with targeted analysis is feasible and sensitive to identify rare variants in cardiomyopathy-associated genes.« less
Sex is a moderator of the association between NOS1AP sequence variants and QTc in two long QT syndrome founder populations: a pedigree-based measured genotype association analysis.

PubMed

Winbo, Annika; Stattin, Eva-Lena; Westin, Ida Maria; Norberg, Anna; Persson, Johan; Jensen, Steen M; Rydberg, Annika

2017-07-18

Sequence variants in the NOS1AP gene have repeatedly been reported to influence QTc, albeit with moderate effect sizes. In the long QT syndrome (LQTS), this may contribute to the substantial QTc variance seen among carriers of identical pathogenic sequence variants. Here we assess three non-coding NOS1AP sequence variants, chosen for their previously reported strong association with QTc in normal and LQTS populations, for association with QTc in two Swedish LQT1 founder populations. This study included 312 individuals (58% females) from two LQT1 founder populations, whereof 227 genotype positive segregating either Y111C (n = 148) or R518* (n = 79) pathogenic sequence variants in the KCNQ1 gene, and 85 genotype negatives. All were genotyped for NOS1AP sequence variants rs12143842, rs16847548 and rs4657139, and tested for association with QTc length (effect size presented as mean difference between derived and wildtype, in ms), using a pedigree-based measured genotype association analysis. Mean QTc was obtained by repeated manual measurement (preferably in lead II) by one observer using coded 50 mm/s standard 12-lead ECGs. A substantial variance in mean QTc was seen in genotype positives 476 ± 36 ms (Y111C 483 ± 34 ms; R518* 462 ± 34 ms) and genotype negatives 433 ± 24 ms. Female sex was significantly associated with QTc prolongation in all genotype groups (p < 0.001). In a multivariable analysis including the entire study population and adjusted for KCNQ1 genotype, sex and age, NOS1AP sequence variants rs12143842 and rs16847548 (but not rs4657139) were significantly associated with QT prolongation, +18 ms (p = 0.0007) and +17 ms (p = 0.006), respectively. Significant sex-interactions were detected for both sequent variants (interaction term r = 0.892, p < 0.001 and r = 0.944, p < 0.001, respectively). Notably, across the genotype groups, when stratified by sex neither rs12143842 nor rs16847548 were significantly associated with QTc in females (both p = 0.16) while in males, a prolongation of +19 ms and +8 ms (p = 0.002 and p = 0.02) was seen in multivariable analysis, explaining up to 23% of QTc variance in all males. Sex was identified as a moderator of the association between NOS1AP sequence variants and QTc in two LQT1 founder populations. This finding may contribute to QTc sex differences and affect the usefulness of NOS1AP as a marker for clinical risk stratification in LQTS.
A Python package for parsing, validating, mapping and formatting sequence variants using HGVS nomenclature.

PubMed

Hart, Reece K; Rico, Rudolph; Hare, Emily; Garcia, John; Westbrook, Jody; Fusaro, Vincent A

2015-01-15

Biological sequence variants are commonly represented in scientific literature, clinical reports and databases of variation using the mutation nomenclature guidelines endorsed by the Human Genome Variation Society (HGVS). Despite the widespread use of the standard, no freely available and comprehensive programming libraries are available. Here we report an open-source and easy-to-use Python library that facilitates the parsing, manipulation, formatting and validation of variants according to the HGVS specification. The current implementation focuses on the subset of the HGVS recommendations that precisely describe sequence-level variation relevant to the application of high-throughput sequencing to clinical diagnostics. The package is released under the Apache 2.0 open-source license. Source code, documentation and issue tracking are available at http://bitbucket.org/hgvs/hgvs/. Python packages are available at PyPI (https://pypi.python.org/pypi/hgvs). Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
A Python package for parsing, validating, mapping and formatting sequence variants using HGVS nomenclature

PubMed Central

Hart, Reece K.; Rico, Rudolph; Hare, Emily; Garcia, John; Westbrook, Jody; Fusaro, Vincent A.

2015-01-01

Summary: Biological sequence variants are commonly represented in scientific literature, clinical reports and databases of variation using the mutation nomenclature guidelines endorsed by the Human Genome Variation Society (HGVS). Despite the widespread use of the standard, no freely available and comprehensive programming libraries are available. Here we report an open-source and easy-to-use Python library that facilitates the parsing, manipulation, formatting and validation of variants according to the HGVS specification. The current implementation focuses on the subset of the HGVS recommendations that precisely describe sequence-level variation relevant to the application of high-throughput sequencing to clinical diagnostics. Availability and implementation: The package is released under the Apache 2.0 open-source license. Source code, documentation and issue tracking are available at http://bitbucket.org/hgvs/hgvs/. Python packages are available at PyPI (https://pypi.python.org/pypi/hgvs). Contact: reecehart@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25273102
Association of Amine-Receptor DNA Sequence Variants with Associative Learning in the Honeybee.

PubMed

Lagisz, Malgorzata; Mercer, Alison R; de Mouzon, Charlotte; Santos, Luana L S; Nakagawa, Shinichi

2016-03-01

Octopamine- and dopamine-based neuromodulatory systems play a critical role in learning and learning-related behaviour in insects. To further our understanding of these systems and resulting phenotypes, we quantified DNA sequence variations at six loci coding octopamine-and dopamine-receptors and their association with aversive and appetitive learning traits in a population of honeybees. We identified 79 polymorphic sequence markers (mostly SNPs and a few insertions/deletions) located within or close to six candidate genes. Intriguingly, we found that levels of sequence variation in the protein-coding regions studied were low, indicating that sequence variation in the coding regions of receptor genes critical to learning and memory is strongly selected against. Non-coding and upstream regions of the same genes, however, were less conserved and sequence variations in these regions were weakly associated with between-individual differences in learning-related traits. While these associations do not directly imply a specific molecular mechanism, they suggest that the cross-talk between dopamine and octopamine signalling pathways may influence olfactory learning and memory in the honeybee.
Rare missense variants in CHRNB3 and CHRNA3 are associated with risk of alcohol and cocaine dependence.

PubMed

Haller, Gabe; Kapoor, Manav; Budde, John; Xuei, Xiaoling; Edenberg, Howard; Nurnberger, John; Kramer, John; Brooks, Andy; Tischfield, Jay; Almasy, Laura; Agrawal, Arpana; Bucholz, Kathleen; Rice, John; Saccone, Nancy; Bierut, Laura; Goate, Alison

2014-02-01

Previous findings have demonstrated that variants in nicotinic receptor genes are associated with nicotine, alcohol and cocaine dependence. Because of the substantial comorbidity, it has often been unclear whether a variant is associated with multiple substances or whether the association is actually with a single substance. To investigate the possible contribution of rare variants to the development of substance dependencies other than nicotine dependence, specifically alcohol and cocaine dependence, we undertook pooled sequencing of the coding regions and flanking sequence of CHRNA5, CHRNA3, CHRNB4, CHRNA6 and CHRNB3 in 287 African American and 1028 European American individuals from the Collaborative Study of the Genetics of Alcoholism (COGA). All members of families for whom any individual was sequenced (2504 African Americans and 7318 European Americans) were then genotyped for all variants identified by sequencing. For each gene, we then tested for association using FamSKAT. For European Americans, we find increased DSM-IV cocaine dependence symptoms (FamSKAT P = 2 × 10(-4)) and increased DSM-IV alcohol dependence symptoms (FamSKAT P = 5 × 10(-4)) among carriers of missense variants in CHRNB3. Additionally, one variant (rs149775276; H329Y) shows association with both cocaine dependence symptoms (P = 7.4 × 10(-5), β = 2.04) and alcohol dependence symptoms (P = 2.6 × 10(-4), β = 2.04). For African Americans, we find decreased cocaine dependence symptoms among carriers of missense variants in CHRNA3 (FamSKAT P = 0.005). Replication in an independent sample supports the role of rare variants in CHRNB3 and alcohol dependence (P = 0.006). These are the first results to implicate rare variants in CHRNB3 or CHRNA3 in risk for alcohol dependence or cocaine dependence.
Jannovar: a java library for exome annotation.

PubMed

Jäger, Marten; Wang, Kai; Bauer, Sebastian; Smedley, Damian; Krawitz, Peter; Robinson, Peter N

2014-05-01

Transcript-based annotation and pedigree analysis are two basic steps in the computational analysis of whole-exome sequencing experiments in genetic diagnostics and disease-gene discovery projects. Here, we present Jannovar, a stand-alone Java application as well as a Java library designed to be used in larger software frameworks for exome and genome analysis. Jannovar uses an interval tree to identify all transcripts affected by a given variant, and provides Human Genome Variation Society-compliant annotations both for variants affecting coding sequences and splice junctions as well as untranslated regions and noncoding RNA transcripts. Jannovar can also perform family-based pedigree analysis with Variant Call Format (VCF) files with data from members of a family segregating a Mendelian disorder. Using a desktop computer, Jannovar requires a few seconds to annotate a typical VCF file with exome data. Jannovar is freely available under the BSD2 license. Source code as well as the Java application and library file can be downloaded from http://compbio.charite.de (with tutorial) and https://github.com/charite/jannovar. © 2014 WILEY PERIODICALS, INC.
TP53 Germline Variations Influence the Predisposition and Prognosis of B-Cell Acute Lymphoblastic Leukemia in Children

PubMed Central

Qian, Maoxiang; Cao, Xueyuan; Devidas, Meenakshi; Yang, Wenjian; Cheng, Cheng; Dai, Yunfeng; Carroll, Andrew; Heerema, Nyla A.; Zhang, Hui; Moriyama, Takaya; Gastier-Foster, Julie M.; Xu, Heng; Raetz, Elizabeth; Larsen, Eric; Winick, Naomi; Bowman, W. Paul; Martin, Paul L.; Mardis, Elaine R.; Fulton, Robert; Zambetti, Gerard; Borowitz, Michael; Wood, Brent; Nichols, Kim E.; Carroll, William L.; Pui, Ching-Hon; Mullighan, Charles G.; Evans, William E.; Hunger, Stephen P.; Relling, Mary V.; Loh, Mignon L.

2018-01-01

Purpose Germline TP53 variation is the genetic basis of Li-Fraumeni syndrome, a highly penetrant cancer predisposition condition. Recent reports of germline TP53 variants in childhood hypodiploid acute lymphoblastic leukemia (ALL) suggest that this type of leukemia is another manifestation of Li-Fraumeni syndrome; however, the pattern, prevalence, and clinical relevance of TP53 variants in childhood ALL remain unknown. Patients and Methods Targeted sequencing of TP53 coding regions was performed in 3,801 children from the Children’s Oncology Group frontline ALL clinical trials, AALL0232 and P9900. TP53 variant pathogenicity was evaluated according to experimentally determined transcriptional activity, in silico prediction of damaging effects, and prevalence in non-ALL control populations. TP53 variants were analyzed for their association with ALL presenting features and treatment outcomes. Results We identified 49 unique nonsilent rare TP53 coding variants in 77 (2.0%) of 3,801 patients sequenced, of which 22 variants were classified as pathogenic. TP53 pathogenic variants were significantly over-represented in ALL compared with non-ALL controls (odds ratio, 5.2; P < .001). Children with TP53 pathogenic variants were significantly older at ALL diagnosis (median age, 15.5 years v 7.3 years; P < .001) and were more likely to have hypodiploid ALL (65.4% v 1.2%; P < .001). Carrying germline TP53 pathogenic variants was associated with inferior event-free survival and overall survival (hazard ratio, 4.2 and 3.9; P < .001 and .001, respectively). In particular, children with TP53 pathogenic variants were at a dramatically higher risk of second cancers than those without pathogenic variants, with 5-year cumulative incidence of 25.1% and 0.7% (P < .001), respectively. Conclusion Loss-of-function germline TP53 variants predispose children to ALL and to adverse treatment outcomes with ALL therapy, particularly the risk of second malignant neoplasms. PMID:29300620
Rapid-Onset Obesity with Hypothalamic Dysfunction, Hypoventilation, and Autonomic Dysregulation (ROHHAD): exome sequencing of trios, monozygotic twins and tumours.

PubMed

Barclay, Sarah F; Rand, Casey M; Borch, Lauren A; Nguyen, Lisa; Gray, Paul A; Gibson, William T; Wilson, Richard J A; Gordon, Paul M K; Aung, Zaw; Berry-Kravis, Elizabeth M; Ize-Ludlow, Diego; Weese-Mayer, Debra E; Bech-Hansen, N Torben

2015-08-25

Rapid-onset Obesity with Hypothalamic Dysfunction, Hypoventilation, and Autonomic Dysregulation (ROHHAD) is thought to be a genetic disease caused by de novo mutations, though causative mutations have yet to be identified. We searched for de novo coding mutations among a carefully-diagnosed and clinically homogeneous cohort of 35 ROHHAD patients. We sequenced the exomes of seven ROHHAD trios, plus tumours from four of these patients and the unaffected monozygotic (MZ) twin of one (discovery cohort), to identify constitutional and somatic de novo sequence variants. We further analyzed this exome data to search for candidate genes under autosomal dominant and recessive models, and to identify structural variations. Candidate genes were tested by exome or Sanger sequencing in a replication cohort of 28 ROHHAD singletons. The analysis of the trio-based exomes found 13 de novo variants. However, no two patients had de novo variants in the same gene, and additional patient exomes and mutation analysis in the replication cohort did not provide strong genetic evidence to implicate any of these sequence variants in ROHHAD. Somatic comparisons revealed no coding differences between any blood and tumour samples, or between the two discordant MZ twins. Neither autosomal dominant nor recessive analysis yielded candidate genes for ROHHAD, and we did not identify any potentially causative structural variations. Clinical exome sequencing is highly unlikely to be a useful diagnostic test in patients with true ROHHAD. As ROHHAD has a high risk for fatality if not properly managed, it remains imperative to expand the search for non-exomic genetic risk factors, as well as to investigate other possible mechanisms of disease. In so doing, we will be able to confirm objectively the ROHHAD diagnosis and to contribute to our understanding of obesity, respiratory control, hypothalamic function, and autonomic regulation.
708 Common and 2010 rare DISC1 locus variants identified in 1542 subjects: analysis for association with psychiatric disorder and cognitive traits.

PubMed

Thomson, P A; Parla, J S; McRae, A F; Kramer, M; Ramakrishnan, K; Yao, J; Soares, D C; McCarthy, S; Morris, S W; Cardone, L; Cass, S; Ghiban, E; Hennah, W; Evans, K L; Rebolini, D; Millar, J K; Harris, S E; Starr, J M; MacIntyre, D J; McIntosh, A M; Watson, J D; Deary, I J; Visscher, P M; Blackwood, D H; McCombie, W R; Porteous, D J

2014-06-01

A balanced t(1;11) translocation that transects the Disrupted in schizophrenia 1 (DISC1) gene shows genome-wide significant linkage for schizophrenia and recurrent major depressive disorder (rMDD) in a single large Scottish family, but genome-wide and exome sequencing-based association studies have not supported a role for DISC1 in psychiatric illness. To explore DISC1 in more detail, we sequenced 528 kb of the DISC1 locus in 653 cases and 889 controls. We report 2718 validated single-nucleotide polymorphisms (SNPs) of which 2010 have a minor allele frequency of <1%. Only 38% of these variants are reported in the 1000 Genomes Project European subset. This suggests that many DISC1 SNPs remain undiscovered and are essentially private. Rare coding variants identified exclusively in patients were found in likely functional protein domains. Significant region-wide association was observed between rs16856199 and rMDD (P=0.026, unadjusted P=6.3 × 10(-5), OR=3.48). This was not replicated in additional recurrent major depression samples (replication P=0.11). Combined analysis of both the original and replication set supported the original association (P=0.0058, OR=1.46). Evidence for segregation of this variant with disease in families was limited to those of rMDD individuals referred from primary care. Burden analysis for coding and non-coding variants gave nominal associations with diagnosis and measures of mood and cognition. Together, these observations are likely to generalise to other candidate genes for major mental illness and may thus provide guidelines for the design of future studies.
G2S: a web-service for annotating genomic variants on 3D protein structures.

PubMed

Wang, Juexin; Sheridan, Robert; Sumer, S Onur; Schultz, Nikolaus; Xu, Dong; Gao, Jianjiong

2018-06-01

Accurately mapping and annotating genomic locations on 3D protein structures is a key step in structure-based analysis of genomic variants detected by recent large-scale sequencing efforts. There are several mapping resources currently available, but none of them provides a web API (Application Programming Interface) that supports programmatic access. We present G2S, a real-time web API that provides automated mapping of genomic variants on 3D protein structures. G2S can align genomic locations of variants, protein locations, or protein sequences to protein structures and retrieve the mapped residues from structures. G2S API uses REST-inspired design and it can be used by various clients such as web browsers, command terminals, programming languages and other bioinformatics tools for bringing 3D structures into genomic variant analysis. The webserver and source codes are freely available at https://g2s.genomenexus.org. g2s@genomenexus.org. Supplementary data are available at Bioinformatics online.
Rare, protein-truncating variants in ATM, CHEK2 and PALB2, but not XRCC2, are associated with increased breast cancer risks.

PubMed

Decker, Brennan; Allen, Jamie; Luccarini, Craig; Pooley, Karen A; Shah, Mitul; Bolla, Manjeet K; Wang, Qin; Ahmed, Shahana; Baynes, Caroline; Conroy, Don M; Brown, Judith; Luben, Robert; Ostrander, Elaine A; Pharoah, Paul Dp; Dunning, Alison M; Easton, Douglas F

2017-11-01

Breast cancer (BC) is the most common malignancy in women and has a major heritable component. The risks associated with most rare susceptibility variants are not well estimated. To better characterise the contribution of variants in ATM , CHEK2 , PALB2 and XRCC2 , we sequenced their coding regions in 13 087 BC cases and 5488 controls from East Anglia, UK. Gene coding regions were enriched via PCR, sequenced, variant called and filtered for quality. ORs for BC risk were estimated separately for carriers of truncating variants and of rare missense variants, which were further subdivided by functional domain and pathogenicity as predicted by four in silico algorithms. Truncating variants in PALB2 (OR=4.69, 95% CI 2.27 to 9.68), ATM (OR=3.26; 95% CI 1.82 to 6.46) and CHEK2 (OR=3.11; 95% CI 2.15 to 4.69), but not XRCC2 (OR=0.94; 95% CI 0.26 to 4.19) were associated with increased BC risk. Truncating variants in ATM and CHEK2 were more strongly associated with risk of oestrogen receptor (ER)-positive than ER-negative disease, while those in PALB2 were associated with similar risks for both subtypes. There was also some evidence that missense variants in ATM , CHEK2 and PALB2 may contribute to BC risk, but larger studies are necessary to quantify the magnitude of this effect. Truncating variants in PALB2 are associated with a higher risk of BC than those in ATM or CHEK2 . A substantial risk of BC due to truncating XRCC2 variants can be excluded. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
A novel recurrent mutation in MITF predisposes to familial and sporadic melanoma

PubMed Central

Yokoyama, Satoru; Woods, Susan L.; Boyle, Glen M.; Aoude, Lauren G.; MacGregor, Stuart; Zismann, Victoria; Gartside, Michael; Cust, Anne E.; Haq, Rizwan; Harland, Mark; Taylor, John C.; Duffy, David L.; Holohan, Kelly; Dutton-Regester, Ken; Palmer, Jane M.; Bonazzi, Vanessa; Stark, Mitchell S.; Symmons, Judith; Law, Matthew H.; Schmidt, Christopher; Lanagan, Cathy; O’Connor, Linda; Holland, Elizabeth A.; Schmid, Helen; Maskiell, Judith A.; Jetann, Jodie; Ferguson, Megan; Jenkins, Mark A.; Kefford, Richard F.; Giles, Graham G.; Armstrong, Bruce K.; Aitken, Joanne F.; Hopper, John L.; Whiteman, David C.; Pharoah, Paul D.; Easton, Douglas F.; Dunning, Alison M.; Newton-Bishop, Julia A.; Montgomery, Grant W.; Martin, Nicholas G.; Mann, Graham J.; Bishop, D. Timothy; Tsao, Hensin; Trent, Jeffrey M.; Fisher, David E.; Hayward, Nicholas K.; Brown, Kevin M.

2012-01-01

So far, two familial melanoma genes have been identified, accounting for a minority of genetic risk in families. Mutations in CDKN2A account for approximately 40% of familial cases1, and predisposing mutations in CDK4 have been reported in a very small number of melanoma kindreds2. To identify other familial melanoma genes, here we conducted whole-genome sequencing of probands from several melanoma families, identifying one individual carrying a novel germline variant (coding DNA sequence c.G1075A; protein sequence p.E318K; rs149617956) in the melanoma-lineage-specific oncogene microphthalmia-associated transcription factor (MITF). Although the variant co-segregated with melanoma in some but not all cases in the family, linkage analysis of 31 families subsequently identified to carry the variant generated a log odds ratio (lod) score of 2.7 under a dominant model, indicating E318K as a possible intermediate risk variant. Consistent with this, the E318K variant was significantly associated with melanoma in a large Australian case–control sample. Likewise, it was similarly associated in an independent case–control sample from the United Kingdom. In the Australian sample, the variant allele was significantly over-represented in cases with a family history of melanoma, multiple primary melanomas, or both. The variant allele was also associated with increased naevus count and non-blue eye colour. Functional analysis of E318K showed that MITF encoded by the variant allele had impaired sumoylation and differentially regulated several MITF targets. These data indicate that MITF is a melanoma-predisposition gene and highlight the utility of whole-genome sequencing to identify novel rare variants associated with disease susceptibility. PMID:22080950
Pleiotropic Effects of Variants in Dementia Genes in Parkinson Disease.

PubMed

Ibanez, Laura; Dube, Umber; Davis, Albert A; Fernandez, Maria V; Budde, John; Cooper, Breanna; Diez-Fairen, Monica; Ortega-Cubero, Sara; Pastor, Pau; Perlmutter, Joel S; Cruchaga, Carlos; Benitez, Bruno A

2018-01-01

Background: The prevalence of dementia in Parkinson disease (PD) increases dramatically with advancing age, approaching 80% in patients who survive 20 years with the disease. Increasing evidence suggests clinical, pathological and genetic overlap between Alzheimer disease, dementia with Lewy bodies and frontotemporal dementia with PD. However, the contribution of the dementia-causing genes to PD risk, cognitive impairment and dementia in PD is not fully established. Objective: To assess the contribution of coding variants in Mendelian dementia-causing genes on the risk of developing PD and the effect on cognitive performance of PD patients. Methods: We analyzed the coding regions of the amyloid-beta precursor protein ( APP ), Presenilin 1 and 2 ( PSEN1, PSEN2 ), and Granulin ( GRN ) genes from 1,374 PD cases and 973 controls using pooled-DNA targeted sequence, human exome-chip and whole-exome sequencing (WES) data by single variant and gene base (SKAT-O and burden tests) analyses. Global cognitive function was assessed using the Mini-Mental State Examination (MMSE) or the Montreal Cognitive Assessment (MoCA). The effect of coding variants in dementia-causing genes on cognitive performance was tested by multiple regression analysis adjusting for gender, disease duration, age at dementia assessment, study site and APOE carrier status. Results: Known AD pathogenic mutations in the PSEN1 (p.A79V) and PSEN2 (p.V148I) genes were found in 0.3% of all PD patients. There was a significant burden of rare, likely damaging variants in the GRN and PSEN1 genes in PD patients when compared with frequencies in the European population from the ExAC database. Multiple regression analysis revealed that PD patients carrying rare variants in the APP, PSEN1, PSEN2 , and GRN genes exhibit lower cognitive tests scores than non-carrier PD patients ( p = 2.0 × 10 -4 ), independent of age at PD diagnosis, age at evaluation, APOE status or recruitment site. Conclusions: Pathogenic mutations in the Alzheimer disease-causing genes ( PSEN1 and PSEN2) are found in sporadic PD patients. PD patients with cognitive decline carry rare variants in dementia-causing genes. Variants in genes causing Mendelian neurodegenerative diseases exhibit pleiotropic effects.
Molecular Diagnosis of Cystic Fibrosis.

PubMed

Deignan, Joshua L; Grody, Wayne W

2016-01-01

This unit describes a recommended approach to identifying causal genetic variants in an individual suspected of having cystic fibrosis. An introduction to the genetics and clinical presentation of cystic fibrosis is initially presented, followed by a description of the two main strategies used in the molecular diagnosis of cystic fibrosis: (1) an initial targeted variant panel used to detect only the most common cystic fibrosis-causing variants in the CFTR gene, and (2) sequencing of the entire coding region of the CFTR gene to detect additional rare causal CFTR variants. Finally, the unit concludes with a discussion regarding the analytic and clinical validity of these approaches. Copyright © 2016 John Wiley & Sons, Inc.
Whole Exome Sequencing Identifies Rare Protein-Coding Variants in Behçet's Disease.

PubMed

Ognenovski, Mikhail; Renauer, Paul; Gensterblum, Elizabeth; Kötter, Ina; Xenitidis, Theodoros; Henes, Jörg C; Casali, Bruno; Salvarani, Carlo; Direskeneli, Haner; Kaufman, Kenneth M; Sawalha, Amr H

2016-05-01

Behçet's disease (BD) is a systemic inflammatory disease with an incompletely understood etiology. Despite the identification of multiple common genetic variants associated with BD, rare genetic variants have been less explored. We undertook this study to investigate the role of rare variants in BD by performing whole exome sequencing in BD patients of European descent. Whole exome sequencing was performed in a discovery set comprising 14 German BD patients of European descent. For replication and validation, Sanger sequencing and Sequenom genotyping were performed in the discovery set and in 2 additional independent sets of 49 German BD patients and 129 Italian BD patients of European descent. Genetic association analysis was then performed in BD patients and 503 controls of European descent. Functional effects of associated genetic variants were assessed using bioinformatic approaches. Using whole exome sequencing, we identified 77 rare variants (in 74 genes) with predicted protein-damaging effects in BD. These variants were genotyped in 2 additional patient sets and then analyzed to reveal significant associations with BD at 2 genetic variants detected in all 3 patient sets that remained significant after Bonferroni correction. We detected genetic association between BD and LIMK2 (rs149034313), involved in regulating cytoskeletal reorganization, and between BD and NEIL1 (rs5745908), involved in base excision DNA repair (P = 3.22 × 10(-4) and P = 5.16 × 10(-4) , respectively). The LIMK2 association is a missense variant with predicted protein damage that may influence functional interactions with proteins involved in cytoskeletal regulation by Rho GTPase, inflammation mediated by chemokine and cytokine signaling pathways, T cell activation, and angiogenesis (Bonferroni-corrected P = 5.63 × 10(-14) , P = 7.29 × 10(-6) , P = 1.15 × 10(-5) , and P = 6.40 × 10(-3) , respectively). The genetic association in NEIL1 is a predicted splice donor variant that may introduce a deleterious intron retention and result in a noncoding transcript variant. We used whole exome sequencing in BD for the first time and identified 2 rare putative protein-damaging genetic variants associated with this disease. These genetic variants might influence cytoskeletal regulation and DNA repair mechanisms in BD and might provide further insight into increased leukocyte tissue infiltration and the role of oxidative stress in BD. © 2016, American College of Rheumatology.
Rare Coding Variants in ANGPTL6 Are Associated with Familial Forms of Intracranial Aneurysm.

PubMed

Bourcier, Romain; Le Scouarnec, Solena; Bonnaud, Stéphanie; Karakachoff, Matilde; Bourcereau, Emmanuelle; Heurtebise-Chrétien, Sandrine; Menguy, Céline; Dina, Christian; Simonet, Floriane; Moles, Alexis; Lenoble, Cédric; Lindenbaum, Pierre; Chatel, Stéphanie; Isidor, Bertrand; Génin, Emmanuelle; Deleuze, Jean-François; Schott, Jean-Jacques; Le Marec, Hervé; Loirand, Gervaise; Desal, Hubert; Redon, Richard

2018-01-04

Intracranial aneurysms (IAs) are acquired cerebrovascular abnormalities characterized by localized dilation and wall thinning in intracranial arteries, possibly leading to subarachnoid hemorrhage and severe outcome in case of rupture. Here, we identified one rare nonsense variant (c.1378A>T) in the last exon of ANGPTL6 (Angiopoietin-Like 6)-which encodes a circulating pro-angiogenic factor mainly secreted from the liver-shared by the four tested affected members of a large pedigree with multiple IA-affected case subjects. We showed a 50% reduction of ANGPTL6 serum concentration in individuals heterozygous for the c.1378A>T allele (p.Lys460Ter) compared to relatives homozygous for the normal allele, probably due to the non-secretion of the truncated protein produced by the c.1378A>T transcripts. Sequencing ANGPTL6 in a series of 94 additional index case subjects with familial IA identified three other rare coding variants in five case subjects. Overall, we detected a significant enrichment (p = 0.023) in rare coding variants within this gene among the 95 index case subjects with familial IA, compared to a reference population of 404 individuals with French ancestry. Among the 6 recruited families, 12 out of 13 (92%) individuals carrying IA also carry such variants in ANGPTL6, versus 15 out of 41 (37%) unaffected ones. We observed a higher rate of individuals with a history of high blood pressure among affected versus healthy individuals carrying ANGPTL6 variants, suggesting that ANGPTL6 could trigger cerebrovascular lesions when combined with other risk factors such as hypertension. Altogether, our results indicate that rare coding variants in ANGPTL6 are causally related to familial forms of IA. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Truncating variants in the majority of the cytoplasmic domain of PCDH15 are unlikely to cause Usher syndrome 1F.

PubMed

Perreault-Micale, Cynthia; Frieden, Alexander; Kennedy, Caleb J; Neitzel, Dana; Sullivan, Jessica; Faulkner, Nicole; Hallam, Stephanie; Greger, Valerie

2014-11-01

Loss of function variants in the PCDH15 gene can cause Usher syndrome type 1F, an autosomal recessive disease associated with profound congenital hearing loss, vestibular dysfunction, and retinitis pigmentosa. The Ashkenazi Jewish population has an increased incidence of Usher syndrome type 1F (founder variant p.Arg245X accounts for 75% of alleles), yet the variant spectrum in a panethnic population remains undetermined. We sequenced the coding region and intron-exon borders of PCDH15 using next-generation DNA sequencing technology in approximately 14,000 patients from fertility clinics. More than 600 unique PCDH15 variants (single nucleotide changes and small indels) were identified, including previously described pathogenic variants p.Arg3X, p.Arg245X (five patients), p.Arg643X, p.Arg929X, and p.Arg1106X. Novel truncating variants were also found, including one in the N-terminal extracellular domain (p.Leu877X), but all other novel truncating variants clustered in the exon 33 encoded C-terminal cytoplasmic domain (52 patients, 14 variants). One variant was observed predominantly in African Americans (carrier frequency of 2.3%). The high incidence of truncating exon 33 variants indicates that they are unlikely to cause Usher syndrome type 1F even though many remove a large portion of the gene. They may be tolerated because PCDH15 has several alternate cytoplasmic domain exons and differentially spliced isoforms may function redundantly. Effects of some PCDH15 truncating variants were addressed by deep sequencing of a panethnic population. Copyright © 2014 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
Targeted Deep Sequencing Identifies Rare ‘loss-of-function’ Variants in IFNGR1 for Risk of Atopic Dermatitis Complicated by Eczema Herpeticum

PubMed Central

Gao, Li; Rafaels, Nicholas M; Huang, Lili; Potee, Joseph; Ruczinski, Ingo; Beaty, Terri H.; Paller, Amy S.; Schneider, Lynda C.; Gallo, Rich; Hanifin, Jon M.; Beck, Lisa A.; Geha, Raif S.; Mathias, Rasika A.; Leung, Donald Y. M.

2015-01-01

Background A subset of atopic dermatitis (AD) is associated with increased susceptibility to eczema herpeticum (ADEH+). We previously reported that common single nucleotide polymorphisms (SNPs) in interferon-gamma (IFNG) and receptor 1 (IFNGR1) were associated with ADEH+ phenotype. Objective To interrogate the role of rare variants in IFN-pathway genes for risk of ADEH+. Methods We performed targeted sequencing of interferon-pathway genes (IFNG, IFNGR1, IFNAR1 and IL12RB1) in 228 European American (EA) AD patients selected according to their EH status and severity measured by Eczema Area and Severity Index (EASI). Replication genotyping was performed in independent samples of 219 EA and 333 African Americans (AA). Functional investigation of ‘loss-of-function’ variants was conducted using site-directed mutagenesis. Results We identified 494 single nucleotide variants (SNVs) encompassing 105kb of sequence, including 145 common, 349 (70.6%) rare (minor allele frequency (MAF) <5%) and 86 (17.4%) novel variants, of which 2.8% were coding-synonymous, 93.3% were non-coding (64.6% intronic), and 3.8% were missense. We identified six rare IFNGR1 missense including three damaging variants (Val14Met (V14M), Val61Ile and Tyr397Cys (Y397C)) conferring a higher risk for ADEH+ (P=0.031). Variants V14M and Y397C were confirmed to be deleterious leading to partial IFNGR1 deficiency. Seven common IFNGR1 SNPs, along with common protective haplotypes (2 to 7-SNPs) conferred a reduced risk of ADEH+ (P=0.015-0.002, P=0.0015-0.0004, respectively), and both SNP and haplotype associations were replicated in an independent AA sample (P=0.004-0.0001 and P=0.001-0.0001, respectively). Conclusion Our results provide evidence that both genetic variants in the gene encoding IFNGR1 are implicated in susceptibility to the ADEH+ phenotype. CAPSULE SUMMARY We provided the first evidence that rare functional IFNGR1 mutations contribute to a defective systemic IFN-γ immune response that accounts for the propensity of AD patients to disseminated viral skin infections. PMID:26343451
[Structural organization of 5S ribosomal DNA of Rosa rugosa].

PubMed

Tynkevych, Iu O; Volkov, R A

2014-01-01

In order to clarify molecular organization of the genomic region encoding 5S rRNA in diploid species Rosa rugosa several 5S rDNA repeated units were cloned and sequenced. Analysis of the obtained sequences revealed that only one length variant of 5S rDNA repeated units, which contains intact promoter elements in the intergenic spacer region (IGS) and appears to be transcriptionally active is present in the genome. Additionally, a limited number of 5S rDNA pseudogenes lacking a portion of coding sequence and the complete IGS was detected. A high level of sequence similarity (from 93.7 to 97.5%) between the IGS of major 5S rDNA variants of East Asian R. rugosa and North American R. nitida was found indicating comparatively recent divergence of these species.

Experimental Assessment of Splicing Variants Using Expression Minigenes and Comparison with In Silico Predictions

PubMed Central

Sharma, Neeraj; Sosnay, Patrick R.; Ramalho, Anabela S.; Douville, Christopher; Franca, Arianna; Gottschalk, Laura B.; Park, Jeenah; Lee, Melissa; Vecchio-Pagan, Briana; Raraigh, Karen S.; Amaral, Margarida D.; Karchin, Rachel; Cutting, Garry R.

2015-01-01

Assessment of the functional consequences of variants near splice sites is a major challenge in the diagnostic laboratory. To address this issue, we created expression minigenes (EMGs) to determine the RNA and protein products generated by splice site variants (n = 10) implicated in cystic fibrosis (CF). Experimental results were compared with the splicing predictions of eight in silico tools. EMGs containing the full-length Cystic Fibrosis Transmembrane Conductance Regulator (CFTR) coding sequence and flanking intron sequences generated wild-type transcript and fully processed protein in Human Embryonic Kidney (HEK293) and CF bronchial epithelial (CFBE41o-) cells. Quantification of variant induced aberrant mRNA isoforms was concordant using fragment analysis and pyrosequencing. The splicing patterns of c.1585−1G>A and c.2657+5G>A were comparable to those reported in primary cells from individuals bearing these variants. Bioinformatics predictions were consistent with experimental results for 9/10 variants (MES), 8/10 variants (NNSplice), and 7/10 variants (SSAT and Sroogle). Programs that estimate the consequences of mis-splicing predicted 11/16 (HSF and ASSEDA) and 10/16 (Fsplice and SplicePort) experimentally observed mRNA isoforms. EMGs provide a robust experimental approach for clinical interpretation of splice site variants and refinement of in silico tools. PMID:25066652
Two missense mutations in melanocortin 1 receptor (MC1R) are strongly associated with dark ventral coat color in reindeer (Rangifer tarandus).

PubMed

Våge, D I; Nieminen, M; Anderson, D G; Røed, K H

2014-10-01

The protein-coding region of melanocortin 1 receptor (MC1R) was sequenced to identify potential variation affecting coat color in reindeer (Rangifer tarandus). A T→C sequence variation at nucleotide position 218 (c.218T>C) causing an amino acid (aa) change from methionine to threonine at aa position 73 (p.Met73Thr) was identified. In addition, a T→G sequence variation was found at nucleotide position 839 (c.839T>G), causing phenylalanine to be exchanged by cysteine at aa position 280 (p.Phe280Cys). The two sequence variants (c.218C and c.839G) were found to be closely associated with a darker belly coat compared with animals not having any of these two variants. The aa acid change p.Met73Thr affects the same position as p.Met73Lys previously reported to give constitutive activation of MC1R in black sheep (Ovis aries), whereas p.Phe280Cys is identical to one of two variants previously reported to be associated with dark coat color in Arctic fox (Alopex lagopus), supporting that the two variants found in reindeer are functional. The complete absence of Thr73 and Cys280 among the 51 wild reindeer analyzed provides some evidence that these variants are more common in the domestic herds. © 2014 Stichting International Foundation for Animal Genetics.
Evaluation of Two Highly-Multiplexed Custom Panels for Massively Parallel Semiconductor Sequencing on Paraffin DNA

PubMed Central

Kotoula, Vassiliki; Lyberopoulou, Aggeliki; Papadopoulou, Kyriaki; Charalambous, Elpida; Alexopoulou, Zoi; Gakou, Chryssa; Lakis, Sotiris; Tsolaki, Eleftheria; Lilakos, Konstantinos; Fountzilas, George

2015-01-01

Background—Aim Massively parallel sequencing (MPS) holds promise for expanding cancer translational research and diagnostics. As yet, it has been applied on paraffin DNA (FFPE) with commercially available highly multiplexed gene panels (100s of DNA targets), while custom panels of low multiplexing are used for re-sequencing. Here, we evaluated the performance of two highly multiplexed custom panels on FFPE DNA. Methods Two custom multiplex amplification panels (B, 373 amplicons; T, 286 amplicons) were coupled with semiconductor sequencing on DNA samples from FFPE breast tumors and matched peripheral blood samples (n samples: 316; n libraries: 332). The two panels shared 37% DNA targets (common or shifted amplicons). Panel performance was evaluated in paired sample groups and quartets of libraries, where possible. Results Amplicon read ratios yielded similar patterns per gene with the same panel in FFPE and blood samples; however, performance of common amplicons differed between panels (p<0.001). FFPE genotypes were compared for 1267 coding and non-coding variant replicates, 999 out of which (78.8%) were concordant in different paired sample combinations. Variant frequency was highly reproducible (Spearman’s rho 0.959). Repeatedly discordant variants were of high coverage / low frequency (p<0.001). Genotype concordance was (a) high, for intra-run duplicates with the same panel (mean±SD: 97.2±4.7, 95%CI: 94.8–99.7, p<0.001); (b) modest, when the same DNA was analyzed with different panels (mean±SD: 81.1±20.3, 95%CI: 66.1–95.1, p = 0.004); and (c) low, when different DNA samples from the same tumor were compared with the same panel (mean±SD: 59.9±24.0; 95%CI: 43.3–76.5; p = 0.282). Low coverage / low frequency variants were validated with Sanger sequencing even in samples with unfavourable DNA quality. Conclusions Custom MPS may yield novel information on genomic alterations, provided that data evaluation is adjusted to tumor tissue FFPE DNA. To this scope, eligibility of all amplicons along with variant coverage and frequency need to be assessed. PMID:26039550
Multiplexed direct genomic selection (MDiGS): a pooled BAC capture approach for highly accurate CNV and SNP/INDEL detection.

PubMed

Alvarado, David M; Yang, Ping; Druley, Todd E; Lovett, Michael; Gurnett, Christina A

2014-06-01

Despite declining sequencing costs, few methods are available for cost-effective single-nucleotide polymorphism (SNP), insertion/deletion (INDEL) and copy number variation (CNV) discovery in a single assay. Commercially available methods require a high investment to a specific region and are only cost-effective for large samples. Here, we introduce a novel, flexible approach for multiplexed targeted sequencing and CNV analysis of large genomic regions called multiplexed direct genomic selection (MDiGS). MDiGS combines biotinylated bacterial artificial chromosome (BAC) capture and multiplexed pooled capture for SNP/INDEL and CNV detection of 96 multiplexed samples on a single MiSeq run. MDiGS is advantageous over other methods for CNV detection because pooled sample capture and hybridization to large contiguous BAC baits reduces sample and probe hybridization variability inherent in other methods. We performed MDiGS capture for three chromosomal regions consisting of ∼ 550 kb of coding and non-coding sequence with DNA from 253 patients with congenital lower limb disorders. PITX1 nonsense and HOXC11 S191F missense mutations were identified that segregate in clubfoot families. Using a novel pooled-capture reference strategy, we identified recurrent chromosome chr17q23.1q23.2 duplications and small HOXC 5' cluster deletions (51 kb and 12 kb). Given the current interest in coding and non-coding variants in human disease, MDiGS fulfills a niche for comprehensive and low-cost evaluation of CNVs, coding, and non-coding variants across candidate regions of interest. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Evolutional dynamics of 45S and 5S ribosomal DNA in ancient allohexaploid Atropa belladonna.

PubMed

Volkov, Roman A; Panchuk, Irina I; Borisjuk, Nikolai V; Hosiawa-Baranska, Marta; Maluszynska, Jolanta; Hemleben, Vera

2017-01-23

Polyploid hybrids represent a rich natural resource to study molecular evolution of plant genes and genomes. Here, we applied a combination of karyological and molecular methods to investigate chromosomal structure, molecular organization and evolution of ribosomal DNA (rDNA) in nightshade, Atropa belladonna (fam. Solanaceae), one of the oldest known allohexaploids among flowering plants. Because of their abundance and specific molecular organization (evolutionarily conserved coding regions linked to variable intergenic spacers, IGS), 45S and 5S rDNA are widely used in plant taxonomic and evolutionary studies. Molecular cloning and nucleotide sequencing of A. belladonna 45S rDNA repeats revealed a general structure characteristic of other Solanaceae species, and a very high sequence similarity of two length variants, with the only difference in number of short IGS subrepeats. These results combined with the detection of three pairs of 45S rDNA loci on separate chromosomes, presumably inherited from both tetraploid and diploid ancestor species, example intensive sequence homogenization that led to substitution/elimination of rDNA repeats of one parent. Chromosome silver-staining revealed that only four out of six 45S rDNA sites are frequently transcriptionally active, demonstrating nucleolar dominance. For 5S rDNA, three size variants of repeats were detected, with the major class represented by repeats containing all functional IGS elements required for transcription, the intermediate size repeats containing partially deleted IGS sequences, and the short 5S repeats containing severe defects both in the IGS and coding sequences. While shorter variants demonstrate increased rate of based substitution, probably in their transition into pseudogenes, the functional 5S rDNA variants are nearly identical at the sequence level, pointing to their origin from a single parental species. Localization of the 5S rDNA genes on two chromosome pairs further supports uniparental inheritance from the tetraploid progenitor. The obtained molecular, cytogenetic and phylogenetic data demonstrate complex evolutionary dynamics of rDNA loci in allohexaploid species of Atropa belladonna. The high level of sequence unification revealed in 45S and 5S rDNA loci of this ancient hybrid species have been seemingly achieved by different molecular mechanisms.
An integrative approach to predicting the functional effects of small indels in non-coding regions of the human genome

PubMed Central

Ferlaino, Michael; Rogers, Mark F.; Shihab, Hashem A.; Mort, Matthew; Cooper, David N.; Gaunt, Tom R.; Campbell, Colin

2018-01-01

Background Small insertions and deletions (indels) have a significant influence in human disease and, in terms of frequency, they are second only to single nucleotide variants as pathogenic mutations. As the majority of mutations associated with complex traits are located outside the exome, it is crucial to investigate the potential pathogenic impact of indels in non-coding regions of the human genome. Results We present FATHMM-indel, an integrative approach to predict the functional effect, pathogenic or neutral, of indels in non-coding regions of the human genome. Our method exploits various genomic annotations in addition to sequence data. When validated on benchmark data, FATHMM-indel significantly outperforms CADD and GAVIN, state of the art models in assessing the pathogenic impact of non-coding variants. FATHMM-indel is available via a web server at indels.biocompute.org.uk. Conclusions FATHMM-indel can accurately predict the functional impact and prioritise small indels throughout the whole non-coding genome. PMID:28985712
An integrative approach to predicting the functional effects of small indels in non-coding regions of the human genome.

PubMed

Ferlaino, Michael; Rogers, Mark F; Shihab, Hashem A; Mort, Matthew; Cooper, David N; Gaunt, Tom R; Campbell, Colin

2017-10-06

Small insertions and deletions (indels) have a significant influence in human disease and, in terms of frequency, they are second only to single nucleotide variants as pathogenic mutations. As the majority of mutations associated with complex traits are located outside the exome, it is crucial to investigate the potential pathogenic impact of indels in non-coding regions of the human genome. We present FATHMM-indel, an integrative approach to predict the functional effect, pathogenic or neutral, of indels in non-coding regions of the human genome. Our method exploits various genomic annotations in addition to sequence data. When validated on benchmark data, FATHMM-indel significantly outperforms CADD and GAVIN, state of the art models in assessing the pathogenic impact of non-coding variants. FATHMM-indel is available via a web server at indels.biocompute.org.uk. FATHMM-indel can accurately predict the functional impact and prioritise small indels throughout the whole non-coding genome.
A Protein Domain and Family Based Approach to Rare Variant Association Analysis.

PubMed

Richardson, Tom G; Shihab, Hashem A; Rivas, Manuel A; McCarthy, Mark I; Campbell, Colin; Timpson, Nicholas J; Gaunt, Tom R

2016-01-01

It has become common practice to analyse large scale sequencing data with statistical approaches based around the aggregation of rare variants within the same gene. We applied a novel approach to rare variant analysis by collapsing variants together using protein domain and family coordinates, regarded to be a more discrete definition of a biologically functional unit. Using Pfam definitions, we collapsed rare variants (Minor Allele Frequency ≤ 1%) together in three different ways 1) variants within single genomic regions which map to individual protein domains 2) variants within two individual protein domain regions which are predicted to be responsible for a protein-protein interaction 3) all variants within combined regions from multiple genes responsible for coding the same protein domain (i.e. protein families). A conventional collapsing analysis using gene coordinates was also undertaken for comparison. We used UK10K sequence data and investigated associations between regions of variants and lipid traits using the sequence kernel association test (SKAT). We observed no strong evidence of association between regions of variants based on Pfam domain definitions and lipid traits. Quantile-Quantile plots illustrated that the overall distributions of p-values from the protein domain analyses were comparable to that of a conventional gene-based approach. Deviations from this distribution suggested that collapsing by either protein domain or gene definitions may be favourable depending on the trait analysed. We have collapsed rare variants together using protein domain and family coordinates to present an alternative approach over collapsing across conventionally used gene-based regions. Although no strong evidence of association was detected in these analyses, future studies may still find value in adopting these approaches to detect previously unidentified association signals.
Impact of the HIV-1 genetic background and HIV-1 population size on the evolution of raltegravir resistance.

PubMed

Fun, Axel; Leitner, Thomas; Vandekerckhove, Linos; Däumer, Martin; Thielen, Alexander; Buchholz, Bernd; Hoepelman, Andy I M; Gisolf, Elizabeth H; Schipper, Pauline J; Wensing, Annemarie M J; Nijhuis, Monique

2018-01-05

Emergence of resistance against integrase inhibitor raltegravir in human immunodeficiency virus type 1 (HIV-1) patients is generally associated with selection of one of three signature mutations: Y143C/R, Q148K/H/R or N155H, representing three distinct resistance pathways. The mechanisms that drive selection of a specific pathway are still poorly understood. We investigated the impact of the HIV-1 genetic background and population dynamics on the emergence of raltegravir resistance. Using deep sequencing we analyzed the integrase coding sequence (CDS) in longitudinal samples from five patients who initiated raltegravir plus optimized background therapy at viral loads > 5000 copies/ml. To investigate the role of the HIV-1 genetic background we created recombinant viruses containing the viral integrase coding region from pre-raltegravir samples from two patients in whom raltegravir resistance developed through different pathways. The in vitro selections performed with these recombinant viruses were designed to mimic natural population bottlenecks. Deep sequencing analysis of the viral integrase CDS revealed that the virological response to raltegravir containing therapy inversely correlated with the relative amount of unique sequence variants that emerged suggesting diversifying selection during drug pressure. In 4/5 patients multiple signature mutations representing different resistance pathways were observed. Interestingly, the resistant population can consist of a single resistant variant that completely dominates the population but also of multiple variants from different resistance pathways that coexist in the viral population. We also found evidence for increased diversification after stronger bottlenecks. In vitro selections with low viral titers, mimicking population bottlenecks, revealed that both recombinant viruses and HXB2 reference virus were able to select mutations from different resistance pathways, although typically only one resistance pathway emerged in each individual culture. The generation of a specific raltegravir resistant variant is not predisposed in the genetic background of the viral integrase CDS. Typically, in the early phases of therapy failure the sequence space is explored and multiple resistance pathways emerge and then compete for dominance which frequently results in a switch of the dominant population over time towards the fittest variant or even multiple variants of similar fitness that can coexist in the viral population.
Screening for rare variants in the PNPLA3 gene in obese liver biopsy patients.

PubMed

Zegers, Doreen; Verrijken, An; Francque, Sven; de Freitas, Fenna; Beckers, Sigri; Aerts, Evi; Ruppert, Martin; Hubens, Guy; Michielsen, Peter; Van Hul, Wim; Van Gaal, Luc F

2016-12-01

Previous research has clearly implicated the PNPLA3 gene in the etiology of nonalcoholic fatty liver disease as a polymorphism in the gene was found to be robustly associated to the disease. However, data on the involvement of rare PNPLA3 variants in the development of nonalcoholic fatty liver disease (NAFLD) is currently limited. Therefore, we performed an extensive mutation analysis study on a cohort of obese liver biopsy patients to determine PNPLA3 variation and its correlation with fatty liver disease. We screened the entire coding region of the PNPLA3 gene in DNA samples of 393 obese liver biopsy patients with varying degrees of fatty liver disease. Mutation analysis was performed by high-resolution melting curve analysis in combination with direct sequencing. We identified several common polymorphisms as well as one rare synonymous variant (c.867G>A rs139896256), one rare intronic variant (c.979+13C>T) and 3 nonsynonymous coding variants (p.A76T, p.A104V and p.T200M) in the PNPLA3 gene. In silico analysis indicated that the p.A104V variant will probably have no functional effect, whereas for the p.A76T and p.T200M variant a possible pathogenic effect is suggested. Overall, we showed that novel variants in PNPLA3 are very rare in our liver biopsy cohort, thereby indicating that their impact on the etiology of NAFLD is probably limited. Nevertheless, for the three rare coding variants that were identified in patients with advanced liver disease, further functional characterization will be essential to verify their potential disease causality. Copyright Â© 2016 Elsevier Masson SAS. All rights reserved.
Association between sequence variants in panicle development genes and the number of spikelets per panicle in rice.

PubMed

Jang, Su; Lee, Yunjoo; Lee, Gileung; Seo, Jeonghwan; Lee, Dongryung; Yu, Yoye; Chin, Joong Hyoun; Koh, Hee-Jong

2018-01-15

Balancing panicle-related traits such as panicle length and the numbers of primary and secondary branches per panicle, is key to improving the number of spikelets per panicle in rice. Identifying genetic information contributes to a broader understanding of the roles of gene and provides candidate alleles for use as DNA markers. Discovering relations between panicle-related traits and sequence variants allows opportunity for molecular application in rice breeding to improve the number of spikelets per panicle. In total, 142 polymorphic sites, which constructed 58 haplotypes, were detected in coding regions of ten panicle development gene and 35 sequence variants in six genes were significantly associated with panicle-related traits. Rice cultivars were clustered according to their sequence variant profiles. One of the four resultant clusters, which contained only indica and tong-il varieties, exhibited the largest average number of favorable alleles and highest average number of spikelets per panicle, suggesting that the favorable allele combination found in this cluster was beneficial in increasing the number of spikelets per panicle. Favorable alleles identified in this study can be used to develop functional markers for rice breeding programs. Furthermore, stacking several favorable alleles has the potential to substantially improve the number of spikelets per panicle in rice.
Rare missense variants in CHRNB3 and CHRNA3 are associated with risk of alcohol and cocaine dependence

PubMed Central

Haller, Gabe; Kapoor, Manav; Budde, John; Xuei, Xiaoling; Edenberg, Howard; Nurnberger, John; Kramer, John; Brooks, Andy; Tischfield, Jay; Almasy, Laura; Agrawal, Arpana; Bucholz, Kathleen; Rice, John; Saccone, Nancy; Bierut, Laura; Goate, Alison

2014-01-01

Previous findings have demonstrated that variants in nicotinic receptor genes are associated with nicotine, alcohol and cocaine dependence. Because of the substantial comorbidity, it has often been unclear whether a variant is associated with multiple substances or whether the association is actually with a single substance. To investigate the possible contribution of rare variants to the development of substance dependencies other than nicotine dependence, specifically alcohol and cocaine dependence, we undertook pooled sequencing of the coding regions and flanking sequence of CHRNA5, CHRNA3, CHRNB4, CHRNA6 and CHRNB3 in 287 African American and 1028 European American individuals from the Collaborative Study of the Genetics of Alcoholism (COGA). All members of families for whom any individual was sequenced (2504 African Americans and 7318 European Americans) were then genotyped for all variants identified by sequencing. For each gene, we then tested for association using FamSKAT. For European Americans, we find increased DSM-IV cocaine dependence symptoms (FamSKAT P = 2 × 10−4) and increased DSM-IV alcohol dependence symptoms (FamSKAT P = 5 × 10−4) among carriers of missense variants in CHRNB3. Additionally, one variant (rs149775276; H329Y) shows association with both cocaine dependence symptoms (P = 7.4 × 10−5, β = 2.04) and alcohol dependence symptoms (P = 2.6 × 10−4, β = 2.04). For African Americans, we find decreased cocaine dependence symptoms among carriers of missense variants in CHRNA3 (FamSKAT P = 0.005). Replication in an independent sample supports the role of rare variants in CHRNB3 and alcohol dependence (P = 0.006). These are the first results to implicate rare variants in CHRNB3 or CHRNA3 in risk for alcohol dependence or cocaine dependence. PMID:24057674
SIBIS: a Bayesian model for inconsistent protein sequence estimation.

PubMed

Khenoussi, Walyd; Vanhoutrève, Renaud; Poch, Olivier; Thompson, Julie D

2014-09-01

The prediction of protein coding genes is a major challenge that depends on the quality of genome sequencing, the accuracy of the model used to elucidate the exonic structure of the genes and the complexity of the gene splicing process leading to different protein variants. As a consequence, today's protein databases contain a huge amount of inconsistency, due to both natural variants and sequence prediction errors. We have developed a new method, called SIBIS, to detect such inconsistencies based on the evolutionary information in multiple sequence alignments. A Bayesian framework, combined with Dirichlet mixture models, is used to estimate the probability of observing specific amino acids and to detect inconsistent or erroneous sequence segments. We evaluated the performance of SIBIS on a reference set of protein sequences with experimentally validated errors and showed that the sensitivity is significantly higher than previous methods, with only a small loss of specificity. We also assessed a large set of human sequences from the UniProt database and found evidence of inconsistency in 48% of the previously uncharacterized sequences. We conclude that the integration of quality control methods like SIBIS in automatic analysis pipelines will be critical for the robust inference of structural, functional and phylogenetic information from these sequences. Source code, implemented in C on a linux system, and the datasets of protein sequences are freely available for download at http://www.lbgi.fr/∼julie/SIBIS. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Simultaneous mutation and copy number variation (CNV) detection by multiplex PCR-based GS-FLX sequencing.

PubMed

Goossens, Dirk; Moens, Lotte N; Nelis, Eva; Lenaerts, An-Sofie; Glassee, Wim; Kalbe, Andreas; Frey, Bruno; Kopal, Guido; De Jonghe, Peter; De Rijk, Peter; Del-Favero, Jurgen

2009-03-01

We evaluated multiplex PCR amplification as a front-end for high-throughput sequencing, to widen the applicability of massive parallel sequencers for the detailed analysis of complex genomes. Using multiplex PCR reactions, we sequenced the complete coding regions of seven genes implicated in peripheral neuropathies in 40 individuals on a GS-FLX genome sequencer (Roche). The resulting dataset showed highly specific and uniform amplification. Comparison of the GS-FLX sequencing data with the dataset generated by Sanger sequencing confirmed the detection of all variants present and proved the sensitivity of the method for mutation detection. In addition, we showed that we could exploit the multiplexed PCR amplicons to determine individual copy number variation (CNV), increasing the spectrum of detected variations to both genetic and genomic variants. We conclude that our straightforward procedure substantially expands the applicability of the massive parallel sequencers for sequencing projects of a moderate number of amplicons (50-500) with typical applications in resequencing exons in positional or functional candidate regions and molecular genetic diagnostics. 2008 Wiley-Liss, Inc.
Unique BK virus non-coding control region (NCCR) variants in hematopoietic stem cell transplant recipients with and without hemorrhagic cystitis.

PubMed

Carr, Michael J; McCormack, Grace P; Mutton, Ken J; Crowley, Brendan

2006-04-01

Hematopoietic stem cell transplant recipients frequently develop BK virus (BKV)-associated hemorrhagic cystitis, which coincides with BK viruria. However, the precise role of BKV in the etiology of hemorrhagic cystitis in hematopoietic stem cell transplant recipients remains unclear, since approximately 50% of all such adult transplant recipients excrete BKV, yet do not develop this clinical condition. In the present study, BKV were analyzed to determine if mutations in the non-coding control region (NCCR), and specific BKV sub-types defined by sequence analysis of major capsid protein VP1, were associated with development of hemorrhagic cystitis in hematopoietic stem cell transplant recipients. The regions encoding VP1 and NCCRs of BKV in urine samples collected from 15 hematopoietic stem cell transplant recipients with hemorrhagic cystitis and 20 without this illness were amplified and sequenced. Sequence variations in the NCCRs of BKV were identified in urine samples from those with and without hemorrhagic cystitis. Furthermore, five unique sequence variations within transcription factor binding sites in the canonical NCCR, O-P-Q-R-S, were identified, representing new BKV variants from a population of cloned quasi-species obtained from patients with and without hemorrhagic cystitis. Thirty-five BKV VP1 sequences were analyzed by phylogenetic analysis but no specific BKV sub-type was associated with hemorrhagic cystitis. Five previously unrecognized naturally occurring variants of the BKV are described which involve amplifications, deletions, and rearrangements of the archetypal BKV NCCRs in individuals with and without hemorrhagic cystitis. Architectural rearrangements in the NCCRs of BKV did not appear to be a prerequisite for development of hemorrhagic cystitis in hematopoietic stem cell transplant recipients. Copyright 2006 Wiley-Liss, Inc.
Identification of hemoglobin variants by top-down mass spectrometry using selected diagnostic product ions.

PubMed

Coelho Graça, Didia; Hartmer, Ralf; Jabs, Wolfgang; Beris, Photis; Clerici, Lorella; Stoermer, Carsten; Samii, Kaveh; Hochstrasser, Denis; Tsybin, Yury O; Scherl, Alexander; Lescuyer, Pierre

2015-04-01

Hemoglobin disorder diagnosis is a complex procedure combining several analytical steps. Due to the lack of specificity of the currently used protein analysis methods, the identification of uncommon hemoglobin variants (proteoforms) can become a hard task to accomplish. The aim of this work was to develop a mass spectrometry-based approach to quickly identify mutated protein sequences within globin chain variants. To reach this goal, a top-down electron transfer dissociation mass spectrometry method was developed for hemoglobin β chain analysis. A diagnostic product ion list was established with a color code strategy allowing to quickly and specifically localize a mutation in the hemoglobin β chain sequence. The method was applied to the analysis of rare hemoglobin β chain variants and an (A)γ-β fusion protein. The results showed that the developed data analysis process allows fast and reliable interpretation of top-down electron transfer dissociation mass spectra by nonexpert users in the clinical area.
Germline EMSY sequence alterations in hereditary breast cancer and ovarian cancer families.

PubMed

Määttä, Kirsi M; Nurminen, Riikka; Kankuri-Tammilehto, Minna; Kallioniemi, Anne; Laasanen, Satu-Leena; Schleutker, Johanna

2017-07-24

BRCA1 and BRCA2 mutations explain approximately one-fifth of the inherited susceptibility in high-risk Finnish hereditary breast and ovarian cancer (HBOC) families. EMSY is located in the breast cancer-associated chromosomal region 11q13. The EMSY gene encodes a BRCA2-interacting protein that has been implicated in DNA damage repair and genomic instability. We analysed the role of germline EMSY variation in breast/ovarian cancer predisposition. The present study describes the first EMSY screening in patients with high familial risk for this disease. Index individuals from 71 high-risk, BRCA1/2-negative HBOC families were screened for germline EMSY sequence alterations in protein coding regions and exon-intron boundaries using Sanger sequencing and TaqMan assays. The identified variants were further screened in 36 Finnish HBOC patients and 904 controls. Moreover, one novel intronic deletion was screened in a cohort of 404 breast cancer patients unselected for family history. Haplotype block structure and the association of haplotypes with breast/ovarian cancer were analysed using Haploview. The functionality of the identified variants was predicted using Haploreg, RegulomeDB, Human Splicing Finder, and Pathogenic-or-Not-Pipeline 2. Altogether, 12 germline EMSY variants were observed. Two alterations were located in the coding region, five alterations were intronic, and five alterations were located in the 3'untranslated region (UTR). Variant frequencies did not significantly differ between cases and controls. The novel variant, c.2709 + 122delT, was detected in 1 out of 107 (0.9%) breast cancer patients, and the carrier showed a bilateral form of the disease. The deletion was absent in 897 controls (OR = 25.28; P = 0.1) and in 404 breast cancer patients unselected for family history. No haplotype was identified to increase the risk of breast/ovarian cancer. Functional analyses suggested that variants, particularly in the 3'UTR, were located within regulatory elements. The novel deletion was predicted to affect splicing regulatory elements. These results suggest that the identified EMSY variants are likely neutral at the population level. However, these variants may contribute to breast/ovarian cancer risk in single families. Additional analyses are warranted for rare novel intronic deletions and the 3'UTR variants predicted to have functional roles.
Pathogenic Anti-Müllerian Hormone Variants in Polycystic Ovary Syndrome.

PubMed

Gorsic, Lidija K; Kosova, Gulum; Werstein, Brian; Sisk, Ryan; Legro, Richard S; Hayes, M Geoffrey; Teixeira, Jose M; Dunaif, Andrea; Urbanek, Margrit

2017-08-01

Polycystic ovary syndrome (PCOS), a common endocrine condition, is the leading cause of anovulatory infertility. Given that common disease-susceptibility variants account for only a small percentage of the estimated PCOS heritability, we tested the hypothesis that rare variants contribute to this deficit in heritability. Unbiased whole-genome sequencing (WGS) of 80 patients with PCOS and 24 reproductively normal control subjects identified potentially deleterious variants in AMH, the gene encoding anti-Müllerian hormone (AMH). Targeted sequencing of AMH of 643 patients with PCOS and 153 control patients was used to replicate WGS findings. Dual luciferase reporter assays measured the impact of the variants on downstream AMH signaling. We found 24 rare (minor allele frequency < 0.01) AMH variants in patients with PCOS and control subjects; 18 variants were specific to women with PCOS. Seventeen of 18 (94%) PCOS-specific variants had significantly reduced AMH signaling, whereas none of 6 variants observed in control subjects showed significant defects in signaling. Thus, we identified rare AMH coding variants that reduced AMH-mediated signaling in a subset of patients with PCOS. To our knowledge, this study is the first to identify rare genetic variants associated with a common PCOS phenotype. Our findings suggest decreased AMH signaling as a mechanism for the pathogenesis of PCOS. AMH decreases androgen biosynthesis by inhibiting CYP17 activity; a potential mechanism of action for AMH variants in PCOS, therefore, is to increase androgen biosynthesis due to decreased AMH-mediated inhibition of CYP17 activity. Copyright © 2017 Endocrine Society
Amplicon-based semiconductor sequencing of human exomes: performance evaluation and optimization strategies.

PubMed

Damiati, E; Borsani, G; Giacopuzzi, Edoardo

2016-05-01

The Ion Proton platform allows to perform whole exome sequencing (WES) at low cost, providing rapid turnaround time and great flexibility. Products for WES on Ion Proton system include the AmpliSeq Exome kit and the recently introduced HiQ sequencing chemistry. Here, we used gold standard variants from GIAB consortium to assess the performances in variants identification, characterize the erroneous calls and develop a filtering strategy to reduce false positives. The AmpliSeq Exome kit captures a large fraction of bases (>94 %) in human CDS, ClinVar genes and ACMG genes, but with 2,041 (7 %), 449 (13 %) and 11 (19 %) genes not fully represented, respectively. Overall, 515 protein coding genes contain hard-to-sequence regions, including 90 genes from ClinVar. Performance in variants detection was maximum at mean coverage >120×, while at 90× and 70× we measured a loss of variants of 3.2 and 4.5 %, respectively. WES using HiQ chemistry showed ~71/97.5 % sensitivity, ~37/2 % FDR and ~0.66/0.98 F1 score for indels and SNPs, respectively. The proposed low, medium or high-stringency filters reduced the amount of false positives by 10.2, 21.2 and 40.4 % for indels and 21.2, 41.9 and 68.2 % for SNP, respectively. Amplicon-based WES on Ion Proton platform using HiQ chemistry emerged as a competitive approach, with improved accuracy in variants identification. False-positive variants remain an issue for the Ion Torrent technology, but our filtering strategy can be applied to reduce erroneous variants.
High-throughput sequencing of the entire genomic regions of CCM1/KRIT1, CCM2 and CCM3/PDCD10 to search for pathogenic deep-intronic splice mutations in cerebral cavernous malformations.

PubMed

Rath, Matthias; Jenssen, Sönke E; Schwefel, Konrad; Spiegler, Stefanie; Kleimeier, Dana; Sperling, Christian; Kaderali, Lars; Felbor, Ute

2017-09-01

Cerebral cavernous malformations (CCM) are vascular lesions of the central nervous system that can cause headaches, seizures and hemorrhagic stroke. Disease-associated mutations have been identified in three genes: CCM1/KRIT1, CCM2 and CCM3/PDCD10. The precise proportion of deep-intronic variants in these genes and their clinical relevance is yet unknown. Here, a long-range PCR (LR-PCR) approach for target enrichment of the entire genomic regions of the three genes was combined with next generation sequencing (NGS) to screen for coding and non-coding variants. NGS detected all six CCM1/KRIT1, two CCM2 and four CCM3/PDCD10 mutations that had previously been identified by Sanger sequencing. Two of the pathogenic variants presented here are novel. Additionally, 20 stringently selected CCM index cases that had remained mutation-negative after conventional sequencing and exclusion of copy number variations were screened for deep-intronic mutations. The combination of bioinformatics filtering and transcript analyses did not reveal any deep-intronic splice mutations in these cases. Our results demonstrate that target enrichment by LR-PCR combined with NGS can be used for a comprehensive analysis of the entire genomic regions of the CCM genes in a research context. However, its clinical utility is limited as deep-intronic splice mutations in CCM1/KRIT1, CCM2 and CCM3/PDCD10 seem to be rather rare. Copyright © 2017 Elsevier Masson SAS. All rights reserved.

Rare HFE variants are the most frequent cause of hemochromatosis in non-c282y homozygous patients with hemochromatosis.

PubMed

Hamdi-Rozé, Houda; Beaumont-Epinette, Marie-Pascale; Ben Ali, Zeineb; Le Lan, Caroline; Loustaud-Ratti, Véronique; Causse, Xavier; Loreal, Olivier; Deugnier, Yves; Brissot, Pierre; Jouanolle, Anne-Marie; Bardou-Jacquet, Edouard

2016-12-01

p.Cys282Tyr (C282Y) homozygosity explains most cases of HFE-related hemochromatosis, but a significant number of patients presenting with typical type I hemochromatosis phenotype remain unexplained. We sought to describe the clinical relevance of rare HFE variants in non-C282Y homozygotes. Patients referred for hemochromatosis to the National Reference Centre for Rare Iron Overload Diseases from 2004 to 2010 were studied. Sequencing was performed for coding region and intronic flanking sequences of HFE, HAMP, HFE2, TFR2, and SLC40A1. Nine private HFE variants were identified in 13 of 206 unrelated patients. Among those, five have not been previously described: p.Leu270Argfs*4, p.Ala271Valfs*25, p.Tyr52*, p.Lys166Asn, and p.Asp141Tyr. Our results show that rare HFE variants are identified more frequently than variants in the other genes associated with iron overload. Rare HFE variants are therefore the most frequent cause of hemochromatosis in non-C282Y homozygote HFE patients. Am. J. Hematol. 91:1202-1205, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Novel human CRYGD rare variant in a Brazilian family with congenital cataract

PubMed Central

Giordano, Gabriel Gorgone; Tavares, Anderson; da Silva, Márcio José; de Vasconcellos, José Paulo Cabral; Arieta, Carlos Eduardo Leite; de Melo, Mônica Barbosa

2011-01-01

Purpose To describe a novel polymorphism in the γD-crystallin (CRYGD) gene in a Brazilian family with congenital cataract. Methods A Brazilian four-generation family was analyzed. The proband had bilateral lamellar cataract and the phenotypes were classified by slit lamp examination. Genomic DNA was extracted from peripheral blood and coding regions and intron/exon boundaries of the αA-crystallin (CRYAA), γC-crystallin (CRYGC), and CRYGD genes were amplified by polymerase chain reaction and directly sequenced. Results Sequencing of the coding regions of CRYGD showed the presence of a heterozygous A→G transversion at c.401 position, which results in the substitution of a tyrosine to a cysteine (Y134C). The polymorphism was identified in three individuals, two affected and one unaffected. Conclusions A novel rare variant in CRYGD (Y134C) was detected in a Brazilian family with congenital cataract. Because there is no segregation between the substitution and the phenotypes in this family, other genetic alterations are likely to be present. PMID:21866214
Mapping of the serotonin 5-HT{sub 1D{beta}} autoreceptor gene on chromosome 6 and direct analysis for sequence variants

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lappalainen, J.; Dean, M.; Virkkunen, M.

1995-04-24

Abnormal brain serotonin function may be characteristic of several neuropsychiatric disorders. Thus, it is important to identify polymorphic genes and screen for functional variants at loci coding for genes that control normal serotonin functions. 5-HT{sub 1D{beta}} is a terminal serotonin autoreceptor which may play a role in regulating serotonin synthesis and release. Using an SSCP technique we screened for 5-HT{sub 1D{beta}} coding sequence variants in psychiatrically interviewed populations, which included controls, alcoholics, and alcoholic arsonists and alcoholic violent offenders with low CSF concentrations of the main serotonin metabolite 5-HIAA. A common polymorphism was identified in the 5-HT{sub 1D{beta}} gene withmore » allele frequencies of 0.72 and 0.28. The SSCP variant was caused by a silent G to C substitution at nucleotide 861 of the coding region. This polymorphism could also be detected as a HincII RFLP of amplified DNA. DNAs from informative CEPH families were typed for the HincII RFLP and analyzed with respect to 20 linked markers on chromosome 6. Multipoint analysis placed the 5-HT{sub 1D{beta}} receptor gene between markers D6S286 and D6S275. A maximum two-point lod score of 10.90 was obtained to D6S26, which had been previously localized on 6q14-15. Chromosomal aberrations involving this region have been previously shown to cause retinal anomalies, developmental delay, and abnormal brain development. This region also contains the gene for North Carolina-type macular dystrophy. 34 refs., 3 figs., 1 tab.« less
Towards Clinical Molecular Diagnosis of Inherited Cardiac Conditions: A Comparison of Bench-Top Genome DNA Sequencers

PubMed Central

Wilkinson, Samuel L.; John, Shibu; Walsh, Roddy; Novotny, Tomas; Valaskova, Iveta; Gupta, Manu; Game, Laurence; Barton, Paul J R.; Cook, Stuart A.; Ware, James S.

2013-01-01

Background Molecular genetic testing is recommended for diagnosis of inherited cardiac disease, to guide prognosis and treatment, but access is often limited by cost and availability. Recently introduced high-throughput bench-top DNA sequencing platforms have the potential to overcome these limitations. Methodology/Principal Findings We evaluated two next-generation sequencing (NGS) platforms for molecular diagnostics. The protein-coding regions of six genes associated with inherited arrhythmia syndromes were amplified from 15 human samples using parallelised multiplex PCR (Access Array, Fluidigm), and sequenced on the MiSeq (Illumina) and Ion Torrent PGM (Life Technologies). Overall, 97.9% of the target was sequenced adequately for variant calling on the MiSeq, and 96.8% on the Ion Torrent PGM. Regions missed tended to be of high GC-content, and most were problematic for both platforms. Variant calling was assessed using 107 variants detected using Sanger sequencing: within adequately sequenced regions, variant calling on both platforms was highly accurate (Sensitivity: MiSeq 100%, PGM 99.1%. Positive predictive value: MiSeq 95.9%, PGM 95.5%). At the time of the study the Ion Torrent PGM had a lower capital cost and individual runs were cheaper and faster. The MiSeq had a higher capacity (requiring fewer runs), with reduced hands-on time and simpler laboratory workflows. Both provide significant cost and time savings over conventional methods, even allowing for adjunct Sanger sequencing to validate findings and sequence exons missed by NGS. Conclusions/Significance MiSeq and Ion Torrent PGM both provide accurate variant detection as part of a PCR-based molecular diagnostic workflow, and provide alternative platforms for molecular diagnosis of inherited cardiac conditions. Though there were performance differences at this throughput, platforms differed primarily in terms of cost, scalability, protocol stability and ease of use. Compared with current molecular genetic diagnostic tests for inherited cardiac arrhythmias, these NGS approaches are faster, less expensive, and yet more comprehensive. PMID:23861798
Variant discovery in the sheep milk transcriptome using RNA sequencing.

PubMed

Suárez-Vega, Aroa; Gutiérrez-Gil, Beatriz; Klopp, Christophe; Tosser-Klopp, Gwenola; Arranz, Juan José

2017-02-15

The identification of genetic variation underlying desired phenotypes is one of the main challenges of current livestock genetic research. High-throughput transcriptome sequencing (RNA-Seq) offers new opportunities for the detection of transcriptome variants (SNPs and short indels) in different tissues and species. In this study, we used RNA-Seq on Milk Sheep Somatic Cells (MSCs) with the goal of characterizing the genetic variation within the coding regions of the milk transcriptome in Churra and Assaf sheep, two common dairy sheep breeds farmed in Spain. A total of 216,637 variants were detected in the MSCs transcriptome of the eight ewes analyzed. Among them, a total of 57,795 variants were detected in the regions harboring Quantitative Trait Loci (QTL) for milk yield, protein percentage and fat percentage, of which 21.44% were novel variants. Among the total variants detected, 561 (2.52%) and 1,649 (7.42%) were predicted to produce high or moderate impact changes in the corresponding transcriptional unit, respectively. In the functional enrichment analysis of the genes positioned within selected QTL regions harboring novel relevant functional variants (high and moderate impact), the KEGG pathway with the highest enrichment was "protein processing in endoplasmic reticulum". Additionally, a total of 504 and 1,063 variants were identified in the genes encoding principal milk proteins and molecules involved in the lipid metabolism, respectively. Of these variants, 20 mutations were found to have putative relevant effects on the encoded proteins. We present herein the first transcriptomic approach aimed at identifying genetic variants of the genes expressed in the lactating mammary gland of sheep. Through the transcriptome analysis of variability within regions harboring QTL for milk yield, protein percentage and fat percentage, we have found several pathways and genes that harbor mutations that could affect dairy production traits. Moreover, remarkable variants were also found in candidate genes coding for major milk proteins and proteins related to milk fat metabolism. Several of the SNPs found in this study could be included as suitable markers in genotyping platforms or custom SNP arrays to perform association analyses in commercial populations and apply genomic selection protocols in the dairy production industry.
Efficient population-scale variant analysis and prioritization with VAPr.

PubMed

Birmingham, Amanda; Mark, Adam M; Mazzaferro, Carlo; Xu, Guorong; Fisch, Kathleen M

2018-04-06

With the growing availability of population-scale whole-exome and whole-genome sequencing, demand for reproducible, scalable variant analysis has spread within genomic research communities. To address this need, we introduce the Python package VAPr (Variant Analysis and Prioritization). VAPr leverages existing annotation tools ANNOVAR and MyVariant.info with MongoDB-based flexible storage and filtering functionality. It offers biologists and bioinformatics generalists easy-to-use and scalable analysis and prioritization of genomic variants from large cohort studies. VAPr is developed in Python and is available for free use and extension under the MIT License. An install package is available on PyPi at https://pypi.python.org/pypi/VAPr, while source code and extensive documentation are on GitHub at https://github.com/ucsd-ccbb/VAPr. kfisch@ucsd.edu.
Diversity and Divergence of Dinoflagellate Histone Proteins

PubMed Central

Marinov, Georgi K.; Lynch, Michael

2015-01-01

Histone proteins and the nucleosomal organization of chromatin are near-universal eukaroytic features, with the exception of dinoflagellates. Previous studies have suggested that histones do not play a major role in the packaging of dinoflagellate genomes, although several genomic and transcriptomic surveys have detected a full set of core histone genes. Here, transcriptomic and genomic sequence data from multiple dinoflagellate lineages are analyzed, and the diversity of histone proteins and their variants characterized, with particular focus on their potential post-translational modifications and the conservation of the histone code. In addition, the set of putative epigenetic mark readers and writers, chromatin remodelers and histone chaperones are examined. Dinoflagellates clearly express the most derived set of histones among all autonomous eukaryote nuclei, consistent with a combination of relaxation of sequence constraints imposed by the histone code and the presence of numerous specialized histone variants. The histone code itself appears to have diverged significantly in some of its components, yet others are conserved, implying conservation of the associated biochemical processes. Specifically, and with major implications for the function of histones in dinoflagellates, the results presented here strongly suggest that transcription through nucleosomal arrays happens in dinoflagellates. Finally, the plausible roles of histones in dinoflagellate nuclei are discussed. PMID:26646152
SIGMAR1 mutation associated with autosomal recessive Silver-like syndrome

PubMed Central

Horga, Alejandro; Tomaselli, Pedro J.; Gonzalez, Michael A.; Laurà, Matilde; Muntoni, Francesco; Manzur, Adnan Y.; Hanna, Michael G.; Blake, Julian C.; Houlden, Henry; Züchner, Stephan

2016-01-01

Objective: To describe the genetic and clinical features of a simplex patient with distal hereditary motor neuropathy (dHMN) and lower limb spasticity (Silver-like syndrome) due to a mutation in the sigma nonopioid intracellular receptor–1 gene (SIGMAR1) and review the phenotypic spectrum of mutations in this gene. Methods: We used whole-exome sequencing to investigate the proband. The variants of interest were investigated for segregation in the family using Sanger sequencing. Subsequently, a larger cohort of 16 unrelated dHMN patients was specifically screened for SIGMAR1 mutations. Results: In the proband, we identified a homozygous missense variant (c.194T>A, p.Leu65Gln) in exon 2 of SIGMAR1 as the probable causative mutation. Pathogenicity is supported by evolutionary conservation, in silico analyses, and the strong phenotypic similarities with previously reported cases carrying coding sequence mutations in SIGMAR1. No other mutations were identified in 16 additional patients with dHMN. Conclusions: We suggest that coding sequence mutations in SIGMAR1 present clinically with a combination of dHMN and pyramidal tract signs, with or without spasticity, in the lower limbs. Preferential involvement of extensor muscles of the upper limbs may be a distinctive feature of the disease. These observations should be confirmed in future studies. PMID:27629094
SIGMAR1 mutation associated with autosomal recessive Silver-like syndrome.

PubMed

Horga, Alejandro; Tomaselli, Pedro J; Gonzalez, Michael A; Laurà, Matilde; Muntoni, Francesco; Manzur, Adnan Y; Hanna, Michael G; Blake, Julian C; Houlden, Henry; Züchner, Stephan; Reilly, Mary M

2016-10-11

To describe the genetic and clinical features of a simplex patient with distal hereditary motor neuropathy (dHMN) and lower limb spasticity (Silver-like syndrome) due to a mutation in the sigma nonopioid intracellular receptor-1 gene (SIGMAR1) and review the phenotypic spectrum of mutations in this gene. We used whole-exome sequencing to investigate the proband. The variants of interest were investigated for segregation in the family using Sanger sequencing. Subsequently, a larger cohort of 16 unrelated dHMN patients was specifically screened for SIGMAR1 mutations. In the proband, we identified a homozygous missense variant (c.194T>A, p.Leu65Gln) in exon 2 of SIGMAR1 as the probable causative mutation. Pathogenicity is supported by evolutionary conservation, in silico analyses, and the strong phenotypic similarities with previously reported cases carrying coding sequence mutations in SIGMAR1. No other mutations were identified in 16 additional patients with dHMN. We suggest that coding sequence mutations in SIGMAR1 present clinically with a combination of dHMN and pyramidal tract signs, with or without spasticity, in the lower limbs. Preferential involvement of extensor muscles of the upper limbs may be a distinctive feature of the disease. These observations should be confirmed in future studies. © 2016 American Academy of Neurology.
Analysis of potential protein-modifying variants in 9000 endometriosis patients and 150000 controls of European ancestry.

PubMed

Sapkota, Yadav; Vivo, Immaculata De; Steinthorsdottir, Valgerdur; Fassbender, Amelie; Bowdler, Lisa; Buring, Julie E; Edwards, Todd L; Jones, Sarah; O, Dorien; Peterse, Daniëlle; Rexrode, Kathryn M; Ridker, Paul M; Schork, Andrew J; Thorleifsson, Gudmar; Wallace, Leanne M; Kraft, Peter; Morris, Andrew P; Nyholt, Dale R; Edwards, Digna R Velez; Nyegaard, Mette; D'Hooghe, Thomas; Chasman, Daniel I; Stefansson, Kari; Missmer, Stacey A; Montgomery, Grant W

2017-09-12

Genome-wide association (GWA) studies have identified 19 independent common risk loci for endometriosis. Most of the GWA variants are non-coding and the genes responsible for the association signals have not been identified. Herein, we aimed to assess the potential role of protein-modifying variants in endometriosis using exome-array genotyping in 7164 cases and 21005 controls, and a replication set of 1840 cases and 129016 controls of European ancestry. Results in the discovery sample identified significant evidence for association with coding variants in single-variant (rs1801232-CUBN) and gene-level (CIITA and PARP4) meta-analyses, but these did not survive replication. In the combined analysis, there was genome-wide significant evidence for rs13394619 (P = 2.3 × 10 -9 ) in GREB1 at 2p25.1 - a locus previously identified in a GWA meta-analysis of European and Japanese samples. Despite sufficient power, our results did not identify any protein-modifying variants (MAF > 0.01) with moderate or large effect sizes in endometriosis, although these variants may exist in non-European populations or in high-risk families. The results suggest continued discovery efforts should focus on genotyping large numbers of surgically-confirmed endometriosis cases and controls, and/or sequencing high-risk families to identify novel rare variants to provide greater insights into the molecular pathogenesis of the disease.
Strategies to Improve Efficiency and Specificity of Degenerate Primers in PCR.

PubMed

Campos, Maria Jorge; Quesada, Alberto

2017-01-01

PCR with degenerate primers can be used to identify the coding sequence of an unknown protein or to detect a genetic variant within a gene family. These primers, which are complex mixtures of slightly different oligonucleotide sequences, can be optimized to increase the efficiency and/or specificity of PCR in the amplification of a sequence of interest by the introduction of mismatches with the target sequence and balancing their position toward the primers 5'- or 3'-ends. In this work, we explain in detail examples of rational design of primers in two different applications, including the use of specific determinants at the 3'-end, to: (1) improve PCR efficiency with coding sequences for members of a protein family by fully degeneration at a core box of conserved genetic information, with the reduction of degeneration at the 5'-end, and (2) optimize specificity of allelic discrimination of closely related orthologous by 5'-end degenerate primers.
[Genetic variants in miRNAs and its association with breast cancer].

PubMed

Méndez-Gómez, Susana; Ruiz Esparza-Garrido, Ruth; Velázquez-Flores, Miguel; Dolores-Vergara, Maria; Salamanca-Gómez, Fabio; Arenas-Aranda, Diego Julio

2014-01-01

In Mexico, breast cancer represents the first cause of cancer death in females. At the molecular level, non-coding RNAs and especially microRNAs have played an important role in the origin and development of this neoplasm In the Anglo-Saxon population, diverse genetic variants in microRNA genes and in their targets are associated with the development of this disease. In the Mexican population it is not known if these or other variants exist. Identification of these or new variants in our population is fundamental in order to have a better understanding of cancer development and to help establish a better diagnostic strategy. DNA was isolated from mammary tumors, adjacent tissue and peripheral blood of Mexican females with or without cancer. From DNA, five microRNA genes and three of their targets were amplified and sequenced. Genetic variants associated with breast cancer in an Anglo- Saxon population have been previously identified in these sequences. In the samples studied we identified seven single nucleotide polymorphisms (SNPs). Two had not been previously described and were identified only in women with cancer. The new variants may be genetic predisposition factors for the development of breast cancer in our population. Further experiments are needed to determine the involvement of these variants in the development, establishment and progression of breast cancer.
A Bayesian Framework for Generalized Linear Mixed Modeling Identifies New Candidate Loci for Late-Onset Alzheimer’s Disease

PubMed Central

Wang, Xulong; Philip, Vivek M.; Ananda, Guruprasad; White, Charles C.; Malhotra, Ankit; Michalski, Paul J.; Karuturi, Krishna R. Murthy; Chintalapudi, Sumana R.; Acklin, Casey; Sasner, Michael; Bennett, David A.; De Jager, Philip L.; Howell, Gareth R.; Carter, Gregory W.

2018-01-01

Recent technical and methodological advances have greatly enhanced genome-wide association studies (GWAS). The advent of low-cost, whole-genome sequencing facilitates high-resolution variant identification, and the development of linear mixed models (LMM) allows improved identification of putatively causal variants. While essential for correcting false positive associations due to sample relatedness and population stratification, LMMs have commonly been restricted to quantitative variables. However, phenotypic traits in association studies are often categorical, coded as binary case-control or ordered variables describing disease stages. To address these issues, we have devised a method for genomic association studies that implements a generalized LMM (GLMM) in a Bayesian framework, called Bayes-GLMM. Bayes-GLMM has four major features: (1) support of categorical, binary, and quantitative variables; (2) cohesive integration of previous GWAS results for related traits; (3) correction for sample relatedness by mixed modeling; and (4) model estimation by both Markov chain Monte Carlo sampling and maximal likelihood estimation. We applied Bayes-GLMM to the whole-genome sequencing cohort of the Alzheimer’s Disease Sequencing Project. This study contains 570 individuals from 111 families, each with Alzheimer’s disease diagnosed at one of four confidence levels. Using Bayes-GLMM we identified four variants in three loci significantly associated with Alzheimer’s disease. Two variants, rs140233081 and rs149372995, lie between PRKAR1B and PDGFA. The coded proteins are localized to the glial-vascular unit, and PDGFA transcript levels are associated with Alzheimer’s disease-related neuropathology. In summary, this work provides implementation of a flexible, generalized mixed-model approach in a Bayesian framework for association studies. PMID:29507048
Low incidence of DNA sequence variation in human induced pluripotent stem cells generated by non-integrating plasmid expression

PubMed Central

Cheng, Linzhao; Hansen, Nancy F.; Zhao, Ling; Du, Yutao; Zou, Chunlin; Donovan, Frank X.; Chou, Bin-Kuan; Zhou, Guangyu; Li, Shijie; Dowey, Sarah N.; Ye, Zhaohui; Chandrasekharappa, Settara C.; Yang, Huanming; Mullikin, James C.; Liu, P. Paul

2012-01-01

Summary The utility of induced pluripotent stem cells (iPSCs) as models to study diseases and as sources for cell therapy depends on the integrity of their genomes. Despite recent publications of DNA sequence variations in the iPSCs, the true scope of such changes for the entire genome is not clear. Here we report the whole-genome sequencing of three human iPSC lines derived from two cell types of an adult donor by episomal vectors. The vector sequence was undetectable in the deeply sequenced iPSC lines. We identified 1058–1808 heterozygous single nucleotide variants (SNVs), but no copy number variants, in each iPSC line. Six to twelve of these SNVs were within coding regions in each iPSC line, but ~50% of them are synonymous changes and the remaining are not selectively enriched for known genes associated with cancers. Our data thus suggest that episome-mediated reprogramming is not inherently mutagenic during integration-free iPSC induction. PMID:22385660
Evaluating whole genome sequence data from the Genetic Absence Epilepsy Rat from Strasbourg and its related non-epileptic strain

PubMed Central

Powell, Kim L.; Zhu, Mingfu; Campbell, C. Ryan; Maia, Jessica M.; Ren, Zhong; Jones, Nigel C.; O’Brien, Terence J.; Petrovski, Slavé

2017-01-01

Objective The Genetic Absence Epilepsy Rats from Strasbourg (GAERS) are an inbreed Wistar rat strain widely used as a model of genetic generalised epilepsy with absence seizures. As in humans, the genetic architecture that results in genetic generalized epilepsy in GAERS is poorly understood. Here we present the strain-specific variants found among the epileptic GAERS and their related Non-Epileptic Control (NEC) strain. The GAERS and NEC represent a powerful opportunity to identify neurobiological factors that are associated with the genetic generalised epilepsy phenotype. Methods We performed whole genome sequencing on adult epileptic GAERS and adult NEC rats, a strain derived from the same original Wistar colony. We also generated whole genome sequencing on four double-crossed (GAERS with NEC) F2 selected for high-seizing (n = 2) and non-seizing (n = 2) phenotypes. Results Specific to the GAERS genome, we identified 1.12 million single nucleotide variants, 296.5K short insertion-deletions, and 354 putative copy number variants that result in complete or partial loss/duplication of 41 genes. Of the GAERS-specific variants that met high quality criteria, 25 are annotated as stop codon gain/loss, 56 as putative essential splice sites, and 56 indels are predicted to result in a frameshift. Subsequent screening against the two F2 progeny sequenced for having the highest and two F2 progeny for having the lowest seizure burden identified only the selected Cacna1h GAERS-private protein-coding variant as exclusively co-segregating with the two high-seizing F2 rats. Significance This study highlights an approach for using whole genome sequencing to narrow down to a manageable candidate list of genetic variants in a complex genetic epilepsy animal model, and suggests utility of this sequencing design to investigate other spontaneously occurring animal models of human disease. PMID:28708842
An integrated map of genetic variation from 1,092 human genomes

PubMed Central

2012-01-01

Summary Through characterising the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help understand the genetic contribution to disease. We describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methodologies to integrate information across multiple algorithms and diverse data sources we provide a validated haplotype map of 38 million SNPs, 1.4 million indels and over 14 thousand larger deletions. We show that individuals from different populations carry different profiles of rare and common variants and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways and that each individual harbours hundreds of rare non-coding variants at conserved sites, such as transcription-factor-motif disrupting changes. This resource, which captures up to 98% of accessible SNPs at a frequency of 1% in populations of medical genetics focus, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations. PMID:23128226
Identified OAS3 gene variants associated with coexistence of HBsAg and anti-HBs in chronic HBV infection.

PubMed

Wang, S; Wang, J; Fan, M-J; Li, T-Y; Pan, H; Wang, X; Liu, H-K; Lin, Q-F; Zhang, J-G; Guan, L-P; Zhernakova, D V; O'Brien, S J; Feng, Z-R; Chang, L; Dai, E-H; Lu, J-H; Xi, H-L; Zeng, Z; Yu, Y-Y; Wang, B-B

2018-03-27

The underlying mechanism of coexistence of hepatitis B surface antigen (HBsAg) and hepatitis B surface antigen antibody (anti-HBs) is still controversial. To identify the host genetic factors related to this unusual clinical phenomenon, a two-stage study was conducted in the Chinese Han population. In the first stage, we performed a case-control (1:1) age- and gender-matched study of 101 cases with concurrent HBsAg and anti-HBs and 102 controls with negative HBsAg and positive anti-HBs using whole exome sequencing. In the second validation stage, we directly sequence the 16 exons on the OAS3 gene in two dependent cohorts of 48 cases and 200 controls. Although, in the first stage, a genome-wide association study of 58,563 polymorphism variants in 101 cases and 102 controls found no significant loci (P-value ≤ .05/58563), and neither locus achieved a conservative genome-wide significance threshold (P-value ≤ 5e-08), gene-based burden analysis showed that OAS3 gene rare variants were associated with the coexistence of HBsAg and anti-HBs. (P-value = 4.127e-06 ≤ 0.05/6994). A total of 16 rare variants were screened out from 21 cases and 3 controls. In the second validation stage, one case with a stop-gained rare variant was identified. Fisher's exact test of all 149 cases and 302 controls showed that the rare coding sequence mutations were more frequent in cases vs controls (P-value = 7.299e-09, OR = 17.27, 95% CI [5.01-58.72]). Protein-coding rare variations on the OAS3 gene are associated with the coexistence of HBsAg and anti-HBs in patients with chronic HBV infection in Chinese Han population. © 2018 John Wiley & Sons Ltd.
Global characterization of copy number variants in epilepsy patients from whole genome sequencing

PubMed Central

Meloche, Caroline; Andrade, Danielle M.; Lafreniere, Ron G.; Gravel, Micheline; Spiegelman, Dan; Dionne-Laporte, Alexandre; Boelman, Cyrus; Hamdan, Fadi F.; Michaud, Jacques L.; Rouleau, Guy; Minassian, Berge A.; Bourque, Guillaume; Cossette, Patrick

2018-01-01

Epilepsy will affect nearly 3% of people at some point during their lifetime. Previous copy number variants (CNVs) studies of epilepsy have used array-based technology and were restricted to the detection of large or exonic events. In contrast, whole-genome sequencing (WGS) has the potential to more comprehensively profile CNVs but existing analytic methods suffer from limited accuracy. We show that this is in part due to the non-uniformity of read coverage, even after intra-sample normalization. To improve on this, we developed PopSV, an algorithm that uses multiple samples to control for technical variation and enables the robust detection of CNVs. Using WGS and PopSV, we performed a comprehensive characterization of CNVs in 198 individuals affected with epilepsy and 301 controls. For both large and small variants, we found an enrichment of rare exonic events in epilepsy patients, especially in genes with predicted loss-of-function intolerance. Notably, this genome-wide survey also revealed an enrichment of rare non-coding CNVs near previously known epilepsy genes. This enrichment was strongest for non-coding CNVs located within 100 Kbp of an epilepsy gene and in regions associated with changes in the gene expression, such as expression QTLs or DNase I hypersensitive sites. Finally, we report on 21 potentially damaging events that could be associated with known or new candidate epilepsy genes. Our results suggest that comprehensive sequence-based profiling of CNVs could help explain a larger fraction of epilepsy cases. PMID:29649218
TREM2 is associated with increased risk for Alzheimer's disease in African Americans.

PubMed

Jin, Sheng Chih; Carrasquillo, Minerva M; Benitez, Bruno A; Skorupa, Tara; Carrell, David; Patel, Dwani; Lincoln, Sarah; Krishnan, Siddharth; Kachadoorian, Michaela; Reitz, Christiane; Mayeux, Richard; Wingo, Thomas S; Lah, James J; Levey, Allan I; Murrell, Jill; Hendrie, Hugh; Foroud, Tatiana; Graff-Radford, Neill R; Goate, Alison M; Cruchaga, Carlos; Ertekin-Taner, Nilüfer

2015-04-10

TREM2 encodes for triggering receptor expressed on myeloid cells 2 and has rare, coding variants that associate with risk for late-onset Alzheimer's disease (LOAD) in Caucasians of European and North-American origin. This study evaluated the role of TREM2 in LOAD risk in African-American (AA) subjects. We performed exonic sequencing and validation in two independent cohorts of >800 subjects. We selected six coding variants (p.R47H, p.R62H, p.D87N, p.E151K, p.W191X, and p.L211P) for case-control analyses in a total of 906 LOAD cases vs. 2,487 controls. We identified significant LOAD risk association with p.L211P (p=0.01, OR=1.27, 95%CI=1.05-1.54) and suggestive association with p.W191X (p=0.08, OR=1.35, 95%CI=0.97-1.87). Conditional analysis suggests that p.L211P, which is in linkage disequilibrium with p.W191X, may be the stronger variant of the two, but does not rule out independent contribution of the latter. TREM2 p.L211P resides within the cytoplasmic domain and p.W191X is a stop-gain mutation within the shorter TREM-2V transcript. The coding variants within the extracellular domain of TREM2 previously shown to confer LOAD risk in Caucasians were extremely rare in our AA cohort and did not associate with LOAD risk. Our findings suggest that TREM2 coding variants also confer LOAD risk in AA, but implicate variants within different regions of the gene than those identified for Caucasian subjects. These results underscore the importance of investigating different ethnic populations for disease risk variant discovery, which may uncover allelic heterogeneity with potentially diverse mechanisms of action.
Identifying Mendelian disease genes with the Variant Effect Scoring Tool

PubMed Central

2013-01-01

Background Whole exome sequencing studies identify hundreds to thousands of rare protein coding variants of ambiguous significance for human health. Computational tools are needed to accelerate the identification of specific variants and genes that contribute to human disease. Results We have developed the Variant Effect Scoring Tool (VEST), a supervised machine learning-based classifier, to prioritize rare missense variants with likely involvement in human disease. The VEST classifier training set comprised ~ 45,000 disease mutations from the latest Human Gene Mutation Database release and another ~45,000 high frequency (allele frequency >1%) putatively neutral missense variants from the Exome Sequencing Project. VEST outperforms some of the most popular methods for prioritizing missense variants in carefully designed holdout benchmarking experiments (VEST ROC AUC = 0.91, PolyPhen2 ROC AUC = 0.86, SIFT4.0 ROC AUC = 0.84). VEST estimates variant score p-values against a null distribution of VEST scores for neutral variants not included in the VEST training set. These p-values can be aggregated at the gene level across multiple disease exomes to rank genes for probable disease involvement. We tested the ability of an aggregate VEST gene score to identify candidate Mendelian disease genes, based on whole-exome sequencing of a small number of disease cases. We used whole-exome data for two Mendelian disorders for which the causal gene is known. Considering only genes that contained variants in all cases, the VEST gene score ranked dihydroorotate dehydrogenase (DHODH) number 2 of 2253 genes in four cases of Miller syndrome, and myosin-3 (MYH3) number 2 of 2313 genes in three cases of Freeman Sheldon syndrome. Conclusions Our results demonstrate the potential power gain of aggregating bioinformatics variant scores into gene-level scores and the general utility of bioinformatics in assisting the search for disease genes in large-scale exome sequencing studies. VEST is available as a stand-alone software package at http://wiki.chasmsoftware.org and is hosted by the CRAVAT web server at http://www.cravat.us PMID:23819870

X-Linked Glomerulopathy Due to COL4A5 Founder Variant.

PubMed

Barua, Moumita; John, Rohan; Stella, Lorenzo; Li, Weili; Roslin, Nicole M; Sharif, Bedra; Hack, Saidah; Lajoie-Starkell, Ginette; Schwaderer, Andrew L; Becknell, Brian; Wuttke, Matthias; Köttgen, Anna; Cattran, Daniel; Paterson, Andrew D; Pei, York

2018-03-01

Alport syndrome is a rare hereditary disorder caused by rare variants in 1 of 3 genes encoding for type IV collagen. Rare variants in COL4A5 on chromosome Xq22 cause X-linked Alport syndrome, which accounts for ∼80% of the cases. Alport syndrome has a variable clinical presentation, including progressive kidney failure, hearing loss, and ocular defects. Exome sequencing performed in 2 affected related males with an undefined X-linked glomerulopathy characterized by global and segmental glomerulosclerosis, mesangial hypercellularity, and vague basement membrane immune complex deposition revealed a COL4A5 sequence variant, a substitution of a thymine by a guanine at nucleotide 665 (c.T665G; rs281874761) of the coding DNA predicted to lead to a cysteine to phenylalanine substitution at amino acid 222, which was not seen in databases cataloguing natural human genetic variation, including dbSNP138, 1000 Genomes Project release version 01-11-2004, Exome Sequencing Project 21-06-2014, or ExAC 01-11-2014. Review of the literature identified 2 additional families with the same COL4A5 variant leading to similar atypical histopathologic features, suggesting a unique pathologic mechanism initiated by this specific rare variant. Homology modeling suggests that the substitution alters the structural and dynamic properties of the type IV collagen trimer. Genetic analysis comparing members of the 3 families indicated a distant relationship with a shared haplotype, implying a founder effect. Crown Copyright © 2017. Published by Elsevier Inc. All rights reserved.
Next generation sequencing gives an insight into the characteristics of highly selected breeds versus non-breed horses in the course of domestication.

PubMed

Metzger, Julia; Tonda, Raul; Beltran, Sergi; Agueda, Lídia; Gut, Marta; Distl, Ottmar

2014-07-04

Domestication has shaped the horse and lead to a group of many different types. Some have been under strong human selection while others developed in close relationship with nature. The aim of our study was to perform next generation sequencing of breed and non-breed horses to provide an insight into genetic influences on selective forces. Whole genome sequencing of five horses of four different populations revealed 10,193,421 single nucleotide polymorphisms (SNPs) and 1,361,948 insertion/deletion polymorphisms (indels). In comparison to horse variant databases and previous reports, we were able to identify 3,394,883 novel SNPs and 868,525 novel indels. We analyzed the distribution of individual variants and found significant enrichment of private mutations in coding regions of genes involved in primary metabolic processes, anatomical structures, morphogenesis and cellular components in non-breed horses and in contrast to that private mutations in genes affecting cell communication, lipid metabolic process, neurological system process, muscle contraction, ion transport, developmental processes of the nervous system and ectoderm in breed horses. Our next generation sequencing data constitute an important first step for the characterization of non-breed in comparison to breed horses and provide a large number of novel variants for future analyses. Functional annotations suggest specific variants that could play a role for the characterization of breed or non-breed horses.
ClinGen Pathogenicity Calculator: a configurable system for assessing pathogenicity of genetic variants.

PubMed

Patel, Ronak Y; Shah, Neethu; Jackson, Andrew R; Ghosh, Rajarshi; Pawliczek, Piotr; Paithankar, Sameer; Baker, Aaron; Riehle, Kevin; Chen, Hailin; Milosavljevic, Sofia; Bizon, Chris; Rynearson, Shawn; Nelson, Tristan; Jarvik, Gail P; Rehm, Heidi L; Harrison, Steven M; Azzariti, Danielle; Powell, Bradford; Babb, Larry; Plon, Sharon E; Milosavljevic, Aleksandar

2017-01-12

The success of the clinical use of sequencing based tests (from single gene to genomes) depends on the accuracy and consistency of variant interpretation. Aiming to improve the interpretation process through practice guidelines, the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) have published standards and guidelines for the interpretation of sequence variants. However, manual application of the guidelines is tedious and prone to human error. Web-based tools and software systems may not only address this problem but also document reasoning and supporting evidence, thus enabling transparency of evidence-based reasoning and resolution of discordant interpretations. In this report, we describe the design, implementation, and initial testing of the Clinical Genome Resource (ClinGen) Pathogenicity Calculator, a configurable system and web service for the assessment of pathogenicity of Mendelian germline sequence variants. The system allows users to enter the applicable ACMG/AMP-style evidence tags for a specific allele with links to supporting data for each tag and generate guideline-based pathogenicity assessment for the allele. Through automation and comprehensive documentation of evidence codes, the system facilitates more accurate application of the ACMG/AMP guidelines, improves standardization in variant classification, and facilitates collaborative resolution of discordances. The rules of reasoning are configurable with gene-specific or disease-specific guideline variations (e.g. cardiomyopathy-specific frequency thresholds and functional assays). The software is modular, equipped with robust application program interfaces (APIs), and available under a free open source license and as a cloud-hosted web service, thus facilitating both stand-alone use and integration with existing variant curation and interpretation systems. The Pathogenicity Calculator is accessible at http://calculator.clinicalgenome.org . By enabling evidence-based reasoning about the pathogenicity of genetic variants and by documenting supporting evidence, the Calculator contributes toward the creation of a knowledge commons and more accurate interpretation of sequence variants in research and clinical care.
Hypervariability of ribosomal DNA at multiple chromosomal sites in lake trout (Salvelinus namaycush).

PubMed

Zhuo, L; Reed, K M; Phillips, R B

1995-06-01

Variation in the intergenic spacer (IGS) of the ribosomal DNA (rDNA) of lake trout (Salvelinus namaycush) was examined. Digestion of genomic DNA with restriction enzymes showed that almost every individual had a unique combination of length variants with most of this variation occurring within rather than between populations. Sequence analysis of a 2.3 kilobase (kb) EcoRI-DraI fragment spanning the 3' end of the 28S coding region and approximately 1.8 kb of the IGS revealed two blocks of repetitive DNA. Putative transcriptional termination sites were found approximately 220 bases (b) downstream from the end of the 28S coding region. Comparison of the 2.3-kb fragments with two longer (3.1 kb) fragments showed that the major difference in length resulted from variation in the number of short (89 b) repeats located 3' to the putative terminator. Repeat units within a single nucleolus organizer region (NOR) appeared relatively homogeneous and genetic analysis found variants to be stably inherited. A comparison of the number of spacer-length variants with the number of NORs found that the number of length variants per individual was always less than the number of NORs. Examination of spacer variants in five populations showed that populations with more NORs had more spacer variants, indicating that variants are present at different rDNA sites on nonhomologous chromosomes.
A Cytogenetic Abnormality and Rare Coding Variants Identify ABCA13 as a Candidate Gene in Schizophrenia, Bipolar Disorder, and Depression

PubMed Central

Knight, Helen M.; Pickard, Benjamin S.; Maclean, Alan; Malloy, Mary P.; Soares, Dinesh C.; McRae, Allan F.; Condie, Alison; White, Angela; Hawkins, William; McGhee, Kevin; van Beck, Margaret; MacIntyre, Donald J.; Starr, John M.; Deary, Ian J.; Visscher, Peter M.; Porteous, David J.; Cannon, Ronald E.; St Clair, David; Muir, Walter J.; Blackwood, Douglas H.R.

2009-01-01

Schizophrenia and bipolar disorder are leading causes of morbidity across all populations, with heritability estimates of ∼80% indicating a substantial genetic component. Population genetics and genome-wide association studies suggest an overlap of genetic risk factors between these illnesses but it is unclear how this genetic component is divided between common gene polymorphisms, rare genomic copy number variants, and rare gene sequence mutations. We report evidence that the lipid transporter gene ABCA13 is a susceptibility factor for both schizophrenia and bipolar disorder. After the initial discovery of its disruption by a chromosome abnormality in a person with schizophrenia, we resequenced ABCA13 exons in 100 cases with schizophrenia and 100 controls. Multiple rare coding variants were identified including one nonsense and nine missense mutations and compound heterozygosity/homozygosity in six cases. Variants were genotyped in additional schizophrenia, bipolar, depression (n > 1600), and control (n > 950) cohorts and the frequency of all rare variants combined was greater than controls in schizophrenia (OR = 1.93, p = 0.0057) and bipolar disorder (OR = 2.71, p = 0.00007). The population attributable risk of these mutations was 2.2% for schizophrenia and 4.0% for bipolar disorder. In a study of 21 families of mutation carriers, we genotyped affected and unaffected relatives and found significant linkage (LOD = 4.3) of rare variants with a phenotype including schizophrenia, bipolar disorder, and major depression. These data identify a candidate gene, highlight the genetic overlap between schizophrenia, bipolar disorder, and depression, and suggest that rare coding variants may contribute significantly to risk of these disorders. PMID:19944402
Gene panel sequencing improves the diagnostic work-up of patients with idiopathic erythrocytosis and identifies new mutations

PubMed Central

Camps, Carme; Petousi, Nayia; Bento, Celeste; Cario, Holger; Copley, Richard R.; McMullin, Mary Frances; van Wijk, Richard; Ratcliffe, Peter J.; Robbins, Peter A.; Taylor, Jenny C.

2016-01-01

Erythrocytosis is a rare disorder characterized by increased red cell mass and elevated hemoglobin concentration and hematocrit. Several genetic variants have been identified as causes for erythrocytosis in genes belonging to different pathways including oxygen sensing, erythropoiesis and oxygen transport. However, despite clinical investigation and screening for these mutations, the cause of disease cannot be found in a considerable number of patients, who are classified as having idiopathic erythrocytosis. In this study, we developed a targeted next-generation sequencing panel encompassing the exonic regions of 21 genes from relevant pathways (~79 Kb) and sequenced 125 patients with idiopathic erythrocytosis. The panel effectively screened 97% of coding regions of these genes, with an average coverage of 450×. It identified 51 different rare variants, all leading to alterations of protein sequence, with 57 out of 125 cases (45.6%) having at least one of these variants. Ten of these were known erythrocytosis-causing variants, which had been missed following existing diagnostic algorithms. Twenty-two were novel variants in erythrocytosis-associated genes (EGLN1, EPAS1, VHL, BPGM, JAK2, SH2B3) and in novel genes included in the panel (e.g. EPO, EGLN2, HIF3A, OS9), some with a high likelihood of functionality, for which future segregation, functional and replication studies will be useful to provide further evidence for causality. The rest were classified as polymorphisms. Overall, these results demonstrate the benefits of using a gene panel rather than existing methods in which focused genetic screening is performed depending on biochemical measurements: the gene panel improves diagnostic accuracy and provides the opportunity for discovery of novel variants. PMID:27651169
Gene panel sequencing improves the diagnostic work-up of patients with idiopathic erythrocytosis and identifies new mutations.

PubMed

Camps, Carme; Petousi, Nayia; Bento, Celeste; Cario, Holger; Copley, Richard R; McMullin, Mary Frances; van Wijk, Richard; Ratcliffe, Peter J; Robbins, Peter A; Taylor, Jenny C

2016-11-01

Erythrocytosis is a rare disorder characterized by increased red cell mass and elevated hemoglobin concentration and hematocrit. Several genetic variants have been identified as causes for erythrocytosis in genes belonging to different pathways including oxygen sensing, erythropoiesis and oxygen transport. However, despite clinical investigation and screening for these mutations, the cause of disease cannot be found in a considerable number of patients, who are classified as having idiopathic erythrocytosis. In this study, we developed a targeted next-generation sequencing panel encompassing the exonic regions of 21 genes from relevant pathways (~79 Kb) and sequenced 125 patients with idiopathic erythrocytosis. The panel effectively screened 97% of coding regions of these genes, with an average coverage of 450×. It identified 51 different rare variants, all leading to alterations of protein sequence, with 57 out of 125 cases (45.6%) having at least one of these variants. Ten of these were known erythrocytosis-causing variants, which had been missed following existing diagnostic algorithms. Twenty-two were novel variants in erythrocytosis-associated genes (EGLN1, EPAS1, VHL, BPGM, JAK2, SH2B3) and in novel genes included in the panel (e.g. EPO, EGLN2, HIF3A, OS9), some with a high likelihood of functionality, for which future segregation, functional and replication studies will be useful to provide further evidence for causality. The rest were classified as polymorphisms. Overall, these results demonstrate the benefits of using a gene panel rather than existing methods in which focused genetic screening is performed depending on biochemical measurements: the gene panel improves diagnostic accuracy and provides the opportunity for discovery of novel variants. Copyright© Ferrata Storti Foundation.
Intact Protein Analysis at 21 Tesla and X-Ray Crystallography Define Structural Differences in Single Amino Acid Variants of Human Mitochondrial Branched-Chain Amino Acid Aminotransferase 2 (BCAT2)

NASA Astrophysics Data System (ADS)

Anderson, Lissa C.; Håkansson, Maria; Walse, Björn; Nilsson, Carol L.

2017-09-01

Structural technologies are an essential component in the design of precision therapeutics. Precision medicine entails the development of therapeutics directed toward a designated target protein, with the goal to deliver the right drug to the right patient at the right time. In the field of oncology, protein structural variants are often associated with oncogenic potential. In a previous proteogenomic screen of patient-derived glioblastoma (GBM) tumor materials, we identified a sequence variant of human mitochondrial branched-chain amino acid aminotransferase 2 as a putative factor of resistance of GBM to standard-of-care-treatments. The enzyme generates glutamate, which is neurotoxic. To elucidate structural coordinates that may confer altered substrate binding or activity of the variant BCAT2 T186R, a 45 kDa protein, we applied combined ETD and CID top-down mass spectrometry in a LC-FT-ICR MS at 21 T, and X-Ray crystallography in the study of both the variant and non-variant intact proteins. The combined ETD/CID fragmentation pattern allowed for not only extensive sequence coverage but also confident localization of the amino acid variant to its position in the sequence. The crystallographic experiments confirmed the hypothesis generated by in silico structural homology modeling, that the Lys59 side-chain of BCAT2 may repulse the Arg186 in the variant protein (PDB code: 5MPR), leading to destabilization of the protein dimer and altered enzyme kinetics. Taken together, the MS and novel 3D structural data give us reason to further pursue BCAT2 T186R as a precision drug target in GBM. [Figure not available: see fulltext.
Nonsyndromic cleft lip with or without cleft palate: Increased burden of rare variants within Gremlin-1, a component of the bone morphogenetic protein 4 pathway.

PubMed

Al Chawa, Taofik; Ludwig, Kerstin U; Fier, Heide; Pötzsch, Bernd; Reich, Rudolf H; Schmidt, Gül; Braumann, Bert; Daratsianos, Nikolaos; Böhmer, Anne C; Schuencke, Hannah; Alblas, Margrieta; Fricker, Nadine; Hoffmann, Per; Knapp, Michael; Lange, Christoph; Nöthen, Markus M; Mangold, Elisabeth

2014-06-01

The genes Gremlin-1 (GREM1) and Noggin (NOG) are components of the bone morphogenetic protein 4 pathway, which has been implicated in craniofacial development. Both genes map to recently identified susceptibility loci (chromosomal region 15q13, 17q22) for nonsyndromic cleft lip with or without cleft palate (nsCL/P). The aim of the present study was to determine whether rare variants in either gene are implicated in nsCL/P etiology. The complete coding regions, untranslated regions, and splice sites of GREM1 and NOG were sequenced in 96 nsCL/P patients and 96 controls of Central European ethnicity. Three burden and four nonburden tests were performed. Statistically significant results were followed up in a second case-control sample (n = 96, respectively). For rare variants observed in cases, segregation analyses were performed. In NOG, four rare sequence variants (minor allele frequency < 1%) were identified. Here, burden and nonburden analyses generated nonsignificant results. In GREM1, 33 variants were identified, 15 of which were rare. Of these, five were novel. Significant p-values were generated in three nonburden analyses. Segregation analyses revealed incomplete penetrance for all variants investigated. Our study did not provide support for NOG being the causal gene at 17q22. However, the observation of a significant excess of rare variants in GREM1 supports the hypothesis that this is the causal gene at chr. 15q13. Because no single causal variant was identified, future sequencing analyses of GREM1 should involve larger samples and the investigation of regulatory elements. © 2014 Wiley Periodicals, Inc.
CRAWview: for viewing splicing variation, gene families, and polymorphism in clusters of ESTs and full-length sequences.

PubMed

Chou, A; Burke, J

1999-05-01

DNA sequence clustering has become a valuable method in support of gene discovery and gene expression analysis. Our interest lies in leveraging the sequence diversity within clusters of expressed sequence tags (ESTs) to model gene structure for the study of gene variants that arise from, among other things, alternative mRNA splicing, polymorphism, and divergence after gene duplication, fusion, and translocation events. In previous work, CRAW was developed to discover gene variants from assembled clusters of ESTs. Most importantly, novel gene features (the differing units between gene variants, for example alternative exons, polymorphisms, transposable elements, etc.) that are specialized to tissue, disease, population, or developmental states can be identified when these tools collate DNA source information with gene variant discrimination. While the goal is complete automation of novel feature and gene variant detection, current methods are far from perfect and hence the development of effective tools for visualization and exploratory data analysis are of paramount importance in the process of sifting through candidate genes and validating targets. We present CRAWview, a Java based visualization extension to CRAW. Features that vary between gene forms are displayed using an automatically generated color coded index. The reporting format of CRAWview gives a brief, high level summary report to display overlap and divergence within clusters of sequences as well as the ability to 'drill down' and see detailed information concerning regions of interest. Additionally, the alignment viewing and editing capabilities of CRAWview make it possible to interactively correct frame-shifts and otherwise edit cluster assemblies. We have implemented CRAWview as a Java application across windows NT/95 and UNIX platforms. A beta version of CRAWview will be freely available to academic users from Pangea Systems (http://www.pangeasystems.com). Contact :
Interactive web-based identification and visualization of transcript shared sequences.

PubMed

Azhir, Alaleh; Merino, Louis-Henri; Nauen, David W

2018-05-12

We have developed TraC (Transcript Consensus), a web-based tool for detecting and visualizing shared sequences among two or more mRNA transcripts such as splice variants. Results including exon-exon boundaries are returned in a highly intuitive, data-rich, interactive plot that permits users to explore the similarities and differences of multiple transcript sequences. The online tool (http://labs.pathology.jhu.edu/nauen/trac/) is free to use. The source code is freely available for download (https://github.com/nauenlab/TraC). Copyright © 2018 Elsevier Inc. All rights reserved.
Increasing the Yield in Targeted Next-Generation Sequencing by Implicating CNV Analysis, Non-Coding Exons and the Overall Variant Load: The Example of Retinal Dystrophies

PubMed Central

Eisenberger, Tobias; Neuhaus, Christine; Khan, Arif O.; Decker, Christian; Preising, Markus N.; Friedburg, Christoph; Bieg, Anika; Gliem, Martin; Issa, Peter Charbel; Holz, Frank G.; Baig, Shahid M.; Hellenbroich, Yorck; Galvez, Alberto; Platzer, Konrad; Wollnik, Bernd; Laddach, Nadja; Ghaffari, Saeed Reza; Rafati, Maryam; Botzenhart, Elke; Tinschert, Sigrid; Börger, Doris; Bohring, Axel; Schreml, Julia; Körtge-Jung, Stefani; Schell-Apacik, Chayim; Bakur, Khadijah; Al-Aama, Jumana Y.; Neuhann, Teresa; Herkenrath, Peter; Nürnberg, Gudrun; Nürnberg, Peter; Davis, John S.; Gal, Andreas; Bergmann, Carsten; Lorenz, Birgit; Bolz, Hanno J.

2013-01-01

Retinitis pigmentosa (RP) and Leber congenital amaurosis (LCA) are major causes of blindness. They result from mutations in many genes which has long hampered comprehensive genetic analysis. Recently, targeted next-generation sequencing (NGS) has proven useful to overcome this limitation. To uncover “hidden mutations” such as copy number variations (CNVs) and mutations in non-coding regions, we extended the use of NGS data by quantitative readout for the exons of 55 RP and LCA genes in 126 patients, and by including non-coding 5′ exons. We detected several causative CNVs which were key to the diagnosis in hitherto unsolved constellations, e.g. hemizygous point mutations in consanguineous families, and CNVs complemented apparently monoallelic recessive alleles. Mutations of non-coding exon 1 of EYS revealed its contribution to disease. In view of the high carrier frequency for retinal disease gene mutations in the general population, we considered the overall variant load in each patient to assess if a mutation was causative or reflected accidental carriership in patients with mutations in several genes or with single recessive alleles. For example, truncating mutations in RP1, a gene implicated in both recessive and dominant RP, were causative in biallelic constellations, unrelated to disease when heterozygous on a biallelic mutation background of another gene, or even non-pathogenic if close to the C-terminus. Patients with mutations in several loci were common, but without evidence for di- or oligogenic inheritance. Although the number of targeted genes was low compared to previous studies, the mutation detection rate was highest (70%) which likely results from completeness and depth of coverage, and quantitative data analysis. CNV analysis should routinely be applied in targeted NGS, and mutations in non-coding exons give reason to systematically include 5′-UTRs in disease gene or exome panels. Consideration of all variants is indispensable because even truncating mutations may be misleading. PMID:24265693
Increasing the yield in targeted next-generation sequencing by implicating CNV analysis, non-coding exons and the overall variant load: the example of retinal dystrophies.

PubMed

Eisenberger, Tobias; Neuhaus, Christine; Khan, Arif O; Decker, Christian; Preising, Markus N; Friedburg, Christoph; Bieg, Anika; Gliem, Martin; Charbel Issa, Peter; Holz, Frank G; Baig, Shahid M; Hellenbroich, Yorck; Galvez, Alberto; Platzer, Konrad; Wollnik, Bernd; Laddach, Nadja; Ghaffari, Saeed Reza; Rafati, Maryam; Botzenhart, Elke; Tinschert, Sigrid; Börger, Doris; Bohring, Axel; Schreml, Julia; Körtge-Jung, Stefani; Schell-Apacik, Chayim; Bakur, Khadijah; Al-Aama, Jumana Y; Neuhann, Teresa; Herkenrath, Peter; Nürnberg, Gudrun; Nürnberg, Peter; Davis, John S; Gal, Andreas; Bergmann, Carsten; Lorenz, Birgit; Bolz, Hanno J

2013-01-01

Retinitis pigmentosa (RP) and Leber congenital amaurosis (LCA) are major causes of blindness. They result from mutations in many genes which has long hampered comprehensive genetic analysis. Recently, targeted next-generation sequencing (NGS) has proven useful to overcome this limitation. To uncover "hidden mutations" such as copy number variations (CNVs) and mutations in non-coding regions, we extended the use of NGS data by quantitative readout for the exons of 55 RP and LCA genes in 126 patients, and by including non-coding 5' exons. We detected several causative CNVs which were key to the diagnosis in hitherto unsolved constellations, e.g. hemizygous point mutations in consanguineous families, and CNVs complemented apparently monoallelic recessive alleles. Mutations of non-coding exon 1 of EYS revealed its contribution to disease. In view of the high carrier frequency for retinal disease gene mutations in the general population, we considered the overall variant load in each patient to assess if a mutation was causative or reflected accidental carriership in patients with mutations in several genes or with single recessive alleles. For example, truncating mutations in RP1, a gene implicated in both recessive and dominant RP, were causative in biallelic constellations, unrelated to disease when heterozygous on a biallelic mutation background of another gene, or even non-pathogenic if close to the C-terminus. Patients with mutations in several loci were common, but without evidence for di- or oligogenic inheritance. Although the number of targeted genes was low compared to previous studies, the mutation detection rate was highest (70%) which likely results from completeness and depth of coverage, and quantitative data analysis. CNV analysis should routinely be applied in targeted NGS, and mutations in non-coding exons give reason to systematically include 5'-UTRs in disease gene or exome panels. Consideration of all variants is indispensable because even truncating mutations may be misleading.
Structure of allelic variants of subtype 5 of histone H1 in pea Pisum sativum L.

PubMed

Bogdanova, V S; Lester, D R; Berdnikov, V A; Andersson, I

2005-06-01

The pea genome contains seven histone H1 genes encoding different subtypes. Previously, the DNA sequence of only one gene, His1, coding for the subtype H1-1, had been identified. We isolated a histone H1 allele from a pea genomic DNA library. Data from the electrophoretic mobility of the pea H1 subtypes and their N-bromosuccinimide cleavage products indicated that the newly isolated gene corresponded to the H1-5 subtype encoded by His5. We confirmed this result by sequencing the gene from three pea lines with H1-5 allelic variants of altered electrophoretic mobility. The allele of the slow H1-5 variant differed from the standard allele by a nucleotide substitution that caused the replacement of the positively charged lysine with asparagine in the DNA-interacting domain of the histone molecule. A temperature-related occurrence had previously been demonstrated for this H1-5 variant in a study on a worldwide collection of pea germplasm. The variant tended to occur at higher frequencies in geographic regions with a cold climate. The fast allelic variant of H1-5 displayed a deletion resulting in the loss of a duplicated pentapeptide in the C-terminal domain.
Genetic analyses of bone morphogenetic protein 2, 4 and 7 in congenital combined pituitary hormone deficiency.

PubMed

Breitfeld, Jana; Martens, Susanne; Klammt, Jürgen; Schlicke, Marina; Pfäffle, Roland; Krause, Kerstin; Weidle, Kerstin; Schleinitz, Dorit; Stumvoll, Michael; Führer, Dagmar; Kovacs, Peter; Tönjes, Anke

2013-12-01

The complex process of development of the pituitary gland is regulated by a number of signalling molecules and transcription factors. Mutations in these factors have been identified in rare cases of congenital hypopituitarism but for most subjects with combined pituitary hormone deficiency (CPHD) genetic causes are unknown. Bone morphogenetic proteins (BMPs) affect induction and growth of the pituitary primordium and thus represent plausible candidates for mutational screening of patients with CPHD. We sequenced BMP2, 4 and 7 in 19 subjects with CPHD. For validation purposes, novel genetic variants were genotyped in 1046 healthy subjects. Additionally, potential functional relevance for most promising variants has been assessed by phylogenetic analyses and prediction of effects on protein structure. Sequencing revealed two novel variants and confirmed 30 previously known polymorphisms and mutations in BMP2, 4 and 7. Although phylogenetic analyses indicated that these variants map within strongly conserved gene regions, there was no direct support for their impact on protein structure when applying predictive bioinformatics tools. A mutation in the BMP4 coding region resulting in an amino acid exchange (p.Arg300Pro) appeared most interesting among the identified variants. Further functional analyses are required to ultimately map the relevance of these novel variants in CPHD.
Genetic analyses of bone morphogenetic protein 2, 4 and 7 in congenital combined pituitary hormone deficiency

PubMed Central

2013-01-01

Background The complex process of development of the pituitary gland is regulated by a number of signalling molecules and transcription factors. Mutations in these factors have been identified in rare cases of congenital hypopituitarism but for most subjects with combined pituitary hormone deficiency (CPHD) genetic causes are unknown. Bone morphogenetic proteins (BMPs) affect induction and growth of the pituitary primordium and thus represent plausible candidates for mutational screening of patients with CPHD. Methods We sequenced BMP2, 4 and 7 in 19 subjects with CPHD. For validation purposes, novel genetic variants were genotyped in 1046 healthy subjects. Additionally, potential functional relevance for most promising variants has been assessed by phylogenetic analyses and prediction of effects on protein structure. Results Sequencing revealed two novel variants and confirmed 30 previously known polymorphisms and mutations in BMP2, 4 and 7. Although phylogenetic analyses indicated that these variants map within strongly conserved gene regions, there was no direct support for their impact on protein structure when applying predictive bioinformatics tools. Conclusions A mutation in the BMP4 coding region resulting in an amino acid exchange (p.Arg300Pro) appeared most interesting among the identified variants. Further functional analyses are required to ultimately map the relevance of these novel variants in CPHD. PMID:24289245
Identification of low-frequency TRAF3IP2 coding variants in psoriatic arthritis patients and functional characterization

PubMed Central

2012-01-01

Introduction In recent genome-wide association studies for psoriatic arthritis (PsA) and psoriasis vulgaris, common coding variants in the TRAF3IP2 gene were identified to contribute to susceptibility to both disease entities. The risk allele of p.Asp10Asn (rs33980500) proved to be most significantly associated and to encode a mutant protein with an almost completely disrupted binding property to TRAF6, supporting its impact as a main disease-causing variant and modulator of IL-17 signaling. Methods To identify further variants, exons 2-4 encoding both known TNF-receptor-associated factor (TRAF) binding domains were sequenced in 871 PsA patients. Seven missense variants and one three-base-pair insertion were identified in 0.06% to 1.02% of alleles. Five of these variants were also present in 931 control individuals at comparable frequency. Constructs containing full-length wild-type or mutant TRAF3IP2 were generated and used to analyze functionally all variants for TRAF6-binding in a mammalian two-hybrid assay. Results None of the newly found alleles, though, encoded proteins with different binding properties to TRAF6, or to the cytoplasmic tail of the IL-17-receptor α-chain, suggesting that they do not contribute to susceptibility. Conclusions Thus, the TRAF3IP2-variant p.Asp10Asn is the only susceptibility allele with functional impact on TRAF6 binding, at least in the German population. PMID:22513239
Development of a genotyping microarray for Usher syndrome.

PubMed

Cremers, Frans P M; Kimberling, William J; Külm, Maigi; de Brouwer, Arjan P; van Wijk, Erwin; te Brinke, Heleen; Cremers, Cor W R J; Hoefsloot, Lies H; Banfi, Sandro; Simonelli, Francesca; Fleischhauer, Johannes C; Berger, Wolfgang; Kelley, Phil M; Haralambous, Elene; Bitner-Glindzicz, Maria; Webster, Andrew R; Saihan, Zubin; De Baere, Elfride; Leroy, Bart P; Silvestri, Giuliana; McKay, Gareth J; Koenekoop, Robert K; Millan, Jose M; Rosenberg, Thomas; Joensuu, Tarja; Sankila, Eeva-Marja; Weil, Dominique; Weston, Mike D; Wissinger, Bernd; Kremer, Hannie

2007-02-01

Usher syndrome, a combination of retinitis pigmentosa (RP) and sensorineural hearing loss with or without vestibular dysfunction, displays a high degree of clinical and genetic heterogeneity. Three clinical subtypes can be distinguished, based on the age of onset and severity of the hearing impairment, and the presence or absence of vestibular abnormalities. Thus far, eight genes have been implicated in the syndrome, together comprising 347 protein-coding exons. To improve DNA diagnostics for patients with Usher syndrome, we developed a genotyping microarray based on the arrayed primer extension (APEX) method. Allele-specific oligonucleotides corresponding to all 298 Usher syndrome-associated sequence variants known to date, 76 of which are novel, were arrayed. Approximately half of these variants were validated using original patient DNAs, which yielded an accuracy of >98%. The efficiency of the Usher genotyping microarray was tested using DNAs from 370 unrelated European and American patients with Usher syndrome. Sequence variants were identified in 64/140 (46%) patients with Usher syndrome type I, 45/189 (24%) patients with Usher syndrome type II, 6/21 (29%) patients with Usher syndrome type III and 6/20 (30%) patients with atypical Usher syndrome. The chip also identified two novel sequence variants, c.400C>T (p.R134X) in PCDH15 and c.1606T>C (p.C536S) in USH2A. The Usher genotyping microarray is a versatile and affordable screening tool for Usher syndrome. Its efficiency will improve with the addition of novel sequence variants with minimal extra costs, making it a very useful first-pass screening tool.
Development of a genotyping microarray for Usher syndrome

PubMed Central

Cremers, Frans P M; Kimberling, William J; Külm, Maigi; de Brouwer, Arjan P; van Wijk, Erwin; te Brinke, Heleen; Cremers, Cor W R J; Hoefsloot, Lies H; Banfi, Sandro; Simonelli, Francesca; Fleischhauer, Johannes C; Berger, Wolfgang; Kelley, Phil M; Haralambous, Elene; Bitner‐Glindzicz, Maria; Webster, Andrew R; Saihan, Zubin; De Baere, Elfride; Leroy, Bart P; Silvestri, Giuliana; McKay, Gareth J; Koenekoop, Robert K; Millan, Jose M; Rosenberg, Thomas; Joensuu, Tarja; Sankila, Eeva‐Marja; Weil, Dominique; Weston, Mike D; Wissinger, Bernd; Kremer, Hannie

2007-01-01

Background Usher syndrome, a combination of retinitis pigmentosa (RP) and sensorineural hearing loss with or without vestibular dysfunction, displays a high degree of clinical and genetic heterogeneity. Three clinical subtypes can be distinguished, based on the age of onset and severity of the hearing impairment, and the presence or absence of vestibular abnormalities. Thus far, eight genes have been implicated in the syndrome, together comprising 347 protein‐coding exons. Methods: To improve DNA diagnostics for patients with Usher syndrome, we developed a genotyping microarray based on the arrayed primer extension (APEX) method. Allele‐specific oligonucleotides corresponding to all 298 Usher syndrome‐associated sequence variants known to date, 76 of which are novel, were arrayed. Results Approximately half of these variants were validated using original patient DNAs, which yielded an accuracy of >98%. The efficiency of the Usher genotyping microarray was tested using DNAs from 370 unrelated European and American patients with Usher syndrome. Sequence variants were identified in 64/140 (46%) patients with Usher syndrome type I, 45/189 (24%) patients with Usher syndrome type II, 6/21 (29%) patients with Usher syndrome type III and 6/20 (30%) patients with atypical Usher syndrome. The chip also identified two novel sequence variants, c.400C>T (p.R134X) in PCDH15 and c.1606T>C (p.C536S) in USH2A. Conclusion The Usher genotyping microarray is a versatile and affordable screening tool for Usher syndrome. Its efficiency will improve with the addition of novel sequence variants with minimal extra costs, making it a very useful first‐pass screening tool. PMID:16963483
Sequence variants of the DFNB31 gene among Usher syndrome patients of diverse origin

PubMed Central

Aller, Elena; Jaijo, Teresa; van Wijk, Erwin; Ebermann, Inga; Kersten, Ferry; García-García, Gema; Voesenek, Krysta; Aparisi, María José; Hoefsloot, Lies; Cremers, Cor; Díaz-Llopis, Manuel; Pennings, Ronald; Bolz, Hanno J.; Kremer, Hannie; Millán, José M.

2010-01-01

Purpose It has been demonstrated that mutations in deafness, autosomal recessive 31 (DFNB31), the gene encoding whirlin, is responsible for nonsyndromic hearing loss (NSHL; DFNB31) and Usher syndrome type II (USH2D). We screened DFNB31 in a large cohort of patients with different clinical subtypes of Usher syndrome (USH) to determine the prevalence of DFNB31 mutations among USH patients. Methods DFNB31 was screened in 149 USH2, 29 USH1, six atypical USH, and 11 unclassified USH patients from diverse ethnic backgrounds. Mutation detection was performed by direct sequencing of all coding exons. Results We identified 38 different variants among 195 patients. Most variants were clearly polymorphic, but at least two out of the 15 nonsynonymous variants (p.R350W and p.R882S) are predicted to impair whirlin structure and function, suggesting eventual pathogenicity. No putatively pathogenic mutation was found in the second allele of patients with these mutations. Conclusions DFNB31 is not a major cause of USH. PMID:20352026

Functional annotation of HOT regions in the human genome: implications for human disease and cancer

PubMed Central

Li, Hao; Chen, Hebing; Liu, Feng; Ren, Chao; Wang, Shengqi; Bo, Xiaochen; Shu, Wenjie

2015-01-01

Advances in genome-wide association studies (GWAS) and large-scale sequencing studies have resulted in an impressive and growing list of disease- and trait-associated genetic variants. Most studies have emphasised the discovery of genetic variation in coding sequences, however, the noncoding regulatory effects responsible for human disease and cancer biology have been substantially understudied. To better characterise the cis-regulatory effects of noncoding variation, we performed a comprehensive analysis of the genetic variants in HOT (high-occupancy target) regions, which are considered to be one of the most intriguing findings of recent large-scale sequencing studies. We observed that GWAS variants that map to HOT regions undergo a substantial net decrease and illustrate development-specific localisation during haematopoiesis. Additionally, genetic risk variants are disproportionally enriched in HOT regions compared with LOT (low-occupancy target) regions in both disease-relevant and cancer cells. Importantly, this enrichment is biased toward disease- or cancer-specific cell types. Furthermore, we observed that cancer cells generally acquire cancer-specific HOT regions at oncogenes through diverse mechanisms of cancer pathogenesis. Collectively, our findings demonstrate the key roles of HOT regions in human disease and cancer and represent a critical step toward further understanding disease biology, diagnosis, and therapy. PMID:26113264
Functional annotation of HOT regions in the human genome: implications for human disease and cancer.

PubMed

Li, Hao; Chen, Hebing; Liu, Feng; Ren, Chao; Wang, Shengqi; Bo, Xiaochen; Shu, Wenjie

2015-06-26

Advances in genome-wide association studies (GWAS) and large-scale sequencing studies have resulted in an impressive and growing list of disease- and trait-associated genetic variants. Most studies have emphasised the discovery of genetic variation in coding sequences, however, the noncoding regulatory effects responsible for human disease and cancer biology have been substantially understudied. To better characterise the cis-regulatory effects of noncoding variation, we performed a comprehensive analysis of the genetic variants in HOT (high-occupancy target) regions, which are considered to be one of the most intriguing findings of recent large-scale sequencing studies. We observed that GWAS variants that map to HOT regions undergo a substantial net decrease and illustrate development-specific localisation during haematopoiesis. Additionally, genetic risk variants are disproportionally enriched in HOT regions compared with LOT (low-occupancy target) regions in both disease-relevant and cancer cells. Importantly, this enrichment is biased toward disease- or cancer-specific cell types. Furthermore, we observed that cancer cells generally acquire cancer-specific HOT regions at oncogenes through diverse mechanisms of cancer pathogenesis. Collectively, our findings demonstrate the key roles of HOT regions in human disease and cancer and represent a critical step toward further understanding disease biology, diagnosis, and therapy.
ToTem: a tool for variant calling pipeline optimization.

PubMed

Tom, Nikola; Tom, Ondrej; Malcikova, Jitka; Pavlova, Sarka; Kubesova, Blanka; Rausch, Tobias; Kolarik, Miroslav; Benes, Vladimir; Bystry, Vojtech; Pospisilova, Sarka

2018-06-26

High-throughput bioinformatics analyses of next generation sequencing (NGS) data often require challenging pipeline optimization. The key problem is choosing appropriate tools and selecting the best parameters for optimal precision and recall. Here we introduce ToTem, a tool for automated pipeline optimization. ToTem is a stand-alone web application with a comprehensive graphical user interface (GUI). ToTem is written in Java and PHP with an underlying connection to a MySQL database. Its primary role is to automatically generate, execute and benchmark different variant calling pipeline settings. Our tool allows an analysis to be started from any level of the process and with the possibility of plugging almost any tool or code. To prevent an over-fitting of pipeline parameters, ToTem ensures the reproducibility of these by using cross validation techniques that penalize the final precision, recall and F-measure. The results are interpreted as interactive graphs and tables allowing an optimal pipeline to be selected, based on the user's priorities. Using ToTem, we were able to optimize somatic variant calling from ultra-deep targeted gene sequencing (TGS) data and germline variant detection in whole genome sequencing (WGS) data. ToTem is a tool for automated pipeline optimization which is freely available as a web application at https://totem.software .
A rare variant in COL11A1 is strongly associated with adult height in Chinese Han population.

PubMed

Shen, Changbing; Zheng, Xiaodong; Gao, Jing; Zhu, Caihong; Ko, Randy; Tang, Xianfa; Yang, Chao; Dou, Jinfa; Lin, Yan; Cheng, Yuyan; Liu, Lu; Xu, Shuangjun; Chen, Gang; Zuo, Xianbo; Yin, Xianyong; Sun, Liangdan; Cui, Yong; Yang, Sen; Zhang, Xuejun; Zhou, Fusheng

2016-09-20

Human height is a highly heritable trait in which multiple genes are involved. Recent genome-wide association studies (GWASs) have identified that COL11A1 is an important susceptibility gene for human height. To determine whether the variants of COL11A1 are associated with adult and children height, we analyzed splicing and coding single-nucleotide variants across COL11A1 through exome-targeted sequencing and two validation stages with a total 20,426 Chinese Han samples. A total of 105 variants were identified by exome-targeted sequencing, of which 30 SNPs were located in coding region. The strongest association signal was Chr1_103380393 with P value of 4.8 × 10(-7). Chr1_103380393 also showed nominal significance in the validation stage (P = 1.21 × 10(-6)). Combined analysis of 16,738 samples strengthened the original association of chr1_103380393 with adult height (Pcombined = 3.1 × 10(-8)), with an increased height of 0.292sd (standard deviation) per G allele (95% CI: 0.19-0.40). There was no evidence (P = 0.843) showing that chr1_103380393 altered child height in 3688 child samples. Only the group of 12-15 years showed slight significance with P value of 0.0258. This study firstly shows that genetic variants of COL11A1 contribute to adult height in Chinese Han population but not to children height, which expand our knowledge of the genetic factors underlying height variation and the biological regulation of human height. Copyright © 2016 Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, and Genetics Society of China. All rights reserved.
Re-Evaluation of the PBAN Receptor Molecule: Characterization of PBANR Variants Expressed in the Pheromone Glands of Moths

PubMed Central

Lee, Jae Min; Hull, J. Joe; Kawai, Takeshi; Goto, Chie; Kurihara, Masaaki; Tanokura, Masaru; Nagata, Koji; Nagasawa, Hiromichi; Matsumoto, Shogo

2011-01-01

Sex pheromone production in most moths is initiated following pheromone biosynthesis activating neuropeptide receptor (PBANR) activation. PBANR was initially cloned from pheromone glands (PGs) of Helicoverpa zea and Bombyx mori. The B. mori PBANR is characterized by a relatively long C-terminus that is essential for ligand-induced internalization, whereas the H. zea PBANR has a shorter C-terminus that lacks features present in the B. mori PBANR critical for internalization. Multiple PBANRs have been reported to be concurrently expressed in the larval CNS of Heliothis virescens. In the current study, we sought to examine the prevalence of multiple PBANRs in the PGs of three moths and to ascertain their potential functional relevance. Multiple PBANR variants (As, A, B, and C) were cloned from the PGs of all species examined with PBANR-C the most highly expressed. Alternative splicing of the C-terminal coding sequence of the PBAN gene gives rise to the variants, which are distinguishable only by the length and composition of their respective C-terminal tails. Transient expression of fluorescent PBANR chimeras in insect cells revealed that PBANR-B and PBANR-C localized exclusively to the cell surface while PBANR-As and PBANR-A exhibited varying degrees of cytosolic localization. Similarly, only the PBANR-B and PBANR-C variants underwent ligand-induced internalization. Taken together, our results suggest that PBANR-C is the principal receptor molecule involved in PBAN signaling regardless of moth species. The high GC content of the C-terminal coding sequence in the B and C variants, which makes amplification using conventional polymerases difficult, likely accounts for previous “preferential” amplification of PBANR-A like receptors from other species. PMID:22654850
Screening and association testing of common coding variation in steroid hormone receptor co-activator and co-repressor genes in relation to breast cancer risk: the Multiethnic Cohort.

PubMed

Haiman, Christopher A; Garcia, Rachel R; Hsu, Chris; Xia, Lucy; Ha, Helen; Sheng, Xin; Le Marchand, Loic; Kolonel, Laurence N; Henderson, Brian E; Stallcup, Michael R; Greene, Geoffrey L; Press, Michael F

2009-01-30

Only a limited number of studies have performed comprehensive investigations of coding variation in relation to breast cancer risk. Given the established role of estrogens in breast cancer, we hypothesized that coding variation in steroid receptor coactivator and corepressor genes may alter inter-individual response to estrogen and serve as markers of breast cancer risk. We sequenced the coding exons of 17 genes (EP300, CCND1, NME1, NCOA1, NCOA2, NCOA3, SMARCA4, SMARCA2, CARM1, FOXA1, MPG, NCOR1, NCOR2, CALCOCO1, PRMT1, PPARBP and CREBBP) suggested to influence transcriptional activation by steroid hormone receptors in a multiethnic panel of women with advanced breast cancer (n = 95): African Americans, Latinos, Japanese, Native Hawaiians and European Americans. Association testing of validated coding variants was conducted in a breast cancer case-control study (1,612 invasive cases and 1,961 controls) nested in the Multiethnic Cohort. We used logistic regression to estimate odds ratios for allelic effects in ethnic-pooled analyses as well as in subgroups defined by disease stage and steroid hormone receptor status. We also investigated effect modification by established breast cancer risk factors that are associated with steroid hormone exposure. We identified 45 coding variants with frequencies > or = 1% in any one ethnic group (43 non-synonymous variants). We observed nominally significant positive associations with two coding variants in ethnic-pooled analyses (NCOR2: His52Arg, OR = 1.79; 95% CI, 1.05-3.05; CALCOCO1: Arg12His, OR = 2.29; 95% CI, 1.00-5.26). A small number of variants were associated with risk in disease subgroup analyses and we observed no strong evidence of effect modification by breast cancer risk factors. Based on the large number of statistical tests conducted in this study, the nominally significant associations that we observed may be due to chance, and will need to be confirmed in other studies. Our findings suggest that common coding variation in these candidate genes do not make a substantial contribution to breast cancer risk in the general population. Cataloging and testing of coding variants in coactivator and corepressor genes should continue and may serve as a valuable resource for investigations of other hormone-related phenotypes, such as inter-individual response to hormonal therapies used for cancer treatment and prevention.
SIN3A mutations are rare in men with azoospermia.

PubMed

Miyamoto, T; Koh, E; Tsujimura, A; Miyagawa, Y; Minase, G; Ueda, Y; Namiki, M; Sengoku, K

2015-11-01

A loss of function of the murine Sin3A gene resulted in male infertility with Sertoli cell-only syndrome (SCOS) phenotype in mice. Here, we investigated the relevance of this gene to human male infertility with azoospermia caused by SCOS. Mutation analysis of SIN3A in the coding region was performed on 80 Japanese patients. However, no variants could be detected. This study suggests a lack of association of SIN3A gene sequence variants with azoospermia caused by SCOS in humans. © 2014 Blackwell Verlag GmbH.
Population genetic implications from sequence variation in four Y chromosome genes.

PubMed

Shen, P; Wang, F; Underhill, P A; Franco, C; Yang, W H; Roxas, A; Sung, R; Lin, A A; Hyman, R W; Vollrath, D; Davis, R W; Cavalli-Sforza, L L; Oefner, P J

2000-06-20

Some insight into human evolution has been gained from the sequencing of four Y chromosome genes. Primary genomic sequencing determined gene SMCY to be composed of 27 exons that comprise 4,620 bp of coding sequence. The unfinished sequencing of the 5' portion of gene UTY1 was completed by primer walking, and a total of 20 exons were found. By using denaturing HPLC, these two genes, as well as DBY and DFFRY, were screened for polymorphic sites in 53-72 representatives of the five continents. A total of 98 variants were found, yielding nucleotide diversity estimates of 2.45 x 10(-5), 5. 07 x 10(-5), and 8.54 x 10(-5) for the coding regions of SMCY, DFFRY, and UTY1, respectively, with no variant having been observed in DBY. In agreement with most autosomal genes, diversity estimates for the noncoding regions were about 2- to 3-fold higher and ranged from 9. 16 x 10(-5) to 14.2 x 10(-5) for the four genes. Analysis of the frequencies of derived alleles for all four genes showed that they more closely fit the expectation of a Luria-Delbrück distribution than a distribution expected under a constant population size model, providing evidence for exponential population growth. Pairwise nucleotide mismatch distributions date the occurrence of population expansion to approximately 28,000 years ago. This estimate is in accord with the spread of Aurignacian technology and the disappearance of the Neanderthals.
Novel variant in the TP63 gene associated to ankyloblepharon-ectodermal dysplasia-cleft lip/palate (AEC) syndrome.

PubMed

Gonzalez, Francisco; Loidi, Lourdes; Abalo-Lojo, Jose M

2017-01-01

Ankyloblepharon-ectodermal dysplasia-cleft lip/palate (AEC) syndrome is a disorder resulting from anomalous embryonic development of ectodermal tissues. There is evidence that AEC syndrome is caused by mutations in the TP63 gene, which encodes the p63 protein. This is an important regulatory protein involved in epidermal proliferation and differentiation. Genome sequencing was performed in DNA from peripheral blood leukocytes of a newborn with AEC syndrome and her parents. Variants were searched in all coding exons and intron-exon boundaries of the TP63 gene. A heterozygous missense variant (NM_003722.4:c.1063G>C (p.Asp355His) was found in the newborn patient. No variants were found in either of the parents. We identified a previously unreported variant in TP63 gene which seems to be involved in the somatic malformations found in the AEC syndrome. The absence of this variant in both parents suggests that the variant appeared de novo.
Deep Sequencing of 71 Candidate Genes to Characterize Variation Associated with Alcohol Dependence.

PubMed

Clark, Shaunna L; McClay, Joseph L; Adkins, Daniel E; Kumar, Gaurav; Aberg, Karolina A; Nerella, Srilaxmi; Xie, Linying; Collins, Ann L; Crowley, James J; Quackenbush, Corey R; Hilliard, Christopher E; Shabalin, Andrey A; Vrieze, Scott I; Peterson, Roseann E; Copeland, William E; Silberg, Judy L; McGue, Matt; Maes, Hermine; Iacono, William G; Sullivan, Patrick F; Costello, Elizabeth J; van den Oord, Edwin J

2017-04-01

Previous genomewide association studies (GWASs) have identified a number of putative risk loci for alcohol dependence (AD). However, only a few loci have replicated and these replicated variants only explain a small proportion of AD risk. Using an innovative approach, the goal of this study was to generate hypotheses about potentially causal variants for AD that can be explored further through functional studies. We employed targeted capture of 71 candidate loci and flanking regions followed by next-generation deep sequencing (mean coverage 78X) in 806 European Americans. Regions included in our targeted capture library were genes identified through published GWAS of alcohol, all human alcohol and aldehyde dehydrogenases, reward system genes including dopaminergic and opioid receptors, prioritized candidate genes based on previous associations, and genes involved in the absorption, distribution, metabolism, and excretion of drugs. We performed single-locus tests to determine if any single variant was associated with AD symptom count. Sets of variants that overlapped with biologically meaningful annotations were tested for association in aggregate. No single, common variant was significantly associated with AD in our study. We did, however, find evidence for association with several variant sets. Two variant sets were significant at the q-value <0.10 level: a genic enhancer for ADHFE1 (p = 1.47 × 10 -5 ; q = 0.019), an alcohol dehydrogenase, and ADORA1 (p = 5.29 × 10 -5 ; q = 0.035), an adenosine receptor that belongs to a G-protein-coupled receptor gene family. To our knowledge, this is the first sequencing study of AD to examine variants in entire genes, including flanking and regulatory regions. We found that in addition to protein coding variant sets, regulatory variant sets may play a role in AD. From these findings, we have generated initial functional hypotheses about how these sets may influence AD. Copyright © 2017 by the Research Society on Alcoholism.
Germ-line and somatic EPHA2 coding variants in lens aging and cataract.

PubMed

Bennett, Thomas M; M'Hamdi, Oussama; Hejtmancik, J Fielding; Shiels, Alan

2017-01-01

Rare germ-line mutations in the coding regions of the human EPHA2 gene (EPHA2) have been associated with inherited forms of pediatric cataract, whereas, frequent, non-coding, single nucleotide variants (SNVs) have been associated with age-related cataract. Here we sought to determine if germ-line EPHA2 coding SNVs were associated with age-related cataract in a case-control DNA panel (> 50 years) and if somatic EPHA2 coding SNVs were associated with lens aging and/or cataract in a post-mortem lens DNA panel (> 48 years). Micro-fluidic PCR amplification followed by targeted amplicon (exon) next-generation (deep) sequencing of EPHA2 (17-exons) afforded high read-depth coverage (1000x) for > 82% of reads in the cataract case-control panel (161 cases, 64 controls) and > 70% of reads in the post-mortem lens panel (35 clear lens pairs, 22 cataract lens pairs). Novel and reference (known) missense SNVs in EPHA2 that were predicted in silico to be functionally damaging were found in both cases and controls from the age-related cataract panel at variant allele frequencies (VAFs) consistent with germ-line transmission (VAF > 20%). Similarly, both novel and reference missense SNVs in EPHA2 were found in the post-mortem lens panel at VAFs consistent with a somatic origin (VAF > 3%). The majority of SNVs found in the cataract case-control panel and post-mortem lens panel were transitions and many occurred at di-pyrimidine sites that are susceptible to ultraviolet (UV) radiation induced mutation. These data suggest that novel germ-line (blood) and somatic (lens) coding SNVs in EPHA2 that are predicted to be functionally deleterious occur in adults over 50 years of age. However, both types of EPHA2 coding variants were present at comparable levels in individuals with or without age-related cataract making simple genotype-phenotype correlations inconclusive.
Germ-line and somatic EPHA2 coding variants in lens aging and cataract

PubMed Central

Bennett, Thomas M.; M’Hamdi, Oussama; Hejtmancik, J. Fielding

2017-01-01

Rare germ-line mutations in the coding regions of the human EPHA2 gene (EPHA2) have been associated with inherited forms of pediatric cataract, whereas, frequent, non-coding, single nucleotide variants (SNVs) have been associated with age-related cataract. Here we sought to determine if germ-line EPHA2 coding SNVs were associated with age-related cataract in a case-control DNA panel (> 50 years) and if somatic EPHA2 coding SNVs were associated with lens aging and/or cataract in a post-mortem lens DNA panel (> 48 years). Micro-fluidic PCR amplification followed by targeted amplicon (exon) next-generation (deep) sequencing of EPHA2 (17-exons) afforded high read-depth coverage (1000x) for > 82% of reads in the cataract case-control panel (161 cases, 64 controls) and > 70% of reads in the post-mortem lens panel (35 clear lens pairs, 22 cataract lens pairs). Novel and reference (known) missense SNVs in EPHA2 that were predicted in silico to be functionally damaging were found in both cases and controls from the age-related cataract panel at variant allele frequencies (VAFs) consistent with germ-line transmission (VAF > 20%). Similarly, both novel and reference missense SNVs in EPHA2 were found in the post-mortem lens panel at VAFs consistent with a somatic origin (VAF > 3%). The majority of SNVs found in the cataract case-control panel and post-mortem lens panel were transitions and many occurred at di-pyrimidine sites that are susceptible to ultraviolet (UV) radiation induced mutation. These data suggest that novel germ-line (blood) and somatic (lens) coding SNVs in EPHA2 that are predicted to be functionally deleterious occur in adults over 50 years of age. However, both types of EPHA2 coding variants were present at comparable levels in individuals with or without age-related cataract making simple genotype-phenotype correlations inconclusive. PMID:29267365
Characterization of an Equine α-S2-Casein Variant Due to a 1.3 kb Deletion Spanning Two Coding Exons

PubMed Central

Brinkmann, Julia; Koudelka, Tomas; Keppler, Julia K.; Tholey, Andreas; Schwarz, Karin; Thaller, Georg; Tetens, Jens

2015-01-01

The production and consumption of mare’s milk in Europe has gained importance, mainly based on positive health effects and a lower allergenic potential as compared to cows’ milk. The allergenicity of milk is to a certain extent affected by different genetic variants. In classical dairy species, much research has been conducted into the genetic variability of milk proteins, but the knowledge in horses is scarce. Here, we characterize two major forms of equine αS2-casein arising from genomic 1.3 kb in-frame deletion involving two coding exons, one of which represents an equid specific duplication. Findings at the DNA-level have been verified by cDNA sequencing from horse milk of mares with different genotypes. At the protein-level, we were able to show by SDS-page and in-gel digestion with subsequent LC-MS analysis that both proteins are actually expressed. The comparison with published sequences of other equids revealed that the deletion has probably occurred before the ancestor of present-day asses and zebras diverged from the horse lineage. PMID:26444874
A novel variant of aquaporin 3 is expressed in killifish (Fundulus heteroclitus) intestine

PubMed Central

Jung, Dawoon; Adamo, Meredith A.; Lehman, Rebecca M.; Barnaby, Roxanna; Jackson, Craig E.; Jackson, Brian P.; Shaw, Joseph R.; Stanton, Bruce A.

2015-01-01

Killifish (Fundulus heteroclitus) are euryhaline teleosts that are widely used in environmental and toxicological studies, and they are tolerant to arsenic, in part due to very low assimilation of arsenic from the environment. The mechanism of arsenic uptake by the intestine, a major route of arsenic uptake in humans is unknown. Thus, the goal of this study was to determine if aquaglyceroporins (AQP), which transport water and other small molecules including arsenite across cell membranes, are expressed in the killifish intestine, and whether AQP expression is affected by osmotic stress. Through RT-PCR and sequence analysis of PCR amplicons, we demonstrated that the intestine expresses kfAQP3a and kfAQP3b, two previously identified variants, and also identified a novel variant of killifish AQP3 (kfAQP3c) in the intestine. The variants likely represent alternate splice forms. A BLAST search of the F. heteroclitus reference genome revealed that the AQP3 gene resides on a single locus, while an alignment of the AQP3 sequence among 384 individuals from eight population ranging from Rhode Island to North Carolina revealed that its coding sequence was remarkably conserved with no fixed polymorphism residing in the region that distinguishes these variants. We further demonstrate that the novel variant transports arsenite into HEK293T cells. Whereas kfAQP3a, which does not transport arsenite, was expressed in both freshwater (FW) and saltwater (SW) acclimated fish, kfAQP3b, an arsenic transporter, was expressed only in FW acclimated fish, and kfAQP3c was expressed only in SW acclimated fish. Thus, we have identified a novel, putative splice variant of kfAQP3, kfAQP3c, which transports arsenic and is expressed only in SW acclimated fish. PMID:25766383
A double mutation in exon 6 of the [beta]-hexosaminidase [alpha] subunit in a patient with the B1 variant of Tay-Sachs disease

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ainsworth, P.J.; Coulter-Mackie, M.B.

1992-10-01

The B1 variant form of Tay-Sachs disease is enzymologically unique in that the causative mutation(s) appear to affect the active site in the [alpha] subunit of [beta]-hexosaminidase A without altering its ability to associate with the [beta] subunit. Most previously reported B1 variant mutations were found in exon 5 within codon 178. The coding sequence of the [alpha] subunit gene of a patient with the B1 variant form was examined with a combination of reverse transcription of mRNA to cDNA, PCR, and dideoxy sequencing. A double mutation in exon 6 has been identified: a G[sub 574][yields]C transversion causing a val[submore » 192][yields]leu change and a G[sub 598][yields] A transition resulting in a val[sub 200][yields]met alteration. The amplified cDNAs were otherwise normal throughout their sequence. The 574 and 598 alterations have been confirmed by amplification directly from genomic DNA from the patient and her mother. Transient-expression studies of the two exon 6 mutations (singly or together) in COS-1 cells show that the G[sub 574][yields]C change is sufficient to cause the loss of enzyme activity. The biochemical phenotype of the 574 alteration in transfection studies is consistent with that expected for a B1 variant mutation. As such, this mutation differs from previously reported B1 variant mutations, all of which occur in exon 5. 31 refs., 2 figs., 2 tabs.« less
WHATIF: an open-source desktop application for extraction and management of the incidental findings from next-generation sequencing variant data

PubMed Central

Ye, Zhan; Kadolph, Christopher; Strenn, Robert; Wall, Daniel; McPherson, Elizabeth; Lin, Simon

2015-01-01

Background Identification and evaluation of incidental findings in patients following whole exome (WGS) or whole genome sequencing (WGS) is challenging for both practicing physicians and researchers. The American College of Medical Genetics and Genomics (ACMG) recently recommended a list of reportable incidental genetic findings. However, no informatics tools are currently available to support evaluation of incidental findings in next-generation sequencing data. Methods The Wisconsin Hierarchical Analysis Tool for Incidental Findings (WHATIF), was developed as a stand-alone Windows-based desktop executable, to support the interactive analysis of incidental findings in the context of the ACMG recommendations. WHATIF integrates the European Bioinformatics Institute Variant Effect Predictor (VEP) tool for biological interpretation and the National Center for Biotechnology Information ClinVar tool for clinical interpretation. Results An open-source desktop program was created to annotate incidental findings and present the results with a user-friendly interface. Further, a meaningful index (WHATIF Index) was devised for each gene to facilitate ranking of the relative importance of the variants and estimate the potential workload associated with further evaluation of the variants. Our WHATIF application is available at: http://tinyurl.com/WHATIF-SOFTWARE Conclusions The WHATIF application offers a user-friendly interface and allows users to investigate the extracted variant information efficiently and intuitively while always accessing the up to date information on variants via application programming interfaces (API) connections. WHATIF’s highly flexible design and straightforward implementation aids users in customizing the source code to meet their own special needs. PMID:25890833
High-throughput sequencing of mGluR signaling pathway genes reveals enrichment of rare variants in autism.

PubMed

Kelleher, Raymond J; Geigenmüller, Ute; Hovhannisyan, Hayk; Trautman, Edwin; Pinard, Robert; Rathmell, Barbara; Carpenter, Randall; Margulies, David

2012-01-01

Identification of common molecular pathways affected by genetic variation in autism is important for understanding disease pathogenesis and devising effective therapies. Here, we test the hypothesis that rare genetic variation in the metabotropic glutamate-receptor (mGluR) signaling pathway contributes to autism susceptibility. Single-nucleotide variants in genes encoding components of the mGluR signaling pathway were identified by high-throughput multiplex sequencing of pooled samples from 290 non-syndromic autism cases and 300 ethnically matched controls on two independent next-generation platforms. This analysis revealed significant enrichment of rare functional variants in the mGluR pathway in autism cases. Higher burdens of rare, potentially deleterious variants were identified in autism cases for three pathway genes previously implicated in syndromic autism spectrum disorder, TSC1, TSC2, and SHANK3, suggesting that genetic variation in these genes also contributes to risk for non-syndromic autism. In addition, our analysis identified HOMER1, which encodes a postsynaptic density-localized scaffolding protein that interacts with Shank3 to regulate mGluR activity, as a novel autism-risk gene. Rare, potentially deleterious HOMER1 variants identified uniquely in the autism population affected functionally important protein regions or regulatory sequences and co-segregated closely with autism among children of affected families. We also identified rare ASD-associated coding variants predicted to have damaging effects on components of the Ras/MAPK cascade. Collectively, these findings suggest that altered signaling downstream of mGluRs contributes to the pathogenesis of non-syndromic autism.
High-Throughput Sequencing of mGluR Signaling Pathway Genes Reveals Enrichment of Rare Variants in Autism

PubMed Central

Hovhannisyan, Hayk; Trautman, Edwin; Pinard, Robert; Rathmell, Barbara; Carpenter, Randall; Margulies, David

2012-01-01

Identification of common molecular pathways affected by genetic variation in autism is important for understanding disease pathogenesis and devising effective therapies. Here, we test the hypothesis that rare genetic variation in the metabotropic glutamate-receptor (mGluR) signaling pathway contributes to autism susceptibility. Single-nucleotide variants in genes encoding components of the mGluR signaling pathway were identified by high-throughput multiplex sequencing of pooled samples from 290 non-syndromic autism cases and 300 ethnically matched controls on two independent next-generation platforms. This analysis revealed significant enrichment of rare functional variants in the mGluR pathway in autism cases. Higher burdens of rare, potentially deleterious variants were identified in autism cases for three pathway genes previously implicated in syndromic autism spectrum disorder, TSC1, TSC2, and SHANK3, suggesting that genetic variation in these genes also contributes to risk for non-syndromic autism. In addition, our analysis identified HOMER1, which encodes a postsynaptic density-localized scaffolding protein that interacts with Shank3 to regulate mGluR activity, as a novel autism-risk gene. Rare, potentially deleterious HOMER1 variants identified uniquely in the autism population affected functionally important protein regions or regulatory sequences and co-segregated closely with autism among children of affected families. We also identified rare ASD-associated coding variants predicted to have damaging effects on components of the Ras/MAPK cascade. Collectively, these findings suggest that altered signaling downstream of mGluRs contributes to the pathogenesis of non-syndromic autism. PMID:22558107
A powerful tool for genome analysis in maize: development and evaluation of the high density 600 k SNP genotyping array.

PubMed

Unterseer, Sandra; Bauer, Eva; Haberer, Georg; Seidel, Michael; Knaak, Carsten; Ouzunova, Milena; Meitinger, Thomas; Strom, Tim M; Fries, Ruedi; Pausch, Hubert; Bertani, Christofer; Davassi, Alessandro; Mayer, Klaus Fx; Schön, Chris-Carolin

2014-09-29

High density genotyping data are indispensable for genomic analyses of complex traits in animal and crop species. Maize is one of the most important crop plants worldwide, however a high density SNP genotyping array for analysis of its large and highly dynamic genome was not available so far. We developed a high density maize SNP array composed of 616,201 variants (SNPs and small indels). Initially, 57 M variants were discovered by sequencing 30 representative temperate maize lines and then stringently filtered for sequence quality scores and predicted conversion performance on the array resulting in the selection of 1.2 M polymorphic variants assayed on two screening arrays. To identify high-confidence variants, 285 DNA samples from a broad genetic diversity panel of worldwide maize lines including the samples used for sequencing, important founder lines for European maize breeding, hybrids, and proprietary samples with European, US, semi-tropical, and tropical origin were used for experimental validation. We selected 616 k variants according to their performance during validation, support of genotype calls through sequencing data, and physical distribution for further analysis and for the design of the commercially available Affymetrix® Axiom® Maize Genotyping Array. This array is composed of 609,442 SNPs and 6,759 indels. Among these are 116,224 variants in coding regions and 45,655 SNPs of the Illumina® MaizeSNP50 BeadChip for study comparison. In a subset of 45,974 variants, apart from the target SNP additional off-target variants are detected, which show only a minor bias towards intermediate allele frequencies. We performed principal coordinate and admixture analyses to determine the ability of the array to detect and resolve population structure and investigated the extent of LD within a worldwide validation panel. The high density Affymetrix® Axiom® Maize Genotyping Array is optimized for European and American temperate maize and was developed based on a diverse sample panel by applying stringent quality filter criteria to ensure its suitability for a broad range of applications. With 600 k variants it is the largest currently publically available genotyping array in crop species.
Cloning and characterization of human immunodeficiency virus type 1 variants diminished in the ability to induce syncytium-independent cytolysis.

PubMed Central

Stevenson, M; Haggerty, S; Lamonica, C; Mann, A M; Meier, C; Wasiak, A

1990-01-01

The phenomenon of interference was exploited to isolate low-abundance noncytopathic human immunodeficiency virus type 1 (HIV-1) variants from a primary HIV-1 isolate from an asymptomatic HIV-1-seropositive hemophiliac. Successive rounds of virus infection of a cytolysis-susceptible CD4+ cell line and isolation of surviving cells resulted in selective amplification of an HIV-1 variant reduced in the ability to induce cytolysis. The presence of a PvuII polymorphism facilitated subsequent amplification and cloning of cytopathic and noncytopathic HIV-1 variants from the primary isolate. Cloned virus stocks from cytopathic and noncytopathic variants exhibited similar replication kinetics, infectivity, and syncytium induction in susceptible host cells. The noncytopathic HIV-1 variant was unable, however, to induce single-cell killing in susceptible host cells. Construction of viral hybrids in which regions of cytopathic and noncytopathic variants were exchanged indicated that determinants for the noncytopathic phenotype map to the envelope glycoprotein. Sequence analysis of the envelope coding regions indicated the absence of two highly conserved N-linked glycosylation sites in the noncytopathic HIV-1 variant, which accompanied differences in processing of precursor gp160 envelope glycoprotein. These results demonstrate that determinants for syncytium-independent single-cell killing are located within the envelope glycoprotein and suggest that single-cell killing is profoundly influenced by alterations in envelope sequence which affect posttranslational processing of HIV-1 envelope glycoprotein within the infected cell. Images PMID:1695254

Postnatal Expression of V2 Vasopressin Receptor Splice Variants in the Rat Cerebellum

PubMed Central

Vargas, Karina J.; Sarmiento, José M.; Ehrenfeld, Pamela; Añazco, Carolina C.; Villanueva, Carolina I.; Carmona, Pamela L.; Brenet, Marianne; Navarro, Javier; Müller-Esterl, Werner; Figueroa, Carlos D.; González, Carlos B.

2010-01-01

The V2 vasopressin receptor gene contains an alternative splice site in exon-3, which leads to the generation of two splice variants (V2a and V2b) first identified in the kidney. The open reading frame of the alternatively spliced V2b transcripten codes a truncated receptor, showing the same amino acid sequence as the canonical V2a receptor up to the 6th transmembrane segment, but displaying a distinct sequence to the corresponding 7th transmembrane segment and C-terminal domain relative to the V2a receptor. Here, we demonstrate the postnatal expression of V2a and V2b variants in the rat cerebellum. Most importantly, we showed by in situ hybridization and immunocytochemistry that both V2 splice variants were preferentially expressed in Purkinje cells, from early to late postnatal development. In addition, both variants were transiently expressed in the neuroblastic external granule cells and Bergmann fibers. These results indicate that the cellular distributions of both splice variants are developmentally regulated, and suggest that the transient expression of the V2 receptor is involved in the mechanisms of cerebellar cytodifferentiation by AVP. Finally, transfected CHO-K1 .expressing similar amounts of both V2 splice variants, as that found in the cerebellum, showed a significant reduction in the surface expression of V2a receptors, suggesting that the differential expression of the V2 splice variants regulate the vasopressin signaling in the cerebellum. PMID:19281786
Detection of hyper-conserved regions in hepatitis B virus X gene potentially useful for gene therapy.

PubMed

González, Carolina; Tabernero, David; Cortese, Maria Francesca; Gregori, Josep; Casillas, Rosario; Riveiro-Barciela, Mar; Godoy, Cristina; Sopena, Sara; Rando, Ariadna; Yll, Marçal; Lopez-Martinez, Rosa; Quer, Josep; Esteban, Rafael; Buti, Maria; Rodríguez-Frías, Francisco

2018-05-21

To detect hyper-conserved regions in the hepatitis B virus (HBV) X gene ( HBX ) 5' region that could be candidates for gene therapy. The study included 27 chronic hepatitis B treatment-naive patients in various clinical stages (from chronic infection to cirrhosis and hepatocellular carcinoma, both HBeAg-negative and HBeAg-positive), and infected with HBV genotypes A-F and H. In a serum sample from each patient with viremia > 3.5 log IU/mL, the HBX 5' end region [nucleotide (nt) 1255-1611] was PCR-amplified and submitted to next-generation sequencing (NGS). We assessed genotype variants by phylogenetic analysis, and evaluated conservation of this region by calculating the information content of each nucleotide position in a multiple alignment of all unique sequences (haplotypes) obtained by NGS. Conservation at the HBx protein amino acid (aa) level was also analyzed. NGS yielded 1333069 sequences from the 27 samples, with a median of 4578 sequences/sample (2487-9279, IQR 2817). In 14/27 patients (51.8%), phylogenetic analysis of viral nucleotide haplotypes showed a complex mixture of genotypic variants. Analysis of the information content in the haplotype multiple alignments detected 2 hyper-conserved nucleotide regions, one in the HBX upstream non-coding region (nt 1255-1286) and the other in the 5' end coding region (nt 1519-1603). This last region coded for a conserved amino acid region (aa 63-76) that partially overlaps a Kunitz-like domain. Two hyper-conserved regions detected in the HBX 5' end may be of value for targeted gene therapy, regardless of the patients' clinical stage or HBV genotype.
Large-scale identification and characterization of alternative splicing variants of human gene transcripts using 56 419 completely sequenced and manually annotated full-length cDNAs

PubMed Central

Takeda, Jun-ichi; Suzuki, Yutaka; Nakao, Mitsuteru; Barrero, Roberto A.; Koyanagi, Kanako O.; Jin, Lihua; Motono, Chie; Hata, Hiroko; Isogai, Takao; Nagai, Keiichi; Otsuki, Tetsuji; Kuryshev, Vladimir; Shionyu, Masafumi; Yura, Kei; Go, Mitiko; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Wiemann, Stefan; Nomura, Nobuo; Sugano, Sumio; Gojobori, Takashi; Imanishi, Tadashi

2006-01-01

We report the first genome-wide identification and characterization of alternative splicing in human gene transcripts based on analysis of the full-length cDNAs. Applying both manual and computational analyses for 56 419 completely sequenced and precisely annotated full-length cDNAs selected for the H-Invitational human transcriptome annotation meetings, we identified 6877 alternative splicing genes with 18 297 different alternative splicing variants. A total of 37 670 exons were involved in these alternative splicing events. The encoded protein sequences were affected in 6005 of the 6877 genes. Notably, alternative splicing affected protein motifs in 3015 genes, subcellular localizations in 2982 genes and transmembrane domains in 1348 genes. We also identified interesting patterns of alternative splicing, in which two distinct genes seemed to be bridged, nested or having overlapping protein coding sequences (CDSs) of different reading frames (multiple CDS). In these cases, completely unrelated proteins are encoded by a single locus. Genome-wide annotations of alternative splicing, relying on full-length cDNAs, should lay firm groundwork for exploring in detail the diversification of protein function, which is mediated by the fast expanding universe of alternative splicing variants. PMID:16914452
Information Topics of Greatest Interest for Return of Genome Sequencing Results among Women Diagnosed with Breast Cancer at a Young Age.

PubMed

Seo, Joann; Ivanovich, Jennifer; Goodman, Melody S; Biesecker, Barbara B; Kaphingst, Kimberly A

2017-06-01

We investigated what information women diagnosed with breast cancer at a young age would want to learn when genome sequencing results are returned. We conducted 60 semi-structured interviews with women diagnosed with breast cancer at age 40 or younger. We examined what specific information participants would want to learn across result types and for each type of result, as well as how much information they would want. Genome sequencing was not offered to participants as part of the study. Two coders independently coded interview transcripts; analysis was conducted using NVivo10. Across result types, participants wanted to learn about health implications, risk and prevalence in quantitative terms, causes of variants, and causes of diseases. Participants wanted to learn actionable information for variants affecting risk of preventable or treatable disease, medication response, and carrier status. The amount of desired information differed for variants affecting risk of unpreventable or untreatable disease, with uncertain significance, and not health-related. Women diagnosed with breast cancer at a young age recognize the value of genome sequencing results in identifying potential causes and effective treatments and expressed interest in using the information to help relatives and to further understand their other health risks. Our findings can inform the development of effective feedback strategies for genome sequencing that meet patients' information needs and preferences.
Assessing the 5S ribosomal RNA heterogeneity in Arabidopsis thaliana using short RNA next generation sequencing data.

PubMed

Szymanski, Maciej; Karlowski, Wojciech M

2016-01-01

In eukaryotes, ribosomal 5S rRNAs are products of multigene families organized within clusters of tandemly repeated units. Accumulation of genomic data obtained from a variety of organisms demonstrated that the potential 5S rRNA coding sequences show a large number of variants, often incompatible with folding into a correct secondary structure. Here, we present results of an analysis of a large set of short RNA sequences generated by the next generation sequencing techniques, to address the problem of heterogeneity of the 5S rRNA transcripts in Arabidopsis and identification of potentially functional rRNA-derived fragments.
Posttranscriptional regulation of the immediate-early gene EGR1 by light in the mouse retina.

PubMed

Simon, Perikles; Schott, Klaus; Williams, Robert W; Schaeffel, Frank

2004-12-01

Synaptic plasticity is modulated by differential regulation of transcription factors such as EGR1 which binds to DNA via a zinc finger binding domain. Inactivation of EGR1 has implicated this gene as a key regulator of memory formation and learning. However, it remains puzzling how synaptic input can lead to an up-regulation of the EGR-1 protein within only a few minutes. Here, we show by immunohistochemical staining that the EGR-1 protein is localized in synapses throughout the mouse retina. We demonstrate for the first time that two variants of Egr-1 mRNA are produced in the retina by alternative polyadenylation, with the longer version having an additional 293 base pairs at the end of the 3'UTR. Remarkably, the use of the alternative polyadenylation site is controlled by light. The additional 3'UTR sequence of the longer variant displays an even higher level of phylogenetic conservation than the coding region of this highly conserved gene. Additionally, it harbours a cytoplasmic polyadenylation element which is known to respond to NMDA receptor activation. The longer version of the Egr-1 mRNA could therefore rapidly respond to excitatory stimuli such as light or glutamate release whereas the short variant, which is predominantly expressed and contains the full coding sequence, lacks the regulatory elements for cytoplasmic polyadenylation in its 3'UTR.
A versatile palindromic amphipathic repeat coding sequence horizontally distributed among diverse bacterial and eucaryotic microbes

PubMed Central

2010-01-01

Background Intragenic tandem repeats occur throughout all domains of life and impart functional and structural variability to diverse translation products. Repeat proteins confer distinctive surface phenotypes to many unicellular organisms, including those with minimal genomes such as the wall-less bacterial monoderms, Mollicutes. One such repeat pattern in this clade is distributed in a manner suggesting its exchange by horizontal gene transfer (HGT). Expanding genome sequence databases reveal the pattern in a widening range of bacteria, and recently among eucaryotic microbes. We examined the genomic flux and consequences of the motif by determining its distribution, predicted structural features and association with membrane-targeted proteins. Results Using a refined hidden Markov model, we document a 25-residue protein sequence motif tandemly arrayed in variable-number repeats in ORFs lacking assigned functions. It appears sporadically in unicellular microbes from disparate bacterial and eucaryotic clades, representing diverse lifestyles and ecological niches that include host parasitic, marine and extreme environments. Tracts of the repeats predict a malleable configuration of recurring domains, with conserved hydrophobic residues forming an amphipathic secondary structure in which hydrophilic residues endow extensive sequence variation. Many ORFs with these domains also have membrane-targeting sequences that predict assorted topologies; others may comprise reservoirs of sequence variants. We demonstrate expressed variants among surface lipoproteins that distinguish closely related animal pathogens belonging to a subgroup of the Mollicutes. DNA sequences encoding the tandem domains display dyad symmetry. Moreover, in some taxa the domains occur in ORFs selectively associated with mobile elements. These features, a punctate phylogenetic distribution, and different patterns of dispersal in genomes of related taxa, suggest that the repeat may be disseminated by HGT and intra-genomic shuffling. Conclusions We describe novel features of PARCELs (Palindromic Amphipathic Repeat Coding ELements), a set of widely distributed repeat protein domains and coding sequences that were likely acquired through HGT by diverse unicellular microbes, further mobilized and diversified within genomes, and co-opted for expression in the membrane proteome of some taxa. Disseminated by multiple gene-centric vehicles, ORFs harboring these elements enhance accessory gene pools as part of the "mobilome" connecting genomes of various clades, in taxa sharing common niches. PMID:20626840
The evolving genetic risk for sporadic ALS.

PubMed

Gibson, Summer B; Downie, Jonathan M; Tsetsou, Spyridoula; Feusier, Julie E; Figueroa, Karla P; Bromberg, Mark B; Jorde, Lynn B; Pulst, Stefan M

2017-07-18

To estimate the genetic risk conferred by known amyotrophic lateral sclerosis (ALS)-associated genes to the pathogenesis of sporadic ALS (SALS) using variant allele frequencies combined with predicted variant pathogenicity. Whole exome sequencing and repeat expansion PCR of C9orf72 and ATXN2 were performed on 87 patients of European ancestry with SALS seen at the University of Utah. DNA variants that change the protein coding sequence of 31 ALS-associated genes were annotated to determine which were rare and deleterious as predicted by MetaSVM. The percentage of patients with SALS with a rare and deleterious variant or repeat expansion in an ALS-associated gene was calculated. An odds ratio analysis was performed comparing the burden of ALS-associated genes in patients with SALS vs 324 normal controls. Nineteen rare nonsynonymous variants in an ALS-associated gene, 2 of which were found in 2 different individuals, were identified in 21 patients with SALS. Further, 5 deleterious C9orf72 and 2 ATXN2 repeat expansions were identified. A total of 17.2% of patients with SALS had a rare and deleterious variant or repeat expansion in an ALS-associated gene. The genetic burden of ALS-associated genes in patients with SALS as predicted by MetaSVM was significantly higher than in normal controls. Previous analyses have identified SALS-predisposing variants only in terms of their rarity in normal control populations. By incorporating variant pathogenicity as well as variant frequency, we demonstrated that the genetic risk contributed by these genes for SALS is substantially lower than previous estimates. © 2017 American Academy of Neurology.
Enhancer Variants Synergistically Drive Dysfunction of a Gene Regulatory Network In Hirschsprung Disease

DOE PAGES

Chatterjee, Sumantra; Kapoor, Ashish; Akiyama, Jennifer A.; ...

2016-09-29

Common sequence variants in cis-regulatory elements (CREs) are suspected etiological causes of complex disorders. We previously identified an intronic enhancer variant in the RET gene disrupting SOX10 binding and increasing Hirschsprung disease (HSCR) risk 4-fold. We now show that two other functionally independent CRE variants, one binding Gata2 and the other binding Rarb, also reduce Ret expression and increase risk 2- and 1.7-fold. By studying human and mouse fetal gut tissues and cell lines, we demonstrate that reduced RET expression propagates throughout its gene regulatory network, exerting effects on both its positive and negative feedback components. We also provide evidencemore » that the presence of a combination of CRE variants synergistically reduces RET expression and its effects throughout the GRN. These studies show how the effects of functionally independent non-coding variants in a coordinated gene regulatory network amplify their individually small effects, providing a model for complex disorders.« less
Enhancer Variants Synergistically Drive Dysfunction of a Gene Regulatory Network In Hirschsprung Disease

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chatterjee, Sumantra; Kapoor, Ashish; Akiyama, Jennifer A.

Common sequence variants in cis-regulatory elements (CREs) are suspected etiological causes of complex disorders. We previously identified an intronic enhancer variant in the RET gene disrupting SOX10 binding and increasing Hirschsprung disease (HSCR) risk 4-fold. We now show that two other functionally independent CRE variants, one binding Gata2 and the other binding Rarb, also reduce Ret expression and increase risk 2- and 1.7-fold. By studying human and mouse fetal gut tissues and cell lines, we demonstrate that reduced RET expression propagates throughout its gene regulatory network, exerting effects on both its positive and negative feedback components. We also provide evidencemore » that the presence of a combination of CRE variants synergistically reduces RET expression and its effects throughout the GRN. These studies show how the effects of functionally independent non-coding variants in a coordinated gene regulatory network amplify their individually small effects, providing a model for complex disorders.« less
De Novo Coding Variants Are Strongly Associated with Tourette Disorder

PubMed Central

Willsey, A. Jeremy; Fernandez, Thomas V.; Yu, Dongmei; King, Robert A.; Dietrich, Andrea; Xing, Jinchuan; Sanders, Stephan J.; Mandell, Jeffrey D.; Huang, Alden Y.; Richer, Petra; Smith, Louw; Dong, Shan; Samocha, Kaitlin E.; Neale, Benjamin M.; Coppola, Giovanni; Mathews, Carol A.; Tischfield, Jay A.; Scharf, Jeremiah M.; State, Matthew W.; Heiman, Gary A.

2017-01-01

SUMMARY Whole-exome sequencing (WES) and de novo variant detection have proven a powerful approach to gene discovery in complex neurodevelopmental disorders. We have completed WES of 325 Tourette disorder trios from the Tourette International Collaborative Genetics cohort and a replication sample of 186 trios from the Tourette Syndrome Association International Consortium on Genetics (511 total). We observe strong and consistent evidence for the contribution of de novo likely gene-disrupting (LGD) variants (rate ratio [RR] 2.32, p = 0.002). Additionally, de novo damaging variants (LGD and probably damaging missense) are overrepresented in probands (RR 1.37, p = 0.003). We identify four likely risk genes with multiple de novo damaging variants in unrelated probands: WWC1 (WW and C2 domain containing 1), CELSR3 (Cadherin EGF LAG seven-pass G-type receptor 3), NIPBL (Nipped-B-like), and FN1 (fibronectin 1). Overall, we estimate that de novo damaging variants in approximately 400 genes contribute risk in 12% of clinical cases. PMID:28472652
Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs.

PubMed

Saunders, Christopher T; Wong, Wendy S W; Swamy, Sajani; Becq, Jennifer; Murray, Lisa J; Cheetham, R Keira

2012-07-15

Whole genome and exome sequencing of matched tumor-normal sample pairs is becoming routine in cancer research. The consequent increased demand for somatic variant analysis of paired samples requires methods specialized to model this problem so as to sensitively call variants at any practical level of tumor impurity. We describe Strelka, a method for somatic SNV and small indel detection from sequencing data of matched tumor-normal samples. The method uses a novel Bayesian approach which represents continuous allele frequencies for both tumor and normal samples, while leveraging the expected genotype structure of the normal. This is achieved by representing the normal sample as a mixture of germline variation with noise, and representing the tumor sample as a mixture of the normal sample with somatic variation. A natural consequence of the model structure is that sensitivity can be maintained at high tumor impurity without requiring purity estimates. We demonstrate that the method has superior accuracy and sensitivity on impure samples compared with approaches based on either diploid genotype likelihoods or general allele-frequency tests. The Strelka workflow source code is available at ftp://strelka@ftp.illumina.com/. csaunders@illumina.com
Spectrum of mutations in leiomyosarcomas identified by clinical targeted next-generation sequencing.

PubMed

Lee, Paul J; Yoo, Naomi S; Hagemann, Ian S; Pfeifer, John D; Cottrell, Catherine E; Abel, Haley J; Duncavage, Eric J

2017-02-01

Recurrent genomic mutations in uterine and non-uterine leiomyosarcomas have not been well established. Using a next generation sequencing (NGS) panel of common cancer-associated genes, 25 leiomyosarcomas arising from multiple sites were examined to explore genetic alterations, including single nucleotide variants (SNV), small insertions/deletions (indels), and copy number alterations (CNA). Sequencing showed 86 non-synonymous, coding region somatic variants within 151 gene targets in 21 cases, with a mean of 4.1 variants per case; 4 cases had no putative mutations in the panel of genes assayed. The most frequently altered genes were TP53 (36%), ATM and ATRX (16%), and EGFR and RB1 (12%). CNA were identified in 85% of cases, with the most frequent copy number losses observed in chromosomes 10 and 13 including PTEN and RB1; the most frequent gains were seen in chromosomes 7 and 17. Our data show that deletions in canonical cancer-related genes are common in leiomyosarcomas. Further, the spectrum of gene mutations observed shows that defects in DNA repair and chromosomal maintenance are central to the biology of leiomyosarcomas, and that activating mutations observed in other common cancer types are rare in leiomyosarcomas. Copyright © 2017 Elsevier Inc. All rights reserved.
Analysis of the neuroligin 4Y gene in patients with autism.

PubMed

Yan, Jin; Feng, Jinong; Schroer, Richard; Li, Wenyan; Skinner, Cindy; Schwartz, Charles E; Cook, Edwin H; Sommer, Steve S

2008-08-01

Frameshift and missense mutations in the X-linked neuroligin 4 (NLGN4, MIM# 300427) and neuroligin 3 (NLGN3, MIM# 300336) genes have been identified in patients with autism, Asperger syndrome and mental retardation. We hypothesize that sequence variants in NLGN4Y are associated with autism or mental retardation. The coding sequences and splice junctions of the NLGN4Y gene were analyzed in 335 male samples (290 with autism and 45 with mental retardation). A total of 1.1 Mb of genomic DNA was sequenced. One missense variant, p.I679V, was identified in a patient with autism, as well as his father with learning disabilities. The I679 residue is highly conserved in three members of the neuroligin family. The absence of p.I679V in 2986 control Y chromosomes and the high similarity of NLGN4 and NLGN4Y are consistent with the hypothesis that p.I679V contributes to the etiology of autism. The presence of only one structural variant in our population of 335 males with autism/mental retardation, the unavailability of significant family cosegregation and an absence of functional assays are, however, important limitations of this study.
Smoking Gun or Circumstantial Evidence? Comparison of Statistical Learning Methods using Functional Annotations for Prioritizing Risk Variants.

PubMed

Gagliano, Sarah A; Ravji, Reena; Barnes, Michael R; Weale, Michael E; Knight, Jo

2015-08-24

Although technology has triumphed in facilitating routine genome sequencing, new challenges have been created for the data-analyst. Genome-scale surveys of human variation generate volumes of data that far exceed capabilities for laboratory characterization. By incorporating functional annotations as predictors, statistical learning has been widely investigated for prioritizing genetic variants likely to be associated with complex disease. We compared three published prioritization procedures, which use different statistical learning algorithms and different predictors with regard to the quantity, type and coding. We also explored different combinations of algorithm and annotation set. As an application, we tested which methodology performed best for prioritizing variants using data from a large schizophrenia meta-analysis by the Psychiatric Genomics Consortium. Results suggest that all methods have considerable (and similar) predictive accuracies (AUCs 0.64-0.71) in test set data, but there is more variability in the application to the schizophrenia GWAS. In conclusion, a variety of algorithms and annotations seem to have a similar potential to effectively enrich true risk variants in genome-scale datasets, however none offer more than incremental improvement in prediction. We discuss how methods might be evolved for risk variant prediction to address the impending bottleneck of the new generation of genome re-sequencing studies.
Localized structural frustration for evaluating the impact of sequence variants

PubMed Central

Kumar, Sushant; Clarke, Declan; Gerstein, Mark

2016-01-01

Population-scale sequencing is increasingly uncovering large numbers of rare single-nucleotide variants (SNVs) in coding regions of the genome. The rarity of these variants makes it challenging to evaluate their deleteriousness with conventional phenotype–genotype associations. Protein structures provide a way of addressing this challenge. Previous efforts have focused on globally quantifying the impact of SNVs on protein stability. However, local perturbations may severely impact protein functionality without strongly disrupting global stability (e.g. in relation to catalysis or allostery). Here, we describe a workflow in which localized frustration, quantifying unfavorable local interactions, is employed as a metric to investigate such effects. Using this workflow on the Protein Databank, we find that frustration produces many immediately intuitive results: for instance, disease-related SNVs create stronger changes in localized frustration than non-disease related variants, and rare SNVs tend to disrupt local interactions to a larger extent than common variants. Less obviously, we observe that somatic SNVs associated with oncogenes and tumor suppressor genes (TSGs) induce very different changes in frustration. In particular, those associated with TSGs change the frustration more in the core than the surface (by introducing loss-of-function events), whereas those associated with oncogenes manifest the opposite pattern, creating gain-of-function events. PMID:27915290
Kangaroo – A pattern-matching program for biological sequences

PubMed Central

2002-01-01

Background Biologists are often interested in performing a simple database search to identify proteins or genes that contain a well-defined sequence pattern. Many databases do not provide straightforward or readily available query tools to perform simple searches, such as identifying transcription binding sites, protein motifs, or repetitive DNA sequences. However, in many cases simple pattern-matching searches can reveal a wealth of information. We present in this paper a regular expression pattern-matching tool that was used to identify short repetitive DNA sequences in human coding regions for the purpose of identifying potential mutation sites in mismatch repair deficient cells. Results Kangaroo is a web-based regular expression pattern-matching program that can search for patterns in DNA, protein, or coding region sequences in ten different organisms. The program is implemented to facilitate a wide range of queries with no restriction on the length or complexity of the query expression. The program is accessible on the web at http://bioinfo.mshri.on.ca/kangaroo/ and the source code is freely distributed at http://sourceforge.net/projects/slritools/. Conclusion A low-level simple pattern-matching application can prove to be a useful tool in many research settings. For example, Kangaroo was used to identify potential genetic targets in a human colorectal cancer variant that is characterized by a high frequency of mutations in coding regions containing mononucleotide repeats. PMID:12150718
An Exome Sequencing Study to Assess the Role of Rare Genetic Variation in Pulmonary Fibrosis.

PubMed

Petrovski, Slavé; Todd, Jamie L; Durheim, Michael T; Wang, Quanli; Chien, Jason W; Kelly, Fran L; Frankel, Courtney; Mebane, Caroline M; Ren, Zhong; Bridgers, Joshua; Urban, Thomas J; Malone, Colin D; Finlen Copeland, Ashley; Brinkley, Christie; Allen, Andrew S; O'Riordan, Thomas; McHutchison, John G; Palmer, Scott M; Goldstein, David B

2017-07-01

Idiopathic pulmonary fibrosis (IPF) is an increasingly recognized, often fatal lung disease of unknown etiology. The aim of this study was to use whole-exome sequencing to improve understanding of the genetic architecture of pulmonary fibrosis. We performed a case-control exome-wide collapsing analysis including 262 unrelated individuals with pulmonary fibrosis clinically classified as IPF according to American Thoracic Society/European Respiratory Society/Japanese Respiratory Society/Latin American Thoracic Association guidelines (81.3%), usual interstitial pneumonia secondary to autoimmune conditions (11.5%), or fibrosing nonspecific interstitial pneumonia (7.2%). The majority (87%) of case subjects reported no family history of pulmonary fibrosis. We searched 18,668 protein-coding genes for an excess of rare deleterious genetic variation using whole-exome sequence data from 262 case subjects with pulmonary fibrosis and 4,141 control subjects drawn from among a set of individuals of European ancestry. Comparing genetic variation across 18,668 protein-coding genes, we found a study-wide significant (P < 4.5 × 10 -7 ) case enrichment of qualifying variants in TERT, RTEL1, and PARN. A model qualifying ultrarare, deleterious, nonsynonymous variants implicated TERT and RTEL1, and a model specifically qualifying loss-of-function variants implicated RTEL1 and PARN. A subanalysis of 186 case subjects with sporadic IPF confirmed TERT, RTEL1, and PARN as study-wide significant contributors to sporadic IPF. Collectively, 11.3% of case subjects with sporadic IPF carried a qualifying variant in one of these three genes compared with the 0.3% carrier rate observed among control subjects (odds ratio, 47.7; 95% confidence interval, 21.5-111.6; P = 5.5 × 10 -22 ). We identified TERT, RTEL1, and PARN-three telomere-related genes previously implicated in familial pulmonary fibrosis-as significant contributors to sporadic IPF. These results support the idea that telomere dysfunction is involved in IPF pathogenesis.
Mutation Screening of 1,237 Cancer Genes across Six Model Cell Lines of Basal-Like Breast Cancer.

PubMed

Olsson, Eleonor; Winter, Christof; George, Anthony; Chen, Yilun; Törngren, Therese; Bendahl, Pär-Ola; Borg, Åke; Gruvberger-Saal, Sofia K; Saal, Lao H

2015-01-01

Basal-like breast cancer is an aggressive subtype generally characterized as poor prognosis and lacking the expression of the three most important clinical biomarkers, estrogen receptor, progesterone receptor, and HER2. Cell lines serve as useful model systems to study cancer biology in vitro and in vivo. We performed mutational profiling of six basal-like breast cancer cell lines (HCC38, HCC1143, HCC1187, HCC1395, HCC1954, and HCC1937) and their matched normal lymphocyte DNA using targeted capture and next-generation sequencing of 1,237 cancer-associated genes, including all exons, UTRs and upstream flanking regions. In total, 658 somatic variants were identified, of which 378 were non-silent (average 63 per cell line, range 37-146) and 315 were novel (not present in the Catalogue of Somatic Mutations in Cancer database; COSMIC). 125 novel mutations were confirmed by Sanger sequencing (59 exonic, 48 3'UTR and 10 5'UTR, 1 splicing), with a validation rate of 94% of high confidence variants. Of 36 mutations previously reported for these cell lines but not detected in our exome data, 36% could not be detected by Sanger sequencing. The base replacements C/G>A/T, C/G>G/C, C/G>T/A and A/T>G/C were significantly more frequent in the coding regions compared to the non-coding regions (OR 3.2, 95% CI 2.0-5.3, P<0.0001; OR 4.3, 95% CI 2.9-6.6, P<0.0001; OR 2.4, 95% CI 1.8-3.1, P<0.0001; OR 1.8, 95% CI 1.2-2.7, P = 0.024, respectively). The single nucleotide variants within the context of T[C]T/A[G]A and T[C]A/T[G]A were more frequent in the coding than in the non-coding regions (OR 3.7, 95% CI 2.2-6.1, P<0.0001; OR 3.8, 95% CI 2.0-7.2, P = 0.001, respectively). Copy number estimations were derived from the targeted regions and correlated well to Affymetrix SNP array copy number data (Pearson correlation 0.82 to 0.96 for all compared cell lines; P<0.0001). These mutation calls across 1,237 cancer-associated genes and identification of novel variants will aid in the design and interpretation of biological experiments using these six basal-like breast cancer cell lines.
Whole Transcriptome Sequencing Enables Discovery and Analysis of Viruses in Archived Primary Central Nervous System Lymphomas

PubMed Central

DeBoever, Christopher; Reid, Erin G.; Smith, Erin N.; Wang, Xiaoyun; Dumaop, Wilmar; Harismendy, Olivier; Carson, Dennis; Richman, Douglas; Masliah, Eliezer; Frazer, Kelly A.

2013-01-01

Primary central nervous system lymphomas (PCNSL) have a dramatically increased prevalence among persons living with AIDS and are known to be associated with human Epstein Barr virus (EBV) infection. Previous work suggests that in some cases, co-infection with other viruses may be important for PCNSL pathogenesis. Viral transcription in tumor samples can be measured using next generation transcriptome sequencing. We demonstrate the ability of transcriptome sequencing to identify viruses, characterize viral expression, and identify viral variants by sequencing four archived AIDS-related PCNSL tissue samples and analyzing raw sequencing reads. EBV was detected in all four PCNSL samples and cytomegalovirus (CMV), JC polyomavirus (JCV), and HIV were also discovered, consistent with clinical diagnoses. CMV was found to express three long non-coding RNAs recently reported as expressed during active infection. Single nucleotide variants were observed in each of the viruses observed and three indels were found in CMV. No viruses were found in several control tumor types including 32 diffuse large B-cell lymphoma samples. This study demonstrates the ability of next generation transcriptome sequencing to accurately identify viruses, including DNA viruses, in solid human cancer tissue samples. PMID:24023918

Next generation sequencing in women affected by nonsyndromic premature ovarian failure displays new potential causative genes and mutations.

PubMed

Fonseca, Dora Janeth; Patiño, Liliana Catherine; Suárez, Yohjana Carolina; de Jesús Rodríguez, Asid; Mateus, Heidi Eliana; Jiménez, Karen Marcela; Ortega-Recalde, Oscar; Díaz-Yamal, Ivonne; Laissue, Paul

2015-07-01

To identify new molecular actors involved in nonsyndromic premature ovarian failure (POF) etiology. This is a retrospective case-control cohort study. University research group and IVF medical center. Twelve women affected by nonsyndromic POF. The control group included 176 women whose menopause had occurred after age 50 and had no antecedents regarding gynecological disease. A further 345 women from the same ethnic origin (general population group) were also recruited to assess allele frequency for potentially deleterious sequence variants. Next generation sequencing (NGS), Sanger sequencing, and bioinformatics analysis. The complete coding regions of 70 candidate genes were massively sequenced, via NGS, in POF patients. Bioinformatics and genetics were used to confirm NGS results and to identify potential sequence variants related to the disease pathogenesis. We have identified mutations in two novel genes, ADAMTS19 and BMPR2, that are potentially related to POF origin. LHCGR mutations, which might have contributed to the phenotype, were also detected. We thus recommend NGS as a powerful tool for identifying new molecular actors in POF and for future diagnostic/prognostic purposes. Copyright © 2015 American Society for Reproductive Medicine. Published by Elsevier Inc. All rights reserved.
Korean Variant Archive (KOVA): a reference database of genetic variations in the Korean population.

PubMed

Lee, Sangmoon; Seo, Jihae; Park, Jinman; Nam, Jae-Yong; Choi, Ahyoung; Ignatius, Jason S; Bjornson, Robert D; Chae, Jong-Hee; Jang, In-Jin; Lee, Sanghyuk; Park, Woong-Yang; Baek, Daehyun; Choi, Murim

2017-06-27

Despite efforts to interrogate human genome variation through large-scale databases, systematic preference toward populations of Caucasian descendants has resulted in unintended reduction of power in studying non-Caucasians. Here we report a compilation of coding variants from 1,055 healthy Korean individuals (KOVA; Korean Variant Archive). The samples were sequenced to a mean depth of 75x, yielding 101 singleton variants per individual. Population genetics analysis demonstrates that the Korean population is a distinct ethnic group comparable to other discrete ethnic groups in Africa and Europe, providing a rationale for such independent genomic datasets. Indeed, KOVA conferred 22.8% increased variant filtering power in addition to Exome Aggregation Consortium (ExAC) when used on Korean exomes. Functional assessment of nonsynonymous variant supported the presence of purifying selection in Koreans. Analysis of copy number variants detected 5.2 deletions and 10.3 amplifications per individual with an increased fraction of novel variants among smaller and rarer copy number variable segments. We also report a list of germline variants that are associated with increased tumor susceptibility. This catalog can function as a critical addition to the pre-existing variant databases in pursuing genetic studies of Korean individuals.
Characterization of two novel cold-inducible K3 dehydrin genes from alfalfa (Medicago sativa spp. sativa L.).

PubMed

Dubé, Marie-Pier; Castonguay, Yves; Cloutier, Jean; Michaud, Josée; Bertrand, Annick

2013-03-01

Dehydrin defines a complex family of intrinsically disordered proteins with potential adaptive value with regard to freeze-induced cell dehydration. Search within an expressed sequence tags library from cDNAs of cold-acclimated crowns of alfalfa (Medicago sativa spp. sativa L.) identified transcripts putatively encoding K(3)-type dehydrins. Analysis of full-length coding sequences unveiled two highly homologous sequence variants, K(3)-A and K(3)-B. An increase in the frequency of genotypes yielding positive genomic amplification of the K(3)-dehydrin variants in response to selection for superior tolerance to freezing and the induction of their expression at low temperature strongly support a link with cold adaptation. The presence of multiple allelic forms within single genotypes and independent segregation indicate that the two K(3) dehydrin variants are encoded by distinct genes located at unlinked loci. The co-inheritance of the K(3)-A dehydrin with a Y(2)K(4) dehydrin restriction fragment length polymorphism with a demonstrated impact on freezing tolerance suggests the presence of a genome domain where these functionally related genes are located. These results provide additional evidence that dehydrin play important roles with regard to tolerance to subfreezing temperatures. They also underscore the value of recurrent selection to help identify variants within a large multigene family in allopolyploid species like alfalfa.
Mutation Spectrum of the ABCA4 Gene in a Greek Cohort with Stargardt Disease: Identification of Novel Mutations and Evidence of Three Prevalent Mutated Alleles

PubMed Central

Vassiliki, Kokkinou; George, Koutsodontis; Polixeni, Stamatiou; Christoforos, Giatzakis; Minas, Aslanides Ioannis; Stavrenia, Koukoula; Ioannis, Datseris

2018-01-01

Aim To evaluate the frequency and pattern of disease-associated mutations of ABCA4 gene among Greek patients with presumed Stargardt disease (STGD1). Materials and Methods A total of 59 patients were analyzed for ABCA4 mutations using the ABCR400 microarray and PCR-based sequencing of all coding exons and flanking intronic regions. MLPA analysis as well as sequencing of two regions in introns 30 and 36 reported earlier to harbor deep intronic disease-associated variants was used in 4 selected cases. Results An overall detection rate of at least one mutant allele was achieved in 52 of the 59 patients (88.1%). Direct sequencing improved significantly the complete characterization rate, that is, identification of two mutations compared to the microarray analysis (93.1% versus 50%). In total, 40 distinct potentially disease-causing variants of the ABCA4 gene were detected, including six previously unreported potentially pathogenic variants. Among the disease-causing variants, in this cohort, the most frequent was c.5714+5G>A representing 16.1%, while p.Gly1961Glu and p.Leu541Pro represented 15.2% and 8.5%, respectively. Conclusions By using a combination of methods, we completely molecularly diagnosed 48 of the 59 patients studied. In addition, we identified six previously unreported, potentially pathogenic ABCA4 mutations. PMID:29854428
APADB: a database for alternative polyadenylation and microRNA regulation events

PubMed Central

Müller, Sören; Rycak, Lukas; Afonso-Grunz, Fabian; Winter, Peter; Zawada, Adam M.; Damrath, Ewa; Scheider, Jessica; Schmäh, Juliane; Koch, Ina; Kahl, Günter; Rotter, Björn

2014-01-01

Alternative polyadenylation (APA) is a widespread mechanism that contributes to the sophisticated dynamics of gene regulation. Approximately 50% of all protein-coding human genes harbor multiple polyadenylation (PA) sites; their selective and combinatorial use gives rise to transcript variants with differing length of their 3′ untranslated region (3′UTR). Shortened variants escape UTR-mediated regulation by microRNAs (miRNAs), especially in cancer, where global 3′UTR shortening accelerates disease progression, dedifferentiation and proliferation. Here we present APADB, a database of vertebrate PA sites determined by 3′ end sequencing, using massive analysis of complementary DNA ends. APADB provides (A)PA sites for coding and non-coding transcripts of human, mouse and chicken genes. For human and mouse, several tissue types, including different cancer specimens, are available. APADB records the loss of predicted miRNA binding sites and visualizes next-generation sequencing reads that support each PA site in a genome browser. The database tables can either be browsed according to organism and tissue or alternatively searched for a gene of interest. APADB is the largest database of APA in human, chicken and mouse. The stored information provides experimental evidence for thousands of PA sites and APA events. APADB combines 3′ end sequencing data with prediction algorithms of miRNA binding sites, allowing to further improve prediction algorithms. Current databases lack correct information about 3′UTR lengths, especially for chicken, and APADB provides necessary information to close this gap. Database URL: http://tools.genxpro.net/apadb/ PMID:25052703
TMTC2 variant associated with sensorineural hearing loss and auditory neuropathy spectrum disorder in a family dyad.

PubMed

Guillen-Ahlers, Hector; Erbe, Christy B; Chevalier, Frédéric D; Montoya, Maria J; Zimmerman, Kip D; Langefeld, Carl D; Olivier, Michael; Runge, Christina L

2018-04-19

Sensorineural hearing loss (SNHL) is a common form of hearing loss that can be inherited or triggered by environmental insults; auditory neuropathy spectrum disorder (ANSD) is a SNHL subtype with unique diagnostic criteria. The genetic factors associated with these impairments are vast and diverse, but causal genetic factors are rarely characterized. A family dyad, both cochlear implant recipients, presented with a hearing history of bilateral, progressive SNHL, and ANSD. Whole-exome sequencing was performed to identify coding sequence variants shared by both family members, and screened against genes relevant to hearing loss and variants known to be associated with SNHL and ANSD. Both family members are successful cochlear implant users, demonstrating effective auditory nerve stimulation with their devices. Genetic analyses revealed a mutation (rs35725509) in the TMTC2 gene, which has been reported previously as a likely genetic cause of SNHL in another family of Northern European descent. This study represents the first confirmation of the rs35725509 variant in an independent family as a likely cause for the complex hearing loss phenotype (SNHL and ANSD) observed in this family dyad. © 2018 The Authors. Molecular Genetics & Genomic Medicine published by Wiley Periodicals, Inc.
Brittle cornea syndrome ZNF469 mutation carrier phenotype and segregation analysis of rare ZNF469 variants in familial keratoconus.

PubMed

Davidson, Alice E; Borasio, Edmondo; Liskova, Petra; Khan, Arif O; Hassan, Hala; Cheetham, Michael E; Plagnol, Vincent; Alkuraya, Fowzan S; Tuft, Stephen J; Hardcastle, Alison J

2015-01-06

Brittle cornea syndrome 1 (BCS1) is a rare recessive condition characterized by extreme thinning of the cornea and sclera, caused by mutations in ZNF469. Keratoconus is a relatively common disease characterized by progressive thinning and ectasia of the cornea. The etiology of keratoconus is complex and not yet understood, but rare ZNF469 variants have recently been associated with disease. We investigated the phenotype of BCS1 carriers with known pathogenic ZNF469 mutations, and recruited families in which aggregation of keratoconus was observed to establish if rare variants in ZNF469 segregated with disease. Patients and family members were recruited and underwent comprehensive anterior segment examination, including corneal topography. Blood samples were donated and genomic DNA was extracted. The coding sequence and splice sites of ZNF469 were PCR amplified and Sanger sequenced. Four carriers of three BCS1-associated ZNF469 loss-of-function mutations (p.[Glu1392Ter], p.[Gln1930Argfs*6], p.[Gln1930fs*133]) were examined and none had keratoconus. One carrier had partially penetrant features of BCS1, including joint hypermobility. ZNF469 sequencing in 11 keratoconus families identified 9 rare (minor allele frequency [MAF] ≤ 0.025) variants predicted to be potentially damaging. However, in each instance the rare variant(s) identified, including two previously reported as potentially keratoconus-associated, did not segregate with the disease. The presence of heterozygous loss-of-function alleles in the ZNF469 gene did not cause keratoconus in the individuals examined. None of the rare nonsynonymous ZNF469 variants identified in the familial cohort conferred a high risk of keratoconus; therefore, genetic variants contributing to disease pathogenesis in these 11 families remain to be identified. Copyright 2015 The Association for Research in Vision and Ophthalmology, Inc.
A comprehensive collection of annotations to interpret sequence variation in human mitochondrial transfer RNAs.

PubMed

Diroma, Maria Angela; Lubisco, Paolo; Attimonelli, Marcella

2016-11-08

The abundance of biological data characterizing the genomics era is contributing to a comprehensive understanding of human mitochondrial genetics. Nevertheless, many aspects are still unclear, specifically about the variability of the 22 human mitochondrial transfer RNA (tRNA) genes and their involvement in diseases. The complex enrichment and isolation of tRNAs in vitro leads to an incomplete knowledge of their post-transcriptional modifications and three-dimensional folding, essential for correct tRNA functioning. An accurate annotation of mitochondrial tRNA variants would be definitely useful and appreciated by mitochondrial researchers and clinicians since the most of bioinformatics tools for variant annotation and prioritization available so far cannot shed light on the functional role of tRNA variations. To this aim, we updated our MToolBox pipeline for mitochondrial DNA analysis of high throughput and Sanger sequencing data by integrating tRNA variant annotations in order to identify and characterize relevant variants not only in protein coding regions, but also in tRNA genes. The annotation step in the pipeline now provides detailed information for variants mapping onto the 22 mitochondrial tRNAs. For each mt-tRNA position along the entire genome, the relative tRNA numbering, tRNA type, cloverleaf secondary domains (loops and stems), mature nucleotide and interactions in the three-dimensional folding were reported. Moreover, pathogenicity predictions for tRNA and rRNA variants were retrieved from the literature and integrated within the annotations provided by MToolBox, both in the stand-alone version and web-based tool at the Mitochondrial Disease Sequence Data Resource (MSeqDR) website. All the information available in the annotation step of MToolBox were exploited to generate custom tracks which can be displayed in the GBrowse instance at MSeqDR website. To the best of our knowledge, specific data regarding mitochondrial variants in tRNA genes were introduced for the first time in a tool for mitochondrial genome analysis, supporting the interpretation of genetic variants in specific genomic contexts.
Frequency of pathogenic germline mutation in CHEK2, PALB2, MRE11, and RAD50 in patients at high risk for hereditary breast cancer.

PubMed

Kim, Haeyoung; Cho, Dae-Yeon; Choi, Doo Ho; Oh, Mijin; Shin, Inkyung; Park, Won; Huh, Seung Jae; Nam, Seok Jin; Lee, Jeong Eon; Kim, Seok Won

2017-01-01

This study was performed to evaluate the frequency of mutations in CHEK2, PALB2, MRE11, and RAD50 among Korean patients at high risk for hereditary breast cancer. A total of 235 Korean patients with hereditary breast cancer who tested negative for BRCA1/2 mutation were enrolled to this study. Entire coding regions of CHEK2, PALB2, MRE11, and RAD50 were analyzed using massively parallel sequencing (MPS). Sequence variants detected by MPS were confirmed by Sanger sequencing. Six patients (2.5 %) were found to have pathogenic variants in CHEK2 (n = 1), PALB2 (n = 2), MRE11 (n = 1), and RAD50 (n = 2). Among the pathogenic variants, PALB2 c.2257C>T was previously reported in other studies, while CHEK2 c.1245dupC, PALB2 c.1048C>T, MRE11 c.1773_1774delAA, RAD50 c.1276C>T, and RAD50 c.3811_3813delGAA were newly identified in this study. A total of 15 missense variants were found in the four genes among 26 patients; 7 patients had a variant in CHEK2, 11 in PALB2, 2 in MRE11, and 6 in RAD50. When in silico analyses were performed to the 15 missense variants, six variants (CHEK2 c.686A>G, PALB2 c.1492G>T, PALB2 c.3054G>C, MRE11 c.140C>T, RAD50 c.1456C>T, and RAD50 c.3790C>T) were predicted to be deleterious. Pathogenic variants in CHEK2, PALB2, MRE11, and RAD50 were detected in a small proportion of Korean patients with features of hereditary breast cancer.
No evidence that protein truncating variants in BRIP1 are associated with breast cancer risk: implications for gene panel testing

PubMed Central

Easton, Douglas F; Lesueur, Fabienne; Decker, Brennan; Michailidou, Kyriaki; Li, Jun; Allen, Jamie; Luccarini, Craig; Pooley, Karen A; Shah, Mitul; Bolla, Manjeet K; Wang, Qin; Dennis, Joe; Ahmad, Jamil; Thompson, Ella R; Damiola, Francesca; Pertesi, Maroulio; Voegele, Catherine; Mebirouk, Noura; Robinot, Nivonirina; Durand, Geoffroy; Forey, Nathalie; Luben, Robert N; Ahmed, Shahana; Aittomäki, Kristiina; Anton-Culver, Hoda; Arndt, Volker; Baynes, Caroline; Beckman, Matthias W; Benitez, Javier; Van Den Berg, David; Blot, William J; Bogdanova, Natalia V; Bojesen, Stig E; Brenner, Hermann; Chang-Claude, Jenny; Chia, Kee Seng; Choi, Ji-Yeob; Conroy, Don M; Cox, Angela; Cross, Simon S; Czene, Kamila; Darabi, Hatef; Devilee, Peter; Eriksson, Mikael; Fasching, Peter A; Figueroa, Jonine; Flyger, Henrik; Fostira, Florentia; García-Closas, Montserrat; Giles, Graham G; Glendon, Gord; González-Neira, Anna; Guénel, Pascal; Haiman, Christopher A; Hall, Per; Hart, Steven N; Hartman, Mikael; Hooning, Maartje J; Hsiung, Chia-Ni; Ito, Hidemi; Jakubowska, Anna; James, Paul A; John, Esther M; Johnson, Nichola; Jones, Michael; Kabisch, Maria; Kang, Daehee; Kosma, Veli-Matti; Kristensen, Vessela; Lambrechts, Diether; Li, Na; Lindblom, Annika; Long, Jirong; Lophatananon, Artitaya; Lubinski, Jan; Mannermaa, Arto; Manoukian, Siranoush; Margolin, Sara; Matsuo, Keitaro; Meindl, Alfons; Mitchell, Gillian; Muir, Kenneth; Nevelsteen, Ines; van den Ouweland, Ans; Peterlongo, Paolo; Phuah, Sze Yee; Pylkäs, Katri; Rowley, Simone M; Sangrajrang, Suleeporn; Schmutzler, Rita K; Shen, Chen-Yang; Shu, Xiao-Ou; Southey, Melissa C; Surowy, Harald; Swerdlow, Anthony; Teo, Soo H; Tollenaar, Rob A E M; Tomlinson, Ian; Torres, Diana; Truong, Thérèse; Vachon, Celine; Verhoef, Senno; Wong-Brown, Michelle; Zheng, Wei; Zheng, Ying; Nevanlinna, Heli; Scott, Rodney J; Andrulis, Irene L; Wu, Anna H; Hopper, John L; Couch, Fergus J; Winqvist, Robert; Burwinkel, Barbara; Sawyer, Elinor J; Schmidt, Marjanka K; Rudolph, Anja; Dörk, Thilo; Brauch, Hiltrud; Hamann, Ute; Neuhausen, Susan L; Milne, Roger L; Fletcher, Olivia; Pharoah, Paul D P; Campbell, Ian G; Dunning, Alison M; Le Calvez-Kelm, Florence; Goldgar, David E; Tavtigian, Sean V; Chenevix-Trench, Georgia

2016-01-01

Background BRCA1 interacting protein C-terminal helicase 1 (BRIP1) is one of the Fanconi Anaemia Complementation (FANC) group family of DNA repair proteins. Biallelic mutations in BRIP1 are responsible for FANC group J, and previous studies have also suggested that rare protein truncating variants in BRIP1 are associated with an increased risk of breast cancer. These studies have led to inclusion of BRIP1 on targeted sequencing panels for breast cancer risk prediction. Methods We evaluated a truncating variant, p.Arg798Ter (rs137852986), and 10 missense variants of BRIP1, in 48 144 cases and 43 607 controls of European origin, drawn from 41 studies participating in the Breast Cancer Association Consortium (BCAC). Additionally, we sequenced the coding regions of BRIP1 in 13 213 cases and 5242 controls from the UK, 1313 cases and 1123 controls from three population-based studies as part of the Breast Cancer Family Registry, and 1853 familial cases and 2001 controls from Australia. Results The rare truncating allele of rs137852986 was observed in 23 cases and 18 controls in Europeans in BCAC (OR 1.09, 95% CI 0.58 to 2.03, p=0.79). Truncating variants were found in the sequencing studies in 34 cases (0.21%) and 19 controls (0.23%) (combined OR 0.90, 95% CI 0.48 to 1.70, p=0.75). Conclusions These results suggest that truncating variants in BRIP1, and in particular p.Arg798Ter, are not associated with a substantial increase in breast cancer risk. Such observations have important implications for the reporting of results from breast cancer screening panels. PMID:26921362
Somatic and Germline TP53 Alterations in Second Malignant Neoplasms from Pediatric Cancer Survivors.

PubMed

Sherborne, Amy L; Lavergne, Vincent; Yu, Katharine; Lee, Leah; Davidson, Philip R; Mazor, Tali; Smirnoff, Ivan V; Horvai, Andrew E; Loh, Mignon; DuBois, Steven G; Goldsby, Robert E; Neglia, Joseph P; Hammond, Sue; Robison, Leslie L; Wustrack, Rosanna; Costello, Joseph F; Nakamura, Alice O; Shannon, Kevin M; Bhatia, Smita; Nakamura, Jean L

2017-04-01

Purpose: Second malignant neoplasms (SMNs) are severe late complications that occur in pediatric cancer survivors exposed to radiotherapy and other genotoxic treatments. To characterize the mutational landscape of treatment-induced sarcomas and to identify candidate SMN-predisposing variants, we analyzed germline and SMN samples from pediatric cancer survivors. Experimental Design: We performed whole-exome sequencing (WES) and RNA sequencing on radiation-induced sarcomas arising from two pediatric cancer survivors. To assess the frequency of germline TP53 variants in SMNs, Sanger sequencing was performed to analyze germline TP53 in 37 pediatric cancer survivors from the Childhood Cancer Survivor Study (CCSS) without any history of a familial cancer predisposition syndrome but known to have developed SMNs. Results: WES revealed TP53 mutations involving p53's DNA-binding domain in both index cases, one of which was also present in the germline. The germline and somatic TP53- mutant variants were enriched in the transcriptomes for both sarcomas. Analysis of TP53- coding exons in germline specimens from the CCSS survivor cohort identified a G215C variant encoding an R72P amino acid substitution in 6 patients and a synonymous SNP A639G in 4 others, resulting in 10 of 37 evaluable patients (27%) harboring a germline TP53 variant. Conclusions: Currently, germline TP53 is not routinely assessed in patients with pediatric cancer. These data support the concept that identifying germline TP53 variants at the time a primary cancer is diagnosed may identify patients at high risk for SMN development, who could benefit from modified therapeutic strategies and/or intensive posttreatment monitoring. Clin Cancer Res; 23(7); 1852-61. ©2016 AACR . ©2016 American Association for Cancer Research.
Somatic and germline TP53 alterations in second malignant neoplasms from pediatric cancer survivors

PubMed Central

Sherborne, Amy L.; Lavergne, Vincent; Yu, Katharine; Lee, Leah; Davidson, Philip R.; Mazor, Tali; Smirnoff, Ivan; Horvai, Andrew; Loh, Mignon; DuBois, Steven G.; Goldsby, Robert E.; Neglia, Joseph; Hammond, Sue; Robison, Leslie L.; Wustrack, Rosanna; Costello, Joseph; Nakamura, Alice O.; Shannon, Kevin; Bhatia, Smita; Nakamura, Jean L.

2016-01-01

Purpose Second malignant neoplasms (SMNs) are severe late complications that occur in pediatric cancer survivors exposed to radiotherapy and other genotoxic treatments. To characterize the mutational landscape of treatment-induced sarcomas and to identify candidate SMN-predisposing variants we analyzed germline and SMN samples from pediatric cancer survivors. Experimental Design We performed whole exome sequencing (WES) and RNA sequencing on radiation-induced sarcomas arising from two pediatric cancer survivors. To assess the frequency of germline TP53 variants in SMNs, Sanger sequencing was performed to analyze germline TP53 in thirty-seven pediatric cancer survivors from the Childhood Cancer Survivor Study (CCSS) without history of a familial cancer predisposition syndrome but known to have developed SMNs. Results WES revealed TP53 mutations involving p53’s DNA binding domain in both index cases, one of which was also present in the germline. The germline and somatic TP53 mutant variants were enriched in the transcriptomes for both sarcomas. Analysis of TP53 coding exons in germline specimens from the CCSS survivor cohort identified a G215C variant encoding an R72P amino acid substitution in six patients and a synonymous single nucleotide polymorphism A639G in four others, resulting in ten out of 37 evaluable patients (27%) harboring a germline TP53 variant. Conclusions Currently, germline TP53 is not routinely assessed in pediatric cancer patients. These data support the concept that identifying germline TP53 variants at the time a primary cancer is diagnosed may identify patients at high risk for SMN development, who could benefit from modified therapeutic strategies and/or intensive post-treatment monitoring. PMID:27683180
Deep Resequencing of GWAS Loci Identifies Rare Variants in CARD9, IL23R and RNF186 That Are Associated with Ulcerative Colitis

PubMed Central

Boucher, Gabrielle; Lo, Ken Sin; Rivas, Manuel A.; Stevens, Christine; Alikashani, Azadeh; Ladouceur, Martin; Ellinghaus, David; Törkvist, Leif; Goel, Gautam; Lagacé, Caroline; Annese, Vito; Bitton, Alain; Begun, Jakob; Brant, Steve R.; Bresso, Francesca; Cho, Judy H.; Duerr, Richard H.; Halfvarson, Jonas; McGovern, Dermot P. B.; Radford-Smith, Graham; Schreiber, Stefan; Schumm, Philip L.; Sharma, Yashoda; Silverberg, Mark S.; Weersma, Rinse K.; D'Amato, Mauro; Vermeire, Severine; Franke, Andre; Lettre, Guillaume; Xavier, Ramnik J.; Daly, Mark J.; Rioux, John D.

2013-01-01

Genome-wide association studies and follow-up meta-analyses in Crohn's disease (CD) and ulcerative colitis (UC) have recently identified 163 disease-associated loci that meet genome-wide significance for these two inflammatory bowel diseases (IBD). These discoveries have already had a tremendous impact on our understanding of the genetic architecture of these diseases and have directed functional studies that have revealed some of the biological functions that are important to IBD (e.g. autophagy). Nonetheless, these loci can only explain a small proportion of disease variance (∼14% in CD and 7.5% in UC), suggesting that not only are additional loci to be found but that the known loci may contain high effect rare risk variants that have gone undetected by GWAS. To test this, we have used a targeted sequencing approach in 200 UC cases and 150 healthy controls (HC), all of French Canadian descent, to study 55 genes in regions associated with UC. We performed follow-up genotyping of 42 rare non-synonymous variants in independent case-control cohorts (totaling 14,435 UC cases and 20,204 HC). Our results confirmed significant association to rare non-synonymous coding variants in both IL23R and CARD9, previously identified from sequencing of CD loci, as well as identified a novel association in RNF186. With the exception of CARD9 (OR = 0.39), the rare non-synonymous variants identified were of moderate effect (OR = 1.49 for RNF186 and OR = 0.79 for IL23R). RNF186 encodes a protein with a RING domain having predicted E3 ubiquitin-protein ligase activity and two transmembrane domains. Importantly, the disease-coding variant is located in the ubiquitin ligase domain. Finally, our results suggest that rare variants in genes identified by genome-wide association in UC are unlikely to contribute significantly to the overall variance for the disease. Rather, these are expected to help focus functional studies of the corresponding disease loci. PMID:24068945
Assessment of allelic diversity in intron-containing Mal d 1 genes and their association to apple allergenicity

PubMed Central

Gao, Zhongshan; Weg, Eric W van de; Matos, Catarina I; Arens, Paul; Bolhaar, Suzanne THP; Knulst, Andre C; Li, Yinghui; Hoffmann-Sommergruber, Karin; Gilissen, Luud JWJ

2008-01-01

Background Mal d 1 is a major apple allergen causing food allergic symptoms of the oral allergy syndrome (OAS) in birch-pollen sensitised patients. The Mal d 1 gene family is known to have at least 7 intron-containing and 11 intronless members that have been mapped in clusters on three linkage groups. In this study, the allelic diversity of the seven intron-containing Mal d 1 genes was assessed among a set of apple cultivars by sequencing or indirectly through pedigree genotyping. Protein variant constitutions were subsequently compared with Skin Prick Test (SPT) responses to study the association of deduced protein variants with allergenicity in a set of 14 cultivars. Results From the seven intron-containing Mal d 1 genes investigated, Mal d 1.01 and Mal d 1.02 were highly conserved, as nine out of ten cultivars coded for the same protein variant, while only one cultivar coded for a second variant. Mal d 1.04, Mal d 1.05 and Mal d 1.06 A, B and C were more variable, coding for three to six different protein variants. Comparison of Mal d 1 allelic composition between the high-allergenic cultivar Golden Delicious and the low-allergenic cultivars Santana and Priscilla, which are linked in pedigree, showed an association between the protein variants coded by the Mal d 1.04 and -1.06A genes (both located on linkage group 16) with allergenicity. This association was confirmed in 10 other cultivars. In addition, Mal d 1.06A allele dosage effects associated with the degree of allergenicity based on prick to prick testing. Conversely, no associations were observed for the protein variants coded by the Mal d 1.01 (on linkage group 13), -1.02, -1.06B, -1.06C genes (all on linkage group 16), nor by the Mal d 1.05 gene (on linkage group 6). Conclusion Protein variant compositions of Mal d 1.04 and -1.06A and, in case of Mal d 1.06A, allele doses are associated with the differences in allergenicity among fourteen apple cultivars. This information indicates the involvement of qualitative as well as quantitative factors in allergenicity and warrants further research in the relative importance of quantitative and qualitative aspects of Mal d 1 gene expression on allergenicity. Results from this study have implications for medical diagnostics, immunotherapy, clinical research and breeding schemes for new hypo-allergenic cultivars. PMID:19014530
Mutation spectrum of genes associated with steroid-resistant nephrotic syndrome in Chinese children.

PubMed

Wang, Ying; Dang, Xiqiang; He, Qingnan; Zhen, Yan; He, Xiaoxie; Yi, Zhuwen; Zhu, Kuichun

2017-08-20

Approximately 20% of children with idiopathic nephrotic syndrome do not respond to steroid therapy. More than 30 genes have been identified as disease-causing genes for the steroid-resistant nephrotic syndrome (SRNS). Few reports were from the Chinese population. The coding regions of genes commonly associated with SRNS were analyzed to characterize the gene mutation spectrum in children with SRNS in central China. The first phase study involved 38 children with five genes (NPHS1, NPHS2, PLCE1, WT1, and TRPC6) by Sanger sequencing. The second phase study involved 33 children with 17 genes by next generation DNA sequencing (NGS. 22 new patients, and 11 patients from first phase study but without positive findings). Overall deleterious or putatively deleterious gene variants were identified in 19 patients (31.7%), including four NPHS1 variants among five patients and three PLCE1 variants among four other patients. Variants in COL4A3, COL4A4, or COL4A5 were found in six patients. Eight novel variants were identified, including two in NPHS1, two in PLCE1, one in NPHS2, LAMB2, COL4A3, and COL4A4, respectively. 55.6% of the children with variants failed to respond to immunosuppressive agent therapy, while the resistance rate in children without variants was 44.4%. Our results show that screening for deleterious variants in some common genes in children clinically suspected with SRNS might be helpful for disease diagnosis as well as prediction of treatment efficacy and prognosis. Copyright © 2017 Elsevier B.V. All rights reserved.
New mutations in non-syndromic primary ovarian insufficiency patients identified via whole-exome sequencing.

PubMed

Patiño, Liliana Catherine; Beau, Isabelle; Carlosama, Carolina; Buitrago, July Constanza; González, Ronald; Suárez, Carlos Fernando; Patarroyo, Manuel Alfonso; Delemer, Brigitte; Young, Jacques; Binart, Nadine; Laissue, Paul

2017-07-01

Is it possible to identify new mutations potentially associated with non-syndromic primary ovarian insufficiency (POI) via whole-exome sequencing (WES)? WES is an efficient tool to study genetic causes of POI as we have identified new mutations, some of which lead to protein destablization potentially contributing to the disease etiology. POI is a frequently occurring complex pathology leading to infertility. Mutations in only few candidate genes, mainly identified by Sanger sequencing, have been definitively related to the pathogenesis of the disease. This is a retrospective cohort study performed on 69 women affected by POI. WES and an innovative bioinformatics analysis were used on non-synonymous sequence variants in a subset of 420 selected POI candidate genes. Mutations in BMPR1B and GREM1 were modeled by using fragment molecular orbital analysis. Fifty-five coding variants in 49 genes potentially related to POI were identified in 33 out of 69 patients (48%). These genes participate in key biological processes in the ovary, such as meiosis, follicular development, granulosa cell differentiation/proliferation and ovulation. The presence of at least two mutations in distinct genes in 42% of the patients argued in favor of a polygenic nature of POI. It is possible that regulatory regions, not analyzed in the present study, carry further variants related to POI. WES and the in silico analyses presented here represent an efficient approach for mapping variants associated with POI etiology. Sequence variants presented here represents potential future genetic biomarkers. This study was supported by the Universidad del Rosario and Colciencias (Grants CS/CIGGUR-ABN062-2016 and 672-2014). Colciencias supported Liliana Catherine Patiño´s work (Fellowship: 617, 2013). The authors declare no conflict of interest. © The Author 2017. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Whole-Exome Sequencing in Age-Related Macular Degeneration Identifies Rare Variants in COL8A1, a Component of Bruch's Membrane.

PubMed

Corominas, Jordi; Colijn, Johanna M; Geerlings, Maartje J; Pauper, Marc; Bakker, Bjorn; Amin, Najaf; Lores Motta, Laura; Kersten, Eveline; Garanto, Alejandro; Verlouw, Joost A M; van Rooij, Jeroen G J; Kraaij, Robert; de Jong, Paulus T V M; Hofman, Albert; Vingerling, Johannes R; Schick, Tina; Fauser, Sascha; de Jong, Eiko K; van Duijn, Cornelia M; Hoyng, Carel B; Klaver, Caroline C W; den Hollander, Anneke I

2018-04-26

Genome-wide association studies and targeted sequencing studies of candidate genes have identified common and rare variants that are associated with age-related macular degeneration (AMD). Whole-exome sequencing (WES) studies allow a more comprehensive analysis of rare coding variants across all genes of the genome and will contribute to a better understanding of the underlying disease mechanisms. To date, the number of WES studies in AMD case-control cohorts remains scarce and sample sizes are limited. To scrutinize the role of rare protein-altering variants in AMD cause, we performed the largest WES study in AMD to date in a large European cohort consisting of 1125 AMD patients and 1361 control participants. Genome-wide case-control association study of WES data. One thousand one hundred twenty-five AMD patients and 1361 control participants. A single variant association test of WES data was performed to detect variants that are associated individually with AMD. The cumulative effect of multiple rare variants with 1 gene was analyzed using a gene-based CMC burden test. Immunohistochemistry was performed to determine the localization of the Col8a1 protein in mouse eyes. Genetic variants associated with AMD. We detected significantly more rare protein-altering variants in the COL8A1 gene in patients (22/2250 alleles [1.0%]) than in control participants (11/2722 alleles [0.4%]; P = 7.07×10 -5 ). The association of rare variants in the COL8A1 gene is independent of the common intergenic variant (rs140647181) near the COL8A1 gene previously associated with AMD. We demonstrated that the Col8a1 protein localizes at Bruch's membrane. This study supported a role for protein-altering variants in the COL8A1 gene in AMD pathogenesis. We demonstrated the presence of Col8a1 in Bruch's membrane, further supporting the role of COL8A1 variants in AMD pathogenesis. Protein-altering variants in COL8A1 may alter the integrity of Bruch's membrane, contributing to the accumulation of drusen and the development of AMD. Copyright © 2018 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.
A gene variation of 14-3-3 zeta isoform in rat hippocampus.

PubMed

Murakami, K; Situ, S Y; Eshete, F

1996-11-14

A variant form of 14-3-3 zeta was isolated from the rat hippocampal cDNA library. The cloned cDNA is 1687 bp in length and it contains an entire ORF (nt = 63-797) with 245 amino acids that is characteristic to 14-3-3 zeta subtype. By comparing with reported sequences of 14-3-3 zeta, we found three nucleotide substitutions within the coding sequence in our clone; C<-->T transition at nt = 325 and G<-->C transversions at nt = 387 and 388. Both are missense mutations, leading ACG (Thr) to ATG (Met) and CGT (Arg) to GCT (Ala) conversions at residue 88 and 109, respectively. Our results show that at least three different genetic variants of 14-3-3 zeta are present in rat species which results in protein variations. Such mutation in the amino acid sequence is an important indication of the diverse functions of this protein and may also contribute to the recent contradictory observations regarding the role of the 14-3-3 zeta subtype.
Rare ATAD5 missense variants in breast and ovarian cancer patients.

PubMed

Maleva Kostovska, Ivana; Wang, Jing; Bogdanova, Natalia; Schürmann, Peter; Bhuju, Sabin; Geffers, Robert; Dürst, Matthias; Liebrich, Clemens; Klapdor, Rüdiger; Christiansen, Hans; Park-Simon, Tjoung-Won; Hillemanns, Peter; Plaseska-Karanfilska, Dijana; Dörk, Thilo

2016-06-28

ATAD5/ELG1 is a protein crucially involved in replication and maintenance of genome stability. ATAD5 has recently been identified as a genomic risk locus for both breast and ovarian cancer through genome-wide association studies. We aimed to investigate the spectrum of coding ATAD5 germ-line mutations in hospital-based series of patients with triple-negative breast cancer or serous ovarian cancer compared with healthy controls. The ATAD5 coding and adjacent splice site regions were analyzed by targeted next-generation sequencing of DNA samples from 273 cancer patients, including 114 patients with triple-negative breast cancer and 159 patients with serous epithelial ovarian cancer, and from 276 healthy females. Among 42 different variants identified, twenty-two were rare missense substitutions, of which 14 were classified as pathogenic by at least one in silico prediction tool. Three of four novel missense substitutions (p.S354I, p.H974R and p.K1466N) were predicted to be pathogenic and were all identified in ovarian cancer patients. Overall, rare missense variants with predicted pathogenicity tended to be enriched in ovarian cancer patients (14/159) versus controls (11/276) (p = 0.05, 2df). While truncating germ-line variants in ATAD5 were not detected, it remains possible that several rare missense variants contribute to genetic susceptibility toward epithelial ovarian carcinomas. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Association between Rare Variants in AP4E1, a Component of Intracellular Trafficking, and Persistent Stuttering

PubMed Central

Raza, M. Hashim; Mattera, Rafael; Morell, Robert; Sainz, Eduardo; Rahn, Rachel; Gutierrez, Joanne; Paris, Emily; Root, Jessica; Solomon, Beth; Brewer, Carmen; Basra, M. Asim Raza; Khan, Shaheen; Riazuddin, Sheikh; Braun, Allen; Bonifacino, Juan S.; Drayna, Dennis

2015-01-01

Stuttering is a common, highly heritable neurodevelopmental disorder characterized by deficits in the volitional control of speech. Whole-exome sequencing identified two heterozygous AP4E1 coding variants, c.1549G>A (p.Val517Ile) and c.2401G>A (p.Glu801Lys), that co-segregate with persistent developmental stuttering in a large Cameroonian family, and we observed the same two variants in unrelated Cameroonians with persistent stuttering. We found 23 other rare variants, including predicted loss-of-function variants, in AP4E1 in unrelated stuttering individuals in Cameroon, Pakistan, and North America. The rate of rare variants in AP4E1 was significantly higher in unrelated Pakistani and Cameroonian stuttering individuals than in population-matched control individuals, and coding variants in this gene are exceptionally rare in the general sub-Saharan West African, South Asian, and North American populations. Clinical examination of the Cameroonian family members failed to identify any symptoms previously reported in rare individuals carrying homozygous loss-of-function mutations in this gene. AP4E1 encodes the ε subunit of the heterotetrameric (ε-β4-μ4-σ4) AP-4 complex, involved in protein sorting at the trans-Golgi network. We found that the μ4 subunit of AP-4 interacts with NAGPA, an enzyme involved in the synthesis of the mannose 6-phosphate signal that targets acid hydrolases to the lysosome and the product of a gene previously associated with stuttering. These findings implicate deficits in intracellular trafficking in persistent stuttering. PMID:26544806

Disease-associated variants in different categories of disease located in distinct regulatory elements.

PubMed

Ma, Meng; Ru, Ying; Chuang, Ling-Shiang; Hsu, Nai-Yun; Shi, Li-Song; Hakenberg, Jörg; Cheng, Wei-Yi; Uzilov, Andrew; Ding, Wei; Glicksberg, Benjamin S; Chen, Rong

2015-01-01

The invention of high throughput sequencing technologies has led to the discoveries of hundreds of thousands of genetic variants associated with thousands of human diseases. Many of these genetic variants are located outside the protein coding regions, and as such, it is challenging to interpret the function of these genetic variants by traditional genetic approaches. Recent genome-wide functional genomics studies, such as FANTOM5 and ENCODE have uncovered a large number of regulatory elements across hundreds of different tissues or cell lines in the human genome. These findings provide an opportunity to study the interaction between regulatory elements and disease-associated genetic variants. Identifying these diseased-related regulatory elements will shed light on understanding the mechanisms of how these variants regulate gene expression and ultimately result in disease formation and progression. In this study, we curated and categorized 27,558 Mendelian disease variants, 20,964 complex disease variants, 5,809 cancer predisposing germline variants, and 43,364 recurrent cancer somatic mutations. Compared against nine different types of regulatory regions from FANTOM5 and ENCODE projects, we found that different types of disease variants show distinctive propensity for particular regulatory elements. Mendelian disease variants and recurrent cancer somatic mutations are 22-fold and 10- fold significantly enriched in promoter regions respectively (q<0.001), compared with allele-frequency-matched genomic background. Separate from these two categories, cancer predisposing germline variants are 27-fold enriched in histone modification regions (q<0.001), 10-fold enriched in chromatin physical interaction regions (q<0.001), and 6-fold enriched in transcription promoters (q<0.001). Furthermore, Mendelian disease variants and recurrent cancer somatic mutations share very similar distribution across types of functional effects. We further found that regulatory regions are located within over 50% coding exon regions. Transcription promoters, methylation regions, and transcription insulators have the highest density of disease variants, with 472, 239, and 72 disease variants per one million base pairs, respectively. Disease-associated variants in different disease categories are preferentially located in particular regulatory elements. These results will be useful for an overall understanding about the differences among the pathogenic mechanisms of various disease-associated variants.
Disease-associated variants in different categories of disease located in distinct regulatory elements

PubMed Central

2015-01-01

Background The invention of high throughput sequencing technologies has led to the discoveries of hundreds of thousands of genetic variants associated with thousands of human diseases. Many of these genetic variants are located outside the protein coding regions, and as such, it is challenging to interpret the function of these genetic variants by traditional genetic approaches. Recent genome-wide functional genomics studies, such as FANTOM5 and ENCODE have uncovered a large number of regulatory elements across hundreds of different tissues or cell lines in the human genome. These findings provide an opportunity to study the interaction between regulatory elements and disease-associated genetic variants. Identifying these diseased-related regulatory elements will shed light on understanding the mechanisms of how these variants regulate gene expression and ultimately result in disease formation and progression. Results In this study, we curated and categorized 27,558 Mendelian disease variants, 20,964 complex disease variants, 5,809 cancer predisposing germline variants, and 43,364 recurrent cancer somatic mutations. Compared against nine different types of regulatory regions from FANTOM5 and ENCODE projects, we found that different types of disease variants show distinctive propensity for particular regulatory elements. Mendelian disease variants and recurrent cancer somatic mutations are 22-fold and 10- fold significantly enriched in promoter regions respectively (q<0.001), compared with allele-frequency-matched genomic background. Separate from these two categories, cancer predisposing germline variants are 27-fold enriched in histone modification regions (q<0.001), 10-fold enriched in chromatin physical interaction regions (q<0.001), and 6-fold enriched in transcription promoters (q<0.001). Furthermore, Mendelian disease variants and recurrent cancer somatic mutations share very similar distribution across types of functional effects. We further found that regulatory regions are located within over 50% coding exon regions. Transcription promoters, methylation regions, and transcription insulators have the highest density of disease variants, with 472, 239, and 72 disease variants per one million base pairs, respectively. Conclusions Disease-associated variants in different disease categories are preferentially located in particular regulatory elements. These results will be useful for an overall understanding about the differences among the pathogenic mechanisms of various disease-associated variants. PMID:26110593
Novel oxytocin receptor variants in laboring women requiring high doses of oxytocin.

PubMed

Reinl, Erin L; Goodwin, Zane A; Raghuraman, Nandini; Lee, Grace Y; Jo, Erin Y; Gezahegn, Beakal M; Pillai, Meghan K; Cahill, Alison G; de Guzman Strong, Cristina; England, Sarah K

2017-08-01

Although oxytocin commonly is used to augment or induce labor, it is difficult to predict its effectiveness because oxytocin dose requirements vary significantly among women. One possibility is that women requiring high or low doses of oxytocin have variations in the oxytocin receptor gene. To identify oxytocin receptor gene variants in laboring women with low and high oxytocin dosage requirements. Term, nulliparous women requiring oxytocin doses of ≤4 mU/min (low-dose-requiring, n = 83) or ≥20 mU/min (high-dose-requiring, n = 104) for labor augmentation or induction provided consent to a postpartum blood draw as a source of genomic DNA. Targeted-amplicon sequencing (coverage >30×) with MiSeq (Illumina) was performed to discover variants in the coding exons of the oxytocin receptor gene. Baseline relevant clinical history, outcomes, demographics, and oxytocin receptor gene sequence variants and their allele frequencies were compared between low-dose-requiring and high-dose-requiring women. The Scale-Invariant Feature Transform algorithm was used to predict the effect of variants on oxytocin receptor function. The Fisher exact or χ 2 tests were used for categorical variables, and Student t tests or Wilcoxon rank sum tests were used for continuous variables. A P value < .05 was considered statistically significant. The high-dose-requiring women had greater rates of obesity and diabetes and were more likely to have undergone labor induction and required prostaglandins. High-dose-requiring women were more likely to undergo cesarean delivery for first-stage arrest and less likely to undergo cesarean delivery for nonreassuring fetal status. Targeted sequencing of the oxytocin receptor gene in the total cohort (n = 187) revealed 30 distinct coding variants: 17 nonsynonymous, 11 synonymous, and 2 small structural variants. One novel variant (A243T) was found in both the low- and high-dose-requiring groups. Three novel variants (Y106H, A240_A249del, and P197delfs*206) resulting in an amino acid substitution, loss of 9 amino acids, and a frameshift stop mutation, respectively, were identified only in low-dose-requiring women. Nine nonsynonymous variants were unique to the high-dose-requiring group. These included 3 known variants (R151C, G221S, and W228C) and 6 novel variants (M133V, R150L, H173R, A248V, G253R, and I266V). Of these, R150L, R151C, and H173R were predicted by Scale-Invariant Feature Transform algorithm to damage oxytocin receptor function. There was no statistically significant association between the numbers of synonymous and nonsynonymous substitutions in the patient groups. Obesity, diabetes, and labor induction were associated with the requirement for high doses of oxytocin. We did not identify significant differences in the prevalence of oxytocin receptor variants between low-dose-requiring and high-dose-requiring women, but novel oxytocin receptor variants were enriched in the high-dose-requiring women. We also found 3 oxytocin receptor variants (2 novel, 1 known) that were predicted to damage oxytocin receptor function and would likely increase an individual's risk for requiring a high oxytocin dose. Further investigation of oxytocin receptor variants and their effects on protein function will inform precision medicine in pregnant women. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Possible role of rare variants in Trace amine associated receptor 1 in schizophrenia.

PubMed

John, Jibin; Kukshal, Prachi; Bhatia, Triptish; Chowdari, K V; Nimgaonkar, V L; Deshpande, S N; Thelma, B K

2017-11-01

Schizophrenia (SZ) is a chronic mental illness with behavioral abnormalities. Recent common variant based genome wide association studies and rare variant detection using next generation sequencing approaches have identified numerous variants that confer risk for SZ, but etiology remains unclear propelling continuing investigations. Using whole exome sequencing, we identified a rare heterozygous variant (c.545G>T; p.Cys182Phe) in Trace amine associated receptor 1 gene (TAAR1 6q23.2) in three affected members in a small SZ family. The variant predicted to be damaging by 15 prediction tools, causes breakage of a conserved disulfide bond in this G-protein-coupled receptor. On screening this intronless gene for additional variant(s) in ~800 sporadic SZ patients, we identified six rare protein altering variants (MAF<0.001) namely p.Ser47Cys, p.Phe51Leu, p.Tyr294Ter, p.Leu295Ser in four unrelated north Indian cases (n=475); p.Ala109Thr and p.Val250Ala in two independent Caucasian/African-American patients (n=310). Five of these variants were also predicted to be damaging. Besides, a rare synonymous variant was observed in SZ patients. These rare variants were absent in north Indian healthy controls (n=410) but significantly enriched in patients (p=0.036). Conversely, three common coding SNPs (rs8192621, rs8192620 and rs8192619) and a promoter SNP (rs60266355) tested for association with SZ in the north Indian cohort were not significant (P>0.05). TAAR1 is a modulator of monoaminergic pathways and interacts with AKT signaling pathways. Substantial animal model based pharmacological and functional data implying its relevance in SZ are also available. However, this is the first report suggestive of the likely contribution of rare variants in this gene to SZ. Copyright © 2017 Elsevier B.V. All rights reserved.
Autism Linked to Increased Oncogene Mutations but Decreased Cancer Rate

PubMed Central

Zimmerman, M. Bridget; Mahajan, Vinit B.; Bassuk, Alexander G.

2016-01-01

Autism spectrum disorder (ASD) is one phenotypic aspect of many monogenic, hereditary cancer syndromes. Pleiotropic effects of cancer genes on the autism phenotype could lead to repurposing of oncology medications to treat this increasingly prevalent neurodevelopmental condition for which there is currently no treatment. To explore this hypothesis we sought to discover whether autistic patients more often have rare coding, single-nucleotide variants within tumor suppressor and oncogenes and whether autistic patients are more often diagnosed with neoplasms. Exome-sequencing data from the ARRA Autism Sequencing Collaboration was compared to that of a control cohort from the Exome Variant Server database revealing that rare, coding variants within oncogenes were enriched for in the ARRA ASD cohort (p<1.0x10-8). In contrast, variants were not significantly enriched in tumor suppressor genes. Phenotypically, children and adults with ASD exhibited a protective effect against cancer, with a frequency of 1.3% vs. 3.9% (p<0.001), but the protective effect decreased with age. The odds ratio of neoplasm for those with ASD relative to controls was 0.06 (95% CI: 0.02, 0.19; p<0.0001) in the 0 to 14 age group; 0.35 (95% CI: 0.14, 0.87; p = 0.024) in the 15 to 29 age group; 0.41 (95% CI: 0.15, 1.17; p = 0.095) in the 30 to 54 age group; and 0.49 (95% CI: 0.14, 1.74; p = 0.267) in those 55 and older. Both males and females demonstrated the protective effect. These findings suggest that defects in cellular proliferation, and potentially senescence, might influence both autism and neoplasm, and already approved drugs targeting oncogenic pathways might also have therapeutic value for treating autism. PMID:26934580
Looking beyond the exome: a phenotype-first approach to molecular diagnostic resolution in rare and undiagnosed diseases

PubMed Central

Pena, Loren DM; Jiang, Yong-Hui; Schoch, Kelly; Spillmann, Rebecca C.; Walley, Nicole; Stong, Nicholas; Horn, Sarah Rapisardo; Sullivan, Jennifer A.; McConkie-Rosell, Allyn; Kansagra, Sujay; Smith, Edward C.; El-Dairi, Mays; Bellet, Jane; Ann Keels, Martha; Jasien, Joan; Kranz, Peter G.; Noel, Richard; Nagaraj, Shashi K.; Lark, Robert K.; Wechsler, Daniel SG; del Gaudio, Daniela; Leung, Marco L.; Hendon, Laura G.; Parker, Collette C.; Jones, Kelly L.; Goldstein, David B.; Shashi, Vandana

2017-01-01

Purpose To describe examples of missed pathogenic variants on whole exome sequencing (WES) and the importance of deep phenotyping for further diagnostic testing. Methods Guided by phenotypic information, three children with negative WES underwent targeted single gene testing. Results Individual 1 had a clinical diagnosis consistent with infantile systemic hyalinosis, although WES and an NGS-based ANTXR2 test were negative. Sanger sequencing of ANTXR2 revealed a homozygous single base pair insertion, previously missed by the WES variant caller software. Individual 2 had neurodevelopmental regression and cerebellar atrophy, with no diagnosis on WES. New clinical findings prompted Sanger sequencing and copy number testing of PLA2G6. A novel homozygous deletion of the non-coding exon 1 (not included in the WES capture kit) was detected, with extension into the promoter, confirming the clinical suspicion of infantile neuroaxonal dystrophy. Individual 3 had progressive ataxia, spasticity and MRI changes of vanishing white matter leukoencephalopathy. An NGS leukodystrophy gene panel and WES showed a heterozygous pathogenic variant in EIF2B5; no deletions/duplications were detected. Sanger sequencing of EIF2B5 showed a frameshift indel, likely missed due to failure of alignment. Conclusions These cases illustrate potential pitfalls of WES/NGS testing, and the importance of phenotype-guided molecular testing in yielding diagnoses. PMID:28914269
Novel genes and mutations in patients affected by recurrent pregnancy loss.

PubMed

Quintero-Ronderos, Paula; Mercier, Eric; Fukuda, Michiko; González, Ronald; Suárez, Carlos Fernando; Patarroyo, Manuel Alfonso; Vaiman, Daniel; Gris, Jean-Christophe; Laissue, Paul

2017-01-01

Recurrent pregnancy loss is a frequently occurring human infertility-related disease affecting ~1% of women. It has been estimated that the cause remains unexplained in >50% cases which strongly suggests that genetic factors may contribute towards the phenotype. Concerning its molecular aetiology numerous studies have had limited success in identifying the disease's genetic causes. This might have been due to the fact that hundreds of genes are involved in each physiological step necessary for guaranteeing reproductive success in mammals. In such scenario, next generation sequencing provides a potentially interesting tool for research into recurrent pregnancy loss causative mutations. The present study involved whole-exome sequencing and an innovative bioinformatics analysis, for the first time, in 49 unrelated women affected by recurrent pregnancy loss. We identified 27 coding variants (22 genes) potentially related to the phenotype (41% of patients). The affected genes, which were enriched by potentially deleterious sequence variants, belonged to distinct molecular cascades playing key roles in implantation/pregnancy biology. Using a quantum chemical approach method we established that mutations in MMP-10 and FGA proteins led to substantial energetic modifications suggesting an impact on their functions and/or stability. The next generation sequencing and bioinformatics approaches presented here represent an efficient way to find mutations, having potentially moderate/strong functional effects, associated with recurrent pregnancy loss aetiology. We consider that some of these variants (and genes) represent probable future biomarkers for recurrent pregnancy loss.
RNA splicing. The human splicing code reveals new insights into the genetic determinants of disease.

PubMed

Xiong, Hui Y; Alipanahi, Babak; Lee, Leo J; Bretschneider, Hannes; Merico, Daniele; Yuen, Ryan K C; Hua, Yimin; Gueroussov, Serge; Najafabadi, Hamed S; Hughes, Timothy R; Morris, Quaid; Barash, Yoseph; Krainer, Adrian R; Jojic, Nebojsa; Scherer, Stephen W; Blencowe, Benjamin J; Frey, Brendan J

2015-01-09

To facilitate precision medicine and whole-genome annotation, we developed a machine-learning technique that scores how strongly genetic variants affect RNA splicing, whose alteration contributes to many diseases. Analysis of more than 650,000 intronic and exonic variants revealed widespread patterns of mutation-driven aberrant splicing. Intronic disease mutations that are more than 30 nucleotides from any splice site alter splicing nine times as often as common variants, and missense exonic disease mutations that have the least impact on protein function are five times as likely as others to alter splicing. We detected tens of thousands of disease-causing mutations, including those involved in cancers and spinal muscular atrophy. Examination of intronic and exonic variants found using whole-genome sequencing of individuals with autism revealed misspliced genes with neurodevelopmental phenotypes. Our approach provides evidence for causal variants and should enable new discoveries in precision medicine. Copyright © 2015, American Association for the Advancement of Science.
Non-Coding Keratin Variants Associate with Liver Fibrosis Progression in Patients with Hemochromatosis

PubMed Central

Lunova, Mariia; Guldiken, Nurdan; Lienau, Tim C.; Stickel, Felix; Omary, M. Bishr

2012-01-01

Background Keratins 8 and 18 (K8/K18) are intermediate filament proteins that protect the liver from various forms of injury. Exonic K8/K18 variants associate with adverse outcome in acute liver failure and with liver fibrosis progression in patients with chronic hepatitis C infection or primary biliary cirrhosis. Given the association of K8/K18 variants with end-stage liver disease and progression in several chronic liver disorders, we studied the importance of keratin variants in patients with hemochromatosis. Methods The entire K8/K18 exonic regions were analyzed in 162 hemochromatosis patients carrying homozygous C282Y HFE (hemochromatosis gene) mutations. 234 liver-healthy subjects were used as controls. Exonic regions were PCR-amplified and analyzed using denaturing high-performance liquid chromatography and DNA sequencing. Previously-generated transgenic mice overexpressing K8 G62C were studied for their susceptibility to iron overload. Susceptibility to iron toxicity of primary hepatocytes that express K8 wild-type and G62C was also assessed. Results We identified amino-acid-altering keratin heterozygous variants in 10 of 162 hemochromatosis patients (6.2%) and non-coding heterozygous variants in 6 additional patients (3.7%). Two novel K8 variants (Q169E/R275W) were found. K8 R341H was the most common amino-acid altering variant (4 patients), and exclusively associated with an intronic KRT8 IVS7+10delC deletion. Intronic, but not amino-acid-altering variants associated with the development of liver fibrosis. In mice, or ex vivo, the K8 G62C variant did not affect iron-accumulation in response to iron-rich diet or the extent of iron-induced hepatocellular injury. Conclusion In patients with hemochromatosis, intronic but not exonic K8/K18 variants associate with liver fibrosis development. PMID:22412904
Sequence and RT-PCR expression analysis of two peroxidases from Arabidopsis thaliana belonging to a novel evolutionary branch of plant peroxidases.

PubMed

Kjaersgård, I V; Jespersen, H M; Rasmussen, S K; Welinder, K G

1997-03-01

cDNA clones encoding two new Arabidopsis thaliana peroxidases, ATP 1a and ATP 2a, have been identified by searching the Arabidopsis database of expressed sequence tags (dbEST). They represent a novel branch of hitherto uncharacterized plant peroxidases which is only 35% identical in amino acid sequence to the well characterized group of basic plant peroxidases represented by the horseradish (Armoracia rusticana) isoperoxidases HRP C, HRP E5 and the similar Arabidopsis isoperoxidases ATP Ca, ATP Cb, and ATP Ea. However ATP 1a is 87% identical in amino acid sequence to a peroxidase encoded by an mRNA isolated from cotton (Gossypium hirsutum). As cotton and Arabidopsis belong to rather diverse families (Malvaceae and Crucifereae, respectively), in contrast with Arabidopsis and horseradish (both Crucifereae), the high degree of sequence identity indicates that this novel type of peroxidase, albeit of unknown function, is likely to be widespread in plant species. The atp 1 and atp 2 types of cDNA sequences were the most redundant among the 28 different isoperoxidases identified among about 200 peroxidase encoding ESTs. Interestingly, 8 out of totally 38 EST sequences coding for ATP 1 showed three identical nucleotide substitutions. This variant form is designated ATP 1b. Similarly, six out of totally 16 EST sequences coding for ATP 2 showed a number of deletions and nucleotide changes. This variant form is designated ATP 2b. The selected EST clones are full-length and contain coding regions of 993 nucleotides for atp 1a, and 984 nucleotides for atp 2a. These regions show 61% DNA sequence identity. The predicted mature proteins ATP 1a, and ATP 2a are 57% identical in sequence and contain the structurally and functionally important residues, characteristic of the plant peroxidase superfamily. However, they do show two differences of importance to peroxidase catalysis: (1) the asparagine residue linked with the active site distal histidine via hydrogen bonding is absent; (2) an N-glycosylation site is located right at the entrance to the heme channel. The reverse transcriptase polymerase chain reaction (RT-PCR) was used to identify mRNAs coding for ATP 1a/b and ATP 2a/b in germinating seeds, seedlings, roots, leaves, stems, flowers and cell suspension culture using elongation factor 1alpha (EF-1alpha) for the first time as a positive control. Both mRNAs were transcribed at levels comparable to EF-1alpha in all plant tissues investigated which were more than two days old, and in cell suspension culture. In addition, the mRNA coding for ATP 1a/b was found in two day old germinating seeds. The abundant transcription of ATP 1a/b and ATP 2a/b is in line with their many entries in dbEST, and indicates essential roles for these novel peroxidases.
Barley whole exome capture: a tool for genomic research in the genus Hordeum and beyond

PubMed Central

Mascher, Martin; Richmond, Todd A; Gerhardt, Daniel J; Himmelbach, Axel; Clissold, Leah; Sampath, Dharanya; Ayling, Sarah; Steuernagel, Burkhard; Pfeifer, Matthias; D'Ascenzo, Mark; Akhunov, Eduard D; Hedley, Pete E; Gonzales, Ana M; Morrell, Peter L; Kilian, Benjamin; Blattner, Frank R; Scholz, Uwe; Mayer, Klaus FX; Flavell, Andrew J; Muehlbauer, Gary J; Waugh, Robbie; Jeddeloh, Jeffrey A; Stein, Nils

2013-01-01

Advanced resources for genome-assisted research in barley (Hordeum vulgare) including a whole-genome shotgun assembly and an integrated physical map have recently become available. These have made possible studies that aim to assess genetic diversity or to isolate single genes by whole-genome resequencing and in silico variant detection. However such an approach remains expensive given the 5 Gb size of the barley genome. Targeted sequencing of the mRNA-coding exome reduces barley genomic complexity more than 50-fold, thus dramatically reducing this heavy sequencing and analysis load. We have developed and employed an in-solution hybridization-based sequence capture platform to selectively enrich for a 61.6 megabase coding sequence target that includes predicted genes from the genome assembly of the cultivar Morex as well as publicly available full-length cDNAs and de novo assembled RNA-Seq consensus sequence contigs. The platform provides a highly specific capture with substantial and reproducible enrichment of targeted exons, both for cultivated barley and related species. We show that this exome capture platform provides a clear path towards a broader and deeper understanding of the natural variation residing in the mRNA-coding part of the barley genome and will thus constitute a valuable resource for applications such as mapping-by-sequencing and genetic diversity analyzes. PMID:23889683
Reprogramming neurodegeneration in the big data era.

PubMed

Zhou, Lujia; Verstreken, Patrik

2018-02-01

Recent genome-wide association studies (GWAS) have identified numerous genetic risk variants for late-onset Alzheimer's disease (AD) and Parkinson's disease (PD). However, deciphering the functional consequences of GWAS data is challenging due to a lack of reliable model systems to study the genetic variants that are often of low penetrance and non-coding identities. Pluripotent stem cell (PSC) technologies offer unprecedented opportunities for molecular phenotyping of GWAS variants in human neurons and microglia. Moreover, rapid technological advances in whole-genome RNA-sequencing and epigenome mapping fuel comprehensive and unbiased investigations of molecular alterations in PSC-derived disease models. Here, we review and discuss how integrated studies that utilize PSC technologies and genome-wide approaches may bring new mechanistic insight into the pathogenesis of AD and PD. Copyright © 2018 Elsevier Ltd. All rights reserved.
A new genetic variant in the Sp1 binding cis-element of cholecystokinin gene promoter region and relationship to alcoholism.

PubMed

Harada, S; Okubo, T; Tsutsumi, M; Takase, S; Muramatsu, T

1998-05-01

Neuropeptide cholecystokinin (CCK) and the CCK receptors in the central nervous system mediate actions on increasing firings, anxiety, and nociceptions. Furthermore, CCK modulates the release of dopamine and dopamine-related behaviors in the mesolimbic pathway. In our study, genetic variation in the promoter and coding regions of the prepro-CCK gene were analyzed among 66 Japanese, 66 American Whites, 54 Chinese, and 41 Colombian natives. Two nucleotide sequence variants were found: a frequent mutation at nucleotide position -45 C to T involved in core sequence of Sp1 binding cis-element of the promoter region, and a C to T substitution at the 1662 position in intron 2. Analysis for the segregation study in 10 families of twins confirmed codominant heredity of two alleles. Distribution of genotypes and gene frequencies of 66 controls and 108 alcoholics in Japan presented that allelic variant T type in alcoholics was found in higher frequencies than that of controls, and distribution of these genotypes was significantly different between the both groups.
Misregulation effect of a novel allelic variant in the Z promoter region found in cis with the CYP21A2 p.P482S mutation: implications for 21-hydroxylase deficiency.

PubMed

Fernández, Cecilia S; Bruque, Carlos D; Taboas, Melisa; Buzzalino, Noemí D; Espeche, Lucia D; Pasqualini, Titania; Charreau, Eduardo H; Alba, Liliana G; Ghiringhelli, Pablo D; Dain, Liliana

2015-09-01

The aim of the current study was to search for the presence of genetic variants in the CYP21A2 Z promoter regulatory region in patients with congenital adrenal hyperplasia due to 21-hydroxylase deficiency. Screening of the 10 most frequent pseudogene-derived mutations was followed by direct sequencing of the entire coding sequence, the proximal promoter, and a distal regulatory region in DNA samples from patients with at least one non-determined allele. We report three non-classical patients that presented a novel genetic variant-g.15626A>G-within the Z promoter regulatory region. In all the patients, the novel variant was found in cis with the mild, less frequent, p.P482S mutation located in the exon 10 of the CYP21A2 gene. The putative pathogenic implication of the novel variant was assessed by in silico analyses and in vitro assays. Topological analyses showed differences in the curvature and bendability of the DNA region bearing the novel variant. By performing functional studies, a significantly decreased activity of a reporter gene placed downstream from the regulatory region was found by the G transition. Our results may suggest that the activity of an allele bearing the p.P482S mutation may be influenced by the misregulated CYP21A2 transcriptional activity exerted by the Z promoter A>G variation.
Localized structural frustration for evaluating the impact of sequence variants.

PubMed

Kumar, Sushant; Clarke, Declan; Gerstein, Mark

2016-12-01

Population-scale sequencing is increasingly uncovering large numbers of rare single-nucleotide variants (SNVs) in coding regions of the genome. The rarity of these variants makes it challenging to evaluate their deleteriousness with conventional phenotype-genotype associations. Protein structures provide a way of addressing this challenge. Previous efforts have focused on globally quantifying the impact of SNVs on protein stability. However, local perturbations may severely impact protein functionality without strongly disrupting global stability (e.g. in relation to catalysis or allostery). Here, we describe a workflow in which localized frustration, quantifying unfavorable local interactions, is employed as a metric to investigate such effects. Using this workflow on the Protein Databank, we find that frustration produces many immediately intuitive results: for instance, disease-related SNVs create stronger changes in localized frustration than non-disease related variants, and rare SNVs tend to disrupt local interactions to a larger extent than common variants. Less obviously, we observe that somatic SNVs associated with oncogenes and tumor suppressor genes (TSGs) induce very different changes in frustration. In particular, those associated with TSGs change the frustration more in the core than the surface (by introducing loss-of-function events), whereas those associated with oncogenes manifest the opposite pattern, creating gain-of-function events. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Variability of Creatine Metabolism Genes in Children with Autism Spectrum Disorder.

PubMed

Cameron, Jessie M; Levandovskiy, Valeriy; Roberts, Wendy; Anagnostou, Evdokia; Scherer, Stephen; Loh, Alvin; Schulze, Andreas

2017-07-31

Creatine deficiency syndrome (CDS) comprises three separate enzyme deficiencies with overlapping clinical presentations: arginine:glycine amidinotransferase ( GATM gene, glycine amidinotransferase), guanidinoacetate methyltransferase ( GAMT gene), and creatine transporter deficiency ( SLC6A8 gene, solute carrier family 6 member 8). CDS presents with developmental delays/regression, intellectual disability, speech and language impairment, autistic behaviour, epileptic seizures, treatment-refractory epilepsy, and extrapyramidal movement disorders; symptoms that are also evident in children with autism. The objective of the study was to test the hypothesis that genetic variability in creatine metabolism genes is associated with autism. We sequenced GATM , GAMT and SLC6A8 genes in 166 patients with autism (coding sequence, introns and adjacent untranslated regions). A total of 29, 16 and 25 variants were identified in each gene, respectively. Four variants were novel in GATM , and 5 in SLC6A8 (not present in the 1000 Genomes, Exome Sequencing Project (ESP) or Exome Aggregation Consortium (ExAC) databases). A single variant in each gene was identified as non-synonymous, and computationally predicted to be potentially damaging. Nine variants in GATM were shown to have a lower minor allele frequency (MAF) in the autism population than in the 1000 Genomes database, specifically in the East Asian population (Fisher's exact test). Two variants also had lower MAFs in the European population. In summary, there were no apparent associations of variants in GAMT and SLC6A8 genes with autism. The data implying there could be a lower association of some specific GATM gene variants with autism is an observation that would need to be corroborated in a larger group of autism patients, and with sub-populations of Asian ethnicities. Overall, our findings suggest that the genetic variability of creatine synthesis/transport is unlikely to play a part in the pathogenesis of autism spectrum disorder (ASD) in children.
Proteogenomic Investigation of Strain Variation in Clinical Mycobacterium tuberculosis Isolates.

PubMed

Heunis, Tiaan; Dippenaar, Anzaan; Warren, Robin M; van Helden, Paul D; van der Merwe, Ruben G; Gey van Pittius, Nicolaas C; Pain, Arnab; Sampson, Samantha L; Tabb, David L

2017-10-06

Mycobacterium tuberculosis consists of a large number of different strains that display unique virulence characteristics. Whole-genome sequencing has revealed substantial genetic diversity among clinical M. tuberculosis isolates, and elucidating the phenotypic variation encoded by this genetic diversity will be of the utmost importance to fully understand M. tuberculosis biology and pathogenicity. In this study, we integrated whole-genome sequencing and mass spectrometry (GeLC-MS/MS) to reveal strain-specific characteristics in the proteomes of two clinical M. tuberculosis Latin American-Mediterranean isolates. Using this approach, we identified 59 peptides containing single amino acid variants, which covered ∼9% of all coding nonsynonymous single nucleotide variants detected by whole-genome sequencing. Furthermore, we identified 29 distinct peptides that mapped to a hypothetical protein not present in the M. tuberculosis H37Rv reference proteome. Here, we provide evidence for the expression of this protein in the clinical M. tuberculosis SAWC3651 isolate. The strain-specific databases enabled confirmation of genomic differences (i.e., large genomic regions of difference and nonsynonymous single nucleotide variants) in these two clinical M. tuberculosis isolates and allowed strain differentiation at the proteome level. Our results contribute to the growing field of clinical microbial proteogenomics and can improve our understanding of phenotypic variation in clinical M. tuberculosis isolates.
Genetic Diversity in Oxytocin Ligands and Receptors in New World Monkeys

PubMed Central

Ren, Dongren; Lu, Guoqing; Moriyama, Hideaki; Mustoe, Aaryn C.; Harrison, Emily B.; French, Jeffrey A.

2015-01-01

Oxytocin (OXT) is an important neurohypophyseal hormone that influences wide spectrum of reproductive and social processes. Eutherian mammals possess a highly conserved sequence of OXT (Cys-Tyr-Ile-Gln-Asn-Cys-Pro-Leu-Gly). However, in this study, we sequenced the coding region for OXT in 22 species covering all New World monkeys (NWM) genera and clades, and characterize five OXT variants, including consensus mammalian Leu8-OXT, major variant Pro8-OXT, and three previously unreported variants: Ala8-OXT, Thr8-OXT, and Phe2-OXT. Pro8-OXT shows clear structural and physicochemical differences from Leu8-OXT. We report multiple predicted amino acid substitutions in the G protein-coupled OXT receptor (OXTR), especially in the critical N-terminus, which is crucial for OXT recognition and binding. Genera with same Pro8-OXT tend to cluster together on a phylogenetic tree based on OXTR sequence, and we demonstrate significant coevolution between OXT and OXTR. NWM species are characterized by high incidence of social monogamy, and we document an association between OXTR phylogeny and social monogamy. Our results demonstrate remarkable genetic diversity in the NWM OXT/OXTR system, which can provide a foundation for molecular, pharmacological, and behavioral studies of the role of OXT signaling in regulating complex social phenotypes. PMID:25938568
De Novo Coding Variants Are Strongly Associated with Tourette Disorder.

PubMed

Willsey, A Jeremy; Fernandez, Thomas V; Yu, Dongmei; King, Robert A; Dietrich, Andrea; Xing, Jinchuan; Sanders, Stephan J; Mandell, Jeffrey D; Huang, Alden Y; Richer, Petra; Smith, Louw; Dong, Shan; Samocha, Kaitlin E; Neale, Benjamin M; Coppola, Giovanni; Mathews, Carol A; Tischfield, Jay A; Scharf, Jeremiah M; State, Matthew W; Heiman, Gary A

2017-05-03

Whole-exome sequencing (WES) and de novo variant detection have proven a powerful approach to gene discovery in complex neurodevelopmental disorders. We have completed WES of 325 Tourette disorder trios from the Tourette International Collaborative Genetics cohort and a replication sample of 186 trios from the Tourette Syndrome Association International Consortium on Genetics (511 total). We observe strong and consistent evidence for the contribution of de novo likely gene-disrupting (LGD) variants (rate ratio [RR] 2.32, p = 0.002). Additionally, de novo damaging variants (LGD and probably damaging missense) are overrepresented in probands (RR 1.37, p = 0.003). We identify four likely risk genes with multiple de novo damaging variants in unrelated probands: WWC1 (WW and C2 domain containing 1), CELSR3 (Cadherin EGF LAG seven-pass G-type receptor 3), NIPBL (Nipped-B-like), and FN1 (fibronectin 1). Overall, we estimate that de novo damaging variants in approximately 400 genes contribute risk in 12% of clinical cases. VIDEO ABSTRACT. Copyright © 2017 Elsevier Inc. All rights reserved.
Identification of novel mutations and sequence variants in the SOX2 and CHX10 genes in patients with anophthalmia/microphthalmia

PubMed Central

Zhou, Jie; Kherani, Femida; Bardakjian, Tanya M.; Katowitz, James; Hughes, Nkecha; Schimmenti, Lisa A.; Schneider, Adele

2008-01-01

Purpose Mutations in the SOX2 and CHX10 genes have been reported in patients with anophthalmia and/or microphthalmia. In this study, we evaluated 34 anophthalmic/microphthalmic patient DNA samples (two sets of siblings included) for mutations and sequence variants in SOX2 and CHX10. Methods Conformational sensitive gel electrophoresis (CSGE) was used for the initial SOX2 and CHX10 screening of 34 affected individuals (two sets of siblings), five unaffected family members, and 80 healthy controls. Patient samples containing heteroduplexes were selected for sequence analysis. Base pair changes in SOX2 and CHX10 were confirmed by sequencing bidirectionally in patient samples. Results Two novel heterozygous mutations and two sequence variants (one known) in SOX2 were identified in this cohort. Mutation c.310 G>T (p. Glu104X), found in one patient, was in the region encoding the high mobility group (HMG) DNA-binding domain and resulted in a change from glutamic acid to a stop codon. The second mutation, noted in two affected siblings, was a single nucleotide deletion c.549delC (p. Pro184ArgfsX19) in the region encoding the activation domain, resulting in a frameshift and premature termination of the coding sequence. The shortened protein products may result in the loss of function. In addition, a novel nucleotide substitution c.*557G>A was identified in the 3′-untranslated region in one patient. The relationship between the nucleotide change and the protein function is indeterminate. A known single nucleotide polymorphism (c. *469 C>A, SNP rs11915160) was also detected in 2 of the 34 patients. Screening of CHX10 identified two synonymous sequence variants, c.471 C>T (p.Ser157Ser, rs35435463) and c.579 G>A (p. Gln193Gln, novel SNP), and one non-synonymous sequence variant, c.871 G>A (p. Asp291Asn, novel SNP). The non-synonymous polymorphism was also present in healthy controls, suggesting non-causality. Conclusions These results support the role of SOX2 in ocular development. Loss of SOX2 function results in severe eye malformation. CHX10 was not implicated with microphthalmia/anophthalmia in our patient cohort. PMID:18385794

BMP15 c.-9C>G promoter sequence variant may contribute to the cause of non-syndromic premature ovarian failure.

PubMed

Fonseca, Dora Janeth; Ortega-Recalde, Oscar; Esteban-Perez, Clara; Moreno-Ortiz, Harold; Patiño, Liliana Catherine; Bermúdez, Olga María; Ortiz, Angela María; Restrepo, Carlos M; Lucena, Elkin; Laissue, Paul

2014-11-01

BMP15 has drawn particular attention in the pathophysiology of reproduction, as its mutations in mammalian species have been related to different reproductive phenotypes. In humans, BMP15 coding regions have been sequenced in large panels of women with premature ovarian failure (POF), but only some mutations have been definitely validated as causing the phenotype. A functional association between the BMP15 c.-9C>G promoter polymorphism and cause of POF have been reported. The aim of this study was to determine the potential functional effect of this sequence variant on specific BMP15 promoter transactivation disturbances. Bioinformatics was used to identify transcription factor binding sites located on the promoter region of BMP15. Reverse transcription polymerase chain reaction was used to study specific gene expression in ovarian tissue. Luciferase reporter assays were used to establish transactivation disturbances caused by the BMP15 c.-9C>G variant. The c.-9C>G variant was found to modify the PITX1 transcription factor binding site. PITX1 and BMP15 co-expressed in human and mouse ovarian tissue, and PITX1 transactivated both BMP15 promoter versions (-9C and -9G). It was found that the BMP15 c.-9G allele was related to BMP15 increased transcription, supporting c.-9C>G as a causal agent of POF. Copyright © 2014 Reproductive Healthcare Ltd. Published by Elsevier Ltd. All rights reserved.
Discovery of stimulation-responsive immune enhancers with CRISPR activation

PubMed Central

Simeonov, Dimitre R.; Gowen, Benjamin G.; Boontanrart, Mandy; Roth, Theodore L.; Gagnon, John D.; Mumbach, Maxwell R.; Satpathy, Ansuman T.; Lee, Youjin; Bray, Nicolas L.; Chan, Alice Y.; Lituiev, Dmytro S.; Nguyen, Michelle L.; Gate, Rachel E.; Subramaniam, Meena; Li, Zhongmei; Woo, Jonathan M.; Mitros, Therese; Ray, Graham J.; Curie, Gemma L.; Naddaf, Nicki; Chu, Julia S.; Ma, Hong; Boyer, Eric; Van Gool, Frederic; Huang, Hailiang; Liu, Ruize; Tobin, Victoria R.; Schumann, Kathrin; Daly, Mark J.; Farh, Kyle K; Ansel, K. Mark; Ye, Chun J.; Greenleaf, William J.; Anderson, Mark S.; Bluestone, Jeffrey A.; Chang, Howard Y.; Corn, Jacob E.; Marson, Alexander

2017-01-01

The majority of genetic variants associated with common human diseases map to enhancers, non-coding elements that shape cell-type-specific transcriptional programs and responses to extracellular cues1–3. Systematic mapping of functional enhancers and their biological contexts is required to understand the mechanisms by which variation in non-coding genetic sequences contributes to disease. Functional enhancers can be mapped by genomic sequence disruption4–6, but this approach is limited to the subset of enhancers that are necessary in the particular cellular context being studied. We hypothesized that recruitment of a strong transcriptional activator to an enhancer would be sufficient to drive target gene expression, even if that enhancer was not currently active in the assayed cells. Here we describe a discovery platform that can identify stimulus-responsive enhancers for a target gene independent of stimulus exposure. We used tiled CRISPR activation (CRISPRa)7 to synthetically recruit a transcriptional activator to sites across large genomic regions (more than 100 kilobases) surrounding two key autoimmunity risk loci, CD69 and IL2RA. We identified several CRISPRa-responsive elements with chromatin features of stimulus-responsive enhancers, including an IL2RA enhancer that harbours an autoimmunity risk variant. Using engineered mouse models, we found that sequence perturbation of the disease-associated Il2ra enhancer did not entirely block Il2ra expression, but rather delayed the timing of gene activation in response to specific extracellular signals. Enhancer deletion skewed polarization of naive T cells towards a pro-inflammatory T helper (TH17) cell state and away from a regulatory T cell state. This integrated approach identifies functional enhancers and reveals how non-coding variation associated with human immune dysfunction alters context-specific gene programs. PMID:28854172
Discovery of stimulation-responsive immune enhancers with CRISPR activation.

PubMed

Simeonov, Dimitre R; Gowen, Benjamin G; Boontanrart, Mandy; Roth, Theodore L; Gagnon, John D; Mumbach, Maxwell R; Satpathy, Ansuman T; Lee, Youjin; Bray, Nicolas L; Chan, Alice Y; Lituiev, Dmytro S; Nguyen, Michelle L; Gate, Rachel E; Subramaniam, Meena; Li, Zhongmei; Woo, Jonathan M; Mitros, Therese; Ray, Graham J; Curie, Gemma L; Naddaf, Nicki; Chu, Julia S; Ma, Hong; Boyer, Eric; Van Gool, Frederic; Huang, Hailiang; Liu, Ruize; Tobin, Victoria R; Schumann, Kathrin; Daly, Mark J; Farh, Kyle K; Ansel, K Mark; Ye, Chun J; Greenleaf, William J; Anderson, Mark S; Bluestone, Jeffrey A; Chang, Howard Y; Corn, Jacob E; Marson, Alexander

2017-09-07

The majority of genetic variants associated with common human diseases map to enhancers, non-coding elements that shape cell-type-specific transcriptional programs and responses to extracellular cues. Systematic mapping of functional enhancers and their biological contexts is required to understand the mechanisms by which variation in non-coding genetic sequences contributes to disease. Functional enhancers can be mapped by genomic sequence disruption, but this approach is limited to the subset of enhancers that are necessary in the particular cellular context being studied. We hypothesized that recruitment of a strong transcriptional activator to an enhancer would be sufficient to drive target gene expression, even if that enhancer was not currently active in the assayed cells. Here we describe a discovery platform that can identify stimulus-responsive enhancers for a target gene independent of stimulus exposure. We used tiled CRISPR activation (CRISPRa) to synthetically recruit a transcriptional activator to sites across large genomic regions (more than 100 kilobases) surrounding two key autoimmunity risk loci, CD69 and IL2RA. We identified several CRISPRa-responsive elements with chromatin features of stimulus-responsive enhancers, including an IL2RA enhancer that harbours an autoimmunity risk variant. Using engineered mouse models, we found that sequence perturbation of the disease-associated Il2ra enhancer did not entirely block Il2ra expression, but rather delayed the timing of gene activation in response to specific extracellular signals. Enhancer deletion skewed polarization of naive T cells towards a pro-inflammatory T helper (T H 17) cell state and away from a regulatory T cell state. This integrated approach identifies functional enhancers and reveals how non-coding variation associated with human immune dysfunction alters context-specific gene programs.
Discovery of stimulation-responsive immune enhancers with CRISPR activation

NASA Astrophysics Data System (ADS)

Simeonov, Dimitre R.; Gowen, Benjamin G.; Boontanrart, Mandy; Roth, Theodore L.; Gagnon, John D.; Mumbach, Maxwell R.; Satpathy, Ansuman T.; Lee, Youjin; Bray, Nicolas L.; Chan, Alice Y.; Lituiev, Dmytro S.; Nguyen, Michelle L.; Gate, Rachel E.; Subramaniam, Meena; Li, Zhongmei; Woo, Jonathan M.; Mitros, Therese; Ray, Graham J.; Curie, Gemma L.; Naddaf, Nicki; Chu, Julia S.; Ma, Hong; Boyer, Eric; van Gool, Frederic; Huang, Hailiang; Liu, Ruize; Tobin, Victoria R.; Schumann, Kathrin; Daly, Mark J.; Farh, Kyle K.; Ansel, K. Mark; Ye, Chun J.; Greenleaf, William J.; Anderson, Mark S.; Bluestone, Jeffrey A.; Chang, Howard Y.; Corn, Jacob E.; Marson, Alexander

2017-09-01

The majority of genetic variants associated with common human diseases map to enhancers, non-coding elements that shape cell-type-specific transcriptional programs and responses to extracellular cues. Systematic mapping of functional enhancers and their biological contexts is required to understand the mechanisms by which variation in non-coding genetic sequences contributes to disease. Functional enhancers can be mapped by genomic sequence disruption, but this approach is limited to the subset of enhancers that are necessary in the particular cellular context being studied. We hypothesized that recruitment of a strong transcriptional activator to an enhancer would be sufficient to drive target gene expression, even if that enhancer was not currently active in the assayed cells. Here we describe a discovery platform that can identify stimulus-responsive enhancers for a target gene independent of stimulus exposure. We used tiled CRISPR activation (CRISPRa) to synthetically recruit a transcriptional activator to sites across large genomic regions (more than 100 kilobases) surrounding two key autoimmunity risk loci, CD69 and IL2RA. We identified several CRISPRa-responsive elements with chromatin features of stimulus-responsive enhancers, including an IL2RA enhancer that harbours an autoimmunity risk variant. Using engineered mouse models, we found that sequence perturbation of the disease-associated Il2ra enhancer did not entirely block Il2ra expression, but rather delayed the timing of gene activation in response to specific extracellular signals. Enhancer deletion skewed polarization of naive T cells towards a pro-inflammatory T helper (TH17) cell state and away from a regulatory T cell state. This integrated approach identifies functional enhancers and reveals how non-coding variation associated with human immune dysfunction alters context-specific gene programs.
Widespread signatures of local mRNA folding structure selection in four Dengue virus serotypes

PubMed Central

2015-01-01

Background It is known that mRNA folding can affect and regulate various gene expression steps both in living organisms and in viruses. Previous studies have recognized functional RNA structures in the genome of the Dengue virus. However, these studies usually focused either on the viral untranslated regions or on very specific and limited regions at the beginning of the coding sequences, in a limited number of strains, and without considering evolutionary selection. Results Here we performed the first large scale comprehensive genomics analysis of selection for local mRNA folding strength in the Dengue virus coding sequences, based on a total of 1,670 genomes and 4 serotypes. Our analysis identified clusters of positions along the coding regions that may undergo a conserved evolutionary selection for strong or weak local folding maintained across different viral variants. Specifically, 53-66 clusters for strong folding and 49-73 clusters for weak folding (depending on serotype) aggregated of positions with a significant conservation of folding energy signals (related to partially overlapping local genomic regions) were recognized. In addition, up to 7% of these positions were found to be conserved in more than 90% of the viral genomes. Although some of the identified positions undergo frequent synonymous / non-synonymous substitutions, the selection for folding strength therein is preserved, and thus cannot be trivially explained based on sequence conservation alone. Conclusions The fact that many of the positions with significant folding related signals are conserved among different Dengue variants suggests that a better understanding of the mRNA structures in the corresponding regions may promote the development of prospective anti- Dengue vaccination strategies. The comparative genomics approach described here can be employed in the future for detecting functional regions in other pathogens with very high mutations rates. PMID:26449467
Targeted deep sequencing identifies rare loss-of-function variants in IFNGR1 for risk of atopic dermatitis complicated by eczema herpeticum.

PubMed

Gao, Li; Bin, Lianghua; Rafaels, Nicholas M; Huang, Lili; Potee, Joseph; Ruczinski, Ingo; Beaty, Terri H; Paller, Amy S; Schneider, Lynda C; Gallo, Rich; Hanifin, Jon M; Beck, Lisa A; Geha, Raif S; Mathias, Rasika A; Barnes, Kathleen C; Leung, Donald Y M

2015-12-01

A subset of atopic dermatitis is associated with increased susceptibility to eczema herpeticum (ADEH+). We previously reported that common single nucleotide polymorphisms (SNPs) in the IFN-γ (IFNG) and IFN-γ receptor 1 (IFNGR1) genes were associated with the ADEH+ phenotype. We sought to interrogate the role of rare variants in interferon pathway genes for the risk of ADEH+. We performed targeted sequencing of interferon pathway genes (IFNG, IFNGR1, IFNAR1, and IL12RB1) in 228 European American patients with AD selected according to their eczema herpeticum status, and severity was measured by using the Eczema Area and Severity Index. Replication genotyping was performed in independent samples of 219 European American and 333 African American subjects. Functional investigation of loss-of-function variants was conducted by using site-directed mutagenesis. We identified 494 single nucleotide variants encompassing 105 kb of sequence, including 145 common, 349 (70.6%) rare (minor allele frequency <5%), and 86 (17.4%) novel variants, of which 2.8% were coding synonymous, 93.3% were noncoding (64.6% intronic), and 3.8% were missense. We identified 6 rare IFNGR1 missense variants, including 3 damaging variants (Val14Met [V14M], Val61Ile, and Tyr397Cys [Y397C]) conferring a higher risk for ADEH+ (P = .031). Variants V14M and Y397C were confirmed to be deleterious, leading to partial IFNGR1 deficiency. Seven common IFNGR1 SNPs, along with common protective haplotypes (2-7 SNPs), conferred a reduced risk of ADEH+ (P = .015-.002 and P = .0015-.0004, respectively), and both SNP and haplotype associations were replicated in an independent African American sample (P = .004-.0001 and P = .001-.0001, respectively). Our results provide evidence that both genetic variants in the gene encoding IFNGR1 are implicated in susceptibility to the ADEH+ phenotype. Copyright © 2015 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.
Mutations Affecting Expression of the rosy Locus in Drosophila melanogaster

PubMed Central

Lee, Chong Sung; Curtis, Daniel; McCarron, Margaret; Love, Carol; Gray, Mark; Bender, Welcome; Chovnick, Arthur

1987-01-01

The rosy locus in Drosophila melanogaster codes for the enzyme xanthine dehydrogenase (XDH). Previous studies defined a "control element" near the 5' end of the gene, where variant sites affected the amount of rosy mRNA and protein produced. We have determined the DNA sequence of this region from both genomic and cDNA clones, and from the ry+10 underproducer strain. This variant strain had many sequence differences, so that the site of the regulatory change could not be fixed. A mutagenesis was also undertaken to isolate new regulatory mutations. We induced 376 new mutations with 1-ethyl-1-nitrosourea (ENU) and screened them to isolate those that reduced the amount of XDH protein produced, but did not change the properties of the enzyme. Genetic mapping was used to find mutations located near the 5' end of the gene. DNA from each of seven mutants was cloned and sequenced through the 5' region. Mutant base changes were identified in all seven; they appear to affect splicing and translation of the rosy mRNA. In a related study (T. P. Keith et al. 1987), the genomic and cDNA sequences are extended through the 3' end of the gene; the combined sequences define the processing pattern of the rosy transcript and predict the amino acid sequence of XDH. PMID:3036645
Medical Sequencing at the extremes of Human Body Mass

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ahituv, Nadav; Kavaslar, Nihan; Schackwitz, Wendy

2006-09-01

Body weight is a quantitative trait with significantheritability in humans. To identify potential genetic contributors tothis phenotype, we resequenced the coding exons and splice junctions of58 genes in 379 obese and 378 lean individuals. Our 96Mb survey included21 genes associated with monogenic forms of obesity in humans or mice, aswell as 37 genes that function in body weight-related pathways. We foundthat the monogenic obesity-associated gene group was enriched for rarenonsynonymous variants unique to the obese (n=46) versus lean (n=26)populations. Computational analysis further predicted a significantlygreater fraction of deleterious variants within the obese cohort.Consistent with the complex inheritance of body weight,more » we did notobserve obvious familial segregation in the majority of the 28 availablekindreds. Taken together, these data suggest that multiple rare alleleswith variable penetrance contribute to obesity in the population andprovide a deep medical sequencing based approach to detectthem.« less
Comprehensive analysis of the mutation spectrum in 301 German ALS families.

PubMed

Müller, Kathrin; Brenner, David; Weydt, Patrick; Meyer, Thomas; Grehl, Torsten; Petri, Susanne; Grosskreutz, Julian; Schuster, Joachim; Volk, Alexander E; Borck, Guntram; Kubisch, Christian; Klopstock, Thomas; Zeller, Daniel; Jablonka, Sibylle; Sendtner, Michael; Klebe, Stephan; Knehr, Antje; Günther, Kornelia; Weis, Joachim; Claeys, Kristl G; Schrank, Berthold; Sperfeld, Anne-Dorte; Hübers, Annemarie; Otto, Markus; Dorst, Johannes; Meitinger, Thomas; Strom, Tim M; Andersen, Peter M; Ludolph, Albert C; Weishaupt, Jochen H

2018-04-12

Recent advances in amyotrophic lateral sclerosis (ALS) genetics have revealed that mutations in any of more than 25 genes can cause ALS, mostly as an autosomal-dominant Mendelian trait. Detailed knowledge about the genetic architecture of ALS in a specific population will be important for genetic counselling but also for genotype-specific therapeutic interventions. Here we combined fragment length analysis, repeat-primed PCR, Southern blotting, Sanger sequencing and whole exome sequencing to obtain a comprehensive profile of genetic variants in ALS disease genes in 301 German pedigrees with familial ALS. We report C9orf72 mutations as well as variants in consensus splice sites and non-synonymous variants in protein-coding regions of ALS genes. We furthermore estimate their pathogenicity by taking into account type and frequency of the respective variant as well as segregation within the families. 49% of our German ALS families carried a likely pathogenic variant in at least one of the earlier identified ALS genes. In 45% of the ALS families, likely pathogenic variants were detected in C9orf72, SOD1, FUS, TARDBP or TBK1 , whereas the relative contribution of the other ALS genes in this familial ALS cohort was 4%. We identified several previously unreported rare variants and demonstrated the absence of likely pathogenic variants in some of the recently described ALS disease genes. We here present a comprehensive genetic characterisation of German familial ALS. The present findings are of importance for genetic counselling in clinical practice, for molecular research and for the design of diagnostic gene panels or genotype-specific therapeutic interventions in Europe. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
mirVAFC: A Web Server for Prioritizations of Pathogenic Sequence Variants from Exome Sequencing Data via Classifications.

PubMed

Li, Zhongshan; Liu, Zhenwei; Jiang, Yi; Chen, Denghui; Ran, Xia; Sun, Zhong Sheng; Wu, Jinyu

2017-01-01

Exome sequencing has been widely used to identify the genetic variants underlying human genetic disorders for clinical diagnoses, but the identification of pathogenic sequence variants among the huge amounts of benign ones is complicated and challenging. Here, we describe a new Web server named mirVAFC for pathogenic sequence variants prioritizations from clinical exome sequencing (CES) variant data of single individual or family. The mirVAFC is able to comprehensively annotate sequence variants, filter out most irrelevant variants using custom criteria, classify variants into different categories as for estimated pathogenicity, and lastly provide pathogenic variants prioritizations based on classifications and mutation effects. Case studies using different types of datasets for different diseases from publication and our in-house data have revealed that mirVAFC can efficiently identify the right pathogenic candidates as in original work in each case. Overall, the Web server mirVAFC is specifically developed for pathogenic sequence variant identifications from family-based CES variants using classification-based prioritizations. The mirVAFC Web server is freely accessible at https://www.wzgenomics.cn/mirVAFC/. © 2016 WILEY PERIODICALS, INC.
Multi-Platform Next-Generation Sequencing of the Domestic Turkey (Meleagris gallopavo): Genome Assembly and Analysis

PubMed Central

Aslam, Luqman; Beal, Kathryn; Ann Blomberg, Le; Bouffard, Pascal; Burt, David W.; Crasta, Oswald; Crooijmans, Richard P. M. A.; Cooper, Kristal; Coulombe, Roger A.; De, Supriyo; Delany, Mary E.; Dodgson, Jerry B.; Dong, Jennifer J.; Evans, Clive; Frederickson, Karin M.; Flicek, Paul; Florea, Liliana; Folkerts, Otto; Groenen, Martien A. M.; Harkins, Tim T.; Herrero, Javier; Hoffmann, Steve; Megens, Hendrik-Jan; Jiang, Andrew; de Jong, Pieter; Kaiser, Pete; Kim, Heebal; Kim, Kyu-Won; Kim, Sungwon; Langenberger, David; Lee, Mi-Kyung; Lee, Taeheon; Mane, Shrinivasrao; Marcais, Guillaume; Marz, Manja; McElroy, Audrey P.; Modise, Thero; Nefedov, Mikhail; Notredame, Cédric; Paton, Ian R.; Payne, William S.; Pertea, Geo; Prickett, Dennis; Puiu, Daniela; Qioa, Dan; Raineri, Emanuele; Ruffier, Magali; Salzberg, Steven L.; Schatz, Michael C.; Scheuring, Chantel; Schmidt, Carl J.; Schroeder, Steven; Searle, Stephen M. J.; Smith, Edward J.; Smith, Jacqueline; Sonstegard, Tad S.; Stadler, Peter F.; Tafer, Hakim; Tu, Zhijian (Jake); Van Tassell, Curtis P.; Vilella, Albert J.; Williams, Kelly P.; Yorke, James A.; Zhang, Liqing; Zhang, Hong-Bin; Zhang, Xiaojun; Zhang, Yang; Reed, Kent M.

2010-01-01

A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo). Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (∼1.1 Gb) includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest. PMID:20838655
Aggregation of population‐based genetic variation over protein domain homologues and its potential use in genetic diagnostics

PubMed Central

Wiel, Laurens; Venselaar, Hanka; Veltman, Joris A.; Vriend, Gert

2017-01-01

Abstract Whole exomes of patients with a genetic disorder are nowadays routinely sequenced but interpretation of the identified genetic variants remains a major challenge. The increased availability of population‐based human genetic variation has given rise to measures of genetic tolerance that have been used, for example, to predict disease‐causing genes in neurodevelopmental disorders. Here, we investigated whether combining variant information from homologous protein domains can improve variant interpretation. For this purpose, we developed a framework that maps population variation and known pathogenic mutations onto 2,750 “meta‐domains.” These meta‐domains consist of 30,853 homologous Pfam protein domain instances that cover 36% of all human protein coding sequences. We find that genetic tolerance is consistent across protein domain homologues, and that patterns of genetic tolerance faithfully mimic patterns of evolutionary conservation. Furthermore, for a significant fraction (68%) of the meta‐domains high‐frequency population variation re‐occurs at the same positions across domain homologues more often than expected. In addition, we observe that the presence of pathogenic missense variants at an aligned homologous domain position is often paired with the absence of population variation and vice versa. The use of these meta‐domains can improve the interpretation of genetic variation. PMID:28815929
Aggregating and Predicting Sequence Labels from Crowd Annotations

PubMed Central

Nguyen, An T.; Wallace, Byron C.; Li, Junyi Jessy; Nenkova, Ani; Lease, Matthew

2017-01-01

Despite sequences being core to NLP, scant work has considered how to handle noisy sequence labels from multiple annotators for the same text. Given such annotations, we consider two complementary tasks: (1) aggregating sequential crowd labels to infer a best single set of consensus annotations; and (2) using crowd annotations as training data for a model that can predict sequences in unannotated text. For aggregation, we propose a novel Hidden Markov Model variant. To predict sequences in unannotated text, we propose a neural approach using Long Short Term Memory. We evaluate a suite of methods across two different applications and text genres: Named-Entity Recognition in news articles and Information Extraction from biomedical abstracts. Results show improvement over strong baselines. Our source code and data are available online1. PMID:29093611
Association between Rare Variants in AP4E1, a Component of Intracellular Trafficking, and Persistent Stuttering.

PubMed

Raza, M Hashim; Mattera, Rafael; Morell, Robert; Sainz, Eduardo; Rahn, Rachel; Gutierrez, Joanne; Paris, Emily; Root, Jessica; Solomon, Beth; Brewer, Carmen; Basra, M Asim Raza; Khan, Shaheen; Riazuddin, Sheikh; Braun, Allen; Bonifacino, Juan S; Drayna, Dennis

2015-11-05

Stuttering is a common, highly heritable neurodevelopmental disorder characterized by deficits in the volitional control of speech. Whole-exome sequencing identified two heterozygous AP4E1 coding variants, c.1549G>A (p.Val517Ile) and c.2401G>A (p.Glu801Lys), that co-segregate with persistent developmental stuttering in a large Cameroonian family, and we observed the same two variants in unrelated Cameroonians with persistent stuttering. We found 23 other rare variants, including predicted loss-of-function variants, in AP4E1 in unrelated stuttering individuals in Cameroon, Pakistan, and North America. The rate of rare variants in AP4E1 was significantly higher in unrelated Pakistani and Cameroonian stuttering individuals than in population-matched control individuals, and coding variants in this gene are exceptionally rare in the general sub-Saharan West African, South Asian, and North American populations. Clinical examination of the Cameroonian family members failed to identify any symptoms previously reported in rare individuals carrying homozygous loss-of-function mutations in this gene. AP4E1 encodes the ε subunit of the heterotetrameric (ε-β4-μ4-σ4) AP-4 complex, involved in protein sorting at the trans-Golgi network. We found that the μ4 subunit of AP-4 interacts with NAGPA, an enzyme involved in the synthesis of the mannose 6-phosphate signal that targets acid hydrolases to the lysosome and the product of a gene previously associated with stuttering. These findings implicate deficits in intracellular trafficking in persistent stuttering. Copyright © 2015 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Describing Phonological Paraphasias in Three Variants of Primary Progressive Aphasia.

PubMed

Dalton, Sarah Grace Hudspeth; Shultz, Christine; Henry, Maya L; Hillis, Argye E; Richardson, Jessica D

2018-03-01

The purpose of this study was to describe the linguistic environment of phonological paraphasias in 3 variants of primary progressive aphasia (semantic, logopenic, and nonfluent) and to describe the profiles of paraphasia production for each of these variants. Discourse samples of 26 individuals diagnosed with primary progressive aphasia were investigated for phonological paraphasias using the criteria established for the Philadelphia Naming Test (Moss Rehabilitation Research Institute, 2013). Phonological paraphasias were coded for paraphasia type, part of speech of the target word, target word frequency, type of segment in error, word position of consonant errors, type of error, and degree of change in consonant errors. Eighteen individuals across the 3 variants produced phonological paraphasias. Most paraphasias were nonword, followed by formal, and then mixed, with errors primarily occurring on nouns and verbs, with relatively few on function words. Most errors were substitutions, followed by addition and deletion errors, and few sequencing errors. Errors were evenly distributed across vowels, consonant singletons, and clusters, with more errors occurring in initial and medial positions of words than in the final position of words. Most consonant errors consisted of only a single-feature change, with few 2- or 3-feature changes. Importantly, paraphasia productions by variant differed from these aggregate results, with unique production patterns for each variant. These results suggest that a system where paraphasias are coded as present versus absent may be insufficient to adequately distinguish between the 3 subtypes of PPA. The 3 variants demonstrate patterns that may be used to improve phenotyping and diagnostic sensitivity. These results should be integrated with recent findings on phonological processing and speech rate. Future research should attempt to replicate these results in a larger sample of participants with longer speech samples and varied elicitation tasks. https://doi.org/10.23641/asha.5558107.
Novel candidate genes may be possible predisposing factors revealed by whole exome sequencing in familial esophageal squamous cell carcinoma.

PubMed

Forouzanfar, Narjes; Baranova, Ancha; Milanizadeh, Saman; Heravi-Moussavi, Alireza; Jebelli, Amir; Abbaszadegan, Mohammad Reza

2017-05-01

Esophageal squamous cell carcinoma is one of the deadliest of all the cancers. Its metastatic properties portend poor prognosis and high rate of recurrence. A more advanced method to identify new molecular biomarkers predicting disease prognosis can be whole exome sequencing. Here, we report the most effective genetic variants of the Notch signaling pathway in esophageal squamous cell carcinoma susceptibility by whole exome sequencing. We analyzed nine probands in unrelated familial esophageal squamous cell carcinoma pedigrees to identify candidate genes. Genomic DNA was extracted and whole exome sequencing performed to generate information about genetic variants in the coding regions. Bioinformatics software applications were utilized to exploit statistical algorithms to demonstrate protein structure and variants conservation. Polymorphic regions were excluded by false-positive investigations. Gene-gene interactions were analyzed for Notch signaling pathway candidates. We identified novel and damaging variants of the Notch signaling pathway through extensive pathway-oriented filtering and functional predictions, which led to the study of 27 candidate novel mutations in all nine patients. Detection of the trinucleotide repeat containing 6B gene mutation (a slice site alteration) in five of the nine probands, but not in any of the healthy samples, suggested that it may be a susceptibility factor for familial esophageal squamous cell carcinoma. Noticeably, 8 of 27 novel candidate gene mutations (e.g. epidermal growth factor, signal transducer and activator of transcription 3, MET) act in a cascade leading to cell survival and proliferation. Our results suggest that the trinucleotide repeat containing 6B mutation may be a candidate predisposing gene in esophageal squamous cell carcinoma. In addition, some of the Notch signaling pathway genetic mutations may act as key contributors to esophageal squamous cell carcinoma.
Sequencing of GJB2 in Cameroonians and Black South Africans and comparison to 1000 Genomes Project Data Support Need to Revise Strategy for Discovery of Nonsyndromic Deafness Genes in Africans.

PubMed

Bosch, Jason; Noubiap, Jean Jacques N; Dandara, Collet; Makubalo, Nomlindo; Wright, Galen; Entfellner, Jean-Baka Domelevo; Tiffin, Nicki; Wonkam, Ambroise

2014-11-01

Mutations in the GJB2 gene, encoding connexin 26, could account for 50% of congenital, nonsyndromic, recessive deafness cases in some Caucasian/Asian populations. There is a scarcity of published data in sub-Saharan Africans. We Sanger sequenced the coding region of the GJB2 gene in 205 Cameroonian and Xhosa South Africans with congenital, nonsyndromic deafness; and performed bioinformatic analysis of variations in the GJB2 gene, incorporating data from the 1000 Genomes Project. Amongst Cameroonian patients, 26.1% were familial. The majority of patients (70%) suffered from sensorineural hearing loss. Ten GJB2 genetic variants were detected by sequencing. A previously reported pathogenic mutation, g.3741_3743delTTC (p.F142del), and a putative pathogenic mutation, g.3816G>A (p.V167M), were identified in single heterozygous samples. Amongst eight the remaining variants, two novel variants, g.3318-41G>A and g.3332G>A, were reported. There were no statistically significant differences in allele frequencies between cases and controls. Principal Components Analyses differentiated between Africans, Asians, and Europeans, but only explained 40% of the variation. The present study is the first to compare African GJB2 sequences with the data from the 1000 Genomes Project and have revealed the low variation between population groups. This finding has emphasized the hypothesis that the prevalence of mutations in GJB2 in nonsyndromic deafness amongst European and Asian populations is due to founder effects arising after these individuals migrated out of Africa, and not to a putative "protective" variant in the genomic structure of GJB2 in Africans. Our results confirm that mutations in GJB2 are not associated with nonsyndromic deafness in Africans.
Widespread Site-Dependent Buffering of Human Regulatory Polymorphism

PubMed Central

Kutyavin, Tanya; Stamatoyannopoulos, John A.

2012-01-01

The average individual is expected to harbor thousands of variants within non-coding genomic regions involved in gene regulation. However, it is currently not possible to interpret reliably the functional consequences of genetic variation within any given transcription factor recognition sequence. To address this, we comprehensively analyzed heritable genome-wide binding patterns of a major sequence-specific regulator (CTCF) in relation to genetic variability in binding site sequences across a multi-generational pedigree. We localized and quantified CTCF occupancy by ChIP-seq in 12 related and unrelated individuals spanning three generations, followed by comprehensive targeted resequencing of the entire CTCF–binding landscape across all individuals. We identified hundreds of variants with reproducible quantitative effects on CTCF occupancy (both positive and negative). While these effects paralleled protein–DNA recognition energetics when averaged, they were extensively buffered by striking local context dependencies. In the significant majority of cases buffering was complete, resulting in silent variants spanning every position within the DNA recognition interface irrespective of level of binding energy or evolutionary constraint. The prevalence of complex partial or complete buffering effects severely constrained the ability to predict reliably the impact of variation within any given binding site instance. Surprisingly, 40% of variants that increased CTCF occupancy occurred at positions of human–chimp divergence, challenging the expectation that the vast majority of functional regulatory variants should be deleterious. Our results suggest that, even in the presence of “perfect” genetic information afforded by resequencing and parallel studies in multiple related individuals, genomic site-specific prediction of the consequences of individual variation in regulatory DNA will require systematic coupling with empirical functional genomic measurements. PMID:22457641
Unified Sequence-Based Association Tests Allowing for Multiple Functional Annotations and Meta-analysis of Noncoding Variation in Metabochip Data.

PubMed

He, Zihuai; Xu, Bin; Lee, Seunggeun; Ionita-Laza, Iuliana

2017-09-07

Substantial progress has been made in the functional annotation of genetic variation in the human genome. Integrative analysis that incorporates such functional annotations into sequencing studies can aid the discovery of disease-associated genetic variants, especially those with unknown function and located outside protein-coding regions. Direct incorporation of one functional annotation as weight in existing dispersion and burden tests can suffer substantial loss of power when the functional annotation is not predictive of the risk status of a variant. Here, we have developed unified tests that can utilize multiple functional annotations simultaneously for integrative association analysis with efficient computational techniques. We show that the proposed tests significantly improve power when variant risk status can be predicted by functional annotations. Importantly, when functional annotations are not predictive of risk status, the proposed tests incur only minimal loss of power in relation to existing dispersion and burden tests, and under certain circumstances they can even have improved power by learning a weight that better approximates the underlying disease model in a data-adaptive manner. The tests can be constructed with summary statistics of existing dispersion and burden tests for sequencing data, therefore allowing meta-analysis of multiple studies without sharing individual-level data. We applied the proposed tests to a meta-analysis of noncoding rare variants in Metabochip data on 12,281 individuals from eight studies for lipid traits. By incorporating the Eigen functional score, we detected significant associations between noncoding rare variants in SLC22A3 and low-density lipoprotein and total cholesterol, associations that are missed by standard dispersion and burden tests. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Establishing the role of rare coding variants in known Parkinson's disease risk loci.

PubMed

Jansen, Iris E; Gibbs, J Raphael; Nalls, Mike A; Price, T Ryan; Lubbe, Steven; van Rooij, Jeroen; Uitterlinden, André G; Kraaij, Robert; Williams, Nigel M; Brice, Alexis; Hardy, John; Wood, Nicholas W; Morris, Huw R; Gasser, Thomas; Singleton, Andrew B; Heutink, Peter; Sharma, Manu

2017-11-01

Many common genetic factors have been identified to contribute to Parkinson's disease (PD) susceptibility, improving our understanding of the related underlying biological mechanisms. The involvement of rarer variants in these loci has been poorly studied. Using International Parkinson's Disease Genomics Consortium data sets, we performed a comprehensive study to determine the impact of rare variants in 23 previously published genome-wide association studies (GWAS) loci in PD. We applied Prix fixe to select the putative causal genes underneath the GWAS peaks, which was based on underlying functional similarities. The Sequence Kernel Association Test was used to analyze the joint effect of rare, common, or both types of variants on PD susceptibility. All genes were tested simultaneously as a gene set and each gene individually. We observed a moderate association of common variants, confirming the involvement of the known PD risk loci within our genetic data sets. Focusing on rare variants, we identified additional association signals for LRRK2, STBD1, and SPATA19. Our study suggests an involvement of rare variants within several putatively causal genes underneath previously identified PD GWAS peaks. Copyright © 2017 Elsevier Inc. All rights reserved.

VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment.

PubMed

Habegger, Lukas; Balasubramanian, Suganthi; Chen, David Z; Khurana, Ekta; Sboner, Andrea; Harmanci, Arif; Rozowsky, Joel; Clarke, Declan; Snyder, Michael; Gerstein, Mark

2012-09-01

The functional annotation of variants obtained through sequencing projects is generally assumed to be a simple intersection of genomic coordinates with genomic features. However, complexities arise for several reasons, including the differential effects of a variant on alternatively spliced transcripts, as well as the difficulty in assessing the impact of small insertions/deletions and large structural variants. Taking these factors into consideration, we developed the Variant Annotation Tool (VAT) to functionally annotate variants from multiple personal genomes at the transcript level as well as obtain summary statistics across genes and individuals. VAT also allows visualization of the effects of different variants, integrates allele frequencies and genotype data from the underlying individuals and facilitates comparative analysis between different groups of individuals. VAT can either be run through a command-line interface or as a web application. Finally, in order to enable on-demand access and to minimize unnecessary transfers of large data files, VAT can be run as a virtual machine in a cloud-computing environment. VAT is implemented in C and PHP. The VAT web service, Amazon Machine Image, source code and detailed documentation are available at vat.gersteinlab.org.
Whole exome sequencing of rare variants in EIF4G1 and VPS35 in Parkinson disease

PubMed Central

Nuytemans, Karen; Bademci, Guney; Inchausti, Vanessa; Dressen, Amy; Kinnamon, Daniel D.; Mehta, Arpit; Wang, Liyong; Züchner, Stephan; Beecham, Gary W.; Martin, Eden R.; Scott, William K.

2013-01-01

Objective: Recently, vacuolar protein sorting 35 (VPS35) and eukaryotic translation initiation factor 4 gamma 1 (EIF4G1) have been identified as 2 causal Parkinson disease (PD) genes. We used whole exome sequencing for rapid, parallel analysis of variations in these 2 genes. Methods: We performed whole exome sequencing in 213 patients with PD and 272 control individuals. Those rare variants (RVs) with <5% frequency in the exome variant server database and our own control data were considered for analysis. We performed joint gene-based tests for association using RVASSOC and SKAT (Sequence Kernel Association Test) as well as single-variant test statistics. Results: We identified 3 novel VPS35 variations that changed the coded amino acid (nonsynonymous) in 3 cases. Two variations were in multiplex families and neither segregated with PD. In EIF4G1, we identified 11 (9 nonsynonymous and 2 small indels) RVs including the reported pathogenic mutation p.R1205H, which segregated in all affected members of a large family, but also in 1 unaffected 86-year-old family member. Two additional RVs were found in isolated patients only. Whereas initial association studies suggested an association (p = 0.04) with all RVs in EIF4G1, subsequent testing in a second dataset for the driving variant (p.F1461) suggested no association between RVs in the gene and PD. Conclusions: We confirm that the specific EIF4G1 variation p.R1205H seems to be a strong PD risk factor, but is nonpenetrant in at least one 86-year-old. A few other select RVs in both genes could not be ruled out as causal. However, there was no evidence for an overall contribution of genetic variability in VPS35 or EIF4G1 to PD development in our dataset. PMID:23408866
Exome sequencing-driven discovery of coding polymorphisms associated with common metabolic phenotypes.

PubMed

Albrechtsen, A; Grarup, N; Li, Y; Sparsø, T; Tian, G; Cao, H; Jiang, T; Kim, S Y; Korneliussen, T; Li, Q; Nie, C; Wu, R; Skotte, L; Morris, A P; Ladenvall, C; Cauchi, S; Stančáková, A; Andersen, G; Astrup, A; Banasik, K; Bennett, A J; Bolund, L; Charpentier, G; Chen, Y; Dekker, J M; Doney, A S F; Dorkhan, M; Forsen, T; Frayling, T M; Groves, C J; Gui, Y; Hallmans, G; Hattersley, A T; He, K; Hitman, G A; Holmkvist, J; Huang, S; Jiang, H; Jin, X; Justesen, J M; Kristiansen, K; Kuusisto, J; Lajer, M; Lantieri, O; Li, W; Liang, H; Liao, Q; Liu, X; Ma, T; Ma, X; Manijak, M P; Marre, M; Mokrosiński, J; Morris, A D; Mu, B; Nielsen, A A; Nijpels, G; Nilsson, P; Palmer, C N A; Rayner, N W; Renström, F; Ribel-Madsen, R; Robertson, N; Rolandsson, O; Rossing, P; Schwartz, T W; Slagboom, P E; Sterner, M; Tang, M; Tarnow, L; Tuomi, T; van't Riet, E; van Leeuwen, N; Varga, T V; Vestmar, M A; Walker, M; Wang, B; Wang, Y; Wu, H; Xi, F; Yengo, L; Yu, C; Zhang, X; Zhang, J; Zhang, Q; Zhang, W; Zheng, H; Zhou, Y; Altshuler, D; 't Hart, L M; Franks, P W; Balkau, B; Froguel, P; McCarthy, M I; Laakso, M; Groop, L; Christensen, C; Brandslund, I; Lauritzen, T; Witte, D R; Linneberg, A; Jørgensen, T; Hansen, T; Wang, J; Nielsen, R; Pedersen, O

2013-02-01

Human complex metabolic traits are in part regulated by genetic determinants. Here we applied exome sequencing to identify novel associations of coding polymorphisms at minor allele frequencies (MAFs) >1% with common metabolic phenotypes. The study comprised three stages. We performed medium-depth (8×) whole exome sequencing in 1,000 cases with type 2 diabetes, BMI >27.5 kg/m(2) and hypertension and in 1,000 controls (stage 1). We selected 16,192 polymorphisms nominally associated (p < 0.05) with case-control status, from four selected annotation categories or from loci reported to associate with metabolic traits. These variants were genotyped in 15,989 Danes to search for association with 12 metabolic phenotypes (stage 2). In stage 3, polymorphisms showing potential associations were genotyped in a further 63,896 Europeans. Exome sequencing identified 70,182 polymorphisms with MAF >1%. In stage 2 we identified 51 potential associations with one or more of eight metabolic phenotypes covered by 45 unique polymorphisms. In meta-analyses of stage 2 and stage 3 results, we demonstrated robust associations for coding polymorphisms in CD300LG (fasting HDL-cholesterol: MAF 3.5%, p = 8.5 × 10(-14)), COBLL1 (type 2 diabetes: MAF 12.5%, OR 0.88, p = 1.2 × 10(-11)) and MACF1 (type 2 diabetes: MAF 23.4%, OR 1.10, p = 8.2 × 10(-10)). We applied exome sequencing as a basis for finding genetic determinants of metabolic traits and show the existence of low-frequency and common coding polymorphisms with impact on common metabolic traits. Based on our study, coding polymorphisms with MAF above 1% do not seem to have particularly high effect sizes on the measured metabolic traits.
CNTN6 mutations are risk factors for abnormal auditory sensory perception in autism spectrum disorders.

PubMed

Mercati, O; Huguet, G; Danckaert, A; André-Leroux, G; Maruani, A; Bellinzoni, M; Rolland, T; Gouder, L; Mathieu, A; Buratti, J; Amsellem, F; Benabou, M; Van-Gils, J; Beggiato, A; Konyukh, M; Bourgeois, J-P; Gazzellone, M J; Yuen, R K C; Walker, S; Delépine, M; Boland, A; Régnault, B; Francois, M; Van Den Abbeele, T; Mosca-Boidron, A L; Faivre, L; Shimoda, Y; Watanabe, K; Bonneau, D; Rastam, M; Leboyer, M; Scherer, S W; Gillberg, C; Delorme, R; Cloëz-Tayarani, I; Bourgeron, T

2017-04-01

Contactin genes CNTN5 and CNTN6 code for neuronal cell adhesion molecules that promote neurite outgrowth in sensory-motor neuronal pathways. Mutations of CNTN5 and CNTN6 have previously been reported in individuals with autism spectrum disorders (ASDs), but very little is known on their prevalence and clinical impact. In this study, we identified CNTN5 and CNTN6 deleterious variants in individuals with ASD. Among the carriers, a girl with ASD and attention-deficit/hyperactivity disorder was carrying five copies of CNTN5. For CNTN6, both deletions (6/1534 ASD vs 1/8936 controls; P=0.00006) and private coding sequence variants (18/501 ASD vs 535/33480 controls; P=0.0005) were enriched in individuals with ASD. Among the rare CNTN6 variants, two deletions were transmitted by fathers diagnosed with ASD, one stop mutation CNTN6 W923X was transmitted by a mother to her two sons with ASD and one variant CNTN6 P770L was found de novo in a boy with ASD. Clinical investigations of the patients carrying CNTN5 or CNTN6 variants showed that they were hypersensitive to sounds (a condition called hyperacusis) and displayed changes in wave latency within the auditory pathway. These results reinforce the hypothesis of abnormal neuronal connectivity in the pathophysiology of ASD and shed new light on the genes that increase risk for abnormal sensory perception in ASD.
Early cancer diagnoses through BRCA1/2 screening of unselected adult biobank participants

PubMed Central

Buchanan, Adam H; Manickam, Kandamurugu; Meyer, Michelle N; Wagner, Jennifer K; Hallquist, Miranda L G; Williams, Janet L; Rahm, Alanna Kulchak; Williams, Marc S; Chen, Zong-Ming E; Shah, Chaitali K; Garg, Tullika K; Lazzeri, Amanda L; Schwartz, Marci L B; Lindbuchler, D'Andra M; Fan, Audrey L; Leeming, Rosemary; Servano, Pedro O; Smith, Ashlee L; Vogel, Victor G; Abul-Husn, Noura S; Dewey, Frederick E; Lebo, Matthew S; Mason-Suares, Heather M; Ritchie, Marylyn D; Davis, F Daniel; Carey, David J; Feinberg, David T; Faucett, W Andrew; Ledbetter, David H; Murray, Michael F

2018-01-01

Purpose The clinical utility of screening unselected individuals for pathogenic BRCA1/2 variants has not been established. Data on cancer risk management behaviors and diagnoses of BRCA1/2-associated cancers can help inform assessments of clinical utility. Methods Whole-exome sequences of participants in the MyCode Community Health Initiative were reviewed for pathogenic/likely pathogenic BRCA1/2 variants. Clinically confirmed variants were disclosed to patient–participants and their clinicians. We queried patient–participants’ electronic health records for BRCA1/2-associated cancer diagnoses and risk management that occurred within 12 months after results disclosure, and calculated the percentage of patient–participants of eligible age who had begun risk management. Results Thirty-seven MyCode patient–participants were unaware of their pathogenic/likely pathogenic BRCA1/2 variant, had not had a BRCA1/2-associated cancer, and had 12 months of follow-up. Of the 33 who were of an age to begin BRCA1/2-associated risk management, 26 (79%) had performed at least one such procedure. Three were diagnosed with an early-stage, BRCA1/2-associated cancer—including a stage 1C fallopian tube cancer—via these procedures. Conclusion Screening for pathogenic BRCA1/2 variants among unselected individuals can lead to occult cancer detection shortly after disclosure. Comprehensive outcomes data generated within our learning healthcare system will aid in determining whether population-wide BRCA1/2 genomic screening programs offer clinical utility. PMID:29261187
Genetic analysis of SIGMAR1 as a cause of familial ALS with dementia

PubMed Central

Belzil, Véronique V; Daoud, Hussein; Camu, William; Strong, Michael J; Dion, Patrick A; Rouleau, Guy A

2013-01-01

Amyotrophic lateral sclerosis (ALS) is the most common motor neuron diseases (MND), while frontotemporal lobar degeneration (FTLD) is the second most common cause of early-onset dementia. Many ALS families segregating FTLD have been reported, particularly over the last decade. Recently, mutations in TARDBP, FUS/TLS, and C9ORF72 have been identified in both ALS and FTLD patients, while mutations in VCP, a FTLD associated gene, have been found in ALS families. Distinct variants located in the 3′-untranslated region (UTR) of the SIGMAR1 gene were previously reported in three unrelated FTLD or FTLD–MND families. We directly sequenced the coding and UTR regions of the SIGMAR1 gene in a targeted cohort of 25 individual familial ALS cases of Caucasian origin with a history of cognitive impairments. This screening identified one variant in the 3′-UTR of the SIGMAR1 gene in one ALS patient, but the same variant was also observed in 1 out of 380 control chromosomes. Subsequently, we screened the same samples for a C9ORF72 repeat expansion: 52% of this cohort was found expanded, including the sample with the SIGMAR1 3′-UTR variant. Consequently, coding and noncoding variants located in the 3′-UTR region of the SIGMAR1 gene are not the cause of FTLD–MND in our cohort, and more than half of this targeted cohort is genetically explained by C9ORF72 repeat expansions. PMID:22739338
Genetic analysis of SIGMAR1 as a cause of familial ALS with dementia.

PubMed

Belzil, Véronique V; Daoud, Hussein; Camu, William; Strong, Michael J; Dion, Patrick A; Rouleau, Guy A

2013-02-01

Amyotrophic lateral sclerosis (ALS) is the most common motor neuron diseases (MND), while frontotemporal lobar degeneration (FTLD) is the second most common cause of early-onset dementia. Many ALS families segregating FTLD have been reported, particularly over the last decade. Recently, mutations in TARDBP, FUS/TLS, and C9ORF72 have been identified in both ALS and FTLD patients, while mutations in VCP, a FTLD associated gene, have been found in ALS families. Distinct variants located in the 3'-untranslated region (UTR) of the SIGMAR1 gene were previously reported in three unrelated FTLD or FTLD-MND families. We directly sequenced the coding and UTR regions of the SIGMAR1 gene in a targeted cohort of 25 individual familial ALS cases of Caucasian origin with a history of cognitive impairments. This screening identified one variant in the 3'-UTR of the SIGMAR1 gene in one ALS patient, but the same variant was also observed in 1 out of 380 control chromosomes. Subsequently, we screened the same samples for a C9ORF72 repeat expansion: 52% of this cohort was found expanded, including the sample with the SIGMAR1 3'-UTR variant. Consequently, coding and noncoding variants located in the 3'-UTR region of the SIGMAR1 gene are not the cause of FTLD-MND in our cohort, and more than half of this targeted cohort is genetically explained by C9ORF72 repeat expansions.
VARiD: a variation detection framework for color-space and letter-space platforms.

PubMed

Dalca, Adrian V; Rumble, Stephen M; Levy, Samuel; Brudno, Michael

2010-06-15

High-throughput sequencing (HTS) technologies are transforming the study of genomic variation. The various HTS technologies have different sequencing biases and error rates, and while most HTS technologies sequence the residues of the genome directly, generating base calls for each position, the Applied Biosystem's SOLiD platform generates dibase-coded (color space) sequences. While combining data from the various platforms should increase the accuracy of variation detection, to date there are only a few tools that can identify variants from color space data, and none that can analyze color space and regular (letter space) data together. We present VARiD--a probabilistic method for variation detection from both letter- and color-space reads simultaneously. VARiD is based on a hidden Markov model and uses the forward-backward algorithm to accurately identify heterozygous, homozygous and tri-allelic SNPs, as well as micro-indels. Our analysis shows that VARiD performs better than the AB SOLiD toolset at detecting variants from color-space data alone, and improves the calls dramatically when letter- and color-space reads are combined. The toolset is freely available at http://compbio.cs.utoronto.ca/varid.
Adler hantavirus, a new genetic variant of Tula virus identified in Major's pine voles (Microtus majori) sampled in southern European Russia.

PubMed

Tkachenko, Evgeniy A; Witkowski, Peter T; Radosa, Lukas; Dzagurova, Tamara K; Okulova, Nataliya M; Yunicheva, Yulia V; Vasilenko, Ludmila; Morozov, Vyacheslav G; Malkin, Gennadiy A; Krüger, Detlev H; Klempa, Boris

2015-01-01

Although at least 30 novel hantaviruses have been recently discovered in novel hosts such as shrews, moles and even bats, hantaviruses (family Bunyaviridae, genus Hantavirus) are primarily known as rodent-borne human pathogens. Here we report on identification of a novel hantavirus variant associated with a rodent host, Major's pine vole (Microtus majori). Altogether 36 hantavirus PCR-positive Major's pine voles were identified in the Krasnodar region of southern European Russia within the years 2008-2011. Initial partial L-segment sequence analysis revealed novel hantavirus sequences. Moreover, we found a single common vole (Microtusarvalis) infected with Tula virus (TULV). Complete S- and M-segment coding sequences were determined from 11 Major's pine voles originating from 8 trapping sites and subjected to phylogenetic analyses. The data obtained show that Major's pine vole is a newly recognized hantavirus reservoir host. The newfound virus, provisionally called Adler hantavirus (ADLV), is closely related to TULV. Based on amino acid differences to TULV (5.6-8.2% for nucleocapsid protein, 9.4-9.5% for glycoprotein precursor) we propose to consider ADLV as a genotype of TULV. Occurrence of ADLV and TULV in the same region suggests that ADLV is not only a geographical variant of TULV but a host-specific genotype. High intra-cluster nucleotide sequence variability (up to 18%) and geographic clustering indicate long-term presence of the virus in this region. Copyright © 2014. Published by Elsevier B.V.
Molecular diagnosis of putative Stargardt disease probands by exome sequencing

PubMed Central

2012-01-01

Background The commonest genetic form of juvenile or early adult onset macular degeneration is Stargardt Disease (STGD) caused by recessive mutations in the gene ABCA4. However, high phenotypic and allelic heterogeneity and a small but non-trivial amount of locus heterogeneity currently impede conclusive molecular diagnosis in a significant proportion of cases. Methods We performed whole exome sequencing (WES) of nine putative Stargardt Disease probands and searched for potentially disease-causing genetic variants in previously identified retinal or macular dystrophy genes. Follow-up dideoxy sequencing was performed for confirmation and to screen for mutations in an additional set of affected individuals lacking a definitive molecular diagnosis. Results Whole exome sequencing revealed seven likely disease-causing variants across four genes, providing a confident genetic diagnosis in six previously uncharacterized participants. We identified four previously missed mutations in ABCA4 across three individuals. Likely disease-causing mutations in RDS/PRPH2, ELOVL, and CRB1 were also identified. Conclusions Our findings highlight the enormous potential of whole exome sequencing in Stargardt Disease molecular diagnosis and research. WES adequately assayed all coding sequences and canonical splice sites of ABCA4 in this study. Additionally, WES enables the identification of disease-related alleles in other genes. This work highlights the importance of collecting parental genetic material for WES testing as the current knowledge of human genome variation limits the determination of causality between identified variants and disease. While larger sample sizes are required to establish the precision and accuracy of this type of testing, this study supports WES for inherited early onset macular degeneration disorders as an alternative to standard mutation screening techniques. PMID:22863181
COOLAIR Antisense RNAs Form Evolutionarily Conserved Elaborate Secondary Structures

DOE PAGES

Hawkes, Emily J.; Hennelly, Scott P.; Novikova, Irina V.; ...

2016-09-20

There is considerable debate about the functionality of long non-coding RNAs (lncRNAs). Lack of sequence conservation has been used to argue against functional relevance. Here, we investigated antisense lncRNAs, called COOLAIR, at the A. thaliana FLC locus and experimentally determined their secondary structure. The major COOLAIR variants are highly structured, organized by exon. The distally polyadenylated transcript has a complex multi-domain structure, altered by a single non-coding SNP defining a functionally distinct A. thaliana FLC haplotype. The A. thaliana COOLAIR secondary structure was used to predict COOLAIR exons in evolutionarily divergent Brassicaceae species. These predictions were validated through chemical probingmore » and cloning. Despite the relatively low nucleotide sequence identity, the structures, including multi-helix junctions, show remarkable evolutionary conservation. In a number of places, the structure is conserved through covariation of a non-contiguous DNA sequence. This structural conservation supports a functional role for COOLAIR transcripts rather than, or in addition to, antisense transcription.« less
COOLAIR Antisense RNAs Form Evolutionarily Conserved Elaborate Secondary Structures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hawkes, Emily J.; Hennelly, Scott P.; Novikova, Irina V.

There is considerable debate about the functionality of long non-coding RNAs (lncRNAs). Lack of sequence conservation has been used to argue against functional relevance. Here, we investigated antisense lncRNAs, called COOLAIR, at the A. thaliana FLC locus and experimentally determined their secondary structure. The major COOLAIR variants are highly structured, organized by exon. The distally polyadenylated transcript has a complex multi-domain structure, altered by a single non-coding SNP defining a functionally distinct A. thaliana FLC haplotype. The A. thaliana COOLAIR secondary structure was used to predict COOLAIR exons in evolutionarily divergent Brassicaceae species. These predictions were validated through chemical probingmore » and cloning. Despite the relatively low nucleotide sequence identity, the structures, including multi-helix junctions, show remarkable evolutionary conservation. In a number of places, the structure is conserved through covariation of a non-contiguous DNA sequence. This structural conservation supports a functional role for COOLAIR transcripts rather than, or in addition to, antisense transcription.« less
The genomic substrate for adaptive radiation in African cichlid fish.

PubMed

Brawand, David; Wagner, Catherine E; Li, Yang I; Malinsky, Milan; Keller, Irene; Fan, Shaohua; Simakov, Oleg; Ng, Alvin Y; Lim, Zhi Wei; Bezault, Etienne; Turner-Maier, Jason; Johnson, Jeremy; Alcazar, Rosa; Noh, Hyun Ji; Russell, Pamela; Aken, Bronwen; Alföldi, Jessica; Amemiya, Chris; Azzouzi, Naoual; Baroiller, Jean-François; Barloy-Hubler, Frederique; Berlin, Aaron; Bloomquist, Ryan; Carleton, Karen L; Conte, Matthew A; D'Cotta, Helena; Eshel, Orly; Gaffney, Leslie; Galibert, Francis; Gante, Hugo F; Gnerre, Sante; Greuter, Lucie; Guyon, Richard; Haddad, Natalie S; Haerty, Wilfried; Harris, Rayna M; Hofmann, Hans A; Hourlier, Thibaut; Hulata, Gideon; Jaffe, David B; Lara, Marcia; Lee, Alison P; MacCallum, Iain; Mwaiko, Salome; Nikaido, Masato; Nishihara, Hidenori; Ozouf-Costaz, Catherine; Penman, David J; Przybylski, Dariusz; Rakotomanga, Michaelle; Renn, Suzy C P; Ribeiro, Filipe J; Ron, Micha; Salzburger, Walter; Sanchez-Pulido, Luis; Santos, M Emilia; Searle, Steve; Sharpe, Ted; Swofford, Ross; Tan, Frederick J; Williams, Louise; Young, Sarah; Yin, Shuangye; Okada, Norihiro; Kocher, Thomas D; Miska, Eric A; Lander, Eric S; Venkatesh, Byrappa; Fernald, Russell D; Meyer, Axel; Ponting, Chris P; Streelman, J Todd; Lindblad-Toh, Kerstin; Seehausen, Ole; Di Palma, Federica

2014-09-18

Cichlid fishes are famous for large, diverse and replicated adaptive radiations in the Great Lakes of East Africa. To understand the molecular mechanisms underlying cichlid phenotypic diversity, we sequenced the genomes and transcriptomes of five lineages of African cichlids: the Nile tilapia (Oreochromis niloticus), an ancestral lineage with low diversity; and four members of the East African lineage: Neolamprologus brichardi/pulcher (older radiation, Lake Tanganyika), Metriaclima zebra (recent radiation, Lake Malawi), Pundamilia nyererei (very recent radiation, Lake Victoria), and Astatotilapia burtoni (riverine species around Lake Tanganyika). We found an excess of gene duplications in the East African lineage compared to tilapia and other teleosts, an abundance of non-coding element divergence, accelerated coding sequence evolution, expression divergence associated with transposable element insertions, and regulation by novel microRNAs. In addition, we analysed sequence data from sixty individuals representing six closely related species from Lake Victoria, and show genome-wide diversifying selection on coding and regulatory variants, some of which were recruited from ancient polymorphisms. We conclude that a number of molecular mechanisms shaped East African cichlid genomes, and that amassing of standing variation during periods of relaxed purifying selection may have been important in facilitating subsequent evolutionary diversification.
The genomic substrate for adaptive radiation in African cichlid fish

PubMed Central

Malinsky, Milan; Keller, Irene; Fan, Shaohua; Simakov, Oleg; Ng, Alvin Y.; Lim, Zhi Wei; Bezault, Etienne; Turner-Maier, Jason; Johnson, Jeremy; Alcazar, Rosa; Noh, Hyun Ji; Russell, Pamela; Aken, Bronwen; Alföldi, Jessica; Amemiya, Chris; Azzouzi, Naoual; Baroiller, Jean-François; Barloy-Hubler, Frederique; Berlin, Aaron; Bloomquist, Ryan; Carleton, Karen L.; Conte, Matthew A.; D'Cotta, Helena; Eshel, Orly; Gaffney, Leslie; Galibert, Francis; Gante, Hugo F.; Gnerre, Sante; Greuter, Lucie; Guyon, Richard; Haddad, Natalie S.; Haerty, Wilfried; Harris, Rayna M.; Hofmann, Hans A.; Hourlier, Thibaut; Hulata, Gideon; Jaffe, David B.; Lara, Marcia; Lee, Alison P.; MacCallum, Iain; Mwaiko, Salome; Nikaido, Masato; Nishihara, Hidenori; Ozouf-Costaz, Catherine; Penman, David J.; Przybylski, Dariusz; Rakotomanga, Michaelle; Renn, Suzy C. P.; Ribeiro, Filipe J.; Ron, Micha; Salzburger, Walter; Sanchez-Pulido, Luis; Santos, M. Emilia; Searle, Steve; Sharpe, Ted; Swofford, Ross; Tan, Frederick J.; Williams, Louise; Young, Sarah; Yin, Shuangye; Okada, Norihiro; Kocher, Thomas D.; Miska, Eric A.; Lander, Eric S.; Venkatesh, Byrappa; Fernald, Russell D.; Meyer, Axel; Ponting, Chris P.; Streelman, J. Todd; Lindblad-Toh, Kerstin; Seehausen, Ole; Di Palma, Federica

2015-01-01

Cichlid fishes are famous for large, diverse and replicated adaptive radiations in the Great Lakes of East Africa. To understand the molecular mechanisms underlying cichlid phenotypic diversity, we sequenced the genomes and transcriptomes of five lineages of African cichlids: the Nile tilapia (Oreochromis niloticus), an ancestral lineage with low diversity; and four members of the East African lineage: Neolamprologus brichardi/pulcher (older radiation, Lake Tanganyika), Metriaclima zebra (recent radiation, Lake Malawi), Pundamilia nyererei (very recent radiation, Lake Victoria), and Astatotilapia burtoni (riverine species around Lake Tanganyika). We found an excess of gene duplications in the East African lineage compared to tilapia and other teleosts, an abundance of non-coding element divergence, accelerated coding sequence evolution, expression divergence associated with transposable element insertions, and regulation by novel microRNAs. In addition, we analysed sequence data from sixty individuals representing six closely related species from Lake Victoria, and show genome-wide diversifying selection on coding and regulatory variants, some of which were recruited from ancient polymorphisms. We conclude that a number of molecular mechanisms shaped East African cichlid genomes, and that amassing of standing variation during periods of relaxed purifying selection may have been important in facilitating subsequent evolutionary diversification. PMID:25186727
Prevalence of the prion protein gene E211K variant in U.S. cattle

PubMed Central

Heaton, Michael P; Keele, John W; Harhay, Gregory P; Richt, Jürgen A; Koohmaraie, Mohammad; Wheeler, Tommy L; Shackelford, Steven D; Casas, Eduardo; King, D Andy; Sonstegard, Tad S; Van Tassell, Curtis P; Neibergs, Holly L; Chase, Chad C; Kalbfleisch, Theodore S; Smith, Timothy PL; Clawson, Michael L; Laegreid, William W

2008-01-01

Background In 2006, an atypical U.S. case of bovine spongiform encephalopathy (BSE) was discovered in Alabama and later reported to be polymorphic for glutamate (E) and lysine (K) codons at position 211 in the bovine prion protein gene (Prnp) coding sequence. A bovine E211K mutation is important because it is analogous to the most common pathogenic mutation in humans (E200K) which causes hereditary Creutzfeldt – Jakob disease, an autosomal dominant form of prion disease. The present report describes a high-throughput matrix-associated laser desorption/ionization-time-of-flight mass spectrometry assay for scoring the Prnp E211K variant and its use to determine an upper limit for the K211 allele frequency in U.S. cattle. Results The K211 allele was not detected in 6062 cattle, including those from five commercial beef processing plants (3892 carcasses) and 2170 registered cattle from 42 breeds. Multiple nearby polymorphisms in Prnp coding sequence of 1456 diverse purebred cattle (42 breeds) did not interfere with scoring E211 or K211 alleles. Based on these results, the upper bounds for prevalence of the E211K variant was estimated to be extremely low, less than 1 in 2000 cattle (Bayesian analysis based on 95% quantile of the posterior distribution with a uniform prior). Conclusion No groups or breeds of U.S. cattle are presently known to harbor the Prnp K211 allele. Because a carrier was not detected, the number of additional atypical BSE cases with K211 will also be vanishingly low. PMID:18625065
Identification of new TSGA10 transcript variants in human testis with conserved regulatory RNA elements in 5'untranslated region and distinct expression in breast cancer.

PubMed

Salehipour, Pouya; Nematzadeh, Mahsa; Mobasheri, Maryam Beigom; Afsharpad, Mandana; Mansouri, Kamran; Modarressi, Mohammad Hossein

2017-09-01

Testis specific gene antigen 10 (TSGA10) is a cancer testis antigen involved in the process of spermatogenesis. TSGA10 could also play an important role in the inhibition of angiogenesis by preventing nuclear localization of HIF-1α. Although it has been shown that TSGA10 messenger RNA (mRNA) is mainly expressed in testis and some tumors, the transcription pattern and regulatory mechanisms of this gene remain largely unknown. Here, we report that human TSGA10 comprises at least 22 exons and generates four different transcript variants. It was identified that using two distinct promoters and splicing of exons 4 and 7 produced these transcript variants, which have the same coding sequence, but the sequence of 5'untanslated region (5'UTR) is different between them. This is significant because conserved regulatory RNA elements like upstream open reading frame (uORF) and putative internal ribosome entry site (IRES) were found in this region which have different combinations in each transcript variant and it may influence translational efficiency of them in normal or unusual environmental conditions like hypoxia. To indicate the transcription pattern of TSGA10 in breast cancer, expression of identified transcript variants was analyzed in 62 breast cancer samples. We found that TSGA10 tends to express variants with shorter 5'UTR and fewer uORF elements in breast cancer tissues. Our study demonstrates for the first time the expression of different TSGA10 transcript variants in testis and breast cancer tissues and provides a first clue to a role of TSGA10 5'UTR in regulation of translation in unusual environmental conditions like hypoxia. Copyright © 2017. Published by Elsevier B.V.
Correlation of rare coding variants in the gene encoding human glucokinase regulatory protein with phenotypic, cellular, and kinetic outcomes.

PubMed

Rees, Matthew G; Ng, David; Ruppert, Sarah; Turner, Clesson; Beer, Nicola L; Swift, Amy J; Morken, Mario A; Below, Jennifer E; Blech, Ilana; Mullikin, James C; McCarthy, Mark I; Biesecker, Leslie G; Gloyn, Anna L; Collins, Francis S

2012-01-01

Defining the genetic contribution of rare variants to common diseases is a major basic and clinical science challenge that could offer new insights into disease etiology and provide potential for directed gene- and pathway-based prevention and treatment. Common and rare nonsynonymous variants in the GCKR gene are associated with alterations in metabolic traits, most notably serum triglyceride levels. GCKR encodes glucokinase regulatory protein (GKRP), a predominantly nuclear protein that inhibits hepatic glucokinase (GCK) and plays a critical role in glucose homeostasis. The mode of action of rare GCKR variants remains unexplored. We identified 19 nonsynonymous GCKR variants among 800 individuals from the ClinSeq medical sequencing project. Excluding the previously described common missense variant p.Pro446Leu, all variants were rare in the cohort. Accordingly, we functionally characterized all variants to evaluate their potential phenotypic effects. Defects were observed for the majority of the rare variants after assessment of cellular localization, ability to interact with GCK, and kinetic activity of the encoded proteins. Comparing the individuals with functional rare variants to those without such variants showed associations with lipid phenotypes. Our findings suggest that, while nonsynonymous GCKR variants, excluding p.Pro446Leu, are rare in individuals of mixed European descent, the majority do affect protein function. In sum, this study utilizes computational, cell biological, and biochemical methods to present a model for interpreting the clinical significance of rare genetic variants in common disease.
DNA Sequence Variants in PPARGC1A, a Gene Encoding a Coactivator of the ω-3 LCPUFA Sensing PPAR-RXR Transcription Complex, Are Associated with NV AMD and AMD-Associated Loci in Genes of Complement and VEGF Signaling Pathways

PubMed Central

SanGiovanni, John Paul; Chen, Jing; Sapieha, Przemyslaw; Aderman, Christopher M.; Stahl, Andreas; Clemons, Traci E.; Chew, Emily Y.; Smith, Lois E. H.

2013-01-01

Background Increased intake of ω-3 long-chain polyunsaturated fatty acids (LCPUFAs) and use of peroxisome proliferator activator receptor (PPAR)-activating drugs are associated with attenuation of pathologic retinal angiogenesis. ω-3 LCPUFAs are endogenous agonists of PPARs. We postulated that DNA sequence variation in PPAR gamma (PPARG) co-activator 1 alpha (PPARGC1A), a gene encoding a co-activator of the LCPUFA-sensing PPARG-retinoid X receptor (RXR) transcription complex, may influence neovascularization (NV) in age-related macular degeneration (AMD). Methods We applied exact testing methods to examine distributions of DNA sequence variants in PPARGC1A for association with NV AMD and interaction of AMD-associated loci in genes of complement, lipid metabolism, and VEGF signaling systems. Our sample contained 1858 people from 3 elderly cohorts of western European ancestry. We concurrently investigated retinal gene expression profiles in 17-day-old neonatal mice on a 2% LCPUFA feeding paradigm to identify LCPUFA-regulated genes both associated with pathologic retinal angiogenesis and known to interact with PPARs or PPARGC1A. Results A DNA coding variant (rs3736265) and a 3'UTR-resident regulatory variant (rs3774923) in PPARGC1A were independently associated with NV AMD (exact P = 0.003, both SNPs). SNP-SNP interactions existed for NV AMD (P<0.005) with rs3736265 and a AMD-associated variant in complement factor B (CFB, rs512559). PPARGC1A influences activation of the AMD-associated complement component 3 (C3) promoter fragment and CFB influences activation and proteolysis of C3. We observed interaction (P≤0.003) of rs3736265 with a variant in vascular endothelial growth factor A (VEGFA, rs3025033), a key molecule in retinal angiogenesis. Another PPARGC1A coding variant (rs8192678) showed statistical interaction with a SNP in the VEGFA receptor fms-related tyrosine kinase 1 (FLT1, rs10507386; P≤0.003). C3 expression was down-regulated 2-fold in retinas of ω-3 LCPUFA-fed mice – these animals also showed 70% reduction in retinal NV (P≤0.001). Conclusion Ligands and co-activators of the ω-3 LCPUFA sensing PPAR-RXR axis may influence retinal angiogenesis in NV AMD via the complement and VEGF signaling systems. We have linked the co-activator of a lipid-sensing transcription factor (PPARG co-activator 1 alpha, PPARGC1A) to age-related macular degeneration (AMD) and AMD-associated genes. PMID:23335958
No evidence that protein truncating variants in BRIP1 are associated with breast cancer risk: implications for gene panel testing.

PubMed

Easton, Douglas F; Lesueur, Fabienne; Decker, Brennan; Michailidou, Kyriaki; Li, Jun; Allen, Jamie; Luccarini, Craig; Pooley, Karen A; Shah, Mitul; Bolla, Manjeet K; Wang, Qin; Dennis, Joe; Ahmad, Jamil; Thompson, Ella R; Damiola, Francesca; Pertesi, Maroulio; Voegele, Catherine; Mebirouk, Noura; Robinot, Nivonirina; Durand, Geoffroy; Forey, Nathalie; Luben, Robert N; Ahmed, Shahana; Aittomäki, Kristiina; Anton-Culver, Hoda; Arndt, Volker; Baynes, Caroline; Beckman, Matthias W; Benitez, Javier; Van Den Berg, David; Blot, William J; Bogdanova, Natalia V; Bojesen, Stig E; Brenner, Hermann; Chang-Claude, Jenny; Chia, Kee Seng; Choi, Ji-Yeob; Conroy, Don M; Cox, Angela; Cross, Simon S; Czene, Kamila; Darabi, Hatef; Devilee, Peter; Eriksson, Mikael; Fasching, Peter A; Figueroa, Jonine; Flyger, Henrik; Fostira, Florentia; García-Closas, Montserrat; Giles, Graham G; Glendon, Gord; González-Neira, Anna; Guénel, Pascal; Haiman, Christopher A; Hall, Per; Hart, Steven N; Hartman, Mikael; Hooning, Maartje J; Hsiung, Chia-Ni; Ito, Hidemi; Jakubowska, Anna; James, Paul A; John, Esther M; Johnson, Nichola; Jones, Michael; Kabisch, Maria; Kang, Daehee; Kosma, Veli-Matti; Kristensen, Vessela; Lambrechts, Diether; Li, Na; Lindblom, Annika; Long, Jirong; Lophatananon, Artitaya; Lubinski, Jan; Mannermaa, Arto; Manoukian, Siranoush; Margolin, Sara; Matsuo, Keitaro; Meindl, Alfons; Mitchell, Gillian; Muir, Kenneth; Nevelsteen, Ines; van den Ouweland, Ans; Peterlongo, Paolo; Phuah, Sze Yee; Pylkäs, Katri; Rowley, Simone M; Sangrajrang, Suleeporn; Schmutzler, Rita K; Shen, Chen-Yang; Shu, Xiao-Ou; Southey, Melissa C; Surowy, Harald; Swerdlow, Anthony; Teo, Soo H; Tollenaar, Rob A E M; Tomlinson, Ian; Torres, Diana; Truong, Thérèse; Vachon, Celine; Verhoef, Senno; Wong-Brown, Michelle; Zheng, Wei; Zheng, Ying; Nevanlinna, Heli; Scott, Rodney J; Andrulis, Irene L; Wu, Anna H; Hopper, John L; Couch, Fergus J; Winqvist, Robert; Burwinkel, Barbara; Sawyer, Elinor J; Schmidt, Marjanka K; Rudolph, Anja; Dörk, Thilo; Brauch, Hiltrud; Hamann, Ute; Neuhausen, Susan L; Milne, Roger L; Fletcher, Olivia; Pharoah, Paul D P; Campbell, Ian G; Dunning, Alison M; Le Calvez-Kelm, Florence; Goldgar, David E; Tavtigian, Sean V; Chenevix-Trench, Georgia

2016-05-01

BRCA1 interacting protein C-terminal helicase 1 (BRIP1) is one of the Fanconi Anaemia Complementation (FANC) group family of DNA repair proteins. Biallelic mutations in BRIP1 are responsible for FANC group J, and previous studies have also suggested that rare protein truncating variants in BRIP1 are associated with an increased risk of breast cancer. These studies have led to inclusion of BRIP1 on targeted sequencing panels for breast cancer risk prediction. We evaluated a truncating variant, p.Arg798Ter (rs137852986), and 10 missense variants of BRIP1, in 48 144 cases and 43 607 controls of European origin, drawn from 41 studies participating in the Breast Cancer Association Consortium (BCAC). Additionally, we sequenced the coding regions of BRIP1 in 13 213 cases and 5242 controls from the UK, 1313 cases and 1123 controls from three population-based studies as part of the Breast Cancer Family Registry, and 1853 familial cases and 2001 controls from Australia. The rare truncating allele of rs137852986 was observed in 23 cases and 18 controls in Europeans in BCAC (OR 1.09, 95% CI 0.58 to 2.03, p=0.79). Truncating variants were found in the sequencing studies in 34 cases (0.21%) and 19 controls (0.23%) (combined OR 0.90, 95% CI 0.48 to 1.70, p=0.75). These results suggest that truncating variants in BRIP1, and in particular p.Arg798Ter, are not associated with a substantial increase in breast cancer risk. Such observations have important implications for the reporting of results from breast cancer screening panels. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Characterization and Expression of the Lucina pectinata Oxygen and Sulfide Binding Hemoglobin Genes

PubMed Central

López-Garriga, Juan; Cadilla, Carmen L.

2016-01-01

The clam Lucina pectinata lives in sulfide-rich muds and houses intracellular symbiotic bacteria that need to be supplied with hydrogen sulfide and oxygen. This clam possesses three hemoglobins: hemoglobin I (HbI), a sulfide-reactive protein, and hemoglobin II (HbII) and III (HbIII), which are oxygen-reactive. We characterized the complete gene sequence and promoter regions for the oxygen reactive hemoglobins and the partial structure and promoters of the HbI gene from Lucina pectinata. We show that HbI has two mRNA variants, where the 5’end had either a sequence of 96 bp (long variant) or 37 bp (short variant). The gene structure of the oxygen reactive Hbs is defined by having 4-exons/3-introns with conservation of intron location at B12.2 and G7.0 and the presence of pre-coding introns, while the partial gene structure of HbI has the same intron conservation but appears to have a 5-exon/ 4-intron structure. A search for putative transcription factor binding sites (TFBSs) was done with the promoters for HbII, HbIII, HbI short and HbI long. The HbII, HbIII and HbI long promoters showed similar predicted TFBSs. We also characterized MITE-like elements in the HbI and HbII gene promoters and intronic regions that are similar to sequences found in other mollusk genomes. The gene expression levels of the clam Hbs, from sulfide-rich and sulfide-poor environments showed a significant decrease of expression in the symbiont-containing tissue for those clams in a sulfide-poor environment, suggesting that the sulfide concentration may be involved in the regulation of these proteins. Gene expression evaluation of the two HbI mRNA variants indicated that the longer variant is expressed at higher levels than the shorter variant in both environments. PMID:26824233

Fast single-pass alignment and variant calling using sequencing data

USDA-ARS?s Scientific Manuscript database

Sequencing research requires efficient computation. Few programs use already known information about DNA variants when aligning sequence data to the reference map. New program findmap.f90 reads the previous variant list before aligning sequence, calling variant alleles, and summing the allele counts...
Diverse point mutations in the human gene for polymorphic N-acetyltransferase

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vatsis, K.P.; Martell, K.J.; Weber, W.W.

1991-07-15

Classification of humans as rapid or slow acetylators is based on hereditary differences in rates of N-acetylation of therapeutic and carcinogenic agents, but N-acetylation of certain arylamine drugs displays no genetic variation. Two highly homologous human genes for N-acetyltransferase NAT1 and NAT2, presumably code for the genetically invariant and variant NAT proteins, respectively. In the present investigation, 1.9-kilobase human genomic EcoRI fragments encoding NAT2 were generated by the polymerase chain reaction with liver and leukocyte DNA from seven subjects phenotyped as homozygous and heterozygous acetylators. Direct sequencing revealed multiple point mutations in the coding region of two distinct NAT2 variants.more » One of these was derived from leukocytes of a slow acetylator and was distinguished by a silent mutation (coden 94) and a separate G {r arrow} A transition (position 590) leading to replacement of Arg-197 by Gln; the mutated guanine was part of a CpG dinucleotide and a Taq I site. The second NAT2 variant originated from liver with low N-acetylation activity. It was characterized by three nucleotide transitions giving rise to a silent mutation (codon 161), accompanied by obliteration of the sole Kpn I site, and two amino acid substitutions. The results show conclusively that the genetically variant NAT is encoded by NAT2.« less
The long non-coding RNA GAS5 differentially regulates cell cycle arrest and apoptosis through activation of BRCA1 and p53 in human neuroblastoma

PubMed Central

Mazar, Joseph; Rosado, Amy; Shelley, John; Marchica, John; Westmoreland, Tamarah J

2017-01-01

The long non-coding RNA GAS5 has been shown to modulate cancer proliferation in numerous human cancer systems and has been correlated with successful patient outcome. Our examination of GAS5 in neuroblastoma has revealed robust expression in both MYCN-amplified and non-amplified cell lines. Knockdown of GAS5 In vitro resulted in defects in cell proliferation, apoptosis, and induced cell cycle arrest. Further analysis of GAS5 clones revealed multiple novel splice variants, two of which inversely modulated with MYCN status. Complementation studies of the variants post-knockdown of GAS5 indicated alternate phenotypes, with one variant (FL) considerably enhancing cell proliferation by rescuing cell cycle arrest and the other (C2) driving apoptosis, suggesting a unique role for each in neuroblastoma cancer physiology. Global sequencing and ELISA arrays revealed that the loss of GAS5 induced p53, BRCA1, and GADD45A, which appeared to modulate cell cycle arrest in concert. Complementation with only the FL GAS5 clone could rescue cell cycle arrest, stabilizing HDM2, and leading to the loss of p53. Together, these data offer novel therapeutic targets in the form of lncRNA splice variants for separate challenges against cancer growth and cell death. PMID:28035057
Inheritance-mode specific pathogenicity prioritization (ISPP) for human protein coding genes.

PubMed

Hsu, Jacob Shujui; Kwan, Johnny S H; Pan, Zhicheng; Garcia-Barcelo, Maria-Mercè; Sham, Pak Chung; Li, Miaoxin

2016-10-15

Exome sequencing studies have facilitated the detection of causal genetic variants in yet-unsolved Mendelian diseases. However, the identification of disease causal genes among a list of candidates in an exome sequencing study is still not fully settled, and it is often difficult to prioritize candidate genes for follow-up studies. The inheritance mode provides crucial information for understanding Mendelian diseases, but none of the existing gene prioritization tools fully utilize this information. We examined the characteristics of Mendelian disease genes under different inheritance modes. The results suggest that Mendelian disease genes with autosomal dominant (AD) inheritance mode are more haploinsufficiency and de novo mutation sensitive, whereas those autosomal recessive (AR) genes have significantly more non-synonymous variants and regulatory transcript isoforms. In addition, the X-linked (XL) Mendelian disease genes have fewer non-synonymous and synonymous variants. As a result, we derived a new scoring system for prioritizing candidate genes for Mendelian diseases according to the inheritance mode. Our scoring system assigned to each annotated protein-coding gene (N = 18 859) three pathogenic scores according to the inheritance mode (AD, AR and XL). This inheritance mode-specific framework achieved higher accuracy (area under curve = 0.84) in XL mode. The inheritance-mode specific pathogenicity prioritization (ISPP) outperformed other well-known methods including Haploinsufficiency, Recessive, Network centrality, Genic Intolerance, Gene Damage Index and Gene Constraint scores. This systematic study suggests that genes manifesting disease inheritance modes tend to have unique characteristics. ISPP is included in KGGSeq v1.0 (http://grass.cgs.hku.hk/limx/kggseq/), and source code is available from (https://github.com/jacobhsu35/ISPP.git). mxli@hku.hkSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Polymorphisms in adenosine receptor genes are associated with infarct size in patients with ischemic cardiomyopathy.

PubMed

Tang, Z; Diamond, M A; Chen, J-M; Holly, T A; Bonow, R O; Dasgupta, A; Hyslop, T; Purzycki, A; Wagner, J; McNamara, D M; Kukulski, T; Wos, S; Velazquez, E J; Ardlie, K; Feldman, A M

2007-10-01

The goal of this experiment was to identify the presence of genetic variants in the adenosine receptor genes and assess their relationship to infarct size in a population of patients with ischemic cardiomyopathy. Adenosine receptors play an important role in protecting the heart during ischemia and in mediating the effects of ischemic preconditioning. We sequenced DNA samples from 273 individuals with ischemic cardiomyopathy and from 203 normal controls to identify the presence of genetic variants in the adenosine receptor genes. Subsequently, we analyzed the relationship between the identified genetic variants and infarct size, left ventricular size, and left ventricular function. Three variants in the 3'-untranslated region of the A(1)-adenosine gene (nt 1689 C/A, nt 2206 Tdel, nt 2683del36) and an informative polymorphism in the coding region of the A3-adenosine gene (nt 1509 A/C I248L) were associated with changes in infarct size. These results suggest that genetic variants in the adenosine receptor genes may predict the heart's response to ischemia or injury and might also influence an individual's response to adenosine therapy.
Novel pedigree analysis implicates DNA repair and chromatin remodeling in multiple myeloma risk

PubMed Central

Curtin, Karen; Rajamanickam, Venkatesh; Jayabalan, David; Atanackovic, Djordje; Rajkumar, S. Vincent; Kumar, Shaji; Slager, Susan; Galia, Perrine; Demangel, Delphine; Salama, Mohamed; Joseph, Vijai; Lipkin, Steven M.; Dumontet, Charles; Vachon, Celine M.

2018-01-01

The high-risk pedigree (HRP) design is an established strategy to discover rare, highly-penetrant, Mendelian-like causal variants. Its success, however, in complex traits has been modest, largely due to challenges of genetic heterogeneity and complex inheritance models. We describe a HRP strategy that addresses intra-familial heterogeneity, and identifies inherited segments important for mapping regulatory risk. We apply this new Shared Genomic Segment (SGS) method in 11 extended, Utah, multiple myeloma (MM) HRPs, and subsequent exome sequencing in SGS regions of interest in 1063 MM / MGUS (monoclonal gammopathy of undetermined significance–a precursor to MM) cases and 964 controls from a jointly-called collaborative resource, including cases from the initial 11 HRPs. One genome-wide significant 1.8 Mb shared segment was found at 6q16. Exome sequencing in this region revealed predicted deleterious variants in USP45 (p.Gln691* and p.Gln621Glu), a gene known to influence DNA repair through endonuclease regulation. Additionally, a 1.2 Mb segment at 1p36.11 is inherited in two Utah HRPs, with coding variants identified in ARID1A (p.Ser90Gly and p.Met890Val), a key gene in the SWI/SNF chromatin remodeling complex. Our results provide compelling statistical and genetic evidence for segregating risk variants for MM. In addition, we demonstrate a novel strategy to use large HRPs for risk-variant discovery more generally in complex traits. PMID:29389935
Novel pedigree analysis implicates DNA repair and chromatin remodeling in multiple myeloma risk.

PubMed

Waller, Rosalie G; Darlington, Todd M; Wei, Xiaomu; Madsen, Michael J; Thomas, Alun; Curtin, Karen; Coon, Hilary; Rajamanickam, Venkatesh; Musinsky, Justin; Jayabalan, David; Atanackovic, Djordje; Rajkumar, S Vincent; Kumar, Shaji; Slager, Susan; Middha, Mridu; Galia, Perrine; Demangel, Delphine; Salama, Mohamed; Joseph, Vijai; McKay, James; Offit, Kenneth; Klein, Robert J; Lipkin, Steven M; Dumontet, Charles; Vachon, Celine M; Camp, Nicola J

2018-02-01

The high-risk pedigree (HRP) design is an established strategy to discover rare, highly-penetrant, Mendelian-like causal variants. Its success, however, in complex traits has been modest, largely due to challenges of genetic heterogeneity and complex inheritance models. We describe a HRP strategy that addresses intra-familial heterogeneity, and identifies inherited segments important for mapping regulatory risk. We apply this new Shared Genomic Segment (SGS) method in 11 extended, Utah, multiple myeloma (MM) HRPs, and subsequent exome sequencing in SGS regions of interest in 1063 MM / MGUS (monoclonal gammopathy of undetermined significance-a precursor to MM) cases and 964 controls from a jointly-called collaborative resource, including cases from the initial 11 HRPs. One genome-wide significant 1.8 Mb shared segment was found at 6q16. Exome sequencing in this region revealed predicted deleterious variants in USP45 (p.Gln691* and p.Gln621Glu), a gene known to influence DNA repair through endonuclease regulation. Additionally, a 1.2 Mb segment at 1p36.11 is inherited in two Utah HRPs, with coding variants identified in ARID1A (p.Ser90Gly and p.Met890Val), a key gene in the SWI/SNF chromatin remodeling complex. Our results provide compelling statistical and genetic evidence for segregating risk variants for MM. In addition, we demonstrate a novel strategy to use large HRPs for risk-variant discovery more generally in complex traits.
Construction of armored RNA containing long-size chimeric RNA by increasing the number and affinity of the pac site in exogenous rna and sequence coding coat protein of the MS2 bacteriophage.

PubMed

Wei, Baojun; Wei, Yuxiang; Zhang, Kuo; Yang, Changmei; Wang, Jing; Xu, Ruihuan; Zhan, Sien; Lin, Guigao; Wang, Wei; Liu, Min; Wang, Lunan; Zhang, Rui; Li, Jinming

2008-01-01

To construct a one-plasmid expression system of the armored RNA containing long chimeric RNA by increasing the number and affinity of the pac site. The plasmid pET-MS2-pac was constructed with one C-variant pac site, and then the plasmid pM-CR-2C containing 1,891-bp chimeric sequences and two C-variant pac sites was produced. Meanwhile, three plasmids (pM-CR-C, pM-CR-2W and pM-CR-W) were obtained as parallel controls with a different number and affinity of the pac site. Finally, the armored RNA was expressed and purified. The armored RNA with 1,891 bases target RNA was expressed successfully by the one-plasmid expression system with two C-variant pac sites, while for one pac site, no matter whether the affinity was changed or not, only the 1,200 bases target RNA was packaged. It was also found that the C-variant pac site could increase the expression efficiency of the armored RNA. The armored RNA with 1,891-bp exogenous RNA in our study showed the characterization of ribonuclease resistance and stability at different time points and temperature conditions. The armored RNA with 1,891 bases exogenous RNA was constructed and the expression system can be used as a platform for preparation of the armored RNA containing long RNA sequences. Copyright 2008 S. Karger AG, Basel.
Mistranslation: from adaptations to applications.

PubMed

Hoffman, Kyle S; O'Donoghue, Patrick; Brandl, Christopher J

2017-11-01

The conservation of the genetic code indicates that there was a single origin, but like all genetic material, the cell's interpretation of the code is subject to evolutionary pressure. Single nucleotide variations in tRNA sequences can modulate codon assignments by altering codon-anticodon pairing or tRNA charging. Either can increase translation errors and even change the code. The frozen accident hypothesis argued that changes to the code would destabilize the proteome and reduce fitness. In studies of model organisms, mistranslation often acts as an adaptive response. These studies reveal evolutionary conserved mechanisms to maintain proteostasis even during high rates of mistranslation. This review discusses the evolutionary basis of altered genetic codes, how mistranslation is identified, and how deviations to the genetic code are exploited. We revisit early discoveries of genetic code deviations and provide examples of adaptive mistranslation events in nature. Lastly, we highlight innovations in synthetic biology to expand the genetic code. The genetic code is still evolving. Mistranslation increases proteomic diversity that enables cells to survive stress conditions or suppress a deleterious allele. Genetic code variants have been identified by genome and metagenome sequence analyses, suppressor genetics, and biochemical characterization. Understanding the mechanisms of translation and genetic code deviations enables the design of new codes to produce novel proteins. Engineering the translation machinery and expanding the genetic code to incorporate non-canonical amino acids are valuable tools in synthetic biology that are impacting biomedical research. This article is part of a Special Issue entitled "Biochemistry of Synthetic Biology - Recent Developments" Guest Editor: Dr. Ilka Heinemann and Dr. Patrick O'Donoghue. Copyright © 2017 Elsevier B.V. All rights reserved.
Exome sequencing and arrayCGH detection of gene sequence and copy number variation between ILS and ISS mouse strains.

PubMed

Dumas, Laura; Dickens, C Michael; Anderson, Nathan; Davis, Jonathan; Bennett, Beth; Radcliffe, Richard A; Sikela, James M

2014-06-01

It has been well documented that genetic factors can influence predisposition to develop alcoholism. While the underlying genomic changes may be of several types, two of the most common and disease associated are copy number variations (CNVs) and sequence alterations of protein coding regions. The goal of this study was to identify CNVs and single-nucleotide polymorphisms that occur in gene coding regions that may play a role in influencing the risk of an individual developing alcoholism. Toward this end, two mouse strains were used that have been selectively bred based on their differential sensitivity to alcohol: the Inbred long sleep (ILS) and Inbred short sleep (ISS) mouse strains. Differences in initial response to alcohol have been linked to risk for alcoholism, and the ILS/ISS strains are used to investigate the genetics of initial sensitivity to alcohol. Array comparative genomic hybridization (arrayCGH) and exome sequencing were conducted to identify CNVs and gene coding sequence differences, respectively, between ILS and ISS mice. Mouse arrayCGH was performed using catalog Agilent 1 × 244 k mouse arrays. Subsequently, exome sequencing was carried out using an Illumina HiSeq 2000 instrument. ArrayCGH detected 74 CNVs that were strain-specific (38 ILS/36 ISS), including several ISS-specific deletions that contained genes implicated in brain function and neurotransmitter release. Among several interesting coding variations detected by exome sequencing was the gain of a premature stop codon in the alpha-amylase 2B (AMY2B) gene specifically in the ILS strain. In total, exome sequencing detected 2,597 and 1,768 strain-specific exonic gene variants in the ILS and ISS mice, respectively. This study represents the most comprehensive and detailed genomic comparison of ILS and ISS mouse strains to date. The two complementary genome-wide approaches identified strain-specific CNVs and gene coding sequence variations that should provide strong candidates to contribute to the alcohol-related phenotypic differences associated with these strains.
GM2 Gangliosidosis in Shiba Inu Dogs with an In-Frame Deletion in HEXB.

PubMed

Kolicheski, A; Johnson, G S; Villani, N A; O'Brien, D P; Mhlanga-Mutangadura, T; Wenger, D A; Mikoloski, K; Eagleson, J S; Taylor, J F; Schnabel, R D; Katz, M L

2017-09-01

Consistent with a tentative diagnosis of neuronal ceroid lipofuscinosis (NCL), autofluorescent cytoplasmic storage bodies were found in neurons from the brains of 2 related Shiba Inu dogs with a young-adult onset, progressive neurodegenerative disease. Unexpectedly, no potentially causal NCL-related variants were identified in a whole-genome sequence generated with DNA from 1 of the affected dogs. Instead, the whole-genome sequence contained a homozygous 3 base pair (bp) deletion in a coding region of HEXB. The other affected dog also was homozygous for this 3-bp deletion. Mutations in the human HEXB ortholog cause Sandhoff disease, a type of GM2 gangliosidosis. Thin-layer chromatography confirmed that GM2 ganglioside had accumulated in an affected Shiba Inu brain. Enzymatic analysis confirmed that the GM2 gangliosidosis resulted from a deficiency in the HEXB encoded protein and not from a deficiency in products from HEXA or GM2A, which are known alternative causes of GM2 gangliosidosis. We conclude that the homozygous 3-bp deletion in HEXB is the likely cause of the Shiba Inu neurodegenerative disease and that whole-genome sequencing can lead to the early identification of potentially disease-causing DNA variants thereby refocusing subsequent diagnostic analyses toward confirming or refuting candidate variant causality. Copyright © 2017 The Authors. Journal of Veterinary Internal Medicine published by Wiley Periodicals, Inc. on behalf of the American College of Veterinary Internal Medicine.
Investigating intra-host and intra-herd sequence diversity of foot-and-mouth disease virus.

PubMed

King, David J; Freimanis, Graham L; Orton, Richard J; Waters, Ryan A; Haydon, Daniel T; King, Donald P

2016-10-01

Due to the poor-fidelity of the enzymes involved in RNA genome replication, foot-and-mouth disease (FMD) virus samples comprise of unique polymorphic populations. In this study, deep sequencing was utilised to characterise the diversity of FMD virus (FMDV) populations in 6 infected cattle present on a single farm during the series of outbreaks in the UK in 2007. A novel RT-PCR method was developed to amplify a 7.6kb nucleotide fragment encompassing the polyprotein coding region of the FMDV genome. Illumina sequencing of each sample identified the fine polymorphic structures at each nucleotide position, from consensus level changes to variants present at a 0.24% frequency. These data were used to investigate population dynamics of FMDV at both herd and host levels, evaluate the impact of host on the viral swarm structure and to identify transmission links with viruses recovered from other farms in the same series of outbreaks. In 7 samples, from 6 different animals, a total of 5 consensus level variants were identified, in addition to 104 sub-consensus variants of which 22 were shared between 2 or more animals. Further analysis revealed differences in swarm structures from samples derived from the same animal suggesting the presence of distinct viral populations evolving independently at different lesion sites within the same infected animal. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Molecular analysis of abnormal hemoglobins in beta chain in Aegean region of Turkey and first reports of hemoglobin Andrew-Minneapolis and Hb Hinsdale from Turkey.

PubMed

Aykut, Ayça; Onay, Hüseyin; Durmaz, Asude; Karaca, Emin; Vergin, Canan; Aydınok, Yeşim; Özkınay, Ferda

2015-07-01

The Agean is one of the regions in Turkey where thalassemias and abnormal hemoglobins (Hbs) are prevalent. Combined heterozygosity of thalassemia mutations with a variety of structural Hb variants lead to an extremely wide spectrum of clinical and hematological phenotypes which is of importance for prenatal diagnosis. One hundred and seventeen patients and carriers diagnosed by hemoglobin electrophoresis (HPLC), at risk for abnormal hemoglobinopathies were screened for mutational analysis of the beta-globin gene. The full coding the 5' UTR, and the 3' UTR sequences of beta-globin gene (GenBank accession no. U01317) were amplified and sequenced. In this study, a total of 118 (12.24%) structural Hb variant alleles were identified in 1341 mutated beta-chain alleles in Medical Genetics Department of Ege University between January 2006 and November 2013. Here, we report the mutation spectrum of abnormal Hbs associated with the beta-globin gene in Aegean region of Turkey. In the present study, the Hb Hinsdale and Hb Andrew-Minneapolis variants are demonstrated for the first time in the Turkish population.
De novo mutations in genes of mediator complex causing syndromic intellectual disability: mediatorpathy or transcriptomopathy?

PubMed

Caro-Llopis, Alfonso; Rosello, Monica; Orellana, Carmen; Oltra, Silvestre; Monfort, Sandra; Mayo, Sonia; Martinez, Francisco

2016-12-01

Mutations in the X-linked gene MED12 cause at least three different, but closely related, entities of syndromic intellectual disability. Recently, a new syndrome caused by MED13L deleterious variants has been described, which shows similar clinical manifestations including intellectual disability, hypotonia, and other congenital anomalies. Genotyping of 1,256 genes related with neurodevelopment was performed by next-generation sequencing in three unrelated patients and their healthy parents. Clinically relevant findings were confirmed by conventional sequencing. Each patient showed one de novo variant not previously reported in the literature or databases. Two different missense variants were found in the MED12 or MED13L genes and one nonsense mutation was found in the MED13L gene. The phenotypic consequences of these mutations are closely related and/or have been previously reported in one or other gene. Additionally, MED12 and MED13L code for two closely related partners of the mediator kinase module. Consequently, we propose the concept of a common MED12/MED13L clinical spectrum, encompassing Opitz-Kaveggia syndrome, Lujan-Fryns syndrome, Ohdo syndrome, MED13L haploinsufficiency syndrome, and others.
Ghrelin gene: identification of missense variants and a frameshift mutation in extremely obese children and adolescents and healthy normal weight students.

PubMed

Hinney, Anke; Hoch, Anne; Geller, Frank; Schäfer, Helmut; Siegfried, Wolfgang; Goldschmidt, Hanspeter; Remschmidt, Helmut; Hebebrand, Johannes

2002-06-01

Ghrelin induces obesity via central and peripheral mechanisms. Administration of ghrelin leads to increased food intake and decreased fat utilisation in rodents. Ghrelin levels are decreased in obese individuals. Recently, a polymorphism (Arg-51-Gln) within the ghrelin gene (GHRL) was described to be associated with obesity. We screened the GHRL coding region in 215 extremely obese German Children and adolescents (study group 1) and 93 normal weight students (study group 2) by single strand conformation polymorphism analysis (SSCP). We found the two previously described single nucleotide polymorphisms (SNP: Arg-51-Gln and Leu-72-Met) in similar frequencies in study groups 1 and 2 (allele frequencies were: 0.019 and 0.016 for the 51-Gln allele and 0.091 and 0.086 for the 72-Met allele, respectively). Hence, we could not confirm the previous finding. Additionally, two novel variants were identified within the coding region: (1) We detected one healthy normal weight individual with a frameshift mutation (2bp deletion at codon 34). This frameshift mutation affects the coding region of the mature ghrelin. Hence, it is highly likely that the normal weight student is haplo-insufficient for ghrelin. (2) An A to T transversion leads to an amino acid exchange from Gln to Leu at amino acid position 90. The frequency of the 90-Leu allele was significantly higher in the extremely obese children and adolescents (0.063) than in the normal weight students (0.016; nominal p = 0.011). Additionally, we genotyped 134 underweight students and 44 normal weight adults for this SNP. Genotype frequencies were similar in extremely obese children and adolescents, underweight students and normal weight adults (p > 0.8). In conclusion, we identified four sequence variants in the coding region of the ghrelin gene in individuals belonging to different weight extremes. A frameshift mutation was detected in a normal weight individual. None of the variants seem to influence weight regulation.
Genome-wide association study yields variants at 20p12.2 that associate with urinary bladder cancer.

PubMed

Rafnar, Thorunn; Sulem, Patrick; Thorleifsson, Gudmar; Vermeulen, Sita H; Helgason, Hannes; Saemundsdottir, Jona; Gudjonsson, Sigurjon A; Sigurdsson, Asgeir; Stacey, Simon N; Gudmundsson, Julius; Johannsdottir, Hrefna; Alexiusdottir, Kristin; Petursdottir, Vigdis; Nikulasson, Sigfus; Geirsson, Gudmundur; Jonsson, Thorvaldur; Aben, Katja K H; Grotenhuis, Anne J; Verhaegh, Gerald W; Dudek, Aleksandra M; Witjes, J Alfred; van der Heijden, Antoine G; Vrieling, Alina; Galesloot, Tessel E; De Juan, Ana; Panadero, Angeles; Rivera, Fernando; Hurst, Carolyn; Bishop, D Timothy; Sak, Sei C; Choudhury, Ananya; Teo, Mark T W; Arici, Cecilia; Carta, Angela; Toninelli, Elena; de Verdier, Petra; Rudnai, Peter; Gurzau, Eugene; Koppova, Kvetoslava; van der Keur, Kirstin A; Lurkin, Irene; Goossens, Mieke; Kellen, Eliane; Guarrera, Simonetta; Russo, Alessia; Critelli, Rossana; Sacerdote, Carlotta; Vineis, Paolo; Krucker, Clémentine; Zeegers, Maurice P; Gerullis, Holger; Ovsiannikov, Daniel; Volkert, Frank; Hengstler, Jan G; Selinski, Silvia; Magnusson, Olafur T; Masson, Gisli; Kong, Augustine; Gudbjartsson, Daniel; Lindblom, Annika; Zwarthoff, Ellen; Porru, Stefano; Golka, Klaus; Buntinx, Frank; Matullo, Giuseppe; Kumar, Rajiv; Mayordomo, José I; Steineck, D Gunnar; Kiltie, Anne E; Jonsson, Eirikur; Radvanyi, François; Knowles, Margaret A; Thorsteinsdottir, Unnur; Kiemeney, Lambertus A; Stefansson, Kari

2014-10-15

Genome-wide association studies (GWAS) of urinary bladder cancer (UBC) have yielded common variants at 12 loci that associate with risk of the disease. We report here the results of a GWAS of UBC including 1670 UBC cases and 90 180 controls, followed by replication analysis in additional 5266 UBC cases and 10 456 controls. We tested a dataset containing 34.2 million variants, generated by imputation based on whole-genome sequencing of 2230 Icelanders. Several correlated variants at 20p12, represented by rs62185668, show genome-wide significant association with UBC after combining discovery and replication results (OR = 1.19, P = 1.5 × 10(-11) for rs62185668-A, minor allele frequency = 23.6%). The variants are located in a non-coding region approximately 300 kb upstream from the JAG1 gene, an important component of the Notch signaling pathways that may be oncogenic or tumor suppressive in several forms of cancer. Our results add to the growing number of UBC risk variants discovered through GWAS. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Sequence variants in oxytocin pathway genes and preterm birth: a candidate gene association study

PubMed Central

2013-01-01

Background Preterm birth (PTB) is a complex disorder associated with significant neonatal mortality and morbidity and long-term adverse health consequences. Multiple lines of evidence suggest that genetic factors play an important role in its etiology. This study was designed to identify genetic variation associated with PTB in oxytocin pathway genes whose role in parturition is well known. Methods To identify common genetic variants predisposing to PTB, we genotyped 16 single nucleotide polymorphisms (SNPs) in the oxytocin (OXT), oxytocin receptor (OXTR), and leucyl/cystinyl aminopeptidase (LNPEP) genes in 651 case infants from the U.S. and one or both of their parents. In addition, we examined the role of rare genetic variation in susceptibility to PTB by conducting direct sequence analysis of OXTR in 1394 cases and 1112 controls from the U.S., Argentina, Denmark, and Finland. This study was further extended to maternal triads (maternal grandparents-mother of a case infant, N=309). We also performed in vitro analysis of selected rare OXTR missense variants to evaluate their functional importance. Results Maternal genetic effect analysis of the SNP genotype data revealed four SNPs in LNPEP that show significant association with prematurity. In our case–control sequence analysis, we detected fourteen coding variants in exon 3 of OXTR, all but four of which were found in cases only. Of the fourteen variants, three were previously unreported novel rare variants. When the sequence data from the maternal triads were analyzed using the transmission disequilibrium test, two common missense SNPs (rs4686302 and rs237902) in OXTR showed suggestive association for three gestational age subgroups. In vitro functional assays showed a significant difference in ligand binding between wild-type and two mutant receptors. Conclusions Our study suggests an association between maternal common polymorphisms in LNPEP and susceptibility to PTB. Maternal OXTR missense SNPs rs4686302 and rs237902 may have gestational age-dependent effects on prematurity. Most of the OXTR rare variants identified do not appear to significantly contribute to the risk of PTB, but those shown to affect receptor function in our in vitro study warrant further investigation. Future studies with larger sample sizes are needed to confirm the findings of this study. PMID:23889750
ACTG: novel peptide mapping onto gene models.

PubMed

Choi, Seunghyuk; Kim, Hyunwoo; Paek, Eunok

2017-04-15

In many proteogenomic applications, mapping peptide sequences onto genome sequences can be very useful, because it allows us to understand origins of the gene products. Existing software tools either take the genomic position of a peptide start site as an input or assume that the peptide sequence exactly matches the coding sequence of a given gene model. In case of novel peptides resulting from genomic variations, especially structural variations such as alternative splicing, these existing tools cannot be directly applied unless users supply information about the variant, either its genomic position or its transcription model. Mapping potentially novel peptides to genome sequences, while allowing certain genomic variations, requires introducing novel gene models when aligning peptide sequences to gene structures. We have developed a new tool called ACTG (Amino aCids To Genome), which maps peptides to genome, assuming all possible single exon skipping, junction variation allowing three edit distances from the original splice sites, exon extension and frame shift. In addition, it can also consider SNVs (single nucleotide variations) during mapping phase if a user provides the VCF (variant call format) file as an input. Available at http://prix.hanyang.ac.kr/ACTG/search.jsp . eunokpaek@hanyang.ac.kr. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Gene network polymorphism is the raw material of natural selection: the selfish gene network hypothesis.

PubMed

Boldogköi, Zsolt

2004-09-01

Population genetics, the mathematical theory of modern evolutionary biology, defines evolution as the alteration of the frequency of distinct gene variants (alleles) differing in fitness over the time. The major problem with this view is that in gene and protein sequences we can find little evidence concerning the molecular basis of phenotypic variance, especially those that would confer adaptive benefit to the bearers. Some novel data, however, suggest that a large amount of genetic variation exists in the regulatory region of genes within populations. In addition, comparison of homologous DNA sequences of various species shows that evolution appears to depend more strongly on gene expression than on the genes themselves. Furthermore, it has been demonstrated in several systems that genes form functional networks, whose products exhibit interrelated expression profiles. Finally, it has been found that regulatory circuits of development behave as evolutionary units. These data demonstrate that our view of evolution calls for a new synthesis. In this article I propose a novel concept, termed the selfish gene network hypothesis, which is based on an overall consideration of the above findings. The major statements of this hypothesis are as follows. (1) Instead of individual genes, gene networks (GNs) are responsible for the determination of traits and behaviors. (2) The primary source of microevolution is the intraspecific polymorphism in GNs and not the allelic variation in either the coding or the regulatory sequences of individual genes. (3) GN polymorphism is generated by the variation in the regulatory regions of the component genes and not by the variance in their coding sequences. (4) Evolution proceeds through continuous restructuring of the composition of GNs rather than fixing of specific alleles or GN variants.
Functional variants in the sucrase–isomaltase gene associate with increased risk of irritable bowel syndrome

PubMed Central

Henström, Maria; Diekmann, Lena; Bonfiglio, Ferdinando; Hadizadeh, Fatemeh; Kuech, Eva-Maria; von Köckritz-Blickwede, Maren; Thingholm, Louise B; Zheng, Tenghao; Assadi, Ghazaleh; Dierks, Claudia; Heine, Martin; Philipp, Ute; Distl, Ottmar; Money, Mary E; Belheouane, Meriem; Heinsen, Femke-Anouska; Rafter, Joseph; Nardone, Gerardo; Cuomo, Rosario; Usai-Satta, Paolo; Galeazzi, Francesca; Neri, Matteo; Walter, Susanna; Simrén, Magnus; Karling, Pontus; Ohlsson, Bodil; Schmidt, Peter T; Lindberg, Greger; Dlugosz, Aldona; Agreus, Lars; Andreasson, Anna; Mayer, Emeran; Baines, John F; Engstrand, Lars; Portincasa, Piero; Bellini, Massimo; Stanghellini, Vincenzo; Barbara, Giovanni; Chang, Lin; Camilleri, Michael; Franke, Andre; Naim, Hassan Y

2018-01-01

Objective IBS is a common gut disorder of uncertain pathogenesis. Among other factors, genetics and certain foods are proposed to contribute. Congenital sucrase–isomaltase deficiency (CSID) is a rare genetic form of disaccharide malabsorption characterised by diarrhoea, abdominal pain and bloating, which are features common to IBS. We tested sucrase–isomaltase (SI) gene variants for their potential relevance in IBS. Design We sequenced SI exons in seven familial cases, and screened four CSID mutations (p.Val557Gly, p.Gly1073Asp, p.Arg1124Ter and p.Phe1745Cys) and a common SI coding polymorphism (p.Val15Phe) in a multicentre cohort of 1887 cases and controls. We studied the effect of the 15Val to 15Phe substitution on SI function in vitro. We analysed p.Val15Phe genotype in relation to IBS status, stool frequency and faecal microbiota composition in 250 individuals from the general population. Results CSID mutations were more common in patients than asymptomatic controls (p=0.074; OR=1.84) and Exome Aggregation Consortium reference sequenced individuals (p=0.020; OR=1.57). 15Phe was detected in 6/7 sequenced familial cases, and increased IBS risk in case–control and population-based cohorts, with best evidence for diarrhoea phenotypes (combined p=0.00012; OR=1.36). In the population-based sample, 15Phe allele dosage correlated with stool frequency (p=0.026) and Parabacteroides faecal microbiota abundance (p=0.0024). The SI protein with 15Phe exhibited 35% reduced enzymatic activity in vitro compared with 15Val (p<0.05). Conclusions SI gene variants coding for disaccharidases with defective or reduced enzymatic activity predispose to IBS. This may help the identification of individuals at risk, and contribute to personalising treatment options in a subset of patients. PMID:27872184

Functional variants in the sucrase-isomaltase gene associate with increased risk of irritable bowel syndrome.

PubMed

Henström, Maria; Diekmann, Lena; Bonfiglio, Ferdinando; Hadizadeh, Fatemeh; Kuech, Eva-Maria; von Köckritz-Blickwede, Maren; Thingholm, Louise B; Zheng, Tenghao; Assadi, Ghazaleh; Dierks, Claudia; Heine, Martin; Philipp, Ute; Distl, Ottmar; Money, Mary E; Belheouane, Meriem; Heinsen, Femke-Anouska; Rafter, Joseph; Nardone, Gerardo; Cuomo, Rosario; Usai-Satta, Paolo; Galeazzi, Francesca; Neri, Matteo; Walter, Susanna; Simrén, Magnus; Karling, Pontus; Ohlsson, Bodil; Schmidt, Peter T; Lindberg, Greger; Dlugosz, Aldona; Agreus, Lars; Andreasson, Anna; Mayer, Emeran; Baines, John F; Engstrand, Lars; Portincasa, Piero; Bellini, Massimo; Stanghellini, Vincenzo; Barbara, Giovanni; Chang, Lin; Camilleri, Michael; Franke, Andre; Naim, Hassan Y; D'Amato, Mauro

2018-02-01

IBS is a common gut disorder of uncertain pathogenesis. Among other factors, genetics and certain foods are proposed to contribute. Congenital sucrase-isomaltase deficiency (CSID) is a rare genetic form of disaccharide malabsorption characterised by diarrhoea, abdominal pain and bloating, which are features common to IBS. We tested sucrase-isomaltase ( SI ) gene variants for their potential relevance in IBS. We sequenced SI exons in seven familial cases, and screened four CSID mutations (p.Val557Gly, p.Gly1073Asp, p.Arg1124Ter and p.Phe1745Cys) and a common SI coding polymorphism (p.Val15Phe) in a multicentre cohort of 1887 cases and controls. We studied the effect of the 15Val to 15Phe substitution on SI function in vitro. We analysed p.Val15Phe genotype in relation to IBS status, stool frequency and faecal microbiota composition in 250 individuals from the general population. CSID mutations were more common in patients than asymptomatic controls (p=0.074; OR=1.84) and Exome Aggregation Consortium reference sequenced individuals (p=0.020; OR=1.57). 15Phe was detected in 6/7 sequenced familial cases, and increased IBS risk in case-control and population-based cohorts, with best evidence for diarrhoea phenotypes (combined p=0.00012; OR=1.36). In the population-based sample, 15Phe allele dosage correlated with stool frequency (p=0.026) and Parabacteroides faecal microbiota abundance (p=0.0024). The SI protein with 15Phe exhibited 35% reduced enzymatic activity in vitro compared with 15Val (p<0.05). SI gene variants coding for disaccharidases with defective or reduced enzymatic activity predispose to IBS. This may help the identification of individuals at risk, and contribute to personalising treatment options in a subset of patients. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
Identification and Functional Characterization of G6PC2 Coding Variants Influencing Glycemic Traits Define an Effector Transcript at the G6PC2-ABCB11 Locus

PubMed Central

Mahajan, Anubha; Sim, Xueling; Ng, Hui Jin; Manning, Alisa; Rivas, Manuel A.; Highland, Heather M.; Locke, Adam E.; Grarup, Niels; Im, Hae Kyung; Cingolani, Pablo; Flannick, Jason; Fontanillas, Pierre; Fuchsberger, Christian; Gaulton, Kyle J.; Teslovich, Tanya M.; Rayner, N. William; Robertson, Neil R.; Beer, Nicola L.; Rundle, Jana K.; Bork-Jensen, Jette; Ladenvall, Claes; Blancher, Christine; Buck, David; Buck, Gemma; Burtt, Noël P.; Gabriel, Stacey; Gjesing, Anette P.; Groves, Christopher J.; Hollensted, Mette; Huyghe, Jeroen R.; Jackson, Anne U.; Jun, Goo; Justesen, Johanne Marie; Mangino, Massimo; Murphy, Jacquelyn; Neville, Matt; Onofrio, Robert; Small, Kerrin S.; Stringham, Heather M.; Syvänen, Ann-Christine; Trakalo, Joseph; Abecasis, Goncalo; Bell, Graeme I.; Blangero, John; Cox, Nancy J.; Duggirala, Ravindranath; Hanis, Craig L.; Seielstad, Mark; Wilson, James G.; Christensen, Cramer; Brandslund, Ivan; Rauramaa, Rainer; Surdulescu, Gabriela L.; Doney, Alex S. F.; Lannfelt, Lars; Linneberg, Allan; Isomaa, Bo; Tuomi, Tiinamaija; Jørgensen, Marit E.; Jørgensen, Torben; Kuusisto, Johanna; Uusitupa, Matti; Salomaa, Veikko; Spector, Timothy D.; Morris, Andrew D.; Palmer, Colin N. A.; Collins, Francis S.; Mohlke, Karen L.; Bergman, Richard N.; Ingelsson, Erik; Lind, Lars; Tuomilehto, Jaakko; Hansen, Torben; Watanabe, Richard M.; Prokopenko, Inga; Dupuis, Josee; Karpe, Fredrik; Groop, Leif; Laakso, Markku; Pedersen, Oluf; Florez, Jose C.; Morris, Andrew P.; Altshuler, David; Meigs, James B.; Boehnke, Michael; McCarthy, Mark I.; Lindgren, Cecilia M.; Gloyn, Anna L.

2015-01-01

Genome wide association studies (GWAS) for fasting glucose (FG) and insulin (FI) have identified common variant signals which explain 4.8% and 1.2% of trait variance, respectively. It is hypothesized that low-frequency and rare variants could contribute substantially to unexplained genetic variance. To test this, we analyzed exome-array data from up to 33,231 non-diabetic individuals of European ancestry. We found exome-wide significant (P<5×10-7) evidence for two loci not previously highlighted by common variant GWAS: GLP1R (p.Ala316Thr, minor allele frequency (MAF)=1.5%) influencing FG levels, and URB2 (p.Glu594Val, MAF = 0.1%) influencing FI levels. Coding variant associations can highlight potential effector genes at (non-coding) GWAS signals. At the G6PC2/ABCB11 locus, we identified multiple coding variants in G6PC2 (p.Val219Leu, p.His177Tyr, and p.Tyr207Ser) influencing FG levels, conditionally independent of each other and the non-coding GWAS signal. In vitro assays demonstrate that these associated coding alleles result in reduced protein abundance via proteasomal degradation, establishing G6PC2 as an effector gene at this locus. Reconciliation of single-variant associations and functional effects was only possible when haplotype phase was considered. In contrast to earlier reports suggesting that, paradoxically, glucose-raising alleles at this locus are protective against type 2 diabetes (T2D), the p.Val219Leu G6PC2 variant displayed a modest but directionally consistent association with T2D risk. Coding variant associations for glycemic traits in GWAS signals highlight PCSK1, RREB1, and ZHX3 as likely effector transcripts. These coding variant association signals do not have a major impact on the trait variance explained, but they do provide valuable biological insights. PMID:25625282
Identification and functional characterization of G6PC2 coding variants influencing glycemic traits define an effector transcript at the G6PC2-ABCB11 locus.

PubMed

Mahajan, Anubha; Sim, Xueling; Ng, Hui Jin; Manning, Alisa; Rivas, Manuel A; Highland, Heather M; Locke, Adam E; Grarup, Niels; Im, Hae Kyung; Cingolani, Pablo; Flannick, Jason; Fontanillas, Pierre; Fuchsberger, Christian; Gaulton, Kyle J; Teslovich, Tanya M; Rayner, N William; Robertson, Neil R; Beer, Nicola L; Rundle, Jana K; Bork-Jensen, Jette; Ladenvall, Claes; Blancher, Christine; Buck, David; Buck, Gemma; Burtt, Noël P; Gabriel, Stacey; Gjesing, Anette P; Groves, Christopher J; Hollensted, Mette; Huyghe, Jeroen R; Jackson, Anne U; Jun, Goo; Justesen, Johanne Marie; Mangino, Massimo; Murphy, Jacquelyn; Neville, Matt; Onofrio, Robert; Small, Kerrin S; Stringham, Heather M; Syvänen, Ann-Christine; Trakalo, Joseph; Abecasis, Goncalo; Bell, Graeme I; Blangero, John; Cox, Nancy J; Duggirala, Ravindranath; Hanis, Craig L; Seielstad, Mark; Wilson, James G; Christensen, Cramer; Brandslund, Ivan; Rauramaa, Rainer; Surdulescu, Gabriela L; Doney, Alex S F; Lannfelt, Lars; Linneberg, Allan; Isomaa, Bo; Tuomi, Tiinamaija; Jørgensen, Marit E; Jørgensen, Torben; Kuusisto, Johanna; Uusitupa, Matti; Salomaa, Veikko; Spector, Timothy D; Morris, Andrew D; Palmer, Colin N A; Collins, Francis S; Mohlke, Karen L; Bergman, Richard N; Ingelsson, Erik; Lind, Lars; Tuomilehto, Jaakko; Hansen, Torben; Watanabe, Richard M; Prokopenko, Inga; Dupuis, Josee; Karpe, Fredrik; Groop, Leif; Laakso, Markku; Pedersen, Oluf; Florez, Jose C; Morris, Andrew P; Altshuler, David; Meigs, James B; Boehnke, Michael; McCarthy, Mark I; Lindgren, Cecilia M; Gloyn, Anna L

2015-01-01

Genome wide association studies (GWAS) for fasting glucose (FG) and insulin (FI) have identified common variant signals which explain 4.8% and 1.2% of trait variance, respectively. It is hypothesized that low-frequency and rare variants could contribute substantially to unexplained genetic variance. To test this, we analyzed exome-array data from up to 33,231 non-diabetic individuals of European ancestry. We found exome-wide significant (P<5×10-7) evidence for two loci not previously highlighted by common variant GWAS: GLP1R (p.Ala316Thr, minor allele frequency (MAF)=1.5%) influencing FG levels, and URB2 (p.Glu594Val, MAF = 0.1%) influencing FI levels. Coding variant associations can highlight potential effector genes at (non-coding) GWAS signals. At the G6PC2/ABCB11 locus, we identified multiple coding variants in G6PC2 (p.Val219Leu, p.His177Tyr, and p.Tyr207Ser) influencing FG levels, conditionally independent of each other and the non-coding GWAS signal. In vitro assays demonstrate that these associated coding alleles result in reduced protein abundance via proteasomal degradation, establishing G6PC2 as an effector gene at this locus. Reconciliation of single-variant associations and functional effects was only possible when haplotype phase was considered. In contrast to earlier reports suggesting that, paradoxically, glucose-raising alleles at this locus are protective against type 2 diabetes (T2D), the p.Val219Leu G6PC2 variant displayed a modest but directionally consistent association with T2D risk. Coding variant associations for glycemic traits in GWAS signals highlight PCSK1, RREB1, and ZHX3 as likely effector transcripts. These coding variant association signals do not have a major impact on the trait variance explained, but they do provide valuable biological insights.
Discovery and Functional Annotation of SIX6 Variants in Primary Open-Angle Glaucoma

PubMed Central

Allingham, R. Rand; Whigham, Benjamin T.; Havens, Shane; Garrett, Melanie E.; Qiao, Chunyan; Katsanis, Nicholas; Wiggs, Janey L.; Pasquale, Louis R.; Ashley-Koch, Allison; Oh, Edwin C.; Hauser, Michael A.

2014-01-01

Glaucoma is a leading cause of blindness worldwide. Primary open-angle glaucoma (POAG) is the most common subtype and is a complex trait with multigenic inheritance. Genome-wide association studies have previously identified a significant association between POAG and the SIX6 locus (rs10483727, odds ratio (OR) = 1.32, p = 3.87×10−11). SIX6 plays a role in ocular development and has been associated with the morphology of the optic nerve. We sequenced the SIX6 coding and regulatory regions in 262 POAG cases and 256 controls and identified six nonsynonymous coding variants, including five rare and one common variant, Asn141His (rs33912345), which was associated significantly with POAG (OR = 1.27, p = 4.2×10−10) in the NEIGHBOR/GLAUGEN datasets. These variants were tested in an in vivo Danio rerio (zebrafish) complementation assay to evaluate ocular metrics such as eye size and optic nerve structure. Five variants, found primarily in POAG cases, were hypomorphic or null, while the sixth variant, found only in controls, was benign. One variant in the SIX6 enhancer increased expression of SIX6 and disrupted its regulation. Finally, to our knowledge for the first time, we have identified a clinical feature in POAG patients that appears to be dependent upon SIX6 genotype: patients who are homozygous for the SIX6 risk allele (His141) have a statistically thinner retinal nerve fiber layer than patients homozygous for the SIX6 non-risk allele (Asn141). Our results, in combination with previous SIX6 work, lead us to hypothesize that SIX6 risk variants disrupt the development of the neural retina, leading to a reduced number of retinal ganglion cells, thereby increasing the risk of glaucoma-associated vision loss. PMID:24875647
Evolutionary conservation analysis increases the colocalization of predicted exonic splicing enhancers in the BRCA1 gene with missense sequence changes and in-frame deletions, but not polymorphisms

PubMed Central

Pettigrew, Christopher; Wayte, Nicola; Lovelock, Paul K; Tavtigian, Sean V; Chenevix-Trench, Georgia; Spurdle, Amanda B; Brown, Melissa A

2005-01-01

Introduction Aberrant pre-mRNA splicing can be more detrimental to the function of a gene than changes in the length or nature of the encoded amino acid sequence. Although predicting the effects of changes in consensus 5' and 3' splice sites near intron:exon boundaries is relatively straightforward, predicting the possible effects of changes in exonic splicing enhancers (ESEs) remains a challenge. Methods As an initial step toward determining which ESEs predicted by the web-based tool ESEfinder in the breast cancer susceptibility gene BRCA1 are likely to be functional, we have determined their evolutionary conservation and compared their location with known BRCA1 sequence variants. Results Using the default settings of ESEfinder, we initially detected 669 potential ESEs in the coding region of the BRCA1 gene. Increasing the threshold score reduced the total number to 464, while taking into consideration the proximity to splice donor and acceptor sites reduced the number to 211. Approximately 11% of these ESEs (23/211) either are identical at the nucleotide level in human, primates, mouse, cow, dog and opossum Brca1 (conserved) or are detectable by ESEfinder in the same position in the Brca1 sequence (shared). The frequency of conserved and shared predicted ESEs between human and mouse is higher in BRCA1 exons (2.8 per 100 nucleotides) than in introns (0.6 per 100 nucleotides). Of conserved or shared putative ESEs, 61% (14/23) were predicted to be affected by sequence variants reported in the Breast Cancer Information Core database. Applying the filters described above increased the colocalization of predicted ESEs with missense changes, in-frame deletions and unclassified variants predicted to be deleterious to protein function, whereas they decreased the colocalization with known polymorphisms or unclassified variants predicted to be neutral. Conclusion In this report we show that evolutionary conservation analysis may be used to improve the specificity of an ESE prediction tool. This is the first report on the prediction of the frequency and distribution of ESEs in the BRCA1 gene, and it is the first reported attempt to predict which ESEs are most likely to be functional and therefore which sequence variants in ESEs are most likely to be pathogenic. PMID:16280041
Consensus generation and variant detection by Celera Assembler.

PubMed

Denisov, Gennady; Walenz, Brian; Halpern, Aaron L; Miller, Jason; Axelrod, Nelson; Levy, Samuel; Sutton, Granger

2008-04-15

We present an algorithm to identify allelic variation given a Whole Genome Shotgun (WGS) assembly of haploid sequences, and to produce a set of haploid consensus sequences rather than a single consensus sequence. Existing WGS assemblers take a column-by-column approach to consensus generation, and produce a single consensus sequence which can be inconsistent with the underlying haploid alleles, and inconsistent with any of the aligned sequence reads. Our new algorithm uses a dynamic windowing approach. It detects alleles by simultaneously processing the portions of aligned reads spanning a region of sequence variation, assigns reads to their respective alleles, phases adjacent variant alleles and generates a consensus sequence corresponding to each confirmed allele. This algorithm was used to produce the first diploid genome sequence of an individual human. It can also be applied to assemblies of multiple diploid individuals and hybrid assemblies of multiple haploid organisms. Being applied to the individual human genome assembly, the new algorithm detects exactly two confirmed alleles and reports two consensus sequences in 98.98% of the total number 2,033311 detected regions of sequence variation. In 33,269 out of 460,373 detected regions of size >1 bp, it fixes the constructed errors of a mosaic haploid representation of a diploid locus as produced by the original Celera Assembler consensus algorithm. Using an optimized procedure calibrated against 1 506 344 known SNPs, it detects 438 814 new heterozygous SNPs with false positive rate 12%. The open source code is available at: http://wgs-assembler.cvs.sourceforge.net/wgs-assembler/
Cancer-specific SNPs originate from low-level heteroplasmic variants in human mitochondrial genomes of a matched cell line pair.

PubMed

Hedberg, Annica; Knutsen, Erik; Løvhaugen, Anne Silje; Jørgensen, Tor Erik; Perander, Maria; Johansen, Steinar D

2018-04-19

Low-level mitochondrial heteroplasmy is a common phenomenon in both normal and cancer cells. Here, we investigate the link between low-level heteroplasmy and mitogenome mutations in a human breast cancer matched cell line by high-throughput sequencing. We identified 23 heteroplasmic sites, of which 15 were common between normal cells (Hs578Bst) and cancer cells (Hs578T). Most sites were clustered within the highly conserved Complex IV and ribosomal RNA genes. Two heteroplasmic variants in normal cells were found as fixed mutations in cancer cells. This indicates a positive selection of these variants in cancer cells. RNA-Seq analysis identified upregulated L-strand specific transcripts in cancer cells, which include three mitochondrial long non-coding RNA molecules. We hypothesize that this is due to two cancer cell-specific mutations in the control region.
TREM2 p.H157Y Variant and the Risk of Alzheimer's Disease: A Meta-Analysis Involving 14,510 Subjects.

PubMed

Jiang, Teng; Hou, Jian-Kang; Gao, Qing; Yu, Jin-Tai; Zhou, Jun-Shan; Zhao, Hong-Dong; Zhang, Ying-Dong

2016-01-01

We recently revealed that p.H157Y (rs2234255), a rare coding variant of triggering receptor expressed on myeloid cells 2 gene (TREM2), was associated with Alzheimer's disease (AD) susceptibility in Han Chinese. Contrastingly, although p.H157Y was previously identified in both AD cases and controls by several sequencing studies, no association of this variant with disease susceptibility was reported. To gain a credible conclusion on the association between p.H157Y and AD risk, a meta-analysis involving 7,102 cases and 7,408 controls was conducted. Our results indicated that p.H157Y was associated with an increased risk of AD (OR=3.65, 95% CI: 1.61-8.28; P=0.002), further establishing TREM2 as an important susceptibility gene for this disease.
Nomenclature- and Database-Compatible Names for the Two Ebola Virus Variants that Emerged in Guinea and the Democratic Republic of the Congo in 2014

PubMed Central

Kuhn, Jens H.; Andersen, Kristian G.; Baize, Sylvain; Bào, Yīmíng; Bavari, Sina; Berthet, Nicolas; Blinkova, Olga; Brister, J. Rodney; Clawson, Anna N.; Fair, Joseph; Gabriel, Martin; Garry, Robert F.; Gire, Stephen K.; Goba, Augustine; Gonzalez, Jean-Paul; Günther, Stephan; Happi, Christian T.; Jahrling, Peter B.; Kapetshi, Jimmy; Kobinger, Gary; Kugelman, Jeffrey R.; Leroy, Eric M.; Maganga, Gael Darren; Mbala, Placide K.; Moses, Lina M.; Muyembe-Tamfum, Jean-Jacques; N’Faly, Magassouba; Nichol, Stuart T.; Omilabu, Sunday A.; Palacios, Gustavo; Park, Daniel J.; Paweska, Janusz T.; Radoshitzky, Sheli R.; Rossi, Cynthia A.; Sabeti, Pardis C.; Schieffelin, John S.; Schoepp, Randal J.; Sealfon, Rachel; Swanepoel, Robert; Towner, Jonathan S.; Wada, Jiro; Wauquier, Nadia; Yozwiak, Nathan L.; Formenty, Pierre

2014-01-01

In 2014, Ebola virus (EBOV) was identified as the etiological agent of a large and still expanding outbreak of Ebola virus disease (EVD) in West Africa and a much more confined EVD outbreak in Middle Africa. Epidemiological and evolutionary analyses confirmed that all cases of both outbreaks are connected to a single introduction each of EBOV into human populations and that both outbreaks are not directly connected. Coding-complete genomic sequence analyses of isolates revealed that the two outbreaks were caused by two novel EBOV variants, and initial clinical observations suggest that neither of them should be considered strains. Here we present consensus decisions on naming for both variants (West Africa: “Makona”, Middle Africa: “Lomela”) and provide database-compatible full, shortened, and abbreviated names that are in line with recently established filovirus sub-species nomenclatures. PMID:25421896
Comparison and evaluation of two exome capture kits and sequencing platforms for variant calling.

PubMed

Zhang, Guoqiang; Wang, Jianfeng; Yang, Jin; Li, Wenjie; Deng, Yutian; Li, Jing; Huang, Jun; Hu, Songnian; Zhang, Bing

2015-08-05

To promote the clinical application of next-generation sequencing, it is important to obtain accurate and consistent variants of target genomic regions at low cost. Ion Proton, the latest updated semiconductor-based sequencing instrument from Life Technologies, is designed to provide investigators with an inexpensive platform for human whole exome sequencing that achieves a rapid turnaround time. However, few studies have comprehensively compared and evaluated the accuracy of variant calling between Ion Proton and Illumina sequencing platforms such as HiSeq 2000, which is the most popular sequencing platform for the human genome. The Ion Proton sequencer combined with the Ion TargetSeq Exome Enrichment Kit together make up TargetSeq-Proton, whereas SureSelect-Hiseq is based on the Agilent SureSelect Human All Exon v4 Kit and the HiSeq 2000 sequencer. Here, we sequenced exonic DNA from four human blood samples using both TargetSeq-Proton and SureSelect-HiSeq. We then called variants in the exonic regions that overlapped between the two exome capture kits (33.6 Mb). The rates of shared variant loci called by two sequencing platforms were from 68.0 to 75.3% in four samples, whereas the concordance of co-detected variant loci reached 99%. Sanger sequencing validation revealed that the validated rate of concordant single nucleotide polymorphisms (SNPs) (91.5%) was higher than the SNPs specific to TargetSeq-Proton (60.0%) or specific to SureSelect-HiSeq (88.3%). With regard to 1-bp small insertions and deletions (InDels), the Sanger sequencing validated rates of concordant variants (100.0%) and SureSelect-HiSeq-specific (89.6%) were higher than those of TargetSeq-Proton-specific (15.8%). In the sequencing of exonic regions, a combination of using of two sequencing strategies (SureSelect-HiSeq and TargetSeq-Proton) increased the variant calling specificity for concordant variant loci and the sensitivity for variant loci called by any one platform. However, for the sequencing of platform-specific variants, the accuracy of variant calling by HiSeq 2000 was higher than that of Ion Proton, specifically for the InDel detection. Moreover, the variant calling software also influences the detection of SNPs and, specifically, InDels in Ion Proton exome sequencing.
Interactions between the two surface proteins of rotavirus may alter the receptor-binding specificity of the virus.

PubMed Central

Méndez, E; Arias, C F; López, S

1996-01-01

The infection of target cells by most animal rotavirus strains requires the presence of sialic acids (SAs) on the cell surface. We recently isolated variants from simian rotavirus RRV whose infectivity is no longer dependent on SAs and showed that the mutant phenotype segregates with the gene coding for VP4, one of the two surface proteins of rotaviruses (the other one being VP7). The nucleotide sequence of the VP4 gene of four independently isolated variants showed three amino acid changes, at positions 37 (Leu to Pro), 187 (Lys to Arg), and 267 (Tyr to Cys), in all mutant VP4 proteins compared with RRV VP4. The characterization of revertant viruses from two independent mutants showed that the arginine residue at position 187 changed back to lysine, indicating that this amino acid is involved in the determination of the mutant phenotype. Surprisingly, sequence analysis of reassortant virus DS1XRRV, which depends on SAs to infect the cell, showed that its VP4 gene is identical to the VP4 gene of the variants. Since the only difference between DS1XRRV and the RRV variants is the parental origin of the VP7 gene (human rotavirus DS1 in the reassortant), these findings suggest that the receptor-binding specificity of rotaviruses, via VP4, may be influenced by the associated VP7 protein. PMID:8551583
Clinical testing of BRCA1 and BRCA2: a worldwide snapshot of technological practices.

PubMed

Toland, Amanda Ewart; Forman, Andrea; Couch, Fergus J; Culver, Julie O; Eccles, Diana M; Foulkes, William D; Hogervorst, Frans B L; Houdayer, Claude; Levy-Lahad, Ephrat; Monteiro, Alvaro N; Neuhausen, Susan L; Plon, Sharon E; Sharan, Shyam K; Spurdle, Amanda B; Szabo, Csilla; Brody, Lawrence C

2018-01-01

Clinical testing of BRCA1 and BRCA2 began over 20 years ago. With the expiration and overturning of the BRCA patents, limitations on which laboratories could offer commercial testing were lifted. These legal changes occurred approximately the same time as the widespread adoption of massively parallel sequencing (MPS) technologies. Little is known about how these changes impacted laboratory practices for detecting genetic alterations in hereditary breast and ovarian cancer genes. Therefore, we sought to examine current laboratory genetic testing practices for BRCA1 / BRCA2 . We employed an online survey of 65 questions covering four areas: laboratory characteristics, details on technological methods, variant classification, and client-support information. Eight United States (US) laboratories and 78 non-US laboratories completed the survey. Most laboratories (93%; 80/86) used MPS platforms to identify variants. Laboratories differed widely on: (1) technologies used for large rearrangement detection; (2) criteria for minimum read depths; (3) non-coding regions sequenced; (4) variant classification criteria and approaches; (5) testing volume ranging from 2 to 2.5 × 10 5 tests annually; and (6) deposition of variants into public databases. These data may be useful for national and international agencies to set recommendations for quality standards for BRCA1/BRCA2 clinical testing. These standards could also be applied to testing of other disease genes.
Whole exome sequencing identifies novel genes for fetal hemoglobin response to hydroxyurea in children with sickle cell anemia.

PubMed

Sheehan, Vivien A; Crosby, Jacy R; Sabo, Aniko; Mortier, Nicole A; Howard, Thad A; Muzny, Donna M; Dugan-Perez, Shannon; Aygun, Banu; Nottage, Kerri A; Boerwinkle, Eric; Gibbs, Richard A; Ware, Russell E; Flanagan, Jonathan M

2014-01-01

Hydroxyurea has proven efficacy in children and adults with sickle cell anemia (SCA), but with considerable inter-individual variability in the amount of fetal hemoglobin (HbF) produced. Sibling and twin studies indicate that some of that drug response variation is heritable. To test the hypothesis that genetic modifiers influence pharmacological induction of HbF, we investigated phenotype-genotype associations using whole exome sequencing of children with SCA treated prospectively with hydroxyurea to maximum tolerated dose (MTD). We analyzed 171 unrelated patients enrolled in two prospective clinical trials, all treated with dose escalation to MTD. We examined two MTD drug response phenotypes: HbF (final %HbF minus baseline %HbF), and final %HbF. Analyzing individual genetic variants, we identified multiple low frequency and common variants associated with HbF induction by hydroxyurea. A validation cohort of 130 pediatric sickle cell patients treated to MTD with hydroxyurea was genotyped for 13 non-synonymous variants with the strongest association with HbF response to hydroxyurea in the discovery cohort. A coding variant in Spalt-like transcription factor, or SALL2, was associated with higher final HbF in this second independent replication sample and SALL2 represents an outstanding novel candidate gene for further investigation. These findings may help focus future functional studies and provide new insights into the pharmacological HbF upregulation by hydroxyurea in patients with SCA.
QuASAR-MPRA: accurate allele-specific analysis for massively parallel reporter assays.

PubMed

Kalita, Cynthia A; Moyerbrailean, Gregory A; Brown, Christopher; Wen, Xiaoquan; Luca, Francesca; Pique-Regi, Roger

2018-03-01

The majority of the human genome is composed of non-coding regions containing regulatory elements such as enhancers, which are crucial for controlling gene expression. Many variants associated with complex traits are in these regions, and may disrupt gene regulatory sequences. Consequently, it is important to not only identify true enhancers but also to test if a variant within an enhancer affects gene regulation. Recently, allele-specific analysis in high-throughput reporter assays, such as massively parallel reporter assays (MPRAs), have been used to functionally validate non-coding variants. However, we are still missing high-quality and robust data analysis tools for these datasets. We have further developed our method for allele-specific analysis QuASAR (quantitative allele-specific analysis of reads) to analyze allele-specific signals in barcoded read counts data from MPRA. Using this approach, we can take into account the uncertainty on the original plasmid proportions, over-dispersion, and sequencing errors. The provided allelic skew estimate and its standard error also simplifies meta-analysis of replicate experiments. Additionally, we show that a beta-binomial distribution better models the variability present in the allelic imbalance of these synthetic reporters and results in a test that is statistically well calibrated under the null. Applying this approach to the MPRA data, we found 602 SNPs with significant (false discovery rate 10%) allele-specific regulatory function in LCLs. We also show that we can combine MPRA with QuASAR estimates to validate existing experimental and computational annotations of regulatory variants. Our study shows that with appropriate data analysis tools, we can improve the power to detect allelic effects in high-throughput reporter assays. http://github.com/piquelab/QuASAR/tree/master/mpra. fluca@wayne.edu or rpique@wayne.edu. Supplementary data are available online at Bioinformatics. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment

PubMed Central

Habegger, Lukas; Balasubramanian, Suganthi; Chen, David Z.; Khurana, Ekta; Sboner, Andrea; Harmanci, Arif; Rozowsky, Joel; Clarke, Declan; Snyder, Michael; Gerstein, Mark

2012-01-01

Summary: The functional annotation of variants obtained through sequencing projects is generally assumed to be a simple intersection of genomic coordinates with genomic features. However, complexities arise for several reasons, including the differential effects of a variant on alternatively spliced transcripts, as well as the difficulty in assessing the impact of small insertions/deletions and large structural variants. Taking these factors into consideration, we developed the Variant Annotation Tool (VAT) to functionally annotate variants from multiple personal genomes at the transcript level as well as obtain summary statistics across genes and individuals. VAT also allows visualization of the effects of different variants, integrates allele frequencies and genotype data from the underlying individuals and facilitates comparative analysis between different groups of individuals. VAT can either be run through a command-line interface or as a web application. Finally, in order to enable on-demand access and to minimize unnecessary transfers of large data files, VAT can be run as a virtual machine in a cloud-computing environment. Availability and Implementation: VAT is implemented in C and PHP. The VAT web service, Amazon Machine Image, source code and detailed documentation are available at vat.gersteinlab.org. Contact: lukas.habegger@yale.edu or mark.gerstein@yale.edu Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:22743228
Molecular genetic studies of DMT1 on 12q in French-Canadian restless legs syndrome patients and families.

PubMed

Xiong, Lan; Dion, Patrick; Montplaisir, Jacques; Levchenko, Anastasia; Thibodeau, Pascale; Karemera, Liliane; Rivière, Jean-Baptiste; St-Onge, Judith; Gaspar, Claudia; Dubé, Marie-Pierre; Desautels, Alex; Turecki, Gustavo; Rouleau, Guy A

2007-10-05

Converging evidence from clinical observations, brain imaging and pathological findings strongly indicate impaired brain iron regulation in restless legs syndrome (RLS). Animal models with mutation in (DMT1) divalent metal transporter 1 gene, an important brain iron transporter, demonstrate a similar iron deficiency profile as found in RLS brain. The human DMT1 gene, mapped to chromosome 12q near the RLS1 locus, qualifies as an excellent functional and possible positional candidate for RLS. DMT1 protein levels were assessed in lymphoblastoid cell lines from RLS patients and controls. Linkage analyses were carried out with markers flanking and within the DMT1 gene. Selected patient samples from RLS families with compatible linkage to the RLS1 locus on 12q were fully sequenced in both the coding regions and the long stretches of UTR sequences. Finally, selected sequence variants were further studied in case/control and family-based association tests. A clinical association of anemia and RLS was further confirmed in this study. There was no detectable difference in DMT1 protein levels between RLS patient lymphoblastoid cell lines and normal controls. Non-parametric linkage analyses failed to identify any significant linkage signals within the DMT1 gene region. Sequencing of selected patients did not detect any sequence variant(s) compatible with DMT1 harboring RLS causative mutation(s). Further studies did not find any association between ten SNPs, spanning the whole DMT1 gene region, and RLS affection status. Finally, two DMT1 intronic SNPs showed positive association with RLS in patients with a history of anemia, when compared to RLS patients without anemia. (c) 2007 Wiley-Liss, Inc.
Whole-exome sequencing of primary plasma cell leukemia discloses heterogeneous mutational patterns.

PubMed

Cifola, Ingrid; Lionetti, Marta; Pinatel, Eva; Todoerti, Katia; Mangano, Eleonora; Pietrelli, Alessandro; Fabris, Sonia; Mosca, Laura; Simeon, Vittorio; Petrucci, Maria Teresa; Morabito, Fortunato; Offidani, Massimo; Di Raimondo, Francesco; Falcone, Antonietta; Caravita, Tommaso; Battaglia, Cristina; De Bellis, Gianluca; Palumbo, Antonio; Musto, Pellegrino; Neri, Antonino

2015-07-10

Primary plasma cell leukemia (pPCL) is a rare and aggressive form of plasma cell dyscrasia and may represent a valid model for high-risk multiple myeloma (MM). To provide novel information concerning the mutational profile of this disease, we performed the whole-exome sequencing of a prospective series of 12 pPCL cases included in a Phase II multicenter clinical trial and previously characterized at clinical and molecular levels. We identified 1, 928 coding somatic non-silent variants on 1, 643 genes, with a mean of 166 variants per sample, and only few variants and genes recurrent in two or more samples. An excess of C > T transitions and the presence of two main mutational signatures (related to APOBEC over-activity and aging) occurring in different translocation groups were observed. We identified 14 candidate cancer driver genes, mainly involved in cell-matrix adhesion, cell cycle, genome stability, RNA metabolism and protein folding. Furthermore, integration of mutation data with copy number alteration profiles evidenced biallelically disrupted genes with potential tumor suppressor functions. Globally, cadherin/Wnt signaling, extracellular matrix and cell cycle checkpoint resulted the most affected functional pathways. Sequencing results were finally combined with gene expression data to better elucidate the biological relevance of mutated genes. This study represents the first whole-exome sequencing screen of pPCL and evidenced a remarkable genetic heterogeneity of mutational patterns. This may provide a contribution to the comprehension of the pathogenetic mechanisms associated with this aggressive form of PC dyscrasia and potentially with high-risk MM.
Gratitude, protective buffering, and cognitive dissonance: How families respond to pediatric whole exome sequencing in the absence of actionable results.

PubMed

Werner-Lin, Allison; Zaspel, Lori; Carlson, Mae; Mueller, Rebecca; Walser, Sarah A; Desai, Ria; Bernhardt, Barbara A

2018-03-01

Clinical genome and exome sequencing (CGES) may identify variants leading to targeted management of existing conditions. Yet, CGES often fails to identify pathogenic diagnostic variants and introduces uncertainties by detecting variants of uncertain significance (VUS) and secondary findings. This study investigated how families understand findings and adjust their perspectives on CGES. As part of NIH's Clinical Sequencing Exploratory Research Consortium, children were recruited from clinics at the Children's Hospital of Pennsylvania (CHOP) and offered exome sequencing. Primary pathogenic and possibly pathogenic, and some secondary findings were returned. Investigators digitally recorded results disclosure sessions and conducted 3-month follow up interviews with 10 adolescents and a parent. An interdisciplinary team coded all transcripts. Participants were initially disappointed with findings, yet reactions evolved within disclosure sessions and at 3-month interviews toward acceptance and satisfaction. Families erroneously expected, and prepared extensively, to learn about risk for common conditions. During disclosure sessions, parents and adolescents varied in how they monitored and responded to each others reactions. Several misinterpreted, or overestimated, the utility of findings to attribute meaning and achieve closure for the CGES experience. Participants perceived testing as an opportunity to improve disease management despite results that did not introduce new treatments or diagnoses. Future research may examine whether families experience cognitive dissonance regarding discrepancies between expectations and findings, and how protective buffering minimizes the burden of disappointment on loved ones. As CGES is increasingly integrated into clinical care providers must contend with tempering family expectations and interpretations of findings while managing complex medical care. © 2018 Wiley Periodicals, Inc.
PVRL1 Variants Contribute to Non-Syndromic Cleft Lip and Palate in Multiple Populations

PubMed Central

Avila, Joseph R.; Jezewski, Peter A.; Vieira, Alexandre R.; Orioli, Iêda M.; Castilla, Eduardo E.; Christensen, Kaare; Daack-Hirsch, Sandra; Romitti, Paul A.; Murray, Jeffrey C.

2007-01-01

Poliovirus Receptor Like-1 (PVRL1) is a member of the immunoglobulin super family that acts in the initiation and maintenance of epithelial adherens junctions and is mutated in the cleft lip and palate/ectodermal dysplasia 1 syndrome (CLPED1, OMIM #225000). In addition, a common non-sense mutation in PVRL1 was discovered more often among non-syndromic sporadic clefting cases in Northern Venezuela in a previous case-control study. The present work sought to ascertain the role of PVRL1 in the sporadic forms of orofacial clefting in multiple populations. Multiple rare and common variants from all three splice isoforms were initially ascertained by sequencing 92 Iowan and 86 Filipino cases and CEPH controls. Using a family-based analysis to examine these variants, the common glycine allele of the G361V coding variant was significantly overtransmitted among all orofacial clefting phenotypes (P = 0.005). This represented G361V genotyping from over 800 Iowan, Danish, and Filipino families. Among four rare amino acid changes found within the V1 and C1 domains, S112T and T131A were found adjacent to critical amino acid positions within the V1 variable domain, regions previously shown to mediate cell-to-cell and cell-to-virus adhesion. The T131A variant was not found in over 1,300 non-affected control samples although the alanine is found in other species. The serine of the S112T variant position is conserved across all known PVRL1 sequences. Together these data suggest that both rare and common mutations within PVRL1 make a minor contribution to disrupting the initiation and regulation of cell-to-cell adhesion and downstream morphogenesis of the embryonic face. PMID:17089422
Functional phosphodiesterase 11A mutations may modify the risk of familial and bilateral testicular germ cell tumors

PubMed Central

Horvath, Anelia; Korde, Larissa; Greene, Mark H.; Libe, Rosella; Osorio, Paulo; Faucz, Fabio Rueda; Raffin-Sanson, Marie Laure; Tsang, Kit Man; Drori-Herishanu, Limor; Patronas, Yianna; Remmers, Elaine F; Nikita, Maria-Elena; Moran, Jason; Greene, Joseph; Nesterova, Maria; Merino, Maria; Bertherat, Jerome; Stratakis, Constantine A.

2009-01-01

Inactivating germline mutations in phosphodiesterase 11A (PDE11A) have been implicated in adrenal tumor susceptibility. PDE11A is highly-expressed in endocrine steroidogenic tissues, especially the testis, and mice with inactivated Pde11a exhibit male infertility, a known testicular germ cell tumor (TGCT) risk factor. We sequenced the PDE11A gene-coding region in 95 patients with TGCT from 64 unrelated kindreds. We identified 8 non-synonymous substitutions in 20 patients from 15 families: four (R52T; F258Y; G291R; V820M) were newly-recognized, three (R804H; R867G; M878V) were functional variants previously implicated in adrenal tumor predisposition, and one (Y727C) was a known polymorphism. We compared the frequency of these variants in our patients to unrelated controls that had been screened and found negative for any endocrine diseases: only the two previously-reported variants, R804H and R867G, known to be frequent in general population, were detected in these controls. The frequency of all PDE11A-gene variants (combined) was significantly higher among patients with TGCT (P=0.0002), present in 19% of the families of our cohort. Most variants were detected in the general population, but functional studies showed that all these mutations reduced PDE activity, and that PDE11A protein expression was decreased (or absent) in TGCT samples from carriers. This is the first demonstration of a PDE gene’s involvement in TGCT, although the cAMP signaling pathway has been investigated extensively in other reproductive organs and their diseases. In conclusion, we report that PDE11A-inactivating sequence variants may modify the risk of familial and bilateral TGCT. PMID:19549888

VariantBam: filtering and profiling of next-generational sequencing data using region-specific rules.

PubMed

Wala, Jeremiah; Zhang, Cheng-Zhong; Meyerson, Matthew; Beroukhim, Rameen

2016-07-01

We developed VariantBam, a C ++ read filtering and profiling tool for use with BAM, CRAM and SAM sequencing files. VariantBam provides a flexible framework for extracting sequencing reads or read-pairs that satisfy combinations of rules, defined by any number of genomic intervals or variant sites. We have implemented filters based on alignment data, sequence motifs, regional coverage and base quality. For example, VariantBam achieved a median size reduction ratio of 3.1:1 when applied to 10 lung cancer whole genome BAMs by removing large tags and selecting for only high-quality variant-supporting reads and reads matching a large dictionary of sequence motifs. Thus VariantBam enables efficient storage of sequencing data while preserving the most relevant information for downstream analysis. VariantBam and full documentation are available at github.com/jwalabroad/VariantBam rameen@broadinstitute.org Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
PVRL1 as a Candidate Gene for Nonsyndromic Cleft Lip With or Without Cleft Palate: No Evidence for the Involvement of Common or Rare Variants in Southern Han Chinese Patients

PubMed Central

Cheng, Hong-Qiu; Huang, En-Min; Xu, Ming-Yan; Shu, Shen-You

2012-01-01

The poliovirus receptor related-1 (PVRL1) gene encodes nectin-1, a cell–cell adhesion molecule (OMIM #600644), and is mutated in the cleft lip with or without cleft palate/ectodermal dysplasia-1 syndrome (CLPED1, OMIM #225000). In addition, PVRL1 mutations have been associated with nonsyndromic cleft lip with or without a cleft palate (NSCL/P) in studies of multiethnic samples. To investigate the possible involvement of this gene in southern Han Chinese NSCL/P patients, we performed (i) a case–control association study, and (ii) a resequencing study. A set of 470 patients with NSCL/P and 693 controls were recruited, and a total of 45 tagging single-nucleotide polymorphisms (SNPs) were genotyped by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. In the resequencing study, the coding regions of the PVRL1 α isoform were direct sequenced in 45 trios from multiply affected families. One (rs7128327) of the 45 tested SNPs showed a trend toward statistical significance in the genotypic-level chi-square test (p=0.009567). However, this result did not withstand correction for multiple testing. Likewise, sliding window haplotype analyses consisting of two, three, or four SNPs failed to detect any positive association. Resequencing analysis also failed to identify any novel rare sequence variants. In conclusion, the present study provided no support for the hypothesis that common or rare variants in PVRL1 play a significant role in NSCL/P development in the southern Han Chinese population. This is the first study that has used tagging SNPs covering all the coding and noncoding regions to search for common NSCL/P-associated mutations of PVRL1. PMID:22455396
Transposon Variants and Their Effects on Gene Expression in Arabidopsis

PubMed Central

Wang, Xi; Weigel, Detlef; Smith, Lisa M.

2013-01-01

Transposable elements (TEs) make up the majority of many plant genomes. Their transcription and transposition is controlled through siRNAs and epigenetic marks including DNA methylation. To dissect the interplay of siRNA–mediated regulation and TE evolution, and to examine how TE differences affect nearby gene expression, we investigated genome-wide differences in TEs, siRNAs, and gene expression among three Arabidopsis thaliana accessions. Both TE sequence polymorphisms and presence of linked TEs are positively correlated with intraspecific variation in gene expression. The expression of genes within 2 kb of conserved TEs is more stable than that of genes next to variant TEs harboring sequence polymorphisms. Polymorphism levels of TEs and closely linked adjacent genes are positively correlated as well. We also investigated the distribution of 24-nt-long siRNAs, which mediate TE repression. TEs targeted by uniquely mapping siRNAs are on average farther from coding genes, apparently because they more strongly suppress expression of adjacent genes. Furthermore, siRNAs, and especially uniquely mapping siRNAs, are enriched in TE regions missing in other accessions. Thus, targeting by uniquely mapping siRNAs appears to promote sequence deletions in TEs. Overall, our work indicates that siRNA–targeting of TEs may influence removal of sequences from the genome and hence evolution of gene expression in plants. PMID:23408902
Whole genome resequencing of a laboratory-adapted Drosophila melanogaster population sample

PubMed Central

Gilks, William P.; Pennell, Tanya M.; Flis, Ilona; Webster, Matthew T.; Morrow, Edward H.

2016-01-01

As part of a study into the molecular genetics of sexually dimorphic complex traits, we used high-throughput sequencing to obtain data on genomic variation in an outbred laboratory-adapted fruit fly ( Drosophila melanogaster) population. We successfully resequenced the whole genome of 220 hemiclonal females that were heterozygous for the same Berkeley reference line genome (BDGP6/dm6), and a unique haplotype from the outbred base population (LH M). The use of a static and known genetic background enabled us to obtain sequences from whole-genome phased haplotypes. We used a BWA-Picard-GATK pipeline for mapping sequence reads to the dm6 reference genome assembly, at a median depth-of coverage of 31X, and have made the resulting data publicly-available in the NCBI Short Read Archive (Accession number SRP058502). We used Haplotype Caller to discover and genotype 1,726,931 small genomic variants (SNPs and indels, <200bp). Additionally we detected and genotyped 167 large structural variants (1-100Kb in size) using GenomeStrip/2.0. Sequence and genotype data are publicly-available at the corresponding NCBI databases: Short Read Archive, dbSNP and dbVar (BioProject PRJNA282591). We have also released the unfiltered genotype data, and the code and logs for data processing and summary statistics ( https://zenodo.org/communities/sussex_drosophila_sequencing/). PMID:27928499
Phylogenetic and Genome-Wide Deep-Sequencing Analyses of Canine Parvovirus Reveal Co-Infection with Field Variants and Emergence of a Recent Recombinant Strain

PubMed Central

Pérez, Ruben; Calleros, Lucía; Marandino, Ana; Sarute, Nicolás; Iraola, Gregorio; Grecco, Sofia; Blanc, Hervé; Vignuzzi, Marco; Isakov, Ofer; Shomron, Noam; Carrau, Lucía; Hernández, Martín; Francia, Lourdes; Sosa, Katia; Tomás, Gonzalo; Panzera, Yanina

2014-01-01

Canine parvovirus (CPV), a fast-evolving single-stranded DNA virus, comprises three antigenic variants (2a, 2b, and 2c) with different frequencies and genetic variability among countries. The contribution of co-infection and recombination to the genetic variability of CPV is far from being fully elucidated. Here we took advantage of a natural CPV population, recently formed by the convergence of divergent CPV-2c and CPV-2a strains, to study co-infection and recombination. Complete sequences of the viral coding region of CPV-2a and CPV-2c strains from 40 samples were generated and analyzed using phylogenetic tools. Two samples showed co-infection and were further analyzed by deep sequencing. The sequence profile of one of the samples revealed the presence of CPV-2c and CPV-2a strains that differed at 29 nucleotides. The other sample included a minor CPV-2a strain (13.3% of the viral population) and a major recombinant strain (86.7%). The recombinant strain arose from inter-genotypic recombination between CPV-2c and CPV-2a strains within the VP1/VP2 gene boundary. Our findings highlight the importance of deep-sequencing analysis to provide a better understanding of CPV molecular diversity. PMID:25365348
A splice variant in the ACSL5 gene relates migraine with fatty acid activation in mitochondria

PubMed Central

Matesanz, Fuencisla; Fedetz, María; Barrionuevo, Cristina; Karaky, Mohamad; Catalá-Rabasa, Antonio; Potenciano, Victor; Bello-Morales, Raquel; López-Guerrero, Jose-Antonio; Alcina, Antonio

2016-01-01

Genome-wide association studies (GWAS) in migraine are providing the molecular basis of this heterogeneous disease, but the understanding of its aetiology is still incomplete. Although some biomarkers have currently been accepted for migraine, large amount of studies for identifying new ones is needed. The migraine-associated variant rs12355831:A>G (P=2 × 10−6), described in a GWAS of the International Headache Genetic Consortium, is localized in a non-coding sequence with unknown function. We sought to identify the causal variant and the genetic mechanism involved in the migraine risk. To this end, we integrated data of RNA sequences from the Genetic European Variation in Health and Disease (GEUVADIS) and genotypes from 1000 GENOMES of 344 lymphoblastoid cell lines (LCLs), to determine the expression quantitative trait loci (eQTLs) in the region. We found that the migraine-associated variant belongs to a linkage disequilibrium block associated with the expression of an acyl-coenzyme A synthetase 5 (ACSL5) transcript lacking exon 20 (ACSL5-Δ20). We showed by exon-skipping assay a direct causality of rs2256368-G in the exon 20 skipping of approximately 20 to 40% of ACSL5 RNA molecules. In conclusion, we identified the functional variant (rs2256368:A>G) affecting ACSL5 exon 20 skipping, as a causal factor linked to the migraine-associated rs12355831:A>G, suggesting that the activation of long-chain fatty acids by the spliced ACSL5-Δ20 molecules, a mitochondrial located enzyme, is involved in migraine pathology. PMID:27189022
Molecular pathological study on LRRC10 in sudden unexplained nocturnal death syndrome in the Chinese Han population

PubMed Central

Huang, Lei; Tang, Shuangbo; Chen, Yili; Zhang, Liyong; Yin, Kun; Wu, Yeda; Zheng, Jinxiang; Wu, Qiuping; Makielski, Jonathan C.

2017-01-01

Sudden unexplained nocturnal death syndrome (SUNDS) is a perplexing disorder to both forensic pathologists and clinic physicians. Clinical features of SUNDS survivors suggested that SUNDS is similar to Brugada syndrome (BrS). Leucine-rich repeat containing 10 (LRRC10) gene was a newly identified gene linked to dilated cardiomyopathy, a disease associated with sudden cardiac death. To investigate the prevalence and spectrum of genetic variants of LRRC10 gene in SUNDS and BrS, the coding regions of LRRC10 were genetically screened in 113 sporadic SUNDS victims (from January 2005 to December 2015, 30.7 ± 7.5 years) and ten BrS patients (during January 2010 to December 2014, 38.7 ± 10.3 years) using direct Sanger sequencing. Afterwards, LRRC10 missense variant carriers were screened for a panel of 80 genes known to be associated with inherited cardiac arrhythmia/cardiomyopathy using target-captured next-generation sequencing. In this study, an in silico-predicted malignant LRRC10 mutation p.E129K was detected in one SUNDS victim without pathogenic rare variant in a panel of 80 arrhythmia/cardiomyopathy-related genes. We also provided evidence to show that rare variant p.P69L might contribute to the genetic cause for one SUNDS victim and two BrS family members. This is the first report of genetic screening of LRRC10 in Chinese SUNDS victims and BrS patients. LRRC10 may be a new susceptible gene for SUNDS, and LRRC10 variant was initially and genetically linked to BrS-associated arrhythmia. PMID:28032242
A coding single-nucleotide polymorphism in lysine demethylase KDM4A associates with increased sensitivity to mTOR inhibitors.

PubMed

Van Rechem, Capucine; Black, Joshua C; Greninger, Patricia; Zhao, Yang; Donado, Carlos; Burrowes, Paul D; Ladd, Brendon; Christiani, David C; Benes, Cyril H; Whetstine, Johnathan R

2015-03-01

SNPs occur within chromatin-modulating factors; however, little is known about how these variants within the coding sequence affect cancer progression or treatment. Therefore, there is a need to establish their biochemical and/or molecular contribution, their use in subclassifying patients, and their impact on therapeutic response. In this report, we demonstrate that coding SNP-A482 within the lysine tridemethylase gene KDM4A/JMJD2A has different allelic frequencies across ethnic populations, associates with differential outcome in patients with non-small cell lung cancer (NSCLC), and promotes KDM4A protein turnover. Using an unbiased drug screen against 87 preclinical and clinical compounds, we demonstrate that homozygous SNP-A482 cells have increased mTOR inhibitor sensitivity. mTOR inhibitors significantly reduce SNP-A482 protein levels, which parallels the increased drug sensitivity observed with KDM4A depletion. Our data emphasize the importance of using variant status as candidate biomarkers and highlight the importance of studying SNPs in chromatin modifiers to achieve better targeted therapy. This report documents the first coding SNP within a lysine demethylase that associates with worse outcome in patients with NSCLC. We demonstrate that this coding SNP alters the protein turnover and associates with increased mTOR inhibitor sensitivity, which identifies a candidate biomarker for mTOR inhibitor therapy and a therapeutic target for combination therapy. ©2015 American Association for Cancer Research.
A case report and literature review of Fanconi Anemia (FA) diagnosed by genetic testing.

PubMed

Solomon, Ponnumony John; Margaret, Priya; Rajendran, Ramya; Ramalingam, Revathy; Menezes, Godfred A; Shirley, Alph S; Lee, Seung Jun; Seong, Moon-Woo; Park, Sung Sup; Seol, Dodam; Seo, Soo Hyun

2015-05-08

Fanconi anemia (FA) is a genetically heterogeneous rare autosomal recessive disorder characterized by congenital malformations, hematological problems and predisposition to malignancies. The genes that have been found to be mutated in FA patients are called FANC. To date 16 distinct FANC genes have been reported. Among these, mutations in FANCA are the most frequent among FA patients worldwide which account for 60- 65%. In this study, a nine years old male child was brought to our hospital one year ago for opinion and advice. He was the third child born to consanguineous parents. The mutation analyses were performed for proband, parents, elder sibling and the relatives [maternal aunt and maternal aunt's son (cousin)]. Molecular genetic testing [targeted next-generation sequencing (MiSeq, Illumina method)] was performed by mutation analysis in 15 genes involved. Entire coding exons and their flanking regions of the genes were analysed. Sanger sequencing [(ABI 3730 analyzer by Applied Biosystems)] was performed using primers specific for 43 coding exons of the FANCA gene. A novel splice site mutation, c.3066 + 1G > T, (IVS31 + 1G > T), homozygote was detected by sequencing in the patient. The above sequence variant was identified in heterozygous state in his parents. Further, the above sequence variant was not identified in other family members (elder sibling, maternal aunt and cousin). It is concluded that genetic study should be done if possible in all the cases of suspected FA, including siblings, parents and close blood relatives. It will help us to plan appropriate treatment and also to select suitable donor for hematopoietic stem cell transplantation and to plan for genetic counseling. In addition to the case report, the main focus of this manuscript was to review literature on role of FANCA gene in FA since large number of FANCA mutations and polymorphisms have been identified.
Deleterious ABCA7 mutations and transcript rescue mechanisms in early onset Alzheimer's disease.

PubMed

De Roeck, Arne; Van den Bossche, Tobi; van der Zee, Julie; Verheijen, Jan; De Coster, Wouter; Van Dongen, Jasper; Dillen, Lubina; Baradaran-Heravi, Yalda; Heeman, Bavo; Sanchez-Valle, Raquel; Lladó, Albert; Nacmias, Benedetta; Sorbi, Sandro; Gelpi, Ellen; Grau-Rivera, Oriol; Gómez-Tortosa, Estrella; Pastor, Pau; Ortega-Cubero, Sara; Pastor, Maria A; Graff, Caroline; Thonberg, Håkan; Benussi, Luisa; Ghidoni, Roberta; Binetti, Giuliano; de Mendonça, Alexandre; Martins, Madalena; Borroni, Barbara; Padovani, Alessandro; Almeida, Maria Rosário; Santana, Isabel; Diehl-Schmid, Janine; Alexopoulos, Panagiotis; Clarimon, Jordi; Lleó, Alberto; Fortea, Juan; Tsolaki, Magda; Koutroumani, Maria; Matěj, Radoslav; Rohan, Zdenek; De Deyn, Peter; Engelborghs, Sebastiaan; Cras, Patrick; Van Broeckhoven, Christine; Sleegers, Kristel

2017-09-01

Premature termination codon (PTC) mutations in the ATP-Binding Cassette, Sub-Family A, Member 7 gene (ABCA7) have recently been identified as intermediate-to-high penetrant risk factor for late-onset Alzheimer's disease (LOAD). High variability, however, is observed in downstream ABCA7 mRNA and protein expression, disease penetrance, and onset age, indicative of unknown modifying factors. Here, we investigated the prevalence and disease penetrance of ABCA7 PTC mutations in a large early onset AD (EOAD)-control cohort, and examined the effect on transcript level with comprehensive third-generation long-read sequencing. We characterized the ABCA7 coding sequence with next-generation sequencing in 928 EOAD patients and 980 matched control individuals. With MetaSKAT rare variant association analysis, we observed a fivefold enrichment (p = 0.0004) of PTC mutations in EOAD patients (3%) versus controls (0.6%). Ten novel PTC mutations were only observed in patients, and PTC mutation carriers in general had an increased familial AD load. In addition, we observed nominal risk reducing trends for three common coding variants. Seven PTC mutations were further analyzed using targeted long-read cDNA sequencing on an Oxford Nanopore MinION platform. PTC-containing transcripts for each investigated PTC mutation were observed at varying proportion (5-41% of the total read count), implying incomplete nonsense-mediated mRNA decay (NMD). Furthermore, we distinguished and phased several previously unknown alternative splicing events (up to 30% of transcripts). In conjunction with PTC mutations, several of these novel ABCA7 isoforms have the potential to rescue deleterious PTC effects. In conclusion, ABCA7 PTC mutations play a substantial role in EOAD, warranting genetic screening of ABCA7 in genetically unexplained patients. Long-read cDNA sequencing revealed both varying degrees of NMD and transcript-modifying events, which may influence ABCA7 dosage, disease severity, and may create opportunities for therapeutic interventions in AD.
Joint linkage and association analysis with exome sequence data implicates SLC25A40 in hypertriglyceridemia.

PubMed

Rosenthal, Elisabeth A; Ranchalis, Jane; Crosslin, David R; Burt, Amber; Brunzell, John D; Motulsky, Arno G; Nickerson, Deborah A; Wijsman, Ellen M; Jarvik, Gail P

2013-12-05

Hypertriglyceridemia (HTG) is a heritable risk factor for cardiovascular disease. Investigating the genetics of HTG may identify new drug targets. There are ~35 known single-nucleotide variants (SNVs) that explain only ~10% of variation in triglyceride (TG) level. Because of the genetic heterogeneity of HTG, a family study design is optimal for identification of rare genetic variants with large effect size because the same mutation can be observed in many relatives and cosegregation with TG can be tested. We considered HTG in a five-generation family of European American descent (n = 121), ascertained for familial combined hyperlipidemia. By using Bayesian Markov chain Monte Carlo joint oligogenic linkage and association analysis, we detected linkage to chromosomes 7 and 17. Whole-exome sequence data revealed shared, highly conserved, private missense SNVs in both SLC25A40 on chr7 and PLD2 on chr17. Jointly, these SNVs explained 49% of the genetic variance in TG; however, only the SLC25A40 SNV was significantly associated with TG (p = 0.0001). This SNV, c.374A>G, causes a highly disruptive p.Tyr125Cys substitution just outside the second helical transmembrane region of the SLC25A40 inner mitochondrial membrane transport protein. Whole-gene testing in subjects from the Exome Sequencing Project confirmed the association between TG and SLC25A40 rare, highly conserved, coding variants (p = 0.03). These results suggest a previously undescribed pathway for HTG and illustrate the power of large pedigrees in the search for rare, causal variants. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Sequencing the GRHL3 Coding Region Reveals Rare Truncating Mutations and a Common Susceptibility Variant for Nonsyndromic Cleft Palate

PubMed Central

Mangold, Elisabeth; Böhmer, Anne C.; Ishorst, Nina; Hoebel, Ann-Kathrin; Gültepe, Pinar; Schuenke, Hannah; Klamt, Johanna; Hofmann, Andrea; Gölz, Lina; Raff, Ruth; Tessmann, Peter; Nowak, Stefanie; Reutter, Heiko; Hemprich, Alexander; Kreusch, Thomas; Kramer, Franz-Josef; Braumann, Bert; Reich, Rudolf; Schmidt, Gül; Jäger, Andreas; Reiter, Rudolf; Brosch, Sibylle; Stavusis, Janis; Ishida, Miho; Seselgyte, Rimante; Moore, Gudrun E.; Nöthen, Markus M.; Borck, Guntram; Aldhorae, Khalid A.; Lace, Baiba; Stanier, Philip; Knapp, Michael; Ludwig, Kerstin U.

2016-01-01

Nonsyndromic cleft lip with/without cleft palate (nsCL/P) and nonsyndromic cleft palate only (nsCPO) are the most frequent subphenotypes of orofacial clefts. A common syndromic form of orofacial clefting is Van der Woude syndrome (VWS) where individuals have CL/P or CPO, often but not always associated with lower lip pits. Recently, ∼5% of VWS-affected individuals were identified with mutations in the grainy head-like 3 gene (GRHL3). To investigate GRHL3 in nonsyndromic clefting, we sequenced its coding region in 576 Europeans with nsCL/P and 96 with nsCPO. Most strikingly, nsCPO-affected individuals had a higher minor allele frequency for rs41268753 (0.099) than control subjects (0.049; p = 1.24 × 10−2). This association was replicated in nsCPO/control cohorts from Latvia, Yemen, and the UK (pcombined = 2.63 × 10−5; ORallelic = 2.46 [95% CI 1.6–3.7]) and reached genome-wide significance in combination with imputed data from a GWAS in nsCPO triads (p = 2.73 × 10−9). Notably, rs41268753 is not associated with nsCL/P (p = 0.45). rs41268753 encodes the highly conserved p.Thr454Met (c.1361C>T) (GERP = 5.3), which prediction programs denote as deleterious, has a CADD score of 29.6, and increases protein binding capacity in silico. Sequencing also revealed four novel truncating GRHL3 mutations including two that were de novo in four families, where all nine individuals harboring mutations had nsCPO. This is important for genetic counseling: given that VWS is rare compared to nsCPO, our data suggest that dominant GRHL3 mutations are more likely to cause nonsyndromic than syndromic CPO. Thus, with rare dominant mutations and a common risk variant in the coding region, we have identified an important contribution for GRHL3 in nsCPO. PMID:27018475
H3.3 demarcates GC-rich coding and subtelomeric regions and serves as potential memory mark for virulence gene expression in Plasmodium falciparum

PubMed Central

Fraschka, Sabine Anne-Kristin; Henderson, Rob Wilhelmus Maria; Bártfai, Richárd

2016-01-01

Histones, by packaging and organizing the DNA into chromatin, serve as essential building blocks for eukaryotic life. The basic structure of the chromatin is established by four canonical histones (H2A, H2B, H3 and H4), while histone variants are more commonly utilized to alter the properties of specific chromatin domains. H3.3, a variant of histone H3, was found to have diverse localization patterns and functions across species but has been rather poorly studied in protists. Here we present the first genome-wide analysis of H3.3 in the malaria-causing, apicomplexan parasite, P. falciparum, which revealed a complex occupancy profile consisting of conserved and parasite-specific features. In contrast to other histone variants, PfH3.3 primarily demarcates euchromatic coding and subtelomeric repetitive sequences. Stable occupancy of PfH3.3 in these regions is largely uncoupled from the transcriptional activity and appears to be primarily dependent on the GC-content of the underlying DNA. Importantly, PfH3.3 specifically marks the promoter region of an active and poised, but not inactive antigenic variation (var) gene, thereby potentially contributing to immune evasion. Collectively, our data suggest that PfH3.3, together with other histone variants, indexes the P. falciparum genome to functionally distinct domains and contribute to a key survival strategy of this deadly pathogen. PMID:27555062
Brain Region-Specific Expression of Genes Mapped within Quantitative Trait Loci for Behavioral Responsiveness to Acute Stress in Fisher 344 and Wistar Kyoto Male Rats (Postprint)

DTIC Science & Technology

2018-03-12

Integrative Genomics Viewer (Broad Institute, Cambridge, Massachusetts), we iden- tified the coding sequence variations between the F344 and WKY... abnormalities and disturbances in brain metabolism resem- bling those in depressive states [74]. Ifna2 is also known to induce memory, concentration, and...Variant and Chronic Interpersonal Stress Prospectively Predicts Social Anxiety and Depression Symptoms Over Six Years. Clinical psychological science
BlackOPs: increasing confidence in variant detection through mappability filtering.

PubMed

Cabanski, Christopher R; Wilkerson, Matthew D; Soloway, Matthew; Parker, Joel S; Liu, Jinze; Prins, Jan F; Marron, J S; Perou, Charles M; Hayes, D Neil

2013-10-01

Identifying variants using high-throughput sequencing data is currently a challenge because true biological variants can be indistinguishable from technical artifacts. One source of technical artifact results from incorrectly aligning experimentally observed sequences to their true genomic origin ('mismapping') and inferring differences in mismapped sequences to be true variants. We developed BlackOPs, an open-source tool that simulates experimental RNA-seq and DNA whole exome sequences derived from the reference genome, aligns these sequences by custom parameters, detects variants and outputs a blacklist of positions and alleles caused by mismapping. Blacklists contain thousands of artifact variants that are indistinguishable from true variants and, for a given sample, are expected to be almost completely false positives. We show that these blacklist positions are specific to the alignment algorithm and read length used, and BlackOPs allows users to generate a blacklist specific to their experimental setup. We queried the dbSNP and COSMIC variant databases and found numerous variants indistinguishable from mapping errors. We demonstrate how filtering against blacklist positions reduces the number of potential false variants using an RNA-seq glioblastoma cell line data set. In summary, accounting for mapping-caused variants tuned to experimental setups reduces false positives and, therefore, improves genome characterization by high-throughput sequencing.
Identification of verotoxin type 2 variant B subunit genes in Escherichia coli by the polymerase chain reaction and restriction fragment length polymorphism analysis.

PubMed Central

Tyler, S D; Johnson, W M; Lior, H; Wang, G; Rozee, K R

1991-01-01

A set of synthetic oligonucleotide primers was designed for use in a polymerase chain reaction protocol to specifically detect the B subunit genes in vtx2ha and vtx2hb, which code for the production of the VT2 (Shiga-like toxin II) variant cytotoxins VT2v-a and VT2v-b, respectively. An additional set of primers amplified a fragment common to the B subunits of the VT2 and the VT2 variant genes. Subsequent restriction endonuclease digestion of this amplicon permitted prediction of specific VT2 and variant genotypes on the basis of predetermined restriction fragment length polymorphisms. Genotypes of 21 VT2-producing strains of Escherichia coli were determined using this polymerase chain reaction-restriction fragment length polymorphism procedure. Four strains contained B subunit target sequences only for VT2 genes, 9 strains contained sequences only for VT2v-a genes, and 3 strains contained sequences only for VT2v-b. For genes in combination, one strain contained B subunit genes for both VT2 and VT2v-a and two strains contained B subunit genes for VT2 and VT2v-b. Two strains of E. coli O91:H21 contained both VT2v-a and VT2v-b B subunit genes. The VT2 reference strain of E. coli, E32511, was found to contain the targeted sequences from both VT2 and VT2v-a genes, whereas the recombinant E. coli, pEB1, possessed only that of the VT2 gene. The specific activities of extracellular VT2 determined in HeLa cells ranged from 0.3 to 41.7 TCD50 per microgram of protein in strains carrying the VT2 gene target and from 0 to 50.0 TCD50 per microgram of protein in strains carrying only the VT2 variant target (TCD50 is the tissue culture dose by which 50% of the cells were affected), suggesting that phenotypic expression does not correlate with genotype. Images PMID:1679436
Structure of Lmaj006129AAA, a hypothetical protein from Leishmania major

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arakaki, Tracy; Le Trong, Isolde; Structural Genomics of Pathogenic Protozoa

2006-03-01

The crystal structure of a conserved hypothetical protein from L. major, Pfam sequence family PF04543, structural genomics target ID Lmaj006129AAA, has been determined at a resolution of 1.6 Å. The gene product of structural genomics target Lmaj006129 from Leishmania major codes for a 164-residue protein of unknown function. When SeMet expression of the full-length gene product failed, several truncation variants were created with the aid of Ginzu, a domain-prediction method. 11 truncations were selected for expression, purification and crystallization based upon secondary-structure elements and disorder. The structure of one of these variants, Lmaj006129AAH, was solved by multiple-wavelength anomalous diffraction (MAD)more » using ELVES, an automatic protein crystal structure-determination system. This model was then successfully used as a molecular-replacement probe for the parent full-length target, Lmaj006129AAA. The final structure of Lmaj006129AAA was refined to an R value of 0.185 (R{sub free} = 0.229) at 1.60 Å resolution. Structure and sequence comparisons based on Lmaj006129AAA suggest that proteins belonging to Pfam sequence families PF04543 and PF01878 may share a common ligand-binding motif.« less
Genetic and molecular characterization of the maize rp3 rust resistance locus.

PubMed Central

Webb, Craig A; Richter, Todd E; Collins, Nicholas C; Nicolas, Marie; Trick, Harold N; Pryor, Tony; Hulbert, Scot H

2002-01-01

In maize, the Rp3 gene confers resistance to common rust caused by Puccinia sorghi. Flanking marker analysis of rust-susceptible rp3 variants suggested that most of them arose via unequal crossing over, indicating that rp3 is a complex locus like rp1. The PIC13 probe identifies a nucleotide binding site-leucine-rich repeat (NBS-LRR) gene family that maps to the complex. Rp3 variants show losses of PIC13 family members relative to the resistant parents when probed with PIC13, indicating that the Rp3 gene is a member of this family. Gel blots and sequence analysis suggest that at least 9 family members are at the locus in most Rp3-carrying lines and that at least 5 of these are transcribed in the Rp3-A haplotype. The coding regions of 14 family members, isolated from three different Rp3-carrying haplotypes, had DNA sequence identities from 93 to 99%. Partial sequencing of clones of a BAC contig spanning the rp3 locus in the maize inbred line B73 identified five different PIC13 paralogues in a region of approximately 140 kb. PMID:12242248
Rare variants in APP, PSEN1 and PSEN2 increase risk for AD in late-onset Alzheimer's disease families.

PubMed

Cruchaga, Carlos; Haller, Gabe; Chakraverty, Sumitra; Mayo, Kevin; Vallania, Francesco L M; Mitra, Robi D; Faber, Kelley; Williamson, Jennifer; Bird, Tom; Diaz-Arrastia, Ramon; Foroud, Tatiana M; Boeve, Bradley F; Graff-Radford, Neill R; St Jean, Pamela; Lawson, Michael; Ehm, Margaret G; Mayeux, Richard; Goate, Alison M

2012-01-01

Pathogenic mutations in APP, PSEN1, PSEN2, MAPT and GRN have previously been linked to familial early onset forms of dementia. Mutation screening in these genes has been performed in either very small series or in single families with late onset AD (LOAD). Similarly, studies in single families have reported mutations in MAPT and GRN associated with clinical AD but no systematic screen of a large dataset has been performed to determine how frequently this occurs. We report sequence data for 439 probands from late-onset AD families with a history of four or more affected individuals. Sixty sequenced individuals (13.7%) carried a novel or pathogenic mutation. Eight pathogenic variants, (one each in APP and MAPT, two in PSEN1 and four in GRN) three of which are novel, were found in 14 samples. Thirteen additional variants, present in 23 families, did not segregate with disease, but the frequency of these variants is higher in AD cases than controls, indicating that these variants may also modify risk for disease. The frequency of rare variants in these genes in this series is significantly higher than in the 1,000 genome project (p = 5.09 × 10⁻⁵; OR = 2.21; 95%CI = 1.49-3.28) or an unselected population of 12,481 samples (p = 6.82 × 10⁻⁵; OR = 2.19; 95%CI = 1.347-3.26). Rare coding variants in APP, PSEN1 and PSEN2, increase risk for or cause late onset AD. The presence of variants in these genes in LOAD and early-onset AD demonstrates that factors other than the mutation can impact the age at onset and penetrance of at least some variants associated with AD. MAPT and GRN mutations can be found in clinical series of AD most likely due to misdiagnosis. This study clearly demonstrates that rare variants in these genes could explain an important proportion of genetic heritability of AD, which is not detected by GWAS.
Detection of a single nucleotide polymorphism in the human alpha-lactalbumin gene: implications for human milk proteins.

PubMed

Chowanadisai, Winyoo; Kelleher, Shannon L; Nemeth, Jennifer F; Yachetti, Stephen; Kuhlman, Charles F; Jackson, Joan G; Davis, Anne M; Lien, Eric L; Lönnerdal, Bo

2005-05-01

Variability in the protein composition of breast milk has been observed in many women and is believed to be due to natural variation of the human population. Single nucleotide polymorphisms (SNPs) are present throughout the entire human genome, but the impact of this variation on human milk composition and biological activity and infant nutrition and health is unclear. The goals of this study were to characterize a variant of human alpha-lactalbumin observed in milk from a Filipino population by determining the location of the polymorphism in the amino acid and genomic sequences of alpha-lactalbumin. Milk and blood samples were collected from 20 Filipino women, and milk samples were collected from an additional 450 women from nine different countries. alpha-Lactalbumin concentration was measured by high-performance liquid chromatography (HPLC), and milk samples containing the variant form of the protein were identified with both HPLC and mass spectrometry (MS). The molecular weight of the variant form was measured by MS, and the location of the polymorphism was narrowed down by protein reduction, alkylation and trypsin digestion. Genomic DNA was isolated from whole blood, and the polymorphism location and subject genotype were determined by amplifying the entire coding sequence of human alpha-lactalbumin by PCR, followed by DNA sequencing. A variant form of alpha-lactalbumin was observed in HPLC chromatograms, and the difference in molecular weight was determined by MS (wild type=14,070 Da, variant=14,056 Da). Protein reduction and digestion narrowed the polymorphism between the 33rd and 77th amino acid of the protein. The genetic polymorphism was identified as adenine to guanine, which translates to a substitution from isoleucine to valine at amino acid 46. The frequency of variation was higher in milk from China, Japan and Philippines, which suggests that this polymorphism is most prevalent in Asia. There are SNPs in the genome for human milk proteins and their implications for protein bioactivity and infant nutrition need to be considered.

Selecting sequence variants to improve genomic predictions for dairy cattle

USDA-ARS?s Scientific Manuscript database

Millions of genetic variants have been identified by population-scale sequencing projects, but subsets are needed for routine genomic predictions or to include on genotyping arrays. Methods of selecting sequence variants were compared using both simulated sequence genotypes and actual data from run ...
A PYY Q62P variant linked to human obesity

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ahituv, Nadav; Kavaslar, Nihan; Schackwitz, Wendy

2005-06-27

Members of the pancreatic polypeptide family and the irreceptors have been implicated in the control of food intake in rodents and humans. To investigate whether nucleotide changes in these candidate genes result in abnormal weight in humans, we sequenced the coding exons and splice sites of seven family members (NPY, PYY, PPY, NPY1R, NPY2R, NPY4R, and NPY5R) in a large cohort of extremely obese (n=379) and lean (n=378) individuals. In total we found eleven rare non-synonymous variants, four of which exhibited familial segregation, NPY1R L53P and PPY P63L with leanness and NPY2R D42G and PYY Q62P with obesity. Functional analysismore » of the obese variants revealed NPY2R D42G to have reduced cell surface expression, while previous cell culture based studies indicated variant PYY Q62P to have altered receptor binding selectivity and we show that it fails to reduce food intake through mouse peptide injection experiments. These results support that rare non-synonymous variants within these genes can alter susceptibility to human body mass index extremes.« less
The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity

PubMed Central

Wang, Quanli; Halvorsen, Matt; Han, Yujun; Weir, William H.; Allen, Andrew S.; Goldstein, David B.

2015-01-01

Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene’s proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene’s regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen’s Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance, ncCADD and ncGWAVA, and find both scores are significantly predictive of human dosage sensitive genes and appear to carry information beyond conservation, as assessed by ncGERP. These results highlight that the intolerance of noncoding sequence stretches in the human genome can provide a critical complementary tool to other genome annotation approaches to help identify the parts of the human genome increasingly likely to harbor mutations that influence risk of disease. PMID:26332131
Parallel targeted next generation sequencing of childhood and adult acute myeloid leukemia patients reveals uniform genomic profile of the disease.

PubMed

Marjanovic, Irena; Kostic, Jelena; Stanic, Bojana; Pejanovic, Nadja; Lucic, Bojana; Karan-Djurasevic, Teodora; Janic, Dragana; Dokmanovic, Lidija; Jankovic, Srdja; Vukovic, Nada Suvajdzic; Tomin, Dragica; Perisic, Ognjen; Rakocevic, Goran; Popovic, Milos; Pavlovic, Sonja; Tosic, Natasa

2016-10-01

The age-specific differences in the genetic mechanisms of myeloid leukemogenesis have been observed and studied previously. However, NGS technology has provided a possibility to obtain a large amount of mutation data. We analyzed DNA samples from 20 childhood (cAML) and 20 adult AML (aAML) patients, using NGS targeted sequencing. The average coverage of high-quality sequences was 2981 × per amplicon. A total of 412 (207 cAML, 205 aAML) variants in the coding regions were detected; out of which, only 122 (62 cAML and 60 aAML) were potentially protein-changing. Our results confirmed that AML contains small number of genetic alterations (median 3 mutations/patient in both groups). The prevalence of the most frequent single gene AML associated mutations differed in cAML and aAML patient cohorts: IDH1 (0 % cAML, 5 % aAML), IDH2 (0 % cAML, 10 % aAML), NPM1 (10 % cAML, 35 % aAML). Additionally, potentially protein-changing variants were found in tyrosine kinase genes or genes encoding tyrosine kinase associated proteins (JAK3, ABL1, GNAQ, and EGFR) in cAML, while among aAML, the prevalence is directed towards variants in the methylation and histone modifying genes (IDH1, IDH2, and SMARCB1). Besides uniform genomic profile of AML, specific genetic characteristic was exclusively detected in cAML and aAML.
Molecular Epidemiology of Mutations in Antimicrobial Resistance Loci of Pseudomonas aeruginosa Isolates from Airways of Cystic Fibrosis Patients.

PubMed

Greipel, Leonie; Fischer, Sebastian; Klockgether, Jens; Dorda, Marie; Mielke, Samira; Wiehlmann, Lutz; Cramer, Nina; Tümmler, Burkhard

2016-11-01

The chronic airway infections with Pseudomonas aeruginosa in people with cystic fibrosis (CF) are treated with aerosolized antibiotics, oral fluoroquinolones, and/or intravenous combination therapy with aminoglycosides and β-lactam antibiotics. An international strain collection of 361 P. aeruginosa isolates from 258 CF patients seen at 30 CF clinics was examined for mutations in 17 antimicrobial susceptibility and resistance loci that had been identified as hot spots of mutation by genome sequencing of serial isolates from a single CF clinic. Combinatorial amplicon sequencing of pooled PCR products identified 1,112 sequence variants that were not present in the genomes of representative strains of the 20 most common clones of the global P. aeruginosa population. A high frequency of singular coding variants was seen in spuE, mexA, gyrA, rpoB, fusA1, mexZ, mexY, oprD, ampD, parR, parS, and envZ (amgS), reflecting the pressure upon P. aeruginosa in lungs of CF patients to generate novel protein variants. The proportion of nonneutral amino acid exchanges was high. Of the 17 loci, mexA, mexZ, and pagL were most frequently affected by independent stop mutations. Private and de novo mutations seem to play a pivotal role in the response of P. aeruginosa populations to the antimicrobial load and the individual CF host. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments.

PubMed

Daily, Jeff

2016-02-10

Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. A faster intra-sequence local pairwise alignment implementation is described and benchmarked, including new global and semi-global variants. Using a 375 residue query sequence a speed of 136 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon E5-2670 24-core processor system, the highest reported for an implementation based on Farrar's 'striped' approach. Rognes's SWIPE optimal database search application is still generally the fastest available at 1.2 to at best 2.4 times faster than Parasail for sequences shorter than 500 amino acids. However, Parasail was faster for longer sequences. For global alignments, Parasail's prefix scan implementation is generally the fastest, faster even than Farrar's 'striped' approach, however the opal library is faster for single-threaded applications. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. Applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.
An inversion of 25 base pairs causes feline GM2 gangliosidosis variant.

PubMed

Martin, Douglas R; Krum, Barbara K; Varadarajan, G S; Hathcock, Terri L; Smith, Bruce F; Baker, Henry J

2004-05-01

In G(M2) gangliosidosis variant 0, a defect in the beta-subunit of lysosomal beta-N-acetylhexosaminidase (EC 3.2.1.52) causes abnormal accumulation of G(M2) ganglioside and severe neurodegeneration. Distinct feline models of G(M2) gangliosidosis variant 0 have been described in both domestic shorthair and Korat cats. In this study, we determined that the causative mutation of G(M2) gangliosidosis in the domestic shorthair cat is a 25-base-pair inversion at the extreme 3' end of the beta-subunit (HEXB) coding sequence, which introduces three amino acid substitutions at the carboxyl terminus of the protein and a translational stop that is eight amino acids premature. Cats homozygous for the 25-base-pair inversion express levels of beta-subunit mRNA approximately 190% of normal and protein levels only 10-20% of normal. Because the 25-base-pair inversion is similar to mutations in the terminal exon of human HEXB, the domestic shorthair cat should serve as an appropriate model to study the molecular pathogenesis of human G(M2) gangliosidosis variant 0 (Sandhoff disease).
Whole-genome sequencing and genetic variant analysis of a Quarter Horse mare.

PubMed

Doan, Ryan; Cohen, Noah D; Sawyer, Jason; Ghaffari, Noushin; Johnson, Charlie D; Dindot, Scott V

2012-02-17

The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse's genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids.
Exome chip meta-analysis identifies novel loci and East Asian-specific coding variants that contribute to lipid levels and coronary artery disease.

PubMed

Lu, Xiangfeng; Peloso, Gina M; Liu, Dajiang J; Wu, Ying; Zhang, He; Zhou, Wei; Li, Jun; Tang, Clara Sze-Man; Dorajoo, Rajkumar; Li, Huaixing; Long, Jirong; Guo, Xiuqing; Xu, Ming; Spracklen, Cassandra N; Chen, Yang; Liu, Xuezhen; Zhang, Yan; Khor, Chiea Chuen; Liu, Jianjun; Sun, Liang; Wang, Laiyuan; Gao, Yu-Tang; Hu, Yao; Yu, Kuai; Wang, Yiqin; Cheung, Chloe Yu Yan; Wang, Feijie; Huang, Jianfeng; Fan, Qiao; Cai, Qiuyin; Chen, Shufeng; Shi, Jinxiu; Yang, Xueli; Zhao, Wanting; Sheu, Wayne H-H; Cherny, Stacey Shawn; He, Meian; Feranil, Alan B; Adair, Linda S; Gordon-Larsen, Penny; Du, Shufa; Varma, Rohit; Chen, Yii-Der Ida; Shu, Xiao-Ou; Lam, Karen Siu Ling; Wong, Tien Yin; Ganesh, Santhi K; Mo, Zengnan; Hveem, Kristian; Fritsche, Lars G; Nielsen, Jonas Bille; Tse, Hung-Fat; Huo, Yong; Cheng, Ching-Yu; Chen, Y Eugene; Zheng, Wei; Tai, E Shyong; Gao, Wei; Lin, Xu; Huang, Wei; Abecasis, Goncalo; Kathiresan, Sekar; Mohlke, Karen L; Wu, Tangchun; Sham, Pak Chung; Gu, Dongfeng; Willer, Cristen J

2017-12-01

Most genome-wide association studies have been of European individuals, even though most genetic variation in humans is seen only in non-European samples. To search for novel loci associated with blood lipid levels and clarify the mechanism of action at previously identified lipid loci, we used an exome array to examine protein-coding genetic variants in 47,532 East Asian individuals. We identified 255 variants at 41 loci that reached chip-wide significance, including 3 novel loci and 14 East Asian-specific coding variant associations. After a meta-analysis including >300,000 European samples, we identified an additional nine novel loci. Sixteen genes were identified by protein-altering variants in both East Asians and Europeans, and thus are likely to be functional genes. Our data demonstrate that most of the low-frequency or rare coding variants associated with lipids are population specific, and that examining genomic data across diverse ancestries may facilitate the identification of functional genes at associated loci.
Exome chip meta-analysis identifies novel loci and East Asian-specific coding variants contributing to lipid levels and coronary artery disease

PubMed Central

Lu, Xiangfeng; Peloso, Gina M; Liu, Dajiang J.; Wu, Ying; Zhang, He; Zhou, Wei; Li, Jun; Tang, Clara Sze-man; Dorajoo, Rajkumar; Li, Huaixing; Long, Jirong; Guo, Xiuqing; Xu, Ming; Spracklen, Cassandra N.; Chen, Yang; Liu, Xuezhen; Zhang, Yan; Khor, Chiea Chuen; Liu, Jianjun; Sun, Liang; Wang, Laiyuan; Gao, Yu-Tang; Hu, Yao; Yu, Kuai; Wang, Yiqin; Cheung, Chloe Yu Yan; Wang, Feijie; Huang, Jianfeng; Fan, Qiao; Cai, Qiuyin; Chen, Shufeng; Shi, Jinxiu; Yang, Xueli; Zhao, Wanting; Sheu, Wayne H.-H.; Cherny, Stacey Shawn; He, Meian; Feranil, Alan B.; Adair, Linda S.; Gordon-Larsen, Penny; Du, Shufa; Varma, Rohit; da Chen, Yii-Der I; Shu, XiaoOu; Lam, Karen Siu Ling; Wong, Tien Yin; Ganesh, Santhi K.; Mo, Zengnan; Hveem, Kristian; Fritsche, Lars; Nielsen, Jonas Bille; Tse, Hung-fat; Huo, Yong; Cheng, Ching-Yu; Chen, Y. Eugene; Zheng, Wei; Tai, E Shyong; Gao, Wei; Lin, Xu; Huang, Wei; Abecasis, Goncalo; Consortium, GLGC; Kathiresan, Sekar; Mohlke, Karen L.; Wu, Tangchun; Sham, Pak Chung; Gu, Dongfeng; Willer, Cristen J

2017-01-01

Most genome-wide association studies have been conducted in European individuals, even though most genetic variation in humans is seen only in non-European samples. To search for novel loci associated with blood lipid levels and clarify the mechanism of action at previously identified lipid loci, we examined protein-coding genetic variants in 47,532 East Asian individuals using an exome array. We identified 255 variants at 41 loci reaching chip-wide significance, including 3 novel loci and 14 East Asian-specific coding variant associations. After meta-analysis with > 300,000 European samples, we identified an additional 9 novel loci. The same 16 genes were identified by the protein-altering variants in both East Asians and Europeans, likely pointing to the functional genes. Our data demonstrate that most of the low-frequency or rare coding variants associated with lipids are population-specific, and that examining genomic data across diverse ancestries may facilitate the identification of functional genes at associated loci. PMID:29083407
Contactin 4 as an Autism Susceptibility Locus

PubMed Central

Cottrell, Catherine E.; Bir, Natalie; Varga, Elizabeth; Alvarez, Carlos E.; Bouyain, Samuel; Zernzach, Randall; LambThrush, Devon; Evans, Johnna; Trimarchi, Michael; Butter, Eric M.; Cunningham, David; Gastier-Foster, Julie M.; McBride, Kim; Herman, Gail E.

2011-01-01

Scientific Abstract Structural and sequence variation have been described in several members of the contactin (CNTN) and contactin associated protein (CNTNAP) gene families in association with neurodevelopmental disorders, including autism. Using array comparative genome hybridization (CGH), we identified a maternally inherited ~535 kb deletion at 3p26.3 encompassing the 5′ end of the contactin 4 gene (CNTN4) in a patient with autism. Based on this finding and previous reports implicating genomic rearrangements of CNTN4 in autism spectrum disorders (ASDs) and 3p− microdeletion syndrome, we undertook sequencing of the coding regions of the gene in a local ASD cohort in comparison with a set of controls. Unique missense variants were identified in 4/75 unrelated individuals with an ASD, as well as in 1/107 controls. All of the amino acid substitutions were nonsynonomous, occurred at evolutionarily conserved positions, and were, thus, felt likely to be deleterious. However, these data did not reach statistical significance, nor did the variants segregate with disease within all of the ASD families. Finally, there was no detectable difference in binding of two of the variants to the interacting protein PTPRG in vitro. Thusadditional, larger studies will be necessary to determine whether CNTN4 functions as an autism susceptibility locus in combination with other genetic and/or environmental factors. PMID:21308999
Mutation and deletion analysis of GFRα-1, encoding the co-receptor for the GDNF/RET complex, in human brain tumours

PubMed Central

Gimm, O; Gössling, A; Marsh, D J; Dahia, P L M; Mulligan, L M; Deimling, A von; Eng, C

1999-01-01

Glial cell line-derived neurotrophic factor (GDNF) plays a key role in the control of vertebrate neuron survival and differentiation in both the central and peripheral nervous systems. GDNF preferentially binds to GFRα-1 which then interacts with the receptor tyrosine kinase RET. We investigated a panel of 36 independent cases of mainly advanced sporadic brain tumours for the presence of mutations in GDNF and GFRα-1. No mutations were found in the coding region of GDNF. We identified six previously described GFRα-1 polymorphisms, two of which lead to an amino acid change. In 15 of 36 brain tumours, all polymorphic variants appeared to be homozygous. Of these 15 tumours, one also had a rare, apparently homozygous, sequence variant at codon 361. Because of the rarity of the combination of homozygous sequence variants, analysis for hemizygous deletion was pursued in the 15 samples and loss of heterozygosity was found in 11 tumours. Our data suggest that intragenic point mutations of GDNF or GFRα-1 are not a common aetiologic event in brain tumours. However, either deletion of GFRα-1 and/or nearby genes may contribute to the pathogenesis of these tumours. © 1999 Cancer Research Campaign PMID:10408842
Toward rules relating zinc finger protein sequences and DNA binding site preferences.

PubMed

Desjarlais, J R; Berg, J M

1992-08-15

Zinc finger proteins of the Cys2-His2 type consist of tandem arrays of domains, where each domain appears to contact three adjacent base pairs of DNA through three key residues. We have designed and prepared a series of variants of the central zinc finger within the DNA binding domain of Sp1 by using information from an analysis of a large data base of zinc finger protein sequences. Through systematic variations at two of the three contact positions (underlined), relatively specific recognition of sequences of the form 5'-GGGGN(G or T)GGG-3' has been achieved. These results provide the basis for rules that may develop into a code that will allow the design of zinc finger proteins with preselected DNA site specificity.
Post-mortem testing; germline BRCA1/2 variant detection using archival FFPE non-tumor tissue. A new paradigm in genetic counseling.

PubMed

Petersen, Annabeth Høgh; Aagaard, Mads Malik; Nielsen, Henriette Roed; Steffensen, Karina Dahl; Waldstrøm, Marianne; Bojesen, Anders

2016-08-01

Accurate estimation of cancer risk in HBOC families often requires BRCA1/2 testing, but this may be impossible in deceased family members. Previous, testing archival formalin-fixed, paraffin-embedded (FFPE) tissue for germline BRCA1/2 variants was unsuccessful, except for the Jewish founder mutations. A high-throughput method to systematically test for variants in all coding regions of BRCA1/2 in archival FFPE samples of non-tumor tissue is described, using HaloPlex target enrichment and next-generation sequencing. In a validation study, correct identification of variants or wild-type was possible in 25 out of 30 (83%) FFPE samples (age range 1-14 years), with a known variant status in BRCA1/2. No false positive was found. Unsuccessful identification was due to highly degraded DNA or presence of large intragenic deletions. In clinical use, a total of 201 FFPE samples (aged 0-43 years) were processed. Thirty-six samples were rejected because of highly degraded DNA or failed library preparation. Fifteen samples were investigated to search for a known variant. In the remaining 150 samples (aged 0-38 years), three variants known to affect function and one variant likely to affect function in BRCA1, six variants known to affect function and one variant likely to affect function in BRCA2, as well as four variants of unknown significance (VUS) in BRCA1 and three VUS in BRCA2 were discovered. It is now possible to test for germline BRCA1/2 variants in deceased persons, using archival FFPE samples from non-tumor tissue. Accurate genetic counseling is achievable in families where variant testing would otherwise be impossible.
Post-mortem testing; germline BRCA1/2 variant detection using archival FFPE non-tumor tissue. A new paradigm in genetic counseling

PubMed Central

Petersen, Annabeth Høgh; Aagaard, Mads Malik; Nielsen, Henriette Roed; Steffensen, Karina Dahl; Waldstrøm, Marianne; Bojesen, Anders

2016-01-01

Accurate estimation of cancer risk in HBOC families often requires BRCA1/2 testing, but this may be impossible in deceased family members. Previous, testing archival formalin-fixed, paraffin-embedded (FFPE) tissue for germline BRCA1/2 variants was unsuccessful, except for the Jewish founder mutations. A high-throughput method to systematically test for variants in all coding regions of BRCA1/2 in archival FFPE samples of non-tumor tissue is described, using HaloPlex target enrichment and next-generation sequencing. In a validation study, correct identification of variants or wild-type was possible in 25 out of 30 (83%) FFPE samples (age range 1–14 years), with a known variant status in BRCA1/2. No false positive was found. Unsuccessful identification was due to highly degraded DNA or presence of large intragenic deletions. In clinical use, a total of 201 FFPE samples (aged 0–43 years) were processed. Thirty-six samples were rejected because of highly degraded DNA or failed library preparation. Fifteen samples were investigated to search for a known variant. In the remaining 150 samples (aged 0–38 years), three variants known to affect function and one variant likely to affect function in BRCA1, six variants known to affect function and one variant likely to affect function in BRCA2, as well as four variants of unknown significance (VUS) in BRCA1 and three VUS in BRCA2 were discovered. It is now possible to test for germline BRCA1/2 variants in deceased persons, using archival FFPE samples from non-tumor tissue. Accurate genetic counseling is achievable in families where variant testing would otherwise be impossible. PMID:26733283
Genetic tracking of the raccoon variant of rabies virus in eastern North America.

PubMed

Szanto, Annamaria G; Nadin-Davis, Susan A; Rosatte, Richard C; White, Bradley N

2011-06-01

To gain insight into the incursion of the raccoon variant of rabies into the raccoon population in three Canadian provinces, a collection of 192 isolates of the raccoon rabies virus (RRV) strain was acquired from across its North American range and was genetically characterized. A 516-nucleotide segment of the non-coding region between the G and L protein open reading frames, corresponding to the most variable region of the rabies virus genome, was sequenced. This analysis identified 119 different sequences, and phylogenetic analysis of the dataset supports the documented history of RRV spread. Three distinct geographically restricted RRV lineages were identified. Lineage 1 was found in Florida, Alabama and Georgia and appears to form the ancestral lineage of the raccoon variant of rabies. Lineage 2, represented by just two isolates, was found only in Florida, while the third lineage appears broadly distributed throughout the rest of the eastern United States and eastern Canada. In New York State, two distinct spatially segregated variants were identified; the one occupying the western and northern portions of the state was responsible for an incursion of raccoon rabies into the Canadian province of Ontario. Isolates from New Brunswick and Quebec form distinct, separate clusters, consistent with their independent origins from neighboring areas of the United States. The data are consistent with localized northward incursion into these three separate areas with no evidence of east-west viral movement between the three Canadian provinces. Copyright © 2011 Elsevier B.V. All rights reserved.
Whole Exome Sequencing Identifies Novel Genes for Fetal Hemoglobin Response to Hydroxyurea in Children with Sickle Cell Anemia

PubMed Central

Sheehan, Vivien A.; Crosby, Jacy R.; Sabo, Aniko; Mortier, Nicole A.; Howard, Thad A.; Muzny, Donna M.; Dugan-Perez, Shannon; Aygun, Banu; Nottage, Kerri A.; Boerwinkle, Eric; Gibbs, Richard A.; Ware, Russell E.; Flanagan, Jonathan M.

2014-01-01

Hydroxyurea has proven efficacy in children and adults with sickle cell anemia (SCA), but with considerable inter-individual variability in the amount of fetal hemoglobin (HbF) produced. Sibling and twin studies indicate that some of that drug response variation is heritable. To test the hypothesis that genetic modifiers influence pharmacological induction of HbF, we investigated phenotype-genotype associations using whole exome sequencing of children with SCA treated prospectively with hydroxyurea to maximum tolerated dose (MTD). We analyzed 171 unrelated patients enrolled in two prospective clinical trials, all treated with dose escalation to MTD. We examined two MTD drug response phenotypes: HbF (final %HbF minus baseline %HbF), and final %HbF. Analyzing individual genetic variants, we identified multiple low frequency and common variants associated with HbF induction by hydroxyurea. A validation cohort of 130 pediatric sickle cell patients treated to MTD with hydroxyurea was genotyped for 13 non-synonymous variants with the strongest association with HbF response to hydroxyurea in the discovery cohort. A coding variant in Spalt-like transcription factor, or SALL2, was associated with higher final HbF in this second independent replication sample and SALL2 represents an outstanding novel candidate gene for further investigation. These findings may help focus future functional studies and provide new insights into the pharmacological HbF upregulation by hydroxyurea in patients with SCA. PMID:25360671
Evaluation of exome variants using the Ion Proton Platform to sequence error-prone regions.

PubMed

Seo, Heewon; Park, Yoomi; Min, Byung Joo; Seo, Myung Eui; Kim, Ju Han

2017-01-01

The Ion Proton sequencer from Thermo Fisher accurately determines sequence variants from target regions with a rapid turnaround time at a low cost. However, misleading variant-calling errors can occur. We performed a systematic evaluation and manual curation of read-level alignments for the 675 ultrarare variants reported by the Ion Proton sequencer from 27 whole-exome sequencing data but that are not present in either the 1000 Genomes Project and the Exome Aggregation Consortium. We classified positive variant calls into 393 highly likely false positives, 126 likely false positives, and 156 likely true positives, which comprised 58.2%, 18.7%, and 23.1% of the variants, respectively. We identified four distinct error patterns of variant calling that may be bioinformatically corrected when using different strategies: simplicity region, SNV cluster, peripheral sequence read, and base inversion. Local de novo assembly successfully corrected 201 (38.7%) of the 519 highly likely or likely false positives. We also demonstrate that the two sequencing kits from Thermo Fisher (the Ion PI Sequencing 200 kit V3 and the Ion PI Hi-Q kit) exhibit different error profiles across different error types. A refined calling algorithm with better polymerase may improve the performance of the Ion Proton sequencing platform.
Diversity of Sarcocystis spp shed by opossums in Brazil inferred with phylogenetic analysis of DNA coding ITS1, cytochrome B, and surface antigens.

PubMed

Valadas, Samantha Y O B; da Silva, Juliana I G; Lopes, Estela Gallucci; Keid, Lara B; Zwarg, Ticiana; de Oliveira, Alice S; Sanches, Thaís C; Joppert, Adriana M; Pena, Hilda F J; Oliveira, Tricia M F S; Ferreira, Helena L; Soares, Rodrigo M

2016-05-01

Although few species of Sarcocystis are known to use marsupials of the genus Didelphis as definitive host, an extensive diversity of alleles of surface antigen genes (sag2, sag3, and sag4) has been described in samples of didelphid opossums in Brazil. In this work, we studied 25 samples of Sarcocystis derived from gastrointestinal tract of opossums of the genus Didelphis by accessing the variability of sag2, sag3, sag4, gene encoding cytochrome b (cytB) and first internal transcribed spacer (ITS1). Reference samples of Sarcocystis neurona (SN138) and Sarcocystis falcatula (SF1) maintained in cell culture were also analyzed. We found four allele variants of cytB, seven allele variants of ITS1, 10 allele variants of sag2, 13 allele variants of sag3, and 6 allele variants of sag4. None of the sporocyst-derived sequences obtained from Brazilian opossums revealed 100% identity to SN138 at cytB gene, nor to SN138 or SF1 at ITS1 locus. In addition, none of the sag alleles were found identical to either SF1 or SN138 homologous sequences, and a high number of new sag allele types were found other than those previously described in Brazil. Out of ten sag2 alleles, four are novel, while eight out of 13 sag3 alleles are novel and one out of six sag4 alleles is novel. Further studies are needed to clarify if such a vast repertoire of allele variants of Sarcocystis is the consequence of re-assortments driven by sexual exchange, in order to form individuals with highly diverse characteristics, such as pathogenicity, host spectrum, among others or if it only represents allele variants of different species with different biological traits. Copyright © 2016 Elsevier Inc. All rights reserved.
Evaluation of Presumably Disease Causing SCN1A Variants in a Cohort of Common Epilepsy Syndromes.

PubMed

Lal, Dennis; Reinthaler, Eva M; Dejanovic, Borislav; May, Patrick; Thiele, Holger; Lehesjoki, Anna-Elina; Schwarz, Günter; Riesch, Erik; Ikram, M Arfan; van Duijn, Cornelia M; Uitterlinden, Andre G; Hofman, Albert; Steinböck, Hannelore; Gruber-Sedlmayr, Ursula; Neophytou, Birgit; Zara, Federico; Hahn, Andreas; Gormley, Padhraig; Becker, Felicitas; Weber, Yvonne G; Cilio, Maria Roberta; Kunz, Wolfram S; Krause, Roland; Zimprich, Fritz; Lemke, Johannes R; Nürnberg, Peter; Sander, Thomas; Lerche, Holger; Neubauer, Bernd A

2016-01-01

The SCN1A gene, coding for the voltage-gated Na+ channel alpha subunit NaV1.1, is the clinically most relevant epilepsy gene. With the advent of high-throughput next-generation sequencing, clinical laboratories are generating an ever-increasing catalogue of SCN1A variants. Variants are more likely to be classified as pathogenic if they have already been identified previously in a patient with epilepsy. Here, we critically re-evaluate the pathogenicity of this class of variants in a cohort of patients with common epilepsy syndromes and subsequently ask whether a significant fraction of benign variants have been misclassified as pathogenic. We screened a discovery cohort of 448 patients with a broad range of common genetic epilepsies and 734 controls for previously reported SCN1A mutations that were assumed to be disease causing. We re-evaluated the evidence for pathogenicity of the identified variants using in silico predictions, segregation, original reports, available functional data and assessment of allele frequencies in healthy individuals as well as in a follow up cohort of 777 patients. We identified 8 known missense mutations, previously reported as pathogenic, in a total of 17 unrelated epilepsy patients (17/448; 3.80%). Our re-evaluation indicates that 7 out of these 8 variants (p.R27T; p.R28C; p.R542Q; p.R604H; p.T1250M; p.E1308D; p.R1928G; NP_001159435.1) are not pathogenic. Only the p.T1174S mutation may be considered as a genetic risk factor for epilepsy of small effect size based on the enrichment in patients (P = 6.60 x 10-4; OR = 0.32, fishers exact test), previous functional studies but incomplete penetrance. Thus, incorporation of previous studies in genetic counseling of SCN1A sequencing results is challenging and may produce incorrect conclusions.

Evaluation of Presumably Disease Causing SCN1A Variants in a Cohort of Common Epilepsy Syndromes

PubMed Central

May, Patrick; Thiele, Holger; Lehesjoki, Anna-Elina; Schwarz, Günter; Riesch, Erik; Ikram, M. Arfan; van Duijn, Cornelia M.; Uitterlinden, Andre G.; Hofman, Albert; Steinböck, Hannelore; Gruber-Sedlmayr, Ursula; Neophytou, Birgit; Zara, Federico; Hahn, Andreas; Gormley, Padhraig; Becker, Felicitas; Weber, Yvonne G.; Cilio, Maria Roberta; Kunz, Wolfram S.; Krause, Roland; Zimprich, Fritz; Lemke, Johannes R.; Nürnberg, Peter; Sander, Thomas; Lerche, Holger; Neubauer, Bernd A.

2016-01-01

Objective The SCN1A gene, coding for the voltage-gated Na+ channel alpha subunit NaV1.1, is the clinically most relevant epilepsy gene. With the advent of high-throughput next-generation sequencing, clinical laboratories are generating an ever-increasing catalogue of SCN1A variants. Variants are more likely to be classified as pathogenic if they have already been identified previously in a patient with epilepsy. Here, we critically re-evaluate the pathogenicity of this class of variants in a cohort of patients with common epilepsy syndromes and subsequently ask whether a significant fraction of benign variants have been misclassified as pathogenic. Methods We screened a discovery cohort of 448 patients with a broad range of common genetic epilepsies and 734 controls for previously reported SCN1A mutations that were assumed to be disease causing. We re-evaluated the evidence for pathogenicity of the identified variants using in silico predictions, segregation, original reports, available functional data and assessment of allele frequencies in healthy individuals as well as in a follow up cohort of 777 patients. Results and Interpretation We identified 8 known missense mutations, previously reported as pathogenic, in a total of 17 unrelated epilepsy patients (17/448; 3.80%). Our re-evaluation indicates that 7 out of these 8 variants (p.R27T; p.R28C; p.R542Q; p.R604H; p.T1250M; p.E1308D; p.R1928G; NP_001159435.1) are not pathogenic. Only the p.T1174S mutation may be considered as a genetic risk factor for epilepsy of small effect size based on the enrichment in patients (P = 6.60 x 10−4; OR = 0.32, fishers exact test), previous functional studies but incomplete penetrance. Thus, incorporation of previous studies in genetic counseling of SCN1A sequencing results is challenging and may produce incorrect conclusions. PMID:26990884
Cloning and characterization of two novel DNases from Streptococcus pyogenes.

PubMed

Hasegawa, Tadao; Torii, Keizo; Hashikawa, Shinnosuke; Iinuma, Yoshitsugu; Ohta, Michio

2002-06-01

The proteins in the culture supernatant (exoproteins) from Streptococcus pyogenes serotype M1 were separated by two-dimensional gel electrophoresis, and their N-terminal amino acid sequences were determined. The amino acid sequences were compared to sequences in the S. pyogenes genome database. The coding sequence showed similarity to sequences of two genes, mf2-v ( mf2 variant) and mf3, which had sequence similarity to genes encoding mitogenic factor (MF); MF has DNase activity. The recombinant genes were expressed in Escherichia coli and the proteins were synthesized. Mf2-v and Mf3 had DNase activity. The activity of Mf2-v was localized to the C-terminal half of the protein. The mf3 gene was shown to be present in most clinically isolated strains of S. pyogenes tested, and the mf2gene was detected in 20% of the isolates. The products of the mf2 and mf3 genes in clinically isolated S. pyogenes strains were thus shown to be DNases.
Neofunctionalization of Duplicated P450 Genes Drives the Evolution of Insecticide Resistance in the Brown Planthopper.

PubMed

Zimmer, Christoph T; Garrood, William T; Singh, Kumar Saurabh; Randall, Emma; Lueke, Bettina; Gutbrod, Oliver; Matthiesen, Svend; Kohler, Maxie; Nauen, Ralf; Davies, T G Emyr; Bass, Chris

2018-01-22

Gene duplication is a major source of genetic variation that has been shown to underpin the evolution of a wide range of adaptive traits [1, 2]. For example, duplication or amplification of genes encoding detoxification enzymes has been shown to play an important role in the evolution of insecticide resistance [3-5]. In this context, gene duplication performs an adaptive function as a result of its effects on gene dosage and not as a source of functional novelty [3, 6-8]. Here, we show that duplication and neofunctionalization of a cytochrome P450, CYP6ER1, led to the evolution of insecticide resistance in the brown planthopper. Considerable genetic variation was observed in the coding sequence of CYP6ER1 in populations of brown planthopper collected from across Asia, but just two sequence variants are highly overexpressed in resistant strains and metabolize imidacloprid. Both variants are characterized by profound amino-acid alterations in substrate recognition sites, and the introduction of these mutations into a susceptible P450 sequence is sufficient to confer resistance. CYP6ER1 is duplicated in resistant strains with individuals carrying paralogs with and without the gain-of-function mutations. Despite numerical parity in the genome, the susceptible and mutant copies exhibit marked asymmetry in their expression with the resistant paralogs overexpressed. In the primary resistance-conferring CYP6ER1 variant, this results from an extended region of novel sequence upstream of the gene that provides enhanced expression. Our findings illustrate the versatility of gene duplication in providing opportunities for functional and regulatory innovation during the evolution of an adaptive trait. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
Association of Arrhythmia-Related Genetic Variants With Phenotypes Documented in Electronic Medical Records.

PubMed

Van Driest, Sara L; Wells, Quinn S; Stallings, Sarah; Bush, William S; Gordon, Adam; Nickerson, Deborah A; Kim, Jerry H; Crosslin, David R; Jarvik, Gail P; Carrell, David S; Ralston, James D; Larson, Eric B; Bielinski, Suzette J; Olson, Janet E; Ye, Zi; Kullo, Iftikhar J; Abul-Husn, Noura S; Scott, Stuart A; Bottinger, Erwin; Almoguera, Berta; Connolly, John; Chiavacci, Rosetta; Hakonarson, Hakon; Rasmussen-Torvik, Laura J; Pan, Vivian; Persell, Stephen D; Smith, Maureen; Chisholm, Rex L; Kitchner, Terrie E; He, Max M; Brilliant, Murray H; Wallace, John R; Doheny, Kimberly F; Shoemaker, M Benjamin; Li, Rongling; Manolio, Teri A; Callis, Thomas E; Macaya, Daniela; Williams, Marc S; Carey, David; Kapplinger, Jamie D; Ackerman, Michael J; Ritchie, Marylyn D; Denny, Joshua C; Roden, Dan M

2016-01-05

Large-scale DNA sequencing identifies incidental rare variants in established Mendelian disease genes, but the frequency of related clinical phenotypes in unselected patient populations is not well established. Phenotype data from electronic medical records (EMRs) may provide a resource to assess the clinical relevance of rare variants. To determine the clinical phenotypes from EMRs for individuals with variants designated as pathogenic by expert review in arrhythmia susceptibility genes. This prospective cohort study included 2022 individuals recruited for nonantiarrhythmic drug exposure phenotypes from October 5, 2012, to September 30, 2013, for the Electronic Medical Records and Genomics Network Pharmacogenomics project from 7 US academic medical centers. Variants in SCN5A and KCNH2, disease genes for long QT and Brugada syndromes, were assessed for potential pathogenicity by 3 laboratories with ion channel expertise and by comparison with the ClinVar database. Relevant phenotypes were determined from EMRs, with data available from 2002 (or earlier for some sites) through September 10, 2014. One or more variants designated as pathogenic in SCN5A or KCNH2. Arrhythmia or electrocardiographic (ECG) phenotypes defined by International Classification of Diseases, Ninth Revision (ICD-9) codes, ECG data, and manual EMR review. Among 2022 study participants (median age, 61 years [interquartile range, 56-65 years]; 1118 [55%] female; 1491 [74%] white), a total of 122 rare (minor allele frequency <0.5%) nonsynonymous and splice-site variants in 2 arrhythmia susceptibility genes were identified in 223 individuals (11% of the study cohort). Forty-two variants in 63 participants were designated potentially pathogenic by at least 1 laboratory or ClinVar, with low concordance across laboratories (Cohen κ = 0.26). An ICD-9 code for arrhythmia was found in 11 of 63 (17%) variant carriers vs 264 of 1959 (13%) of those without variants (difference, +4%; 95% CI, -5% to +13%; P = .35). In the 1270 (63%) with ECGs, corrected QT intervals were not different in variant carriers vs those without (median, 429 vs 439 milliseconds; difference, -10 milliseconds; 95% CI, -16 to +3 milliseconds; P = .17). After manual review, 22 of 63 participants (35%) with designated variants had any ECG or arrhythmia phenotype, and only 2 had corrected QT interval longer than 500 milliseconds. Among laboratories experienced in genetic testing for cardiac arrhythmia disorders, there was low concordance in designating SCN5A and KCNH2 variants as pathogenic. In an unselected population, the putatively pathogenic genetic variants were not associated with an abnormal phenotype. These findings raise questions about the implications of notifying patients of incidental genetic findings.
An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data.

PubMed

Jun, Goo; Wing, Mary Kate; Abecasis, Gonçalo R; Kang, Hyun Min

2015-06-01

The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes Project show that the pipeline provides effective filtering against false positive variants and high power to detect true variants. Our pipeline has already contributed to variant detection and genotyping in several large-scale sequencing projects, including the 1000 Genomes Project and the NHLBI Exome Sequencing Project. We hope it will now prove useful to many medical sequencing studies. © 2015 Jun et al.; Published by Cold Spring Harbor Laboratory Press.
Principles and Recommendations for Standardizing the Use of the Next-Generation Sequencing Variant File in Clinical Settings.

PubMed

Lubin, Ira M; Aziz, Nazneen; Babb, Lawrence J; Ballinger, Dennis; Bisht, Himani; Church, Deanna M; Cordes, Shaun; Eilbeck, Karen; Hyland, Fiona; Kalman, Lisa; Landrum, Melissa; Lockhart, Edward R; Maglott, Donna; Marth, Gabor; Pfeifer, John D; Rehm, Heidi L; Roy, Somak; Tezak, Zivana; Truty, Rebecca; Ullman-Cullere, Mollie; Voelkerding, Karl V; Worthey, Elizabeth A; Zaranek, Alexander W; Zook, Justin M

2017-05-01

A national workgroup convened by the Centers for Disease Control and Prevention identified principles and made recommendations for standardizing the description of sequence data contained within the variant file generated during the course of clinical next-generation sequence analysis for diagnosing human heritable conditions. The specifications for variant files were initially developed to be flexible with regard to content representation to support a variety of research applications. This flexibility permits variation with regard to how sequence findings are described and this depends, in part, on the conventions used. For clinical laboratory testing, this poses a problem because these differences can compromise the capability to compare sequence findings among laboratories to confirm results and to query databases to identify clinically relevant variants. To provide for a more consistent representation of sequence findings described within variant files, the workgroup made several recommendations that considered alignment to a common reference sequence, variant caller settings, use of genomic coordinates, and gene and variant naming conventions. These recommendations were considered with regard to the existing variant file specifications presently used in the clinical setting. Adoption of these recommendations is anticipated to reduce the potential for ambiguity in describing sequence findings and facilitate the sharing of genomic data among clinical laboratories and other entities. Copyright © 2017 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
A statistical method for the detection of variants from next-generation resequencing of DNA pools.

PubMed

Bansal, Vikas

2010-06-15

Next-generation sequencing technologies have enabled the sequencing of several human genomes in their entirety. However, the routine resequencing of complete genomes remains infeasible. The massive capacity of next-generation sequencers can be harnessed for sequencing specific genomic regions in hundreds to thousands of individuals. Sequencing-based association studies are currently limited by the low level of multiplexing offered by sequencing platforms. Pooled sequencing represents a cost-effective approach for studying rare variants in large populations. To utilize the power of DNA pooling, it is important to accurately identify sequence variants from pooled sequencing data. Detection of rare variants from pooled sequencing represents a different challenge than detection of variants from individual sequencing. We describe a novel statistical approach, CRISP [Comprehensive Read analysis for Identification of Single Nucleotide Polymorphisms (SNPs) from Pooled sequencing] that is able to identify both rare and common variants by using two approaches: (i) comparing the distribution of allele counts across multiple pools using contingency tables and (ii) evaluating the probability of observing multiple non-reference base calls due to sequencing errors alone. Information about the distribution of reads between the forward and reverse strands and the size of the pools is also incorporated within this framework to filter out false variants. Validation of CRISP on two separate pooled sequencing datasets generated using the Illumina Genome Analyzer demonstrates that it can detect 80-85% of SNPs identified using individual sequencing while achieving a low false discovery rate (3-5%). Comparison with previous methods for pooled SNP detection demonstrates the significantly lower false positive and false negative rates for CRISP. Implementation of this method is available at http://polymorphism.scripps.edu/~vbansal/software/CRISP/.
Genome-wide significant association between a sequence variant at 15q15.2 and lung cancer risk

PubMed Central

Rafnar, Thorunn; Sulem, Patrick; Besenbacher, Soren; Gudbjartsson, Daniel F.; Zanon, Carlo; Gudmundsson, Julius; Stacey, Simon N.; Kostic, Jelena P.; Thorgeirsson, Thorgeir E.; Thorleifsson, Gudmar; Bjarnason, Hjordis; Skuladottir, Halla; Gudbjartsson, Tomas; Isaksson, Helgi J.; Isla, Dolores; Murillo, Laura; García-Prats, Maria D.; Panadero, Angeles; Aben, Katja K.H.; Vermeulen, Sita H.; van der Heijden, Henricus F.M.; Feser, William; Miller, York E.; Bunn, Paul A.; Kong, Augustine; Wolf, Holly J.; Franklin, Wilbur A.; Mayordomo, Jose I; Kiemeney, Lambertus A.; Jonsson, Steinn; Thorsteinsdottir, Unnur; Stefansson, Kari

2010-01-01

Genome-wide association studies (GWAS) have identified three genomic regions, at 15q24-25.1, 5p15.33 and 6p21.33, which associate with risk of lung cancer. Large meta-analyses of GWA data have failed to find additional associations of genome-wide significance. In this study, we sought to confirm 7 variants with suggestive association to lung cancer (P<10−5) in a recently published meta-analysis. In a GWA dataset of 1,447 lung cancer cases and 36,256 controls in Iceland, three correlated variants on 15q15.2 (rs504417, rs11853991 and rs748404) showed a significant association with lung cancer whereas rs4254535 on 2p14, rs1530057 on 3p24.1, rs6438347 on 3q13.31 and rs1926203 on 10q23.31 did not. The most significant variant, rs748404, was genotyped in additional 1,299 lung cancer cases and 4,102 controls from the Netherlands, Spain and the USA and the results combined with published GWAS data. In this analysis, the T allele of rs748404 reached genome-wide significance (OR=1.15, P=1.1×10−9). Another variant at the same locus, rs12050604, showed association with lung cancer (OR=1.09, 3.6×10−6) and remained significant after adjustment for rs748404 and vice versa. rs748404 is located 140 kb centromeric of the TP53BP1 gene that has been implicated in lung cancer risk. Two fully correlated, non-synonymous coding variants in TP53BP1, rs2602141 (Q1136K) and rs560191 (E353D), showed association with lung cancer in our sample set; however, this association did not remain significant after adjustment for rs748404. Our data show that one or more lung cancer risk variants of genome-wide significance and distinct from the coding variants in TP53BP1 are located at 15q15.2. PMID:21303977
Identification of a Novel Transcript and Regulatory Mechanism for Microsomal Triglyceride Transfer Protein

PubMed Central

Suzuki, Takashi; Brown, Judy J.; Swift, Larry L.

2016-01-01

Microsomal triglyceride transfer protein (MTP) is essential for the assembly of triglyceride-rich apolipoprotein B-containing lipoproteins. Previous studies in our laboratory identified a novel splice variant of MTP in mice that we named MTP-B. MTP-B has a unique first exon (1B) located 2.7 kB upstream of the first exon (1A) for canonical MTP (MTP-A). The two mature isoforms, though nearly identical in sequence and function, have different tissue expression patterns. In this study we report the identification of a second MTP splice variant (MTP-C), which contains both exons 1B and 1A. MTP-C is expressed in all the tissues we tested. In cells transfected with MTP-C, protein expression was less than 15% of that found when the cells were transfected with MTP-A or MTP-B. In silico analysis of the 5’-UTR of MTP-C revealed seven ATGs upstream of the start site for MTP-A, which is the only viable start site in frame with the main coding sequence. One of those ATGs was located in the 5’-UTR for MTP-A. We generated reporter constructs in which the 5’-UTRs of MTP-A or MTP-C were inserted between an SV40 promoter and the coding sequence of the luciferase gene and transfected these constructs into HEK 293 cells. Luciferase activity was significantly reduced by the MTP-C 5’-UTR, but not by the MTP-A 5’-UTR. We conclude that alternative splicing plays a key role in regulating MTP expression by introducing unique 5’-UTRs, which contain elements that alter translation efficiency, enabling the cell to optimize MTP levels and activity. PMID:26771188
Pathogenic variants in TUBB4A are not found in primary dystonia

PubMed Central

Vemula, Satya R.; Xiao, Jianfeng; Bastian, Robert W.; Momčilović, Dragana; Blitzer, Andrew

2014-01-01

Objective: To determine the contribution of TUBB4A, recently associated with DYT4 dystonia in a pedigree with “whispering dysphonia” from Norfolk, United Kingdom, to the etiopathogenesis of primary dystonia. Methods: High-resolution melting and Sanger sequencing were used to inspect the entire coding region of TUBB4A in 575 subjects with primary laryngeal, segmental, or generalized dystonia. Results: No pathogenic variants, including the exon 1 variant (c.4C>G) identified in the DYT4 whispering dysphonia kindred, were found in this study. Conclusion: The c.4C>G DYT4 mutation appears to be private, and clinical testing for TUBB4A mutations is not justified in spasmodic dysphonia or other forms of primary dystonia. Moreover, given its allelic association with leukoencephalopathy hypomyelination with atrophy of basal ganglia and cerebellum and protean clinical manifestations (chorea, ataxia, dysarthria, intellectual disability, dysmorphic facial features, and psychiatric disorders), DYT4 should not be categorized as a primary dystonia. PMID:24598712
Guidelines for investigating causality of sequence variants in human disease

PubMed Central

MacArthur, D. G.; Manolio, T. A.; Dimmock, D. P.; Rehm, H. L.; Shendure, J.; Abecasis, G. R.; Adams, D. R.; Altman, R. B.; Antonarakis, S. E.; Ashley, E. A.; Barrett, J. C.; Biesecker, L. G.; Conrad, D. F.; Cooper, G. M.; Cox, N. J.; Daly, M. J.; Gerstein, M. B.; Goldstein, D. B.; Hirschhorn, J. N.; Leal, S. M.; Pennacchio, L. A.; Stamatoyannopoulos, J. A.; Sunyaev, S. R.; Valle, D.; Voight, B. F.; Winckler, W.; Gunter, C.

2014-01-01

The discovery of rare genetic variants is accelerating, and clear guidelines for distinguishing disease-causing sequence variants from the many potentially functional variants present in any human genome are urgently needed. Without rigorous standards we risk an acceleration of false-positive reports of causality, which would impede the translation of genomic research findings into the clinical diagnostic setting and hinder biological understanding of disease. Here we discuss the key challenges of assessing sequence variants in human disease, integrating both gene-level and variant-level support for causality. We propose guidelines for summarizing confidence in variant pathogenicity and highlight several areas that require further resource development. PMID:24759409
Guidelines for investigating causality of sequence variants in human disease.

PubMed

MacArthur, D G; Manolio, T A; Dimmock, D P; Rehm, H L; Shendure, J; Abecasis, G R; Adams, D R; Altman, R B; Antonarakis, S E; Ashley, E A; Barrett, J C; Biesecker, L G; Conrad, D F; Cooper, G M; Cox, N J; Daly, M J; Gerstein, M B; Goldstein, D B; Hirschhorn, J N; Leal, S M; Pennacchio, L A; Stamatoyannopoulos, J A; Sunyaev, S R; Valle, D; Voight, B F; Winckler, W; Gunter, C

2014-04-24

The discovery of rare genetic variants is accelerating, and clear guidelines for distinguishing disease-causing sequence variants from the many potentially functional variants present in any human genome are urgently needed. Without rigorous standards we risk an acceleration of false-positive reports of causality, which would impede the translation of genomic research findings into the clinical diagnostic setting and hinder biological understanding of disease. Here we discuss the key challenges of assessing sequence variants in human disease, integrating both gene-level and variant-level support for causality. We propose guidelines for summarizing confidence in variant pathogenicity and highlight several areas that require further resource development.
Two MC1R loss-of-function alleles in cream-coloured Australian Cattle Dogs and white Huskies.

PubMed

Dürig, N; Letko, A; Lepori, V; Hadji Rasouliha, S; Loechel, R; Kehl, A; Hytönen, M K; Lohi, H; Mauri, N; Dietrich, J; Wiedmer, M; Drögemüller, M; Jagannathan, V; Schmutz, S M; Leeb, T

2018-06-22

Loss-of-function variants in the MC1R gene cause recessive red or yellow coat-colour phenotypes in many species. The canine MC1R:c.916C>T (p.Arg306Ter) variant is widespread and found in a homozygous state in many uniformly yellow- or red-coloured dogs. We investigated cream-coloured Australian Cattle Dogs whose coat colour could not be explained by this variant. A genome-wide association study with 10 cream and 123 red Australian Cattle Dogs confirmed that the cream locus indeed maps to MC1R. Whole-genome sequencing of cream dogs revealed a single nucleotide variant within the MITF binding site of the canine MC1R promoter. We propose to designate the mutant alleles at MC1R:c.916C>T as e 1 and at the new promoter variant as e 2 . Both alleles segregate in the Australian Cattle Dog breed. When we considered both alleles in combination, we observed perfect association between the MC1R genotypes and the cream coat colour phenotype in a cohort of 10 cases and 324 control dogs. Analysis of the MC1R transcript levels in an e 1 /e 2 compound heterozygous dog confirmed that the transcript levels of the e 2 allele were markedly reduced with respect to the e 1 allele. We further report another MC1R loss-of-function allele in Alaskan and Siberian Huskies caused by a 2-bp deletion in the coding sequence, MC1R:c.816_817delCT. We propose to term this allele e 3 . Huskies that carry two copies of MC1R loss-of-function alleles have a white coat colour. © 2018 Stichting International Foundation for Animal Genetics.
A map of human microRNA variation uncovers unexpectedly high levels of variability

PubMed Central

2012-01-01

Background MicroRNAs (miRNAs) are key components of the gene regulatory network in many species. During the past few years, these regulatory elements have been shown to be involved in an increasing number and range of diseases. Consequently, the compilation of a comprehensive map of natural variability in a healthy population seems an obvious requirement for future research on miRNA-related pathologies. Methods Data on 14 populations from the 1000 Genomes Project were analyzed, along with new data extracted from 60 exomes of healthy individuals from a population from southern Spain, sequenced in the context of the Medical Genome Project, to derive an accurate map of miRNA variability. Results Despite the common belief that miRNAs are highly conserved elements, analysis of the sequences of the 1,152 individuals indicated that the observed level of variability is double what was expected. A total of 527 variants were found. Among these, 45 variants affected the recognition region of the corresponding miRNA and were found in 43 different miRNAs, 26 of which are known to be involved in 57 diseases. Different parts of the mature structure of the miRNA were affected to different degrees by variants, which suggests the existence of a selective pressure related to the relative functional impact of the change. Moreover, 41 variants showed a significant deviation from the Hardy-Weinberg equilibrium, which supports the existence of a selective process against some alleles. The average number of variants per individual in miRNAs was 28. Conclusions Despite an expectation that miRNAs would be highly conserved genomic elements, our study reports a level of variability comparable to that observed for coding genes. PMID:22906193
High-resolution analysis of selection sweeps identified between fine-wool Merino and coarse-wool Churra sheep breeds.

PubMed

Gutiérrez-Gil, Beatriz; Esteban-Blanco, Cristina; Wiener, Pamela; Chitneedi, Praveen Krishna; Suarez-Vega, Aroa; Arranz, Juan-Jose

2017-11-07

With the aim of identifying selection signals in three Merino sheep lines that are highly specialized for fine wool production (Australian Industry Merino, Australian Merino and Australian Poll Merino) and considering that these lines have been subjected to selection not only for wool traits but also for growth and carcass traits and parasite resistance, we contrasted the OvineSNP50 BeadChip (50 K-chip) pooled genotypes of these Merino lines with the genotypes of a coarse-wool breed, phylogenetically related breed, Spanish Churra dairy sheep. Genome re-sequencing datasets of the two breeds were analyzed to further explore the genetic variation of the regions initially identified as putative selection signals. Based on the 50 K-chip genotypes, we used the overlapping selection signals (SS) identified by four selection sweep mapping analyses (that detect genetic differentiation, reduced heterozygosity and patterns of haplotype diversity) to define 18 convergence candidate regions (CCR), five associated with positive selection in Australian Merino and the remainder indicating positive selection in Churra. Subsequent analysis of whole-genome sequences from 15 Churra and 13 Merino samples identified 142,400 genetic variants (139,745 bi-allelic SNPs and 2655 indels) within the 18 defined CCR. Annotation of 1291 variants that were significantly associated with breed identity between Churra and Merino samples identified 257 intragenic variants that caused 296 functional annotation variants, 275 of which were located across 31 coding genes. Among these, four synonymous and four missense variants (NPR2_His847Arg, NCAPG_Ser585Phe, LCORL_Asp1214Glu and LCORL_Ile1441Leu) were included. Here, we report the mapping and genetic variation of 18 selection signatures that were identified between Australian Merino and Spanish Churra sheep breeds, which were validated by an additional contrast between Spanish Merino and Churra genotypes. Analysis of whole-genome sequencing datasets allowed us to identify divergent variants that may be viewed as candidates involved in the phenotypic differences for wool, growth and meat production/quality traits between the breeds analyzed. The four missense variants located in the NPR2, NCAPG and LCORL genes may be related to selection sweep regions previously identified and various QTL reported in sheep in relation to growth traits and carcass composition.
Reliable Detection of Herpes Simplex Virus Sequence Variation by High-Throughput Resequencing.

PubMed

Morse, Alison M; Calabro, Kaitlyn R; Fear, Justin M; Bloom, David C; McIntyre, Lauren M

2017-08-16

High-throughput sequencing (HTS) has resulted in data for a number of herpes simplex virus (HSV) laboratory strains and clinical isolates. The knowledge of these sequences has been critical for investigating viral pathogenicity. However, the assembly of complete herpesviral genomes, including HSV, is complicated due to the existence of large repeat regions and arrays of smaller reiterated sequences that are commonly found in these genomes. In addition, the inherent genetic variation in populations of isolates for viruses and other microorganisms presents an additional challenge to many existing HTS sequence assembly pipelines. Here, we evaluate two approaches for the identification of genetic variants in HSV1 strains using Illumina short read sequencing data. The first, a reference-based approach, identifies variants from reads aligned to a reference sequence and the second, a de novo assembly approach, identifies variants from reads aligned to de novo assembled consensus sequences. Of critical importance for both approaches is the reduction in the number of low complexity regions through the construction of a non-redundant reference genome. We compared variants identified in the two methods. Our results indicate that approximately 85% of variants are identified regardless of the approach. The reference-based approach to variant discovery captures an additional 15% representing variants divergent from the HSV1 reference possibly due to viral passage. Reference-based approaches are significantly less labor-intensive and identify variants across the genome where de novo assembly-based approaches are limited to regions where contigs have been successfully assembled. In addition, regions of poor quality assembly can lead to false variant identification in de novo consensus sequences. For viruses with a well-assembled reference genome, a reference-based approach is recommended.
TYK2 Protein-Coding Variants Protect against Rheumatoid Arthritis and Autoimmunity, with No Evidence of Major Pleiotropic Effects on Non-Autoimmune Complex Traits

PubMed Central

Diogo, Dorothée; Bastarache, Lisa; Liao, Katherine P.; Graham, Robert R.; Fulton, Robert S.; Greenberg, Jeffrey D.; Eyre, Steve; Bowes, John; Cui, Jing; Lee, Annette; Pappas, Dimitrios A.; Kremer, Joel M.; Barton, Anne; Coenen, Marieke J. H.; Franke, Barbara; Kiemeney, Lambertus A.; Mariette, Xavier; Richard-Miceli, Corrine; Canhão, Helena; Fonseca, João E.; de Vries, Niek; Tak, Paul P.; Crusius, J. Bart A.; Nurmohamed, Michael T.; Kurreeman, Fina; Mikuls, Ted R.; Okada, Yukinori; Stahl, Eli A.; Larson, David E.; Deluca, Tracie L.; O'Laughlin, Michelle; Fronick, Catrina C.; Fulton, Lucinda L.; Kosoy, Roman; Ransom, Michael; Bhangale, Tushar R.; Ortmann, Ward; Cagan, Andrew; Gainer, Vivian; Karlson, Elizabeth W.; Kohane, Isaac; Murphy, Shawn N.; Martin, Javier; Zhernakova, Alexandra; Klareskog, Lars; Padyukov, Leonid; Worthington, Jane; Mardis, Elaine R.; Seldin, Michael F.; Gregersen, Peter K.; Behrens, Timothy; Raychaudhuri, Soumya; Denny, Joshua C.; Plenge, Robert M.

2015-01-01

Despite the success of genome-wide association studies (GWAS) in detecting a large number of loci for complex phenotypes such as rheumatoid arthritis (RA) susceptibility, the lack of information on the causal genes leaves important challenges to interpret GWAS results in the context of the disease biology. Here, we genetically fine-map the RA risk locus at 19p13 to define causal variants, and explore the pleiotropic effects of these same variants in other complex traits. First, we combined Immunochip dense genotyping (n = 23,092 case/control samples), Exomechip genotyping (n = 18,409 case/control samples) and targeted exon-sequencing (n = 2,236 case/controls samples) to demonstrate that three protein-coding variants in TYK2 (tyrosine kinase 2) independently protect against RA: P1104A (rs34536443, OR = 0.66, P = 2.3x10-21), A928V (rs35018800, OR = 0.53, P = 1.2x10-9), and I684S (rs12720356, OR = 0.86, P = 4.6x10-7). Second, we show that the same three TYK2 variants protect against systemic lupus erythematosus (SLE, Pomnibus = 6x10-18), and provide suggestive evidence that two of the TYK2 variants (P1104A and A928V) may also protect against inflammatory bowel disease (IBD; Pomnibus = 0.005). Finally, in a phenome-wide association study (PheWAS) assessing >500 phenotypes using electronic medical records (EMR) in >29,000 subjects, we found no convincing evidence for association of P1104A and A928V with complex phenotypes other than autoimmune diseases such as RA, SLE and IBD. Together, our results demonstrate the role of TYK2 in the pathogenesis of RA, SLE and IBD, and provide supporting evidence for TYK2 as a promising drug target for the treatment of autoimmune diseases. PMID:25849893
TYK2 protein-coding variants protect against rheumatoid arthritis and autoimmunity, with no evidence of major pleiotropic effects on non-autoimmune complex traits.

PubMed

Diogo, Dorothée; Bastarache, Lisa; Liao, Katherine P; Graham, Robert R; Fulton, Robert S; Greenberg, Jeffrey D; Eyre, Steve; Bowes, John; Cui, Jing; Lee, Annette; Pappas, Dimitrios A; Kremer, Joel M; Barton, Anne; Coenen, Marieke J H; Franke, Barbara; Kiemeney, Lambertus A; Mariette, Xavier; Richard-Miceli, Corrine; Canhão, Helena; Fonseca, João E; de Vries, Niek; Tak, Paul P; Crusius, J Bart A; Nurmohamed, Michael T; Kurreeman, Fina; Mikuls, Ted R; Okada, Yukinori; Stahl, Eli A; Larson, David E; Deluca, Tracie L; O'Laughlin, Michelle; Fronick, Catrina C; Fulton, Lucinda L; Kosoy, Roman; Ransom, Michael; Bhangale, Tushar R; Ortmann, Ward; Cagan, Andrew; Gainer, Vivian; Karlson, Elizabeth W; Kohane, Isaac; Murphy, Shawn N; Martin, Javier; Zhernakova, Alexandra; Klareskog, Lars; Padyukov, Leonid; Worthington, Jane; Mardis, Elaine R; Seldin, Michael F; Gregersen, Peter K; Behrens, Timothy; Raychaudhuri, Soumya; Denny, Joshua C; Plenge, Robert M

2015-01-01

Despite the success of genome-wide association studies (GWAS) in detecting a large number of loci for complex phenotypes such as rheumatoid arthritis (RA) susceptibility, the lack of information on the causal genes leaves important challenges to interpret GWAS results in the context of the disease biology. Here, we genetically fine-map the RA risk locus at 19p13 to define causal variants, and explore the pleiotropic effects of these same variants in other complex traits. First, we combined Immunochip dense genotyping (n = 23,092 case/control samples), Exomechip genotyping (n = 18,409 case/control samples) and targeted exon-sequencing (n = 2,236 case/controls samples) to demonstrate that three protein-coding variants in TYK2 (tyrosine kinase 2) independently protect against RA: P1104A (rs34536443, OR = 0.66, P = 2.3 x 10(-21)), A928V (rs35018800, OR = 0.53, P = 1.2 x 10(-9)), and I684S (rs12720356, OR = 0.86, P = 4.6 x 10(-7)). Second, we show that the same three TYK2 variants protect against systemic lupus erythematosus (SLE, Pomnibus = 6 x 10(-18)), and provide suggestive evidence that two of the TYK2 variants (P1104A and A928V) may also protect against inflammatory bowel disease (IBD; P(omnibus) = 0.005). Finally, in a phenome-wide association study (PheWAS) assessing >500 phenotypes using electronic medical records (EMR) in >29,000 subjects, we found no convincing evidence for association of P1104A and A928V with complex phenotypes other than autoimmune diseases such as RA, SLE and IBD. Together, our results demonstrate the role of TYK2 in the pathogenesis of RA, SLE and IBD, and provide supporting evidence for TYK2 as a promising drug target for the treatment of autoimmune diseases.
Variant calling in low-coverage whole genome sequencing of a Native American population sample.

PubMed

Bizon, Chris; Spiegel, Michael; Chasse, Scott A; Gizer, Ian R; Li, Yun; Malc, Ewa P; Mieczkowski, Piotr A; Sailsbery, Josh K; Wang, Xiaoshu; Ehlers, Cindy L; Wilhelmsen, Kirk C

2014-01-30

The reduction in the cost of sequencing a human genome has led to the use of genotype sampling strategies in order to impute and infer the presence of sequence variants that can then be tested for associations with traits of interest. Low-coverage Whole Genome Sequencing (WGS) is a sampling strategy that overcomes some of the deficiencies seen in fixed content SNP array studies. Linkage-disequilibrium (LD) aware variant callers, such as the program Thunder, may provide a calling rate and accuracy that makes a low-coverage sequencing strategy viable. We examined the performance of an LD-aware variant calling strategy in a population of 708 low-coverage whole genome sequences from a community sample of Native Americans. We assessed variant calling through a comparison of the sequencing results to genotypes measured in 641 of the same subjects using a fixed content first generation exome array. The comparison was made using the variant calling routines GATK Unified Genotyper program and the LD-aware variant caller Thunder. Thunder was found to improve concordance in a coverage dependent fashion, while correctly calling nearly all of the common variants as well as a high percentage of the rare variants present in the sample. Low-coverage WGS is a strategy that appears to collect genetic information intermediate in scope between fixed content genotyping arrays and deep-coverage WGS. Our data suggests that low-coverage WGS is a viable strategy with a greater chance of discovering novel variants and associations than fixed content arrays for large sample association analyses.
[Detection of pathogenic mutations in Marfan syndrome by targeted next-generation semiconductor sequencing].

PubMed

Lu, Chaoxia; Wu, Wei; Xiao, Jifang; Meng, Yan; Zhang, Shuyang; Zhang, Xue

2013-06-01

To detect pathogenic mutations in Marfan syndrome (MFS) using an Ion Torrent Personal Genome Machine (PGM) and to validate the result of targeted next-generation semiconductor sequencing for the diagnosis of genetic disorders. Peripheral blood samples were collected from three MFS patients and a normal control with informed consent. Genomic DNA was isolated by standard method and then subjected to targeted sequencing using an Ion Ampliseq(TM) Inherited Disease Panel. Three multiplex PCR reactions were carried out to amplify the coding exons of 328 genes including FBN1, TGFBR1 and TGFBR2. DNA fragments from different samples were ligated with barcoded sequencing adaptors. Template preparation and emulsion PCR, and Ion Sphere Particles enrichment were carried out using an Ion One Touch system. The ion sphere particles were sequenced on a 318 chip using the PGM platform. Data from the PGM runs were processed using an Ion Torrent Suite 3.2 software to generate sequence reads. After sequence alignment and extraction of SNPs and indels, all the variants were filtered against dbSNP137. DNA sequences were visualized with an Integrated Genomics Viewer. The most likely disease-causing variants were analyzed by Sanger sequencing. The PGM sequencing has yielded an output of 855.80 Mb, with a > 100 × median sequencing depth and a coverage of > 98% for the targeted regions in all the four samples. After data analysis and database filtering, one known missense mutation (p.E1811K) and two novel premature termination mutations (p.E2264X and p.L871FfsX23) in the FBN1 gene were identified in the three MFS patients. All mutations were verified by conventional Sanger sequencing. Pathogenic FBN1 mutations have been identified in all patients with MFS, indicating that the targeted next-generation sequencing on the PGM sequencers can be applied for accurate and high-throughput testing of genetic disorders.

A machine learning model to determine the accuracy of variant calls in capture-based next generation sequencing.

PubMed

van den Akker, Jeroen; Mishne, Gilad; Zimmer, Anjali D; Zhou, Alicia Y

2018-04-17

Next generation sequencing (NGS) has become a common technology for clinical genetic tests. The quality of NGS calls varies widely and is influenced by features like reference sequence characteristics, read depth, and mapping accuracy. With recent advances in NGS technology and software tools, the majority of variants called using NGS alone are in fact accurate and reliable. However, a small subset of difficult-to-call variants that still do require orthogonal confirmation exist. For this reason, many clinical laboratories confirm NGS results using orthogonal technologies such as Sanger sequencing. Here, we report the development of a deterministic machine-learning-based model to differentiate between these two types of variant calls: those that do not require confirmation using an orthogonal technology (high confidence), and those that require additional quality testing (low confidence). This approach allows reliable NGS-based calling in a clinical setting by identifying the few important variant calls that require orthogonal confirmation. We developed and tested the model using a set of 7179 variants identified by a targeted NGS panel and re-tested by Sanger sequencing. The model incorporated several signals of sequence characteristics and call quality to determine if a variant was identified at high or low confidence. The model was tuned to eliminate false positives, defined as variants that were called by NGS but not confirmed by Sanger sequencing. The model achieved very high accuracy: 99.4% (95% confidence interval: +/- 0.03%). It categorized 92.2% (6622/7179) of the variants as high confidence, and 100% of these were confirmed to be present by Sanger sequencing. Among the variants that were categorized as low confidence, defined as NGS calls of low quality that are likely to be artifacts, 92.1% (513/557) were found to be not present by Sanger sequencing. This work shows that NGS data contains sufficient characteristics for a machine-learning-based model to differentiate low from high confidence variants. Additionally, it reveals the importance of incorporating site-specific features as well as variant call features in such a model.
Quantitation of heteroplasmy of mtDNA sequence variants identified in a population of AD patients and controls by array-based resequencing.

PubMed

Coon, Keith D; Valla, Jon; Szelinger, Szabolics; Schneider, Lonnie E; Niedzielko, Tracy L; Brown, Kevin M; Pearson, John V; Halperin, Rebecca; Dunckley, Travis; Papassotiropoulos, Andreas; Caselli, Richard J; Reiman, Eric M; Stephan, Dietrich A

2006-08-01

The role of mitochondrial dysfunction in the pathogenesis of Alzheimer's disease (AD) has been well documented. Though evidence for the role of mitochondria in AD seems incontrovertible, the impact of mitochondrial DNA (mtDNA) mutations in AD etiology remains controversial. Though mutations in mitochondrially encoded genes have repeatedly been implicated in the pathogenesis of AD, many of these studies have been plagued by lack of replication as well as potential contamination of nuclear-encoded mitochondrial pseudogenes. To assess the role of mtDNA mutations in the pathogenesis of AD, while avoiding the pitfalls of nuclear-encoded mitochondrial pseudogenes encountered in previous investigations and showcasing the benefits of a novel resequencing technology, we sequenced the entire coding region (15,452 bp) of mtDNA from 19 extremely well-characterized AD patients and 18 age-matched, unaffected controls utilizing a new, reliable, high-throughput array-based resequencing technique, the Human MitoChip. High-throughput, array-based DNA resequencing of the entire mtDNA coding region from platelets of 37 subjects revealed the presence of 208 loci displaying a total of 917 sequence variants. There were no statistically significant differences in overall mutational burden between cases and controls, however, 265 independent sites of statistically significant change between cases and controls were identified. Changed sites were found in genes associated with complexes I (30.2%), III (3.0%), IV (33.2%), and V (9.1%) as well as tRNA (10.6%) and rRNA (14.0%). Despite their statistical significance, the subtle nature of the observed changes makes it difficult to determine whether they represent true functional variants involved in AD etiology or merely naturally occurring dissimilarity. Regardless, this study demonstrates the tremendous value of this novel mtDNA resequencing platform, which avoids the pitfalls of erroneously amplifying nuclear-encoded mtDNA pseudogenes, and our proposed analysis paradigm, which utilizes the availability of raw signal intensity values for each of the four potential alleles to facilitate quantitative estimates of mtDNA heteroplasmy. This information provides a potential new target for burgeoning diagnostics and therapeutics that could truly assist those suffering from this devastating disorder.
NGS Technologies as a Turning Point in Rare Disease Research, Diagnosis and Treatment

PubMed Central

Fernández-Marmiesse, Ana; Gouveia, Sofía; Couce, María L.

2018-01-01

Approximately 25-50 million Americans, 30 million Europeans, and 8% of the Aus-tralian population have a rare disease. Rare diseases are thus a common problem for clini-cians and account for enormous healthcare costs worldwide due to the difficulty of establish-ing a specific diagnosis. In this article, we review the milestones achieved in our understanding of rare diseases since the emergence of next-generation sequencing (NGS) technologies and analyze how these advances have influenced research and diagnosis. The first half of this review describes how NGS has changed diagnostic workflows and provided an unprecedent-ed, simple way of discovering novel disease-associated genes. We focus particularly on meta-bolic and neurodevelopmental disorders. NGS has enabled cheap and rapid genetic diagnosis, highlighted the relevance of mosaic and de novo mutations, brought to light the wide pheno-typic spectrum of most genes, detected digenic inheritance or the presence of more than one rare disease in the same patient, and paved the way for promising new therapies. In the sec-ond part of the review, we look at the limitations and challenges of NGS, including determina-tion of variant causality, the loss of variants in coding and non-coding regions, and the detec-tion of somatic mosaicism variants and epigenetic mutations, and discuss how these can be overcome in the near future. PMID:28721829
NGS Technologies as a Turning Point in Rare Disease Research , Diagnosis and Treatment.

PubMed

Fernandez-Marmiesse, Ana; Gouveia, Sofia; Couce, Maria L

2018-01-30

Approximately 25-50 million Americans, 30 million Europeans, and 8% of the Australian population have a rare disease. Rare diseases are thus a common problem for clinicians and account for enormous healthcare costs worldwide due to the difficulty of establishing a specific diagnosis. In this article, we review the milestones achieved in our understanding of rare diseases since the emergence of next-generation sequencing (NGS) technologies and analyze how these advances have influenced research and diagnosis. The first half of this review describes how NGS has changed diagnostic workflows and provided an unprecedented, simple way of discovering novel disease-associated genes. We focus particularly on metabolic and neurodevelopmental disorders. NGS has enabled cheap and rapid genetic diagnosis, highlighted the relevance of mosaic and de novo mutations, brought to light the wide phenotypic spectrum of most genes, detected digenic inheritance or the presence of more than one rare disease in the same patient, and paved the way for promising new therapies. In the second part of the review, we look at the limitations and challenges of NGS, including determination of variant causality, the loss of variants in coding and non-coding regions, and the detection of somatic mosaicism variants and epigenetic mutations, and discuss how these can be overcome in the near future. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Autosomal recessive congenital cataract in captive-bred vervet monkeys (Chlorocebus aethiops).

PubMed

Magwebu, Zandisiwe E; Abdul-Rasool, Sahar; Seier, Jürgen V; Chauke, Chesa G

2018-04-01

The aim of the study was to evaluate the genetic predisposition of congenital cataract in a colony of captive-bred vervet monkeys. Four congenital cataract genes: glucosaminyl (N-acetyl) transferase 2 (GCNT2), heat shock transcription factor 4 (HSF4), crystallin alpha A (CRYAA) and lens intrinsic membrane protein-2 (LIM2) were screened, sequenced and analysed for possible genetic variants in 36 monkeys. Gene expression was also evaluated in these genes. Fifteen sequence variants were identified in the coding regions of three genes (GCNT2, HSF4 and CRYAA). Of these variations, only three were missense mutations (M258V, V16I and S24N) and identified in the GCNT2 transcripts A, B and C, respectively, which resulted in a downregulated gene expression. Although the three missense mutations in GCNT2 have a benign effect, a possibility exists that the candidate genes (GCNT2, HSF4 and CRYAA) might harbour mutations that are responsible for total congenital cataract. © 2018 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Whole genome sequence analysis of Geitlerinema sp. FC II unveils competitive edge of the strain in marine cultivation system for biofuel production.

PubMed

Batchu, Navish Kumar; Khater, Shradha; Patil, Sonal; Nagle, Vinod; Das, Gautam; Bhadra, Bhaskar; Sapre, Ajit; Dasgupta, Santanu

2018-03-05

A filamentous cyanobacteria, Geitlerinema sp. FC II, was isolated from marine algae culture pond at Reliance Industries Limited (RIL), India. The 6.7 Mb draft genome of FC II encodes for 6697 protein coding genes. Analysis of the whole genome sequence revealed presence of nif gene cluster, supporting its capability to fix atmospheric nitrogen. FC II genome contains two variants of sulfide:quinone oxidoreductases (SQR), which is a crucial elector donor in cyanobacterial metabolic processes. FC II is characterized by the presence of multiple CRISPR- Cas (Clustered Regularly Interspaced Short Palindrome Repeats - CRISPR associated proteins) clusters, multiple variants of genes encoding photosystem reaction centres, biosynthetic gene clusters of alkane, polyketides and non-ribosomal peptides. Presence of these pathways will help FC II in gaining an ecological advantage over other strains for biomass production in large scale cultivation system. Hence, FC II may be used for production of biofuel and other industrially important metabolites. Copyright © 2018 Elsevier Inc. All rights reserved.
Genetic polymorphisms in Na+-taurocholate co-transporting polypeptide (NTCP) and ileal apical sodium-dependent bile acid transporter (ASBT) and ethnic comparisons of functional variants of NTCP among Asian populations.

PubMed

Pan, Wei; Song, Im-Sook; Shin, Ho-Jung; Kim, Min-Hye; Choi, Yeong-Lim; Lim, Su-Jeong; Kim, Woo-Young; Lee, Sang-Seop; Shin, Jae-Gook

2011-06-01

Genetic variants of Na(+)-taurocholate co-transporting polypeptide (NTCP; SLC10A1) and ileal apical sodium-dependent bile acid transporter (ASBT; SLC10A2), which greatly contribute to bile acid homeostasis, were extensively explored in the Korean population and functional variants of NTCP were compared among Asian populations. From direct DNA sequencing, six SNPs were identified in the SLC10A1 gene and 14 SNPs in the SLC10A2 gene. Three of seven coding variants were non-synonymous SNPs: two variants from SLC10A1 (A64T, S267F) and one from SLC10A2 (A171S). No linkage was analysed in the SLC10A1 gene because of low frequencies of genetic variants, and the SLC10A2 gene was composed of two separated linkage disequilibrium blocks contrary to the white population. The stably transfected NTCP-A64T variant showed significantly decreased uptakes of taurocholate and rosuvastatin compared with wild-type NTCP. The decreased taurocholate uptake and increased rosuvastatin uptake were shown in the NTCP-S267F variant. The allele frequencies of these functional variants were 1.0% and 3.1%, respectively, in a Korean population. However, NTCP-A64T was not found in Chinese and Vietnamese subjects. The frequency distribution of NTCP-S267F in Koreans was significantly lower than those in Chinese and Vietnamese populations. Our data suggest that NTCP-A64T and -S267F variants cause substrate-dependent functional change in vitro, and show ethnic difference in their allelic frequencies among Asian populations although the clinical relevance of these variants is remained to be evaluated.
Gene variants and binge eating as predictors of comorbidity and outcome of treatment in severe obesity.

PubMed

Potoczna, Natascha; Branson, Ruth; Kral, John G; Piec, Grazyna; Steffen, Rudolf; Ricklin, Thomas; Hoehe, Margret R; Lentes, Klaus-Ulrich; Horber, Fritz F

2004-12-01

Melanocortin-4 receptor gene (MC4R) variants are associated with obesity and binge eating disorder (BED), whereas the more prevalent proopiomelanocortin (POMC) and leptin receptor gene (LEPR) mutations are rarely associated with obesity or BED. The complete coding regions of MC4R, POMC, and leptin-binding domain of LEPR were comparatively sequenced in 300 patients (233 women and 67 men; mean +/- SEM age, 42 +/- 1 years; mean +/- SEM body mass index, 43.5 +/- 0.3 kg/m2) undergoing laparoscopic gastric banding. Eating behavior, esophagogastric pathology, metabolic syndrome prevalence, and postoperative weight loss and complications were retrospectively compared between carriers and noncarriers of gene variants with and without BED during 36 +/- 3-month follow-up. Nineteen patients (6.3%) carried 8 MC4R variants, 144 (48.0%) carried 13 POMC variants, and 247 (82.3%) carried 11 LEPR variants. All MC4R variant carriers had BED, compared with 18.1% of noncarriers (P < 0.001). BED rates were similar among POMC and LEPR variant carriers and noncarriers. Gastroscopy revealed more erosive esophagitis in bingers than in nonbingers before and after banding (P < 0.04), regardless of genotype. MC4R variant carriers lost less weight (P=0.003), showed less improvement in metabolic syndrome (P < 0.001), had dilated esophagi (P < 0.001) and more vomiting (P < 0.05), and had fivefold more gastric complications (P < 0.001) than noncarriers. Overall outcome was poorest in MC4R variant carriers, better in noncarriers with BED (P < 0.05), and best in noncarriers without BED (P < 0.001). MC4R variants influence comorbidities and treatment outcomes in severe obesity.
Novel de novo AVPR2 Variant in a Patient with Congenital Nephrogenic Diabetes Insipidus

PubMed Central

Joshi, Shivani; Brandstrom, Per; Gregersen, Niels; Rittig, Søren; Christensen, Jane Hvarregaard

2017-01-01

Early diagnosis and treatment of congenital nephrogenic diabetes insipidus (CNDI) are essential due to the risk of intellectual disability caused by repeated episodes of dehydration and rapid rehydration. Timely genetic testing for disease-causing variants in the arginine vasopressin receptor 2 (AVPR2) gene is possible in at-risk newborns with a known family history of X-linked CNDI. In this study, a Swedish male with no family history was diagnosed with CNDI at 6 months of age during an episode of gastroenteritis. We analyzed the coding regions of AVPR2 by PCR and direct DNA sequencing and identified an 80-bp duplication in exon 2 (GenBank NM_000054.4; c.800_879dup) in the proband. This variant leads to a frameshift and introduces a stop codon four codons downstream (p.Ala294Profs*4). The variant gene product either succumbs to nonsense-mediated decay or is translated to a truncated nonfunctional vasopressin V2 receptor. This variant was absent in four unaffected family members, including his parents, as well as in 100 alleles from healthy controls, and is thus considered a novel de novo disease-causing variant. Identification of the disease-causing variant facilitated precise diagnosis of CNDI in the proband. Furthermore, it allows future genetic counseling in the family. This case study highlights the importance of genetic testing in sporadic infant cases with CNDI that can occur due to de novo variants in AVPR2 or several generations of female transmission of the disease-causing variant. PMID:29177155
Exome sequencing reveals novel genetic loci influencing obesity-related traits in Hispanic children

USDA-ARS?s Scientific Manuscript database

To perform whole exome sequencing in 928 Hispanic children and identify variants and genes associated with childhood obesity.Single-nucleotide variants (SNVs) were identified from Illumina whole exome sequencing data using integrated read mapping, variant calling, and an annotation pipeline (Mercury...
Novel sequence variants in the TMIE gene in families with autosomal recessive nonsyndromic hearing impairment

PubMed Central

Santos, Regie Lyn P.; El-Shanti, Hatem; Sikandar, Shaheen; Lee, Kwanghyuk; Bhatti, Attya; Yan, Kai; Chahrour, Maria H.; McArthur, Nathan; Pham, Thanh L.; Mahasneh, Amjad Abdullah; Ahmad, Wasim

2010-01-01

To date, 37 genes have been identified for nonsyndromic hearing impairment (NSHI). Identifying the functional sequence variants within these genes and knowing their population-specific frequencies is of public health value, in particular for genetic screening for NSHI. To determine putatively functional sequence variants in the transmembrane inner ear (TMIE) gene in Pakistani and Jordanian families with autosomal recessive (AR) NSHI, four Jordanian and 168 Pakistani families with ARNSHI that is not due to GJB2 (CX26) were submitted to a genome scan. Two-point and multipoint parametric linkage analyses were performed, and families with logarithmic odds (LOD) scores of 1.0 or greater within the TMIE region underwent further DNA sequencing. The evolutionary conservation and location in predicted protein domains of amino acid residues where sequence variants occurred were studied to elucidate the possible effects of these sequence variants on function. Of seven families that were screened for TMIE, putatively functional sequence variants were found to segregate with hearing impairment in four families but were not seen in not less than 110 ethnically matched control chromosomes. The previously reported c.241C>T (p.R81C) variant was observed in two Pakistani families. Two novel variants, c.92A>G (p.E31G) and the splice site mutation c.212–2A>C, were identified in one Pakistani and one Jordanian family, respectively. The c.92A>G (p.E31G) variant occurred at a residue that is conserved in the mouse and is predicted to be extracellular. Conservation and potential functionality of previously published mutations were also examined. The prevalence of functional TMIE variants in Pakistani families is 1.7% [95% confidence interval (CI) 0.3–4.8]. Further studies on the spectrum, prevalence rates, and functional effect of sequence variants in the TMIE gene in other populations should demonstrate the true importance of this gene as a cause of hearing impairment. PMID:16389551
The human pregnane X receptor: genomic structure and identification and functional characterization of natural allelic variants.

PubMed

Zhang, J; Kuehl, P; Green, E D; Touchman, J W; Watkins, P B; Daly, A; Hall, S D; Maurel, P; Relling, M; Brimer, C; Yasuda, K; Wrighton, S A; Hancock, M; Kim, R B; Strom, S; Thummel, K; Russell, C G; Hudson, J R; Schuetz, E G; Boguski, M S

2001-10-01

The pregnane X receptor (PXR)/steroid and xenobiotic receptor (SXR) transcriptionally activates cytochrome P4503A4 (CYP3A4) when ligand activated by endobiotics and xenobiotics. We cloned the human PXR gene and analysed the sequence in DNAs of individuals whose CYP3A phenotype was known. The PXR gene spans 35 kb, contains nine exons, and mapped to chromosome 13q11-13. Thirty-eight single nucleotide polymorphisms (SNPs) were identified including six SNPs in the coding region. Three of the coding SNPs are non-synonymous creating new PXR alleles [PXR*2, P27S (79C to T); PXR*3, G36R (106G to A); and PXR*4, R122Q (4321G to A)]. The frequency of PXR*2 was 0.20 in African Americans and was never found in Caucasians. Hepatic expression of CYP3A4 protein was not significantly different between African Americans homozygous for PXR*1 compared to those with one PXR*2 allele. PXR*4 was a rare variant found in only one Caucasian person. Homology modelling suggested that R122Q, (PXR*4) is a direct DNA contact site variation in the third alpha-helix in the DNA binding domain. Compared with PXR*1, and variants PXR*2 and PXR*3, only the variant PXR*4 protein had significantly decreased affinity for the PXR binding sequence in electromobility shift assays and attenuated ligand activation of the CYP3A4 reporter plasmids in transient transfection assays. However, the person heterozygous for PXR*4 is normal for CYP3A4 metabolism phenotype. The relevance of each of the 38 PXR SNPs identified in DNA of individuals whose CYP3A basal and rifampin-inducible CYP3A4 expression was determined in vivo and/or in vitro was demonstrated by univariate statistical analysis. Because ligand activation of PXR and upregulation of a system of drug detoxification genes are major determinants of drug interactions, it will now be useful to extend this work to determine the association of these common PXR SNPs to human variation in induction of other drug detoxification gene targets.
Systematic screening for mutations in the promoter and the coding region of the 5-HT{sub 1A} gene

DOE Office of Scientific and Technical Information (OSTI.GOV)

Erdmann, J.; Shimron-Abarbanell, D.; Cichon, S.

1995-10-09

In the present study we sought to identify genetic variation in the 5-HT{sub 1A} receptor gene which through alteration of protein function or level of expression might contribute to the genetic predisposition to neuropsychiatric diseases. Genomic DNA samples from 159 unrelated subjects (including 45 schizophrenic, 46 bipolar affective, and 43 patients with Tourette`s syndrome, as well as 25 healthy controls) were investigated by single-strand conformation analysis. Overlapping PCR (polymerase chain reaction) fragments covered the whole coding sequence as well as the 5{prime} untranslated region of the 5-HT{sub 1A} gene. The region upstream to the coding sequence we investigated contains amore » functional promoter. We found two rare nucleotide sequence variants. Both mutations are located in the coding region of the gene: a coding mutation (A{yields}G) in nucleotide position 82 which leads to an amino acid exchange (Ile{yields}Val) in position 28 of the receptor protein and a silent mutation (C{yields}T) in nucleotide position 549. The occurrence of the Ile-28-Val substitution was studied in an extended sample of patients (n = 352) and controls (n = 210) but was found in similar frequencies in all groups. Thus, this mutation is unlikely to play a significant role in the genetic predisposition to the diseases investigated. In conclusion, our study does not provide evidence that the 5-HT{sub 1A} gene plays either a major or a minor role in the genetic predisposition to schizophrenia, bipolar affective disorder, or Tourette`s syndrome. 29 refs., 4 figs., 1 tab.« less
Demographic history and rare allele sharing among human populations.

PubMed

Gravel, Simon; Henn, Brenna M; Gutenkunst, Ryan N; Indap, Amit R; Marth, Gabor T; Clark, Andrew G; Yu, Fuli; Gibbs, Richard A; Bustamante, Carlos D

2011-07-19

High-throughput sequencing technology enables population-level surveys of human genomic variation. Here, we examine the joint allele frequency distributions across continental human populations and present an approach for combining complementary aspects of whole-genome, low-coverage data and targeted high-coverage data. We apply this approach to data generated by the pilot phase of the Thousand Genomes Project, including whole-genome 2-4× coverage data for 179 samples from HapMap European, Asian, and African panels as well as high-coverage target sequencing of the exons of 800 genes from 697 individuals in seven populations. We use the site frequency spectra obtained from these data to infer demographic parameters for an Out-of-Africa model for populations of African, European, and Asian descent and to predict, by a jackknife-based approach, the amount of genetic diversity that will be discovered as sample sizes are increased. We predict that the number of discovered nonsynonymous coding variants will reach 100,000 in each population after ∼1,000 sequenced chromosomes per population, whereas ∼2,500 chromosomes will be needed for the same number of synonymous variants. Beyond this point, the number of segregating sites in the European and Asian panel populations is expected to overcome that of the African panel because of faster recent population growth. Overall, we find that the majority of human genomic variable sites are rare and exhibit little sharing among diverged populations. Our results emphasize that replication of disease association for specific rare genetic variants across diverged populations must overcome both reduced statistical power because of rarity and higher population divergence.
Demographic history and rare allele sharing among human populations

PubMed Central

Gravel, Simon; Henn, Brenna M.; Gutenkunst, Ryan N.; Indap, Amit R.; Marth, Gabor T.; Clark, Andrew G.; Yu, Fuli; Gibbs, Richard A.; Bustamante, Carlos D.; Altshuler, David L.; Durbin, Richard M.; Abecasis, Gonçalo R.; Bentley, David R.; Chakravarti, Aravinda; Clark, Andrew G.; Collins, Francis S.; De La Vega, Francisco M.; Donnelly, Peter; Egholm, Michael; Flicek, Paul; Gabriel, Stacey B.; Gibbs, Richard A.; Knoppers, Bartha M.; Lander, Eric S.; Lehrach, Hans; Mardis, Elaine R.; McVean, Gil A.; Nickerson, Debbie A.; Peltonen, Leena; Schafer, Alan J.; Sherry, Stephen T.; Wang, Jun; Wilson, Richard K.; Gibbs, Richard A.; Deiros, David; Metzker, Mike; Muzny, Donna; Reid, Jeff; Wheeler, David; Wang, Jun; Li, Jingxiang; Jian, Min; Li, Guoqing; Li, Ruiqiang; Liang, Huiqing; Tian, Geng; Wang, Bo; Wang, Jian; Wang, Wei; Yang, Huanming; Zhang, Xiuqing; Zheng, Huisong; Lander, Eric S.; Altshuler, David L.; Ambrogio, Lauren; Bloom, Toby; Cibulskis, Kristian; Fennell, Tim J.; Gabriel, Stacey B.; Jaffe, David B.; Shefler, Erica; Sougnez, Carrie L.; Bentley, David R.; Gormley, Niall; Humphray, Sean; Kingsbury, Zoya; Koko-Gonzales, Paula; Stone, Jennifer; McKernan, Kevin J.; Costa, Gina L.; Ichikawa, Jeffry K.; Lee, Clarence C.; Sudbrak, Ralf; Lehrach, Hans; Borodina, Tatiana A.; Dahl, Andreas; Davydov, Alexey N.; Marquardt, Peter; Mertes, Florian; Nietfeld, Wilfiried; Rosenstiel, Philip; Schreiber, Stefan; Soldatov, Aleksey V.; Timmermann, Bernd; Tolzmann, Marius; Egholm, Michael; Affourtit, Jason; Ashworth, Dana; Attiya, Said; Bachorski, Melissa; Buglione, Eli; Burke, Adam; Caprio, Amanda; Celone, Christopher; Clark, Shauna; Conners, David; Desany, Brian; Gu, Lisa; Guccione, Lorri; Kao, Kalvin; Kebbel, Andrew; Knowlton, Jennifer; Labrecque, Matthew; McDade, Louise; Mealmaker, Craig; Minderman, Melissa; Nawrocki, Anne; Niazi, Faheem; Pareja, Kristen; Ramenani, Ravi; Riches, David; Song, Wanmin; Turcotte, Cynthia; Wang, Shally; Mardis, Elaine R.; Wilson, Richard K.; Dooling, David; Fulton, Lucinda; Fulton, Robert; Weinstock, George; Durbin, Richard M.; Burton, John; Carter, David M.; Churcher, Carol; Coffey, Alison; Cox, Anthony; Palotie, Aarno; Quail, Michael; Skelly, Tom; Stalker, James; Swerdlow, Harold P.; Turner, Daniel; De Witte, Anniek; Giles, Shane; Gibbs, Richard A.; Wheeler, David; Bainbridge, Matthew; Challis, Danny; Sabo, Aniko; Yu, Fuli; Yu, Jin; Wang, Jun; Fang, Xiaodong; Guo, Xiaosen; Li, Ruiqiang; Li, Yingrui; Luo, Ruibang; Tai, Shuaishuai; Wu, Honglong; Zheng, Hancheng; Zheng, Xiaole; Zhou, Yan; Li, Guoqing; Wang, Jian; Yang, Huanming; Marth, Gabor T.; Garrison, Erik P.; Huang, Weichun; Indap, Amit; Kural, Deniz; Lee, Wan-Ping; Leong, Wen Fung; Quinlan, Aaron R.; Stewart, Chip; Stromberg, Michael P.; Ward, Alistair N.; Wu, Jiantao; Lee, Charles; Mills, Ryan E.; Shi, Xinghua; Daly, Mark J.; DePristo, Mark A.; Altshuler, David L.; Ball, Aaron D.; Banks, Eric; Bloom, Toby; Browning, Brian L.; Cibulskis, Kristian; Fennell, Tim J.; Garimella, Kiran V.; Grossman, Sharon R.; Handsaker, Robert E.; Hanna, Matt; Hartl, Chris; Jaffe, David B.; Kernytsky, Andrew M.; Korn, Joshua M.; Li, Heng; Maguire, Jared R.; McCarroll, Steven A.; McKenna, Aaron; Nemesh, James C.; Philippakis, Anthony A.; Poplin, Ryan E.; Price, Alkes; Rivas, Manuel A.; Sabeti, Pardis C.; Schaffner, Stephen F.; Shefler, Erica; Shlyakhter, Ilya A.; Cooper, David N.; Ball, Edward V.; Mort, Matthew; Phillips, Andrew D.; Stenson, Peter D.; Sebat, Jonathan; Makarov, Vladimir; Ye, Kenny; Yoon, Seungtai C.; Bustamante, Carlos D.; Clark, Andrew G.; Boyko, Adam; Degenhardt, Jeremiah; Gravel, Simon; Gutenkunst, Ryan N.; Kaganovich, Mark; Keinan, Alon; Lacroute, Phil; Ma, Xin; Reynolds, Andy; Clarke, Laura; Flicek, Paul; Cunningham, Fiona; Herrero, Javier; Keenen, Stephen; Kulesha, Eugene; Leinonen, Rasko; McLaren, William M.; Radhakrishnan, Rajesh; Smith, Richard E.; Zalunin, Vadim; Zheng-Bradley, Xiangqun; Korbel, Jan O.; Stütz, Adrian M.; Humphray, Sean; Bauer, Markus; Cheetham, R. Keira; Cox, Tony; Eberle, Michael; James, Terena; Kahn, Scott; Murray, Lisa; Chakravarti, Aravinda; Ye, Kai; De La Vega, Francisco M.; Fu, Yutao; Hyland, Fiona C. L.; Manning, Jonathan M.; McLaughlin, Stephen F.; Peckham, Heather E.; Sakarya, Onur; Sun, Yongming A.; Tsung, Eric F.; Batzer, Mark A.; Konkel, Miriam K.; Walker, Jerilyn A.; Sudbrak, Ralf; Albrecht, Marcus W.; Amstislavskiy, Vyacheslav S.; Herwig, Ralf; Parkhomchuk, Dimitri V.; Sherry, Stephen T.; Agarwala, Richa; Khouri, Hoda M.; Morgulis, Aleksandr O.; Paschall, Justin E.; Phan, Lon D.; Rotmistrovsky, Kirill E.; Sanders, Robert D.; Shumway, Martin F.; Xiao, Chunlin; McVean, Gil A.; Auton, Adam; Iqbal, Zamin; Lunter, Gerton; Marchini, Jonathan L.; Moutsianas, Loukas; Myers, Simon; Tumian, Afidalina; Desany, Brian; Knight, James; Winer, Roger; Craig, David W.; Beckstrom-Sternberg, Steve M.; Christoforides, Alexis; Kurdoglu, Ahmet A.; Pearson, John V.; Sinari, Shripad A.; Tembe, Waibhav D.; Haussler, David; Hinrichs, Angie S.; Katzman, Sol J.; Kern, Andrew; Kuhn, Robert M.; Przeworski, Molly; Hernandez, Ryan D.; Howie, Bryan; Kelley, Joanna L.; Melton, S. Cord; Abecasis, Gonçalo R.; Li, Yun; Anderson, Paul; Blackwell, Tom; Chen, Wei; Cookson, William O.; Ding, Jun; Kang, Hyun Min; Lathrop, Mark; Liang, Liming; Moffatt, Miriam F.; Scheet, Paul; Sidore, Carlo; Snyder, Matthew; Zhan, Xiaowei; Zöllner, Sebastian; Awadalla, Philip; Casals, Ferran; Idaghdour, Youssef; Keebler, John; Stone, Eric A.; Zilversmit, Martine; Jorde, Lynn; Xing, Jinchuan; Eichler, Evan E.; Aksay, Gozde; Alkan, Can; Hajirasouliha, Iman; Hormozdiari, Fereydoun; Kidd, Jeffrey M.; Sahinalp, S. Cenk; Sudmant, Peter H.; Mardis, Elaine R.; Chen, Ken; Chinwalla, Asif; Ding, Li; Koboldt, Daniel C.; McLellan, Mike D.; Dooling, David; Weinstock, George; Wallis, John W.; Wendl, Michael C.; Zhang, Qunyuan; Durbin, Richard M.; Albers, Cornelis A.; Ayub, Qasim; Balasubramaniam, Senduran; Barrett, Jeffrey C.; Carter, David M.; Chen, Yuan; Conrad, Donald F.; Danecek, Petr; Dermitzakis, Emmanouil T.; Hu, Min; Huang, Ni; Hurles, Matt E.; Jin, Hanjun; Jostins, Luke; Keane, Thomas M.; Le, Si Quang; Lindsay, Sarah; Long, Quan; MacArthur, Daniel G.; Montgomery, Stephen B.; Parts, Leopold; Stalker, James; Tyler-Smith, Chris; Walter, Klaudia; Zhang, Yujun; Gerstein, Mark B.; Snyder, Michael; Abyzov, Alexej; Balasubramanian, Suganthi; Bjornson, Robert; Du, Jiang; Grubert, Fabian; Habegger, Lukas; Haraksingh, Rajini; Jee, Justin; Khurana, Ekta; Lam, Hugo Y. K.; Leng, Jing; Mu, Xinmeng Jasmine; Urban, Alexander E.; Zhang, Zhengdong; Li, Yingrui; Luo, Ruibang; Marth, Gabor T.; Garrison, Erik P.; Kural, Deniz; Quinlan, Aaron R.; Stewart, Chip; Stromberg, Michael P.; Ward, Alistair N.; Wu, Jiantao; Lee, Charles; Mills, Ryan E.; Shi, Xinghua; McCarroll, Steven A.; Banks, Eric; DePristo, Mark A.; Handsaker, Robert E.; Hartl, Chris; Korn, Joshua M.; Li, Heng; Nemesh, James C.; Sebat, Jonathan; Makarov, Vladimir; Ye, Kenny; Yoon, Seungtai C.; Degenhardt, Jeremiah; Kaganovich, Mark; Clarke, Laura; Smith, Richard E.; Zheng-Bradley, Xiangqun; Korbel, Jan O.; Humphray, Sean; Cheetham, R. Keira; Eberle, Michael; Kahn, Scott; Murray, Lisa; Ye, Kai; De La Vega, Francisco M.; Fu, Yutao; Peckham, Heather E.; Sun, Yongming A.; Batzer, Mark A.; Konkel, Miriam K.; Walker, Jerilyn A.; Xiao, Chunlin; Iqbal, Zamin; Desany, Brian; Blackwell, Tom; Snyder, Matthew; Xing, Jinchuan; Eichler, Evan E.; Aksay, Gozde; Alkan, Can; Hajirasouliha, Iman; Hormozdiari, Fereydoun; Kidd, Jeffrey M.; Chen, Ken; Chinwalla, Asif; Ding, Li; McLellan, Mike D.; Wallis, John W.; Hurles, Matt E.; Conrad, Donald F.; Walter, Klaudia; Zhang, Yujun; Gerstein, Mark B.; Snyder, Michael; Abyzov, Alexej; Du, Jiang; Grubert, Fabian; Haraksingh, Rajini; Jee, Justin; Khurana, Ekta; Lam, Hugo Y. K.; Leng, Jing; Mu, Xinmeng Jasmine; Urban, Alexander E.; Zhang, Zhengdong; Gibbs, Richard A.; Bainbridge, Matthew; Challis, Danny; Coafra, Cristian; Dinh, Huyen; Kovar, Christie; Lee, Sandy; Muzny, Donna; Nazareth, Lynne; Reid, Jeff; Sabo, Aniko; Yu, Fuli; Yu, Jin; Marth, Gabor T.; Garrison, Erik P.; Indap, Amit; Leong, Wen Fung; Quinlan, Aaron R.; Stewart, Chip; Ward, Alistair N.; Wu, Jiantao; Cibulskis, Kristian; Fennell, Tim J.; Gabriel, Stacey B.; Garimella, Kiran V.; Hartl, Chris; Shefler, Erica; Sougnez, Carrie L.; Wilkinson, Jane; Clark, Andrew G.; Gravel, Simon; Grubert, Fabian; Clarke, Laura; Flicek, Paul; Smith, Richard E.; Zheng-Bradley, Xiangqun; Sherry, Stephen T.; Khouri, Hoda M.; Paschall, Justin E.; Shumway, Martin F.; Xiao, Chunlin; McVean, Gil A.; Katzman, Sol J.; Abecasis, Gonçalo R.; Blackwell, Tom; Mardis, Elaine R.; Dooling, David; Fulton, Lucinda; Fulton, Robert; Koboldt, Daniel C.; Durbin, Richard M.; Balasubramaniam, Senduran; Coffey, Allison; Keane, Thomas M.; MacArthur, Daniel G.; Palotie, Aarno; Scott, Carol; Stalker, James; Tyler-Smith, Chris; Gerstein, Mark B.; Balasubramanian, Suganthi; Chakravarti, Aravinda; Knoppers, Bartha M.; Abecasis, Gonçalo R.; Bustamante, Carlos D.; Gharani, Neda; Gibbs, Richard A.; Jorde, Lynn; Kaye, Jane S.; Kent, Alastair; Li, Taosha; McGuire, Amy L.; McVean, Gil A.; Ossorio, Pilar N.; Rotimi, Charles N.; Su, Yeyang; Toji, Lorraine H.; TylerSmith, Chris; Brooks, Lisa D.; Felsenfeld, Adam L.; McEwen, Jean E.; Abdallah, Assya; Juenger, Christopher R.; Clemm, Nicholas C.; Collins, Francis S.; Duncanson, Audrey; Green, Eric D.; Guyer, Mark S.; Peterson, Jane L.; Schafer, Alan J.; Abecasis, Gonçalo R.; Altshuler, David L.; Auton, Adam; Brooks, Lisa D.; Durbin, Richard M.; Gibbs, Richard A.; Hurles, Matt E.; McVean, Gil A.

2011-01-01

High-throughput sequencing technology enables population-level surveys of human genomic variation. Here, we examine the joint allele frequency distributions across continental human populations and present an approach for combining complementary aspects of whole-genome, low-coverage data and targeted high-coverage data. We apply this approach to data generated by the pilot phase of the Thousand Genomes Project, including whole-genome 2–4× coverage data for 179 samples from HapMap European, Asian, and African panels as well as high-coverage target sequencing of the exons of 800 genes from 697 individuals in seven populations. We use the site frequency spectra obtained from these data to infer demographic parameters for an Out-of-Africa model for populations of African, European, and Asian descent and to predict, by a jackknife-based approach, the amount of genetic diversity that will be discovered as sample sizes are increased. We predict that the number of discovered nonsynonymous coding variants will reach 100,000 in each population after ∼1,000 sequenced chromosomes per population, whereas ∼2,500 chromosomes will be needed for the same number of synonymous variants. Beyond this point, the number of segregating sites in the European and Asian panel populations is expected to overcome that of the African panel because of faster recent population growth. Overall, we find that the majority of human genomic variable sites are rare and exhibit little sharing among diverged populations. Our results emphasize that replication of disease association for specific rare genetic variants across diverged populations must overcome both reduced statistical power because of rarity and higher population divergence. PMID:21730125
A rationally designed six-residue swap generates comparability in the aggregation behavior of α-synuclein and β-synuclein.

PubMed

Roodveldt, Cintia; Andersson, August; De Genst, Erwin J; Labrador-Garrido, Adahir; Buell, Alexander K; Dobson, Christopher M; Tartaglia, Gian Gaetano; Vendruscolo, Michele

2012-11-06

The aggregation process of α-synuclein, a protein closely associated with Parkinson's disease, is highly sensitive to sequence variations. It is therefore of great importance to understand the factors that define the aggregation propensity of specific mutational variants as well as their toxic behavior in the cellular environment. In this context, we investigated the extent to which the aggregation behavior of α-synuclein can be altered to resemble that of β-synuclein, an aggregation-resistant homologue of α-synuclein not associated with disease, by swapping residues between the two proteins. Because of the vast number of possible swaps, we have applied a rational design procedure to single out a mutational variant, called α2β, in which two short stretches of the sequence in the NAC region have been replaced in α-synuclein from β-synuclein. We find not only that the aggregation rate of α2β is close to that of β-synuclein, being much lower than that of α-synuclein, but also that α2β effectively changes the cellular toxicity of α-synuclein to a value similar to that of β-synuclein upon exposure of SH-SY5Y cells to preformed oligomers. Remarkably, control experiments on the corresponding mutational variant of β-synuclein, called β2α, confirmed that the mutations that we have identified also shift the aggregation behavior of this protein toward that of α-synuclein. These results demonstrate that it is becoming possible to control in quantitative detail the sequence code that defines the aggregation behavior and toxicity of α-synuclein.
Exome sequencing and genome-wide linkage analysis in 17 families illustrate the complex contribution of TTN truncating variants to dilated cardiomyopathy.

PubMed

Norton, Nadine; Li, Duanxiang; Rampersaud, Evadnie; Morales, Ana; Martin, Eden R; Zuchner, Stephan; Guo, Shengru; Gonzalez, Michael; Hedges, Dale J; Robertson, Peggy D; Krumm, Niklas; Nickerson, Deborah A; Hershberger, Ray E

2013-04-01

BACKGROUND- Familial dilated cardiomyopathy (DCM) is a genetically heterogeneous disease with >30 known genes. TTN truncating variants were recently implicated in a candidate gene study to cause 25% of familial and 18% of sporadic DCM cases. METHODS AND RESULTS- We used an unbiased genome-wide approach using both linkage analysis and variant filtering across the exome sequences of 48 individuals affected with DCM from 17 families to identify genetic cause. Linkage analysis ranked the TTN region as falling under the second highest genome-wide multipoint linkage peak, multipoint logarithm of odds, 1.59. We identified 6 TTN truncating variants carried by individuals affected with DCM in 7 of 17 DCM families (logarithm of odds, 2.99); 2 of these 7 families also had novel missense variants that segregated with disease. Two additional novel truncating TTN variants did not segregate with DCM. Nucleotide diversity at the TTN locus, including missense variants, was comparable with 5 other known DCM genes. The average number of missense variants in the exome sequences from the DCM cases or the ≈5400 cases from the Exome Sequencing Project was ≈23 per individual. The average number of TTN truncating variants in the Exome Sequencing Project was 0.014 per individual. We also identified a region (chr9q21.11-q22.31) with no known DCM genes with a maximum heterogeneity logarithm of odds score of 1.74. CONCLUSIONS- These data suggest that TTN truncating variants contribute to DCM cause. However, the lack of segregation of all identified TTN truncating variants illustrates the challenge of determining variant pathogenicity even with full exome sequencing.
Sensitivity of BRCA1/2 testing in high-risk breast/ovarian/male breast cancer families: little contribution of comprehensive RNA/NGS panel testing.

PubMed

Byers, Helen; Wallis, Yvonne; van Veen, Elke M; Lalloo, Fiona; Reay, Kim; Smith, Philip; Wallace, Andrew J; Bowers, Naomi; Newman, William G; Evans, D Gareth

2016-11-01

The sensitivity of testing BRCA1 and BRCA2 remains unresolved as the frequency of deep intronic splicing variants has not been defined in high-risk familial breast/ovarian cancer families. This variant category is reported at significant frequency in other tumour predisposition genes, including NF1 and MSH2. We carried out comprehensive whole gene RNA analysis on 45 high-risk breast/ovary and male breast cancer families with no identified pathogenic variant on exonic sequencing and copy number analysis of BRCA1/2. In addition, we undertook variant screening of a 10-gene high/moderate risk breast/ovarian cancer panel by next-generation sequencing. DNA testing identified the causative variant in 50/56 (89%) breast/ovarian/male breast cancer families with Manchester scores of ≥50 with two variants being confirmed to affect splicing on RNA analysis. RNA sequencing of BRCA1/BRCA2 on 45 individuals from high-risk families identified no deep intronic variants and did not suggest loss of RNA expression as a cause of lost sensitivity. Panel testing in 42 samples identified a known RAD51D variant, a high-risk ATM variant in another breast ovary family and a truncating CHEK2 mutation. Current exonic sequencing and copy number analysis variant detection methods of BRCA1/2 have high sensitivity in high-risk breast/ovarian cancer families. Sequence analysis of RNA does not identify any variants undetected by current analysis of BRCA1/2. However, RNA analysis clarified the pathogenicity of variants of unknown significance detected by current methods. The low diagnostic uplift achieved through sequence analysis of the other known breast/ovarian cancer susceptibility genes indicates that further high-risk genes remain to be identified.
Identification of protein-damaging mutations in 10 swine taste receptors and 191 appetite-reward genes.

PubMed

Clop, Alex; Sharaf, Abdoallah; Castelló, Anna; Ramos-Onsins, Sebastián; Cirera, Susanna; Mercadé, Anna; Derdak, Sophia; Beltran, Sergi; Huisman, Abe; Fredholm, Merete; van As, Pieter; Sánchez, Armand

2016-08-26

Taste receptors (TASRs) are essential for the body's recognition of chemical compounds. In the tongue, TASRs sense the sweet and umami and the toxin-related bitter taste thus promoting a particular eating behaviour. Moreover, their relevance in other organs is now becoming evident. In the intestine, they regulate nutrient absorption and gut motility. Upon ligand binding, TASRs activate the appetite-reward circuitry to signal the nervous system and keep body homeostasis. With the aim to identify genetic variation in the swine TASRs and in the genes from the appetite and the reward pathways, we have sequenced the exons of 201 TASRs and appetite-reward genes from 304 pigs belonging to ten breeds, wild boars and to two phenotypically extreme groups from a F2 resource with data on growth and fat deposition. We identified 2,766 coding variants 395 of which were predicted to have a strong impact on protein sequence and function. 334 variants were present in only one breed and at predicted alternative allele frequency (pAAF) ≥ 0.1. The Asian pigs and the wild boars showed the largest proportion of breed specific variants. We also compared the pAAF of the two F2 groups and found that variants in TAS2R39 and CD36 display significant differences suggesting that these genes could influence growth and fat deposition. We developed a 128-variant genotyping assay and confirmed 57 of these variants. We have identified thousands of variants affecting TASRs as well as genes involved in the appetite and the reward mechanisms. Some of these genes have been already associated to taste preferences, appetite or behaviour in humans and mouse. We have also detected indications of a potential relationship of some of these genes with growth and fat deposition, which could have been caused by changes in taste preferences, appetite or reward and ultimately impact on food intake. A genotyping array with 57 variants in 31 of these genes is now available for genotyping and start elucidating the impact of genetic variation in these genes on pig biology and breeding.
Creating reference gene annotation for the mouse C57BL6/J genome assembly.

PubMed

Mudge, Jonathan M; Harrow, Jennifer

2015-10-01

Annotation on the reference genome of the C57BL6/J mouse has been an ongoing project ever since the draft genome was first published. Initially, the principle focus was on the identification of all protein-coding genes, although today the importance of describing long non-coding RNAs, small RNAs, and pseudogenes is recognized. Here, we describe the progress of the GENCODE mouse annotation project, which combines manual annotation from the HAVANA group with Ensembl computational annotation, alongside experimental and in silico validation pipelines from other members of the consortium. We discuss the more recent incorporation of next-generation sequencing datasets into this workflow, including the usage of mass-spectrometry data to potentially identify novel protein-coding genes. Finally, we will outline how the C57BL6/J genebuild can be used to gain insights into the variant sites that distinguish different mouse strains and species.

Multiplexed resequencing analysis to identify rare variants in pooled DNA with barcode indexing using next-generation sequencer.

PubMed

Mitsui, Jun; Fukuda, Yoko; Azuma, Kyo; Tozaki, Hirokazu; Ishiura, Hiroyuki; Takahashi, Yuji; Goto, Jun; Tsuji, Shoji

2010-07-01

We have recently found that multiple rare variants of the glucocerebrosidase gene (GBA) confer a robust risk for Parkinson disease, supporting the 'common disease-multiple rare variants' hypothesis. To develop an efficient method of identifying rare variants in a large number of samples, we applied multiplexed resequencing using a next-generation sequencer to identification of rare variants of GBA. Sixteen sets of pooled DNAs from six pooled DNA samples were prepared. Each set of pooled DNAs was subjected to polymerase chain reaction to amplify the target gene (GBA) covering 6.5 kb, pooled into one tube with barcode indexing, and then subjected to extensive sequence analysis using the SOLiD System. Individual samples were also subjected to direct nucleotide sequence analysis. With the optimization of data processing, we were able to extract all the variants from 96 samples with acceptable rates of false-positive single-nucleotide variants.
SNPnexus: assessing the functional relevance of genetic variation to facilitate the promise of precision medicine.

PubMed

Dayem Ullah, Abu Z; Oscanoa, Jorge; Wang, Jun; Nagano, Ai; Lemoine, Nicholas R; Chelala, Claude

2018-05-11

Broader functional annotation of genetic variation is a valuable means for prioritising phenotypically-important variants in further disease studies and large-scale genotyping projects. We developed SNPnexus to meet this need by assessing the potential significance of known and novel SNPs on the major transcriptome, proteome, regulatory and structural variation models. Since its previous release in 2012, we have made significant improvements to the annotation categories and updated the query and data viewing systems. The most notable changes include broader functional annotation of noncoding variants and expanding annotations to the most recent human genome assembly GRCh38/hg38. SNPnexus has now integrated rich resources from ENCODE and Roadmap Epigenomics Consortium to map and annotate the noncoding variants onto different classes of regulatory regions and noncoding RNAs as well as providing their predicted functional impact from eight popular non-coding variant scoring algorithms and computational methods. A novel functionality offered now is the support for neo-epitope predictions from leading tools to facilitate its use in immunotherapeutic applications. These updates to SNPnexus are in preparation for its future expansion towards a fully comprehensive computational workflow for disease-associated variant prioritization from sequencing data, placing its users at the forefront of translational research. SNPnexus is freely available at http://www.snp-nexus.org.
A Perfect Match Genomic Landscape Provides a Unified Framework for the Precise Detection of Variation in Natural and Synthetic Haploid Genomes

PubMed Central

Palacios-Flores, Kim; García-Sotelo, Jair; Castillo, Alejandra; Uribe, Carina; Aguilar, Luis; Morales, Lucía; Gómez-Romero, Laura; Reyes, José; Garciarubio, Alejandro; Boege, Margareta; Dávila, Guillermo

2018-01-01

We present a conceptually simple, sensitive, precise, and essentially nonstatistical solution for the analysis of genome variation in haploid organisms. The generation of a Perfect Match Genomic Landscape (PMGL), which computes intergenome identity with single nucleotide resolution, reveals signatures of variation wherever a query genome differs from a reference genome. Such signatures encode the precise location of different types of variants, including single nucleotide variants, deletions, insertions, and amplifications, effectively introducing the concept of a general signature of variation. The precise nature of variants is then resolved through the generation of targeted alignments between specific sets of sequence reads and known regions of the reference genome. Thus, the perfect match logic decouples the identification of the location of variants from the characterization of their nature, providing a unified framework for the detection of genome variation. We assessed the performance of the PMGL strategy via simulation experiments. We determined the variation profiles of natural genomes and of a synthetic chromosome, both in the context of haploid yeast strains. Our approach uncovered variants that have previously escaped detection. Moreover, our strategy is ideally suited for further refining high-quality reference genomes. The source codes for the automated PMGL pipeline have been deposited in a public repository. PMID:29367403
A novel variant in the SLC12A1 gene in two families with antenatal Bartter syndrome.

PubMed

Breinbjerg, Anders; Siggaard Rittig, Charlotte; Gregersen, Niels; Rittig, Søren; Hvarregaard Christensen, Jane

2017-01-01

Bartter syndrome is an autosomal-recessive inherited disease in which patients present with hypokalaemia and metabolic alkalosis. We present two apparently nonrelated cases with antenatal Bartter syndrome type I, due to a novel variant in the SLC12A1 gene encoding the bumetanide-sensitive sodium-(potassium)-chloride cotransporter 2 in the thick ascending limb of the loop of Henle. Blood samples were received from the two cases and 19 of their relatives, and deoxyribonucleic acid was extracted. The coding regions of the SLC12A1 gene were amplified using polymerase chain reaction, followed by bidirectional direct deoxyribonucleic acid sequencing. Each affected child in the two families was homozygous for a novel inherited variant in the SLC12A1gene, c.1614T>A. The variant predicts a change from a tyrosine codon to a stop codon (p.Tyr538Ter). The two cases presented antenatally and at six months of age, respectively. The two cases were homozygous for the same variant in the SLC12A1 gene, but presented clinically at different ages. This could eventually be explained by the presence of other gene variants or environmental factors modifying the phenotypes. The phenotypes of the patients were similar to other patients with antenatal Bartter syndrome. ©2016 Foundation Acta Paediatrica. Published by John Wiley & Sons Ltd.
Allele-Specific Methylation Occurs at Genetic Variants Associated with Complex Disease

PubMed Central

Hutchinson, John N.; Raj, Towfique; Fagerness, Jes; Stahl, Eli; Viloria, Fernando T.; Gimelbrant, Alexander; Seddon, Johanna; Daly, Mark; Chess, Andrew; Plenge, Robert

2014-01-01

We hypothesize that the phenomenon of allele-specific methylation (ASM) may underlie the phenotypic effects of multiple variants identified by Genome-Wide Association studies (GWAS). We evaluate ASM in a human population and document its genome-wide patterns in an initial screen at up to 380,678 sites within the genome, or up to 5% of the total genomic CpGs. We show that while substantial inter-individual variation exists, 5% of assessed sites show evidence of ASM in at least six samples; the majority of these events (81%) are under genetic influence. Many of these cis-regulated ASM variants are also eQTLs in peripheral blood mononuclear cells and monocytes and/or in high linkage-disequilibrium with variants linked to complex disease. Finally, focusing on autoimmune phenotypes, we extend this initial screen to confirm the association of cis-regulated ASM with multiple complex disease-associated variants in an independent population using next-generation bisulfite sequencing. These four variants are implicated in complex phenotypes such as ulcerative colitis and AIDS progression disease (rs10491434), Celiac disease (rs2762051), Crohn's disease, IgA nephropathy and early-onset inflammatory bowel disease (rs713875) and height (rs6569648). Our results suggest cis-regulated ASM may provide a mechanistic link between the non-coding genetic changes and phenotypic variation observed in these diseases and further suggests a route to integrating DNA methylation status with GWAS results. PMID:24911414
VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research

PubMed Central

Lai, Zhongwu; Markovets, Aleksandra; Ahdesmaki, Miika; Chapman, Brad; Hofmann, Oliver; McEwen, Robert; Johnson, Justin; Dougherty, Brian; Barrett, J. Carl; Dry, Jonathan R.

2016-01-01

Abstract Accurate variant calling in next generation sequencing (NGS) is critical to understand cancer genomes better. Here we present VarDict, a novel and versatile variant caller for both DNA- and RNA-sequencing data. VarDict simultaneously calls SNV, MNV, InDels, complex and structural variants, expanding the detected genetic driver landscape of tumors. It performs local realignments on the fly for more accurate allele frequency estimation. VarDict performance scales linearly to sequencing depth, enabling ultra-deep sequencing used to explore tumor evolution or detect tumor DNA circulating in blood. In addition, VarDict performs amplicon aware variant calling for polymerase chain reaction (PCR)-based targeted sequencing often used in diagnostic settings, and is able to detect PCR artifacts. Finally, VarDict also detects differences in somatic and loss of heterozygosity variants between paired samples. VarDict reprocessing of The Cancer Genome Atlas (TCGA) Lung Adenocarcinoma dataset called known driver mutations in KRAS, EGFR, BRAF, PIK3CA and MET in 16% more patients than previously published variant calls. We believe VarDict will greatly facilitate application of NGS in clinical cancer research. PMID:27060149
Finding cancer driver mutations in the era of big data research.

PubMed

Poulos, Rebecca C; Wong, Jason W H

2018-04-02

In the last decade, the costs of genome sequencing have decreased considerably. The commencement of large-scale cancer sequencing projects has enabled cancer genomics to join the big data revolution. One of the challenges still facing cancer genomics research is determining which are the driver mutations in an individual cancer, as these contribute only a small subset of the overall mutation profile of a tumour. Focusing primarily on somatic single nucleotide mutations in this review, we consider both coding and non-coding driver mutations, and discuss how such mutations might be identified from cancer sequencing datasets. We describe some of the tools and database that are available for the annotation of somatic variants and the identification of cancer driver genes. We also address the use of genome-wide variation in mutation load to establish background mutation rates from which to identify driver mutations under positive selection. Finally, we describe the ways in which mutational signatures can act as clues for the identification of cancer drivers, as these mutations may cause, or arise from, certain mutational processes. By defining the molecular changes responsible for driving cancer development, new cancer treatment strategies may be developed or novel preventative measures proposed.
FAVR (Filtering and Annotation of Variants that are Rare): methods to facilitate the analysis of rare germline genetic variants from massively parallel sequencing datasets

PubMed Central

2013-01-01

Background Characterising genetic diversity through the analysis of massively parallel sequencing (MPS) data offers enormous potential to significantly improve our understanding of the genetic basis for observed phenotypes, including predisposition to and progression of complex human disease. Great challenges remain in resolving genetic variants that are genuine from the millions of artefactual signals. Results FAVR is a suite of new methods designed to work with commonly used MPS analysis pipelines to assist in the resolution of some of the issues related to the analysis of the vast amount of resulting data, with a focus on relatively rare genetic variants. To the best of our knowledge, no equivalent method has previously been described. The most important and novel aspect of FAVR is the use of signatures in comparator sequence alignment files during variant filtering, and annotation of variants potentially shared between individuals. The FAVR methods use these signatures to facilitate filtering of (i) platform and/or mapping-specific artefacts, (ii) common genetic variants, and, where relevant, (iii) artefacts derived from imbalanced paired-end sequencing, as well as annotation of genetic variants based on evidence of co-occurrence in individuals. We applied conventional variant calling applied to whole-exome sequencing datasets, produced using both SOLiD and TruSeq chemistries, with or without downstream processing by FAVR methods. We demonstrate a 3-fold smaller rare single nucleotide variant shortlist with no detected reduction in sensitivity. This analysis included Sanger sequencing of rare variant signals not evident in dbSNP131, assessment of known variant signal preservation, and comparison of observed and expected rare variant numbers across a range of first cousin pairs. The principles described herein were applied in our recent publication identifying XRCC2 as a new breast cancer risk gene and have been made publically available as a suite of software tools. Conclusions FAVR is a platform-agnostic suite of methods that significantly enhances the analysis of large volumes of sequencing data for the study of rare genetic variants and their influence on phenotypes. PMID:23441864
Clinical Validation and Implementation of a Targeted Next-Generation Sequencing Assay to Detect Somatic Variants in Non-Small Cell Lung, Melanoma, and Gastrointestinal Malignancies

PubMed Central

Fisher, Kevin E.; Zhang, Linsheng; Wang, Jason; Smith, Geoffrey H.; Newman, Scott; Schneider, Thomas M.; Pillai, Rathi N.; Kudchadkar, Ragini R.; Owonikoko, Taofeek K.; Ramalingam, Suresh S.; Lawson, David H.; Delman, Keith A.; El-Rayes, Bassel F.; Wilson, Malania M.; Sullivan, H. Clifford; Morrison, Annie S.; Balci, Serdar; Adsay, N. Volkan; Gal, Anthony A.; Sica, Gabriel L.; Saxe, Debra F.; Mann, Karen P.; Hill, Charles E.; Khuri, Fadlo R.; Rossi, Michael R.

2017-01-01

We tested and clinically validated a targeted next-generation sequencing (NGS) mutation panel using 80 formalin-fixed, paraffin-embedded (FFPE) tumor samples. Forty non-small cell lung carcinoma (NSCLC), 30 melanoma, and 30 gastrointestinal (12 colonic, 10 gastric, and 8 pancreatic adenocarcinoma) FFPE samples were selected from laboratory archives. After appropriate specimen and nucleic acid quality control, 80 NGS libraries were prepared using the Illumina TruSight tumor (TST) kit and sequenced on the Illumina MiSeq. Sequence alignment, variant calling, and sequencing quality control were performed using vendor software and laboratory-developed analysis workflows. TST generated ≥500× coverage for 98.4% of the 13,952 targeted bases. Reproducible and accurate variant calling was achieved at ≥5% variant allele frequency with 8 to 12 multiplexed samples per MiSeq flow cell. TST detected 112 variants overall, and confirmed all known single-nucleotide variants (n = 27), deletions (n = 5), insertions (n = 3), and multinucleotide variants (n = 3). TST detected at least one variant in 85.0% (68/80), and two or more variants in 36.2% (29/80), of samples. TP53 was the most frequently mutated gene in NSCLC (13 variants; 13/32 samples), gastrointestinal malignancies (15 variants; 13/25 samples), and overall (30 variants; 28/80 samples). BRAF mutations were most common in melanoma (nine variants; 9/23 samples). Clinically relevant NGS data can be obtained from routine clinical FFPE solid tumor specimens using TST, benchtop instruments, and vendor-supplied bioinformatics pipelines. PMID:26801070
Exome sequencing identifies rare LDLR and APOA5 alleles conferring risk for myocardial infarction.

PubMed

Do, Ron; Stitziel, Nathan O; Won, Hong-Hee; Jørgensen, Anders Berg; Duga, Stefano; Angelica Merlini, Pier; Kiezun, Adam; Farrall, Martin; Goel, Anuj; Zuk, Or; Guella, Illaria; Asselta, Rosanna; Lange, Leslie A; Peloso, Gina M; Auer, Paul L; Girelli, Domenico; Martinelli, Nicola; Farlow, Deborah N; DePristo, Mark A; Roberts, Robert; Stewart, Alexander F R; Saleheen, Danish; Danesh, John; Epstein, Stephen E; Sivapalaratnam, Suthesh; Hovingh, G Kees; Kastelein, John J; Samani, Nilesh J; Schunkert, Heribert; Erdmann, Jeanette; Shah, Svati H; Kraus, William E; Davies, Robert; Nikpay, Majid; Johansen, Christopher T; Wang, Jian; Hegele, Robert A; Hechter, Eliana; Marz, Winfried; Kleber, Marcus E; Huang, Jie; Johnson, Andrew D; Li, Mingyao; Burke, Greg L; Gross, Myron; Liu, Yongmei; Assimes, Themistocles L; Heiss, Gerardo; Lange, Ethan M; Folsom, Aaron R; Taylor, Herman A; Olivieri, Oliviero; Hamsten, Anders; Clarke, Robert; Reilly, Dermot F; Yin, Wu; Rivas, Manuel A; Donnelly, Peter; Rossouw, Jacques E; Psaty, Bruce M; Herrington, David M; Wilson, James G; Rich, Stephen S; Bamshad, Michael J; Tracy, Russell P; Cupples, L Adrienne; Rader, Daniel J; Reilly, Muredach P; Spertus, John A; Cresci, Sharon; Hartiala, Jaana; Tang, W H Wilson; Hazen, Stanley L; Allayee, Hooman; Reiner, Alex P; Carlson, Christopher S; Kooperberg, Charles; Jackson, Rebecca D; Boerwinkle, Eric; Lander, Eric S; Schwartz, Stephen M; Siscovick, David S; McPherson, Ruth; Tybjaerg-Hansen, Anne; Abecasis, Goncalo R; Watkins, Hugh; Nickerson, Deborah A; Ardissino, Diego; Sunyaev, Shamil R; O'Donnell, Christopher J; Altshuler, David; Gabriel, Stacey; Kathiresan, Sekar

2015-02-05

Myocardial infarction (MI), a leading cause of death around the world, displays a complex pattern of inheritance. When MI occurs early in life, genetic inheritance is a major component to risk. Previously, rare mutations in low-density lipoprotein (LDL) genes have been shown to contribute to MI risk in individual families, whereas common variants at more than 45 loci have been associated with MI risk in the population. Here we evaluate how rare mutations contribute to early-onset MI risk in the population. We sequenced the protein-coding regions of 9,793 genomes from patients with MI at an early age (≤50 years in males and ≤60 years in females) along with MI-free controls. We identified two genes in which rare coding-sequence mutations were more frequent in MI cases versus controls at exome-wide significance. At low-density lipoprotein receptor (LDLR), carriers of rare non-synonymous mutations were at 4.2-fold increased risk for MI; carriers of null alleles at LDLR were at even higher risk (13-fold difference). Approximately 2% of early MI cases harbour a rare, damaging mutation in LDLR; this estimate is similar to one made more than 40 years ago using an analysis of total cholesterol. Among controls, about 1 in 217 carried an LDLR coding-sequence mutation and had plasma LDL cholesterol > 190 mg dl(-1). At apolipoprotein A-V (APOA5), carriers of rare non-synonymous mutations were at 2.2-fold increased risk for MI. When compared with non-carriers, LDLR mutation carriers had higher plasma LDL cholesterol, whereas APOA5 mutation carriers had higher plasma triglycerides. Recent evidence has connected MI risk with coding-sequence mutations at two genes functionally related to APOA5, namely lipoprotein lipase and apolipoprotein C-III (refs 18, 19). Combined, these observations suggest that, as well as LDL cholesterol, disordered metabolism of triglyceride-rich lipoproteins contributes to MI risk.
Uncovering the Rare Variants of DLC1 Isoform 1 and Their Functional Effects in a Chinese Sporadic Congenital Heart Disease Cohort

PubMed Central

Wang, Zhen; Tan, Huilian; Kong, Xianghua; Shu, Yang; Zhang, Yuchao; Huang, Yun; Zhu, Yufei; Xu, Heng; Wang, Zhiqiang; Wang, Ping; Ning, Guang; Kong, Xiangyin; Hu, Guohong; Hu, Landian

2014-01-01

Congenital heart disease (CHD) is the most common birth defect affecting the structure and function of fetal hearts. Despite decades of extensive studies, the genetic mechanism of sporadic CHD remains obscure. Deleted in liver cancer 1 (DLC1) gene, encoding a GTPase-activating protein, is highly expressed in heart and essential for heart development according to the knowledge of Dlc1-deficient mice. To determine whether DLC1 is a susceptibility gene for sporadic CHD, we sequenced the coding region of DLC1 isoform 1 in 151 sporadic CHD patients and identified 13 non-synonymous rare variants (including 6 private variants) in the case cohort. Importantly, these rare variants (8/13) were enriched in the N-terminal region of the DLC1 isoform 1 protein. Seven of eight amino acids at the N-terminal variant positions were conserved among the primates. Among the 9 rare variants that were predicted as “damaging”, five were located at the N-terminal region. Ensuing in vitro functional assays showed that three private variants (Met360Lys, Glu418Lys and Asp554Val) impaired the ability of DLC1 to inhibit cell migration or altered the subcellular location of the protein compared to wild-type DLC1 isoform 1. These data suggest that DLC1 might act as a CHD-associated gene in addition to its role as a tumor suppressor in cancer. PMID:24587289
Mutational Analysis of TAC3 and TACR3 Genes in Patients with Idiopathic Central Pubertal Disorders

PubMed Central

Tusset, Cintia; Noel, Sekoni D.; Trarbach, Ericka B.; Silveira, Letícia F. G.; Jorge, Alexander A. L.; Brito, Vinicius N.; Cukier, Priscila; Seminara, Stephanie B.; de Mendonça, Berenice B.; Kaiser, Ursula B.; Latronico, Ana Claudia

2013-01-01

Aim To investigate the presence of variants in the TAC3 and TACR3 genes, which encode NKB and its receptor (NK3R), respectively, in a large cohort of patients with idiopathic central pubertal disorders. Patients and Methods Two hundred and thirty seven patients were studied: 114 with central precocious puberty (CPP), 73 with normosmic isolated hypogonadotropic hypogonadism (IHH) and 50 with constitutional delay of growth and puberty (CDGP). The control group consisted of 150 Brazilian individuals with normal pubertal development. Genomic DNA was extracted from peripheral blood and the entire coding region of both TAC3 and TACR3 genes were amplified and automatically sequenced. Results We identified one variant (p.A63P) in NKB and four variants, p.G18D, p.L58L (c.172C>T), p.W275* and p.A449S in NK3R, which were absent in the control group. The p.A63P variant was identified in a girl with CPP, and p.A449S in a girl with CDGP. The known p.G18D, p.L58L and p.W275* variants were identified in three unrelated males with normosmic IHH. Conclusion Rare variants in the TAC3 and TACR3 genes were identified in patients with central pubertal disorders. Loss-of-function variants of TACR3 were associated with the normosmic IHH phenotype. PMID:23329188
Role of LRRK2 and SNCA in autosomal dominant Parkinson's disease in Turkey.

PubMed

Kessler, Christoph; Atasu, Burcu; Hanagasi, Hasmet; Simón-Sánchez, Javier; Hauser, Ann-Kathrin; Pak, Meltem; Bilgic, Basar; Erginel-Unaltuna, Nihan; Gurvit, Hakan; Gasser, Thomas; Lohmann, Ebba

2018-03-01

Mutations in the LRRK2 and alpha-synuclein (SNCA) genes are well-established causes of autosomal dominant Parkinson's disease (PD). However, their frequency differs widely between ethnic groups. Only three studies have screened all coding regions of LRRK2 and SNCA in European samples so far. In Turkey, the role of LRRK2 in Parkinson's disease has been studied fragmentarily, and the incidence of SNCA copy number variations is unknown. The purpose of this study is to determine the frequency of LRRK2 and SNCA mutations in autosomal dominant PD in Turkey. We performed Sanger sequencing of all coding LRRK2 and SNCA exons in a sample of 91 patients with Parkinsonism. Copy number variations in SNCA, PRKN, PINK1, DJ1 and ATP13A2 were assessed using the MLPA method. All patients had a positive family history compatible with autosomal dominant inheritance. Known mutations in LRRK2 and SNCA were found in 3.3% of cases: one patient harbored the LRRK2 G2019S mutation, and two patients carried a SNCA gene duplication. Furthermore, we found a heterozygous deletion of PRKN exon 2 in one patient, and four rare coding variants of unknown significance (LRRK2: A211V, R1067Q, T2494I; SNCA: T72T). Genetic testing in one affected family identified the LRRK2 R1067Q variant as a possibly pathogenic substitution. Point mutations in LRRK2 and SNCA are a rare cause of autosomal dominant PD in Turkey. However, copy number variations should be considered. The unclassified variants, especially LRRK2 R1067Q, demand further investigation. Copyright © 2017. Published by Elsevier Ltd.
CVD-associated non-coding RNA, ANRIL, modulates expression of atherogenic pathways in VSMC

DOE Office of Scientific and Technical Information (OSTI.GOV)

Congrains, Ada; Kamide, Kei; Katsuya, Tomohiro

Highlights: Black-Right-Pointing-Pointer ANRIL maps in the strongest susceptibility locus for cardiovascular disease. Black-Right-Pointing-Pointer Silencing of ANRIL leads to altered expression of tissue remodeling-related genes. Black-Right-Pointing-Pointer The effects of ANRIL on gene expression are splicing variant specific. Black-Right-Pointing-Pointer ANRIL affects progression of cardiovascular disease by regulating proliferation and apoptosis pathways. -- Abstract: ANRIL is a newly discovered non-coding RNA lying on the strongest genetic susceptibility locus for cardiovascular disease (CVD) in the chromosome 9p21 region. Genome-wide association studies have been linking polymorphisms in this locus with CVD and several other major diseases such as diabetes and cancer. The role of thismore » non-coding RNA in atherosclerosis progression is still poorly understood. In this study, we investigated the implication of ANRIL in the modulation of gene sets directly involved in atherosclerosis. We designed and tested siRNA sequences to selectively target two exons (exon 1 and exon 19) of the transcript and successfully knocked down expression of ANRIL in human aortic vascular smooth muscle cells (HuAoVSMC). We used a pathway-focused RT-PCR array to profile gene expression changes caused by ANRIL knock down. Notably, the genes affected by each of the siRNAs were different, suggesting that different splicing variants of ANRIL might have distinct roles in cell physiology. Our results suggest that ANRIL splicing variants play a role in coordinating tissue remodeling, by modulating the expression of genes involved in cell proliferation, apoptosis, extra-cellular matrix remodeling and inflammatory response to finally impact in the risk of cardiovascular disease and other pathologies.« less
Whole-genome sequence-based analysis of thyroid function.

PubMed

Taylor, Peter N; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J; Traglia, Michela; Brown, Suzanne J; Mullin, Benjamin H; Shihab, Hashem A; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R; Beilby, John P; Charoen, Pimphen; Danecek, Petr; Dudbridge, Frank; Forgetta, Vincenzo; Greenwood, Celia; Grundberg, Elin; Johnson, Andrew D; Hui, Jennie; Lim, Ee M; McCarthy, Shane; Muddyman, Dawn; Panicker, Vijay; Perry, John R B; Bell, Jordana T; Yuan, Wei; Relton, Caroline; Gaunt, Tom; Schlessinger, David; Abecasis, Goncalo; Cucca, Francesco; Surdulescu, Gabriela L; Woltersdorf, Wolfram; Zeggini, Eleftheria; Zheng, Hou-Feng; Toniolo, Daniela; Dayan, Colin M; Naitza, Silvia; Walsh, John P; Spector, Tim; Davey Smith, George; Durbin, Richard; Richards, J Brent; Sanna, Serena; Soranzo, Nicole; Timpson, Nicholas J; Wilson, Scott G

2015-03-06

Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 10(-9)) and a new independent variant in PDE8B (MAF=10.4%, P=5.94 × 10(-14)). For FT4, we report a low-frequency variant near B4GALT6/SLC25A52 (MAF=3.2%, P=1.27 × 10(-9)) tagging a rare TTR variant (MAF=0.4%, P=2.14 × 10(-11)). All common variants explain ≥20% of the variance in TSH and FT4. Analysis of rare variants (MAF<1%) using sequence kernel association testing reveals a novel association with FT4 in NRG1. Our results demonstrate that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function.
Mitochondrial targeting sequence variants of the CHCHD2 gene are a risk for Lewy body disorders

PubMed Central

Ogaki, Kotaro; Koga, Shunsuke; Heckman, Michael G.; Fiesel, Fabienne C.; Ando, Maya; Labbé, Catherine; Lorenzo-Betancor, Oswaldo; Moussaud-Lamodière, Elisabeth L.; Soto-Ortolaza, Alexandra I.; Walton, Ronald L.; Strongosky, Audrey J.; Uitti, Ryan J.; McCarthy, Allan; Lynch, Timothy; Siuda, Joanna; Opala, Grzegorz; Rudzinska, Monika; Krygowska-Wajs, Anna; Barcikowska, Maria; Czyzewski, Krzysztof; Puschmann, Andreas; Nishioka, Kenya; Funayama, Manabu; Hattori, Nobutaka; Parisi, Joseph E.; Petersen, Ronald C.; Graff-Radford, Neill R.; Boeve, Bradley F.; Springer, Wolfdieter; Wszolek, Zbigniew K.; Dickson, Dennis W.

2015-01-01

Objective: To assess the role of CHCHD2 variants in patients with Parkinson disease (PD) and Lewy body disease (LBD) in Caucasian populations. Methods: All exons of the CHCHD2 gene were sequenced in a US Caucasian patient-control series (878 PD, 610 LBD, and 717 controls). Subsequently, exons 1 and 2 were sequenced in an Irish series (355 PD and 365 controls) and a Polish series (394 PD and 350 controls). Immunohistochemistry and immunofluorescence studies were performed on pathologic LBD cases with rare CHCHD2 variants. Results: We identified 9 rare exonic variants of unknown significance. These variants were more frequent in the combined group of PD and LBD patients compared to controls (0.6% vs 0.1%, p = 0.013). In addition, the presence of any rare variant was more common in patients with LBD (2.5% vs 1.0%, p = 0.050) compared to controls. Eight of these 9 variants were located within the gene's mitochondrial targeting sequence. Conclusions: Although the role of variants of the CHCHD2 gene in PD and LBD remains to be further elucidated, the rare variants in the mitochondrial targeting sequence may be a risk factor for Lewy body disorders, which may link CHCHD2 to other genetic forms of parkinsonism with mitochondrial dysfunction. PMID:26561290
Pooled-DNA Sequencing for Elucidating New Genomic Risk Factors, Rare Variants Underlying Alzheimer's Disease.

PubMed

Jin, Sheng Chih; Benitez, Bruno A; Deming, Yuetiva; Cruchaga, Carlos

2016-01-01

Analyses of genome-wide association studies (GWAS) for complex disorders usually identify common variants with a relatively small effect size that only explain a small proportion of phenotypic heritability. Several studies have suggested that a significant fraction of heritability may be explained by low-frequency (minor allele frequency (MAF) of 1-5 %) and rare-variants that are not contained in the commercial GWAS genotyping arrays (Schork et al., Curr Opin Genet Dev 19:212, 2009). Rare variants can also have relatively large effects on risk for developing human diseases or disease phenotype (Cruchaga et al., PLoS One 7:e31039, 2012). However, it is necessary to perform next-generation sequencing (NGS) studies in a large population (>4,000 samples) to detect a significant rare-variant association. Several NGS methods, such as custom capture sequencing and amplicon-based sequencing, are designed to screen a small proportion of the genome, but most of these methods are limited in the number of samples that can be multiplexed (i.e. most sequencing kits only provide 96 distinct index). Additionally, the sequencing library preparation for 4,000 samples remains expensive and thus conducting NGS studies with the aforementioned methods are not feasible for most research laboratories.The need for low-cost large scale rare-variant detection makes pooled-DNA sequencing an ideally efficient and cost-effective technique to identify rare variants in target regions by sequencing hundreds to thousands of samples. Our recent work has demonstrated that pooled-DNA sequencing can accurately detect rare variants in targeted regions in multiple DNA samples with high sensitivity and specificity (Jin et al., Alzheimers Res Ther 4:34, 2012). In these studies we used a well-established pooled-DNA sequencing approach and a computational package, SPLINTER (short indel prediction by large deviation inference and nonlinear true frequency estimation by recursion) (Vallania et al., Genome Res 20:1711, 2010), for accurate identification of rare variants in large DNA pools. Given an average sequencing coverage of 30× per haploid genome, SPLINTER can detect rare variants and short indels up to 4 base pairs (bp) with high sensitivity and specificity (up to 1 haploid allele in a pool as large as 500 individuals). Step-by-step instructions on how to conduct pooled-DNA sequencing experiments and data analyses are described in this chapter.
Host genetic variation in mucosal immunity pathways influences the upper airway microbiome.

PubMed

Igartua, Catherine; Davenport, Emily R; Gilad, Yoav; Nicolae, Dan L; Pinto, Jayant; Ober, Carole

2017-02-01

The degree to which host genetic variation can modulate microbial communities in humans remains an open question. Here, we performed a genetic mapping study of the microbiome in two accessible upper airway sites, the nasopharynx and the nasal vestibule, during two seasons in 144 adult members of a founder population of European decent. We estimated the relative abundances (RAs) of genus level bacteria from 16S rRNA gene sequences and examined associations with 148,653 genetic variants (linkage disequilibrium [LD] r 2 < 0.5) selected from among all common variants discovered in genome sequences in this population. We identified 37 microbiome quantitative trait loci (mbQTLs) that showed evidence of association with the RAs of 22 genera (q < 0.05) and were enriched for genes in mucosal immunity pathways. The most significant association was between the RA of Dermacoccus (phylum Actinobacteria) and a variant 8 kb upstream of TINCR (rs117042385; p = 1.61 × 10 -8 ; q = 0.002), a long non-coding RNA that binds to peptidoglycan recognition protein 3 (PGLYRP3) mRNA, a gene encoding a known antimicrobial protein. A second association was between a missense variant in PGLYRP4 (rs3006458) and the RA of an unclassified genus of family Micrococcaceae (phylum Actinobacteria) (p = 5.10 × 10 -7 ; q = 0.032). Our findings provide evidence of host genetic influences on upper airway microbial composition in humans and implicate mucosal immunity genes in this relationship.
Exome sequencing identifies novel compound heterozygous IFNA4 and IFNA10 mutations as a cause of impaired function in Crohn’s disease patients

PubMed Central

Xiao, Chuan-Xing; Xiao, Jing-Jing; Xu, Hong-Zhi; Wang, Huan-Huan; Chen, Xu; Liu, Yuan-Sheng; Li, Ping; Shi, Ying; Nie, Yong-Zhan; Li, Shao; Wu, Kai-Chun; Liu, Zhan-Ju; Ren, Jian-Lin; Guleng, Bayasi

2015-01-01

Previous studies have highlighted the role of genetic predispositions in disease, and several genes had been identified as important in Crohn’s disease (CD). However, many of these genes are likely rare and not associated with susceptibility in Chinese CD patients. We found 294 shared identical variants in the CD patients of which 26 were validated by Sanger sequencing. Two heterozygous IFN variants (IFNA10 c.60 T > A; IFNA4 c.60 A > T) were identified as significantly associated with CD susceptibility. The single-nucleotide changes alter a cysteine situated before the signal peptide cleavage site to a stop code (TGA) in IFNA10 result in the serum levels of IFNA10 were significantly decreased in the CD patients compared to the controls. Furthermore, the IFNA10 and IFNA4 mutants resulted in an impairment of the suppression of HCV RNA replication in HuH7 cells, and the administration of the recombinant IFN subtypes restored DSS-induced colonic inflammation through the upregulation of CD4+ Treg cells. We identified heterozygous IFNA10 and IFNA4 variants as a cause of impaired function and CD susceptibility genes in Chinese patients from multiple center based study. These findings might provide clues in the understanding of the genetic heterogeneity of CD and lead to better screening and improved treatment. PMID:26000985
Whole-Exome Sequencing of Congenital Glaucoma Patients Reveals Hypermorphic Variants in GPATCH3, a New Gene Involved in Ocular and Craniofacial Development

PubMed Central

Ferre-Fernández, Jesús-José; Aroca-Aguilar, José-Daniel; Medina-Trillo, Cristina; Bonet-Fernández, Juan-Manuel; Méndez-Hernández, Carmen-Dora; Morales-Fernández, Laura; Corton, Marta; Cabañero-Valera, María-José; Gut, Marta; Tonda, Raul; Ayuso, Carmen; Coca-Prados, Miguel; García-Feijoo, Julián; Escribano, Julio

2017-01-01

Congenital glaucoma (CG) is a heterogeneous, inherited and severe optical neuropathy that originates from maldevelopment of the anterior segment of the eye. To identify new disease genes, we performed whole-exome sequencing of 26 unrelated CG patients. In one patient we identified two rare, recessive and hypermorphic coding variants in GPATCH3, a gene of unidentified function, and 5% of a second group of 170 unrelated CG patients carried rare variants in this gene. The recombinant GPATCH3 protein activated in vitro the proximal promoter of CXCR4, a gene involved in embryo neural crest cell migration. The GPATCH3 protein was detected in human tissues relevant to glaucoma (e.g., ciliary body). This gene was expressed in the dermis, skeletal muscles, periocular mesenchymal-like cells and corneal endothelium of early zebrafish embryos. Morpholino-mediated knockdown and transient overexpression of gpatch3 led to varying degrees of goniodysgenesis and ocular and craniofacial abnormalities, recapitulating some of the features of zebrafish embryos deficient in the glaucoma-related genes pitx2 and foxc1. In conclusion, our data suggest the existence of high genetic heterogeneity in CG and provide evidence for the role of GPATCH3 in this disease. We also show that GPATCH3 is a new gene involved in ocular and craniofacial development. PMID:28397860

Inherited platelet disorders: toward DNA-based diagnosis

PubMed Central

Lentaigne, Claire; Freson, Kathleen; Laffan, Michael A.; Turro, Ernest

2016-01-01

Variations in platelet number, volume, and function are largely genetically controlled, and many loci associated with platelet traits have been identified by genome-wide association studies (GWASs).1 The genome also contains a large number of rare variants, of which a tiny fraction underlies the inherited diseases of humans. Research over the last 3 decades has led to the discovery of 51 genes harboring variants responsible for inherited platelet disorders (IPDs). However, the majority of patients with an IPD still do not receive a molecular diagnosis. Alongside the scientific interest, molecular or genetic diagnosis is important for patients. There is increasing recognition that a number of IPDs are associated with severe pathologies, including an increased risk of malignancy, and a definitive diagnosis can inform prognosis and care. In this review, we give an overview of these disorders grouped according to their effect on platelet biology and their clinical characteristics. We also discuss the challenge of identifying candidate genes and causal variants therein, how IPDs have been historically diagnosed, and how this is changing with the introduction of high-throughput sequencing. Finally, we describe how integration of large genomic, epigenomic, and phenotypic datasets, including whole genome sequencing data, GWASs, epigenomic profiling, protein–protein interaction networks, and standardized clinical phenotype coding, will drive the discovery of novel mechanisms of disease in the near future to improve patient diagnosis and management. PMID:27095789
De novo truncating variants in the AHDC1 gene encoding the AT-hook DNA-binding motif-containing protein 1 are associated with intellectual disability and developmental delay.

PubMed

Yang, Hui; Douglas, Ganka; Monaghan, Kristin G; Retterer, Kyle; Cho, Megan T; Escobar, Luis F; Tucker, Megan E; Stoler, Joan; Rodan, Lance H; Stein, Diane; Marks, Warren; Enns, Gregory M; Platt, Julia; Cox, Rachel; Wheeler, Patricia G; Crain, Carrie; Calhoun, Amy; Tryon, Rebecca; Richard, Gabriele; Vitazka, Patrik; Chung, Wendy K

2015-10-01

Whole-exome sequencing (WES) represents a significant breakthrough in clinical genetics, and identifies a genetic etiology in up to 30% of cases of intellectual disability (ID). Using WES, we identified seven unrelated patients with a similar clinical phenotype of severe intellectual disability or neurodevelopmental delay who were all heterozygous for de novo truncating variants in the AT-hook DNA-binding motif-containing protein 1 (AHDC1). The patients were all minimally verbal or nonverbal and had variable neurological problems including spastic quadriplegia, ataxia, nystagmus, seizures, autism, and self-injurious behaviors. Additional common clinical features include dysmorphic facial features and feeding difficulties associated with failure to thrive and short stature. The AHDC1 gene has only one coding exon, and the protein contains conserved regions including AT-hook motifs and a PDZ binding domain. We postulate that all seven variants detected in these patients result in a truncated protein missing critical functional domains, disrupting interactions with other proteins important for brain development. Our study demonstrates that truncating variants in AHDC1 are associated with ID and are primarily associated with a neurodevelopmental phenotype.
Engineered Cpf1 variants with altered PAM specificities.

PubMed

Gao, Linyi; Cox, David B T; Yan, Winston X; Manteiga, John C; Schneider, Martin W; Yamano, Takashi; Nishimasu, Hiroshi; Nureki, Osamu; Crosetto, Nicola; Zhang, Feng

2017-08-01

The RNA-guided endonuclease Cpf1 is a promising tool for genome editing in eukaryotic cells. However, the utility of the commonly used Acidaminococcus sp. BV3L6 Cpf1 (AsCpf1) and Lachnospiraceae bacterium ND2006 Cpf1 (LbCpf1) is limited by their requirement of a TTTV protospacer adjacent motif (PAM) in the DNA substrate. To address this limitation, we performed a structure-guided mutagenesis screen to increase the targeting range of Cpf1. We engineered two AsCpf1 variants carrying the mutations S542R/K607R and S542R/K548V/N552R, which recognize TYCV and TATV PAMs, respectively, with enhanced activities in vitro and in human cells. Genome-wide assessment of off-target activity using BLISS indicated that these variants retain high DNA-targeting specificity, which we further improved by introducing an additional non-PAM-interacting mutation. Introducing the identified PAM-interacting mutations at their corresponding positions in LbCpf1 similarly altered its PAM specificity. Together, these variants increase the targeting range of Cpf1 by approximately threefold in human coding sequences to one cleavage site per ∼11 bp.
Engineered Cpf1 variants with altered PAM specificities increase genome targeting range

PubMed Central

Gao, Linyi; Cox, David B.T.; Yan, Winston X.; Manteiga, John C.; Schneider, Martin W.; Yamano, Takashi; Nishimasu, Hiroshi; Nureki, Osamu; Crosetto, Nicola; Zhang, Feng

2017-01-01

The RNA-guided endonuclease Cpf1 is a promising tool for genome editing in eukaryotic cells1–7. However, the utility of the commonly used Acidaminococcus sp. BV3L6 Cpf1 (AsCpf1) and Lachnospiraceae bacterium ND2006 Cpf1 (LbCpf1) is limited by their requirement of a TTTV protospacer adjacent motif (PAM) in the DNA substrate. To address this limitation, we performed a structure-guided mutagenesis screen to increase the targeting range of Cpf1. We engineered two AsCpf1 variants carrying the mutations S542R/K607R and S542R/K548V/N552R, which recognize TYCV and TATV PAMs, respectively, with enhanced activities in vitro and in human cells. Genome-wide assessment of off-target activity using BLISS7 assay indicated that these variants retain high DNA targeting specificity, which we further improved by introducing an additional non-PAM-interacting mutation. Introducing the identified mutations at their corresponding positions in LbCpf1 similarly altered its PAM specificity. Together, these variants increase the targeting range of Cpf1 by approximately three-fold in human coding sequences to one cleavage site per ~11 bp. PMID:28581492
Genetic variants of adiponectin receptor 2 are associated with increased adiponectin levels and decreased triglyceride/VLDL levels in patients with metabolic syndrome.

PubMed

Broedl, Uli C; Lehrke, Michael; Fleischer-Brielmaier, Elisabeth; Tietz, Anne B; Nagel, Jutta M; Göke, Burkhard; Lohse, Peter; Parhofer, Klaus G

2006-05-15

Adiponectin acts as an antidiabetic, antiinflammatory and antiatherogenic adipokine. These effects are assumed to be mediated by the recently discovered adiponectin receptors AdipoR1 and AdipoR2. The purpose of this study was to determine whether variations in the AdipoR1 and AdipoR2 genes may contribute to insulin resistance, dyslipidemia and inflammation. We sequenced all seven coding exons of both genes in 20 unrelated German subjects with metabolic syndrome and tested genetic variants for association with glucose, lipid and inflammatory parameters. We identified three AdipoR2 variants (+795G/A, +870C/A and +963C/T) in perfect linkage disequilibrium (r2 = 1) with a minor allele frequency of 0.125. This haplotype was associated with higher plasma adiponectin levels and decreased fasting triglyceride, VLDL-triglyceride and VLDL-cholesterol levels. No association, however, was observed between the AdipoR2 SNP cluster and glucose metabolism. To our knowledge, this is the first study to identify an association between genetic variants of the adiponectin receptor genes and plasma adiponectin levels. Furthermore, our data suggest that AdipoR2 may play an important role in triglyceride/VLDL metabolism.
Haplotype Analysis in Multiple Crosses to Identify a QTL Gene

PubMed Central

Wang, Xiaosong; Korstanje, Ron; Higgins, David; Paigen, Beverly

2004-01-01

Identifying quantitative trait locus (QTL) genes is a challenging task. Herein, we report using a two-step process to identify Apoa2 as the gene underlying Hdlq5, a QTL for plasma high-density lipoprotein cholesterol (HDL) levels on mouse chromosome 1. First, we performed a sequence analysis of the Apoa2 coding region in 46 genetically diverse mouse strains and found five different APOA2 protein variants, which we named APOA2a to APOA2e. Second, we conducted a haplotype analysis of the strains in 21 crosses that have so far detected HDL QTLs; we found that Hdlq5 was detected only in the nine crosses where one parent had the APOA2b protein variant characterized by an Ala61-to-Val61 substitution. We then found that strains with the APOA2b variant had significantly higher (P ≤ 0.002) plasma HDL levels than those with either the APOA2a or the APOA2c variant. These findings support Apoa2 as the underlying Hdlq5 gene and suggest the Apoa2 polymorphisms responsible for the Hdlq5 phenotype. Therefore, haplotype analysis in multiple crosses can be used to support a candidate QTL gene. PMID:15310659
Haplotype analysis in multiple crosses to identify a QTL gene.

PubMed

Wang, Xiaosong; Korstanje, Ron; Higgins, David; Paigen, Beverly

2004-09-01

Identifying quantitative trait locus (QTL) genes is a challenging task. Herein, we report using a two-step process to identify Apoa2 as the gene underlying Hdlq5, a QTL for plasma high-density lipoprotein cholesterol (HDL) levels on mouse chromosome 1. First, we performed a sequence analysis of the Apoa2 coding region in 46 genetically diverse mouse strains and found five different APOA2 protein variants, which we named APOA2a to APOA2e. Second, we conducted a haplotype analysis of the strains in 21 crosses that have so far detected HDL QTLs; we found that Hdlq5 was detected only in the nine crosses where one parent had the APOA2b protein variant characterized by an Ala61-to-Val61 substitution. We then found that strains with the APOA2b variant had significantly higher (P < or = 0.002) plasma HDL levels than those with either the APOA2a or the APOA2c variant. These findings support Apoa2 as the underlying Hdlq5 gene and suggest the Apoa2 polymorphisms responsible for the Hdlq5 phenotype. Therefore, haplotype analysis in multiple crosses can be used to support a candidate QTL gene.
High depth, whole-genome sequencing of cholera isolates from Haiti and the Dominican Republic.

PubMed

Sealfon, Rachel; Gire, Stephen; Ellis, Crystal; Calderwood, Stephen; Qadri, Firdausi; Hensley, Lisa; Kellis, Manolis; Ryan, Edward T; LaRocque, Regina C; Harris, Jason B; Sabeti, Pardis C

2012-09-11

Whole-genome sequencing is an important tool for understanding microbial evolution and identifying the emergence of functionally important variants over the course of epidemics. In October 2010, a severe cholera epidemic began in Haiti, with additional cases identified in the neighboring Dominican Republic. We used whole-genome approaches to sequence four Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae isolates to a high depth of coverage (>2000x); four of the seven isolates were previously sequenced. Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants. We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads. Phylogenetic analysis between the newly sequenced and thirty-three previously sequenced V. cholerae isolates indicates that the Haitian and Dominican Republic isolates are closest to strains from South Asia. The Haitian and Dominican Republic isolates form a tight cluster, with only four variants unique to individual isolates. These variants are located in the CTX region, the SXT region, and the core genome. Of the 126 mutations identified that separate the Haiti-Dominican Republic cluster from the V. cholerae reference strain (N16961), 73 are non-synonymous changes, and a number of these changes cluster in specific genes and pathways. Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.
Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data.

PubMed

Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A; Larsen, Martin Jakob

2016-01-01

Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths.
Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data

PubMed Central

Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A.; Larsen, Martin Jakob

2016-01-01

Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths. PMID:27002637
Errors from approximation of ODE systems with reduced order models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vassilevska, Tanya

2016-12-30

This is a code to calculate the error from approximation of systems of ordinary differential equations (ODEs) by using Proper Orthogonal Decomposition (POD) Reduced Order Models (ROM) methods and to compare and analyze the errors for two POD ROM variants. The first variant is the standard POD ROM, the second variant is a modification of the method using the values of the time derivatives (a.k.a. time-derivative snapshots). The code compares the errors from the two variants under different conditions.
International interlaboratory study comparing single organism 16S rRNA gene sequencing data: Beyond consensus sequence comparisons

PubMed Central

Olson, Nathan D.; Lund, Steven P.; Zook, Justin M.; Rojas-Cornejo, Fabiola; Beck, Brian; Foy, Carole; Huggett, Jim; Whale, Alexandra S.; Sui, Zhiwei; Baoutina, Anna; Dobeson, Michael; Partis, Lina; Morrow, Jayne B.

2015-01-01

This study presents the results from an interlaboratory sequencing study for which we developed a novel high-resolution method for comparing data from different sequencing platforms for a multi-copy, paralogous gene. The combination of PCR amplification and 16S ribosomal RNA gene (16S rRNA) sequencing has revolutionized bacteriology by enabling rapid identification, frequently without the need for culture. To assess variability between laboratories in sequencing 16S rRNA, six laboratories sequenced the gene encoding the 16S rRNA from Escherichia coli O157:H7 strain EDL933 and Listeria monocytogenes serovar 4b strain NCTC11994. Participants performed sequencing methods and protocols available in their laboratories: Sanger sequencing, Roche 454 pyrosequencing®, or Ion Torrent PGM®. The sequencing data were evaluated on three levels: (1) identity of biologically conserved position, (2) ratio of 16S rRNA gene copies featuring identified variants, and (3) the collection of variant combinations in a set of 16S rRNA gene copies. The same set of biologically conserved positions was identified for each sequencing method. Analytical methods using Bayesian and maximum likelihood statistics were developed to estimate variant copy ratios, which describe the ratio of nucleotides at each identified biologically variable position, as well as the likely set of variant combinations present in 16S rRNA gene copies. Our results indicate that estimated variant copy ratios at biologically variable positions were only reproducible for high throughput sequencing methods. Furthermore, the likely variant combination set was only reproducible with increased sequencing depth and longer read lengths. We also demonstrate novel methods for evaluating variable positions when comparing multi-copy gene sequence data from multiple laboratories generated using multiple sequencing technologies. PMID:27077030
Characterization of mussel H2A.Z.2: a new H2A.Z variant preferentially expressed in germinal tissues from Mytilus.

PubMed

Rivera-Casas, Ciro; González-Romero, Rodrigo; Vizoso-Vazquez, Ángel; Cheema, Manjinder S; Cerdán, M Esperanza; Méndez, Josefina; Ausió, Juan; Eirin-Lopez, Jose M

2016-10-01

Histones are the fundamental constituents of the eukaryotic chromatin, facilitating the physical organization of DNA in chromosomes and participating in the regulation of its metabolism. The H2A family displays the largest number of variants among core histones, including the renowned H2A.X, macroH2A, H2A.B (Bbd), and H2A.Z. This latter variant is especially interesting because of its regulatory role and its differentiation into 2 functionally divergent variants (H2A.Z.1 and H2A.Z.2), further specializing the structure and function of vertebrate chromatin. In the present work we describe, for the first time, the presence of a second H2A.Z variant (H2A.Z.2) in the genome of a non-vertebrate animal, the mussel Mytilus. The molecular and evolutionary characterization of mussel H2A.Z.1 and H2A.Z.2 histones is consistent with their functional specialization, supported on sequence divergence at promoter and coding regions as well as on varying gene expression patterns. More precisely, the expression of H2A.Z.2 transcripts in gonadal tissue and its potential upregulation in response to genotoxic stress might be mirroring the specialization of this variant in DNA repair. Overall, the findings presented in this work complement recent reports describing the widespread presence of other histone variants across eukaryotes, supporting an ancestral origin and conserved role for histone variants in chromatin.
Functional analysis of four naturally occurring variants of human constitutive androstane receptor.

PubMed

Ikeda, Shinobu; Kurose, Kouichi; Jinno, Hideto; Sai, Kimie; Ozawa, Shogo; Hasegawa, Ryuichi; Komamura, Kazuo; Kotake, Takeshi; Morishita, Hideki; Kamakura, Shiro; Kitakaze, Masafumi; Tomoike, Hitonobu; Tamura, Tomohide; Yamamoto, Noboru; Kunitoh, Hideo; Yamada, Yasuhide; Ohe, Yuichiro; Shimada, Yasuhiro; Shirao, Kuniaki; Kubota, Kaoru; Minami, Hironobu; Ohtsu, Atsushi; Yoshida, Teruhiko; Saijo, Nagahiro; Saito, Yoshiro; Sawada, Jun-ichi

2005-01-01

The human constitutive androstane receptor (CAR, NR1I3) is a member of the orphan nuclear receptor superfamily that plays an important role in the control of drug metabolism and disposition. In this study, we sequenced all the coding exons of the NR1I3 gene for 334 Japanese subjects. We identified three novel single nucleotide polymorphisms (SNPs) that induce non-synonymous alterations of amino acids (His246Arg, Leu308Pro, and Asn323Ser) residing in the ligand-binding domain of CAR, in addition to the Val133Gly variant, which was another CAR variant identified in our previous study. We performed functional analysis of these four naturally occurring CAR variants in COS-7 cells using a CYP3A4 promoter/enhancer reporter gene that includes the CAR responsive elements. The His246Arg variant caused marked reductions in both transactivation of the reporter gene and in the response to 6-(4-chlorophenyl)imidazo[2,1-b][1,3]thiazole-5-carbaldehyde O-(3,4-dichlorobenzyl)oxime (CITCO), which is a human CAR-specific agonist. The transactivation ability of the Leu308Pro variant was also significantly decreased, but its responsiveness to CITCO was not abrogated. The transactivation ability and CITCO response of the Val133Gly and Asn323Ser variants did not change as compared to the wild-type CAR. These data suggest that the His246Arg and Leu308Pro variants, especially His246Arg, may influence the expression of drug-metabolizing enzymes and transporters that are transactivated by CAR.
Protein Degradation in a TX-TL Cell-free Expression System Using ClpXP Protease

DTIC Science & Technology

2014-07-14

function in TX-TL, as well as bacteriophage assembly [2, 6]. Circuits can also be prototyped from basic parts within 8 hours, avoiding cloning and...mRFP, and Venus and variants eGFP-ssrA, mRFP-ssrA, and Venus-ssrA, coding sequences were cloned into a T7-lacO inducible vector containing a N...12672L12677.! 6.! Shin,!J.,!P.!Jardine,!and!V.!Noireaux,!Genome(Replication,(Synthesis,(and( Assembly(of(the( Bacteriophage (T7(in(a(Single(Cell9Free
Exome-Wide Association Study Identifies New Low-Frequency and Rare UGT1A1 Coding Variants and UGT1A6 Coding Variants Influencing Serum Bilirubin in Elderly Subjects

PubMed Central

Oussalah, Abderrahim; Bosco, Paolo; Anello, Guido; Spada, Rosario; Guéant-Rodriguez, Rosa-Maria; Chery, Céline; Rouyer, Pierre; Josse, Thomas; Romano, Antonino; Elia, Maurizzio; Bronowicki, Jean-Pierre; Guéant, Jean-Louis

2015-01-01

Abstract Genome-wide association studies (GWASs) have identified loci contributing to total serum bilirubin level. However, no exome-wide approaches have been performed to address this question. Using exome-wide approach, we assessed the influence of protein-coding variants on unconjugated, conjugated, and total serum bilirubin levels in a well-characterized cohort of 773 ambulatory elderly subjects from Italy. Coding variants were replicated in 227 elderly subjects from the same area. We identified 4 missense rare (minor allele frequency, MAF < 0.5%) and low-frequency (MAF, 0.5%–5%) coding variants located in the first exon of the UGT1A1 gene, which encodes for the substrate-binding domain (rs4148323 [MAF = 0.06%; p.Gly71Arg], rs144398951 [MAF = 0.06%; p.Ile215Val], rs35003977 [MAF = 0.78%; p.Val225Gly], and rs57307513 [MAF = 0.06%; p.Ser250Pro]). These variants were in strong linkage disequilibrium with 3 intronic UGT1A1 variants (rs887829, rs4148325, rs6742078), which were significantly associated with total bilirubin level (P = 2.34 × 10−34, P = 7.02 × 10−34, and P = 8.27 × 10−34), as well as unconjugated, and conjugated bilirubin levels. We also identified UGT1A6 variants in association with total (rs6759892, p.Ser7Ala, P = 1.98 × 10−26; rs2070959, p.Thr181Ala, P = 2.87 × 10−27; and rs1105879, p.Arg184Ser, P = 3.27 × 10−29), unconjugated, and conjugated bilirubin levels. All UGT1A1 intronic variants (rs887829, rs6742078, and rs4148325) and UGT1A6 coding variants (rs6759892, rs2070959, and rs1105879) were significantly associated with gallstone-related cholecystectomy risk. The UGT1A6 variant rs2070959 (p.Thr181Ala) was associated with the highest risk of gallstone–related cholecystectomy (OR, 4.58; 95% CI, 1.58–13.28; P = 3.21 × 10−3). Using an exome-wide approach we identified coding variants on UGT1A1 and UGT1A6 genes in association with serum bilirubin level and hyperbilirubinemia risk in elderly subjects. UGT1A1 intronic single-nucleotide polymorphisms (SNPs) (rs6742078, rs887829, rs4148324) serve as proxy markers for the low-frequency and rare UGT1A1 variants, thereby providing mechanistic explanation to the relationship between UGT1A1 intronic SNPs and the UGT1A1 enzyme activity. UGT1A1 and UGT1A6 variants might be potentially associated with gallstone-related cholecystectomy risk. PMID:26039129
Genomic prediction using preselected DNA variants from a GWAS with whole-genome sequence data in Holstein-Friesian cattle.

PubMed

Veerkamp, Roel F; Bouwman, Aniek C; Schrooten, Chris; Calus, Mario P L

2016-12-01

Whole-genome sequence data is expected to capture genetic variation more completely than common genotyping panels. Our objective was to compare the proportion of variance explained and the accuracy of genomic prediction by using imputed sequence data or preselected SNPs from a genome-wide association study (GWAS) with imputed whole-genome sequence data. Phenotypes were available for 5503 Holstein-Friesian bulls. Genotypes were imputed up to whole-genome sequence (13,789,029 segregating DNA variants) by using run 4 of the 1000 bull genomes project. The program GCTA was used to perform GWAS for protein yield (PY), somatic cell score (SCS) and interval from first to last insemination (IFL). From the GWAS, subsets of variants were selected and genomic relationship matrices (GRM) were used to estimate the variance explained in 2087 validation animals and to evaluate the genomic prediction ability. Finally, two GRM were fitted together in several models to evaluate the effect of selected variants that were in competition with all the other variants. The GRM based on full sequence data explained only marginally more genetic variation than that based on common SNP panels: for PY, SCS and IFL, genomic heritability improved from 0.81 to 0.83, 0.83 to 0.87 and 0.69 to 0.72, respectively. Sequence data also helped to identify more variants linked to quantitative trait loci and resulted in clearer GWAS peaks across the genome. The proportion of total variance explained by the selected variants combined in a GRM was considerably smaller than that explained by all variants (less than 0.31 for all traits). When selected variants were used, accuracy of genomic predictions decreased and bias increased. Although 35 to 42 variants were detected that together explained 13 to 19% of the total variance (18 to 23% of the genetic variance) when fitted alone, there was no advantage in using dense sequence information for genomic prediction in the Holstein data used in our study. Detection and selection of variants within a single breed are difficult due to long-range linkage disequilibrium. Stringent selection of variants resulted in more biased genomic predictions, although this might be due to the training population being the same dataset from which the selected variants were identified.
Mutations in LPL, APOC2, APOA5, GPIHBP1 and LMF1 in patients with severe hypertriglyceridaemia

PubMed Central

Surendran, R Preethi; Visser, Maartje E; Heemelaar, Steffie; Wang, Jian; Peter, Jorge; Defesche, Joep C; Kuivenhoven, Jan A; Hosseini, Maryam; Péterfy, Miklós; Kastelein, John JP; Johansen, Chris T; Hegele, Robert A; Stroes, Erik SG; Dallinga-Thie, Geesje M

2014-01-01

Objective The severe forms of hypertriglyceridaemia (HTG) are caused by mutations in genes that lead to loss of function of lipoprotein lipase (LPL). In most patients with severe HTG (TG >10 mmol/L) it is a challenge to define the underlying cause. We investigated the molecular basis of severe HTG in patients referred to the Lipid Clinic at the Academic Medical Center Amsterdam. Methods The coding regions of LPL, APOC2, APOA5 and two novel genes, lipase maturation factor 1 (LMF1) and GPI-anchored HDL-binding protein 1 (GPIHBP1), were sequenced in 86 patients with type 1 and type 5 HTG and 327 controls. Results In 46 patients (54%) rare DNA sequence variants were identified, comprising variants in LPL (n=19), APOC2 (n=1), APOA5 (n=2), GPIHBP1 (n=3) and LMF1 (n=8). In 22 patients (26%) only common variants in LPL (p.Asp36Asn, p.Asn318Ser and p.Ser474Ter) and APOA5 (p.Ser19Trp) could be identified, whereas no mutations were found in 18 patients (21%). In vitro validation revealed that the mutations in LMF1 were not associated with compromised LPL function. Consistent with this, five of the eight LMF1 variants were also found in controls and therefore cannot account for the observed phenotype. Conclusion The prevalence of mutations in LPL was 34% and mostly restricted to patients with type 1 HTG. Mutations in GPIHBP1 (n=3), APOC2 (n=1) and APOA5 (n=2) were rare but the associated clinical phenotype was severe. Routine sequencing of candidate genes in severe HTG has improved our understanding of the molecular basis of this phenotype associated with acute pancreatitis, and may help to guide future individualized therapeutic strategies. PMID:22239554
Mutations in LPL, APOC2, APOA5, GPIHBP1 and LMF1 in patients with severe hypertriglyceridaemia.

PubMed

Surendran, R P; Visser, M E; Heemelaar, S; Wang, J; Peter, J; Defesche, J C; Kuivenhoven, J A; Hosseini, M; Péterfy, M; Kastelein, J J P; Johansen, C T; Hegele, R A; Stroes, E S G; Dallinga-Thie, G M

2012-08-01

The severe forms of hypertriglyceridaemia (HTG) are caused by mutations in genes that lead to the loss of function of lipoprotein lipase (LPL). In most patients with severe HTG (TG > 10 mmol L(-1) ), it is a challenge to define the underlying cause. We investigated the molecular basis of severe HTG in patients referred to the Lipid Clinic at the Academic Medical Center Amsterdam. The coding regions of LPL, APOC2, APOA5 and two novel genes, lipase maturation factor 1 (LMF1) and GPI-anchored high-density lipoprotein (HDL)-binding protein 1 (GPIHBP1), were sequenced in 86 patients with type 1 and type 5 HTG and 327 controls. In 46 patients (54%), rare DNA sequence variants were identified, comprising variants in LPL (n = 19), APOC2 (n = 1), APOA5 (n = 2), GPIHBP1 (n = 3) and LMF1 (n = 8). In 22 patients (26%), only common variants in LPL (p.Asp36Asn, p.Asn318Ser and p.Ser474Ter) and APOA5 (p.Ser19Trp) could be identified, whereas no mutations were found in 18 patients (21%). In vitro validation revealed that the mutations in LMF1 were not associated with compromised LPL function. Consistent with this, five of the eight LMF1 variants were also found in controls and therefore cannot account for the observed phenotype. The prevalence of mutations in LPL was 34% and mostly restricted to patients with type 1 HTG. Mutations in GPIHBP1 (n = 3), APOC2 (n = 1) and APOA5 (n = 2) were rare but the associated clinical phenotype was severe. Routine sequencing of candidate genes in severe HTG has improved our understanding of the molecular basis of this phenotype associated with acute pancreatitis and may help to guide future individualized therapeutic strategies. © 2012 The Association for the Publication of the Journal of Internal Medicine.
Effect of Next-Generation Exome Sequencing Depth for Discovery of Diagnostic Variants.

PubMed

Kim, Kyung; Seong, Moon-Woo; Chung, Won-Hyong; Park, Sung Sup; Leem, Sangseob; Park, Won; Kim, Jihyun; Lee, KiYoung; Park, Rae Woong; Kim, Namshin

2015-06-01

Sequencing depth, which is directly related to the cost and time required for the generation, processing, and maintenance of next-generation sequencing data, is an important factor in the practical utilization of such data in clinical fields. Unfortunately, identifying an exome sequencing depth adequate for clinical use is a challenge that has not been addressed extensively. Here, we investigate the effect of exome sequencing depth on the discovery of sequence variants for clinical use. Toward this, we sequenced ten germ-line blood samples from breast cancer patients on the Illumina platform GAII(x) at a high depth of ~200×. We observed that most function-related diverse variants in the human exonic regions could be detected at a sequencing depth of 120×. Furthermore, investigation using a diagnostic gene set showed that the number of clinical variants identified using exome sequencing reached a plateau at an average sequencing depth of about 120×. Moreover, the phenomena were consistent across the breast cancer samples.

Protein-altering variants associated with body mass index implicate pathways that control energy intake and expenditure underpinning obesity

PubMed Central

Turcot, Valérie; Lu, Yingchang; Highland, Heather M; Schurmann, Claudia; Justice, Anne E; Fine, Rebecca S; Bradfield, Jonathan P; Esko, Tõnu; Giri, Ayush; Graff, Mariaelisa; Guo, Xiuqing; Hendricks, Audrey E; Karaderi, Tugce; Lempradl, Adelheid; Locke, Adam E; Mahajan, Anubha; Marouli, Eirini; Sivapalaratnam, Suthesh; Young, Kristin L; Alfred, Tamuno; Feitosa, Mary F; Masca, Nicholas GD; Manning, Alisa K; Medina-Gomez, Carolina; Mudgal, Poorva; Ng, Maggie CY; Reiner, Alex P; Vedantam, Sailaja; Willems, Sara M; Winkler, Thomas W; Abecasis, Goncalo; Aben, Katja K; Alam, Dewan S; Alharthi, Sameer E; Allison, Matthew; Amouyel, Philippe; Asselbergs, Folkert W; Auer, Paul L; Balkau, Beverley; Bang, Lia E; Barroso, Inês; Bastarache, Lisa; Benn, Marianne; Bergmann, Sven; Bielak, Lawrence F; Blüher, Matthias; Boehnke, Michael; Boeing, Heiner; Boerwinkle, Eric; Böger, Carsten A; Bork-Jensen, Jette; Bots, Michiel L; Bottinger, Erwin P; Bowden, Donald W; Brandslund, Ivan; Breen, Gerome; Brilliant, Murray H; Broer, Linda; Brumat, Marco; Burt, Amber A; Butterworth, Adam S; Campbell, Peter T; Cappellani, Stefania; Carey, David J; Catamo, Eulalia; Caulfield, Mark J; Chambers, John C; Chasman, Daniel I; Chen, Yii-Der Ida; Chowdhury, Rajiv; Christensen, Cramer; Chu, Audrey Y; Cocca, Massimiliano; Collins, Francis S; Cook, James P; Corley, Janie; Galbany, Jordi Corominas; Cox, Amanda J; Crosslin, David S; Cuellar-Partida, Gabriel; D'Eustacchio, Angela; Danesh, John; Davies, Gail; de Bakker, Paul IW; de Groot, Mark CH; de Mutsert, Renée; Deary, Ian J; Dedoussis, George; Demerath, Ellen W; den Heijer, Martin; den Hollander, Anneke I; den Ruijter, Hester M; Dennis, Joe G; Denny, Josh C; Di Angelantonio, Emanuele; Drenos, Fotios; Du, Mengmeng; Dubé, Marie-Pierre; Dunning, Alison M; Easton, Douglas F; Edwards, Todd L; Ellinghaus, David; Ellinor, Patrick T; Elliott, Paul; Evangelou, Evangelos; Farmaki, Aliki-Eleni; Farooqi, I. Sadaf; Faul, Jessica D; Fauser, Sascha; Feng, Shuang; Ferrannini, Ele; Ferrieres, Jean; Florez, Jose C; Ford, Ian; Fornage, Myriam; Franco, Oscar H; Franke, Andre; Franks, Paul W; Friedrich, Nele; Frikke-Schmidt, Ruth; Galesloot, Tessel E.; Gan, Wei; Gandin, Ilaria; Gasparini, Paolo; Gibson, Jane; Giedraitis, Vilmantas; Gjesing, Anette P; Gordon-Larsen, Penny; Gorski, Mathias; Grabe, Hans-Jörgen; Grant, Struan FA; Grarup, Niels; Griffiths, Helen L; Grove, Megan L; Gudnason, Vilmundur; Gustafsson, Stefan; Haessler, Jeff; Hakonarson, Hakon; Hammerschlag, Anke R; Hansen, Torben; Harris, Kathleen Mullan; Harris, Tamara B; Hattersley, Andrew T; Have, Christian T; Hayward, Caroline; He, Liang; Heard-Costa, Nancy L; Heath, Andrew C; Heid, Iris M; Helgeland, Øyvind; Hernesniemi, Jussi; Hewitt, Alex W; Holmen, Oddgeir L; Hovingh, G Kees; Howson, Joanna MM; Hu, Yao; Huang, Paul L; Huffman, Jennifer E; Ikram, M Arfan; Ingelsson, Erik; Jackson, Anne U; Jansson, Jan-Håkan; Jarvik, Gail P; Jensen, Gorm B; Jia, Yucheng; Johansson, Stefan; Jørgensen, Marit E; Jørgensen, Torben; Jukema, J Wouter; Kahali, Bratati; Kahn, René S; Kähönen, Mika; Kamstrup, Pia R; Kanoni, Stavroula; Kaprio, Jaakko; Karaleftheri, Maria; Kardia, Sharon LR; Karpe, Fredrik; Kathiresan, Sekar; Kee, Frank; Kiemeney, Lambertus A; Kim, Eric; Kitajima, Hidetoshi; Komulainen, Pirjo; Kooner, Jaspal S; Kooperberg, Charles; Korhonen, Tellervo; Kovacs, Peter; Kuivaniemi, Helena; Kutalik, Zoltán; Kuulasmaa, Kari; Kuusisto, Johanna; Laakso, Markku; Lakka, Timo A; Lamparter, David; Lange, Ethan M; Lange, Leslie A; Langenberg, Claudia; Larson, Eric B; Lee, Nanette R; Lehtimäki, Terho; Lewis, Cora E; Li, Huaixing; Li, Jin; Li-Gao, Ruifang; Lin, Honghuang; Lin, Keng-Hung; Lin, Li-An; Lin, Xu; Lind, Lars; Lindström, Jaana; Linneberg, Allan; Liu, Ching-Ti; Liu, Dajiang J; Liu, Yongmei; Lo, Ken Sin; Lophatananon, Artitaya; Lotery, Andrew J; Loukola, Anu; Luan, Jian'an; Lubitz, Steven A; Lyytikäinen, Leo-Pekka; Männistö, Satu; Marenne, Gaëlle; Mazul, Angela L; McCarthy, Mark I; McKean-Cowdin, Roberta; Medland, Sarah E; Meidtner, Karina; Milani, Lili; Mistry, Vanisha; Mitchell, Paul; Mohlke, Karen L; Moilanen, Leena; Moitry, Marie; Montgomery, Grant W; Mook-Kanamori, Dennis O; Moore, Carmel; Mori, Trevor A; Morris, Andrew D; Morris, Andrew P; Müller-Nurasyid, Martina; Munroe, Patricia B; Nalls, Mike A; Narisu, Narisu; Nelson, Christopher P; Neville, Matt; Nielsen, Sune F; Nikus, Kjell; Njølstad, Pål R; Nordestgaard, Børge G; Nyholt, Dale R; O'Connel, Jeffrey R; O’Donoghue, Michelle L.; Olde Loohuis, Loes M; Ophoff, Roel A; Owen, Katharine R; Packard, Chris J; Padmanabhan, Sandosh; Palmer, Colin NA; Palmer, Nicholette D; Pasterkamp, Gerard; Patel, Aniruddh P; Pattie, Alison; Pedersen, Oluf; Peissig, Peggy L; Peloso, Gina M; Pennell, Craig E; Perola, Markus; Perry, James A; Perry, John RB; Pers, Tune H; Person, Thomas N; Peters, Annette; Petersen, Eva RB; Peyser, Patricia A; Pirie, Ailith; Polasek, Ozren; Polderman, Tinca J; Puolijoki, Hannu; Raitakari, Olli T; Rasheed, Asif; Rauramaa, Rainer; Reilly, Dermot F; Renström, Frida; Rheinberger, Myriam; Ridker, Paul M; Rioux, John D; Rivas, Manuel A; Roberts, David J; Robertson, Neil R; Robino, Antonietta; Rolandsson, Olov; Rudan, Igor; Ruth, Katherine S; Saleheen, Danish; Salomaa, Veikko; Samani, Nilesh J; Sapkota, Yadav; Sattar, Naveed; Schoen, Robert E; Schreiner, Pamela J; Schulze, Matthias B; Scott, Robert A; Segura-Lepe, Marcelo P; Shah, Svati H; Sheu, Wayne H-H; Sim, Xueling; Slater, Andrew J; Small, Kerrin S; Smith, Albert Vernon; Southam, Lorraine; Spector, Timothy D; Speliotes, Elizabeth K; Starr, John M; Stefansson, Kari; Steinthorsdottir, Valgerdur; Stirrups, Kathleen E; Strauch, Konstantin; Stringham, Heather M; Stumvoll, Michael; Sun, Liang; Surendran, Praveen; Swift, Amy J; Tada, Hayato; Tansey, Katherine E; Tardif, Jean-Claude; Taylor, Kent D; Teumer, Alexander; Thompson, Deborah J; Thorleifsson, Gudmar; Thorsteinsdottir, Unnur; Thuesen, Betina H; Tönjes, Anke; Tromp, Gerard; Trompet, Stella; Tsafantakis, Emmanouil; Tuomilehto, Jaakko; Tybjaerg-Hansen, Anne; Tyrer, Jonathan P; Uher, Rudolf; Uitterlinden, André G; Uusitupa, Matti; van der Laan, Sander W; van Duijn, Cornelia M; van Leeuwen, Nienke; van Setten, Jessica; Vanhala, Mauno; Varbo, Anette; Varga, Tibor V; Varma, Rohit; Velez Edwards, Digna R; Vermeulen, Sita H; Veronesi, Giovanni; Vestergaard, Henrik; Vitart, Veronique; Vogt, Thomas F; Völker, Uwe; Vuckovic, Dragana; Wagenknecht, Lynne E; Walker, Mark; Wallentin, Lars; Wang, Feijie; Wang, Carol A; Wang, Shuai; Wang, Yiqin; Ware, Erin B; Wareham, Nicholas J; Warren, Helen R; Waterworth, Dawn M; Wessel, Jennifer; White, Harvey D; Willer, Cristen J; Wilson, James G; Witte, Daniel R; Wood, Andrew R; Wu, Ying; Yaghootkar, Hanieh; Yao, Jie; Yao, Pang; Yerges-Armstrong, Laura M; Young, Robin; Zeggini, Eleftheria; Zhan, Xiaowei; Zhang, Weihua; Zhao, Jing Hua; Zhao, Wei; Zhao, Wei; Zhou, Wei; Zondervan, Krina T; Rotter, Jerome I; Pospisilik, John A; Rivadeneira, Fernando; Borecki, Ingrid B; Deloukas, Panos; Frayling, Timothy M; Lettre, Guillaume; North, Kari E; Lindgren, Cecilia M; Hirschhorn, Joel N; Loos, Ruth JF

2018-01-01

Genome-wide association studies (GWAS) have identified >250 loci for body mass index (BMI), implicating pathways related to neuronal biology. Most GWAS loci represent clusters of common, non-coding variants from which pinpointing causal genes remains challenging. Here, we combined data from 718,734 individuals to discover rare and low-frequency (MAF<5%) coding variants associated with BMI. We identified 14 coding variants in 13 genes, of which eight in genes (ZBTB7B, ACHE, RAPGEF3, RAB21, ZFHX3, ENTPD6, ZFR2, ZNF169) newly implicated in human obesity, two (MC4R, KSR2) previously observed in extreme obesity, and two variants in GIPR. Effect sizes of rare variants are ~10 times larger than of common variants, with the largest effect observed in carriers of an MC4R stop-codon (p.Tyr35Ter, MAF=0.01%), weighing ~7kg more than non-carriers. Pathway analyses confirmed enrichment of neuronal genes and provide new evidence for adipocyte and energy expenditure biology, widening the potential of genetically-supported therapeutic targets to treat obesity. PMID:29273807
Validation and optimization of the Ion Torrent S5 XL sequencer and Oncomine workflow for BRCA1 and BRCA2 genetic testing.

PubMed

Shin, Saeam; Kim, Yoonjung; Chul Oh, Seoung; Yu, Nae; Lee, Seung-Tae; Rak Choi, Jong; Lee, Kyung-A

2017-05-23

In this study, we validated the analytical performance of BRCA1/2 sequencing using Ion Torrent's new bench-top sequencer with amplicon panel with optimized bioinformatics pipelines. Using 43 samples that were previously validated by Illumina's MiSeq platform and/or by Sanger sequencing/multiplex ligation-dependent probe amplification, we amplified the target with the Oncomine™ BRCA Research Assay and sequenced on Ion Torrent S5 XL (Thermo Fisher Scientific, Waltham, MA, USA). We compared two bioinformatics pipelines for optimal processing of S5 XL sequence data: the Torrent Suite with a plug-in Torrent Variant Caller (Thermo Fisher Scientific), and commercial NextGENe software (Softgenetics, State College, PA, USA). All expected 681 single nucleotide variants, 15 small indels, and three copy number variants were correctly called, except one common variant adjacent to a rare variant on the primer-binding site. The sensitivity, specificity, false positive rate, and accuracy for detection of single nucleotide variant and small indels of S5 XL sequencing were 99.85%, 100%, 0%, and 99.99% for the Torrent Variant Caller and 99.85%, 99.99%, 0.14%, and 99.99% for NextGENe, respectively. The reproducibility of variant calling was 100%, and the precision of variant frequency also showed good performance with coefficients of variation between 0.32 and 5.29%. We obtained highly accurate data through uniform and sufficient coverage depth over all target regions and through optimization of the bioinformatics pipeline. We confirmed that our platform is accurate and practical for diagnostic BRCA1/2 testing in a clinical laboratory.
A comparative analysis of exome capture.

PubMed

Parla, Jennifer S; Iossifov, Ivan; Grabill, Ian; Spector, Mona S; Kramer, Melissa; McCombie, W Richard

2011-09-29

Human exome resequencing using commercial target capture kits has been and is being used for sequencing large numbers of individuals to search for variants associated with various human diseases. We rigorously evaluated the capabilities of two solution exome capture kits. These analyses help clarify the strengths and limitations of those data as well as systematically identify variables that should be considered in the use of those data. Each exome kit performed well at capturing the targets they were designed to capture, which mainly corresponds to the consensus coding sequences (CCDS) annotations of the human genome. In addition, based on their respective targets, each capture kit coupled with high coverage Illumina sequencing produced highly accurate nucleotide calls. However, other databases, such as the Reference Sequence collection (RefSeq), define the exome more broadly, and so not surprisingly, the exome kits did not capture these additional regions. Commercial exome capture kits provide a very efficient way to sequence select areas of the genome at very high accuracy. Here we provide the data to help guide critical analyses of sequencing data derived from these products.
Domestic animals as models for biomedical research.

PubMed

Andersson, Leif

2016-01-01

Domestic animals are unique models for biomedical research due to their long history (thousands of years) of strong phenotypic selection. This process has enriched for novel mutations that have contributed to phenotype evolution in domestic animals. The characterization of such mutations provides insights in gene function and biological mechanisms. This review summarizes genetic dissection of about 50 genetic variants affecting pigmentation, behaviour, metabolic regulation, and the pattern of locomotion. The variants are controlled by mutations in about 30 different genes, and for 10 of these our group was the first to report an association between the gene and a phenotype. Almost half of the reported mutations occur in non-coding sequences, suggesting that this is the most common type of polymorphism underlying phenotypic variation since this is a biased list where the proportion of coding mutations are inflated as they are easier to find. The review documents that structural changes (duplications, deletions, and inversions) have contributed significantly to the evolution of phenotypic diversity in domestic animals. Finally, we describe five examples of evolution of alleles, which means that alleles have evolved by the accumulation of several consecutive mutations affecting the function of the same gene.
Domestic animals as models for biomedical research

PubMed Central

Andersson, Leif

2016-01-01

Domestic animals are unique models for biomedical research due to their long history (thousands of years) of strong phenotypic selection. This process has enriched for novel mutations that have contributed to phenotype evolution in domestic animals. The characterization of such mutations provides insights in gene function and biological mechanisms. This review summarizes genetic dissection of about 50 genetic variants affecting pigmentation, behaviour, metabolic regulation, and the pattern of locomotion. The variants are controlled by mutations in about 30 different genes, and for 10 of these our group was the first to report an association between the gene and a phenotype. Almost half of the reported mutations occur in non-coding sequences, suggesting that this is the most common type of polymorphism underlying phenotypic variation since this is a biased list where the proportion of coding mutations are inflated as they are easier to find. The review documents that structural changes (duplications, deletions, and inversions) have contributed significantly to the evolution of phenotypic diversity in domestic animals. Finally, we describe five examples of evolution of alleles, which means that alleles have evolved by the accumulation of several consecutive mutations affecting the function of the same gene. PMID:26479863
Electron holes appear to trigger cancer-implicated mutations

NASA Astrophysics Data System (ADS)

Miller, John; Villagran, Martha

Malignant tumors are caused by mutations, which also affect their subsequent growth and evolution. We use a novel approach, computational DNA hole spectroscopy [M.Y. Suarez-Villagran & J.H. Miller, Sci. Rep. 5, 13571 (2015)], to compute spectra of enhanced hole probability based on actual sequence data. A hole is a mobile site of positive charge created when an electron is removed, for example by radiation or contact with a mutagenic agent. Peaks in the hole spectrum depict sites where holes tend to localize and potentially trigger a base pair mismatch during replication. Our studies of reveal a correlation between hole spectrum peaks and spikes in human mutation frequencies. Importantly, we also find that hole peak positions that do not coincide with large variant frequencies often coincide with cancer-implicated mutations and/or (for coding DNA) encoded conserved amino acids. This enables combining hole spectra with variant data to identify critical base pairs and potential cancer `driver' mutations. Such integration of DNA hole and variance spectra could also prove invaluable for pinpointing critical regions, and sites of driver mutations, in the vast non-protein-coding genome. Supported by the State of Texas through the Texas Ctr. for Superconductivity.
Deep-targeted exon sequencing reveals renal polymorphisms associate with postexercise hypotension among African Americans.

PubMed

Pescatello, Linda S; Schifano, Elizabeth D; Ash, Garrett I; Panza, Gregory A; Lamberti, Lauren; Chen, Ming-Hui; Deshpande, Ved; Zaleski, Amanda; Farinatti, Paulo; Taylor, Beth A; Thompson, Paul D

2016-10-01

We found variants from the Angiotensinogen-Converting Enzyme (ACE), Angiotensin Type 1 Receptor (AGTR1), Aldosterone Synthase (CYP11B2), and Adducin (ADD1) genes exhibited intensity-dependent associations with the ambulatory blood pressure (BP) response following acute exercise, or postexercise hypotension (PEH). In a validation cohort, we sequenced exons from these genes for their associations with PEH Obese (30.9 ± 3.6 kg m -2 ) adults (n = 23; 61% African Americans [AF], 39% Caucasian) 42.0 ± 9.8 years with hypertension (139.8 ± 10.4/84.6 ± 6.2 mmHg) completed three random experiments: bouts of vigorous and moderate intensity cycling and control. Subjects wore an ambulatory BP monitor for 19 h. We performed deep-targeted exon sequencing using the Illumina TruSeq Custom Amplicon kit. Variant genotypes were coded as number of minor alleles (#MA) and selected for further statistical analysis based upon Bonferonni or Benjamini-Yekutieli multiple testing corrected p-values under time adjusted linear models for 19 hourly BP measurements per subject. After vigorous intensity over 19 h among ACE, AGTR1, CYP11B2, and ADD1 variants passing multiple testing thresholds, as the #MA increased, systolic (SBP) and/or diastolic BP decreased 12 mmHg (P = 4.5E-05) to 30 mmHg (P = 6.4E-04) among AF only. In contrast, after moderate intensity over 19 h among ACE and CYP11B2 variants passing multiple testing thresholds, as the #MA increased, SBP increased 21 mmHg (P = 8.0E-04) to 22 mmHg (P = 8.2E-04) among AF only. In this replication study, ACE, AGTR1, CYP11B2, and ADD1 variants exhibited associations with PEH after vigorous, but not moderate intensity exercise among AF only. Renal variants should be explored further with a multi-level "omics" approach for associations with PEH among a large, ethnically diverse sample of adults with hypertension. © 2016 The Authors. Physiological Reports published by Wiley Periodicals, Inc. on behalf of the American Physiological Society and The Physiological Society.
Identification of RNF213 as a susceptibility gene for moyamoya disease and its possible role in vascular development.

PubMed

Liu, Wanyang; Morito, Daisuke; Takashima, Seiji; Mineharu, Yohei; Kobayashi, Hatasu; Hitomi, Toshiaki; Hashikata, Hirokuni; Matsuura, Norio; Yamazaki, Satoru; Toyoda, Atsushi; Kikuta, Ken-ichiro; Takagi, Yasushi; Harada, Kouji H; Fujiyama, Asao; Herzig, Roman; Krischek, Boris; Zou, Liping; Kim, Jeong Eun; Kitakaze, Masafumi; Miyamoto, Susumu; Nagata, Kazuhiro; Hashimoto, Nobuo; Koizumi, Akio

2011-01-01

Moyamoya disease is an idiopathic vascular disorder of intracranial arteries. Its susceptibility locus has been mapped to 17q25.3 in Japanese families, but the susceptibility gene is unknown. Genome-wide linkage analysis in eight three-generation families with moyamoya disease revealed linkage to 17q25.3 (P<10(-4)). Fine mapping demonstrated a 1.5-Mb disease locus bounded by D17S1806 and rs2280147. We conducted exome analysis of the eight index cases in these families, with results filtered through Ng criteria. There was a variant of p.N321S in PCMTD1 and p.R4810K in RNF213 in the 1.5-Mb locus of the eight index cases. The p.N321S variant in PCMTD1 could not be confirmed by the Sanger method. Sequencing RNF213 in 42 index cases confirmed p.R4810K and revealed it to be the only unregistered variant. Genotyping 39 SNPs around RNF213 revealed a founder haplotype transmitted in 42 families. Sequencing the 260-kb region covering the founder haplotype in one index case did not show any coding variants except p.R4810K. A case-control study demonstrated strong association of p.R4810K with moyamoya disease in East Asian populations (251 cases and 707 controls) with an odds ratio of 111.8 (P = 10(-119)). Sequencing of RNF213 in East Asian cases revealed additional novel variants: p.D4863N, p.E4950D, p.A5021V, p.D5160E, and p.E5176G. Among Caucasian cases, variants p.N3962D, p.D4013N, p.R4062Q and p.P4608S were identified. RNF213 encodes a 591-kDa cytosolic protein that possesses two functional domains: a Walker motif and a RING finger domain. These exhibit ATPase and ubiquitin ligase activities. Although the mutant alleles (p.R4810K or p.D4013N in the RING domain) did not affect transcription levels or ubiquitination activity, knockdown of RNF213 in zebrafish caused irregular wall formation in trunk arteries and abnormal sprouting vessels. We provide evidence suggesting, for the first time, the involvement of RNF213 in genetic susceptibility to moyamoya disease.
Factors influencing success of clinical genome sequencing across a broad spectrum of disorders

PubMed Central

Lise, Stefano; Broxholme, John; Cazier, Jean-Baptiste; Rimmer, Andy; Kanapin, Alexander; Lunter, Gerton; Fiddy, Simon; Allan, Chris; Aricescu, A. Radu; Attar, Moustafa; Babbs, Christian; Becq, Jennifer; Beeson, David; Bento, Celeste; Bignell, Patricia; Blair, Edward; Buckle, Veronica J; Bull, Katherine; Cais, Ondrej; Cario, Holger; Chapel, Helen; Copley, Richard R; Cornall, Richard; Craft, Jude; Dahan, Karin; Davenport, Emma E; Dendrou, Calliope; Devuyst, Olivier; Fenwick, Aimée L; Flint, Jonathan; Fugger, Lars; Gilbert, Rodney D; Goriely, Anne; Green, Angie; Greger, Ingo H.; Grocock, Russell; Gruszczyk, Anja V; Hastings, Robert; Hatton, Edouard; Higgs, Doug; Hill, Adrian; Holmes, Chris; Howard, Malcolm; Hughes, Linda; Humburg, Peter; Johnson, David; Karpe, Fredrik; Kingsbury, Zoya; Kini, Usha; Knight, Julian C; Krohn, Jonathan; Lamble, Sarah; Langman, Craig; Lonie, Lorne; Luck, Joshua; McCarthy, Davis; McGowan, Simon J; McMullin, Mary Frances; Miller, Kerry A; Murray, Lisa; Németh, Andrea H; Nesbit, M Andrew; Nutt, David; Ormondroyd, Elizabeth; Oturai, Annette Bang; Pagnamenta, Alistair; Patel, Smita Y; Percy, Melanie; Petousi, Nayia; Piazza, Paolo; Piret, Sian E; Polanco-Echeverry, Guadalupe; Popitsch, Niko; Powrie, Fiona; Pugh, Chris; Quek, Lynn; Robbins, Peter A; Robson, Kathryn; Russo, Alexandra; Sahgal, Natasha; van Schouwenburg, Pauline A; Schuh, Anna; Silverman, Earl; Simmons, Alison; Sørensen, Per Soelberg; Sweeney, Elizabeth; Taylor, John; Thakker, Rajesh V; Tomlinson, Ian; Trebes, Amy; Twigg, Stephen RF; Uhlig, Holm H; Vyas, Paresh; Vyse, Tim; Wall, Steven A; Watkins, Hugh; Whyte, Michael P; Witty, Lorna; Wright, Ben; Yau, Chris; Buck, David; Humphray, Sean; Ratcliffe, Peter J; Bell, John I; Wilkie, Andrew OM; Bentley, David; Donnelly, Peter; McVean, Gilean

2015-01-01

To assess factors influencing the success of whole genome sequencing for mainstream clinical diagnosis, we sequenced 217 individuals from 156 independent cases across a broad spectrum of disorders in whom prior screening had identified no pathogenic variants. We quantified the number of candidate variants identified using different strategies for variant calling, filtering, annotation and prioritisation. We found that jointly calling variants across samples, filtering against both local and external databases, deploying multiple annotation tools and using familial transmission above biological plausibility contributed to accuracy. Overall, we identified disease causing variants in 21% of cases, rising to 34% (23/68) for Mendelian disorders and 57% (8/14) in trios. We also discovered 32 potentially clinically actionable variants in 18 genes unrelated to the referral disorder, though only four were ultimately considered reportable. Our results demonstrate the value of genome sequencing for routine clinical diagnosis, but also highlight many outstanding challenges. PMID:25985138
Exome Sequence Analysis of 14 Families With High Myopia.

PubMed

Kloss, Bethany A; Tompson, Stuart W; Whisenhunt, Kristina N; Quow, Krystina L; Huang, Samuel J; Pavelec, Derek M; Rosenberg, Thomas; Young, Terri L

2017-04-01

To identify causal gene mutations in 14 families with autosomal dominant (AD) high myopia using exome sequencing. Select individuals from 14 large Caucasian families with high myopia were exome sequenced. Gene variants were filtered to identify potential pathogenic changes. Sanger sequencing was used to confirm variants in original DNA, and to test for disease cosegregation in additional family members. Candidate genes and chromosomal loci previously associated with myopic refractive error and its endophenotypes were comprehensively screened. In 14 high myopia families, we identified 73 rare and 31 novel gene variants as candidates for pathogenicity. In seven of these families, two of the novel and eight of the rare variants were within known myopia loci. A total of 104 heterozygous nonsynonymous rare variants in 104 genes were identified in 10 out of 14 probands. Each variant cosegregated with affection status. No rare variants were identified in genes known to cause myopia or in genes closest to published genome-wide association study association signals for refractive error or its endophenotypes. Whole exome sequencing was performed to determine gene variants implicated in the pathogenesis of AD high myopia. This study provides new genes for consideration in the pathogenesis of high myopia, and may aid in the development of genetic profiling of those at greatest risk for attendant ocular morbidities of this disorder.
BETASEQ: a powerful novel method to control type-I error inflation in partially sequenced data for rare variant association testing.

PubMed

Yan, Song; Li, Yun

2014-02-15

Despite its great capability to detect rare variant associations, next-generation sequencing is still prohibitively expensive when applied to large samples. In case-control studies, it is thus appealing to sequence only a subset of cases to discover variants and genotype the identified variants in controls and the remaining cases under the reasonable assumption that causal variants are usually enriched among cases. However, this approach leads to inflated type-I error if analyzed naively for rare variant association. Several methods have been proposed in recent literature to control type-I error at the cost of either excluding some sequenced cases or correcting the genotypes of discovered rare variants. All of these approaches thus suffer from certain extent of information loss and thus are underpowered. We propose a novel method (BETASEQ), which corrects inflation of type-I error by supplementing pseudo-variants while keeps the original sequence and genotype data intact. Extensive simulations and real data analysis demonstrate that, in most practical situations, BETASEQ leads to higher testing powers than existing approaches with guaranteed (controlled or conservative) type-I error. BETASEQ and associated R files, including documentation, examples, are available at http://www.unc.edu/~yunmli/betaseq
Using whole-exome sequencing to identify variants inherited from mosaic parents

PubMed Central

Rios, Jonathan J; Delgado, Mauricio R

2015-01-01

Whole-exome sequencing (WES) has allowed the discovery of genes and variants causing rare human disease. This is often achieved by comparing nonsynonymous variants between unrelated patients, and particularly for sporadic or recessive disease, often identifies a single or few candidate genes for further consideration. However, despite the potential for this approach to elucidate the genetic cause of rare human disease, a majority of patients fail to realize a genetic diagnosis using standard exome analysis methods. Although genetic heterogeneity contributes to the difficulty of exome sequence analysis between patients, it remains plausible that rare human disease is not caused by de novo or recessive variants. Multiple human disorders have been described for which the variant was inherited from a phenotypically normal mosaic parent. Here we highlight the potential for exome sequencing to identify a reasonable number of candidate genes when dominant disease variants are inherited from a mosaic parent. We show the power of WES to identify a limited number of candidate genes using this disease model and how sequence coverage affects identification of mosaic variants by WES. We propose this analysis as an alternative to discover genetic causes of rare human disorders for which typical WES approaches fail to identify likely pathogenic variants. PMID:24986828
Telomere extension by telomerase and ALT generates variant repeats by mechanistically distinct processes

PubMed Central

Lee, Michael; Hills, Mark; Conomos, Dimitri; Stutz, Michael D.; Dagg, Rebecca A.; Lau, Loretta M.S.; Reddel, Roger R.; Pickett, Hilda A.

2014-01-01

Telomeres are terminal repetitive DNA sequences on chromosomes, and are considered to comprise almost exclusively hexameric TTAGGG repeats. We have evaluated telomere sequence content in human cells using whole-genome sequencing followed by telomere read extraction in a panel of mortal cell strains and immortal cell lines. We identified a wide range of telomere variant repeats in human cells, and found evidence that variant repeats are generated by mechanistically distinct processes during telomerase- and ALT-mediated telomere lengthening. Telomerase-mediated telomere extension resulted in biased repeat synthesis of variant repeats that differed from the canonical sequence at positions 1 and 3, but not at positions 2, 4, 5 or 6. This indicates that telomerase is most likely an error-prone reverse transcriptase that misincorporates nucleotides at specific positions on the telomerase RNA template. In contrast, cell lines that use the ALT pathway contained a large range of variant repeats that varied greatly between lines. This is consistent with variant repeats spreading from proximal telomeric regions throughout telomeres in a stochastic manner by recombination-mediated templating of DNA synthesis. The presence of unexpectedly large numbers of variant repeats in cells utilizing either telomere maintenance mechanism suggests a conserved role for variant sequences at human telomeres. PMID:24225324
Resequencing of the vesicular glutamate transporter 2 gene (VGLUT2) reveals some rare genetic variants that may increase the genetic burden in schizophrenia.

PubMed

Shen, Yu-Chih; Liao, Ding-Lieh; Lu, Chao-Lin; Chen, Jen-Yeu; Liou, Ying-Jay; Chen, Tzu-Ting; Chen, Chia-Hsiang

2010-08-01

Vesicular glutamate transporters (VGLUT1-3) package glutamate into vesicles in the presynaptic terminal and regulate the release of glutamate. In mesencephalic dopamine neuron culture, the majority of isolated dopamine neurons express VGLUT2, but not VGLUT1 or 3, have been demonstrated. As related to the dysregulated glutamatergic hypothesis of schizophrenia, the gene encoding VGLUT2 is the most plausible candidate involved in the pathogenesis of this illness. We searched for genetic variants in the promoter region and 12 exons (including UTR ends) of the VGLUT2 gene using direct sequencing in a sample of Han Chinese schizophrenic patients (n=375) and non-psychotic controls (n=366) from Taiwan, and conducted a case-control association study. We identified 8 common SNPs in the VGLUT2 gene. SNP and haplotype-based analyses showed no association with schizophrenia. Besides, we identified 9 rare variants in 13 out of 375 patients, including 3 variants located at the promoter region, 2 synonymous variants located at protein coding regions, and 4 variants located at UTR ends. No rare variants were found in the control subjects. Collectively, these rare variants were significantly overrepresented in the patient group (3.5% versus 0, p value of Fisher's exact test=2.3x10(-5)), suggesting they may contribute to the pathogenesis of schizophrenia. Although the functional significance of these rare variants remains to be characterized, our study may lend support to the multiple rare mutations hypothesis of schizophrenia, and may provide genetic clues to indicate the involvement of the glutamate transmission pathway in the pathogenesis of schizophrenia. Copyright 2010 Elsevier B.V. All rights reserved.
Human T-cell lymphotropic virus type 1 subtype C molecular variants among indigenous australians: new insights into the molecular epidemiology of HTLV-1 in Australo-Melanesia.

PubMed

Cassar, Olivier; Einsiedel, Lloyd; Afonso, Philippe V; Gessain, Antoine

2013-01-01

HTLV-1 infection is endemic among people of Melanesian descent in Papua New Guinea, the Solomon Islands and Vanuatu. Molecular studies reveal that these Melanesian strains belong to the highly divergent HTLV-1c subtype. In Australia, HTLV-1 is also endemic among the Indigenous people of central Australia; however, the molecular epidemiology of HTLV-1 infection in this population remains poorly documented. Studying a series of 23 HTLV-1 strains from Indigenous residents of central Australia, we analyzed coding (gag, pol, env, tax) and non-coding (LTR) genomic proviral regions. Four complete HTLV-1 proviral sequences were also characterized. Phylogenetic analyses implemented with both Neighbor-Joining and Maximum Likelihood methods revealed that all proviral strains belong to the HTLV-1c subtype with a high genetic diversity, which varied with the geographic origin of the infected individuals. Two distinct Australians clades were found, the first including strains derived from most patients whose origins are in the North, and the second comprising a majority of those from the South of central Australia. Time divergence estimation suggests that the speciation of these two Australian clades probably occurred 9,120 years ago (38,000-4,500). The HTLV-1c subtype is endemic to central Australia where the Indigenous population is infected with diverse subtype c variants. At least two Australian clades exist, which cluster according to the geographic origin of the human hosts. These molecular variants are probably of very ancient origin. Further studies could provide new insights into the evolution and modes of dissemination of these retrovirus variants and the associated ancient migration events through which early human settlement of Australia and Melanesia was achieved.
QUES, a new Phaseolus vulgaris genotype resistant to common bean weevils, contains the Arcelin-8 allele coding for new lectin-related variants.

PubMed

Zaugg, Isabelle; Magni, Chiara; Panzeri, Dario; Daminati, Maria Gloria; Bollini, Roberto; Benrey, Betty; Bacher, Sven; Sparvoli, Francesca

2013-03-01

In common bean (Phaseolus vulgaris L.), the most abundant seed proteins are the storage protein phaseolin and the family of closely related APA proteins (arcelin, phytohemagglutinin and α-amylase inhibitor). High variation in APA protein composition has been described and the presence of arcelin (Arc) has been associated with bean resistance against two bruchid beetles, the bean weevil (Acanthoscelides obtectus Say) and the Mexican bean weevil (Zabrotes subfasciatus Bohemian). So far, seven Arc variants have been identified, all in wild accessions, however, only those containing Arc-4 were reported to be resistant to both species. Although many efforts have been made, a successful breeding of this genetic trait into cultivated genotypes has not yet been achieved. Here, we describe a newly collected wild accession (named QUES) and demonstrate its resistance to both A. obtectus and Z. subfasciatus. Immunological and proteomic analyses of QUES seed protein composition indicated the presence of new Arc and arcelin-like (ARL) polypeptides of about 30 and 27 kDa, respectively. Sequencing of cDNAs coding for QUES APA proteins confirmed that this accession contains new APA variants, here referred to as Arc-8 and ARL-8. Moreover, bioinformatic analysis showed the two proteins are closely related to APA components present in the G12949 wild bean accession, which contains the Arc-4 variant. The presence of these new APA components, combined with the observations that they are poorly digested and remain very abundant in A. obtectus feces, so-called frass, suggest that the QUES APA locus is involved in the bruchid resistance. Moreover, molecular analysis indicated a lower complexity of the locus compared to that of G12949, suggesting that QUES should be considered a valuable source of resistance for further breeding purposes.
Human papillomavirus type 16 variants in cervical intraepithelial neoplasia and invasive carcinoma in San Luis Potosí City, Mexico

PubMed Central

López-Revilla, Rubén; Pineda, Marco A; Ortiz-Valdez, Julio; Sánchez-Garza, Mireya; Riego, Lina

2009-01-01

Background In San Luis Potosí City cervical infection by human papillomavirus type 16 (HPV16) associated to dysplastic lesions is more prevalent in younger women. In this work HPV16 subtypes and variants associated to low-grade intraepithelial lesions (LSIL), high-grade intraepithelial lesions (HSIL) and invasive cervical cancer (ICC) of 38 women residing in San Luis Potosí City were identified by comparing their E6 open reading frame sequences. Results Three European (E) variants (E-P, n = 27; E-T350G, n = 7; E-C188G, n = 2) and one AA-a variant (n = 2) were identified among the 38 HPV16 sequences analyzed. E-P variant sequences contained 23 single nucleotide changes, two of which (A334G, A404T) had not been described before and allowed the phylogenetic separation from the other variants. E-P A334G sequences were the most prevalent (22 cases, 57.9%), followed by the E-P Ref prototype (8 cases, 21.1%) and E-P A404T (1 case, 2.6%) sequences. The HSIL + ICC fraction was 0.21 for the E-P A334G variants and 0.00 for the E-P Ref variants. Conclusion We conclude that in the women included in this study the HPV16 E subtype is 19 times more frequent than the AA subtype; that the circulating E variants are E-P (71.1%) > E-T350G (18.4%) > E-C188G (5.3%); that 71.0% of the E-P sequences carry the A334G single nucleotide change and appear to correspond to a HPV16 variant characteristic of San Luis Potosi City more oncogenic than the E-P Ref prototype. PMID:19216802
De novo assembly of a haplotype-resolved human genome.

PubMed

Cao, Hongzhi; Wu, Honglong; Luo, Ruibang; Huang, Shujia; Sun, Yuhui; Tong, Xin; Xie, Yinlong; Liu, Binghang; Yang, Hailong; Zheng, Hancheng; Li, Jian; Li, Bo; Wang, Yu; Yang, Fang; Sun, Peng; Liu, Siyang; Gao, Peng; Huang, Haodong; Sun, Jing; Chen, Dan; He, Guangzhu; Huang, Weihua; Huang, Zheng; Li, Yue; Tellier, Laurent C A M; Liu, Xiao; Feng, Qiang; Xu, Xun; Zhang, Xiuqing; Bolund, Lars; Krogh, Anders; Kristiansen, Karsten; Drmanac, Radoje; Drmanac, Snezana; Nielsen, Rasmus; Li, Songgang; Wang, Jian; Yang, Huanming; Li, Yingrui; Wong, Gane Ka-Shu; Wang, Jun

2015-06-01

The human genome is diploid, and knowledge of the variants on each chromosome is important for the interpretation of genomic information. Here we report the assembly of a haplotype-resolved diploid genome without using a reference genome. Our pipeline relies on fosmid pooling together with whole-genome shotgun strategies, based solely on next-generation sequencing and hierarchical assembly methods. We applied our sequencing method to the genome of an Asian individual and generated a 5.15-Gb assembled genome with a haplotype N50 of 484 kb. Our analysis identified previously undetected indels and 7.49 Mb of novel coding sequences that could not be aligned to the human reference genome, which include at least six predicted genes. This haplotype-resolved genome represents the most complete de novo human genome assembly to date. Application of our approach to identify individual haplotype differences should aid in translating genotypes to phenotypes for the development of personalized medicine.
Permanent Neonatal Diabetes Caused by Creation of an Ectopic Splice Site within the INS Gene

PubMed Central

Gastaldo, Elena; Harries, Lorna W.; Rubio-Cabezas, Oscar; Castaño, Luis

2012-01-01

Background The aim of this study was to characterize the genetic etiology in a patient who presented with permanent neonatal diabetes at 2 months of age. Methodology/Principal Findings Regulatory elements and coding exons 2 and 3 of the INS gene were amplified and sequenced from genomic and complementary DNA samples. A novel heterozygous INS mutation within the terminal intron of the gene was identified in the proband and her affected father. This mutation introduces an ectopic splice site leading to the insertion of 29 nucleotides from the intronic sequence into the mature mRNA, which results in a longer and abnormal transcript. Conclusions/Significance This study highlights the importance of routinely sequencing the exon-intron boundaries and the need to carry out additional studies to confirm the pathogenicity of any identified intronic genetic variants. PMID:22235272
Low incidence of SNVs and indels in trio genomes of Cas9-mediated multiplex edited sheep.

PubMed

Wang, Xiaolong; Liu, Jing; Niu, Yiyuan; Li, Yan; Zhou, Shiwei; Li, Chao; Ma, Baohua; Kou, Qifang; Petersen, Bjoern; Sonstegard, Tad; Huang, Xingxu; Jiang, Yu; Chen, Yulin

2018-05-25

The simplicity of the CRISPR/Cas9 system has enabled its widespread applications in generating animal models, functional genomic screening and in treating genetic and infectious diseases. However, unintended mutations produced by off-target CRISPR/Cas9 nuclease activity may lead to negative consequences. Especially, a very recent study found that gene editing can introduce hundreds of unintended mutations into the genome, and have attracted wide attention. To address the off-target concerns, urgent characterization of the CRISPR/Cas9-mediated off-target mutagenesis is highly anticipated. Here we took advantage of our previously generated gene-edited sheep and performed family trio-based whole genome sequencing which is capable of discriminating variants in the edited progenies that are inherited, naturally generated, or induced by genetic modification. Three family trios were re-sequenced at a high average depth of genomic coverage (~ 25.8×). After developing a pipeline to comprehensively analyze the sequence data for de novo single nucleotide variants, indels and structural variations from the genome; we only found a single unintended event in the form of a 2.4 kb inversion induced by site-specific double-strand breaks between two sgRNA targeting sites at the MSTN locus with a low incidence. We provide the first report on the fidelity of CRISPR-based modification for sheep genomes targeted simultaneously for gene breaks at three coding sequence locations. The trio-based sequencing approach revealed almost negligible off-target modifications, providing timely evidences of the safe application of genome editing in vivo with CRISPR/Cas9.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.