Sample records for acid sequence variants

  1. αIIbβ3 variants defined by next-generation sequencing: Predicting variants likely to cause Glanzmann thrombasthenia

    PubMed Central

    Buitrago, Lorena; Rendon, Augusto; Liang, Yupu; Simeoni, Ilenia; Negri, Ana; Filizola, Marta; Ouwehand, Willem H.; Coller, Barry S.; Alessi, Marie-Christine; Ballmaier, Matthias; Bariana, Tadbir; Bellissimo, Daniel; Bertoli, Marta; Bray, Paul; Bury, Loredana; Carrell, Robin; Cattaneo, Marco; Collins, Peter; French, Deborah; Favier, Remi; Freson, Kathleen; Furie, Bruce; Germeshausen, Manuela; Ghevaert, Cedric; Gomez, Keith; Goodeve, Anne; Gresele, Paolo; Guerrero, Jose; Hampshire, Dan J.; Hadinnapola, Charaka; Heemskerk, Johan; Henskens, Yvonne; Hill, Marian; Hogg, Nancy; Johnsen, Jill; Kahr, Walter; Kerr, Ron; Kunishima, Shinji; Laffan, Michael; Natwani, Amit; Neerman-Arbez, Marguerite; Nurden, Paquita; Nurden, Alan; Ormiston, Mark; Othman, Maha; Ouwehand, Willem; Perry, David; Vilk, Shoshana Ravel; Reitsma, Pieter; Rondina, Matthew; Simeoni, Ilenia; Smethurst, Peter; Stephens, Jonathan; Stevenson, William; Szkotak, Artur; Turro, Ernest; Van Geet, Christel; Vries, Minka; Ward, June; Waye, John; Westbury, Sarah; Whiteheart, Sidney; Wilcox, David; Zhang, Bi

    2015-01-01

    Next-generation sequencing is transforming our understanding of human genetic variation but assessing the functional impact of novel variants presents challenges. We analyzed missense variants in the integrin αIIbβ3 receptor subunit genes ITGA2B and ITGB3 identified by whole-exome or -genome sequencing in the ThromboGenomics project, comprising ∼32,000 alleles from 16,108 individuals. We analyzed the results in comparison with 111 missense variants in these genes previously reported as being associated with Glanzmann thrombasthenia (GT), 20 associated with alloimmune thrombocytopenia, and 5 associated with aniso/macrothrombocytopenia. We identified 114 novel missense variants in ITGA2B (affecting ∼11% of the amino acids) and 68 novel missense variants in ITGB3 (affecting ∼9% of the amino acids). Of the variants, 96% had minor allele frequencies (MAF) < 0.1%, indicating their rarity. Based on sequence conservation, MAF, and location on a complete model of αIIbβ3, we selected three novel variants that affect amino acids previously associated with GT for expression in HEK293 cells. αIIb P176H and β3 C547G severely reduced αIIbβ3 expression, whereas αIIb P943A partially reduced αIIbβ3 expression and had no effect on fibrinogen binding. We used receiver operating characteristic curves of combined annotation-dependent depletion, Polyphen 2-HDIV, and sorting intolerant from tolerant to estimate the percentage of novel variants likely to be deleterious. At optimal cut-off values, which had 69–98% sensitivity in detecting GT mutations, between 27% and 71% of the novel αIIb or β3 missense variants were predicted to be deleterious. Our data have implications for understanding the evolutionary pressure on αIIbβ3 and highlight the challenges in predicting the clinical significance of novel missense variants. PMID:25827233

  2. Efficient analysis of mouse genome sequences reveal many nonsense variants

    PubMed Central

    Steeland, Sophie; Timmermans, Steven; Van Ryckeghem, Sara; Hulpiau, Paco; Saeys, Yvan; Van Montagu, Marc; Vandenbroucke, Roosmarijn E.; Libert, Claude

    2016-01-01

    Genetic polymorphisms in coding genes play an important role when using mouse inbred strains as research models. They have been shown to influence research results, explain phenotypical differences between inbred strains, and increase the amount of interesting gene variants present in the many available inbred lines. SPRET/Ei is an inbred strain derived from Mus spretus that has ∼1% sequence difference with the C57BL/6J reference genome. We obtained a listing of all SNPs and insertions/deletions (indels) present in SPRET/Ei from the Mouse Genomes Project (Wellcome Trust Sanger Institute) and processed these data to obtain an overview of all transcripts having nonsynonymous coding sequence variants. We identified 8,883 unique variants affecting 10,096 different transcripts from 6,328 protein-coding genes, which is about 28% of all coding genes. Because only a subset of these variants results in drastic changes in proteins, we focused on variations that are nonsense mutations that ultimately resulted in a gain of a stop codon. These genes were identified by in silico changing the C57BL/6J coding sequences to the SPRET/Ei sequences, converting them to amino acid (AA) sequences, and comparing the AA sequences. All variants and transcripts affected were also stored in a database, which can be browsed using a SPRET/Ei M. spretus variants web tool (www.spretus.org), including a manual. We validated the tool by demonstrating the loss of function of three proteins predicted to be severely truncated, namely Fas, IRAK2, and IFNγR1. PMID:27147605

  3. Fast single-pass alignment and variant calling using sequencing data

    USDA-ARS?s Scientific Manuscript database

    Sequencing research requires efficient computation. Few programs use already known information about DNA variants when aligning sequence data to the reference map. New program findmap.f90 reads the previous variant list before aligning sequence, calling variant alleles, and summing the allele counts...

  4. Novel sequence variants in the TMIE gene in families with autosomal recessive nonsyndromic hearing impairment

    PubMed Central

    Santos, Regie Lyn P.; El-Shanti, Hatem; Sikandar, Shaheen; Lee, Kwanghyuk; Bhatti, Attya; Yan, Kai; Chahrour, Maria H.; McArthur, Nathan; Pham, Thanh L.; Mahasneh, Amjad Abdullah; Ahmad, Wasim

    2010-01-01

    To date, 37 genes have been identified for nonsyndromic hearing impairment (NSHI). Identifying the functional sequence variants within these genes and knowing their population-specific frequencies is of public health value, in particular for genetic screening for NSHI. To determine putatively functional sequence variants in the transmembrane inner ear (TMIE) gene in Pakistani and Jordanian families with autosomal recessive (AR) NSHI, four Jordanian and 168 Pakistani families with ARNSHI that is not due to GJB2 (CX26) were submitted to a genome scan. Two-point and multipoint parametric linkage analyses were performed, and families with logarithmic odds (LOD) scores of 1.0 or greater within the TMIE region underwent further DNA sequencing. The evolutionary conservation and location in predicted protein domains of amino acid residues where sequence variants occurred were studied to elucidate the possible effects of these sequence variants on function. Of seven families that were screened for TMIE, putatively functional sequence variants were found to segregate with hearing impairment in four families but were not seen in not less than 110 ethnically matched control chromosomes. The previously reported c.241C>T (p.R81C) variant was observed in two Pakistani families. Two novel variants, c.92A>G (p.E31G) and the splice site mutation c.212–2A>C, were identified in one Pakistani and one Jordanian family, respectively. The c.92A>G (p.E31G) variant occurred at a residue that is conserved in the mouse and is predicted to be extracellular. Conservation and potential functionality of previously published mutations were also examined. The prevalence of functional TMIE variants in Pakistani families is 1.7% [95% confidence interval (CI) 0.3–4.8]. Further studies on the spectrum, prevalence rates, and functional effect of sequence variants in the TMIE gene in other populations should demonstrate the true importance of this gene as a cause of hearing impairment. PMID:16389551

  5. mirVAFC: A Web Server for Prioritizations of Pathogenic Sequence Variants from Exome Sequencing Data via Classifications.

    PubMed

    Li, Zhongshan; Liu, Zhenwei; Jiang, Yi; Chen, Denghui; Ran, Xia; Sun, Zhong Sheng; Wu, Jinyu

    2017-01-01

    Exome sequencing has been widely used to identify the genetic variants underlying human genetic disorders for clinical diagnoses, but the identification of pathogenic sequence variants among the huge amounts of benign ones is complicated and challenging. Here, we describe a new Web server named mirVAFC for pathogenic sequence variants prioritizations from clinical exome sequencing (CES) variant data of single individual or family. The mirVAFC is able to comprehensively annotate sequence variants, filter out most irrelevant variants using custom criteria, classify variants into different categories as for estimated pathogenicity, and lastly provide pathogenic variants prioritizations based on classifications and mutation effects. Case studies using different types of datasets for different diseases from publication and our in-house data have revealed that mirVAFC can efficiently identify the right pathogenic candidates as in original work in each case. Overall, the Web server mirVAFC is specifically developed for pathogenic sequence variant identifications from family-based CES variants using classification-based prioritizations. The mirVAFC Web server is freely accessible at https://www.wzgenomics.cn/mirVAFC/. © 2016 WILEY PERIODICALS, INC.

  6. Variants of beta-glucosidase

    DOEpatents

    Fidantsef, Ana; Lamsa, Michael; Gorre-Clancy, Brian

    2015-07-14

    The present invention relates to variants of a parent beta-glucosidase, comprising a substitution at one or more positions corresponding to positions 142, 183, 266, and 703 of amino acids 1 to 842 of SEQ ID NO: 2 or corresponding to positions 142, 183, 266, and 705 of amino acids 1 to 844 of SEQ ID NO: 70, wherein the variant has beta-glucosidase activity. The present invention also relates to nucleotide sequences encoding the variant beta-glucosidases and to nucleic acid constructs, vectors, and host cells comprising the nucleotide sequences.

  7. Variants of beta-glucosidases

    DOEpatents

    Fidantsef, Ana; Lamsa, Michael; Gorre-Clancy, Brian

    2014-10-07

    The present invention relates to variants of a parent beta-glucosidase, comprising a substitution at one or more positions corresponding to positions 142, 183, 266, and 703 of amino acids 1 to 842 of SEQ ID NO: 2 or corresponding to positions 142, 183, 266, and 705 of amino acids 1 to 844 of SEQ ID NO: 70, wherein the variant has beta-glucosidase activity. The present invention also relates to nucleotide sequences encoding the variant beta-glucosidases and to nucleic acid constructs, vectors, and host cells comprising the nucleotide sequences.

  8. Variants of beta-glucosidase

    DOEpatents

    Fidantsef, Ana [Davis, CA; Lamsa, Michael [Davis, CA; Gorre-Clancy, Brian [Elk Grove, CA

    2009-12-29

    The present invention relates to variants of a parent beta-glucosidase, comprising a substitution at one or more positions corresponding to positions 142, 183, 266, and 703 of amino acids 1 to 842 of SEQ ID NO: 2 or corresponding to positions 142, 183, 266, and 705 of amino acids 1 to 844 of SEQ ID NO: 70, wherein the variant has beta-glucosidase activity. The present invention also relates to nucleotide sequences encoding the variant beta-glucosidases and to nucleic acid constructs, vectors, and host cells comprising the nucleotide sequences.

  9. Selecting sequence variants to improve genomic predictions for dairy cattle

    USDA-ARS?s Scientific Manuscript database

    Millions of genetic variants have been identified by population-scale sequencing projects, but subsets are needed for routine genomic predictions or to include on genotyping arrays. Methods of selecting sequence variants were compared using both simulated sequence genotypes and actual data from run ...

  10. Identification of rare paired box 3 variant in strabismus by whole exome sequencing.

    PubMed

    Gong, Hui-Min; Wang, Jing; Xu, Jing; Zhou, Zhan-Yu; Li, Jing-Wen; Chen, Shu-Fang

    2017-01-01

    To identify the potentially pathogenic gene variants that contributes to the etiology of strabismus. A Chinese pedigree with strabismus was collected and the exomes of two affected individuals were sequenced using the next-generation sequencing technology. The resulting variants from exome sequencing were filtered by subsequent bioinformatics methods and the candidate mutation was verified as heterozygous in the affected proposita and her mother by sanger sequencing. Whole exome sequencing and filtering identified a nonsynonymous mutation c.434G-T transition in paired box 3 (PAX3) in the two affected individuals, which were predicted to be deleterious by more than 4 bioinformatics programs. This altered amino acid residue was located in the conserved PAX domain of PAX3. This gene encodes a member of the PAX family of transcription factors, which play critical roles during fetal development. Mutations in PAX3 were associated with Waardenburg syndrome with strabismus. Our results report that the c.434G-T mutation (p.R145L) in PAX3 may contribute to strabismus, expanding our understanding of the causally relevant genes for this disorder.

  11. Imputation of Exome Sequence Variants into Population- Based Samples and Blood-Cell-Trait-Associated Loci in African Americans: NHLBI GO Exome Sequencing Project

    PubMed Central

    Auer, Paul L.; Johnsen, Jill M.; Johnson, Andrew D.; Logsdon, Benjamin A.; Lange, Leslie A.; Nalls, Michael A.; Zhang, Guosheng; Franceschini, Nora; Fox, Keolu; Lange, Ethan M.; Rich, Stephen S.; O’Donnell, Christopher J.; Jackson, Rebecca D.; Wallace, Robert B.; Chen, Zhao; Graubert, Timothy A.; Wilson, James G.; Tang, Hua; Lettre, Guillaume; Reiner, Alex P.; Ganesh, Santhi K.; Li, Yun

    2012-01-01

    Researchers have successfully applied exome sequencing to discover causal variants in selected individuals with familial, highly penetrant disorders. We demonstrate the utility of exome sequencing followed by imputation for discovering low-frequency variants associated with complex quantitative traits. We performed exome sequencing in a reference panel of 761 African Americans and then imputed newly discovered variants into a larger sample of more than 13,000 African Americans for association testing with the blood cell traits hemoglobin, hematocrit, white blood count, and platelet count. First, we illustrate the feasibility of our approach by demonstrating genome-wide-significant associations for variants that are not covered by conventional genotyping arrays; for example, one such association is that between higher platelet count and an MPL c.117G>T (p.Lys39Asn) variant encoding a p.Lys39Asn amino acid substitution of the thrombpoietin receptor gene (p = 1.5 × 10−11). Second, we identified an association between missense variants of LCT and higher white blood count (p = 4 × 10−13). Third, we identified low-frequency coding variants that might account for allelic heterogeneity at several known blood cell-associated loci: MPL c.754T>C (p.Tyr252His) was associated with higher platelet count; CD36 c.975T>G (p.Tyr325∗) was associated with lower platelet count; and several missense variants at the α-globin gene locus were associated with lower hemoglobin. By identifying low-frequency missense variants associated with blood cell traits not previously reported by genome-wide association studies, we establish that exome sequencing followed by imputation is a powerful approach to dissecting complex, genetically heterogeneous traits in large population-based studies. PMID:23103231

  12. Guidelines for investigating causality of sequence variants in human disease

    PubMed Central

    MacArthur, D. G.; Manolio, T. A.; Dimmock, D. P.; Rehm, H. L.; Shendure, J.; Abecasis, G. R.; Adams, D. R.; Altman, R. B.; Antonarakis, S. E.; Ashley, E. A.; Barrett, J. C.; Biesecker, L. G.; Conrad, D. F.; Cooper, G. M.; Cox, N. J.; Daly, M. J.; Gerstein, M. B.; Goldstein, D. B.; Hirschhorn, J. N.; Leal, S. M.; Pennacchio, L. A.; Stamatoyannopoulos, J. A.; Sunyaev, S. R.; Valle, D.; Voight, B. F.; Winckler, W.; Gunter, C.

    2014-01-01

    The discovery of rare genetic variants is accelerating, and clear guidelines for distinguishing disease-causing sequence variants from the many potentially functional variants present in any human genome are urgently needed. Without rigorous standards we risk an acceleration of false-positive reports of causality, which would impede the translation of genomic research findings into the clinical diagnostic setting and hinder biological understanding of disease. Here we discuss the key challenges of assessing sequence variants in human disease, integrating both gene-level and variant-level support for causality. We propose guidelines for summarizing confidence in variant pathogenicity and highlight several areas that require further resource development. PMID:24759409

  13. Guidelines for investigating causality of sequence variants in human disease.

    PubMed

    MacArthur, D G; Manolio, T A; Dimmock, D P; Rehm, H L; Shendure, J; Abecasis, G R; Adams, D R; Altman, R B; Antonarakis, S E; Ashley, E A; Barrett, J C; Biesecker, L G; Conrad, D F; Cooper, G M; Cox, N J; Daly, M J; Gerstein, M B; Goldstein, D B; Hirschhorn, J N; Leal, S M; Pennacchio, L A; Stamatoyannopoulos, J A; Sunyaev, S R; Valle, D; Voight, B F; Winckler, W; Gunter, C

    2014-04-24

    The discovery of rare genetic variants is accelerating, and clear guidelines for distinguishing disease-causing sequence variants from the many potentially functional variants present in any human genome are urgently needed. Without rigorous standards we risk an acceleration of false-positive reports of causality, which would impede the translation of genomic research findings into the clinical diagnostic setting and hinder biological understanding of disease. Here we discuss the key challenges of assessing sequence variants in human disease, integrating both gene-level and variant-level support for causality. We propose guidelines for summarizing confidence in variant pathogenicity and highlight several areas that require further resource development.

  14. Variants of cellobiohydrolases

    DOEpatents

    Bott, Richard R.; Foukaraki, Maria; Hommes, Ronaldus Wilhelmus; Kaper, Thijs; Kelemen, Bradley R.; Kralj, Slavko; Nikolaev, Igor; Sandgren, Mats; Van Lieshout, Johannes Franciscus Thomas; Van Stigt Thans, Sander

    2018-04-10

    Disclosed are a number of homologs and variants of Hypocrea jecorina Ce17A (formerly Trichoderma reesei cellobiohydrolase I or CBH1), nucleic acids encoding the same and methods for producing the same. The homologs and variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted and/or deleted.

  15. Integrating mRNA and Protein Sequencing Enables the Detection and Quantitative Profiling of Natural Protein Sequence Variants of Populus trichocarpa.

    PubMed

    Abraham, Paul E; Wang, Xiaojing; Ranjan, Priya; Nookaew, Intawat; Zhang, Bing; Tuskan, Gerald A; Hettich, Robert L

    2015-12-04

    Next-generation sequencing has transformed the ability to link genotypes to phenotypes and facilitates the dissection of genetic contribution to complex traits. However, it is challenging to link genetic variants with the perturbed functional effects on proteins encoded by such genes. Here we show how RNA sequencing can be exploited to construct genotype-specific protein sequence databases to assess natural variation in proteins, providing information about the molecular toolbox driving cellular processes. For this study, we used two natural genotypes selected from a recent genome-wide association study of Populus trichocarpa, an obligate outcrosser with tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs), as well as insertions and deletions. We profiled the frequency of 128 types of naturally occurring amino acid substitutions, including both expected (neutral) and unexpected (non-neutral) SAAPs, with a subset occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. By zeroing in on the molecular signatures of these important regions that might have previously been uncharacterized, we now provide a high-resolution molecular inventory that should improve accessibility and subsequent identification of natural protein variants in future genotype-to-phenotype studies.

  16. Identification of rare paired box 3 variant in strabismus by whole exome sequencing

    PubMed Central

    Gong, Hui-Min; Wang, Jing; Xu, Jing; Zhou, Zhan-Yu; Li, Jing-Wen; Chen, Shu-Fang

    2017-01-01

    AIM To identify the potentially pathogenic gene variants that contributes to the etiology of strabismus. METHODS A Chinese pedigree with strabismus was collected and the exomes of two affected individuals were sequenced using the next-generation sequencing technology. The resulting variants from exome sequencing were filtered by subsequent bioinformatics methods and the candidate mutation was verified as heterozygous in the affected proposita and her mother by sanger sequencing. RESULTS Whole exome sequencing and filtering identified a nonsynonymous mutation c.434G-T transition in paired box 3 (PAX3) in the two affected individuals, which were predicted to be deleterious by more than 4 bioinformatics programs. This altered amino acid residue was located in the conserved PAX domain of PAX3. This gene encodes a member of the PAX family of transcription factors, which play critical roles during fetal development. Mutations in PAX3 were associated with Waardenburg syndrome with strabismus. CONCLUSION Our results report that the c.434G-T mutation (p.R145L) in PAX3 may contribute to strabismus, expanding our understanding of the causally relevant genes for this disorder. PMID:28861346

  17. Whole-genome sequencing and genetic variant analysis of a Quarter Horse mare.

    PubMed

    Doan, Ryan; Cohen, Noah D; Sawyer, Jason; Ghaffari, Noushin; Johnson, Charlie D; Dindot, Scott V

    2012-02-17

    The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse's genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids.

  18. Whole exome sequencing for familial bicuspid aortic valve identifies putative variants.

    PubMed

    Martin, Lisa J; Pilipenko, Valentina; Kaufman, Kenneth M; Cripe, Linda; Kottyan, Leah C; Keddache, Mehdi; Dexheimer, Phillip; Weirauch, Matthew T; Benson, D Woodrow

    2014-10-01

    Bicuspid aortic valve (BAV) is the most common congenital cardiovascular malformation. Although highly heritable, few causal variants have been identified. The purpose of this study was to identify genetic variants underlying BAV by whole exome sequencing a multiplex BAV kindred. Whole exome sequencing was performed on 17 individuals from a single family (BAV=3; other cardiovascular malformation, 3). Postvariant calling error control metrics were established after examining the relationship between Mendelian inheritance error rate and coverage, quality score, and call rate. To determine the most effective approach to identifying susceptibility variants from among 54 674 variants passing error control metrics, we evaluated 3 variant selection strategies frequently used in whole exome sequencing studies plus extended family linkage. No putative rare, high-effect variants were identified in all affected but no unaffected individuals. Eight high-effect variants were identified by ≥2 of the commonly used selection strategies; however, these were either common in the general population (>10%) or present in the majority of the unaffected family members. However, using extended family linkage, 3 synonymous variants were identified; all 3 variants were identified by at least one other strategy. These results suggest that traditional whole exome sequencing approaches, which assume causal variants alter coding sense, may be insufficient for BAV and other complex traits. Identification of disease-associated variants is facilitated by the use of segregation within families. © 2014 American Heart Association, Inc.

  19. Variant Humicola grisea CBH1.1

    DOEpatents

    Goedegeburr, Frits; Gualfetti, Peter; Mitchinson, Colin; Larenas, Edmund

    2013-02-19

    Disclosed are variants of Humicola grisea Cel7A (CBH1.1), H. jecorina CBH1 variant or S. thermophilium CBH1, nucleic acids encoding the same and methods for producing the same. The variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted.

  20. Variant Humicola grisea CBH1.1

    DOEpatents

    Goedegebuur, Frits [Vlaardingen, NL; Gualfetti, Peter [San Francisco, CA; Mitchinson, Colin [Half Moon Bay, CA; Larenas, Edmund [Moss Beach, CA

    2011-05-31

    Disclosed are variants of Humicola grisea Cel7A (CBH1.1), H. jecorina CBH1 variant or S. thermophilium CBH1, nucleic acids encoding the same and methods for producing the same. The variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted.

  1. Variant humicola grisea CBH1.1

    DOEpatents

    Goedegebuur, Frits; Gualfetti, Peter; Mitchinson, Colin; Edmund, Larenas

    2014-09-09

    Disclosed are variants of Humicola grisea Cel7A (CBH1.1), H. jecorina CBH1 variant or S. thermophilium CBH1, nucleic acids encoding the same and methods for producing the same. The variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted.

  2. Variant Humicola grisea CBH1.1

    DOEpatents

    Goedegebuur, Frits; Gualfetti, Peter; Mitchinson, Colin; Larenas, Edmund

    2014-03-18

    Disclosed are variants of Humicola grisea Cel7A (CBH1.1), H. jecorina CBH1 variant or S. thermophilium CBH1, nucleic acids encoding the same and methods for producing the same. The variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted.

  3. Variant Humicola grisea CBH1.1

    DOEpatents

    Goedegebuur, Frits; Gualfetti, Peter; Mitchinson, Colin; Larenas, Edmund

    2017-05-09

    Disclosed are variants of Humicola grisea CeI7A (CBH1.1), H. jecorina CBH1 variant or S. thermophilium CBH1, nucleic acids encoding the same and methods for producing the same. The variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted.

  4. Variant Humicola grisea CBH1.1

    DOEpatents

    Goedegebuur, Frits [Vlaardingen, NL; Gualfetti, Peter [San Francisco, CA; Mitchinson, Colin [Half Moon Bay, CA; Larenas, Edmund [Moss Beach, CA

    2011-08-16

    Disclosed are variants of Humicola grisea Cel7A (CBH1.1), H. jecorina CBH1 variant or S. thermophilium CBH1, nucleic acids encoding the same and methods for producing the same. The variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted.

  5. Variant Humicola grisea CBH1.1

    DOEpatents

    Goedegebuur, Frits [Vlaardingen, NL; Gualfetti, Peter [San Francisco, CA; Mitchinson, Colin [Half Moon Bay, CA; Larenas, Edmund [Moss Beach, CA

    2012-08-07

    Disclosed are variants of Humicola grisea Cel7A (CBH1.1), H. jecorina CBH1 variant or S. thermophilium CBH1, nucleic acids encoding the same and methods for producing the same. The variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted.

  6. Variant Humicola grisea CBH1.1

    DOEpatents

    Goedegebuur, Frits [Vlaardingen, NL; Gualfetti, Peter [San Francisco, CA; Mitchinson, Colin [Half Moon Bay, CA; Larenas, Edmund [Moss Beach, CA

    2008-12-02

    Disclosed are variants of Humicola grisea Cel7A (CBH1.1), H. jecorina CBH1 variant or S. thermophilium CBH1, nucleic acids encoding the same and methods for producing the same. The variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted.

  7. MutaBind estimates and interprets the effects of sequence variants on protein-protein interactions.

    PubMed

    Li, Minghui; Simonetti, Franco L; Goncearenco, Alexander; Panchenko, Anna R

    2016-07-08

    Proteins engage in highly selective interactions with their macromolecular partners. Sequence variants that alter protein binding affinity may cause significant perturbations or complete abolishment of function, potentially leading to diseases. There exists a persistent need to develop a mechanistic understanding of impacts of variants on proteins. To address this need we introduce a new computational method MutaBind to evaluate the effects of sequence variants and disease mutations on protein interactions and calculate the quantitative changes in binding affinity. The MutaBind method uses molecular mechanics force fields, statistical potentials and fast side-chain optimization algorithms. The MutaBind server maps mutations on a structural protein complex, calculates the associated changes in binding affinity, determines the deleterious effect of a mutation, estimates the confidence of this prediction and produces a mutant structural model for download. MutaBind can be applied to a large number of problems, including determination of potential driver mutations in cancer and other diseases, elucidation of the effects of sequence variants on protein fitness in evolution and protein design. MutaBind is available at http://www.ncbi.nlm.nih.gov/projects/mutabind/. Published by Oxford University Press on behalf of Nucleic Acids Research 2016. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  8. Sequencing Structural Variants in Cancer for Precision Therapeutics.

    PubMed

    Macintyre, Geoff; Ylstra, Bauke; Brenton, James D

    2016-09-01

    The identification of mutations that guide therapy selection for patients with cancer is now routine in many clinical centres. The majority of assays used for solid tumour profiling use DNA sequencing to interrogate somatic point mutations because they are relatively easy to identify and interpret. Many cancers, however, including high-grade serous ovarian, oesophageal, and small-cell lung cancer, are driven by somatic structural variants that are not measured by these assays. Therefore, there is currently an unmet need for clinical assays that can cheaply and rapidly profile structural variants in solid tumours. In this review we survey the landscape of 'actionable' structural variants in cancer and identify promising detection strategies based on massively-parallel sequencing. Copyright © 2016 Elsevier Ltd. All rights reserved.

  9. Diff-seq: A high throughput sequencing-based mismatch detection assay for DNA variant enrichment and discovery

    PubMed Central

    Karas, Vlad O; Sinnott-Armstrong, Nicholas A; Varghese, Vici; Shafer, Robert W; Greenleaf, William J; Sherlock, Gavin

    2018-01-01

    Abstract Much of the within species genetic variation is in the form of single nucleotide polymorphisms (SNPs), typically detected by whole genome sequencing (WGS) or microarray-based technologies. However, WGS produces mostly uninformative reads that perfectly match the reference, while microarrays require genome-specific reagents. We have developed Diff-seq, a sequencing-based mismatch detection assay for SNP discovery without the requirement for specialized nucleic-acid reagents. Diff-seq leverages the Surveyor endonuclease to cleave mismatched DNA molecules that are generated after cross-annealing of a complex pool of DNA fragments. Sequencing libraries enriched for Surveyor-cleaved molecules result in increased coverage at the variant sites. Diff-seq detected all mismatches present in an initial test substrate, with specific enrichment dependent on the identity and context of the variation. Application to viral sequences resulted in increased observation of variant alleles in a biologically relevant context. Diff-Seq has the potential to increase the sensitivity and efficiency of high-throughput sequencing in the detection of variation. PMID:29361139

  10. Using whole-exome sequencing to identify variants inherited from mosaic parents

    PubMed Central

    Rios, Jonathan J; Delgado, Mauricio R

    2015-01-01

    Whole-exome sequencing (WES) has allowed the discovery of genes and variants causing rare human disease. This is often achieved by comparing nonsynonymous variants between unrelated patients, and particularly for sporadic or recessive disease, often identifies a single or few candidate genes for further consideration. However, despite the potential for this approach to elucidate the genetic cause of rare human disease, a majority of patients fail to realize a genetic diagnosis using standard exome analysis methods. Although genetic heterogeneity contributes to the difficulty of exome sequence analysis between patients, it remains plausible that rare human disease is not caused by de novo or recessive variants. Multiple human disorders have been described for which the variant was inherited from a phenotypically normal mosaic parent. Here we highlight the potential for exome sequencing to identify a reasonable number of candidate genes when dominant disease variants are inherited from a mosaic parent. We show the power of WES to identify a limited number of candidate genes using this disease model and how sequence coverage affects identification of mosaic variants by WES. We propose this analysis as an alternative to discover genetic causes of rare human disorders for which typical WES approaches fail to identify likely pathogenic variants. PMID:24986828

  11. Diagnostics based on nucleic acid sequence variant profiling: PCR, hybridization, and NGS approaches.

    PubMed

    Khodakov, Dmitriy; Wang, Chunyan; Zhang, David Yu

    2016-10-01

    Nucleic acid sequence variations have been implicated in many diseases, and reliable detection and quantitation of DNA/RNA biomarkers can inform effective therapeutic action, enabling precision medicine. Nucleic acid analysis technologies being translated into the clinic can broadly be classified into hybridization, PCR, and sequencing, as well as their combinations. Here we review the molecular mechanisms of popular commercial assays, and their progress in translation into in vitro diagnostics. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  12. Insecticidal components from field pea extracts: sequences of some variants of pea albumin 1b.

    PubMed

    Taylor, Wesley G; Sutherland, Daniel H; Olson, Douglas J H; Ross, Andrew R S; Fields, Paul G

    2004-12-15

    Methanol soluble insecticidal peptides with masses of 3752, 3757, and 3805 Da, isolated from crude extracts (C8 extracts) derived from the protein-enriched flour of commercial field peas [Pisum sativum (L.)], were purified by reversed phase chromatography and, after reduction and alkylation, were sequenced by matrix-assisted laser desorption/ionization (MALDI) time-of-flight mass spectrometry with the aid of various peptidases. These major peptides were variants of pea albumin 1b (PA1b) with methionine sulfoxide rather than methionine at position 12. Peptide 3752 showed additional variations at positions 29 (valine for isoleucine) and 34 (histidine for asparagine). A minor, 37 amino acid peptide with a molecular mass of 3788 Da was also sequenced and differed from a known PA1b variant at positions 1, 25, and 31. Sequence variants of PA1b with their molecular masses were compiled, and variants that matched the accurate masses of the experimental peptides were used to narrow the search. MALDI postsource decay experiments on pronase fragments helped to confirm the sequences. Whole and dehulled field peas gave insecticidal C8 extracts in the laboratory that were enriched in peptides with masses of 3736, 3741, and 3789 Da, as determined by high-performance liquid chromatography (HPLC) and electrospray ionization mass spectrometry. It was therefore concluded that oxidation of the methionine residues to methionine sulfoxide occurred primarily during the processing of dehulled peas in a mill.

  13. Comparison and evaluation of two exome capture kits and sequencing platforms for variant calling.

    PubMed

    Zhang, Guoqiang; Wang, Jianfeng; Yang, Jin; Li, Wenjie; Deng, Yutian; Li, Jing; Huang, Jun; Hu, Songnian; Zhang, Bing

    2015-08-05

    To promote the clinical application of next-generation sequencing, it is important to obtain accurate and consistent variants of target genomic regions at low cost. Ion Proton, the latest updated semiconductor-based sequencing instrument from Life Technologies, is designed to provide investigators with an inexpensive platform for human whole exome sequencing that achieves a rapid turnaround time. However, few studies have comprehensively compared and evaluated the accuracy of variant calling between Ion Proton and Illumina sequencing platforms such as HiSeq 2000, which is the most popular sequencing platform for the human genome. The Ion Proton sequencer combined with the Ion TargetSeq Exome Enrichment Kit together make up TargetSeq-Proton, whereas SureSelect-Hiseq is based on the Agilent SureSelect Human All Exon v4 Kit and the HiSeq 2000 sequencer. Here, we sequenced exonic DNA from four human blood samples using both TargetSeq-Proton and SureSelect-HiSeq. We then called variants in the exonic regions that overlapped between the two exome capture kits (33.6 Mb). The rates of shared variant loci called by two sequencing platforms were from 68.0 to 75.3% in four samples, whereas the concordance of co-detected variant loci reached 99%. Sanger sequencing validation revealed that the validated rate of concordant single nucleotide polymorphisms (SNPs) (91.5%) was higher than the SNPs specific to TargetSeq-Proton (60.0%) or specific to SureSelect-HiSeq (88.3%). With regard to 1-bp small insertions and deletions (InDels), the Sanger sequencing validated rates of concordant variants (100.0%) and SureSelect-HiSeq-specific (89.6%) were higher than those of TargetSeq-Proton-specific (15.8%). In the sequencing of exonic regions, a combination of using of two sequencing strategies (SureSelect-HiSeq and TargetSeq-Proton) increased the variant calling specificity for concordant variant loci and the sensitivity for variant loci called by any one platform. However, for the

  14. Regularized rare variant enrichment analysis for case-control exome sequencing data.

    PubMed

    Larson, Nicholas B; Schaid, Daniel J

    2014-02-01

    Rare variants have recently garnered an immense amount of attention in genetic association analysis. However, unlike methods traditionally used for single marker analysis in GWAS, rare variant analysis often requires some method of aggregation, since single marker approaches are poorly powered for typical sequencing study sample sizes. Advancements in sequencing technologies have rendered next-generation sequencing platforms a realistic alternative to traditional genotyping arrays. Exome sequencing in particular not only provides base-level resolution of genetic coding regions, but also a natural paradigm for aggregation via genes and exons. Here, we propose the use of penalized regression in combination with variant aggregation measures to identify rare variant enrichment in exome sequencing data. In contrast to marginal gene-level testing, we simultaneously evaluate the effects of rare variants in multiple genes, focusing on gene-based least absolute shrinkage and selection operator (LASSO) and exon-based sparse group LASSO models. By using gene membership as a grouping variable, the sparse group LASSO can be used as a gene-centric analysis of rare variants while also providing a penalized approach toward identifying specific regions of interest. We apply extensive simulations to evaluate the performance of these approaches with respect to specificity and sensitivity, comparing these results to multiple competing marginal testing methods. Finally, we discuss our findings and outline future research. © 2013 WILEY PERIODICALS, INC.

  15. Single-variant and multi-variant trend tests for genetic association with next-generation sequencing that are robust to sequencing error.

    PubMed

    Kim, Wonkuk; Londono, Douglas; Zhou, Lisheng; Xing, Jinchuan; Nato, Alejandro Q; Musolf, Anthony; Matise, Tara C; Finch, Stephen J; Gordon, Derek

    2012-01-01

    As with any new technology, next-generation sequencing (NGS) has potential advantages and potential challenges. One advantage is the identification of multiple causal variants for disease that might otherwise be missed by SNP-chip technology. One potential challenge is misclassification error (as with any emerging technology) and the issue of power loss due to multiple testing. Here, we develop an extension of the linear trend test for association that incorporates differential misclassification error and may be applied to any number of SNPs. We call the statistic the linear trend test allowing for error, applied to NGS, or LTTae,NGS. This statistic allows for differential misclassification. The observed data are phenotypes for unrelated cases and controls, coverage, and the number of putative causal variants for every individual at all SNPs. We simulate data considering multiple factors (disease mode of inheritance, genotype relative risk, causal variant frequency, sequence error rate in cases, sequence error rate in controls, number of loci, and others) and evaluate type I error rate and power for each vector of factor settings. We compare our results with two recently published NGS statistics. Also, we create a fictitious disease model based on downloaded 1000 Genomes data for 5 SNPs and 388 individuals, and apply our statistic to those data. We find that the LTTae,NGS maintains the correct type I error rate in all simulations (differential and non-differential error), while the other statistics show large inflation in type I error for lower coverage. Power for all three methods is approximately the same for all three statistics in the presence of non-differential error. Application of our statistic to the 1000 Genomes data suggests that, for the data downloaded, there is a 1.5% sequence misclassification rate over all SNPs. Finally, application of the multi-variant form of LTTae,NGS shows high power for a number of simulation settings, although it can have

  16. Single variant and multi-variant trend tests for genetic association with next generation sequencing that are robust to sequencing error

    PubMed Central

    Kim, Wonkuk; Londono, Douglas; Zhou, Lisheng; Xing, Jinchuan; Nato, Andrew; Musolf, Anthony; Matise, Tara C.; Finch, Stephen J.; Gordon, Derek

    2013-01-01

    As with any new technology, next generation sequencing (NGS) has potential advantages and potential challenges. One advantage is the identification of multiple causal variants for disease that might otherwise be missed by SNP-chip technology. One potential challenge is misclassification error (as with any emerging technology) and the issue of power loss due to multiple testing. Here, we develop an extension of the linear trend test for association that incorporates differential misclassification error and may be applied to any number of SNPs. We call the statistic the linear trend test allowing for error, applied to NGS, or LTTae,NGS. This statistic allows for differential misclassification. The observed data are phenotypes for unrelated cases and controls, coverage, and the number of putative causal variants for every individual at all SNPs. We simulate data considering multiple factors (disease mode of inheritance, genotype relative risk, causal variant frequency, sequence error rate in cases, sequence error rate in controls, number of loci, and others) and evaluate type I error rate and power for each vector of factor settings. We compare our results with two recently published NGS statistics. Also, we create a fictitious disease model, based on downloaded 1000 Genomes data for 5 SNPs and 388 individuals, and apply our statistic to that data. We find that the LTTae,NGS maintains the correct type I error rate in all simulations (differential and non-differential error), while the other statistics show large inflation in type I error for lower coverage. Power for all three methods is approximately the same for all three statistics in the presence of non-differential error. Application of our statistic to the 1000 Genomes data suggests that, for the data downloaded, there is a 1.5% sequence misclassification rate over all SNPs. Finally, application of the multi-variant form of LTTae,NGS shows high power for a number of simulation settings, although it can have

  17. Valine/isoleucine variants drive selective pressure in the VP1 sequence of EV-A71 enteroviruses.

    PubMed

    Duy, Nghia Ngu; Huong, Le Thi Thanh; Ravel, Patrice; Huong, Le Thi Song; Dwivedi, Ankit; Sessions, October Michael; Hou, Yan'An; Chua, Robert; Kister, Guilhem; Afelt, Aneta; Moulia, Catherine; Gubler, Duane J; Thiem, Vu Dinh; Thanh, Nguyen Thi Hien; Devaux, Christian; Duong, Tran Nhu; Hien, Nguyen Tran; Cornillot, Emmanuel; Gavotte, Laurent; Frutos, Roger

    2017-05-08

    In 2011-2012, Northern Vietnam experienced its first large scale hand foot and mouth disease (HFMD) epidemic. In 2011, a major HFMD epidemic was also reported in South Vietnam with fatal cases. This 2011-2012 outbreak was the first one to occur in North Vietnam providing grounds to study the etiology, origin and dynamic of the disease. We report here the analysis of the VP1 gene of strains isolated throughout North Vietnam during the 2011-2012 outbreak and before. The VP1 gene of 106 EV-A71 isolates from North Vietnam and 2 from Central Vietnam were sequenced. Sequence alignments were analyzed at the nucleic acid and protein level. Gene polymorphism was also analyzed. A Factorial Correspondence Analysis was performed to correlate amino acid mutations with clinical parameters. The sequences were distributed into four phylogenetic clusters. Three clusters corresponded to the subgenogroup C4 and the last one corresponded to the subgenogroup C5. Each cluster displayed different polymorphism characteristics. Proteins were highly conserved but three sites bearing only Isoleucine (I) or Valine (V) were characterized. The isoleucine/valine variability matched the clusters. Spatiotemporal analysis of the I/V variants showed that all variants which emerged in 2011 and then in 2012 were not the same but were all present in the region prior to the 2011-2012 outbreak. Some correlation was found between certain I/V variants and ethnicity and severity. The 2011-2012 outbreak was not caused by an exogenous strain coming from South Vietnam or elsewhere but by strains already present and circulating at low level in North Vietnam. However, what triggered the outbreak remains unclear. A selective pressure is applied on I/V variants which matches the genetic clusters. I/V variants were shown on other viruses to correlate with pathogenicity. This should be investigated in EV-A71. I/V variants are an easy and efficient way to survey and identify circulating EV-A71 strains.

  18. Detecting very low allele fraction variants using targeted DNA sequencing and a novel molecular barcode-aware variant caller.

    PubMed

    Xu, Chang; Nezami Ranjbar, Mohammad R; Wu, Zhong; DiCarlo, John; Wang, Yexun

    2017-01-03

    Detection of DNA mutations at very low allele fractions with high accuracy will significantly improve the effectiveness of precision medicine for cancer patients. To achieve this goal through next generation sequencing, researchers need a detection method that 1) captures rare mutation-containing DNA fragments efficiently in the mix of abundant wild-type DNA; 2) sequences the DNA library extensively to deep coverage; and 3) distinguishes low level true variants from amplification and sequencing errors with high accuracy. Targeted enrichment using PCR primers provides researchers with a convenient way to achieve deep sequencing for a small, yet most relevant region using benchtop sequencers. Molecular barcoding (or indexing) provides a unique solution for reducing sequencing artifacts analytically. Although different molecular barcoding schemes have been reported in recent literature, most variant calling has been done on limited targets, using simple custom scripts. The analytical performance of barcode-aware variant calling can be significantly improved by incorporating advanced statistical models. We present here a highly efficient, simple and scalable enrichment protocol that integrates molecular barcodes in multiplex PCR amplification. In addition, we developed smCounter, an open source, generic, barcode-aware variant caller based on a Bayesian probabilistic model. smCounter was optimized and benchmarked on two independent read sets with SNVs and indels at 5 and 1% allele fractions. Variants were called with very good sensitivity and specificity within coding regions. We demonstrated that we can accurately detect somatic mutations with allele fractions as low as 1% in coding regions using our enrichment protocol and variant caller.

  19. Localized structural frustration for evaluating the impact of sequence variants.

    PubMed

    Kumar, Sushant; Clarke, Declan; Gerstein, Mark

    2016-12-01

    Population-scale sequencing is increasingly uncovering large numbers of rare single-nucleotide variants (SNVs) in coding regions of the genome. The rarity of these variants makes it challenging to evaluate their deleteriousness with conventional phenotype-genotype associations. Protein structures provide a way of addressing this challenge. Previous efforts have focused on globally quantifying the impact of SNVs on protein stability. However, local perturbations may severely impact protein functionality without strongly disrupting global stability (e.g. in relation to catalysis or allostery). Here, we describe a workflow in which localized frustration, quantifying unfavorable local interactions, is employed as a metric to investigate such effects. Using this workflow on the Protein Databank, we find that frustration produces many immediately intuitive results: for instance, disease-related SNVs create stronger changes in localized frustration than non-disease related variants, and rare SNVs tend to disrupt local interactions to a larger extent than common variants. Less obviously, we observe that somatic SNVs associated with oncogenes and tumor suppressor genes (TSGs) induce very different changes in frustration. In particular, those associated with TSGs change the frustration more in the core than the surface (by introducing loss-of-function events), whereas those associated with oncogenes manifest the opposite pattern, creating gain-of-function events. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  20. Simple and efficient identification of rare recessive pathologically important sequence variants from next generation exome sequence data.

    PubMed

    Carr, Ian M; Morgan, Joanne; Watson, Christopher; Melnik, Svitlana; Diggle, Christine P; Logan, Clare V; Harrison, Sally M; Taylor, Graham R; Pena, Sergio D J; Markham, Alexander F; Alkuraya, Fowzan S; Black, Graeme C M; Ali, Manir; Bonthron, David T

    2013-07-01

    Massively parallel ("next generation") DNA sequencing (NGS) has quickly become the method of choice for seeking pathogenic mutations in rare uncharacterized monogenic diseases. Typically, before DNA sequencing, protein-coding regions are enriched from patient genomic DNA, representing either the entire genome ("exome sequencing") or selected mapped candidate loci. Sequence variants, identified as differences between the patient's and the human genome reference sequences, are then filtered according to various quality parameters. Changes are screened against datasets of known polymorphisms, such as dbSNP and the 1000 Genomes Project, in the effort to narrow the list of candidate causative variants. An increasing number of commercial services now offer to both generate and align NGS data to a reference genome. This potentially allows small groups with limited computing infrastructure and informatics skills to utilize this technology. However, the capability to effectively filter and assess sequence variants is still an important bottleneck in the identification of deleterious sequence variants in both research and diagnostic settings. We have developed an approach to this problem comprising a user-friendly suite of programs that can interactively analyze, filter and screen data from enrichment-capture NGS data. These programs ("Agile Suite") are particularly suitable for small-scale gene discovery or for diagnostic analysis. © 2013 WILEY PERIODICALS, INC.

  1. HGVS Recommendations for the Description of Sequence Variants: 2016 Update.

    PubMed

    den Dunnen, Johan T; Dalgleish, Raymond; Maglott, Donna R; Hart, Reece K; Greenblatt, Marc S; McGowan-Jordan, Jean; Roux, Anne-Francoise; Smith, Timothy; Antonarakis, Stylianos E; Taschner, Peter E M

    2016-06-01

    The consistent and unambiguous description of sequence variants is essential to report and exchange information on the analysis of a genome. In particular, DNA diagnostics critically depends on accurate and standardized description and sharing of the variants detected. The sequence variant nomenclature system proposed in 2000 by the Human Genome Variation Society has been widely adopted and has developed into an internationally accepted standard. The recommendations are currently commissioned through a Sequence Variant Description Working Group (SVD-WG) operating under the auspices of three international organizations: the Human Genome Variation Society (HGVS), the Human Variome Project (HVP), and the Human Genome Organization (HUGO). Requests for modifications and extensions go through the SVD-WG following a standard procedure including a community consultation step. Version numbers are assigned to the nomenclature system to allow users to specify the version used in their variant descriptions. Here, we present the current recommendations, HGVS version 15.11, and briefly summarize the changes that were made since the 2000 publication. Most focus has been on removing inconsistencies and tightening definitions allowing automatic data processing. An extensive version of the recommendations is available online, at http://www.HGVS.org/varnomen. © 2016 WILEY PERIODICALS, INC.

  2. VariantBam: filtering and profiling of next-generational sequencing data using region-specific rules.

    PubMed

    Wala, Jeremiah; Zhang, Cheng-Zhong; Meyerson, Matthew; Beroukhim, Rameen

    2016-07-01

    We developed VariantBam, a C ++ read filtering and profiling tool for use with BAM, CRAM and SAM sequencing files. VariantBam provides a flexible framework for extracting sequencing reads or read-pairs that satisfy combinations of rules, defined by any number of genomic intervals or variant sites. We have implemented filters based on alignment data, sequence motifs, regional coverage and base quality. For example, VariantBam achieved a median size reduction ratio of 3.1:1 when applied to 10 lung cancer whole genome BAMs by removing large tags and selecting for only high-quality variant-supporting reads and reads matching a large dictionary of sequence motifs. Thus VariantBam enables efficient storage of sequencing data while preserving the most relevant information for downstream analysis. VariantBam and full documentation are available at github.com/jwalabroad/VariantBam rameen@broadinstitute.org Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  3. Integrating mRNA and protein sequencing enables the detection and quantitative profiling of natural protein sequence variants of Populus trichocarpa

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Abraham, Paul E.; Wang, Xiaojing; Ranjan, Priya

    The availability of next-generation sequencing technologies has rapidly transformed our ability to link genotypes to phenotypes, and as such, promises to facilitate the dissection of genetic contribution to complex traits. Although discoveries of genetic associations will further our understanding of biology, once candidate variants have been identified, investigators are faced with the challenge of characterizing the functional effects on proteins encoded by such genes. Here we show how next-generation RNA sequencing data can be exploited to construct genotype-specific protein sequence databases, which provide a clearer picture of the molecular toolbox underlying cellular and organismal processes and their variation in amore » natural population. For this study, we used two individual genotypes (DENA-17-3 and VNDL-27-4) from a recent genome wide association (GWA) study of Populus trichocarpa, an obligate outcrosser that exhibits tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs) and insertions and deletions (INDELS). Based on large-scale identification of SAAPs, we profiled the frequency of 128 types of naturally occurring amino acid substitutions, with a subset of SAAPs occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. In addition, we were able to explore the diploid landscape of Populus at the proteome-level, allowing the characterization of heterozygous variants.« less

  4. Integrating mRNA and protein sequencing enables the detection and quantitative profiling of natural protein sequence variants of Populus trichocarpa

    DOE PAGES

    Abraham, Paul E.; Wang, Xiaojing; Ranjan, Priya; ...

    2015-10-20

    The availability of next-generation sequencing technologies has rapidly transformed our ability to link genotypes to phenotypes, and as such, promises to facilitate the dissection of genetic contribution to complex traits. Although discoveries of genetic associations will further our understanding of biology, once candidate variants have been identified, investigators are faced with the challenge of characterizing the functional effects on proteins encoded by such genes. Here we show how next-generation RNA sequencing data can be exploited to construct genotype-specific protein sequence databases, which provide a clearer picture of the molecular toolbox underlying cellular and organismal processes and their variation in amore » natural population. For this study, we used two individual genotypes (DENA-17-3 and VNDL-27-4) from a recent genome wide association (GWA) study of Populus trichocarpa, an obligate outcrosser that exhibits tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs) and insertions and deletions (INDELS). Based on large-scale identification of SAAPs, we profiled the frequency of 128 types of naturally occurring amino acid substitutions, with a subset of SAAPs occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. In addition, we were able to explore the diploid landscape of Populus at the proteome-level, allowing the characterization of heterozygous variants.« less

  5. Variants of glycoside hydrolases

    DOEpatents

    Teter, Sarah; Ward, Connie; Cherry, Joel; Jones, Aubrey; Harris, Paul; Yi, Jung

    2013-02-26

    The present invention relates to variants of a parent glycoside hydrolase, comprising a substitution at one or more positions corresponding to positions 21, 94, 157, 205, 206, 247, 337, 350, 373, 383, 438, 455, 467, and 486 of amino acids 1 to 513 of SEQ ID NO: 2, and optionally further comprising a substitution at one or more positions corresponding to positions 8, 22, 41, 49, 57, 113, 193, 196, 226, 227, 246, 251, 255, 259, 301, 356, 371, 411, and 462 of amino acids 1 to 513 of SEQ ID NO: 2 a substitution at one or more positions corresponding to positions 8, 22, 41, 49, 57, 113, 193, 196, 226, 227, 246, 251, 255, 259, 301, 356, 371, 411, and 462 of amino acids 1 to 513 of SEQ ID NO: 2, wherein the variants have glycoside hydrolase activity. The present invention also relates to nucleotide sequences encoding the variant glycoside hydrolases and to nucleic acid constructs, vectors, and host cells comprising the nucleotide sequences.

  6. Variants of glycoside hydrolases

    DOEpatents

    Teter, Sarah [Davis, CA; Ward, Connie [Hamilton, MT; Cherry, Joel [Davis, CA; Jones, Aubrey [Davis, CA; Harris, Paul [Carnation, WA; Yi, Jung [Sacramento, CA

    2011-04-26

    The present invention relates to variants of a parent glycoside hydrolase, comprising a substitution at one or more positions corresponding to positions 21, 94, 157, 205, 206, 247, 337, 350, 373, 383, 438, 455, 467, and 486 of amino acids 1 to 513 of SEQ ID NO: 2, and optionally further comprising a substitution at one or more positions corresponding to positions 8, 22, 41, 49, 57, 113, 193, 196, 226, 227, 246, 251, 255, 259, 301, 356, 371, 411, and 462 of amino acids 1 to 513 of SEQ ID NO: 2 a substitution at one or more positions corresponding to positions 8, 22, 41, 49, 57, 113, 193, 196, 226, 227, 246, 251, 255, 259, 301, 356, 371, 411, and 462 of amino acids 1 to 513 of SEQ ID NO: 2, wherein the variants have glycoside hydrolase activity. The present invention also relates to nucleotide sequences encoding the variant glycoside hydrolases and to nucleic acid constructs, vectors, and host cells comprising the nucleotide sequences.

  7. Variants of glycoside hydrolases

    DOEpatents

    Teter, Sarah; Ward, Connie; Cherry, Joel; Jones, Aubrey; Harris, Paul; Yi, Jung

    2017-07-11

    The present invention relates to variants of a parent glycoside hydrolase, comprising a substitution at one or more positions corresponding to positions 21, 94, 157, 205, 206, 247, 337, 350, 373, 383, 438, 455, 467, and 486 of amino acids 1 to 513 of SEQ ID NO: 2, and optionally further comprising a substitution at one or more positions corresponding to positions 8, 22, 41, 49, 57, 113, 193, 196, 226, 227, 246, 251, 255, 259, 301, 356, 371, 411, and 462 of amino acids 1 to 513 of SEQ ID NO: 2 a substitution at one or more positions corresponding to positions 8, 22, 41, 49, 57, 113, 193, 196, 226, 227, 246, 251, 255, 259, 301, 356, 371, 411, and 462 of amino acids 1 to 513 of SEQ ID NO: 2, wherein the variants have glycoside hydrolase activity. The present invention also relates to nucleotide sequences encoding the variant glycoside hydrolases and to nucleic acid constructs, vectors, and host cells comprising the nucleotide sequences.

  8. Evolution of simeprevir-resistant variants over time by ultra-deep sequencing in HCV genotype 1b.

    PubMed

    Akuta, Norio; Suzuki, Fumitaka; Sezaki, Hitomi; Suzuki, Yoshiyuki; Hosaka, Tetsuya; Kobayashi, Masahiro; Kobayashi, Mariko; Saitoh, Satoshi; Ikeda, Kenji; Kumada, Hiromitsu

    2014-08-01

    Using ultra-deep sequencing technology, the present study was designed to investigate the evolution of simeprevir-resistant variants (amino acid substitutions of aa80, aa155, aa156, and aa168 positions in HCV NS3 region) over time. In Toranomon Hospital, 18 Japanese patients infected with HCV genotype 1b, received triple therapy of simeprevir/PEG-IFN/ribavirin (DRAGON or CONCERT study). Sustained virological response rate was 67%, and that was significantly higher in patients with IL28B rs8099917 TT than in those with non-TT. Six patients, who did not achieve sustained virological response, were tested for resistant variants by ultra-deep sequencing, at the baseline, at the time of re-elevation of viral loads, and at 96 weeks after the completion of treatment. Twelve of 18 resistant variants, detected at re-elevation of viral load, were de novo resistant variants. Ten of 12 de novo resistant variants become undetectable over time, and that five of seven resistant variants, detected at baseline, persisted over time. In one patient, variants of Q80R at baseline (0.3%) increased at 96-week after the cessation of treatment (10.2%), and de novo resistant variants of D168E (0.3%) also increased at 96-week after the cessation of treatment (9.7%). In conclusion, the present study indicates that the emergence of simeprevir-resistant variants after the start of treatment could not be predicted at baseline, and the majority of de novo resistant variants become undetectable over time. Further large-scale prospective studies should be performed to investigate the clinical utility in detecting simeprevir-resistant variants. © 2014 Wiley Periodicals, Inc.

  9. Effect of Next-Generation Exome Sequencing Depth for Discovery of Diagnostic Variants.

    PubMed

    Kim, Kyung; Seong, Moon-Woo; Chung, Won-Hyong; Park, Sung Sup; Leem, Sangseob; Park, Won; Kim, Jihyun; Lee, KiYoung; Park, Rae Woong; Kim, Namshin

    2015-06-01

    Sequencing depth, which is directly related to the cost and time required for the generation, processing, and maintenance of next-generation sequencing data, is an important factor in the practical utilization of such data in clinical fields. Unfortunately, identifying an exome sequencing depth adequate for clinical use is a challenge that has not been addressed extensively. Here, we investigate the effect of exome sequencing depth on the discovery of sequence variants for clinical use. Toward this, we sequenced ten germ-line blood samples from breast cancer patients on the Illumina platform GAII(x) at a high depth of ~200×. We observed that most function-related diverse variants in the human exonic regions could be detected at a sequencing depth of 120×. Furthermore, investigation using a diagnostic gene set showed that the number of clinical variants identified using exome sequencing reached a plateau at an average sequencing depth of about 120×. Moreover, the phenomena were consistent across the breast cancer samples.

  10. In vivo and in vitro binding of fatty acids to genetic variants of human serum albumin.

    PubMed

    Kragh-Hansen, U; Nielsen, H; Pedersen, A O

    1995-01-01

    The effect of genetic variation on the fatty-acid binding properties of human serum albumin was studied by two methods involving the use of sequenced albumin variants isolated from bisalbuminaemic persons. First, the amount of total fatty acid and of several individuals fatty acids bound to eighteen different variants and to their normal counterpart (Alb A) were determined by a gas-chromatographic micromethod. Pronounced effects on total fatty acid binding were found for the glycosylated variants Alb Redhill (modified in domain II) and Alb Casebrook (domain III) in which cases a 1.7- and 8.6-fold increment, respectively, was found. By contrast, Alb Malm0 (glycosylated in domain I) carried the same amount of fatty acid as Alb A. The fatty acid loads on three chain-termination variants were normal. Finally, eight albumins with single amino-acid substitutions bound normal amounts of fatty acid, whereas one bound increased (1.7-fold) and three albumins bound diminished amounts (0.5-0.6-fold). Information on nineteen individual fatty acids was also obtained. It was possible, based on the type of changes in their relative amounts, to group the fatty acids as follows: (a) = C6:0 - C14:0, (b) = C15:0 - C18:0, (c) = C16:1 - C18:1, and (d) a group composed of essential and conditionally essential fatty acids. For nine variants, in most cases modified in domain III, large changes in one or more of these groups were observed. The changes were not related to any changes in total fatty acid load. Second, the binding of laurate, as a representative of the group (a) fatty acids, to delipidated albumin preparations was studied at pH 7.4 by a kinetic dialysis technique. The first stoichiometric association constant for binding to Alb Redhill (0.7-fold) and Alb Casebrook (0.6-fold) was diminished as compared with binding to their corresponding Alb A, whereas binding to one chain-termination variant and three single amino-acid substitutions were all unaffected by the mutation.

  11. Rare and Coding Region Genetic Variants Associated With Risk of Ischemic Stroke: The NHLBI Exome Sequence Project.

    PubMed

    Auer, Paul L; Nalls, Mike; Meschia, James F; Worrall, Bradford B; Longstreth, W T; Seshadri, Sudha; Kooperberg, Charles; Burger, Kathleen M; Carlson, Christopher S; Carty, Cara L; Chen, Wei-Min; Cupples, L Adrienne; DeStefano, Anita L; Fornage, Myriam; Hardy, John; Hsu, Li; Jackson, Rebecca D; Jarvik, Gail P; Kim, Daniel S; Lakshminarayan, Kamakshi; Lange, Leslie A; Manichaikul, Ani; Quinlan, Aaron R; Singleton, Andrew B; Thornton, Timothy A; Nickerson, Deborah A; Peters, Ulrike; Rich, Stephen S

    2015-07-01

    Stroke is the second leading cause of death and the third leading cause of years of life lost. Genetic factors contribute to stroke prevalence, and candidate gene and genome-wide association studies (GWAS) have identified variants associated with ischemic stroke risk. These variants often have small effects without obvious biological significance. Exome sequencing may discover predicted protein-altering variants with a potentially large effect on ischemic stroke risk. To investigate the contribution of rare and common genetic variants to ischemic stroke risk by targeting the protein-coding regions of the human genome. The National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project (ESP) analyzed approximately 6000 participants from numerous cohorts of European and African ancestry. For discovery, 365 cases of ischemic stroke (small-vessel and large-vessel subtypes) and 809 European ancestry controls were sequenced; for replication, 47 affected sibpairs concordant for stroke subtype and an African American case-control series were sequenced, with 1672 cases and 4509 European ancestry controls genotyped. The ESP's exome sequencing and genotyping started on January 1, 2010, and continued through June 30, 2012. Analyses were conducted on the full data set between July 12, 2012, and July 13, 2013. Discovery of new variants or genes contributing to ischemic stroke risk and subtype (primary analysis) and determination of support for protein-coding variants contributing to risk in previously published candidate genes (secondary analysis). We identified 2 novel genes associated with an increased risk of ischemic stroke: a protein-coding variant in PDE4DIP (rs1778155; odds ratio, 2.15; P = 2.63 × 10(-8)) with an intracellular signal transduction mechanism and in ACOT4 (rs35724886; odds ratio, 2.04; P = 1.24 × 10(-7)) with a fatty acid metabolism; confirmation of PDE4DIP was observed in affected sibpair families with large-vessel stroke

  12. Investigation of the role of TCF4 rare sequence variants in schizophrenia.

    PubMed

    Basmanav, F Buket; Forstner, Andreas J; Fier, Heide; Herms, Stefan; Meier, Sandra; Degenhardt, Franziska; Hoffmann, Per; Barth, Sandra; Fricker, Nadine; Strohmaier, Jana; Witt, Stephanie H; Ludwig, Michael; Schmael, Christine; Moebus, Susanne; Maier, Wolfgang; Mössner, Rainald; Rujescu, Dan; Rietschel, Marcella; Lange, Christoph; Nöthen, Markus M; Cichon, Sven

    2015-07-01

    Transcription factor 4 (TCF4) is one of the most robust of all reported schizophrenia risk loci and is supported by several genetic and functional lines of evidence. While numerous studies have implicated common genetic variation at TCF4 in schizophrenia risk, the role of rare, small-sized variants at this locus-such as single nucleotide variants and short indels which are below the resolution of chip-based arrays requires further exploration. The aim of the present study was to investigate the association between rare TCF4 sequence variants and schizophrenia. Exon-targeted resequencing was performed in 190 German schizophrenia patients. Six rare variants at the coding exons and flanking sequences of the TCF4 gene were identified, including two missense variants and one splice site variant. These six variants were then pooled with nine additional rare variants identified in 379 European participants of the 1000 Genomes Project, and all 15 variants were genotyped in an independent German sample (n = 1,808 patients; n = 2,261 controls). These data were then analyzed using six statistical methods developed for the association analysis of rare variants. No significant association (P < 0.05) was found. However, the results from our association and power analyses suggest that further research into the possible involvement of rare TCF4 sequence variants in schizophrenia risk is warranted by the assessment of larger cohorts with higher statistical power to identify rare variant associations. © 2015 Wiley Periodicals, Inc.

  13. Evaluation of exome variants using the Ion Proton Platform to sequence error-prone regions.

    PubMed

    Seo, Heewon; Park, Yoomi; Min, Byung Joo; Seo, Myung Eui; Kim, Ju Han

    2017-01-01

    The Ion Proton sequencer from Thermo Fisher accurately determines sequence variants from target regions with a rapid turnaround time at a low cost. However, misleading variant-calling errors can occur. We performed a systematic evaluation and manual curation of read-level alignments for the 675 ultrarare variants reported by the Ion Proton sequencer from 27 whole-exome sequencing data but that are not present in either the 1000 Genomes Project and the Exome Aggregation Consortium. We classified positive variant calls into 393 highly likely false positives, 126 likely false positives, and 156 likely true positives, which comprised 58.2%, 18.7%, and 23.1% of the variants, respectively. We identified four distinct error patterns of variant calling that may be bioinformatically corrected when using different strategies: simplicity region, SNV cluster, peripheral sequence read, and base inversion. Local de novo assembly successfully corrected 201 (38.7%) of the 519 highly likely or likely false positives. We also demonstrate that the two sequencing kits from Thermo Fisher (the Ion PI Sequencing 200 kit V3 and the Ion PI Hi-Q kit) exhibit different error profiles across different error types. A refined calling algorithm with better polymerase may improve the performance of the Ion Proton sequencing platform.

  14. Many amino acid substitution variants identified in DNA repair genes during human population screenings are predicted to impact protein function

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xi, T; Jones, I M; Mohrenweiser, H W

    2003-11-03

    Over 520 different amino acid substitution variants have been previously identified in the systematic screening of 91 human DNA repair genes for sequence variation. Two algorithms were employed to predict the impact of these amino acid substitutions on protein activity. Sorting Intolerant From Tolerant (SIFT) classified 226 of 508 variants (44%) as ''Intolerant''. Polymorphism Phenotyping (PolyPhen) classed 165 of 489 amino acid substitutions (34%) as ''Probably or Possibly Damaging''. Another 9-15% of the variants were classed as ''Potentially Intolerant or Damaging''. The results from the two algorithms are highly associated, with concordance in predicted impact observed for {approx}62% of themore » variants. Twenty one to thirty one percent of the variant proteins are predicted to exhibit reduced activity by both algorithms. These variants occur at slightly lower individual allele frequency than do the variants classified as ''Tolerant'' or ''Benign''. Both algorithms correctly predicted the impact of 26 functionally characterized amino acid substitutions in the APE1 protein on biochemical activity, with one exception. It is concluded that a substantial fraction of the missense variants observed in the general human population are functionally relevant. These variants are expected to be the molecular genetic and biochemical basis for the associations of reduced DNA repair capacity phenotypes with elevated cancer risk.« less

  15. GenProBiS: web server for mapping of sequence variants to protein binding sites.

    PubMed

    Konc, Janez; Skrlj, Blaz; Erzen, Nika; Kunej, Tanja; Janezic, Dusanka

    2017-07-03

    Discovery of potentially deleterious sequence variants is important and has wide implications for research and generation of new hypotheses in human and veterinary medicine, and drug discovery. The GenProBiS web server maps sequence variants to protein structures from the Protein Data Bank (PDB), and further to protein-protein, protein-nucleic acid, protein-compound, and protein-metal ion binding sites. The concept of a protein-compound binding site is understood in the broadest sense, which includes glycosylation and other post-translational modification sites. Binding sites were defined by local structural comparisons of whole protein structures using the Protein Binding Sites (ProBiS) algorithm and transposition of ligands from the similar binding sites found to the query protein using the ProBiS-ligands approach with new improvements introduced in GenProBiS. Binding site surfaces were generated as three-dimensional grids encompassing the space occupied by predicted ligands. The server allows intuitive visual exploration of comprehensively mapped variants, such as human somatic mis-sense mutations related to cancer and non-synonymous single nucleotide polymorphisms from 21 species, within the predicted binding sites regions for about 80 000 PDB protein structures using fast WebGL graphics. The GenProBiS web server is open and free to all users at http://genprobis.insilab.org. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Variant calling in low-coverage whole genome sequencing of a Native American population sample.

    PubMed

    Bizon, Chris; Spiegel, Michael; Chasse, Scott A; Gizer, Ian R; Li, Yun; Malc, Ewa P; Mieczkowski, Piotr A; Sailsbery, Josh K; Wang, Xiaoshu; Ehlers, Cindy L; Wilhelmsen, Kirk C

    2014-01-30

    The reduction in the cost of sequencing a human genome has led to the use of genotype sampling strategies in order to impute and infer the presence of sequence variants that can then be tested for associations with traits of interest. Low-coverage Whole Genome Sequencing (WGS) is a sampling strategy that overcomes some of the deficiencies seen in fixed content SNP array studies. Linkage-disequilibrium (LD) aware variant callers, such as the program Thunder, may provide a calling rate and accuracy that makes a low-coverage sequencing strategy viable. We examined the performance of an LD-aware variant calling strategy in a population of 708 low-coverage whole genome sequences from a community sample of Native Americans. We assessed variant calling through a comparison of the sequencing results to genotypes measured in 641 of the same subjects using a fixed content first generation exome array. The comparison was made using the variant calling routines GATK Unified Genotyper program and the LD-aware variant caller Thunder. Thunder was found to improve concordance in a coverage dependent fashion, while correctly calling nearly all of the common variants as well as a high percentage of the rare variants present in the sample. Low-coverage WGS is a strategy that appears to collect genetic information intermediate in scope between fixed content genotyping arrays and deep-coverage WGS. Our data suggests that low-coverage WGS is a viable strategy with a greater chance of discovering novel variants and associations than fixed content arrays for large sample association analyses.

  17. Characterization of alanine to valine sequence variants in the Fc region of nivolumab biosimilar produced in Chinese hamster ovary cells.

    PubMed

    Li, Yantao; Fu, Tuo; Liu, Tao; Guo, Huaizu; Guo, Qingcheng; Xu, Jin; Zhang, Dapeng; Qian, Weizhu; Dai, Jianxin; Li, Bohua; Guo, Yajun; Hou, Sheng; Wang, Hao

    2016-07-01

    Nivolumab is a therapeutic fully human IgG4 antibody to programmed death 1 (PD-1). In this study, a nivolumab biosimilar, which was produced in our laboratory, was analyzed and characterized. Sequence variants that contain undesired amino acid sequences may cause concern during biosimilar bioprocess development. We found that low levels of sequence variants were detected in the heavy chain of the nivolumab biosimilar by ultra performance liquid chromatography (UPLC) and tandem mass spectrometry. It was further identified with UPLC-MS/MS by IdeS or trypsin digestion. The sequence variant was confirmed through addition of synthetic mutant peptide. Subsequently, the mixing base signal of normal and mutant sequence was detected through DNA sequencing. The relative levels of mutant A424V in the Fc region of the heavy chain have been detected and demonstrated to be 12.25% and 13.54%, via base peak intensity (BPI) and UV chromatography of the tryptic peptide mapping, respectively. A424V variant was also quantified by real-time PCR (RT-PCR) at the DNA and RNA level, which was 19.2% and 16.8%, respectively. The relative content of the mutant was consistent at the DNA, RNA and protein level, indicating that the A424V mutation may have little influence at transcriptional or translational levels. These results demonstrate that orthogonal state-of-the-art techniques such as LC- UV- MS and RT-PCR should be implemented to characterize recombinant proteins and cell lines for development of biosimilars. Our study suggests that it is important to establish an integrated and effective analytical method to monitor and characterize sequence variants during antibody drug development, especially for antibody biosimilar products.

  18. RareVariantVis: new tool for visualization of causative variants in rare monogenic disorders using whole genome sequencing data.

    PubMed

    Stokowy, Tomasz; Garbulowski, Mateusz; Fiskerstrand, Torunn; Holdhus, Rita; Labun, Kornel; Sztromwasser, Pawel; Gilissen, Christian; Hoischen, Alexander; Houge, Gunnar; Petersen, Kjell; Jonassen, Inge; Steen, Vidar M

    2016-10-01

    The search for causative genetic variants in rare diseases of presumed monogenic inheritance has been boosted by the implementation of whole exome (WES) and whole genome (WGS) sequencing. In many cases, WGS seems to be superior to WES, but the analysis and visualization of the vast amounts of data is demanding. To aid this challenge, we have developed a new tool-RareVariantVis-for analysis of genome sequence data (including non-coding regions) for both germ line and somatic variants. It visualizes variants along their respective chromosomes, providing information about exact chromosomal position, zygosity and frequency, with point-and-click information regarding dbSNP IDs, gene association and variant inheritance. Rare variants as well as de novo variants can be flagged in different colors. We show the performance of the RareVariantVis tool in the Genome in a Bottle WGS data set. https://www.bioconductor.org/packages/3.3/bioc/html/RareVariantVis.html tomasz.stokowy@k2.uib.no Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  19. DNA sequence of the lymphotropic variant of minute virus of mice, MVM(i), and comparison with the DNA sequence of the fibrotropic prototype strain.

    PubMed

    Astell, C R; Gardiner, E M; Tattersall, P

    1986-02-01

    The sequence of molecular clones of the genome of MVM(i), a lymphotropic variant of minute virus of mice, was determined and compared with that of MVM(p), the fibrotropic prototype strain. At the nucleotide level there are 163 base changes: 129 transitions and 34 transversions. Most nucleotide changes are silent, with only 27 amino acids changes predicted, of which 22 are conservative. Notable differences between the MVM(i) and MVM(p) genomes which may account for the cell specificities of these viruses occur within the 3' nontranslated regions. The differences discussed include the absence of a 65-base-pair direct in MVM(i), the presence of only two polyadenylation sites in MVM(i) compared with four in MVM(p), and sequences that bear a resemblance to enhancer sequences. Also included in this paper is an important correction to the MVM(p) sequence (C.R. Astell, M. Thomson, M. Merchlinsky, and D. C. Ward, Nucleic Acids Res. 11:999-1018, 1983).

  20. Inner retinal dystrophy in a patient with biallelic sequence variants in BRAT1.

    PubMed

    Oatts, Julius T; Duncan, Jacque L; Hoyt, Creig S; Slavotinek, Anne M; Moore, Anthony T

    2017-12-01

    Mutations in the BRCA1-associated protein required for the ataxia telangiectasia mutated (ATM) activation-1 (BRAT1) gene cause lethal neonatal rigidity and multifocal seizure syndrome characterized by rigidity and intractable seizures and a milder phenotype with intellectual disability, seizures, nonprogressive cerebellar ataxia or dyspraxia, and cerebellar atrophy. To date, nystagmus, cortical visual impairment, impairment of central vision, optic nerve hypoplasia, and optic atrophy have been described in this condition. This article describes the retinal findings in a patient with biallelic deleterious sequence variants in BRAT1. Case report of a child with biallelic sequence variants in the BRAT1 gene. This patient had developmental delay, microcephaly, nystagmus, and esotropia, and full-field electroretinography (ERG) revealed an inner retinal dystrophy. She was found on exome sequencing to have compound heterozygous sequence variants in the BRAT1 gene: one maternally inherited frameshift variant (c.294dupA, predicting p.Leu99Thrfs*92), which has previously been reported, and one paternally inherited novel missense variant (c.803G>A, p.Arg268His), which is likely to affect protein function. Biallelic sequence variants in BRAT1 have been reported to cause a variety of ocular and systemic manifestations, but to our knowledge, this is the first report of inner retinal dysfunction manifest as selective loss of full-field ERG scotopic and photopic b-wave amplitudes.

  1. Clinical Validation and Implementation of a Targeted Next-Generation Sequencing Assay to Detect Somatic Variants in Non-Small Cell Lung, Melanoma, and Gastrointestinal Malignancies

    PubMed Central

    Fisher, Kevin E.; Zhang, Linsheng; Wang, Jason; Smith, Geoffrey H.; Newman, Scott; Schneider, Thomas M.; Pillai, Rathi N.; Kudchadkar, Ragini R.; Owonikoko, Taofeek K.; Ramalingam, Suresh S.; Lawson, David H.; Delman, Keith A.; El-Rayes, Bassel F.; Wilson, Malania M.; Sullivan, H. Clifford; Morrison, Annie S.; Balci, Serdar; Adsay, N. Volkan; Gal, Anthony A.; Sica, Gabriel L.; Saxe, Debra F.; Mann, Karen P.; Hill, Charles E.; Khuri, Fadlo R.; Rossi, Michael R.

    2017-01-01

    We tested and clinically validated a targeted next-generation sequencing (NGS) mutation panel using 80 formalin-fixed, paraffin-embedded (FFPE) tumor samples. Forty non-small cell lung carcinoma (NSCLC), 30 melanoma, and 30 gastrointestinal (12 colonic, 10 gastric, and 8 pancreatic adenocarcinoma) FFPE samples were selected from laboratory archives. After appropriate specimen and nucleic acid quality control, 80 NGS libraries were prepared using the Illumina TruSight tumor (TST) kit and sequenced on the Illumina MiSeq. Sequence alignment, variant calling, and sequencing quality control were performed using vendor software and laboratory-developed analysis workflows. TST generated ≥500× coverage for 98.4% of the 13,952 targeted bases. Reproducible and accurate variant calling was achieved at ≥5% variant allele frequency with 8 to 12 multiplexed samples per MiSeq flow cell. TST detected 112 variants overall, and confirmed all known single-nucleotide variants (n = 27), deletions (n = 5), insertions (n = 3), and multinucleotide variants (n = 3). TST detected at least one variant in 85.0% (68/80), and two or more variants in 36.2% (29/80), of samples. TP53 was the most frequently mutated gene in NSCLC (13 variants; 13/32 samples), gastrointestinal malignancies (15 variants; 13/25 samples), and overall (30 variants; 28/80 samples). BRAF mutations were most common in melanoma (nine variants; 9/23 samples). Clinically relevant NGS data can be obtained from routine clinical FFPE solid tumor specimens using TST, benchtop instruments, and vendor-supplied bioinformatics pipelines. PMID:26801070

  2. Nucleotide sequence of wild-type hepatitis A virus GBM in comparison with two cell culture-adapted variants.

    PubMed Central

    Graff, J; Normann, A; Feinstone, S M; Flehmig, B

    1994-01-01

    In order to study cell tropism and attenuation of hepatitis A virus (HAV), the genome of HAV wild-type GBM and two cell culture-adapted variants, GBM/FRhK and GBM/HFS, were cloned and sequenced after amplification by reverse transcriptase-PCR. During virus cultivation, the HAV variant GBM/FRhK had a strict host range for FRhK-4 cells, in contrast to GBM/HFS, which can be grown in HFS and FRhK-4 cells. The HAV variant GBM/HFS was shown to be attenuated when inoculated into chimpanzees (B. Flehmig, R. F. Mauler, G. Noll, E. Weinmann, and J. P. Gregerson, p. 87-90, in A. Zuckerman, ed., Viral Hepatitis and Liver Disease, 1988). On the basis of this biological background, the comparison of the nucleotide sequences of these three HAV GBM variants should elucidate differences which may be of importance for cell tropism and attenuation. The comparison of the genome between the GBM wild type and HAV wild types HM175 (J. I. Cohen, J. R. Ticehurst, R. H. Purcell, A. Buckler-White, and B. M. Baroudy, J. Virol. 61:50-59, 1987) and HAV-LA (R. Najarian, O. Caput, W. Gee, S. J. Potter, A. Renard, J. Merryweather, G. Van Nest, and D. Dina, Proc. Natl. Acad. Sci. USA 82:2627-2631, 1985) showed a 92 to 96.3% identity, whereas the identity was 99.3 to 99.6% between the GBM variants. Nucleotide differences between the wild-type and the cell culture-adapted variants, which were identical in both cell culture-adapted GBM variants, were localized in the 5' noncoding region; in 2B, 3B, and 3D; and in the 3' noncoding region. Our result concerning the 2B/2C region confirms a mutation at position 3889 (C-->T, alanine to valine), which had been shown to be of importance for cell culture adaptation (S. U. Emerson, C. McRill, B. Rosenblum, S. M. Feinstone, and R. H. Purcell, J. Virol. 65:4882-4886, 1991; S. U. Emerson, Y. K. Huang, C. McRill, M. Lewis, and R. H. Purcell, J. Virol. 66:650-654, 1992), whereas other mutations differ from published HAV sequence data and may be cell specific

  3. Pooled-DNA Sequencing for Elucidating New Genomic Risk Factors, Rare Variants Underlying Alzheimer's Disease.

    PubMed

    Jin, Sheng Chih; Benitez, Bruno A; Deming, Yuetiva; Cruchaga, Carlos

    2016-01-01

    Analyses of genome-wide association studies (GWAS) for complex disorders usually identify common variants with a relatively small effect size that only explain a small proportion of phenotypic heritability. Several studies have suggested that a significant fraction of heritability may be explained by low-frequency (minor allele frequency (MAF) of 1-5 %) and rare-variants that are not contained in the commercial GWAS genotyping arrays (Schork et al., Curr Opin Genet Dev 19:212, 2009). Rare variants can also have relatively large effects on risk for developing human diseases or disease phenotype (Cruchaga et al., PLoS One 7:e31039, 2012). However, it is necessary to perform next-generation sequencing (NGS) studies in a large population (>4,000 samples) to detect a significant rare-variant association. Several NGS methods, such as custom capture sequencing and amplicon-based sequencing, are designed to screen a small proportion of the genome, but most of these methods are limited in the number of samples that can be multiplexed (i.e. most sequencing kits only provide 96 distinct index). Additionally, the sequencing library preparation for 4,000 samples remains expensive and thus conducting NGS studies with the aforementioned methods are not feasible for most research laboratories.The need for low-cost large scale rare-variant detection makes pooled-DNA sequencing an ideally efficient and cost-effective technique to identify rare variants in target regions by sequencing hundreds to thousands of samples. Our recent work has demonstrated that pooled-DNA sequencing can accurately detect rare variants in targeted regions in multiple DNA samples with high sensitivity and specificity (Jin et al., Alzheimers Res Ther 4:34, 2012). In these studies we used a well-established pooled-DNA sequencing approach and a computational package, SPLINTER (short indel prediction by large deviation inference and nonlinear true frequency estimation by recursion) (Vallania et al., Genome Res

  4. From days to hours: reporting clinically actionable variants from whole genome sequencing.

    PubMed

    Middha, Sumit; Baheti, Saurabh; Hart, Steven N; Kocher, Jean-Pierre A

    2014-01-01

    As the cost of whole genome sequencing (WGS) decreases, clinical laboratories will be looking at broadly adopting this technology to screen for variants of clinical significance. To fully leverage this technology in a clinical setting, results need to be reported quickly, as the turnaround rate could potentially impact patient care. The latest sequencers can sequence a whole human genome in about 24 hours. However, depending on the computing infrastructure available, the processing of data can take several days, with the majority of computing time devoted to aligning reads to genomics regions that are to date not clinically interpretable. In an attempt to accelerate the reporting of clinically actionable variants, we have investigated the utility of a multi-step alignment algorithm focused on aligning reads and calling variants in genomic regions of clinical relevance prior to processing the remaining reads on the whole genome. This iterative workflow significantly accelerates the reporting of clinically actionable variants with no loss of accuracy when compared to genotypes obtained with the OMNI SNP platform or to variants detected with a standard workflow that combines Novoalign and GATK.

  5. Whole-Genome Sequencing and Variant Analysis of Human Papillomavirus 16 Infections.

    PubMed

    van der Weele, Pascal; Meijer, Chris J L M; King, Audrey J

    2017-10-01

    Human papillomavirus (HPV) is a strongly conserved DNA virus, high-risk types of which can cause cervical cancer in persistent infections. The most common type found in HPV-attributable cancer is HPV16, which can be subdivided into four lineages (A to D) with different carcinogenic properties. Studies have shown HPV16 sequence diversity in different geographical areas, but only limited information is available regarding HPV16 diversity within a population, especially at the whole-genome level. We analyzed HPV16 major variant diversity and conservation in persistent infections and performed a single nucleotide polymorphism (SNP) comparison between persistent and clearing infections. Materials were obtained in the Netherlands from a cohort study with longitudinal follow-up for up to 3 years. Our analysis shows a remarkably large variant diversity in the population. Whole-genome sequences were obtained for 57 persistent and 59 clearing HPV16 infections, resulting in 109 unique variants. Interestingly, persistent infections were completely conserved through time. One reinfection event was identified where the initial and follow-up samples clustered differently. Non-A1/A2 variants seemed to clear preferentially ( P = 0.02). Our analysis shows that population-wide HPV16 sequence diversity is very large. In persistent infections, the HPV16 sequence was fully conserved. Sequencing can identify HPV16 reinfections, although occurrence is rare. SNP comparison identified no strongly acting effect of the viral genome affecting HPV16 infection clearance or persistence in up to 3 years of follow-up. These findings suggest the progression of an early HPV16 infection could be host related. IMPORTANCE Human papillomavirus 16 (HPV16) is the predominant type found in cervical cancer. Progression of initial infection to cervical cancer has been linked to sequence properties; however, knowledge of variants circulating in European populations, especially with longitudinal follow-up, is

  6. BEST1 sequence variants in Italian patients with vitelliform macular dystrophy

    PubMed Central

    Sodi, Andrea; Passerini, Ilaria; Caputo, Roberto; Bacci, Giacomo Maria; Bodoj, Mirela; Torricelli, Francesca; Menchini, Ugo

    2012-01-01

    Purpose To analyze the spectrum of sequence variants in the BEST1 gene in a group of Italian patients affected by Best vitelliform macular dystrophy (VMD). Methods Thirty Italian patients with a diagnosis of VMD and 20 clinically healthy relatives were recruited. They belonged to 19 Italian families predominantly originating from central Italy. They received a standard ophthalmologic examination, OCT scan, and electrophysiological tests (ERG and EOG). Fluorescein and ICG angiographies and fundus autofluorescence imaging were performed in selected cases. DNA samples were analyzed for sequence variants of the BEST1 gene by direct sequencing techniques. Results Nine missense variants and one deletion were found in the affected patients; each patient carried one mutation. Five variants [c.73C>T (p.Arg25Trp), c.652C>T (p.Arg218Cys), c.652C>G (p.Arg218Gly), c.728C>T (p.Ala243Val), c.893T>C (p.Phe298Ser)] have already been described in literature while another five variants [c.217A>C (p.Ile73Leu), c.239T>G (p.Phe80Cys), c.883_885del (p.Ile295del), c.907G>A (p.Asp303Asn), c.911A>G (p.Asp304Gly)] had not previously been reported. Affected patients, sometimes even from the same family, occasionally showed variable phenotypes. One heterozygous variant was also found in five clinically healthy relatives with normal fundus, visual acuity and ERG but with abnormal EOG. Conclusions Ten variants in the BEST1 gene were detected in a group of individuals with clinically apparent VMD, and in some clinically normal individuals with an abnormal EOG. The high prevalence of novel variants and the frequent report of a specific variant (p.Arg25Trp) that has rarely been described in other ethnic groups suggests a distribution of BEST1 variants peculiar to Italian VMD patients. PMID:23213274

  7. Intact Protein Analysis at 21 Tesla and X-Ray Crystallography Define Structural Differences in Single Amino Acid Variants of Human Mitochondrial Branched-Chain Amino Acid Aminotransferase 2 (BCAT2)

    NASA Astrophysics Data System (ADS)

    Anderson, Lissa C.; Håkansson, Maria; Walse, Björn; Nilsson, Carol L.

    2017-09-01

    Structural technologies are an essential component in the design of precision therapeutics. Precision medicine entails the development of therapeutics directed toward a designated target protein, with the goal to deliver the right drug to the right patient at the right time. In the field of oncology, protein structural variants are often associated with oncogenic potential. In a previous proteogenomic screen of patient-derived glioblastoma (GBM) tumor materials, we identified a sequence variant of human mitochondrial branched-chain amino acid aminotransferase 2 as a putative factor of resistance of GBM to standard-of-care-treatments. The enzyme generates glutamate, which is neurotoxic. To elucidate structural coordinates that may confer altered substrate binding or activity of the variant BCAT2 T186R, a 45 kDa protein, we applied combined ETD and CID top-down mass spectrometry in a LC-FT-ICR MS at 21 T, and X-Ray crystallography in the study of both the variant and non-variant intact proteins. The combined ETD/CID fragmentation pattern allowed for not only extensive sequence coverage but also confident localization of the amino acid variant to its position in the sequence. The crystallographic experiments confirmed the hypothesis generated by in silico structural homology modeling, that the Lys59 side-chain of BCAT2 may repulse the Arg186 in the variant protein (PDB code: 5MPR), leading to destabilization of the protein dimer and altered enzyme kinetics. Taken together, the MS and novel 3D structural data give us reason to further pursue BCAT2 T186R as a precision drug target in GBM. [Figure not available: see fulltext.

  8. FAVR (Filtering and Annotation of Variants that are Rare): methods to facilitate the analysis of rare germline genetic variants from massively parallel sequencing datasets

    PubMed Central

    2013-01-01

    Background Characterising genetic diversity through the analysis of massively parallel sequencing (MPS) data offers enormous potential to significantly improve our understanding of the genetic basis for observed phenotypes, including predisposition to and progression of complex human disease. Great challenges remain in resolving genetic variants that are genuine from the millions of artefactual signals. Results FAVR is a suite of new methods designed to work with commonly used MPS analysis pipelines to assist in the resolution of some of the issues related to the analysis of the vast amount of resulting data, with a focus on relatively rare genetic variants. To the best of our knowledge, no equivalent method has previously been described. The most important and novel aspect of FAVR is the use of signatures in comparator sequence alignment files during variant filtering, and annotation of variants potentially shared between individuals. The FAVR methods use these signatures to facilitate filtering of (i) platform and/or mapping-specific artefacts, (ii) common genetic variants, and, where relevant, (iii) artefacts derived from imbalanced paired-end sequencing, as well as annotation of genetic variants based on evidence of co-occurrence in individuals. We applied conventional variant calling applied to whole-exome sequencing datasets, produced using both SOLiD and TruSeq chemistries, with or without downstream processing by FAVR methods. We demonstrate a 3-fold smaller rare single nucleotide variant shortlist with no detected reduction in sensitivity. This analysis included Sanger sequencing of rare variant signals not evident in dbSNP131, assessment of known variant signal preservation, and comparison of observed and expected rare variant numbers across a range of first cousin pairs. The principles described herein were applied in our recent publication identifying XRCC2 as a new breast cancer risk gene and have been made publically available as a suite of software

  9. Routine HLA-B genotyping with PCR-sequence-specific oligonucleotides detects a B*52 variant (B*5206).

    PubMed

    Hoelsch, K; Lenggeler, I; Pfannes, W; Knabe, H; Klein, H-G; Woelpl, A

    2005-05-01

    A new human leukocyte antigen (HLA)-B allele was found during routine typing of samples for a German unrelated bone marrow donor registry, the "Aktion Knochenmarkspende Bayern". After first interpretation of data of two independent low-resolution sequence-specific oligonucleotide typing tests, a B*51 variant was suggested. Further analysis via sequence-based typing identified the sequence as new B*52 allele. This new allele officially assigned as B*5206 differs from HLA-B*520102 by one nucleotide exchange in exon 2. The mutation is located at nucleotide position 274, at which a cytosine is substituted by a thymine leading to an amino acid change at protein position 67 from serine (TCC) to phenylalanine (TTC).

  10. Human papillomavirus type 18 variant lineages in United States populations characterized by sequence analysis of LCR-E6, E2, and L1 regions.

    PubMed

    Arias-Pulido, Hugo; Peyton, Cheri L; Torrez-Martínez, Norah; Anderson, D Nelson; Wheeler, Cosette M

    2005-07-20

    While HPV 16 variant lineages have been well characterized, the knowledge about HPV 18 variants is limited. In this study, HPV 18 nucleotide variations in the E2 hinge region were characterized by sequence analysis in 47 control and 51 tumor specimens. Fifty of these specimens were randomly selected for sequencing of an LCR-E6 segment and 20 samples representative of LCR-E6 and E2 sequence variants were examined across the L1 region. A total of 2770 nucleotides per HPV 18 variant genome were considered in this study. HPV 18 variant nucleotides were linked among all gene segments analyzed and grouped into three main branches: Asian-American (AA), European (E), and African (Af). These three branches were equally distributed among controls and cases and when stratified by Hispanic and non-Hispanic ethnicities. Among invasive cervical cancer cases, no significant differences in the three HPV variant branches were observed among ethnic groups or when stratified by histopathology (squamous vs. adenocarcinoma). The Af branch showed the greatest nucleotide variability when compared to the HPV 18 reference sequence and was more closely related to HPV 45 than either AA or E branches. Our data also characterize nucleotide and amino acid variations in the L1 capsid gene among HPV 18 variants, which may be relevant to vaccine strategies and subsequent studies of naturally occurring HPV 18 variants. Several novel HPV 18 nucleotide variations were identified in this study.

  11. Amino acid changes in disease-associated variants differ radically from variants observed in the 1000 genomes project dataset.

    PubMed

    de Beer, Tjaart A P; Laskowski, Roman A; Parks, Sarah L; Sipos, Botond; Goldman, Nick; Thornton, Janet M

    2013-01-01

    The 1000 Genomes Project data provides a natural background dataset for amino acid germline mutations in humans. Since the direction of mutation is known, the amino acid exchange matrix generated from the observed nucleotide variants is asymmetric and the mutabilities of the different amino acids are very different. These differences predominantly reflect preferences for nucleotide mutations in the DNA (especially the high mutation rate of the CpG dinucleotide, which makes arginine mutability very much higher than other amino acids) rather than selection imposed by protein structure constraints, although there is evidence for the latter as well. The variants occur predominantly on the surface of proteins (82%), with a slight preference for sites which are more exposed and less well conserved than random. Mutations to functional residues occur about half as often as expected by chance. The disease-associated amino acid variant distributions in OMIM are radically different from those expected on the basis of the 1000 Genomes dataset. The disease-associated variants preferentially occur in more conserved sites, compared to 1000 Genomes mutations. Many of the amino acid exchange profiles appear to exhibit an anti-correlation, with common exchanges in one dataset being rare in the other. Disease-associated variants exhibit more extreme differences in amino acid size and hydrophobicity. More modelling of the mutational processes at the nucleotide level is needed, but these observations should contribute to an improved prediction of the effects of specific variants in humans.

  12. Whole Exome Sequencing Identifies Rare Protein-Coding Variants in Behçet's Disease.

    PubMed

    Ognenovski, Mikhail; Renauer, Paul; Gensterblum, Elizabeth; Kötter, Ina; Xenitidis, Theodoros; Henes, Jörg C; Casali, Bruno; Salvarani, Carlo; Direskeneli, Haner; Kaufman, Kenneth M; Sawalha, Amr H

    2016-05-01

    Behçet's disease (BD) is a systemic inflammatory disease with an incompletely understood etiology. Despite the identification of multiple common genetic variants associated with BD, rare genetic variants have been less explored. We undertook this study to investigate the role of rare variants in BD by performing whole exome sequencing in BD patients of European descent. Whole exome sequencing was performed in a discovery set comprising 14 German BD patients of European descent. For replication and validation, Sanger sequencing and Sequenom genotyping were performed in the discovery set and in 2 additional independent sets of 49 German BD patients and 129 Italian BD patients of European descent. Genetic association analysis was then performed in BD patients and 503 controls of European descent. Functional effects of associated genetic variants were assessed using bioinformatic approaches. Using whole exome sequencing, we identified 77 rare variants (in 74 genes) with predicted protein-damaging effects in BD. These variants were genotyped in 2 additional patient sets and then analyzed to reveal significant associations with BD at 2 genetic variants detected in all 3 patient sets that remained significant after Bonferroni correction. We detected genetic association between BD and LIMK2 (rs149034313), involved in regulating cytoskeletal reorganization, and between BD and NEIL1 (rs5745908), involved in base excision DNA repair (P = 3.22 × 10(-4) and P = 5.16 × 10(-4) , respectively). The LIMK2 association is a missense variant with predicted protein damage that may influence functional interactions with proteins involved in cytoskeletal regulation by Rho GTPase, inflammation mediated by chemokine and cytokine signaling pathways, T cell activation, and angiogenesis (Bonferroni-corrected P = 5.63 × 10(-14) , P = 7.29 × 10(-6) , P = 1.15 × 10(-5) , and P = 6.40 × 10(-3) , respectively). The genetic association in NEIL1 is a predicted splice

  13. Comprehensive Rare Variant Analysis via Whole-Genome Sequencing to Determine the Molecular Pathology of Inherited Retinal Disease.

    PubMed

    Carss, Keren J; Arno, Gavin; Erwood, Marie; Stephens, Jonathan; Sanchis-Juan, Alba; Hull, Sarah; Megy, Karyn; Grozeva, Detelina; Dewhurst, Eleanor; Malka, Samantha; Plagnol, Vincent; Penkett, Christopher; Stirrups, Kathleen; Rizzo, Roberta; Wright, Genevieve; Josifova, Dragana; Bitner-Glindzicz, Maria; Scott, Richard H; Clement, Emma; Allen, Louise; Armstrong, Ruth; Brady, Angela F; Carmichael, Jenny; Chitre, Manali; Henderson, Robert H H; Hurst, Jane; MacLaren, Robert E; Murphy, Elaine; Paterson, Joan; Rosser, Elisabeth; Thompson, Dorothy A; Wakeling, Emma; Ouwehand, Willem H; Michaelides, Michel; Moore, Anthony T; Webster, Andrew R; Raymond, F Lucy

    2017-01-05

    Inherited retinal disease is a common cause of visual impairment and represents a highly heterogeneous group of conditions. Here, we present findings from a cohort of 722 individuals with inherited retinal disease, who have had whole-genome sequencing (n = 605), whole-exome sequencing (n = 72), or both (n = 45) performed, as part of the NIHR-BioResource Rare Diseases research study. We identified pathogenic variants (single-nucleotide variants, indels, or structural variants) for 404/722 (56%) individuals. Whole-genome sequencing gives unprecedented power to detect three categories of pathogenic variants in particular: structural variants, variants in GC-rich regions, which have significantly improved coverage compared to whole-exome sequencing, and variants in non-coding regulatory regions. In addition to previously reported pathogenic regulatory variants, we have identified a previously unreported pathogenic intronic variant in CHM in two males with choroideremia. We have also identified 19 genes not previously known to be associated with inherited retinal disease, which harbor biallelic predicted protein-truncating variants in unsolved cases. Whole-genome sequencing is an increasingly important comprehensive method with which to investigate the genetic causes of inherited retinal disease. Copyright © 2017. Published by Elsevier Inc.

  14. CBH1 homologs and variant CBH1 cellulases

    DOEpatents

    Goedegebuur, Frits [Rozenlaan, NL; Gualfetti, Peter [San Francisco, CA; Mitchinson, Colin [Half Moon Bay, CA; Neefe, Paulien [Zoetermeer, NL

    2011-05-31

    Disclosed are a number of homologs and variants of Hypocrea jecorina Cel7A (formerly Trichoderma reesei cellobiohydrolase I or CBH1), nucleic acids encoding the same and methods for producing the same. The homologs and variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted and/or deleted.

  15. Multiplexed enrichment of rare DNA variants via sequence-selective and temperature-robust amplification

    PubMed Central

    Wu, Lucia R.; Chen, Sherry X.; Wu, Yalei; Patel, Abhijit A.; Zhang, David Yu

    2018-01-01

    Rare DNA-sequence variants hold important clinical and biological information, but existing detection techniques are expensive, complex, allele-specific, or don’t allow for significant multiplexing. Here, we report a temperature-robust polymerase-chain-reaction method, which we term blocker displacement amplification (BDA), that selectively amplifies all sequence variants, including single-nucleotide variants (SNVs), within a roughly 20-nucleotide window by 1,000-fold over wild-type sequences. This allows for easy detection and quantitation of hundreds of potential variants originally at ≤0.1% in allele frequency. BDA is compatible with inexpensive thermocycler instrumentation and employs a rationally designed competitive hybridization reaction to achieve comparable enrichment performance across annealing temperatures ranging from 56 °C to 64 °C. To show the sequence generality of BDA, we demonstrate enrichment of 156 SNVs and the reliable detection of single-digit copies. We also show that the BDA detection of rare driver mutations in cell-free DNA samples extracted from the blood plasma of lung-cancer patients is highly consistent with deep sequencing using molecular lineage tags, with a receiver operator characteristic accuracy of 95%. PMID:29805844

  16. VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research

    PubMed Central

    Lai, Zhongwu; Markovets, Aleksandra; Ahdesmaki, Miika; Chapman, Brad; Hofmann, Oliver; McEwen, Robert; Johnson, Justin; Dougherty, Brian; Barrett, J. Carl; Dry, Jonathan R.

    2016-01-01

    Abstract Accurate variant calling in next generation sequencing (NGS) is critical to understand cancer genomes better. Here we present VarDict, a novel and versatile variant caller for both DNA- and RNA-sequencing data. VarDict simultaneously calls SNV, MNV, InDels, complex and structural variants, expanding the detected genetic driver landscape of tumors. It performs local realignments on the fly for more accurate allele frequency estimation. VarDict performance scales linearly to sequencing depth, enabling ultra-deep sequencing used to explore tumor evolution or detect tumor DNA circulating in blood. In addition, VarDict performs amplicon aware variant calling for polymerase chain reaction (PCR)-based targeted sequencing often used in diagnostic settings, and is able to detect PCR artifacts. Finally, VarDict also detects differences in somatic and loss of heterozygosity variants between paired samples. VarDict reprocessing of The Cancer Genome Atlas (TCGA) Lung Adenocarcinoma dataset called known driver mutations in KRAS, EGFR, BRAF, PIK3CA and MET in 16% more patients than previously published variant calls. We believe VarDict will greatly facilitate application of NGS in clinical cancer research. PMID:27060149

  17. Higher criticism approach to detect rare variants using whole genome sequencing data

    PubMed Central

    2014-01-01

    Because of low statistical power of single-variant tests for whole genome sequencing (WGS) data, the association test for variant groups is a key approach for genetic mapping. To address the features of sparse and weak genetic effects to be detected, the higher criticism (HC) approach has been proposed and theoretically has proven optimal for detecting sparse and weak genetic effects. Here we develop a strategy to apply the HC approach to WGS data that contains rare variants as the majority. By using Genetic Analysis Workshop 18 "dose" genetic data with simulated phenotypes, we assess the performance of HC under a variety of strategies for grouping variants and collapsing rare variants. The HC approach is compared with the minimal p-value method and the sequence kernel association test. The results show that the HC approach is preferred for detecting weak genetic effects. PMID:25519367

  18. Screening of whole genome sequences identified high-impact variants for stallion fertility.

    PubMed

    Schrimpf, Rahel; Gottschalk, Maren; Metzger, Julia; Martinsson, Gunilla; Sieme, Harald; Distl, Ottmar

    2016-04-14

    Stallion fertility is an economically important trait due to the increase of artificial insemination in horses. The availability of whole genome sequence data facilitates identification of rare high-impact variants contributing to stallion fertility. The aim of our study was to genotype rare high-impact variants retrieved from next-generation sequencing (NGS)-data of 11 horses in order to unravel harmful genetic variants in large samples of stallions. Gene ontology (GO) terms and search results from public databases were used to obtain a comprehensive list of human und mice genes predicted to participate in the regulation of male reproduction. The corresponding equine orthologous genes were searched in whole genome sequence data of seven stallions and four mares and filtered for high-impact genetic variants using SnpEFF, SIFT and Polyphen 2 software. All genetic variants with the missing homozygous mutant genotype were genotyped on 337 fertile stallions of 19 breeds using KASP genotyping assays or PCR-RFLP. Mixed linear model analysis was employed for an association analysis with de-regressed estimated breeding values of the paternal component of the pregnancy rate per estrus (EBV-PAT). We screened next generation sequenced data of whole genomes from 11 horses for equine genetic variants in 1194 human and mice genes involved in male fertility and linked through common gene ontology (GO) with male reproductive processes. Variants were filtered for high-impact on protein structure and validated through SIFT and Polyphen 2. Only those genetic variants were followed up when the homozygote mutant genotype was missing in the detection sample comprising 11 horses. After this filtering process, 17 single nucleotide polymorphism (SNPs) were left. These SNPs were genotyped in 337 fertile stallions of 19 breeds using KASP genotyping assays or PCR-RFLP. An association analysis in 216 Hanoverian stallions revealed a significant association of the splice-site disruption variant

  19. CYP3A4 allelic variants with amino acid substitutions in exons 7 and 12: evidence for an allelic variant with altered catalytic activity.

    PubMed

    Sata, F; Sapone, A; Elizondo, G; Stocker, P; Miller, V P; Zheng, W; Raunio, H; Crespi, C L; Gonzalez, F J

    2000-01-01

    To determine the existence of mutant and variant CgammaP3A4 alleles in three racial groups and to assess functions of the variant alleles by complementary deoxyribonucleic acid (cDNA) expression. A bacterial artificial chromosome that contains the complete CgammaP3A4 gene was isolated and the exons and surrounding introns were directly sequenced to develop primers to polymerase chain reaction (PCR) amplify and sequence the gene from lymphocyte DNA. DNA samples from Chinese, black, and white subjects were screened. Mutating the affected amino acid in the wild-type cDNA and expressing the variant enzyme with use of the baculovirus system was used to functionally evaluate the variant allele having a missense mutation. To investigate the existence of mutant and variant CgammaP3A4 alleles in humans, all 13 exons and the 5'-flanking region of the human CgammaP3A4 gene in three racial groups were sequenced and four alleles were identified. An A-->G point mutation in the 5'-flanking region of the human CgammaP3A4 gene, designated CgammaP3A4*1B, was found in the three different racial groups. The frequency of this allele in a white population was 4.2%, whereas it was 66.7% in black subjects. The CgammaP3A4*1B allele was not found in Chinese subjects. A second variant allele, designated CgammaP3A4*2, having a Ser222Pro change, was found at a frequency of 2.7% in the white population and was absent in the black subjects and Chinese subjects analyzed. Baculovirus-directed cDNA expression revealed that the CYP3A4*2 P450 had a lower intrinsic clearance for the CYP3A4 substrate nifedipine compared with the wild-type enzyme but was not significantly different from the wild-type enzyme for testosterone 6beta-hydroxylation. Another rare allele, designated CgammaP3A4*3, was found in a single Chinese subject who had a Met445Thr change in the conserved heme-binding region of the P450. These are the first examples of potential function polymorphisms resulting from missense mutations in

  20. A survey of tools for variant analysis of next-generation genome sequencing data

    PubMed Central

    Pabinger, Stephan; Dander, Andreas; Fischer, Maria; Snajder, Rene; Sperk, Michael; Efremova, Mirjana; Krabichler, Birgit; Speicher, Michael R.; Zschocke, Johannes

    2014-01-01

    Recent advances in genome sequencing technologies provide unprecedented opportunities to characterize individual genomic landscapes and identify mutations relevant for diagnosis and therapy. Specifically, whole-exome sequencing using next-generation sequencing (NGS) technologies is gaining popularity in the human genetics community due to the moderate costs, manageable data amounts and straightforward interpretation of analysis results. While whole-exome and, in the near future, whole-genome sequencing are becoming commodities, data analysis still poses significant challenges and led to the development of a plethora of tools supporting specific parts of the analysis workflow or providing a complete solution. Here, we surveyed 205 tools for whole-genome/whole-exome sequencing data analysis supporting five distinct analytical steps: quality assessment, alignment, variant identification, variant annotation and visualization. We report an overview of the functionality, features and specific requirements of the individual tools. We then selected 32 programs for variant identification, variant annotation and visualization, which were subjected to hands-on evaluation using four data sets: one set of exome data from two patients with a rare disease for testing identification of germline mutations, two cancer data sets for testing variant callers for somatic mutations, copy number variations and structural variations, and one semi-synthetic data set for testing identification of copy number variations. Our comprehensive survey and evaluation of NGS tools provides a valuable guideline for human geneticists working on Mendelian disorders, complex diseases and cancers. PMID:23341494

  1. Supplementation of Nucleosides During Selection can Reduce Sequence Variant Levels in CHO Cells Using GS/MSX Selection System.

    PubMed

    Tang, Danming; Lam, Cynthia; Louie, Salina; Hoi, Kam Hon; Shaw, David; Yim, Mandy; Snedecor, Brad; Misaghi, Shahram

    2018-01-01

    In the process of generating stable monoclonal antibody (mAb) producing cell lines, reagents such as methotrexate (MTX) or methionine sulfoximine (MSX) are often used. However, using such selection reagent(s) increases the possibility of having higher occurrence of sequence variants in the expressed antibody molecules due to the effects of MTX or MSX on de novo nucleotide synthesis. Since MSX inhibits glutamine synthase (GS) and results in both amino acid and nucleoside starvation, it is questioned whether supplementing nucleosides into the media could lower sequence variant levels without affecting titer. The results show that the supplementation of nucleosides to the media during MSX selection decreased genomic DNA mutagenesis rates in the selected cells, probably by reducing nucleotide mis-incorporation into the DNA. Furthermore, addition of nucleosides enhance clone recovery post selection and does not affect antibody expression. It is further observed that nucleoside supplements lowered DNA mutagenesis rates only at the initial stage of the clone selection and do not have any effect on DNA mutagenesis rates after stable cell lines are established. Therefore, the data suggests that addition of nucleosides during early stages of MSX selection can lower sequence variant levels without affecting titer or clone stability in antibody expression. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  2. Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data

    NASA Astrophysics Data System (ADS)

    Sandmann, Sarah; de Graaf, Aniek O.; Karimi, Mohsen; van der Reijden, Bert A.; Hellström-Lindberg, Eva; Jansen, Joop H.; Dugas, Martin

    2017-02-01

    Valid variant calling results are crucial for the use of next-generation sequencing in clinical routine. However, there are numerous variant calling tools that usually differ in algorithms, filtering strategies, recommendations and thus, also in the output. We evaluated eight open-source tools regarding their ability to call single nucleotide variants and short indels with allelic frequencies as low as 1% in non-matched next-generation sequencing data: GATK HaplotypeCaller, Platypus, VarScan, LoFreq, FreeBayes, SNVer, SAMtools and VarDict. We analysed two real datasets from patients with myelodysplastic syndrome, covering 54 Illumina HiSeq samples and 111 Illumina NextSeq samples. Mutations were validated by re-sequencing on the same platform, on a different platform and expert based review. In addition we considered two simulated datasets with varying coverage and error profiles, covering 50 samples each. In all cases an identical target region consisting of 19 genes (42,322 bp) was analysed. Altogether, no tool succeeded in calling all mutations. High sensitivity was always accompanied by low precision. Influence of varying coverages- and background noise on variant calling was generally low. Taking everything into account, VarDict performed best. However, our results indicate that there is a need to improve reproducibility of the results in the context of multithreading.

  3. Whole-exome sequencing for variant discovery in blepharospasm.

    PubMed

    Tian, Jun; Vemula, Satya R; Xiao, Jianfeng; Valente, Enza Maria; Defazio, Giovanni; Petrucci, Simona; Gigante, Angelo Fabio; Rudzińska-Bar, Monika; Wszolek, Zbigniew K; Kennelly, Kathleen D; Uitti, Ryan J; van Gerpen, Jay A; Hedera, Peter; Trimble, Elizabeth J; LeDoux, Mark S

    2018-05-16

    Blepharospasm (BSP) is a type of focal dystonia characterized by involuntary orbicularis oculi spasms that are usually bilateral, synchronous, and symmetrical. Despite strong evidence for genetic contributions to BSP, progress in the field has been constrained by small cohorts, incomplete penetrance, and late age of onset. Although several genetic etiologies for dystonia have been identified through whole-exome sequencing (WES), none of these are characteristically associated with BSP as a singular or predominant manifestation. We performed WES on 31 subjects from 21 independent pedigrees with BSP. The strongest candidate sequence variants derived from in silico analyses were confirmed with bidirectional Sanger sequencing and subjected to cosegregation analysis. Cosegregating deleterious variants (GRCH37/hg19) in CACNA1A (NM_001127222.1: c.7261_7262delinsGT, p.Pro2421Val), REEP4 (NM_025232.3: c.109C>T, p.Arg37Trp), TOR2A (NM_130459.3: c.568C>T, p.Arg190Cys), and ATP2A3 (NM_005173.3: c.1966C>T, p.Arg656Cys) were identified in four independent multigenerational pedigrees. Deleterious variants in HS1BP3 (NM_022460.3: c.94C>A, p.Gly32Cys) and GNA14 (NM_004297.3: c.989_990del, p.Thr330ArgfsTer67) were identified in a father and son with segmental cranio-cervical dystonia first manifest as BSP. Deleterious variants in DNAH17, TRPV4, CAPN11, VPS13C, UNC13B, SPTBN4, MYOD1, and MRPL15 were found in two or more independent pedigrees. To our knowledge, none of these genes have previously been associated with isolated BSP, although other CACNA1A mutations have been associated with both positive and negative motor disorders including ataxia, episodic ataxia, hemiplegic migraine, and dystonia. Our WES datasets provide a platform for future studies of BSP genetics which will demand careful consideration of incomplete penetrance, pleiotropy, population stratification, and oligogenic inheritance patterns. © 2018 The Authors. Molecular Genetics & Genomic Medicine published by Wiley

  4. Investigation of rare and low-frequency variants using high-throughput sequencing with pooled DNA samples

    PubMed Central

    Wang, Jingwen; Skoog, Tiina; Einarsdottir, Elisabet; Kaartokallio, Tea; Laivuori, Hannele; Grauers, Anna; Gerdhem, Paul; Hytönen, Marjo; Lohi, Hannes; Kere, Juha; Jiao, Hong

    2016-01-01

    High-throughput sequencing using pooled DNA samples can facilitate genome-wide studies on rare and low-frequency variants in a large population. Some major questions concerning the pooling sequencing strategy are whether rare and low-frequency variants can be detected reliably, and whether estimated minor allele frequencies (MAFs) can represent the actual values obtained from individually genotyped samples. In this study, we evaluated MAF estimates using three variant detection tools with two sets of pooled whole exome sequencing (WES) and one set of pooled whole genome sequencing (WGS) data. Both GATK and Freebayes displayed high sensitivity, specificity and accuracy when detecting rare or low-frequency variants. For the WGS study, 56% of the low-frequency variants in Illumina array have identical MAFs and 26% have one allele difference between sequencing and individual genotyping data. The MAF estimates from WGS correlated well (r = 0.94) with those from Illumina arrays. The MAFs from the pooled WES data also showed high concordance (r = 0.88) with those from the individual genotyping data. In conclusion, the MAFs estimated from pooled DNA sequencing data reflect the MAFs in individually genotyped samples well. The pooling strategy can thus be a rapid and cost-effective approach for the initial screening in large-scale association studies. PMID:27633116

  5. Linkage disequilibrium among commonly genotyped SNP and variants detected from bull sequence

    USDA-ARS?s Scientific Manuscript database

    Genomic prediction utilizing causal variants could increase selection accuracy above that achieved with SNP genotyped by commercial assays. A number of variants detected from sequencing influential sires are likely to be causal, but noticable improvements in prediction accuracy using imputed sequen...

  6. A machine learning model to determine the accuracy of variant calls in capture-based next generation sequencing.

    PubMed

    van den Akker, Jeroen; Mishne, Gilad; Zimmer, Anjali D; Zhou, Alicia Y

    2018-04-17

    Next generation sequencing (NGS) has become a common technology for clinical genetic tests. The quality of NGS calls varies widely and is influenced by features like reference sequence characteristics, read depth, and mapping accuracy. With recent advances in NGS technology and software tools, the majority of variants called using NGS alone are in fact accurate and reliable. However, a small subset of difficult-to-call variants that still do require orthogonal confirmation exist. For this reason, many clinical laboratories confirm NGS results using orthogonal technologies such as Sanger sequencing. Here, we report the development of a deterministic machine-learning-based model to differentiate between these two types of variant calls: those that do not require confirmation using an orthogonal technology (high confidence), and those that require additional quality testing (low confidence). This approach allows reliable NGS-based calling in a clinical setting by identifying the few important variant calls that require orthogonal confirmation. We developed and tested the model using a set of 7179 variants identified by a targeted NGS panel and re-tested by Sanger sequencing. The model incorporated several signals of sequence characteristics and call quality to determine if a variant was identified at high or low confidence. The model was tuned to eliminate false positives, defined as variants that were called by NGS but not confirmed by Sanger sequencing. The model achieved very high accuracy: 99.4% (95% confidence interval: +/- 0.03%). It categorized 92.2% (6622/7179) of the variants as high confidence, and 100% of these were confirmed to be present by Sanger sequencing. Among the variants that were categorized as low confidence, defined as NGS calls of low quality that are likely to be artifacts, 92.1% (513/557) were found to be not present by Sanger sequencing. This work shows that NGS data contains sufficient characteristics for a machine-learning-based model to

  7. Whole-Exome Sequencing Identifies Rare and Low-Frequency Coding Variants Associated with LDL Cholesterol

    PubMed Central

    Lange, Leslie A.; Hu, Youna; Zhang, He; Xue, Chenyi; Schmidt, Ellen M.; Tang, Zheng-Zheng; Bizon, Chris; Lange, Ethan M.; Smith, Joshua D.; Turner, Emily H.; Jun, Goo; Kang, Hyun Min; Peloso, Gina; Auer, Paul; Li, Kuo-ping; Flannick, Jason; Zhang, Ji; Fuchsberger, Christian; Gaulton, Kyle; Lindgren, Cecilia; Locke, Adam; Manning, Alisa; Sim, Xueling; Rivas, Manuel A.; Holmen, Oddgeir L.; Gottesman, Omri; Lu, Yingchang; Ruderfer, Douglas; Stahl, Eli A.; Duan, Qing; Li, Yun; Durda, Peter; Jiao, Shuo; Isaacs, Aaron; Hofman, Albert; Bis, Joshua C.; Correa, Adolfo; Griswold, Michael E.; Jakobsdottir, Johanna; Smith, Albert V.; Schreiner, Pamela J.; Feitosa, Mary F.; Zhang, Qunyuan; Huffman, Jennifer E.; Crosby, Jacy; Wassel, Christina L.; Do, Ron; Franceschini, Nora; Martin, Lisa W.; Robinson, Jennifer G.; Assimes, Themistocles L.; Crosslin, David R.; Rosenthal, Elisabeth A.; Tsai, Michael; Rieder, Mark J.; Farlow, Deborah N.; Folsom, Aaron R.; Lumley, Thomas; Fox, Ervin R.; Carlson, Christopher S.; Peters, Ulrike; Jackson, Rebecca D.; van Duijn, Cornelia M.; Uitterlinden, André G.; Levy, Daniel; Rotter, Jerome I.; Taylor, Herman A.; Gudnason, Vilmundur; Siscovick, David S.; Fornage, Myriam; Borecki, Ingrid B.; Hayward, Caroline; Rudan, Igor; Chen, Y. Eugene; Bottinger, Erwin P.; Loos, Ruth J.F.; Sætrom, Pål; Hveem, Kristian; Boehnke, Michael; Groop, Leif; McCarthy, Mark; Meitinger, Thomas; Ballantyne, Christie M.; Gabriel, Stacey B.; O’Donnell, Christopher J.; Post, Wendy S.; North, Kari E.; Reiner, Alexander P.; Boerwinkle, Eric; Psaty, Bruce M.; Altshuler, David; Kathiresan, Sekar; Lin, Dan-Yu; Jarvik, Gail P.; Cupples, L. Adrienne; Kooperberg, Charles; Wilson, James G.; Nickerson, Deborah A.; Abecasis, Goncalo R.; Rich, Stephen S.; Tracy, Russell P.; Willer, Cristen J.; Gabriel, Stacey B.; Altshuler, David M.; Abecasis, Gonçalo R.; Allayee, Hooman; Cresci, Sharon; Daly, Mark J.; de Bakker, Paul I.W.; DePristo, Mark A.; Do, Ron; Donnelly, Peter; Farlow, Deborah N.; Fennell, Tim; Garimella, Kiran; Hazen, Stanley L.; Hu, Youna; Jordan, Daniel M.; Jun, Goo; Kathiresan, Sekar; Kang, Hyun Min; Kiezun, Adam; Lettre, Guillaume; Li, Bingshan; Li, Mingyao; Newton-Cheh, Christopher H.; Padmanabhan, Sandosh; Peloso, Gina; Pulit, Sara; Rader, Daniel J.; Reich, David; Reilly, Muredach P.; Rivas, Manuel A.; Schwartz, Steve; Scott, Laura; Siscovick, David S.; Spertus, John A.; Stitziel, Nathaniel O.; Stoletzki, Nina; Sunyaev, Shamil R.; Voight, Benjamin F.; Willer, Cristen J.; Rich, Stephen S.; Akylbekova, Ermeg; Atwood, Larry D.; Ballantyne, Christie M.; Barbalic, Maja; Barr, R. Graham; Benjamin, Emelia J.; Bis, Joshua; Boerwinkle, Eric; Bowden, Donald W.; Brody, Jennifer; Budoff, Matthew; Burke, Greg; Buxbaum, Sarah; Carr, Jeff; Chen, Donna T.; Chen, Ida Y.; Chen, Wei-Min; Concannon, Pat; Crosby, Jacy; Cupples, L. Adrienne; D’Agostino, Ralph; DeStefano, Anita L.; Dreisbach, Albert; Dupuis, Josée; Durda, J. Peter; Ellis, Jaclyn; Folsom, Aaron R.; Fornage, Myriam; Fox, Caroline S.; Fox, Ervin; Funari, Vincent; Ganesh, Santhi K.; Gardin, Julius; Goff, David; Gordon, Ora; Grody, Wayne; Gross, Myron; Guo, Xiuqing; Hall, Ira M.; Heard-Costa, Nancy L.; Heckbert, Susan R.; Heintz, Nicholas; Herrington, David M.; Hickson, DeMarc; Huang, Jie; Hwang, Shih-Jen; Jacobs, David R.; Jenny, Nancy S.; Johnson, Andrew D.; Johnson, Craig W.; Kawut, Steven; Kronmal, Richard; Kurz, Raluca; Lange, Ethan M.; Lange, Leslie A.; Larson, Martin G.; Lawson, Mark; Lewis, Cora E.; Levy, Daniel; Li, Dalin; Lin, Honghuang; Liu, Chunyu; Liu, Jiankang; Liu, Kiang; Liu, Xiaoming; Liu, Yongmei; Longstreth, William T.; Loria, Cay; Lumley, Thomas; Lunetta, Kathryn; Mackey, Aaron J.; Mackey, Rachel; Manichaikul, Ani; Maxwell, Taylor; McKnight, Barbara; Meigs, James B.; Morrison, Alanna C.; Musani, Solomon K.; Mychaleckyj, Josyf C.; Nettleton, Jennifer A.; North, Kari; O’Donnell, Christopher J.; O’Leary, Daniel; Ong, Frank; Palmas, Walter; Pankow, James S.; Pankratz, Nathan D.; Paul, Shom; Perez, Marco; Person, Sharina D.; Polak, Joseph; Post, Wendy S.; Psaty, Bruce M.; Quinlan, Aaron R.; Raffel, Leslie J.; Ramachandran, Vasan S.; Reiner, Alexander P.; Rice, Kenneth; Rotter, Jerome I.; Sanders, Jill P.; Schreiner, Pamela; Seshadri, Sudha; Shea, Steve; Sidney, Stephen; Silverstein, Kevin; Smith, Nicholas L.; Sotoodehnia, Nona; Srinivasan, Asoke; Taylor, Herman A.; Taylor, Kent; Thomas, Fridtjof; Tracy, Russell P.; Tsai, Michael Y.; Volcik, Kelly A.; Wassel, Chrstina L.; Watson, Karol; Wei, Gina; White, Wendy; Wiggins, Kerri L.; Wilk, Jemma B.; Williams, O. Dale; Wilson, Gregory; Wilson, James G.; Wolf, Phillip; Zakai, Neil A.; Hardy, John; Meschia, James F.; Nalls, Michael; Singleton, Andrew; Worrall, Brad; Bamshad, Michael J.; Barnes, Kathleen C.; Abdulhamid, Ibrahim; Accurso, Frank; Anbar, Ran; Beaty, Terri; Bigham, Abigail; Black, Phillip; Bleecker, Eugene; Buckingham, Kati; Cairns, Anne Marie; Caplan, Daniel; Chatfield, Barbara; Chidekel, Aaron; Cho, Michael; Christiani, David C.; Crapo, James D.; Crouch, Julia; Daley, Denise; Dang, Anthony; Dang, Hong; De Paula, Alicia; DeCelie-Germana, Joan; Drumm, Allen DozorMitch; Dyson, Maynard; Emerson, Julia; Emond, Mary J.; Ferkol, Thomas; Fink, Robert; Foster, Cassandra; Froh, Deborah; Gao, Li; Gershan, William; Gibson, Ronald L.; Godwin, Elizabeth; Gondor, Magdalen; Gutierrez, Hector; Hansel, Nadia N.; Hassoun, Paul M.; Hiatt, Peter; Hokanson, John E.; Howenstine, Michelle; Hummer, Laura K.; Kanga, Jamshed; Kim, Yoonhee; Knowles, Michael R.; Konstan, Michael; Lahiri, Thomas; Laird, Nan; Lange, Christoph; Lin, Lin; Lin, Xihong; Louie, Tin L.; Lynch, David; Make, Barry; Martin, Thomas R.; Mathai, Steve C.; Mathias, Rasika A.; McNamara, John; McNamara, Sharon; Meyers, Deborah; Millard, Susan; Mogayzel, Peter; Moss, Richard; Murray, Tanda; Nielson, Dennis; Noyes, Blakeslee; O’Neal, Wanda; Orenstein, David; O’Sullivan, Brian; Pace, Rhonda; Pare, Peter; Parker, H. Worth; Passero, Mary Ann; Perkett, Elizabeth; Prestridge, Adrienne; Rafaels, Nicholas M.; Ramsey, Bonnie; Regan, Elizabeth; Ren, Clement; Retsch-Bogart, George; Rock, Michael; Rosen, Antony; Rosenfeld, Margaret; Ruczinski, Ingo; Sanford, Andrew; Schaeffer, David; Sell, Cindy; Sheehan, Daniel; Silverman, Edwin K.; Sin, Don; Spencer, Terry; Stonebraker, Jackie; Tabor, Holly K.; Varlotta, Laurie; Vergara, Candelaria I.; Weiss, Robert; Wigley, Fred; Wise, Robert A.; Wright, Fred A.; Wurfel, Mark M.; Zanni, Robert; Zou, Fei; Nickerson, Deborah A.; Rieder, Mark J.; Green, Phil; Shendure, Jay; Akey, Joshua M.; Bustamante, Carlos D.; Crosslin, David R.; Eichler, Evan E.; Fox, P. Keolu; Fu, Wenqing; Gordon, Adam; Gravel, Simon; Jarvik, Gail P.; Johnsen, Jill M.; Kan, Mengyuan; Kenny, Eimear E.; Kidd, Jeffrey M.; Lara-Garduno, Fremiet; Leal, Suzanne M.; Liu, Dajiang J.; McGee, Sean; O’Connor, Timothy D.; Paeper, Bryan; Robertson, Peggy D.; Smith, Joshua D.; Staples, Jeffrey C.; Tennessen, Jacob A.; Turner, Emily H.; Wang, Gao; Yi, Qian; Jackson, Rebecca; Peters, Ulrike; Carlson, Christopher S.; Anderson, Garnet; Anton-Culver, Hoda; Assimes, Themistocles L.; Auer, Paul L.; Beresford, Shirley; Bizon, Chris; Black, Henry; Brunner, Robert; Brzyski, Robert; Burwen, Dale; Caan, Bette; Carty, Cara L.; Chlebowski, Rowan; Cummings, Steven; Curb, J. David; Eaton, Charles B.; Ford, Leslie; Franceschini, Nora; Fullerton, Stephanie M.; Gass, Margery; Geller, Nancy; Heiss, Gerardo; Howard, Barbara V.; Hsu, Li; Hutter, Carolyn M.; Ioannidis, John; Jiao, Shuo; Johnson, Karen C.; Kooperberg, Charles; Kuller, Lewis; LaCroix, Andrea; Lakshminarayan, Kamakshi; Lane, Dorothy; Lasser, Norman; LeBlanc, Erin; Li, Kuo-Ping; Limacher, Marian; Lin, Dan-Yu; Logsdon, Benjamin A.; Ludlam, Shari; Manson, JoAnn E.; Margolis, Karen; Martin, Lisa; McGowan, Joan; Monda, Keri L.; Kotchen, Jane Morley; Nathan, Lauren; Ockene, Judith; O’Sullivan, Mary Jo; Phillips, Lawrence S.; Prentice, Ross L.; Robbins, John; Robinson, Jennifer G.; Rossouw, Jacques E.; Sangi-Haghpeykar, Haleh; Sarto, Gloria E.; Shumaker, Sally; Simon, Michael S.; Stefanick, Marcia L.; Stein, Evan; Tang, Hua; Taylor, Kira C.; Thomson, Cynthia A.; Thornton, Timothy A.; Van Horn, Linda; Vitolins, Mara; Wactawski-Wende, Jean; Wallace, Robert; Wassertheil-Smoller, Sylvia; Zeng, Donglin; Applebaum-Bowden, Deborah; Feolo, Michael; Gan, Weiniu; Paltoo, Dina N.; Sholinsky, Phyliss; Sturcke, Anne

    2014-01-01

    Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98th or <2nd percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments. PMID:24507775

  8. Whole exome sequencing identifies genetic variants in inherited thrombocytopenia with secondary qualitative function defects

    PubMed Central

    Johnson, Ben; Lowe, Gillian C.; Futterer, Jane; Lordkipanidzé, Marie; MacDonald, David; Simpson, Michael A.; Sanchez-Guiú, Isabel; Drake, Sian; Bem, Danai; Leo, Vincenzo; Fletcher, Sarah J.; Dawood, Ban; Rivera, José; Allsup, David; Biss, Tina; Bolton-Maggs, Paula HB; Collins, Peter; Curry, Nicola; Grimley, Charlotte; James, Beki; Makris, Mike; Motwani, Jayashree; Pavord, Sue; Talks, Katherine; Thachil, Jecko; Wilde, Jonathan; Williams, Mike; Harrison, Paul; Gissen, Paul; Mundell, Stuart; Mumford, Andrew; Daly, Martina E.; Watson, Steve P.; Morgan, Neil V.

    2016-01-01

    Inherited thrombocytopenias are a heterogeneous group of disorders characterized by abnormally low platelet counts which can be associated with abnormal bleeding. Next-generation sequencing has previously been employed in these disorders for the confirmation of suspected genetic abnormalities, and more recently in the discovery of novel disease-causing genes. However its full potential has not yet been exploited. Over the past 6 years we have sequenced the exomes from 55 patients, including 37 index cases and 18 additional family members, all of whom were recruited to the UK Genotyping and Phenotyping of Platelets study. All patients had inherited or sustained thrombocytopenia of unknown etiology with platelet counts varying from 11×109/L to 186×109/L. Of the 51 patients phenotypically tested, 37 (73%), had an additional secondary qualitative platelet defect. Using whole exome sequencing analysis we have identified “pathogenic” or “likely pathogenic” variants in 46% (17/37) of our index patients with thrombocytopenia. In addition, we report variants of uncertain significance in 12 index cases, including novel candidate genetic variants in previously unreported genes in four index cases. These results demonstrate that whole exome sequencing is an efficient method for elucidating potential pathogenic genetic variants in inherited thrombocytopenia. Whole exome sequencing also has the added benefit of discovering potentially pathogenic genetic variants for further study in novel genes not previously implicated in inherited thrombocytopenia. PMID:27479822

  9. Principles and Recommendations for Standardizing the Use of the Next-Generation Sequencing Variant File in Clinical Settings.

    PubMed

    Lubin, Ira M; Aziz, Nazneen; Babb, Lawrence J; Ballinger, Dennis; Bisht, Himani; Church, Deanna M; Cordes, Shaun; Eilbeck, Karen; Hyland, Fiona; Kalman, Lisa; Landrum, Melissa; Lockhart, Edward R; Maglott, Donna; Marth, Gabor; Pfeifer, John D; Rehm, Heidi L; Roy, Somak; Tezak, Zivana; Truty, Rebecca; Ullman-Cullere, Mollie; Voelkerding, Karl V; Worthey, Elizabeth A; Zaranek, Alexander W; Zook, Justin M

    2017-05-01

    A national workgroup convened by the Centers for Disease Control and Prevention identified principles and made recommendations for standardizing the description of sequence data contained within the variant file generated during the course of clinical next-generation sequence analysis for diagnosing human heritable conditions. The specifications for variant files were initially developed to be flexible with regard to content representation to support a variety of research applications. This flexibility permits variation with regard to how sequence findings are described and this depends, in part, on the conventions used. For clinical laboratory testing, this poses a problem because these differences can compromise the capability to compare sequence findings among laboratories to confirm results and to query databases to identify clinically relevant variants. To provide for a more consistent representation of sequence findings described within variant files, the workgroup made several recommendations that considered alignment to a common reference sequence, variant caller settings, use of genomic coordinates, and gene and variant naming conventions. These recommendations were considered with regard to the existing variant file specifications presently used in the clinical setting. Adoption of these recommendations is anticipated to reduce the potential for ambiguity in describing sequence findings and facilitate the sharing of genomic data among clinical laboratories and other entities. Copyright © 2017 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  10. An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data.

    PubMed

    Jun, Goo; Wing, Mary Kate; Abecasis, Gonçalo R; Kang, Hyun Min

    2015-06-01

    The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes Project show that the pipeline provides effective filtering against false positive variants and high power to detect true variants. Our pipeline has already contributed to variant detection and genotyping in several large-scale sequencing projects, including the 1000 Genomes Project and the NHLBI Exome Sequencing Project. We hope it will now prove useful to many medical sequencing studies. © 2015 Jun et al.; Published by Cold Spring Harbor Laboratory Press.

  11. Mitochondrial targeting sequence variants of the CHCHD2 gene are a risk for Lewy body disorders

    PubMed Central

    Ogaki, Kotaro; Koga, Shunsuke; Heckman, Michael G.; Fiesel, Fabienne C.; Ando, Maya; Labbé, Catherine; Lorenzo-Betancor, Oswaldo; Moussaud-Lamodière, Elisabeth L.; Soto-Ortolaza, Alexandra I.; Walton, Ronald L.; Strongosky, Audrey J.; Uitti, Ryan J.; McCarthy, Allan; Lynch, Timothy; Siuda, Joanna; Opala, Grzegorz; Rudzinska, Monika; Krygowska-Wajs, Anna; Barcikowska, Maria; Czyzewski, Krzysztof; Puschmann, Andreas; Nishioka, Kenya; Funayama, Manabu; Hattori, Nobutaka; Parisi, Joseph E.; Petersen, Ronald C.; Graff-Radford, Neill R.; Boeve, Bradley F.; Springer, Wolfdieter; Wszolek, Zbigniew K.; Dickson, Dennis W.

    2015-01-01

    Objective: To assess the role of CHCHD2 variants in patients with Parkinson disease (PD) and Lewy body disease (LBD) in Caucasian populations. Methods: All exons of the CHCHD2 gene were sequenced in a US Caucasian patient-control series (878 PD, 610 LBD, and 717 controls). Subsequently, exons 1 and 2 were sequenced in an Irish series (355 PD and 365 controls) and a Polish series (394 PD and 350 controls). Immunohistochemistry and immunofluorescence studies were performed on pathologic LBD cases with rare CHCHD2 variants. Results: We identified 9 rare exonic variants of unknown significance. These variants were more frequent in the combined group of PD and LBD patients compared to controls (0.6% vs 0.1%, p = 0.013). In addition, the presence of any rare variant was more common in patients with LBD (2.5% vs 1.0%, p = 0.050) compared to controls. Eight of these 9 variants were located within the gene's mitochondrial targeting sequence. Conclusions: Although the role of variants of the CHCHD2 gene in PD and LBD remains to be further elucidated, the rare variants in the mitochondrial targeting sequence may be a risk factor for Lewy body disorders, which may link CHCHD2 to other genetic forms of parkinsonism with mitochondrial dysfunction. PMID:26561290

  12. Polymorphisms and variants in the prion protein sequence of European moose (Alces alces), reindeer (Rangifer tarandus), roe deer (Capreolus capreolus) and fallow deer (Dama dama) in Scandinavia

    PubMed Central

    Wik, Lotta; Mikko, Sofia; Klingeborn, Mikael; Stéen, Margareta; Simonsson, Magnus; Linné, Tommy

    2012-01-01

    The prion protein (PrP) sequence of European moose, reindeer, roe deer and fallow deer in Scandinavia has high homology to the PrP sequence of North American cervids. Variants in the European moose PrP sequence were found at amino acid position 109 as K or Q. The 109Q variant is unique in the PrP sequence of vertebrates. During the 1980s a wasting syndrome in Swedish moose, Moose Wasting Syndrome (MWS), was described. SNP analysis demonstrated a difference in the observed genotype proportions of the heterozygous Q/K and homozygous Q/Q variants in the MWS animals compared with the healthy animals. In MWS moose the allele frequencies for 109K and 109Q were 0.73 and 0.27, respectively, and for healthy animals 0.69 and 0.31. Both alleles were seen as heterozygotes and homozygotes. In reindeer, PrP sequence variation was demonstrated at codon 176 as D or N and codon 225 as S or Y. The PrP sequences in roe deer and fallow deer were identical with published GenBank sequences. PMID:22441661

  13. Novel pathogenic variant (c.3178G>A) in the SMC1A gene in a family with Cornelia de Lange syndrome identified by exome sequencing.

    PubMed

    Jang, Mi Ae; Lee, Chang Woo; Kim, Jin Kyung; Ki, Chang Seok

    2015-11-01

    Cornelia de Lange syndrome (CdLS) is a clinically and genetically heterogeneous congenital anomaly. Mutations in the NIPBL gene account for a half of the affected individuals. We describe a family with CdLS carrying a novel pathogenic variant of the SMC1A gene identified by exome sequencing. The proband was a 3-yr-old boy presenting with a developmental delay. He had distinctive facial features without major structural anomalies and tested negative for the NIPBL gene. His younger sister, mother, and maternal grandmother presented with mild mental retardation. By exome sequencing of the proband, a novel SMC1A variant, c.3178G>A, was identified, which was expected to cause an amino acid substitution (p.Glu1060Lys) in the highly conserved coiled-coil domain of the SMC1A protein. Sanger sequencing confirmed that the three female relatives with mental retardation also carry this variant. Our results reveal that SMC1A gene defects are associated with milder phenotypes of CdLS. Furthermore, we showed that exome sequencing could be a useful tool to identify pathogenic variants in patients with CdLS.

  14. Whole-exome sequencing identifies rare and low-frequency coding variants associated with LDL cholesterol.

    PubMed

    Lange, Leslie A; Hu, Youna; Zhang, He; Xue, Chenyi; Schmidt, Ellen M; Tang, Zheng-Zheng; Bizon, Chris; Lange, Ethan M; Smith, Joshua D; Turner, Emily H; Jun, Goo; Kang, Hyun Min; Peloso, Gina; Auer, Paul; Li, Kuo-Ping; Flannick, Jason; Zhang, Ji; Fuchsberger, Christian; Gaulton, Kyle; Lindgren, Cecilia; Locke, Adam; Manning, Alisa; Sim, Xueling; Rivas, Manuel A; Holmen, Oddgeir L; Gottesman, Omri; Lu, Yingchang; Ruderfer, Douglas; Stahl, Eli A; Duan, Qing; Li, Yun; Durda, Peter; Jiao, Shuo; Isaacs, Aaron; Hofman, Albert; Bis, Joshua C; Correa, Adolfo; Griswold, Michael E; Jakobsdottir, Johanna; Smith, Albert V; Schreiner, Pamela J; Feitosa, Mary F; Zhang, Qunyuan; Huffman, Jennifer E; Crosby, Jacy; Wassel, Christina L; Do, Ron; Franceschini, Nora; Martin, Lisa W; Robinson, Jennifer G; Assimes, Themistocles L; Crosslin, David R; Rosenthal, Elisabeth A; Tsai, Michael; Rieder, Mark J; Farlow, Deborah N; Folsom, Aaron R; Lumley, Thomas; Fox, Ervin R; Carlson, Christopher S; Peters, Ulrike; Jackson, Rebecca D; van Duijn, Cornelia M; Uitterlinden, André G; Levy, Daniel; Rotter, Jerome I; Taylor, Herman A; Gudnason, Vilmundur; Siscovick, David S; Fornage, Myriam; Borecki, Ingrid B; Hayward, Caroline; Rudan, Igor; Chen, Y Eugene; Bottinger, Erwin P; Loos, Ruth J F; Sætrom, Pål; Hveem, Kristian; Boehnke, Michael; Groop, Leif; McCarthy, Mark; Meitinger, Thomas; Ballantyne, Christie M; Gabriel, Stacey B; O'Donnell, Christopher J; Post, Wendy S; North, Kari E; Reiner, Alexander P; Boerwinkle, Eric; Psaty, Bruce M; Altshuler, David; Kathiresan, Sekar; Lin, Dan-Yu; Jarvik, Gail P; Cupples, L Adrienne; Kooperberg, Charles; Wilson, James G; Nickerson, Deborah A; Abecasis, Goncalo R; Rich, Stephen S; Tracy, Russell P; Willer, Cristen J

    2014-02-06

    Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98(th) or <2(nd) percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments. Copyright © 2014 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  15. Whole exome sequencing identifies genetic variants in inherited thrombocytopenia with secondary qualitative function defects.

    PubMed

    Johnson, Ben; Lowe, Gillian C; Futterer, Jane; Lordkipanidzé, Marie; MacDonald, David; Simpson, Michael A; Sanchez-Guiú, Isabel; Drake, Sian; Bem, Danai; Leo, Vincenzo; Fletcher, Sarah J; Dawood, Ban; Rivera, José; Allsup, David; Biss, Tina; Bolton-Maggs, Paula Hb; Collins, Peter; Curry, Nicola; Grimley, Charlotte; James, Beki; Makris, Mike; Motwani, Jayashree; Pavord, Sue; Talks, Katherine; Thachil, Jecko; Wilde, Jonathan; Williams, Mike; Harrison, Paul; Gissen, Paul; Mundell, Stuart; Mumford, Andrew; Daly, Martina E; Watson, Steve P; Morgan, Neil V

    2016-10-01

    Inherited thrombocytopenias are a heterogeneous group of disorders characterized by abnormally low platelet counts which can be associated with abnormal bleeding. Next-generation sequencing has previously been employed in these disorders for the confirmation of suspected genetic abnormalities, and more recently in the discovery of novel disease-causing genes. However its full potential has not yet been exploited. Over the past 6 years we have sequenced the exomes from 55 patients, including 37 index cases and 18 additional family members, all of whom were recruited to the UK Genotyping and Phenotyping of Platelets study. All patients had inherited or sustained thrombocytopenia of unknown etiology with platelet counts varying from 11×10 9 /L to 186×10 9 /L. Of the 51 patients phenotypically tested, 37 (73%), had an additional secondary qualitative platelet defect. Using whole exome sequencing analysis we have identified "pathogenic" or "likely pathogenic" variants in 46% (17/37) of our index patients with thrombocytopenia. In addition, we report variants of uncertain significance in 12 index cases, including novel candidate genetic variants in previously unreported genes in four index cases. These results demonstrate that whole exome sequencing is an efficient method for elucidating potential pathogenic genetic variants in inherited thrombocytopenia. Whole exome sequencing also has the added benefit of discovering potentially pathogenic genetic variants for further study in novel genes not previously implicated in inherited thrombocytopenia. Copyright© Ferrata Storti Foundation.

  16. Whole-Exome Sequencing Identifies Novel Variants for Tooth Agenesis.

    PubMed

    Dinckan, N; Du, R; Petty, L E; Coban-Akdemir, Z; Jhangiani, S N; Paine, I; Baugh, E H; Erdem, A P; Kayserili, H; Doddapaneni, H; Hu, J; Muzny, D M; Boerwinkle, E; Gibbs, R A; Lupski, J R; Uyguner, Z O; Below, J E; Letra, A

    2018-01-01

    Tooth agenesis is a common craniofacial abnormality in humans and represents failure to develop 1 or more permanent teeth. Tooth agenesis is complex, and variations in about a dozen genes have been reported as contributing to the etiology. Here, we combined whole-exome sequencing, array-based genotyping, and linkage analysis to identify putative pathogenic variants in candidate disease genes for tooth agenesis in 10 multiplex Turkish families. Novel homozygous and heterozygous variants in LRP6, DKK1, LAMA3, and COL17A1 genes, as well as known variants in WNT10A, were identified as likely pathogenic in isolated tooth agenesis. Novel variants in KREMEN1 were identified as likely pathogenic in 2 families with suspected syndromic tooth agenesis. Variants in more than 1 gene were identified segregating with tooth agenesis in 2 families, suggesting oligogenic inheritance. Structural modeling of missense variants suggests deleterious effects to the encoded proteins. Functional analysis of an indel variant (c.3607+3_6del) in LRP6 suggested that the predicted resulting mRNA is subject to nonsense-mediated decay. Our results support a major role for WNT pathways genes in the etiology of tooth agenesis while revealing new candidate genes. Moreover, oligogenic cosegregation was suggestive for complex inheritance and potentially complex gene product interactions during development, contributing to improved understanding of the genetic etiology of familial tooth agenesis.

  17. Re-Ranking Sequencing Variants in the Post-GWAS Era for Accurate Causal Variant Identification

    PubMed Central

    Faye, Laura L.; Machiela, Mitchell J.; Kraft, Peter; Bull, Shelley B.; Sun, Lei

    2013-01-01

    Next generation sequencing has dramatically increased our ability to localize disease-causing variants by providing base-pair level information at costs increasingly feasible for the large sample sizes required to detect complex-trait associations. Yet, identification of causal variants within an established region of association remains a challenge. Counter-intuitively, certain factors that increase power to detect an associated region can decrease power to localize the causal variant. First, combining GWAS with imputation or low coverage sequencing to achieve the large sample sizes required for high power can have the unintended effect of producing differential genotyping error among SNPs. This tends to bias the relative evidence for association toward better genotyped SNPs. Second, re-use of GWAS data for fine-mapping exploits previous findings to ensure genome-wide significance in GWAS-associated regions. However, using GWAS findings to inform fine-mapping analysis can bias evidence away from the causal SNP toward the tag SNP and SNPs in high LD with the tag. Together these factors can reduce power to localize the causal SNP by more than half. Other strategies commonly employed to increase power to detect association, namely increasing sample size and using higher density genotyping arrays, can, in certain common scenarios, actually exacerbate these effects and further decrease power to localize causal variants. We develop a re-ranking procedure that accounts for these adverse effects and substantially improves the accuracy of causal SNP identification, often doubling the probability that the causal SNP is top-ranked. Application to the NCI BPC3 aggressive prostate cancer GWAS with imputation meta-analysis identified a new top SNP at 2 of 3 associated loci and several additional possible causal SNPs at these loci that may have otherwise been overlooked. This method is simple to implement using R scripts provided on the author's website. PMID:23950724

  18. Exome Sequencing Identifies Potential Risk Variants for Mendelian Disorders at High Prevalence in Qatar

    PubMed Central

    Rodriguez-Flores, Juan L.; Fakhro, Khalid; Hackett, Neil R.; Salit, Jacqueline; Fuller, Jennifer; Agosto-Perez, Francisco; Gharbiah, Maey; Malek, Joel A.; Zirie, Mahmoud; Jayyousi, Amin; Badii, Ramin; Al-Marri, Ajayeb Al-Nabet; Chouchane, Lotfi; Stadler, Dora J.; Hunter-Zinck, Haley; Mezey, Jason G.; Crystal, Ronald G.

    2013-01-01

    Exome sequencing of families of related individuals has been highly successful in identifying genetic polymorphisms responsible for Mendelian disorders. Here, we demonstrate the value of the reverse approach, where we use exome sequencing of a sample of unrelated individuals to analyze allele frequencies of known causal mutations for Mendelian diseases. We sequenced the exomes of 100 individuals representing the three major genetic subgroups of the Qatari population (Q1 Bedouin, Q2 Persian-South Asian, Q3 African) and identified 37 variants in 33 genes with effects on 36 clinically significant Mendelian diseases. These include variants not present in 1000 Genomes and variants at high frequency when compared to 1000 Genomes populations. Several of these Mendelian variants were only segregating in one Qatari subpopulation, where the observed subpopulation specificity trends were confirmed in an independent population of 386 Qataris. Pre-marital genetic screening in Qatar tests for only 4 out of the 37, such that this study provides a set of Mendelian disease variants with potential impact on the epidemiological profile of the population that could be incorporated into the testing program if further experimental and clinical characterization confirms high penetrance. PMID:24123366

  19. General Framework for Meta-analysis of Rare Variants in Sequencing Association Studies

    PubMed Central

    Lee, Seunggeun; Teslovich, Tanya M.; Boehnke, Michael; Lin, Xihong

    2013-01-01

    We propose a general statistical framework for meta-analysis of gene- or region-based multimarker rare variant association tests in sequencing association studies. In genome-wide association studies, single-marker meta-analysis has been widely used to increase statistical power by combining results via regression coefficients and standard errors from different studies. In analysis of rare variants in sequencing studies, region-based multimarker tests are often used to increase power. We propose meta-analysis methods for commonly used gene- or region-based rare variants tests, such as burden tests and variance component tests. Because estimation of regression coefficients of individual rare variants is often unstable or not feasible, the proposed method avoids this difficulty by calculating score statistics instead that only require fitting the null model for each study and then aggregating these score statistics across studies. Our proposed meta-analysis rare variant association tests are conducted based on study-specific summary statistics, specifically score statistics for each variant and between-variant covariance-type (linkage disequilibrium) relationship statistics for each gene or region. The proposed methods are able to incorporate different levels of heterogeneity of genetic effects across studies and are applicable to meta-analysis of multiple ancestry groups. We show that the proposed methods are essentially as powerful as joint analysis by directly pooling individual level genotype data. We conduct extensive simulations to evaluate the performance of our methods by varying levels of heterogeneity across studies, and we apply the proposed methods to meta-analysis of rare variant effects in a multicohort study of the genetics of blood lipid levels. PMID:23768515

  20. Whole exome sequencing of rare variants in EIF4G1 and VPS35 in Parkinson disease

    PubMed Central

    Nuytemans, Karen; Bademci, Guney; Inchausti, Vanessa; Dressen, Amy; Kinnamon, Daniel D.; Mehta, Arpit; Wang, Liyong; Züchner, Stephan; Beecham, Gary W.; Martin, Eden R.; Scott, William K.

    2013-01-01

    Objective: Recently, vacuolar protein sorting 35 (VPS35) and eukaryotic translation initiation factor 4 gamma 1 (EIF4G1) have been identified as 2 causal Parkinson disease (PD) genes. We used whole exome sequencing for rapid, parallel analysis of variations in these 2 genes. Methods: We performed whole exome sequencing in 213 patients with PD and 272 control individuals. Those rare variants (RVs) with <5% frequency in the exome variant server database and our own control data were considered for analysis. We performed joint gene-based tests for association using RVASSOC and SKAT (Sequence Kernel Association Test) as well as single-variant test statistics. Results: We identified 3 novel VPS35 variations that changed the coded amino acid (nonsynonymous) in 3 cases. Two variations were in multiplex families and neither segregated with PD. In EIF4G1, we identified 11 (9 nonsynonymous and 2 small indels) RVs including the reported pathogenic mutation p.R1205H, which segregated in all affected members of a large family, but also in 1 unaffected 86-year-old family member. Two additional RVs were found in isolated patients only. Whereas initial association studies suggested an association (p = 0.04) with all RVs in EIF4G1, subsequent testing in a second dataset for the driving variant (p.F1461) suggested no association between RVs in the gene and PD. Conclusions: We confirm that the specific EIF4G1 variation p.R1205H seems to be a strong PD risk factor, but is nonpenetrant in at least one 86-year-old. A few other select RVs in both genes could not be ruled out as causal. However, there was no evidence for an overall contribution of genetic variability in VPS35 or EIF4G1 to PD development in our dataset. PMID:23408866

  1. Sequence analyses of fimbriae subunit FimA proteins on Actinomyces naeslundii genospecies 1 and 2 and Actinomyces odontolyticus with variant carbohydrate binding specificities

    PubMed Central

    Drobni, Mirva; Hallberg, Kristina; Öhman, Ulla; Birve, Anna; Persson, Karina; Johansson, Ingegerd; Strömberg, Nicklas

    2006-01-01

    Background Actinomyces naeslundii genospecies 1 and 2 express type-2 fimbriae (FimA subunit polymers) with variant Galβ binding specificities and Actinomyces odontolyticus a sialic acid specificity to colonize different oral surfaces. However, the fimbrial nature of the sialic acid binding property and sequence information about FimA proteins from multiple strains are lacking. Results Here we have sequenced fimA genes from strains of A.naeslundii genospecies 1 (n = 4) and genospecies 2 (n = 4), both of which harboured variant Galβ-dependent hemagglutination (HA) types, and from A.odontolyticus PK984 with a sialic acid-dependent HA pattern. Three unique subtypes of FimA proteins with 63.8–66.4% sequence identity were present in strains of A. naeslundii genospecies 1 and 2 and A. odontolyticus. The generally high FimA sequence identity (>97.2%) within a genospecies revealed species specific sequences or segments that coincided with binding specificity. All three FimA protein variants contained a signal peptide, pilin motif, E box, proline-rich segment and an LPXTG sorting motif among other conserved segments for secretion, assembly and sorting of fimbrial proteins. The highly conserved pilin, E box and LPXTG motifs are present in fimbriae proteins from other Gram-positive bacteria. Moreover, only strains of genospecies 1 were agglutinated with type-2 fimbriae antisera derived from A. naeslundii genospecies 1 strain 12104, emphasizing that the overall folding of FimA may generate different functionalities. Western blot analyses with FimA antisera revealed monomers and oligomers of FimA in whole cell protein extracts and a purified recombinant FimA preparation, indicating a sortase-independent oligomerization of FimA. Conclusion The genus Actinomyces involves a diversity of unique FimA proteins with conserved pilin, E box and LPXTG motifs, depending on subspecies and associated binding specificity. In addition, a sortase independent oligomerization of FimA subunit

  2. Whole-exome sequencing identified a variant in EFTUD2 gene in establishing a genetic diagnosis.

    PubMed

    Rengasamy Venugopalan, S; Farrow, E G; Lypka, M

    2017-06-01

    Craniofacial anomalies are complex and have an overlapping phenotype. Mandibulofacial Dysostosis and Oculo-Auriculo-Vertebral Spectrum are conditions that share common craniofacial phenotype and present a challenge in arriving at a diagnosis. In this report, we present a case of female proband who was given a differential diagnosis of Treacher Collins syndrome or Hemifacial Microsomia without certainty. Prior genetic testing reported negative for 22q deletion and FGFR screenings. The objective of this study was to demonstrate the critical role of whole-exome sequencing in establishing a genetic diagnosis of the proband. The participants were 14½-year-old affected female proband/parent trio. Proband/parent trio were enrolled in the study. Surgical tissue sample from the proband and parental blood samples were collected and prepared for whole-exome sequencing. Illumina HiSeq 2500 instrument was used for sequencing (125 nucleotide reads/84X coverage). Analyses of variants were performed using custom-developed software, RUNES and VIKING. Variant analyses following whole-exome sequencing identified a heterozygous de novo pathogenic variant, c.259C>T (p.Gln87*), in EFTUD2 (NM_004247.3) gene in the proband. Previous studies have reported that the variants in EFTUD2 gene were associated with Mandibulofacial Dysostosis with Microcephaly. Patients with facial asymmetry, micrognathia, choanal atresia and microcephaly should be analyzed for variants in EFTUD2 gene. Next-generation sequencing techniques, such as whole-exome sequencing offer great promise to improve the understanding of etiologies of sporadic genetic diseases. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  3. Composition for nucleic acid sequencing

    DOEpatents

    Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY

    2008-08-26

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  4. Multiplexed resequencing analysis to identify rare variants in pooled DNA with barcode indexing using next-generation sequencer.

    PubMed

    Mitsui, Jun; Fukuda, Yoko; Azuma, Kyo; Tozaki, Hirokazu; Ishiura, Hiroyuki; Takahashi, Yuji; Goto, Jun; Tsuji, Shoji

    2010-07-01

    We have recently found that multiple rare variants of the glucocerebrosidase gene (GBA) confer a robust risk for Parkinson disease, supporting the 'common disease-multiple rare variants' hypothesis. To develop an efficient method of identifying rare variants in a large number of samples, we applied multiplexed resequencing using a next-generation sequencer to identification of rare variants of GBA. Sixteen sets of pooled DNAs from six pooled DNA samples were prepared. Each set of pooled DNAs was subjected to polymerase chain reaction to amplify the target gene (GBA) covering 6.5 kb, pooled into one tube with barcode indexing, and then subjected to extensive sequence analysis using the SOLiD System. Individual samples were also subjected to direct nucleotide sequence analysis. With the optimization of data processing, we were able to extract all the variants from 96 samples with acceptable rates of false-positive single-nucleotide variants.

  5. Group-based variant calling leveraging next-generation supercomputing for large-scale whole-genome sequencing studies.

    PubMed

    Standish, Kristopher A; Carland, Tristan M; Lockwood, Glenn K; Pfeiffer, Wayne; Tatineni, Mahidhar; Huang, C Chris; Lamberth, Sarah; Cherkas, Yauheniya; Brodmerkel, Carrie; Jaeger, Ed; Smith, Lance; Rajagopal, Gunaretnam; Curran, Mark E; Schork, Nicholas J

    2015-09-22

    Next-generation sequencing (NGS) technologies have become much more efficient, allowing whole human genomes to be sequenced faster and cheaper than ever before. However, processing the raw sequence reads associated with NGS technologies requires care and sophistication in order to draw compelling inferences about phenotypic consequences of variation in human genomes. It has been shown that different approaches to variant calling from NGS data can lead to different conclusions. Ensuring appropriate accuracy and quality in variant calling can come at a computational cost. We describe our experience implementing and evaluating a group-based approach to calling variants on large numbers of whole human genomes. We explore the influence of many factors that may impact the accuracy and efficiency of group-based variant calling, including group size, the biogeographical backgrounds of the individuals who have been sequenced, and the computing environment used. We make efficient use of the Gordon supercomputer cluster at the San Diego Supercomputer Center by incorporating job-packing and parallelization considerations into our workflow while calling variants on 437 whole human genomes generated as part of large association study. We ultimately find that our workflow resulted in high-quality variant calls in a computationally efficient manner. We argue that studies like ours should motivate further investigations combining hardware-oriented advances in computing systems with algorithmic developments to tackle emerging 'big data' problems in biomedical research brought on by the expansion of NGS technologies.

  6. Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data.

    PubMed

    Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A; Larsen, Martin Jakob

    2016-01-01

    Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths.

  7. A sequence variant associating with educational attainment also affects childhood cognition.

    PubMed

    Gunnarsson, Bjarni; Jónsdóttir, Guðrún A; Björnsdóttir, Gyða; Konte, Bettina; Sulem, Patrick; Kristmundsdóttir, Snædís; Kehr, Birte; Gústafsson, Ómar; Helgason, Hannes; Iordache, Paul D; Ólafsson, Sigurgeir; Frigge, Michael L; Þorleifsson, Guðmar; Arnarsdóttir, Sunna; Stefánsdóttir, Berglind; Giegling, Ina; Djurovic, Srdjan; Sundet, Kjetil S; Espeseth, Thomas; Melle, Ingrid; Hartmann, Annette M; Thorsteinsdottir, Unnur; Kong, Augustine; Guðbjartsson, Daníel F; Ettinger, Ulrich; Andreassen, Ole A; Dan Rujescu; Halldórsson, Jónas G; Stefánsson, Hreinn; Halldórsson, Bjarni V; Stefánsson, Kári

    2016-11-04

    Only a few common variants in the sequence of the genome have been shown to impact cognitive traits. Here we demonstrate that polygenic scores of educational attainment predict specific aspects of childhood cognition, as measured with IQ. Recently, three sequence variants were shown to associate with educational attainment, a confluence phenotype of genetic and environmental factors contributing to academic success. We show that one of these variants associating with educational attainment, rs4851266-T, also associates with Verbal IQ in dyslexic children (P = 4.3 × 10 -4 , β = 0.16 s.d.). The effect of 0.16 s.d. corresponds to 1.4 IQ points for heterozygotes and 2.8 IQ points for homozygotes. We verified this association in independent samples consisting of adults (P = 8.3 × 10 -5 , β = 0.12 s.d., combined P = 2.2 x 10 -7 , β = 0.14 s.d.). Childhood cognition is unlikely to be affected by education attained later in life, and the variant explains a greater fraction of the variance in verbal IQ than in educational attainment (0.7% vs 0.12%,. P = 1.0 × 10 -5 ).

  8. Quick, sensitive and specific detection and evaluation of quantification of minor variants by high-throughput sequencing.

    PubMed

    Leung, Ross Ka-Kit; Dong, Zhi Qiang; Sa, Fei; Chong, Cheong Meng; Lei, Si Wan; Tsui, Stephen Kwok-Wing; Lee, Simon Ming-Yuen

    2014-02-01

    Minor variants have significant implications in quasispecies evolution, early cancer detection and non-invasive fetal genotyping but their accurate detection by next-generation sequencing (NGS) is hampered by sequencing errors. We generated sequencing data from mixtures at predetermined ratios in order to provide insight into sequencing errors and variations that can arise for which simulation cannot be performed. The information also enables better parameterization in depth of coverage, read quality and heterogeneity, library preparation techniques, technical repeatability for mathematical modeling, theory development and simulation experimental design. We devised minor variant authentication rules that achieved 100% accuracy in both testing and validation experiments. The rules are free from tedious inspection of alignment accuracy, sequencing read quality or errors introduced by homopolymers. The authentication processes only require minor variants to: (1) have minimum depth of coverage larger than 30; (2) be reported by (a) four or more variant callers, or (b) DiBayes or LoFreq, plus SNVer (or BWA when no results are returned by SNVer), and with the interassay coefficient of variation (CV) no larger than 0.1. Quantification accuracy undermined by sequencing errors could neither be overcome by ultra-deep sequencing, nor recruiting more variant callers to reach a consensus, such that consistent underestimation and overestimation (i.e. low CV) were observed. To accommodate stochastic error and adjust the observed ratio within a specified accuracy, we presented a proof of concept for the use of a double calibration curve for quantification, which provides an important reference towards potential industrial-scale fabrication of calibrants for NGS.

  9. Chip-based sequencing nucleic acids

    DOEpatents

    Beer, Neil Reginald

    2014-08-26

    A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.

  10. Identification of a Latin American-specific BabA adhesin variant through whole genome sequencing of Helicobacter pylori patient isolates from Nicaragua

    DOE PAGES

    Thorell, Kaisa; Hosseini, Shaghayegh; Palacios Gonzales, Reyna Victoria Palacios; ...

    2016-02-29

    In this study, Helicobacter pylori (H. pylori) is one of the most common bacterial infections in humans and this infection can lead to gastric ulcers and gastric cancer. H. pylori is one of the most genetically variable human pathogens and the ability of the bacterium to bind to the host epithelium as well as the presence of different virulence factors and genetic variants within these genes have been associated with disease severity. Nicaragua has particularly high gastric cancer incidence and we therefore studied Nicaraguan clinical H. pylori isolates for factors that could contribute to cancer risk. The complete genomes ofmore » fifty-two Nicaraguan H. pylorii isolates were sequenced and assembled de novo, and phylogenetic and virulence factor analyses were performed. The Nicaraguan isolates showed phylogenetic relationship with West African isolates in whole-genome sequence comparisons and with Western and urban South-and Central American isolates using MLSA (Multi-locus sequence analysis). A majority, 77 % of the isolates carried the cancer-associated virulence gene cagA and also the s1/i1/m1 vacuolating cytotoxin, vacA allele combination, which is linked to increased severity of disease. Specifically, we also found that Nicaraguan isolates have a blood group-binding adhesin (BabA) variant highly similar to previously reported BabA sequences from Latin America, including from isolates belonging to other phylogenetic groups. These BabA sequences were found to be under positive selection at several amino acid positions that differed from the global collection of isolates. In conclusion, the discovery of a Latin American BabA variant, independent of overall phylogenetic background, suggests hitherto unknown host or environmental factors within the Latin American population giving H. pylori isolates carrying this adhesin variant a selective advantage, which could affect pathogenesis and risk for sequelae through specific adherence properties.« less

  11. Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data

    PubMed Central

    Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A.; Larsen, Martin Jakob

    2016-01-01

    Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths. PMID:27002637

  12. Identification of rare X-linked neuroligin variants by massively parallel sequencing in males with autism spectrum disorder.

    PubMed

    Steinberg, Karyn Meltz; Ramachandran, Dhanya; Patel, Viren C; Shetty, Amol C; Cutler, David J; Zwick, Michael E

    2012-09-28

    Autism spectrum disorder (ASD) is highly heritable, but the genetic risk factors for it remain largely unknown. Although structural variants with large effect sizes may explain up to 15% ASD, genome-wide association studies have failed to uncover common single nucleotide variants with large effects on phenotype. The focus within ASD genetics is now shifting to the examination of rare sequence variants of modest effect, which is most often achieved via exome selection and sequencing. This strategy has indeed identified some rare candidate variants; however, the approach does not capture the full spectrum of genetic variation that might contribute to the phenotype. We surveyed two loci with known rare variants that contribute to ASD, the X-linked neuroligin genes by performing massively parallel Illumina sequencing of the coding and noncoding regions from these genes in males from families with multiplex autism. We annotated all variant sites and functionally tested a subset to identify other rare mutations contributing to ASD susceptibility. We found seven rare variants at evolutionary conserved sites in our study population. Functional analyses of the three 3' UTR variants did not show statistically significant effects on the expression of NLGN3 and NLGN4X. In addition, we identified two NLGN3 intronic variants located within conserved transcription factor binding sites that could potentially affect gene regulation. These data demonstrate the power of massively parallel, targeted sequencing studies of affected individuals for identifying rare, potentially disease-contributing variation. However, they also point out the challenges and limitations of current methods of direct functional testing of rare variants and the difficulties of identifying alleles with modest effects.

  13. Identification of rare X-linked neuroligin variants by massively parallel sequencing in males with autism spectrum disorder

    PubMed Central

    2012-01-01

    Background Autism spectrum disorder (ASD) is highly heritable, but the genetic risk factors for it remain largely unknown. Although structural variants with large effect sizes may explain up to 15% ASD, genome-wide association studies have failed to uncover common single nucleotide variants with large effects on phenotype. The focus within ASD genetics is now shifting to the examination of rare sequence variants of modest effect, which is most often achieved via exome selection and sequencing. This strategy has indeed identified some rare candidate variants; however, the approach does not capture the full spectrum of genetic variation that might contribute to the phenotype. Methods We surveyed two loci with known rare variants that contribute to ASD, the X-linked neuroligin genes by performing massively parallel Illumina sequencing of the coding and noncoding regions from these genes in males from families with multiplex autism. We annotated all variant sites and functionally tested a subset to identify other rare mutations contributing to ASD susceptibility. Results We found seven rare variants at evolutionary conserved sites in our study population. Functional analyses of the three 3’ UTR variants did not show statistically significant effects on the expression of NLGN3 and NLGN4X. In addition, we identified two NLGN3 intronic variants located within conserved transcription factor binding sites that could potentially affect gene regulation. Conclusions These data demonstrate the power of massively parallel, targeted sequencing studies of affected individuals for identifying rare, potentially disease-contributing variation. However, they also point out the challenges and limitations of current methods of direct functional testing of rare variants and the difficulties of identifying alleles with modest effects. PMID:23020841

  14. Variants of uncertain significance in newborn screening disorders: implications for large-scale genomic sequencing.

    PubMed

    Narravula, Alekhya; Garber, Kathryn B; Askree, S Hussain; Hegde, Madhuri; Hall, Patricia L

    2017-01-01

    As exome and genome sequencing using high-throughput sequencing technologies move rapidly into the diagnostic process, laboratories and clinicians need to develop a strategy for dealing with uncertain findings. A commitment must be made to minimize these findings, and all parties may need to make adjustments to their processes. The information required to reclassify these variants is often available but not communicated to all relevant parties. To illustrate these issues, we focused on three well-characterized monogenic, metabolic disorders included in newborn screens: classic galactosemia, caused by GALT variants; phenylketonuria, caused by PAH variants; and medium-chain acyl-CoA dehydrogenase (MCAD) deficiency, caused by ACADM variants. In 10 years of clinical molecular testing, we have observed 134 unique GALT variants, 46 of which were variants of uncertain significance (VUS). In PAH, we observed 132 variants, including 17 VUS, and for ACADM, we observed 64 unique variants, of which 33 were uncertain. After this review, 17 VUS (37%; 7 in ACADM, 9 in GALT, and 1 in PAH) were reclassified from uncertain (6 to benign or likely benign and 11 to pathogenic or likely pathogenic). We identified common types of missing information that would have helped make a definitive classification and categorized this information by ease and cost to obtain.Genet Med 19 1, 77-82.

  15. Genomic prediction using preselected DNA variants from a GWAS with whole-genome sequence data in Holstein-Friesian cattle.

    PubMed

    Veerkamp, Roel F; Bouwman, Aniek C; Schrooten, Chris; Calus, Mario P L

    2016-12-01

    Whole-genome sequence data is expected to capture genetic variation more completely than common genotyping panels. Our objective was to compare the proportion of variance explained and the accuracy of genomic prediction by using imputed sequence data or preselected SNPs from a genome-wide association study (GWAS) with imputed whole-genome sequence data. Phenotypes were available for 5503 Holstein-Friesian bulls. Genotypes were imputed up to whole-genome sequence (13,789,029 segregating DNA variants) by using run 4 of the 1000 bull genomes project. The program GCTA was used to perform GWAS for protein yield (PY), somatic cell score (SCS) and interval from first to last insemination (IFL). From the GWAS, subsets of variants were selected and genomic relationship matrices (GRM) were used to estimate the variance explained in 2087 validation animals and to evaluate the genomic prediction ability. Finally, two GRM were fitted together in several models to evaluate the effect of selected variants that were in competition with all the other variants. The GRM based on full sequence data explained only marginally more genetic variation than that based on common SNP panels: for PY, SCS and IFL, genomic heritability improved from 0.81 to 0.83, 0.83 to 0.87 and 0.69 to 0.72, respectively. Sequence data also helped to identify more variants linked to quantitative trait loci and resulted in clearer GWAS peaks across the genome. The proportion of total variance explained by the selected variants combined in a GRM was considerably smaller than that explained by all variants (less than 0.31 for all traits). When selected variants were used, accuracy of genomic predictions decreased and bias increased. Although 35 to 42 variants were detected that together explained 13 to 19% of the total variance (18 to 23% of the genetic variance) when fitted alone, there was no advantage in using dense sequence information for genomic prediction in the Holstein data used in our study

  16. Sequence Variation in the Small-Subunit rRNA Gene of Plasmodium malariae and Prevalence of Isolates with the Variant Sequence in Sichuan, China

    PubMed Central

    Liu, Qing; Zhu, Shenghua; Mizuno, Sahoko; Kimura, Masatsugu; Liu, Peina; Isomura, Shin; Wang, Xingzhen; Kawamoto, Fumihiko

    1998-01-01

    By two PCR-based diagnostic methods, Plasmodium malariae infections have been rediscovered at two foci in the Sichuan province of China, a region where no cases of P. malariae have been officially reported for the last 2 decades. In addition, a variant form of P. malariae which has a deletion of 19 bp and seven substitutions of base pairs in the target sequence of the small-subunit (SSU) rRNA gene was detected with high frequency. Alignment analysis of Plasmodium sp. SSU rRNA gene sequences revealed that the 5′ region of the variant sequence is identical to that of P. vivax or P. knowlesi and its 3′ region is identical to that of P. malariae. The same sequence variations were also found in P. malariae isolates collected along the Thai-Myanmar border, suggesting a wide distribution of this variant form from southern China to Southeast Asia. PMID:9774600

  17. Filovirus RefSeq Entries: Evaluation and Selection of Filovirus Type Variants, Type Sequences, and Names

    PubMed Central

    Kuhn, Jens H.; Andersen, Kristian G.; Bào, Yīmíng; Bavari, Sina; Becker, Stephan; Bennett, Richard S.; Bergman, Nicholas H.; Blinkova, Olga; Bradfute, Steven; Brister, J. Rodney; Bukreyev, Alexander; Chandran, Kartik; Chepurnov, Alexander A.; Davey, Robert A.; Dietzgen, Ralf G.; Doggett, Norman A.; Dolnik, Olga; Dye, John M.; Enterlein, Sven; Fenimore, Paul W.; Formenty, Pierre; Freiberg, Alexander N.; Garry, Robert F.; Garza, Nicole L.; Gire, Stephen K.; Gonzalez, Jean-Paul; Griffiths, Anthony; Happi, Christian T.; Hensley, Lisa E.; Herbert, Andrew S.; Hevey, Michael C.; Hoenen, Thomas; Honko, Anna N.; Ignatyev, Georgy M.; Jahrling, Peter B.; Johnson, Joshua C.; Johnson, Karl M.; Kindrachuk, Jason; Klenk, Hans-Dieter; Kobinger, Gary; Kochel, Tadeusz J.; Lackemeyer, Matthew G.; Lackner, Daniel F.; Leroy, Eric M.; Lever, Mark S.; Mühlberger, Elke; Netesov, Sergey V.; Olinger, Gene G.; Omilabu, Sunday A.; Palacios, Gustavo; Panchal, Rekha G.; Park, Daniel J.; Patterson, Jean L.; Paweska, Janusz T.; Peters, Clarence J.; Pettitt, James; Pitt, Louise; Radoshitzky, Sheli R.; Ryabchikova, Elena I.; Saphire, Erica Ollmann; Sabeti, Pardis C.; Sealfon, Rachel; Shestopalov, Aleksandr M.; Smither, Sophie J.; Sullivan, Nancy J.; Swanepoel, Robert; Takada, Ayato; Towner, Jonathan S.; van der Groen, Guido; Volchkov, Viktor E.; Volchkova, Valentina A.; Wahl-Jensen, Victoria; Warren, Travis K.; Warfield, Kelly L.; Weidmann, Manfred; Nichol, Stuart T.

    2014-01-01

    Sequence determination of complete or coding-complete genomes of viruses is becoming common practice for supporting the work of epidemiologists, ecologists, virologists, and taxonomists. Sequencing duration and costs are rapidly decreasing, sequencing hardware is under modification for use by non-experts, and software is constantly being improved to simplify sequence data management and analysis. Thus, analysis of virus disease outbreaks on the molecular level is now feasible, including characterization of the evolution of individual virus populations in single patients over time. The increasing accumulation of sequencing data creates a management problem for the curators of commonly used sequence databases and an entry retrieval problem for end users. Therefore, utilizing the data to their fullest potential will require setting nomenclature and annotation standards for virus isolates and associated genomic sequences. The National Center for Biotechnology Information’s (NCBI’s) RefSeq is a non-redundant, curated database for reference (or type) nucleotide sequence records that supplies source data to numerous other databases. Building on recently proposed templates for filovirus variant naming [ ()////variant designation>-], we report consensus decisions from a majority of past and currently active filovirus experts on the eight filovirus type variants and isolates to be represented in RefSeq, their final designations, and their associated sequences. PMID:25256396

  18. Identification of Candidate Gene Variants in Korean MODY Families by Whole-Exome Sequencing.

    PubMed

    Shim, Ye Jee; Kim, Jung Eun; Hwang, Su-Kyeong; Choi, Bong Seok; Choi, Byung Ho; Cho, Eun-Mi; Jang, Kyoung Mi; Ko, Cheol Woo

    2015-01-01

    To date, 13 genes causing maturity-onset diabetes of the young (MODY) have been identified. However, there is a big discrepancy in the genetic locus between Asian and Caucasian patients with MODY. Thus, we conducted whole-exome sequencing in Korean MODY families to identify causative gene variants. Six MODY probands and their family members were included. Variants in the dbSNP135 and TIARA databases for Koreans and the variants with minor allele frequencies >0.5% of the 1000 Genomes database were excluded. We selected only the functional variants (gain of stop codon, frameshifts and nonsynonymous single-nucleotide variants) and conducted a case-control comparison in the family members. The selected variants were scanned for the previously introduced gene set implicated in glucose metabolism. Three variants c.620C>T:p.Thr207Ile in PTPRD, c.559C>G:p.Gln187Glu in SYT9, and c.1526T>G:p.Val509Gly in WFS1 were respectively identified in 3 families. We could not find any disease-causative alleles of known MODY 1-13 genes. Based on the predictive program, Thr207Ile in PTPRD was considered pathogenic. Whole-exome sequencing is a valuable method for the genetic diagnosis of MODY. Further evaluation is necessary about the role of PTPRD, SYT9 and WFS1 in normal insulin release from pancreatic beta cells. © 2015 S. Karger AG, Basel.

  19. Common 5S rRNA variants are likely to be accepted in many sequence contexts

    NASA Technical Reports Server (NTRS)

    Zhang, Zhengdong; D'Souza, Lisa M.; Lee, Youn-Hyung; Fox, George E.

    2003-01-01

    Over evolutionary time RNA sequences which are successfully fixed in a population are selected from among those that satisfy the structural and chemical requirements imposed by the function of the RNA. These sequences together comprise the structure space of the RNA. In principle, a comprehensive understanding of RNA structure and function would make it possible to enumerate which specific RNA sequences belong to a particular structure space and which do not. We are using bacterial 5S rRNA as a model system to attempt to identify principles that can be used to predict which sequences do or do not belong to the 5S rRNA structure space. One promising idea is the very intuitive notion that frequently seen sequence changes in an aligned data set of naturally occurring 5S rRNAs would be widely accepted in many other 5S rRNA sequence contexts. To test this hypothesis, we first developed well-defined operational definitions for a Vibrio region of the 5S rRNA structure space and what is meant by a highly variable position. Fourteen sequence variants (10 point changes and 4 base-pair changes) were identified in this way, which, by the hypothesis, would be expected to incorporate successfully in any of the known sequences in the Vibrio region. All 14 of these changes were constructed and separately introduced into the Vibrio proteolyticus 5S rRNA sequence where they are not normally found. Each variant was evaluated for its ability to function as a valid 5S rRNA in an E. coli cellular context. It was found that 93% (13/14) of the variants tested are likely valid 5S rRNAs in this context. In addition, seven variants were constructed that, although present in the Vibrio region, did not meet the stringent criteria for a highly variable position. In this case, 86% (6/7) are likely valid. As a control we also examined seven variants that are seldom or never seen in the Vibrio region of 5S rRNA sequence space. In this case only two of seven were found to be potentially valid. The

  20. Identification of novel mutations and sequence variants in the SOX2 and CHX10 genes in patients with anophthalmia/microphthalmia

    PubMed Central

    Zhou, Jie; Kherani, Femida; Bardakjian, Tanya M.; Katowitz, James; Hughes, Nkecha; Schimmenti, Lisa A.; Schneider, Adele

    2008-01-01

    Purpose Mutations in the SOX2 and CHX10 genes have been reported in patients with anophthalmia and/or microphthalmia. In this study, we evaluated 34 anophthalmic/microphthalmic patient DNA samples (two sets of siblings included) for mutations and sequence variants in SOX2 and CHX10. Methods Conformational sensitive gel electrophoresis (CSGE) was used for the initial SOX2 and CHX10 screening of 34 affected individuals (two sets of siblings), five unaffected family members, and 80 healthy controls. Patient samples containing heteroduplexes were selected for sequence analysis. Base pair changes in SOX2 and CHX10 were confirmed by sequencing bidirectionally in patient samples. Results Two novel heterozygous mutations and two sequence variants (one known) in SOX2 were identified in this cohort. Mutation c.310 G>T (p. Glu104X), found in one patient, was in the region encoding the high mobility group (HMG) DNA-binding domain and resulted in a change from glutamic acid to a stop codon. The second mutation, noted in two affected siblings, was a single nucleotide deletion c.549delC (p. Pro184ArgfsX19) in the region encoding the activation domain, resulting in a frameshift and premature termination of the coding sequence. The shortened protein products may result in the loss of function. In addition, a novel nucleotide substitution c.*557G>A was identified in the 3′-untranslated region in one patient. The relationship between the nucleotide change and the protein function is indeterminate. A known single nucleotide polymorphism (c. *469 C>A, SNP rs11915160) was also detected in 2 of the 34 patients. Screening of CHX10 identified two synonymous sequence variants, c.471 C>T (p.Ser157Ser, rs35435463) and c.579 G>A (p. Gln193Gln, novel SNP), and one non-synonymous sequence variant, c.871 G>A (p. Asp291Asn, novel SNP). The non-synonymous polymorphism was also present in healthy controls, suggesting non-causality. Conclusions These results support the role of SOX2 in ocular

  1. The quest for rare variants: pooled multiplexed next generation sequencing in plants.

    PubMed

    Marroni, Fabio; Pinosio, Sara; Morgante, Michele

    2012-01-01

    Next generation sequencing (NGS) instruments produce an unprecedented amount of sequence data at contained costs. This gives researchers the possibility of designing studies with adequate power to identify rare variants at a fraction of the economic and labor resources required by individual Sanger sequencing. As of today, few research groups working in plant sciences have exploited this potentiality, showing that pooled NGS provides results in excellent agreement with those obtained by individual Sanger sequencing. The aim of this review is to convey to the reader the general ideas underlying the use of pooled NGS for the identification of rare variants. To facilitate a thorough understanding of the possibilities of the method, we will explain in detail the possible experimental and analytical approaches and discuss their advantages and disadvantages. We will show that information on allele frequency obtained by pooled NGS can be used to accurately compute basic population genetics indexes such as allele frequency, nucleotide diversity, and Tajima's D. Finally, we will discuss applications and future perspectives of the multiplexed NGS approach.

  2. In silico prediction of the pathogenic effect of a novel variant of BCKDHA leading to classical maple syrup urine disease identified using clinical exome sequencing.

    PubMed

    Fernández-Lainez, Cynthia; Aláez-Verson, Carmen; Ibarra-González, Isabel; Enríquez-Flores, Sergio; Carrillo-Sanchez, Karol; Flores-Lagunes, Leonardo; Guillén-López, Sara; Belmont-Martínez, Leticia; Vela-Amieva, Marcela

    2018-04-16

    Maple syrup urine disease (MSUD) is a metabolic disorder caused by mutations in three of the branched-chain α-keto acid dehydrogenase complex (BCKDC) genes. Classical MSUD symptom can be observed immediately after birth and include ketoacidosis, irritability, lethargy, and coma, which can lead to death or irreversible neurodevelopmental delay in survivors. The molecular diagnosis of MSUD can be time-consuming and difficult to establish using conventional Sanger sequencing because it could be due to pathogenic variants of any of the BCKDC genes. Next-generation sequencing-based methodologies have revolutionized the molecular diagnosis of inborn errors in metabolism and offer a superior approach for genotyping these patients. Here, we report an MSUD case whose molecular diagnosis was performed by clinical exome sequencing (CES), and the possible structural pathogenic effect of a novel E1α subunit pathogenic variant was analyzed using in silico analysis of α and β subunit crystallographic structure. Molecular analysis revealed a new homozygous non-sense c.1267C>T or p.Gln423Ter variant of BCKDHA. The novel BCKDHA variant is considered pathogenic because it caused a premature stop codon that probably led to the loss of the last 22 amino acid residues of the E1α subunit C-terminal end. In silico analysis of this region showed that it is in contact with several residues of the E1β subunit mainly through polar contacts, hydrogen bonds, and hydrophobic interactions. CES strategy could benefit the patients and families by offering precise and prompt diagnosis and better genetic counseling. Copyright © 2018 Elsevier B.V. All rights reserved.

  3. MYO7A and USH2A gene sequence variants in Italian patients with Usher syndrome.

    PubMed

    Sodi, Andrea; Mariottini, Alessandro; Passerini, Ilaria; Murro, Vittoria; Tachyla, Iryna; Bianchi, Benedetta; Menchini, Ugo; Torricelli, Francesca

    2014-01-01

    To analyze the spectrum of sequence variants in the MYO7A and USH2A genes in a group of Italian patients affected by Usher syndrome (USH). Thirty-six Italian patients with a diagnosis of USH were recruited. They received a standard ophthalmologic examination, visual field testing, optical coherence tomography (OCT) scan, and electrophysiological tests. Fluorescein angiography and fundus autofluorescence imaging were performed in selected cases. All the patients underwent an audiologic examination for the 0.25-8,000 Hz frequencies. Vestibular function was evaluated with specific tests. DNA samples were analyzed for sequence variants of the MYO7A gene (for USH1) and the USH2A gene (for USH2) with direct sequencing techniques. A few patients were analyzed for both genes. In the MYO7A gene, ten missense variants were found; three patients were compound heterozygous, and two were homozygous. Thirty-four USH2A gene variants were detected, including eight missense variants, nine nonsense variants, six splicing variants, and 11 duplications/deletions; 19 patients were compound heterozygous, and three were homozygous. Four MYO7A and 17 USH2A variants have already been described in the literature. Among the novel mutations there are four USH2A large deletions, detected with multiplex ligation dependent probe amplification (MLPA) technology. Two potentially pathogenic variants were found in 27 patients (75%). Affected patients showed variable clinical pictures without a clear genotype-phenotype correlation. Ten variants in the MYO7A gene and 34 variants in the USH2A gene were detected in Italian patients with USH at a high detection rate. A selective analysis of these genes may be valuable for molecular analysis, combining diagnostic efficiency with little time wastage and less resource consumption.

  4. Syndromic intellectual disability: a new phenotype caused by an aromatic amino acid decarboxylase gene (DDC) variant.

    PubMed

    Graziano, Claudio; Wischmeijer, Anita; Pippucci, Tommaso; Fusco, Carlo; Diquigiovanni, Chiara; Nõukas, Margit; Sauk, Martin; Kurg, Ants; Rivieri, Francesca; Blau, Nenad; Hoffmann, Georg F; Chaubey, Alka; Schwartz, Charles E; Romeo, Giovanni; Bonora, Elena; Garavelli, Livia; Seri, Marco

    2015-04-01

    The causative variant in a consanguineous family in which the three patients (two siblings and a cousin) presented with intellectual disability, Marfanoid habitus, craniofacial dysmorphisms, chronic diarrhea and progressive kyphoscoliosis, has been identified through whole exome sequencing (WES) analysis. WES study identified a homozygous DDC variant in the patients, c.1123C>T, resulting in p.Arg375Cys missense substitution. Mutations in DDC cause a recessive metabolic disorder (aromatic amino acid decarboxylase, AADC, deficiency, OMIM #608643) characterized by hypotonia, oculogyric crises, excessive sweating, temperature instability, dystonia, severe neurologic dysfunction in infancy, and specific abnormalities of neurotransmitters and their metabolites in the cerebrospinal fluid (CSF). In our family, analysis of neurotransmitters and their metabolites in patient's CSF shows a pattern compatible with AADC deficiency, although the clinical signs are different from the classic form. Our work expands the phenotypic spectrum associated with DDC variants, which therefore can cause an additional novel syndrome without typical movement abnormalities. Copyright © 2015 Elsevier B.V. All rights reserved.

  5. Genetic epidemiology of pharmacogenetic variants in South East Asian Malays using whole-genome sequences.

    PubMed

    Sivadas, A; Salleh, M Z; Teh, L K; Scaria, V

    2017-10-01

    Expanding the scope of pharmacogenomic research by including multiple global populations is integral to building robust evidence for its clinical translation. Deep whole-genome sequencing of diverse ethnic populations provides a unique opportunity to study rare and common pharmacogenomic markers that often vary in frequency across populations. In this study, we aim to build a diverse map of pharmacogenetic variants in South East Asian (SEA) Malay population using deep whole-genome sequences of 100 healthy SEA Malay individuals. We investigated the allelic diversity of potentially deleterious pharmacogenomic variants in SEA Malay population. Our analysis revealed 227 common and 466 rare potentially functional single nucleotide variants (SNVs) in 437 pharmacogenomic genes involved in drug metabolism, transport and target genes, including 74 novel variants. This study has created one of the most comprehensive maps of pharmacogenetic markers in any population from whole genomes and will hugely benefit pharmacogenomic investigations and drug dosage recommendations in SEA Malays.

  6. Short communication: Validation of 4 candidate causative trait variants in 2 cattle breeds using targeted sequence imputation.

    PubMed

    Pausch, Hubert; Wurmser, Christine; Reinhardt, Friedrich; Emmerling, Reiner; Fries, Ruedi

    2015-06-01

    Most association studies for pinpointing trait-associated variants are performed within breed. The availability of sequence data from key ancestors of several cattle breeds now enables immediate assessment of the frequency of trait-associated variants in populations different from the mapping population and their imputation into large validation populations. The objective of this study was to validate the effects of 4 putatively causative variants on milk production traits, male fertility, and stature in German Fleckvieh and Holstein-Friesian animals using targeted sequence imputation. We used whole-genome sequence data of 456 animals to impute 4 missense mutations in DGAT1, GHR, PRLR, and PROP1 into 10,363 Fleckvieh and 8,812 Holstein animals. The accuracy of the imputed genotypes exceeded 95% for all variants. Association testing with imputed variants revealed consistent antagonistic effects of the DGAT1 p.A232K and GHR p.F279Y variants on milk yield and protein and fat contents, respectively, in both breeds. The allele frequency of both polymorphisms has changed considerably in the past 20 yr, indicating that they were targets of recent selection for milk production traits. The PRLR p.S18N variant was associated with yield traits in Fleckvieh but not in Holstein, suggesting that it may be in linkage disequilibrium with a mutation affecting yield traits rather than being causal. The reported effects of the PROP1 p.H173R variant on milk production, male fertility, and stature could not be confirmed. Our results demonstrate that population-wide imputation of candidate causal variants from sequence data is feasible, enabling their rapid validation in large independent populations. Copyright © 2015 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  7. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-06-06

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  8. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-05-30

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  9. A Multiple-Sequence Variant of the Multiple-Baseline Design: A Strategy for Analysis of Sequence Effects and Treatment Comparison.

    ERIC Educational Resources Information Center

    Noell, George H.; Gresham, Frank M.

    2001-01-01

    Describes design logic and potential uses of a variant of the multiple-baseline design. The multiple-baseline multiple-sequence (MBL-MS) consists of multiple-baseline designs that are interlaced with one another and include all possible sequences of treatments. The MBL-MS design appears to be primarily useful for comparison of treatments taking…

  10. A phylogenetic framework facilitates Y-STR variant discovery and classification via massively parallel sequencing.

    PubMed

    Huszar, Tunde I; Jobling, Mark A; Wetton, Jon H

    2018-04-12

    Short tandem repeats on the male-specific region of the Y chromosome (Y-STRs) are permanently linked as haplotypes, and therefore Y-STR sequence diversity can be considered within the robust framework of a phylogeny of haplogroups defined by single nucleotide polymorphisms (SNPs). Here we use massively parallel sequencing (MPS) to analyse the 23 Y-STRs in Promega's prototype PowerSeq™ Auto/Mito/Y System kit (containing the markers of the PowerPlex® Y23 [PPY23] System) in a set of 100 diverse Y chromosomes whose phylogenetic relationships are known from previous megabase-scale resequencing. Including allele duplications and alleles resulting from likely somatic mutation, we characterised 2311 alleles, demonstrating 99.83% concordance with capillary electrophoresis (CE) data on the same sample set. The set contains 267 distinct sequence-based alleles (an increase of 58% compared to the 169 detectable by CE), including 60 novel Y-STR variants phased with their flanking sequences which have not been reported previously to our knowledge. Variation includes 46 distinct alleles containing non-reference variants of SNPs/indels in both repeat and flanking regions, and 145 distinct alleles containing repeat pattern variants (RPV). For DYS385a,b, DYS481 and DYS390 we observed repeat count variation in short flanking segments previously considered invariable, and suggest new MPS-based structural designations based on these. We considered the observed variation in the context of the Y phylogeny: several specific haplogroup associations were observed for SNPs and indels, reflecting the low mutation rates of such variant types; however, RPVs showed less phylogenetic coherence and more recurrence, reflecting their relatively high mutation rates. In conclusion, our study reveals considerable additional diversity at the Y-STRs of the PPY23 set via MPS analysis, demonstrates high concordance with CE data, facilitates nomenclature standardisation, and places Y-STR sequence variants

  11. Integrated rare variant-based risk gene prioritization in disease case-control sequencing studies.

    PubMed

    Lin, Jhih-Rong; Zhang, Quanwei; Cai, Ying; Morrow, Bernice E; Zhang, Zhengdong D

    2017-12-01

    Rare variants of major effect play an important role in human complex diseases and can be discovered by sequencing-based genome-wide association studies. Here, we introduce an integrated approach that combines the rare variant association test with gene network and phenotype information to identify risk genes implicated by rare variants for human complex diseases. Our data integration method follows a 'discovery-driven' strategy without relying on prior knowledge about the disease and thus maintains the unbiased character of genome-wide association studies. Simulations reveal that our method can outperform a widely-used rare variant association test method by 2 to 3 times. In a case study of a small disease cohort, we uncovered putative risk genes and the corresponding rare variants that may act as genetic modifiers of congenital heart disease in 22q11.2 deletion syndrome patients. These variants were missed by a conventional approach that relied on the rare variant association test alone.

  12. Associations between variants of FADS genes and omega-3 and omega-6 milk fatty acids of Canadian Holstein cows.

    PubMed

    Ibeagha-Awemu, Eveline M; Akwanji, Kingsley A; Beaudoin, Frédéric; Zhao, Xin

    2014-02-17

    Fatty acid desaturase 1 (FADS1) and 2 (FADS2) genes code respectively for the enzymes delta-5 and delta-6 desaturases which are rate limiting enzymes in the synthesis of polyunsaturated omega-3 and omega-6 fatty acids (FAs). Omega-3 and-6 FAs as well as conjugated linoleic acid (CLA) are present in bovine milk and have demonstrated positive health effects in humans. Studies in humans have shown significant relationships between genetic variants in FADS1 and 2 genes with plasma and tissue concentrations of omega-3 and-6 FAs. The aim of this study was to evaluate the extent of sequence variations within these two genes in Canadian Holstein cows as well as the association between sequence variants and health promoting FAs in milk. Thirty three SNPs were detected within the studied regions of genes including a synonymous mutation (FADS1-07, rs42187261, 306Tyr > Tyr) in exon 8 of FADS1, a non-synonymous mutation (FADS2-14, rs211580559, 294Ala > Val) within FADS2 exon 7, a splice site SNP (FADS2-05, rs211263660), a 3'UTR SNP (FADS2-23, rs109772589), and another 3'UTR SNP with an effect on a microRNA binding site within FADS2 gene (FADS2-19, rs210169303). Association analyses showed significant relations between three out of seven tested SNPs and several FAs. Significant associations (FDR P < 0.05) were recorded between FADS2-23 (rs109772589) and two omega-6 FAs (dihomogamma linolenic acid [C20:3n6] and arachidonic acid [C20:4n6]), FADS1-07 (rs42187261) and one omega-3 FA (eicosapentaenoic acid, C20:5n3) and tricosanoic acid (C23:0), and one intronic SNP, FADS1-01 (rs136261927) and C20:3n6. Our study has demonstrated positive associations between three SNPs within FADS1 and FADS2 genes (a SNP within the 3'UTR, a synonymous SNP and an intronic SNP), with three milk PUFAs of Canadian Holstein cows thus suggesting possible involvement of synonymous and non-coding region variants in FA synthesis. These SNPs may serve as potential genetic markers in breeding programs to

  13. A splice variant in the ACSL5 gene relates migraine with fatty acid activation in mitochondria

    PubMed Central

    Matesanz, Fuencisla; Fedetz, María; Barrionuevo, Cristina; Karaky, Mohamad; Catalá-Rabasa, Antonio; Potenciano, Victor; Bello-Morales, Raquel; López-Guerrero, Jose-Antonio; Alcina, Antonio

    2016-01-01

    Genome-wide association studies (GWAS) in migraine are providing the molecular basis of this heterogeneous disease, but the understanding of its aetiology is still incomplete. Although some biomarkers have currently been accepted for migraine, large amount of studies for identifying new ones is needed. The migraine-associated variant rs12355831:A>G (P=2 × 10−6), described in a GWAS of the International Headache Genetic Consortium, is localized in a non-coding sequence with unknown function. We sought to identify the causal variant and the genetic mechanism involved in the migraine risk. To this end, we integrated data of RNA sequences from the Genetic European Variation in Health and Disease (GEUVADIS) and genotypes from 1000 GENOMES of 344 lymphoblastoid cell lines (LCLs), to determine the expression quantitative trait loci (eQTLs) in the region. We found that the migraine-associated variant belongs to a linkage disequilibrium block associated with the expression of an acyl-coenzyme A synthetase 5 (ACSL5) transcript lacking exon 20 (ACSL5-Δ20). We showed by exon-skipping assay a direct causality of rs2256368-G in the exon 20 skipping of approximately 20 to 40% of ACSL5 RNA molecules. In conclusion, we identified the functional variant (rs2256368:A>G) affecting ACSL5 exon 20 skipping, as a causal factor linked to the migraine-associated rs12355831:A>G, suggesting that the activation of long-chain fatty acids by the spliced ACSL5-Δ20 molecules, a mitochondrial located enzyme, is involved in migraine pathology. PMID:27189022

  14. Leveraging long read sequencing from a single individual to provide a comprehensive resource for benchmarking variant calling methods

    PubMed Central

    Mu, John C.; Tootoonchi Afshar, Pegah; Mohiyuddin, Marghoob; Chen, Xi; Li, Jian; Bani Asadi, Narges; Gerstein, Mark B.; Wong, Wing H.; Lam, Hugo Y. K.

    2015-01-01

    A high-confidence, comprehensive human variant set is critical in assessing accuracy of sequencing algorithms, which are crucial in precision medicine based on high-throughput sequencing. Although recent works have attempted to provide such a resource, they still do not encompass all major types of variants including structural variants (SVs). Thus, we leveraged the massive high-quality Sanger sequences from the HuRef genome to construct by far the most comprehensive gold set of a single individual, which was cross validated with deep Illumina sequencing, population datasets, and well-established algorithms. It was a necessary effort to completely reanalyze the HuRef genome as its previously published variants were mostly reported five years ago, suffering from compatibility, organization, and accuracy issues that prevent their direct use in benchmarking. Our extensive analysis and validation resulted in a gold set with high specificity and sensitivity. In contrast to the current gold sets of the NA12878 or HS1011 genomes, our gold set is the first that includes small variants, deletion SVs and insertion SVs up to a hundred thousand base-pairs. We demonstrate the utility of our HuRef gold set to benchmark several published SV detection tools. PMID:26412485

  15. Annotation of Sequence Variants in Cancer Samples: Processes and Pitfalls for Routine Assays in the Clinical Laboratory.

    PubMed

    Lee, Lobin A; Arvai, Kevin J; Jones, Dan

    2015-07-01

    As DNA sequencing of multigene panels becomes routine for cancer samples in the clinical laboratory, an efficient process for classifying variants has become more critical. Determining which germline variants are significant for cancer disposition and which somatic mutations are integral to cancer development or therapy response remains difficult, even for well-studied genes such as BRCA1 and TP53. We compare and contrast the general principles and lines of evidence commonly used to distinguish the significance of cancer-associated germline and somatic genetic variants. The factors important in each step of the analysis pipeline are reviewed, as are some of the publicly available annotation tools. Given the range of indications and uses of cancer sequencing assays, including diagnosis, staging, prognostication, theranostics, and residual disease detection, the need for flexible methods for scoring of variants is discussed. The usefulness of protein prediction tools and multimodal risk-based or Bayesian approaches are highlighted. Using TET2 variants encountered in hematologic neoplasms, several examples of this multifactorial approach to classifying sequence variants of unknown significance are presented. Although there are still significant gaps in the publicly available data for many cancer genes that limit the broad application of explicit algorithms for variant scoring, the elements of a more rigorous model are outlined. Copyright © 2015 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  16. BETASEQ: a powerful novel method to control type-I error inflation in partially sequenced data for rare variant association testing.

    PubMed

    Yan, Song; Li, Yun

    2014-02-15

    Despite its great capability to detect rare variant associations, next-generation sequencing is still prohibitively expensive when applied to large samples. In case-control studies, it is thus appealing to sequence only a subset of cases to discover variants and genotype the identified variants in controls and the remaining cases under the reasonable assumption that causal variants are usually enriched among cases. However, this approach leads to inflated type-I error if analyzed naively for rare variant association. Several methods have been proposed in recent literature to control type-I error at the cost of either excluding some sequenced cases or correcting the genotypes of discovered rare variants. All of these approaches thus suffer from certain extent of information loss and thus are underpowered. We propose a novel method (BETASEQ), which corrects inflation of type-I error by supplementing pseudo-variants while keeps the original sequence and genotype data intact. Extensive simulations and real data analysis demonstrate that, in most practical situations, BETASEQ leads to higher testing powers than existing approaches with guaranteed (controlled or conservative) type-I error. BETASEQ and associated R files, including documentation, examples, are available at http://www.unc.edu/~yunmli/betaseq

  17. Identification of Five Novel Variants in Chinese Oculocutaneous Albinism by Targeted Next-Generation Sequencing.

    PubMed

    Qiu, Biyuan; Ma, Tao; Peng, Chunyan; Zheng, Xiaoqin; Yang, Jiyun

    2018-04-01

    The diagnosis of oculocutaneous albinism (OCA) is established using clinical signs and symptoms. OCA is, however, a highly genetically heterogeneous disease with mutations identified in at least nineteen unique genes, many of which produce overlapping phenotypic traits. Thus, differentiating genetic OCA subtypes for diagnoses and genetic counseling is challenging, based on clinical presentation alone, and would benefit from a comprehensive molecular diagnostic. To develop and validate a more comprehensive, targeted, next-generation-sequencing-based diagnostic for the identification of OCA-causing variants. The genomic DNA samples from 28 OCA probands were analyzed by targeted next-generation sequencing (NGS), and the candidate variants were confirmed through Sanger sequencing. We observed mutations in the TYR, OCA2, and SLC45A2 genes in 25/28 (89%) patients with OCA. We identified 38 pathogenic variants among these three genes, including 5 novel variants: c.1970G>T (p.Gly657Val), c.1669A>C (p.Thr557Pro), c.2339-2A>C, and c.1349C>G (p.Thr450Arg) in OCA2; c.459_470delTTTTGCTGCCGA (p.Ala155_Phe158del) in SLC45A2. Our findings expand the mutational spectrum of OCA in the Chinese population, and the assay we developed should be broadly useful as a molecular diagnostic, and as an aid for genetic counseling for OCA patients.

  18. Integrating multiple genomic data to predict disease-causing nonsynonymous single nucleotide variants in exome sequencing studies.

    PubMed

    Wu, Jiaxin; Li, Yanda; Jiang, Rui

    2014-03-01

    Exome sequencing has been widely used in detecting pathogenic nonsynonymous single nucleotide variants (SNVs) for human inherited diseases. However, traditional statistical genetics methods are ineffective in analyzing exome sequencing data, due to such facts as the large number of sequenced variants, the presence of non-negligible fraction of pathogenic rare variants or de novo mutations, and the limited size of affected and normal populations. Indeed, prevalent applications of exome sequencing have been appealing for an effective computational method for identifying causative nonsynonymous SNVs from a large number of sequenced variants. Here, we propose a bioinformatics approach called SPRING (Snv PRioritization via the INtegration of Genomic data) for identifying pathogenic nonsynonymous SNVs for a given query disease. Based on six functional effect scores calculated by existing methods (SIFT, PolyPhen2, LRT, MutationTaster, GERP and PhyloP) and five association scores derived from a variety of genomic data sources (gene ontology, protein-protein interactions, protein sequences, protein domain annotations and gene pathway annotations), SPRING calculates the statistical significance that an SNV is causative for a query disease and hence provides a means of prioritizing candidate SNVs. With a series of comprehensive validation experiments, we demonstrate that SPRING is valid for diseases whose genetic bases are either partly known or completely unknown and effective for diseases with a variety of inheritance styles. In applications of our method to real exome sequencing data sets, we show the capability of SPRING in detecting causative de novo mutations for autism, epileptic encephalopathies and intellectual disability. We further provide an online service, the standalone software and genome-wide predictions of causative SNVs for 5,080 diseases at http://bioinfo.au.tsinghua.edu.cn/spring.

  19. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1998-01-01

    A method for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe.

  20. Characterization of the two intra-individual sequence variants in the 18S rRNA gene in the plant parasitic nematode, Rotylenchulus reniformis.

    PubMed

    Nyaku, Seloame T; Sripathi, Venkateswara R; Kantety, Ramesh V; Gu, Yong Q; Lawrence, Kathy; Sharma, Govind C

    2013-01-01

    The 18S rRNA gene is fundamental to cellular and organismal protein synthesis and because of its stable persistence through generations it is also used in phylogenetic analysis among taxa. Sequence variation in this gene within a single species is rare, but it has been observed in few metazoan organisms. More frequently it has mostly been reported in the non-transcribed spacer region. Here, we have identified two sequence variants within the near full coding region of 18S rRNA gene from a single reniform nematode (RN) Rotylenchulus reniformis labeled as reniform nematode variant 1 (RN_VAR1) and variant 2 (RN_VAR2). All sequences from three of the four isolates had both RN variants in their sequences; however, isolate 13B had only RN variant 2 sequence. Specific variable base sites (96 or 5.5%) were found within the 18S rRNA gene that can clearly distinguish the two 18S rDNA variants of RN, in 11 (25.0%) and 33 (75.0%) of the 44 RN clones, for RN_VAR1 and RN_VAR2, respectively. Neighbor-joining trees show that the RN_VAR1 is very similar to the previously existing R. reniformis sequence in GenBank, while the RN_VAR2 sequence is more divergent. This is the first report of the identification of two major variants of the 18S rRNA gene in the same single RN, and documents the specific base variation between the two variants, and hypothesizes on simultaneous co-existence of these two variants for this gene.

  1. Characterization of the Two Intra-Individual Sequence Variants in the 18S rRNA Gene in the Plant Parasitic Nematode, Rotylenchulus reniformis

    PubMed Central

    Nyaku, Seloame T.; Sripathi, Venkateswara R.; Kantety, Ramesh V.; Gu, Yong Q.; Lawrence, Kathy; Sharma, Govind C.

    2013-01-01

    The 18S rRNA gene is fundamental to cellular and organismal protein synthesis and because of its stable persistence through generations it is also used in phylogenetic analysis among taxa. Sequence variation in this gene within a single species is rare, but it has been observed in few metazoan organisms. More frequently it has mostly been reported in the non-transcribed spacer region. Here, we have identified two sequence variants within the near full coding region of 18S rRNA gene from a single reniform nematode (RN) Rotylenchulus reniformis labeled as reniform nematode variant 1 (RN_VAR1) and variant 2 (RN_VAR2). All sequences from three of the four isolates had both RN variants in their sequences; however, isolate 13B had only RN variant 2 sequence. Specific variable base sites (96 or 5.5%) were found within the 18S rRNA gene that can clearly distinguish the two 18S rDNA variants of RN, in 11 (25.0%) and 33 (75.0%) of the 44 RN clones, for RN_VAR1 and RN_VAR2, respectively. Neighbor-joining trees show that the RN_VAR1 is very similar to the previously existing R. reniformis sequence in GenBank, while the RN_VAR2 sequence is more divergent. This is the first report of the identification of two major variants of the 18S rRNA gene in the same single RN, and documents the specific base variation between the two variants, and hypothesizes on simultaneous co-existence of these two variants for this gene. PMID:23593343

  2. De novo assembly and next-generation sequencing to analyse full-length gene variants from codon-barcoded libraries.

    PubMed

    Cho, Namjin; Hwang, Byungjin; Yoon, Jung-ki; Park, Sangun; Lee, Joongoo; Seo, Han Na; Lee, Jeewon; Huh, Sunghoon; Chung, Jinsoo; Bang, Duhee

    2015-09-21

    Interpreting epistatic interactions is crucial for understanding evolutionary dynamics of complex genetic systems and unveiling structure and function of genetic pathways. Although high resolution mapping of en masse variant libraries renders molecular biologists to address genotype-phenotype relationships, long-read sequencing technology remains indispensable to assess functional relationship between mutations that lie far apart. Here, we introduce JigsawSeq for multiplexed sequence identification of pooled gene variant libraries by combining a codon-based molecular barcoding strategy and de novo assembly of short-read data. We first validate JigsawSeq on small sub-pools and observed high precision and recall at various experimental settings. With extensive simulations, we then apply JigsawSeq to large-scale gene variant libraries to show that our method can be reliably scaled using next-generation sequencing. JigsawSeq may serve as a rapid screening tool for functional genomics and offer the opportunity to explore evolutionary trajectories of protein variants.

  3. Novel Genetic Variants of Sporadic Atrial Septal Defect (ASD) in a Chinese Population Identified by Whole-Exome Sequencing (WES).

    PubMed

    Liu, Yong; Cao, Yu; Li, Yaxiong; Lei, Dongyun; Li, Lin; Hou, Zong Liu; Han, Shen; Meng, Mingyao; Shi, Jianlin; Zhang, Yayong; Wang, Yi; Niu, Zhaoyi; Xie, Yanhua; Xiao, Benshan; Wang, Yuanfei; Li, Xiao; Yang, Lirong; Wang, Wenju; Jiang, Lihong

    2018-03-05

    BACKGROUND Recently, mutations in several genes have been described to be associated with sporadic ASD, but some genetic variants remain to be identified. The aim of this study was to use whole-exome sequencing (WES) combined with bioinformatics analysis to identify novel genetic variants in cases of sporadic congenital ASD, followed by validation by Sanger sequencing. MATERIAL AND METHODS Five Han patients with secundum ASD were recruited, and their tissue samples were analyzed by WES, followed by verification by Sanger sequencing of tissue and blood samples. Further evaluation using blood samples included 452 additional patients with sporadic secundum ASD (212 male and 240 female patients) and 519 healthy subjects (252 male and 267 female subjects) for further verification by a multiplexed MassARRAY system. Bioinformatic analyses were performed to identify novel genetic variants associated with sporadic ASD. RESULTS From five patients with sporadic ASD, a total of 181,762 genomic variants in 33 exon loci, validated by Sanger sequencing, were selected and underwent MassARRAY analysis in 452 patients with ASD and 519 healthy subjects. Three loci with high mutation frequencies, the 138665410 FOXL2 gene variant, the 23862952 MYH6 gene variant, and the 71098693 HYDIN gene variant were found to be significantly associated with sporadic ASD (P<0.05); variants in FOXL2 and MYH6 were found in patients with isolated, sporadic ASD (P<5×10^-4). CONCLUSIONS This was the first study that demonstrated variants in FOXL2 and HYDIN associated with sporadic ASD, and supported the use of WES and bioinformatics analysis to identify disease-associated mutations.

  4. The Saccharomyces Genome Database Variant Viewer.

    PubMed

    Sheppard, Travis K; Hitz, Benjamin C; Engel, Stacia R; Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C; Dalusag, Kyla S; Demeter, Janos; Hellerstedt, Sage T; Karra, Kalpana; Nash, Robert S; Paskov, Kelley M; Skrzypek, Marek S; Weng, Shuai; Wong, Edith D; Cherry, J Michael

    2016-01-04

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. In recent years, we have moved toward increased representation of sequence variation and allelic differences within S. cerevisiae. The publication of numerous additional genomes has motivated the creation of new tools for their annotation and analysis. Here we present the Variant Viewer: a dynamic open-source web application for the visualization of genomic and proteomic differences. Multiple sequence alignments have been constructed across high quality genome sequences from 11 different S. cerevisiae strains and stored in the SGD. The alignments and summaries are encoded in JSON and used to create a two-tiered dynamic view of the budding yeast pan-genome, available at http://www.yeastgenome.org/variant-viewer. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1998-07-21

    A method is disclosed for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe. 11 figs.

  6. Harnessing Omics Big Data in Nine Vertebrate Species by Genome-Wide Prioritization of Sequence Variants with the Highest Predicted Deleterious Effect on Protein Function.

    PubMed

    Rozman, Vita; Kunej, Tanja

    2018-05-10

    Harnessing the genomics big data requires innovation in how we extract and interpret biologically relevant variants. Currently, there is no established catalog of prioritized missense variants associated with deleterious protein function phenotypes. We report in this study, to the best of our knowledge, the first genome-wide prioritization of sequence variants with the most deleterious effect on protein function (potentially deleterious variants [pDelVars]) in nine vertebrate species: human, cattle, horse, sheep, pig, dog, rat, mouse, and zebrafish. The analysis was conducted using the Ensembl/BioMart tool. Genes comprising pDelVars in the highest number of examined species were identified using a Python script. Multiple genomic alignments of the selected genes were built to identify interspecies orthologous potentially deleterious variants, which we defined as the "ortho-pDelVars." Genome-wide prioritization revealed that in humans, 0.12% of the known variants are predicted to be deleterious. In seven out of nine examined vertebrate species, the genes encoding the multiple PDZ domain crumbs cell polarity complex component (MPDZ) and the transforming acidic coiled-coil containing protein 2 (TACC2) comprise pDelVars. Five interspecies ortho-pDelVars were identified in three genes. These findings offer new ways to harness genomics big data by facilitating the identification of functional polymorphisms in humans and animal models and thus provide a future basis for optimization of protocols for whole genome prioritization of pDelVars and screening of orthologous sequence variants. The approach presented here can inform various postgenomic applications such as personalized medicine and multiomics study of health interventions (iatromics).

  7. Genetic Mapping and Exome Sequencing Identify Variants Associated with Five Novel Diseases

    PubMed Central

    Puffenberger, Erik G.; Jinks, Robert N.; Sougnez, Carrie; Cibulskis, Kristian; Willert, Rebecca A.; Achilly, Nathan P.; Cassidy, Ryan P.; Fiorentini, Christopher J.; Heiken, Kory F.; Lawrence, Johnny J.; Mahoney, Molly H.; Miller, Christopher J.; Nair, Devika T.; Politi, Kristin A.; Worcester, Kimberly N.; Setton, Roni A.; DiPiazza, Rosa; Sherman, Eric A.; Eastman, James T.; Francklyn, Christopher; Robey-Bond, Susan; Rider, Nicholas L.; Gabriel, Stacey; Morton, D. Holmes; Strauss, Kevin A.

    2012-01-01

    The Clinic for Special Children (CSC) has integrated biochemical and molecular methods into a rural pediatric practice serving Old Order Amish and Mennonite (Plain) children. Among the Plain people, we have used single nucleotide polymorphism (SNP) microarrays to genetically map recessive disorders to large autozygous haplotype blocks (mean = 4.4 Mb) that contain many genes (mean = 79). For some, uninformative mapping or large gene lists preclude disease-gene identification by Sanger sequencing. Seven such conditions were selected for exome sequencing at the Broad Institute; all had been previously mapped at the CSC using low density SNP microarrays coupled with autozygosity and linkage analyses. Using between 1 and 5 patient samples per disorder, we identified sequence variants in the known disease-causing genes SLC6A3 and FLVCR1, and present evidence to strongly support the pathogenicity of variants identified in TUBGCP6, BRAT1, SNIP1, CRADD, and HARS. Our results reveal the power of coupling new genotyping technologies to population-specific genetic knowledge and robust clinical data. PMID:22279524

  8. Exome sequencing and genome-wide linkage analysis in 17 families illustrate the complex contribution of TTN truncating variants to dilated cardiomyopathy.

    PubMed

    Norton, Nadine; Li, Duanxiang; Rampersaud, Evadnie; Morales, Ana; Martin, Eden R; Zuchner, Stephan; Guo, Shengru; Gonzalez, Michael; Hedges, Dale J; Robertson, Peggy D; Krumm, Niklas; Nickerson, Deborah A; Hershberger, Ray E

    2013-04-01

    BACKGROUND- Familial dilated cardiomyopathy (DCM) is a genetically heterogeneous disease with >30 known genes. TTN truncating variants were recently implicated in a candidate gene study to cause 25% of familial and 18% of sporadic DCM cases. METHODS AND RESULTS- We used an unbiased genome-wide approach using both linkage analysis and variant filtering across the exome sequences of 48 individuals affected with DCM from 17 families to identify genetic cause. Linkage analysis ranked the TTN region as falling under the second highest genome-wide multipoint linkage peak, multipoint logarithm of odds, 1.59. We identified 6 TTN truncating variants carried by individuals affected with DCM in 7 of 17 DCM families (logarithm of odds, 2.99); 2 of these 7 families also had novel missense variants that segregated with disease. Two additional novel truncating TTN variants did not segregate with DCM. Nucleotide diversity at the TTN locus, including missense variants, was comparable with 5 other known DCM genes. The average number of missense variants in the exome sequences from the DCM cases or the ≈5400 cases from the Exome Sequencing Project was ≈23 per individual. The average number of TTN truncating variants in the Exome Sequencing Project was 0.014 per individual. We also identified a region (chr9q21.11-q22.31) with no known DCM genes with a maximum heterogeneity logarithm of odds score of 1.74. CONCLUSIONS- These data suggest that TTN truncating variants contribute to DCM cause. However, the lack of segregation of all identified TTN truncating variants illustrates the challenge of determining variant pathogenicity even with full exome sequencing.

  9. Parental origin of sequence variants associated with complex diseases.

    PubMed

    Kong, Augustine; Steinthorsdottir, Valgerdur; Masson, Gisli; Thorleifsson, Gudmar; Sulem, Patrick; Besenbacher, Soren; Jonasdottir, Aslaug; Sigurdsson, Asgeir; Kristinsson, Kari Th; Jonasdottir, Adalbjorg; Frigge, Michael L; Gylfason, Arnaldur; Olason, Pall I; Gudjonsson, Sigurjon A; Sverrisson, Sverrir; Stacey, Simon N; Sigurgeirsson, Bardur; Benediktsdottir, Kristrun R; Sigurdsson, Helgi; Jonsson, Thorvaldur; Benediktsson, Rafn; Olafsson, Jon H; Johannsson, Oskar Th; Hreidarsson, Astradur B; Sigurdsson, Gunnar; Ferguson-Smith, Anne C; Gudbjartsson, Daniel F; Thorsteinsdottir, Unnur; Stefansson, Kari

    2009-12-17

    Effects of susceptibility variants may depend on from which parent they are inherited. Although many associations between sequence variants and human traits have been discovered through genome-wide associations, the impact of parental origin has largely been ignored. Here we show that for 38,167 Icelanders genotyped using single nucleotide polymorphism (SNP) chips, the parental origin of most alleles can be determined. For this we used a combination of genealogy and long-range phasing. We then focused on SNPs that associate with diseases and are within 500 kilobases of known imprinted genes. Seven independent SNP associations were examined. Five-one with breast cancer, one with basal-cell carcinoma and three with type 2 diabetes-have parental-origin-specific associations. These variants are located in two genomic regions, 11p15 and 7q32, each harbouring a cluster of imprinted genes. Furthermore, we observed a novel association between the SNP rs2334499 at 11p15 and type 2 diabetes. Here the allele that confers risk when paternally inherited is protective when maternally transmitted. We identified a differentially methylated CTCF-binding site at 11p15 and demonstrated correlation of rs2334499 with decreased methylation of that site.

  10. Variant discovery in the sheep milk transcriptome using RNA sequencing.

    PubMed

    Suárez-Vega, Aroa; Gutiérrez-Gil, Beatriz; Klopp, Christophe; Tosser-Klopp, Gwenola; Arranz, Juan José

    2017-02-15

    The identification of genetic variation underlying desired phenotypes is one of the main challenges of current livestock genetic research. High-throughput transcriptome sequencing (RNA-Seq) offers new opportunities for the detection of transcriptome variants (SNPs and short indels) in different tissues and species. In this study, we used RNA-Seq on Milk Sheep Somatic Cells (MSCs) with the goal of characterizing the genetic variation within the coding regions of the milk transcriptome in Churra and Assaf sheep, two common dairy sheep breeds farmed in Spain. A total of 216,637 variants were detected in the MSCs transcriptome of the eight ewes analyzed. Among them, a total of 57,795 variants were detected in the regions harboring Quantitative Trait Loci (QTL) for milk yield, protein percentage and fat percentage, of which 21.44% were novel variants. Among the total variants detected, 561 (2.52%) and 1,649 (7.42%) were predicted to produce high or moderate impact changes in the corresponding transcriptional unit, respectively. In the functional enrichment analysis of the genes positioned within selected QTL regions harboring novel relevant functional variants (high and moderate impact), the KEGG pathway with the highest enrichment was "protein processing in endoplasmic reticulum". Additionally, a total of 504 and 1,063 variants were identified in the genes encoding principal milk proteins and molecules involved in the lipid metabolism, respectively. Of these variants, 20 mutations were found to have putative relevant effects on the encoded proteins. We present herein the first transcriptomic approach aimed at identifying genetic variants of the genes expressed in the lactating mammary gland of sheep. Through the transcriptome analysis of variability within regions harboring QTL for milk yield, protein percentage and fat percentage, we have found several pathways and genes that harbor mutations that could affect dairy production traits. Moreover, remarkable variants

  11. Novel Genetic Variants of Sporadic Atrial Septal Defect (ASD) in a Chinese Population Identified by Whole-Exome Sequencing (WES)

    PubMed Central

    Liu, Yong; Cao, Yu; Li, Yaxiong; Lei, Dongyun; Li, Lin; Hou, Zong Liu; Han, Shen; Meng, Mingyao; Shi, Jianlin; Zhang, Yayong; Wang, Yi; Niu, Zhaoyi; Xie, Yanhua; Xiao, Benshan; Wang, Yuanfei; Li, Xiao; Yang, Lirong

    2018-01-01

    Background Recently, mutations in several genes have been described to be associated with sporadic ASD, but some genetic variants remain to be identified. The aim of this study was to use whole-exome sequencing (WES) combined with bioinformatics analysis to identify novel genetic variants in cases of sporadic congenital ASD, followed by validation by Sanger sequencing. Material/Methods Five Han patients with secundum ASD were recruited, and their tissue samples were analyzed by WES, followed by verification by Sanger sequencing of tissue and blood samples. Further evaluation using blood samples included 452 additional patients with sporadic secundum ASD (212 male and 240 female patients) and 519 healthy subjects (252 male and 267 female subjects) for further verification by a multiplexed MassARRAY system. Bioinformatic analyses were performed to identify novel genetic variants associated with sporadic ASD. Results From five patients with sporadic ASD, a total of 181,762 genomic variants in 33 exon loci, validated by Sanger sequencing, were selected and underwent MassARRAY analysis in 452 patients with ASD and 519 healthy subjects. Three loci with high mutation frequencies, the 138665410 FOXL2 gene variant, the 23862952 MYH6 gene variant, and the 71098693 HYDIN gene variant were found to be significantly associated with sporadic ASD (P<0.05); variants in FOXL2 and MYH6 were found in patients with isolated, sporadic ASD (P<5×10−4). Conclusions This was the first study that demonstrated variants in FOXL2 and HYDIN associated with sporadic ASD, and supported the use of WES and bioinformatics analysis to identify disease-associated mutations. PMID:29505555

  12. Whole-exome sequencing identifies common and rare variant metabolic QTLs in a Middle Eastern population.

    PubMed

    Yousri, Noha A; Fakhro, Khalid A; Robay, Amal; Rodriguez-Flores, Juan L; Mohney, Robert P; Zeriri, Hassina; Odeh, Tala; Kader, Sara Abdul; Aldous, Eman K; Thareja, Gaurav; Kumar, Manish; Al-Shakaki, Alya; Chidiac, Omar M; Mohamoud, Yasmin A; Mezey, Jason G; Malek, Joel A; Crystal, Ronald G; Suhre, Karsten

    2018-01-23

    Metabolomics-genome-wide association studies (mGWAS) have uncovered many metabolic quantitative trait loci (mQTLs) influencing human metabolic individuality, though predominantly in European cohorts. By combining whole-exome sequencing with a high-resolution metabolomics profiling for a highly consanguineous Middle Eastern population, we discover 21 common variant and 12 functional rare variant mQTLs, of which 45% are novel altogether. We fine-map 10 common variant mQTLs to new metabolite ratio associations, and 11 common variant mQTLs to putative protein-altering variants. This is the first work to report common and rare variant mQTLs linked to diseases and/or pharmacological targets in a consanguineous Arab cohort, with wide implications for precision medicine in the Middle East.

  13. Solid phase sequencing of double-stranded nucleic acids

    DOEpatents

    Fu, Dong-Jing; Cantor, Charles R.; Koster, Hubert; Smith, Cassandra L.

    2002-01-01

    This invention relates to methods for detecting and sequencing of target double-stranded nucleic acid sequences, to nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probe comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include nucleic acids in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated determination of molecular weights and identification of the target sequence.

  14. When is it MODY? Challenges in the Interpretation of Sequence Variants in MODY Genes

    PubMed Central

    Althari, Sara; Gloyn, Anna L.

    2015-01-01

    The genomics revolution has raised more questions than it has provided answers. Big data from large population-scale resequencing studies are increasingly deconstructing classic notions of Mendelian disease genetics, which support a simplistic correlation between mutational severity and phenotypic outcome. The boundaries are being blurred as the body of evidence showing monogenic disease-causing alleles in healthy genomes, and in the genomes of individu-als with increased common complex disease risk, continues to grow. In this review, we focus on the newly emerging challenges which pertain to the interpretation of sequence variants in genes implicated in the pathogenesis of maturity-onset diabetes of the young (MODY), a presumed mono-genic form of diabetes characterized by Mendelian inheritance. These challenges highlight the complexities surrounding the assignments of pathogenicity, in particular to rare protein-alerting variants, and bring to the forefront some profound clinical diagnostic implications. As MODY is both genetically and clinically heterogeneous, an accurate molecular diagnosis and cautious extrapolation of sequence data are critical to effective disease management and treatment. The biological and translational value of sequence information can only be attained by adopting a multitude of confirmatory analyses, which interrogate variant implication in disease from every possible angle. Indeed, studies which have effectively detected rare damaging variants in known MODY genes in normoglycemic individuals question the existence of a sin-gle gene mutation scenario: does monogenic diabetes exist when the genetic culprits of MODY have been systematical-ly identified in individuals without MODY? PMID:27111119

  15. Identifying novel sequence variants of RNA 3D motifs

    PubMed Central

    Zirbel, Craig L.; Roll, James; Sweeney, Blake A.; Petrov, Anton I.; Pirrung, Meg; Leontis, Neocles B.

    2015-01-01

    Predicting RNA 3D structure from sequence is a major challenge in biophysics. An important sub-goal is accurately identifying recurrent 3D motifs from RNA internal and hairpin loop sequences extracted from secondary structure (2D) diagrams. We have developed and validated new probabilistic models for 3D motif sequences based on hybrid Stochastic Context-Free Grammars and Markov Random Fields (SCFG/MRF). The SCFG/MRF models are constructed using atomic-resolution RNA 3D structures. To parameterize each model, we use all instances of each motif found in the RNA 3D Motif Atlas and annotations of pairwise nucleotide interactions generated by the FR3D software. Isostericity relations between non-Watson–Crick basepairs are used in scoring sequence variants. SCFG techniques model nested pairs and insertions, while MRF ideas handle crossing interactions and base triples. We use test sets of randomly-generated sequences to set acceptance and rejection thresholds for each motif group and thus control the false positive rate. Validation was carried out by comparing results for four motif groups to RMDetect. The software developed for sequence scoring (JAR3D) is structured to automatically incorporate new motifs as they accumulate in the RNA 3D Motif Atlas when new structures are solved and is available free for download. PMID:26130723

  16. Localized structural frustration for evaluating the impact of sequence variants

    PubMed Central

    Kumar, Sushant; Clarke, Declan; Gerstein, Mark

    2016-01-01

    Population-scale sequencing is increasingly uncovering large numbers of rare single-nucleotide variants (SNVs) in coding regions of the genome. The rarity of these variants makes it challenging to evaluate their deleteriousness with conventional phenotype–genotype associations. Protein structures provide a way of addressing this challenge. Previous efforts have focused on globally quantifying the impact of SNVs on protein stability. However, local perturbations may severely impact protein functionality without strongly disrupting global stability (e.g. in relation to catalysis or allostery). Here, we describe a workflow in which localized frustration, quantifying unfavorable local interactions, is employed as a metric to investigate such effects. Using this workflow on the Protein Databank, we find that frustration produces many immediately intuitive results: for instance, disease-related SNVs create stronger changes in localized frustration than non-disease related variants, and rare SNVs tend to disrupt local interactions to a larger extent than common variants. Less obviously, we observe that somatic SNVs associated with oncogenes and tumor suppressor genes (TSGs) induce very different changes in frustration. In particular, those associated with TSGs change the frustration more in the core than the surface (by introducing loss-of-function events), whereas those associated with oncogenes manifest the opposite pattern, creating gain-of-function events. PMID:27915290

  17. Structural comparisons of two allelic variants of human placental alkaline phosphatase.

    PubMed

    Millán, J L; Stigbrand, T; Jörnvall, H

    1985-01-01

    A simple immunosorbent purification scheme based on monoclonal antibodies has been devised for human placental alkaline phosphatase. The two most common allelic variants, S and F, have similar amino acid compositions with identical N-terminal amino acid sequences through the first 13 residues. Both variants have identical lectin binding properties towards concanavalin A, lentil-lectin, wheat germ agglutinin, phytohemagglutinin and soybean agglutinin, and identical carbohydrate contents as revealed by methylation analysis. CNBr fragments of the variants demonstrate identical high performance liquid chromatography patterns. The carbohydrate containing fragment is different from the 32P-labeled active site fragment and the N-terminal fragment.

  18. Next-generation DNA sequencing identifies novel gene variants and pathways involved in specific language impairment.

    PubMed

    Chen, Xiaowei Sylvia; Reader, Rose H; Hoischen, Alexander; Veltman, Joris A; Simpson, Nuala H; Francks, Clyde; Newbury, Dianne F; Fisher, Simon E

    2017-04-25

    A significant proportion of children have unexplained problems acquiring proficient linguistic skills despite adequate intelligence and opportunity. Developmental language disorders are highly heritable with substantial societal impact. Molecular studies have begun to identify candidate loci, but much of the underlying genetic architecture remains undetermined. We performed whole-exome sequencing of 43 unrelated probands affected by severe specific language impairment, followed by independent validations with Sanger sequencing, and analyses of segregation patterns in parents and siblings, to shed new light on aetiology. By first focusing on a pre-defined set of known candidates from the literature, we identified potentially pathogenic variants in genes already implicated in diverse language-related syndromes, including ERC1, GRIN2A, and SRPX2. Complementary analyses suggested novel putative candidates carrying validated variants which were predicted to have functional effects, such as OXR1, SCN9A and KMT2D. We also searched for potential "multiple-hit" cases; one proband carried a rare AUTS2 variant in combination with a rare inherited haplotype affecting STARD9, while another carried a novel nonsynonymous variant in SEMA6D together with a rare stop-gain in SYNPR. On broadening scope to all rare and novel variants throughout the exomes, we identified biological themes that were enriched for such variants, including microtubule transport and cytoskeletal regulation.

  19. Next-generation DNA sequencing identifies novel gene variants and pathways involved in specific language impairment

    PubMed Central

    Chen, Xiaowei Sylvia; Reader, Rose H.; Hoischen, Alexander; Veltman, Joris A.; Simpson, Nuala H.; Francks, Clyde; Newbury, Dianne F.; Fisher, Simon E.

    2017-01-01

    A significant proportion of children have unexplained problems acquiring proficient linguistic skills despite adequate intelligence and opportunity. Developmental language disorders are highly heritable with substantial societal impact. Molecular studies have begun to identify candidate loci, but much of the underlying genetic architecture remains undetermined. We performed whole-exome sequencing of 43 unrelated probands affected by severe specific language impairment, followed by independent validations with Sanger sequencing, and analyses of segregation patterns in parents and siblings, to shed new light on aetiology. By first focusing on a pre-defined set of known candidates from the literature, we identified potentially pathogenic variants in genes already implicated in diverse language-related syndromes, including ERC1, GRIN2A, and SRPX2. Complementary analyses suggested novel putative candidates carrying validated variants which were predicted to have functional effects, such as OXR1, SCN9A and KMT2D. We also searched for potential “multiple-hit” cases; one proband carried a rare AUTS2 variant in combination with a rare inherited haplotype affecting STARD9, while another carried a novel nonsynonymous variant in SEMA6D together with a rare stop-gain in SYNPR. On broadening scope to all rare and novel variants throughout the exomes, we identified biological themes that were enriched for such variants, including microtubule transport and cytoskeletal regulation. PMID:28440294

  20. A Unified Mixed-Effects Model for Rare-Variant Association in Sequencing Studies

    PubMed Central

    Sun, Jianping; Zheng, Yingye; Hsu, Li

    2013-01-01

    For rare-variant association analysis, due to extreme low frequencies of these variants, it is necessary to aggregate them by a prior set (e.g., genes and pathways) in order to achieve adequate power. In this paper, we consider hierarchical models to relate a set of rare variants to phenotype by modeling the effects of variants as a function of variant characteristics while allowing for variant-specific effect (heterogeneity). We derive a set of two score statistics, testing the group effect by variant characteristics and the heterogeneity effect. We make a novel modification to these score statistics so that they are independent under the null hypothesis and their asymptotic distributions can be derived. As a result, the computational burden is greatly reduced compared with permutation-based tests. Our approach provides a general testing framework for rare variants association, which includes many commonly used tests, such as the burden test [Li and Leal, 2008] and the sequence kernel association test [Wu et al., 2011], as special cases. Furthermore, in contrast to these tests, our proposed test has an added capacity to identify which components of variant characteristics and heterogeneity contribute to the association. Simulations under a wide range of scenarios show that the proposed test is valid, robust and powerful. An application to the Dallas Heart Study illustrates that apart from identifying genes with significant associations, the new method also provides additional information regarding the source of the association. Such information may be useful for generating hypothesis in future studies. PMID:23483651

  1. Functional Analyses of a Novel Splice Variant in the CHD7 Gene, Found by Next Generation Sequencing, Confirm Its Pathogenicity in a Spanish Patient and Diagnose Him with CHARGE Syndrome.

    PubMed

    Villate, Olatz; Ibarluzea, Nekane; Fraile-Bethencourt, Eugenia; Valenzuela, Alberto; Velasco, Eladio A; Grozeva, Detelina; Raymond, F L; Botella, María P; Tejada, María-Isabel

    2018-01-01

    Mutations in CHD7 have been shown to be a major cause of CHARGE syndrome, which presents many symptoms and features common to other syndromes making its diagnosis difficult. Next generation sequencing (NGS) of a panel of intellectual disability related genes was performed in an adult patient without molecular diagnosis. A splice donor variant in CHD7 (c.5665 + 1G > T) was identified. To study its potential pathogenicity, exons and flanking intronic sequences were amplified from patient DNA and cloned into the pSAD ® splicing vector. HeLa cells were transfected with this construct and a wild-type minigene and functional analysis were performed. The construct with the c.5665 + 1G > T variant produced an aberrant transcript with an insert of 63 nucleotides of intron 28 creating a premature termination codon (TAG) 25 nucleotides downstream. This would lead to the insertion of 8 new amino acids and therefore a truncated 1896 amino acid protein. As a result of this, the patient was diagnosed with CHARGE syndrome. Functional analyses underline their usefulness for studying the pathogenicity of variants found by NGS and therefore its application to accurately diagnose patients.

  2. Associations between variants of FADS genes and omega-3 and omega-6 milk fatty acids of Canadian Holstein cows

    PubMed Central

    2014-01-01

    Background Fatty acid desaturase 1 (FADS1) and 2 (FADS2) genes code respectively for the enzymes delta-5 and delta-6 desaturases which are rate limiting enzymes in the synthesis of polyunsaturated omega-3 and omega-6 fatty acids (FAs). Omega-3 and-6 FAs as well as conjugated linoleic acid (CLA) are present in bovine milk and have demonstrated positive health effects in humans. Studies in humans have shown significant relationships between genetic variants in FADS1 and 2 genes with plasma and tissue concentrations of omega-3 and-6 FAs. The aim of this study was to evaluate the extent of sequence variations within these two genes in Canadian Holstein cows as well as the association between sequence variants and health promoting FAs in milk. Results Thirty three SNPs were detected within the studied regions of genes including a synonymous mutation (FADS1-07, rs42187261, 306Tyr > Tyr) in exon 8 of FADS1, a non-synonymous mutation (FADS2-14, rs211580559, 294Ala > Val) within FADS2 exon 7, a splice site SNP (FADS2-05, rs211263660), a 3′UTR SNP (FADS2-23, rs109772589), and another 3′UTR SNP with an effect on a microRNA binding site within FADS2 gene (FADS2-19, rs210169303). Association analyses showed significant relations between three out of seven tested SNPs and several FAs. Significant associations (FDR P < 0.05) were recorded between FADS2-23 (rs109772589) and two omega-6 FAs (dihomogamma linolenic acid [C20:3n6] and arachidonic acid [C20:4n6]), FADS1-07 (rs42187261) and one omega-3 FA (eicosapentaenoic acid, C20:5n3) and tricosanoic acid (C23:0), and one intronic SNP, FADS1-01 (rs136261927) and C20:3n6. Conclusion Our study has demonstrated positive associations between three SNPs within FADS1 and FADS2 genes (a SNP within the 3’UTR, a synonymous SNP and an intronic SNP), with three milk PUFAs of Canadian Holstein cows thus suggesting possible involvement of synonymous and non-coding region variants in FA synthesis. These SNPs may serve as

  3. A rare variant of the mtDNA HVS1 sequence in the hairs of Napoléon's family.

    PubMed

    Lucotte, Gérard

    2010-10-04

    This paper describes the finding of a rare variant in the sequence of the hypervariable segment (HVS1) of mitochondrial (mtDNA) extracted from two preserved hairs, authenticated as belonging to the French Emperor Napoléon I (Napoléon Bonaparte). This rare variant is a mutation that changes the base C to T at position 16,184 (16184C→T), and it constitutes the only mutation found in this HVS1 sequence. This mutation is rare, because it was not found in a reference database (P < 0.05). In a personal database (M. Pala) comprising 37,000 different sequences, the 16184C→T mutation was found in only three samples, thus in this database the mutation frequency was 0.00008%. This mutation 16184C→T was also the only variant found subsequently in the HVS1 sequences of mtDNAs extracted from Napoléon's mother (Letizia) and from his youngest sister (Caroline), confirming that this mutation is maternally inherited. This 16184C→T variant could be used for genetic verification to authenticate any doubtful material and determine whether it should indeed be attributed to Napoléon.

  4. A rare variant of the mtDNA HVS1 sequence in the hairs of Napoléon's family

    PubMed Central

    2010-01-01

    This paper describes the finding of a rare variant in the sequence of the hypervariable segment (HVS1) of mitochondrial (mtDNA) extracted from two preserved hairs, authenticated as belonging to the French Emperor Napoléon I (Napoléon Bonaparte). This rare variant is a mutation that changes the base C to T at position 16,184 (16184C→T), and it constitutes the only mutation found in this HVS1 sequence. This mutation is rare, because it was not found in a reference database (P < 0.05). In a personal database (M. Pala) comprising 37,000 different sequences, the 16184C→T mutation was found in only three samples, thus in this database the mutation frequency was 0.00008%. This mutation 16184C→T was also the only variant found subsequently in the HVS1 sequences of mtDNAs extracted from Napoléon's mother (Letizia) and from his youngest sister (Caroline), confirming that this mutation is maternally inherited. This 16184C→T variant could be used for genetic verification to authenticate any doubtful material and determine whether it should indeed be attributed to Napoléon. PMID:21092341

  5. Rapid Detection of Rare Deleterious Variants by Next Generation Sequencing with Optional Microarray SNP Genotype Data

    PubMed Central

    Watson, Christopher M.; Crinnion, Laura A.; Gurgel‐Gianetti, Juliana; Harrison, Sally M.; Daly, Catherine; Antanavicuite, Agne; Lascelles, Carolina; Markham, Alexander F.; Pena, Sergio D. J.; Bonthron, David T.

    2015-01-01

    ABSTRACT Autozygosity mapping is a powerful technique for the identification of rare, autosomal recessive, disease‐causing genes. The ease with which this category of disease gene can be identified has greatly increased through the availability of genome‐wide SNP genotyping microarrays and subsequently of exome sequencing. Although these methods have simplified the generation of experimental data, its analysis, particularly when disparate data types must be integrated, remains time consuming. Moreover, the huge volume of sequence variant data generated from next generation sequencing experiments opens up the possibility of using these data instead of microarray genotype data to identify disease loci. To allow these two types of data to be used in an integrated fashion, we have developed AgileVCFMapper, a program that performs both the mapping of disease loci by SNP genotyping and the analysis of potentially deleterious variants using exome sequence variant data, in a single step. This method does not require microarray SNP genotype data, although analysis with a combination of microarray and exome genotype data enables more precise delineation of disease loci, due to superior marker density and distribution. PMID:26037133

  6. Sequence variants in ARHGAP15, COLQ and FAM155A associate with diverticular disease and diverticulitis

    PubMed Central

    Sigurdsson, Snaevar; Alexandersson, Kristjan F.; Sulem, Patrick; Feenstra, Bjarke; Gudmundsdottir, Steinunn; Halldorsson, Gisli H.; Olafsson, Sigurgeir; Sigurdsson, Asgeir; Rafnar, Thorunn; Thorgeirsson, Thorgeir; Sørensen, Erik; Nordholm-Carstensen, Andreas; Burcharth, Jakob; Andersen, Jens; Jørgensen, Henrik Stig; Possfelt-Møller, Emma; Ullum, Henrik; Thorleifsson, Gudmar; Masson, Gisli; Thorsteinsdottir, Unnur; Melbye, Mads; Gudbjartsson, Daniel F.; Stefansson, Tryggvi; Jonsdottir, Ingileif; Stefansson, Kari

    2017-01-01

    Diverticular disease is characterized by pouches (that is, diverticulae) due to weakness in the bowel wall, which can become infected and inflamed causing diverticulitis, with potentially severe complications. Here, we test 32.4 million sequence variants identified through whole-genome sequencing (WGS) of 15,220 Icelanders for association with diverticular disease (5,426 cases) and its more severe form diverticulitis (2,764 cases). Subsequently, 16 sequence variants are followed up in a diverticular disease sample from Denmark (5,970 cases, 3,020 controls). In the combined Icelandic and Danish data sets we observe significant association of intronic variants in ARHGAP15 (Rho GTPase-activating protein 15; rs4662344-T: P=1.9 × 10−18, odds ratio (OR)=1.23) and COLQ (collagen-like tail subunit of asymmetric acetylcholinesterase; rs7609897-T: P=1.5 × 10−10, OR=0.87) with diverticular disease and in FAM155A (family with sequence similarity 155A; rs67153654-A: P=3.0 × 10−11, OR=0.82) with diverticulitis. These are the first loci shown to associate with diverticular disease in a genome-wide study. PMID:28585551

  7. A power set-based statistical selection procedure to locate susceptible rare variants associated with complex traits with sequencing data.

    PubMed

    Sun, Hokeun; Wang, Shuang

    2014-08-15

    Existing association methods for rare variants from sequencing data have focused on aggregating variants in a gene or a genetic region because of the fact that analysing individual rare variants is underpowered. However, these existing rare variant detection methods are not able to identify which rare variants in a gene or a genetic region of all variants are associated with the complex diseases or traits. Once phenotypic associations of a gene or a genetic region are identified, the natural next step in the association study with sequencing data is to locate the susceptible rare variants within the gene or the genetic region. In this article, we propose a power set-based statistical selection procedure that is able to identify the locations of the potentially susceptible rare variants within a disease-related gene or a genetic region. The selection performance of the proposed selection procedure was evaluated through simulation studies, where we demonstrated the feasibility and superior power over several comparable existing methods. In particular, the proposed method is able to handle the mixed effects when both risk and protective variants are present in a gene or a genetic region. The proposed selection procedure was also applied to the sequence data on the ANGPTL gene family from the Dallas Heart Study to identify potentially susceptible rare variants within the trait-related genes. An R package 'rvsel' can be downloaded from http://www.columbia.edu/∼sw2206/ and http://statsun.pusan.ac.kr. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  8. Variant Amino Acid Residues Alter the Enzyme Activity of Peanut Type 2 Diacylglycerol Acyltransferases

    PubMed Central

    Zheng, Ling; Shockey, Jay; Bian, Fei; Chen, Gao; Shan, Lei; Li, Xinguo; Wan, Shubo; Peng, Zhenying

    2017-01-01

    Diacylglycerol acyltransferase (DGAT) catalyzes the final step in triacylglycerol (TAG) biosynthesis via the acyl-CoA-dependent acylation of diacylglycerol. This reaction is a major control point in the Kennedy pathway for biosynthesis of TAG, which is the most important form of stored metabolic energy in most oil-producing plants. In this study, Arachis hypogaea type 2 DGAT (AhDGAT2) genes were cloned from the peanut cultivar ‘Luhua 14.’ Sequence analysis of 11 different peanut cultivars revealed a gene family of 8 peanut DGAT2 genes (designated AhDGAT2a-h). Sequence alignments revealed 21 nucleotide differences between the eight ORFs, but only six differences result in changes to the predicted amino acid (AA) sequences. A representative full-length cDNA clone (AhDGAT2a) was characterized in detail. The biochemical effects of altering the AhDGAT2a sequence to include single variable AA residues were tested by mutagenesis and functional complementation assays in transgenic yeast systems. All six mutant variants retained enzyme activity and produced lipid droplets in vivo. The N6D and A26P mutants also displayed increased enzyme activity and/or total cellular fatty acid (FA) content. N6D mutant mainly increased the content of palmitoleic acid, and A26P mutant mainly increased the content of palmitic acid. The A26P mutant grew well both in the presence of oleic and C18:2, but the other mutants grew better in the presence of C18:2. AhDGAT2 is expressed in all peanut organs analyzed, with high transcript levels in leaves and flowers. These levels are comparable to that found in immature seeds, where DGAT2 expression is most abundant in other plants. Over-expression of AhDGAT2a in tobacco substantially increased the FA content of transformed tobacco seeds. Expression of AhDGAT2a also altered transcription levels of endogenous tobacco lipid metabolic genes in transgenic tobacco, apparently creating a larger carbon ‘sink’ that supports increased FA levels. PMID

  9. Sequence variants in ESR1 and OXTR are associated with Mayer-Rokitansky-Küster-Hauser syndrome.

    PubMed

    Brucker, Sara Yvonne; Frank, Liliane; Eisenbeis, Simone; Henes, Melanie; Wallwiener, Diethelm; Riess, Olaf; van Eijck, Barbara; Schöller, Dorit; Bonin, Michael; Rall, Kristin Katharina

    2017-11-01

    Mayer-Rokitansky-Küster-Hauser syndrome (MRKHS) is characterized by congenital absence of the uterus and the upper two-thirds of the vagina in otherwise phenotypically normal females. It is found isolated or associated with renal, skeletal and other malformations. Despite ongoing research, the etiology is mainly unknown. For a long time, the hypothesis of deficient hormone receptors as the cause for MRKHS has existed, supported by previous findings of our group. The aim of the present study was to identify unknown genetic causes for MRKHS and to compare them with data banks including a review of the literature. DNA sequence analysis of the oxytocin receptor (OXTR) and estrogen receptor-1 gene (ESR1) was performed in a group of 93 clinically well-defined patients with uterovaginal aplasia (68 with the isolated form and 25 with associated malformations). In total, we detected three OXTR variants in 18 MRKHS patients with one leading to a missense mutation, and six ESR1 variants in 21 MRKHS patients, two of these causing amino acid changes and therefore potentially disease. The identified variants on DNA level might impair receptor function through different molecular mechanisms. Mutations of ESR1 and OXTR are associated with MRKHS. Thus, we consider these genes potential candidates associated with the manifestation of MRKHS. © 2017 Nordic Federation of Societies of Obstetrics and Gynecology, Acta Obstetricia et Gynecologica Scandinavica.

  10. Functional analysis of variant lysosomal acid glycosidases of Anderson-Fabry and Pompe disease in a human embryonic kidney epithelial cell line (HEK 293 T).

    PubMed

    Ebrahim, Hatim Y; Baker, Robert J; Mehta, Atul B; Hughes, Derralynn A

    2012-03-01

    The functional significance of missense mutations in genes encoding acid glycosidases of lysosomal storage disorders (LSDs) is not always clear. Here we describe a method of investigating functional properties of variant enzymes in vitro using a human embryonic kidney epithelial cell line. Site-directed mutagenesis was performed on the parental plasmids containing cDNA encoding for alpha-galactosidase A (α-Gal A) and acid maltase (α-Glu) to prepare plasmids encoding relevant point mutations. Mutant plasmids were transfected into HEK 293 T cells, and transient over-expression of variant enzymes was measured after 3 days. We have illustrated the method by examining enzymatic activities of four unknown α-Gal A and one α-Glu variants identified in our patients with Anderson-Fabry disease and Pompe diseases respectively. Comparison with control variants known to be either pathogenic or non-pathogenic together with over-expression of wild-type enzyme allowed determination of the pathogenicity of the mutation. One leader sequence novel variant of α-Gal A (p.A15T) was shown not to significantly reduce enzyme activity, whereas three other novel α-Gal A variants (p.D93Y, p.L372P and p.T410I) were shown to be pathogenic as they resulted in significant reduction of enzyme activity. A novel α-Glu variant (p.L72R) was shown to be pathogenic as this significantly reduced enzyme activity. Certain acid glycosidase variants that have been described in association with late-onset LSDs and which are known to have variable residual plasma and leukocyte enzyme activity in patients appear to show intermediate to low enzyme activity (p.N215S and p.Q279E α-Gal A respectively) in the over-expression system.

  11. Cystinuria Associated with Different SLC7A9 Gene Variants in the Cat

    PubMed Central

    Raj, Karthik; Osborne, Carl; Giger, Urs

    2016-01-01

    Cystinuria is a classical inborn error of metabolism characterized by a selective proximal renal tubular defect affecting cystine, ornithine, lysine, and arginine (COLA) reabsorption, which can lead to uroliths and urinary obstruction. In humans, dogs and mice, cystinuria is caused by variants in one of two genes, SLC3A1 and SLC7A9, which encode the rBAT and bo,+AT subunits of the bo,+ basic amino acid transporter system, respectively. In this study, exons and flanking regions of the SLC3A1 and SLC7A9 genes were sequenced from genomic DNA of cats (Felis catus) with COLAuria and cystine calculi. Relative to the Felis catus-6.2 reference genome sequence, DNA sequences from these affected cats revealed 3 unique homozygous SLC7A9 missense variants: one in exon 5 (p.Asp236Asn) from a non-purpose-bred medium-haired cat, one in exon 7 (p.Val294Glu) in a Maine Coon and a Sphinx cat, and one in exon 10 (p.Thr392Met) from a non-purpose-bred long-haired cat. A genotyping assay subsequently identified another cystinuric domestic medium-haired cat that was homozygous for the variant originally identified in the purebred cats. These missense variants result in deleterious amino acid substitutions of highly conserved residues in the bo,+AT protein. A limited population survey supported that the variants found were likely causative. The remaining 2 sequenced domestic short-haired cats had a heterozygous variant at a splice donor site in intron 10 and a homozygous single nucleotide variant at a branchpoint in intron 11 of SLC7A9, respectively. This study identifies the first SLC7A9 variants causing feline cystinuria and reveals that, as in humans and dogs, this disease is genetically heterogeneous in cats. PMID:27404572

  12. The N-terminal sequence of albumin Redhill, a variant of human serum albumin.

    PubMed

    Hutchinson, D W; Matejtschuk, P

    1985-12-02

    Albumin Redhill, a variant human albumin, has been isolated by fast protein liquid chromatofocusing. The N-terminal sequence of this protein corresponded to that of albumin A except that one additional arginine residue was attached to the N-terminus.

  13. A Python package for parsing, validating, mapping and formatting sequence variants using HGVS nomenclature.

    PubMed

    Hart, Reece K; Rico, Rudolph; Hare, Emily; Garcia, John; Westbrook, Jody; Fusaro, Vincent A

    2015-01-15

    Biological sequence variants are commonly represented in scientific literature, clinical reports and databases of variation using the mutation nomenclature guidelines endorsed by the Human Genome Variation Society (HGVS). Despite the widespread use of the standard, no freely available and comprehensive programming libraries are available. Here we report an open-source and easy-to-use Python library that facilitates the parsing, manipulation, formatting and validation of variants according to the HGVS specification. The current implementation focuses on the subset of the HGVS recommendations that precisely describe sequence-level variation relevant to the application of high-throughput sequencing to clinical diagnostics. The package is released under the Apache 2.0 open-source license. Source code, documentation and issue tracking are available at http://bitbucket.org/hgvs/hgvs/. Python packages are available at PyPI (https://pypi.python.org/pypi/hgvs). Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  14. A Python package for parsing, validating, mapping and formatting sequence variants using HGVS nomenclature

    PubMed Central

    Hart, Reece K.; Rico, Rudolph; Hare, Emily; Garcia, John; Westbrook, Jody; Fusaro, Vincent A.

    2015-01-01

    Summary: Biological sequence variants are commonly represented in scientific literature, clinical reports and databases of variation using the mutation nomenclature guidelines endorsed by the Human Genome Variation Society (HGVS). Despite the widespread use of the standard, no freely available and comprehensive programming libraries are available. Here we report an open-source and easy-to-use Python library that facilitates the parsing, manipulation, formatting and validation of variants according to the HGVS specification. The current implementation focuses on the subset of the HGVS recommendations that precisely describe sequence-level variation relevant to the application of high-throughput sequencing to clinical diagnostics. Availability and implementation: The package is released under the Apache 2.0 open-source license. Source code, documentation and issue tracking are available at http://bitbucket.org/hgvs/hgvs/. Python packages are available at PyPI (https://pypi.python.org/pypi/hgvs). Contact: reecehart@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25273102

  15. Dietary fatty acids modulate associations between genetic variants and circulating fatty acids in plasma and erythrocyte membranes: meta-analysis of nine studies in the CHARGE consortium

    USDA-ARS?s Scientific Manuscript database

    Scope: Tissue concentrations of omega-3 fatty acids may reduce cardiovascular disease risk, and genetic variants are associated with circulating fatty acids concentrations. Whether dietary fatty acids interact with genetic variants to modify circulating omega-3 fatty acids is unclear. We evaluated i...

  16. Sequence variant classification and reporting: recommendations for improving the interpretation of cancer susceptibility genetic test results

    PubMed Central

    Plon, Sharon E.; Eccles, Diana M.; Easton, Douglas; Foulkes, William D.; Genuardi, Maurizio; Greenblatt, Marc S.; Hogervorst, Frans B.L.; Hoogerbrugge, Nicoline; Spurdle, Amanda B.; Tavtigian, Sean

    2011-01-01

    Genetic testing of cancer susceptibility genes is now widely applied in clinical practice to predict risk of developing cancer. In general, sequence-based testing of germline DNA is used to determine whether an individual carries a change that is clearly likely to disrupt normal gene function. Genetic testing may detect changes that are clearly pathogenic, clearly neutral or variants of unclear clinical significance. Such variants present a considerable challenge to the diagnostic laboratory and the receiving clinician in terms of interpretation and clear presentation of the implications of the result to the patient. There does not appear to be a consistent approach to interpreting and reporting the clinical significance of variants either among genes or among laboratories. The potential for confusion among clinicians and patients is considerable and misinterpretation may lead to inappropriate clinical consequences. In this article we review the current state of sequence-based genetic testing, describe other standardized reporting systems used in oncology and propose a standardized classification system for application to sequence based results for cancer predisposition genes. We suggest a system of five classes of variants based on the degree of likelihood of pathogenicity. Each class is associated with specific recommendations for clinical management of at-risk relatives that will depend on the syndrome. We propose that panels of experts on each cancer predisposition syndrome facilitate the classification scheme and designate appropriate surveillance and cancer management guidelines. The international adoption of a standardized reporting system should improve the clinical utility of sequence-based genetic tests to predict cancer risk. PMID:18951446

  17. Sequence variants in oxytocin pathway genes and preterm birth: a candidate gene association study

    PubMed Central

    2013-01-01

    Background Preterm birth (PTB) is a complex disorder associated with significant neonatal mortality and morbidity and long-term adverse health consequences. Multiple lines of evidence suggest that genetic factors play an important role in its etiology. This study was designed to identify genetic variation associated with PTB in oxytocin pathway genes whose role in parturition is well known. Methods To identify common genetic variants predisposing to PTB, we genotyped 16 single nucleotide polymorphisms (SNPs) in the oxytocin (OXT), oxytocin receptor (OXTR), and leucyl/cystinyl aminopeptidase (LNPEP) genes in 651 case infants from the U.S. and one or both of their parents. In addition, we examined the role of rare genetic variation in susceptibility to PTB by conducting direct sequence analysis of OXTR in 1394 cases and 1112 controls from the U.S., Argentina, Denmark, and Finland. This study was further extended to maternal triads (maternal grandparents-mother of a case infant, N=309). We also performed in vitro analysis of selected rare OXTR missense variants to evaluate their functional importance. Results Maternal genetic effect analysis of the SNP genotype data revealed four SNPs in LNPEP that show significant association with prematurity. In our case–control sequence analysis, we detected fourteen coding variants in exon 3 of OXTR, all but four of which were found in cases only. Of the fourteen variants, three were previously unreported novel rare variants. When the sequence data from the maternal triads were analyzed using the transmission disequilibrium test, two common missense SNPs (rs4686302 and rs237902) in OXTR showed suggestive association for three gestational age subgroups. In vitro functional assays showed a significant difference in ligand binding between wild-type and two mutant receptors. Conclusions Our study suggests an association between maternal common polymorphisms in LNPEP and susceptibility to PTB. Maternal OXTR missense SNPs rs4686302

  18. Detecting Genomic Clustering of Risk Variants from Sequence Data: Cases vs. Controls

    PubMed Central

    Schaid, Daniel J.; Sinnwell, Jason P.; McDonnell, Shannon K.; Thibodeau, Stephen N.

    2013-01-01

    As the ability to measure dense genetic markers approaches the limit of the DNA sequence itself, taking advantage of possible clustering of genetic variants in, and around, a gene would benefit genetic association analyses, and likely provide biological insights. The greatest benefit might be realized when multiple rare variants cluster in a functional region. Several statistical tests have been developed, one of which is based on the popular Kulldorff scan statistic for spatial clustering of disease. We extended another popular spatial clustering method – Tango’s statistic – to genomic sequence data. An advantage of Tango’s method is that it is rapid to compute, and when single test statistic is computed, its distribution is well approximated by a scaled chi-square distribution, making computation of p-values very rapid. We compared the Type-I error rates and power of several clustering statistics, as well as the omnibus sequence kernel association test (SKAT). Although our version of Tango’s statistic, which we call “Kernel Distance” statistic, took approximately half the time to compute than the Kulldorff scan statistic, it had slightly less power than the scan statistic. Our results showed that the Ionita-Laza version of Kulldorff’s scan statistic had the greatest power over a range of clustering scenarios. PMID:23842950

  19. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people

    PubMed Central

    Nelson, Matthew R.; Wegmann, Daniel; Ehm, Margaret G.; Kessner, Darren; St. Jean, Pamela; Verzilli, Claudio; Shen, Judong; Tang, Zhengzheng; Bacanu, Silviu-Alin; Fraser, Dana; Warren, Liling; Aponte, Jennifer; Zawistowski, Matthew; Liu, Xiao; Zhang, Hao; Zhang, Yong; Li, Jun; Li, Yun; Li, Li; Woollard, Peter; Topp, Simon; Hall, Matthew D.; Nangle, Keith; Wang, Jun; Abecasis, Gonçalo; Cardon, Lon R.; Zöllner, Sebastian; Whittaker, John C.; Chissoe, Stephanie L.; Novembre, John; Mooser, Vincent

    2015-01-01

    Rare genetic variants contribute to complex disease risk; however, the abundance of rare variants in human populations remains unknown. We explored this spectrum of variation by sequencing 202 genes encoding drug targets in 14,002 individuals. We find rare variants are abundant (one every 17 bases) and geographically localized, such that even with large sample sizes, rare variant catalogs will be largely incomplete. We used the observed patterns of variation to estimate population growth parameters, the proportion of variants in a given frequency class that are putatively deleterious, and mutation rates for each gene. Overall we conclude that, due to rapid population growth and weak purifying selection, human populations harbor an abundance of rare variants, many of which are deleterious and have relevance to understanding disease risk. PMID:22604722

  20. Genetic variants of human serum cholinesterase influence metabolism of the muscle relaxant succinylcholine.

    PubMed

    Lockridge, O

    1990-01-01

    People with genetic variants of cholinesterase respond abnormally to succinylcholine, experiencing substantial prolongation of muscle paralysis with apnea rather than the usual 2-6 min. The structure of usual cholinesterase has been determined including the complete amino acid and nucleotide sequence. This has allowed identification of altered amino acids and nucleotides. The variant most frequently found in patients who respond abnormally to succinylcholine is atypical cholinesterase, which occurs in homozygous form in 1 out of 3500 Caucasians. Atypical cholinesterase has a single substitution at nucleotide 209 which changes aspartic acid 70 to glycine. This suggests that Asp 70 is part of the anionic site, and that the absence of this negatively charged amino acid explains the reduced affinity of atypical cholinesterase for positively charged substrates and inhibitors. The clinical consequence of reduced affinity for succinylcholine is that none of the succinylcholine is hydrolyzed in blood and a large overdose reaches the nerve-muscle junction where it causes prolonged muscle paralysis. Silent cholinesterase has a frame shift mutation at glycine 117 which prematurely terminates protein synthesis and yields no active enzyme. The K variant, named in honor of W. Kalow, has threonine in place of alanine 539. The K variant is associated with 33% lower activity. All variants arise from a single locus as there is only one gene for human cholinesterase (EC 3.1.1.8). Comparison of amino acid sequences of esterases and proteases shows that cholinesterase belongs to a new family of serine esterases which is different from the serine proteases.

  1. Multi-species sequence comparison reveals conservation of ghrelin gene-derived splice variants encoding a truncated ghrelin peptide.

    PubMed

    Seim, Inge; Jeffery, Penny L; Thomas, Patrick B; Walpole, Carina M; Maugham, Michelle; Fung, Jenny N T; Yap, Pei-Yi; O'Keeffe, Angela J; Lai, John; Whiteside, Eliza J; Herington, Adrian C; Chopin, Lisa K

    2016-06-01

    The peptide hormone ghrelin is a potent orexigen produced predominantly in the stomach. It has a number of other biological actions, including roles in appetite stimulation, energy balance, the stimulation of growth hormone release and the regulation of cell proliferation. Recently, several ghrelin gene splice variants have been described. Here, we attempted to identify conserved alternative splicing of the ghrelin gene by cross-species sequence comparisons. We identified a novel human exon 2-deleted variant and provide preliminary evidence that this splice variant and in1-ghrelin encode a C-terminally truncated form of the ghrelin peptide, termed minighrelin. These variants are expressed in humans and mice, demonstrating conservation of alternative splicing spanning 90 million years. Minighrelin appears to have similar actions to full-length ghrelin, as treatment with exogenous minighrelin peptide stimulates appetite and feeding in mice. Forced expression of the exon 2-deleted preproghrelin variant mirrors the effect of the canonical preproghrelin, stimulating cell proliferation and migration in the PC3 prostate cancer cell line. This is the first study to characterise an exon 2-deleted preproghrelin variant and to demonstrate sequence conservation of ghrelin gene-derived splice variants that encode a truncated ghrelin peptide. This adds further impetus for studies into the alternative splicing of the ghrelin gene and the function of novel ghrelin peptides in vertebrates.

  2. Low-Pass Genome-Wide Sequencing and Variant Inference Using Identity-by-Descent in an Isolated Human Population

    PubMed Central

    Gusev, A.; Shah, M. J.; Kenny, E. E.; Ramachandran, A.; Lowe, J. K.; Salit, J.; Lee, C. C.; Levandowsky, E. C.; Weaver, T. N.; Doan, Q. C.; Peckham, H. E.; McLaughlin, S. F.; Lyons, M. R.; Sheth, V. N.; Stoffel, M.; De La Vega, F. M.; Friedman, J. M.; Breslow, J. L.

    2012-01-01

    Whole-genome sequencing in an isolated population with few founders directly ascertains variants from the population bottleneck that may be rare elsewhere. In such populations, shared haplotypes allow imputation of variants in unsequenced samples without resorting to complex statistical methods as in studies of outbred cohorts. We focus on an isolated population cohort from the Pacific Island of Kosrae, Micronesia, where we previously collected SNP array and rich phenotype data for the majority of the population. We report identification of long regions with haplotypes co-inherited between pairs of individuals and methodology to leverage such shared genetic content for imputation. Our estimates show that sequencing as few as 40 personal genomes allows for inference in up to 60% of the 3000-person cohort at the average locus. We ascertained a pilot data set of whole-genome sequences from seven Kosraean individuals, with average 5× coverage. This assay identified 5,735,306 unique sites of which 1,212,831 were previously unknown. Additionally, these variants are unusually enriched for alleles that are rare in other populations when compared to geographic neighbors (published Korean genome SJK). We used the presence of shared haplotypes between the seven Kosraen individuals to estimate expected imputation accuracy of known and novel homozygous variants at 99.6% and 97.3%, respectively. This study presents whole-genome analysis of a homogenous isolate population with emphasis on optimal rare variant inference. PMID:22135348

  3. Identification of Alternative Splice Variants Using Unique Tryptic Peptide Sequences for Database Searches.

    PubMed

    Tran, Trung T; Bollineni, Ravi C; Strozynski, Margarita; Koehler, Christian J; Thiede, Bernd

    2017-07-07

    Alternative splicing is a mechanism in eukaryotes by which different forms of mRNAs are generated from the same gene. Identification of alternative splice variants requires the identification of peptides specific for alternative splice forms. For this purpose, we generated a human database that contains only unique tryptic peptides specific for alternative splice forms from Swiss-Prot entries. Using this database allows an easy access to splice variant-specific peptide sequences that match to MS data. Furthermore, we combined this database without alternative splice variant-1-specific peptides with human Swiss-Prot. This combined database can be used as a general database for searching of LC-MS data. LC-MS data derived from in-solution digests of two different cell lines (LNCaP, HeLa) and phosphoproteomics studies were analyzed using these two databases. Several nonalternative splice variant-1-specific peptides were found in both cell lines, and some of them seemed to be cell-line-specific. Control and apoptotic phosphoproteomes from Jurkat T cells revealed several nonalternative splice variant-1-specific peptides, and some of them showed clear quantitative differences between the two states.

  4. Determination of disease phenotypes and pathogenic variants from exome sequence data in the CAGI 4 gene panel challenge.

    PubMed

    Kundu, Kunal; Pal, Lipika R; Yin, Yizhou; Moult, John

    2017-09-01

    The use of gene panel sequence for diagnostic and prognostic testing is now widespread, but there are so far few objective tests of methods to interpret these data. We describe the design and implementation of a gene panel sequencing data analysis pipeline (VarP) and its assessment in a CAGI4 community experiment. The method was applied to clinical gene panel sequencing data of 106 patients, with the goal of determining which of 14 disease classes each patient has and the corresponding causative variant(s). The disease class was correctly identified for 36 cases, including 10 where the original clinical pipeline did not find causative variants. For a further seven cases, we found strong evidence of an alternative disease to that tested. Many of the potentially causative variants are missense, with no previous association with disease, and these proved the hardest to correctly assign pathogenicity or otherwise. Post analysis showed that three-dimensional structure data could have helped for up to half of these cases. Over-reliance on HGMD annotation led to a number of incorrect disease assignments. We used a largely ad hoc method to assign probabilities of pathogenicity for each variant, and there is much work still to be done in this area. © 2017 The Authors. **Human Mutation published by Wiley Periodicals, Inc.

  5. Genetic Variants Identified from Epilepsy of Unknown Etiology in Chinese Children by Targeted Exome Sequencing

    PubMed Central

    Wang, Yimin; Du, Xiaonan; Bin, Rao; Yu, Shanshan; Xia, Zhezhi; Zheng, Guo; Zhong, Jianmin; Zhang, Yunjian; Jiang, Yong-hui; Wang, Yi

    2017-01-01

    Genetic factors play a major role in the etiology of epilepsy disorders. Recent genomics studies using next generation sequencing (NGS) technique have identified a large number of genetic variants including copy number (CNV) and single nucleotide variant (SNV) in a small set of genes from individuals with epilepsy. These discoveries have contributed significantly to evaluate the etiology of epilepsy in clinic and lay the foundation to develop molecular specific treatment. However, the molecular basis for a majority of epilepsy patients remains elusive, and furthermore, most of these studies have been conducted in Caucasian children. Here we conducted a targeted exome-sequencing of 63 trios of Chinese epilepsy families using a custom-designed NGS panel that covers 412 known and candidate genes for epilepsy. We identified pathogenic and likely pathogenic variants in 15 of 63 (23.8%) families in known epilepsy genes including SCN1A, CDKL5, STXBP1, CHD2, SCN3A, SCN9A, TSC2, MBD5, POLG and EFHC1. More importantly, we identified likely pathologic variants in several novel candidate genes such as GABRE, MYH1, and CLCN6. Our results provide the evidence supporting the application of custom-designed NGS panel in clinic and indicate a conserved genetic susceptibility for epilepsy between Chinese and Caucasian children. PMID:28074849

  6. HapFABIA: Identification of very short segments of identity by descent characterized by rare variants in large sequencing data

    PubMed Central

    Hochreiter, Sepp

    2013-01-01

    Identity by descent (IBD) can be reliably detected for long shared DNA segments, which are found in related individuals. However, many studies contain cohorts of unrelated individuals that share only short IBD segments. New sequencing technologies facilitate identification of short IBD segments through rare variants, which convey more information on IBD than common variants. Current IBD detection methods, however, are not designed to use rare variants for the detection of short IBD segments. Short IBD segments reveal genetic structures at high resolution. Therefore, they can help to improve imputation and phasing, to increase genotyping accuracy for low-coverage sequencing and to increase the power of association studies. Since short IBD segments are further assumed to be old, they can shed light on the evolutionary history of humans. We propose HapFABIA, a computational method that applies biclustering to identify very short IBD segments characterized by rare variants. HapFABIA is designed to detect short IBD segments in genotype data that were obtained from next-generation sequencing, but can also be applied to DNA microarray data. Especially in next-generation sequencing data, HapFABIA exploits rare variants for IBD detection. HapFABIA significantly outperformed competing algorithms at detecting short IBD segments on artificial and simulated data with rare variants. HapFABIA identified 160 588 different short IBD segments characterized by rare variants with a median length of 23 kb (mean 24 kb) in data for chromosome 1 of the 1000 Genomes Project. These short IBD segments contain 752 000 single nucleotide variants (SNVs), which account for 39% of the rare variants and 23.5% of all variants. The vast majority—152 000 IBD segments—are shared by Africans, while only 19 000 and 11 000 are shared by Europeans and Asians, respectively. IBD segments that match the Denisova or the Neandertal genome are found significantly more often in Asians and Europeans but also

  7. Semiconductor Whole Exome Sequencing for the Identification of Genetic Variants in Colombian Patients Clinically Diagnosed with Long QT Syndrome.

    PubMed

    Burgos, Mariana; Arenas, Alvaro; Cabrera, Rodrigo

    2016-08-01

    Inherited long QT syndrome (LQTS) is a cardiac channelopathy characterized by a prolongation of QT interval and the risk of syncope, cardiac arrest, and sudden cardiac death. Genetic diagnosis of LQTS is critical in medical practice as results can guide adequate management of patients and distinguish phenocopies such as catecholaminergic polymorphic ventricular tachycardia (CPVT). However, extensive screening of large genomic regions is required in order to reliably identify genetic causes. Semiconductor whole exome sequencing (WES) is a promising approach for the identification of variants in the coding regions of most human genes. DNA samples from 21 Colombian patients clinically diagnosed with LQTS were enriched for coding regions using multiplex polymerase chain reaction (PCR) and subjected to WES using a semiconductor sequencer. Semiconductor WES showed mean coverage of 93.6 % for all coding regions relevant to LQTS at >10× depth with high intra- and inter-assay depth heterogeneity. Fifteen variants were detected in 12 patients in genes associated with LQTS. Three variants were identified in three patients in genes associated with CPVT. Co-segregation analysis was performed when possible. All variants were analyzed with two pathogenicity prediction algorithms. The overall prevalence of LQTS and CPVT variants in our cohort was 71.4 %. All LQTS variants previously identified through commercial genetic testing were identified. Standardized WES assays can be easily implemented, often at a lower cost than sequencing panels. Our results show that WES can identify LQTS-causing mutations and permits differential diagnosis of related conditions in a real-world clinical setting. However, high heterogeneity in sequencing depth and low coverage in the most relevant genes is expected to be associated with reduced analytical sensitivity.

  8. Detection of nucleic acid sequences by invader-directed cleavage

    DOEpatents

    Brow, Mary Ann D.; Hall, Jeff Steven Grotelueschen; Lyamichev, Victor; Olive, David Michael; Prudent, James Robert

    1999-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The 5' nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based by charge.

  9. Common genetic variants of the human UMOD gene are functional on transcription and predict plasma uric acid in two distinct populations

    PubMed Central

    Han, Jia; Liu, Ying; Rao, Fangwen; Nievergelt, Caroline M.; O’Connor, Daniel T.; Wang, Xingyu; Liu, Lisheng; Bu, Dingfang; Liang, Yu; Wang, Fang; Zhang, Luxia; Zhang, Hong; Chen, Yuqing; Wang, Haiyan

    2013-01-01

    Uromodulin (UMOD) genetic variants cause familial juvenile hyperuricemic nephropathy, characterized by hyperuricemia, decreased renal excretion of UMOD and uric acid; such findings suggest a role for UMOD in the regulation of plasma uric acid. We screened common variants across the UMOD locus in two populations, one from a community-based Chinese population, the other from California twins and siblings. Transcriptional activity of promoter variants was estimated in luciferase reporter plasmids transfected into HEK293 cells and mlMCD3 cells. By variance components in twin pairs, uric acid concentration and excretion were heritable traits. In the primary population from Beijing, we identified that carriers of haplotype GCC displayed higher plasma uric acid, and 3 UMOD promoter variants associated with plasma uric acid. UMOD promoter variants displayed reciprocal effects on urine uric acid excretion and plasma uric acid concentration, suggesting a primary effect on renal tubular handling of urate. These UMOD genetic marker-on-trait associations for uric acid were replicated in an independent American population sample. Site-directed mutagenesis at trait-associated UMOD promoter variants altered promoter activity in transfected luciferase reporter plasmids. These results suggest that UMOD promoter variants seem to initiate a cascade of transcriptional and biochemical changes influencing UMOD secretion, eventuating in elevation of plasma uric acid. PMID:23344472

  10. MToolBox: a highly automated pipeline for heteroplasmy annotation and prioritization analysis of human mitochondrial variants in high-throughput sequencing

    PubMed Central

    Diroma, Maria Angela; Santorsola, Mariangela; Guttà, Cristiano; Gasparre, Giuseppe; Picardi, Ernesto; Pesole, Graziano; Attimonelli, Marcella

    2014-01-01

    Motivation: The increasing availability of mitochondria-targeted and off-target sequencing data in whole-exome and whole-genome sequencing studies (WXS and WGS) has risen the demand of effective pipelines to accurately measure heteroplasmy and to easily recognize the most functionally important mitochondrial variants among a huge number of candidates. To this purpose, we developed MToolBox, a highly automated pipeline to reconstruct and analyze human mitochondrial DNA from high-throughput sequencing data. Results: MToolBox implements an effective computational strategy for mitochondrial genomes assembling and haplogroup assignment also including a prioritization analysis of detected variants. MToolBox provides a Variant Call Format file featuring, for the first time, allele-specific heteroplasmy and annotation files with prioritized variants. MToolBox was tested on simulated samples and applied on 1000 Genomes WXS datasets. Availability and implementation: MToolBox package is available at https://sourceforge.net/projects/mtoolbox/. Contact: marcella.attimonelli@uniba.it Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25028726

  11. Novel Variant of Tickborne Encephalitis Virus, Russia

    PubMed Central

    Ternovoi, Vladimir A.; Protopopova, Elena V.; Chausov, Eugene V.; Novikov, Dmitry V.; Leonova, Galina N.; Netesov, Sergey V.

    2007-01-01

    We isolated a novel strain of tickborne encephalitis virus (TBEV), Glubinnoe/2004, from a patient with a fatal case in Russia. We sequenced the strain, whose landmark features included 57 amino acid substitutions and 5 modified cleavage sites. Phylogenetically, Glubinnoe/2004 is a novel variant that belongs to the Eastern type of TBEV. PMID:18258012

  12. Systematic comparison of variant calling pipelines using gold standard personal exome variants

    PubMed Central

    Hwang, Sohyun; Kim, Eiru; Lee, Insuk; Marcotte, Edward M.

    2015-01-01

    The success of clinical genomics using next generation sequencing (NGS) requires the accurate and consistent identification of personal genome variants. Assorted variant calling methods have been developed, which show low concordance between their calls. Hence, a systematic comparison of the variant callers could give important guidance to NGS-based clinical genomics. Recently, a set of high-confident variant calls for one individual (NA12878) has been published by the Genome in a Bottle (GIAB) consortium, enabling performance benchmarking of different variant calling pipelines. Based on the gold standard reference variant calls from GIAB, we compared the performance of thirteen variant calling pipelines, testing combinations of three read aligners—BWA-MEM, Bowtie2, and Novoalign—and four variant callers—Genome Analysis Tool Kit HaplotypeCaller (GATK-HC), Samtools mpileup, Freebayes and Ion Proton Variant Caller (TVC), for twelve data sets for the NA12878 genome sequenced by different platforms including Illumina2000, Illumina2500, and Ion Proton, with various exome capture systems and exome coverage. We observed different biases toward specific types of SNP genotyping errors by the different variant callers. The results of our study provide useful guidelines for reliable variant identification from deep sequencing of personal genomes. PMID:26639839

  13. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2006-07-04

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  14. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2002-01-01

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  15. Next generation sequencing to identify novel genetic variants causative of autosomal dominant familial hypercholesterolemia associated with increased risk of coronary heart disease.

    PubMed

    Al-Allaf, Faisal A; Athar, Mohammad; Abduljaleel, Zainularifeen; Taher, Mohiuddin M; Khan, Wajahatullah; Ba-Hammam, Faisal A; Abalkhail, Hala; Alashwal, Abdullah

    2015-07-01

    Familial hypercholesterolemia (FH) is an autosomal dominant inherited disease characterized by elevated plasma low-density lipoprotein cholesterol (LDL-C). It is an autosomal dominant disease, caused by variants in Ldlr, ApoB or Pcsk9, which results in high levels of LDL-cholesterol (LDL-C) leading to early coronary heart disease. Sequencing whole genome for screening variants for FH are not suitable due to high cost. Hence, in this study we performed targeted customized sequencing of FH 12 genes (Ldlr, ApoB, Pcsk9, Abca1, Apoa2, Apoc3, Apon2, Arh, Ldlrap1, Apoc2, ApoE, and Lpl) that have been implicated in the homozygous phenotype of a proband pedigree to identify candidate variants by NGS Ion torrent PGM. Only three genes (Ldlr, ApoB, and Pcsk9) were found to be highly associated with FH based on the variant rate. The results showed that seven deleterious variants in Ldlr, ApoB, and Pcsk9 genes were pathological and were clinically significant based on predictions identified by SIFT and PolyPhen. Targeted customized sequencing is an efficient technique for screening variants among targeted FH genes. Final validation of seven deleterious variants conducted by capillary resulted to only one novel variant in Ldlr gene that was found in exon 14 (c.2026delG, p. Gly676fs). The variant found in Ldlr gene was a novel heterozygous variant derived from a male in the proband. Copyright © 2015 Elsevier B.V. All rights reserved.

  16. VirVarSeq: a low-frequency virus variant detection pipeline for Illumina sequencing using adaptive base-calling accuracy filtering.

    PubMed

    Verbist, Bie M P; Thys, Kim; Reumers, Joke; Wetzels, Yves; Van der Borght, Koen; Talloen, Willem; Aerssens, Jeroen; Clement, Lieven; Thas, Olivier

    2015-01-01

    In virology, massively parallel sequencing (MPS) opens many opportunities for studying viral quasi-species, e.g. in HIV-1- and HCV-infected patients. This is essential for understanding pathways to resistance, which can substantially improve treatment. Although MPS platforms allow in-depth characterization of sequence variation, their measurements still involve substantial technical noise. For Illumina sequencing, single base substitutions are the main error source and impede powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores (Qs) that are useful for differentiating errors from the real low-frequency mutations. A variant calling tool, Q-cpileup, is proposed, which exploits the Qs of nucleotides in a filtering strategy to increase specificity. The tool is imbedded in an open-source pipeline, VirVarSeq, which allows variant calling starting from fastq files. Using both plasmid mixtures and clinical samples, we show that Q-cpileup is able to reduce the number of false-positive findings. The filtering strategy is adaptive and provides an optimized threshold for individual samples in each sequencing run. Additionally, linkage information is kept between single-nucleotide polymorphisms as variants are called at the codon level. This enables virologists to have an immediate biological interpretation of the reported variants with respect to their antiviral drug responses. A comparison with existing SNP caller tools reveals that calling variants at the codon level with Q-cpileup results in an outstanding sensitivity while maintaining a good specificity for variants with frequencies down to 0.5%. The VirVarSeq is available, together with a user's guide and test data, at sourceforge: http://sourceforge.net/projects/virtools/?source=directory. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. Genotype–phenotype correlations in individuals with pathogenic RERE variants

    PubMed Central

    Jordan, Valerie K.; Fregeau, Brieana; Ge, Xiaoyan; Giordano, Jessica; Wapner, Ronald J.; Balci, Tugce B.; Carter, Melissa T.; Bernat, John A.; Moccia, Amanda N.; Srivastava, Anshika; Martin, Donna M.; Bielas, Stephanie L.; Pappas, John; Svoboda, Melissa D.; Rio, Marlène; Boddaert, Nathalie; Cantagrel, Vincent; Lewis, Andrea M.; Scaglia, Fernando; Kohler, Jennefer N.; Bernstein, Jonathan A.; Dries, Annika M.; Rosenfeld, Jill A.; DeFilippo, Colette; Thorson, Willa; Yang, Yaping; Sherr, Elliott H.; Bi, Weimin; Scott, Daryl A.

    2018-01-01

    Heterozygous variants in the arginine-glutamic acid dipeptide repeats gene (RERE) have been shown to cause neurodevelopmental disorder with or without anomalies of the brain, eye, or heart (NEDBEH). Here, we report nine individuals with NEDBEH who carry partial deletions or deleterious sequence variants in RERE. These variants were found to be de novo in all cases in which parental samples were available. An analysis of data from individuals with NEDBEH suggests that point mutations affecting the Atrophin-1 domain of RERE are associated with an increased risk of structural eye defects, congenital heart defects, renal anomalies, and sensorineural hearing loss when compared with loss-of-function variants that are likely to lead to haploinsufficiency. A high percentage of RERE pathogenic variants affect a histidine-rich region in the Atrophin-1 domain. We have also identified a recurrent two-amino-acid duplication in this region that is associated with the development of a CHARGE syndrome-like phenotype. We conclude that mutations affecting RERE result in a spectrum of clinical phenotypes. Genotype–phenotype correlations exist and can be used to guide medical decision making. Consideration should also be given to screening for RERE variants in individuals who fulfill diagnostic criteria for CHARGE syndrome but do not carry pathogenic variants in CHD7. PMID:29330883

  18. Whole-exome Sequence Analysis Implicates Rare Il17REL Variants in Familial and Sporadic Inflammatory Bowel Disease.

    PubMed

    Sasaki, Mark M; Skol, Andrew D; Hungate, Eric A; Bao, Riyue; Huang, Lei; Kahn, Stacy A; Allan, James M; Brant, Steven R; McGovern, Dermot P B; Peter, Inga; Silverberg, Mark S; Cho, Judy H; Kirschner, Barbara S; Onel, Kenan

    2016-01-01

    Rare variants (<1%) likely contribute significantly to risk for common diseases such as inflammatory bowel disease (IBD) in specific patient subsets, such as those with high familiality. They are, however, extraordinarily challenging to identify. To discover candidate rare variants associated with IBD, we performed whole-exome sequencing on 6 members of a pediatric-onset IBD family with multiple affected individuals. To determine whether the variants discovered in this family are also associated with nonfamilial IBD, we investigated their influence on disease in 2 large case-control (CC) series. We identified 2 rare variants, rs142430606 and rs200958270, both in the established IBD-susceptibility gene IL17REL, carried by all 4 affected family members and their obligate carrier parents. We then demonstrated that both variants are associated with sporadic ulcerative colitis (UC) in 2 independent data sets. For UC in CC 1: rs142430606 (odds ratio [OR] = 2.99, Padj = 0.028; minor allele frequency [MAF]cases = 0.0063, MAFcontrols = 0.0021); rs200958270 (OR = 2.61, Padj = 0.082; MAFcases = 0.0045, MAFcontrols = 0.0017). For UC in CC 2: rs142430606 (OR = 1.94, P = 0.0056; MAFcases = 0.0071, MAFcontrols = 0.0045); rs200958270 (OR = 2.08, P = 0.0028; MAFcases = 0.0071, MAFcontrols = 0.0042). We discover in a family and replicate in 2 CC data sets 2 rare susceptibility variants for IBD, both in IL17REL. Our results illustrate that whole-exome sequencing performed on disease-enriched families to guide association testing can be an efficient strategy for the discovery of rare disease-associated variants. We speculate that rare variants identified in families and confirmed in the general population may be important modifiers of disease risk for patients with a family history, and that genetic testing of these variants may be warranted in this patient subset.

  19. Longitudinal studies on maternal HIV-1 variants by biological phenotyping, sequence analysis and viral load.

    PubMed

    Renta, J Y; Cadilla, C L; Vega, M E; Hillyer, G V; Estrada, C; Jiménez, E; Abreu, E; Méndez, I; Gandía, J; Meléndez-Guerrero, L M

    1997-11-01

    In this study, the HIV-1 variant viruses from ten pregnant women and their infants were isolated and characterized longitudinally in order to determine the role that viral envelope (gp120-V3 loop) gene variation and viral tropism play in vertical transmission. Biological phenotyping of each HIV variant was accomplished by growth in MT-2, and macrophages from healthy and non-HIV-infected donors. Genetic characterization of the variants was accomplished by DNA sequence analysis. All the women enrolled in this study received ZDV therapy. Virus was cultured from eight out of ten env V3-PCR positive mothers. HIV-1 isolates were all non-syncitium inducing variants. None of the mothers were found to transmit HIV, as determined by DNA PCR and quantitative co-cultures on their infants which were seronegative for HIV-1 through one year after birth. Viral cultures from infant blood samples were negative and infants were all healthy. However, nested env V3-PCR detected proviral DNA in five out of ten infants. In contrast, conventional gag-PCR was negative in the same five infants. Sequences of the five maternal-infant pairs were different, suggesting unique infant HIV-1 variants. The three highest maternal viral load values corresponded to infants that were env V3-PCR positive. These results suggest that HIV-1 particles are transmitted from ZDV-treated mothers to infants. Infant follow up is recommended to determine if HIV-1 has been inhibited by the immune system of the infants.

  20. Exome sequencing analysis reveals variants in primary immunodeficiency genes in patients with very early onset inflammatory bowel disease.

    PubMed

    Kelsen, Judith R; Dawany, Noor; Moran, Christopher J; Petersen, Britt-Sabina; Sarmady, Mahdi; Sasson, Ariella; Pauly-Hubbard, Helen; Martinez, Alejandro; Maurer, Kelly; Soong, Joanne; Rappaport, Eric; Franke, Andre; Keller, Andreas; Winter, Harland S; Mamula, Petar; Piccoli, David; Artis, David; Sonnenberg, Gregory F; Daly, Mark; Sullivan, Kathleen E; Baldassano, Robert N; Devoto, Marcella

    2015-11-01

    Very early onset inflammatory bowel disease (VEO-IBD), IBD diagnosed at 5 years of age or younger, frequently presents with a different and more severe phenotype than older-onset IBD. We investigated whether patients with VEO-IBD carry rare or novel variants in genes associated with immunodeficiencies that might contribute to disease development. Patients with VEO-IBD and parents (when available) were recruited from the Children's Hospital of Philadelphia from March 2013 through July 2014. We analyzed DNA from 125 patients with VEO-IBD (age, 3 wk to 4 y) and 19 parents, 4 of whom also had IBD. Exome capture was performed by Agilent SureSelect V4, and sequencing was performed using the Illumina HiSeq platform. Alignment to human genome GRCh37 was achieved followed by postprocessing and variant calling. After functional annotation, candidate variants were analyzed for change in protein function, minor allele frequency less than 0.1%, and scaled combined annotation-dependent depletion scores of 10 or less. We focused on genes associated with primary immunodeficiencies and related pathways. An additional 210 exome samples from patients with pediatric IBD (n = 45) or adult-onset Crohn's disease (n = 20) and healthy individuals (controls, n = 145) were obtained from the University of Kiel, Germany, and used as control groups. Four hundred genes and regions associated with primary immunodeficiency, covering approximately 6500 coding exons totaling more than 1 Mbp of coding sequence, were selected from the whole-exome data. Our analysis showed novel and rare variants within these genes that could contribute to the development of VEO-IBD, including rare heterozygous missense variants in IL10RA and previously unidentified variants in MSH5 and CD19. In an exome sequence analysis of patients with VEO-IBD and their parents, we identified variants in genes that regulate B- and T-cell functions and could contribute to pathogenesis. Our analysis could lead to the

  1. Exome Sequencing Analysis Reveals Variants in Primary Immunodeficiency Genes in Patients With Very Early Onset Inflammatory Bowel Disease

    PubMed Central

    Kelsen, Judith R.; Dawany, Noor; Moran, Christopher J.; Petersen, Britt-Sabina; Sarmady, Mahdi; Sasson, Ariella; Pauly-Hubbard, Helen; Martinez, Alejandro; Maurer, Kelly; Soong, Joanne; Rappaport, Eric; Franke, Andre; Keller, Andreas; Winter, Harland S.; Mamula, Petar; Piccoli, David; Artis, David; Sonnenberg, Gregory F.; Daly, Mark; Sullivan, Kathleen E.; Baldassano, Robert N.; Devoto, Marcella

    2016-01-01

    Background & Aims Very early onset inflammatory bowel disease (VEO-IBD), IBD diagnosed ≤5 y of age, frequently presents with a different and more severe phenotype than older-onset IBD. We investigated whether patients with VEO-IBD carry rare or novel variants in genes associated with immunodeficiencies that might contribute to disease development. Methods Patients with VEO-IBD and parents (when available) were recruited from the Children's Hospital of Philadelphia from March 2013 through July 2014. We analyzed DNA from 125 patients with VEO-IBD (ages 3 weeks to 4 y) and 19 parents, 4 of whom also had IBD. Exome capture was performed by Agilent SureSelect V4, and sequencing was performed using the Illumina HiSeq platform. Alignment to human genome GRCh37 was achieved followed by post-processing and variant calling. Following functional annotation, candidate variants were analyzed for change in protein function, minor allele frequency <0.1%, and scaled combined annotation dependent depletion scores ≤10. We focused on genes associated with primary immunodeficiencies and related pathways. An additional 210 exome samples from patients with pediatric IBD (n=45) or adult-onset Crohn's disease (n=20) and healthy individuals (controls, n=145) were obtained from the University of Kiel, Germany and used as control groups. Results Four-hundred genes and regions associated with primary immunodeficiency, covering approximately 6500 coding exons totaling > 1 Mbp of coding sequence, were selected from the whole exome data. Our analysis revealed novel and rare variants within these genes that could contribute to the development of VEO-IBD, including rare heterozygous missense variants in IL10RA and previously unidentified variants in MSH5 and CD19. Conclusions In an exome sequence analysis of patients with VEO-IBD and their parents, we identified variants in genes that regulate B- and T-cell functions and could contribute to pathogenesis. Our analysis could lead to the

  2. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs.

    PubMed

    Saunders, Christopher T; Wong, Wendy S W; Swamy, Sajani; Becq, Jennifer; Murray, Lisa J; Cheetham, R Keira

    2012-07-15

    Whole genome and exome sequencing of matched tumor-normal sample pairs is becoming routine in cancer research. The consequent increased demand for somatic variant analysis of paired samples requires methods specialized to model this problem so as to sensitively call variants at any practical level of tumor impurity. We describe Strelka, a method for somatic SNV and small indel detection from sequencing data of matched tumor-normal samples. The method uses a novel Bayesian approach which represents continuous allele frequencies for both tumor and normal samples, while leveraging the expected genotype structure of the normal. This is achieved by representing the normal sample as a mixture of germline variation with noise, and representing the tumor sample as a mixture of the normal sample with somatic variation. A natural consequence of the model structure is that sensitivity can be maintained at high tumor impurity without requiring purity estimates. We demonstrate that the method has superior accuracy and sensitivity on impure samples compared with approaches based on either diploid genotype likelihoods or general allele-frequency tests. The Strelka workflow source code is available at ftp://strelka@ftp.illumina.com/. csaunders@illumina.com

  3. Disease Variant Landscape of a Large Multiethnic Population of Moyamoya Patients by Exome Sequencing

    PubMed Central

    Shoemaker, Lorelei D.; Clark, Michael J.; Patwardhan, Anil; Chandratillake, Gemma; Garcia, Sarah; Chen, Rong; Morgan, Alexander A.; Leng, Nan; Kirk, Scott; Chen, Richard; Cook, Douglas J.; Snyder, Michael; Steinberg, Gary K.

    2015-01-01

    Moyamoya disease (MMD) is a rare disorder characterized by cerebrovascular occlusion and development of hemorrhage-prone collateral vessels. Approximately 10–12% of cases are familial, with a presumed low penetrance autosomal dominant pattern of inheritance. Diagnosis commonly occurs only after clinical presentation. The recent identification of the RNF213 founder mutation (p.R4810K) in the Asian population has made a significant contribution, but the etiology of this disease remains unclear. To further develop the variant landscape of MMD, we performed high-depth whole exome sequencing of 125 unrelated, predominantly nonfamilial, ethnically diverse MMD patients in parallel with 125 internally sequenced, matched controls using the same exome and analysis platform. Three subpopulations were established: Asian, Caucasian, and non-RNF213 founder mutation cases. We provided additional support for the previously observed RNF213 founder mutation (p.R4810K) in Asian cases (P = 6.01×10−5) that was enriched among East Asians compared to Southeast Asian and Pacific Islander cases (P = 9.52×10−4) and was absent in all Caucasian cases. The most enriched variant in Caucasian (P = 7.93×10−4) and non-RNF213 founder mutation (P = 1.51×10−3) cases was ZXDC (p.P562L), a gene involved in MHC Class II activation. Collapsing variant methodology ranked OBSCN, a gene involved in myofibrillogenesis, as most enriched in Caucasian (P = 1.07×10−4) and non-RNF213 founder mutation cases (P = 5.31×10−5). These findings further support the East Asian origins of the RNF213 (p.R4810K) variant and more fully describe the genetic landscape of multiethnic MMD, revealing novel, alternative candidate variants and genes that may be important in MMD etiology and diagnosis. PMID:26530418

  4. Exome Sequencing in an Admixed Isolated Population Indicates NFXL1 Variants Confer a Risk for Specific Language Impairment

    PubMed Central

    Villanueva, Pía; Nudel, Ron; Hoischen, Alexander; Fernández, María Angélica; Simpson, Nuala H.; Gilissen, Christian; Reader, Rose H.; Jara, Lillian; Echeverry, Maria Magdalena; Francks, Clyde; Baird, Gillian; Conti-Ramsden, Gina; O’Hare, Anne; Bolton, Patrick F.; Hennessy, Elizabeth R.; Palomino, Hernán; Carvajal-Carmona, Luis; Veltman, Joris A.; Cazier, Jean-Baptiste; De Barbieri, Zulema

    2015-01-01

    Children affected by Specific Language Impairment (SLI) fail to acquire age appropriate language skills despite adequate intelligence and opportunity. SLI is highly heritable, but the understanding of underlying genetic mechanisms has proved challenging. In this study, we use molecular genetic techniques to investigate an admixed isolated founder population from the Robinson Crusoe Island (Chile), who are affected by a high incidence of SLI, increasing the power to discover contributory genetic factors. We utilize exome sequencing in selected individuals from this population to identify eight coding variants that are of putative significance. We then apply association analyses across the wider population to highlight a single rare coding variant (rs144169475, Minor Allele Frequency of 4.1% in admixed South American populations) in the NFXL1 gene that confers a nonsynonymous change (N150K) and is significantly associated with language impairment in the Robinson Crusoe population (p = 2.04 × 10–4, 8 variants tested). Subsequent sequencing of NFXL1 in 117 UK SLI cases identified four individuals with heterozygous variants predicted to be of functional consequence. We conclude that coding variants within NFXL1 confer an increased risk of SLI within a complex genetic model. PMID:25781923

  5. Exome sequencing in an admixed isolated population indicates NFXL1 variants confer a risk for specific language impairment.

    PubMed

    Villanueva, Pía; Nudel, Ron; Hoischen, Alexander; Fernández, María Angélica; Simpson, Nuala H; Gilissen, Christian; Reader, Rose H; Jara, Lillian; Echeverry, María Magdalena; Echeverry, Maria Magdalena; Francks, Clyde; Baird, Gillian; Conti-Ramsden, Gina; O'Hare, Anne; Bolton, Patrick F; Hennessy, Elizabeth R; Palomino, Hernán; Carvajal-Carmona, Luis; Veltman, Joris A; Cazier, Jean-Baptiste; De Barbieri, Zulema; Fisher, Simon E; Newbury, Dianne F

    2015-03-01

    Children affected by Specific Language Impairment (SLI) fail to acquire age appropriate language skills despite adequate intelligence and opportunity. SLI is highly heritable, but the understanding of underlying genetic mechanisms has proved challenging. In this study, we use molecular genetic techniques to investigate an admixed isolated founder population from the Robinson Crusoe Island (Chile), who are affected by a high incidence of SLI, increasing the power to discover contributory genetic factors. We utilize exome sequencing in selected individuals from this population to identify eight coding variants that are of putative significance. We then apply association analyses across the wider population to highlight a single rare coding variant (rs144169475, Minor Allele Frequency of 4.1% in admixed South American populations) in the NFXL1 gene that confers a nonsynonymous change (N150K) and is significantly associated with language impairment in the Robinson Crusoe population (p = 2.04 × 10-4, 8 variants tested). Subsequent sequencing of NFXL1 in 117 UK SLI cases identified four individuals with heterozygous variants predicted to be of functional consequence. We conclude that coding variants within NFXL1 confer an increased risk of SLI within a complex genetic model.

  6. Whole-Genome Sequences of Variants of Bacillus anthracis Sterne and Their Toxin Gene Deletion Mutants

    PubMed Central

    Staab, A.; Plaut, R. D.; Pratt, C.; Lovett, S. P.; Wiley, M. R.; Biggs, T. D.; Bernhards, R. C.; Beck, L. C.; Palacios, G. F.; Stibitz, S.; Jones, K. L.; Goodwin, B. G.; Smith, M. A.

    2017-01-01

    ABSTRACT Here, we report the draft genome sequences of three laboratory variants of Bacillus anthracis Sterne and their double (Δlef Δcya) and triple (Δpag Δlef Δcya) toxin gene deletion derivatives. PMID:29122874

  7. Dipeptide Sequence Determination: Analyzing Phenylthiohydantoin Amino Acids by HPLC

    NASA Astrophysics Data System (ADS)

    Barton, Janice S.; Tang, Chung-Fei; Reed, Steven S.

    2000-02-01

    Amino acid composition and sequence determination, important techniques for characterizing peptides and proteins, are essential for predicting conformation and studying sequence alignment. This experiment presents improved, fundamental methods of sequence analysis for an upper-division biochemistry laboratory. Working in pairs, students use the Edman reagent to prepare phenylthiohydantoin derivatives of amino acids for determination of the sequence of an unknown dipeptide. With a single HPLC technique, students identify both the N-terminal amino acid and the composition of the dipeptide. This method yields good precision of retention times and allows use of a broad range of amino acids as components of the dipeptide. Students learn fundamental principles and techniques of sequence analysis and HPLC.

  8. Dealing with the incidental finding of secondary variants by the example of SRNS patients undergoing targeted next-generation sequencing.

    PubMed

    Weber, Stefanie; Büscher, Anja K; Hagmann, Henning; Liebau, Max C; Heberle, Christian; Ludwig, Michael; Rath, Sabine; Alberer, Martin; Beissert, Antje; Zenker, Martin; Hoyer, Peter F; Konrad, Martin; Klein, Hanns-Georg; Hoefele, Julia

    2016-01-01

    Steroid-resistant nephrotic syndrome (SRNS) is a severe cause of progressive renal disease. Genetic forms of SRNS can present with autosomal recessive or autosomal dominant inheritance. Recent studies have identified mutations in multiple podocyte genes responsible for SRNS. Improved sequencing methods (next-generation sequencing, NGS) now promise rapid mutational testing of SRNS genes. In the present study, a simultaneous screening of ten SRNS genes in 37 SRNS patients was performed by NGS. In 38 % of the patients, causative mutations in one SRNS gene were found. In 22 % of the patients, in addition to these mutations, a secondary variant in a different gene was identified. This high incidence of accumulating sequence variants was unexpected but, although they might have modifier effects, the pathogenic potential of these additional sequence variants seems unclear so far. The example of molecular diagnostics by NGS in SRNS patients shows that these new sequencing technologies might provide further insight into molecular pathogenicity in genetic disorders but will also generate results, which will be difficult to interpret and complicate genetic counseling. Although NGS promises more frequent identification of disease-causing mutations, the identification of causative mutations, the interpretation of incidental findings and possible pitfalls might pose problems, which hopefully will decrease by further experience and elucidation of molecular interactions.

  9. Whole-exome sequencing reveals genetic variants associated with chronic kidney disease characterized by tubulointerstitial damages in North Central Region, Sri Lanka.

    PubMed

    Nanayakkara, Shanika; Senevirathna, S T M L D; Parahitiyawa, Nipuna B; Abeysekera, Tilak; Chandrajith, Rohana; Ratnatunga, Neelakanthi; Hitomi, Toshiaki; Kobayashi, Hatasu; Harada, Kouji H; Koizumi, Akio

    2015-09-01

    The familial clustering observed in chronic kidney disease of uncertain etiology (CKDu) characterized by tubulointerstitial damages in the North Central Region of Sri Lanka strongly suggests the involvement of genetic factors in its pathogenesis. The objective of the present study is to use whole-exome sequencing to identify the genetic variants associated with CKDu. Whole-exome sequencing of eight CKDu cases and eight controls was performed, followed by direct sequencing of candidate loci in 301 CKDu cases and 276 controls. Association study revealed rs34970857 (c.658G > A/p.V220M) located in the KCNA10 gene encoding a voltage-gated K channel as the most promising SNP with the highest odds ratio of 1.74. Four rare variants were identified in gene encoding Laminin beta2 (LAMB2) which is known to cause congenital nephrotic syndrome. Three out of four variants in LAMB2 were novel variants found exclusively in cases. Genetic investigations provide strong evidence on the presence of genetic susceptibility for CKDu. Possibility of presence of several rare variants associated with CKDu in this population is also suggested.

  10. Gene-Based Sequencing Identifies Lipid-Influencing Variants with Ethnicity-Specific Effects in African Americans

    PubMed Central

    Bentley, Amy R.; Chen, Guanjie; Shriner, Daniel; Doumatey, Ayo P.; Zhou, Jie; Huang, Hanxia; Mullikin, James C.; Blakesley, Robert W.; Hansen, Nancy F.; Bouffard, Gerard G.; Cherukuri, Praveen F.; Maskeri, Baishali; Young, Alice C.; Adeyemo, Adebowale; Rotimi, Charles N.

    2014-01-01

    Although a considerable proportion of serum lipids loci identified in European ancestry individuals (EA) replicate in African Americans (AA), interethnic differences in the distribution of serum lipids suggest that some genetic determinants differ by ethnicity. We conducted a comprehensive evaluation of five lipid candidate genes to identify variants with ethnicity-specific effects. We sequenced ABCA1, LCAT, LPL, PON1, and SERPINE1 in 48 AA individuals with extreme serum lipid concentrations (high HDLC/low TG or low HDLC/high TG). Identified variants were genotyped in the full population-based sample of AA (n = 1694) and tested for an association with serum lipids. rs328 (LPL) and correlated variants were associated with higher HDLC and lower TG. Interestingly, a stronger effect was observed on a “European” vs. “African” genetic background at this locus. To investigate this effect, we evaluated the region among West Africans (WA). For TG, the effect size among WA was the same in AA with only African local ancestry (2–3% lower TG), while the larger association among AA with local European ancestry matched previous reports in EA (10%). For HDLC, there was no association with rs328 in AA with only African local ancestry or in WA, while the association among AA with European local ancestry was much greater than what has been observed for EA (15 vs. ∼5 mg/dl), suggesting an interaction with an environmental or genetic factor that differs by ethnicity. Beyond this ancestry effect, the importance of African ancestry-focused, sequence-based work was also highlighted by serum lipid associations of variants that were in higher frequency (or present only) among those of African ancestry. By beginning our study with the sequence variation present in AA individuals, investigating local ancestry effects, and seeking replication in WA, we were able to comprehensively evaluate the role of a set of candidate genes in serum lipids in AA. PMID:24603370

  11. 77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-29

    ... DEPARTMENT OF COMMERCE Patent and Trademark Office Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request... Patent applications that contain nucleotide and/or amino acid sequence disclosures must include a copy of...

  12. High speed nucleic acid sequencing

    DOEpatents

    Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid. Each type of labeled nucleotide comprises an acceptor fluorophore attached to a phosphate portion of the nucleotide such that the fluorophore is removed upon incorporation into a growing strand. Fluorescent signal is emitted via fluorescent resonance energy transfer between the donor fluorophore and the acceptor fluorophore as each nucleotide is incorporated into the growing strand. The sequence is deduced by identifying which base is being incorporated into the growing strand.

  13. The IBO germination quantitative trait locus encodes a phosphatase 2C-related variant with a nonsynonymous amino acid change that interferes with abscisic acid signaling.

    PubMed

    Amiguet-Vercher, Amélia; Santuari, Luca; Gonzalez-Guzman, Miguel; Depuydt, Stephen; Rodriguez, Pedro L; Hardtke, Christian S

    2015-02-01

    Natural genetic variation is crucial for adaptability of plants to different environments. Seed dormancy prevents precocious germination in unsuitable conditions and is an adaptation to a major macro-environmental parameter, the seasonal variation in temperature and day length. Here we report the isolation of IBO, a quantitative trait locus (QTL) that governs c. 30% of germination rate variance in an Arabidopsis recombinant inbred line (RIL) population derived from the parental accessions Eilenburg-0 (Eil-0) and Loch Ness-0 (Lc-0). IBO encodes an uncharacterized phosphatase 2C-related protein, but neither the Eil-0 nor the Lc-0 variant, which differ in a single amino acid, have any appreciable phosphatase activity in in vitro assays. However, we found that the amino acid change in the Lc-0 variant of the IBO protein confers reduced germination rate. Moreover, unlike the Eil-0 variant of the protein, the Lc-0 variant can interfere with the activity of the phosphatase 2C ABSCISIC ACID INSENSITIVE 1 in vitro. This suggests that the Lc-0 variant possibly interferes with abscisic acid signaling, a notion that is supported by physiological assays. Thus, we isolated an example of a QTL allele with a nonsynonymous amino acid change that might mediate local adaptation of seed germination timing. © 2014 The Authors. New Phytologist © 2014 New Phytologist Trust.

  14. Molecular Cloning and Expression of Sequence Variants of Manganese Superoxide Dismutase Genes from Wheat

    USDA-ARS?s Scientific Manuscript database

    Reactive oxygen species (ROS) are very harmful to living organisms due to the potential oxidation of membrane lipids, DNA, proteins, and carbohydrates. Transformed E.coli strain QC 871, superoxide dismutase (SOD) double-mutant, with three sequence variant MnSOD1, MnSOD2, and MnSOD3 manganese supero...

  15. RefCNV: Identification of Gene-Based Copy Number Variants Using Whole Exome Sequencing.

    PubMed

    Chang, Lun-Ching; Das, Biswajit; Lih, Chih-Jian; Si, Han; Camalier, Corinne E; McGregor, Paul M; Polley, Eric

    2016-01-01

    With rapid advances in DNA sequencing technologies, whole exome sequencing (WES) has become a popular approach for detecting somatic mutations in oncology studies. The initial intent of WES was to characterize single nucleotide variants, but it was observed that the number of sequencing reads that mapped to a genomic region correlated with the DNA copy number variants (CNVs). We propose a method RefCNV that uses a reference set to estimate the distribution of the coverage for each exon. The construction of the reference set includes an evaluation of the sources of variability in the coverage distribution. We observed that the processing steps had an impact on the coverage distribution. For each exon, we compared the observed coverage with the expected normal coverage. Thresholds for determining CNVs were selected to control the false-positive error rate. RefCNV prediction correlated significantly (r = 0.96-0.86) with CNV measured by digital polymerase chain reaction for MET (7q31), EGFR (7p12), or ERBB2 (17q12) in 13 tumor cell lines. The genome-wide CNV analysis showed a good overall correlation (Spearman's coefficient = 0.82) between RefCNV estimation and publicly available CNV data in Cancer Cell Line Encyclopedia. RefCNV also showed better performance than three other CNV estimation methods in genome-wide CNV analysis.

  16. Exome sequence analysis and follow up genotyping implicates rare ULK1 variants to be involved in susceptibility to schizophrenia

    PubMed Central

    Al Eissa, Mariam M.; Fiorentino, Alessia; Sharp, Sally I.; O'Brien, Niamh L.; Wolfe, Kate; Giaroli, Giovanni; Curtis, David; Bass, Nicholas J.

    2017-01-01

    Summary Schizophrenia (SCZ) is a severe, highly heritable psychiatric disorder. Elucidation of the genetic architecture of the disorder will facilitate greater understanding of the altered underlying neurobiological mechanisms. The aim of this study was to identify likely aetiological variants in subjects affected with SCZ. Exome sequence data from a SCZ cas–control sample from Sweden was analysed for likely aetiological variants using a weighted burden test. Suggestive evidence implicated the UNC‐51‐like kinase (ULK1) gene, and it was observed that four rare variants that were more common in the Swedish SCZ cases were also more common in UK10K SCZ cases, as compared to obesity cases. These three missense variants and one intronic variant were genotyped in the University College London cohort of 1304 SCZ cases and 1348 ethnically matched controls. All four variants were more common in the SCZ cases than controls and combining them produced a result significant at P = 0.02. The results presented here demonstrate the importance of following up exome sequencing studies using additional datasets. The roles of ULK1 in autophagy and mTOR signalling strengthen the case that these pathways may be important in the pathophysiology of SCZ. The findings reported here await independent replication. PMID:29148569

  17. Genotype-phenotype correlations in individuals with pathogenic RERE variants.

    PubMed

    Jordan, Valerie K; Fregeau, Brieana; Ge, Xiaoyan; Giordano, Jessica; Wapner, Ronald J; Balci, Tugce B; Carter, Melissa T; Bernat, John A; Moccia, Amanda N; Srivastava, Anshika; Martin, Donna M; Bielas, Stephanie L; Pappas, John; Svoboda, Melissa D; Rio, Marlène; Boddaert, Nathalie; Cantagrel, Vincent; Lewis, Andrea M; Scaglia, Fernando; Kohler, Jennefer N; Bernstein, Jonathan A; Dries, Annika M; Rosenfeld, Jill A; DeFilippo, Colette; Thorson, Willa; Yang, Yaping; Sherr, Elliott H; Bi, Weimin; Scott, Daryl A

    2018-05-01

    Heterozygous variants in the arginine-glutamic acid dipeptide repeats gene (RERE) have been shown to cause neurodevelopmental disorder with or without anomalies of the brain, eye, or heart (NEDBEH). Here, we report nine individuals with NEDBEH who carry partial deletions or deleterious sequence variants in RERE. These variants were found to be de novo in all cases in which parental samples were available. An analysis of data from individuals with NEDBEH suggests that point mutations affecting the Atrophin-1 domain of RERE are associated with an increased risk of structural eye defects, congenital heart defects, renal anomalies, and sensorineural hearing loss when compared with loss-of-function variants that are likely to lead to haploinsufficiency. A high percentage of RERE pathogenic variants affect a histidine-rich region in the Atrophin-1 domain. We have also identified a recurrent two-amino-acid duplication in this region that is associated with the development of a CHARGE syndrome-like phenotype. We conclude that mutations affecting RERE result in a spectrum of clinical phenotypes. Genotype-phenotype correlations exist and can be used to guide medical decision making. Consideration should also be given to screening for RERE variants in individuals who fulfill diagnostic criteria for CHARGE syndrome but do not carry pathogenic variants in CHD7. © 2018 Wiley Periodicals, Inc.

  18. Houston Methodist Variant Viewer: An Application to Support Clinical Laboratory Interpretation of Next-generation Sequencing Data for Cancer

    PubMed Central

    Christensen, Paul A.; Ni, Yunyun; Bao, Feifei; Hendrickson, Heather L.; Greenwood, Michael; Thomas, Jessica S.; Long, S. Wesley; Olsen, Randall J.

    2017-01-01

    Introduction: Next-generation-sequencing (NGS) is increasingly used in clinical and research protocols for patients with cancer. NGS assays are routinely used in clinical laboratories to detect mutations bearing on cancer diagnosis, prognosis and personalized therapy. A typical assay may interrogate 50 or more gene targets that encompass many thousands of possible gene variants. Analysis of NGS data in cancer is a labor-intensive process that can become overwhelming to the molecular pathologist or research scientist. Although commercial tools for NGS data analysis and interpretation are available, they are often costly, lack key functionality or cannot be customized by the end user. Methods: To facilitate NGS data analysis in our clinical molecular diagnostics laboratory, we created a custom bioinformatics tool termed Houston Methodist Variant Viewer (HMVV). HMVV is a Java-based solution that integrates sequencing instrument output, bioinformatics analysis, storage resources and end user interface. Results: Compared to the predicate method used in our clinical laboratory, HMVV markedly simplifies the bioinformatics workflow for the molecular technologist and facilitates the variant review by the molecular pathologist. Importantly, HMVV reduces time spent researching the biological significance of the variants detected, standardizes the online resources used to perform the variant investigation and assists generation of the annotated report for the electronic medical record. HMVV also maintains a searchable variant database, including the variant annotations generated by the pathologist, which is useful for downstream quality improvement and research projects. Conclusions: HMVV is a clinical grade, low-cost, feature-rich, highly customizable platform that we have made available for continued development by the pathology informatics community. PMID:29226007

  19. Association between sequence variants in panicle development genes and the number of spikelets per panicle in rice.

    PubMed

    Jang, Su; Lee, Yunjoo; Lee, Gileung; Seo, Jeonghwan; Lee, Dongryung; Yu, Yoye; Chin, Joong Hyoun; Koh, Hee-Jong

    2018-01-15

    Balancing panicle-related traits such as panicle length and the numbers of primary and secondary branches per panicle, is key to improving the number of spikelets per panicle in rice. Identifying genetic information contributes to a broader understanding of the roles of gene and provides candidate alleles for use as DNA markers. Discovering relations between panicle-related traits and sequence variants allows opportunity for molecular application in rice breeding to improve the number of spikelets per panicle. In total, 142 polymorphic sites, which constructed 58 haplotypes, were detected in coding regions of ten panicle development gene and 35 sequence variants in six genes were significantly associated with panicle-related traits. Rice cultivars were clustered according to their sequence variant profiles. One of the four resultant clusters, which contained only indica and tong-il varieties, exhibited the largest average number of favorable alleles and highest average number of spikelets per panicle, suggesting that the favorable allele combination found in this cluster was beneficial in increasing the number of spikelets per panicle. Favorable alleles identified in this study can be used to develop functional markers for rice breeding programs. Furthermore, stacking several favorable alleles has the potential to substantially improve the number of spikelets per panicle in rice.

  20. Vascular Ehlers–Danlos Syndrome in siblings with biallelic COL3A1 sequence variants and marked clinical variability in the extended family

    PubMed Central

    Jørgensen, Agnete; Fagerheim, Toril; Rand-Hendriksen, Svend; Lunde, Per I; Vorren, Torgrim O; Pepin, Melanie G; Leistritz, Dru F; Byers, Peter H

    2015-01-01

    Vascular Ehlers–Danlos Syndrome (vEDS), also known as EDS type IV, is considered to be an autosomal dominant disorder caused by sequence variants in COL3A1, which encodes the chains of type III procollagen. We identified a family in which there was marked clinical variation with the earliest death due to extensive aortic dissection at age 15 years and other family members in their eighties with no complications. The proband was born with right-sided clubfoot but was otherwise healthy until he died unexpectedly at 15 years. His sister, in addition to signs consistent with vascular EDS, had bilateral frontal and parietal polymicrogyria. The proband and his sister each had two COL3A1 sequence variants, c.1786C>T, p.(Arg596*) in exon 26 and c.3851G>A, p.(Gly1284Glu) in exon 50 on different alleles. Cells from the compound heterozygote produced a reduced amount of type III procollagen, all the chains of which had abnormal electrophoretic mobility. Biallelic sequence variants have a significantly worse outcome than heterozygous variants for either null mutations or missense mutations, and frontoparietal polymicrogyria may be an added phenotype feature. This genetic constellation provides a very rare explanation for marked intrafamilial clinical variation due to sequence variants in COL3A1. PMID:25205403

  1. CEP72-ROS1: A novel ROS1 oncogenic fusion variant in lung adenocarcinoma identified by next-generation sequencing.

    PubMed

    Zhu, You-Cai; Zhou, Yue-Fen; Wang, Wen-Xian; Xu, Chun-Wei; Zhuang, Wu; Du, Kai-Qi; Chen, Gang

    2018-05-01

    ROS1 rearrangement is a validated therapeutic driver gene in non-small cell lung cancer (NSCLC) and represents a small subset (1-2%) of NSCLC. A total of 17 different fusion partner genes of ROS1 in NSCLC have been reported. The multi-targeted MET/ALK/ROS1 tyrosine kinase inhibitor (TKI) crizotinib has demonstrated remarkable efficacy in ROS1-rearranged NSCLC. Consequently, ROS1 detection assays include fluorescence in situ hybridization, immunohistochemistry, and real-time PCR. Next-generation sequencing (NGS) assay covers a range of fusion genes and approaches to discover novel receptor-kinase rearrangements in lung cancer. A 63-year-old male smoker with stage IV NSCLC (TxNxM1) was detected with a novel ROS1 fusion. Histological examination of the tumor showed lung adenocarcinoma. NGS analysis of the hydrothorax cellblocks revealed a novel CEP72-ROS1 rearrangement. This novel CEP72-ROS1 fusion variant is generated by the fusion of exons 1-11 of CEP72 on chromosome 5p15 to exons 23-43 of ROS1 on chromosome 6q22. The predicted CEP72-ROS1 protein product contains 1202 amino acids comprising the N-terminal amino acids 594-647 of CEP72 and C-terminal amino acid 1-1148 of ROS1. CEP72-ROS1 is a novel ROS1 fusion variant in NSCLC discovered by NGS and could be included in ROS1 detection assay, such as reverse transcription PCR. Pleural effusion samples show good diagnostic performance in clinical practice. © 2018 The Authors. Thoracic Cancer published by China Lung Oncology Group and John Wiley & Sons Australia, Ltd.

  2. Next generation sequencing identifies abnormal Y chromosome and candidate causal variants in premature ovarian failure patients.

    PubMed

    Lee, Yujung; Kim, Changshin; Park, YoungJoon; Pyun, Jung-A; Kwack, KyuBum

    2016-12-01

    Premature ovarian failure (POF) is characterized by heterogeneous genetic causes such as chromosomal abnormalities and variants in causal genes. Recently, development of techniques made next generation sequencing (NGS) possible to detect genome wide variants including chromosomal abnormalities. Among 37 Korean POF patients, XY karyotype with distal part deletions of Y chromosome, Yp11.32-31 and Yp12 end part, was observed in two patients through NGS. Six deleterious variants in POF genes were also detected which might explain the pathogenesis of POF with abnormalities in the sex chromosomes. Additionally, the two POF patients had no mutation in SRY but three non-synonymous variants were detected in genes regarding sex reversal. These findings suggest candidate causes of POF and sex reversal and show the propriety of NGS to approach the heterogeneous pathogenesis of POF. Copyright © 2016 Elsevier Inc. All rights reserved.

  3. Single nucleotide variants and InDels identified from whole-genome re-sequencing of Guzerat, Gyr, Girolando and Holstein cattle breeds.

    PubMed

    Stafuzza, Nedenia Bonvino; Zerlotini, Adhemar; Lobo, Francisco Pereira; Yamagishi, Michel Eduardo Beleza; Chud, Tatiane Cristina Seleguim; Caetano, Alexandre Rodrigues; Munari, Danísio Prado; Garrick, Dorian J; Machado, Marco Antonio; Martins, Marta Fonseca; Carvalho, Maria Raquel; Cole, John Bruce; Barbosa da Silva, Marcos Vinicius Gualberto

    2017-01-01

    Whole-genome re-sequencing, alignment and annotation analyses were undertaken for 12 sires representing four important cattle breeds in Brazil: Guzerat (multi-purpose), Gyr, Girolando and Holstein (dairy production). A total of approximately 4.3 billion reads from an Illumina HiSeq 2000 sequencer generated for each animal 10.7 to 16.4-fold genome coverage. A total of 27,441,279 single nucleotide variations (SNVs) and 3,828,041 insertions/deletions (InDels) were detected in the samples, of which 2,557,670 SNVs and 883,219 InDels were novel. The submission of these genetic variants to the dbSNP database significantly increased the number of known variants, particularly for the indicine genome. The concordance rate between genotypes obtained using the Bovine HD BeadChip array and the same variants identified by sequencing was about 99.05%. The annotation of variants identified numerous non-synonymous SNVs and frameshift InDels which could affect phenotypic variation. Functional enrichment analysis was performed and revealed that variants in the olfactory transduction pathway was over represented in all four cattle breeds, while the ECM-receptor interaction pathway was over represented in Girolando and Guzerat breeds, the ABC transporters pathway was over represented only in Holstein breed, and the metabolic pathways was over represented only in Gyr breed. The genetic variants discovered here provide a rich resource to help identify potential genomic markers and their associated molecular mechanisms that impact economically important traits for Gyr, Girolando, Guzerat and Holstein breeding programs.

  4. Single nucleotide variants and InDels identified from whole-genome re-sequencing of Guzerat, Gyr, Girolando and Holstein cattle breeds

    PubMed Central

    Lobo, Francisco Pereira; Yamagishi, Michel Eduardo Beleza; Chud, Tatiane Cristina Seleguim; Caetano, Alexandre Rodrigues; Munari, Danísio Prado; Garrick, Dorian J.; Machado, Marco Antonio; Martins, Marta Fonseca; Carvalho, Maria Raquel; Cole, John Bruce; Barbosa da Silva, Marcos Vinicius Gualberto

    2017-01-01

    Whole-genome re-sequencing, alignment and annotation analyses were undertaken for 12 sires representing four important cattle breeds in Brazil: Guzerat (multi-purpose), Gyr, Girolando and Holstein (dairy production). A total of approximately 4.3 billion reads from an Illumina HiSeq 2000 sequencer generated for each animal 10.7 to 16.4-fold genome coverage. A total of 27,441,279 single nucleotide variations (SNVs) and 3,828,041 insertions/deletions (InDels) were detected in the samples, of which 2,557,670 SNVs and 883,219 InDels were novel. The submission of these genetic variants to the dbSNP database significantly increased the number of known variants, particularly for the indicine genome. The concordance rate between genotypes obtained using the Bovine HD BeadChip array and the same variants identified by sequencing was about 99.05%. The annotation of variants identified numerous non-synonymous SNVs and frameshift InDels which could affect phenotypic variation. Functional enrichment analysis was performed and revealed that variants in the olfactory transduction pathway was over represented in all four cattle breeds, while the ECM-receptor interaction pathway was over represented in Girolando and Guzerat breeds, the ABC transporters pathway was over represented only in Holstein breed, and the metabolic pathways was over represented only in Gyr breed. The genetic variants discovered here provide a rich resource to help identify potential genomic markers and their associated molecular mechanisms that impact economically important traits for Gyr, Girolando, Guzerat and Holstein breeding programs. PMID:28323836

  5. The Clinical Next-Generation Sequencing Database: A Tool for the Unified Management of Clinical Information and Genetic Variants to Accelerate Variant Pathogenicity Classification.

    PubMed

    Nishio, Shin-Ya; Usami, Shin-Ichi

    2017-03-01

    Recent advances in next-generation sequencing (NGS) have given rise to new challenges due to the difficulties in variant pathogenicity interpretation and large dataset management, including many kinds of public population databases as well as public or commercial disease-specific databases. Here, we report a new database development tool, named the "Clinical NGS Database," for improving clinical NGS workflow through the unified management of variant information and clinical information. This database software offers a two-feature approach to variant pathogenicity classification. The first of these approaches is a phenotype similarity-based approach. This database allows the easy comparison of the detailed phenotype of each patient with the average phenotype of the same gene mutation at the variant or gene level. It is also possible to browse patients with the same gene mutation quickly. The other approach is a statistical approach to variant pathogenicity classification based on the use of the odds ratio for comparisons between the case and the control for each inheritance mode (families with apparently autosomal dominant inheritance vs. control, and families with apparently autosomal recessive inheritance vs. control). A number of case studies are also presented to illustrate the utility of this database. © 2016 The Authors. **Human Mutation published by Wiley Periodicals, Inc.

  6. Screening for single nucleotide variants, small indels and exon deletions with a next-generation sequencing based gene panel approach for Usher syndrome

    PubMed Central

    Krawitz, Peter M; Schiska, Daniela; Krüger, Ulrike; Appelt, Sandra; Heinrich, Verena; Parkhomchuk, Dmitri; Timmermann, Bernd; Millan, Jose M; Robinson, Peter N; Mundlos, Stefan; Hecht, Jochen; Gross, Manfred

    2014-01-01

    Usher syndrome is an autosomal recessive disorder characterized both by deafness and blindness. For the three clinical subtypes of Usher syndrome causal mutations in altogether 12 genes and a modifier gene have been identified. Due to the genetic heterogeneity of Usher syndrome, the molecular analysis is predestined for a comprehensive and parallelized analysis of all known genes by next-generation sequencing (NGS) approaches. We describe here the targeted enrichment and deep sequencing for exons of Usher genes and compare the costs and workload of this approach compared to Sanger sequencing. We also present a bioinformatics analysis pipeline that allows us to detect single-nucleotide variants, short insertions and deletions, as well as copy number variations of one or more exons on the same sequence data. Additionally, we present a flexible in silico gene panel for the analysis of sequence variants, in which newly identified genes can easily be included. We applied this approach to a cohort of 44 Usher patients and detected biallelic pathogenic mutations in 35 individuals and monoallelic mutations in eight individuals of our cohort. Thirty-nine of the sequence variants, including two heterozygous deletions comprising several exons of USH2A, have not been reported so far. Our NGS-based approach allowed us to assess single-nucleotide variants, small indels, and whole exon deletions in a single test. The described diagnostic approach is fast and cost-effective with a high molecular diagnostic yield. PMID:25333064

  7. Screening for single nucleotide variants, small indels and exon deletions with a next-generation sequencing based gene panel approach for Usher syndrome.

    PubMed

    Krawitz, Peter M; Schiska, Daniela; Krüger, Ulrike; Appelt, Sandra; Heinrich, Verena; Parkhomchuk, Dmitri; Timmermann, Bernd; Millan, Jose M; Robinson, Peter N; Mundlos, Stefan; Hecht, Jochen; Gross, Manfred

    2014-09-01

    Usher syndrome is an autosomal recessive disorder characterized both by deafness and blindness. For the three clinical subtypes of Usher syndrome causal mutations in altogether 12 genes and a modifier gene have been identified. Due to the genetic heterogeneity of Usher syndrome, the molecular analysis is predestined for a comprehensive and parallelized analysis of all known genes by next-generation sequencing (NGS) approaches. We describe here the targeted enrichment and deep sequencing for exons of Usher genes and compare the costs and workload of this approach compared to Sanger sequencing. We also present a bioinformatics analysis pipeline that allows us to detect single-nucleotide variants, short insertions and deletions, as well as copy number variations of one or more exons on the same sequence data. Additionally, we present a flexible in silico gene panel for the analysis of sequence variants, in which newly identified genes can easily be included. We applied this approach to a cohort of 44 Usher patients and detected biallelic pathogenic mutations in 35 individuals and monoallelic mutations in eight individuals of our cohort. Thirty-nine of the sequence variants, including two heterozygous deletions comprising several exons of USH2A, have not been reported so far. Our NGS-based approach allowed us to assess single-nucleotide variants, small indels, and whole exon deletions in a single test. The described diagnostic approach is fast and cost-effective with a high molecular diagnostic yield.

  8. Structural analysis of an HLA-B27 functional variant, B27d detected in American blacks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rojo, S.; Aparicio, P.; Hansen, J.A.

    1987-11-15

    The structure of a new functional variant B27d has been established by comparative peptide mapping and radiochemical sequencing. This analysis complete the structural characterization of the six know histocompatibility leukocyte antigen (HLA)-B27 subtypes. The only detected amino acid change between the main HLA-B27.1 subtype and B27d is that of Try/sub 59/ to His/sub 59/. Position 59 has not been previously found to vary among class I HLA or H-2 antigens. Such substitution accounts for the reported isoelectric focusing pattern of this variant. HLA-B27d is the only B27 variant found to differ from other subtypes by a single amino acid replacement.more » The nature of the change is compatible with its origin by a point mutation from HLB-B27.1. Because B27d was found only American blacks and in no other ethnic groups, it is suggested that this variant originated as a result of a mutation of the B27.1 gene that occurred within the black population. Structural analysis of B27d was done by comparative mapping. Radiochemical sequencing was carried out with /sup 14/C-labeled and /sup 3/H-labeled amino acids.« less

  9. Genetic polymorphisms in Na+-taurocholate co-transporting polypeptide (NTCP) and ileal apical sodium-dependent bile acid transporter (ASBT) and ethnic comparisons of functional variants of NTCP among Asian populations.

    PubMed

    Pan, Wei; Song, Im-Sook; Shin, Ho-Jung; Kim, Min-Hye; Choi, Yeong-Lim; Lim, Su-Jeong; Kim, Woo-Young; Lee, Sang-Seop; Shin, Jae-Gook

    2011-06-01

    Genetic variants of Na(+)-taurocholate co-transporting polypeptide (NTCP; SLC10A1) and ileal apical sodium-dependent bile acid transporter (ASBT; SLC10A2), which greatly contribute to bile acid homeostasis, were extensively explored in the Korean population and functional variants of NTCP were compared among Asian populations. From direct DNA sequencing, six SNPs were identified in the SLC10A1 gene and 14 SNPs in the SLC10A2 gene. Three of seven coding variants were non-synonymous SNPs: two variants from SLC10A1 (A64T, S267F) and one from SLC10A2 (A171S). No linkage was analysed in the SLC10A1 gene because of low frequencies of genetic variants, and the SLC10A2 gene was composed of two separated linkage disequilibrium blocks contrary to the white population. The stably transfected NTCP-A64T variant showed significantly decreased uptakes of taurocholate and rosuvastatin compared with wild-type NTCP. The decreased taurocholate uptake and increased rosuvastatin uptake were shown in the NTCP-S267F variant. The allele frequencies of these functional variants were 1.0% and 3.1%, respectively, in a Korean population. However, NTCP-A64T was not found in Chinese and Vietnamese subjects. The frequency distribution of NTCP-S267F in Koreans was significantly lower than those in Chinese and Vietnamese populations. Our data suggest that NTCP-A64T and -S267F variants cause substrate-dependent functional change in vitro, and show ethnic difference in their allelic frequencies among Asian populations although the clinical relevance of these variants is remained to be evaluated.

  10. Kit for detecting nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    2001-01-01

    A kit is provided for detecting a target nucleic acid sequence in a sample, the kit comprising: a first hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the first hybridization probe including a first complexing agent for forming a binding pair with a second complexing agent; and a second hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the first hybridization probe does not selectively hybridize, the second hybridization probe including a detectable marker; a third hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the third hybridization probe including the same detectable marker as the second hybridization probe; and a fourth hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the third hybridization probe does not selectively hybridize, the fourth hybridization probe including the first complexing agent for forming a binding pair with the second complexing agent; wherein the first and second hybridization probes are capable of simultaneously hybridizing to the target sequence and the third and fourth hybridization probes are capable of simultaneously hybridizing to the target sequence, the detectable marker is not present on the first or fourth hybridization probes and the first, second, third, and fourth hybridization probes each include a competitive nucleic acid sequence which is sufficiently complementary to a third portion of the target sequence that the competitive sequences of the first, second, third, and fourth hybridization probes compete with each other to hybridize to the third portion of the

  11. Prevalence of pathogenic germline variants detected by multigene sequencing in unselected Japanese patients with ovarian cancer

    PubMed Central

    Hirasawa, Akira; Imoto, Issei; Naruto, Takuya; Akahane, Tomoko; Yamagami, Wataru; Nomura, Hiroyuki; Masuda, Kiyoshi; Susumu, Nobuyuki; Tsuda, Hitoshi; Aoki, Daisuke

    2017-01-01

    Pathogenic germline BRCA1, BRCA2 (BRCA1/2), and several other gene variants predispose women to primary ovarian, fallopian tube, and peritoneal carcinoma (OC), although variant frequency and relevance information is scarce in Japanese women with OC. Using targeted panel sequencing, we screened 230 unselected Japanese women with OC from our hospital-based cohort for pathogenic germline variants in 75 or 79 OC-associated genes. Pathogenic variants of 11 genes were identified in 41 (17.8%) women: 19 (8.3%; BRCA1), 8 (3.5%; BRCA2), 6 (2.6%; mismatch repair genes), 3 (1.3%; RAD51D), 2 (0.9%; ATM), 1 (0.4%; MRE11A), 1 (FANCC), and 1 (GABRA6). Carriers of BRCA1/2 or any other tested gene pathogenic variants were more likely to be diagnosed younger, have first or second-degree relatives with OC, and have OC classified as high-grade serous carcinoma (HGSC). After adjustment for these variables, all 3 features were independent predictive factors for pathogenic variants in any tested genes whereas only the latter two remained for variants in BRCA1/2. Our data indicate similar variant prevalence in Japanese patients with OC and other ethnic groups and suggest that HGSC and OC family history may facilitate genetic predisposition prediction in Japanese patients with OC and referring high-risk patients for genetic counseling and testing. PMID:29348823

  12. Prevalence of pathogenic germline variants detected by multigene sequencing in unselected Japanese patients with ovarian cancer.

    PubMed

    Hirasawa, Akira; Imoto, Issei; Naruto, Takuya; Akahane, Tomoko; Yamagami, Wataru; Nomura, Hiroyuki; Masuda, Kiyoshi; Susumu, Nobuyuki; Tsuda, Hitoshi; Aoki, Daisuke

    2017-12-22

    Pathogenic germline BRCA1 , BRCA2 ( BRCA1/2 ), and several other gene variants predispose women to primary ovarian, fallopian tube, and peritoneal carcinoma (OC), although variant frequency and relevance information is scarce in Japanese women with OC. Using targeted panel sequencing, we screened 230 unselected Japanese women with OC from our hospital-based cohort for pathogenic germline variants in 75 or 79 OC-associated genes. Pathogenic variants of 11 genes were identified in 41 (17.8%) women: 19 (8.3%; BRCA1 ), 8 (3.5%; BRCA2 ), 6 (2.6%; mismatch repair genes), 3 (1.3%; RAD51D ), 2 (0.9%; ATM ), 1 (0.4%; MRE11A ), 1 ( FANCC ), and 1 ( GABRA6 ). Carriers of BRCA1/2 or any other tested gene pathogenic variants were more likely to be diagnosed younger, have first or second-degree relatives with OC, and have OC classified as high-grade serous carcinoma (HGSC). After adjustment for these variables, all 3 features were independent predictive factors for pathogenic variants in any tested genes whereas only the latter two remained for variants in BRCA1/2 . Our data indicate similar variant prevalence in Japanese patients with OC and other ethnic groups and suggest that HGSC and OC family history may facilitate genetic predisposition prediction in Japanese patients with OC and referring high-risk patients for genetic counseling and testing.

  13. Combined variants in factor VIII and prostaglandin synthase-1 amplify hemorrhage severity across three generations of descendants.

    PubMed

    Nance, D; Campbell, R A; Rowley, J W; Downie, J M; Jorde, L B; Kahr, W H; Mereby, S A; Tolley, N D; Zimmerman, G A; Weyrich, A S; Rondina, M T

    2016-11-01

    Essentials Co-existent damaging variants are likely to cause more severe bleeding and may go undiagnosed. We determined pathogenic variants in a three-generational pedigree with excessive bleeding. Bleeding occurred with concurrent variants in prostaglandin synthase-1 (PTGS-1) and factor VIII. The PTGS-1 variant was associated with functional defects in the arachidonic acid pathway. Background Inherited human variants that concurrently cause disorders of primary hemostasis and coagulation are uncommon. Nevertheless, rare cases of co-existent damaging variants are likely to cause more severe bleeding and may go undiagnosed. Objective We prospectively sought to determine pathogenic variants in a three-generational pedigree with excessive bleeding. Patients/methods Platelet number, size and light transmission aggregometry to multiple agonists were evaluated in pedigree members. Transmission electron microscopy determined platelet morphology and granule content. Thromboxane release studies and light transmission aggregometry in the presence or absence of prostaglandin G 2 assessed specific functional defects in the arachidonic acid pathway. Whole exome sequencing (WES) and targeted nucleotide sequence analysis identified potentially deleterious variants. Results Pedigree members with excessive bleeding had impaired platelet aggregation with arachidonic acid, epinephrine and low-dose ADP, as well as reduced platelet thromboxane B 2 release. Impaired platelet aggregation in response to 2MesADP was rescued with prostaglandin G 2 , a prostaglandin intermediate downstream of prostaglandin synthase-1 (PTGS-1) that aids in the production of thromboxane. WES identified a non-synonymous variant in the signal peptide of PTGS-1 (rs3842787; c.50C>T; p.Pro17Leu) that completely co-segregated with disease phenotype. A variant in the F8 gene causing hemophilia A (rs28935203; c.5096A>T; p.Y1699F) was also identified. Individuals with both variants had more severe bleeding

  14. Sequence Variants and Haplotype Analysis of Cat ERBB2 Gene: A Survey on Spontaneous Cat Mammary Neoplastic and Non-Neoplastic Lesions

    PubMed Central

    Santos, Sara; Bastos, Estela; Baptista, Cláudia S.; Sá, Daniela; Caloustian, Christophe; Guedes-Pinto, Henrique; Gärtner, Fátima; Gut, Ivo G.; Chaves, Raquel

    2012-01-01

    The human ERBB2 proto-oncogene is widely considered a key gene involved in human breast cancer onset and progression. Among spontaneous tumors, mammary tumors are the most frequent cause of cancer death in cats and second most frequent in humans. In fact, naturally occurring tumors in domestic animals, more particularly cat mammary tumors, have been proposed as a good model for human breast cancer, but critical genetic and molecular information is still scarce. The aims of this study include the analysis of the cat ERBB2 gene partial sequences (between exon 17 and 20) in order to characterize a normal and a mammary lesion heterogeneous populations. Cat genomic DNA was extracted from normal frozen samples (n = 16) and from frozen and formalin-fixed paraffin-embedded mammary lesion samples (n = 41). We amplified and sequenced two cat ERBB2 DNA fragments comprising exons 17 to 20. It was possible to identify five sequence variants and six haplotypes in the total population. Two sequence variants and two haplotypes show to be specific for cat mammary tumor samples. Bioinformatics analysis predicts that four of the sequence variants can produce alternative transcripts or activate cryptic splicing sites. Also, a possible association was identified between clinicopathological traits and the variant haplotypes. As far as we know, this is the first attempt to examine ERBB2 genetic variations in cat mammary genome and its possible association with the onset and progression of cat mammary tumors. The demonstration of a possible association between primary tumor size (one of the two most important prognostic factors) and the number of masses with the cat ERBB2 variant haplotypes reveal the importance of the analysis of this gene in veterinary medicine. PMID:22489125

  15. VarMod: modelling the functional effects of non-synonymous variants.

    PubMed

    Pappalardo, Morena; Wass, Mark N

    2014-07-01

    Unravelling the genotype-phenotype relationship in humans remains a challenging task in genomics studies. Recent advances in sequencing technologies mean there are now thousands of sequenced human genomes, revealing millions of single nucleotide variants (SNVs). For non-synonymous SNVs present in proteins the difficulties of the problem lie in first identifying those nsSNVs that result in a functional change in the protein among the many non-functional variants and in turn linking this functional change to phenotype. Here we present VarMod (Variant Modeller) a method that utilises both protein sequence and structural features to predict nsSNVs that alter protein function. VarMod develops recent observations that functional nsSNVs are enriched at protein-protein interfaces and protein-ligand binding sites and uses these characteristics to make predictions. In benchmarking on a set of nearly 3000 nsSNVs VarMod performance is comparable to an existing state of the art method. The VarMod web server provides extensive resources to investigate the sequence and structural features associated with the predictions including visualisation of protein models and complexes via an interactive JSmol molecular viewer. VarMod is available for use at http://www.wasslab.org/varmod. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Isolation and Molecular Characterization of Novel Infectious Bronchitis Virus Variants from Vaccinated Broiler Flocks in Egypt.

    PubMed

    Abdel-Sabour, Mohammed A; Al-Ebshahy, Emad M; Khaliel, Samy A; Abdel-Wanis, Nabil A; Yanai, Tokuma

    2017-09-01

    The present study aimed to determine the molecular characteristics of circulating infectious bronchitis virus (IBV) strains in vaccinated broiler flocks in the Giza and Fayoum governorates. Thirty-four isolates were collected, and egg propagation revealed their ability to induce typical IBV lesions after three to five successive passages. Three selected isolates were identified as IBV using a real-time reverse transcriptase-PCR assay targeted the nucleocapsid (N) gene and further characterized by partial spike (S) gene sequence analysis. Phylogenetic analysis revealed their clustering into two variant groups. Group I consisted of one variant (VSVRI_F3), which had 99.1% nucleotide sequence identity to the Q1 reference strain. Group II consisted of variants VSVRI_G4 and VSVRI_G9, which showed 92.8%-94.3% nucleotide identity with the Egyptian variants Eg/12120S/2012, Eg/12197B/2012, and Eg/1265B/2012. Regarding the deduced amino acid sequence, the three variants had 77.1%-85.2% similarity with the vaccine strains currently used in Egypt. These findings highlight the importance of monitoring the prevalence of IBV variants in vaccinated broiler flocks as well as adopting an appropriate vaccination strategy.

  17. Whole genome sequencing of an African American family highlights toll like receptor 6 variants in Kawasaki disease susceptibility.

    PubMed

    Kim, Jihoon; Shimizu, Chisato; Kingsmore, Stephen F; Veeraraghavan, Narayanan; Levy, Eric; Ribeiro Dos Santos, Andre M; Yang, Hai; Flatley, Jay; Hoang, Long Truong; Hibberd, Martin L; Tremoulet, Adriana H; Harismendy, Olivier; Ohno-Machado, Lucila; Burns, Jane C

    2017-01-01

    Kawasaki disease (KD) is the most common acquired pediatric heart disease. We analyzed Whole Genome Sequences (WGS) from a 6-member African American family in which KD affected two of four children. We sought rare, potentially causative genotypes by sequentially applying the following WGS filters: sequence quality scores, inheritance model (recessive homozygous and compound heterozygous), predicted deleteriousness, allele frequency, genes in KD-associated pathways or with significant associations in published KD genome-wide association studies (GWAS), and with differential expression in KD blood transcriptomes. Biologically plausible genotypes were identified in twelve variants in six genes in the two affected children. The affected siblings were compound heterozygous for the rare variants p.Leu194Pro and p.Arg247Lys in Toll-like receptor 6 (TLR6), which affect TLR6 signaling. The affected children were also homozygous for three common, linked (r2 = 1) intronic single nucleotide variants (SNVs) in TLR6 (rs56245262, rs56083757 and rs7669329), that have previously shown association with KD in cohorts of European descent. Using transcriptome data from pre-treatment whole blood of KD subjects (n = 146), expression quantitative trait loci (eQTL) analyses were performed. Subjects homozygous for the intronic risk allele (A allele of TLR6 rs56245262) had differential expression of Interleukin-6 (IL-6) as a function of genotype (p = 0.0007) and a higher erythrocyte sedimentation rate at diagnosis. TLR6 plays an important role in pathogen-associated molecular pattern recognition, and sequence variations may affect binding affinities that in turn influence KD susceptibility. This integrative genomic approach illustrates how the analysis of WGS in multiplex families with a complex genetic disease allows examination of both the common disease-common variant and common disease-rare variant hypotheses.

  18. Introduction to Deep Sequencing and Its Application to Drug Addiction Research with a Focus on Rare Variants

    PubMed Central

    Wang, Shaolin; Yang, Zhongli; Ma, Jennie Z.; Payne, Thomas J.; Li, Ming D

    2013-01-01

    Through linkage analysis, candidate gene approach, and genome-wide association studies (GWAS), many genetic susceptibility factors for substance dependence have been discovered, such as the alcohol dehydrogenase gene (ALDH2) for alcohol dependence (AD) and nicotinic acetylcholine receptor (nAChR) subunit variants on chromosomes 8 and 15 for nicotine dependence (ND). However, these confirmed genetic factors contribute only a small portion of the heritability responsible for each addiction. Among many potential factors, rare variants in those identified and unidentified susceptibility genes are supposed to contribute greatly to the missing heritability. Several studies focusing on rare variants have been conducted by taking advantage of next-generation sequencing technologies, which revealed that some rare variants of nAChR subunits are associated with ND in both genetic and functional studies. However, these studies investigated variants for only a small number of genes and need to be expanded to broad regions/genes in a larger population. This review presents an update on recently developed methods for rare-variant identification and association analysis and on studies focused on rare-variant discovery and function related to addictions. PMID:23990377

  19. CEP72‐ROS1: A novel ROS1 oncogenic fusion variant in lung adenocarcinoma identified by next‐generation sequencing

    PubMed Central

    Zhu, You‐cai; Zhou, Yue‐fen; Zhuang, Wu; Du, Kai‐qi; Chen, Gang

    2018-01-01

    ROS1 rearrangement is a validated therapeutic driver gene in non‐small cell lung cancer (NSCLC) and represents a small subset (1–2%) of NSCLC. A total of 17 different fusion partner genes of ROS1 in NSCLC have been reported. The multi‐targeted MET/ALK/ROS1 tyrosine kinase inhibitor (TKI) crizotinib has demonstrated remarkable efficacy in ROS1‐rearranged NSCLC. Consequently, ROS1 detection assays include fluorescence in situ hybridization, immunohistochemistry, and real‐time PCR. Next‐generation sequencing (NGS) assay covers a range of fusion genes and approaches to discover novel receptor‐kinase rearrangements in lung cancer. A 63‐year‐old male smoker with stage IV NSCLC (TxNxM1) was detected with a novel ROS1 fusion. Histological examination of the tumor showed lung adenocarcinoma. NGS analysis of the hydrothorax cellblocks revealed a novel CEP72‐ROS1 rearrangement. This novel CEP72‐ROS1 fusion variant is generated by the fusion of exons 1–11 of CEP72 on chromosome 5p15 to exons 23–43 of ROS1 on chromosome 6q22. The predicted CEP72‐ROS1 protein product contains 1202 amino acids comprising the N‐terminal amino acids 594–647 of CEP72 and C‐terminal amino acid 1‐1148 of ROS1. CEP72‐ROS1 is a novel ROS1 fusion variant in NSCLC discovered by NGS and could be included in ROS1 detection assay, such as reverse transcription PCR. Pleural effusion samples show good diagnostic performance in clinical practice. PMID:29517860

  20. High-throughput sequencing of mGluR signaling pathway genes reveals enrichment of rare variants in autism.

    PubMed

    Kelleher, Raymond J; Geigenmüller, Ute; Hovhannisyan, Hayk; Trautman, Edwin; Pinard, Robert; Rathmell, Barbara; Carpenter, Randall; Margulies, David

    2012-01-01

    Identification of common molecular pathways affected by genetic variation in autism is important for understanding disease pathogenesis and devising effective therapies. Here, we test the hypothesis that rare genetic variation in the metabotropic glutamate-receptor (mGluR) signaling pathway contributes to autism susceptibility. Single-nucleotide variants in genes encoding components of the mGluR signaling pathway were identified by high-throughput multiplex sequencing of pooled samples from 290 non-syndromic autism cases and 300 ethnically matched controls on two independent next-generation platforms. This analysis revealed significant enrichment of rare functional variants in the mGluR pathway in autism cases. Higher burdens of rare, potentially deleterious variants were identified in autism cases for three pathway genes previously implicated in syndromic autism spectrum disorder, TSC1, TSC2, and SHANK3, suggesting that genetic variation in these genes also contributes to risk for non-syndromic autism. In addition, our analysis identified HOMER1, which encodes a postsynaptic density-localized scaffolding protein that interacts with Shank3 to regulate mGluR activity, as a novel autism-risk gene. Rare, potentially deleterious HOMER1 variants identified uniquely in the autism population affected functionally important protein regions or regulatory sequences and co-segregated closely with autism among children of affected families. We also identified rare ASD-associated coding variants predicted to have damaging effects on components of the Ras/MAPK cascade. Collectively, these findings suggest that altered signaling downstream of mGluRs contributes to the pathogenesis of non-syndromic autism.

  1. High-Throughput Sequencing of mGluR Signaling Pathway Genes Reveals Enrichment of Rare Variants in Autism

    PubMed Central

    Hovhannisyan, Hayk; Trautman, Edwin; Pinard, Robert; Rathmell, Barbara; Carpenter, Randall; Margulies, David

    2012-01-01

    Identification of common molecular pathways affected by genetic variation in autism is important for understanding disease pathogenesis and devising effective therapies. Here, we test the hypothesis that rare genetic variation in the metabotropic glutamate-receptor (mGluR) signaling pathway contributes to autism susceptibility. Single-nucleotide variants in genes encoding components of the mGluR signaling pathway were identified by high-throughput multiplex sequencing of pooled samples from 290 non-syndromic autism cases and 300 ethnically matched controls on two independent next-generation platforms. This analysis revealed significant enrichment of rare functional variants in the mGluR pathway in autism cases. Higher burdens of rare, potentially deleterious variants were identified in autism cases for three pathway genes previously implicated in syndromic autism spectrum disorder, TSC1, TSC2, and SHANK3, suggesting that genetic variation in these genes also contributes to risk for non-syndromic autism. In addition, our analysis identified HOMER1, which encodes a postsynaptic density-localized scaffolding protein that interacts with Shank3 to regulate mGluR activity, as a novel autism-risk gene. Rare, potentially deleterious HOMER1 variants identified uniquely in the autism population affected functionally important protein regions or regulatory sequences and co-segregated closely with autism among children of affected families. We also identified rare ASD-associated coding variants predicted to have damaging effects on components of the Ras/MAPK cascade. Collectively, these findings suggest that altered signaling downstream of mGluRs contributes to the pathogenesis of non-syndromic autism. PMID:22558107

  2. Mutation Update for GNE Gene Variants Associated with GNE Myopathy

    PubMed Central

    Celeste, Frank V.; Vilboux, Thierry; Ciccone, Carla; de Dios, John Karl; Malicdan, May Christine V.; Leoyklang, Petcharat; McKew, John C.; Gahl, William A.; Carrillo-Carrasco, Nuria; Huizing, Marjan

    2014-01-01

    The GNE gene encodes the rate-limiting, bifunctional enzyme of sialic acid biosynthesis, UDP-N-acetylglucosamine 2-epimerase/N-acetylmannosamine kinase (GNE). Biallelic GNE mutations underlie GNE myopathy, an adult-onset progressive myopathy. GNE myopathy-associated GNE mutations are predominantly missense, resulting in reduced, but not absent, GNE enzyme activities. The exact pathomechanism of GNE myopathy remains unknown, but likely involves aberrant (muscle) sialylation. Here we summarize 154 reported and novel GNE variants associated with GNE myopathy, including 122 missense, 11 nonsense, 14 insertion/deletions and 7 intronic variants. All variants were deposited in the online GNE variation database (http://www.dmd.nl/nmdb2/home.php?select_db=GNE). We report the predicted effects on protein function of all variants as well as the predicted effects on epimerase and/or kinase enzymatic activities of selected variants. By analyzing exome sequence databases, we identified three frequently occurring, unreported GNE missense variants/polymorphisms, important for future sequence interpretations. Based on allele frequencies, we estimate the world-wide prevalence of GNE myopathy to be ~ 4–21/1,000,000. This previously unrecognized high prevalence confirms suspicions that many patients may escape diagnosis. Awareness among physicians for GNE myopathy is essential for the identification of new patients, which is required for better understanding of the disorder’s pathomechanism and for the success of ongoing treatment trials. PMID:24796702

  3. European external quality control study on the competence of laboratories to recognize rare sequence variants resulting in unusual genotyping results.

    PubMed

    Márki-Zay, János; Klein, Christoph L; Gancberg, David; Schimmel, Heinz G; Dux, László

    2009-04-01

    Depending on the method used, rare sequence variants adjacent to the single nucleotide polymorphism (SNP) of interest may cause unusual or erroneous genotyping results. Because such rare variants are known for many genes commonly tested in diagnostic laboratories, we organized a proficiency study to assess their influence on the accuracy of reported laboratory results. Four external quality control materials were processed and sent to 283 laboratories through 3 EQA organizers for analysis of the prothrombin 20210G>A mutation. Two of these quality control materials contained sequence variants introduced by site-directed mutagenesis. One hundred eighty-nine laboratories participated in the study. When samples gave a usual result with the method applied, the error rate was 5.1%. Detailed analysis showed that more than 70% of the failures were reported from only 9 laboratories. Allele-specific amplification-based PCR had a much higher error rate than other methods (18.3% vs 2.9%). The variants 20209C>T and [20175T>G; 20179_20180delAC] resulted in unusual genotyping results in 67 and 85 laboratories, respectively. Eighty-three (54.6%) of these unusual results were not recognized, 32 (21.1%) were attributed to technical issues, and only 37 (24.3%) were recognized as another sequence variant. Our findings revealed that some of the participating laboratories were not able to recognize and correctly interpret unusual genotyping results caused by rare SNPs. Our study indicates that the majority of the failures could be avoided by improved training and careful selection and validation of the methods applied.

  4. Antagonistic lactic acid bacteria isolated from goat milk and identification of a novel nisin variant Lactococcus lactis

    PubMed Central

    2014-01-01

    Background The raw goat milk microbiota is considered a good source of novel bacteriocinogenic lactic acid bacteria (LAB) strains that can be exploited as an alternative for use as biopreservatives in foods. The constant demand for such alternative tools justifies studies that investigate the antimicrobial potential of such strains. Results The obtained data identified a predominance of Lactococcus and Enterococcus strains in raw goat milk microbiota with antimicrobial activity against Listeria monocytogenes ATCC 7644. Enzymatic assays confirmed the bacteriocinogenic nature of the antimicrobial substances produced by the isolated strains, and PCR reactions detected a variety of bacteriocin-related genes in their genomes. Rep-PCR identified broad genetic variability among the Enterococcus isolates, and close relations between the Lactococcus strains. The sequencing of PCR products from nis-positive Lactococcus allowed the identification of a predicted nisin variant not previously described and possessing a wide inhibitory spectrum. Conclusions Raw goat milk was confirmed as a good source of novel bacteriocinogenic LAB strains, having identified Lactococcus isolates possessing variations in their genomes that suggest the production of a nisin variant not yet described and with potential for use as biopreservatives in food due to its broad spectrum of action. PMID:24521354

  5. Asparagine-linked oligosaccharides present on a non-consensus amino acid sequence in the CH1 domain of human antibodies.

    PubMed

    Valliere-Douglass, John F; Kodama, Paul; Mujacic, Mirna; Brady, Lowell J; Wang, Wes; Wallace, Alison; Yan, Boxu; Reddy, Pranhitha; Treuheit, Michael J; Balland, Alain

    2009-11-20

    We report that N-linked oligosaccharide structures can be present on an asparagine residue not adhering to the consensus site motif NX(S/T), where X is not proline, described in the literature. We have observed oligosaccharides on a non-consensus asparaginyl residue in the C(H)1 constant domain of IgG1 and IgG2 antibodies. The initial findings were obtained from characterization of charge variant populations evident in a recombinant human antibody of the IgG2 subclass. HPLC-MS results indicated that cation-exchange chromatography acidic variant populations were enriched in antibody with a second glycosylation site, in addition to the well documented canonical glycosylation site located in the C(H)2 domain. Subsequent tryptic and chymotryptic peptide map data indicated that the second glycosylation site was associated with the amino acid sequence TVSWN(162)SGAL in the C(H)1 domain of the antibody. This highly atypical modification is present at levels of 0.5-2.0% on most of the recombinant antibodies that have been tested and has also been observed in IgG1 antibodies derived from human donors. Site-directed mutagenesis of the C(H)1 domain sequence in a recombinant-human IgG1 antibody resulted in an increase in non-consensus glycosylation to 3.15%, a greater than 4-fold increase over the level observed in the wild type, by changing the -1 and +1 amino acids relative to the asparagine residue at position 162. We believe that further understanding of the phenomenon of non-consensus glycosylation can be used to gain fundamental insights into the fidelity of the cellular glycosylation machinery.

  6. Seshat: A Web service for accurate annotation, validation, and analysis of TP53 variants generated by conventional and next-generation sequencing.

    PubMed

    Tikkanen, Tuomas; Leroy, Bernard; Fournier, Jean Louis; Risques, Rosa Ana; Malcikova, Jitka; Soussi, Thierry

    2018-07-01

    Accurate annotation of genomic variants in human diseases is essential to allow personalized medicine. Assessment of somatic and germline TP53 alterations has now reached the clinic and is required in several circumstances such as the identification of the most effective cancer therapy for patients with chronic lymphocytic leukemia (CLL). Here, we present Seshat, a Web service for annotating TP53 information derived from sequencing data. A flexible framework allows the use of standard file formats such as Mutation Annotation Format (MAF) or Variant Call Format (VCF), as well as common TXT files. Seshat performs accurate variant annotations using the Human Genome Variation Society (HGVS) nomenclature and the stable TP53 genomic reference provided by the Locus Reference Genomic (LRG). In addition, using the 2017 release of the UMD_TP53 database, Seshat provides multiple statistical information for each TP53 variant including database frequency, functional activity, or pathogenicity. The information is delivered in standardized output tables that minimize errors and facilitate comparison of mutational data across studies. Seshat is a beneficial tool to interpret the ever-growing TP53 sequencing data generated by multiple sequencing platforms and it is freely available via the TP53 Website, http://p53.fr or directly at http://vps338341.ovh.net/. © 2018 Wiley Periodicals, Inc.

  7. Rare variant testing across methods and thresholds using the multi-kernel sequence kernel association test (MK-SKAT).

    PubMed

    Urrutia, Eugene; Lee, Seunggeun; Maity, Arnab; Zhao, Ni; Shen, Judong; Li, Yun; Wu, Michael C

    Analysis of rare genetic variants has focused on region-based analysis wherein a subset of the variants within a genomic region is tested for association with a complex trait. Two important practical challenges have emerged. First, it is difficult to choose which test to use. Second, it is unclear which group of variants within a region should be tested. Both depend on the unknown true state of nature. Therefore, we develop the Multi-Kernel SKAT (MK-SKAT) which tests across a range of rare variant tests and groupings. Specifically, we demonstrate that several popular rare variant tests are special cases of the sequence kernel association test which compares pair-wise similarity in trait value to similarity in the rare variant genotypes between subjects as measured through a kernel function. Choosing a particular test is equivalent to choosing a kernel. Similarly, choosing which group of variants to test also reduces to choosing a kernel. Thus, MK-SKAT uses perturbation to test across a range of kernels. Simulations and real data analyses show that our framework controls type I error while maintaining high power across settings: MK-SKAT loses power when compared to the kernel for a particular scenario but has much greater power than poor choices.

  8. Brute-Force Approach for Mass Spectrometry-Based Variant Peptide Identification in Proteogenomics without Personalized Genomic Data

    NASA Astrophysics Data System (ADS)

    Ivanov, Mark V.; Lobas, Anna A.; Levitsky, Lev I.; Moshkovskii, Sergei A.; Gorshkov, Mikhail V.

    2018-02-01

    In a proteogenomic approach based on tandem mass spectrometry analysis of proteolytic peptide mixtures, customized exome or RNA-seq databases are employed for identifying protein sequence variants. However, the problem of variant peptide identification without personalized genomic data is important for a variety of applications. Following the recent proposal by Chick et al. (Nat. Biotechnol. 33, 743-749, 2015) on the feasibility of such variant peptide search, we evaluated two available approaches based on the previously suggested "open" search and the "brute-force" strategy. To improve the efficiency of these approaches, we propose an algorithm for exclusion of false variant identifications from the search results involving analysis of modifications mimicking single amino acid substitutions. Also, we propose a de novo based scoring scheme for assessment of identified point mutations. In the scheme, the search engine analyzes y-type fragment ions in MS/MS spectra to confirm the location of the mutation in the variant peptide sequence.

  9. Hybridization and sequencing of nucleic acids using base pair mismatches

    DOEpatents

    Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

    2001-01-01

    Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.

  10. Structural and functional interaction of fatty acids with human liver fatty acid-binding protein (L-FABP) T94A variant.

    PubMed

    Huang, Huan; McIntosh, Avery L; Martin, Gregory G; Landrock, Kerstin K; Landrock, Danilo; Gupta, Shipra; Atshaves, Barbara P; Kier, Ann B; Schroeder, Friedhelm

    2014-05-01

    The human liver fatty acid-binding protein (L-FABP) T94A variant, the most common in the FABP family, has been associated with elevated liver triglyceride levels. How this amino acid substitution elicits these effects is not known. This issue was addressed using human recombinant wild-type (WT) and T94A variant L-FABP proteins as well as cultured primary human hepatocytes expressing the respective proteins (genotyped as TT, TC and CC). The T94A substitution did not alter or only slightly altered L-FABP binding affinities for saturated, monounsaturated or polyunsaturated long chain fatty acids, nor did it change the affinity for intermediates of triglyceride synthesis. Nevertheless, the T94A substitution markedly altered the secondary structural response of L-FABP induced by binding long chain fatty acids or intermediates of triglyceride synthesis. Finally, the T94A substitution markedly decreased the levels of induction of peroxisome proliferator-activated receptor α-regulated proteins such as L-FABP, fatty acid transport protein 5 and peroxisome proliferator-activated receptor α itself meditated by the polyunsaturated fatty acids eicosapentaenoic acid and docosahexaenoic acid in cultured primary human hepatocytes. Thus, although the T94A substitution did not alter the affinity of human L-FABP for long chain fatty acids, it significantly altered human L-FABP structure and stability, as well as the conformational and functional response to these ligands. © 2014 FEBS.

  11. Mouse Vk gene classification by nucleic acid sequence similarity.

    PubMed

    Strohal, R; Helmberg, A; Kroemer, G; Kofler, R

    1989-01-01

    Analyses of immunoglobulin (Ig) variable (V) region gene usage in the immune response, estimates of V gene germline complexity, and other nucleic acid hybridization-based studies depend on the extent to which such genes are related (i.e., sequence similarity) and their organization in gene families. While mouse Igh heavy chain V region (VH) gene families are relatively well-established, a corresponding systematic classification of Igk light chain V region (Vk) genes has not been reported. The present analysis, in the course of which we reviewed the known extent of the Vk germline gene repertoire and Vk gene usage in a variety of responses to foreign and self antigens, provides a classification of mouse Vk genes in gene families composed of members with greater than 80% overall nucleic acid sequence similarity. This classification differed in several aspects from that of VH genes: only some Vk gene families were as clearly separated (by greater than 25% sequence dissimilarity) as typical VH gene families; most Vk gene families were closely related and, in several instances, members from different families were very similar (greater than 80%) over large sequence portions; frequently, classification by nucleic acid sequence similarity diverged from existing classifications based on amino-terminal protein sequence similarity. Our data have implications for Vk gene analyses by nucleic acid hybridization and describe potentially important differences in sequence organization between VH and Vk genes.

  12. Sequence variants of the DFNB31 gene among Usher syndrome patients of diverse origin

    PubMed Central

    Aller, Elena; Jaijo, Teresa; van Wijk, Erwin; Ebermann, Inga; Kersten, Ferry; García-García, Gema; Voesenek, Krysta; Aparisi, María José; Hoefsloot, Lies; Cremers, Cor; Díaz-Llopis, Manuel; Pennings, Ronald; Bolz, Hanno J.; Kremer, Hannie; Millán, José M.

    2010-01-01

    Purpose It has been demonstrated that mutations in deafness, autosomal recessive 31 (DFNB31), the gene encoding whirlin, is responsible for nonsyndromic hearing loss (NSHL; DFNB31) and Usher syndrome type II (USH2D). We screened DFNB31 in a large cohort of patients with different clinical subtypes of Usher syndrome (USH) to determine the prevalence of DFNB31 mutations among USH patients. Methods DFNB31 was screened in 149 USH2, 29 USH1, six atypical USH, and 11 unclassified USH patients from diverse ethnic backgrounds. Mutation detection was performed by direct sequencing of all coding exons. Results We identified 38 different variants among 195 patients. Most variants were clearly polymorphic, but at least two out of the 15 nonsynonymous variants (p.R350W and p.R882S) are predicted to impair whirlin structure and function, suggesting eventual pathogenicity. No putatively pathogenic mutation was found in the second allele of patients with these mutations. Conclusions DFNB31 is not a major cause of USH. PMID:20352026

  13. STRUCTURAL AND FUNCTIONAL INTERACTION OF FATTY ACIDS WITH HUMAN LIVER FATTY ACID BINDING PROTEIN (L-FABP) T94A VARIANT

    PubMed Central

    Huang, Huan; McIntosh, Avery L.; Martin, Gregory G.; Landrock, Kerstin K.; Landrock, Danilo; Gupta, Shipra; Atshaves, Barbara P.; Kier, Ann B.; Schroeder, Friedhelm

    2014-01-01

    The human liver fatty acid binding protein (L-FABP) T94A variant, the most common in the FABP family, has been associated with elevated liver triglyceride (TG) levels. How this amino acid substitution elicits these effects is not known. This issue was addressed with human recombinant wild-type (WT, T94T) and T94A variant L-FABP proteins as well as cultured primary human hepatocytes expressing the respective proteins (genotyped as TT, TC, and CC). T94A substitution did not or only slightly alter L-FABP binding affinities for saturated, monounsaturated, or polyunsaturated long chain fatty acids (LCFA), nor did it change the affinity for intermediates in TG synthesis. Nevertheless, T94A substitution markedly altered the secondary structural response of L-FABP induced by binding LCFA or intermediates of TG synthesis. Finally, T94A substitution markedly diminished polyunsaturated fatty acid, eicosapentaenoic acid (EPA) or docosahexaenoic acid (DHA), induction of peroxisome proliferator-activated receptor alpha (PPARα) - regulated proteins such as L-FABP, fatty acid transport protein 5 (FATP5), and PPARα itself in cultured primary human hepatocytes. Thus, while T94A substitution did not alter the affinity of human L-FABP for LCFAs, it significantly altered human L-FABP structure and stability as well as conformational and functional response to these ligands. PMID:24628888

  14. Stability of monomeric Cro variants: Isoenergetic transformation of a type I' to a type II' beta-hairpin by single amino acid replacements.

    PubMed

    Mollah, A K M M; Stennis, Rhonda L; Mossing, Michael C

    2003-05-01

    The thermodynamic stabilities of three monomeric variants of the bacteriophage lambda Cro repressor that differ only in the sequence of two amino acids at the apex of an engineered beta-hairpin have been determined. The sequences of the turns are EVK-XX-EVK, where the two central residues are DG, GG, and GT, respectively. Standard-state unfolding free energies, determined from circular dichroism measurements as a function of urea concentration, range from 2.4 to 2.7 kcal/mole, while those determined from guanidine hydrochloride range from 2.8 to 3.3 kcal/mole for the three proteins. Thermal denaturation yields van't Hoff unfolding enthalpies of 36 to 40 kcal /mole at midpoint temperatures in the range of 53 to 58 degrees C. Extrapolation of the thermal denaturation free energies with heat capacities of 400 to 600 cal/mole deg gives good agreement with the parameters determined in denaturant titrations. As predicted from statistical surveys of amino acid replacements in beta-hairpins, energetic barriers to transformation from a type I' turn (DG) to a type II' turn (GT) can be quite small.

  15. Deep sequencing shows low-level oncogenic hepatitis B virus variants persists post-liver transplant despite potent anti-HBV prophylaxis.

    PubMed

    Lau, K C K; Osiowy, C; Giles, E; Lusina, B; van Marle, G; Burak, K W; Coffin, C S

    2018-06-01

    Recent studies suggest that withdrawal of hepatitis B immune globulin (HBIG) and nucleos(t)ide analogues (NA) prophylaxis may be considered in HBV surface antigen (HBsAg)-negative liver transplant (LT) recipients with a low risk of disease recurrence. However, the frequency of occult HBV infection (OBI) and HBV variants after LT in the current era of potent NA therapy is unknown. Twelve LT recipients on prophylaxis were tested in matched plasma and peripheral blood mononuclear cells (PBMCs) for HBV quasispecies by in-house nested PCR and next-generation sequencing of amplicons. HBV covalently closed circular DNA (cccDNA) was detected in Hirt DNA isolated from PBMCs with cccDNA-specific primers and confirmed by nucleic acid hybridization and Sanger sequencing. HBV mRNA in PBMC was detected with reverse-transcriptase nested PCR. In LT recipients on immunosuppressive therapy (10/12 male; median age 57.5 [IQR: 39.8-66.5]; median follow-up post-LT 60 months; 6 pre-LT hepatocellular carcinoma [HCC]), 9 were HBsAg-. HBV DNA was detected in all plasma and PBMC tested; cccDNA and/or mRNA was detected in the PBMC of 10/12 patients. Significant HBV quasispecies diversity (ie 143-2212 nonredundant HBV species) was noted in both sites, and single nucleotide polymorphisms associated with cirrhosis and HCC were detected at varying frequencies. In conclusion, OBI and HBV variants associated with severe liver disease persist in LT recipients on prophylaxis. Although HBV control and cccDNA transcriptional silencing may occur despite immunosuppression, complete virological eradication does not occur in LT recipients with a history of HBV-related end-stage liver disease. © 2018 John Wiley & Sons Ltd.

  16. Analysis of Sequence Variation and Risk Association of Human Papillomavirus 52 Variants Circulating in Korea

    PubMed Central

    Choi, Youn Jin; Ki, Eun Young; Zhang, Chuqing; Ho, Wendy C. S.; Lee, Sung-Jong; Jeong, Min Jin

    2016-01-01

    Introduction Human papillomavirus (HPV) 52 is a carcinogenic, high-risk genotype frequently detected in cervical cancer cases from East Asia, including Korea. Materials and Methods Sequences of HPV52 detected in 91 cervical samples collected from women attending Seoul St. Mary’s Hospital were analyzed. HPV52 genomic sequences were obtained by polymerase chain reaction (PCR)-based sequencing and analyzed using Seq-Scape software, and phylogenetic trees were constructed using MEGA6 software. Results Of the 91 cervical samples, 40 were normal, 22 were low-grade lesions, 21 were high-grade lesions and 7 were squamous cell carcinomas. Four HPV52 variant lineages (A, B, C and D) were identified. Lineage B was the most frequently detected lineage, followed by lineage C. By analyzing the two most frequently detected lineages (B and C), we found that distinct variations existed in each lineage. We also found that a lineage B-specific mutation K93R (A379G) was associated with an increased risk of cervical neoplasia. Conclusions To our knowledge, we are the first to reveal the predominance of the HPV52 lineages, B and C, in Korea. We also found these lineages harbored distinct genetic alterations that may affect oncogenicity. Our findings increase our understanding on the heterogeneity of HPV52 variants, and may be useful for the development of new diagnostic assays and therapeutic vaccines. PMID:27977741

  17. Exome Sequencing Is an Efficient Tool for Variant Late-Infantile Neuronal Ceroid Lipofuscinosis Molecular Diagnosis

    PubMed Central

    Ortega-Recalde, Oscar; Nallathambi, Jeyabalan; Anandula, Venkata Ramana; Renukaradhya, Umashankar; Laissue, Paul

    2014-01-01

    The neuronal ceroid-lipofuscinoses (NCL) is a group of neurodegenerative disorders characterized by epilepsy, visual failure, progressive mental and motor deterioration, myoclonus, dementia and reduced life expectancy. Classically, NCL-affected individuals have been classified into six categories, which have been mainly defined regarding the clinical onset of symptoms. However, some patients cannot be easily included in a specific group because of significant variation in the age of onset and disease progression. Molecular genetics has emerged in recent years as a useful tool for enhancing NCL subtype classification. Fourteen NCL genetic forms (CLN1 to CLN14) have been described to date. The variant late-infantile form of the disease has been linked to CLN5, CLN6, CLN7 (MFSD8) and CLN8 mutations. Despite advances in the diagnosis of neurodegenerative disorders mutations in these genes may cause similar phenotypes, which rends difficult accurate candidate gene selection for direct sequencing. Three siblings who were affected by variant late-infantile NCL are reported in the present study. We used whole-exome sequencing, direct sequencing and in silico approaches to identify the molecular basis of the disease. We identified the novel c.1219T>C (p.Trp407Arg) and c.1361T>C (p.Met454Thr) MFSD8 pathogenic mutations. Our results highlighted next generation sequencing as a novel and powerful methodological approach for the rapid determination of the molecular diagnosis of NCL. They also provide information regarding the phenotypic and molecular spectrum of CLN7 disease. PMID:25333361

  18. Exome sequencing is an efficient tool for variant late-infantile neuronal ceroid lipofuscinosis molecular diagnosis.

    PubMed

    Patiño, Liliana Catherine; Battu, Rajani; Ortega-Recalde, Oscar; Nallathambi, Jeyabalan; Anandula, Venkata Ramana; Renukaradhya, Umashankar; Laissue, Paul

    2014-01-01

    The neuronal ceroid-lipofuscinoses (NCL) is a group of neurodegenerative disorders characterized by epilepsy, visual failure, progressive mental and motor deterioration, myoclonus, dementia and reduced life expectancy. Classically, NCL-affected individuals have been classified into six categories, which have been mainly defined regarding the clinical onset of symptoms. However, some patients cannot be easily included in a specific group because of significant variation in the age of onset and disease progression. Molecular genetics has emerged in recent years as a useful tool for enhancing NCL subtype classification. Fourteen NCL genetic forms (CLN1 to CLN14) have been described to date. The variant late-infantile form of the disease has been linked to CLN5, CLN6, CLN7 (MFSD8) and CLN8 mutations. Despite advances in the diagnosis of neurodegenerative disorders mutations in these genes may cause similar phenotypes, which rends difficult accurate candidate gene selection for direct sequencing. Three siblings who were affected by variant late-infantile NCL are reported in the present study. We used whole-exome sequencing, direct sequencing and in silico approaches to identify the molecular basis of the disease. We identified the novel c.1219T>C (p.Trp407Arg) and c.1361T>C (p.Met454Thr) MFSD8 pathogenic mutations. Our results highlighted next generation sequencing as a novel and powerful methodological approach for the rapid determination of the molecular diagnosis of NCL. They also provide information regarding the phenotypic and molecular spectrum of CLN7 disease.

  19. DHAD variants and methods of screening

    DOEpatents

    Kelly, Kristen J.; Ye, Rick W.

    2017-02-28

    Methods of screening for dihydroxy-acid dehydratase (DHAD) variants that display increased DHAD activity are disclosed, along with DHAD variants identified by these methods. Such enzymes can result in increased production of compounds from DHAD requiring biosynthetic pathways. Also disclosed are isolated nucleic acids encoding the DHAD variants, recombinant host cells comprising the isolated nucleic acid molecules, and methods of producing butanol.

  20. Ultradeep Sequencing for Detection of Quasispecies Variants in the Major Hydrophilic Region of Hepatitis B Virus in Indonesian Patients

    PubMed Central

    Yamani, Laura Navika; Utsumi, Takako; Juniastuti; Wandono, Hadi; Widjanarko, Doddy; Triantanoe, Ari; Wasityastuti, Widya; Liang, Yujiao; Okada, Rina; Tanahashi, Toshihito; Murakami, Yoshiki; Azuma, Takeshi; Soetjipto; Lusida, Maria Inge; Hayashi, Yoshitake

    2015-01-01

    Quasispecies of hepatitis B virus (HBV) with variations in the major hydrophilic region (MHR) of the HBV surface antigen (HBsAg) can evolve during infection, allowing HBV to evade neutralizing antibodies. These escape variants may contribute to chronic infections. In this study, we looked for MHR variants in HBV quasispecies using ultradeep sequencing and evaluated the relationship between these variants and clinical manifestations in infected patients. We enrolled 30 Indonesian patients with hepatitis B infection (11 with chronic hepatitis and 19 with advanced liver disease). The most common subgenotype/subtype of HBV was B3/adw (97%). The HBsAg titer was lower in patients with advanced liver disease than that in patients with chronic hepatitis. The MHR variants were grouped based on the percentage of the viral population affected: major, ≥20% of the total population; intermediate, 5% to <20%; and minor, 1% to <5%. The rates of MHR variation that were present in the major and intermediate viral population were significantly greater in patients with advanced liver disease than those in chronic patients. The most frequent MHR variants related to immune evasion in the major and intermediate populations were P120Q/T, T123A, P127T, Q129H/R, M133L/T, and G145R. The major population of MHR variants causing impaired of HBsAg secretion (e.g., G119R, Q129R, T140I, and G145R) was detected only in advanced liver disease patients. This is the first study to use ultradeep sequencing for the detection of MHR variants of HBV quasispecies in Indonesian patients. We found that a greater number of MHR variations was related to disease severity and reduced likelihood of HBsAg titer. PMID:26202119

  1. CDKL5 variants

    PubMed Central

    Kalscheuer, Vera M.; Hennig, Friederike; Leonard, Helen; Downs, Jenny; Clarke, Angus; Benke, Tim A.; Armstrong, Judith; Pineda, Mercedes; Bailey, Mark E.S.; Cobb, Stuart R.

    2017-01-01

    Objective: To provide new insights into the interpretation of genetic variants in a rare neurologic disorder, CDKL5 deficiency, in the contexts of population sequencing data and an updated characterization of the CDKL5 gene. Methods: We analyzed all known potentially pathogenic CDKL5 variants by combining data from large-scale population sequencing studies with CDKL5 variants from new and all available clinical cohorts and combined this with computational methods to predict pathogenicity. Results: The study has identified several variants that can be reclassified as benign or likely benign. With the addition of novel CDKL5 variants, we confirm that pathogenic missense variants cluster in the catalytic domain of CDKL5 and reclassify a purported missense variant as having a splicing consequence. We provide further evidence that missense variants in the final 3 exons are likely to be benign and not important to disease pathology. We also describe benign splicing and nonsense variants within these exons, suggesting that isoform hCDKL5_5 is likely to have little or no neurologic significance. We also use the available data to make a preliminary estimate of minimum incidence of CDKL5 deficiency. Conclusions: These findings have implications for genetic diagnosis, providing evidence for the reclassification of specific variants previously thought to result in CDKL5 deficiency. Together, these analyses support the view that the predominant brain isoform in humans (hCDKL5_1) is crucial for normal neurodevelopment and that the catalytic domain is the primary functional domain. PMID:29264392

  2. Characterization of Canine parvovirus 2 variants circulating in Greece.

    PubMed

    Ntafis, Vasileios; Xylouri, Eftychia; Kalli, Iris; Desario, Costantina; Mari, Viviana; Decaro, Nicola; Buonavoglia, Canio

    2010-09-01

    The aim of the present study was to characterize Canine parvovirus 2 (CPV-2) variants currently circulating in Greece. Between March 2008 and March 2009, 167 fecal samples were collected from diarrheic dogs from different regions of Greece. Canine parvovirus 2 was detected by standard polymerase chain reaction, whereas minor groove binder probe assays were used to distinguish genetic variants and discriminate between vaccine and field strains. Of 84 CPV-2-positive samples, 81 CPV-2a, 1 CPV-2b, and 2 CPV-2c were detected. Vaccine strains were not detected in any sample. Sequence analysis of the VP2 gene of the 2 CPV-2c viruses revealed up to 100% amino acid identity with the CPV-2c strains previously detected in Europe. The results indicated that, unlike other European countries, CPV-2a remains the most common variant in Greece, and that the CPV-2c variant found in Europe is also present in Greece.

  3. Molecular characterization of canine parvovirus strains in Argentina: Detection of the pathogenic variant CPV2c in vaccinated dogs.

    PubMed

    Calderon, Marina Gallo; Mattion, Nora; Bucafusco, Danilo; Fogel, Fernando; Remorini, Patricia; La Torre, Jose

    2009-08-01

    PCR amplification with sequence-specific primers was used to detect canine parvovirus (CPV) DNA in 38 rectal swabs from Argentine domestic dogs with symptoms compatible with parvovirus disease. Twenty-seven out of 38 samples analyzed were CPV positive. The classical CPV2 strain was not detected in any of the samples, but nine samples were identified as CPV2a variant and 18 samples as CPV2b variant. Further sequence analysis revealed a mutation at amino acid 426 of the VP2 gene (Asp426Glu), characteristic of the CPV2c variant, in 14 out of 18 of the samples identified initially by PCR as CPV2b. The appearance of CPV2c variant in Argentina might be dated at least to the year 2003. Three different pathogenic CPV variants circulating currently in the Argentine domestic dog population were identified, with CPV2c being the only variant affecting vaccinated and unvaccinated dogs during the year 2008.

  4. BlackOPs: increasing confidence in variant detection through mappability filtering.

    PubMed

    Cabanski, Christopher R; Wilkerson, Matthew D; Soloway, Matthew; Parker, Joel S; Liu, Jinze; Prins, Jan F; Marron, J S; Perou, Charles M; Hayes, D Neil

    2013-10-01

    Identifying variants using high-throughput sequencing data is currently a challenge because true biological variants can be indistinguishable from technical artifacts. One source of technical artifact results from incorrectly aligning experimentally observed sequences to their true genomic origin ('mismapping') and inferring differences in mismapped sequences to be true variants. We developed BlackOPs, an open-source tool that simulates experimental RNA-seq and DNA whole exome sequences derived from the reference genome, aligns these sequences by custom parameters, detects variants and outputs a blacklist of positions and alleles caused by mismapping. Blacklists contain thousands of artifact variants that are indistinguishable from true variants and, for a given sample, are expected to be almost completely false positives. We show that these blacklist positions are specific to the alignment algorithm and read length used, and BlackOPs allows users to generate a blacklist specific to their experimental setup. We queried the dbSNP and COSMIC variant databases and found numerous variants indistinguishable from mapping errors. We demonstrate how filtering against blacklist positions reduces the number of potential false variants using an RNA-seq glioblastoma cell line data set. In summary, accounting for mapping-caused variants tuned to experimental setups reduces false positives and, therefore, improves genome characterization by high-throughput sequencing.

  5. Detection of Emerging Vaccine-Related Polioviruses by Deep Sequencing.

    PubMed

    Sahoo, Malaya K; Holubar, Marisa; Huang, ChunHong; Mohamed-Hadley, Alisha; Liu, Yuanyuan; Waggoner, Jesse J; Troy, Stephanie B; Garcia-Garcia, Lourdes; Ferreyra-Reyes, Leticia; Maldonado, Yvonne; Pinsky, Benjamin A

    2017-07-01

    Oral poliovirus vaccine can mutate to regain neurovirulence. To date, evaluation of these mutations has been performed primarily on culture-enriched isolates by using conventional Sanger sequencing. We therefore developed a culture-independent, deep-sequencing method targeting the 5' untranslated region (UTR) and P1 genomic region to characterize vaccine-related poliovirus variants. Error analysis of the deep-sequencing method demonstrated reliable detection of poliovirus mutations at levels of <1%, depending on read depth. Sequencing of viral nucleic acids from the stool of vaccinated, asymptomatic children and their close contacts collected during a prospective cohort study in Veracruz, Mexico, revealed no vaccine-derived polioviruses. This was expected given that the longest duration between sequenced sample collection and the end of the most recent national immunization week was 66 days. However, we identified many low-level variants (<5%) distributed across the 5' UTR and P1 genomic region in all three Sabin serotypes, as well as vaccine-related viruses with multiple canonical mutations associated with phenotypic reversion present at high levels (>90%). These results suggest that monitoring emerging vaccine-related poliovirus variants by deep sequencing may aid in the poliovirus endgame and efforts to ensure global polio eradication. Copyright © 2017 Sahoo et al.

  6. Non-Coding Keratin Variants Associate with Liver Fibrosis Progression in Patients with Hemochromatosis

    PubMed Central

    Lunova, Mariia; Guldiken, Nurdan; Lienau, Tim C.; Stickel, Felix; Omary, M. Bishr

    2012-01-01

    Background Keratins 8 and 18 (K8/K18) are intermediate filament proteins that protect the liver from various forms of injury. Exonic K8/K18 variants associate with adverse outcome in acute liver failure and with liver fibrosis progression in patients with chronic hepatitis C infection or primary biliary cirrhosis. Given the association of K8/K18 variants with end-stage liver disease and progression in several chronic liver disorders, we studied the importance of keratin variants in patients with hemochromatosis. Methods The entire K8/K18 exonic regions were analyzed in 162 hemochromatosis patients carrying homozygous C282Y HFE (hemochromatosis gene) mutations. 234 liver-healthy subjects were used as controls. Exonic regions were PCR-amplified and analyzed using denaturing high-performance liquid chromatography and DNA sequencing. Previously-generated transgenic mice overexpressing K8 G62C were studied for their susceptibility to iron overload. Susceptibility to iron toxicity of primary hepatocytes that express K8 wild-type and G62C was also assessed. Results We identified amino-acid-altering keratin heterozygous variants in 10 of 162 hemochromatosis patients (6.2%) and non-coding heterozygous variants in 6 additional patients (3.7%). Two novel K8 variants (Q169E/R275W) were found. K8 R341H was the most common amino-acid altering variant (4 patients), and exclusively associated with an intronic KRT8 IVS7+10delC deletion. Intronic, but not amino-acid-altering variants associated with the development of liver fibrosis. In mice, or ex vivo, the K8 G62C variant did not affect iron-accumulation in response to iron-rich diet or the extent of iron-induced hepatocellular injury. Conclusion In patients with hemochromatosis, intronic but not exonic K8/K18 variants associate with liver fibrosis development. PMID:22412904

  7. Rare variants and autoimmune disease.

    PubMed

    Massey, Jonathan; Eyre, Steve

    2014-09-01

    The study of rare variants in monogenic forms of autoimmune disease has offered insight into the aetiology of more complex pathologies. Research in complex autoimmune disease initially focused on sequencing candidate genes, with some early successes, notably in uncovering low-frequency variation associated with Type 1 diabetes mellitus. However, other early examples have proved difficult to replicate, and a recent study across six autoimmune diseases, re-sequencing 25 autoimmune disease-associated genes in large sample sizes, failed to find any associated rare variants. The study of rare and low-frequency variation in autoimmune diseases has been made accessible by the inclusion of such variants on custom genotyping arrays (e.g. Immunochip and Exome arrays). Whole-exome sequencing approaches are now also being utilised to uncover the contribution of rare coding variants to disease susceptibility, severity and treatment response. Other sequencing strategies are starting to uncover the role of regulatory rare variation. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  8. Analysis of selected genes associated with cardiomyopathy by next-generation sequencing.

    PubMed

    Szabadosova, Viktoria; Boronova, Iveta; Ferenc, Peter; Tothova, Iveta; Bernasovska, Jarmila; Zigova, Michaela; Kmec, Jan; Bernasovsky, Ivan

    2018-02-01

    As the leading cause of congestive heart failure, cardiomyopathy represents a heterogenous group of heart muscle disorders. Despite considerable progress being made in the genetic diagnosis of cardiomyopathy by detection of the mutations in the most prevalent cardiomyopathy genes, the cause remains unsolved in many patients. High-throughput mutation screening in the disease genes for cardiomyopathy is now possible because of using target enrichment followed by next-generation sequencing. The aim of the study was to analyze a panel of genes associated with dilated or hypertrophic cardiomyopathy based on previously published results in order to identify the subjects at risk. The method of next-generation sequencing by IlluminaHiSeq 2500 platform was used to detect sequence variants in 16 individuals diagnosed with dilated or hypertrophic cardiomyopathy. Detected variants were filtered and the functional impact of amino acid changes was predicted by computational programs. DNA samples of the 16 patients were analyzed by whole exome sequencing. We identified six nonsynonymous variants that were shown to be pathogenic in all used prediction softwares: rs3744998 (EPG5), rs11551768 (MGME1), rs148374985 (MURC), rs78461695 (PLEC), rs17158558 (RET) and rs2295190 (SYNE1). Two of the analyzed sequence variants had minor allele frequency (MAF)<0.01: rs148374985 (MURC), rs34580776 (MYBPC3). Our data support the potential role of the detected variants in pathogenesis of dilated or hypertrophic cardiomyopathy; however, the possibility that these variants might not be true disease-causing variants but are susceptibility alleles that require additional mutations or injury to cause the clinical phenotype of disease must be considered. © 2017 Wiley Periodicals, Inc.

  9. A short review of variants calling for single-cell-sequencing data with applications.

    PubMed

    Wei, Zhuohui; Shu, Chang; Zhang, Changsheng; Huang, Jingying; Cai, Hongmin

    2017-11-01

    The field of single-cell sequencing is fleetly expanding, and many techniques have been developed in the past decade. With this technology, biologists can study not only the heterogeneity between two adjacent cells in the same tissue or organ, but also the evolutionary relationships and degenerative processes in a single cell. Calling variants is the main purpose in analyzing single cell sequencing (SCS) data. Currently, some popular methods used for bulk-cell-sequencing data analysis are tailored directly to be applied in dealing with SCS data. However, SCS requires an extra step of genome amplification to accumulate enough quantity for satisfying sequencing needs. The amplification yields large biases and thus raises challenge for using the bulk-cell-sequencing methods. In order to provide guidance for the development of specialized analyzed methods as well as using currently developed tools for SNS, this paper aims to bridge the gap. In this paper, we firstly introduced two popular genome amplification methods and compared their capabilities. Then we introduced a few popular models for calling single-nucleotide polymorphisms and copy-number variations. Finally, break-through applications of SNS were summarized to demonstrate its potential in researching cell evolution. Copyright © 2017 Elsevier Ltd. All rights reserved.

  10. A study based on whole-genome sequencing yields a rare variant at 8q24 associated with prostate cancer

    PubMed Central

    Gudmundsson, Julius; Sulem, Patrick; Gudbjartsson, Daniel F.; Masson, Gisli; Agnarsson, Bjarni A.; Benediktsdottir, Kristrun R.; Sigurdsson, Asgeir; Magnusson, Olafur Th.; Gudjonsson, Sigurjon A.; Magnusdottir, Droplaug N.; Johannsdottir, Hrefna; Helgadottir, Hafdis Th.; Stacey, Simon N.; Jonasdottir, Adalbjorg; Olafsdottir, Stefania B.; Thorleifsson, Gudmar; Jonasson, Jon G.; Tryggvadottir, Laufey; Navarrete, Sebastian; Fuertes, Fernando; Helfand, Brian T.; Hu, Qiaoyan; Csiki, Irma E.; Mates, Ioan N.; Jinga, Viorel; Aben, Katja K. H.; van Oort, Inge M.; Vermeulen, Sita H.; Donovan, Jenny L.; Hamdy, Freddy C.; Ng, Chi-Fai; Chiu, Peter K.F.; Lau, Kin-Mang; Ng, Maggie C.Y.; Gulcher, Jeffrey R.; Kong, Augustine; Catalona, William J.; Mayordomo, Jose I.; Einarsson, Gudmundur V.; Barkardottir, Rosa B.; Jonsson, Eirikur; Mates, Dana; Neal, David E.; Kiemeney, Lambertus A.; Thorsteinsdottir, Unnur; Rafnar, Thorunn; Stefansson, Kari

    2013-01-01

    Western countries, prostate cancer is the most prevalent cancer of men, and one of the leading causes of cancer-related death in men. Several genome-wide association studies have yielded numerous common variants conferring risk of prostate cancer. In the present study we analyzed 32.5 million variants discovered by whole-genome sequencing 1,795 Icelanders. One variant was found to be associated with prostate cancer in European populations: rs188140481[A] (OR = 2.90, Pcomb = 6.2×10−34) located on 8q24, with an average risk allele control frequency of 0.54%. This variant is only very weakly correlated (r2 ≤ 0.06) with previously reported risk variants on 8q24, and remains significant after adjustment for all of them. Carriers of rs188140481[A] were diagnosed with prostate cancer 1.26 years younger than non-carriers (P = 0.0059). We also report results for the previously described HOXB13 mutation (rs138213197[T]), confirming it as prostate cancer risk variant in populations from all over Europe. PMID:23104005

  11. Functional Implications of Novel Human Acid Sphingomyelinase Splice Variants

    PubMed Central

    Rhein, Cosima; Tripal, Philipp; Seebahn, Angela; Konrad, Alice; Kramer, Marcel; Nagel, Christine; Kemper, Jonas; Bode, Jens; Mühle, Christiane; Gulbins, Erich; Reichel, Martin; Becker, Cord-Michael; Kornhuber, Johannes

    2012-01-01

    Background Acid sphingomyelinase (ASM) hydrolyses sphingomyelin and generates the lipid messenger ceramide, which mediates a variety of stress-related cellular processes. The pathological effects of dysregulated ASM activity are evident in several human diseases and indicate an important functional role for ASM regulation. We investigated alternative splicing as a possible mechanism for regulating cellular ASM activity. Methodology/Principal Findings We identified three novel ASM splice variants in human cells, termed ASM-5, -6 and -7, which lack portions of the catalytic- and/or carboxy-terminal domains in comparison to full-length ASM-1. Differential expression patterns in primary blood cells indicated that ASM splicing might be subject to regulatory processes. The newly identified ASM splice variants were catalytically inactive in biochemical in vitro assays, but they decreased the relative cellular ceramide content in overexpression studies and exerted a dominant-negative effect on ASM activity in physiological cell models. Conclusions/Significance These findings indicate that alternative splicing of ASM is of functional significance for the cellular stress response, possibly representing a mechanism for maintaining constant levels of cellular ASM enzyme activity. PMID:22558155

  12. Whole-Exome Sequencing in Age-Related Macular Degeneration Identifies Rare Variants in COL8A1, a Component of Bruch's Membrane.

    PubMed

    Corominas, Jordi; Colijn, Johanna M; Geerlings, Maartje J; Pauper, Marc; Bakker, Bjorn; Amin, Najaf; Lores Motta, Laura; Kersten, Eveline; Garanto, Alejandro; Verlouw, Joost A M; van Rooij, Jeroen G J; Kraaij, Robert; de Jong, Paulus T V M; Hofman, Albert; Vingerling, Johannes R; Schick, Tina; Fauser, Sascha; de Jong, Eiko K; van Duijn, Cornelia M; Hoyng, Carel B; Klaver, Caroline C W; den Hollander, Anneke I

    2018-04-26

    Genome-wide association studies and targeted sequencing studies of candidate genes have identified common and rare variants that are associated with age-related macular degeneration (AMD). Whole-exome sequencing (WES) studies allow a more comprehensive analysis of rare coding variants across all genes of the genome and will contribute to a better understanding of the underlying disease mechanisms. To date, the number of WES studies in AMD case-control cohorts remains scarce and sample sizes are limited. To scrutinize the role of rare protein-altering variants in AMD cause, we performed the largest WES study in AMD to date in a large European cohort consisting of 1125 AMD patients and 1361 control participants. Genome-wide case-control association study of WES data. One thousand one hundred twenty-five AMD patients and 1361 control participants. A single variant association test of WES data was performed to detect variants that are associated individually with AMD. The cumulative effect of multiple rare variants with 1 gene was analyzed using a gene-based CMC burden test. Immunohistochemistry was performed to determine the localization of the Col8a1 protein in mouse eyes. Genetic variants associated with AMD. We detected significantly more rare protein-altering variants in the COL8A1 gene in patients (22/2250 alleles [1.0%]) than in control participants (11/2722 alleles [0.4%]; P = 7.07×10 -5 ). The association of rare variants in the COL8A1 gene is independent of the common intergenic variant (rs140647181) near the COL8A1 gene previously associated with AMD. We demonstrated that the Col8a1 protein localizes at Bruch's membrane. This study supported a role for protein-altering variants in the COL8A1 gene in AMD pathogenesis. We demonstrated the presence of Col8a1 in Bruch's membrane, further supporting the role of COL8A1 variants in AMD pathogenesis. Protein-altering variants in COL8A1 may alter the integrity of Bruch's membrane, contributing to the accumulation

  13. Investigation of Outbreaks of Salmonella enterica Serovar Typhimurium and Its Monophasic Variants Using Whole-Genome Sequencing, Denmark

    PubMed Central

    Gymoese, Pernille; Sørensen, Gitte; Litrup, Eva; Olsen, John Elmerdal; Nielsen, Eva Møller

    2017-01-01

    Whole-genome sequencing is rapidly replacing current molecular typing methods for surveillance purposes. Our study evaluates core-genome single-nucleotide polymorphism analysis for outbreak detection and linking of sources of Salmonella enterica serovar Typhimurium and its monophasic variants during a 7-month surveillance period in Denmark. We reanalyzed and defined 8 previously characterized outbreaks from the phylogenetic relatedness of the isolates, epidemiologic data, and food traceback investigations. All outbreaks were identified, and we were able to exclude unrelated and include additional related human cases. We were furthermore able to link possible food and veterinary sources to the outbreaks. Isolates clustered according to sequence types (STs) 19, 34, and 36. Our study shows that core-genome single-nucleotide polymorphism analysis is suitable for surveillance and outbreak investigation for Salmonella Typhimurium (ST19 and ST36), but whole genome–wide analysis may be required for the tight genetic clone of monophasic variants (ST34). PMID:28930002

  14. Influenza virus sequence feature variant type analysis: evidence of a role for NS1 in influenza virus host range restriction.

    PubMed

    Noronha, Jyothi M; Liu, Mengya; Squires, R Burke; Pickett, Brett E; Hale, Benjamin G; Air, Gillian M; Galloway, Summer E; Takimoto, Toru; Schmolke, Mirco; Hunt, Victoria; Klem, Edward; García-Sastre, Adolfo; McGee, Monnie; Scheuermann, Richard H

    2012-05-01

    Genetic drift of influenza virus genomic sequences occurs through the combined effects of sequence alterations introduced by a low-fidelity polymerase and the varying selective pressures experienced as the virus migrates through different host environments. While traditional phylogenetic analysis is useful in tracking the evolutionary heritage of these viruses, the specific genetic determinants that dictate important phenotypic characteristics are often difficult to discern within the complex genetic background arising through evolution. Here we describe a novel influenza virus sequence feature variant type (Flu-SFVT) approach, made available through the public Influenza Research Database resource (www.fludb.org), in which variant types (VTs) identified in defined influenza virus protein sequence features (SFs) are used for genotype-phenotype association studies. Since SFs have been defined for all influenza virus proteins based on known structural, functional, and immune epitope recognition properties, the Flu-SFVT approach allows the rapid identification of the molecular genetic determinants of important influenza virus characteristics and their connection to underlying biological functions. We demonstrate the use of the SFVT approach to obtain statistical evidence for effects of NS1 protein sequence variations in dictating influenza virus host range restriction.

  15. The amino acid sequence of Staphylococcus aureus penicillinase.

    PubMed Central

    Ambler, R P

    1975-01-01

    The amino acid sequence of the penicillinase (penicillin amido-beta-lactamhydrolase, EC 3.5.2.6) from Staphylococcus aureus strain PC1 was determined. The protein consists of a single polypeptide chain of 257 residues, and the sequence was determined by characterization of tryptic, chymotryptic, peptic and CNBr peptides, with some additional evidence from thermolysin and S. aureus proteinase peptides. A mistake in the preliminary report of the sequence is corrected; residues 113-116 are now thought to be -Lys-Lys-Val-Lys- rather than -Lys-Val-Lys-Lys-. Detailed evidence for the amino acid sequence has been deposited as Supplementary Publication SUP 50056 (91 pages) at the British Library (Lending Division), Boston Spa, Wetherby, West Yorkshire LS23 7BQ, U.K., from whom copies may be obtained on the terms given in Biochem. J. (1975) 145, 5. PMID:1218078

  16. Distinctive Epstein-Barr virus variants associated with benign and malignant pediatric pathologies: LMP1 sequence characterization and linkage with other viral gene polymorphisms.

    PubMed

    Lorenzetti, Mario Alejandro; Gantuz, Magdalena; Altcheh, Jaime; De Matteo, Elena; Chabay, Paola Andrea; Preciado, María Victoria

    2012-03-01

    The ubiquitous Epstein-Barr virus (EBV) is related to the development of lymphoma and is also the etiological agent for infectious mononucleosis (IM). Sequence variations in the gene encoding LMP1 have been deeply studied in different pathologies and geographic regions. Controversial results propose the existence of tumor-related variants, while others argued in favor of a geographical distribution of these variants. Reports assessing EBV variants in IM were performed in adult patients who displayed multiple variant infections. In the present study, LMP1 variants in 15 pediatric patients with IM and 20 pediatric patients with EBV-associated lymphomas from Argentina were analyzed as representatives of benign and malignant infections in children, respectively. A 3-month follow-up study of LMP1 variants in peripheral blood cells and in oral secretions of patients with IM was performed. Moreover, an integrated linkage analysis was performed with variants of EBNA1 and the promoter region of BZLF1. Similar sequence polymorphisms were detected in both pathological conditions, IM and lymphoma, but these differ from those previously described in healthy donors from Argentina and Brazil. The results suggest that certain LMP1 polymorphisms, namely, the 30-bp deletion and high copy number of the 33-bp repeats, are associated with EBV-related pathologies, either benign or malignant, instead of just being tumor related. Additionally, this is the first study to describe the Alaskan variant in EBV-related lymphomas that previously was restricted to nasopharyngeal carcinomas from North America.

  17. Distinctive Epstein-Barr Virus Variants Associated with Benign and Malignant Pediatric Pathologies: LMP1 Sequence Characterization and Linkage with Other Viral Gene Polymorphisms

    PubMed Central

    Gantuz, Magdalena; Altcheh, Jaime; De Matteo, Elena; Chabay, Paola Andrea; Preciado, María Victoria

    2012-01-01

    The ubiquitous Epstein-Barr virus (EBV) is related to the development of lymphoma and is also the etiological agent for infectious mononucleosis (IM). Sequence variations in the gene encoding LMP1 have been deeply studied in different pathologies and geographic regions. Controversial results propose the existence of tumor-related variants, while others argued in favor of a geographical distribution of these variants. Reports assessing EBV variants in IM were performed in adult patients who displayed multiple variant infections. In the present study, LMP1 variants in 15 pediatric patients with IM and 20 pediatric patients with EBV-associated lymphomas from Argentina were analyzed as representatives of benign and malignant infections in children, respectively. A 3-month follow-up study of LMP1 variants in peripheral blood cells and in oral secretions of patients with IM was performed. Moreover, an integrated linkage analysis was performed with variants of EBNA1 and the promoter region of BZLF1. Similar sequence polymorphisms were detected in both pathological conditions, IM and lymphoma, but these differ from those previously described in healthy donors from Argentina and Brazil. The results suggest that certain LMP1 polymorphisms, namely, the 30-bp deletion and high copy number of the 33-bp repeats, are associated with EBV-related pathologies, either benign or malignant, instead of just being tumor related. Additionally, this is the first study to describe the Alaskan variant in EBV-related lymphomas that previously was restricted to nasopharyngeal carcinomas from North America. PMID:22205789

  18. Global characterization of copy number variants in epilepsy patients from whole genome sequencing

    PubMed Central

    Meloche, Caroline; Andrade, Danielle M.; Lafreniere, Ron G.; Gravel, Micheline; Spiegelman, Dan; Dionne-Laporte, Alexandre; Boelman, Cyrus; Hamdan, Fadi F.; Michaud, Jacques L.; Rouleau, Guy; Minassian, Berge A.; Bourque, Guillaume; Cossette, Patrick

    2018-01-01

    Epilepsy will affect nearly 3% of people at some point during their lifetime. Previous copy number variants (CNVs) studies of epilepsy have used array-based technology and were restricted to the detection of large or exonic events. In contrast, whole-genome sequencing (WGS) has the potential to more comprehensively profile CNVs but existing analytic methods suffer from limited accuracy. We show that this is in part due to the non-uniformity of read coverage, even after intra-sample normalization. To improve on this, we developed PopSV, an algorithm that uses multiple samples to control for technical variation and enables the robust detection of CNVs. Using WGS and PopSV, we performed a comprehensive characterization of CNVs in 198 individuals affected with epilepsy and 301 controls. For both large and small variants, we found an enrichment of rare exonic events in epilepsy patients, especially in genes with predicted loss-of-function intolerance. Notably, this genome-wide survey also revealed an enrichment of rare non-coding CNVs near previously known epilepsy genes. This enrichment was strongest for non-coding CNVs located within 100 Kbp of an epilepsy gene and in regions associated with changes in the gene expression, such as expression QTLs or DNase I hypersensitive sites. Finally, we report on 21 potentially damaging events that could be associated with known or new candidate epilepsy genes. Our results suggest that comprehensive sequence-based profiling of CNVs could help explain a larger fraction of epilepsy cases. PMID:29649218

  19. Quantitative characterization of all single amino acid variants of a viral capsid-based drug delivery vehicle.

    PubMed

    Hartman, Emily C; Jakobson, Christopher M; Favor, Andrew H; Lobba, Marco J; Álvarez-Benedicto, Ester; Francis, Matthew B; Tullman-Ercek, Danielle

    2018-04-11

    Self-assembling proteins are critical to biological systems and industrial technologies, but predicting how mutations affect self-assembly remains a significant challenge. Here, we report a technique, termed SyMAPS (Systematic Mutation and Assembled Particle Selection), that can be used to characterize the assembly competency of all single amino acid variants of a self-assembling viral structural protein. SyMAPS studies on the MS2 bacteriophage coat protein revealed a high-resolution fitness landscape that challenges some conventional assumptions of protein engineering. An additional round of selection identified a previously unknown variant (CP[T71H]) that is stable at neutral pH but less tolerant to acidic conditions than the wild-type coat protein. The capsids formed by this variant could be more amenable to disassembly in late endosomes or early lysosomes-a feature that is advantageous for delivery applications. In addition to providing a mutability blueprint for virus-like particles, SyMAPS can be readily applied to other self-assembling proteins.

  20. GAVIN: Gene-Aware Variant INterpretation for medical sequencing.

    PubMed

    van der Velde, K Joeri; de Boer, Eddy N; van Diemen, Cleo C; Sikkema-Raddatz, Birgit; Abbott, Kristin M; Knopperts, Alain; Franke, Lude; Sijmons, Rolf H; de Koning, Tom J; Wijmenga, Cisca; Sinke, Richard J; Swertz, Morris A

    2017-01-16

    We present Gene-Aware Variant INterpretation (GAVIN), a new method that accurately classifies variants for clinical diagnostic purposes. Classifications are based on gene-specific calibrations of allele frequencies from the ExAC database, likely variant impact using SnpEff, and estimated deleteriousness based on CADD scores for >3000 genes. In a benchmark on 18 clinical gene sets, we achieve a sensitivity of 91.4% and a specificity of 76.9%. This accuracy is unmatched by 12 other tools. We provide GAVIN as an online MOLGENIS service to annotate VCF files and as an open source executable for use in bioinformatic pipelines. It can be found at http://molgenis.org/gavin .

  1. A novel LPL intronic variant: g.18704C>A identified by re-sequencing Kuwaiti Arab samples is associated with high-density lipoprotein, very low-density lipoprotein and triglyceride lipid levels.

    PubMed

    Al-Bustan, Suzanne A; Al-Serri, Ahmad; Annice, Babitha G; Alnaqeeb, Majed A; Al-Kandari, Wafa Y; Dashti, Mohammed

    2018-01-01

    The role interethnic genetic differences play in plasma lipid level variation across populations is a global health concern. Several genes involved in lipid metabolism and transport are strong candidates for the genetic association with lipid level variation especially lipoprotein lipase (LPL). The objective of this study was to re-sequence the full LPL gene in Kuwaiti Arabs, analyse the sequence variation and identify variants that could attribute to variation in plasma lipid levels for further genetic association. Samples (n = 100) of an Arab ethnic group from Kuwait were analysed for sequence variation by Sanger sequencing across the 30 Kb LPL gene and its flanking sequences. A total of 293 variants including 252 single nucleotide polymorphisms (SNPs) and 39 insertions/deletions (InDels) were identified among which 47 variants (32 SNPs and 15 InDels) were novel to Kuwaiti Arabs. This study is the first to report sequence data and analysis of frequencies of variants at the LPL gene locus in an Arab ethnic group with a novel "rare" variant (LPL:g.18704C>A) significantly associated to HDL (B = -0.181; 95% CI (-0.357, -0.006); p = 0.043), TG (B = 0.134; 95% CI (0.004-0.263); p = 0.044) and VLDL (B = 0.131; 95% CI (-0.001-0.263); p = 0.043) levels. Sequence variation in Kuwaiti Arabs was compared to other populations and was found to be similar with regards to the number of SNPs, InDels and distribution of the number of variants across the LPL gene locus and minor allele frequency (MAF). Moreover, comparison of the identified variants and their MAF with other reports provided a list of 46 potential variants across the LPL gene to be considered for future genetic association studies. The findings warrant further investigation into the association of g.18704C>A with lipid levels in other ethnic groups and with clinical manifestations of dyslipidemia.

  2. A novel LPL intronic variant: g.18704C>A identified by re-sequencing Kuwaiti Arab samples is associated with high-density lipoprotein, very low-density lipoprotein and triglyceride lipid levels

    PubMed Central

    Al-Serri, Ahmad; Annice, Babitha G.; Alnaqeeb, Majed A.; Al-Kandari, Wafa Y.; Dashti, Mohammed

    2018-01-01

    The role interethnic genetic differences play in plasma lipid level variation across populations is a global health concern. Several genes involved in lipid metabolism and transport are strong candidates for the genetic association with lipid level variation especially lipoprotein lipase (LPL). The objective of this study was to re-sequence the full LPL gene in Kuwaiti Arabs, analyse the sequence variation and identify variants that could attribute to variation in plasma lipid levels for further genetic association. Samples (n = 100) of an Arab ethnic group from Kuwait were analysed for sequence variation by Sanger sequencing across the 30 Kb LPL gene and its flanking sequences. A total of 293 variants including 252 single nucleotide polymorphisms (SNPs) and 39 insertions/deletions (InDels) were identified among which 47 variants (32 SNPs and 15 InDels) were novel to Kuwaiti Arabs. This study is the first to report sequence data and analysis of frequencies of variants at the LPL gene locus in an Arab ethnic group with a novel “rare” variant (LPL:g.18704C>A) significantly associated to HDL (B = -0.181; 95% CI (-0.357, -0.006); p = 0.043), TG (B = 0.134; 95% CI (0.004–0.263); p = 0.044) and VLDL (B = 0.131; 95% CI (-0.001–0.263); p = 0.043) levels. Sequence variation in Kuwaiti Arabs was compared to other populations and was found to be similar with regards to the number of SNPs, InDels and distribution of the number of variants across the LPL gene locus and minor allele frequency (MAF). Moreover, comparison of the identified variants and their MAF with other reports provided a list of 46 potential variants across the LPL gene to be considered for future genetic association studies. The findings warrant further investigation into the association of g.18704C>A with lipid levels in other ethnic groups and with clinical manifestations of dyslipidemia. PMID:29438437

  3. Functional consequences of a novel variant of PCSK1.

    PubMed

    Pickett, Lindsay A; Yourshaw, Michael; Albornoz, Valeria; Chen, Zijun; Solorzano-Vargas, R Sergio; Nelson, Stanley F; Martín, Martín G; Lindberg, Iris

    2013-01-01

    Common single nucleotide polymorphisms (SNPs) in proprotein convertase subtilisin/kexin type 1 with modest effects on PC1/3 in vitro have been associated with obesity in five genome-wide association studies and with diabetes in one genome-wide association study. We here present a novel SNP and compare its biosynthesis, secretion and catalytic activity to wild-type enzyme and to SNPs that have been linked to obesity. A novel PC1/3 variant introducing an Arg to Gln amino acid substitution at residue 80 (within the secondary cleavage site of the prodomain) (rs1799904) was studied. This novel variant was selected for analysis from the 1000 Genomes sequencing project based on its predicted deleterious effect on enzyme function and its comparatively more frequent allele frequency. The actual existence of the R80Q (rs1799904) variant was verified by Sanger sequencing. The effects of this novel variant on the biosynthesis, secretion, and catalytic activity were determined; the previously-described obesity risk SNPs N221D (rs6232), Q665E/S690T (rs6234/rs6235), and the Q665E and S690T SNPs (analyzed separately) were included for comparative purposes. The novel R80Q (rs1799904) variant described in this study resulted in significantly detrimental effects on both the maturation and in vitro catalytic activity of PC1/3. Our findings that this novel R80Q (rs1799904) variant both exhibits adverse effects on PC1/3 activity and is prevalent in the population suggests that further biochemical and genetic analysis to assess its contribution to the risk of metabolic disease within the general population is warranted.

  4. Pooled Sequencing of 531 Genes in Inflammatory Bowel Disease Identifies an Associated Rare Variant in BTNL2 and Implicates Other Immune Related Genes

    PubMed Central

    Prescott, Natalie J.; Lehne, Benjamin; Stone, Kristina; Lee, James C.; Taylor, Kirstin; Knight, Jo; Papouli, Efterpi; Mirza, Muddassar M.; Simpson, Michael A.; Spain, Sarah L.; Lu, Grace; Fraternali, Franca; Bumpstead, Suzannah J.; Gray, Emma; Amar, Ariella; Bye, Hannah; Green, Peter; Chung-Faye, Guy; Hayee, Bu’Hussain; Pollok, Richard; Satsangi, Jack; Parkes, Miles; Barrett, Jeffrey C.; Mansfield, John C.; Sanderson, Jeremy; Lewis, Cathryn M.; Weale, Michael E.; Schlitt, Thomas; Mathew, Christopher G.

    2015-01-01

    The contribution of rare coding sequence variants to genetic susceptibility in complex disorders is an important but unresolved question. Most studies thus far have investigated a limited number of genes from regions which contain common disease associated variants. Here we investigate this in inflammatory bowel disease by sequencing the exons and proximal promoters of 531 genes selected from both genome-wide association studies and pathway analysis in pooled DNA panels from 474 cases of Crohn’s disease and 480 controls. 80 variants with evidence of association in the sequencing experiment or with potential functional significance were selected for follow up genotyping in 6,507 IBD cases and 3,064 population controls. The top 5 disease associated variants were genotyped in an extension panel of 3,662 IBD cases and 3,639 controls, and tested for association in a combined analysis of 10,147 IBD cases and 7,008 controls. A rare coding variant p.G454C in the BTNL2 gene within the major histocompatibility complex was significantly associated with increased risk for IBD (p = 9.65x10−10, OR = 2.3[95% CI = 1.75–3.04]), but was independent of the known common associated CD and UC variants at this locus. Rare (<1%) and low frequency (1–5%) variants in 3 additional genes showed suggestive association (p<0.005) with either an increased risk (ARIH2 c.338-6C>T) or decreased risk (IL12B p.V298F, and NICN p.H191R) of IBD. These results provide additional insights into the involvement of the inhibition of T cell activation in the development of both sub-phenotypes of inflammatory bowel disease. We suggest that although rare coding variants may make a modest overall contribution to complex disease susceptibility, they can inform our understanding of the molecular pathways that contribute to pathogenesis. PMID:25671699

  5. Distribution and clinical impact of functional variants in 50,726 whole-exome sequences from the DiscovEHR study.

    PubMed

    Dewey, Frederick E; Murray, Michael F; Overton, John D; Habegger, Lukas; Leader, Joseph B; Fetterolf, Samantha N; O'Dushlaine, Colm; Van Hout, Cristopher V; Staples, Jeffrey; Gonzaga-Jauregui, Claudia; Metpally, Raghu; Pendergrass, Sarah A; Giovanni, Monica A; Kirchner, H Lester; Balasubramanian, Suganthi; Abul-Husn, Noura S; Hartzel, Dustin N; Lavage, Daniel R; Kost, Korey A; Packer, Jonathan S; Lopez, Alexander E; Penn, John; Mukherjee, Semanti; Gosalia, Nehal; Kanagaraj, Manoj; Li, Alexander H; Mitnaul, Lyndon J; Adams, Lance J; Person, Thomas N; Praveen, Kavita; Marcketta, Anthony; Lebo, Matthew S; Austin-Tse, Christina A; Mason-Suares, Heather M; Bruse, Shannon; Mellis, Scott; Phillips, Robert; Stahl, Neil; Murphy, Andrew; Economides, Aris; Skelding, Kimberly A; Still, Christopher D; Elmore, James R; Borecki, Ingrid B; Yancopoulos, George D; Davis, F Daniel; Faucett, William A; Gottesman, Omri; Ritchie, Marylyn D; Shuldiner, Alan R; Reid, Jeffrey G; Ledbetter, David H; Baras, Aris; Carey, David J

    2016-12-23

    The DiscovEHR collaboration between the Regeneron Genetics Center and Geisinger Health System couples high-throughput sequencing to an integrated health care system using longitudinal electronic health records (EHRs). We sequenced the exomes of 50,726 adult participants in the DiscovEHR study to identify ~4.2 million rare single-nucleotide variants and insertion/deletion events, of which ~176,000 are predicted to result in a loss of gene function. Linking these data to EHR-derived clinical phenotypes, we find clinical associations supporting therapeutic targets, including genes encoding drug targets for lipid lowering, and identify previously unidentified rare alleles associated with lipid levels and other blood level traits. About 3.5% of individuals harbor deleterious variants in 76 clinically actionable genes. The DiscovEHR data set provides a blueprint for large-scale precision medicine initiatives and genomics-guided therapeutic discovery. Copyright © 2016, American Association for the Advancement of Science.

  6. X-Linked Glomerulopathy Due to COL4A5 Founder Variant.

    PubMed

    Barua, Moumita; John, Rohan; Stella, Lorenzo; Li, Weili; Roslin, Nicole M; Sharif, Bedra; Hack, Saidah; Lajoie-Starkell, Ginette; Schwaderer, Andrew L; Becknell, Brian; Wuttke, Matthias; Köttgen, Anna; Cattran, Daniel; Paterson, Andrew D; Pei, York

    2018-03-01

    Alport syndrome is a rare hereditary disorder caused by rare variants in 1 of 3 genes encoding for type IV collagen. Rare variants in COL4A5 on chromosome Xq22 cause X-linked Alport syndrome, which accounts for ∼80% of the cases. Alport syndrome has a variable clinical presentation, including progressive kidney failure, hearing loss, and ocular defects. Exome sequencing performed in 2 affected related males with an undefined X-linked glomerulopathy characterized by global and segmental glomerulosclerosis, mesangial hypercellularity, and vague basement membrane immune complex deposition revealed a COL4A5 sequence variant, a substitution of a thymine by a guanine at nucleotide 665 (c.T665G; rs281874761) of the coding DNA predicted to lead to a cysteine to phenylalanine substitution at amino acid 222, which was not seen in databases cataloguing natural human genetic variation, including dbSNP138, 1000 Genomes Project release version 01-11-2004, Exome Sequencing Project 21-06-2014, or ExAC 01-11-2014. Review of the literature identified 2 additional families with the same COL4A5 variant leading to similar atypical histopathologic features, suggesting a unique pathologic mechanism initiated by this specific rare variant. Homology modeling suggests that the substitution alters the structural and dynamic properties of the type IV collagen trimer. Genetic analysis comparing members of the 3 families indicated a distant relationship with a shared haplotype, implying a founder effect. Crown Copyright © 2017. Published by Elsevier Inc. All rights reserved.

  7. Detection of de novo single nucleotide variants in offspring of atomic-bomb survivors close to the hypocenter by whole-genome sequencing.

    PubMed

    Horai, Makiko; Mishima, Hiroyuki; Hayashida, Chisa; Kinoshita, Akira; Nakane, Yoshibumi; Matsuo, Tatsuki; Tsuruda, Kazuto; Yanagihara, Katsunori; Sato, Shinya; Imanishi, Daisuke; Imaizumi, Yoshitaka; Hata, Tomoko; Miyazaki, Yasushi; Yoshiura, Koh-Ichiro

    2018-03-01

    Ionizing radiation released by the atomic bombs at Hiroshima and Nagasaki, Japan, in 1945 caused many long-term illnesses, including increased risks of malignancies such as leukemia and solid tumours. Radiation has demonstrated genetic effects in animal models, leading to concerns over the potential hereditary effects of atomic bomb-related radiation. However, no direct analyses of whole DNA have yet been reported. We therefore investigated de novo variants in offspring of atomic-bomb survivors by whole-genome sequencing (WGS). We collected peripheral blood from three trios, each comprising a father (atomic-bomb survivor with acute radiation symptoms), a non-exposed mother, and their child, none of whom had any past history of haematological disorders. One trio of non-exposed individuals was included as a control. DNA was extracted and the numbers of de novo single nucleotide variants in the children were counted by WGS with sequencing confirmation. Gross structural variants were also analysed. Written informed consent was obtained from all participants prior to the study. There were 62, 81, and 42 de novo single nucleotide variants in the children of atomic-bomb survivors, compared with 48 in the control trio. There were no gross structural variants in any trio. These findings are in accord with previously published results that also showed no significant genetic effects of atomic-bomb radiation on second-generation survivors.

  8. Construction of a combinatorial pipeline using two somatic variant  calling  methods  for whole exome sequence data of gastric cancer.

    PubMed

    Kohmoto, Tomohiro; Masuda, Kiyoshi; Naruto, Takuya; Tange, Shoichiro; Shoda, Katsutoshi; Hamada, Junichi; Saito, Masako; Ichikawa, Daisuke; Tajima, Atsushi; Otsuji, Eigo; Imoto, Issei

    2017-01-01

    High-throughput next-generation sequencing is a powerful tool to identify the genotypic landscapes of somatic variants and therapeutic targets in various cancers including gastric cancer, forming the basis for personalized medicine in the clinical setting. Although the advent of many computational algorithms leads to higher accuracy in somatic variant calling, no standard method exists due to the limitations of each method. Here, we constructed a new pipeline. We combined two different somatic variant callers with different algorithms, Strelka and VarScan 2, and evaluated performance using whole exome sequencing data obtained from 19 Japanese cases with gastric cancer (GC); then, we characterized these tumors based on identified driver molecular alterations. More single nucleotide variants (SNVs) and small insertions/deletions were detected by Strelka and VarScan 2, respectively. SNVs detected by both tools showed higher accuracy for estimating somatic variants compared with those detected by only one of the two tools and accurately showed the mutation signature and mutations of driver genes reported for GC. Our combinatorial pipeline may have an advantage in detection of somatic mutations in GC and may be useful for further genomic characterization of Japanese patients with GC to improve the efficacy of GC treatments. J. Med. Invest. 64: 233-240, August, 2017.

  9. Whole-Exome Sequencing to Identify Rare Variants and Gene Networks that Increase Susceptibility to Scleroderma in African Americans.

    PubMed

    Gourh, Pravitt; Remmers, Elaine F; Boyden, Steven E; Alexander, Theresa; Morgan, Nadia D; Shah, Ami A; Mayes, Maureen D; Doumatey, Ayo; Bentley, Amy R; Shriner, Daniel; Domsic, Robyn T; Medsger, Thomas A; Steen, Virginia D; Ramos, Paula S; Silver, Richard M; Korman, Benjamin; Varga, John; Schiopu, Elena; Khanna, Dinesh; Hsu, Vivien; Gordon, Jessica K; Saketkoo, Lesley Ann; Gladue, Heather; Kron, Brynn; Criswell, Lindsey A; Derk, Chris T; Bridges, S Louis; Shanmugam, Victoria K; Kolstad, Kathleen D; Chung, Lorinda; Jan, Reem; Bernstein, Elana J; Goldberg, Avram; Trojanowski, Marcin; Kafaja, Suzanne; Maksimowicz-McKinnon, Kathleen M; Mullikin, James C; Adeyemo, Adebowale; Rotimi, Charles; Boin, Francesco; Kastner, Daniel L; Wigley, Fredrick M

    2018-05-06

    Whole-exome sequencing (WES) studies in systemic sclerosis (SSc) patients of European American (EA) ancestry have identified variants in the ATP8B4 gene and enrichment of variants in genes in the extracellular matrix (ECM)-related pathway increasing SSc susceptibility. Our goal was to evaluate the association of the ATP8B4 gene and the ECM-related pathway with SSc in a cohort of African Americans (AA). SSc patients of AA ancestry were enrolled from 23 academic centers across the United States under the Genome Research in African American Scleroderma Patients (GRASP) consortium. Unrelated AA individuals without serological evidence of autoimmunity enrolled in the Howard University Family Study were used as unaffected controls. Functional variants in genes reported in the two WES studies in EA SSc were selected for gene association testing using the optimized sequence kernel association test (SKAT-O) and pathway analysis by Ingenuity pathway analysis in 379 patients and 411 controls. Principal components analysis demonstrated that the patients and controls had similar ancestral backgrounds with about equal proportions of mean European admixture. Using SKAT-O, we examined the association of individual genes that were previously reported in EAs, and none remained significant including ATP8B4 (P U nCorr =0.98). However, we confirm the previously reported association of the ECM-related pathway with enrichment of variants within the COL13A1, COL18A1, COL22A1, COL4A3, COL4A4, COL5A2, PROK1, and SERPINE1 genes (P C orr =1.95×10 -4 ). This is the largest genetic study in AAs with SSc to date, corroborating the role of functional variants aggregating in a fibrotic pathway and increasing SSc susceptibility. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  10. Complete Genome Sequence of a Novel Hantavirus Variant of Rio Mamoré Virus, Maripa Virus, from French Guiana

    PubMed Central

    Matheus, Séverine; Lavergne, Anne; de Thoisy, Benoît; Dussart, Philippe

    2012-01-01

    We report the first complete genome sequence of Maripa virus identified in 2009 from a patient with hantavirus pulmonary syndrome in French Guiana. Maripa virus corresponds to a new variant of the Rio Mamoré virus species in the Bunyaviridae family, genus Hantavirus. PMID:22492924

  11. Detection and isolation of nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1997-01-01

    A method for detecting a target nucleic acid sequence in a sample is provided using hybridization probes which competitively hybridize to a target nucleic acid. According to the method, a target nucleic acid sequence is hybridized to first and second hybridization probes which are complementary to overlapping portions of the target nucleic acid sequence, the first hybridization probe including a first complexing agent capable of forming a binding pair with a second complexing agent and the second hybridization probe including a detectable marker. The first complexing agent attached to the first hybridization probe is contacted with a second complexing agent, the second complexing agent being attached to a solid support such that when the first and second complexing agents are attached, target nucleic acid sequences hybridized to the first hybridization probe become immobilized on to the solid support. The immobilized target nucleic acids are then separated and detected by detecting the detectable marker attached to the second hybridization probe. A kit for performing the method is also provided.

  12. Detection and isolation of nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1997-04-01

    A method for detecting a target nucleic acid sequence in a sample is provided using hybridization probes which competitively hybridize to a target nucleic acid. According to the method, a target nucleic acid sequence is hybridized to first and second hybridization probes which are complementary to overlapping portions of the target nucleic acid sequence, the first hybridization probe including a first complexing agent capable of forming a binding pair with a second complexing agent and the second hybridization probe including a detectable marker. The first complexing agent attached to the first hybridization probe is contacted with a second complexing agent, the second complexing agent being attached to a solid support such that when the first and second complexing agents are attached, target nucleic acid sequences hybridized to the first hybridization probe become immobilized on to the solid support. The immobilized target nucleic acids are then separated and detected by detecting the detectable marker attached to the second hybridization probe. A kit for performing the method is also provided. 7 figs.

  13. Rare Variant, Gene-Based Association Study of Hereditary Melanoma Using Whole-Exome Sequencing.

    PubMed

    Artomov, Mykyta; Stratigos, Alexander J; Kim, Ivana; Kumar, Raj; Lauss, Martin; Reddy, Bobby Y; Miao, Benchun; Daniela Robles-Espinoza, Carla; Sankar, Aravind; Njauw, Ching-Ni; Shannon, Kristen; Gragoudas, Evangelos S; Marie Lane, Anne; Iyer, Vivek; Newton-Bishop, Julia A; Timothy Bishop, D; Holland, Elizabeth A; Mann, Graham J; Singh, Tarjinder; Daly, Mark J; Tsao, Hensin

    2017-12-01

    Extraordinary progress has been made in our understanding of common variants in many diseases, including melanoma. Because the contribution of rare coding variants is not as well characterized, we performed an exome-wide, gene-based association study of familial cutaneous melanoma (CM) and ocular melanoma (OM). Using 11 990 jointly processed individual DNA samples, whole-exome sequencing was performed, followed by large-scale joint variant calling using GATK (Genome Analysis ToolKit). PLINK/SEQ was used for statistical analysis of genetic variation. Four models were used to estimate the association among different types of variants. In vitro functional validation was performed using three human melanoma cell lines in 2D and 3D proliferation assays. In vivo tumor growth was assessed using xenografts of human melanoma A375 melanoma cells in nude mice (eight mice per group). All statistical tests were two-sided. Strong signals were detected for CDKN2A (Pmin = 6.16 × 10-8) in the CM cohort (n = 273) and BAP1 (Pmin = 3.83 × 10-6) in the OM (n = 99) cohort. Eleven genes that exhibited borderline association (P < 10-4) were independently validated using The Cancer Genome Atlas melanoma cohort (379 CM, 47 OM) and a matched set of 3563 European controls with CDKN2A (P = .009), BAP1 (P = .03), and EBF3 (P = 4.75 × 10-4), a candidate risk locus, all showing evidence of replication. EBF3 was then evaluated using germline data from a set of 132 familial melanoma cases and 4769 controls of UK origin (joint P = 1.37 × 10-5). Somatically, loss of EBF3 expression correlated with progression, poorer outcome, and high MITF tumors. Functionally, induction of EBF3 in melanoma cells reduced cell growth in vitro, retarded tumor formation in vivo, and reduced MITF levels. The results of this large rare variant germline association study further define the mutational landscape of hereditary melanoma and implicate EBF3 as a possible CM predisposition gene.

  14. Candidate genes for congenital diaphragmatic hernia from animalmodels: sequencing of fog2 and pdgfra reveals rare variants indiaphragmatic hernia patients

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bleyl, S.B.; Moshrefi, A.; Shaw, G.M.

    2007-05-11

    Congenital diaphragmatic hernia (CDH) is a common, lifethreatening birth defect. Although there is strong evidence implicatinggenetic factors in its pathogenesis, few causative genes have beenidentified, and in isolated CDH, only one de novo, nonsense mutation hasbeen reported in FOG2 in a female with posterior diaphragmaticeventration. We report here that the homozygous null mouse for the Pdgfragene has posterolateral diaphragmatic defects and thus is a model forhuman CDH. We hypothesized that mutations in this gene could cause humanCDH. We sequenced PDGFRa and FOG2 in 96 patients with CDH, of which 53had isolated CDH (55.2 percent), 36 had CDH and additional anomalies(37.5more » percent), and 7 had CDH and known chromosome aberrations (7.3percent). For FOG2, we identified novel sequence alterations predictingp.M703L and p.T843A in two patients with isolated CDH that were absent in526 and 564 control chromosomes respectively. These altered amino acidswere highly conserved. However, due to the lack of available parental DNAsamples we were not able to determine if the sequence alterations were denovo. For PDGFRa, we found a single variant predicting p.L967V in apatient with CDH and multiple anomalies that was absent in 768 controlchromosomes. This patient also had one cell with trisomy 15 on skinfibroblast culture, a finding of uncertain significance. Although ourstudy identified sequence variants in FOG2 and PDGFRa, we have notdefinitively established the variants as mutations and we found noevidence that CDH commonly results from mutations in thesegenes.« less

  15. Postnatal Expression of V2 Vasopressin Receptor Splice Variants in the Rat Cerebellum

    PubMed Central

    Vargas, Karina J.; Sarmiento, José M.; Ehrenfeld, Pamela; Añazco, Carolina C.; Villanueva, Carolina I.; Carmona, Pamela L.; Brenet, Marianne; Navarro, Javier; Müller-Esterl, Werner; Figueroa, Carlos D.; González, Carlos B.

    2010-01-01

    The V2 vasopressin receptor gene contains an alternative splice site in exon-3, which leads to the generation of two splice variants (V2a and V2b) first identified in the kidney. The open reading frame of the alternatively spliced V2b transcripten codes a truncated receptor, showing the same amino acid sequence as the canonical V2a receptor up to the 6th transmembrane segment, but displaying a distinct sequence to the corresponding 7th transmembrane segment and C-terminal domain relative to the V2a receptor. Here, we demonstrate the postnatal expression of V2a and V2b variants in the rat cerebellum. Most importantly, we showed by in situ hybridization and immunocytochemistry that both V2 splice variants were preferentially expressed in Purkinje cells, from early to late postnatal development. In addition, both variants were transiently expressed in the neuroblastic external granule cells and Bergmann fibers. These results indicate that the cellular distributions of both splice variants are developmentally regulated, and suggest that the transient expression of the V2 receptor is involved in the mechanisms of cerebellar cytodifferentiation by AVP. Finally, transfected CHO-K1 .expressing similar amounts of both V2 splice variants, as that found in the cerebellum, showed a significant reduction in the surface expression of V2a receptors, suggesting that the differential expression of the V2 splice variants regulate the vasopressin signaling in the cerebellum. PMID:19281786

  16. Coval: Improving Alignment Quality and Variant Calling Accuracy for Next-Generation Sequencing Data

    PubMed Central

    Kosugi, Shunichi; Natsume, Satoshi; Yoshida, Kentaro; MacLean, Daniel; Cano, Liliana; Kamoun, Sophien; Terauchi, Ryohei

    2013-01-01

    Accurate identification of DNA polymorphisms using next-generation sequencing technology is challenging because of a high rate of sequencing error and incorrect mapping of reads to reference genomes. Currently available short read aligners and DNA variant callers suffer from these problems. We developed the Coval software to improve the quality of short read alignments. Coval is designed to minimize the incidence of spurious alignment of short reads, by filtering mismatched reads that remained in alignments after local realignment and error correction of mismatched reads. The error correction is executed based on the base quality and allele frequency at the non-reference positions for an individual or pooled sample. We demonstrated the utility of Coval by applying it to simulated genomes and experimentally obtained short-read data of rice, nematode, and mouse. Moreover, we found an unexpectedly large number of incorrectly mapped reads in ‘targeted’ alignments, where the whole genome sequencing reads had been aligned to a local genomic segment, and showed that Coval effectively eliminated such spurious alignments. We conclude that Coval significantly improves the quality of short-read sequence alignments, thereby increasing the calling accuracy of currently available tools for SNP and indel identification. Coval is available at http://sourceforge.net/projects/coval105/. PMID:24116042

  17. Generic and sequence-variant specific molecular assays for the detection of the highly variable Grapevine leafroll-associated virus 3.

    PubMed

    Chooi, Kar Mun; Cohen, Daniel; Pearson, Michael N

    2013-04-01

    Grapevine leafroll-associated virus 3 (GLRaV-3) is an economically important virus, which is found in all grapevine growing regions worldwide. Its accurate detection in nursery and field samples is of high importance for certification schemes and disease management programmes. To reduce false negatives that can be caused by sequence variability, a new universal primer pair was designed against a divergent sequence data set, targeting the open reading frame 4 (heat shock protein 70 homologue gene), and optimised for conventional one-step RT-PCR and one-step SYBR Green real-time RT-PCR assays. In addition, primer pairs for the simultaneous detection of specific GLRaV-3 variants from groups 1, 2, 6 (specifically NZ-1) and the outlier NZ2 variant, and the generic detection of variants from groups 1 to 5 were designed and optimised as a conventional one-step multiplex RT-PCR assay using the plant nad5 gene as an internal control (i.e. one-step hexaplex RT-PCR). Results showed that the generic and variant specific assays detected in vitro RNA transcripts from a range of 1×10(1)-1×10(8) copies of amplicon per μl diluted in healthy total RNA from Vitis vinifera cv. Cabernet Sauvignon. Furthermore, the assays were employed effectively to screen 157 germplasm and 159 commercial field samples. Thus results demonstrate that the GLRaV-3 generic and variant-specific assays are prospective tools that will be beneficial for certification schemes and disease management programmes, as well as biological and epidemiological studies of the divergent GLRaV-3 populations. Copyright © 2013 Elsevier B.V. All rights reserved.

  18. Deep sequencing is an appropriate tool for the selection of unique Hepatitis C virus (HCV) variants after single genomic amplification.

    PubMed

    Guinoiseau, Thibault; Moreau, Alain; Hohnadel, Guillaume; Ngo-Giang-Huong, Nicole; Brulard, Celine; Vourc'h, Patrick; Goudeau, Alain; Gaudy-Graffin, Catherine

    2017-01-01

    Hepatitis C virus (HCV) evolves rapidly in a single host and circulates as a quasispecies wich is a complex mixture of genetically distinct virus's but closely related namely variants. To identify intra-individual diversity and investigate their functional properties in vitro, it is necessary to define their quasispecies composition and isolate the HCV variants. This is possible using single genome amplification (SGA). This technique, based on serially diluted cDNA to amplify a single cDNA molecule (clonal amplicon), has already been used to determine individual HCV diversity. In these studies, positive PCR reactions from SGA were directly sequenced using Sanger technology. The detection of non-clonal amplicons is necessary for excluding them to facilitate further functional analysis. Here, we compared Next Generation Sequencing (NGS) with De Novo assembly and Sanger sequencing for their ability to distinguish clonal and non-clonal amplicons after SGA on one plasma specimen. All amplicons (n = 42) classified as clonal by NGS were also classified as clonal by Sanger sequencing. No double peaks were seen on electropherograms for non-clonal amplicons with position-specific nucleotide variation below 15% by NGS. Altogether, NGS circumvented many of the difficulties encountered when using Sanger sequencing after SGA and is an appropriate tool to reliability select clonal amplicons for further functional studies.

  19. Amino acid sequence of the human fibronectin receptor

    PubMed Central

    1987-01-01

    The amino acid sequence deduced from cDNA of the human placental fibronectin receptor is reported. The receptor is composed of two subunits: an alpha subunit of 1,008 amino acids which is processed into two polypeptides disulfide bonded to one another, and a beta subunit of 778 amino acids. Each subunit has near its COOH terminus a hydrophobic segment. This and other sequence features suggest a structure for the receptor in which the hydrophobic segments serve as transmembrane domains anchoring each subunit to the membrane and dividing each into a large ectodomain and a short cytoplasmic domain. The alpha subunit ectodomain has five sequence elements homologous to consensus Ca2+- binding sites of several calcium-binding proteins, and the beta subunit contains a fourfold repeat strikingly rich in cysteine. The alpha subunit sequence is 46% homologous to the alpha subunit of the vitronectin receptor. The beta subunit is 44% homologous to the human platelet adhesion receptor subunit IIIa and 47% homologous to a leukocyte adhesion receptor beta subunit. The high degree of homology (85%) of the beta subunit with one of the polypeptides of a chicken adhesion receptor complex referred to as integrin complex strongly suggests that the latter polypeptide is the chicken homologue of the fibronectin receptor beta subunit. These receptor subunit homologies define a superfamily of adhesion receptors. The availability of the entire protein sequence for the fibronectin receptor will facilitate studies on the functions of these receptors. PMID:2958481

  20. Phenolic acid esterases, coding sequences and methods

    DOEpatents

    Blum, David L.; Kataeva, Irina; Li, Xin-Liang; Ljungdahl, Lars G.

    2002-01-01

    Described herein are four phenolic acid esterases, three of which correspond to domains of previously unknown function within bacterial xylanases, from XynY and XynZ of Clostridium thermocellum and from a xylanase of Ruminococcus. The fourth specifically exemplified xylanase is a protein encoded within the genome of Orpinomyces PC-2. The amino acids of these polypeptides and nucleotide sequences encoding them are provided. Recombinant host cells, expression vectors and methods for the recombinant production of phenolic acid esterases are also provided.

  1. Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1998-01-01

    A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration.

  2. Exploring the feasibility of using copy number variants as genetic markers through large-scale whole genome sequencing experiments

    USDA-ARS?s Scientific Manuscript database

    Copy number variants (CNV) are large scale duplications or deletions of genomic sequence that are caused by a diverse set of molecular phenomena that are distinct from single nucleotide polymorphism (SNP) formation. Due to their different mechanisms of formation, CNVs are often difficult to track us...

  3. Diet1, bile acid diarrhea, and FGF15/19: mouse model and human genetic variants.

    PubMed

    Lee, Jessica M; Ong, Jessica R; Vergnes, Laurent; de Aguiar Vallim, Thomas Q; Nolan, Jonathan; Cantor, Rita M; Walters, Julian R F; Reue, Karen

    2018-03-01

    Diet1 modulates intestinal production of the hormone, fibroblast growth factor (FGF)15, which signals in liver to regulate bile acid synthesis. C57BL/6ByJ mice with a spontaneous Diet1 -null mutation are resistant to hypercholesterolemia compared with wild-type C57BL/6J mice through enhanced cholesterol conversion to bile acids. To further characterize the role of Diet1 in metabolism, we generated Diet1 -/- mice on the C57BL/6J genetic background. C57BL/6J Diet1 -/- mice had elevated bile acid levels, reduced Fgf15 expression, and increased gastrointestinal motility and intestinal luminal water content, which are symptoms of bile acid diarrhea (BAD) in humans. Natural genetic variation in Diet1 mRNA expression levels across 76 inbred mouse strains correlated positively with Ffg15 mRNA and negatively with serum bile acid levels. This led us to investigate the role of DIET1 genetic variation in primary BAD patients. We identified a DIET1 coding variant ( rs12256835 ) that had skewed prevalence between BAD cases and controls. This variant causes an H1721Q amino acid substitution that increases the levels of FGF19 protein secreted from cultured cells. We propose that genetic variation in DIET1 may be a determinant of FGF19 secretion levels, and may affect bile acid metabolism in both physiological and pathological conditions. Copyright © 2018 by the American Society for Biochemistry and Molecular Biology, Inc.

  4. WHATIF: an open-source desktop application for extraction and management of the incidental findings from next-generation sequencing variant data

    PubMed Central

    Ye, Zhan; Kadolph, Christopher; Strenn, Robert; Wall, Daniel; McPherson, Elizabeth; Lin, Simon

    2015-01-01

    Background Identification and evaluation of incidental findings in patients following whole exome (WGS) or whole genome sequencing (WGS) is challenging for both practicing physicians and researchers. The American College of Medical Genetics and Genomics (ACMG) recently recommended a list of reportable incidental genetic findings. However, no informatics tools are currently available to support evaluation of incidental findings in next-generation sequencing data. Methods The Wisconsin Hierarchical Analysis Tool for Incidental Findings (WHATIF), was developed as a stand-alone Windows-based desktop executable, to support the interactive analysis of incidental findings in the context of the ACMG recommendations. WHATIF integrates the European Bioinformatics Institute Variant Effect Predictor (VEP) tool for biological interpretation and the National Center for Biotechnology Information ClinVar tool for clinical interpretation. Results An open-source desktop program was created to annotate incidental findings and present the results with a user-friendly interface. Further, a meaningful index (WHATIF Index) was devised for each gene to facilitate ranking of the relative importance of the variants and estimate the potential workload associated with further evaluation of the variants. Our WHATIF application is available at: http://tinyurl.com/WHATIF-SOFTWARE Conclusions The WHATIF application offers a user-friendly interface and allows users to investigate the extracted variant information efficiently and intuitively while always accessing the up to date information on variants via application programming interfaces (API) connections. WHATIF’s highly flexible design and straightforward implementation aids users in customizing the source code to meet their own special needs. PMID:25890833

  5. A novel pathogenic variant in an Iranian Ataxia telangiectasia family revealed by next-generation sequencing followed by in silico analysis.

    PubMed

    Tabatabaiefar, Mohammad Amin; Alipour, Paria; Pourahmadiyan, Azam; Fattahi, Najmeh; Shariati, Laleh; Golchin, Neda; Mohammadi-Asl, Javad

    2017-08-15

    Ataxia telangiectasia (A-T) is a neurodegenerative autosomal recessive disorder with the main characteristics of progressive cerebellar degeneration, sensitivity to ionizing radiation, immunodeficiency, telangiectasia, premature aging, recurrent sinopulmonary infections, and increased risk of malignancy, especially of lymphoid origin. Ataxia Telangiectasia Mutated gene, ATM, as a causative gene for the A-T disorder, encodes the ATM protein, which plays an important role in the activation of cell-cycle checkpoints and initiation of DNA repair in response to DNA damage. Targeted next-generation sequencing (NGS) was performed on an Iranian 5-year-old boy presented with truncal and limb ataxia, telangiectasia of the eye, Hodgkin lymphoma, hyper pigmentation, total alopecia, hepatomegaly, and dysarthria. Sanger sequencing was used to confirm the candidate pathogenic variants. Computational docking was done using the HEX software to examine how this change affects the interactions of ATM with the upstream and downstream proteins. Three different variants were identified comprising two homozygous SNPs and one novel homozygous frameshift variant (c.80468047delTA, p.Thr2682ThrfsX5), which creates a stop codon in exon 57 leaving the protein truncated at its C-terminal portion. Therefore, the activation and phosphorylation of target proteins are lost. Moreover, the HEX software confirmed that the mutated protein lost its interaction with upstream and downstream proteins. The variant was classified as pathogenic based on the American College of Medical Genetics and Genomics guideline. This study expands the spectrum of ATM pathogenic variants in Iran and demonstrates the utility of targeted NGS in genetic diagnostics. Copyright © 2017. Published by Elsevier B.V.

  6. Unlocking hidden genomic sequence

    PubMed Central

    Keith, Jonathan M.; Cochran, Duncan A. E.; Lala, Gita H.; Adams, Peter; Bryant, Darryn; Mitchelson, Keith R.

    2004-01-01

    Despite the success of conventional Sanger sequencing, significant regions of many genomes still present major obstacles to sequencing. Here we propose a novel approach with the potential to alleviate a wide range of sequencing difficulties. The technique involves extracting target DNA sequence from variants generated by introduction of random mutations. The introduction of mutations does not destroy original sequence information, but distributes it amongst multiple variants. Some of these variants lack problematic features of the target and are more amenable to conventional sequencing. The technique has been successfully demonstrated with mutation levels up to an average 18% base substitution and has been used to read previously intractable poly(A), AT-rich and GC-rich motifs. PMID:14973330

  7. Germ-line variants identified by next generation sequencing in a panel of estrogen and cancer associated genes correlate with poor clinical outcome in Lynch syndrome patients.

    PubMed

    Jóri, Balazs; Kamps, Rick; Xanthoulea, Sofia; Delvoux, Bert; Blok, Marinus J; Van de Vijver, Koen K; de Koning, Bart; Oei, Felicia Trups; Tops, Carli M; Speel, Ernst Jm; Kruitwagen, Roy F; Gomez-Garcia, Encarna B; Romano, Andrea

    2015-12-01

    The risk to develop colorectal and endometrial cancers among subjects testing positive for a pathogenic Lynch syndrome mutation varies, making the risk prediction difficult. Genetic risk modifiers alter the risk conferred by inherited Lynch syndrome mutations, and their identification can improve genetic counseling. We aimed at identifying rare genetic modifiers of the risk of Lynch syndrome endometrial cancer. A family based approach was used to assess the presence of genetic risk modifiers among 35 Lynch syndrome mutation carriers having either a poor clinical phenotype (early age of endometrial cancer diagnosis or multiple cancers) or a neutral clinical phenotype. Putative genetic risk modifiers were identified by Next Generation Sequencing among a panel of 154 genes involved in endometrial physiology and carcinogenesis. A simple pipeline, based on an allele frequency lower than 0.001 and on predicted non-conservative amino-acid substitutions returned 54 variants that were considered putative risk modifiers. The presence of two or more risk modifying variants in women carrying a pathogenic Lynch syndrome mutation was associated with a poor clinical phenotype. A gene-panel is proposed that comprehends genes that can carry variants with putative modifying effects on the risk of Lynch syndrome endometrial cancer. Validation in further studies is warranted before considering the possible use of this tool in genetic counseling.

  8. Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1998-03-24

    A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration. 14 figs.

  9. An inversion of 25 base pairs causes feline GM2 gangliosidosis variant.

    PubMed

    Martin, Douglas R; Krum, Barbara K; Varadarajan, G S; Hathcock, Terri L; Smith, Bruce F; Baker, Henry J

    2004-05-01

    In G(M2) gangliosidosis variant 0, a defect in the beta-subunit of lysosomal beta-N-acetylhexosaminidase (EC 3.2.1.52) causes abnormal accumulation of G(M2) ganglioside and severe neurodegeneration. Distinct feline models of G(M2) gangliosidosis variant 0 have been described in both domestic shorthair and Korat cats. In this study, we determined that the causative mutation of G(M2) gangliosidosis in the domestic shorthair cat is a 25-base-pair inversion at the extreme 3' end of the beta-subunit (HEXB) coding sequence, which introduces three amino acid substitutions at the carboxyl terminus of the protein and a translational stop that is eight amino acids premature. Cats homozygous for the 25-base-pair inversion express levels of beta-subunit mRNA approximately 190% of normal and protein levels only 10-20% of normal. Because the 25-base-pair inversion is similar to mutations in the terminal exon of human HEXB, the domestic shorthair cat should serve as an appropriate model to study the molecular pathogenesis of human G(M2) gangliosidosis variant 0 (Sandhoff disease).

  10. Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification.

    PubMed

    Sinclair, Robert M; Ravantti, Janne J; Bamford, Dennis H

    2017-04-15

    Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly

  11. Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification

    PubMed Central

    Sinclair, Robert M.; Ravantti, Janne J.

    2017-01-01

    ABSTRACT Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids

  12. Functional Consequences of a Novel Variant of PCSK1

    PubMed Central

    Pickett, Lindsay A.; Yourshaw, Michael; Albornoz, Valeria; Chen, Zijun; Solorzano-Vargas, R. Sergio; Nelson, Stanley F.; Martín, Martín G.; Lindberg, Iris

    2013-01-01

    Background Common single nucleotide polymorphisms (SNPs) in proprotein convertase subtilisin/kexin type 1 with modest effects on PC1/3 in vitro have been associated with obesity in five genome-wide association studies and with diabetes in one genome-wide association study. We here present a novel SNP and compare its biosynthesis, secretion and catalytic activity to wild-type enzyme and to SNPs that have been linked to obesity. Methodology/Principal Findings A novel PC1/3 variant introducing an Arg to Gln amino acid substitution at residue 80 (within the secondary cleavage site of the prodomain) (rs1799904) was studied. This novel variant was selected for analysis from the 1000 Genomes sequencing project based on its predicted deleterious effect on enzyme function and its comparatively more frequent allele frequency. The actual existence of the R80Q (rs1799904) variant was verified by Sanger sequencing. The effects of this novel variant on the biosynthesis, secretion, and catalytic activity were determined; the previously-described obesity risk SNPs N221D (rs6232), Q665E/S690T (rs6234/rs6235), and the Q665E and S690T SNPs (analyzed separately) were included for comparative purposes. The novel R80Q (rs1799904) variant described in this study resulted in significantly detrimental effects on both the maturation and in vitro catalytic activity of PC1/3. Conclusion/Significance Our findings that this novel R80Q (rs1799904) variant both exhibits adverse effects on PC1/3 activity and is prevalent in the population suggests that further biochemical and genetic analysis to assess its contribution to the risk of metabolic disease within the general population is warranted. PMID:23383060

  13. Detection of clinically relevant copy-number variants by exome sequencing in a large cohort of genetic disorders

    PubMed Central

    Pfundt, Rolph; del Rosario, Marisol; Vissers, Lisenka E.L.M.; Kwint, Michael P.; Janssen, Irene M.; de Leeuw, Nicole; Yntema, Helger G.; Nelen, Marcel R.; Lugtenberg, Dorien; Kamsteeg, Erik-Jan; Wieskamp, Nienke; Stegmann, Alexander P.A.; Stevens, Servi J.C.; Rodenburg, Richard J.T.; Simons, Annet; Mensenkamp, Arjen R.; Rinne, Tuula; Gilissen, Christian; Scheffer, Hans; Veltman, Joris A.; Hehir-Kwa, Jayne Y.

    2017-01-01

    Purpose: Copy-number variation is a common source of genomic variation and an important genetic cause of disease. Microarray-based analysis of copy-number variants (CNVs) has become a first-tier diagnostic test for patients with neurodevelopmental disorders, with a diagnostic yield of 10–20%. However, for most other genetic disorders, the role of CNVs is less clear and most diagnostic genetic studies are generally limited to the study of single-nucleotide variants (SNVs) and other small variants. With the introduction of exome and genome sequencing, it is now possible to detect both SNVs and CNVs using an exome- or genome-wide approach with a single test. Methods: We performed exome-based read-depth CNV screening on data from 2,603 patients affected by a range of genetic disorders for which exome sequencing was performed in a diagnostic setting. Results: In total, 123 clinically relevant CNVs ranging in size from 727 bp to 15.3 Mb were detected, which resulted in 51 conclusive diagnoses and an overall increase in diagnostic yield of ~2% (ranging from 0 to –5.8% per disorder). Conclusions: This study shows that CNVs play an important role in a broad range of genetic disorders and that detection via exome-based CNV profiling results in an increase in the diagnostic yield without additional testing, bringing us closer to single-test genomics. Genet Med advance online publication 27 October 2016 PMID:28574513

  14. Clinical Validation of Copy Number Variant Detection from Targeted Next-Generation Sequencing Panels.

    PubMed

    Kerkhof, Jennifer; Schenkel, Laila C; Reilly, Jack; McRobbie, Sheri; Aref-Eshghi, Erfan; Stuart, Alan; Rupar, C Anthony; Adams, Paul; Hegele, Robert A; Lin, Hanxin; Rodenhiser, David; Knoll, Joan; Ainsworth, Peter J; Sadikovic, Bekim

    2017-11-01

    Next-generation sequencing (NGS) technology has rapidly replaced Sanger sequencing in the assessment of sequence variations in clinical genetics laboratories. One major limitation of current NGS approaches is the ability to detect copy number variations (CNVs) approximately >50 bp. Because these represent a major mutational burden in many genetic disorders, parallel CNV assessment using alternate supplemental methods, along with the NGS analysis, is normally required, resulting in increased labor, costs, and turnaround times. The objective of this study was to clinically validate a novel CNV detection algorithm using targeted clinical NGS gene panel data. We have applied this approach in a retrospective cohort of 391 samples and a prospective cohort of 2375 samples and found a 100% sensitivity (95% CI, 89%-100%) for 37 unique events and a high degree of specificity to detect CNVs across nine distinct targeted NGS gene panels. This NGS CNV pipeline enables stand-alone first-tier assessment for CNV and sequence variants in a clinical laboratory setting, dispensing with the need for parallel CNV analysis using classic techniques, such as microarray, long-range PCR, or multiplex ligation-dependent probe amplification. This NGS CNV pipeline can also be applied to the assessment of complex genomic regions, including pseudogenic DNA sequences, such as the PMS2CL gene, and to mitochondrial genome heteroplasmy detection. Copyright © 2017 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  15. Targeted deep sequencing identifies rare loss-of-function variants in IFNGR1 for risk of atopic dermatitis complicated by eczema herpeticum.

    PubMed

    Gao, Li; Bin, Lianghua; Rafaels, Nicholas M; Huang, Lili; Potee, Joseph; Ruczinski, Ingo; Beaty, Terri H; Paller, Amy S; Schneider, Lynda C; Gallo, Rich; Hanifin, Jon M; Beck, Lisa A; Geha, Raif S; Mathias, Rasika A; Barnes, Kathleen C; Leung, Donald Y M

    2015-12-01

    A subset of atopic dermatitis is associated with increased susceptibility to eczema herpeticum (ADEH+). We previously reported that common single nucleotide polymorphisms (SNPs) in the IFN-γ (IFNG) and IFN-γ receptor 1 (IFNGR1) genes were associated with the ADEH+ phenotype. We sought to interrogate the role of rare variants in interferon pathway genes for the risk of ADEH+. We performed targeted sequencing of interferon pathway genes (IFNG, IFNGR1, IFNAR1, and IL12RB1) in 228 European American patients with AD selected according to their eczema herpeticum status, and severity was measured by using the Eczema Area and Severity Index. Replication genotyping was performed in independent samples of 219 European American and 333 African American subjects. Functional investigation of loss-of-function variants was conducted by using site-directed mutagenesis. We identified 494 single nucleotide variants encompassing 105 kb of sequence, including 145 common, 349 (70.6%) rare (minor allele frequency <5%), and 86 (17.4%) novel variants, of which 2.8% were coding synonymous, 93.3% were noncoding (64.6% intronic), and 3.8% were missense. We identified 6 rare IFNGR1 missense variants, including 3 damaging variants (Val14Met [V14M], Val61Ile, and Tyr397Cys [Y397C]) conferring a higher risk for ADEH+ (P = .031). Variants V14M and Y397C were confirmed to be deleterious, leading to partial IFNGR1 deficiency. Seven common IFNGR1 SNPs, along with common protective haplotypes (2-7 SNPs), conferred a reduced risk of ADEH+ (P = .015-.002 and P = .0015-.0004, respectively), and both SNP and haplotype associations were replicated in an independent African American sample (P = .004-.0001 and P = .001-.0001, respectively). Our results provide evidence that both genetic variants in the gene encoding IFNGR1 are implicated in susceptibility to the ADEH+ phenotype. Copyright © 2015 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.

  16. Sequence variants in four genes underlying Bardet-Biedl syndrome in consanguineous families

    PubMed Central

    Ullah, Asmat; Umair, Muhammad; Yousaf, Maryam; Khan, Sher Alam; Nazim-ud-din, Muhammad; Shah, Khadim; Ahmad, Farooq; Azeem, Zahid; Ali, Ghazanfar; Alhaddad, Bader; Rafique, Afzal; Jan, Abid; Haack, Tobias B.; Strom, Tim M.; Meitinger, Thomas; Ghous, Tahseen

    2017-01-01

    Purpose To investigate the molecular basis of Bardet-Biedl syndrome (BBS) in five consanguineous families of Pakistani origin. Methods Linkage in two families (A and B) was established to BBS7 on chromosome 4q27, in family C to BBS8 on chromosome 14q32.1, and in family D to BBS10 on chromosome 12q21.2. Family E was investigated directly with exome sequence analysis. Results Sanger sequencing revealed two novel mutations and three previously reported mutations in the BBS genes. These mutations include two deletions (c.580_582delGCA, c.1592_1597delTTCCAG) in the BBS7 gene, a missense mutation (p.Gln449His) in the BBS8 gene, a frameshift mutation (c.271_272insT) in the BBS10 gene, and a nonsense mutation (p.Ser40*) in the MKKS (BBS6) gene. Conclusions Two novel mutations and three previously reported variants, identified in the present study, further extend the body of evidence implicating BBS6, BBS7, BBS8, and BBS10 in causing BBS. PMID:28761321

  17. Variants in the human intestinal fatty acid binding protein 2 gene in obese subjects.

    PubMed

    Sipiläinen, R; Uusitupa, M; Heikkinen, S; Rissanen, A; Laakso, M

    1997-08-01

    Fatty acid binding protein 2 gene (FABP2) has been proposed to be an important candidate gene for insulin resistance; therefore, it also could be a promising candidate gene for obesity. We screened the whole coding region of the FABP2 gene in 40 obese nondiabetic Finnish subjects. Furthermore, we investigated the effects of the codon 54 polymorphism of this gene (Ala-->Thr) on insulin levels and basal metabolic rate in 170 obese subjects. The frequencies of the variants found in exon 4 (GTA-->GTG) and 3'-noncoding region (GCGCA-->GCACA), as well as the allele frequencies for the variable lengths of the ATT repeat sequence in intron 2 did not differ between the obese subjects and nonobese controls. The frequency of threonine-encoding allele in codon 54 of the FABP2 gene did not differ between obese and control subjects (28 vs. 29%, respectively). In the obese group there were no differences in gender distribution, age, weight, body mass index, lean body mass, percentage of body fat, waist circumference, and waist-to-hip ratio among the individuals homozygous for Ala54, heterozygous for Thr54, and homozygous for Thr54-encoding alleles. Similarly, fasting serum insulin, glucose, lipids and lipoprotein concentrations, basal metabolic rate (adjusted for lean body mass and age), respiratory quotient, and rates of glucose and lipid oxidation did not differ among the groups. We conclude that obesity is not associated with specific variants in the FABP2 gene. Furthermore, the codon 54 Ala to Thr polymorphism of this gene does not influence insulin levels or basal metabolic rate in obese Finns.

  18. Longitudinal Antigenic Sequences and Sites from Intra-Host Evolution (LASSIE) identifies immune-selected HIV variants

    DOE PAGES

    Hraber, Peter; Korber, Bette; Wagh, Kshitij; ...

    2015-10-21

    Within-host genetic sequencing from samples collected over time provides a dynamic view of how viruses evade host immunity. Immune-driven mutations might stimulate neutralization breadth by selecting antibodies adapted to cycles of immune escape that generate within-subject epitope diversity. Comprehensive identification of immune-escape mutations is experimentally and computationally challenging. With current technology, many more viral sequences can readily be obtained than can be tested for binding and neutralization, making down-selection necessary. Typically, this is done manually, by picking variants that represent different time-points and branches on a phylogenetic tree. Such strategies are likely to miss many relevant mutations and combinations ofmore » mutations, and to be redundant for other mutations. Longitudinal Antigenic Sequences and Sites from Intrahost Evolution (LASSIE) uses transmitted founder loss to identify virus “hot-spots” under putative immune selection and chooses sequences that represent recurrent mutations in selected sites. LASSIE favors earliest sequences in which mutations arise. Here, with well-characterized longitudinal Env sequences, we confirmed selected sites were concentrated in antibody contacts and selected sequences represented diverse antigenic phenotypes. Finally, practical applications include rapidly identifying immune targets under selective pressure within a subject, selecting minimal sets of reagents for immunological assays that characterize evolving antibody responses, and for immunogens in polyvalent “cocktail” vaccines.« less

  19. Deep sequencing is an appropriate tool for the selection of unique Hepatitis C virus (HCV) variants after single genomic amplification

    PubMed Central

    Guinoiseau, Thibault; Moreau, Alain; Hohnadel, Guillaume; Ngo-Giang-Huong, Nicole; Brulard, Celine; Vourc’h, Patrick; Goudeau, Alain; Gaudy-Graffin, Catherine

    2017-01-01

    Hepatitis C virus (HCV) evolves rapidly in a single host and circulates as a quasispecies wich is a complex mixture of genetically distinct virus’s but closely related namely variants. To identify intra-individual diversity and investigate their functional properties in vitro, it is necessary to define their quasispecies composition and isolate the HCV variants. This is possible using single genome amplification (SGA). This technique, based on serially diluted cDNA to amplify a single cDNA molecule (clonal amplicon), has already been used to determine individual HCV diversity. In these studies, positive PCR reactions from SGA were directly sequenced using Sanger technology. The detection of non-clonal amplicons is necessary for excluding them to facilitate further functional analysis. Here, we compared Next Generation Sequencing (NGS) with De Novo assembly and Sanger sequencing for their ability to distinguish clonal and non-clonal amplicons after SGA on one plasma specimen. All amplicons (n = 42) classified as clonal by NGS were also classified as clonal by Sanger sequencing. No double peaks were seen on electropherograms for non-clonal amplicons with position-specific nucleotide variation below 15% by NGS. Altogether, NGS circumvented many of the difficulties encountered when using Sanger sequencing after SGA and is an appropriate tool to reliability select clonal amplicons for further functional studies. PMID:28362878

  20. Screening of SHOX gene sequence variants in Saudi Arabian children with idiopathic short stature.

    PubMed

    Alharthi, Abdulla A; El-Hallous, Ehab I; Talaat, Iman M; Alghamdi, Hamed A; Almalki, Matar I; Gaber, Ahmed

    2017-10-01

    Short stature affects approximately 2%-3% of children, representing one of the most frequent disorders for which clinical attention is sought during childhood. Despite assumed genetic heterogeneity, mutations or deletions in the short stature homeobox-containing gene ( SHOX ) are frequently detected in subjects with short stature. Idiopathic short stature (ISS) refers to patients with short stature for various unknown reasons. The goal of this study was to screen all the exons of SHOX to identify related mutations. We screened all the exons of SHOX for mutations analysis in 105 ISS children patients (57 girls and 48 boys) living in Taif governorate, KSA using a direct DNA sequencing method. Height, arm span, and sitting height were recorded, and subischial leg length was calculated. A total of 30 of 105 ISS patients (28%) contained six polymorphic variants in exons 1, 2, 4, and 6. One mutation was found in the DNA domain binding region of exon 4. Three of these polymorphic variants were novel, while the others were reported previously. There were no significant differences in anthropometric measures in ISS patients with and without identifiable polymorphic variants in SHOX . In Saudi Arabia ISS patients, rather than SHOX , it is possible that new genes are involved in longitudinal growth. Additional molecular analysis is required to diagnose and understand the etiology of this disease.

  1. Synthesis of betulinic acid derivatives as entry inhibitors against HIV-1 and bevirimat-resistant HIV-1 variants.

    PubMed

    Dang, Zhao; Qian, Keduo; Ho, Phong; Zhu, Lei; Lee, Kuo-Hsiung; Huang, Li; Chen, Chin-Ho

    2012-08-15

    Betulinic acid derivatives modified at the C28 position are HIV-1entry inhibitors such as compound A43D; however, modified at the C3 position instead of C28 give HIV-1 maturation inhibitor such as bevirimat. Bevirimat exhibited promising pharmacokinetic profiles in clinical trials, but its effectiveness was compromised by the high baseline drug resistance of HIV-1 variants with polymorphism in the putative drug binding site. In an effort to determine whether the viruses with bevirimat resistant polymorphism also altered their sensitivities to the betulinic acid derivatives that inhibit HIV-1 entry, a series of new betulinic acid entry inhibitors were synthesized and tested for their activities against HIV-1 NL4-3 and NL4-3 variants resistant to bevirimat. The results show that the bevirimat resistant viruses were approximately 5- to10-fold more sensitive to three new glutamine ester derivatives (13, 15 and 38) and A43D in an HIV-1 multi-cycle replication assay. In contrast, the wild type NL4-3 and the bevirimat resistant variants were equally sensitive to the HIV-1 RT inhibitor AZT. In addition, these three new compounds markedly improved microsomal stability compared to A43D. Copyright © 2012 Elsevier Ltd. All rights reserved.

  2. Adhesion of glucosyltransferase phase variants to Streptococcus gordonii bacterium-glucan substrata may involve lipoteichoic acid.

    PubMed

    Vickerman, M M; Jones, G W

    1992-10-01

    Growing Streptococcus gordonii Spp+ phase variants, which have normal levels of glucosyltransferase (GTF) activity, use sucrose to promote their accumulation on surfaces by forming a cohesive bacterium-insoluble glucan polymer mass (BPM). Spp- phase variants, which have lower levels of GTF activity, do not form BPMs and do not remain in BPMs formed by Spp+ cells when grown in mixed cultures. To test the hypothesis that segregation of attached Spp+ and unattached Spp- cells was due to differences in adhesiveness, adhesion between washed, [3H]thymidine-labeled cells and preformed BPM substrata was measured. Unexpectedly, the results showed that cells of both phenotypes, as well as GTF-negative cells, attached equally well to preformed BPMs, indicating that attachment to BPMs was independent of cell surface GTF activity. Initial characterization of this binding interaction suggested that a protease-sensitive component on the washed cells may be binding to lipoteichoic acids sequestered in the BPM, since exogenous lipoteichoic acid inhibited adhesion. Surprisingly, the adhesion of both Spp+ and Spp- cells was markedly inhibited in the presence of sucrose, which also released lipoteichoic acid from the BPM. These in vitro findings suggest that, in vivo, sucrose and lipoteichoic acid may modify dental plaque development by enhancing or inhibiting the attachment of additional bacteria.

  3. Adhesion of glucosyltransferase phase variants to Streptococcus gordonii bacterium-glucan substrata may involve lipoteichoic acid.

    PubMed Central

    Vickerman, M M; Jones, G W

    1992-01-01

    Growing Streptococcus gordonii Spp+ phase variants, which have normal levels of glucosyltransferase (GTF) activity, use sucrose to promote their accumulation on surfaces by forming a cohesive bacterium-insoluble glucan polymer mass (BPM). Spp- phase variants, which have lower levels of GTF activity, do not form BPMs and do not remain in BPMs formed by Spp+ cells when grown in mixed cultures. To test the hypothesis that segregation of attached Spp+ and unattached Spp- cells was due to differences in adhesiveness, adhesion between washed, [3H]thymidine-labeled cells and preformed BPM substrata was measured. Unexpectedly, the results showed that cells of both phenotypes, as well as GTF-negative cells, attached equally well to preformed BPMs, indicating that attachment to BPMs was independent of cell surface GTF activity. Initial characterization of this binding interaction suggested that a protease-sensitive component on the washed cells may be binding to lipoteichoic acids sequestered in the BPM, since exogenous lipoteichoic acid inhibited adhesion. Surprisingly, the adhesion of both Spp+ and Spp- cells was markedly inhibited in the presence of sucrose, which also released lipoteichoic acid from the BPM. These in vitro findings suggest that, in vivo, sucrose and lipoteichoic acid may modify dental plaque development by enhancing or inhibiting the attachment of additional bacteria. PMID:1398940

  4. Beta-glucosidase I variants with improved properties

    DOEpatents

    Bott, Richard R.; Kaper, Thijs; Kelemen, Bradley; Goedegebuur, Frits; Hommes, Ronaldus Wilhelmus; Kralj, Slavko; Kruithof, Paulien; Nikolaev, Igor; Van Der Kley, Wilhelmus Antonious Hendricus; Van Lieshout, Johannes Franciscus Thomas; Van Stigt Thans, Sander

    2016-09-20

    The present disclosure is generally directed to enzymes and in particular beta-glucosidase variants. Also described are nucleic acids encoding beta-glucosidase variants, compositions comprising beta-glucosidase variants, methods of using beta-glucosidase variants, and methods of identifying additional useful beta-glucosidase variants.

  5. PNPLA3 variant I148M is associated with altered hepatic lipid composition in humans.

    PubMed

    Peter, Andreas; Kovarova, Marketa; Nadalin, Silvio; Cermak, Tomas; Königsrainer, Alfred; Machicao, Fausto; Stefan, Norbert; Häring, Hans-Ulrich; Schleicher, Erwin

    2014-10-01

    The common sequence variant I148M of the patatin-like phospholipase domain-containing protein 3 gene (PNPLA3) is associated with increased hepatic triacylglycerol (TAG) content, but not with insulin resistance, in humans. The PNPLA3 (I148M) variant was previously reported to alter the specificity of the encoded enzyme and subsequently affect lipid composition. We analysed the fatty acid composition of five lipid fractions from liver tissue samples from 52 individuals, including 19 carriers of the minor PNPLA3 (I148M) variant. PNPLA3 (I148M) was associated with a strong increase (1.75-fold) in liver TAGs, but with no change in other lipid fractions. PNPLA3 (I148M) minor allele carriers had an increased n-3 polyunsaturated fatty acid (PUFA) α-linolenic acid content and reductions in several n-6 PUFAs in the liver TAG fraction. Furthermore, there was a strong inverse correlation between n-6 PUFA and TAG content independent of PNPLA3 genotype. In a multivariate model including liver fat content, PNPLA3 genotype and fatty acid composition, two significant differences could be exclusively attributed to the PNPLA3 (I148M) minor allele: reduced stearic acid and increased α-linolenic acid content in the hepatic TAG fraction. These changes therefore suggest a mechanism to explain the PNPLA3 (I148M)-dependent increase in liver fat content without causing insulin resistance. Stearic acid can induce insulin resistance, whereas α-linolenic acid may protect against it.

  6. Paenibacillus polymyxa PKB1 produces variants of polymyxin B-type antibiotics.

    PubMed

    Shaheen, Mohamed; Li, Jingru; Ross, Avena C; Vederas, John C; Jensen, Susan E

    2011-12-23

    Polymyxins are cationic lipopeptide antibiotics active against many species of Gram-negative bacteria. We sequenced the gene cluster for polymyxin biosynthesis from Paenibacillus polymyxa PKB1. The 40.8 kb gene cluster comprises three nonribosomal peptide synthetase-encoding genes and two ABC transporter-like genes. Disruption of a peptide synthetase gene abolished all antibiotic production, whereas deletion of one or both transporter genes only reduced antibiotic production. Computational analysis of the peptide synthetase modules suggested that the enzyme system produces variant forms of polymyxin B (1 and 2), with D-2,4-diaminobutyrate instead of L-2,4-diaminobutyrate in amino acid position 3. Two antibacterial metabolites were resolved by HPLC and identified by high-resolution mass spectrometry and MS/MS sequencing as the expected variants 3 and 4 of polymyxin B(1) (1) and B(2) (2). Stereochemical analysis confirmed the presence of both D-2,4-diaminobutyrate and L-2,4-diaminobutyrate residues. Copyright © 2011 Elsevier Ltd. All rights reserved.

  7. Rare Variant, Gene-Based Association Study of Hereditary Melanoma Using Whole-Exome Sequencing

    PubMed Central

    Artomov, Mykyta; Stratigos, Alexander J; Kim, Ivana; Kumar, Raj; Lauss, Martin; Reddy, Bobby Y; Miao, Benchun; Daniela Robles-Espinoza, Carla; Sankar, Aravind; Njauw, Ching-Ni; Shannon, Kristen; Gragoudas, Evangelos S; Marie Lane, Anne; Iyer, Vivek; Newton-Bishop, Julia A; Timothy Bishop, D; Holland, Elizabeth A; Mann, Graham J; Singh, Tarjinder; Daly, Mark J; Tsao, Hensin

    2017-01-01

    Abstract Background Extraordinary progress has been made in our understanding of common variants in many diseases, including melanoma. Because the contribution of rare coding variants is not as well characterized, we performed an exome-wide, gene-based association study of familial cutaneous melanoma (CM) and ocular melanoma (OM). Methods Using 11 990 jointly processed individual DNA samples, whole-exome sequencing was performed, followed by large-scale joint variant calling using GATK (Genome Analysis ToolKit). PLINK/SEQ was used for statistical analysis of genetic variation. Four models were used to estimate the association among different types of variants. In vitro functional validation was performed using three human melanoma cell lines in 2D and 3D proliferation assays. In vivo tumor growth was assessed using xenografts of human melanoma A375 melanoma cells in nude mice (eight mice per group). All statistical tests were two-sided. Results Strong signals were detected for CDKN2A (Pmin = 6.16 × 10-8) in the CM cohort (n = 273) and BAP1 (Pmin = 3.83 × 10‐6) in the OM (n = 99) cohort. Eleven genes that exhibited borderline association (P < 10‐4) were independently validated using The Cancer Genome Atlas melanoma cohort (379 CM, 47 OM) and a matched set of 3563 European controls with CDKN2A (P = .009), BAP1 (P = .03), and EBF3 (P = 4.75 × 10‐4), a candidate risk locus, all showing evidence of replication. EBF3 was then evaluated using germline data from a set of 132 familial melanoma cases and 4769 controls of UK origin (joint P = 1.37 × 10‐5). Somatically, loss of EBF3 expression correlated with progression, poorer outcome, and high MITF tumors. Functionally, induction of EBF3 in melanoma cells reduced cell growth in vitro, retarded tumor formation in vivo, and reduced MITF levels. Conclusions The results of this large rare variant germline association study further define the mutational landscape of hereditary melanoma and

  8. Complete amino acid sequence of bovine colostrum low-Mr cysteine proteinase inhibitor.

    PubMed

    Hirado, M; Tsunasawa, S; Sakiyama, F; Niinobe, M; Fujii, S

    1985-07-01

    The complete amino acid sequence of bovine colostrum cysteine proteinase inhibitor was determined by sequencing native inhibitor and peptides obtained by cyanogen bromide degradation, Achromobacter lysylendopeptidase digestion and partial acid hydrolysis of reduced and S-carboxymethylated protein. Achromobacter peptidase digestion was successfully used to isolate two disulfide-containing peptides. The inhibitor consists of 112 amino acids with an Mr of 12787. Two disulfide bonds were established between Cys 66 and Cys 77 and between Cys 90 and Cys 110. A high degree of homology in the sequence was found between the colostrum inhibitor and human gamma-trace, human salivary acidic protein and chicken egg-white cystatin.

  9. Novel oxytocin receptor variants in laboring women requiring high doses of oxytocin.

    PubMed

    Reinl, Erin L; Goodwin, Zane A; Raghuraman, Nandini; Lee, Grace Y; Jo, Erin Y; Gezahegn, Beakal M; Pillai, Meghan K; Cahill, Alison G; de Guzman Strong, Cristina; England, Sarah K

    2017-08-01

    Although oxytocin commonly is used to augment or induce labor, it is difficult to predict its effectiveness because oxytocin dose requirements vary significantly among women. One possibility is that women requiring high or low doses of oxytocin have variations in the oxytocin receptor gene. To identify oxytocin receptor gene variants in laboring women with low and high oxytocin dosage requirements. Term, nulliparous women requiring oxytocin doses of ≤4 mU/min (low-dose-requiring, n = 83) or ≥20 mU/min (high-dose-requiring, n = 104) for labor augmentation or induction provided consent to a postpartum blood draw as a source of genomic DNA. Targeted-amplicon sequencing (coverage >30×) with MiSeq (Illumina) was performed to discover variants in the coding exons of the oxytocin receptor gene. Baseline relevant clinical history, outcomes, demographics, and oxytocin receptor gene sequence variants and their allele frequencies were compared between low-dose-requiring and high-dose-requiring women. The Scale-Invariant Feature Transform algorithm was used to predict the effect of variants on oxytocin receptor function. The Fisher exact or χ 2 tests were used for categorical variables, and Student t tests or Wilcoxon rank sum tests were used for continuous variables. A P value < .05 was considered statistically significant. The high-dose-requiring women had greater rates of obesity and diabetes and were more likely to have undergone labor induction and required prostaglandins. High-dose-requiring women were more likely to undergo cesarean delivery for first-stage arrest and less likely to undergo cesarean delivery for nonreassuring fetal status. Targeted sequencing of the oxytocin receptor gene in the total cohort (n = 187) revealed 30 distinct coding variants: 17 nonsynonymous, 11 synonymous, and 2 small structural variants. One novel variant (A243T) was found in both the low- and high-dose-requiring groups. Three novel variants (Y106H, A240_A249del, and P197

  10. A novel histone variant localized in nucleoli of higher plant cells.

    PubMed

    Tanaka, I; Akahori, Y; Gomi, K; Suzuki, T; Ueda, K

    1999-07-01

    Immunofluorescence staining with antisera raised against p35, a basic nuclear protein that accumulates in the pollen nuclei of Lilium longiflorum, specifically stained the nucleoli in interphase nuclei of somatic tissues, including root and leaf, and in pachytene nuclei during meiotic division, whereas antisera raised against histone H1 uniformly stained the entire chromatin domain with the exception of the nucleoli in these nuclei. Further, p35-specific antisera stained the nucleoli in root and leaf nuclei of the monocotyledonous plants Tulipa gesneriana, Allium cepa and Triticum aestivum and of the dicotyledonous plants Vicia faba and Nicotiana tabacum. Thus, these novel antisera stained the nucleoli in cells of all higher plants examined, although the staining patterns within nucleoli were somewhat different among plant species and tissues. The full-length cDNA of p35 was cloned on the basis of the partial amino acid sequence. The deduced amino acid composition and amino acid sequence of p35 indicate that this nucleolar protein is a novel variant of histone Hl. Further, p35 was strongly bound to ribosomal DNA in vitro. The results of immunoblotting of histones extracted from each tissue of the various plant species with the nucleolus-specific antibodies also suggested the conservation of similar epitope(s) in both mono- and dicotyledonous plants. From these results, it is suggested that similar variants of histone Hl are specifically distributed in the nucleoli of all plant species and help to organize the nucleolar chromatin.

  11. Cellobiohydrolase variants and polynucleotides encoding same

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wogulis, Mark

    The present invention relates to variants of a parent cellobiohydrolase II. The present invention also relates to polynucleotides encoding the variants; nucleic acid constructs, vectors, and host cells comprising the polynucleotides; and methods of using the variants.

  12. Clinical detection of deletion structural variants in whole-genome sequences

    PubMed Central

    Noll, Aaron C; Miller, Neil A; Smith, Laurie D; Yoo, Byunggil; Fiedler, Stephanie; Cooley, Linda D; Willig, Laurel K; Petrikin, Josh E; Cakici, Julie; Lesko, John; Newton, Angela; Detherage, Kali; Thiffault, Isabelle; Saunders, Carol J; Farrow, Emily G; Kingsmore, Stephen F

    2016-01-01

    Optimal management of acutely ill infants with monogenetic diseases requires rapid identification of causative haplotypes. Whole-genome sequencing (WGS) has been shown to identify pathogenic nucleotide variants in such infants. Deletion structural variants (DSVs, >50 nt) are implicated in many genetic diseases, and tools have been designed to identify DSVs using short-read WGS. Optimisation and integration of these tools into a WGS pipeline could improve diagnostic sensitivity and specificity of WGS. In addition, it may improve turnaround time when compared with current CNV assays, enhancing utility in acute settings. Here we describe DSV detection methods for use in WGS for rapid diagnosis in acutely ill infants: SKALD (Screening Konsensus and Annotation of Large Deletions) combines calls from two tools (Breakdancer and GenomeStrip) with calibrated filters and clinical interpretation rules. In four WGS runs, the average analytic precision (positive predictive value) of SKALD was 78%, and recall (sensitivity) was 27%, when compared with validated reference DSV calls. When retrospectively applied to a cohort of 36 families with acutely ill infants SKALD identified causative DSVs in two. The first was heterozygous deletion of exons 1–3 of MMP21 in trans with a heterozygous frame-shift deletion in two siblings with transposition of the great arteries and heterotaxy. In a newborn female with dysmorphic features, ventricular septal defect and persistent pulmonary hypertension, SKALD identified the breakpoints of a heterozygous, de novo 1p36.32p36.13 deletion. In summary, consensus DSV calling, implemented in an 8-h computational pipeline with parameterised filtering, has the potential to increase the diagnostic yield of WGS in acutely ill neonates and discover novel disease genes. PMID:29263817

  13. MACARON: A python framework to identify and re-annotate multi-base affected codons in whole genome/exome sequence data.

    PubMed

    Khan, Waqasuddin; Saripella, Ganapathi Varma-; Ludwig, Thomas; Cuppens, Tania; Thibord, Florian; Génin, Emmanuelle; Deleuze, Jean-Francois; Trégouët, David-Alexandre

    2018-05-03

    Predicted deleteriousness of coding variants is a frequently used criterion to filter out variants detected in next-generation sequencing projects and to select candidates impacting on the risk of human diseases. Most available dedicated tools implement a base-to-base annotation approach that could be biased in presence of several variants in the same genetic codon. We here proposed the MACARON program that, from a standard VCF file, identifies, re-annotates and predicts the amino acid change resulting from multiple single nucleotide variants (SNVs) within the same genetic codon. Applied to the whole exome dataset of 573 individuals, MACARON identifies 114 situations where multiple SNVs within a genetic codon induce an amino acid change that is different from those predicted by standard single SNV annotation tool. Such events are not uncommon and deserve to be studied in sequencing projects with inconclusive findings. MACARON is written in python with codes available on the GENMED website (www.genmed.fr). david-alexandre.tregouet@inserm.fr. Supplementary data are available at Bioinformatics online.

  14. PVRL1 Variants Contribute to Non-Syndromic Cleft Lip and Palate in Multiple Populations

    PubMed Central

    Avila, Joseph R.; Jezewski, Peter A.; Vieira, Alexandre R.; Orioli, Iêda M.; Castilla, Eduardo E.; Christensen, Kaare; Daack-Hirsch, Sandra; Romitti, Paul A.; Murray, Jeffrey C.

    2007-01-01

    Poliovirus Receptor Like-1 (PVRL1) is a member of the immunoglobulin super family that acts in the initiation and maintenance of epithelial adherens junctions and is mutated in the cleft lip and palate/ectodermal dysplasia 1 syndrome (CLPED1, OMIM #225000). In addition, a common non-sense mutation in PVRL1 was discovered more often among non-syndromic sporadic clefting cases in Northern Venezuela in a previous case-control study. The present work sought to ascertain the role of PVRL1 in the sporadic forms of orofacial clefting in multiple populations. Multiple rare and common variants from all three splice isoforms were initially ascertained by sequencing 92 Iowan and 86 Filipino cases and CEPH controls. Using a family-based analysis to examine these variants, the common glycine allele of the G361V coding variant was significantly overtransmitted among all orofacial clefting phenotypes (P = 0.005). This represented G361V genotyping from over 800 Iowan, Danish, and Filipino families. Among four rare amino acid changes found within the V1 and C1 domains, S112T and T131A were found adjacent to critical amino acid positions within the V1 variable domain, regions previously shown to mediate cell-to-cell and cell-to-virus adhesion. The T131A variant was not found in over 1,300 non-affected control samples although the alanine is found in other species. The serine of the S112T variant position is conserved across all known PVRL1 sequences. Together these data suggest that both rare and common mutations within PVRL1 make a minor contribution to disrupting the initiation and regulation of cell-to-cell adhesion and downstream morphogenesis of the embryonic face. PMID:17089422

  15. Germline sequence variants in TGM3 and RGS22 confer risk of basal cell carcinoma

    PubMed Central

    Stacey, Simon N.; Sulem, Patrick; Gudbjartsson, Daniel F.; Jonasdottir, Aslaug; Thorleifsson, Gudmar; Gudjonsson, Sigurjon A.; Masson, Gisli; Gudmundsson, Julius; Sigurgeirsson, Bardur; Benediktsdottir, Kristrun R.; Thorisdottir, Kristin; Ragnarsson, Rafn; Fuentelsaz, Victoria; Corredera, Cristina; Grasa, Matilde; Planelles, Dolores; Sanmartin, Onofre; Rudnai, Peter; Gurzau, Eugene; Koppova, Kvetoslava; Hemminki, Kari; Nexø, Bjørn A; Tjønneland, Anne; Overvad, Kim; Johannsdottir, Hrefna; Helgadottir, Hafdis T.; Thorsteinsdottir, Unnur; Kong, Augustine; Vogel, Ulla; Kumar, Rajiv; Nagore, Eduardo; Mayordomo, José I.; Rafnar, Thorunn; Olafsson, Jon H.; Stefansson, Kari

    2014-01-01

    To search for new sequence variants that confer risk of cutaneous basal cell carcinoma (BCC), we conducted a genome-wide association study of 38.5 million single nucleotide polymorphisms (SNPs) and small indels identified through whole-genome sequencing of 2230 Icelanders. We imputed genotypes for 4208 BCC patients and 109 408 controls using Illumina SNP chip typing data, carried out association tests and replicated the findings in independent population samples. We found new BCC susceptibility loci at TGM3 (rs214782[G], P = 5.5 × 10−17, OR = 1.29) and RGS22 (rs7006527[C], P = 8.7 × 10−13, OR = 0.77). TGM3 encodes transglutaminase type 3, which plays a key role in production of the cornified envelope during epidermal differentiation. PMID:24403052

  16. Rare Variant Association Test with Multiple Phenotypes

    PubMed Central

    Lee, Selyeong; Won, Sungho; Kim, Young Jin; Kim, Yongkang; Kim, Bong-Jo; Park, Taesung

    2016-01-01

    Although genome-wide association studies (GWAS) have now discovered thousands of genetic variants associated with common traits, such variants cannot explain the large degree of “missing heritability,” likely due to rare variants. The advent of next generation sequencing technology has allowed rare variant detection and association with common traits, often by investigating specific genomic regions for rare variant effects on a trait. Although multiply correlated phenotypes are often concurrently observed in GWAS, most studies analyze only single phenotypes, which may lessen statistical power. To increase power, multivariate analyses, which consider correlations between multiple phenotypes, can be used. However, few existing multi-variant analyses can identify rare variants for assessing multiple phenotypes. Here, we propose Multivariate Association Analysis using Score Statistics (MAAUSS), to identify rare variants associated with multiple phenotypes, based on the widely used Sequence Kernel Association Test (SKAT) for a single phenotype. We applied MAAUSS to Whole Exome Sequencing (WES) data from a Korean population of 1,058 subjects, to discover genes associated with multiple traits of liver function. We then assessed validation of those genes by a replication study, using an independent dataset of 3,445 individuals. Notably, we detected the gene ZNF620 among five significant genes. We then performed a simulation study to compare MAAUSS's performance with existing methods. Overall, MAAUSS successfully conserved type 1 error rates and in many cases, had a higher power than the existing methods. This study illustrates a feasible and straightforward approach for identifying rare variants correlated with multiple phenotypes, with likely relevance to missing heritability. PMID:28039885

  17. Nucleic acid sequence detection using multiplexed oligonucleotide PCR

    DOEpatents

    Nolan, John P [Santa Fe, NM; White, P Scott [Los Alamos, NM

    2006-12-26

    Methods for rapidly detecting single or multiple sequence alleles in a sample nucleic acid are described. Provided are all of the oligonucleotide pairs capable of annealing specifically to a target allele and discriminating among possible sequences thereof, and ligating to each other to form an oligonucleotide complex when a particular sequence feature is present (or, alternatively, absent) in the sample nucleic acid. The design of each oligonucleotide pair permits the subsequent high-level PCR amplification of a specific amplicon when the oligonucleotide complex is formed, but not when the oligonucleotide complex is not formed. The presence or absence of the specific amplicon is used to detect the allele. Detection of the specific amplicon may be achieved using a variety of methods well known in the art, including without limitation, oligonucleotide capture onto DNA chips or microarrays, oligonucleotide capture onto beads or microspheres, electrophoresis, and mass spectrometry. Various labels and address-capture tags may be employed in the amplicon detection step of multiplexed assays, as further described herein.

  18. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion

    PubMed Central

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-01-01

    Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed). PMID:22638583

  19. LoRTE: Detecting transposon-induced genomic variants using low coverage PacBio long read sequences.

    PubMed

    Disdero, Eric; Filée, Jonathan

    2017-01-01

    Population genomic analysis of transposable elements has greatly benefited from recent advances of sequencing technologies. However, the short size of the reads and the propensity of transposable elements to nest in highly repeated regions of genomes limits the efficiency of bioinformatic tools when Illumina or 454 technologies are used. Fortunately, long read sequencing technologies generating read length that may span the entire length of full transposons are now available. However, existing TE population genomic softwares were not designed to handle long reads and the development of new dedicated tools is needed. LoRTE is the first tool able to use PacBio long read sequences to identify transposon deletions and insertions between a reference genome and genomes of different strains or populations. Tested against simulated and genuine Drosophila melanogaster PacBio datasets, LoRTE appears to be a reliable and broadly applicable tool to study the dynamic and evolutionary impact of transposable elements using low coverage, long read sequences. LoRTE is an efficient and accurate tool to identify structural genomic variants caused by TE insertion or deletion. LoRTE is available for download at http://www.egce.cnrs-gif.fr/?p=6422.

  20. NOTCH3 variants and risk of ischemic stroke.

    PubMed

    Ross, Owen A; Soto-Ortolaza, Alexandra I; Heckman, Michael G; Verbeeck, Christophe; Serie, Daniel J; Rayaprolu, Sruti; Rich, Stephen S; Nalls, Michael A; Singleton, Andrew; Guerreiro, Rita; Kinsella, Emma; Wszolek, Zbigniew K; Brott, Thomas G; Brown, Robert D; Worrall, Bradford B; Meschia, James F

    2013-01-01

    Mutations within the NOTCH3 gene cause cerebral autosomal dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL). CADASIL mutations appear to be restricted to the first twenty-four exons, resulting in the gain or loss of a cysteine amino acid. The role of other exonic NOTCH3 variation not involving cysteine residues and mutations in exons 25-33 in ischemic stroke remains unresolved. All 33 exons of NOTCH3 were sequenced in 269 Caucasian probands from the Siblings With Ischemic Stroke Study (SWISS), a 70-center North American affected sibling pair study and 95 healthy Caucasian control subjects. Variants identified by sequencing in the SWISS probands were then tested for association with ischemic stroke using US Caucasian controls collected at the Mayo Clinic (n=654), and further assessed in a Caucasian (n=802) and African American (n=298) patient-control series collected through the Ischemic Stroke Genetics Study (ISGS). Sequencing of the 269 SWISS probands identified one (0.4%) with small vessel type stroke carrying a known CADASIL mutation (p.R558C; Exon 11). Of the 19 common NOTCH3 variants identified, the only variant significantly associated with ischemic stroke after multiple testing adjustment was p.R1560P (rs78501403; Exon 25) in the combined SWISS and ISGS Caucasian series (Odds Ratio [OR] 0.50, P=0.0022) where presence of the minor allele was protective against ischemic stroke. Although only significant prior to adjustment for multiple testing, p.T101T (rs3815188; Exon 3) was associated with an increased risk of small-vessel stroke (OR: 1.56, P=0.008) and p.P380P (rs61749020; Exon 7) was associated with decreased risk of large-vessel stroke (OR: 0.35, P=0.047) in Caucasians. No significant associations were observed in the small African American series. Cysteine-affecting NOTCH3 mutations are rare in patients with typical ischemic stroke, however our observation that common NOTCH3 variants may be associated with risk of ischemic

  1. DNA Sequence Variants in the Five Prime Untranslated Region of the Cyclooxygenase-2 Gene Are Commonly Found in Healthy Dogs and Gray Wolves.

    PubMed

    Safra, Noa; Hayward, Louisa J; Aguilar, Miriam; Sacks, Benjamin N; Westropp, Jodi L; Mohr, F Charles; Mellersh, Cathryn S; Bannasch, Danika L

    2015-01-01

    The aim of this study was to investigate the frequency of regional DNA variants upstream to the translation initiation site of the canine Cyclooxygenase-2 (Cox-2) gene in healthy dogs. Cox-2 plays a role in various disease conditions such as acute and chronic inflammation, osteoarthritis and malignancy. A role for Cox-2 DNA variants in genetic predisposition to canine renal dysplasia has been proposed and dog breeders have been encouraged to select against these DNA variants. We sequenced 272-422 bases in 152 dogs unaffected by renal dysplasia and found 19 different haplotypes including 11 genetic variants which had not been described previously. We genotyped 7 gray wolves to ascertain the wildtype variant and found that the wolves we analyzed had predominantly the second most common DNA variant found in dogs. Our results demonstrate an elevated level of regional polymorphism that appears to be a feature of healthy domesticated dogs.

  2. Sequencing of sporadic Attention-Deficit Hyperactivity Disorder (ADHD) identifies novel and potentially pathogenic de novo variants and excludes overlap with genes associated with autism spectrum disorder.

    PubMed

    Kim, Daniel Seung; Burt, Amber A; Ranchalis, Jane E; Wilmot, Beth; Smith, Joshua D; Patterson, Karynne E; Coe, Bradley P; Li, Yatong K; Bamshad, Michael J; Nikolas, Molly; Eichler, Evan E; Swanson, James M; Nigg, Joel T; Nickerson, Deborah A; Jarvik, Gail P

    2017-06-01

    Attention-Deficit Hyperactivity Disorder (ADHD) has high heritability; however, studies of common variation account for <5% of ADHD variance. Using data from affected participants without a family history of ADHD, we sought to identify de novo variants that could account for sporadic ADHD. Considering a total of 128 families, two analyses were conducted in parallel: first, in 11 unaffected parent/affected proband trios (or quads with the addition of an unaffected sibling) we completed exome sequencing. Six de novo missense variants at highly conserved bases were identified and validated from four of the 11 families: the brain-expressed genes TBC1D9, DAGLA, QARS, CSMD2, TRPM2, and WDR83. Separately, in 117 unrelated probands with sporadic ADHD, we sequenced a panel of 26 genes implicated in intellectual disability (ID) and autism spectrum disorder (ASD) to evaluate whether variation in ASD/ID-associated genes were also present in participants with ADHD. Only one putative deleterious variant (Gln600STOP) in CHD1L was identified; this was found in a single proband. Notably, no other nonsense, splice, frameshift, or highly conserved missense variants in the 26 gene panel were identified and validated. These data suggest that de novo variant analysis in families with independently adjudicated sporadic ADHD diagnosis can identify novel genes implicated in ADHD pathogenesis. Moreover, that only one of the 128 cases (0.8%, 11 exome, and 117 MIP sequenced participants) had putative deleterious variants within our data in 26 genes related to ID and ASD suggests significant independence in the genetic pathogenesis of ADHD as compared to ASD and ID phenotypes. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  3. Identification of missing variants by combining multiple analytic pipelines.

    PubMed

    Ren, Yingxue; Reddy, Joseph S; Pottier, Cyril; Sarangi, Vivekananda; Tian, Shulan; Sinnwell, Jason P; McDonnell, Shannon K; Biernacka, Joanna M; Carrasquillo, Minerva M; Ross, Owen A; Ertekin-Taner, Nilüfer; Rademakers, Rosa; Hudson, Matthew; Mainzer, Liudmila Sergeevna; Asmann, Yan W

    2018-04-16

    After decades of identifying risk factors using array-based genome-wide association studies (GWAS), genetic research of complex diseases has shifted to sequencing-based rare variants discovery. This requires large sample sizes for statistical power and has brought up questions about whether the current variant calling practices are adequate for large cohorts. It is well-known that there are discrepancies between variants called by different pipelines, and that using a single pipeline always misses true variants exclusively identifiable by other pipelines. Nonetheless, it is common practice today to call variants by one pipeline due to computational cost and assume that false negative calls are a small percent of total. We analyzed 10,000 exomes from the Alzheimer's Disease Sequencing Project (ADSP) using multiple analytic pipelines consisting of different read aligners and variant calling strategies. We compared variants identified by using two aligners in 50,100, 200, 500, 1000, and 1952 samples; and compared variants identified by adding single-sample genotyping to the default multi-sample joint genotyping in 50,100, 500, 2000, 5000 and 10,000 samples. We found that using a single pipeline missed increasing numbers of high-quality variants correlated with sample sizes. By combining two read aligners and two variant calling strategies, we rescued 30% of pass-QC variants at sample size of 2000, and 56% at 10,000 samples. The rescued variants had higher proportions of low frequency (minor allele frequency [MAF] 1-5%) and rare (MAF < 1%) variants, which are the very type of variants of interest. In 660 Alzheimer's disease cases with earlier onset ages of ≤65, 4 out of 13 (31%) previously-published rare pathogenic and protective mutations in APP, PSEN1, and PSEN2 genes were undetected by the default one-pipeline approach but recovered by the multi-pipeline approach. Identification of the complete variant set from sequencing data is the prerequisite of genetic

  4. Expression of an estrogen-regulated variant transcript of the peroxisomal branched chain fatty acid oxidase ACOX2 in breast carcinomas.

    PubMed

    Bjørklund, Sunniva Stordal; Kristensen, Vessela N; Seiler, Michael; Kumar, Surendra; Alnæs, Grethe I Grenaker; Ming, Yao; Kerrigan, John; Naume, Bjørn; Sachidanandam, Ravi; Bhanot, Gyan; Børresen-Dale, Anne-Lise; Ganesan, Shridar

    2015-07-17

    Alternate transcripts from a single gene locus greatly enhance the combinatorial flexibility of the human transcriptome. Different patterns of exon usage have been observed when comparing normal tissue to cancers, suggesting that variant transcripts may play a role in the tumor phenotype. Ribonucleic acid-sequencing (RNA-seq) data from breast cancer samples was used to identify an intronic start variant transcript of Acyl-CoA oxidase 2, ACOX2 (ACOX2-i9). Difference in expression between Estrogen Receptor (ER) positive and ER negative patients was assessed by the Wilcoxon rank sum test, and the findings validated in The Cancer Genome Atlas (TCGA) breast cancer dataset (BRCA). ACOX2-i9 expression was also assessed in cell lines using both quantitative reverse transcriptase-polymerase chain reaction (qRT-PCR) and Western blot analysis. Knock down by short hairpin RNA (shRNA) and colony formation assays were used to determine whether ACOX2-i9 expression would influence cellular fitness. The effect of ACOX2-i9 expression on patient survival was assessed by the Kaplan-Meier survival function, and association to clinical parameters was analyzed using a Fisher exact test. The expression and translation of ACOX2-i9 into a 25 kDa protein was demonstrated in HepG2 cells as well as in several breast cancer cell lines. shRNA knock down of the ACOX2-i9 variant resulted in decreased cell viability of T47D and MDA-MB 436 cells. Moreover, expression of ACOX2-i9 was shown to be estrogen regulated, being induced by propyl pyrazoletriol and inhibited by tamoxifen and fulvestrant in ER+ T47D and Mcf-7 cells, but not in the ER- MDA-MB 436 cell line. This variant transcript showed expression predominantly in ER-positive breast tumors as assessed in our initial set of 53 breast cancers and further validated in 87 tumor/normal pairs from the TCGA breast cancer dataset, and expression was associated with better outcome in ER positive patients. ACOX2-i9 is specifically enriched in ER+ breast

  5. Determination of a novel integron-located variant (blaOXA -320 ) of Class D β-lactamase in Proteus mirabilis.

    PubMed

    Cicek, Aysegul Copur; Duzgun, Azer Ozad; Saral, Aysegul; Sandalli, Cemal

    2014-10-01

    Proteus mirabilis (P. mirabilis) is one of Gram-negative pathogens encountered in clinical specimens. A clinical isolate (TRP41) of P. mirabilis was isolated from a Turkish patient in Turkey. The isolate was identified using the API 32GN system and 16S rRNA gene sequencing and it was found resistant to ampicillin/sulbactam, piperacillin, tetracycline, and trimethoprim/sulfamethoxazole. This isolate was harboring a Class 1 integron gene cassette and its DNA sequence analysis revealed a novel blaOXA variant exhibiting one amino acid substitution (Asn266Ile) from blaOXA-1 . This new variant of OXA was located on Class 1 integron together with aadA1 gene encoding aminoglycoside-modifying enzymes. According to sequence records, the new variant was named as blaOXA-320 . Cassette array and size of integron were found as blaOXA-320 -aadA1 and 2086 bp, respectively. The blaOXA-320 gene is not transferable according to conjugation experiment. In this study, we report the first identification of blaOXA-320 -aadA1 gene cassette, a novel variant of Class D β-lactamase, in P. mirabilis from Turkey. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  6. Kangaroo IGF-II is structurally and functionally similar to the human [Ser29]-IGF-II variant.

    PubMed

    Yandell, C A; Francis, G L; Wheldrake, J F; Upton, Z

    1999-06-01

    Kangaroo IGF-II has been purified from western grey kangaroo (Macropus fuliginosus) serum and characterised in a number of in vitro assays. In addition, the complete cDNA sequence of mature IGF-II has been obtained by reverse-transcription polymerase chain reaction. Comparison of the kangaroo IGF-II cDNA sequence with known IGF-II sequences from other species revealed that it is very similar to the human variant, [Ser29]-hIGF-II. Both the variant and kangaroo IGF-II contain an insert of nine nucleotides that encode the amino acids Leu-Pro-Gly at the junction of the B and C domains of the mature protein. The deduced kangaroo IGF-II protein sequence also contains three other amino acid changes that are not observed in human IGF-II. These amino acid differences share similarities with the changes described in many of the IGF-IIs reported for non-mammalian species. Characterisation of human IGF-II, kangaroo IGF-II, chicken IGF-II and [Ser29]-hIGF-II in a number of in vitro assays revealed that all four proteins are functionally very similar. No significant differences were observed in the ability of the IGF-IIs to bind to the bovine IGF-II/cation-independent mannose 6-phosphate receptor or to stimulate protein synthesis in rat L6 myoblasts. However, differences were observed in their abilities to bind to IGF-binding proteins (IGFBPs) present in human serum. Kangaroo, chicken and [Ser29]-hIGF-II had lower apparent affinities for human IGFBPs than did human IGF-II. Thus, it appears that the major circulating form of IGF-II in the kangaroo and a minor form of IGF-II found in human serum are structurally and functionally very similar. This suggests that the splice site that generates both the variant and major form of human IGF-II must have evolved after the divergence of marsupials from placental mammals.

  7. Identification of rare genetic variants in Italian patients with dementia by targeted gene sequencing.

    PubMed

    Bartoletti-Stella, Anna; Baiardi, Simone; Stanzani-Maserati, Michelangelo; Piras, Silvia; Caffarra, Paolo; Raggi, Alberto; Pantieri, Roberta; Baldassari, Sara; Caporali, Leonardo; Abu-Rumeileh, Samir; Linarello, Simona; Liguori, Rocco; Parchi, Piero; Capellari, Sabina

    2018-06-01

    Genetics is intricately involved in the etiology of neurodegenerative dementias. The incidence of monogenic dementia among all neurodegenerative forms is unknown due to the lack of systematic studies and of patient/clinician access to extensive diagnostic procedures. In this study, we conducted targeted sequencing in 246 clinically heterogeneous patients, mainly with early-onset and/or familial neurodegenerative dementia, using a custom-designed next-generation sequencing panel covering 27 genes known to harbor mutations that can cause different types of dementia, in addition to the detection of C9orf72 repeat expansions. Forty-nine patients (19.9%) carried known pathogenic or novel, likely pathogenic, variants, involving both common (presenilin 1, presenilin 2, C9orf72, and granulin) and rare (optineurin, serpin family I member 1 and protein kinase cyclic adenosine monophosphate (cAMP)-dependent type I regulatory subunit beta) dementia-associated genes. Our results support the use of an extended next-generation sequencing panels as a quick, accurate, and cost-effective method for diagnosis in clinical practice. This approach could have a significant impact on the proportion of tested patients, especially among those with an early disease onset. Copyright © 2018 Elsevier Inc. All rights reserved.

  8. Occurrence of novel GII.17 and GII.21 norovirus variants in the coastal environment of South Korea in 2015

    PubMed Central

    Koo, Eung Seo; Kim, Man Su; Choi, Yong Seon; Park, Kwon-Sam; Jeong, Yong Seok

    2017-01-01

    Human norovirus (HNoV), a positive-sense RNA virus, is the main causative agent of acute viral gastroenteritis. Multiple pandemic variants of the genogroup II genotype 4 (GII.4) of NoV have attracted great attention from researchers worldwide. However, novel variants of GII.17 have been overtaking those pandemic variants in some areas of East Asia. To investigate the environmental occurrence of GII in South Korea, we collected water samples from coastal streams and a neighboring waste water treatment plant in North Jeolla province (in March, July, and December of 2015). Based on capsid gene region C analysis, four different genotypes (GII.4, GII.13, GII.17, and GII.21) were detected, with much higher prevalence of GII.17 than of GII.4. Additional sequence analyses of the ORF1-ORF2 junction and ORF2 from the water samples revealed that the GII.17 sequences in this study were closely related to the novel strains of GII.P17-GII.17, the main causative variants of the 2014–2015 HNoV outbreak in China and Japan. In addition, the GII.P21-GII.21 variants were identified in this study and they had new amino acid sequence variations in the blockade epitopes of the P2 domain. From these results, we present two important findings: 1) the novel GII.P17-GII.17 variants appeared to be predominant in the study area, and 2) new GII.21 variants have emerged in South Korea. PMID:28199388

  9. Gene-diet interaction of a common FADS1 variant with marine polyunsaturated fatty acids for fatty acid composition in plasma and erythrocytes among men.

    PubMed

    Takkunen, Markus J; de Mello, Vanessa D; Schwab, Ursula S; Kuusisto, Johanna; Vaittinen, Maija; Ågren, Jyrki J; Laakso, Markku; Pihlajamäki, Jussi; Uusitupa, Matti I J

    2016-02-01

    Limited information exists on how the relationship between dietary intake of fat and fatty acids in erythrocytes and plasma is modulated by polymorphisms in the FADS gene cluster. We examined gene-diet interaction of total marine PUFA intake with a known gene encoding Δ-5 desaturase enzyme (FADS1) variant (rs174550) for fatty acids in erythrocyte membranes and plasma phospholipids (PL), cholesteryl esters (CE), and triglycerides (TG). In this cross-sectional study, fatty acid compositions were measured using GC, and total intake of polyunsaturated fat from fish and fish oil was estimated using a food frequency questionnaire in a subsample (n = 962) of the Metabolic Syndrome in Men Study. We found nominally significant gene-diet interactions for eicosapentaenoic acid (EPA, 20:5n-3) in erythrocytes (pinteraction = 0.032) and for EPA in plasma PL (pinteraction = 0.062), CE (pinteraction = 0.035), and TG (pinteraction = 0.035), as well as for docosapentaenoic acid (22:5n-3) in PL (pinteraction = 0.007). After excluding omega-3 supplement users, we found a significant gene-diet interaction for EPA in erythrocytes (pinteraction < 0.003). In a separate cohort of the Kuopio Obesity Surgery Study, the same locus was strongly associated with hepatic mRNA expression of FADS1 (p = 1.5 × 10(-10) ). FADS1 variants may modulate the relationship between marine fatty acid intake and circulating levels of long-chain omega-3 fatty acids. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. Sequence variants of Toll-like receptor 4 and susceptibility to prostate cancer.

    PubMed

    Chen, Yen-Ching; Giovannucci, Edward; Lazarus, Ross; Kraft, Peter; Ketkar, Shamika; Hunter, David J

    2005-12-15

    Chronic inflammation has been hypothesized to be a risk factor for prostate cancer. The Toll-like receptor 4 (TLR4) presents the bacterial lipopolysaccharide (LPS), which interacts with ligand-binding protein and CD14 (LPS receptor) and activates expression of inflammatory genes through nuclear factor-kappaB and mitogen-activated protein kinase signaling. A previous case-control study found a modest association of a polymorphism in the TLR4 gene [11381G/C, GG versus GC/CC: odds ratio (OR), 1.26] with risk of prostate cancer. We assessed if sequence variants of TLR4 were associated with the risk of prostate cancer. In a nested case-control design within the Health Professionals Follow-up Study, we identified 700 participants with prostate cancer diagnosed after they had provided a blood specimen in 1993 and before January 2000. Controls were 700 age-matched men without prostate cancer who had had a prostate-specific antigen test after providing a blood specimen. We genotyped 16 common (>5%) single nucleotide polymorphisms (SNP) discovered in a resequencing study spanning TLR4 to test for association between sequence variation in TLR4 and prostate cancer. Homozygosity for the variant alleles of eight SNPs was associated with a statistically significantly lower risk of prostate cancer (TLR4_1893, TLR4_2032, TLR4_2437, TLR4_7764, TLR4_11912, TLR4_16649, TLR4_17050, and TLR4_17923), but the TLR4_15844 polymorphism corresponding to 11381G/C was not associated with prostate cancer (GG versus CG/CC: OR, 1.01; 95% confidence interval, 0.79-1.29). Six common haplotypes (cumulative frequency, 81%) were observed; the global test for association between haplotypes and prostate cancer was statistically significant (chi(2) = 14.8 on 6 degrees of freedom; P = 0.02). Two common haplotypes were statistically significantly associated with altered risk of prostate cancer. Inherited polymorphisms of the innate immune gene TLR4 are associated with risk of prostate cancer.

  11. Identification of seven novel loci associated with amino acid levels using single-variant and gene-based tests in 8545 Finnish men from the METSIM study.

    PubMed

    Teslovich, Tanya M; Kim, Daniel Seung; Yin, Xianyong; Stancáková, Alena; Jackson, Anne U; Wielscher, Matthias; Naj, Adam; Perry, John R B; Huyghe, Jeroen R; Stringham, Heather M; Davis, James P; Raulerson, Chelsea K; Welch, Ryan P; Fuchsberger, Christian; Locke, Adam E; Sim, Xueling; Chines, Peter S; Narisu, Narisu; Kangas, Antti J; Soininen, Pasi; Ala-Korpela, Mika; Gudnason, Vilmundur; Musani, Solomon K; Jarvelin, Marjo-Riitta; Schellenberg, Gerard D; Speliotes, Elizabeth K; Kuusisto, Johanna; Collins, Francis S; Boehnke, Michael; Laakso, Markku; Mohlke, Karen L

    2018-05-01

    Comprehensive metabolite profiling captures many highly heritable traits, including amino acid levels, which are potentially sensitive biomarkers for disease pathogenesis. To better understand the contribution of genetic variation to amino acid levels, we performed single variant and gene-based tests of association between nine serum amino acids (alanine, glutamine, glycine, histidine, isoleucine, leucine, phenylalanine, tyrosine, and valine) and 16.6 million genotyped and imputed variants in 8545 non-diabetic Finnish men from the METabolic Syndrome In Men (METSIM) study with replication in Northern Finland Birth Cohort (NFBC1966). We identified five novel loci associated with amino acid levels (P = < 5×10-8): LOC157273/PPP1R3B with glycine (rs9987289, P = 2.3×10-26); ZFHX3 (chr16:73326579, minor allele frequency (MAF) = 0.42%, P = 3.6×10-9), LIPC (rs10468017, P = 1.5×10-8), and WWOX (rs9937914, P = 3.8×10-8) with alanine; and TRIB1 with tyrosine (rs28601761, P = 8×10-9). Gene-based tests identified two novel genes harboring missense variants of MAF <1% that show aggregate association with amino acid levels: PYCR1 with glycine (Pgene = 1.5×10-6) and BCAT2 with valine (Pgene = 7.4×10-7); neither gene was implicated by single variant association tests. These findings are among the first applications of gene-based tests to identify new loci for amino acid levels. In addition to the seven novel gene associations, we identified five independent signals at established amino acid loci, including two rare variant signals at GLDC (rs138640017, MAF=0.95%, Pconditional = 5.8×10-40) with glycine levels and HAL (rs141635447, MAF = 0.46%, Pconditional = 9.4×10-11) with histidine levels. Examination of all single variant association results in our data revealed a strong inverse relationship between effect size and MAF (Ptrend<0.001). These novel signals provide further insight into the molecular mechanisms of amino acid metabolism and potentially, their perturbations in

  12. Characterization of Novel Missense Variants of SERPINA1 Gene Causing Alpha-1 Antitrypsin Deficiency.

    PubMed

    Matamala, Nerea; Lara, Beatriz; Gomez-Mariano, Gema; Martínez, Selene; Retana, Diana; Fernandez, Taiomara; Silvestre, Ramona Angeles; Belmonte, Irene; Rodriguez-Frias, Francisco; Vilar, Marçal; Sáez, Raquel; Iturbe, Igor; Castillo, Silvia; Molina-Molina, María; Texido, Anna; Tirado-Conde, Gema; Lopez-Campos, Jose Luis; Posada, Manuel; Blanco, Ignacio; Janciauskiene, Sabina; Martinez-Delgado, Beatriz

    2018-06-01

    The SERPINA1 gene is highly polymorphic, with more than 100 variants described in databases. SERPINA1 encodes the alpha-1 antitrypsin (AAT) protein, and severe deficiency of AAT is a major contributor to pulmonary emphysema and liver diseases. In Spanish patients with AAT deficiency, we identified seven new variants of the SERPINA1 gene involving amino acid substitutions in different exons: PiSDonosti (S+Ser14Phe), PiTijarafe (Ile50Asn), PiSevilla (Ala58Asp), PiCadiz (Glu151Lys), PiTarragona (Phe227Cys), PiPuerto Real (Thr249Ala), and PiValencia (Lys328Glu). We examined the characteristics of these variants and the putative association with the disease. Mutant proteins were overexpressed in HEK293T cells, and AAT expression, polymerization, degradation, and secretion, as well as antielastase activity, were analyzed by periodic acid-Schiff staining, Western blotting, pulse-chase, and elastase inhibition assays. When overexpressed, S+S14F, I50N, A58D, F227C, and T249A variants formed intracellular polymers and did not secrete AAT protein. Both the E151K and K328E variants secreted AAT protein and did not form polymers, although K328E showed intracellular retention and reduced antielastase activity. We conclude that deficient variants may be more frequent than previously thought and that their discovery is possible only by the complete sequencing of the gene and subsequent functional characterization. Better knowledge of SERPINA1 variants would improve diagnosis and management of individuals with AAT deficiency.

  13. WEB-server for search of a periodicity in amino acid and nucleotide sequences

    NASA Astrophysics Data System (ADS)

    E Frenkel, F.; Skryabin, K. G.; Korotkov, E. V.

    2017-12-01

    A new web server (http://victoria.biengi.ac.ru/splinter/login.php) was designed and developed to search for periodicity in nucleotide and amino acid sequences. The web server operation is based upon a new mathematical method of searching for multiple alignments, which is founded on the position weight matrices optimization, as well as on implementation of the two-dimensional dynamic programming. This approach allows the construction of multiple alignments of the indistinctly similar amino acid and nucleotide sequences that accumulated more than 1.5 substitutions per a single amino acid or a nucleotide without performing the sequences paired comparisons. The article examines the principles of the web server operation and two examples of studying amino acid and nucleotide sequences, as well as information that could be obtained using the web server.

  14. Early Strains of Multidrug-Resistant Salmonella enterica Serovar Kentucky Sequence Type 198 from Southeast Asia Harbor Salmonella Genomic Island 1-J Variants with a Novel Insertion Sequence

    PubMed Central

    Le Hello, Simon; Weill, François-Xavier; Guibert, Véronique; Praud, Karine; Cloeckaert, Axel

    2012-01-01

    Salmonella genomic island 1 (SGI1) is a 43-kb integrative mobilizable element that harbors a great diversity of multidrug resistance gene clusters described in numerous Salmonella enterica serovars and also in Proteus mirabilis. The majority of SGI1 variants contain an In104-derivative complex class 1 integron inserted between resolvase gene res and open reading frame (ORF) S044 in SGI1. Recently, the international spread of ciprofloxacin-resistant S. enterica serovar Kentucky sequence type 198 (ST198) containing SGI1-K variants has been reported. A retrospective study was undertaken to characterize ST198 S. Kentucky strains isolated before the spread of the epidemic ST198-SGI1-K population in Africa and the Middle East. Here, we characterized 12 ST198 S. Kentucky strains isolated between 1969 and 1999, mainly from humans returning from Southeast Asia (n = 10 strains) or Israel (n = 1 strain) or from meat in Egypt (n = 1 strain). All these ST198 S. Kentucky strains did not belong to the XbaI pulsotype X1 associated with the African epidemic clone but to pulsotype X2. SGI1-J subgroup variants containing different complex integrons with a partial transposition module and inserted within ORF S023 of SGI1 were detected in six strains. The SGI1-J4 variant containing a partially deleted class 1 integron and thus showing a narrow resistance phenotype to sulfonamides was identified in two epidemiologically unrelated strains from Indonesia. The four remaining strains harbored a novel SGI1-J variant, named SGI1-J6, which contained aadA2, floR2, tetR(G)-tetA(G), and sul1 resistance genes within its complex integron. Moreover, in all these S. Kentucky isolates, a novel insertion sequence related to the IS630 family and named ISSen5 was found inserted upstream of the SGI1 complex integron in ORF S023. Thus, two subpopulations of S. Kentucky ST198 independently and exclusively acquired the SGI1 during the 1980s and 1990s. Unlike the ST198-X1 African epidemic subpopulation, the

  15. Early strains of multidrug-resistant Salmonella enterica serovar Kentucky sequence type 198 from Southeast Asia harbor Salmonella genomic island 1-J variants with a novel insertion sequence.

    PubMed

    Le Hello, Simon; Weill, François-Xavier; Guibert, Véronique; Praud, Karine; Cloeckaert, Axel; Doublet, Benoît

    2012-10-01

    Salmonella genomic island 1 (SGI1) is a 43-kb integrative mobilizable element that harbors a great diversity of multidrug resistance gene clusters described in numerous Salmonella enterica serovars and also in Proteus mirabilis. The majority of SGI1 variants contain an In104-derivative complex class 1 integron inserted between resolvase gene res and open reading frame (ORF) S044 in SGI1. Recently, the international spread of ciprofloxacin-resistant S. enterica serovar Kentucky sequence type 198 (ST198) containing SGI1-K variants has been reported. A retrospective study was undertaken to characterize ST198 S. Kentucky strains isolated before the spread of the epidemic ST198-SGI1-K population in Africa and the Middle East. Here, we characterized 12 ST198 S. Kentucky strains isolated between 1969 and 1999, mainly from humans returning from Southeast Asia (n = 10 strains) or Israel (n = 1 strain) or from meat in Egypt (n = 1 strain). All these ST198 S. Kentucky strains did not belong to the XbaI pulsotype X1 associated with the African epidemic clone but to pulsotype X2. SGI1-J subgroup variants containing different complex integrons with a partial transposition module and inserted within ORF S023 of SGI1 were detected in six strains. The SGI1-J4 variant containing a partially deleted class 1 integron and thus showing a narrow resistance phenotype to sulfonamides was identified in two epidemiologically unrelated strains from Indonesia. The four remaining strains harbored a novel SGI1-J variant, named SGI1-J6, which contained aadA2, floR2, tetR(G)-tetA(G), and sul1 resistance genes within its complex integron. Moreover, in all these S. Kentucky isolates, a novel insertion sequence related to the IS630 family and named ISSen5 was found inserted upstream of the SGI1 complex integron in ORF S023. Thus, two subpopulations of S. Kentucky ST198 independently and exclusively acquired the SGI1 during the 1980s and 1990s. Unlike the ST198-X1 African epidemic subpopulation, the

  16. Detection and isolation of nucleic acid sequences using a bifunctional hybridization probe

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    2000-01-01

    A method for detecting and isolating a target sequence in a sample of nucleic acids is provided using a bifunctional hybridization probe capable of hybridizing to the target sequence that includes a detectable marker and a first complexing agent capable of forming a binding pair with a second complexing agent. A kit is also provided for detecting a target sequence in a sample of nucleic acids using a bifunctional hybridization probe according to this method.

  17. International interlaboratory study comparing single organism 16S rRNA gene sequencing data: Beyond consensus sequence comparisons

    PubMed Central

    Olson, Nathan D.; Lund, Steven P.; Zook, Justin M.; Rojas-Cornejo, Fabiola; Beck, Brian; Foy, Carole; Huggett, Jim; Whale, Alexandra S.; Sui, Zhiwei; Baoutina, Anna; Dobeson, Michael; Partis, Lina; Morrow, Jayne B.

    2015-01-01

    This study presents the results from an interlaboratory sequencing study for which we developed a novel high-resolution method for comparing data from different sequencing platforms for a multi-copy, paralogous gene. The combination of PCR amplification and 16S ribosomal RNA gene (16S rRNA) sequencing has revolutionized bacteriology by enabling rapid identification, frequently without the need for culture. To assess variability between laboratories in sequencing 16S rRNA, six laboratories sequenced the gene encoding the 16S rRNA from Escherichia coli O157:H7 strain EDL933 and Listeria monocytogenes serovar 4b strain NCTC11994. Participants performed sequencing methods and protocols available in their laboratories: Sanger sequencing, Roche 454 pyrosequencing®, or Ion Torrent PGM®. The sequencing data were evaluated on three levels: (1) identity of biologically conserved position, (2) ratio of 16S rRNA gene copies featuring identified variants, and (3) the collection of variant combinations in a set of 16S rRNA gene copies. The same set of biologically conserved positions was identified for each sequencing method. Analytical methods using Bayesian and maximum likelihood statistics were developed to estimate variant copy ratios, which describe the ratio of nucleotides at each identified biologically variable position, as well as the likely set of variant combinations present in 16S rRNA gene copies. Our results indicate that estimated variant copy ratios at biologically variable positions were only reproducible for high throughput sequencing methods. Furthermore, the likely variant combination set was only reproducible with increased sequencing depth and longer read lengths. We also demonstrate novel methods for evaluating variable positions when comparing multi-copy gene sequence data from multiple laboratories generated using multiple sequencing technologies. PMID:27077030

  18. The rs2231142 variant of the ABCG2 gene is associated with uric acid levels and gout among Japanese people.

    PubMed

    Yamagishi, Kazumasa; Tanigawa, Takeshi; Kitamura, Akihiko; Köttgen, Anna; Folsom, Aaron R; Iso, Hiroyasu

    2010-08-01

    Recent genome-wide association and functional studies have shown that the ABCG2 gene encodes for a urate transporter, and a common causal ABCG2 variant, rs2231142, leads to elevated uric acid levels and prevalent gout among Whites and Blacks. We examined whether this finding is observed in a Japanese population, since Asians have a high reported prevalence of the T-risk allele. A total of 3923 Japanese people from the Circulatory Risk in Communities Study aged 40-90 years were genotyped for rs2231142. Associations of the rs2231142 variant with serum uric acid levels and prevalence of gout and hyperuricaemia were examined. The frequency of the T-risk allele was 31% in this Japanese sample. Multivariable adjusted mean uric acid levels were 7-9 micromol/l higher for TG and TT than GG carriers (P-additive = 0.0006). The multivariable-adjusted odds ratio (OR) of prevalent gout was 1.37 (95% CI 0.68, 2.76) for TG and 4.37 (95% CI 1.98, 9.62) for TT compared with the GG carriers (P-additive = 0.001). When evaluating the combined outcome of hyperuricaemia and gout, the respective ORs were 1.40 (95% CI 1.04, 1.87) for TG and 1.88 (95% CI 1.23, 2.89) for TT carriers. The population attributable risk was 29% for gout and 19% for gout and/or hyperuricaemia. The association of the causal ABCG2 rs2231142 variant with uric acid levels and gout was confirmed in a sample of Japanese ancestry. Our study emphasizes the importance of this common causal variant in a population with a high risk allele frequency, especially as more Japanese adopt a Western lifestyle with a concomitant increase in mean serum uric acid levels.

  19. Sex is a moderator of the association between NOS1AP sequence variants and QTc in two long QT syndrome founder populations: a pedigree-based measured genotype association analysis.

    PubMed

    Winbo, Annika; Stattin, Eva-Lena; Westin, Ida Maria; Norberg, Anna; Persson, Johan; Jensen, Steen M; Rydberg, Annika

    2017-07-18

    Sequence variants in the NOS1AP gene have repeatedly been reported to influence QTc, albeit with moderate effect sizes. In the long QT syndrome (LQTS), this may contribute to the substantial QTc variance seen among carriers of identical pathogenic sequence variants. Here we assess three non-coding NOS1AP sequence variants, chosen for their previously reported strong association with QTc in normal and LQTS populations, for association with QTc in two Swedish LQT1 founder populations. This study included 312 individuals (58% females) from two LQT1 founder populations, whereof 227 genotype positive segregating either Y111C (n = 148) or R518* (n = 79) pathogenic sequence variants in the KCNQ1 gene, and 85 genotype negatives. All were genotyped for NOS1AP sequence variants rs12143842, rs16847548 and rs4657139, and tested for association with QTc length (effect size presented as mean difference between derived and wildtype, in ms), using a pedigree-based measured genotype association analysis. Mean QTc was obtained by repeated manual measurement (preferably in lead II) by one observer using coded 50 mm/s standard 12-lead ECGs. A substantial variance in mean QTc was seen in genotype positives 476 ± 36 ms (Y111C 483 ± 34 ms; R518* 462 ± 34 ms) and genotype negatives 433 ± 24 ms. Female sex was significantly associated with QTc prolongation in all genotype groups (p < 0.001). In a multivariable analysis including the entire study population and adjusted for KCNQ1 genotype, sex and age, NOS1AP sequence variants rs12143842 and rs16847548 (but not rs4657139) were significantly associated with QT prolongation, +18 ms (p = 0.0007) and +17 ms (p = 0.006), respectively. Significant sex-interactions were detected for both sequent variants (interaction term r = 0.892, p < 0.001 and r = 0.944, p < 0.001, respectively). Notably, across the genotype groups, when stratified by sex neither rs12143842 nor rs16847548 were significantly associated with

  20. Chromosome specific repetitive DNA sequences

    DOEpatents

    Moyzis, Robert K.; Meyne, Julianne

    1991-01-01

    A method is provided for determining specific nucleotide sequences useful in forming a probe which can identify specific chromosomes, preferably through in situ hybridization within the cell itself. In one embodiment, chromosome preferential nucleotide sequences are first determined from a library of recombinant DNA clones having families of repetitive sequences. Library clones are identified with a low homology with a sequence of repetitive DNA families to which the first clones respectively belong and variant sequences are then identified by selecting clones having a pattern of hybridization with genomic DNA dissimilar to the hybridization pattern shown by the respective families. In another embodiment, variant sequences are selected from a sequence of a known repetitive DNA family. The selected variant sequence is classified as chromosome specific, chromosome preferential, or chromosome nonspecific. Sequences which are classified as chromosome preferential are further sequenced and regions are identified having a low homology with other regions of the chromosome preferential sequence or with known sequences of other family me This invention is the result of a contract with the Department of Energy (Contract No. W-7405-ENG-36).

  1. Association analysis for udder index and milking speed with imputed whole-genome sequence variants in Nordic Holstein cattle.

    PubMed

    Jardim, Júlia Gazzoni; Guldbrandtsen, Bernt; Lund, Mogens Sandø; Sahana, Goutam

    2018-03-01

    Genome-wide association testing facilitates the identification of genetic variants associated with complex traits. Mapping genes that promote genetic resistance to mastitis could reduce the cost of antibiotic use and enhance animal welfare and milk production by improving outcomes of breeding for udder health. Using imputed whole-genome sequence variants, we carried out association studies for 2 traits related to udder health, udder index, and milking speed in Nordic Holstein cattle. A total of 4,921 bulls genotyped with the BovineSNP50 BeadChip array were imputed to high-density genotypes (Illumina BovineHD BeadChip, Illumina, San Diego, CA) and, subsequently, to whole-genome sequence variants. An association analysis was carried out using a linear mixed model. Phenotypes used in the association analyses were deregressed breeding values. Multitrait meta-analysis was carried out for these 2 traits. We identified 10 and 8 chromosomes harboring markers that were significantly associated with udder index and milking speed, respectively. Strongest association signals were observed on chromosome 20 for udder index and chromosome 19 for milking speed. Multitrait meta-analysis identified 13 chromosomes harboring associated markers for the combination of udder index and milking speed. The associated region on chromosome 20 overlapped with earlier reported quantitative trait loci for similar traits in other cattle populations. Moreover, this region was located close to the FYB gene, which is involved in platelet activation and controls IL-2 expression; FYB is a strong candidate gene for udder health and worthy of further investigation. Copyright © 2018 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  2. Soil amino acid composition across a boreal forest successional sequence

    Treesearch

    Nancy R. Werdin-Pfisterer; Knut Kielland; Richard D. Boone

    2009-01-01

    Soil amino acids are important sources of organic nitrogen for plant nutrition, yet few studies have examined which amino acids are most prevalent in the soil. In this study, we examined the composition, concentration, and seasonal patterns of soil amino acids across a primary successional sequence encompassing a natural gradient of plant productivity and soil...

  3. elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling

    PubMed Central

    Decap, Dries; Fostier, Jan; Reumers, Joke

    2015-01-01

    elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878), we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878), elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost. PMID:26182406

  4. Preferential amino acid sequences in alumina-catalyzed peptide bond formation.

    PubMed

    Bujdák, J; Rode, B M

    2002-05-21

    The catalytic effect of activated alumina on amino acid condensation was investigated. The readiness of amino acids to form peptide sequences was estimated on the basis of the yield of dipeptides and was found to decrease in the order glycine (Gly), alanine (Ala), leucine (Leu), valine (Val), proline (Pro). For example, approximately 15% Gly was converted to the dipeptide (Gly(2)), 5% to cyclic anhydride (cyc(Gly(2))) and small amounts of tri- (Gly(3)) and tetrapeptide (Gly(4)) were formed after 28 days. On the other hand, only trace amounts of Pro(2) were formed from proline under the same conditions. Preferential formation of certain sequences was observed in the mixed reaction systems containing two amino acids. For example, almost ten times more Gly-Val than Val-Gly was formed in the Gly+Val reaction system. The preferred sequences can be explained on the basis of an inductive effect that side groups have on the nucleophilicity and electrophilicity, respectively, of the amino and carboxyl groups. A comparison with published data of amino acid reactions in other reaction systems revealed that the main trends of preferential sequence formation were the same as those described for the salt-induced peptide formation (SIPF) reaction. The results of this work and other previously published papers show that alumina and related mineral surfaces might have played a crucial role in the prebiotic formation of the first peptides on the primitive earth.

  5. Detection of Clinically Relevant Genetic Variants in Autism Spectrum Disorder by Whole-Genome Sequencing

    PubMed Central

    Jiang, Yong-hui; Yuen, Ryan K.C.; Jin, Xin; Wang, Mingbang; Chen, Nong; Wu, Xueli; Ju, Jia; Mei, Junpu; Shi, Yujian; He, Mingze; Wang, Guangbiao; Liang, Jieqin; Wang, Zhe; Cao, Dandan; Carter, Melissa T.; Chrysler, Christina; Drmic, Irene E.; Howe, Jennifer L.; Lau, Lynette; Marshall, Christian R.; Merico, Daniele; Nalpathamkalam, Thomas; Thiruvahindrapuram, Bhooma; Thompson, Ann; Uddin, Mohammed; Walker, Susan; Luo, Jun; Anagnostou, Evdokia; Zwaigenbaum, Lonnie; Ring, Robert H.; Wang, Jian; Lajonchere, Clara; Wang, Jun; Shih, Andy; Szatmari, Peter; Yang, Huanming; Dawson, Geraldine; Li, Yingrui; Scherer, Stephen W.

    2013-01-01

    Autism Spectrum Disorder (ASD) demonstrates high heritability and familial clustering, yet the genetic causes remain only partially understood as a result of extensive clinical and genomic heterogeneity. Whole-genome sequencing (WGS) shows promise as a tool for identifying ASD risk genes as well as unreported mutations in known loci, but an assessment of its full utility in an ASD group has not been performed. We used WGS to examine 32 families with ASD to detect de novo or rare inherited genetic variants predicted to be deleterious (loss-of-function and damaging missense mutations). Among ASD probands, we identified deleterious de novo mutations in six of 32 (19%) families and X-linked or autosomal inherited alterations in ten of 32 (31%) families (some had combinations of mutations). The proportion of families identified with such putative mutations was larger than has been previously reported; this yield was in part due to the comprehensive and uniform coverage afforded by WGS. Deleterious variants were found in four unrecognized, nine known, and eight candidate ASD risk genes. Examples include CAPRIN1 and AFF2 (both linked to FMR1, which is involved in fragile X syndrome), VIP (involved in social-cognitive deficits), and other genes such as SCN2A and KCNQ2 (linked to epilepsy), NRXN1, and CHD7, which causes ASD-associated CHARGE syndrome. Taken together, these results suggest that WGS and thorough bioinformatic analyses for de novo and rare inherited mutations will improve the detection of genetic variants likely to be associated with ASD or its accompanying clinical symptoms. PMID:23849776

  6. Cellobiohydrolase variants and polynucleotides encoding the same

    DOEpatents

    Wogulis, Mark

    2014-09-09

    The present invention relates to variants of a parent cellobiohydrolase. The present invention also relates to polynucleotides encoding the cellobiohydrolase variants; nucleic acid constructs, vectors, and host cells comprising the polynucleotides; and methods of using the cellobiohydrolase variants.

  7. nbCNV: a multi-constrained optimization model for discovering copy number variants in single-cell sequencing data.

    PubMed

    Zhang, Changsheng; Cai, Hongmin; Huang, Jingying; Song, Yan

    2016-09-17

    Variations in DNA copy number have an important contribution to the development of several diseases, including autism, schizophrenia and cancer. Single-cell sequencing technology allows the dissection of genomic heterogeneity at the single-cell level, thereby providing important evolutionary information about cancer cells. In contrast to traditional bulk sequencing, single-cell sequencing requires the amplification of the whole genome of a single cell to accumulate enough samples for sequencing. However, the amplification process inevitably introduces amplification bias, resulting in an over-dispersing portion of the sequencing data. Recent study has manifested that the over-dispersed portion of the single-cell sequencing data could be well modelled by negative binomial distributions. We developed a read-depth based method, nbCNV to detect the copy number variants (CNVs). The nbCNV method uses two constraints-sparsity and smoothness to fit the CNV patterns under the assumption that the read signals are negatively binomially distributed. The problem of CNV detection was formulated as a quadratic optimization problem, and was solved by an efficient numerical solution based on the classical alternating direction minimization method. Extensive experiments to compare nbCNV with existing benchmark models were conducted on both simulated data and empirical single-cell sequencing data. The results of those experiments demonstrate that nbCNV achieves superior performance and high robustness for the detection of CNVs in single-cell sequencing data.

  8. Detection of genome-wide copy number variants in myeloid malignancies using next-generation sequencing.

    PubMed

    Shen, Wei; Paxton, Christian N; Szankasi, Philippe; Longhurst, Maria; Schumacher, Jonathan A; Frizzell, Kimberly A; Sorrells, Shelly M; Clayton, Adam L; Jattani, Rakhi P; Patel, Jay L; Toydemir, Reha; Kelley, Todd W; Xu, Xinjie

    2018-04-01

    Genetic abnormalities, including copy number variants (CNV), copy number neutral loss of heterozygosity (CN-LOH) and gene mutations, underlie the pathogenesis of myeloid malignancies and serve as important diagnostic, prognostic and/or therapeutic markers. Currently, multiple testing strategies are required for comprehensive genetic testing in myeloid malignancies. The aim of this proof-of-principle study was to investigate the feasibility of combining detection of genome-wide large CNVs, CN-LOH and targeted gene mutations into a single assay using next-generation sequencing (NGS). For genome-wide CNV detection, we designed a single nucleotide polymorphism (SNP) sequencing backbone with 22 762 SNP regions evenly distributed across the entire genome. For targeted mutation detection, 62 frequently mutated genes in myeloid malignancies were targeted. We combined this SNP sequencing backbone with a targeted mutation panel, and sequenced 9 healthy individuals and 16 patients with myeloid malignancies using NGS. We detected 52 somatic CNVs, 11 instances of CN-LOH and 39 oncogenic mutations in the 16 patients with myeloid malignancies, and none in the 9 healthy individuals. All CNVs and CN-LOH were confirmed by SNP microarray analysis. We describe a genome-wide SNP sequencing backbone which allows for sensitive detection of genome-wide CNVs and CN-LOH using NGS. This proof-of-principle study has demonstrated that this strategy can provide more comprehensive genetic profiling for patients with myeloid malignancies using a single assay. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  9. Rare mtDNA variants in Leber hereditary optic neuropathy families with recurrence of myoclonus.

    PubMed

    La Morgia, C; Achilli, A; Iommarini, L; Barboni, P; Pala, M; Olivieri, A; Zanna, C; Vidoni, S; Tonon, C; Lodi, R; Vetrugno, R; Mostacci, B; Liguori, R; Carroccia, R; Montagna, P; Rugolo, M; Torroni, A; Carelli, V

    2008-03-04

    To investigate the mechanisms underlying myoclonus in Leber hereditary optic neuropathy (LHON). Five patients and one unaffected carrier from two Italian families bearing the homoplasmic 11778/ND4 and 3460/ND1 mutations underwent a uniform investigation including neurophysiologic studies, muscle biopsy, serum lactic acid after exercise, and muscle ((31)P) and cerebral ((1)H) magnetic resonance spectroscopy (MRS). Biochemical investigations on fibroblasts and complete mitochondrial DNA (mtDNA) sequences of both families were also performed. All six individuals had myoclonus. In spite of a normal EEG background and the absence of giant SEPs and C reflex, EEG-EMG back-averaging showed a preceding jerk-locked EEG potential, consistent with a cortical generator of the myoclonus. Specific comorbidities in the 11778/ND4 family included muscular cramps and psychiatric disorders, whereas features common to both families were migraine and cardiologic abnormalities. Signs of mitochondrial proliferation were seen in muscle biopsies and lactic acid elevation was observed in four of six patients. (31)P-MRS was abnormal in five of six patients and (1)H-MRS showed ventricular accumulation of lactic acid in three of six patients. Fibroblast ATP depletion was evident at 48 hours incubation with galactose in LHON/myoclonus patients. Sequence analysis revealed haplogroup T2 (11778/ND4 family) and U4a (3460/ND1 family) mtDNAs. A functional role for the non-synonymous 4136A>G/ND1, 9139G>A/ATPase6, and 15773G>A/cyt b variants was supported by amino acid conservation analysis. Myoclonus and other comorbidities characterized our Leber hereditary optic neuropathy (LHON) families. Functional investigations disclosed a bioenergetic impairment in all individuals. Our sequence analysis suggests that the LHON plus phenotype in our cases may relate to the synergic role of mtDNA variants.

  10. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2010-07-01 2010-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide and...

  11. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2011-07-01 2011-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide and...

  12. Androgen Receptor and its Splice Variant, AR-V7, Differentially Regulate FOXA1 Sensitive Genes in LNCaP Prostate Cancer Cells

    PubMed Central

    Krause, William C.; Shafi, Ayesha A.; Nakka, Manjula; Weigel, Nancy L.

    2014-01-01

    Prostate cancer (PCa) is an androgen-dependent disease, and tumors that are resistant to androgen ablation therapy often remain androgen receptor (AR) dependent. Among the contributors to castration-resistant PCa are AR splice variants that lack the ligand-binding domain (LBD). Instead, they have small amounts of unique sequence derived from cryptic exons or from out of frame translation. The AR-V7 (or AR3) variant is constitutively active and is expressed under conditions consistent with CRPC. AR-V7 is reported to regulate a transcriptional program that is similar but not identical to that of AR. However, it is unknown whether these differences are due to the unique sequence in AR-V7, or simply to loss of the LBD. To examine transcriptional regulation by AR-V7, we have used lentiviruses encoding AR-V7 (amino acids 1-627 of AR with the 16 amino acids unique to the variant) to prepare a derivative of the androgen-dependent LNCaP cells with inducible expression of AR-V7. An additional cell line was generated with regulated expression of AR-NTD (amino acids 1-660 of AR); this mutant lacks the LBD but does not have the AR-V7 specific sequence. We find that AR and AR-V7 have distinct activities on target genes that are co-regulated by FOXA1. Transcripts regulated by AR-V7 were similarly regulated by AR-NTD, indicating that loss of the LBD is sufficient for the observed differences. Differential regulation of target genes correlates with preferential recruitment of AR or AR-V7 to specific cis-regulatory DNA sequences providing an explanation for some of the observed differences in target gene regulation. PMID:25008967

  13. Genetic variants of the unsaturated fatty acid receptor GPR120 relating to obesity in dogs

    PubMed Central

    MIYABE, Masahiro; GIN, Azusa; ONOZAWA, Eri; DAIMON, Mana; YAMADA, Hana; ODA, Hitomi; MORI, Akihiro; MOMOTA, Yutaka; AZAKAMI, Daigo; YAMAMOTO, Ichiro; MOCHIZUKI, Mariko; SAKO, Toshinori; TAMURA, Katsutoshi; ISHIOKA, Katsumi

    2015-01-01

    G protein-coupled receptor (GPR) 120 is an unsaturated fatty acid receptor, which is associated with various physiological functions. It is reported that the genetic variant of GPR120, p.Arg270His, is detected more in obese people, and this genetic variation functionally relates to obesity in humans. Obesity is a common nutritional disorder also in dogs, but the genetic factors have not ever been identified in dogs. In this study, we investigated the molecular structure of canine GPR120 and searched for candidate genetic variants which may relate to obesity in dogs. Canine GPR120 was highly homologous to those of other species, and seven transmembrane domains and two N-glycosylation sites were conserved. GPR120 mRNA was expressed in lung, jejunum, ileum, colon, hypothalamus, hippocampus, spinal cord, bone marrow, dermis and white adipose tissues in dogs, as those in mice and humans. Genetic variants of GPR120 were explored in client-owned 141 dogs, resulting in that 5 synonymous and 4 non-synonymous variants were found. The variant c.595C>A (p.Pro199Thr) was found in 40 dogs, and the gene frequency was significantly higher in dogs with higher body condition scores, i.e. 0.320 in BCS4–5 dogs, 0.175 in BCS3 dogs and 0.000 in BCS2 dogs. We conclude that c.595C>A (p.Pro199Thr) is a candidate variant relating to obesity, which may be helpful for nutritional management of dogs. PMID:25960032

  14. Genetic variants of the unsaturated fatty acid receptor GPR120 relating to obesity in dogs.

    PubMed

    Miyabe, Masahiro; Gin, Azusa; Onozawa, Eri; Daimon, Mana; Yamada, Hana; Oda, Hitomi; Mori, Akihiro; Momota, Yutaka; Azakami, Daigo; Yamamoto, Ichiro; Mochizuki, Mariko; Sako, Toshinori; Tamura, Katsutoshi; Ishioka, Katsumi

    2015-10-01

    G protein-coupled receptor (GPR) 120 is an unsaturated fatty acid receptor, which is associated with various physiological functions. It is reported that the genetic variant of GPR120, p.Arg270His, is detected more in obese people, and this genetic variation functionally relates to obesity in humans. Obesity is a common nutritional disorder also in dogs, but the genetic factors have not ever been identified in dogs. In this study, we investigated the molecular structure of canine GPR120 and searched for candidate genetic variants which may relate to obesity in dogs. Canine GPR120 was highly homologous to those of other species, and seven transmembrane domains and two N-glycosylation sites were conserved. GPR120 mRNA was expressed in lung, jejunum, ileum, colon, hypothalamus, hippocampus, spinal cord, bone marrow, dermis and white adipose tissues in dogs, as those in mice and humans. Genetic variants of GPR120 were explored in client-owned 141 dogs, resulting in that 5 synonymous and 4 non-synonymous variants were found. The variant c.595C>A (p.Pro199Thr) was found in 40 dogs, and the gene frequency was significantly higher in dogs with higher body condition scores, i.e. 0.320 in BCS4-5 dogs, 0.175 in BCS3 dogs and 0.000 in BCS2 dogs. We conclude that c.595C>A (p.Pro199Thr) is a candidate variant relating to obesity, which may be helpful for nutritional management of dogs.

  15. BreaKmer: detection of structural variation in targeted massively parallel sequencing data using kmers.

    PubMed

    Abo, Ryan P; Ducar, Matthew; Garcia, Elizabeth P; Thorner, Aaron R; Rojas-Rudilla, Vanesa; Lin, Ling; Sholl, Lynette M; Hahn, William C; Meyerson, Matthew; Lindeman, Neal I; Van Hummelen, Paul; MacConaill, Laura E

    2015-02-18

    Genomic structural variation (SV), a common hallmark of cancer, has important predictive and therapeutic implications. However, accurately detecting SV using high-throughput sequencing data remains challenging, especially for 'targeted' resequencing efforts. This is critically important in the clinical setting where targeted resequencing is frequently being applied to rapidly assess clinically actionable mutations in tumor biopsies in a cost-effective manner. We present BreaKmer, a novel approach that uses a 'kmer' strategy to assemble misaligned sequence reads for predicting insertions, deletions, inversions, tandem duplications and translocations at base-pair resolution in targeted resequencing data. Variants are predicted by realigning an assembled consensus sequence created from sequence reads that were abnormally aligned to the reference genome. Using targeted resequencing data from tumor specimens with orthogonally validated SV, non-tumor samples and whole-genome sequencing data, BreaKmer had a 97.4% overall sensitivity for known events and predicted 17 positively validated, novel variants. Relative to four publically available algorithms, BreaKmer detected SV with increased sensitivity and limited calls in non-tumor samples, key features for variant analysis of tumor specimens in both the clinical and research settings. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Comparative transcriptome analysis of three color variants of the sea cucumber Apostichopus japonicus.

    PubMed

    Jo, Jihoon; Park, Jongsun; Lee, Hyun-Gwan; Kern, Elizabeth M A; Cheon, Seongmin; Jin, Soyeong; Park, Joong-Ki; Cho, Sung-Jin; Park, Chungoo

    2016-08-01

    The sea cucumber Apostichopus japonicus Selenka 1867 represents an important resource in biomedical research, traditional medicine, and the seafood industry. Much of the commercial value of A. japonicus is determined by dorsal/ventral color variation (red, green, and black), yet the taxonomic relationships between these color variants are not clearly understood. We performed the first comparative analysis of de novo assembled transcriptome data from three color variants of A. japonicus. Using the Illumina platform, we sequenced nearly 177,596,774 clean reads representing a total of 18.2Gbp of sea cucumber transcriptome. A comparison of over 0.3 million transcript scaffolds against the Uniprot/Swiss-Prot database yielded 8513, 8602, and 8588 positive matches for green, red, and black body color transcriptomes, respectively. Using the Panther gene classification system, we assessed an extensive and diverse set of expressed genes in three color variants and found that (1) among the three color variants of A. japonicus, genes associated with RNA binding protein, oxidoreductase, nucleic acid binding, transferase, and KRAB box transcription factor were most commonly expressed; and (2) the main protein functional classes are differently regulated in all three color variants (extracellular matrix protein and phosphatase for green color, transporter and potassium channel for red color, and G-protein modulator and enzyme modulator for black color). This work will assist in the discovery and annotation of novel genes that play significant morphological and physiological roles in color variants of A. japonicus, and these sequence data will provide a useful set of resources for the rapidly growing sea cucumber aquaculture industry. Copyright © 2016 Elsevier B.V. All rights reserved.

  17. Common and rare von Willebrand factor (VWF) coding variants, VWF levels, and factor VIII levels in African Americans: the NHLBI Exome Sequencing Project.

    PubMed

    Johnsen, Jill M; Auer, Paul L; Morrison, Alanna C; Jiao, Shuo; Wei, Peng; Haessler, Jeffrey; Fox, Keolu; McGee, Sean R; Smith, Joshua D; Carlson, Christopher S; Smith, Nicholas; Boerwinkle, Eric; Kooperberg, Charles; Nickerson, Deborah A; Rich, Stephen S; Green, David; Peters, Ulrike; Cushman, Mary; Reiner, Alex P

    2013-07-25

    Several rare European von Willebrand disease missense variants of VWF (including p.Arg2185Gln and p.His817Gln) were recently reported to be common in apparently healthy African Americans (AAs). Using data from the NHLBI Exome Sequencing Project, we assessed the association of these and other VWF coding variants with von Willebrand factor (VWF) and factor VIII (FVIII) levels in 4468 AAs. Of 30 nonsynonymous VWF variants, 6 were significantly and independently associated (P < .001) with levels of VWF and/or FVIII. Each additional copy of the common VWF variants encoding p.Thr789Ala or p.Asp1472His was associated with 6 to 8 IU/dL higher VWF levels. The VWF variant encoding p.Arg2185Gln was associated with 7 to 13 IU/dL lower VWF and FVIII levels. The type 2N-related VWF variant encoding p.His817Gln was associated with 17 IU/dL lower FVIII level but normal VWF level. A novel, rare missense VWF variant that predicts disruption of an O-glycosylation site (p.Ser1486Leu) and a rare variant encoding p.Arg2287Trp were each associated with 30 to 40 IU/dL lower VWF level (P < .001). In summary, several common and rare VWF missense variants contribute to phenotypic differences in VWF and FVIII among AAs.

  18. Beta-glucosidase variants and polynucleotides encoding same

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wogulis, Mark; Harris, Paul; Osborn, David

    The present invention relates to beta-glucosidase variants, e.g. beta-glucosidase variants of a parent Family GH3A beta-glucosidase from Aspergillus fumigatus. The present invention also relates to polynucleotides encoding the beta-glucosidase variants; nucleic acid constructs, vectors, and host cells comprising the polynucleotides; and methods of using the beta-glucosidase variants.

  19. Neurodevelopmental disease-associated de novo mutations and rare sequence variants affect TRIO GDP/GTP exchange factor activity.

    PubMed

    Katrancha, Sara M; Wu, Yi; Zhu, Minsheng; Eipper, Betty A; Koleske, Anthony J; Mains, Richard E

    2017-12-01

    Bipolar disorder, schizophrenia, autism and intellectual disability are complex neurodevelopmental disorders, debilitating millions of people. Therapeutic progress is limited by poor understanding of underlying molecular pathways. Using a targeted search, we identified an enrichment of de novo mutations in the gene encoding the 330-kDa triple functional domain (TRIO) protein associated with neurodevelopmental disorders. By generating multiple TRIO antibodies, we show that the smaller TRIO9 isoform is the major brain protein product, and its levels decrease after birth. TRIO9 contains two guanine nucleotide exchange factor (GEF) domains with distinct specificities: GEF1 activates both Rac1 and RhoG; GEF2 activates RhoA. To understand the impact of disease-associated de novo mutations and other rare sequence variants on TRIO function, we utilized two FRET-based biosensors: a Rac1 biosensor to study mutations in TRIO (T)GEF1, and a RhoA biosensor to study mutations in TGEF2. We discovered that one autism-associated de novo mutation in TGEF1 (K1431M), at the TGEF1/Rac1 interface, markedly decreased its overall activity toward Rac1. A schizophrenia-associated rare sequence variant in TGEF1 (F1538Intron) was substantially less active, normalized to protein level and expressed poorly. Overall, mutations in TGEF1 decreased GEF1 activity toward Rac1. One bipolar disorder-associated rare variant (M2145T) in TGEF2 impaired inhibition by the TGEF2 pleckstrin-homology domain, resulting in dramatically increased TGEF2 activity. Overall, genetic damage to both TGEF domains altered TRIO catalytic activity, decreasing TGEF1 activity and increasing TGEF2 activity. Importantly, both GEF changes are expected to decrease neurite outgrowth, perhaps consistent with their association with neurodevelopmental disorders. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  20. OVA: integrating molecular and physical phenotype data from multiple biomedical domain ontologies with variant filtering for enhanced variant prioritization.

    PubMed

    Antanaviciute, Agne; Watson, Christopher M; Harrison, Sally M; Lascelles, Carolina; Crinnion, Laura; Markham, Alexander F; Bonthron, David T; Carr, Ian M

    2015-12-01

    Exome sequencing has become a de facto standard method for Mendelian disease gene discovery in recent years, yet identifying disease-causing mutations among thousands of candidate variants remains a non-trivial task. Here we describe a new variant prioritization tool, OVA (ontology variant analysis), in which user-provided phenotypic information is exploited to infer deeper biological context. OVA combines a knowledge-based approach with a variant-filtering framework. It reduces the number of candidate variants by considering genotype and predicted effect on protein sequence, and scores the remainder on biological relevance to the query phenotype.We take advantage of several ontologies in order to bridge knowledge across multiple biomedical domains and facilitate computational analysis of annotations pertaining to genes, diseases, phenotypes, tissues and pathways. In this way, OVA combines information regarding molecular and physical phenotypes and integrates both human and model organism data to effectively prioritize variants. By assessing performance on both known and novel disease mutations, we show that OVA performs biologically meaningful candidate variant prioritization and can be more accurate than another recently published candidate variant prioritization tool. OVA is freely accessible at http://dna2.leeds.ac.uk:8080/OVA/index.jsp. Supplementary data are available at Bioinformatics online. umaan@leeds.ac.uk. © The Author 2015. Published by Oxford University Press.

  1. An Improved Variant of Soybean Type 1 Diacylglycerol Acyltransferase Increases the Oil Content and Decreases the Soluble Carbohydrate Content of Soybeans.

    PubMed

    Roesler, Keith; Shen, Bo; Bermudez, Ericka; Li, Changjiang; Hunt, Joanne; Damude, Howard G; Ripp, Kevin G; Everard, John D; Booth, John R; Castaneda, Leandro; Feng, Lizhi; Meyer, Knut

    2016-06-01

    Kinetically improved diacylglycerol acyltransferase (DGAT) variants were created to favorably alter carbon partitioning in soybean (Glycine max) seeds. Initially, variants of a type 1 DGAT from a high-oil, high-oleic acid plant seed, Corylus americana, were screened for high oil content in Saccharomyces cerevisiae Nearly all DGAT variants examined from high-oil strains had increased affinity for oleoyl-CoA, with S0.5 values decreased as much as 4.7-fold compared with the wild-type value of 0.94 µm Improved soybean DGAT variants were then designed to include amino acid substitutions observed in promising C. americana DGAT variants. The expression of soybean and C. americana DGAT variants in soybean somatic embryos resulted in oil contents as high as 10% and 12%, respectively, compared with only 5% and 7.6% oil achieved by overexpressing the corresponding wild-type DGATs. The affinity for oleoyl-CoA correlated strongly with oil content. The soybean DGAT variant that gave the greatest oil increase contained 14 amino acid substitutions out of a total of 504 (97% sequence identity with native). Seed-preferred expression of this soybean DGAT1 variant increased oil content of soybean seeds by an average of 3% (16% relative increase) in highly replicated, single-location field trials. The DGAT transgenes significantly reduced the soluble carbohydrate content of mature seeds and increased the seed protein content of some events. This study demonstrated that engineering of the native DGAT enzyme is an effective strategy to improve the oil content and value of soybeans. © 2016 American Society of Plant Biologists. All Rights Reserved.

  2. The Saccharomyces Genome Database Variant Viewer

    PubMed Central

    Sheppard, Travis K.; Hitz, Benjamin C.; Engel, Stacia R.; Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C.; Dalusag, Kyla S.; Demeter, Janos; Hellerstedt, Sage T.; Karra, Kalpana; Nash, Robert S.; Paskov, Kelley M.; Skrzypek, Marek S.; Weng, Shuai; Wong, Edith D.; Cherry, J. Michael

    2016-01-01

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. In recent years, we have moved toward increased representation of sequence variation and allelic differences within S. cerevisiae. The publication of numerous additional genomes has motivated the creation of new tools for their annotation and analysis. Here we present the Variant Viewer: a dynamic open-source web application for the visualization of genomic and proteomic differences. Multiple sequence alignments have been constructed across high quality genome sequences from 11 different S. cerevisiae strains and stored in the SGD. The alignments and summaries are encoded in JSON and used to create a two-tiered dynamic view of the budding yeast pan-genome, available at http://www.yeastgenome.org/variant-viewer. PMID:26578556

  3. Top-down mass spectrometry reveals new sequence variants of the major bovine seminal plasma protein PDC-109.

    PubMed

    Laitaoja, Mikko; Sankhala, Rajeshwer S; Swamy, Musti J; Jänis, Janne

    2012-07-01

    The major protein of bovine seminal plasma, PDC-109, is a 109-residue polypeptide that exists as a polydisperse aggregate under native conditions. The oligomeric state of this aggregate varies with ionic strength and the presence of lipids. Binding of PDC-109 to choline phospholipids on the sperm plasma membrane results in an efflux of cholesterol and choline phospholipids, which is an important step in sperm capacitation. In this study, Fourier transform ion cyclotron resonance mass spectrometry was used to analyze PDC-109 purified from bovine seminal plasma. In addition to the previously known PDC-109 variants, four new sequence variants were identified by top-down mass spectrometry. For example, a protein variant containing point mutations P10L and G14R was identified along with another form having a 14-residue truncation in the N-terminal region. Two other minor variants could also be identified from the affinity-purified PDC-109. These results demonstrate that PDC-109 is naturally produced as a mixture of several protein forms, most of which have not been detected in previous studies. Native mass spectrometry revealed that PDC-109 is exclusively monomeric at low protein concentrations, suggesting that the protein oligomers are weakly bound and can easily be disrupted. Ligand binding to PDC-109 was also investigated, and it was observed that two molecules of O-phosphorylcholine bind to each PDC-109 monomer, consistent with previous reports. Copyright © 2012 John Wiley & Sons, Ltd.

  4. The complete amino acid sequence of human skeletal-muscle fructose-bisphosphate aldolase.

    PubMed Central

    Freemont, P S; Dunbar, B; Fothergill-Gilmore, L A

    1988-01-01

    The complete amino acid sequence of human skeletal-muscle fructose-bisphosphate aldolase, comprising 363 residues, was determined. The sequence was deduced by automated sequencing of CNBr-cleavage, o-iodosobenzoic acid-cleavage, trypsin-digest and staphylococcal-proteinase-digest fragments. Comparison of the sequence with other class I aldolase sequences shows that the mammalian muscle isoenzyme is one of the most highly conserved enzymes known, with only about 2% of the residues changing per 100 million years. Non-mammalian aldolases appear to be evolving at the same rate as other glycolytic enzymes, with about 4% of the residues changing per 100 million years. Secondary-structure predictions are analysed in an accompanying paper [Sawyer, Fothergill-Gilmore & Freemont (1988) Biochem. J. 249, 789-793]. PMID:3355497

  5. Candidate Sequence Variants and Fetal Hemoglobin in Children with Sickle Cell Disease Treated with Hydroxyurea

    PubMed Central

    Green, Nancy S.; Ender, Katherine L.; Pashankar, Farzana; Driscoll, Catherine; Giardina, Patricia J.; Mullen, Craig A.; Clark, Lorraine N.; Manwani, Deepa; Crotty, Jennifer; Kisselev, Sergey; Neville, Kathleen A.; Hoppe, Carolyn; Barral, Sandra

    2013-01-01

    Background Fetal hemoglobin level is a heritable complex trait that strongly correlates swith the clinical severity of sickle cell disease. Only few genetic loci have been identified as robustly associated with fetal hemoglobin in patients with sickle cell disease, primarily adults. The sole approved pharmacologic therapy for this disease is hydroxyurea, with effects largely attributable to induction of fetal hemoglobin. Methodology/Principal Findings In a multi-site observational analysis of children with sickle cell disease, candidate single nucleotide polymorphisms associated with baseline fetal hemoglobin levels in adult sickle cell disease were examined in children at baseline and induced by hydroxyurea therapy. For baseline levels, single marker analysis demonstrated significant association with BCL11A and the beta and epsilon globin loci (HBB and HBE, respectively), with an additive attributable variance from these loci of 23%. Among a subset of children on hydroxyurea, baseline fetal hemoglobin levels explained 33% of the variance in induced levels. The variant in HBE accounted for an additional 13% of the variance in induced levels, while variants in the HBB and BCL11A loci did not contribute beyond baseline levels. Conclusions/Significance These findings clarify the overlap between baseline and hydroxyurea-induced fetal hemoglobin levels in pediatric disease. Studies assessing influences of specific sequence variants in these and other genetic loci in larger populations and in unusual hydroxyurea responders are needed to further understand the maintenance and therapeutic induction of fetal hemoglobin in pediatric sickle cell disease. PMID:23409025

  6. Rare variant associations with waist-to-hip ratio in European-American and African-American women from the NHLBI-Exome Sequencing Project.

    PubMed

    Kan, Mengyuan; Auer, Paul L; Wang, Gao T; Bucasas, Kristine L; Hooker, Stanley; Rodriguez, Alejandra; Li, Biao; Ellis, Jaclyn; Adrienne Cupples, L; Ida Chen, Yii-Der; Dupuis, Josée; Fox, Caroline S; Gross, Myron D; Smith, Joshua D; Heard-Costa, Nancy; Meigs, James B; Pankow, James S; Rotter, Jerome I; Siscovick, David; Wilson, James G; Shendure, Jay; Jackson, Rebecca; Peters, Ulrike; Zhong, Hua; Lin, Danyu; Hsu, Li; Franceschini, Nora; Carlson, Chris; Abecasis, Goncalo; Gabriel, Stacey; Bamshad, Michael J; Altshuler, David; Nickerson, Deborah A; North, Kari E; Lange, Leslie A; Reiner, Alexander P; Leal, Suzanne M

    2016-08-01

    Waist-to-hip ratio (WHR), a relative comparison of waist and hip circumferences, is an easily accessible measurement of body fat distribution, in particular central abdominal fat. A high WHR indicates more intra-abdominal fat deposition and is an established risk factor for cardiovascular disease and type 2 diabetes. Recent genome-wide association studies have identified numerous common genetic loci influencing WHR, but the contributions of rare variants have not been previously reported. We investigated rare variant associations with WHR in 1510 European-American and 1186 African-American women from the National Heart, Lung, and Blood Institute-Exome Sequencing Project. Association analysis was performed on the gene level using several rare variant association methods. The strongest association was observed for rare variants in IKBKB (P=4.0 × 10(-8)) in European-Americans, where rare variants in this gene are predicted to decrease WHRs. The activation of the IKBKB gene is involved in inflammatory processes and insulin resistance, which may affect normal food intake and body weight and shape. Meanwhile, aggregation of rare variants in COBLL1, previously found to harbor common variants associated with WHR and fasting insulin, were nominally associated (P=2.23 × 10(-4)) with higher WHR in European-Americans. However, these significant results are not shared between African-Americans and European-Americans that may be due to differences in the allelic architecture of the two populations and the small sample sizes. Our study indicates that the combined effect of rare variants contribute to the inter-individual variation in fat distribution through the regulation of insulin response.

  7. Guillain-Barré Syndrome: A Variant Consisting of Facial Diplegia and Paresthesia with Left Facial Hemiplegia Associated with Antibodies to Galactocerebroside and Phosphatidic Acid.

    PubMed

    Nishiguchi, Sho; Branch, Joel; Tsuchiya, Tsubasa; Ito, Ryoji; Kawada, Junya

    2017-10-02

    BACKGROUND A rare variant of Guillain-Barré syndrome (GBS) consists of facial diplegia and paresthesia, but an even more rare association is with facial hemiplegia, similar to Bell's palsy. This case report is of this rare variant of GBS that was associated with IgG antibodies to galactocerebroside and phosphatidic acid. CASE REPORT A 54-year-old man presented with lower left facial palsy and paresthesia of his extremities, following an upper respiratory tract infection. Physical examination confirmed lower left facial palsy and paresthesia of his extremities with hyporeflexia of his lower limbs and sensory loss of all four extremities. The differential diagnosis was between a variant of GBS and Bell's palsy. Following initial treatment with glucocorticoids followed by intravenous immunoglobulin (IVIG), his sensory abnormalities resolved. Serum IgG antibodies to galactocerebroside and phosphatidic acid were positive in this patient, but not other antibodies to glycolipids or phospholipids were found. Five months following discharge from hospital, his left facial palsy had improved. CONCLUSIONS A case of a rare variant of GBS is presented with facial diplegia and paresthesia and with unilateral facial palsy. This rare variant of GBS may which may mimic Bell's palsy. In this case, IgG antibodies to galactocerebroside and phosphatidic acid were detected.

  8. Sequences Of Amino Acids For Human Serum Albumin

    NASA Technical Reports Server (NTRS)

    Carter, Daniel C.

    1992-01-01

    Sequences of amino acids defined for use in making polypeptides one-third to one-sixth as large as parent human serum albumin molecule. Smaller, chemically stable peptides have diverse applications including service as artificial human serum and as active components of biosensors and chromatographic matrices. In applications involving production of artificial sera from new sequences, little or no concern about viral contaminants. Smaller genetically engineered polypeptides more easily expressed and produced in large quantities, making commercial isolation and production more feasible and profitable.

  9. Analysis of CHRNA7 rare variants in autism spectrum disorder susceptibility.

    PubMed

    Bacchelli, Elena; Battaglia, Agatino; Cameli, Cinzia; Lomartire, Silvia; Tancredi, Raffaella; Thomson, Susanne; Sutcliffe, James S; Maestrini, Elena

    2015-04-01

    Chromosome 15q13.3 recurrent microdeletions are causally associated with a wide range of phenotypes, including autism spectrum disorder (ASD), seizures, intellectual disability, and other psychiatric conditions. Whether the reciprocal microduplication is pathogenic is less certain. CHRNA7, encoding for the alpha7 subunit of the neuronal nicotinic acetylcholine receptor, is considered the likely culprit gene in mediating neurological phenotypes in 15q13.3 deletion cases. To assess if CHRNA7 rare variants confer risk to ASD, we performed copy number variant analysis and Sanger sequencing of the CHRNA7 coding sequence in a sample of 135 ASD cases. Sequence variation in this gene remains largely unexplored, given the existence of a fusion gene, CHRFAM7A, which includes a nearly identical partial duplication of CHRNA7. Hence, attempts to sequence coding exons must distinguish between CHRNA7 and CHRFAM7A, making next-generation sequencing approaches unreliable for this purpose. A CHRNA7 microduplication was detected in a patient with autism and moderate cognitive impairment; while no rare damaging variants were identified in the coding region, we detected rare variants in the promoter region, previously described to functionally reduce transcription. This study represents the first sequence variant analysis of CHRNA7 in a sample of idiopathic autism. © 2015 Wiley Periodicals, Inc.

  10. Full-length genome analysis of two genetically distinct variants of porcine epidemic diarrhea virus in Thailand.

    PubMed

    Cheun-Arom, Thaniwan; Temeeyasen, Gun; Tripipat, Thitima; Kaewprommal, Pavita; Piriyapongsa, Jittima; Sukrong, Suchada; Chongcharoen, Wanchai; Tantituvanont, Angkana; Nilubol, Dachrit

    2016-10-01

    Porcine epidemic diarrhea virus (PEDV) has continued to cause sporadic outbreaks in Thailand since 2007 and a pandemic variant containing an insertion and deletion in the spike gene was responsible for outbreaks. In 2014, there were further outbreaks of the disease occurring within four months of each other. In this study, the full-length genome sequences of two genetically distinct PEDV isolates from the outbreaks were characterized. The two PEDV isolates, CBR1/2014 and EAS1/2014, were 28,039 and 28,033 nucleotides in length and showed 96.2% and 93.6% similarities at nucleotide and amino acid levels respectively. In total, we have observed 1048 nucleotide substitutions throughout the genome. Compared to EAS1/2014, CBR1/2014 has 2 insertions of 4 ((56)GENQ(59)) and 1 ((140)N) amino acid positions 56-59 and 140, and 2 deletions of 2 ((160)DG(161)) and 1 ((1199)Y) amino acid positions 160-161 and 1199. The phylogenetic analysis based on full-length genome of CBR1/2014 isolate has grouped the virus with the pandemic variants. In contrast, EAS1/2014 isolate was grouped with CV777, LZC and SM98, a classical variant. Our findings demonstrated the emergence of EAS1/2014, a classical variant which is novel to Thailand and genetically distinct from the currently circulating endemic variants. This study warrants further investigations into molecular epidemiology and genetic evolution of the PEDV in Thailand. Copyright © 2016 Elsevier B.V. All rights reserved.

  11. THAP1/DYT6 sequence variants in non-DYT1 early-onset primary dystonia in China and their effects on RNA expression.

    PubMed

    Cheng, Fu Bo; Ozelius, Laurie J; Wan, Xin Hua; Feng, Jia Chun; Ma, Ling Yan; Yang, Ying Mai; Wang, Lin

    2012-02-01

    Mutations in the THAP1 gene were recently identified as the cause of DYT6 primary dystonia. More than 40 mutations in this gene have been described in different populations. However, no previous report has identified sequence variations that affect the transcript process of the THAP1 gene. In addition, the mutation frequency in Chinese early-onset primary dystonia has not been well characterized. One hundred and two unrelated patients with non-DYT1 early-onset primary dystonia (age at onset <26 years), family members of participants with mutations, and 200 neurologically normal controls were screened for THAP1 gene mutations. The effects of the identified mutations on RNA expression were analyzed using semi-quantitative real-time PCR. Seven sequence variants (c.63_66del TTTC, c.161G>T, c.224A>T, c.267G>A, c.339T>C, c.449A>C, and c.539T>C) were identified in this group of patients (6.9%). In this cohort, 15 subjects (seven unrelated patients and eight family members) were detected to have THAP1 sequence variants. Among these 15 subjects, 11 were manifested (penetrance of DYT6 was 73.3%) and seven presented with craniocervical involvement (63.6%). However, one patient manifested paroxysmal headshake, and one presented with essential hand tremor. Semi-quantitative real-time PCR indicated that a novel silent mutation (c.267G>A) decreased the expression of THAP1 in human lymphocytes. Our findings indicated that THAP1 sequence variants are not common in non-DYT1 early-onset primary dystonia in China and that the clinical manifestation may vary. One silent mutation (c.267G>A) was shown to affect THAP1 expression.

  12. Isolated nucleic acids encoding antipathogenic polypeptides and uses thereof

    DOEpatents

    Altier, Daniel J.; Crane, Virginia C.; Ellanskaya, Irina; Ellanskaya, Natalia; Gilliam, Jacob T.; Hunter-Cevera, Jennie; Presnail, James K.; Schepers, Eric J.; Simmons, Carl R.; Torok, Tamas; Yalpani, Nasser

    2010-04-20

    Compositions and methods for protecting a plant from a pathogen, particularly a fungal pathogen, are provided. Compositions include amino acid sequences, and variants and fragments thereof, for antipathogenic polypeptides that were isolated from fungal fermentation broths. Nucleic acids that encode the antipathogenic polypeptides are also provided. A method for inducing pathogen resistance in a plant using the nucleotide sequences disclosed herein is further provided. The method comprises introducing into a plant an expression cassette comprising a promoter operably linked to a nucleotide sequence that encodes an antipathogenic polypeptide of the invention. Compositions comprising an antipathogenic polypeptide or a transformed microorganism comprising a nucleic acid of the invention in combination with a carrier and methods of using these compositions to protect a plant from a pathogen are further provided. Transformed plants, plant cells, seeds, and microorganisms comprising a nucleotide sequence that encodes an antipathogenic polypeptide of the invention are also disclosed.

  13. Limited Variation in BK Virus T-Cell Epitopes Revealed by Next-Generation Sequencing

    PubMed Central

    Sahoo, Malaya K.; Tan, Susanna K.; Chen, Sharon F.; Kapusinszky, Beatrix; Concepcion, Katherine R.; Kjelson, Lynn; Mallempati, Kalyan; Farina, Heidi M.; Fernández-Viña, Marcelo; Tyan, Dolly; Grimm, Paul C.; Anderson, Matthew W.; Concepcion, Waldo

    2015-01-01

    BK virus (BKV) infection causing end-organ disease remains a formidable challenge to the hematopoietic cell transplant (HCT) and kidney transplant fields. As BKV-specific treatments are limited, immunologic-based therapies may be a promising and novel therapeutic option for transplant recipients with persistent BKV infection. Here, we describe a whole-genome, deep-sequencing methodology and bioinformatics pipeline that identify BKV variants across the genome and at BKV-specific HLA-A2-, HLA-B0702-, and HLA-B08-restricted CD8 T-cell epitopes. BKV whole genomes were amplified using long-range PCR with four inverse primer sets, and fragmentation libraries were sequenced on the Ion Torrent Personal Genome Machine (PGM). An error model and variant-calling algorithm were developed to accurately identify rare variants. A total of 65 samples from 18 pediatric HCT and kidney recipients with quantifiable BKV DNAemia underwent whole-genome sequencing. Limited genetic variation was observed. The median number of amino acid variants identified per sample was 8 (range, 2 to 37; interquartile range, 10), with the majority of variants (77%) detected at a frequency of <5%. When normalized for length, there was no statistical difference in the median number of variants across all genes. Similarly, the predominant virus population within samples harbored T-cell epitopes similar to the reference BKV strain that was matched for the BKV genotype. Despite the conservation of epitopes, low-level variants in T-cell epitopes were detected in 77.7% (14/18) of patients. Understanding epitope variation across the whole genome provides insight into the virus-immune interface and may help guide the development of protocols for novel immunologic-based therapies. PMID:26202116

  14. Clinical implications of SCN1A missense and truncation variants in a large Japanese cohort with Dravet syndrome.

    PubMed

    Ishii, Atsushi; Watkins, Joseph C; Chen, Debbie; Hirose, Shinichi; Hammer, Michael F

    2017-02-01

    Two major classes of SCN1A variants are associated with Dravet syndrome (DS): those that result in haploinsufficiency (truncating) and those that result in an amino acid substitution (missense). The aim of this retrospective study was to describe the first large cohort of Japanese patients with SCN1A mutation-positive DS (n = 285), and investigate the relationship between variant (type and position) and clinical expression and response to treatment. We sequenced all exons and intron-exon boundaries of SCN1A in our cohort, investigated differences in the distribution of truncating and missense variants, tested for associations between variant type and phenotype, and compared these patterns with those of cohorts with milder epilepsy and healthy individuals. Unlike truncation variants, missense variants are found at higher density in the S4 voltage sensor and pore loops and at lower density in the domain I-II and II-III linkers and the first three segments of domain II. Relative to healthy individuals, there is an increased frequency of truncating (but not missense) variants in the noncoding C-terminus. The rate of cognitive decline is more rapid for patients with truncation variants regardless of age at seizure onset, whereas age at onset is a predictor of the rate of cognitive decline for patients with missense variants. We found significant differences in the distribution of truncating and missense variants across the SCN1A sequence among healthy individuals, patients with DS, and those with milder forms of SCN1A-variant positive epilepsy. Testing for associations with phenotype revealed that variant type can be predictive of rate of cognitive decline. Analysis of descriptive medication data suggests that in addition to conventional drug therapy in DS, bromide, clonazepam and topiramate may reduce seizure frequency. Wiley Periodicals, Inc. © 2016 International League Against Epilepsy.

  15. Rapid sequence evolution of street rabies glycoprotein is related to the highly heterogeneous nature of the viral population.

    PubMed

    Benmansour, A; Brahimi, M; Tuffereau, C; Coulon, P; Lafay, F; Flamand, A

    1992-03-01

    The sequence of the glycoprotein gene of a street rabies virus was determined directly using fragments of a rabid dog brain after PCR amplification. Compared with that of the prototype strain CVS, this sequence displayed 10% divergence in overall amino acid composition. However only 6% divergence was noted in the ectodomain suggesting that structural constraints are exerted on this portion of the glycoprotein. A human strain isolated on cell culture from the saliva of a patient with clinical rabies had only five amino acid differences with the canine isolate, an indication of their close relatedness. These differences could have originated during transmission from dog to dog, or from dog to man, or during isolation on cell culture; they are nonetheless indicative of a genetic evolution of street rabies virus. This evolution was further evidenced by the selection of cell-adapted variants which displayed new amino acid substitutions in the glycoprotein. One of them concerned antigenic site III where arginine at position 333 was replaced by glutamine. As expected this substitution conferred resistance to a site IIIa monoclonal antibody (MAb), but surprisingly did not abolish neurovirulence for adult mice. However, a decrease in the neurovirulence of the cell-adapted variant in the presence of a site IIIa specific MAb was noted, suggesting that neurovirulence was due to a subpopulation neutralizable by the MAb. Simultaneous presence of both the parental and variant sequences was indeed evidenced in the brain of a mouse inoculated with the cell-adapted variant; during multiplication in the mouse brain, the frequency of the parental sequence rose from less than 10% to nearly 50%, indicating the selective advantage conferred by arginine 333 in nervous tissue. Altogether these results were suggestive of an intrinsic heterogeneity of street rabies virus. This heterogeneity was further demonstrated by the sequencing of molecular clones of the glycoprotein gene, which

  16. Nucleic acids encoding antifungal polypeptides and uses thereof

    DOEpatents

    Altier, Daniel J.; Ellanskaya, I. A.; Gilliam, Jacob T.; Hunter-Cevera, Jennie; Presnail, James K; Schepers, Eric; Simmons, Carl R.; Torok, Tamas; Yalpani, Nasser

    2010-11-02

    Compositions and methods for protecting a plant from a pathogen, particularly a fungal pathogen, are provided. Compositions include an amino acid sequence, and variants and fragments thereof, for an antipathogenic polypeptide that was isolated from a fungal fermentation broth. Nucleic acid molecules that encode the antipathogenic polypeptides of the invention, and antipathogenic domains thereof, are also provided. A method for inducing pathogen resistance in a plant using the nucleotide sequences disclosed herein is further provided. The method comprises introducing into a plant an expression cassette comprising a promoter operably linked to a nucleotide sequence that encodes an antipathogenic polypeptide of the invention. Compositions comprising an antipathogenic polypeptide or a transformed microorganism comprising a nucleic acid of the invention in combination with a carrier and methods of using these compositions to protect a plant from a pathogen are further provided. Transformed plants, plant cells, seeds, and microorganisms comprising a nucleotide sequence that encodes an antipathogenic polypeptide of the invention are also disclosed.

  17. Association of levels of fasting glucose and insulin with rare variants at the chromosome 11p11.2-MADD locus: Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium Targeted Sequencing Study.

    PubMed

    Cornes, Belinda K; Brody, Jennifer A; Nikpoor, Naghmeh; Morrison, Alanna C; Chu, Huan; Ahn, Byung Soo; Wang, Shuai; Dauriz, Marco; Barzilay, Joshua I; Dupuis, Josée; Florez, Jose C; Coresh, Josef; Gibbs, Richard A; Kao, W H Linda; Liu, Ching-Ti; McKnight, Barbara; Muzny, Donna; Pankow, James S; Reid, Jeffrey G; White, Charles C; Johnson, Andrew D; Wong, Tien Y; Psaty, Bruce M; Boerwinkle, Eric; Rotter, Jerome I; Siscovick, David S; Sladek, Robert; Meigs, James B

    2014-06-01

    Common variation at the 11p11.2 locus, encompassing MADD, ACP2, NR1H3, MYBPC3, and SPI1, has been associated in genome-wide association studies with fasting glucose and insulin (FI). In the Cohorts for Heart and Aging Research in Genomic Epidemiology Targeted Sequencing Study, we sequenced 5 gene regions at 11p11.2 to identify rare, potentially functional variants influencing fasting glucose or FI levels. Sequencing (mean depth, 38×) across 16.1 kb in 3566 individuals without diabetes mellitus identified 653 variants, 79.9% of which were rare (minor allele frequency <1%) and novel. We analyzed rare variants in 5 gene regions with FI or fasting glucose using the sequence kernel association test. At NR1H3, 53 rare variants were jointly associated with FI (P=2.73×10(-3)); of these, 7 were predicted to have regulatory function and showed association with FI (P=1.28×10(-3)). Conditioning on 2 previously associated variants at MADD (rs7944584, rs10838687) did not attenuate this association, suggesting that there are >2 independent signals at 11p11.2. One predicted regulatory variant, chr11:47227430 (hg18; minor allele frequency=0.00068), contributed 20.6% to the overall sequence kernel association test score at NR1H3, lies in intron 2 of NR1H3, and is a predicted binding site for forkhead box A1 (FOXA1), a transcription factor associated with insulin regulation. In human HepG2 hepatoma cells, the rare chr11:47227430 A allele disrupted FOXA1 binding and reduced FOXA1-dependent transcriptional activity. Sequencing at 11p11.2-NR1H3 identified rare variation associated with FI. One variant, chr11:47227430, seems to be functional, with the rare A allele reducing transcription factor FOXA1 binding and FOXA1-dependent transcriptional activity. © 2014 American Heart Association, Inc.

  18. A Nonsense Variant in the ACADVL Gene in German Hunting Terriers with Exercise Induced Metabolic Myopathy.

    PubMed

    Lepori, Vincent; Mühlhause, Franziska; Sewell, Adrian C; Jagannathan, Vidhya; Janzen, Nils; Rosati, Marco; Alves de Sousa, Filipe Miguel Maximiano; Tschopp, Aurélie; Schüpbach, Gertraud; Matiasek, Kaspar; Tipold, Andrea; Leeb, Tosso; Kornberg, Marion

    2018-05-04

    Several enzymes are involved in fatty acid oxidation, which is a key process in mitochondrial energy production. Inherited defects affecting any step of fatty acid oxidation can result in clinical disease. We present here an extended family of German Hunting Terriers with 10 dogs affected by clinical signs of exercise induced weakness, muscle pain, and suspected rhabdomyolysis. The combination of clinical signs, muscle histopathology and acylcarnitine analysis with an elevated tetradecenoylcarnitine (C14:1) peak suggested a possible diagnosis of acyl-CoA dehydrogenase very long chain deficiency (ACADVLD). Whole genome sequence analysis of one affected dog and 191 controls revealed a nonsense variant in the ACADVL gene encoding acyl-CoA dehydrogenase very long chain, c.1728C>A or p.(Tyr576*). The variant showed perfect association with the phenotype in the 10 affected and more than 500 control dogs of various breeds. Pathogenic variants in the ACADVL gene have been reported in humans with similar myopathic phenotypes. We therefore considered the detected variant to be the most likely candidate causative variant for the observed exercise induced myopathy. To our knowledge, this is the first description of this disease in dogs, which we propose to name exercise induced metabolic myopathy (EIMM), and the identification of the first canine pathogenic ACADVL variant. Our findings provide a large animal model for a known human disease and will enable genetic testing to avoid the unintentional breeding of affected offspring. Copyright © 2018 Lepori et al.

  19. Comparative characterization of random-sequence proteins consisting of 5, 12, and 20 kinds of amino acids

    PubMed Central

    Tanaka, Junko; Doi, Nobuhide; Takashima, Hideaki; Yanagawa, Hiroshi

    2010-01-01

    Screening of functional proteins from a random-sequence library has been used to evolve novel proteins in the field of evolutionary protein engineering. However, random-sequence proteins consisting of the 20 natural amino acids tend to aggregate, and the occurrence rate of functional proteins in a random-sequence library is low. From the viewpoint of the origin of life, it has been proposed that primordial proteins consisted of a limited set of amino acids that could have been abundantly formed early during chemical evolution. We have previously found that members of a random-sequence protein library constructed with five primitive amino acids show high solubility (Doi et al., Protein Eng Des Sel 2005;18:279–284). Although such a library is expected to be appropriate for finding functional proteins, the functionality may be limited, because they have no positively charged amino acid. Here, we constructed three libraries of 120-amino acid, random-sequence proteins using alphabets of 5, 12, and 20 amino acids by preselection using mRNA display (to eliminate sequences containing stop codons and frameshifts) and characterized and compared the structural properties of random-sequence proteins arbitrarily chosen from these libraries. We found that random-sequence proteins constructed with the 12-member alphabet (including five primitive amino acids and positively charged amino acids) have higher solubility than those constructed with the 20-member alphabet, though other biophysical properties are very similar in the two libraries. Thus, a library of moderate complexity constructed from 12 amino acids may be a more appropriate resource for functional screening than one constructed from 20 amino acids. PMID:20162614

  20. Animal selection for whole genome sequencing by quantifying the unique contribution of homozygous haplotypes sequenced

    USDA-ARS?s Scientific Manuscript database

    Major whole genome sequencing projects promise to identify rare and causal variants within livestock species; however, the efficient selection of animals for sequencing remains a major problem within these surveys. The goal of this project was to develop a library of high accuracy genetic variants f...

  1. Network Analysis of Sequence-Function Relationships and Exploration of Sequence Space of TEM β-Lactamases.

    PubMed

    Zeil, Catharina; Widmann, Michael; Fademrecht, Silvia; Vogel, Constantin; Pleiss, Jürgen

    2016-05-01

    The Lactamase Engineering Database (www.LacED.uni-stuttgart.de) was developed to facilitate the classification and analysis of TEM β-lactamases. The current version contains 474 TEM variants. Two hundred fifty-nine variants form a large scale-free network of highly connected point mutants. The network was divided into three subnetworks which were enriched by single phenotypes: one network with predominantly 2be and two networks with 2br phenotypes. Fifteen positions were found to be highly variable, contributing to the majority of the observed variants. Since it is expected that a considerable fraction of the theoretical sequence space is functional, the currently sequenced 474 variants represent only the tip of the iceberg of functional TEM β-lactamase variants which form a huge natural reservoir of highly interconnected variants. Almost 50% of the variants are part of a quartet. Thus, two single mutations that result in functional enzymes can be combined into a functional protein. Most of these quartets consist of the same phenotype, or the mutations are additive with respect to the phenotype. By predicting quartets from triplets, 3,916 unknown variants were constructed. Eighty-seven variants complement multiple quartets and therefore have a high probability of being functional. The construction of a TEM β-lactamase network and subsequent analyses by clustering and quartet prediction are valuable tools to gain new insights into the viable sequence space of TEM β-lactamases and to predict their phenotype. The highly connected sequence space of TEM β-lactamases is ideally suited to network analysis and demonstrates the strengths of network analysis over tree reconstruction methods. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  2. An Improved Variant of Soybean Type 1 Diacylglycerol Acyltransferase Increases the Oil Content and Decreases the Soluble Carbohydrate Content of Soybeans[OPEN

    PubMed Central

    Shen, Bo; Damude, Howard G.; Everard, John D.; Booth, John R.

    2016-01-01

    Kinetically improved diacylglycerol acyltransferase (DGAT) variants were created to favorably alter carbon partitioning in soybean (Glycine max) seeds. Initially, variants of a type 1 DGAT from a high-oil, high-oleic acid plant seed, Corylus americana, were screened for high oil content in Saccharomyces cerevisiae. Nearly all DGAT variants examined from high-oil strains had increased affinity for oleoyl-CoA, with S0.5 values decreased as much as 4.7-fold compared with the wild-type value of 0.94 µm. Improved soybean DGAT variants were then designed to include amino acid substitutions observed in promising C. americana DGAT variants. The expression of soybean and C. americana DGAT variants in soybean somatic embryos resulted in oil contents as high as 10% and 12%, respectively, compared with only 5% and 7.6% oil achieved by overexpressing the corresponding wild-type DGATs. The affinity for oleoyl-CoA correlated strongly with oil content. The soybean DGAT variant that gave the greatest oil increase contained 14 amino acid substitutions out of a total of 504 (97% sequence identity with native). Seed-preferred expression of this soybean DGAT1 variant increased oil content of soybean seeds by an average of 3% (16% relative increase) in highly replicated, single-location field trials. The DGAT transgenes significantly reduced the soluble carbohydrate content of mature seeds and increased the seed protein content of some events. This study demonstrated that engineering of the native DGAT enzyme is an effective strategy to improve the oil content and value of soybeans. PMID:27208257

  3. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data shall...

  4. Amino acid sequence of a trypsin inhibitor from a Spirometra (Spirometra erinaceieuropaei).

    PubMed

    Sanda, A; Uchida, A; Itagaki, T; Kobayashi, H; Inokuchi, N; Koyama, T; Iwama, M; Ohgi, K; Irie, M

    2001-12-01

    A trypsin inhibitor that is highly homologous with bovine pancreatic trypsin inhibitor (BPTI) was co-purified along with RNase from Spirometra (Spirometra erinaceieuropaei). The amino acid sequence of this inhibitor (SETI) and the nucleotide sequence of the cDNA encoding this protein were determined by protein chemistry and gene technology. SETI contains 68 amino acid residues and has a molecular mass of 7,798 Da. SETI has 31 amino acid residues that are identical with BPTI's sequence, including 6 half-cystine and 5 aromatic amino acid residues. The active site Lys residue in BPTI is replaced by an Arg residue in SETI. SETI is an effective inhibitor of trypsin and moderately inhibits a-chymotrypsin, but less inhibits elastase or subtilisin. SETI was expressed by E. coli containing a PelB vector carrying the SETI encoding cDNA; an expression yield of 0.68 mg/l was obtained. The phylogenetic relationship of SETI and the other BPTI-like trypsin inhibitors was analyzed using most likelihood inference methods.

  5. Targeted Deep Sequencing Identifies Rare ‘loss-of-function’ Variants in IFNGR1 for Risk of Atopic Dermatitis Complicated by Eczema Herpeticum

    PubMed Central

    Gao, Li; Rafaels, Nicholas M; Huang, Lili; Potee, Joseph; Ruczinski, Ingo; Beaty, Terri H.; Paller, Amy S.; Schneider, Lynda C.; Gallo, Rich; Hanifin, Jon M.; Beck, Lisa A.; Geha, Raif S.; Mathias, Rasika A.; Leung, Donald Y. M.

    2015-01-01

    Background A subset of atopic dermatitis (AD) is associated with increased susceptibility to eczema herpeticum (ADEH+). We previously reported that common single nucleotide polymorphisms (SNPs) in interferon-gamma (IFNG) and receptor 1 (IFNGR1) were associated with ADEH+ phenotype. Objective To interrogate the role of rare variants in IFN-pathway genes for risk of ADEH+. Methods We performed targeted sequencing of interferon-pathway genes (IFNG, IFNGR1, IFNAR1 and IL12RB1) in 228 European American (EA) AD patients selected according to their EH status and severity measured by Eczema Area and Severity Index (EASI). Replication genotyping was performed in independent samples of 219 EA and 333 African Americans (AA). Functional investigation of ‘loss-of-function’ variants was conducted using site-directed mutagenesis. Results We identified 494 single nucleotide variants (SNVs) encompassing 105kb of sequence, including 145 common, 349 (70.6%) rare (minor allele frequency (MAF) <5%) and 86 (17.4%) novel variants, of which 2.8% were coding-synonymous, 93.3% were non-coding (64.6% intronic), and 3.8% were missense. We identified six rare IFNGR1 missense including three damaging variants (Val14Met (V14M), Val61Ile and Tyr397Cys (Y397C)) conferring a higher risk for ADEH+ (P=0.031). Variants V14M and Y397C were confirmed to be deleterious leading to partial IFNGR1 deficiency. Seven common IFNGR1 SNPs, along with common protective haplotypes (2 to 7-SNPs) conferred a reduced risk of ADEH+ (P=0.015-0.002, P=0.0015-0.0004, respectively), and both SNP and haplotype associations were replicated in an independent AA sample (P=0.004-0.0001 and P=0.001-0.0001, respectively). Conclusion Our results provide evidence that both genetic variants in the gene encoding IFNGR1 are implicated in susceptibility to the ADEH+ phenotype. CAPSULE SUMMARY We provided the first evidence that rare functional IFNGR1 mutations contribute to a defective systemic IFN-γ immune response that accounts

  6. BMP15 c.-9C>G promoter sequence variant may contribute to the cause of non-syndromic premature ovarian failure.

    PubMed

    Fonseca, Dora Janeth; Ortega-Recalde, Oscar; Esteban-Perez, Clara; Moreno-Ortiz, Harold; Patiño, Liliana Catherine; Bermúdez, Olga María; Ortiz, Angela María; Restrepo, Carlos M; Lucena, Elkin; Laissue, Paul

    2014-11-01

    BMP15 has drawn particular attention in the pathophysiology of reproduction, as its mutations in mammalian species have been related to different reproductive phenotypes. In humans, BMP15 coding regions have been sequenced in large panels of women with premature ovarian failure (POF), but only some mutations have been definitely validated as causing the phenotype. A functional association between the BMP15 c.-9C>G promoter polymorphism and cause of POF have been reported. The aim of this study was to determine the potential functional effect of this sequence variant on specific BMP15 promoter transactivation disturbances. Bioinformatics was used to identify transcription factor binding sites located on the promoter region of BMP15. Reverse transcription polymerase chain reaction was used to study specific gene expression in ovarian tissue. Luciferase reporter assays were used to establish transactivation disturbances caused by the BMP15 c.-9C>G variant. The c.-9C>G variant was found to modify the PITX1 transcription factor binding site. PITX1 and BMP15 co-expressed in human and mouse ovarian tissue, and PITX1 transactivated both BMP15 promoter versions (-9C and -9G). It was found that the BMP15 c.-9G allele was related to BMP15 increased transcription, supporting c.-9C>G as a causal agent of POF. Copyright © 2014 Reproductive Healthcare Ltd. Published by Elsevier Ltd. All rights reserved.

  7. Complete cDNA sequence of SAP-like pentraxin from Limulus polyphemus: implications for pentraxin evolution.

    PubMed

    Tharia, Hazel A; Shrive, Annette K; Mills, John D; Arme, Chris; Williams, Gwyn T; Greenhough, Trevor J

    2002-02-22

    The serum amyloid P component (SAP)-like pentraxin Limulus polyphemus SAP is a recently discovered, distinct pentraxin species, of known structure, which does not bind phosphocholine and whose N-terminal sequence has been shown to differ markedly from the highly conserved N terminus of all other known horseshoe crab pentraxins. The complete cDNA sequence of Limulus SAP, and the derived amino acid sequence, the first invertebrate SAP-like pentraxin sequence, have been determined. Two sequences were identified that differed only in the length of the 3' untranslated region. Limulus SAP is synthesised as a precursor protein of 234 amino acid residues, the first 17 residues encoding a signal peptide that is absent from the mature protein. Phylogenetic analysis clusters Limulus SAP pentraxin with the horseshoe crab C-reactive proteins (CRPs) rather than the mammalian SAPs, which are clustered with mammalian CRPs. The deduced amino acid sequence shares 22% identity with both human SAP and CRP, which are 51% identical, and 31-35% with horseshoe crab CRPs. These analyses indicate that gene duplication of CRP (or SAP), followed by sequence divergence and the evolution of CRP and/or SAP function, occurred independently along the chordate and arthropod evolutionary lines rather than in a common ancestor. They further indicate that the CRP/SAP gene duplication event in Limulus occurred before both the emergence of the Limulus CRP variants and the mammalian CRP/SAP gene duplication. Limulus SAP, which does not exhibit the CRP characteristic of calcium-dependent binding to phosphocholine, is established as a pentraxin species distinct from all other known horseshoe crab pentraxins that exist in many variant forms sharing a high level of sequence homology. Copyright 2002 Elsevier Science Ltd.

  8. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  9. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  10. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  11. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  12. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  13. Telomere extension by telomerase and ALT generates variant repeats by mechanistically distinct processes

    PubMed Central

    Lee, Michael; Hills, Mark; Conomos, Dimitri; Stutz, Michael D.; Dagg, Rebecca A.; Lau, Loretta M.S.; Reddel, Roger R.; Pickett, Hilda A.

    2014-01-01

    Telomeres are terminal repetitive DNA sequences on chromosomes, and are considered to comprise almost exclusively hexameric TTAGGG repeats. We have evaluated telomere sequence content in human cells using whole-genome sequencing followed by telomere read extraction in a panel of mortal cell strains and immortal cell lines. We identified a wide range of telomere variant repeats in human cells, and found evidence that variant repeats are generated by mechanistically distinct processes during telomerase- and ALT-mediated telomere lengthening. Telomerase-mediated telomere extension resulted in biased repeat synthesis of variant repeats that differed from the canonical sequence at positions 1 and 3, but not at positions 2, 4, 5 or 6. This indicates that telomerase is most likely an error-prone reverse transcriptase that misincorporates nucleotides at specific positions on the telomerase RNA template. In contrast, cell lines that use the ALT pathway contained a large range of variant repeats that varied greatly between lines. This is consistent with variant repeats spreading from proximal telomeric regions throughout telomeres in a stochastic manner by recombination-mediated templating of DNA synthesis. The presence of unexpectedly large numbers of variant repeats in cells utilizing either telomere maintenance mechanism suggests a conserved role for variant sequences at human telomeres. PMID:24225324

  14. Common and rare variants associated with kidney stones and biochemical traits.

    PubMed

    Oddsson, Asmundur; Sulem, Patrick; Helgason, Hannes; Edvardsson, Vidar O; Thorleifsson, Gudmar; Sveinbjörnsson, Gardar; Haraldsdottir, Eik; Eyjolfsson, Gudmundur I; Sigurdardottir, Olof; Olafsson, Isleifur; Masson, Gisli; Holm, Hilma; Gudbjartsson, Daniel F; Thorsteinsdottir, Unnur; Indridason, Olafur S; Palsson, Runolfur; Stefansson, Kari

    2015-08-14

    Kidney stone disease is a complex disorder with a strong genetic component. We conducted a genome-wide association study of 28.3 million sequence variants detected through whole-genome sequencing of 2,636 Icelanders that were imputed into 5,419 kidney stone cases, including 2,172 cases with a history of recurrent kidney stones, and 279,870 controls. We identify sequence variants associating with kidney stones at ALPL (rs1256328[T], odds ratio (OR)=1.21, P=5.8 × 10(-10)) and a suggestive association at CASR (rs7627468[A], OR=1.16, P=2.0 × 10(-8)). Focusing our analysis on coding sequence variants in 63 genes with preferential kidney expression we identify two rare missense variants SLC34A1 p.Tyr489Cys (OR=2.38, P=2.8 × 10(-5)) and TRPV5 p.Leu530Arg (OR=3.62, P=4.1 × 10(-5)) associating with recurrent kidney stones. We also observe associations of the identified kidney stone variants with biochemical traits in a large population set, indicating potential biological mechanism.

  15. Methods for engineering polypeptide variants via somatic hypermutation and polypeptide made thereby

    DOEpatents

    Tsien, Roger Y; Wang, Lei

    2015-01-13

    Methods using somatic hypermutation (SHM) for producing polypeptide and nucleic acid variants, and nucleic acids encoding such polypeptide variants are disclosed. Such variants may have desired properties. Also disclosed are novel polypeptides, such as improved fluorescent proteins, produced by the novel methods, and nucleic acids, vectors, and host cells comprising such vectors.

  16. Nucleotide sequence analysis of the gene encoding the Deinococcus radiodurans surface protein, derived amino acid sequence, and complementary protein chemical studies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Peters, J.; Peters, M.; Lottspeich, F.

    1987-11-01

    The complete nucleotide sequence of the gene encoding the surface (hexagonally packed intermediate (HPI))-layer polypeptide of Deinococcus radiodurans Sark was determined and found to encode a polypeptide of 1036 amino acids. Amino acid sequence analysis of about 30% of the residues revealed that the mature polypeptide consists of at least 978 amino acids. The N terminus was blocked to Edman degradation. The results of proteolytic modification of the HPI layer in situ and M/sub r/ estimations of the HPI polypeptide expressed in Escherichia coli indicated that there is a leader sequence. The N-terminal region contained a very high percentage (29%)more » of threonine and serine, including a cluster of nine consecutive serine or threonine residues, whereas a stretch near the C terminus was extremely rich in aromatic amino acids (29%). The protein contained at least two disulfide bridges, as well as tightly bound reducing sugars and fatty acids.« less

  17. Complex analysis of urate transporters SLC2A9, SLC22A12 and functional characterization of non-synonymous allelic variants of GLUT9 in the Czech population: no evidence of effect on hyperuricemia and gout.

    PubMed

    Hurba, Olha; Mancikova, Andrea; Krylov, Vladimir; Pavlikova, Marketa; Pavelka, Karel; Stibůrková, Blanka

    2014-01-01

    Using European descent Czech populations, we performed a study of SLC2A9 and SLC22A12 genes previously identified as being associated with serum uric acid concentrations and gout. This is the first study of the impact of non-synonymous allelic variants on the function of GLUT9 except for patients suffering from renal hypouricemia type 2. The cohort consisted of 250 individuals (150 controls, 54 nonspecific hyperuricemics and 46 primary gout and/or hyperuricemia subjects). We analyzed 13 exons of SLC2A9 (GLUT9 variant 1 and GLUT9 variant 2) and 10 exons of SLC22A12 by PCR amplification and sequenced directly. Allelic variants were prepared and their urate uptake and subcellular localization were studied by Xenopus oocytes expression system. The functional studies were analyzed using the non-parametric Wilcoxon and Kruskall-Wallis tests; the association study used the Fisher exact test and linear regression approach. We identified a total of 52 sequence variants (12 unpublished). Eight non-synonymous allelic variants were found only in SLC2A9: rs6820230, rs2276961, rs144196049, rs112404957, rs73225891, rs16890979, rs3733591 and rs2280205. None of these variants showed any significant difference in the expression of GLUT9 and in urate transport. In the association study, eight variants showed a possible association with hyperuricemia. However, seven of these were in introns and the one exon located variant, rs7932775, did not show a statistically significant association with serum uric acid concentration. Our results did not confirm any effect of SLC22A12 and SLC2A9 variants on serum uric acid concentration. Our complex approach using association analysis together with functional and immunohistochemical characterization of non-synonymous allelic variants did not show any influence on expression, subcellular localization and urate uptake of GLUT9.

  18. Fibrinogen Lincoln: a new truncated alpha chain variant with delayed clotting.

    PubMed

    Ridgway, H J; Brennan, S O; Gibbons, S; George, P M

    1996-04-01

    A patient referred for preoperative investigation of prolonged bleeding and easy bruising was found to have increased thrombin and reptilase times; however, the thrombin catalysed release of fibrinopeptides A and B was normal. Analysis of five other family members, spanning three generations, indicated that three had a similar defect and suggested autosomal dominant inheritance. Non-reducing SDS-PAGE of purified fibrinogen from affected individuals showed that the 340 kD form of their fibrinogen ran as a doublet. SSCP (single-stranded conformational polymorphism) analysis of exon 5 of the A alpha gene, which encodes the C-terminal half of the chain, confirmed the presence of a mutation. Cycle sequencing of PCR amplified DNA revealed a 13 base pair deletion (nt 4758-4770), resulting in a frame-shift at Ala 475, which translates as four new amino acids before terminating at a new stop codon (-476His-Cys-Leu-Ala-Stop). The presence of a circulating truncated A alpha chain was confirmed when SDS-PAGE gels were probed with an alpha chain specific antisera; which showed that the variant A alpha chain comigrated with gamma chains. The truncation results in a variant A alpha chain with a deletion of 131 amino acids (480-610), and four new amino acids at the C-terminal.

  19. Rare variant associations with waist-to-hip ratio in European-American and African-American women from the NHLBI-Exome Sequencing Project

    PubMed Central

    Kan, Mengyuan; Auer, Paul L; Wang, Gao T; Bucasas, Kristine L; Hooker, Stanley; Rodriguez, Alejandra; Li, Biao; Ellis, Jaclyn; Adrienne Cupples, L; Ida Chen, Yii-Der; Dupuis, Josée; Fox, Caroline S; Gross, Myron D; Smith, Joshua D; Heard-Costa, Nancy; Meigs, James B; Pankow, James S; Rotter, Jerome I; Siscovick, David; Wilson, James G; Shendure, Jay; Jackson, Rebecca; Peters, Ulrike; Zhong, Hua; Lin, Danyu; Hsu, Li; Franceschini, Nora; Carlson, Chris; Abecasis, Goncalo; Gabriel, Stacey; Bamshad, Michael J; Altshuler, David; Nickerson, Deborah A; North, Kari E; Lange, Leslie A; Reiner, Alexander P; Leal, Suzanne M

    2016-01-01

    Waist-to-hip ratio (WHR), a relative comparison of waist and hip circumferences, is an easily accessible measurement of body fat distribution, in particular central abdominal fat. A high WHR indicates more intra-abdominal fat deposition and is an established risk factor for cardiovascular disease and type 2 diabetes. Recent genome-wide association studies have identified numerous common genetic loci influencing WHR, but the contributions of rare variants have not been previously reported. We investigated rare variant associations with WHR in 1510 European-American and 1186 African-American women from the National Heart, Lung, and Blood Institute-Exome Sequencing Project. Association analysis was performed on the gene level using several rare variant association methods. The strongest association was observed for rare variants in IKBKB (P=4.0 × 10−8) in European-Americans, where rare variants in this gene are predicted to decrease WHRs. The activation of the IKBKB gene is involved in inflammatory processes and insulin resistance, which may affect normal food intake and body weight and shape. Meanwhile, aggregation of rare variants in COBLL1, previously found to harbor common variants associated with WHR and fasting insulin, were nominally associated (P=2.23 × 10−4) with higher WHR in European-Americans. However, these significant results are not shared between African-Americans and European-Americans that may be due to differences in the allelic architecture of the two populations and the small sample sizes. Our study indicates that the combined effect of rare variants contribute to the inter-individual variation in fat distribution through the regulation of insulin response. PMID:26757982

  20. HUMAN LIVER FATTY ACID BINDING PROTEIN (L-FABP) T94A VARIANT ALTERS STRUCTURE, STABILITY, AND INTERACTION WITH FIBRATES

    PubMed Central

    Martin, Gregory G.; McIntosh, Avery L.; Huang, Huan; Gupta, Shipra; Atshaves, Barbara P.; Landrock, Kerstin K.; Landrock, Danilo; Kier, Ann B.; Schroeder, Friedhelm

    2014-01-01

    Although the human L-FABP T94A variant arises from the most commonly occurring SNP in the entire FABP family, there is a complete lack of understanding regarding the role of this polymorphism in human disease. It has been hypothesized that the T94A substitution results in complete loss of ligand binding ability and function analogous to L-FABP gene ablation. This possibility was addressed using recombinant human WT T94T and T94A variant L-FABP and cultured primary human hepatocytes. Non-conservative replacement of the medium sized, polar, uncharged T residue by a smaller, nonpolar, aliphatic A residue at position 94 of human L-FABP significantly increased L-FABP protein α-helical structure at the expense of β-sheet and concomitantly decreased thermal stability. T94A did not alter binding affinities for PPARα agonist ligands (phytanic acid, fenofibrate, fenofibric acid). While T94A did not alter the impact of phytanic acid and only slightly altered that of fenofibrate on human L-FABP secondary structure, the active metabolite fenofibric acid altered T94A secondary structure much more than that of WT T94T L-FABP. Finally, in cultured primary human hepatocytes the T94A variant exhibited significantly reduced fibrate-mediated induction of PPARα-regulated proteins such as L-FABP, FATP5, and PPARα itself. Thus, while T94A substitution did not alter the affinity of human L-FABP for PPARα agonist ligands, it significantly altered human L-FABP structure, stability, as well as conformational and functional response to fibrate. PMID:24299557

  1. Method of generating ploynucleotides encoding enhanced folding variants

    DOEpatents

    Bradbury, Andrew M.; Kiss, Csaba; Waldo, Geoffrey S.

    2017-05-02

    The invention provides directed evolution methods for improving the folding, solubility and stability (including thermostability) characteristics of polypeptides. In one aspect, the invention provides a method for generating folding and stability-enhanced variants of proteins, including but not limited to fluorescent proteins, chromophoric proteins and enzymes. In another aspect, the invention provides methods for generating thermostable variants of a target protein or polypeptide via an internal destabilization baiting strategy. Internally destabilization a protein of interest is achieved by inserting a heterologous, folding-destabilizing sequence (folding interference domain) within DNA encoding the protein of interest, evolving the protein sequences adjacent to the heterologous insertion to overcome the destabilization (using any number of mutagenesis methods), thereby creating a library of variants. The variants in the library are expressed, and those with enhanced folding characteristics selected.

  2. Structure and genetic variability of envelope glycoproteins of two antigenic variants of caprine arthritis-encephalitis lentivirus.

    PubMed

    Knowles, D P; Cheevers, W P; McGuire, T C; Brassfield, A L; Harwood, W G; Stem, T A

    1991-11-01

    To define the structure of the caprine arthritis-encephalitis virus (CAEV) env gene and characterize genetic changes which occur during antigenic variation, we sequenced the env genes of CAEV-63 and CAEV-Co, two antigenic variants of CAEV defined by serum neutralization. The deduced primary translation product of the CAEV env gene consists of a 60- to 80-amino-acid signal peptide followed by an amino-terminal surface protein (SU) and a carboxy-terminal transmembrane protein (TM) separated by an Arg-Lys-Lys-Arg cleavage site. The signal peptide cleavage site was verified by amino-terminal amino acid sequencing of native CAEV-63 SU. In addition, immunoprecipitation of [35S]methionine-labeled CAEV-63 proteins by sera from goats immunized with recombinant vaccinia virus expressing the CAEV-63 env gene confirmed that antibodies induced by env-encoded recombinant proteins react specifically with native virion SU and TM. The env genes of CAEV-63 and CAEV-Co encode 28 conserved cysteines and 25 conserved potential N-linked glycosylation sites. Nucleotide sequence variability results in 62 amino acid changes and one deletion within the SU and 34 amino acid changes within the TM.

  3. Structure and genetic variability of envelope glycoproteins of two antigenic variants of caprine arthritis-encephalitis lentivirus.

    PubMed Central

    Knowles, D P; Cheevers, W P; McGuire, T C; Brassfield, A L; Harwood, W G; Stem, T A

    1991-01-01

    To define the structure of the caprine arthritis-encephalitis virus (CAEV) env gene and characterize genetic changes which occur during antigenic variation, we sequenced the env genes of CAEV-63 and CAEV-Co, two antigenic variants of CAEV defined by serum neutralization. The deduced primary translation product of the CAEV env gene consists of a 60- to 80-amino-acid signal peptide followed by an amino-terminal surface protein (SU) and a carboxy-terminal transmembrane protein (TM) separated by an Arg-Lys-Lys-Arg cleavage site. The signal peptide cleavage site was verified by amino-terminal amino acid sequencing of native CAEV-63 SU. In addition, immunoprecipitation of [35S]methionine-labeled CAEV-63 proteins by sera from goats immunized with recombinant vaccinia virus expressing the CAEV-63 env gene confirmed that antibodies induced by env-encoded recombinant proteins react specifically with native virion SU and TM. The env genes of CAEV-63 and CAEV-Co encode 28 conserved cysteines and 25 conserved potential N-linked glycosylation sites. Nucleotide sequence variability results in 62 amino acid changes and one deletion within the SU and 34 amino acid changes within the TM. Images PMID:1656067

  4. Association of low-frequency and rare coding-sequence variants with blood lipids and Coronary Heart Disease in 56,000 whites and blacks

    USDA-ARS?s Scientific Manuscript database

    Low-frequency coding DNA sequence variants in the proprotein convertase subtilisin/kexin type 9 gene (PCSK9) lower plasma low-density lipoprotein cholesterol (LDL-C), protect against risk of coronary heart disease (CHD), and have prompted the development of a new class of therapeutics. It is uncerta...

  5. Identification of pathogenic gene variants in small families with intellectually disabled siblings by exome sequencing.

    PubMed

    Schuurs-Hoeijmakers, Janneke H M; Vulto-van Silfhout, Anneke T; Vissers, Lisenka E L M; van de Vondervoort, Ilse I G M; van Bon, Bregje W M; de Ligt, Joep; Gilissen, Christian; Hehir-Kwa, Jayne Y; Neveling, Kornelia; del Rosario, Marisol; Hira, Gausiya; Reitano, Santina; Vitello, Aurelio; Failla, Pinella; Greco, Donatella; Fichera, Marco; Galesi, Ornella; Kleefstra, Tjitske; Greally, Marie T; Ockeloen, Charlotte W; Willemsen, Marjolein H; Bongers, Ernie M H F; Janssen, Irene M; Pfundt, Rolph; Veltman, Joris A; Romano, Corrado; Willemsen, Michèl A; van Bokhoven, Hans; Brunner, Han G; de Vries, Bert B A; de Brouwer, Arjan P M

    2013-12-01

    Intellectual disability (ID) is a common neurodevelopmental disorder affecting 1-3% of the general population. Mutations in more than 10% of all human genes are considered to be involved in this disorder, although the majority of these genes are still unknown. We investigated 19 small non-consanguineous families with two to five affected siblings in order to identify pathogenic gene variants in known, novel and potential ID candidate genes. Non-consanguineous families have been largely ignored in gene identification studies as small family size precludes prior mapping of the genetic defect. Using exome sequencing, we identified pathogenic mutations in three genes, DDHD2, SLC6A8, and SLC9A6, of which the latter two have previously been implicated in X-linked ID phenotypes. In addition, we identified potentially pathogenic mutations in BCORL1 on the X-chromosome and in MCM3AP, PTPRT, SYNE1, and ZNF528 on autosomes. We show that potentially pathogenic gene variants can be identified in small, non-consanguineous families with as few as two affected siblings, thus emphasising their value in the identification of syndromic and non-syndromic ID genes.

  6. GTRAC: fast retrieval from compressed collections of genomic variants

    PubMed Central

    Tatwawadi, Kedar; Hernaez, Mikel; Ochoa, Idoia; Weissman, Tsachy

    2016-01-01

    Motivation: The dramatic decrease in the cost of sequencing has resulted in the generation of huge amounts of genomic data, as evidenced by projects such as the UK10K and the Million Veteran Project, with the number of sequenced genomes ranging in the order of 10 K to 1 M. Due to the large redundancies among genomic sequences of individuals from the same species, most of the medical research deals with the variants in the sequences as compared with a reference sequence, rather than with the complete genomic sequences. Consequently, millions of genomes represented as variants are stored in databases. These databases are constantly updated and queried to extract information such as the common variants among individuals or groups of individuals. Previous algorithms for compression of this type of databases lack efficient random access capabilities, rendering querying the database for particular variants and/or individuals extremely inefficient, to the point where compression is often relinquished altogether. Results: We present a new algorithm for this task, called GTRAC, that achieves significant compression ratios while allowing fast random access over the compressed database. For example, GTRAC is able to compress a Homo sapiens dataset containing 1092 samples in 1.1 GB (compression ratio of 160), while allowing for decompression of specific samples in less than a second and decompression of specific variants in 17 ms. GTRAC uses and adapts techniques from information theory, such as a specialized Lempel-Ziv compressor, and tailored succinct data structures. Availability and Implementation: The GTRAC algorithm is available for download at: https://github.com/kedartatwawadi/GTRAC Contact: kedart@stanford.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27587665

  7. Settling the score: variant prioritization and Mendelian disease

    PubMed Central

    Eilbeck, Karen; Quinlan, Aaron; Yandell, Mark

    2018-01-01

    When investigating Mendelian disease using exome or genome sequencing, distinguishing disease-causing genetic variants from the multitude of candidate variants is a complex, multidimensional task. Many prioritization tools and online interpretation resources exist, and professional organizations have offered clinical guidelines for review and return of prioritization results. In this Review, we describe the strengths and weaknesses of widely used computational approaches, explain their roles in the diagnostic and discovery process and discuss how they can inform (and misinform) expert reviewers. We place variant prioritization in the wider context of gene prioritization, burden testing and genotype–phenotype association, and we discuss opportunities and challenges introduced by whole-genome sequencing. PMID:28804138

  8. A Bioinformatics Workflow for Variant Peptide Detection in Shotgun Proteomics*

    PubMed Central

    Li, Jing; Su, Zengliu; Ma, Ze-Qiang; Slebos, Robbert J. C.; Halvey, Patrick; Tabb, David L.; Liebler, Daniel C.; Pao, William; Zhang, Bing

    2011-01-01

    Shotgun proteomics data analysis usually relies on database search. However, commonly used protein sequence databases do not contain information on protein variants and thus prevent variant peptides and proteins from been identified. Including known coding variations into protein sequence databases could help alleviate this problem. Based on our recently published human Cancer Proteome Variation Database, we have created a protein sequence database that comprehensively annotates thousands of cancer-related coding variants collected in the Cancer Proteome Variation Database as well as noncancer-specific ones from the Single Nucleotide Polymorphism Database (dbSNP). Using this database, we then developed a data analysis workflow for variant peptide identification in shotgun proteomics. The high risk of false positive variant identifications was addressed by a modified false discovery rate estimation method. Analysis of colorectal cancer cell lines SW480, RKO, and HCT-116 revealed a total of 81 peptides that contain either noncancer-specific or cancer-related variations. Twenty-three out of 26 variants randomly selected from the 81 were confirmed by genomic sequencing. We further applied the workflow on data sets from three individual colorectal tumor specimens. A total of 204 distinct variant peptides were detected, and five carried known cancer-related mutations. Each individual showed a specific pattern of cancer-related mutations, suggesting potential use of this type of information for personalized medicine. Compatibility of the workflow has been tested with four popular database search engines including Sequest, Mascot, X!Tandem, and MyriMatch. In summary, we have developed a workflow that effectively uses existing genomic data to enable variant peptide detection in proteomics. PMID:21389108

  9. Positional bias in variant calls against draft reference assemblies.

    PubMed

    Briskine, Roman V; Shimizu, Kentaro K

    2017-03-28

    Whole genome resequencing projects may implement variant calling using draft reference genomes assembled de novo from short-read libraries. Despite lower quality of such assemblies, they allowed researchers to extend a wide range of population genetic and genome-wide association analyses to non-model species. As the variant calling pipelines are complex and involve many software packages, it is important to understand inherent biases and limitations at each step of the analysis. In this article, we report a positional bias present in variant calling performed against draft reference assemblies constructed from de Bruijn or string overlap graphs. We assessed how frequently variants appeared at each position counted from ends of a contig or scaffold sequence, and discovered unexpectedly high number of variants at the positions related to the length of either k-mers or reads used for the assembly. We detected the bias in both publicly available draft assemblies from Assemblathon 2 competition as well as in the assemblies we generated from our simulated short-read data. Simulations confirmed that the bias causing variants are predominantly false positives induced by reads from spatially distant repeated sequences. The bias is particularly strong in contig assemblies. Scaffolding does not eliminate the bias but tends to mitigate it because of the changes in variants' relative positions and alterations in read alignments. The bias can be effectively reduced by filtering out the variants that reside in repetitive elements. Draft genome sequences generated by several popular assemblers appear to be susceptible to the positional bias potentially affecting many resequencing projects in non-model species. The bias is inherent to the assembly algorithms and arises from their particular handling of repeated sequences. It is recommended to reduce the bias by filtering especially if higher-quality genome assembly cannot be achieved. Our findings can help other researchers to

  10. Amino acid sequence of tyrosinase from Neurospora crassa.

    PubMed Central

    Lerch, K

    1978-01-01

    The amino-acid sequence of tyrosinase from Neurospora crassa (monophenol,dihydroxyphenylalanine:oxygen oxidoreductase, EC 1.14.18.1) is reported. This copper-containing oxidase consists of a single polypeptide chain of 407 amino acids. The primary structure was determined by automated and manual sequence analysis on fragments produced by cleavage with cyanogen bromide and on peptides obtained by digestion with trypsin, pepsin, thermolysin, or chymotrypsin. The amino terminus of the protein is acetylated and the single cysteinyl residue 96 is covalently linked via a thioether bridge to histidyl residue 94. The formation and the possible role of this unusual structure in Neurospora tyrosinase is discussed. Dye-sensitized photooxidation of apotyrosinase and active-site-directed inactivation of the native enzyme indicate the possible involvement of histidyl residues 188, 192, 289, and 305 or 306 as ligands to the active-site copper as well as in the catalytic mechanism of this monooxygenase. PMID:151279

  11. Identification of a Novel De Novo Variant in the PAX3 Gene in Waardenburg Syndrome by Diagnostic Exome Sequencing: The First Molecular Diagnosis in Korea.

    PubMed

    Jang, Mi-Ae; Lee, Taeheon; Lee, Junnam; Cho, Eun-Hae; Ki, Chang-Seok

    2015-05-01

    Waardenburg syndrome (WS) is a clinically and genetically heterogeneous hereditary auditory pigmentary disorder characterized by congenital sensorineural hearing loss and iris discoloration. Many genes have been linked to WS, including PAX3, MITF, SNAI2, EDNRB, EDN3, and SOX10, and many additional genes have been associated with disorders with phenotypic overlap with WS. To screen all possible genes associated with WS and congenital deafness simultaneously, we performed diagnostic exome sequencing (DES) in a male patient with clinical features consistent with WS. Using DES, we identified a novel missense variant (c.220C>G; p.Arg74Gly) in exon 2 of the PAX3 gene in the patient. Further analysis by Sanger sequencing of the patient and his parents revealed a de novo occurrence of the variant. Our findings show that DES can be a useful tool for the identification of pathogenic gene variants in WS patients and for differentiation between WS and similar disorders. To the best of our knowledge, this is the first report of genetically confirmed WS in Korea.

  12. Identification of a Novel De Novo Variant in the PAX3 Gene in Waardenburg Syndrome by Diagnostic Exome Sequencing: The First Molecular Diagnosis in Korea

    PubMed Central

    Jang, Mi-Ae; Lee, Taeheon; Lee, Junnam

    2015-01-01

    Waardenburg syndrome (WS) is a clinically and genetically heterogeneous hereditary auditory pigmentary disorder characterized by congenital sensorineural hearing loss and iris discoloration. Many genes have been linked to WS, including PAX3, MITF, SNAI2, EDNRB, EDN3, and SOX10, and many additional genes have been associated with disorders with phenotypic overlap with WS. To screen all possible genes associated with WS and congenital deafness simultaneously, we performed diagnostic exome sequencing (DES) in a male patient with clinical features consistent with WS. Using DES, we identified a novel missense variant (c.220C>G; p.Arg74Gly) in exon 2 of the PAX3 gene in the patient. Further analysis by Sanger sequencing of the patient and his parents revealed a de novo occurrence of the variant. Our findings show that DES can be a useful tool for the identification of pathogenic gene variants in WS patients and for differentiation between WS and similar disorders. To the best of our knowledge, this is the first report of genetically confirmed WS in Korea. PMID:25932447

  13. Genetic variants in desaturase gene, erythrocyte fatty acids, and risk for type 2 diabetes in Chinese Hans.

    PubMed

    Huang, Tao; Sun, Jianqin; Chen, Yanqiu; Xie, Hua; Xu, Danfeng; Huang, Jinyan; Li, Duo

    2014-01-01

    The aim of this study was to examine the association of the genetic variants in the fatty acid desaturase (FADS) gene cluster with erythrocyte phospholipid fatty acids (PLFA), and their relation to risk for type 2 diabetes mellitus (T2DM) in Han Chinese. Seven hundred and fifty-eight patients with T2DM and 400 healthy individuals were recruited. The erythrocyte PLFA and single-nucleotide polymorphism were determined by standard method. Minor allele homozygotes and heterozygotes of rs174575 and rs174537 had lower PL 20:4 ω-6 levels in healthy individuals. Minor allele homozygotes and heterozygotes of rs174455 in FADS3 gene had lower levels of 22:5 ω-3, 20:4 ω-6, and Δ5desaturase activity in patients with T2DM. Erythrocyte membrane PL 18:3 ω-3 (P for trend = 0.002), 22:5 ω-3 (P for trend < 0.001), ω-3 polyunsaturated fatty acid (P for trend < 0.001), and ω-3:ω-6 (P for trend < 0.001) were significantly inversely associated with risk for T2DM. Genetic variants in the FADS gene cluster are associated with altered erythrocyte PLFAs. High levels of PL 18:3 ω-3, 22:5 ω-3, and total ω-3 polyunsaturated fatty acid were associated with low risk for T2DM. Copyright © 2014 Elsevier Inc. All rights reserved.

  14. Fine-mapping the human leukocyte antigen locus in rheumatoid arthritis and other rheumatic diseases: identifying causal amino acid variants?

    PubMed

    van Heemst, Jurgen; Huizinga, Tom J W; van der Woude, Diane; Toes, René E M

    2015-05-01

    To provide an update on and the context of the recent findings obtained with novel statistical methods on the association of the human leukocyte antigen (HLA) locus with rheumatic diseases. Novel single nucleotide polymorphism fine-mapping data obtained for the HLA locus have indicated the strongest association with amino acid positions 11 and 13 of HLA-DRB1 molecule for several rheumatic diseases. On the basis of these data, a dominant role for position 11/13 in driving the association with these diseases is proposed and the identification of causal variants in the HLA region in relation to disease susceptibility implicated. The HLA class II locus is the most important risk factor for several rheumatic diseases. Recently, new statistical approaches have identified previously unrecognized amino acid positions in the HLA-DR molecule that associate with anticitrullinated protein antibody-negative and anticitrullinated protein antibody-positive rheumatoid arthritis. Likewise, similar findings have been made for other rheumatic conditions such as giant-cell arteritis and systemic lupus erythematosus. Interestingly, all these studies point toward an association with the same amino acid positions: amino acid positions 11 and 13 of the HLA-DR β chain. As both these positions influence peptide binding by HLA-DR and have been implicated in antigen presentation, the novel fine-mapping approach is proposed to map causal variants in the HLA region relevant to rheumatoid arthritis and several rheumatic diseases. If these interpretations are correct, they would direct the biological research aiming to address the explanation for the HLA-disease association. Here, we provide an overview of the recent findings and evidence from literature that, although relevant new insights have been obtained on HLA-disease associations, the interpretation of the biological role of these amino acids as causal variants explaining that such associations should be taken with caution.

  15. OVAS: an open-source variant analysis suite with inheritance modelling.

    PubMed

    Mozere, Monika; Tekman, Mehmet; Kari, Jameela; Bockenhauer, Detlef; Kleta, Robert; Stanescu, Horia

    2018-02-08

    The advent of modern high-throughput genetics continually broadens the gap between the rising volume of sequencing data, and the tools required to process them. The need to pinpoint a small subset of functionally important variants has now shifted towards identifying the critical differences between normal variants and disease-causing ones. The ever-increasing reliance on cloud-based services for sequence analysis and the non-transparent methods they utilize has prompted the need for more in-situ services that can provide a safer and more accessible environment to process patient data, especially in circumstances where continuous internet usage is limited. To address these issues, we herein propose our standalone Open-source Variant Analysis Sequencing (OVAS) pipeline; consisting of three key stages of processing that pertain to the separate modes of annotation, filtering, and interpretation. Core annotation performs variant-mapping to gene-isoforms at the exon/intron level, append functional data pertaining the type of variant mutation, and determine hetero/homozygosity. An extensive inheritance-modelling module in conjunction with 11 other filtering components can be used in sequence ranging from single quality control to multi-file penetrance model specifics such as X-linked recessive or mosaicism. Depending on the type of interpretation required, additional annotation is performed to identify organ specificity through gene expression and protein domains. In the course of this paper we analysed an autosomal recessive case study. OVAS made effective use of the filtering modules to recapitulate the results of the study by identifying the prescribed compound-heterozygous disease pattern from exome-capture sequence input samples. OVAS is an offline open-source modular-driven analysis environment designed to annotate and extract useful variants from Variant Call Format (VCF) files, and process them under an inheritance context through a top-down filtering schema of

  16. A novel variant in the SLC12A1 gene in two families with antenatal Bartter syndrome.

    PubMed

    Breinbjerg, Anders; Siggaard Rittig, Charlotte; Gregersen, Niels; Rittig, Søren; Hvarregaard Christensen, Jane

    2017-01-01

    Bartter syndrome is an autosomal-recessive inherited disease in which patients present with hypokalaemia and metabolic alkalosis. We present two apparently nonrelated cases with antenatal Bartter syndrome type I, due to a novel variant in the SLC12A1 gene encoding the bumetanide-sensitive sodium-(potassium)-chloride cotransporter 2 in the thick ascending limb of the loop of Henle. Blood samples were received from the two cases and 19 of their relatives, and deoxyribonucleic acid was extracted. The coding regions of the SLC12A1 gene were amplified using polymerase chain reaction, followed by bidirectional direct deoxyribonucleic acid sequencing. Each affected child in the two families was homozygous for a novel inherited variant in the SLC12A1gene, c.1614T>A. The variant predicts a change from a tyrosine codon to a stop codon (p.Tyr538Ter). The two cases presented antenatally and at six months of age, respectively. The two cases were homozygous for the same variant in the SLC12A1 gene, but presented clinically at different ages. This could eventually be explained by the presence of other gene variants or environmental factors modifying the phenotypes. The phenotypes of the patients were similar to other patients with antenatal Bartter syndrome. ©2016 Foundation Acta Paediatrica. Published by John Wiley & Sons Ltd.

  17. PANTHER-PSEP: predicting disease-causing genetic variants using position-specific evolutionary preservation.

    PubMed

    Tang, Haiming; Thomas, Paul D

    2016-07-15

    PANTHER-PSEP is a new software tool for predicting non-synonymous genetic variants that may play a causal role in human disease. Several previous variant pathogenicity prediction methods have been proposed that quantify evolutionary conservation among homologous proteins from different organisms. PANTHER-PSEP employs a related but distinct metric based on 'evolutionary preservation': homologous proteins are used to reconstruct the likely sequences of ancestral proteins at nodes in a phylogenetic tree, and the history of each amino acid can be traced back in time from its current state to estimate how long that state has been preserved in its ancestors. Here, we describe the PSEP tool, and assess its performance on standard benchmarks for distinguishing disease-associated from neutral variation in humans. On these benchmarks, PSEP outperforms not only previous tools that utilize evolutionary conservation, but also several highly used tools that include multiple other sources of information as well. For predicting pathogenic human variants, the trace back of course starts with a human 'reference' protein sequence, but the PSEP tool can also be applied to predicting deleterious or pathogenic variants in reference proteins from any of the ∼100 other species in the PANTHER database. PANTHER-PSEP is freely available on the web at http://pantherdb.org/tools/csnpScoreForm.jsp Users can also download the command-line based tool at ftp://ftp.pantherdb.org/cSNP_analysis/PSEP/ CONTACT: pdthomas@usc.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  18. SIBIS: a Bayesian model for inconsistent protein sequence estimation.

    PubMed

    Khenoussi, Walyd; Vanhoutrève, Renaud; Poch, Olivier; Thompson, Julie D

    2014-09-01

    The prediction of protein coding genes is a major challenge that depends on the quality of genome sequencing, the accuracy of the model used to elucidate the exonic structure of the genes and the complexity of the gene splicing process leading to different protein variants. As a consequence, today's protein databases contain a huge amount of inconsistency, due to both natural variants and sequence prediction errors. We have developed a new method, called SIBIS, to detect such inconsistencies based on the evolutionary information in multiple sequence alignments. A Bayesian framework, combined with Dirichlet mixture models, is used to estimate the probability of observing specific amino acids and to detect inconsistent or erroneous sequence segments. We evaluated the performance of SIBIS on a reference set of protein sequences with experimentally validated errors and showed that the sensitivity is significantly higher than previous methods, with only a small loss of specificity. We also assessed a large set of human sequences from the UniProt database and found evidence of inconsistency in 48% of the previously uncharacterized sequences. We conclude that the integration of quality control methods like SIBIS in automatic analysis pipelines will be critical for the robust inference of structural, functional and phylogenetic information from these sequences. Source code, implemented in C on a linux system, and the datasets of protein sequences are freely available for download at http://www.lbgi.fr/∼julie/SIBIS. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  19. A statistical method for the detection of variants from next-generation resequencing of DNA pools.

    PubMed

    Bansal, Vikas

    2010-06-15

    Next-generation sequencing technologies have enabled the sequencing of several human genomes in their entirety. However, the routine resequencing of complete genomes remains infeasible. The massive capacity of next-generation sequencers can be harnessed for sequencing specific genomic regions in hundreds to thousands of individuals. Sequencing-based association studies are currently limited by the low level of multiplexing offered by sequencing platforms. Pooled sequencing represents a cost-effective approach for studying rare variants in large populations. To utilize the power of DNA pooling, it is important to accurately identify sequence variants from pooled sequencing data. Detection of rare variants from pooled sequencing represents a different challenge than detection of variants from individual sequencing. We describe a novel statistical approach, CRISP [Comprehensive Read analysis for Identification of Single Nucleotide Polymorphisms (SNPs) from Pooled sequencing] that is able to identify both rare and common variants by using two approaches: (i) comparing the distribution of allele counts across multiple pools using contingency tables and (ii) evaluating the probability of observing multiple non-reference base calls due to sequencing errors alone. Information about the distribution of reads between the forward and reverse strands and the size of the pools is also incorporated within this framework to filter out false variants. Validation of CRISP on two separate pooled sequencing datasets generated using the Illumina Genome Analyzer demonstrates that it can detect 80-85% of SNPs identified using individual sequencing while achieving a low false discovery rate (3-5%). Comparison with previous methods for pooled SNP detection demonstrates the significantly lower false positive and false negative rates for CRISP. Implementation of this method is available at http://polymorphism.scripps.edu/~vbansal/software/CRISP/.

  20. Rare Variants in RTEL1 Are Associated with Familial Interstitial Pneumonia

    PubMed Central

    Cogan, Joy D.; Zhao, Min; Mitchell, Daphne B.; Rives, Lynette; Markin, Cheryl; Garnett, Errine T.; Montgomery, Keri H.; Mason, Wendi R.; McKean, David F.; Powers, Julia; Murphy, Elissa; Olson, Lana M.; Choi, Leena; Cheng, Dong-Sheng; Blue, Elizabeth Marchani; Young, Lisa R.; Lancaster, Lisa H.; Steele, Mark P.; Brown, Kevin K.; Schwarz, Marvin I.; Fingerlin, Tasha E.; Schwartz, David A.; Lawson, William E.; Loyd, James E.; Zhao, Zhongming; Phillips, John A.; Blackwell, Timothy S.

    2015-01-01

    Rationale: Up to 20% of cases of idiopathic interstitial pneumonia cluster in families, comprising the syndrome of familial interstitial pneumonia (FIP); however, the genetic basis of FIP remains uncertain in most families. Objectives: To determine if new disease-causing rare genetic variants could be identified using whole-exome sequencing of affected members from FIP families, providing additional insights into disease pathogenesis. Methods: Affected subjects from 25 kindreds were selected from an ongoing FIP registry for whole-exome sequencing from genomic DNA. Candidate rare variants were confirmed by Sanger sequencing, and cosegregation analysis was performed in families, followed by additional sequencing of affected individuals from another 163 kindreds. Measurements and Main Results: We identified a potentially damaging rare variant in the gene encoding for regulator of telomere elongation helicase 1 (RTEL1) that segregated with disease and was associated with very short telomeres in peripheral blood mononuclear cells in 1 of 25 families in our original whole-exome sequencing cohort. Evaluation of affected individuals in 163 additional kindreds revealed another eight families (4.7%) with heterozygous rare variants in RTEL1 that segregated with clinical FIP. Probands and unaffected carriers of these rare variants had short telomeres (<10% for age) in peripheral blood mononuclear cells and increased T-circle formation, suggesting impaired RTEL1 function. Conclusions: Rare loss-of-function variants in RTEL1 represent a newly defined genetic predisposition for FIP, supporting the importance of telomere-related pathways in pulmonary fibrosis. PMID:25607374

  1. Rare variants in RTEL1 are associated with familial interstitial pneumonia.

    PubMed

    Cogan, Joy D; Kropski, Jonathan A; Zhao, Min; Mitchell, Daphne B; Rives, Lynette; Markin, Cheryl; Garnett, Errine T; Montgomery, Keri H; Mason, Wendi R; McKean, David F; Powers, Julia; Murphy, Elissa; Olson, Lana M; Choi, Leena; Cheng, Dong-Sheng; Blue, Elizabeth Marchani; Young, Lisa R; Lancaster, Lisa H; Steele, Mark P; Brown, Kevin K; Schwarz, Marvin I; Fingerlin, Tasha E; Schwartz, David A; Lawson, William E; Loyd, James E; Zhao, Zhongming; Phillips, John A; Blackwell, Timothy S

    2015-03-15

    Up to 20% of cases of idiopathic interstitial pneumonia cluster in families, comprising the syndrome of familial interstitial pneumonia (FIP); however, the genetic basis of FIP remains uncertain in most families. To determine if new disease-causing rare genetic variants could be identified using whole-exome sequencing of affected members from FIP families, providing additional insights into disease pathogenesis. Affected subjects from 25 kindreds were selected from an ongoing FIP registry for whole-exome sequencing from genomic DNA. Candidate rare variants were confirmed by Sanger sequencing, and cosegregation analysis was performed in families, followed by additional sequencing of affected individuals from another 163 kindreds. We identified a potentially damaging rare variant in the gene encoding for regulator of telomere elongation helicase 1 (RTEL1) that segregated with disease and was associated with very short telomeres in peripheral blood mononuclear cells in 1 of 25 families in our original whole-exome sequencing cohort. Evaluation of affected individuals in 163 additional kindreds revealed another eight families (4.7%) with heterozygous rare variants in RTEL1 that segregated with clinical FIP. Probands and unaffected carriers of these rare variants had short telomeres (<10% for age) in peripheral blood mononuclear cells and increased T-circle formation, suggesting impaired RTEL1 function. Rare loss-of-function variants in RTEL1 represent a newly defined genetic predisposition for FIP, supporting the importance of telomere-related pathways in pulmonary fibrosis.

  2. β-Globin gene sequencing of hemoglobin Austin revises the historically reported electrophoretic migration pattern.

    PubMed

    Racsa, Lori D; Luu, Hung S; Park, Jason Y; Mitui, Midori; Timmons, Charles F

    2014-06-01

    Hemoglobin (Hb) Austin was defined in 1977, using amino acid sequencing of samples from 3 unrelated Mexican-Americans, as a substitution of serine for arginine at position 40 of the β-globin chain (Arg40Ser). Its electrophoretic migration on both cellulose acetate (pH 8.4) and citrate agar (pH 6.2) was reported between Hb F and Hb A, and this description persists in reference literature. OBJECTIVES.-To review the clinical features and redefine the diagnostic characteristics of Hb Austin. Eight samples from 6 unrelated individuals and 2 siblings, all with Hispanic surnames, were submitted for abnormal Hb identification between June 2010 and September 2011. High-performance liquid chromatography, isoelectric focusing (IEF), citrate agar electrophoresis, and bidirectional DNA sequencing of the entire β-globin gene were performed. DNA sequencing confirmed all 8 individuals to be heterozygous for Hb Austin (Arg40Ser). Retention time on high-performance liquid chromatography and migration on citrate agar electrophoresis were consistent with that identification. Migration on IEF, however, was not between Hb F and Hb A, as predicted from the report of cellulose acetate electrophoresis. By IEF, Hb Austin migrated anodal to ("faster than") Hb A. Hemoglobin Austin (Arg40Ser) appears on IEF as a "fast," anodally migrating, Hb variant, just as would be expected from its amino acid substitution. The cited historic report is, at best, not applicable to IEF and is probably erroneous. Our observation of 8 cases in 16 months suggests that this variant may be relatively common in some Hispanic populations, making its recognition important. Furthermore, gene sequencing is proving itself a powerful and reliable tool for definitive identification of Hb variants.

  3. ICO amplicon NGS data analysis: a Web tool for variant detection in common high-risk hereditary cancer genes analyzed by amplicon GS Junior next-generation sequencing.

    PubMed

    Lopez-Doriga, Adriana; Feliubadaló, Lídia; Menéndez, Mireia; Lopez-Doriga, Sergio; Morón-Duran, Francisco D; del Valle, Jesús; Tornero, Eva; Montes, Eva; Cuesta, Raquel; Campos, Olga; Gómez, Carolina; Pineda, Marta; González, Sara; Moreno, Victor; Capellá, Gabriel; Lázaro, Conxi

    2014-03-01

    Next-generation sequencing (NGS) has revolutionized genomic research and is set to have a major impact on genetic diagnostics thanks to the advent of benchtop sequencers and flexible kits for targeted libraries. Among the main hurdles in NGS are the difficulty of performing bioinformatic analysis of the huge volume of data generated and the high number of false positive calls that could be obtained, depending on the NGS technology and the analysis pipeline. Here, we present the development of a free and user-friendly Web data analysis tool that detects and filters sequence variants, provides coverage information, and allows the user to customize some basic parameters. The tool has been developed to provide accurate genetic analysis of targeted sequencing of common high-risk hereditary cancer genes using amplicon libraries run in a GS Junior System. The Web resource is linked to our own mutation database, to assist in the clinical classification of identified variants. We believe that this tool will greatly facilitate the use of the NGS approach in routine laboratories.

  4. The impact of FADS genetic variants on ω6 polyunsaturated fatty acid metabolism in African Americans

    PubMed Central

    2011-01-01

    Background Arachidonic acid (AA) is a long-chain omega-6 polyunsaturated fatty acid (PUFA) synthesized from the precursor dihomo-gamma-linolenic acid (DGLA) that plays a vital role in immunity and inflammation. Variants in the Fatty Acid Desaturase (FADS) family of genes on chromosome 11q have been shown to play a role in PUFA metabolism in populations of European and Asian ancestry; no work has been done in populations of African ancestry to date. Results In this study, we report that African Americans have significantly higher circulating levels of plasma AA (p = 1.35 × 10-48) and lower DGLA levels (p = 9.80 × 10-11) than European Americans. Tests for association in N = 329 individuals across 80 nucleotide polymorphisms (SNPs) in the Fatty Acid Desaturase (FADS) locus revealed significant association with AA, DGLA and the AA/DGLA ratio, a measure of enzymatic efficiency, in both racial groups (peak signal p = 2.85 × 10-16 in African Americans, 2.68 × 10-23 in European Americans). Ancestry-related differences were observed at an upstream marker previously associated with AA levels (rs174537), wherein, 79-82% of African Americans carry two copies of the G allele compared to only 42-45% of European Americans. Importantly, the allelic effect of the G allele, which is associated with enhanced conversion of DGLA to AA, on enzymatic efficiency was similar in both groups. Conclusions We conclude that the impact of FADS genetic variants on PUFA metabolism, specifically AA levels, is likely more pronounced in African Americans due to the larger proportion of individuals carrying the genotype associated with increased FADS1 enzymatic conversion of DGLA to AA. PMID:21599946

  5. Nanopores and nucleic acids: prospects for ultrarapid sequencing

    NASA Technical Reports Server (NTRS)

    Deamer, D. W.; Akeson, M.

    2000-01-01

    DNA and RNA molecules can be detected as they are driven through a nanopore by an applied electric field at rates ranging from several hundred microseconds to a few milliseconds per molecule. The nanopore can rapidly discriminate between pyrimidine and purine segments along a single-stranded nucleic acid molecule. Nanopore detection and characterization of single molecules represents a new method for directly reading information encoded in linear polymers. If single-nucleotide resolution can be achieved, it is possible that nucleic acid sequences can be determined at rates exceeding a thousand bases per second.

  6. Functional characterization of rare FOXP2 variants in neurodevelopmental disorder.

    PubMed

    Estruch, Sara B; Graham, Sarah A; Chinnappa, Swathi M; Deriziotis, Pelagia; Fisher, Simon E

    2016-01-01

    Heterozygous disruption of FOXP2 causes a rare form of speech and language impairment. Screens of the FOXP2 sequence in individuals with speech/language-related disorders have identified several rare protein-altering variants, but their phenotypic relevance is often unclear. FOXP2 encodes a transcription factor with a forkhead box DNA-binding domain, but little is known about the functions of protein regions outside this domain. We performed detailed functional analyses of seven rare FOXP2 variants found in affected cases, including three which have not been previously characterized, testing intracellular localization, transcriptional regulation, dimerization, and interaction with other proteins. To shed further light on molecular functions of FOXP2, we characterized the interaction between this transcription factor and co-repressor proteins of the C-terminal binding protein (CTBP) family. Finally, we analysed the functional significance of the polyglutamine tracts in FOXP2, since tract length variations have been reported in cases of neurodevelopmental disorder. We confirmed etiological roles of multiple FOXP2 variants. Of three variants that have been suggested to cause speech/language disorder, but never before been characterized, only one showed functional effects. For the other two, we found no effects on protein function in any assays, suggesting that they are incidental to the phenotype. We identified a CTBP-binding region within the N-terminal portion of FOXP2. This region includes two amino acid substitutions that occurred on the human lineage following the split from chimpanzees. However, we did not observe any effects of these amino acid changes on CTBP binding or other core aspects of FOXP2 function. Finally, we found that FOXP2 variants with reduced polyglutamine tracts did not exhibit altered behaviour in cellular assays, indicating that such tracts are non-essential for core aspects of FOXP2 function, and that tract variation is unlikely to be a

  7. Assessing pathogenicity for novel mutation/sequence variants: the value of healthy older individuals.

    PubMed

    Zatz, Mayana; Pavanello, Rita de Cassia M; Lourenço, Naila Cristina V; Cerqueira, Antonia; Lazar, Monize; Vainzof, Mariz

    2012-12-01

    Improvement in DNA technology is increasingly revealing unexpected/unknown mutations in healthy persons and generating anxiety due to their still unknown health consequences. We report a 44-year-old healthy father of a 10-year-old daughter with bilateral coloboma and hearing loss, but without muscle weakness, in whom a whole-genome CGH revealed a deletion of exons 38-44 in the dystrophin gene. This mutation was inherited from her asymptomatic father, who was further clinically and molecularly evaluated for prognosis and genetic counseling (GC). This deletion was never identified by us in 982 Duchenne/Becker patients. To assess whether the present case represents a rare case of non-penetrance, and aiming to obtain more information for prognosis and GC, we suggested that healthy older relatives submit their DNA for analysis, to which several complied. Mutation analysis revealed that his mother, brother, and 56-year-old maternal uncle also carry the 38-44 deletion, suggesting it an unlikely cause of muscle weakness. Genome sequencing will disclose mutations and variants whose health impact are still unknown, raising important problems in interpreting results, defining prognosis, and discussing GC. We suggest that, in addition to family history, keeping the DNA of older relatives could be very informative, in particular for those interested in having their genome sequenced.

  8. VIPER: a web application for rapid expert review of variant calls.

    PubMed

    Wöste, Marius; Dugas, Martin

    2018-06-01

    With the rapid development in next-generation sequencing, cost and time requirements for genomic sequencing are decreasing, enabling applications in many areas such as cancer research. Many tools have been developed to analyze genomic variation ranging from single nucleotide variants to whole chromosomal aberrations. As sequencing throughput increases, the number of variants called by such tools also grows. Often employed manual inspection of such calls is thus becoming a time-consuming procedure. We developed the Variant InsPector and Expert Rating tool (VIPER) to speed up this process by integrating the Integrative Genomics Viewer into a web application. Analysts can then quickly iterate through variants, apply filters and make decisions based on the generated images and variant metadata. VIPER was successfully employed in analyses with manual inspection of more than 10 000 calls. VIPER is implemented in Java and Javascript and is freely available at https://github.com/MarWoes/viper. marius.woeste@uni-muenster.de. Supplementary data are available at Bioinformatics online.

  9. PubDNA Finder: a web database linking full-text articles to sequences of nucleic acids.

    PubMed

    García-Remesal, Miguel; Cuevas, Alejandro; Pérez-Rey, David; Martín, Luis; Anguita, Alberto; de la Iglesia, Diana; de la Calle, Guillermo; Crespo, José; Maojo, Víctor

    2010-11-01

    PubDNA Finder is an online repository that we have created to link PubMed Central manuscripts to the sequences of nucleic acids appearing in them. It extends the search capabilities provided by PubMed Central by enabling researchers to perform advanced searches involving sequences of nucleic acids. This includes, among other features (i) searching for papers mentioning one or more specific sequences of nucleic acids and (ii) retrieving the genetic sequences appearing in different articles. These additional query capabilities are provided by a searchable index that we created by using the full text of the 176 672 papers available at PubMed Central at the time of writing and the sequences of nucleic acids appearing in them. To automatically extract the genetic sequences occurring in each paper, we used an original method we have developed. The database is updated monthly by automatically connecting to the PubMed Central FTP site to retrieve and index new manuscripts. Users can query the database via the web interface provided. PubDNA Finder can be freely accessed at http://servet.dia.fi.upm.es:8080/pubdnafinder

  10. Identifying Mendelian disease genes with the Variant Effect Scoring Tool

    PubMed Central

    2013-01-01

    Background Whole exome sequencing studies identify hundreds to thousands of rare protein coding variants of ambiguous significance for human health. Computational tools are needed to accelerate the identification of specific variants and genes that contribute to human disease. Results We have developed the Variant Effect Scoring Tool (VEST), a supervised machine learning-based classifier, to prioritize rare missense variants with likely involvement in human disease. The VEST classifier training set comprised ~ 45,000 disease mutations from the latest Human Gene Mutation Database release and another ~45,000 high frequency (allele frequency >1%) putatively neutral missense variants from the Exome Sequencing Project. VEST outperforms some of the most popular methods for prioritizing missense variants in carefully designed holdout benchmarking experiments (VEST ROC AUC = 0.91, PolyPhen2 ROC AUC = 0.86, SIFT4.0 ROC AUC = 0.84). VEST estimates variant score p-values against a null distribution of VEST scores for neutral variants not included in the VEST training set. These p-values can be aggregated at the gene level across multiple disease exomes to rank genes for probable disease involvement. We tested the ability of an aggregate VEST gene score to identify candidate Mendelian disease genes, based on whole-exome sequencing of a small number of disease cases. We used whole-exome data for two Mendelian disorders for which the causal gene is known. Considering only genes that contained variants in all cases, the VEST gene score ranked dihydroorotate dehydrogenase (DHODH) number 2 of 2253 genes in four cases of Miller syndrome, and myosin-3 (MYH3) number 2 of 2313 genes in three cases of Freeman Sheldon syndrome. Conclusions Our results demonstrate the potential power gain of aggregating bioinformatics variant scores into gene-level scores and the general utility of bioinformatics in assisting the search for disease genes in large-scale exome sequencing studies. VEST is

  11. Common and rare variants associated with kidney stones and biochemical traits

    PubMed Central

    Oddsson, Asmundur; Sulem, Patrick; Helgason, Hannes; Edvardsson, Vidar O.; Thorleifsson, Gudmar; Sveinbjörnsson, Gardar; Haraldsdottir, Eik; Eyjolfsson, Gudmundur I.; Sigurdardottir, Olof; Olafsson, Isleifur; Masson, Gisli; Holm, Hilma; Gudbjartsson, Daniel F.; Thorsteinsdottir, Unnur; Indridason, Olafur S.; Palsson, Runolfur; Stefansson, Kari

    2015-01-01

    Kidney stone disease is a complex disorder with a strong genetic component. We conducted a genome-wide association study of 28.3 million sequence variants detected through whole-genome sequencing of 2,636 Icelanders that were imputed into 5,419 kidney stone cases, including 2,172 cases with a history of recurrent kidney stones, and 279,870 controls. We identify sequence variants associating with kidney stones at ALPL (rs1256328[T], odds ratio (OR)=1.21, P=5.8 × 10−10) and a suggestive association at CASR (rs7627468[A], OR=1.16, P=2.0 × 10−8). Focusing our analysis on coding sequence variants in 63 genes with preferential kidney expression we identify two rare missense variants SLC34A1 p.Tyr489Cys (OR=2.38, P=2.8 × 10−5) and TRPV5 p.Leu530Arg (OR=3.62, P=4.1 × 10−5) associating with recurrent kidney stones. We also observe associations of the identified kidney stone variants with biochemical traits in a large population set, indicating potential biological mechanism. PMID:26272126

  12. PUF60 variants cause a syndrome of ID, short stature, microcephaly, coloboma, craniofacial, cardiac, renal and spinal features

    PubMed Central

    Low, Karen J; Ansari, Morad; Abou Jamra, Rami; Clarke, Angus; El Chehadeh, Salima; FitzPatrick, David R; Greenslade, Mark; Henderson, Alex; Hurst, Jane; Keller, Kory; Kuentz, Paul; Prescott, Trine; Roessler, Franziska; Selmer, Kaja K; Schneider, Michael C; Stewart, Fiona; Tatton-Brown, Katrina; Thevenon, Julien; Vigeland, Magnus D; Vogt, Julie; Willems, Marjolaine; Zonana, Jonathan; Study, D D D; Smithson, Sarah F

    2017-01-01

    PUF60 encodes a nucleic acid-binding protein, a component of multimeric complexes regulating RNA splicing and transcription. In 2013, patients with microdeletions of chromosome 8q24.3 including PUF60 were found to have developmental delay, microcephaly, craniofacial, renal and cardiac defects. Very similar phenotypes have been described in six patients with variants in PUF60, suggesting that it underlies the syndrome. We report 12 additional patients with PUF60 variants who were ascertained using exome sequencing: six through the Deciphering Developmental Disorders Study and six through similar projects. Detailed phenotypic analysis of all patients was undertaken. All 12 patients had de novo heterozygous PUF60 variants on exome analysis, each confirmed by Sanger sequencing: four frameshift variants resulting in premature stop codons, three missense variants that clustered within the RNA recognition motif of PUF60 and five essential splice-site (ESS) variant. Analysis of cDNA from a fibroblast cell line derived from one of the patients with an ESS variants revealed aberrant splicing. The consistent feature was developmental delay and most patients had short stature. The phenotypic variability was striking; however, we observed similarities including spinal segmentation anomalies, congenital heart disease, ocular colobomata, hand anomalies and (in two patients) unilateral renal agenesis/horseshoe kidney. Characteristic facial features included micrognathia, a thin upper lip and long philtrum, narrow almond-shaped palpebral fissures, synophrys, flared eyebrows and facial hypertrichosis. Heterozygote loss-of-function variants in PUF60 cause a phenotype comprising growth/developmental delay and craniofacial, cardiac, renal, ocular and spinal anomalies, adding to disorders of human development resulting from aberrant RNA processing/spliceosomal function. PMID:28327570

  13. Whole-genome sequence-based analysis of thyroid function.

    PubMed

    Taylor, Peter N; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J; Traglia, Michela; Brown, Suzanne J; Mullin, Benjamin H; Shihab, Hashem A; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R; Beilby, John P; Charoen, Pimphen; Danecek, Petr; Dudbridge, Frank; Forgetta, Vincenzo; Greenwood, Celia; Grundberg, Elin; Johnson, Andrew D; Hui, Jennie; Lim, Ee M; McCarthy, Shane; Muddyman, Dawn; Panicker, Vijay; Perry, John R B; Bell, Jordana T; Yuan, Wei; Relton, Caroline; Gaunt, Tom; Schlessinger, David; Abecasis, Goncalo; Cucca, Francesco; Surdulescu, Gabriela L; Woltersdorf, Wolfram; Zeggini, Eleftheria; Zheng, Hou-Feng; Toniolo, Daniela; Dayan, Colin M; Naitza, Silvia; Walsh, John P; Spector, Tim; Davey Smith, George; Durbin, Richard; Richards, J Brent; Sanna, Serena; Soranzo, Nicole; Timpson, Nicholas J; Wilson, Scott G

    2015-03-06

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 10(-9)) and a new independent variant in PDE8B (MAF=10.4%, P=5.94 × 10(-14)). For FT4, we report a low-frequency variant near B4GALT6/SLC25A52 (MAF=3.2%, P=1.27 × 10(-9)) tagging a rare TTR variant (MAF=0.4%, P=2.14 × 10(-11)). All common variants explain ≥20% of the variance in TSH and FT4. Analysis of rare variants (MAF<1%) using sequence kernel association testing reveals a novel association with FT4 in NRG1. Our results demonstrate that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function.

  14. Extreme-phenotype genome-wide association study (XP-GWAS): a method for identifying trait-associated variants by sequencing pools of individuals selected from a diversity panel.

    PubMed

    Yang, Jinliang; Jiang, Haiying; Yeh, Cheng-Ting; Yu, Jianming; Jeddeloh, Jeffrey A; Nettleton, Dan; Schnable, Patrick S

    2015-11-01

    Although approaches for performing genome-wide association studies (GWAS) are well developed, conventional GWAS requires high-density genotyping of large numbers of individuals from a diversity panel. Here we report a method for performing GWAS that does not require genotyping of large numbers of individuals. Instead XP-GWAS (extreme-phenotype GWAS) relies on genotyping pools of individuals from a diversity panel that have extreme phenotypes. This analysis measures allele frequencies in the extreme pools, enabling discovery of associations between genetic variants and traits of interest. This method was evaluated in maize (Zea mays) using the well-characterized kernel row number trait, which was selected to enable comparisons between the results of XP-GWAS and conventional GWAS. An exome-sequencing strategy was used to focus sequencing resources on genes and their flanking regions. A total of 0.94 million variants were identified and served as evaluation markers; comparisons among pools showed that 145 of these variants were statistically associated with the kernel row number phenotype. These trait-associated variants were significantly enriched in regions identified by conventional GWAS. XP-GWAS was able to resolve several linked QTL and detect trait-associated variants within a single gene under a QTL peak. XP-GWAS is expected to be particularly valuable for detecting genes or alleles responsible for quantitative variation in species for which extensive genotyping resources are not available, such as wild progenitors of crops, orphan crops, and other poorly characterized species such as those of ecological interest. © 2015 The Authors The Plant Journal published by Society for Experimental Biology and John Wiley & Sons Ltd.

  15. Identification of interleukin-26 in the dromedary camel (Camelus dromedarius): Evidence of alternative splicing and isolation of novel splice variants.

    PubMed

    Premraj, Avinash; Nautiyal, Binita; Aleyas, Abi G; Rasool, Thaha Jamal

    2015-10-01

    Interleukin-26 (IL-26) is a member of the IL-10 family of cytokines. Though conserved across vertebrates, the IL-26 gene is functionally inactivated in a few mammals like rat, mouse and horse. We report here the identification, isolation and cloning of the cDNA of IL-26 from the dromedary camel. The camel cDNA contains a 516 bp open reading frame encoding a 171 amino acid precursor protein, including a 21 amino acid signal peptide. Sequence analysis revealed high similarity with other mammalian IL-26 homologs and the conservation of IL-10 cytokine family domain structure including key amino acid residues. We also report the identification and cloning of four novel transcript variants produced by alternative splicing at the Exon 3-Exon 4 regions of the gene. Three of the alternative splice variants had premature termination codons and are predicted to code for truncated proteins. The transcript variant 4 (Tv4) having an insertion of an extra 120 bp nucleotides in the ORF was predicted to encode a full length protein product with 40 extra amino acid residues. The mRNA transcripts of all the variants were identified in lymph node, where as fewer variants were observed in other tissues like blood, liver and kidney. The expression of Tv2 and Tv3 were found to be up regulated in mitogen induced camel peripheral blood mononuclear cells. IL-26-Tv2 expression was also induced in camel fibroblast cells infected with Camel pox virus in-vitro. The identification of the transcript variants of IL-26 from the dromedary camel is the first report of alternative splicing for IL-26 in a species in which the gene has not been inactivated. Copyright © 2015 Elsevier Ltd. All rights reserved.

  16. Complete complementary DNA-derived amino acid sequence of canine cardiac phospholamban.

    PubMed Central

    Fujii, J; Ueno, A; Kitano, K; Tanaka, S; Kadoma, M; Tada, M

    1987-01-01

    Complementary DNA (cDNA) clones specific for phospholamban of sarcoplasmic reticulum membranes have been isolated from a canine cardiac cDNA library. The amino acid sequence deduced from the cDNA sequence indicates that phospholamban consists of 52 amino acid residues and lacks an amino-terminal signal sequence. The protein has an inferred mol wt 6,080 that is in agreement with its apparent monomeric mol wt 6,000, estimated previously by sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Phospholamban contains two distinct domains, a hydrophilic region at the amino terminus (domain I) and a hydrophobic region at the carboxy terminus (domain II). We propose that domain I is localized at the cytoplasmic surface and offers phosphorylatable sites whereas domain II is anchored into the sarcoplasmic reticulum membrane. PMID:3793929

  17. GTRAC: fast retrieval from compressed collections of genomic variants.

    PubMed

    Tatwawadi, Kedar; Hernaez, Mikel; Ochoa, Idoia; Weissman, Tsachy

    2016-09-01

    The dramatic decrease in the cost of sequencing has resulted in the generation of huge amounts of genomic data, as evidenced by projects such as the UK10K and the Million Veteran Project, with the number of sequenced genomes ranging in the order of 10 K to 1 M. Due to the large redundancies among genomic sequences of individuals from the same species, most of the medical research deals with the variants in the sequences as compared with a reference sequence, rather than with the complete genomic sequences. Consequently, millions of genomes represented as variants are stored in databases. These databases are constantly updated and queried to extract information such as the common variants among individuals or groups of individuals. Previous algorithms for compression of this type of databases lack efficient random access capabilities, rendering querying the database for particular variants and/or individuals extremely inefficient, to the point where compression is often relinquished altogether. We present a new algorithm for this task, called GTRAC, that achieves significant compression ratios while allowing fast random access over the compressed database. For example, GTRAC is able to compress a Homo sapiens dataset containing 1092 samples in 1.1 GB (compression ratio of 160), while allowing for decompression of specific samples in less than a second and decompression of specific variants in 17 ms. GTRAC uses and adapts techniques from information theory, such as a specialized Lempel-Ziv compressor, and tailored succinct data structures. The GTRAC algorithm is available for download at: https://github.com/kedartatwawadi/GTRAC CONTACT: : kedart@stanford.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  18. Variants in SKP1, PROB1, and IL17B genes at keratoconus 5q31.1–q35.3 susceptibility locus identified by whole-exome sequencing

    PubMed Central

    Karolak, Justyna A; Gambin, Tomasz; Pitarque, Jose A; Molinari, Andrea; Jhangiani, Shalini; Stankiewicz, Pawel; Lupski, James R; Gajecka, Marzena

    2017-01-01

    Keratoconus (KTCN) is a protrusion and thinning of the cornea, resulting in impairment of visual function. The extreme genetic heterogeneity makes it difficult to discover factors unambiguously influencing the KTCN phenotype. In this study, we used whole-exome sequencing (WES) and Sanger sequencing to reduce the number of candidate genes at the 5q31.1–q35.3 locus and to prioritize other potentially relevant variants in an Ecuadorian family with KTCN. We applied WES in two affected KTCN individuals from the Ecuadorian family that showed a suggestive linkage between the KTCN phenotype and the 5q31.1–q35.3 locus. Putative variants identified by WES were further evaluated in this family using Sanger sequencing. Exome capture discovered a total of 173 rare (minor allele frequency <0.001 in control population) nonsynonymous variants in both affected individuals. Among them, 16 SNVs were selected for further evaluation. Segregation analysis revealed that variants c.475T>G in SKP1, c.671G>A in PROB1, and c.527G>A in IL17B in the 5q31.1–q35.3 linkage region, and c.850G>A in HKDC1 in the 10q22 locus completely segregated with the phenotype in the studied KTCN family. We demonstrate that a combination of various techniques significantly narrowed the studied genomic region and reduced the list of the putative exonic variants. Moreover, since this locus overlapped two other chromosomal regions previously recognized in distinct KTCN studies, our findings suggest that this 5q31.1–q35.3 locus might be linked with KTCN. PMID:27703147

  19. LenVarDB: database of length-variant protein domains.

    PubMed

    Mutt, Eshita; Mathew, Oommen K; Sowdhamini, Ramanathan

    2014-01-01

    Protein domains are functionally and structurally independent modules, which add to the functional variety of proteins. This array of functional diversity has been enabled by evolutionary changes, such as amino acid substitutions or insertions or deletions, occurring in these protein domains. Length variations (indels) can introduce changes at structural, functional and interaction levels. LenVarDB (freely available at http://caps.ncbs.res.in/lenvardb/) traces these length variations, starting from structure-based sequence alignments in our Protein Alignments organized as Structural Superfamilies (PASS2) database, across 731 structural classification of proteins (SCOP)-based protein domain superfamilies connected to 2 730 625 sequence homologues. Alignment of sequence homologues corresponding to a structural domain is available, starting from a structure-based sequence alignment of the superfamily. Orientation of the length-variant (indel) regions in protein domains can be visualized by mapping them on the structure and on the alignment. Knowledge about location of length variations within protein domains and their visual representation will be useful in predicting changes within structurally or functionally relevant sites, which may ultimately regulate protein function. Non-technical summary: Evolutionary changes bring about natural changes to proteins that may be found in many organisms. Such changes could be reflected as amino acid substitutions or insertions-deletions (indels) in protein sequences. LenVarDB is a database that provides an early overview of observed length variations that were set among 731 protein families and after examining >2 million sequences. Indels are followed up to observe if they are close to the active site such that they can affect the activity of proteins. Inclusion of such information can aid the design of bioengineering experiments.

  20. A variant in the sonic hedgehog regulatory sequence (ZRS) is associated with triphalangeal thumb and deregulates expression in the developing limb

    PubMed Central

    Furniss, Dominic; Lettice, Laura A.; Taylor, Indira B.; Critchley, Paul S.; Giele, Henk; Hill, Robert E.; Wilkie, Andrew O.M.

    2008-01-01

    A locus for triphalangeal thumb, variably associated with pre-axial polydactyly, was previously identified in the zone of polarizing activity regulatory sequence (ZRS), a long range limb-specific enhancer of the Sonic Hedgehog (SHH) gene at human chromosome 7q36.3. Here, we demonstrate that a 295T>C variant in the human ZRS, previously thought to represent a neutral polymorphism, acts as a dominant allele with reduced penetrance. We found this variant in three independently ascertained probands from southern England with triphalangeal thumb, demonstrated significant linkage of the phenotype to the variant (LOD = 4.1), and identified a shared microsatellite haplotype around the ZRS, suggesting that the probands share a common ancestor. An individual homozygous for the 295C allele presented with isolated bilateral triphalangeal thumb resembling the heterozygous phenotype, suggesting that the variant is largely dominant to the wild-type allele. As a functional test of the pathogenicity of the 295C allele, we utilized a mutated ZRS construct to demonstrate that it can drive ectopic anterior expression of a reporter gene in the developing mouse forelimb. We conclude that the 295T>C variant is in fact pathogenic and, in southern England, appears to be the most common cause of triphalangeal thumb. Depending on the dispersal of the founding mutation, it may play a wider role in the aetiology of this disorder. PMID:18463159

  1. Guillain-Barré Syndrome: A Variant Consisting of Facial Diplegia and Paresthesia with Left Facial Hemiplegia Associated with Antibodies to Galactocerebroside and Phosphatidic Acid

    PubMed Central

    Nishiguchi, Sho; Branch, Joel; Tsuchiya, Tsubasa; Ito, Ryoji; Kawada, Junya

    2017-01-01

    Patient: Male, 54 Final Diagnosis: Guillain-Barré syndrome Symptoms: Paresthesia of extremities • unilateral facial palsy Medication: — Clinical Procedure: — Specialty: Neurology Objective: Unusual clinical course Background: A rare variant of Guillain-Barré syndrome (GBS) consists of facial diplegia and paresthesia, but an even more rare association is with facial hemiplegia, similar to Bell’s palsy. This case report is of this rare variant of GBS that was associated with IgG antibodies to galactocerebroside and phosphatidic acid. Case Report: A 54-year-old man presented with lower left facial palsy and paresthesia of his extremities, following an upper respiratory tract infection. Physical examination confirmed lower left facial palsy and paresthesia of his extremities with hyporeflexia of his lower limbs and sensory loss of all four extremities. The differential diagnosis was between a variant of GBS and Bell’s palsy. Following initial treatment with glucocorticoids followed by intravenous immunoglobulin (IVIG), his sensory abnormalities resolved. Serum IgG antibodies to galactocerebroside and phosphatidic acid were positive in this patient, but not other antibodies to glycolipids or phospholipids were found. Five months following discharge from hospital, his left facial palsy had improved. Conclusions: A case of a rare variant of GBS is presented with facial diplegia and paresthesia and with unilateral facial palsy. This rare variant of GBS may which may mimic Bell’s palsy. In this case, IgG antibodies to galactocerebroside and phosphatidic acid were detected. PMID:28966341

  2. Comparison of illumina and 454 deep sequencing in participants failing raltegravir-based antiretroviral therapy.

    PubMed

    Li, Jonathan Z; Chapman, Brad; Charlebois, Patrick; Hofmann, Oliver; Weiner, Brian; Porter, Alyssa J; Samuel, Reshmi; Vardhanabhuti, Saran; Zheng, Lu; Eron, Joseph; Taiwo, Babafemi; Zody, Michael C; Henn, Matthew R; Kuritzkes, Daniel R; Hide, Winston; Wilson, Cara C; Berzins, Baiba I; Acosta, Edward P; Bastow, Barbara; Kim, Peter S; Read, Sarah W; Janik, Jennifer; Meres, Debra S; Lederman, Michael M; Mong-Kryspin, Lori; Shaw, Karl E; Zimmerman, Louis G; Leavitt, Randi; De La Rosa, Guy; Jennings, Amy

    2014-01-01

    The impact of raltegravir-resistant HIV-1 minority variants (MVs) on raltegravir treatment failure is unknown. Illumina sequencing offers greater throughput than 454, but sequence analysis tools for viral sequencing are needed. We evaluated Illumina and 454 for the detection of HIV-1 raltegravir-resistant MVs. A5262 was a single-arm study of raltegravir and darunavir/ritonavir in treatment-naïve patients. Pre-treatment plasma was obtained from 5 participants with raltegravir resistance at the time of virologic failure. A control library was created by pooling integrase clones at predefined proportions. Multiplexed sequencing was performed with Illumina and 454 platforms at comparable costs. Illumina sequence analysis was performed with the novel snp-assess tool and 454 sequencing was analyzed with V-Phaser. Illumina sequencing resulted in significantly higher sequence coverage and a 0.095% limit of detection. Illumina accurately detected all MVs in the control library at ≥0.5% and 7/10 MVs expected at 0.1%. 454 sequencing failed to detect any MVs at 0.1% with 5 false positive calls. For MVs detected in the patient samples by both 454 and Illumina, the correlation in the detected variant frequencies was high (R2 = 0.92, P<0.001). Illumina sequencing detected 2.4-fold greater nucleotide MVs and 2.9-fold greater amino acid MVs compared to 454. The only raltegravir-resistant MV detected was an E138K mutation in one participant by Illumina sequencing, but not by 454. In participants of A5262 with raltegravir resistance at virologic failure, baseline raltegravir-resistant MVs were rarely detected. At comparable costs to 454 sequencing, Illumina demonstrated greater depth of coverage, increased sensitivity for detecting HIV MVs, and fewer false positive variant calls.

  3. Expression Variants of the Lipogenic AGPAT6 Gene Affect Diverse Milk Composition Phenotypes in Bos taurus

    PubMed Central

    Littlejohn, Mathew D.; Tiplady, Kathryn; Lopdell, Thomas; Law, Tania A.; Scott, Andrew; Harland, Chad; Sherlock, Ric; Henty, Kristen; Obolonkin, Vlad; Lehnert, Klaus; MacGibbon, Alistair; Spelman, Richard J.; Davis, Stephen R.; Snell, Russell G.

    2014-01-01

    Milk is composed of a complex mixture of lipids, proteins, carbohydrates and various vitamins and minerals as a source of nutrition for young mammals. The composition of milk varies between individuals, with lipid composition in particular being highly heritable. Recent reports have highlighted a region of bovine chromosome 27 harbouring variants affecting milk fat percentage and fatty acid content. We aimed to further investigate this locus in two independent cattle populations, consisting of a Holstein-Friesian x Jersey crossbreed pedigree of 711 F2 cows, and a collection of 32,530 mixed ancestry Bos taurus cows. Bayesian genome-wide association mapping using markers imputed from the Illumina BovineHD chip revealed a large quantitative trait locus (QTL) for milk fat percentage on chromosome 27, present in both populations. We also investigated a range of other milk composition phenotypes, and report additional associations at this locus for fat yield, protein percentage and yield, lactose percentage and yield, milk volume, and the proportions of numerous milk fatty acids. We then used mammary RNA sequence data from 212 lactating cows to assess the transcript abundance of genes located in the milk fat percentage QTL interval. This analysis revealed a strong eQTL for AGPAT6, demonstrating that high milk fat percentage genotype is also additively associated with increased expression of the AGPAT6 gene. Finally, we used whole genome sequence data from six F1 sires to target a panel of novel AGPAT6 locus variants for genotyping in the F2 crossbreed population. Association analysis of 58 of these variants revealed highly significant association for polymorphisms mapping to the 5′UTR exons and intron 1 of AGPAT6. Taken together, these data suggest that variants affecting the expression of AGPAT6 are causally involved in differential milk fat synthesis, with pleiotropic consequences for a diverse range of other milk components. PMID:24465687

  4. Complementation of the Function of Glycoprotein H of Human Herpesvirus 6 Variant A by Glycoprotein H of Variant B in the Virus Life Cycle

    PubMed Central

    Oyaizu, Hiroko; Tang, Huamin; Ota, Megumi; Takenaka, Nobuyuki; Ozono, Keiichi; Yamanishi, Koichi

    2012-01-01

    Human herpesvirus 6 (HHV-6) is a T-cell-tropic betaherpesvirus. HHV-6 can be classified into two variants, HHV-6 variant A (HHV-6A) and HHV-6B, based on genetic, antigenic, and cell tropisms, although the homology of their entire genomic sequences is nearly 90%. The HHV-6A glycoprotein complex gH/gL/gQ1/gQ2 is a viral ligand that binds to the cellular receptor human CD46. Because gH has 94.3% amino acid identity between the variants, here we examined whether gH from one variant could complement its loss in the other. Recently, we successfully reconstituted HHV-6A from its cloned genome in a bacterial artificial chromosome (BAC) (rHHV-6ABAC). Using this system, we constructed HHV-6ABAC DNA containing the HHV-6B gH (BgH) gene instead of the HHV-6A gH (AgH) gene in Escherichia coli. Recombinant HHV-6ABAC expressing BgH (rHHV-6ABAC-BgH) was successfully reconstituted. In addition, a monoclonal antibody that blocks HHV-6B but not HHV-6A infection neutralized rHHV-6ABAC-BgH but not rHHV-6ABAC. These results indicate that HHV-6B gH can complement the function of HHV-6A gH in the viral infectious cycle. PMID:22647694

  5. Sequence variants of KHDRBS1 as high penetrance susceptibility risks for primary ovarian insufficiency by mis-regulating mRNA alternative splicing.

    PubMed

    Wang, Binbin; Li, Lin; Zhu, Ying; Zhang, Wei; Wang, Xi; Chen, Beili; Li, Tengyan; Pan, Hong; Wang, Jing; Kee, Kehkooi; Cao, Yunxia

    2017-10-01

    Does a novel heterozygous KHDRBS1 variant, identified using whole-exome sequencing (WES) in two patients with primary ovarian insufficiency (POI) in a pedigree, cause defects in mRNA alternative splicing? The heterozygous variant of KHDRBS1 was confirmed to cause defects in alternative splicing of many genes involved in DNA replication and repair. Studies in mice revealed that Khdrbs1 deficient females are subfertile, which manifests as delayed sexual maturity and significantly reduced numbers of secondary and pre-antral follicles. No mutation of KHDRBS1, however, has been reported in patients with POI. This genetic and functional study used WES to find putative mutations in a POI pedigree. Altogether, 215 idiopathic POI patients and 400 healthy controls were screened for KHDRBS1 mutations. Two POI patients were subjected to WES to identify sequence variants. Mutational analysis of the KHDRBS1 gene in 215 idiopathic POI patients and 400 healthy controls were performed. RNA-sequencing was carried out to find the mis-regulation of gene expression due to KHDRBS1 mutation. Bioinformatics was used to analyze the change in alternative splicing events. We identified a heterozygous mutation (c.460A > G, p.M154V) in KHDRBS1 in two patients. Further mutational analysis of 215 idiopathic POI patients with the KHDRBS1 gene found one heterozygous mutation (c.263C > T, p.P88L). We failed to find these two mutations in 400 healthy control women. Using RNA-sequencing, we found that the KGN cells expressing the M154V KHDRBS1 mutant had different expression of 66 genes compared with wild-type (WT) cells. Furthermore, 145 genes were alternatively spliced in M154V cells, and these genes were enriched for DNA replication and repair function, revealing a potential underlying mechanism of the pathology that leads to POI. Although the in vitro assays demonstrated the effect of the KHDRBS1 variant on alternative splicing, further studies are needed to validate the in vivo effects on germ

  6. The complete amino acid sequence of human erythrocyte diphosphoglycerate mutase.

    PubMed Central

    Haggarty, N W; Dunbar, B; Fothergill, L A

    1983-01-01

    The complete amino acid sequence of human erythrocyte diphosphoglycerate mutase, comprising 239 residues, was determined. The sequence was deduced from the four cyanogen bromide fragments, and from the peptides derived from these fragments after digestion with a number of proteolytic enzymes. Comparison of this sequence with that of the yeast glycolytic enzyme, phosphoglycerate mutase, shows that these enzymes are 47% identical. Most, but not all, of the residues implicated as being important for the activity of the glycolytic mutase are conserved in the erythrocyte diphosphoglycerate mutase. PMID:6313356

  7. VarBin, a novel method for classifying true and false positive variants in NGS data

    PubMed Central

    2013-01-01

    Background Variant discovery for rare genetic diseases using Illumina genome or exome sequencing involves screening of up to millions of variants to find only the one or few causative variant(s). Sequencing or alignment errors create "false positive" variants, which are often retained in the variant screening process. Methods to remove false positive variants often retain many false positive variants. This report presents VarBin, a method to prioritize variants based on a false positive variant likelihood prediction. Methods VarBin uses the Genome Analysis Toolkit variant calling software to calculate the variant-to-wild type genotype likelihood ratio at each variant change and position divided by read depth. The resulting Phred-scaled, likelihood-ratio by depth (PLRD) was used to segregate variants into 4 Bins with Bin 1 variants most likely true and Bin 4 most likely false positive. PLRD values were calculated for a proband of interest and 41 additional Illumina HiSeq, exome and whole genome samples (proband's family or unrelated samples). At variant sites without apparent sequencing or alignment error, wild type/non-variant calls cluster near -3 PLRD and variant calls typically cluster above 10 PLRD. Sites with systematic variant calling problems (evident by variant quality scores and biases as well as displayed on the iGV viewer) tend to have higher and more variable wild type/non-variant PLRD values. Depending on the separation of a proband's variant PLRD value from the cluster of wild type/non-variant PLRD values for background samples at the same variant change and position, the VarBin method's classification is assigned to each proband variant (Bin 1 to Bin 4). Results To assess VarBin performance, Sanger sequencing was performed on 98 variants in the proband and background samples. True variants were confirmed in 97% of Bin 1 variants, 30% of Bin 2, and 0% of Bin 3/Bin 4. Conclusions These data indicate that VarBin correctly classifies the majority of true

  8. Association of genetic variants of GRIN2B with autism.

    PubMed

    Pan, Yongcheng; Chen, Jingjing; Guo, Hui; Ou, Jianjun; Peng, Yu; Liu, Qiong; Shen, Yidong; Shi, Lijuan; Liu, Yalan; Xiong, Zhimin; Zhu, Tengfei; Luo, Sanchuan; Hu, Zhengmao; Zhao, Jingping; Xia, Kun

    2015-02-06

    Autism (MIM 209850) is a complex neurodevelopmental disorder characterized by social communication impairments and restricted repetitive behaviors. It has a high heritability, although much remains unclear. To evaluate genetic variants of GRIN2B in autism etiology, we performed a system association study of common and rare variants of GRIN2B and autism in cohorts from a Chinese population, involving a total sample of 1,945 subjects. Meta-analysis of a triad family cohort and a case-control cohort identified significant associations of multiple common variants and autism risk (Pmin = 1.73 × 10(-4)). Significantly, the haplotype involved with the top common variants also showed significant association (P = 1.78 × 10(-6)). Sanger sequencing of 275 probands from a triad cohort identified several variants in coding regions, including four common variants and seven rare variants. Two of the common coding variants were located in the autism-related linkage disequilibrium (LD) block, and both were significantly associated with autism (P < 9 × 10(-3)) using an independent control cohort. Burden analysis and case-only analysis of rare coding variants identified by Sanger sequencing did not find this association. Our study for the first time reveals that common variants and related haplotypes of GRIN2B are associated with autism risk.

  9. Computational Redesign of Acyl-ACP Thioesterase with Improved Selectivity toward Medium-Chain-Length Fatty Acids

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Grisewood, Matthew J.; Hernández-Lozada, Néstor J.; Thoden, James B.

    Enzyme and metabolic engineering offer the potential to develop biocatalysts for converting natural resources to a wide range of chemicals. To broaden the scope of potential products beyond natural metabolites, methods of engineering enzymes to accept alternative substrates and/or perform novel chemistries must be developed. DNA synthesis can create large libraries of enzyme-coding sequences, but most biochemistries lack a simple assay to screen for promising enzyme variants. Our solution to this challenge is structure-guided mutagenesis, in which optimization algorithms select the best sequences from libraries based on specified criteria (i.e., binding selectivity). We demonstrate this approach by identifying medium-chain (C8–C12)more » acyl-ACP thioesterases through structure-guided mutagenesis. Medium-chain fatty acids, which are products of thioesterase-catalyzed hydrolysis, are limited in natural abundance, compared to long-chain fatty acids; the limited supply leads to high costs of C6–C10 oleochemicals such as fatty alcohols, amines, and esters. Here, we applied computational tools to tune substrate binding of the highly active ‘TesA thioesterase in Escherichia coli. We used the IPRO algorithm to design thioesterase variants with enhanced C12 or C8 specificity, while maintaining high activity. After four rounds of structure-guided mutagenesis, we identified 3 variants with enhanced production of dodecanoic acid (C12) and 27 variants with enhanced production of octanoic acid (C8). The top variants reached up to 49% C12 and 50% C8 while exceeding native levels of total free fatty acids. A comparably sized library created by random mutagenesis failed to identify promising mutants. The chain length-preference of ‘TesA and the best mutant were confirmed in vitro using acyl-CoA substrates. Molecular dynamics simulations, confirmed by resolved crystal structures, of ‘TesA variants suggest that hydrophobic forces govern ‘TesA substrate specificity

  10. Computational Redesign of Acyl-ACP Thioesterase with Improved Selectivity toward Medium-Chain-Length Fatty Acids

    DOE PAGES

    Grisewood, Matthew J.; Hernández-Lozada, Néstor J.; Thoden, James B.; ...

    2017-04-20

    Enzyme and metabolic engineering offer the potential to develop biocatalysts for converting natural resources to a wide range of chemicals. To broaden the scope of potential products beyond natural metabolites, methods of engineering enzymes to accept alternative substrates and/or perform novel chemistries must be developed. DNA synthesis can create large libraries of enzyme-coding sequences, but most biochemistries lack a simple assay to screen for promising enzyme variants. Our solution to this challenge is structure-guided mutagenesis, in which optimization algorithms select the best sequences from libraries based on specified criteria (i.e., binding selectivity). We demonstrate this approach by identifying medium-chain (C8–C12)more » acyl-ACP thioesterases through structure-guided mutagenesis. Medium-chain fatty acids, which are products of thioesterase-catalyzed hydrolysis, are limited in natural abundance, compared to long-chain fatty acids; the limited supply leads to high costs of C6–C10 oleochemicals such as fatty alcohols, amines, and esters. Here, we applied computational tools to tune substrate binding of the highly active ‘TesA thioesterase in Escherichia coli. We used the IPRO algorithm to design thioesterase variants with enhanced C12 or C8 specificity, while maintaining high activity. After four rounds of structure-guided mutagenesis, we identified 3 variants with enhanced production of dodecanoic acid (C12) and 27 variants with enhanced production of octanoic acid (C8). The top variants reached up to 49% C12 and 50% C8 while exceeding native levels of total free fatty acids. A comparably sized library created by random mutagenesis failed to identify promising mutants. The chain length-preference of ‘TesA and the best mutant were confirmed in vitro using acyl-CoA substrates. Molecular dynamics simulations, confirmed by resolved crystal structures, of ‘TesA variants suggest that hydrophobic forces govern ‘TesA substrate specificity

  11. Quantum-Sequencing: Biophysics of quantum tunneling through nucleic acids

    NASA Astrophysics Data System (ADS)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    Tunneling microscopy and spectroscopy has extensively been used in physical surface sciences to study quantum tunneling to measure electronic local density of states of nanomaterials and to characterize adsorbed species. Quantum-Sequencing (Q-Seq) is a new method based on tunneling microscopy for electronic sequencing of single molecule of nucleic acids. A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free single-molecule sequencing method. Here, we present the unique ``electronic fingerprints'' for all nucleotides on DNA and RNA using Q-Seq along their intrinsic biophysical parameters. We have analyzed tunneling spectra for the nucleotides at different pH conditions and analyzed the HOMO, LUMO and energy gap for all of them. In addition we show a number of biophysical parameters to further characterize all nucleobases (electron and hole transition voltage and energy barriers). These results highlight the robustness of Q-Seq as a technique for next-generation sequencing.

  12. Amino acid sequence of the Amur tiger prion protein.

    PubMed

    Wu, Changde; Pang, Wanyong; Zhao, Deming

    2006-10-01

    Prion diseases are fatal neurodegenerative disorders in human and animal associated with conformational conversion of a cellular prion protein (PrP(C)) into the pathologic isoform (PrP(Sc)). Various data indicate that the polymorphisms within the open reading frame (ORF) of PrP are associated with the susceptibility and control the species barrier in prion diseases. In the present study, partial Prnp from 25 Amur tigers (tPrnp) were cloned and screened for polymorphisms. Four single nucleotide polymorphisms (T423C, A501G, C511A, A610G) were found; the C511A and A610G nucleotide substitutions resulted in the amino acid changes Lysine171Glutamine and Alanine204Threoine, respectively. The tPrnp amino acid sequence is similar to house cat (Felis catus ) and sheep, but differs significantly from other two cat Prnp sequences that were previously deposited in GenBank.

  13. Functional analysis of four naturally occurring variants of human constitutive androstane receptor.

    PubMed

    Ikeda, Shinobu; Kurose, Kouichi; Jinno, Hideto; Sai, Kimie; Ozawa, Shogo; Hasegawa, Ryuichi; Komamura, Kazuo; Kotake, Takeshi; Morishita, Hideki; Kamakura, Shiro; Kitakaze, Masafumi; Tomoike, Hitonobu; Tamura, Tomohide; Yamamoto, Noboru; Kunitoh, Hideo; Yamada, Yasuhide; Ohe, Yuichiro; Shimada, Yasuhiro; Shirao, Kuniaki; Kubota, Kaoru; Minami, Hironobu; Ohtsu, Atsushi; Yoshida, Teruhiko; Saijo, Nagahiro; Saito, Yoshiro; Sawada, Jun-ichi

    2005-01-01

    The human constitutive androstane receptor (CAR, NR1I3) is a member of the orphan nuclear receptor superfamily that plays an important role in the control of drug metabolism and disposition. In this study, we sequenced all the coding exons of the NR1I3 gene for 334 Japanese subjects. We identified three novel single nucleotide polymorphisms (SNPs) that induce non-synonymous alterations of amino acids (His246Arg, Leu308Pro, and Asn323Ser) residing in the ligand-binding domain of CAR, in addition to the Val133Gly variant, which was another CAR variant identified in our previous study. We performed functional analysis of these four naturally occurring CAR variants in COS-7 cells using a CYP3A4 promoter/enhancer reporter gene that includes the CAR responsive elements. The His246Arg variant caused marked reductions in both transactivation of the reporter gene and in the response to 6-(4-chlorophenyl)imidazo[2,1-b][1,3]thiazole-5-carbaldehyde O-(3,4-dichlorobenzyl)oxime (CITCO), which is a human CAR-specific agonist. The transactivation ability of the Leu308Pro variant was also significantly decreased, but its responsiveness to CITCO was not abrogated. The transactivation ability and CITCO response of the Val133Gly and Asn323Ser variants did not change as compared to the wild-type CAR. These data suggest that the His246Arg and Leu308Pro variants, especially His246Arg, may influence the expression of drug-metabolizing enzymes and transporters that are transactivated by CAR.

  14. Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

    DOEpatents

    Studier, F. William

    1995-04-18

    Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient.

  15. Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

    DOEpatents

    Studier, F.W.

    1995-04-18

    Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient. 2 figs.

  16. A rare variant in MYH6 is associated with high risk of sick sinus syndrome

    PubMed Central

    Holm, Hilma; Gudbjartsson, Daniel F; Sulem, Patrick; Masson, Gisli; Helgadottir, Hafdis Th; Zanon, Carlo; Magnusson, Olafur Th; Helgason, Agnar; Saemundsdottir, Jona; Gylfason, Arnaldur; Stefansdottir, Hrafnhildur; Gretarsdottir, Solveig; Matthiasson, Stefan E; Thorgeirsson, Guðmundur; Jonasdottir, Aslaug; Sigurdsson, Asgeir; Stefansson, Hreinn; Werge, Thomas; Rafnar, Thorunn; Kiemeney, Lambertus A; Parvez, Babar; Muhammad, Raafia; Roden, Dan M; Darbar, Dawood; Thorleifsson, Gudmar; Walters, G Bragi; Kong, Augustine; Thorsteinsdottir, Unnur; Arnar, David O; Stefansson, Kari

    2011-01-01

    Through complementary application of SNP genotyping, whole-genome sequencing and imputation in 38,384 Icelanders, we have discovered a previously unidentified sick sinus syndrome susceptibility gene, MYH6, encoding the alpha heavy chain subunit of cardiac myosin. A missense variant in this gene, c.2161C>T, results in the conceptual amino acid substitution p.Arg721Trp, has an allelic frequency of 0.38% in Icelanders and associates with sick sinus syndrome with an odds ratio = 1 2.53 and P = 1.5 × 10−29. We show that the lifetime risk of being diagnosed with sick sinus syndrome is around 6% for non-carriers of c.2161C>T but is approximately 50% for carriers of the c.2161C>T variant. PMID:21378987

  17. Exome Sequencing in Suspected Monogenic Dyslipidemias

    PubMed Central

    Stitziel, Nathan O.; Peloso, Gina M.; Abifadel, Marianne; Cefalu, Angelo B.; Fouchier, Sigrid; Motazacker, M. Mahdi; Tada, Hayato; Larach, Daniel B.; Awan, Zuhier; Haller, Jorge F.; Pullinger, Clive R.; Varret, Mathilde; Rabès, Jean-Pierre; Noto, Davide; Tarugi, Patrizia; Kawashiri, Masa-aki; Nohara, Atsushi; Yamagishi, Masakazu; Risman, Marjorie; Deo, Rahul; Ruel, Isabelle; Shendure, Jay; Nickerson, Deborah A.; Wilson, James G.; Rich, Stephen S.; Gupta, Namrata; Farlow, Deborah N.; Neale, Benjamin M.; Daly, Mark J.; Kane, John P.; Freeman, Mason W.; Genest, Jacques; Rader, Daniel J.; Mabuchi, Hiroshi; Kastelein, John J.P.; Hovingh, G. Kees; Averna, Maurizio R.; Gabriel, Stacey; Boileau, Catherine; Kathiresan, Sekar

    2015-01-01

    Background Exome sequencing is a promising tool for gene mapping in Mendelian disorders. We utilized this technique in an attempt to identify novel genes underlying monogenic dyslipidemias. Methods and Results We performed exome sequencing on 213 selected family members from 41 kindreds with suspected Mendelian inheritance of extreme levels of low-density lipoprotein (LDL) cholesterol (after candidate gene sequencing excluded known genetic causes for high LDL cholesterol families) or high-density lipoprotein (HDL) cholesterol. We used standard analytic approaches to identify candidate variants and also assigned a polygenic score to each individual in order to account for their burden of common genetic variants known to influence lipid levels. In nine families, we identified likely pathogenic variants in known lipid genes (ABCA1, APOB, APOE, LDLR, LIPA, and PCSK9); however, we were unable to identify obvious genetic etiologies in the remaining 32 families despite follow-up analyses. We identified three factors that limited novel gene discovery: (1) imperfect sequencing coverage across the exome hid potentially causal variants; (2) large numbers of shared rare alleles within families obfuscated causal variant identification; and (3) individuals from 15% of families carried a significant burden of common lipid-related alleles, suggesting complex inheritance can masquerade as monogenic disease. Conclusions We identified the genetic basis of disease in nine of 41 families; however, none of these represented novel gene discoveries. Our results highlight the promise and limitations of exome sequencing as a discovery technique in suspected monogenic dyslipidemias. Considering the confounders identified may inform the design of future exome sequencing studies. PMID:25632026

  18. Complex Analysis of Urate Transporters SLC2A9, SLC22A12 and Functional Characterization of Non-Synonymous Allelic Variants of GLUT9 in the Czech Population: No Evidence of Effect on Hyperuricemia and Gout

    PubMed Central

    Hurba, Olha; Mancikova, Andrea; Krylov, Vladimir; Pavlikova, Marketa; Pavelka, Karel; Stibůrková, Blanka

    2014-01-01

    Objective Using European descent Czech populations, we performed a study of SLC2A9 and SLC22A12 genes previously identified as being associated with serum uric acid concentrations and gout. This is the first study of the impact of non-synonymous allelic variants on the function of GLUT9 except for patients suffering from renal hypouricemia type 2. Methods The cohort consisted of 250 individuals (150 controls, 54 nonspecific hyperuricemics and 46 primary gout and/or hyperuricemia subjects). We analyzed 13 exons of SLC2A9 (GLUT9 variant 1 and GLUT9 variant 2) and 10 exons of SLC22A12 by PCR amplification and sequenced directly. Allelic variants were prepared and their urate uptake and subcellular localization were studied by Xenopus oocytes expression system. The functional studies were analyzed using the non-parametric Wilcoxon and Kruskall-Wallis tests; the association study used the Fisher exact test and linear regression approach. Results We identified a total of 52 sequence variants (12 unpublished). Eight non-synonymous allelic variants were found only in SLC2A9: rs6820230, rs2276961, rs144196049, rs112404957, rs73225891, rs16890979, rs3733591 and rs2280205. None of these variants showed any significant difference in the expression of GLUT9 and in urate transport. In the association study, eight variants showed a possible association with hyperuricemia. However, seven of these were in introns and the one exon located variant, rs7932775, did not show a statistically significant association with serum uric acid concentration. Conclusion Our results did not confirm any effect of SLC22A12 and SLC2A9 variants on serum uric acid concentration. Our complex approach using association analysis together with functional and immunohistochemical characterization of non-synonymous allelic variants did not show any influence on expression, subcellular localization and urate uptake of GLUT9. PMID:25268603

  19. Whole-genome sequencing reveals a coding non-pathogenic variant tagging a non-coding pathogenic hexanucleotide repeat expansion in C9orf72 as cause of amyotrophic lateral sclerosis.

    PubMed

    Herdewyn, Sarah; Zhao, Hui; Moisse, Matthieu; Race, Valérie; Matthijs, Gert; Reumers, Joke; Kusters, Benno; Schelhaas, Helenius J; van den Berg, Leonard H; Goris, An; Robberecht, Wim; Lambrechts, Diether; Van Damme, Philip

    2012-06-01

    Motor neuron degeneration in amyotrophic lateral sclerosis (ALS) has a familial cause in 10% of patients. Despite significant advances in the genetics of the disease, many families remain unexplained. We performed whole-genome sequencing in five family members from a pedigree with autosomal-dominant classical ALS. A family-based elimination approach was used to identify novel coding variants segregating with the disease. This list of variants was effectively shortened by genotyping these variants in 2 additional unaffected family members and 1500 unrelated population-specific controls. A novel rare coding variant in SPAG8 on chromosome 9p13.3 segregated with the disease and was not observed in controls. Mutations in SPAG8 were not encountered in 34 other unexplained ALS pedigrees, including 1 with linkage to chromosome 9p13.2-23.3. The shared haplotype containing the SPAG8 variant in this small pedigree was 22.7 Mb and overlapped with the core 9p21 linkage locus for ALS and frontotemporal dementia. Based on differences in coverage depth of known variable tandem repeat regions between affected and non-affected family members, the shared haplotype was found to contain an expanded hexanucleotide (GGGGCC)(n) repeat in C9orf72 in the affected members. Our results demonstrate that rare coding variants identified by whole-genome sequencing can tag a shared haplotype containing a non-coding pathogenic mutation and that changes in coverage depth can be used to reveal tandem repeat expansions. It also confirms (GGGGCC)n repeat expansions in C9orf72 as a cause of familial ALS.

  20. REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants.

    PubMed

    Ioannidis, Nilah M; Rothstein, Joseph H; Pejaver, Vikas; Middha, Sumit; McDonnell, Shannon K; Baheti, Saurabh; Musolf, Anthony; Li, Qing; Holzinger, Emily; Karyadi, Danielle; Cannon-Albright, Lisa A; Teerlink, Craig C; Stanford, Janet L; Isaacs, William B; Xu, Jianfeng; Cooney, Kathleen A; Lange, Ethan M; Schleutker, Johanna; Carpten, John D; Powell, Isaac J; Cussenot, Olivier; Cancel-Tassin, Geraldine; Giles, Graham G; MacInnis, Robert J; Maier, Christiane; Hsieh, Chih-Lin; Wiklund, Fredrik; Catalona, William J; Foulkes, William D; Mandal, Diptasri; Eeles, Rosalind A; Kote-Jarai, Zsofia; Bustamante, Carlos D; Schaid, Daniel J; Hastie, Trevor; Ostrander, Elaine A; Bailey-Wilson, Joan E; Radivojac, Predrag; Thibodeau, Stephen N; Whittemore, Alice S; Sieh, Weiva

    2016-10-06

    The vast majority of coding variants are rare, and assessment of the contribution of rare variants to complex traits is hampered by low statistical power and limited functional data. Improved methods for predicting the pathogenicity of rare coding variants are needed to facilitate the discovery of disease variants from exome sequencing studies. We developed REVEL (rare exome variant ensemble learner), an ensemble method for predicting the pathogenicity of missense variants on the basis of individual tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP, SiPhy, phyloP, and phastCons. REVEL was trained with recently discovered pathogenic and rare neutral missense variants, excluding those previously used to train its constituent tools. When applied to two independent test sets, REVEL had the best overall performance (p < 10 -12 ) as compared to any individual tool and seven ensemble methods: MetaSVM, MetaLR, KGGSeq, Condel, CADD, DANN, and Eigen. Importantly, REVEL also had the best performance for distinguishing pathogenic from rare neutral variants with allele frequencies <0.5%. The area under the receiver operating characteristic curve (AUC) for REVEL was 0.046-0.182 higher in an independent test set of 935 recent SwissVar disease variants and 123,935 putatively neutral exome sequencing variants and 0.027-0.143 higher in an independent test set of 1,953 pathogenic and 2,406 benign variants recently reported in ClinVar than the AUCs for other ensemble methods. We provide pre-computed REVEL scores for all possible human missense variants to facilitate the identification of pathogenic variants in the sea of rare variants discovered as sequencing studies expand in scale. Copyright © 2016 American Society of Human Genetics. All rights reserved.

  1. VaDiR: an integrated approach to Variant Detection in RNA.

    PubMed

    Neums, Lisa; Suenaga, Seiji; Beyerlein, Peter; Anders, Sara; Koestler, Devin; Mariani, Andrea; Chien, Jeremy

    2018-02-01

    Advances in next-generation DNA sequencing technologies are now enabling detailed characterization of sequence variations in cancer genomes. With whole-genome sequencing, variations in coding and non-coding sequences can be discovered. But the cost associated with it is currently limiting its general use in research. Whole-exome sequencing is used to characterize sequence variations in coding regions, but the cost associated with capture reagents and biases in capture rate limit its full use in research. Additional limitations include uncertainty in assigning the functional significance of the mutations when these mutations are observed in the non-coding region or in genes that are not expressed in cancer tissue. We investigated the feasibility of uncovering mutations from expressed genes using RNA sequencing datasets with a method called Variant Detection in RNA(VaDiR) that integrates 3 variant callers, namely: SNPiR, RVBoost, and MuTect2. The combination of all 3 methods, which we called Tier 1 variants, produced the highest precision with true positive mutations from RNA-seq that could be validated at the DNA level. We also found that the integration of Tier 1 variants with those called by MuTect2 and SNPiR produced the highest recall with acceptable precision. Finally, we observed a higher rate of mutation discovery in genes that are expressed at higher levels. Our method, VaDiR, provides a possibility of uncovering mutations from RNA sequencing datasets that could be useful in further functional analysis. In addition, our approach allows orthogonal validation of DNA-based mutation discovery by providing complementary sequence variation analysis from paired RNA/DNA sequencing datasets.

  2. A novel HLA-B allele, B*5214, detected in a Taiwanese volunteer bone marrow donor using a sequence-based typing method.

    PubMed

    Chen, M J; Chu, C C; Shyr, M H; Lin, C L; Lin, P Y; Yang, K L

    2010-02-01

    HLA-B*5214, a novel rare allele of HLA-B*52 variant, was found in a Taiwanese volunteer bone marrow donor by sequence-based typing method. The sequence of B*5214 is identical to that of B*520101 in exon 2 but differs from B*520101 in exon 3 at nucleotide positions 419 A-->T and 435 A-->G. Alteration of these two nucleotides resulted an amino acid substitution at amino acid residue 116 Y-->F ( TAC-->TTC) and a silent exchange at residue 121 K-->K (AAA-->AAG).

  3. Stability and function of interdomain linker variants of glucoamylase 1 from Aspergillus niger.

    PubMed

    Sauer, J; Christensen, T; Frandsen, T P; Mirgorodskaya, E; McGuire, K A; Driguez, H; Roepstorff, P; Sigurskjold, B W; Svensson, B

    2001-08-07

    Several variants of glucoamylase 1 (GA1) from Aspergillus niger were created in which the highly O-glycosylated peptide (aa 468--508) connecting the (alpha/alpha)(6)-barrel catalytic domain and the starch binding domain was substituted at the gene level by equivalent segments of glucoamylases from Hormoconis resinae, Humicola grisea, and Rhizopus oryzae encoding 5, 19, and 36 amino acid residues. Variants were constructed in which the H. resinae linker was elongated by proline-rich sequences as this linker itself apparently was too short to allow formation of the corresponding protein variant. Size and isoelectric point of GA1 variants reflected differences in linker length, posttranslational modification, and net charge. While calculated polypeptide chain molecular masses for wild-type GA1, a nonnatural proline-rich linker variant, H. grisea, and R. oryzae linker variants were 65,784, 63,777, 63,912, and 65,614 Da, respectively, MALDI-TOF-MS gave values of 82,042, 73,800, 73,413, and 90,793 Da, respectively, where the latter value could partly be explained by an N-glycosylation site introduced near the linker C-terminus. The k(cat) and K(m) for hydrolysis of maltooligodextrins and soluble starch, and the rate of hydrolysis of barley starch granules were essentially the same for the variants as for wild-type GA1. beta-Cyclodextrin, acarbose, and two heterobidentate inhibitors were found by isothermal titration calorimetry to bind to the catalytic and starch binding domains of the linker variants, indicating that the function of the active site and the starch binding site was maintained. The stability of GA1 linker variants toward GdnHCl and heat, however, was reduced compared to wild-type.

  4. Two Novel Variants Affecting CDKL5 Transcript Associated with Epileptic Encephalopathy.

    PubMed

    Neupauerová, Jana; Štěrbová, Katalin; Vlčková, Markéta; Sebroňová, Věra; Maříková, Tat'ána; Krůtová, Marcela; David, Staněk; Kršek, Pavel; Žaliová, Markéta; Seeman, Pavel; Laššuthová, Petra

    2017-10-01

    Variants in the human X-linked cyclin-dependent kinase-like 5 (CDKL5) gene have been reported as being etiologically associated with early infantile epileptic encephalopathy type 2 (EIEE2). We report on two patients, a boy and a girl, with EIEE2 that present with early onset epilepsy, hypotonia, severe intellectual disability, and poor eye contact. Massively parallel sequencing (MPS) of a custom-designed gene panel for epilepsy and epileptic encephalopathy containing 112 epilepsy-related genes was performed. Sanger sequencing was used to confirm the novel variants. For confirmation of the functional consequence of an intronic CDKL5 variant in patient 2, an RNA study was done. DNA sequencing revealed de novo variants in CDKL5, a c.2578C>T (p. Gln860*) present in a hemizygous state in a 3-year-old boy, and a potential splice site variant c.463+5G>A in heterozygous state in a 5-year-old girl. Multiple in silico splicing algorithms predicted a highly reduced splice site score for c.463+5G>A. A subsequent mRNA study confirmed an aberrant shorter transcript lacking exon 7. Our data confirmed that variants in the CDKL5 are associated with EIEE2. There is credible evidence that the novel identified variants are pathogenic and, therefore, are likely the cause of the disease in the presented patients. In one of the patients a stop codon variant is predicted to produce a truncated protein, and in the other patient an intronic variant results in aberrant splicing.

  5. Histone H3 Variants in Trichomonas vaginalis

    PubMed Central

    Zubáčová, Zuzana; Hostomská, Jitka

    2012-01-01

    The parabasalid protist Trichomonas vaginalis is a widespread parasite that affects humans, frequently causing vaginitis in infected women. Trichomonad mitosis is marked by the persistence of the nuclear membrane and the presence of an asymmetric extranuclear spindle with no obvious direct connection to the chromosomes. No centromeric markers have been described in T. vaginalis, which has prevented a detailed analysis of mitotic events in this organism. In other eukaryotes, nucleosomes of centromeric chromatin contain the histone H3 variant CenH3. The principal aim of this work was to identify a CenH3 homolog in T. vaginalis. We performed a screen of the T. vaginalis genome to retrieve sequences of canonical and variant H3 histones. Three variant histone H3 proteins were identified, and the subcellular localization of their epitope-tagged variants was determined. The localization of the variant TVAG_185390 could not be distinguished from that of the canonical H3 histone. The sequence of the variant TVAG_087830 closely resembled that of histone H3. The tagged protein colocalized with sites of active transcription, indicating that the variant TVAG_087830 represented H3.3 in T. vaginalis. The third H3 variant (TVAG_224460) was localized to 6 or 12 distinct spots at the periphery of the nucleus, corresponding to the number of chromosomes in G1 phase and G2 phase, respectively. We propose that this variant represents the centromeric marker CenH3 and thus can be employed as a tool to study mitosis in T. vaginalis. Furthermore, we suggest that the peripheral distribution of CenH3 within the nucleus results from the association of centromeres with the nuclear envelope throughout the cell cycle. PMID:22408228

  6. Inferring Short-Range Linkage Information from Sequencing Chromatograms

    PubMed Central

    Beggel, Bastian; Neumann-Fraune, Maria; Kaiser, Rolf; Verheyen, Jens; Lengauer, Thomas

    2013-01-01

    Direct Sanger sequencing of viral genome populations yields multiple ambiguous sequence positions. It is not straightforward to derive linkage information from sequencing chromatograms, which in turn hampers the correct interpretation of the sequence data. We present a method for determining the variants existing in a viral quasispecies in the case of two nearby ambiguous sequence positions by exploiting the effect of sequence context-dependent incorporation of dideoxynucleotides. The computational model was trained on data from sequencing chromatograms of clonal variants and was evaluated on two test sets of in vitro mixtures. The approach achieved high accuracies in identifying the mixture components of 97.4% on a test set in which the positions to be analyzed are only one base apart from each other, and of 84.5% on a test set in which the ambiguous positions are separated by three bases. In silico experiments suggest two major limitations of our approach in terms of accuracy. First, due to a basic limitation of Sanger sequencing, it is not possible to reliably detect minor variants with a relative frequency of no more than 10%. Second, the model cannot distinguish between mixtures of two or four clonal variants, if one of two sets of linear constraints is fulfilled. Furthermore, the approach requires repetitive sequencing of all variants that might be present in the mixture to be analyzed. Nevertheless, the effectiveness of our method on the two in vitro test sets shows that short-range linkage information of two ambiguous sequence positions can be inferred from Sanger sequencing chromatograms without any further assumptions on the mixture composition. Additionally, our model provides new insights into the established and widely used Sanger sequencing technology. The source code of our method is made available at http://bioinf.mpi-inf.mpg.de/publications/beggel/linkageinformation.zip. PMID:24376502

  7. Continuously tunable nucleic acid hybridization probes.

    PubMed

    Wu, Lucia R; Wang, Juexiao Sherry; Fang, John Z; Evans, Emily R; Pinto, Alessandro; Pekker, Irena; Boykin, Richard; Ngouenet, Celine; Webster, Philippa J; Beechem, Joseph; Zhang, David Yu

    2015-12-01

    In silico-designed nucleic acid probes and primers often do not achieve favorable specificity and sensitivity tradeoffs on the first try, and iterative empirical sequence-based optimization is needed, particularly in multiplexed assays. We present a novel, on-the-fly method of tuning probe affinity and selectivity by adjusting the stoichiometry of auxiliary species, which allows for independent and decoupled adjustment of the hybridization yield for different probes in multiplexed assays. Using this method, we achieved near-continuous tuning of probe effective free energy. To demonstrate our approach, we enforced uniform capture efficiency of 31 DNA molecules (GC content, 0-100%), maximized the signal difference for 11 pairs of single-nucleotide variants and performed tunable hybrid capture of mRNA from total RNA. Using the Nanostring nCounter platform, we applied stoichiometric tuning to simultaneously adjust yields for a 24-plex assay, and we show multiplexed quantitation of RNA sequences and variants from formalin-fixed, paraffin-embedded samples.

  8. Continuously Tunable Nucleic Acid Hybridization Probes

    PubMed Central

    Wu, Lucia R.; Wang, J. Sherry; Fang, John Z.; Reiser, Emily; Pinto, Alessandro; Pekker, Irena; Boykin, Richard; Ngouenet, Celine; Webster, Philippa J.; Beechem, Joseph; Zhang, David Yu

    2015-01-01

    In silico designed nucleic acid probes and primers often fail to achieve favorable specificity and sensitivity tradeoffs on the first try, and iterative empirical sequence-based optimization is needed, particularly in multiplexed assays. Here, we present a novel, on-the-fly method of tuning probe affinity and selectivity via the stoichiometry of auxiliary species, allowing independent and decoupled adjustment of hybridization yield for different probes in multiplexed assays. Using this method, we achieve near-continuous tuning of probe effective free energy (0.03 kcal·mol−1 granularity). As applications, we enforced uniform capture efficiency of 31 DNA molecules (GC content 0% – 100%), maximized signal difference for 11 pairs of single nucleotide variants, and performed tunable hybrid-capture of mRNA from total RNA. Using the Nanostring nCounter platform, we applied stoichiometric tuning to simultaneously adjust yields for a 24-plex assay, and we show multiplexed quantitation of RNA sequences and variants from formalin-fixed, paraffin-embedded samples (FFPE). PMID:26480474

  9. Conservation of Shannon's redundancy for proteins. [information theory applied to amino acid sequences

    NASA Technical Reports Server (NTRS)

    Gatlin, L. L.

    1974-01-01

    Concepts of information theory are applied to examine various proteins in terms of their redundancy in natural originators such as animals and plants. The Monte Carlo method is used to derive information parameters for random protein sequences. Real protein sequence parameters are compared with the standard parameters of protein sequences having a specific length. The tendency of a chain to contain some amino acids more frequently than others and the tendency of a chain to contain certain amino acid pairs more frequently than other pairs are used as randomness measures of individual protein sequences. Non-periodic proteins are generally found to have random Shannon redundancies except in cases of constraints due to short chain length and genetic codes. Redundant characteristics of highly periodic proteins are discussed. A degree of periodicity parameter is derived.

  10. Discovery of a Mammalian Splice Variant of Myostatin That Stimulates Myogenesis

    PubMed Central

    Jeanplong, Ferenc; Falconer, Shelley J.; Oldham, Jenny M.; Thomas, Mark; Gray, Tarra S.; Hennebry, Alex; Matthews, Kenneth G.; Kemp, Frederick C.; Patel, Ketan; Berry, Carole; Nicholas, Gina; McMahon, Christopher D.

    2013-01-01

    Myostatin plays a fundamental role in regulating the size of skeletal muscles. To date, only a single myostatin gene and no splice variants have been identified in mammals. Here we describe the splicing of a cryptic intron that removes the coding sequence for the receptor binding moiety of sheep myostatin. The deduced polypeptide sequence of the myostatin splice variant (MSV) contains a 256 amino acid N-terminal domain, which is common to myostatin, and a unique C-terminus of 65 amino acids. Western immunoblotting demonstrated that MSV mRNA is translated into protein, which is present in skeletal muscles. To determine the biological role of MSV, we developed an MSV over-expressing C2C12 myoblast line and showed that it proliferated faster than that of the control line in association with an increased abundance of the CDK2/Cyclin E complex in the nucleus. Recombinant protein made for the novel C-terminus of MSV also stimulated myoblast proliferation and bound to myostatin with high affinity as determined by surface plasmon resonance assay. Therefore, we postulated that MSV functions as a binding protein and antagonist of myostatin. Consistent with our postulate, myostatin protein was co-immunoprecipitated from skeletal muscle extracts with an MSV-specific antibody. MSV over-expression in C2C12 myoblasts blocked myostatin-induced Smad2/3-dependent signaling, thereby confirming that MSV antagonizes the canonical myostatin pathway. Furthermore, MSV over-expression increased the abundance of MyoD, Myogenin and MRF4 proteins (P<0.05), which indicates that MSV stimulates myogenesis through the induction of myogenic regulatory factors. To help elucidate a possible role in vivo, we observed that MSV protein was more abundant during early post-natal muscle development, while myostatin remained unchanged, which suggests that MSV may promote the growth of skeletal muscles. We conclude that MSV represents a unique example of intra-genic regulation in which a splice variant

  11. Characterization of a novel variant of Mycobacterium chimaera.

    PubMed

    van Ingen, J; Hoefsloot, W; Buijtels, P C A M; Tortoli, E; Supply, P; Dekhuijzen, P N R; Boeree, M J; van Soolingen, D

    2012-09-01

    In this study, nonchromogenic mycobacteria were isolated from pulmonary samples of three patients in the Netherlands. All isolates had identical, unique 16S rRNA gene and 16S-23S ITS sequences, which were closely related to those of Mycobacterium chimaera and Mycobacterium marseillense. The biochemical features of the isolates differed slightly from those of M. chimaera, suggesting that the isolates may represent a possible separate species within the Mycobacterium avium complex (MAC). However, the cell-wall mycolic acid pattern, analysed by HPLC, and the partial sequences of the hsp65 and rpoB genes were identical to those of M. chimaera. We concluded that the isolates represent a novel variant of M. chimaera. The results of this analysis have led us to question the currently used methods of species definition for members of the genus Mycobacterium, which are based largely on 16S rRNA or rpoB gene sequencing. Definitions based on a single genetic target are likely to be insufficient. Genetic divergence, especially in the MAC, yields strains that cannot be confidently assigned to a specific species based on the analysis of a single genetic target.

  12. Cellulase variants with improved expression, activity and stability, and use thereof

    DOEpatents

    Aehle, Wolfgang; Bott, Richard R; Bower, Benjamin; Caspi, Jonathan; Estell, David A; Goedegebuur, Frits; Hommes, Ronaldus W.J.; Kaper, Thijs; Kelemen, Bradley; Kralj, Slavko; Van Lieshout, Johan; Nikolaev, Igor; Van Stigt Thans, Sander; Wallace, Louise; Vogtentanz, Gudrun; Sandgren, Mats

    2014-03-25

    The present disclosure relates to cellulase variants. In particular the present disclosure relates to cellulase variants having improved expression, activity and/or stability. Also described are nucleic acids encoding the cellulase variants, compositions comprising the cellulase variants, and methods of use thereof.

  13. Cellulase variants with improved expression, activity and stability, and use thereof

    DOEpatents

    Aehle, Wolfgang; Bott, Richard R.; Bower, Benjamin S.; Caspi, Jonathan; Goedegebuur, Frits; Hommes, Ronaldus Wilhelmus Joannes; Kaper, Thijs; Kelemen, Bradley R.; Kralj, Slavko; Van Lieshout, Johannes Franciscus Thomas; Nikolaev, Igor; Wallace, Louise; Van Stigt Thans, Sander; Vogtentanz, Gudrun; Sandgren, Mats

    2016-12-20

    The present disclosure relates to cellulase variants. In particular the present disclosure relates to cellulase variants having improved expression, activity and/or stability. Also described are nucleic acids encoding the cellulase variants, compositions comprising the cellulase variants, and methods of use thereof.

  14. Hundreds of variants clustered in genomic loci and biological pathways affect human height

    PubMed Central

    Lango Allen, Hana; Estrada, Karol; Lettre, Guillaume; Berndt, Sonja I.; Weedon, Michael N.; Rivadeneira, Fernando; Willer, Cristen J.; Jackson, Anne U.; Vedantam, Sailaja; Raychaudhuri, Soumya; Ferreira, Teresa; Wood, Andrew R.; Weyant, Robert J.; Segrè, Ayellet V.; Speliotes, Elizabeth K.; Wheeler, Eleanor; Soranzo, Nicole; Park, Ju-Hyun; Yang, Jian; Gudbjartsson, Daniel; Heard-Costa, Nancy L.; Randall, Joshua C.; Qi, Lu; Smith, Albert Vernon; Mägi, Reedik; Pastinen, Tomi; Liang, Liming; Heid, Iris M.; Luan, Jian'an; Thorleifsson, Gudmar; Winkler, Thomas W.; Goddard, Michael E.; Lo, Ken Sin; Palmer, Cameron; Workalemahu, Tsegaselassie; Aulchenko, Yurii S.; Johansson, Åsa; Zillikens, M.Carola; Feitosa, Mary F.; Esko, Tõnu; Johnson, Toby; Ketkar, Shamika; Kraft, Peter; Mangino, Massimo; Prokopenko, Inga; Absher, Devin; Albrecht, Eva; Ernst, Florian; Glazer, Nicole L.; Hayward, Caroline; Hottenga, Jouke-Jan; Jacobs, Kevin B.; Knowles, Joshua W.; Kutalik, Zoltán; Monda, Keri L.; Polasek, Ozren; Preuss, Michael; Rayner, Nigel W.; Robertson, Neil R.; Steinthorsdottir, Valgerdur; Tyrer, Jonathan P.; Voight, Benjamin F.; Wiklund, Fredrik; Xu, Jianfeng; Zhao, Jing Hua; Nyholt, Dale R.; Pellikka, Niina; Perola, Markus; Perry, John R.B.; Surakka, Ida; Tammesoo, Mari-Liis; Altmaier, Elizabeth L.; Amin, Najaf; Aspelund, Thor; Bhangale, Tushar; Boucher, Gabrielle; Chasman, Daniel I.; Chen, Constance; Coin, Lachlan; Cooper, Matthew N.; Dixon, Anna L.; Gibson, Quince; Grundberg, Elin; Hao, Ke; Junttila, M. Juhani; Kaplan, Lee M.; Kettunen, Johannes; König, Inke R.; Kwan, Tony; Lawrence, Robert W.; Levinson, Douglas F.; Lorentzon, Mattias; McKnight, Barbara; Morris, Andrew P.; Müller, Martina; Ngwa, Julius Suh; Purcell, Shaun; Rafelt, Suzanne; Salem, Rany M.; Salvi, Erika; Sanna, Serena; Shi, Jianxin; Sovio, Ulla; Thompson, John R.; Turchin, Michael C.; Vandenput, Liesbeth; Verlaan, Dominique J.; Vitart, Veronique; White, Charles C.; Ziegler, Andreas; Almgren, Peter; Balmforth, Anthony J.; Campbell, Harry; Citterio, Lorena; De Grandi, Alessandro; Dominiczak, Anna; Duan, Jubao; Elliott, Paul; Elosua, Roberto; Eriksson, Johan G.; Freimer, Nelson B.; Geus, Eco J.C.; Glorioso, Nicola; Haiqing, Shen; Hartikainen, Anna-Liisa; Havulinna, Aki S.; Hicks, Andrew A.; Hui, Jennie; Igl, Wilmar; Illig, Thomas; Jula, Antti; Kajantie, Eero; Kilpeläinen, Tuomas O.; Koiranen, Markku; Kolcic, Ivana; Koskinen, Seppo; Kovacs, Peter; Laitinen, Jaana; Liu, Jianjun; Lokki, Marja-Liisa; Marusic, Ana; Maschio, Andrea; Meitinger, Thomas; Mulas, Antonella; Paré, Guillaume; Parker, Alex N.; Peden, John F.; Petersmann, Astrid; Pichler, Irene; Pietiläinen, Kirsi H.; Pouta, Anneli; Ridderstråle, Martin; Rotter, Jerome I.; Sambrook, Jennifer G.; Sanders, Alan R.; Schmidt, Carsten Oliver; Sinisalo, Juha; Smit, Jan H.; Stringham, Heather M.; Walters, G.Bragi; Widen, Elisabeth; Wild, Sarah H.; Willemsen, Gonneke; Zagato, Laura; Zgaga, Lina; Zitting, Paavo; Alavere, Helene; Farrall, Martin; McArdle, Wendy L.; Nelis, Mari; Peters, Marjolein J.; Ripatti, Samuli; van Meurs, Joyce B.J.; Aben, Katja K.; Ardlie, Kristin G; Beckmann, Jacques S.; Beilby, John P.; Bergman, Richard N.; Bergmann, Sven; Collins, Francis S.; Cusi, Daniele; den Heijer, Martin; Eiriksdottir, Gudny; Gejman, Pablo V.; Hall, Alistair S.; Hamsten, Anders; Huikuri, Heikki V.; Iribarren, Carlos; Kähönen, Mika; Kaprio, Jaakko; Kathiresan, Sekar; Kiemeney, Lambertus; Kocher, Thomas; Launer, Lenore J.; Lehtimäki, Terho; Melander, Olle; Mosley, Tom H.; Musk, Arthur W.; Nieminen, Markku S.; O'Donnell, Christopher J.; Ohlsson, Claes; Oostra, Ben; Palmer, Lyle J.; Raitakari, Olli; Ridker, Paul M.; Rioux, John D.; Rissanen, Aila; Rivolta, Carlo; Schunkert, Heribert; Shuldiner, Alan R.; Siscovick, David S.; Stumvoll, Michael; Tönjes, Anke; Tuomilehto, Jaakko; van Ommen, Gert-Jan; Viikari, Jorma; Heath, Andrew C.; Martin, Nicholas G.; Montgomery, Grant W.; Province, Michael A.; Kayser, Manfred; Arnold, Alice M.; Atwood, Larry D.; Boerwinkle, Eric; Chanock, Stephen J.; Deloukas, Panos; Gieger, Christian; Grönberg, Henrik; Hall, Per; Hattersley, Andrew T.; Hengstenberg, Christian; Hoffman, Wolfgang; Lathrop, G.Mark; Salomaa, Veikko; Schreiber, Stefan; Uda, Manuela; Waterworth, Dawn; Wright, Alan F.; Assimes, Themistocles L.; Barroso, Inês; Hofman, Albert; Mohlke, Karen L.; Boomsma, Dorret I.; Caulfield, Mark J.; Cupples, L.Adrienne; Erdmann, Jeanette; Fox, Caroline S.; Gudnason, Vilmundur; Gyllensten, Ulf; Harris, Tamara B.; Hayes, Richard B.; Jarvelin, Marjo-Riitta; Mooser, Vincent; Munroe, Patricia B.; Ouwehand, Willem H.; Penninx, Brenda W.; Pramstaller, Peter P.; Quertermous, Thomas; Rudan, Igor; Samani, Nilesh J.; Spector, Timothy D.; Völzke, Henry; Watkins, Hugh; Wilson, James F.; Groop, Leif C.; Haritunians, Talin; Hu, Frank B.; Kaplan, Robert C.; Metspalu, Andres; North, Kari E.; Schlessinger, David; Wareham, Nicholas J.; Hunter, David J.; O'Connell, Jeffrey R.; Strachan, David P.; Wichmann, H.-Erich; Borecki, Ingrid B.; van Duijn, Cornelia M.; Schadt, Eric E.; Thorsteinsdottir, Unnur; Peltonen, Leena; Uitterlinden, André; Visscher, Peter M.; Chatterjee, Nilanjan; Loos, Ruth J.F.; Boehnke, Michael; McCarthy, Mark I.; Ingelsson, Erik; Lindgren, Cecilia M.; Abecasis, Gonçalo R.; Stefansson, Kari; Frayling, Timothy M.; Hirschhorn, Joel N

    2010-01-01

    Most common human traits and diseases have a polygenic pattern of inheritance: DNA sequence variants at many genetic loci influence phenotype. Genome-wide association (GWA) studies have identified >600 variants associated with human traits1, but these typically explain small fractions of phenotypic variation, raising questions about the utility of further studies. Here, using 183,727 individuals, we show that hundreds of genetic variants, in at least 180 loci, influence adult height, a highly heritable and classic polygenic trait2,3. The large number of loci reveals patterns with important implications for genetic studies of common human diseases and traits. First, the 180 loci are not random, but instead are enriched for genes that are connected in biological pathways (P=0.016), and that underlie skeletal growth defects (P<0.001). Second, the likely causal gene is often located near the most strongly associated variant: in 13 of 21 loci containing a known skeletal growth gene, that gene was closest to the associated variant. Third, at least 19 loci have multiple independently associated variants, suggesting that allelic heterogeneity is a frequent feature of polygenic traits, that comprehensive explorations of already-discovered loci should discover additional variants, and that an appreciable fraction of associated loci may have been identified. Fourth, associated variants are enriched for likely functional effects on genes, being over-represented amongst variants that alter amino acid structure of proteins and expression levels of nearby genes. Our data explain ∼10% of the phenotypic variation in height, and we estimate that unidentified common variants of similar effect sizes would increase this figure to ∼16% of phenotypic variation (∼20% of heritable variation). Although additional approaches are needed to fully dissect the genetic architecture of polygenic human traits, our findings indicate that GWA studies can identify large numbers of loci that

  15. Complete Amino Acid Sequence of a Copper/Zinc-Superoxide Dismutase from Ginger Rhizome.

    PubMed

    Nishiyama, Yuki; Fukamizo, Tamo; Yoneda, Kazunari; Araki, Tomohiro

    2017-04-01

    Superoxide dismutase (SOD) is an antioxidant enzyme protecting cells from oxidative stress. Ginger (Zingiber officinale) is known for its antioxidant properties, however, there are no data on SODs from ginger rhizomes. In this study, we purified SOD from the rhizome of Z. officinale (Zo-SOD) and determined its complete amino acid sequence using N terminal sequencing, amino acid analysis, and de novo sequencing by tandem mass spectrometry. Zo-SOD consists of 151 amino acids with two signature Cu/Zn-SOD motifs and has high similarity to other plant Cu/Zn-SODs. Multiple sequence alignment showed that Cu/Zn-binding residues and cysteines forming a disulfide bond, which are highly conserved in Cu/Zn-SODs, are also present in Zo-SOD. Phylogenetic analysis revealed that plant Cu/Zn-SODs clustered into distinct chloroplastic, cytoplasmic, and intermediate groups. Among them, only chloroplastic enzymes carried amino acid substitutions in the region functionally important for enzymatic activity, suggesting that chloroplastic SODs may have a function distinct from those of SODs localized in other subcellular compartments. The nucleotide sequence of the Zo-SOD coding region was obtained by reverse-translation, and the gene was synthesized, cloned, and expressed. The recombinant Zo-SOD demonstrated pH stability in the range of 5-10, which is similar to other reported Cu/Zn-SODs, and thermal stability in the range of 10-60 °C, which is higher than that for most plant Cu/Zn-SODs but lower compared to the enzyme from a Z. officinale relative Curcuma aromatica.

  16. Protein location prediction using atomic composition and global features of the amino acid sequence

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cherian, Betsy Sheena, E-mail: betsy.skb@gmail.com; Nair, Achuthsankar S.

    2010-01-22

    Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectivelymore » used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.« less

  17. Reliable Detection of Herpes Simplex Virus Sequence Variation by High-Throughput Resequencing.

    PubMed

    Morse, Alison M; Calabro, Kaitlyn R; Fear, Justin M; Bloom, David C; McIntyre, Lauren M

    2017-08-16

    High-throughput sequencing (HTS) has resulted in data for a number of herpes simplex virus (HSV) laboratory strains and clinical isolates. The knowledge of these sequences has been critical for investigating viral pathogenicity. However, the assembly of complete herpesviral genomes, including HSV, is complicated due to the existence of large repeat regions and arrays of smaller reiterated sequences that are commonly found in these genomes. In addition, the inherent genetic variation in populations of isolates for viruses and other microorganisms presents an additional challenge to many existing HTS sequence assembly pipelines. Here, we evaluate two approaches for the identification of genetic variants in HSV1 strains using Illumina short read sequencing data. The first, a reference-based approach, identifies variants from reads aligned to a reference sequence and the second, a de novo assembly approach, identifies variants from reads aligned to de novo assembled consensus sequences. Of critical importance for both approaches is the reduction in the number of low complexity regions through the construction of a non-redundant reference genome. We compared variants identified in the two methods. Our results indicate that approximately 85% of variants are identified regardless of the approach. The reference-based approach to variant discovery captures an additional 15% representing variants divergent from the HSV1 reference possibly due to viral passage. Reference-based approaches are significantly less labor-intensive and identify variants across the genome where de novo assembly-based approaches are limited to regions where contigs have been successfully assembled. In addition, regions of poor quality assembly can lead to false variant identification in de novo consensus sequences. For viruses with a well-assembled reference genome, a reference-based approach is recommended.

  18. Rare missense variants in POT1 predispose to familial cutaneous malignant melanoma

    PubMed Central

    Shi, Jianxin; Yang, Xiaohong R.; Ballew, Bari; Rotunno, Melissa; Calista, Donato; Fargnoli, Maria Concetta; Ghiorzo, Paola; Paillerets, Brigitte Bressac-de; Nagore, Eduardo; Avril, Marie Francoise; Caporaso, Neil E.; McMaster, Mary L.; Cullen, Michael; Wang, Zhaoming; Zhang, Xijun; Bruno, William; Pastorino, Lorenza; Queirolo, Paola; Banuls-Roca, Jose; Garcia-Casado, Zaida; Vaysse, Amaury; Mohamdi, Hamida; Riazalhosseini, Yasser; Foglio, Mario; Jouenne, Fanélie; Hua, Xing; Hyland, Paula L.; Yin, Jinhu; Vallabhaneni, Haritha; Chai, Weihang; Minghetti, Paola; Pellegrini, Cristina; Ravichandran, Sarangan; Eggermont, Alexander; Lathrop, Mark; Peris, Ketty; Scarra, Giovanna Bianchi; Landi, Giorgio; Savage, Sharon A.; Sampson, Joshua N.; He, Ji; Yeager, Meredith; Goldin, Lynn R.; Demenais, Florence; Chanock, Stephen J.; Tucker, Margaret A.; Goldstein, Alisa M.; Liu, Yie; Landi, Maria Teresa

    2014-01-01

    Although CDKN2A is the most frequent high-risk melanoma susceptibility gene, the underlying genetic factors for most melanoma-prone families remain unknown. Using whole exome sequencing, we identified a rare variant that arose as a founder mutation in the telomere shelterin POT1 gene (g.7:124493086 C>T, Ser270Asn) in five unrelated melanoma-prone families from Romagna, Italy. Carriers of this variant had increased telomere length and elevated fragile telomeres suggesting that this variant perturbs telomere maintenance. Two additional rare POT1 variants were identified in all cases sequenced in two other Italian families, yielding a frequency of POT1 variants comparable to that of CDKN2A mutations in this population. These variants were not found in public databases or in 2,038 genotyped Italian controls. We also identified two rare recurrent POT1 variants in American and French familial melanoma cases. Our findings suggest that POT1 is a major susceptibility gene for familial melanoma in several populations. PMID:24686846

  19. The role of functionally defective rare germline variants of sialic acid acetylesterase in autoimmune Addison's disease

    PubMed Central

    Gan, Earn H; MacArthur, Katie; Mitchell, Anna L; Pearce, Simon H S

    2012-01-01

    Background Autoimmune Addison's disease (AAD) is a rare condition with a complex genetic basis. A panel of rare and functionally defective genetic variants in the sialic acid acetylesterase (SIAE) gene has recently been implicated in several common autoimmune conditions. We performed a case–control study to determine whether these rare variants are associated with a rarer condition, AAD. Method We analysed nine SIAE gene variants (W48X, M89V, C196F, C226G, R230W, T312M, Y349C, F404S and R479C) in a United Kingdom cohort of 378 AAD subjects and 387 healthy controls. All samples were genotyped using Sequenom iPlex chemistry to characterise primer extension products. Results A heterozygous rare allele at codon 312 (312*M) was found in one AAD patient (0.13%) but was not detected in the healthy controls. The commoner, functionally recessive variant at codon 89 (89*V) was found to be homozygous in two AAD patients but was only found in the heterozygous state in controls. Taking into account all nine alleles examined, 4/378 (1.06%) AAD patients and 1/387 (0.25%) healthy controls carried the defective SIAE alleles, with a calculated odds ratio of 4.13 (95% CI 0.44–97.45, two-tailed P value 0.212, NS). Conclusion We demonstrated the presence of 89*V homozygotes and the 312*M rare allele in the AAD cohort, but overall, our analysis does not support a role for rare variants in SIAE in the pathogenesis of AAD. However, the relatively small collection of AAD patients limits the power to exclude a small effect. PMID:23011869

  20. Exome Sequence Analysis of 14 Families With High Myopia.

    PubMed

    Kloss, Bethany A; Tompson, Stuart W; Whisenhunt, Kristina N; Quow, Krystina L; Huang, Samuel J; Pavelec, Derek M; Rosenberg, Thomas; Young, Terri L

    2017-04-01

    To identify causal gene mutations in 14 families with autosomal dominant (AD) high myopia using exome sequencing. Select individuals from 14 large Caucasian families with high myopia were exome sequenced. Gene variants were filtered to identify potential pathogenic changes. Sanger sequencing was used to confirm variants in original DNA, and to test for disease cosegregation in additional family members. Candidate genes and chromosomal loci previously associated with myopic refractive error and its endophenotypes were comprehensively screened. In 14 high myopia families, we identified 73 rare and 31 novel gene variants as candidates for pathogenicity. In seven of these families, two of the novel and eight of the rare variants were within known myopia loci. A total of 104 heterozygous nonsynonymous rare variants in 104 genes were identified in 10 out of 14 probands. Each variant cosegregated with affection status. No rare variants were identified in genes known to cause myopia or in genes closest to published genome-wide association study association signals for refractive error or its endophenotypes. Whole exome sequencing was performed to determine gene variants implicated in the pathogenesis of AD high myopia. This study provides new genes for consideration in the pathogenesis of high myopia, and may aid in the development of genetic profiling of those at greatest risk for attendant ocular morbidities of this disorder.