Science.gov

Sample records for acid sequence variants

  1. On human disease-causing amino acid variants: statistical study of sequence and structural patterns

    PubMed Central

    Alexov, Emil

    2015-01-01

    Statistical analysis was carried out on large set of naturally occurring human amino acid variations and it was demonstrated that there is a preference for some amino acid substitutions to be associated with diseases. At an amino acid sequence level, it was shown that the disease-causing variants frequently involve drastic changes of amino acid physico-chemical properties of proteins such as charge, hydrophobicity and geometry. Structural analysis of variants involved in diseases and being frequently observed in human population showed similar trends: disease-causing variants tend to cause more changes of hydrogen bond network and salt bridges as compared with harmless amino acid mutations. Analysis of thermodynamics data reported in literature, both experimental and computational, indicated that disease-causing variants tend to destabilize proteins and their interactions, which prompted us to investigate the effects of amino acid mutations on large databases of experimentally measured energy changes in unrelated proteins. Although the experimental datasets were linked neither to diseases nor exclusory to human proteins, the observed trends were the same: amino acid mutations tend to destabilize proteins and their interactions. Having in mind that structural and thermodynamics properties are interrelated, it is pointed out that any large change of any of them is anticipated to cause a disease. PMID:25689729

  2. Amino acid substitutions in genetic variants of human serum albumin and in sequences inferred from molecular cloning

    SciTech Connect

    Takahashi, N.; Takahashi, Y.; Blumberg, B.S.; Putnam, F.W.

    1987-07-01

    The structural changes in four genetic variants of human serum albumin were analyzed by tandem high-pressure liquid chromatography (HPLC) of the tryptic peptides, HPLC mapping and isoelectric focusing of the CNBr fragments, and amino acid sequence analysis of the purified peptides. Lysine-372 of normal (common) albumin A was changed to glutamic acid both in albumin Naskapi, a widespread polymorphic variant of North American Indians, and in albumin Mersin found in Eti Turks. The two variants also exhibited anomalous migration in NaDodSO/sub 4//PAGE, which is attributed to a conformational change. The identity of albumins Naskapi and Mersin may have originated through descent from a common mid-Asiatic founder of the two migrating ethnic groups, or it may represent identical but independent mutations of the albumin gene. In albumin Adana, from Eti Turks, the substitution site was not identified but was localized to the region from positions 447 through 548. The substitution of aspartic acid-550 by glycine was found in albumin Mexico-2 from four individuals of the Pima tribe. Although only single-point substitutions have been found in these and in certain other genetic variants of human albumin, five differences exist in the amino acid sequences inferred from cDNA sequences by workers in three other laboratories. However, our results on albumin A and on 14 different genetic variants accord with the amino acid sequence of albumin deduced from the genomic sequence. The apparent amino acid substitutions inferred from comparison of individual cDNA sequences probably reflect artifacts in cloning or in cDNA sequence analysis rather than polymorphism of the coding sections of the albumin gene.

  3. Amino acid substitutions in genetic variants of human serum albumin and in sequences inferred from molecular cloning.

    PubMed

    Takahashi, N; Takahashi, Y; Blumberg, B S; Putnam, F W

    1987-07-01

    The structural changes in four genetic variants of human serum albumin were analyzed by tandem high-pressure liquid chromatography (HPLC) of the tryptic peptides, HPLC mapping and isoelectric focusing of the CNBr fragments, and amino acid sequence analysis of the purified peptides. Lysine-372 of normal (common) albumin A was changed to glutamic acid both in albumin Naskapi, a widespread polymorphic variant of North American Indians, and in albumin Mersin found in Eti Turks. The two variants also exhibited anomalous migration in NaDodSO4/PAGE, which is attributed to a conformational change. The identity of albumins Naskapi and Mersin may have originated through descent from a common mid-Asiatic founder of the two migrating ethnic groups, or it may represent identical but independent mutations of the albumin gene. In albumin Adana, from Eti Turks, the substitution site was not identified but was localized to the region from positions 447 through 548. The substitution of aspartic acid-550 by glycine was found in albumin Mexico-2 from four individuals of the Pima tribe. Although only single-point substitutions have been found in these and in certain other genetic variants of human albumin, five differences exist in the amino acid sequences inferred from cDNA sequences by workers in three other laboratories. However, our results on albumin A and on 14 different genetic variants accord with the amino acid sequence of albumin deduced from the genomic sequence. The apparent amino acid substitutions inferred from comparison of individual cDNA sequences probably reflect artifacts in cloning or in cDNA sequence analysis rather than polymorphism of the coding sections of the albumin gene.

  4. Amino acid sequence and variant forms of favin, a lectin from Vicia faba.

    PubMed

    Hopp, T P; Hemperly, J J; Cunningham, B A

    1982-04-25

    We have determined the complete amino acid sequence (182 residues) of the beta chain of favin, the glucose-binding lectin from fava beans (Vicia faba), and have established that the carbohydrate moiety is attached to Asn 168. Together with the sequence of the alpha chain previously reported (Hemperly, J. J., Hopp, T. P., Becker, J. W., and Cunningham, B. A. (1979) J. Biol. Chem. 254, 6803-6810), these data complete the analysis of the primary structure of the lectin. We have also examined minor polypeptides that appear in all preparations of favin. Two lower molecular weight species (Mr = 9,500-11,600) appear to be fragments of the beta chain resulting from cleavage following Asn 76, whereas six high molecular weight forms (Mr = 25,000 or greater) appear to include aggregates of the beta chain and possibly some alternative products of chain processing. PMID:7068646

  5. Data in support of the discovery of alternative splicing variants of quail LEPR and the evolutionary conservation of qLEPRl by nucleotide and amino acid sequences alignment

    PubMed Central

    Wang, Dandan; Xu, Chunlin; Wang, Taian; Li, Hong; Li, Yanmin; Ren, Junxiao; Tian, Yadong; Li, Zhuanjian; Jiao, Yuping; Kang, Xiangtao; Liu, Xiaojun

    2015-01-01

    Leptin receptor (LEPR) belongs to the class I cytokine receptor superfamily which share common structural features and signal transduction pathways. Although multiple LEPR isoforms, which are derived from one gene, were identified in mammals, they were rarely found in avian except the long LEPR. Four alternative splicing variants of quail LEPR (qLEPR) had been cloned and sequenced for the first time (Wang et al., 2015 [1]). To define patterns of the four splicing variants (qLEPRl, qLEPR-a, qLEPR-b and qLEPR-c) and locate the conserved regions of qLEPRl, this data article provides nucleotide sequence alignment of qLEPR and amino acid sequence alignment of representative vertebrate LEPR. The detailed analysis was shown in [1]. PMID:26759819

  6. Better prediction of functional effects for sequence variants

    PubMed Central

    2015-01-01

    Elucidating the effects of naturally occurring genetic variation is one of the major challenges for personalized health and personalized medicine. Here, we introduce SNAP2, a novel neural network based classifier that improves over the state-of-the-art in distinguishing between effect and neutral variants. Our method's improved performance results from screening many potentially relevant protein features and from refining our development data sets. Cross-validated on >100k experimentally annotated variants, SNAP2 significantly outperformed other methods, attaining a two-state accuracy (effect/neutral) of 83%. SNAP2 also outperformed combinations of other methods. Performance increased for human variants but much more so for other organisms. Our method's carefully calibrated reliability index informs selection of variants for experimental follow up, with the most strongly predicted half of all effect variants predicted at over 96% accuracy. As expected, the evolutionary information from automatically generated multiple sequence alignments gave the strongest signal for the prediction. However, we also optimized our new method to perform surprisingly well even without alignments. This feature reduces prediction runtime by over two orders of magnitude, enables cross-genome comparisons, and renders our new method as the best solution for the 10-20% of sequence orphans. SNAP2 is available at: https://rostlab.org/services/snap2web Definitions used Delta, input feature that results from computing the difference feature scores for native amino acid and feature scores for variant amino acid; nsSNP, non-synoymous SNP; PMD, Protein Mutant Database; SNAP, Screening for non-acceptable polymorphisms; SNP, single nucleotide polymorphism; variant, any amino acid changing sequence variant. PMID:26110438

  7. Sequence Variant Descriptions: HGVS Nomenclature and Mutalyzer.

    PubMed

    den Dunnen, Johan T

    2016-01-01

    Consistent and unambiguous description of sequence variants is essential to report and exchange information on the analysis of a genome, in particular in DNA diagnostics. The HGVS nomenclature-recommendations for the description of sequence variants as originally proposed by the Human Genome Variation Society-has gradually been accepted as the international standard for variant description. In this unit, we describe the current recommendations (HGVS version 15.11) regarding how to describe variants at the DNA, RNA, and protein level. We explain the rationale and give example descriptions for all variant types: substitution, deletion, duplication, insertion, inversion, conversion, and complex, as well as special types occurring only on the RNA (splicing) or protein level (nonsense, frame shift, extension). Finally, we point users to available support tools and give examples for the use of the freely available Mutalyzer suite. An extensive version of the HGVS recommendations is available online at http://varnomen.hgvs.org/. © 2016 by John Wiley & Sons, Inc. PMID:27367167

  8. Nanopore sequencing detects structural variants in cancer

    PubMed Central

    Norris, Alexis L.; Workman, Rachael E.; Fan, Yunfan; Eshleman, James R.; Timp, Winston

    2016-01-01

    ABSTRACT Despite advances in sequencing, structural variants (SVs) remain difficult to reliably detect due to the short read length (<300 bp) of 2nd generation sequencing. Not only do the reads (or paired-end reads) need to straddle a breakpoint, but repetitive elements often lead to ambiguities in the alignment of short reads. We propose to use the long-reads (up to 20 kb) possible with 3rd generation sequencing, specifically nanopore sequencing on the MinION. Nanopore sequencing relies on a similar concept to a Coulter counter, reading the DNA sequence from the change in electrical current resulting from a DNA strand being forced through a nanometer-sized pore embedded in a membrane. Though nanopore sequencing currently has a relatively high mismatch rate that precludes base substitution and small frameshift mutation detection, its accuracy is sufficient for SV detection because of its long reads. In fact, long reads in some cases may improve SV detection efficiency. We have tested nanopore sequencing to detect a series of well-characterized SVs, including large deletions, inversions, and translocations that inactivate the CDKN2A/p16 and SMAD4/DPC4 tumor suppressor genes in pancreatic cancer. Using PCR amplicon mixes, we have demonstrated that nanopore sequencing can detect large deletions, translocations and inversions at dilutions as low as 1:100, with as few as 500 reads per sample. Given the speed, small footprint, and low capital cost, nanopore sequencing could become the ideal tool for the low-level detection of cancer-associated SVs needed for molecular relapse, early detection, or therapeutic monitoring. PMID:26787508

  9. Analysis of DNA Sequence Variants Detected by High Throughput Sequencing

    PubMed Central

    Adams, David R; Sincan, Murat; Fajardo, Karin Fuentes; Mullikin, James C; Pierson, Tyler M; Toro, Camilo; Boerkoel, Cornelius F; Tifft, Cynthia J; Gahl, William A; Markello, Tom C

    2014-01-01

    The Undiagnosed Diseases Program at the National Institutes of Health uses High Throughput Sequencing (HTS) to diagnose rare and novel diseases. HTS techniques generate large numbers of DNA sequence variants, which must be analyzed and filtered to find candidates for disease causation. Despite the publication of an increasing number of successful exome-based projects, there has been little formal discussion of the analytic steps applied to HTS variant lists. We present the results of our experience with over 30 families for whom HTS sequencing was used in an attempt to find clinical diagnoses. For each family, exome sequence was augmented with high-density SNP-array data. We present a discussion of the theory and practical application of each analytic step and provide example data to illustrate our approach. The paper is designed to provide an analytic roadmap for variant analysis, thereby enabling a wide range of researchers and clinical genetics practitioners to perform direct analysis of HTS data for their patients and projects. PMID:22290882

  10. Strategies to choose from millions of imputed sequence variants

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Millions of sequence variants are known, but subsets are needed for routine genomic predictions or to include on genotyping arrays. Variant selection and imputation strategies were tested using 26 984 simulated reference bulls, of which 1 000 had 30 million sequence variants, 773 had 600 000 markers...

  11. Guidelines for investigating causality of sequence variants in human disease.

    PubMed

    MacArthur, D G; Manolio, T A; Dimmock, D P; Rehm, H L; Shendure, J; Abecasis, G R; Adams, D R; Altman, R B; Antonarakis, S E; Ashley, E A; Barrett, J C; Biesecker, L G; Conrad, D F; Cooper, G M; Cox, N J; Daly, M J; Gerstein, M B; Goldstein, D B; Hirschhorn, J N; Leal, S M; Pennacchio, L A; Stamatoyannopoulos, J A; Sunyaev, S R; Valle, D; Voight, B F; Winckler, W; Gunter, C

    2014-04-24

    The discovery of rare genetic variants is accelerating, and clear guidelines for distinguishing disease-causing sequence variants from the many potentially functional variants present in any human genome are urgently needed. Without rigorous standards we risk an acceleration of false-positive reports of causality, which would impede the translation of genomic research findings into the clinical diagnostic setting and hinder biological understanding of disease. Here we discuss the key challenges of assessing sequence variants in human disease, integrating both gene-level and variant-level support for causality. We propose guidelines for summarizing confidence in variant pathogenicity and highlight several areas that require further resource development.

  12. αIIbβ3 variants defined by next-generation sequencing: Predicting variants likely to cause Glanzmann thrombasthenia

    PubMed Central

    Buitrago, Lorena; Rendon, Augusto; Liang, Yupu; Simeoni, Ilenia; Negri, Ana; Filizola, Marta; Ouwehand, Willem H.; Coller, Barry S.; Alessi, Marie-Christine; Ballmaier, Matthias; Bariana, Tadbir; Bellissimo, Daniel; Bertoli, Marta; Bray, Paul; Bury, Loredana; Carrell, Robin; Cattaneo, Marco; Collins, Peter; French, Deborah; Favier, Remi; Freson, Kathleen; Furie, Bruce; Germeshausen, Manuela; Ghevaert, Cedric; Gomez, Keith; Goodeve, Anne; Gresele, Paolo; Guerrero, Jose; Hampshire, Dan J.; Hadinnapola, Charaka; Heemskerk, Johan; Henskens, Yvonne; Hill, Marian; Hogg, Nancy; Johnsen, Jill; Kahr, Walter; Kerr, Ron; Kunishima, Shinji; Laffan, Michael; Natwani, Amit; Neerman-Arbez, Marguerite; Nurden, Paquita; Nurden, Alan; Ormiston, Mark; Othman, Maha; Ouwehand, Willem; Perry, David; Vilk, Shoshana Ravel; Reitsma, Pieter; Rondina, Matthew; Simeoni, Ilenia; Smethurst, Peter; Stephens, Jonathan; Stevenson, William; Szkotak, Artur; Turro, Ernest; Van Geet, Christel; Vries, Minka; Ward, June; Waye, John; Westbury, Sarah; Whiteheart, Sidney; Wilcox, David; Zhang, Bi

    2015-01-01

    Next-generation sequencing is transforming our understanding of human genetic variation but assessing the functional impact of novel variants presents challenges. We analyzed missense variants in the integrin αIIbβ3 receptor subunit genes ITGA2B and ITGB3 identified by whole-exome or -genome sequencing in the ThromboGenomics project, comprising ∼32,000 alleles from 16,108 individuals. We analyzed the results in comparison with 111 missense variants in these genes previously reported as being associated with Glanzmann thrombasthenia (GT), 20 associated with alloimmune thrombocytopenia, and 5 associated with aniso/macrothrombocytopenia. We identified 114 novel missense variants in ITGA2B (affecting ∼11% of the amino acids) and 68 novel missense variants in ITGB3 (affecting ∼9% of the amino acids). Of the variants, 96% had minor allele frequencies (MAF) < 0.1%, indicating their rarity. Based on sequence conservation, MAF, and location on a complete model of αIIbβ3, we selected three novel variants that affect amino acids previously associated with GT for expression in HEK293 cells. αIIb P176H and β3 C547G severely reduced αIIbβ3 expression, whereas αIIb P943A partially reduced αIIbβ3 expression and had no effect on fibrinogen binding. We used receiver operating characteristic curves of combined annotation-dependent depletion, Polyphen 2-HDIV, and sorting intolerant from tolerant to estimate the percentage of novel variants likely to be deleterious. At optimal cut-off values, which had 69–98% sensitivity in detecting GT mutations, between 27% and 71% of the novel αIIb or β3 missense variants were predicted to be deleterious. Our data have implications for understanding the evolutionary pressure on αIIbβ3 and highlight the challenges in predicting the clinical significance of novel missense variants. PMID:25827233

  13. Selection of sequence variants to improve dairy cattle genomic predictions

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genomic prediction reliabilities improved when adding selected sequence variants from run 5 of the 1,000 bull genomes project. High density (HD) imputed genotypes for 26,970 progeny tested Holstein bulls were combined with sequence variants for 444 Holstein animals. The first test included 481,904 c...

  14. Protective variant for hippocampal atrophy identified by whole exome sequencing.

    PubMed

    Nho, Kwangsik; Kim, Sungeun; Risacher, Shannon L; Shen, Li; Corneveaux, Jason J; Swaminathan, Shanker; Lin, Hai; Ramanan, Vijay K; Liu, Yunlong; Foroud, Tatiana M; Inlow, Mark H; Siniard, Ashley L; Reiman, Rebecca A; Aisen, Paul S; Petersen, Ronald C; Green, Robert C; Jack, Clifford R; Weiner, Michael W; Baldwin, Clinton T; Lunetta, Kathryn L; Farrer, Lindsay A; Furney, Simon J; Lovestone, Simon; Simmons, Andrew; Mecocci, Patrizia; Vellas, Bruno; Tsolaki, Magda; Kloszewska, Iwona; Soininen, Hilkka; McDonald, Brenna C; Farlow, Martin R; Ghetti, Bernardino; Huentelman, Matthew J; Saykin, Andrew J

    2015-03-01

    We used whole-exome sequencing to identify variants other than APOE associated with the rate of hippocampal atrophy in amnestic mild cognitive impairment. An in-silico predicted missense variant in REST (rs3796529) was found exclusively in subjects with slow hippocampal volume loss and validated using unbiased whole-brain analysis and meta-analysis across 5 independent cohorts. REST is a master regulator of neurogenesis and neuronal differentiation that has not been previously implicated in Alzheimer's disease. These findings nominate REST and its functional pathways as protective and illustrate the potential of combining next-generation sequencing with neuroimaging to discover novel disease mechanisms and potential therapeutic targets. PMID:25559091

  15. HGVS Recommendations for the Description of Sequence Variants: 2016 Update.

    PubMed

    den Dunnen, Johan T; Dalgleish, Raymond; Maglott, Donna R; Hart, Reece K; Greenblatt, Marc S; McGowan-Jordan, Jean; Roux, Anne-Francoise; Smith, Timothy; Antonarakis, Stylianos E; Taschner, Peter E M

    2016-06-01

    The consistent and unambiguous description of sequence variants is essential to report and exchange information on the analysis of a genome. In particular, DNA diagnostics critically depends on accurate and standardized description and sharing of the variants detected. The sequence variant nomenclature system proposed in 2000 by the Human Genome Variation Society has been widely adopted and has developed into an internationally accepted standard. The recommendations are currently commissioned through a Sequence Variant Description Working Group (SVD-WG) operating under the auspices of three international organizations: the Human Genome Variation Society (HGVS), the Human Variome Project (HVP), and the Human Genome Organization (HUGO). Requests for modifications and extensions go through the SVD-WG following a standard procedure including a community consultation step. Version numbers are assigned to the nomenclature system to allow users to specify the version used in their variant descriptions. Here, we present the current recommendations, HGVS version 15.11, and briefly summarize the changes that were made since the 2000 publication. Most focus has been on removing inconsistencies and tightening definitions allowing automatic data processing. An extensive version of the recommendations is available online, at http://www.HGVS.org/varnomen. PMID:26931183

  16. Sequencing Structural Variants in Cancer for Precision Therapeutics.

    PubMed

    Macintyre, Geoff; Ylstra, Bauke; Brenton, James D

    2016-09-01

    The identification of mutations that guide therapy selection for patients with cancer is now routine in many clinical centres. The majority of assays used for solid tumour profiling use DNA sequencing to interrogate somatic point mutations because they are relatively easy to identify and interpret. Many cancers, however, including high-grade serous ovarian, oesophageal, and small-cell lung cancer, are driven by somatic structural variants that are not measured by these assays. Therefore, there is currently an unmet need for clinical assays that can cheaply and rapidly profile structural variants in solid tumours. In this review we survey the landscape of 'actionable' structural variants in cancer and identify promising detection strategies based on massively-parallel sequencing. PMID:27478068

  17. Phosphodiesterase sequence variants may predispose to prostate cancer

    PubMed Central

    de Alexandre, Rodrigo Bertollo; Horvath, Anelia; Szarek, Eva; Manning, Allison D.; Leal, Leticia Ferro; Kardauke, Fabio; Epstein, Jonathan A.; Carraro, Dirce Maria; Soares, Fernando Augusto; Apanasovich, Tatiyana; Stratakis, Constantine A.; Faucz, Fabio Rueda

    2015-01-01

    We hypothesized that mutations that inactivate phosphodiesterase (PDE) activity and lead to increased cyclic AMP (cAMP) and cyclic GMP (cGMP) levels may be associated with prostate cancer (PCa). We sequenced the entire PDE coding sequences in the DNA of 16 biopsy samples from PCa patients. Novel mutations were confirmed in the somatic or germline state by Sanger sequencing. Data were then compared to the 1000 Genome Project. PDE, CREB and pCREB protein expression was also studied in all samples, in both normal and abnormal tissue, by immunofluorescence. We identified 3 previously described PDE sequence variants that were significantly higher in PCa. Four novel sequence variations, one each in the PDE4B, PDE6C, PDE7B and PDE10A genes, respectively, were also found in the PCa samples. Interestingly, PDE10A and PDE4B novel variants that were present in 19% and 6% of the patients, respectively, were found in the tumor tissue only. In patients carrying PDE defects, there was pCREB accumulation (p<0.001), and an increase of the pCREB/CREB ratio (patients 0.97± 0.03; controls 0.52± 0.03; p-value < 0.001) by immunohistochemical analysis. We conclude that PDE sequence variants may play a role in the predisposition and/or progression to PCa at the germline and/or somatic state, respectively. Larger such studies are needed to confirm these findings. PMID:25979379

  18. Composition for nucleic acid sequencing

    SciTech Connect

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2008-08-26

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  19. Simulating Sequences of the Human Genome with Rare Variants

    PubMed Central

    Peng, Bo; Liu, Xiaoming

    2011-01-01

    Objective Simulated samples have been widely used in the development of efficient statistical methods identifying genetic variants that predispose to human genetic diseases. Although it is well known that natural selection has a strong influence on the number and diversity of rare genetic variations in human populations, existing simulation methods are limited in their ability to simulate multi-locus selection models with realistic distributions of the random fitness effects of newly arising mutants. Methods We developed a computer program to simulate large populations of gene sequences using a forward-time simulation approach. This program is capable of simulating several multi-locus fitness schemes with arbitrary diploid single-locus selection models with random or locus-specific fitness effects. Arbitrary quantitative trait or disease models can be applied to the simulated populations from which individual- or family-based samples can be drawn and analyzed. Results Using realistic demographic and natural selection models estimated from empirical sequence data, datasets simulated using our method differ significantly in the number and diversity of rare variants from datasets simulated using existing methods that ignore natural selection. Our program thus provides a useful tool to simulate datasets with realistic distributions of rare genetic variants for the study of genetic diseases caused by such variants. PMID:21212684

  20. Differences in regulatory sequences of naturally occurring JC virus variants.

    PubMed Central

    Martin, J D; King, D M; Slauch, J M; Frisque, R J

    1985-01-01

    The regulatory region was sequenced for DNAs representative of seven independent isolates of JC virus, the probable agent of progressive multifocal leukoencephalopathy. The isolates included an oncogenic variant (MAD-4), an antigenic variant (MAD-11), and two different isolates derived from the urine (MAD-7) and from the brain (MAD-8) of the same patient. The representative DNAs were molecularly cloned directly from diseased brain tissue and from human fetal glial cells infected with the corresponding isolated viruses. The regulatory sequences of these DNAs were compared with those of the prototype isolate, MAD-1, sequenced previously (R. J. Frisque, J. Virol. 46:170-176, 1983). We found that the regulatory region of JC viral DNA is highly variable due to complex alterations of the previously described 98-base-pair repeat of MAD-1 DNA. On the basis of these alterations, there are two general types of JC virus. There were no consistent alterations in regulatory sequences which could distinguish brain tissue DNAs from tissue culture DNAs. Furthermore, for each isolate except MAD-1 (R. J. Frisque, J. Virol. 46:170-176, 1983), the regulatory regions of brain tissue and tissue culture DNAs were not identical. The arrangement, sequence, or both of potential regulatory elements (TATA sequence, GGGXGGPuPu, tandem repeats) of JC viral DNAs are sufficiently different from those in other viral and eucaryotic systems that they may effect the unique properties of this slow virus. PMID:2981353

  1. Fast single-pass alignment and variant calling using sequencing data

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Sequencing research requires efficient computation. Few programs use already known information about DNA variants when aligning sequence data to the reference map. New program findmap.f90 reads the previous variant list before aligning sequence, calling variant alleles, and summing the allele counts...

  2. M2SG: mapping human disease-related genetic variants to protein sequences and genomic loci

    PubMed Central

    Ji, Renkai; Cong, Qian; Li, Wenlin; Grishin, Nick V.

    2013-01-01

    Summary: Online Mendelian Inheritance in Man (OMIM) is a manually curated compendium of human genetic variants and the corresponding phenotypes, mostly human diseases. Instead of directly documenting the native sequences for gene entries, OMIM links its entries to protein and DNA sequences in other databases. However, because of the existence of gene isoforms and errors in OMIM records, mapping a specific OMIM mutation to its corresponding protein sequence is not trivial. Combining computer programs and extensive manual curation of OMIM full-text descriptions and original literature, we mapped 98% of OMIM amino acid substitutions (AASs) and all SwissProt Variant (SwissVar) disease-related AASs to reference sequences and confidently mapped 99.96% of all AASs to the genomic loci. Based on the results, we developed an online database and interactive web server (M2SG) to (i) retrieve the mapped OMIM and SwissVar variants for a given protein sequence; and (ii) obtain related proteins and mutations for an input disease phenotype. This database will be useful for analyzing sequences, understanding the effect of mutations, identifying important genetic variations and designing experiments on a protein of interest. Availability and implementation: The database and web server are freely available at http://prodata.swmed.edu/M2S/mut2seq.cgi. Contact: grishin@chop.swmed.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24002112

  3. Amino acid substitutions in inherited albumin variants from Amerindian and Japanese populations

    SciTech Connect

    Takahashi, N.; Takahashi, Y.; Isobe, T.; Putnam, F.W.; Fujita, M.; Satoh, C.; Neel, J.V.

    1987-11-01

    The authors report an effort to determine the basis for the altered migration of seven inherited albumin variants detected by one-dimensional electrophoresis in population surveys involving tribal Amerindians and Japanese children. An amino acid substitution has thus far been determined for four of the variants. The randomness in the albumin polypeptide of these and the other sixteen independently ascertained amino acid substitutions of albumin and proalbumin thus far established was analyzed; the clustering of eight of these at two positions in the six-amino acid propeptide sequence seems noteworthy. By comparison with other proteins studied by electrophoresis, albumin exhibits average variability. It is a paradox that individuals who, for genetic reasons, lack albumin exhibit no obvious ill effects; yet, electrophoretic variants of albumin are no more numerous than are variants of proteins, the absence of which results in severe disease.

  4. Identifying novel sequence variants of RNA 3D motifs

    PubMed Central

    Zirbel, Craig L.; Roll, James; Sweeney, Blake A.; Petrov, Anton I.; Pirrung, Meg; Leontis, Neocles B.

    2015-01-01

    Predicting RNA 3D structure from sequence is a major challenge in biophysics. An important sub-goal is accurately identifying recurrent 3D motifs from RNA internal and hairpin loop sequences extracted from secondary structure (2D) diagrams. We have developed and validated new probabilistic models for 3D motif sequences based on hybrid Stochastic Context-Free Grammars and Markov Random Fields (SCFG/MRF). The SCFG/MRF models are constructed using atomic-resolution RNA 3D structures. To parameterize each model, we use all instances of each motif found in the RNA 3D Motif Atlas and annotations of pairwise nucleotide interactions generated by the FR3D software. Isostericity relations between non-Watson–Crick basepairs are used in scoring sequence variants. SCFG techniques model nested pairs and insertions, while MRF ideas handle crossing interactions and base triples. We use test sets of randomly-generated sequences to set acceptance and rejection thresholds for each motif group and thus control the false positive rate. Validation was carried out by comparing results for four motif groups to RMDetect. The software developed for sequence scoring (JAR3D) is structured to automatically incorporate new motifs as they accumulate in the RNA 3D Motif Atlas when new structures are solved and is available free for download. PMID:26130723

  5. Hypomorphic variants of cationic amino acid transporter 3 in males with autism spectrum disorders.

    PubMed

    Nava, Caroline; Rupp, Johanna; Boissel, Jean-Paul; Mignot, Cyril; Rastetter, Agnès; Amiet, Claire; Jacquette, Aurélia; Dupuits, Céline; Bouteiller, Delphine; Keren, Boris; Ruberg, Merle; Faudet, Anne; Doummar, Diane; Philippe, Anne; Périsse, Didier; Laurent, Claudine; Lebrun, Nicolas; Guillemot, Vincent; Chelly, Jamel; Cohen, David; Héron, Delphine; Brice, Alexis; Closs, Ellen I; Depienne, Christel

    2015-12-01

    Cationic amino acid transporters (CATs) mediate the entry of L-type cationic amino acids (arginine, ornithine and lysine) into the cells including neurons. CAT-3, encoded by the SLC7A3 gene on chromosome X, is one of the three CATs present in the human genome, with selective expression in brain. SLC7A3 is highly intolerant to variation in humans, as attested by the low frequency of deleterious variants in available databases, but the impact on variants in this gene in humans remains undefined. In this study, we identified a missense variant in SLC7A3, encoding the CAT-3 cationic amino acid transporter, on chromosome X by exome sequencing in two brothers with autism spectrum disorder (ASD). We then sequenced the SLC7A3 coding sequence in 148 male patients with ASD and identified three additional rare missense variants in unrelated patients. Functional analyses of the mutant transporters showed that two of the four identified variants cause severe or moderate loss of CAT-3 function due to altered protein stability or abnormal trafficking to the plasma membrane. The patient with the most deleterious SLC7A3 variant had high-functioning autism and epilepsy, and also carries a de novo 16p11.2 duplication possibly contributing to his phenotype. This study shows that rare hypomorphic variants of SLC7A3 exist in male individuals and suggest that SLC7A3 variants possibly contribute to the etiology of ASD in male subjects in association with other genetic factors. PMID:26215737

  6. mitoSAVE: mitochondrial sequence analysis of variants in Excel.

    PubMed

    King, Jonathan L; Sajantila, Antti; Budowle, Bruce

    2014-09-01

    The mitochondrial genome (mtGenome) contains genetic information amenable to numerous applications such as medical research, population and evolutionary studies, and human identity testing. However, inconsistent nomenclature assignment makes haplotype comparison difficult and can lead to false exclusion of potentially useful profiles. Massively Parallel Sequencing (MPS) is a platform for sequencing large datasets and potentially whole populations with relative ease. However, the data generated are not easily parsed and interpreted. With this in mind, mitoSAVE has been developed to enable fast conversion of Variant Call Format (VCF) files. mitoSAVE is an Excel-based workbook that converts data within the VCF into mtDNA haplotypes using phylogenetically-established nomenclature as well as rule-based alignments consistent with current forensic standards. mitoSAVE is formatted for human mitochondrial genome; however, it can easily be adapted to support other reasonably small genomes.

  7. Whole-exome sequencing identifies variants in invasive pituitary adenomas

    PubMed Central

    Lan, Xiaolei; Gao, Hua; Wang, Fei; Feng, Jie; Bai, Jiwei; Zhao, Peng; Cao, Lei; Gui, Songbai; Gong, Lei; Zhang, Yazhuo

    2016-01-01

    Pituitary adenomas exhibit a wide range of behaviors. The prediction of invasion or malignant behavior in pituitary adenomas remains challenging. The objective of the present study was to identify the genetic abnormalities associated with invasion in sporadic pituitary adenomas. In the present study, the exomes of six invasive pituitary adenomas (IPA) and six non-invasive pituitary adenomas (nIPA) were sequenced by whole-exome sequencing. Variants were confirmed by dideoxynucleotide sequencing, and candidate driver genes were assessed in an additional 28 pituitary adenomas. A total of 15 identified variants were mainly associated with angiogenesis, metabolism, cell cycle phase, cellular component organization, cytoskeleton and biogenesis immune at a cellular level, including 13 variants that occurred as single nucleotide variants and 2 that comprised of insertions. The messenger RNA (mRNA) levels of diffuse panbronchiolitis critical region 1 (DPCR1), KIAA0226, myxovirus (influenza virus) resistance, proline-rich protein BstNI subfamily 3, PR domain containing 2, with ZNF domain, RIZ1 (PRDM2), PR domain containing 8 (PRDM8), SPANX family member N2 (SPANXN2), TRIO and F-actin binding protein and zinc finger protein 717 in IPA specimens were 50% decreased compared with nIPA specimens. In particular, DPCR1, PRDM2, PRDM8 and SPANXN2 mRNA levels in IPA specimens were approximately four-fold lower compared with nIPA specimens (P=0.003, 0.007, 0.009 and 0.004, respectively). By contrast, the mRNA levels of dentin sialophospho protein, EGF like domain, multiple 7 (EGFL7), low density lipoprotein receptor-related protein 1B and dynein, axonemal, assembly factor 1 (LRRC50) were increased in IPA compared with nIPA specimens (P=0.041, 0.037, 0.022 and 0.013, respectively). Furthermore, decreased PRDM2 expression was associated with tumor recurrence. The findings of the present study indicate that DPCR1, EGFL7, the PRDM family and LRRC50 in pituitary adenomas are modifiers of

  8. Whole-exome sequencing identifies variants in invasive pituitary adenomas

    PubMed Central

    Lan, Xiaolei; Gao, Hua; Wang, Fei; Feng, Jie; Bai, Jiwei; Zhao, Peng; Cao, Lei; Gui, Songbai; Gong, Lei; Zhang, Yazhuo

    2016-01-01

    Pituitary adenomas exhibit a wide range of behaviors. The prediction of invasion or malignant behavior in pituitary adenomas remains challenging. The objective of the present study was to identify the genetic abnormalities associated with invasion in sporadic pituitary adenomas. In the present study, the exomes of six invasive pituitary adenomas (IPA) and six non-invasive pituitary adenomas (nIPA) were sequenced by whole-exome sequencing. Variants were confirmed by dideoxynucleotide sequencing, and candidate driver genes were assessed in an additional 28 pituitary adenomas. A total of 15 identified variants were mainly associated with angiogenesis, metabolism, cell cycle phase, cellular component organization, cytoskeleton and biogenesis immune at a cellular level, including 13 variants that occurred as single nucleotide variants and 2 that comprised of insertions. The messenger RNA (mRNA) levels of diffuse panbronchiolitis critical region 1 (DPCR1), KIAA0226, myxovirus (influenza virus) resistance, proline-rich protein BstNI subfamily 3, PR domain containing 2, with ZNF domain, RIZ1 (PRDM2), PR domain containing 8 (PRDM8), SPANX family member N2 (SPANXN2), TRIO and F-actin binding protein and zinc finger protein 717 in IPA specimens were 50% decreased compared with nIPA specimens. In particular, DPCR1, PRDM2, PRDM8 and SPANXN2 mRNA levels in IPA specimens were approximately four-fold lower compared with nIPA specimens (P=0.003, 0.007, 0.009 and 0.004, respectively). By contrast, the mRNA levels of dentin sialophospho protein, EGF like domain, multiple 7 (EGFL7), low density lipoprotein receptor-related protein 1B and dynein, axonemal, assembly factor 1 (LRRC50) were increased in IPA compared with nIPA specimens (P=0.041, 0.037, 0.022 and 0.013, respectively). Furthermore, decreased PRDM2 expression was associated with tumor recurrence. The findings of the present study indicate that DPCR1, EGFL7, the PRDM family and LRRC50 in pituitary adenomas are modifiers of

  9. Eliminating tyrosine sequence variants in CHO cell lines producing recombinant monoclonal antibodies.

    PubMed

    Feeney, Lauren; Carvalhal, Veronica; Yu, X Christopher; Chan, Betty; Michels, David A; Wang, Yajun Jennifer; Shen, Amy; Ressl, Jan; Dusel, Brendon; Laird, Michael W

    2013-04-01

    Amino acid sequence variants are defined as unintended amino acid sequence changes that contribute to product variation with potential impact to product safety, immunogenicity, and efficacy. Therefore, it is important to understand the propensity for sequence variant (SV) formation during the production of recombinant proteins for therapeutic use. During the development of clinical therapeutic products, several monoclonal antibodies (mAbs) produced from Chinese Hamster Ovary (CHO) cells exhibited SVs at low levels (≤3%) in multiple locations throughout the mAbs. In these examples, the cell culture process depleted tyrosine, and the tyrosine residues in the recombinant mAbs were replaced with phenylalanine or histidine. In this work, it is demonstrated that tyrosine supplementation eliminated the tyrosine SVs, while early tyrosine starvation significantly increased the SV level in all mAbs tested. Additionally, it was determined that phenylalanine is the amino acid preferentially misincorporated in the absence of tyrosine over histidine, with no other amino acid misincorporated in the absence of tyrosine, phenylalanine, and histidine. The data support that the tyrosine SVs are due to mistranslation and not DNA mutation, most likely due to tRNA(Tyr) mischarging due to the structural similarities between tyrosine and phenylalanine.

  10. Structure prediction and analysis of neuraminidase sequence variants.

    PubMed

    Thayer, Kelly M

    2016-07-01

    Analyzing protein structure has become an integral aspect of understanding systems of biochemical import. The laboratory experiment endeavors to introduce protein folding to ascertain structures of proteins for which the structure is unavailable, as well as to critically evaluate the quality of the prediction obtained. The model system used is the highly mutable influenza virus protein neuraminidase, which is the key target in the development of therapeutics. In light of recent pandemics, understanding how mutations confer drug resistance, which translates at the molecular level to understanding how different sequence variants differ, constitutes an area of great interest because of the ramifications in public health. This lab targets upper level undergraduate biochemistry students, and aims to introduce tools to be used to explore protein folding and protein visualization in the context of the neuraminidase case study. Students proceed to critically evaluate the folded models by comparison with crystallographic structures. When validity is established, they fold a neuraminidase sequence for which a structure is not available. Through structural alignment and visual inspection of the 150 loop, students gain molecular insight into two possible conformations of the protein, which are actively being studied. Folding the third chosen sequence mimics a true research environment in allowing students to generate a structure from a sequence for which a structure was not previously available, and to assess whether their particular variant has an open or closed loop. From this vantage, they are then challenged to speculate about the connection between loop conformation and drug susceptibility. © 2016 by The International Union of Biochemistry and Molecular Biology, 44(4):361-376, 2016. PMID:26900942

  11. High speed nucleic acid sequencing

    SciTech Connect

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid. Each type of labeled nucleotide comprises an acceptor fluorophore attached to a phosphate portion of the nucleotide such that the fluorophore is removed upon incorporation into a growing strand. Fluorescent signal is emitted via fluorescent resonance energy transfer between the donor fluorophore and the acceptor fluorophore as each nucleotide is incorporated into the growing strand. The sequence is deduced by identifying which base is being incorporated into the growing strand.

  12. Predicted Molecular Effects of Sequence Variants Link to System Level of Disease.

    PubMed

    Reeb, Jonas; Hecht, Maximilian; Mahlich, Yannick; Bromberg, Yana; Rost, Burkhard

    2016-08-01

    Developments in experimental and computational biology are advancing our understanding of how protein sequence variation impacts molecular protein function. However, the leap from the micro level of molecular function to the macro level of the whole organism, e.g. disease, remains barred. Here, we present new results emphasizing earlier work that suggested some links from molecular function to disease. We focused on non-synonymous single nucleotide variants, also referred to as single amino acid variants (SAVs). Building upon OMIA (Online Mendelian Inheritance in Animals), we introduced a curated set of 117 disease-causing SAVs in animals. Methods optimized to capture effects upon molecular function often correctly predict human (OMIM) and animal (OMIA) Mendelian disease-causing variants. We also predicted effects of human disease-causing variants in the mouse model, i.e. we put OMIM SAVs into mouse orthologs. Overall, fewer variants were predicted with effect in the model organism than in the original organism. Our results, along with other recent studies, demonstrate that predictions of molecular effects capture some important aspects of disease. Thus, in silico methods focusing on the micro level of molecular function can help to understand the macro system level of disease. PMID:27536940

  13. Predicted Molecular Effects of Sequence Variants Link to System Level of Disease

    PubMed Central

    Bromberg, Yana; Rost, Burkhard

    2016-01-01

    Developments in experimental and computational biology are advancing our understanding of how protein sequence variation impacts molecular protein function. However, the leap from the micro level of molecular function to the macro level of the whole organism, e.g. disease, remains barred. Here, we present new results emphasizing earlier work that suggested some links from molecular function to disease. We focused on non-synonymous single nucleotide variants, also referred to as single amino acid variants (SAVs). Building upon OMIA (Online Mendelian Inheritance in Animals), we introduced a curated set of 117 disease-causing SAVs in animals. Methods optimized to capture effects upon molecular function often correctly predict human (OMIM) and animal (OMIA) Mendelian disease-causing variants. We also predicted effects of human disease-causing variants in the mouse model, i.e. we put OMIM SAVs into mouse orthologs. Overall, fewer variants were predicted with effect in the model organism than in the original organism. Our results, along with other recent studies, demonstrate that predictions of molecular effects capture some important aspects of disease. Thus, in silico methods focusing on the micro level of molecular function can help to understand the macro system level of disease. PMID:27536940

  14. Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants

    PubMed Central

    Belkadi, Aziz; Bolze, Alexandre; Itan, Yuval; Cobat, Aurélie; Vincent, Quentin B.; Antipenko, Alexander; Shang, Lei; Boisson, Bertrand; Casanova, Jean-Laurent; Abel, Laurent

    2015-01-01

    We compared whole-exome sequencing (WES) and whole-genome sequencing (WGS) in six unrelated individuals. In the regions targeted by WES capture (81.5% of the consensus coding genome), the mean numbers of single-nucleotide variants (SNVs) and small insertions/deletions (indels) detected per sample were 84,192 and 13,325, respectively, for WES, and 84,968 and 12,702, respectively, for WGS. For both SNVs and indels, the distributions of coverage depth, genotype quality, and minor read ratio were more uniform for WGS than for WES. After filtering, a mean of 74,398 (95.3%) high-quality (HQ) SNVs and 9,033 (70.6%) HQ indels were called by both platforms. A mean of 105 coding HQ SNVs and 32 indels was identified exclusively by WES whereas 692 HQ SNVs and 105 indels were identified exclusively by WGS. We Sanger-sequenced a random selection of these exclusive variants. For SNVs, the proportion of false-positive variants was higher for WES (78%) than for WGS (17%). The estimated mean number of real coding SNVs (656 variants, ∼3% of all coding HQ SNVs) identified by WGS and missed by WES was greater than the number of SNVs identified by WES and missed by WGS (26 variants). For indels, the proportions of false-positive variants were similar for WES (44%) and WGS (46%). Finally, WES was not reliable for the detection of copy-number variations, almost all of which extended beyond the targeted regions. Although currently more expensive, WGS is more powerful than WES for detecting potential disease-causing mutations within WES regions, particularly those due to SNVs. PMID:25827230

  15. Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants.

    PubMed

    Belkadi, Aziz; Bolze, Alexandre; Itan, Yuval; Cobat, Aurélie; Vincent, Quentin B; Antipenko, Alexander; Shang, Lei; Boisson, Bertrand; Casanova, Jean-Laurent; Abel, Laurent

    2015-04-28

    We compared whole-exome sequencing (WES) and whole-genome sequencing (WGS) in six unrelated individuals. In the regions targeted by WES capture (81.5% of the consensus coding genome), the mean numbers of single-nucleotide variants (SNVs) and small insertions/deletions (indels) detected per sample were 84,192 and 13,325, respectively, for WES, and 84,968 and 12,702, respectively, for WGS. For both SNVs and indels, the distributions of coverage depth, genotype quality, and minor read ratio were more uniform for WGS than for WES. After filtering, a mean of 74,398 (95.3%) high-quality (HQ) SNVs and 9,033 (70.6%) HQ indels were called by both platforms. A mean of 105 coding HQ SNVs and 32 indels was identified exclusively by WES whereas 692 HQ SNVs and 105 indels were identified exclusively by WGS. We Sanger-sequenced a random selection of these exclusive variants. For SNVs, the proportion of false-positive variants was higher for WES (78%) than for WGS (17%). The estimated mean number of real coding SNVs (656 variants, ∼3% of all coding HQ SNVs) identified by WGS and missed by WES was greater than the number of SNVs identified by WES and missed by WGS (26 variants). For indels, the proportions of false-positive variants were similar for WES (44%) and WGS (46%). Finally, WES was not reliable for the detection of copy-number variations, almost all of which extended beyond the targeted regions. Although currently more expensive, WGS is more powerful than WES for detecting potential disease-causing mutations within WES regions, particularly those due to SNVs. PMID:25827230

  16. Eleven new sequence variants of citrus exocortis viroid and the correlation of sequence with pathogenicity.

    PubMed Central

    Visvader, J E; Symons, R H

    1985-01-01

    Full-length double-stranded cDNA was prepared from purified circular RNA of two new Australian field isolates of citrus exocortis viroid (CEV) using two synthetic oligodeoxynucleotide primers. The cDNA was then cloned into the phage vector M13mp9 for sequence analysis. Sequencing of nine cDNA clones of isolate CEV-DE30 and eleven cDNA clones of isolate CEV-J indicated that both isolates consisted of a mixture of viroid species and led to the discovery of eleven new sequence variants of CEV. These new variants, together with the six reported previously, form two classes of sequence which differ by a minimum of 26 nucleotides in a total of 370 to 375 residues. These two classes correlate with two biologically distinct groups when propagated on tomato plants where one produces severe symptoms and the other gives rise to mild symptoms. Two regions of the native structure of CEV, comprising 18% of the total residues, differ between the sequence variants of mild and severe isolates. Whether or not both of these regions are essential for the variation in pathogenicity has yet to be determined. PMID:2582367

  17. Effect of Amino Acid Polymorphisms of House Dust Mite Der p 2 Variants on Allergic Sensitization

    PubMed Central

    Tanyaratsrisakul, Sasipa; Jirapongsananuruk, Orathai; Kulwanich, Bhakkawarat; Hales, Belinda J.; Thomas, Wayne R.

    2016-01-01

    Purpose The sequence variations of the Der p 2 allergen of Dermatophagoides pteronyssinus diverge along 2 pathways with particular amino acid substitutions at positions 40,47,111, and 114. The environmental prevalence and IgE binding to Der p 2 variants differ among regions. To compare IgE binding to Der p 2 variants between sera from Bangkok, Thailand and Perth, Western Australia with different variants and to determine the variant-specificity of antibodies induced by vaccination with recombinant variants. Methods The structures of recombinant variants produced in yeast were compared by circular dichroism and 1-anilinonaphthalene 8-sulfonic acid staining of their lipid-binding cavity. Sera from subjects in Bangkok and Perth where different variants are found were compared by the affinity (IC50) of IgE cross-reactivity to different variants and by direct IgE binding. Mice were immunized with the variants Der p 2.0101 and Der p 2.0110, and their IgG binding to Der p 2.0103, 2.0104, and 2.0109 was measured. Results The secondary structures of the recombinant variants resembled the natural allergen but with differences in ANS binding. The IC50 of Der p 2.0101 required 7-fold higher concentrations to inhibit IgE binding to the high-IgE-binding Der p 2.0104 than for homologous inhibition in sera from Bangkok where it is absent, while in sera from Perth that have both variants the IC50 was the same and low. Reciprocal results were obtained for Der p 2.0110 not found in Perth. Direct binding revealed that Der p 2.0104 was best for detecting IgE in both regions, followed by Der p 2.0101 with binding to other variants showing larger differences. Mouse anti-Der p 2.0101 antibodies had a high affinity of cross-reactivity but bound poorly to other variants. Conclusions The affinity of IgE antibody cross-reactivity, the direct IgE binding, and the specificities of antibodies induced by vaccination show that measures of allergic sensitization and therapeutic strategies could be

  18. Functional annotation of non-coding sequence variants

    PubMed Central

    Ritchie, Graham R. S.; Dunham, Ian; Zeggini, Eleftheria; Flicek, Paul

    2016-01-01

    Identifying functionally relevant variants against the background of ubiquitous genetic variation is a major challenge in human genetics. For variants that fall in protein-coding regions our understanding of the genetic code and splicing allow us to identify likely candidates, but interpreting variants that fall outside of genic regions is more difficult. Here we present a new tool, GWAVA, which supports prioritisation of non-coding variants by integrating a range of annotations. PMID:24487584

  19. Many amino acid substitution variants identified in DNA repair genes during human population screenings are predicted to impact protein function

    SciTech Connect

    Xi, T; Jones, I M; Mohrenweiser, H W

    2003-11-03

    Over 520 different amino acid substitution variants have been previously identified in the systematic screening of 91 human DNA repair genes for sequence variation. Two algorithms were employed to predict the impact of these amino acid substitutions on protein activity. Sorting Intolerant From Tolerant (SIFT) classified 226 of 508 variants (44%) as ''Intolerant''. Polymorphism Phenotyping (PolyPhen) classed 165 of 489 amino acid substitutions (34%) as ''Probably or Possibly Damaging''. Another 9-15% of the variants were classed as ''Potentially Intolerant or Damaging''. The results from the two algorithms are highly associated, with concordance in predicted impact observed for {approx}62% of the variants. Twenty one to thirty one percent of the variant proteins are predicted to exhibit reduced activity by both algorithms. These variants occur at slightly lower individual allele frequency than do the variants classified as ''Tolerant'' or ''Benign''. Both algorithms correctly predicted the impact of 26 functionally characterized amino acid substitutions in the APE1 protein on biochemical activity, with one exception. It is concluded that a substantial fraction of the missense variants observed in the general human population are functionally relevant. These variants are expected to be the molecular genetic and biochemical basis for the associations of reduced DNA repair capacity phenotypes with elevated cancer risk.

  20. Predicting effects of noncoding variants with deep learning–based sequence model

    PubMed Central

    Zhou, Jian; Troyanskaya, Olga G

    2016-01-01

    Identifying functional effects of noncoding variants is a major challenge in human genetics. To predict the noncoding-variant effects de novo from sequence, we developed a deep learning–based algorithmic framework, DeepSEA (http://deepsea.princeton.edu/), that directly learns a regulatory sequence code from large-scale chromatin-profiling data, enabling prediction of chromatin effects of sequence alterations with single-nucleotide sensitivity. We further used this capability to improve prioritization of functional variants including expression quantitative trait loci (eQTLs) and disease-associated variants. PMID:26301843

  1. False discovery rates for rare variants from sequenced data.

    PubMed

    Capanu, Marinela; Seshan, Venkatraman E

    2015-02-01

    The detection of rare deleterious variants is the preeminent current technical challenge in statistical genetics. Sorting the deleterious from neutral variants at a disease locus is challenging because of the sparseness of the evidence for each individual variant. Hierarchical modeling and Bayesian model uncertainty are two techniques that have been shown to be promising in pinpointing individual rare variants that may be driving the association. Interpreting the results from these techniques from the perspective of multiple testing is a challenge and the goal of this article is to better understand their false discovery properties. Using simulations, we conclude that accurate false discovery control cannot be achieved in this framework unless the magnitude of the variants' risk is large and the hierarchical characteristics have high accuracy in distinguishing deleterious from neutral variants.

  2. Association of an ACSL1 gene variant with polyunsaturated fatty acids in bovine skeletal muscle

    PubMed Central

    2011-01-01

    Background The intramuscular fat deposition and the fatty acid profiles of beef affect meat quality. High proportions of unsaturated fatty acids are related to beef flavor and are beneficial for the nutritional value of meat. Moreover, a variety of clinical and epidemiologic studies showed that particularly long-chain omega-3 fatty acids from animal sources have a positive impact on human health and disease. Results To screen for genetic factors affecting fatty acid profiles in beef, we initially performed a microsatellite-based genome scan in a F2 Charolais × German Holstein resource population and identified a quantitative trait locus (QTL) for fatty acid composition in a region on bovine chromosome 27 where previously QTL affecting marbling score had been detected in beef cattle populations. The long-chain acyl-CoA synthetase 1 (ACSL1) gene was identified as the most plausible functional and positional candidate gene in the QTL interval due to its direct impact on fatty acid metabolism and its position in the QTL interval. ACSL1 is necessary for synthesis of long-chain acyl-CoA esters, fatty acid degradation and phospholipid remodeling. We validated the genomic annotation of the bovine ACSL1 gene by in silico comparative sequence analysis and experimental verification. Re-sequencing of the complete coding, exon-flanking intronic sequences, 3' untranslated region (3'UTR) and partial promoter region of the ACSL1 gene revealed three synonymous mutations in exons 6, 7, and 20, six noncoding intronic gene variants, six polymorphisms in the promoter region, and four variants in the 3' UTR region. The association analysis identified the gene variant in intron 5 of the ACSL1 gene (c.481-233A>G) to be significantly associated with the relative content of distinct fractions and ratios of fatty acids (e.g., n-3 fatty acids, polyunsaturated, n-3 long-chain polyunsaturated fatty acids, trans vaccenic acid) in skeletal muscle. A tentative association of the ACSL1 gene

  3. Computational Approach to Annotating Variants of Unknown Significance in Clinical Next Generation Sequencing.

    PubMed

    Schulz, Wade L; Tormey, Christopher A; Torres, Richard

    2015-01-01

    Next generation sequencing (NGS) has become a common technology in the clinical laboratory, particularly for the analysis of malignant neoplasms. However, most mutations identified by NGS are variants of unknown clinical significance (VOUS). Although the approach to define these variants differs by institution, software algorithms that predict variant effect on protein function may be used. However, these algorithms commonly generate conflicting results, potentially adding uncertainty to interpretation. In this review, we examine several computational tools used to predict whether a variant has clinical significance. In addition to describing the role of these tools in clinical diagnostics, we assess their efficacy in analyzing known pathogenic and benign variants in hematologic malignancies.

  4. Sequence variants from whole genome sequencing a large group of Icelanders.

    PubMed

    Gudbjartsson, Daniel F; Sulem, Patrick; Helgason, Hannes; Gylfason, Arnaldur; Gudjonsson, Sigurjon A; Zink, Florian; Oddson, Asmundur; Magnusson, Gisli; Halldorsson, Bjarni V; Hjartarson, Eirikur; Sigurdsson, Gunnar Th; Kong, Augustine; Helgason, Agnar; Masson, Gisli; Magnusson, Olafur Th; Thorsteinsdottir, Unnur; Stefansson, Kari

    2015-01-01

    We have accumulated considerable data on the genetic makeup of the Icelandic population by sequencing the whole genomes of 2,636 Icelanders to depth of at least 10X and by chip genotyping 101,584 more. The sequencing was done with Illumina technology. The median sequencing depth was 20X and 909 individuals were sequenced to a depth of at least 30X. We found 20 million single nucleotide polymorphisms (SNPs) and 1.5 million insertions/deletions (indels) that passed stringent quality control. Almost all the common SNPs (derived allele frequency (DAF) over 2%) that we identified in Iceland have been observed by either dbSNP (build 137) or the Exome Sequencing Project (ESP) while only 60 and 20% of rare (DAF<0.5%) SNPs and indels in coding regions, the most heavily studied parts of the genome, have been observed in the public databases. Features of our variant data, such as the transition/transversion ratio and the length distribution of indels, are similar to published reports. PMID:25977816

  5. Rare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Level

    PubMed Central

    Lu, Wenbin; Tzeng, Jung-Ying

    2016-01-01

    Genetic association analyses of rare variants in next-generation sequencing (NGS) studies are fundamentally challenging due to the presence of a very large number of candidate variants at extremely low minor allele frequencies. Recent developments often focus on pooling multiple variants to provide association analysis at the gene instead of the locus level. Nonetheless, pinpointing individual variants is a critical goal for genomic researches as such information can facilitate the precise delineation of molecular mechanisms and functions of genetic factors on diseases. Due to the extreme rarity of mutations and high-dimensionality, significances of causal variants cannot easily stand out from those of noncausal ones. Consequently, standard false-positive control procedures, such as the Bonferroni and false discovery rate (FDR), are often impractical to apply, as a majority of the causal variants can only be identified along with a few but unknown number of noncausal variants. To provide informative analysis of individual variants in large-scale sequencing studies, we propose the Adaptive False-Negative Control (AFNC) procedure that can include a large proportion of causal variants with high confidence by introducing a novel statistical inquiry to determine those variants that can be confidently dispatched as noncausal. The AFNC provides a general framework that can accommodate for a variety of models and significance tests. The procedure is computationally efficient and can adapt to the underlying proportion of causal variants and quality of significance rankings. Extensive simulation studies across a plethora of scenarios demonstrate that the AFNC is advantageous for identifying individual rare variants, whereas the Bonferroni and FDR are exceedingly over-conservative for rare variants association studies. In the analyses of the CoLaus dataset, AFNC has identified individual variants most responsible for gene-level significances. Moreover, single-variant results

  6. Consensus sequence determination and elucidation of the evolutionary history of a rotavirus Wa variant reveal a close relationship to various Wa variants derived from the original Wa strain.

    PubMed

    Wentzel, Johannes F; Yuan, Lijuan; Rao, Shujing; van Dijk, Alberdina A; O'Neill, Hester G

    2013-12-01

    The consensus nucleotide sequence of a human rotavirus Wa strain, with only a partially known passage history, was determined with sequence-independent amplification and next generation 454® pyrosequencing. This rotavirus Wa strain had the expected genome constellation of G1-P[8]-I1-R1-C1-M1-A1-N1-T1-E1-H1 and was designated RVA/Human-tc/USA/WaCS/1974/G1P[8]. Phylogenetic analyses revealed a close relationship to four human rotavirus Wa variants (Wag5re, Wag7/8re, ParWa and VirWa) derived from the original 1974 human isolate. There were rearrangements in the Wag5re- and Wag7/8re variants in genome segments 5 (Wag5re) and 7 and 8 (Wag7/8re), which were not present in WaCS. Pairwise comparisons and a combined molecular clock for the Wa rotavirus genome indicated a close relationship between WaCS and ParWa and VirWa. These results suggest that WaCS is most probably an early cell culture adapted variant from the initial gnotobiotic pig passaged Wa isolate. Evolutionary pressure analysis identified a possible negative selected amino acid site in VP1 (genome segment 1) and a likely positive selected site in VP4 (genome segment 4). The WaCS may be more appropriate as a rotavirus Wa reference sequence than the current composite Wa reference genome.

  7. Whole-Genome sequencing and genetic variant analysis of a quarter Horse mare

    PubMed Central

    2012-01-01

    Background The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. Results Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse's genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. Conclusions This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids. PMID:22340285

  8. Population sequencing of two endocannabinoid metabolic genes identifies rare and common regulatory variants associated with extreme obesity and metabolite level

    PubMed Central

    2010-01-01

    Background Targeted re-sequencing of candidate genes in individuals at the extremes of a quantitative phenotype distribution is a method of choice to gain information on the contribution of rare variants to disease susceptibility. The endocannabinoid system mediates signaling in the brain and peripheral tissues involved in the regulation of energy balance, is highly active in obese patients, and represents a strong candidate pathway to examine for genetic association with body mass index (BMI). Results We sequenced two intervals (covering 188 kb) encoding the endocannabinoid metabolic enzymes fatty-acid amide hydrolase (FAAH) and monoglyceride lipase (MGLL) in 147 normal controls and 142 extremely obese cases. After applying quality filters, we called 1,393 high quality single nucleotide variants, 55% of which are rare, and 143 indels. Using single marker tests and collapsed marker tests, we identified four intervals associated with BMI: the FAAH promoter, the MGLL promoter, MGLL intron 2, and MGLL intron 3. Two of these intervals are composed of rare variants and the majority of the associated variants are located in promoter sequences or in predicted transcriptional enhancers, suggesting a regulatory role. The set of rare variants in the FAAH promoter associated with BMI is also associated with increased level of FAAH substrate anandamide, further implicating a functional role in obesity. Conclusions Our study, which is one of the first reports of a sequence-based association study using next-generation sequencing of candidate genes, provides insights into study design and analysis approaches and demonstrates the importance of examining regulatory elements rather than exclusively focusing on exon sequences. PMID:21118518

  9. Draft Genome Sequences of Five Yersinia pseudotuberculosis ST19 Isolates and One Isolate Variant.

    PubMed

    Platonov, Mikhail E; Blouin, Yann; Evseeva, Vera V; Afanas'ev, Maxim V; Pourcel, Christine; Balakhonov, Sergey V; Vergnaud, Gilles; Anisimov, Andrey P

    2013-04-11

    We report the first draft genome sequences of five Yersinia pseudotuberculosis isolates of sequence type (ST) 19 and of a variant from one of the five isolates. The total length of assemblies ranged from 4,226,485 bp to 4,274,148 bp, including between 3,808 and 3,843 predicted coding sequences.

  10. Draft Genome Sequences of Five Yersinia pseudotuberculosis ST19 Isolates and One Isolate Variant

    PubMed Central

    Platonov, Mikhail E.; Blouin, Yann; Evseeva, Vera V.; Afanas’ev, Maxim V.; Pourcel, Christine; Balakhonov, Sergey V.

    2013-01-01

    We report the first draft genome sequences of five Yersinia pseudotuberculosis isolates of sequence type (ST) 19 and of a variant from one of the five isolates. The total length of assemblies ranged from 4,226,485 bp to 4,274,148 bp, including between 3,808 and 3,843 predicted coding sequences. PMID:23580708

  11. Complete Genome Sequence of Pseudomonas aeruginosa Phage-Resistant Variant PA1RG

    PubMed Central

    Li, Gang; Lu, Shuguang; Shen, Mengyu; Le, Shuai; Tan, Yinling; Li, Ming; Zhao, Xia; Wang, Jing; Shen, Wei; Guo, Keke; Yang, Yuhui; Zhu, Hongbin; Li, Shu; Zhu, Junmin; Rao, Xiancai

    2016-01-01

    Bacteria have evolved several defense systems against phage predation. Here, we report the 6,500,439-bp complete genome sequence of the Pseudomonas aeruginosa phage-resistant variant PA1RG. Single-molecule real-time (SMRT) sequencing and de novo assembly revealed a single contig with 320-fold sequence coverage. PMID:26893434

  12. Complete Genome Sequence of Pseudomonas aeruginosa Phage-Resistant Variant PA1RG.

    PubMed

    Li, Gang; Lu, Shuguang; Shen, Mengyu; Le, Shuai; Tan, Yinling; Li, Ming; Zhao, Xia; Wang, Jing; Shen, Wei; Guo, Keke; Yang, Yuhui; Zhu, Hongbin; Li, Shu; Zhu, Junmin; Rao, Xiancai; Hu, Fuquan

    2016-01-01

    Bacteria have evolved several defense systems against phage predation. Here, we report the 6,500,439-bp complete genome sequence of the Pseudomonas aeruginosa phage-resistant variant PA1RG. Single-molecule real-time (SMRT) sequencing and de novo assembly revealed a single contig with 320-fold sequence coverage. PMID:26893434

  13. Chip-based sequencing nucleic acids

    SciTech Connect

    Beer, Neil Reginald

    2014-08-26

    A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.

  14. Detection and characterization of two co-infection variant strains of avian orthoreovirus (ARV) in young layer chickens using next-generation sequencing (NGS).

    PubMed

    Tang, Yi; Lin, Lin; Sebastian, Aswathy; Lu, Huaguang

    2016-04-19

    Using next-generation sequencing (NGS) for full genomic characterization studies of the newly emerging avian orthoreovirus (ARV) field strains isolated in Pennsylvania poultry, we identified two co-infection ARV variant strains from one ARV isolate obtained from ARV-affected young layer chickens. The de novo assembly of the ARV reads generated 19 contigs of two different ARV variant strains according to 10 genome segments of each ARV strain. The two variants had the same M2 segment. The complete genomes of each of the two variant strains were 23,493 bp in length, and 10 dsRNA segments ranged from 1192 bp (S4) to 3958 bp (L1), encoding 12 viral proteins. Sequence comparison of nucleotide (nt) and amino acid (aa) sequences of all 10 genome segments revealed 58.1-100% and 51.4-100% aa identity between the two variant strains, and 54.3-89.4% and 49.5-98.1% aa identity between the two variants and classic vaccine strains. Phylogenetic analysis revealed a moderate to significant nt sequence divergence between the two variant and ARV reference strains. These findings have demonstrated the first naturally occurring co-infection of two ARV variants in commercial young layer chickens, providing scientific evidence that multiple ARV strains can be simultaneously present in one host species of chickens.

  15. Detection and characterization of two co-infection variant strains of avian orthoreovirus (ARV) in young layer chickens using next-generation sequencing (NGS)

    PubMed Central

    Tang, Yi; Lin, Lin; Sebastian, Aswathy; Lu, Huaguang

    2016-01-01

    Using next-generation sequencing (NGS) for full genomic characterization studies of the newly emerging avian orthoreovirus (ARV) field strains isolated in Pennsylvania poultry, we identified two co-infection ARV variant strains from one ARV isolate obtained from ARV-affected young layer chickens. The de novo assembly of the ARV reads generated 19 contigs of two different ARV variant strains according to 10 genome segments of each ARV strain. The two variants had the same M2 segment. The complete genomes of each of the two variant strains were 23,493 bp in length, and 10 dsRNA segments ranged from 1192 bp (S4) to 3958 bp (L1), encoding 12 viral proteins. Sequence comparison of nucleotide (nt) and amino acid (aa) sequences of all 10 genome segments revealed 58.1–100% and 51.4–100% aa identity between the two variant strains, and 54.3–89.4% and 49.5–98.1% aa identity between the two variants and classic vaccine strains. Phylogenetic analysis revealed a moderate to significant nt sequence divergence between the two variant and ARV reference strains. These findings have demonstrated the first naturally occurring co-infection of two ARV variants in commercial young layer chickens, providing scientific evidence that multiple ARV strains can be simultaneously present in one host species of chickens. PMID:27089943

  16. Test for Rare Variants by Environment Interactions in Sequencing Association Studies

    PubMed Central

    Lin, Xinyi; Lee, Seunggeun; Wu, Michael C.; Wang, Chaolong; Chen, Han; Li, Zilin; Lin, Xihong

    2015-01-01

    Summary We consider in this paper testing rare variants by environment interactions in sequencing association studies. Current methods for studying the association of rare variants with traits cannot be readily applied for testing for rare variants by environment interactions, as these methods do not effectively control for the main effects of rare variants, leading to unstable results and/or inflated Type 1 error rates. We will first analytically study the bias of the use of conventional burden based tests for rare variants by environment interactions, and show the tests can often be invalid and result in inflated Type 1 error rates. To overcome these difficulties, we develop the interaction sequence kernel association test (iSKAT) for assessing rare variants by environment interactions. The proposed test iSKAT is optimal in a class of variance component tests and is powerful and robust to the proportion of variants in a gene that interact with environment and the signs of the effects. This test properly controls for the main effects of the rare variants using weighted ridge regression while adjusting for covariates. We demonstrate the performance of iSKAT using simulation studies and illustrate its application by analysis of a candidate gene sequencing study of plasma adiponectin levels. PMID:26229047

  17. Molecular analyses of an acidic transthyretin Asn 90 variant.

    PubMed Central

    Saraiva, M J; Almeida, M R; Alves, I L; Moreira, P; Gawinowicz, M; Costa, P P; Rauh, S; Banhzoff, A; Altland, K

    1991-01-01

    A mutation in transthyretin (TTR Asn 90) has been identified in the Portuguese and German populations. This variant has a lower pI and was found by screening analyses in 2/4,000 German subjects and in 4/1,200 Portuguese by using either double one-dimensional (D1-D) electrophoresis with isoelectric focusing (IEF) or hybrid isoelectric focusing in immobilized pH gradient (HIEF) as the final separation step. The Portuguese population sample was from the area where TTR Met 30-associated familial amyloidotic polyneuropathy (FAP) prevails, and it was divided into (a) a group of 500 individuals belonging to FAP kindreds and (b) a group of 700 collected at random. HIEF showed two particular situations: (1) one case, from an FAP kindred, was simultaneously carrier of the Met 30 substitution and the acidic variant, and (2) one individual, from the randomly selected Portuguese sample, had only the acidic monomer. Comparative peptide mapping, by HPLC, of the acidic variant carriers and of normal TTR showed the presence of an abnormal tryptic peptide, not present in the normal TTR digests, with an asparagine-for-histidine substitution at position 90 explained by a single base change of adenine for cytosine in the histidine codon. This was confirmed at the DNA level by RFLP analyses of PCR-amplified material after digestion with SphI and BsmI. In all carriers of the Asn 90 substitution, no indicators were found for an association with traits characteristic for FAP. Images Figure 1 Figure 3 PMID:1850190

  18. Optimized filtering reduces the error rate in detecting genomic variants by short-read sequencing.

    PubMed

    Reumers, Joke; De Rijk, Peter; Zhao, Hui; Liekens, Anthony; Smeets, Dominiek; Cleary, John; Van Loo, Peter; Van Den Bossche, Maarten; Catthoor, Kirsten; Sabbe, Bernard; Despierre, Evelyn; Vergote, Ignace; Hilbush, Brian; Lambrechts, Diether; Del-Favero, Jurgen

    2012-01-01

    Distinguishing single-nucleotide variants (SNVs) from errors in whole-genome sequences remains challenging. Here we describe a set of filters, together with a freely accessible software tool, that selectively reduce error rates and thereby facilitate variant detection in data from two short-read sequencing technologies, Complete Genomics and Illumina. By sequencing the nearly identical genomes from monozygotic twins and considering shared SNVs as 'true variants' and discordant SNVs as 'errors', we optimized thresholds for 12 individual filters and assessed which of the 1,048 filter combinations were effective in terms of sensitivity and specificity. Cumulative application of all effective filters reduced the error rate by 290-fold, facilitating the identification of genetic differences between monozygotic twins. We also applied an adapted, less stringent set of filters to reliably identify somatic mutations in a highly rearranged tumor and to identify variants in the NA19240 HapMap genome relative to a reference set of SNVs. PMID:22178994

  19. Distinguishing proteins from arbitrary amino acid sequences.

    PubMed

    Yau, Stephen S-T; Mao, Wei-Guang; Benson, Max; He, Rong Lucy

    2015-01-01

    What kinds of amino acid sequences could possibly be protein sequences? From all existing databases that we can find, known proteins are only a small fraction of all possible combinations of amino acids. Beginning with Sanger's first detailed determination of a protein sequence in 1952, previous studies have focused on describing the structure of existing protein sequences in order to construct the protein universe. No one, however, has developed a criteria for determining whether an arbitrary amino acid sequence can be a protein. Here we show that when the collection of arbitrary amino acid sequences is viewed in an appropriate geometric context, the protein sequences cluster together. This leads to a new computational test, described here, that has proved to be remarkably accurate at determining whether an arbitrary amino acid sequence can be a protein. Even more, if the results of this test indicate that the sequence can be a protein, and it is indeed a protein sequence, then its identity as a protein sequence is uniquely defined. We anticipate our computational test will be useful for those who are attempting to complete the job of discovering all proteins, or constructing the protein universe. PMID:25609314

  20. Pooled-DNA Sequencing for Elucidating New Genomic Risk Factors, Rare Variants Underlying Alzheimer's Disease.

    PubMed

    Jin, Sheng Chih; Benitez, Bruno A; Deming, Yuetiva; Cruchaga, Carlos

    2016-01-01

    Analyses of genome-wide association studies (GWAS) for complex disorders usually identify common variants with a relatively small effect size that only explain a small proportion of phenotypic heritability. Several studies have suggested that a significant fraction of heritability may be explained by low-frequency (minor allele frequency (MAF) of 1-5 %) and rare-variants that are not contained in the commercial GWAS genotyping arrays (Schork et al., Curr Opin Genet Dev 19:212, 2009). Rare variants can also have relatively large effects on risk for developing human diseases or disease phenotype (Cruchaga et al., PLoS One 7:e31039, 2012). However, it is necessary to perform next-generation sequencing (NGS) studies in a large population (>4,000 samples) to detect a significant rare-variant association. Several NGS methods, such as custom capture sequencing and amplicon-based sequencing, are designed to screen a small proportion of the genome, but most of these methods are limited in the number of samples that can be multiplexed (i.e. most sequencing kits only provide 96 distinct index). Additionally, the sequencing library preparation for 4,000 samples remains expensive and thus conducting NGS studies with the aforementioned methods are not feasible for most research laboratories.The need for low-cost large scale rare-variant detection makes pooled-DNA sequencing an ideally efficient and cost-effective technique to identify rare variants in target regions by sequencing hundreds to thousands of samples. Our recent work has demonstrated that pooled-DNA sequencing can accurately detect rare variants in targeted regions in multiple DNA samples with high sensitivity and specificity (Jin et al., Alzheimers Res Ther 4:34, 2012). In these studies we used a well-established pooled-DNA sequencing approach and a computational package, SPLINTER (short indel prediction by large deviation inference and nonlinear true frequency estimation by recursion) (Vallania et al., Genome Res

  1. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-06-06

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  2. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-05-30

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  3. Somatic mutations and germline sequence variants in the expressed tyrosine kinase genes of patients with de novo acute myeloid leukemia

    PubMed Central

    Xiang, Zhifu; Walgren, Richard; Zhao, Yu; Kasai, Yumi; Miner, Tracie; Ries, Rhonda E.; Lubman, Olga; Fremont, Daved H.; McLellan, Michael D.; Payton, Jacqueline E.; Westervelt, Peter; DiPersio, John F.; Link, Daniel C.; Walter, Matthew J.; Graubert, Timothy A.; Watson, Mark; Baty, Jack; Heath, Sharon; Shannon, William D.; Nagarajan, Rakesh; Bloomfield, Clara D.; Mardis, Elaine R.; Wilson, Richard K.; Ley, Timothy J.

    2008-01-01

    Activating mutations in tyrosine kinase (TK) genes (eg, FLT3 and KIT) are found in more than 30% of patients with de novo acute myeloid leukemia (AML); many groups have speculated that mutations in other TK genes may be present in the remaining 70%. We performed high-throughput resequencing of the kinase domains of 26 TK genes (11 receptor TK; 15 cytoplasmic TK) expressed in most AML patients using genomic DNA from the bone marrow (tumor) and matched skin biopsy samples (“germline”) from 94 patients with de novo AML; sequence variants were validated in an additional 94 AML tumor samples (14.3 million base pairs of sequence were obtained and analyzed). We identified known somatic mutations in FLT3, KIT, and JAK2 TK genes at the expected frequencies and found 4 novel somatic mutations, JAK1V623A, JAK1T478S, DDR1A803V, and NTRK1S677N, once each in 4 respective patients of 188 tested. We also identified novel germline sequence changes encoding amino acid substitutions (ie, nonsynonymous changes) in 14 TK genes, including TYK2, which had the largest number of nonsynonymous sequence variants (11 total detected). Additional studies will be required to define the roles that these somatic and germline TK gene variants play in AML pathogenesis. PMID:18270328

  4. A Unified Mixed-Effects Model for Rare-Variant Association in Sequencing Studies

    PubMed Central

    Sun, Jianping; Zheng, Yingye; Hsu, Li

    2013-01-01

    For rare-variant association analysis, due to extreme low frequencies of these variants, it is necessary to aggregate them by a prior set (e.g., genes and pathways) in order to achieve adequate power. In this paper, we consider hierarchical models to relate a set of rare variants to phenotype by modeling the effects of variants as a function of variant characteristics while allowing for variant-specific effect (heterogeneity). We derive a set of two score statistics, testing the group effect by variant characteristics and the heterogeneity effect. We make a novel modification to these score statistics so that they are independent under the null hypothesis and their asymptotic distributions can be derived. As a result, the computational burden is greatly reduced compared with permutation-based tests. Our approach provides a general testing framework for rare variants association, which includes many commonly used tests, such as the burden test [Li and Leal, 2008] and the sequence kernel association test [Wu et al., 2011], as special cases. Furthermore, in contrast to these tests, our proposed test has an added capacity to identify which components of variant characteristics and heterogeneity contribute to the association. Simulations under a wide range of scenarios show that the proposed test is valid, robust and powerful. An application to the Dallas Heart Study illustrates that apart from identifying genes with significant associations, the new method also provides additional information regarding the source of the association. Such information may be useful for generating hypothesis in future studies. PMID:23483651

  5. Whole-Exome Sequencing Identifies Rare and Low-Frequency Coding Variants Associated with LDL Cholesterol

    PubMed Central

    Lange, Leslie A.; Hu, Youna; Zhang, He; Xue, Chenyi; Schmidt, Ellen M.; Tang, Zheng-Zheng; Bizon, Chris; Lange, Ethan M.; Smith, Joshua D.; Turner, Emily H.; Jun, Goo; Kang, Hyun Min; Peloso, Gina; Auer, Paul; Li, Kuo-ping; Flannick, Jason; Zhang, Ji; Fuchsberger, Christian; Gaulton, Kyle; Lindgren, Cecilia; Locke, Adam; Manning, Alisa; Sim, Xueling; Rivas, Manuel A.; Holmen, Oddgeir L.; Gottesman, Omri; Lu, Yingchang; Ruderfer, Douglas; Stahl, Eli A.; Duan, Qing; Li, Yun; Durda, Peter; Jiao, Shuo; Isaacs, Aaron; Hofman, Albert; Bis, Joshua C.; Correa, Adolfo; Griswold, Michael E.; Jakobsdottir, Johanna; Smith, Albert V.; Schreiner, Pamela J.; Feitosa, Mary F.; Zhang, Qunyuan; Huffman, Jennifer E.; Crosby, Jacy; Wassel, Christina L.; Do, Ron; Franceschini, Nora; Martin, Lisa W.; Robinson, Jennifer G.; Assimes, Themistocles L.; Crosslin, David R.; Rosenthal, Elisabeth A.; Tsai, Michael; Rieder, Mark J.; Farlow, Deborah N.; Folsom, Aaron R.; Lumley, Thomas; Fox, Ervin R.; Carlson, Christopher S.; Peters, Ulrike; Jackson, Rebecca D.; van Duijn, Cornelia M.; Uitterlinden, André G.; Levy, Daniel; Rotter, Jerome I.; Taylor, Herman A.; Gudnason, Vilmundur; Siscovick, David S.; Fornage, Myriam; Borecki, Ingrid B.; Hayward, Caroline; Rudan, Igor; Chen, Y. Eugene; Bottinger, Erwin P.; Loos, Ruth J.F.; Sætrom, Pål; Hveem, Kristian; Boehnke, Michael; Groop, Leif; McCarthy, Mark; Meitinger, Thomas; Ballantyne, Christie M.; Gabriel, Stacey B.; O’Donnell, Christopher J.; Post, Wendy S.; North, Kari E.; Reiner, Alexander P.; Boerwinkle, Eric; Psaty, Bruce M.; Altshuler, David; Kathiresan, Sekar; Lin, Dan-Yu; Jarvik, Gail P.; Cupples, L. Adrienne; Kooperberg, Charles; Wilson, James G.; Nickerson, Deborah A.; Abecasis, Goncalo R.; Rich, Stephen S.; Tracy, Russell P.; Willer, Cristen J.; Gabriel, Stacey B.; Altshuler, David M.; Abecasis, Gonçalo R.; Allayee, Hooman; Cresci, Sharon; Daly, Mark J.; de Bakker, Paul I.W.; DePristo, Mark A.; Do, Ron; Donnelly, Peter; Farlow, Deborah N.; Fennell, Tim; Garimella, Kiran; Hazen, Stanley L.; Hu, Youna; Jordan, Daniel M.; Jun, Goo; Kathiresan, Sekar; Kang, Hyun Min; Kiezun, Adam; Lettre, Guillaume; Li, Bingshan; Li, Mingyao; Newton-Cheh, Christopher H.; Padmanabhan, Sandosh; Peloso, Gina; Pulit, Sara; Rader, Daniel J.; Reich, David; Reilly, Muredach P.; Rivas, Manuel A.; Schwartz, Steve; Scott, Laura; Siscovick, David S.; Spertus, John A.; Stitziel, Nathaniel O.; Stoletzki, Nina; Sunyaev, Shamil R.; Voight, Benjamin F.; Willer, Cristen J.; Rich, Stephen S.; Akylbekova, Ermeg; Atwood, Larry D.; Ballantyne, Christie M.; Barbalic, Maja; Barr, R. Graham; Benjamin, Emelia J.; Bis, Joshua; Boerwinkle, Eric; Bowden, Donald W.; Brody, Jennifer; Budoff, Matthew; Burke, Greg; Buxbaum, Sarah; Carr, Jeff; Chen, Donna T.; Chen, Ida Y.; Chen, Wei-Min; Concannon, Pat; Crosby, Jacy; Cupples, L. Adrienne; D’Agostino, Ralph; DeStefano, Anita L.; Dreisbach, Albert; Dupuis, Josée; Durda, J. Peter; Ellis, Jaclyn; Folsom, Aaron R.; Fornage, Myriam; Fox, Caroline S.; Fox, Ervin; Funari, Vincent; Ganesh, Santhi K.; Gardin, Julius; Goff, David; Gordon, Ora; Grody, Wayne; Gross, Myron; Guo, Xiuqing; Hall, Ira M.; Heard-Costa, Nancy L.; Heckbert, Susan R.; Heintz, Nicholas; Herrington, David M.; Hickson, DeMarc; Huang, Jie; Hwang, Shih-Jen; Jacobs, David R.; Jenny, Nancy S.; Johnson, Andrew D.; Johnson, Craig W.; Kawut, Steven; Kronmal, Richard; Kurz, Raluca; Lange, Ethan M.; Lange, Leslie A.; Larson, Martin G.; Lawson, Mark; Lewis, Cora E.; Levy, Daniel; Li, Dalin; Lin, Honghuang; Liu, Chunyu; Liu, Jiankang; Liu, Kiang; Liu, Xiaoming; Liu, Yongmei; Longstreth, William T.; Loria, Cay; Lumley, Thomas; Lunetta, Kathryn; Mackey, Aaron J.; Mackey, Rachel; Manichaikul, Ani; Maxwell, Taylor; McKnight, Barbara; Meigs, James B.; Morrison, Alanna C.; Musani, Solomon K.; Mychaleckyj, Josyf C.; Nettleton, Jennifer A.; North, Kari; O’Donnell, Christopher J.; O’Leary, Daniel; Ong, Frank; Palmas, Walter; Pankow, James S.; Pankratz, Nathan D.; Paul, Shom; Perez, Marco; Person, Sharina D.; Polak, Joseph; Post, Wendy S.; Psaty, Bruce M.; Quinlan, Aaron R.; Raffel, Leslie J.; Ramachandran, Vasan S.; Reiner, Alexander P.; Rice, Kenneth; Rotter, Jerome I.; Sanders, Jill P.; Schreiner, Pamela; Seshadri, Sudha; Shea, Steve; Sidney, Stephen; Silverstein, Kevin; Smith, Nicholas L.; Sotoodehnia, Nona; Srinivasan, Asoke; Taylor, Herman A.; Taylor, Kent; Thomas, Fridtjof; Tracy, Russell P.; Tsai, Michael Y.; Volcik, Kelly A.; Wassel, Chrstina L.; Watson, Karol; Wei, Gina; White, Wendy; Wiggins, Kerri L.; Wilk, Jemma B.; Williams, O. Dale; Wilson, Gregory; Wilson, James G.; Wolf, Phillip; Zakai, Neil A.; Hardy, John; Meschia, James F.; Nalls, Michael; Singleton, Andrew; Worrall, Brad; Bamshad, Michael J.; Barnes, Kathleen C.; Abdulhamid, Ibrahim; Accurso, Frank; Anbar, Ran; Beaty, Terri; Bigham, Abigail; Black, Phillip; Bleecker, Eugene; Buckingham, Kati; Cairns, Anne Marie; Caplan, Daniel; Chatfield, Barbara; Chidekel, Aaron; Cho, Michael; Christiani, David C.; Crapo, James D.; Crouch, Julia; Daley, Denise; Dang, Anthony; Dang, Hong; De Paula, Alicia; DeCelie-Germana, Joan; Drumm, Allen DozorMitch; Dyson, Maynard; Emerson, Julia; Emond, Mary J.; Ferkol, Thomas; Fink, Robert; Foster, Cassandra; Froh, Deborah; Gao, Li; Gershan, William; Gibson, Ronald L.; Godwin, Elizabeth; Gondor, Magdalen; Gutierrez, Hector; Hansel, Nadia N.; Hassoun, Paul M.; Hiatt, Peter; Hokanson, John E.; Howenstine, Michelle; Hummer, Laura K.; Kanga, Jamshed; Kim, Yoonhee; Knowles, Michael R.; Konstan, Michael; Lahiri, Thomas; Laird, Nan; Lange, Christoph; Lin, Lin; Lin, Xihong; Louie, Tin L.; Lynch, David; Make, Barry; Martin, Thomas R.; Mathai, Steve C.; Mathias, Rasika A.; McNamara, John; McNamara, Sharon; Meyers, Deborah; Millard, Susan; Mogayzel, Peter; Moss, Richard; Murray, Tanda; Nielson, Dennis; Noyes, Blakeslee; O’Neal, Wanda; Orenstein, David; O’Sullivan, Brian; Pace, Rhonda; Pare, Peter; Parker, H. Worth; Passero, Mary Ann; Perkett, Elizabeth; Prestridge, Adrienne; Rafaels, Nicholas M.; Ramsey, Bonnie; Regan, Elizabeth; Ren, Clement; Retsch-Bogart, George; Rock, Michael; Rosen, Antony; Rosenfeld, Margaret; Ruczinski, Ingo; Sanford, Andrew; Schaeffer, David; Sell, Cindy; Sheehan, Daniel; Silverman, Edwin K.; Sin, Don; Spencer, Terry; Stonebraker, Jackie; Tabor, Holly K.; Varlotta, Laurie; Vergara, Candelaria I.; Weiss, Robert; Wigley, Fred; Wise, Robert A.; Wright, Fred A.; Wurfel, Mark M.; Zanni, Robert; Zou, Fei; Nickerson, Deborah A.; Rieder, Mark J.; Green, Phil; Shendure, Jay; Akey, Joshua M.; Bustamante, Carlos D.; Crosslin, David R.; Eichler, Evan E.; Fox, P. Keolu; Fu, Wenqing; Gordon, Adam; Gravel, Simon; Jarvik, Gail P.; Johnsen, Jill M.; Kan, Mengyuan; Kenny, Eimear E.; Kidd, Jeffrey M.; Lara-Garduno, Fremiet; Leal, Suzanne M.; Liu, Dajiang J.; McGee, Sean; O’Connor, Timothy D.; Paeper, Bryan; Robertson, Peggy D.; Smith, Joshua D.; Staples, Jeffrey C.; Tennessen, Jacob A.; Turner, Emily H.; Wang, Gao; Yi, Qian; Jackson, Rebecca; Peters, Ulrike; Carlson, Christopher S.; Anderson, Garnet; Anton-Culver, Hoda; Assimes, Themistocles L.; Auer, Paul L.; Beresford, Shirley; Bizon, Chris; Black, Henry; Brunner, Robert; Brzyski, Robert; Burwen, Dale; Caan, Bette; Carty, Cara L.; Chlebowski, Rowan; Cummings, Steven; Curb, J. David; Eaton, Charles B.; Ford, Leslie; Franceschini, Nora; Fullerton, Stephanie M.; Gass, Margery; Geller, Nancy; Heiss, Gerardo; Howard, Barbara V.; Hsu, Li; Hutter, Carolyn M.; Ioannidis, John; Jiao, Shuo; Johnson, Karen C.; Kooperberg, Charles; Kuller, Lewis; LaCroix, Andrea; Lakshminarayan, Kamakshi; Lane, Dorothy; Lasser, Norman; LeBlanc, Erin; Li, Kuo-Ping; Limacher, Marian; Lin, Dan-Yu; Logsdon, Benjamin A.; Ludlam, Shari; Manson, JoAnn E.; Margolis, Karen; Martin, Lisa; McGowan, Joan; Monda, Keri L.; Kotchen, Jane Morley; Nathan, Lauren; Ockene, Judith; O’Sullivan, Mary Jo; Phillips, Lawrence S.; Prentice, Ross L.; Robbins, John; Robinson, Jennifer G.; Rossouw, Jacques E.; Sangi-Haghpeykar, Haleh; Sarto, Gloria E.; Shumaker, Sally; Simon, Michael S.; Stefanick, Marcia L.; Stein, Evan; Tang, Hua; Taylor, Kira C.; Thomson, Cynthia A.; Thornton, Timothy A.; Van Horn, Linda; Vitolins, Mara; Wactawski-Wende, Jean; Wallace, Robert; Wassertheil-Smoller, Sylvia; Zeng, Donglin; Applebaum-Bowden, Deborah; Feolo, Michael; Gan, Weiniu; Paltoo, Dina N.; Sholinsky, Phyliss; Sturcke, Anne

    2014-01-01

    Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98th or <2nd percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments. PMID:24507775

  6. Whole-exome sequencing identifies rare and low-frequency coding variants associated with LDL cholesterol.

    PubMed

    Lange, Leslie A; Hu, Youna; Zhang, He; Xue, Chenyi; Schmidt, Ellen M; Tang, Zheng-Zheng; Bizon, Chris; Lange, Ethan M; Smith, Joshua D; Turner, Emily H; Jun, Goo; Kang, Hyun Min; Peloso, Gina; Auer, Paul; Li, Kuo-Ping; Flannick, Jason; Zhang, Ji; Fuchsberger, Christian; Gaulton, Kyle; Lindgren, Cecilia; Locke, Adam; Manning, Alisa; Sim, Xueling; Rivas, Manuel A; Holmen, Oddgeir L; Gottesman, Omri; Lu, Yingchang; Ruderfer, Douglas; Stahl, Eli A; Duan, Qing; Li, Yun; Durda, Peter; Jiao, Shuo; Isaacs, Aaron; Hofman, Albert; Bis, Joshua C; Correa, Adolfo; Griswold, Michael E; Jakobsdottir, Johanna; Smith, Albert V; Schreiner, Pamela J; Feitosa, Mary F; Zhang, Qunyuan; Huffman, Jennifer E; Crosby, Jacy; Wassel, Christina L; Do, Ron; Franceschini, Nora; Martin, Lisa W; Robinson, Jennifer G; Assimes, Themistocles L; Crosslin, David R; Rosenthal, Elisabeth A; Tsai, Michael; Rieder, Mark J; Farlow, Deborah N; Folsom, Aaron R; Lumley, Thomas; Fox, Ervin R; Carlson, Christopher S; Peters, Ulrike; Jackson, Rebecca D; van Duijn, Cornelia M; Uitterlinden, André G; Levy, Daniel; Rotter, Jerome I; Taylor, Herman A; Gudnason, Vilmundur; Siscovick, David S; Fornage, Myriam; Borecki, Ingrid B; Hayward, Caroline; Rudan, Igor; Chen, Y Eugene; Bottinger, Erwin P; Loos, Ruth J F; Sætrom, Pål; Hveem, Kristian; Boehnke, Michael; Groop, Leif; McCarthy, Mark; Meitinger, Thomas; Ballantyne, Christie M; Gabriel, Stacey B; O'Donnell, Christopher J; Post, Wendy S; North, Kari E; Reiner, Alexander P; Boerwinkle, Eric; Psaty, Bruce M; Altshuler, David; Kathiresan, Sekar; Lin, Dan-Yu; Jarvik, Gail P; Cupples, L Adrienne; Kooperberg, Charles; Wilson, James G; Nickerson, Deborah A; Abecasis, Goncalo R; Rich, Stephen S; Tracy, Russell P; Willer, Cristen J

    2014-02-01

    Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98(th) or <2(nd) percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments.

  7. Single-variant and multi-variant trend tests for genetic association with next-generation sequencing that are robust to sequencing error.

    PubMed

    Kim, Wonkuk; Londono, Douglas; Zhou, Lisheng; Xing, Jinchuan; Nato, Alejandro Q; Musolf, Anthony; Matise, Tara C; Finch, Stephen J; Gordon, Derek

    2012-01-01

    As with any new technology, next-generation sequencing (NGS) has potential advantages and potential challenges. One advantage is the identification of multiple causal variants for disease that might otherwise be missed by SNP-chip technology. One potential challenge is misclassification error (as with any emerging technology) and the issue of power loss due to multiple testing. Here, we develop an extension of the linear trend test for association that incorporates differential misclassification error and may be applied to any number of SNPs. We call the statistic the linear trend test allowing for error, applied to NGS, or LTTae,NGS. This statistic allows for differential misclassification. The observed data are phenotypes for unrelated cases and controls, coverage, and the number of putative causal variants for every individual at all SNPs. We simulate data considering multiple factors (disease mode of inheritance, genotype relative risk, causal variant frequency, sequence error rate in cases, sequence error rate in controls, number of loci, and others) and evaluate type I error rate and power for each vector of factor settings. We compare our results with two recently published NGS statistics. Also, we create a fictitious disease model based on downloaded 1000 Genomes data for 5 SNPs and 388 individuals, and apply our statistic to those data. We find that the LTTae,NGS maintains the correct type I error rate in all simulations (differential and non-differential error), while the other statistics show large inflation in type I error for lower coverage. Power for all three methods is approximately the same for all three statistics in the presence of non-differential error. Application of our statistic to the 1000 Genomes data suggests that, for the data downloaded, there is a 1.5% sequence misclassification rate over all SNPs. Finally, application of the multi-variant form of LTTae,NGS shows high power for a number of simulation settings, although it can have

  8. NGS-eval: NGS Error analysis and novel sequence VAriant detection tooL.

    PubMed

    May, Ali; Abeln, Sanne; Buijs, Mark J; Heringa, Jaap; Crielaard, Wim; Brandt, Bernd W

    2015-07-01

    Massively parallel sequencing of microbial genetic markers (MGMs) is used to uncover the species composition in a multitude of ecological niches. These sequencing runs often contain a sample with known composition that can be used to evaluate the sequencing quality or to detect novel sequence variants. With NGS-eval, the reads from such (mock) samples can be used to (i) explore the differences between the reads and their references and to (ii) estimate the sequencing error rate. This tool maps these reads to references and calculates as well as visualizes the different types of sequencing errors. Clearly, sequencing errors can only be accurately calculated if the reference sequences are correct. However, even with known strains, it is not straightforward to select the correct references from databases. We previously analysed a pyrosequencing dataset from a mock sample to estimate sequencing error rates and detected sequence variants in our mock community, allowing us to obtain an accurate error estimation. Here, we demonstrate the variant detection and error analysis capability of NGS-eval with Illumina MiSeq reads from the same mock community. While tailored towards the field of metagenomics, this server can be used for any type of MGM-based reads. NGS-eval is available at http://www.ibi.vu.nl/programs/ngsevalwww/.

  9. A survey of tools for variant analysis of next-generation genome sequencing data

    PubMed Central

    Pabinger, Stephan; Dander, Andreas; Fischer, Maria; Snajder, Rene; Sperk, Michael; Efremova, Mirjana; Krabichler, Birgit; Speicher, Michael R.; Zschocke, Johannes

    2014-01-01

    Recent advances in genome sequencing technologies provide unprecedented opportunities to characterize individual genomic landscapes and identify mutations relevant for diagnosis and therapy. Specifically, whole-exome sequencing using next-generation sequencing (NGS) technologies is gaining popularity in the human genetics community due to the moderate costs, manageable data amounts and straightforward interpretation of analysis results. While whole-exome and, in the near future, whole-genome sequencing are becoming commodities, data analysis still poses significant challenges and led to the development of a plethora of tools supporting specific parts of the analysis workflow or providing a complete solution. Here, we surveyed 205 tools for whole-genome/whole-exome sequencing data analysis supporting five distinct analytical steps: quality assessment, alignment, variant identification, variant annotation and visualization. We report an overview of the functionality, features and specific requirements of the individual tools. We then selected 32 programs for variant identification, variant annotation and visualization, which were subjected to hands-on evaluation using four data sets: one set of exome data from two patients with a rare disease for testing identification of germline mutations, two cancer data sets for testing variant callers for somatic mutations, copy number variations and structural variations, and one semi-synthetic data set for testing identification of copy number variations. Our comprehensive survey and evaluation of NGS tools provides a valuable guideline for human geneticists working on Mendelian disorders, complex diseases and cancers. PMID:23341494

  10. Whole exome sequencing identifies genetic variants in inherited thrombocytopenia with secondary qualitative function defects

    PubMed Central

    Johnson, Ben; Lowe, Gillian C.; Futterer, Jane; Lordkipanidzé, Marie; MacDonald, David; Simpson, Michael A.; Sanchez-Guiú, Isabel; Drake, Sian; Bem, Danai; Leo, Vincenzo; Fletcher, Sarah J.; Dawood, Ban; Rivera, José; Allsup, David; Biss, Tina; Bolton-Maggs, Paula HB; Collins, Peter; Curry, Nicola; Grimley, Charlotte; James, Beki; Makris, Mike; Motwani, Jayashree; Pavord, Sue; Talks, Katherine; Thachil, Jecko; Wilde, Jonathan; Williams, Mike; Harrison, Paul; Gissen, Paul; Mundell, Stuart; Mumford, Andrew; Daly, Martina E.; Watson, Steve P.; Morgan, Neil V.

    2016-01-01

    Inherited thrombocytopenias are a heterogeneous group of disorders characterized by abnormally low platelet counts which can be associated with abnormal bleeding. Next-generation sequencing has previously been employed in these disorders for the confirmation of suspected genetic abnormalities, and more recently in the discovery of novel disease-causing genes. However its full potential has not yet been exploited. Over the past 6 years we have sequenced the exomes from 55 patients, including 37 index cases and 18 additional family members, all of whom were recruited to the UK Genotyping and Phenotyping of Platelets study. All patients had inherited or sustained thrombocytopenia of unknown etiology with platelet counts varying from 11×109/L to 186×109/L. Of the 51 patients phenotypically tested, 37 (73%), had an additional secondary qualitative platelet defect. Using whole exome sequencing analysis we have identified “pathogenic” or “likely pathogenic” variants in 46% (17/37) of our index patients with thrombocytopenia. In addition, we report variants of uncertain significance in 12 index cases, including novel candidate genetic variants in previously unreported genes in four index cases. These results demonstrate that whole exome sequencing is an efficient method for elucidating potential pathogenic genetic variants in inherited thrombocytopenia. Whole exome sequencing also has the added benefit of discovering potentially pathogenic genetic variants for further study in novel genes not previously implicated in inherited thrombocytopenia. PMID:27479822

  11. Transcriptome Sequencing of a Large Human Family Identifies the Impact of Rare Noncoding Variants

    PubMed Central

    Li, Xin; Battle, Alexis; Karczewski, Konrad J.; Zappala, Zach; Knowles, David A.; Smith, Kevin S.; Kukurba, Kim R.; Wu, Eric; Simon, Noah; Montgomery, Stephen B.

    2014-01-01

    Recent and rapid human population growth has led to an excess of rare genetic variants that are expected to contribute to an individual’s genetic burden of disease risk. To date, much of the focus has been on rare protein-coding variants, for which potential impact can be estimated from the genetic code, but determining the impact of rare noncoding variants has been more challenging. To improve our understanding of such variants, we combined high-quality genome sequencing and RNA sequencing data from a 17-individual, three-generation family to contrast expression quantitative trait loci (eQTLs) and splicing quantitative trait loci (sQTLs) within this family to eQTLs and sQTLs within a population sample. Using this design, we found that eQTLs and sQTLs with large effects in the family were enriched with rare regulatory and splicing variants (minor allele frequency < 0.01). They were also more likely to influence essential genes and genes involved in complex disease. In addition, we tested the capacity of diverse noncoding annotation to predict the impact of rare noncoding variants. We found that distance to the transcription start site, evolutionary constraint, and epigenetic annotation were considerably more informative for predicting the impact of rare variants than for predicting the impact of common variants. These results highlight that rare noncoding variants are important contributors to individual gene-expression profiles and further demonstrate a significant capability for genomic annotation to predict the impact of rare noncoding variants. PMID:25192044

  12. A rare sequence variant in intron 1 of THAP1 is associated with primary dystonia

    PubMed Central

    Vemula, Satya R; Xiao, Jianfeng; Zhao, Yu; Bastian, Robert W; Perlmutter, Joel S; Racette, Brad A; Paniello, Randal C; Wszolek, Zbigniew K; Uitti, Ryan J; Van Gerpen, Jay A; Hedera, Peter; Truong, Daniel D; Blitzer, Andrew; Rudzińska, Monika; Momčilović, Dragana; Jinnah, Hyder A; Frei, Karen; Pfeiffer, Ronald F; LeDoux, Mark S

    2014-01-01

    Although coding variants in THAP1 have been causally associated with primary dystonia, the contribution of noncoding variants remains uncertain. Herein, we examine a previously identified Intron 1 variant (c.71+9C>A, rs200209986). Among 1672 subjects with mainly adult-onset primary dystonia, 12 harbored the variant in contrast to 1/1574 controls (P < 0.01). Dystonia classification included cervical dystonia (N = 3), laryngeal dystonia (adductor subtype, N = 3), jaw-opening oromandibular dystonia (N = 1), blepharospasm (N = 2), and unclassified (N = 3). Age of dystonia onset ranged from 25 to 69 years (mean = 54 years). In comparison to controls with no identified THAP1 sequence variants, the c.71+9C>A variant was associated with an elevated ratio of Isoform 1 (NM_018105) to Isoform 2 (NM_199003) in leukocytes. In silico and minigene analyses indicated that c.71+9C>A alters THAP1 splicing. Lymphoblastoid cells harboring the c.71+9C>A variant showed extensive apoptosis with relatively fewer cells in the G2 phase of the cell cycle. Differentially expressed genes from lymphoblastoid cells revealed that the c.71+9C>A variant exerts effects on DNA synthesis, cell growth and proliferation, cell survival, and cytotoxicity. In aggregate, these data indicate that THAP1 c.71+9C>A is a risk factor for adult-onset primary dystonia. PMID:24936516

  13. Comparison of Sequencing Platforms for Single Nucleotide Variant Calls in a Human Sample

    PubMed Central

    Miller, Webb; Guillory, Joseph; Stinson, Jeremy; Seshagiri, Somasekar

    2013-01-01

    Next-generation sequencings platforms coupled with advanced bioinformatic tools enable re-sequencing of the human genome at high-speed and large cost savings. We compare sequencing platforms from Roche/454(GS FLX), Illumina/HiSeq (HiSeq 2000), and Life Technologies/SOLiD (SOLiD 3 ECC) for their ability to identify single nucleotide substitutions in whole genome sequences from the same human sample. We report on significant GC-related bias observed in the data sequenced on Illumina and SOLiD platforms. The differences in the variant calls were investigated with regards to coverage, and sequencing error. Some of the variants called by only one or two of the platforms were experimentally tested using mass spectrometry; a method that is independent of DNA sequencing. We establish several causes why variants remained unreported, specific to each platform. We report the indel called using the three sequencing technologies and from the obtained results we conclude that sequencing human genomes with more than a single platform and multiple libraries is beneficial when high level of accuracy is required. PMID:23405114

  14. Filovirus RefSeq Entries: Evaluation and Selection of Filovirus Type Variants, Type Sequences, and Names

    PubMed Central

    Kuhn, Jens H.; Andersen, Kristian G.; Bào, Yīmíng; Bavari, Sina; Becker, Stephan; Bennett, Richard S.; Bergman, Nicholas H.; Blinkova, Olga; Bradfute, Steven; Brister, J. Rodney; Bukreyev, Alexander; Chandran, Kartik; Chepurnov, Alexander A.; Davey, Robert A.; Dietzgen, Ralf G.; Doggett, Norman A.; Dolnik, Olga; Dye, John M.; Enterlein, Sven; Fenimore, Paul W.; Formenty, Pierre; Freiberg, Alexander N.; Garry, Robert F.; Garza, Nicole L.; Gire, Stephen K.; Gonzalez, Jean-Paul; Griffiths, Anthony; Happi, Christian T.; Hensley, Lisa E.; Herbert, Andrew S.; Hevey, Michael C.; Hoenen, Thomas; Honko, Anna N.; Ignatyev, Georgy M.; Jahrling, Peter B.; Johnson, Joshua C.; Johnson, Karl M.; Kindrachuk, Jason; Klenk, Hans-Dieter; Kobinger, Gary; Kochel, Tadeusz J.; Lackemeyer, Matthew G.; Lackner, Daniel F.; Leroy, Eric M.; Lever, Mark S.; Mühlberger, Elke; Netesov, Sergey V.; Olinger, Gene G.; Omilabu, Sunday A.; Palacios, Gustavo; Panchal, Rekha G.; Park, Daniel J.; Patterson, Jean L.; Paweska, Janusz T.; Peters, Clarence J.; Pettitt, James; Pitt, Louise; Radoshitzky, Sheli R.; Ryabchikova, Elena I.; Saphire, Erica Ollmann; Sabeti, Pardis C.; Sealfon, Rachel; Shestopalov, Aleksandr M.; Smither, Sophie J.; Sullivan, Nancy J.; Swanepoel, Robert; Takada, Ayato; Towner, Jonathan S.; van der Groen, Guido; Volchkov, Viktor E.; Volchkova, Valentina A.; Wahl-Jensen, Victoria; Warren, Travis K.; Warfield, Kelly L.; Weidmann, Manfred; Nichol, Stuart T.

    2014-01-01

    Sequence determination of complete or coding-complete genomes of viruses is becoming common practice for supporting the work of epidemiologists, ecologists, virologists, and taxonomists. Sequencing duration and costs are rapidly decreasing, sequencing hardware is under modification for use by non-experts, and software is constantly being improved to simplify sequence data management and analysis. Thus, analysis of virus disease outbreaks on the molecular level is now feasible, including characterization of the evolution of individual virus populations in single patients over time. The increasing accumulation of sequencing data creates a management problem for the curators of commonly used sequence databases and an entry retrieval problem for end users. Therefore, utilizing the data to their fullest potential will require setting nomenclature and annotation standards for virus isolates and associated genomic sequences. The National Center for Biotechnology Information’s (NCBI’s) RefSeq is a non-redundant, curated database for reference (or type) nucleotide sequence records that supplies source data to numerous other databases. Building on recently proposed templates for filovirus variant naming [ ()////variant designation>-], we report consensus decisions from a majority of past and currently active filovirus experts on the eight filovirus type variants and isolates to be represented in RefSeq, their final designations, and their associated sequences. PMID:25256396

  15. Filovirus RefSeq entries: evaluation and selection of filovirus type variants, type sequences, and names.

    PubMed

    Kuhn, Jens H; Andersen, Kristian G; Bào, Yīmíng; Bavari, Sina; Becker, Stephan; Bennett, Richard S; Bergman, Nicholas H; Blinkova, Olga; Bradfute, Steven; Brister, J Rodney; Bukreyev, Alexander; Chandran, Kartik; Chepurnov, Alexander A; Davey, Robert A; Dietzgen, Ralf G; Doggett, Norman A; Dolnik, Olga; Dye, John M; Enterlein, Sven; Fenimore, Paul W; Formenty, Pierre; Freiberg, Alexander N; Garry, Robert F; Garza, Nicole L; Gire, Stephen K; Gonzalez, Jean-Paul; Griffiths, Anthony; Happi, Christian T; Hensley, Lisa E; Herbert, Andrew S; Hevey, Michael C; Hoenen, Thomas; Honko, Anna N; Ignatyev, Georgy M; Jahrling, Peter B; Johnson, Joshua C; Johnson, Karl M; Kindrachuk, Jason; Klenk, Hans-Dieter; Kobinger, Gary; Kochel, Tadeusz J; Lackemeyer, Matthew G; Lackner, Daniel F; Leroy, Eric M; Lever, Mark S; Mühlberger, Elke; Netesov, Sergey V; Olinger, Gene G; Omilabu, Sunday A; Palacios, Gustavo; Panchal, Rekha G; Park, Daniel J; Patterson, Jean L; Paweska, Janusz T; Peters, Clarence J; Pettitt, James; Pitt, Louise; Radoshitzky, Sheli R; Ryabchikova, Elena I; Saphire, Erica Ollmann; Sabeti, Pardis C; Sealfon, Rachel; Shestopalov, Aleksandr M; Smither, Sophie J; Sullivan, Nancy J; Swanepoel, Robert; Takada, Ayato; Towner, Jonathan S; van der Groen, Guido; Volchkov, Viktor E; Volchkova, Valentina A; Wahl-Jensen, Victoria; Warren, Travis K; Warfield, Kelly L; Weidmann, Manfred; Nichol, Stuart T

    2014-09-26

    Sequence determination of complete or coding-complete genomes of viruses is becoming common practice for supporting the work of epidemiologists, ecologists, virologists, and taxonomists. Sequencing duration and costs are rapidly decreasing, sequencing hardware is under modification for use by non-experts, and software is constantly being improved to simplify sequence data management and analysis. Thus, analysis of virus disease outbreaks on the molecular level is now feasible, including characterization of the evolution of individual virus populations in single patients over time. The increasing accumulation of sequencing data creates a management problem for the curators of commonly used sequence databases and an entry retrieval problem for end users. Therefore, utilizing the data to their fullest potential will require setting nomenclature and annotation standards for virus isolates and associated genomic sequences. The National Center for Biotechnology Information's (NCBI's) RefSeq is a non-redundant, curated database for reference (or type) nucleotide sequence records that supplies source data to numerous other databases. Building on recently proposed templates for filovirus variant naming [ ()////variant designation>-], we report consensus decisions from a majority of past and currently active filovirus experts on the eight filovirus type variants and isolates to be represented in RefSeq, their final designations, and their associated sequences.

  16. Comparison of sequencing platforms for single nucleotide variant calls in a human sample.

    PubMed

    Ratan, Aakrosh; Miller, Webb; Guillory, Joseph; Stinson, Jeremy; Seshagiri, Somasekar; Schuster, Stephan C

    2013-01-01

    Next-generation sequencings platforms coupled with advanced bioinformatic tools enable re-sequencing of the human genome at high-speed and large cost savings. We compare sequencing platforms from Roche/454(GS FLX), Illumina/HiSeq (HiSeq 2000), and Life Technologies/SOLiD (SOLiD 3 ECC) for their ability to identify single nucleotide substitutions in whole genome sequences from the same human sample. We report on significant GC-related bias observed in the data sequenced on Illumina and SOLiD platforms. The differences in the variant calls were investigated with regards to coverage, and sequencing error. Some of the variants called by only one or two of the platforms were experimentally tested using mass spectrometry; a method that is independent of DNA sequencing. We establish several causes why variants remained unreported, specific to each platform. We report the indel called using the three sequencing technologies and from the obtained results we conclude that sequencing human genomes with more than a single platform and multiple libraries is beneficial when high level of accuracy is required.

  17. Deep Sequencing Reveals Novel Genetic Variants in Children with Acute Liver Failure and Tissue Evidence of Impaired Energy Metabolism

    PubMed Central

    Valencia, C. Alexander; Wang, Xinjian; Wang, Jin; Peters, Anna; Simmons, Julia R.; Moran, Molly C.; Mathur, Abhinav; Husami, Ammar; Qian, Yaping; Sheridan, Rachel; Bove, Kevin E.; Witte, David; Huang, Taosheng; Miethke, Alexander G.

    2016-01-01

    Background & Aims The etiology of acute liver failure (ALF) remains elusive in almost half of affected children. We hypothesized that inherited mitochondrial and fatty acid oxidation disorders were occult etiological factors in patients with idiopathic ALF and impaired energy metabolism. Methods Twelve patients with elevated blood molar lactate/pyruvate ratio and indeterminate etiology were selected from a retrospective cohort of 74 subjects with ALF because their fixed and frozen liver samples were available for histological, ultrastructural, molecular and biochemical analysis. Results A customized next-generation sequencing panel for 26 genes associated with mitochondrial and fatty acid oxidation defects revealed mutations and sequence variants in five subjects. Variants involved the genes ACAD9, POLG, POLG2, DGUOK, and RRM2B; the latter not previously reported in subjects with ALF. The explanted livers of the patients with heterozygous, truncating insertion mutations in RRM2B showed patchy micro- and macrovesicular steatosis, decreased mitochondrial DNA (mtDNA) content <30% of controls, and reduced respiratory chain complex activity; both patients had good post-transplant outcome. One infant with severe lactic acidosis was found to carry two heterozygous variants in ACAD9, which was associated with isolated complex I deficiency and diffuse hypergranular hepatocytes. The two subjects with heterozygous variants of unknown clinical significance in POLG and DGUOK developed ALF following drug exposure. Their hepatocytes displayed abnormal mitochondria by electron microscopy. Conclusion Targeted next generation sequencing and correlation with histological, ultrastructural and functional studies on liver tissue in children with elevated lactate/pyruvate ratio expand the spectrum of genes associated with pediatric ALF. PMID:27483465

  18. Phasing for medical sequencing using rare variants and large haplotype reference panels

    PubMed Central

    Sharp, Kevin; Kretzschmar, Warren; Delaneau, Olivier; Marchini, Jonathan

    2016-01-01

    Motivation: There is growing recognition that estimating haplotypes from high coverage sequencing of single samples in clinical settings is an important problem. At the same time very large datasets consisting of tens and hundreds of thousands of high-coverage sequenced samples will soon be available. We describe a method that takes advantage of these huge human genetic variation resources and rare variant sharing patterns to estimate haplotypes on single sequenced samples. Sharing rare variants between two individuals is more likely to arise from a recent common ancestor and, hence, also more likely to indicate similar shared haplotypes over a substantial flanking region of sequence. Results: Our method exploits this idea to select a small set of highly informative copying states within a Hidden Markov Model (HMM) phasing algorithm. Using rare variants in this way allows us to avoid iterative MCMC methods to infer haplotypes. Compared to other approaches that do not explicitly use rare variants we obtain significant gains in phasing accuracy, less variation over phasing runs and improvements in speed. For example, using a reference panel of 7420 haplotypes from the UK10K project, we are able to reduce switch error rates by up to 50% when phasing samples sequenced at high-coverage. In addition, a single step rephasing of the UK10K panel, using rare variant information, has a downstream impact on phasing performance. These results represent a proof of concept that rare variant sharing patterns can be utilized to phase large high-coverage sequencing studies such as the 100 000 Genomes Project dataset. Availability and implementation: A webserver that includes an implementation of this new method and allows phasing of high-coverage clinical samples is available at https://phasingserver.stats.ox.ac.uk/. Contact: marchini@stats.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153703

  19. Improved detection of artifactual viral minority variants in high-throughput sequencing data.

    PubMed

    Welkers, Matthijs R A; Jonges, Marcel; Jeeninga, Rienk E; Koopmans, Marion P G; de Jong, Menno D

    2014-01-01

    High-throughput sequencing (HTS) of viral samples provides important information on the presence of viral minority variants. However, detection and accurate quantification is limited by the capacity to distinguish biological from artificial variation. In this study, errors related to the Illumina HiSeq2000 library generation and HTS process were investigated by determining minority variant frequencies in an influenza A/WSN/1933(H1N1) virus reverse-genetics plasmid pool. Errors related to amplification and sequencing were determined using the same plasmid pool, by generation of infectious virus using reverse genetics followed by in duplo reverse-transcriptase PCR (RT-PCR) amplification and HTS in the same sequence run. Results showed that after "best practice" quality control (QC), within the plasmid pool, one minority variant with a frequency >0.5% was identified, while 84 and 139 were identified in the RT-PCR amplified samples, indicating RT-PCR amplification artificially increased variation. Detailed analysis showed that artifactual minority variants could be identified by two major technical characteristics: their predominant presence in a single read orientation and uneven distribution of mismatches over the length of the reads. We demonstrate that by addition of two QC steps 95% of the artifactual minority variants could be identified. When our analysis approach was applied to three clinical samples 68% of the initially identified minority variants were identified as artifacts. Our study clearly demonstrated that, without additional QC steps, overestimation of viral minority variants is very likely to occur, mainly as a consequence of the required RT-PCR amplification step. The improved ability to detect and correct for artifactual minority variants, increases data resolution and could aid both past and future studies incorporating HTS. The source code has been made available through Sourceforge (https://sourceforge.net/projects/mva-ngs). PMID:25657642

  20. VARIANT: Command Line, Web service and Web interface for fast and accurate functional characterization of variants found by Next-Generation Sequencing

    PubMed Central

    Medina, Ignacio; De Maria, Alejandro; Bleda, Marta; Salavert, Francisco; Alonso, Roberto; Gonzalez, Cristina Y.; Dopazo, Joaquin

    2012-01-01

    The massive use of Next-Generation Sequencing (NGS) technologies is uncovering an unexpected amount of variability. The functional characterization of such variability, particularly in the most common form of variation found, the Single Nucleotide Variants (SNVs), has become a priority that needs to be addressed in a systematic way. VARIANT (VARIant ANalyis Tool) reports information on the variants found that include consequence type and annotations taken from different databases and repositories (SNPs and variants from dbSNP and 1000 genomes, and disease-related variants from the Genome-Wide Association Study (GWAS) catalog, Online Mendelian Inheritance in Man (OMIM), Catalog of Somatic Mutations in Cancer (COSMIC) mutations, etc). VARIANT also produces a rich variety of annotations that include information on the regulatory (transcription factor or miRNA-binding sites, etc.) or structural roles, or on the selective pressures on the sites affected by the variation. This information allows extending the conventional reports beyond the coding regions and expands the knowledge on the contribution of non-coding or synonymous variants to the phenotype studied. Contrarily to other tools, VARIANT uses a remote database and operates through efficient RESTful Web Services that optimize search and transaction operations. In this way, local problems of installation, update or disk size limitations are overcome without the need of sacrifice speed (thousands of variants are processed per minute). VARIANT is available at: http://variant.bioinfo.cipf.es. PMID:22693211

  1. Novel Y-chromosome Short Tandem Repeat Variants Detected Through the Use of Massively Parallel Sequencing

    PubMed Central

    Warshauer, David H.; Churchill, Jennifer D.; Novroski, Nicole; King, Jonathan L.; Budowle, Bruce

    2015-01-01

    Massively parallel sequencing (MPS) technology is capable of determining the sizes of short tandem repeat (STR) alleles as well as their individual nucleotide sequences. Thus, single nucleotide polymorphisms (SNPs) within the repeat regions of STRs and variations in the pattern of repeat units in a given repeat motif can be used to differentiate alleles of the same length. In this study, MPS was used to sequence 28 forensically-relevant Y-chromosome STRs in a set of 41 DNA samples from the 3 major U.S. population groups (African Americans, Caucasians, and Hispanics). The resulting sequence data, which were analyzed with STRait Razor v2.0, revealed 37 unique allele sequence variants that have not been previously reported. Of these, 19 sequences were variations of documented sequences resulting from the presence of intra-repeat SNPs or alternative repeat unit patterns. Despite a limited sampling, two of the most frequently-observed variants were found only in African American samples. The remaining 18 variants represented allele sequences for which there were no published data with which to compare. These findings illustrate the great potential of MPS with regard to increasing the resolving power of STR typing and emphasize the need for sample population characterization of STR alleles. PMID:26391384

  2. Exome sequencing identifies a rare HSPG2 variant associated with familial idiopathic scoliosis.

    PubMed

    Baschal, Erin E; Wethey, Cambria I; Swindle, Kandice; Baschal, Robin M; Gowan, Katherine; Tang, Nelson L S; Alvarado, David M; Haller, Gabe E; Dobbs, Matthew B; Taylor, Matthew R G; Gurnett, Christina A; Jones, Kenneth L; Miller, Nancy H

    2014-12-12

    Idiopathic scoliosis occurs in 3% of individuals and has an unknown etiology. The objective of this study was to identify rare variants that contribute to the etiology of idiopathic scoliosis by using exome sequencing in a multigenerational family with idiopathic scoliosis. Exome sequencing was completed for three members of this multigenerational family with idiopathic scoliosis, resulting in the identification of a variant in the HSPG2 gene as a potential contributor to the phenotype. The HSPG2 gene was sequenced in a separate cohort of 100 unrelated individuals affected with idiopathic scoliosis and also was examined in an independent idiopathic scoliosis population. The exome sequencing and subsequent bioinformatics filtering resulted in 16 potentially damaging and rare coding variants. One of these variants, p.Asn786Ser, is located in the HSPG2 gene. The variant p.Asn786Ser also is overrepresented in a larger cohort of idiopathic scoliosis cases compared with a control population (P = 0.024). Furthermore, we identified additional rare HSPG2 variants that are predicted to be damaging in two independent cohorts of individuals with idiopathic scoliosis. The HSPG2 gene encodes for a ubiquitous multifunctional protein within the extracellular matrix in which loss of function mutation are known to result in a musculoskeletal phenotype in both mouse and humans. Based on these results, we conclude that rare variants in the HSPG2 gene potentially contribute to the idiopathic scoliosis phenotype in a subset of patients with idiopathic scoliosis. Further studies must be completed to confirm the effect of the HSPG2 gene on the idiopathic scoliosis phenotype.

  3. Exome sequencing of case-unaffected-parents trios reveals recessive and de novo genetic variants in sporadic ALS

    PubMed Central

    Steinberg, Karyn Meltz; Yu, Bing; Koboldt, Daniel C.; Mardis, Elaine R.; Pamphlett, Roger

    2015-01-01

    The contribution of genetic variants to sporadic amyotrophic lateral sclerosis (ALS) remains largely unknown. Either recessive or de novo variants could result in an apparently sporadic occurrence of ALS. In an attempt to find such variants we sequenced the exomes of 44 ALS-unaffected-parents trios. Rare and potentially damaging compound heterozygous variants were found in 27% of ALS patients, homozygous recessive variants in 14% and coding de novo variants in 27%. In 20% of patients more than one of the above variants was present. Genes with recessive variants were enriched in nucleotide binding capacity, ATPase activity, and the dynein heavy chain. Genes with de novo variants were enriched in transcription regulation and cell cycle processes. This trio study indicates that rare private recessive variants could be a mechanism underlying some case of sporadic ALS, and that de novo mutations are also likely to play a part in the disease. PMID:25773295

  4. Bovine Parathyroid Hormone: Amino Acid Sequence

    PubMed Central

    Brewer, H. Bryan; Ronan, Rosemary

    1970-01-01

    Bovine parathyroid hormone has been isolated in homogeneous form, and its complete amino acid sequence determined. The bovine hormone is a single chain, 84 amino acids long. It contains amino-terminal alanine, and carboxyl-terminal glutamine. The bovine parathyroid hormone is approximately three times the length of the newly discovered hormone, thyrocalcitonin, whose action is reciprocal to parathyroid hormone. Images PMID:5275384

  5. Exome sequencing in pooled DNA samples to identify maternal pre-eclampsia risk variants.

    PubMed

    Kaartokallio, Tea; Wang, Jingwen; Heinonen, Seppo; Kajantie, Eero; Kivinen, Katja; Pouta, Anneli; Gerdhem, Paul; Jiao, Hong; Kere, Juha; Laivuori, Hannele

    2016-01-01

    Pre-eclampsia is a common pregnancy disorder that is a major cause for maternal and perinatal mortality and morbidity. Variants predisposing to pre-eclampsia might be under negative evolutionary selection that is likely to keep their population frequencies low. We exome sequenced samples from a hundred Finnish pre-eclamptic women in pools of ten to screen for low-frequency, large-effect risk variants for pre-eclampsia. After filtering and additional genotyping steps, we selected 28 low-frequency missense, nonsense and splice site variants that were enriched in the pre-eclampsia pools compared to reference data, and genotyped the variants in 1353 pre-eclamptic and 699 non-pre-eclamptic women to test the association of them with pre-eclampsia and quantitative traits relevant for the disease. Genotypes from the SISu project (n = 6118 exome sequenced Finnish samples) were included in the binary trait association analysis as a population reference to increase statistical power. In these analyses, none of the variants tested reached genome-wide significance. In conclusion, the genetic risk for pre-eclampsia is likely complex even in a population isolate like Finland, and larger sample sizes will be necessary to detect risk variants. PMID:27384325

  6. Novel pathogenic variants and genes for myopathies identified by whole exome sequencing

    PubMed Central

    Hunter, Jesse M; Ahearn, Mary Ellen; Balak, Christopher D; Liang, Winnie S; Kurdoglu, Ahmet; Corneveaux, Jason J; Russell, Megan; Huentelman, Matthew J; Craig, David W; Carpten, John; Coons, Stephen W; DeMello, Daphne E; Hall, Judith G; Bernes, Saunder M; Baumbach-Reardon, Lisa

    2015-01-01

    Neuromuscular diseases (NMD) account for a significant proportion of infant and childhood mortality and devastating chronic disease. Determining the specific diagnosis of NMD is challenging due to thousands of unique or rare genetic variants that result in overlapping phenotypes. We present four unique childhood myopathy cases characterized by relatively mild muscle weakness, slowly progressing course, mildly elevated creatine phosphokinase (CPK), and contractures. We also present two additional cases characterized by severe prenatal/neonatal myopathy. Prior extensive genetic testing and histology of these cases did not reveal the genetic etiology of disease. Here, we applied whole exome sequencing (WES) and bioinformatics to identify likely causal pathogenic variants in each pedigree. In two cases, we identified novel pathogenic variants in COL6A3. In a third case, we identified novel likely pathogenic variants in COL6A6 and COL6A3. We identified a novel splice variant in EMD in a fourth case. Finally, we classify two cases as calcium channelopathies with identification of novel pathogenic variants in RYR1 and CACNA1S. These are the first cases of myopathies reported to be caused by variants in COL6A6 and CACNA1S. Our results demonstrate the utility and genetic diagnostic value of WES in the broad class of NMD phenotypes. PMID:26247046

  7. Rare variant phasing and haplotypic expression from RNA sequencing with phASER.

    PubMed

    Castel, Stephane E; Mohammadi, Pejman; Chung, Wendy K; Shen, Yufeng; Lappalainen, Tuuli

    2016-01-01

    Haplotype phasing of genetic variants is important for clinical interpretation of the genome, population genetic analysis and functional genomic analysis of allelic activity. Here we present phASER, an accurate approach for phasing variants that are overlapped by sequencing reads, including those from RNA sequencing (RNA-seq), which often span multiple exons due to splicing. Using diverse RNA-seq data we demonstrate that this provides more accurate phasing of rare variants compared with population-based phasing and allows phasing of variants in the same gene up to hundreds of kilobases away that cannot be obtained from DNA sequencing (DNA-seq) reads. We show that in the context of medical genetic studies this improves the resolution of compound heterozygotes. Additionally, phASER provides measures of haplotypic expression that increase power and accuracy in studies of allelic expression. In summary, phasing using RNA-seq and phASER is accurate and improves studies where rare variant haplotypes or allelic expression is needed. PMID:27605262

  8. From days to hours: reporting clinically actionable variants from whole genome sequencing.

    PubMed

    Middha, Sumit; Baheti, Saurabh; Hart, Steven N; Kocher, Jean-Pierre A

    2014-01-01

    As the cost of whole genome sequencing (WGS) decreases, clinical laboratories will be looking at broadly adopting this technology to screen for variants of clinical significance. To fully leverage this technology in a clinical setting, results need to be reported quickly, as the turnaround rate could potentially impact patient care. The latest sequencers can sequence a whole human genome in about 24 hours. However, depending on the computing infrastructure available, the processing of data can take several days, with the majority of computing time devoted to aligning reads to genomics regions that are to date not clinically interpretable. In an attempt to accelerate the reporting of clinically actionable variants, we have investigated the utility of a multi-step alignment algorithm focused on aligning reads and calling variants in genomic regions of clinical relevance prior to processing the remaining reads on the whole genome. This iterative workflow significantly accelerates the reporting of clinically actionable variants with no loss of accuracy when compared to genotypes obtained with the OMNI SNP platform or to variants detected with a standard workflow that combines Novoalign and GATK. PMID:24505267

  9. Rare variant phasing and haplotypic expression from RNA sequencing with phASER

    PubMed Central

    Castel, Stephane E.; Mohammadi, Pejman; Chung, Wendy K.; Shen, Yufeng; Lappalainen, Tuuli

    2016-01-01

    Haplotype phasing of genetic variants is important for clinical interpretation of the genome, population genetic analysis and functional genomic analysis of allelic activity. Here we present phASER, an accurate approach for phasing variants that are overlapped by sequencing reads, including those from RNA sequencing (RNA-seq), which often span multiple exons due to splicing. Using diverse RNA-seq data we demonstrate that this provides more accurate phasing of rare variants compared with population-based phasing and allows phasing of variants in the same gene up to hundreds of kilobases away that cannot be obtained from DNA sequencing (DNA-seq) reads. We show that in the context of medical genetic studies this improves the resolution of compound heterozygotes. Additionally, phASER provides measures of haplotypic expression that increase power and accuracy in studies of allelic expression. In summary, phasing using RNA-seq and phASER is accurate and improves studies where rare variant haplotypes or allelic expression is needed. PMID:27605262

  10. HLA class II sequence variants influence tuberculosis risk in populations of European ancestry

    PubMed Central

    Sveinbjornsson, Gardar; Gudbjartsson, Daniel F.; Halldorsson, Bjarni V.; Kristinsson, Karl G.; Gottfredsson, Magnus; Barrett, Jeffrey C.; Gudmundsson, Larus J.; Blondal, Kai; Gylfason, Arnaldur; Gudjonsson, Sigurjon Axel; Helgadottir, Hafdis T.; Jonasdottir, Adalbjorg; Jonasdottir, Aslaug; Karason, Ari; Kardum, Ljiljana Bulat; Knežević, Jelena; Kristjansson, Helgi; Kristjansson, Mar; Love, Arthur; Luo, Yang; Magnusson, Olafur T.; Sulem, Patrick; Kong, Augustine; Masson, Gisli; Thorsteinsdottir, Unnur; Dembic, Zlatko; Nejentsev, Sergey; Blondal, Thorsteinn; Jonsdottir, Ingileif; Stefansson, Kari

    2016-01-01

    Mycobacterium tuberculosis (M. tuberculosis) infections cause 9.0 million new tuberculosis (TB) cases and 1.5 million deaths annually1. To search for sequence variants that confer risk of TB we tested 28.3 million variants identified through whole-genome sequencing of 2,636 Icelanders for association with TB (8,162 cases and 277,643 controls), pulmonary TB (PTB), and M. tuberculosis infection. We found association of three sequence variants in the HLA class II region: rs557011[T] (MAF=40.2%) with M. tuberculosis infection (OR =1.14, P=3.1×10-13) and PTB (OR=1.25, P=5.8×10-12) and rs9271378[G] (MAF=32.5%) with PTB (OR=0.78, P=2.5×10-12), both located between HLA-DQA1 and HLA-DRB1. Finally, a missense variant p.Ala210Thr in HLA-DQA1, (MAF=19.1%, rs9272785) shows association with M. tuberculosis infection (P=9.3×10-9, OR=1.14). The association of these variants with PTB was replicated in large samples of European ancestry from Russia and Croatia (P< 5.9×10-4). These findings demonstrate that the HLA class II region contributes to the complex genetic risk of tuberculosis, possibly through reduced presentation of protective M. tuberculosis antigens to T cells. PMID:26829749

  11. Variant Humicola grisea CBH1.1

    DOEpatents

    Goedegebuur, Frits; Gualfetti, Peter; Mitchinson, Colin; Larenas, Edmund

    2011-05-31

    Disclosed are variants of Humicola grisea Cel7A (CBH1.1), H. jecorina CBH1 variant or S. thermophilium CBH1, nucleic acids encoding the same and methods for producing the same. The variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted.

  12. Variant Humicola grisea CBH1.1

    DOEpatents

    Goedegebuur, Frits; Gualfetti, Peter; Mitchinson, Colin; Larenas, Edmund

    2008-12-02

    Disclosed are variants of Humicola grisea Cel7A (CBH1.1), H. jecorina CBH1 variant or S. thermophilium CBH1, nucleic acids encoding the same and methods for producing the same. The variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted.

  13. Variant Humicola grisea CBH1.1

    DOEpatents

    Goedegebuur, Frits; Gualfetti, Peter; Mitchinson, Colin; Larenas, Edmund

    2012-08-07

    Disclosed are variants of Humicola grisea Cel7A (CBH1.1), H. jecorina CBH1 variant or S. thermophilium CBH1, nucleic acids encoding the same and methods for producing the same. The variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted.

  14. Variant Humicola grisea CBH1.1

    DOEpatents

    Goedegebuur, Frits; Gualfetti, Peter; Mitchinson, Colin; Larenas, Edmund

    2011-08-16

    Disclosed are variants of Humicola grisea Cel7A (CBH1.1), H. jecorina CBH1 variant or S. thermophilium CBH1, nucleic acids encoding the same and methods for producing the same. The variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted.

  15. Variant Humicola grisea CBH1.1

    SciTech Connect

    Goedegebuur, Frits; Gualfetti, Peter; Mitchinson, Colin; Larenas, Edmund

    2014-03-18

    Disclosed are variants of Humicola grisea Cel7A (CBH1.1), H. jecorina CBH1 variant or S. thermophilium CBH1, nucleic acids encoding the same and methods for producing the same. The variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted.

  16. Variant humicola grisea CBH1.1

    SciTech Connect

    Goedegebuur, Frits; Gualfetti, Peter; Mitchinson, Colin; Edmund, Larenas

    2014-09-09

    Disclosed are variants of Humicola grisea Cel7A (CBH1.1), H. jecorina CBH1 variant or S. thermophilium CBH1, nucleic acids encoding the same and methods for producing the same. The variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted.

  17. Variant Humicola grisea CBH1.1

    SciTech Connect

    Goedegeburr, Frits; Gualfetti, Peter; Mitchinson, Colin; Larenas, Edmund

    2013-02-19

    Disclosed are variants of Humicola grisea Cel7A (CBH1.1), H. jecorina CBH1 variant or S. thermophilium CBH1, nucleic acids encoding the same and methods for producing the same. The variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted.

  18. Integrating mRNA and protein sequencing enables the detection and quantitative profiling of natural protein sequence variants of Populus trichocarpa

    DOE PAGES

    Abraham, Paul E.; Wang, Xiaojing; Ranjan, Priya; Zhang, Bing; Tuskan, Gerald A.; Robert L. Hettich; Nookaew, Intawat

    2015-10-20

    The availability of next-generation sequencing technologies has rapidly transformed our ability to link genotypes to phenotypes, and as such, promises to facilitate the dissection of genetic contribution to complex traits. Although discoveries of genetic associations will further our understanding of biology, once candidate variants have been identified, investigators are faced with the challenge of characterizing the functional effects on proteins encoded by such genes. Here we show how next-generation RNA sequencing data can be exploited to construct genotype-specific protein sequence databases, which provide a clearer picture of the molecular toolbox underlying cellular and organismal processes and their variation in amore » natural population. For this study, we used two individual genotypes (DENA-17-3 and VNDL-27-4) from a recent genome wide association (GWA) study of Populus trichocarpa, an obligate outcrosser that exhibits tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs) and insertions and deletions (INDELS). Based on large-scale identification of SAAPs, we profiled the frequency of 128 types of naturally occurring amino acid substitutions, with a subset of SAAPs occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. In addition, we were able to explore the diploid landscape of Populus at the proteome-level, allowing the characterization of heterozygous variants.« less

  19. Integrating mRNA and protein sequencing enables the detection and quantitative profiling of natural protein sequence variants of Populus trichocarpa

    SciTech Connect

    Abraham, Paul E.; Wang, Xiaojing; Ranjan, Priya; Zhang, Bing; Tuskan, Gerald A.; Robert L. Hettich; Nookaew, Intawat

    2015-10-20

    The availability of next-generation sequencing technologies has rapidly transformed our ability to link genotypes to phenotypes, and as such, promises to facilitate the dissection of genetic contribution to complex traits. Although discoveries of genetic associations will further our understanding of biology, once candidate variants have been identified, investigators are faced with the challenge of characterizing the functional effects on proteins encoded by such genes. Here we show how next-generation RNA sequencing data can be exploited to construct genotype-specific protein sequence databases, which provide a clearer picture of the molecular toolbox underlying cellular and organismal processes and their variation in a natural population. For this study, we used two individual genotypes (DENA-17-3 and VNDL-27-4) from a recent genome wide association (GWA) study of Populus trichocarpa, an obligate outcrosser that exhibits tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs) and insertions and deletions (INDELS). Based on large-scale identification of SAAPs, we profiled the frequency of 128 types of naturally occurring amino acid substitutions, with a subset of SAAPs occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. In addition, we were able to explore the diploid landscape of Populus at the proteome-level, allowing the characterization of heterozygous variants.

  20. Exome sequencing of ion channel genes reveals complex variant profiles confounding personal risk assessment in epilepsy

    PubMed Central

    Klassen, Tara; Davis, Caleb; Goldman, Alica; Burgess, Dan; Chen, Tim; Wheeler, David; McPherson, John; Bourquin, Traci; Lewis, Lora; Villasana, Donna; Morgan, Margaret; Muzny, Donna; Gibbs, Richard; Noebels, Jeffrey

    2011-01-01

    Ion channel mutations are an important cause of rare Mendelian disorders affecting brain, heart, and other tissues. We performed parallel exome sequencing of 237 channel genes in a well characterized human sample, comparing variant profiles of unaffected individuals to those with the most common neuronal excitability disorder, sporadic idiopathic epilepsy. Rare missense variation in known Mendelian disease genes is prevalent in both groups at similar complexity, revealing that even deleterious ion channel mutations confer uncertain risk to an individual depending on the other variants with which they are combined. Our findings indicate that variant discovery via large scale sequencing efforts is only a first step in illuminating the complex allelic architecture underlying personal disease risk. We propose that in silico modeling of channel variation in realistic cell and network models will be crucial to future strategies assessing mutation profile pathogenicity and drug response in individuals with a broad spectrum of excitability disorders. PMID:21703448

  1. Whole exome sequencing of rare variants in EIF4G1 and VPS35 in Parkinson disease

    PubMed Central

    Nuytemans, Karen; Bademci, Guney; Inchausti, Vanessa; Dressen, Amy; Kinnamon, Daniel D.; Mehta, Arpit; Wang, Liyong; Züchner, Stephan; Beecham, Gary W.; Martin, Eden R.; Scott, William K.

    2013-01-01

    Objective: Recently, vacuolar protein sorting 35 (VPS35) and eukaryotic translation initiation factor 4 gamma 1 (EIF4G1) have been identified as 2 causal Parkinson disease (PD) genes. We used whole exome sequencing for rapid, parallel analysis of variations in these 2 genes. Methods: We performed whole exome sequencing in 213 patients with PD and 272 control individuals. Those rare variants (RVs) with <5% frequency in the exome variant server database and our own control data were considered for analysis. We performed joint gene-based tests for association using RVASSOC and SKAT (Sequence Kernel Association Test) as well as single-variant test statistics. Results: We identified 3 novel VPS35 variations that changed the coded amino acid (nonsynonymous) in 3 cases. Two variations were in multiplex families and neither segregated with PD. In EIF4G1, we identified 11 (9 nonsynonymous and 2 small indels) RVs including the reported pathogenic mutation p.R1205H, which segregated in all affected members of a large family, but also in 1 unaffected 86-year-old family member. Two additional RVs were found in isolated patients only. Whereas initial association studies suggested an association (p = 0.04) with all RVs in EIF4G1, subsequent testing in a second dataset for the driving variant (p.F1461) suggested no association between RVs in the gene and PD. Conclusions: We confirm that the specific EIF4G1 variation p.R1205H seems to be a strong PD risk factor, but is nonpenetrant in at least one 86-year-old. A few other select RVs in both genes could not be ruled out as causal. However, there was no evidence for an overall contribution of genetic variability in VPS35 or EIF4G1 to PD development in our dataset. PMID:23408866

  2. Common 5S rRNA variants are likely to be accepted in many sequence contexts

    NASA Technical Reports Server (NTRS)

    Zhang, Zhengdong; D'Souza, Lisa M.; Lee, Youn-Hyung; Fox, George E.

    2003-01-01

    Over evolutionary time RNA sequences which are successfully fixed in a population are selected from among those that satisfy the structural and chemical requirements imposed by the function of the RNA. These sequences together comprise the structure space of the RNA. In principle, a comprehensive understanding of RNA structure and function would make it possible to enumerate which specific RNA sequences belong to a particular structure space and which do not. We are using bacterial 5S rRNA as a model system to attempt to identify principles that can be used to predict which sequences do or do not belong to the 5S rRNA structure space. One promising idea is the very intuitive notion that frequently seen sequence changes in an aligned data set of naturally occurring 5S rRNAs would be widely accepted in many other 5S rRNA sequence contexts. To test this hypothesis, we first developed well-defined operational definitions for a Vibrio region of the 5S rRNA structure space and what is meant by a highly variable position. Fourteen sequence variants (10 point changes and 4 base-pair changes) were identified in this way, which, by the hypothesis, would be expected to incorporate successfully in any of the known sequences in the Vibrio region. All 14 of these changes were constructed and separately introduced into the Vibrio proteolyticus 5S rRNA sequence where they are not normally found. Each variant was evaluated for its ability to function as a valid 5S rRNA in an E. coli cellular context. It was found that 93% (13/14) of the variants tested are likely valid 5S rRNAs in this context. In addition, seven variants were constructed that, although present in the Vibrio region, did not meet the stringent criteria for a highly variable position. In this case, 86% (6/7) are likely valid. As a control we also examined seven variants that are seldom or never seen in the Vibrio region of 5S rRNA sequence space. In this case only two of seven were found to be potentially valid. The

  3. Molecular Cloning and Expression of Sequence Variants of Manganese Superoxide Dismutase Genes from Wheat

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Reactive oxygen species (ROS) are very harmful to living organisms due to the potential oxidation of membrane lipids, DNA, proteins, and carbohydrates. Transformed E.coli strain QC 871, superoxide dismutase (SOD) double-mutant, with three sequence variant MnSOD1, MnSOD2, and MnSOD3 manganese supero...

  4. An integrated framework for discovery and genotyping of genomic variants from high-throughput sequencing experiments.

    PubMed

    Duitama, Jorge; Quintero, Juan Camilo; Cruz, Daniel Felipe; Quintero, Constanza; Hubmann, Georg; Foulquié-Moreno, Maria R; Verstrepen, Kevin J; Thevelein, Johan M; Tohme, Joe

    2014-04-01

    Recent advances in high-throughput sequencing (HTS) technologies and computing capacity have produced unprecedented amounts of genomic data that have unraveled the genetics of phenotypic variability in several species. However, operating and integrating current software tools for data analysis still require important investments in highly skilled personnel. Developing accurate, efficient and user-friendly software packages for HTS data analysis will lead to a more rapid discovery of genomic elements relevant to medical, agricultural and industrial applications. We therefore developed Next-Generation Sequencing Eclipse Plug-in (NGSEP), a new software tool for integrated, efficient and user-friendly detection of single nucleotide variants (SNVs), indels and copy number variants (CNVs). NGSEP includes modules for read alignment, sorting, merging, functional annotation of variants, filtering and quality statistics. Analysis of sequencing experiments in yeast, rice and human samples shows that NGSEP has superior accuracy and efficiency, compared with currently available packages for variants detection. We also show that only a comprehensive and accurate identification of repeat regions and CNVs allows researchers to properly separate SNVs from differences between copies of repeat elements. We expect that NGSEP will become a strong support tool to empower the analysis of sequencing data in a wide range of research projects on different species. PMID:24413664

  5. An integrated framework for discovery and genotyping of genomic variants from high-throughput sequencing experiments

    PubMed Central

    Duitama, Jorge; Quintero, Juan Camilo; Cruz, Daniel Felipe; Quintero, Constanza; Hubmann, Georg; Foulquié-Moreno, Maria R.; Verstrepen, Kevin J.; Thevelein, Johan M.; Tohme, Joe

    2014-01-01

    Recent advances in high-throughput sequencing (HTS) technologies and computing capacity have produced unprecedented amounts of genomic data that have unraveled the genetics of phenotypic variability in several species. However, operating and integrating current software tools for data analysis still require important investments in highly skilled personnel. Developing accurate, efficient and user-friendly software packages for HTS data analysis will lead to a more rapid discovery of genomic elements relevant to medical, agricultural and industrial applications. We therefore developed Next-Generation Sequencing Eclipse Plug-in (NGSEP), a new software tool for integrated, efficient and user-friendly detection of single nucleotide variants (SNVs), indels and copy number variants (CNVs). NGSEP includes modules for read alignment, sorting, merging, functional annotation of variants, filtering and quality statistics. Analysis of sequencing experiments in yeast, rice and human samples shows that NGSEP has superior accuracy and efficiency, compared with currently available packages for variants detection. We also show that only a comprehensive and accurate identification of repeat regions and CNVs allows researchers to properly separate SNVs from differences between copies of repeat elements. We expect that NGSEP will become a strong support tool to empower the analysis of sequencing data in a wide range of research projects on different species. PMID:24413664

  6. An integrated framework for discovery and genotyping of genomic variants from high-throughput sequencing experiments.

    PubMed

    Duitama, Jorge; Quintero, Juan Camilo; Cruz, Daniel Felipe; Quintero, Constanza; Hubmann, Georg; Foulquié-Moreno, Maria R; Verstrepen, Kevin J; Thevelein, Johan M; Tohme, Joe

    2014-04-01

    Recent advances in high-throughput sequencing (HTS) technologies and computing capacity have produced unprecedented amounts of genomic data that have unraveled the genetics of phenotypic variability in several species. However, operating and integrating current software tools for data analysis still require important investments in highly skilled personnel. Developing accurate, efficient and user-friendly software packages for HTS data analysis will lead to a more rapid discovery of genomic elements relevant to medical, agricultural and industrial applications. We therefore developed Next-Generation Sequencing Eclipse Plug-in (NGSEP), a new software tool for integrated, efficient and user-friendly detection of single nucleotide variants (SNVs), indels and copy number variants (CNVs). NGSEP includes modules for read alignment, sorting, merging, functional annotation of variants, filtering and quality statistics. Analysis of sequencing experiments in yeast, rice and human samples shows that NGSEP has superior accuracy and efficiency, compared with currently available packages for variants detection. We also show that only a comprehensive and accurate identification of repeat regions and CNVs allows researchers to properly separate SNVs from differences between copies of repeat elements. We expect that NGSEP will become a strong support tool to empower the analysis of sequencing data in a wide range of research projects on different species.

  7. A two-dimensional pooling strategy for rare variant detection on next-generation sequencing platforms.

    PubMed

    Zuzarte, Philip C; Denroche, Robert E; Fehringer, Gordon; Katzov-Eckert, Hagit; Hung, Rayjean J; McPherson, John D

    2014-01-01

    We describe a method for pooling and sequencing DNA from a large number of individual samples while preserving information regarding sample identity. DNA from 576 individuals was arranged into four 12 row by 12 column matrices and then pooled by row and by column resulting in 96 total pools with 12 individuals in each pool. Pooling of DNA was carried out in a two-dimensional fashion, such that DNA from each individual is present in exactly one row pool and exactly one column pool. By considering the variants observed in the rows and columns of a matrix we are able to trace rare variants back to the specific individuals that carry them. The pooled DNA samples were enriched over a 250 kb region previously identified by GWAS to significantly predispose individuals to lung cancer. All 96 pools (12 row and 12 column pools from 4 matrices) were barcoded and sequenced on an Illumina HiSeq 2000 instrument with an average depth of coverage greater than 4,000×. Verification based on Ion PGM sequencing confirmed the presence of 91.4% of confidently classified SNVs assayed. In this way, each individual sample is sequenced in multiple pools providing more accurate variant calling than a single pool or a multiplexed approach. This provides a powerful method for rare variant detection in regions of interest at a reduced cost to the researcher.

  8. Consensus Genotyper for Exome Sequencing (CGES): improving the quality of exome variant genotypes

    PubMed Central

    Trubetskoy, Vassily; Rodriguez, Alex; Dave, Uptal; Campbell, Nicholas; Crawford, Emily L.; Cook, Edwin H.; Sutcliffe, James S.; Foster, Ian; Madduri, Ravi; Cox, Nancy J.; Davis, Lea K.

    2015-01-01

    Motivation: The development of cost-effective next-generation sequencing methods has spurred the development of high-throughput bioinformatics tools for detection of sequence variation. With many disparate variant-calling algorithms available, investigators must ask, ‘Which method is best for my data?’ Machine learning research has shown that so-called ensemble methods that combine the output of multiple models can dramatically improve classifier performance. Here we describe a novel variant-calling approach based on an ensemble of variant-calling algorithms, which we term the Consensus Genotyper for Exome Sequencing (CGES). CGES uses a two-stage voting scheme among four algorithm implementations. While our ensemble method can accept variants generated by any variant-calling algorithm, we used GATK2.8, SAMtools, FreeBayes and Atlas-SNP2 in building CGES because of their performance, widespread adoption and diverse but complementary algorithms. Results: We apply CGES to 132 samples sequenced at the Hudson Alpha Institute for Biotechnology (HAIB, Huntsville, AL) using the Nimblegen Exome Capture and Illumina sequencing technology. Our sample set consisted of 40 complete trios, two families of four, one parent–child duo and two unrelated individuals. CGES yielded the fewest total variant calls (NCGES=139°897), the highest Ts/Tv ratio (3.02), the lowest Mendelian error rate across all genotypes (0.028%), the highest rediscovery rate from the Exome Variant Server (EVS; 89.3%) and 1000 Genomes (1KG; 84.1%) and the highest positive predictive value (PPV; 96.1%) for a random sample of previously validated de novo variants. We describe these and other quality control (QC) metrics from consensus data and explain how the CGES pipeline can be used to generate call sets of varying quality stringency, including consensus calls present across all four algorithms, calls that are consistent across any three out of four algorithms, calls that are consistent across any two out

  9. Exploration of the arrest peptide sequence space reveals arrest-enhanced variants.

    PubMed

    Cymer, Florian; Hedman, Rickard; Ismail, Nurzian; von Heijne, Gunnar

    2015-04-17

    Translational arrest peptides (APs) are short stretches of polypeptides that induce translational stalling when synthesized on a ribosome. Mechanical pulling forces acting on the nascent chain can weaken or even abolish stalling. APs can therefore be used as in vivo force sensors, making it possible to measure the forces that act on a nascent chain during translation with single-residue resolution. It is also possible to score the relative strengths of APs by subjecting them to a given pulling force and ranking them according to stalling efficiency. Using the latter approach, we now report an extensive mutagenesis scan of a strong mutant variant of the Mannheimia succiniciproducens SecM AP and identify mutations that further increase the stalling efficiency. Combining three such mutations, we designed an AP that withstands the strongest pulling force we are able to generate at present. We further show that diproline stretches in a nascent protein act as very strong APs when translation is carried out in the absence of elongation factor P. Our findings highlight critical residues in APs, show that certain amino acid sequences induce very strong translational arrest and provide a toolbox of APs of varying strengths that can be used for in vivo force measurements.

  10. Longitudinal studies on maternal HIV-1 variants by biological phenotyping, sequence analysis and viral load.

    PubMed

    Renta, J Y; Cadilla, C L; Vega, M E; Hillyer, G V; Estrada, C; Jiménez, E; Abreu, E; Méndez, I; Gandía, J; Meléndez-Guerrero, L M

    1997-11-01

    In this study, the HIV-1 variant viruses from ten pregnant women and their infants were isolated and characterized longitudinally in order to determine the role that viral envelope (gp120-V3 loop) gene variation and viral tropism play in vertical transmission. Biological phenotyping of each HIV variant was accomplished by growth in MT-2, and macrophages from healthy and non-HIV-infected donors. Genetic characterization of the variants was accomplished by DNA sequence analysis. All the women enrolled in this study received ZDV therapy. Virus was cultured from eight out of ten env V3-PCR positive mothers. HIV-1 isolates were all non-syncitium inducing variants. None of the mothers were found to transmit HIV, as determined by DNA PCR and quantitative co-cultures on their infants which were seronegative for HIV-1 through one year after birth. Viral cultures from infant blood samples were negative and infants were all healthy. However, nested env V3-PCR detected proviral DNA in five out of ten infants. In contrast, conventional gag-PCR was negative in the same five infants. Sequences of the five maternal-infant pairs were different, suggesting unique infant HIV-1 variants. The three highest maternal viral load values corresponded to infants that were env V3-PCR positive. These results suggest that HIV-1 particles are transmitted from ZDV-treated mothers to infants. Infant follow up is recommended to determine if HIV-1 has been inhibited by the immune system of the infants.

  11. Whole-exome sequencing identifies rare pathogenic variants in new predisposition genes for familial colorectal cancer

    PubMed Central

    Esteban-Jurado, Clara; Vila-Casadesús, Maria; Garre, Pilar; Lozano, Juan José; Pristoupilova, Anna; Beltran, Sergi; Muñoz, Jenifer; Ocaña, Teresa; Balaguer, Francesc; López-Cerón, Maria; Cuatrecasas, Miriam; Franch-Expósito, Sebastià; Piqué, Josep M.; Castells, Antoni; Carracedo, Angel; Ruiz-Ponte, Clara; Abulí, Anna; Bessa, Xavier; Andreu, Montserrat; Bujanda, Luis; Caldés, Trinidad; Castellví-Bel, Sergi

    2015-01-01

    Purpose: Colorectal cancer is an important cause of mortality in the developed world. Hereditary forms are due to germ-line mutations in APC, MUTYH, and the mismatch repair genes, but many cases present familial aggregation but an unknown inherited cause. The hypothesis of rare high-penetrance mutations in new genes is a likely explanation for the underlying predisposition in some of these familial cases. Methods: Exome sequencing was performed in 43 patients with colorectal cancer from 29 families with strong disease aggregation without mutations in known hereditary colorectal cancer genes. Data analysis selected only very rare variants (0–0.1%), producing a putative loss of function and located in genes with a role compatible with cancer. Variants in genes previously involved in hereditary colorectal cancer or nearby previous colorectal cancer genome-wide association study hits were also chosen. Results: Twenty-eight final candidate variants were selected and validated by Sanger sequencing. Correct family segregation and somatic studies were used to categorize the most interesting variants in CDKN1B, XRCC4, EPHX1, NFKBIZ, SMARCA4, and BARD1. Conclusion: We identified new potential colorectal cancer predisposition variants in genes that have a role in cancer predisposition and are involved in DNA repair and the cell cycle, which supports their putative involvement in germ-line predisposition to this neoplasm. PMID:25058500

  12. Colocalisation of predicted exonic splicing enhancers in BRCA2 with reported sequence variants.

    PubMed

    Pettigrew, Christopher A; Wayte, Nicola; Wronski, Ania; Lovelock, Paul K; Spurdle, Amanda B; Brown, Melissa A

    2008-07-01

    Disruption of the breast cancer susceptibility gene BRCA2 is associated with increased risk of developing breast and ovarian cancer. Over 1800 sequence changes in BRCA2 have been reported, although for many the pathogenicity is unclear. Classifying these changes remains a challenge, as they may disrupt regulatory sequences as well as the primary protein coding sequence. Sequence changes located in the splice site consensus sequences often disrupt splicing, however sequence changes located within exons are also able to alter splicing patterns. Unfortunately, the presence of these exonic splicing enhancers (ESEs) and the functional effect of variants within ESEs it is currently difficult to predict. We have previously developed a method of predicting which sequence changes within exons are likely to affect splicing, using BRCA1 as an example. In this paper, we have predicted ESEs in BRCA2 using the web-based tool ESEfinder and incorporated the same series of filters (increased threshold, 125 nt limit and evolutionary conservation of the motif) in order to identify predicted ESEs that are more likely to be functional. Initially 1114 ESEs were predicted for BRCA2, however after all the filters were included, this figure was reduced to 31, 3% of the original number of predicted ESEs. Reported unclassified sequence variants in BRCA2 were found to colocalise to 55% (17/31) of these conserved ESEs, while polymorphisms colocalised to 0 of the conserved ESEs. In summary, we have identified a subset of unclassified sequence variants in BRCA2 that may adversely affect splicing and thereby contribute to BRCA2 disruption.

  13. When is it MODY? Challenges in the Interpretation of Sequence Variants in MODY Genes.

    PubMed

    Althari, Sara; Gloyn, Anna L

    2015-01-01

    The genomics revolution has raised more questions than it has provided answers. Big data from large population-scale resequencing studies are increasingly deconstructing classic notions of Mendelian disease genetics, which support a simplistic correlation between mutational severity and phenotypic outcome. The boundaries are being blurred as the body of evidence showing monogenic disease-causing alleles in healthy genomes, and in the genomes of individu-als with increased common complex disease risk, continues to grow. In this review, we focus on the newly emerging challenges which pertain to the interpretation of sequence variants in genes implicated in the pathogenesis of maturity-onset diabetes of the young (MODY), a presumed mono-genic form of diabetes characterized by Mendelian inheritance. These challenges highlight the complexities surrounding the assignments of pathogenicity, in particular to rare protein-alerting variants, and bring to the forefront some profound clinical diagnostic implications. As MODY is both genetically and clinically heterogeneous, an accurate molecular diagnosis and cautious extrapolation of sequence data are critical to effective disease management and treatment. The biological and translational value of sequence information can only be attained by adopting a multitude of confirmatory analyses, which interrogate variant implication in disease from every possible angle. Indeed, studies which have effectively detected rare damaging variants in known MODY genes in normoglycemic individuals question the existence of a sin-gle gene mutation scenario: does monogenic diabetes exist when the genetic culprits of MODY have been systematical-ly identified in individuals without MODY? PMID:27111119

  14. MutaBind estimates and interprets the effects of sequence variants on protein-protein interactions.

    PubMed

    Li, Minghui; Simonetti, Franco L; Goncearenco, Alexander; Panchenko, Anna R

    2016-07-01

    Proteins engage in highly selective interactions with their macromolecular partners. Sequence variants that alter protein binding affinity may cause significant perturbations or complete abolishment of function, potentially leading to diseases. There exists a persistent need to develop a mechanistic understanding of impacts of variants on proteins. To address this need we introduce a new computational method MutaBind to evaluate the effects of sequence variants and disease mutations on protein interactions and calculate the quantitative changes in binding affinity. The MutaBind method uses molecular mechanics force fields, statistical potentials and fast side-chain optimization algorithms. The MutaBind server maps mutations on a structural protein complex, calculates the associated changes in binding affinity, determines the deleterious effect of a mutation, estimates the confidence of this prediction and produces a mutant structural model for download. MutaBind can be applied to a large number of problems, including determination of potential driver mutations in cancer and other diseases, elucidation of the effects of sequence variants on protein fitness in evolution and protein design. MutaBind is available at http://www.ncbi.nlm.nih.gov/projects/mutabind/. PMID:27150810

  15. Using Whole Exome Sequencing to Identify Candidate Genes With Rare Variants In Nonsyndromic Cleft Lip and Palate.

    PubMed

    Aylward, Alana; Cai, Yi; Lee, Andrew; Blue, Elizabeth; Rabinowitz, Daniel; Haddad, Joseph

    2016-07-01

    Studies suggest that nonsyndromic cleft lip and palate (NSCLP) is polygenic with variable penetrance, presenting a challenge in identifying all causal genetic variants. Despite relatively high prevalence of NSCLP among Amerindian populations, no large whole exome sequencing (WES) studies have been completed in this population. Our goal was to identify candidate genes with rare genetic variants for NSCLP in a Honduran population using WES. WES was performed on two to four members of 27 multiplex Honduran families. Genetic variants with a minor allele frequency > 1% in reference databases were removed. Heterozygous variants consistent with dominant disease with incomplete penetrance were ascertained, and variants with predicted functional consequence were prioritized for analysis. Pedigree-specific P-values were calculated as the probability of all affected members in the pedigree being carriers, given that at least one is a carrier. Preliminary results identified 3,727 heterozygous rare variants; 1,282 were predicted to be functionally consequential. Twenty-three genes had variants of interest in ≥3 families, where some genes had different variants in each family, giving a total of 50 variants. Variant validation via Sanger sequencing of the families and unrelated unaffected controls excluded variants that were sequencing errors or common variants not in databases, leaving four genes with candidate variants in ≥3 families. Of these, candidate variants in two genes consistently segregate with NSCLP as a dominant variant with incomplete penetrance: ACSS2 and PHYH. Rare variants found at the same gene in all affected individuals in several families are likely to be directly related to NSCLP. PMID:27229527

  16. Sequencing of SCN5A identifies rare and common variants associated with cardiac conduction

    PubMed Central

    Magnani, Jared W.; Brody, Jennifer A.; Prins, Bram P.; Arking, Dan E.; Lin, Honghuang; Yin, Xiaoyan; Liu, Ching-Ti; Morrison, Alanna C.; Zhang, Feng; Spector, Tim D.; Alonso, Alvaro; Bis, Joshua C.; Heckbert, Susan R.; Lumley, Thomas; Sitlani, Colleen M.; Cupples, L. Adrienne; Lubitz, Steven A.; Soliman, Elsayed Z.; Pulit, Sara L.; Newton-Cheh, Christopher; O'Donnell, Christopher J.; Ellinor, Patrick T.; Benjamin, Emelia J.; Muzny, Donna M.; Gibbs, Richard A.; Santibanez, Jireh; Taylor, Herman A.; Rotter, Jerome I.; Lange, Leslie A.; Psaty, Bruce M.; Jackson, Rebecca; Rich, Stephen S.; Boerwinkle, Eric; Jamshidi, Yalda; Sotoodehnia, Nona

    2014-01-01

    Background The cardiac sodium channel SCN5A regulates atrioventricular and ventricular conduction. Genetic variants in this gene are associated with PR and QRS intervals. We sought to further characterize the contribution of rare and common coding variation in SCN5A to cardiac conduction. Methods and Results In the Cohorts for Heart and Aging Research in Genomic Epidemiology Targeted Sequencing Study (CHARGE), we performed targeted exonic sequencing of SCN5A (n=3699, European-ancestry individuals) and identified 4 common (minor allele frequency >1%) and 157 rare variants. Common and rare SCN5A coding variants were examined for association with PR and QRS intervals through meta-analysis of European ancestry participants from CHARGE, NHLBI’s Exome Sequencing Project (ESP, n=607) and the UK10K (n=1275) and by examining ESP African-ancestry participants (N=972). Rare coding SCN5A variants in aggregate were associated with PR interval in European and African-ancestry participants (P=1.3×10−3). Three common variants were associated with PR and/or QRS interval duration among European-ancestry participants and one among African-ancestry participants. These included two well-known missense variants; rs1805124 (H558R) was associated with PR and QRS shortening in European-ancestry participants (P=6.25×10−4 and P=5.2×10−3 respectively) and rs7626962 (S1102Y) was associated with PR shortening in those of African ancestry (P=2.82×10−3). Among European-ancestry participants, two novel synonymous variants, rs1805126 and rs6599230, were associated with cardiac conduction. Our top signal, rs1805126 was associated with PR and QRS lengthening (P=3.35×10−7 and P=2.69×10−4 respectively), and rs6599230 was associated with PR shortening (P=2.67×10−5). Conclusions By sequencing SCN5A, we identified novel common and rare coding variants associated with cardiac conduction. PMID:24951663

  17. Extending Rare-Variant Testing Strategies: Analysis of Noncoding Sequence and Imputed Genotypes

    PubMed Central

    Zawistowski, Matthew; Gopalakrishnan, Shyam; Ding, Jun; Li, Yun; Grimm, Sara; Zöllner, Sebastian

    2010-01-01

    Next Generation Sequencing Technology has revolutionized our ability to study the contribution of rare genetic variation to heritable traits. However, existing single-marker association tests are underpowered for detecting rare risk variants. A more powerful approach involves pooling methods that combine multiple rare variants from the same gene into a single test statistic. Proposed pooling methods can be limited because they generally assume high-quality genotypes derived from deep-coverage sequencing, which may not be available. In this paper, we consider an intuitive and computationally efficient pooling statistic, the cumulative minor-allele test (CMAT). We assess the performance of the CMAT and other pooling methods on datasets simulated with population genetic models to contain realistic levels of neutral variation. We consider study designs ranging from exon-only to whole-gene analyses that contain noncoding variants. For all study designs, the CMAT achieves power comparable to that of previously proposed methods. We then extend the CMAT to probabilistic genotypes and describe application to low-coverage sequencing and imputation data. We show that augmenting sequence data with imputed samples is a practical method for increasing the power of rare-variant studies. We also provide a method of controlling for confounding variables such as population stratification. Finally, we demonstrate that our method makes it possible to use external imputation templates to analyze rare variants imputed into existing GWAS datasets. As proof of principle, we performed a CMAT analysis of more than 8 million SNPs that we imputed into the GAIN psoriasis dataset by using haplotypes from the 1000 Genomes Project. PMID:21070896

  18. Identification of polymorphisms and sequence variants in the human homologue of the mouse natural resistance-associated macrophage protein gene

    SciTech Connect

    Liu, Jing; Fujiwara, T.M.; Buu, N.T.; Sanchez, F.O.; Cellier, M.; Paradis, A.J.; Frappier, D.; Skamene, E.; Gros, P.; Morgan, K.

    1995-04-01

    The most common mycobacterial disease in humans is tuberculosis, and there is evidence for genetic factors in susceptibility to tuberculosis. In the mouse, the Bcg gene controls macrophage priming for activation and is a major gene for susceptibility to infection with mycobacteria. A candidate gene for Bcg was identified by positional cloning and was designated {open_quotes}natural resistance-associated macrophage protein gene{close_quotes} (Nramp1), and the human homologue (NRAMP1) has recently been cloned. Here we report (1) the physical mapping NRAMP1 close to VIL in chromosome region 2q35 by PCR analysis of somatic cell hybrids and YAC cloning and (2) the identification of nine sequence variants in NRAMP1. Of the four variants in the coding region, there were two missense mutations and two silent substitutions. The missense mutations were a conservative alanine-to-valine substitution at codon 318 in exon9 and an aspartic acid-to-asparagine substitution at codon 543 in the predicted cytoplasmic tail of the NRAMP1 protein. A microsatellite was located in the immediate 5{prime} region of the gene, three variants were in introns, and one variant was located in the 3{prime} UTR. The allele frequencies of each of the nine variants were determined in DNA samples of 60 Caucasians and 20 Asians. In addition, we have physically linked two highly polymorphic microsatellite markers, D2S104 and D2S173, to NRAMP1 on a 1.5-Mb YAC contig. These molecular markers will be useful to assess the role of NRAMP1 in susceptibility to tuberculosis and other macrophage-mediated diseases. 40 refs., 3 figs., 2 tabs.

  19. Amplicon Sequencing of Colorectal Cancer: Variant Calling in Frozen and Formalin-Fixed Samples

    PubMed Central

    Betge, Johannes; Kerr, Grainne; Miersch, Thilo; Leible, Svenja; Erdmann, Gerrit; Galata, Christian L.; Zhan, Tianzuo; Gaiser, Timo; Post, Stefan; Ebert, Matthias P.; Horisberger, Karoline; Boutros, Michael

    2015-01-01

    Next generation sequencing (NGS) is an emerging technology becoming relevant for genotyping of clinical samples. Here, we assessed the stability of amplicon sequencing from formalin-fixed paraffin-embedded (FFPE) and paired frozen samples from colorectal cancer metastases with different analysis pipelines. 212 amplicon regions in 48 cancer related genes were sequenced with Illumina MiSeq using DNA isolated from resection specimens from 17 patients with colorectal cancer liver metastases. From ten of these patients, paired fresh frozen and routinely processed FFPE tissue was available for comparative study. Sample quality of FFPE tissues was determined by the amount of amplifiable DNA using qPCR, sequencing libraries were evaluated using Bioanalyzer. Three bioinformatic pipelines were compared for analysis of amplicon sequencing data. Selected hot spot mutations were reviewed using Sanger sequencing. In the sequenced samples from 16 patients, 29 non-synonymous coding mutations were identified in eleven genes. Most frequent were mutations in TP53 (10), APC (7), PIK3CA (3) and KRAS (2). A high concordance of FFPE and paired frozen tissue samples was observed in ten matched samples, revealing 21 identical mutation calls and only two mutations differing. Comparison of these results with two other commonly used variant calling tools, however, showed high discrepancies. Hence, amplicon sequencing can potentially be used to identify hot spot mutations in colorectal cancer metastases in frozen and FFPE tissue. However, remarkable differences exist among results of different variant calling tools, which are not only related to DNA sample quality. Our study highlights the need for standardization and benchmarking of variant calling pipelines, which will be required for translational and clinical applications. PMID:26010451

  20. Phenolic acid esterases, coding sequences and methods

    DOEpatents

    Blum, David L.; Kataeva, Irina; Li, Xin-Liang; Ljungdahl, Lars G.

    2002-01-01

    Described herein are four phenolic acid esterases, three of which correspond to domains of previously unknown function within bacterial xylanases, from XynY and XynZ of Clostridium thermocellum and from a xylanase of Ruminococcus. The fourth specifically exemplified xylanase is a protein encoded within the genome of Orpinomyces PC-2. The amino acids of these polypeptides and nucleotide sequences encoding them are provided. Recombinant host cells, expression vectors and methods for the recombinant production of phenolic acid esterases are also provided.

  1. Rapid Detection of Rare Deleterious Variants by Next Generation Sequencing with Optional Microarray SNP Genotype Data

    PubMed Central

    Watson, Christopher M.; Crinnion, Laura A.; Gurgel‐Gianetti, Juliana; Harrison, Sally M.; Daly, Catherine; Antanavicuite, Agne; Lascelles, Carolina; Markham, Alexander F.; Pena, Sergio D. J.; Bonthron, David T.

    2015-01-01

    ABSTRACT Autozygosity mapping is a powerful technique for the identification of rare, autosomal recessive, disease‐causing genes. The ease with which this category of disease gene can be identified has greatly increased through the availability of genome‐wide SNP genotyping microarrays and subsequently of exome sequencing. Although these methods have simplified the generation of experimental data, its analysis, particularly when disparate data types must be integrated, remains time consuming. Moreover, the huge volume of sequence variant data generated from next generation sequencing experiments opens up the possibility of using these data instead of microarray genotype data to identify disease loci. To allow these two types of data to be used in an integrated fashion, we have developed AgileVCFMapper, a program that performs both the mapping of disease loci by SNP genotyping and the analysis of potentially deleterious variants using exome sequence variant data, in a single step. This method does not require microarray SNP genotype data, although analysis with a combination of microarray and exome genotype data enables more precise delineation of disease loci, due to superior marker density and distribution. PMID:26037133

  2. Novel scripts for improved annotation and selection of variants from whole exome sequencing in cancer research.

    PubMed

    Hansen, Marcus Celik; Nederby, Line; Roug, Anne; Villesen, Palle; Kjeldsen, Eigil; Nyvold, Charlotte Guldborg; Hokland, Peter

    2015-01-01

    Sequencing the exome is quickly becoming the preferred method for discovering disease-inducing mutations. While obtaining data sets is a straightforward procedure, the subsequent analysis and interpretation of the data is a limiting step for clinical applications. Thus, while the initial mutation and variant calling can be performed by a bioinformatician or trained researcher, the output from robust packages such as MuTect and GATK is not directly informative for the general life scientists. In attempt to obviate this problem we have created complementary Wolfram scripts, which enable easy downstream annotation and selection, presented here in the perspective of hematological relevance. It also provides the researcher with the opportunity to extend the analysis by having a full-fledged programming and analysis environment of Mathematica at hand. In brief, post-processing is performed by: •Mapping of germ line and somatic variants to coding regions, and defining variant sets within Mathematica.•Processing of variants in variant effect predictor.•Extended annotation, relevance scoring and defining focus areas through the provided functions. PMID:26150983

  3. HLA class II sequence variants influence tuberculosis risk in populations of European ancestry.

    PubMed

    Sveinbjornsson, Gardar; Gudbjartsson, Daniel F; Halldorsson, Bjarni V; Kristinsson, Karl G; Gottfredsson, Magnus; Barrett, Jeffrey C; Gudmundsson, Larus J; Blondal, Kai; Gylfason, Arnaldur; Gudjonsson, Sigurjon Axel; Helgadottir, Hafdis T; Jonasdottir, Adalbjorg; Jonasdottir, Aslaug; Karason, Ari; Kardum, Ljiljana Bulat; Knežević, Jelena; Kristjansson, Helgi; Kristjansson, Mar; Love, Arthur; Luo, Yang; Magnusson, Olafur T; Sulem, Patrick; Kong, Augustine; Masson, Gisli; Thorsteinsdottir, Unnur; Dembic, Zlatko; Nejentsev, Sergey; Blondal, Thorsteinn; Jonsdottir, Ingileif; Stefansson, Kari

    2016-03-01

    Mycobacterium tuberculosis infections cause 9 million new tuberculosis cases and 1.5 million deaths annually. To identify variants conferring risk of tuberculosis, we tested 28.3 million variants identified through whole-genome sequencing of 2,636 Icelanders for association with tuberculosis (8,162 cases and 277,643 controls), pulmonary tuberculosis (PTB) and M. tuberculosis infection. We found association of three variants in the region harboring genes encoding the class II human leukocyte antigens (HLAs): rs557011[T] (minor allele frequency (MAF) = 40.2%), associated with M. tuberculosis infection (odds ratio (OR) = 1.14, P = 3.1 × 10(-13)) and PTB (OR = 1.25, P = 5.8 × 10(-12)), and rs9271378[G] (MAF = 32.5%), associated with PTB (OR = 0.78, P = 2.5 × 10(-12))--both located between HLA-DQA1 and HLA-DRB1--and a missense variant encoding p.Ala210Thr in HLA-DQA1 (MAF = 19.1%, rs9272785), associated with M. tuberculosis infection (P = 9.3 × 10(-9), OR = 1.14). We replicated association of these variants with PTB in samples of European ancestry from Russia and Croatia (P < 5.9 × 10(-4)). These findings show that the HLA class II region contributes to genetic risk of tuberculosis, possibly through reduced presentation of protective M. tuberculosis antigens to T cells.

  4. CLAMMS: a scalable algorithm for calling common and rare copy number variants from exome sequencing data

    PubMed Central

    Packer, Jonathan S.; Maxwell, Evan K.; O’Dushlaine, Colm; Lopez, Alexander E.; Dewey, Frederick E.; Chernomorsky, Rostislav; Baras, Aris; Overton, John D.; Habegger, Lukas; Reid, Jeffrey G.

    2016-01-01

    Motivation: Several algorithms exist for detecting copy number variants (CNVs) from human exome sequencing read depth, but previous tools have not been well suited for large population studies on the order of tens or hundreds of thousands of exomes. Their limitations include being difficult to integrate into automated variant-calling pipelines and being ill-suited for detecting common variants. To address these issues, we developed a new algorithm—Copy number estimation using Lattice-Aligned Mixture Models (CLAMMS)—which is highly scalable and suitable for detecting CNVs across the whole allele frequency spectrum. Results: In this note, we summarize the methods and intended use-case of CLAMMS, compare it to previous algorithms and briefly describe results of validation experiments. We evaluate the adherence of CNV calls from CLAMMS and four other algorithms to Mendelian inheritance patterns on a pedigree; we compare calls from CLAMMS and other algorithms to calls from SNP genotyping arrays for a set of 3164 samples; and we use TaqMan quantitative polymerase chain reaction to validate CNVs predicted by CLAMMS at 39 loci (95% of rare variants validate; across 19 common variant loci, the mean precision and recall are 99% and 94%, respectively). In the Supplementary Materials (available at the CLAMMS Github repository), we present our methods and validation results in greater detail. Availability and implementation: https://github.com/rgcgithub/clamms (implemented in C). Contact: jeffrey.reid@regeneron.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26382196

  5. De novo sequencing and variant calling with nanopores using PoreSeq

    PubMed Central

    Szalay, Tamas; Golovchenko, Jene A.

    2016-01-01

    The single-molecule accuracy of nanopore sequencing has been an area of rapid academic and commercial advancement, but remains challenging for the de novo analysis of genomes. We introduce here a novel algorithm for the error correction of nanopore data, utilizing statistical models of the physical system in order to obtain high accuracy de novo sequences at a range of coverage depths. We demonstrate the technique by sequencing M13 bacteriophage DNA to 99% accuracy at moderate coverage as well as its use in an assembly pipeline by sequencing E. coli and λ DNA at a range of coverages. We also show the algorithm’s ability to accurately classify sequence variants at far lower coverage than existing methods. PMID:26352647

  6. Complete Nucleotide Sequence Analysis of the Norovirus GII.4 Sydney Variant in South Korea

    PubMed Central

    Park, Ji-Sun; Lee, Sung-Geun; Cho, Han-Gil; Jheong, Weon-Hwa; Paik, Soon-Young

    2015-01-01

    Norovirus is the primary cause of acute gastroenteritis in individuals of all ages. In Australia, a new strain of norovirus (GII.4) was identified in March 2012, and this strain has spread rapidly around the world. In August 2012, this new GII.4 strain was identified in patients in South Korea. Therefore, to examine the characteristics of the epidemic norovirus GII.4 2012 variant in South Korea, we conducted KM272334 full-length genomic analysis. The genome of the gg-12-08-04 strain consisted of 7,558 bp and contained three open reading frame (ORF) composites throughout the whole genome: ORF1 (5,100 bp), ORF2 (1,623 bp), and ORF3 (807 bp). Phylogenetic analyses showed that gg-12-08-04 belonged to the GII.4 Sydney 2012 variant, sharing 98.92% nucleotide similarity with this variant strain. According to SimPlot analysis, the gg-12-08-04 strain was a recombinant strain with breakpoint at the ORF1/2 junction between Osaka 2007 and Apeldoorn 2008 strains. This study is the first report of the complete sequence of the GII.4 Sydney 2012 strain in South Korea. Therefore, this may represent the standard sequence of the norovirus GII.4 2012 variant in South Korea and could therefore be useful for the development of norovirus vaccines. PMID:25688356

  7. Adjusted sequence kernel association test for rare variants controlling for cryptic and family relatedness.

    PubMed

    Oualkacha, Karim; Dastani, Zari; Li, Rui; Cingolani, Pablo E; Spector, Timothy D; Hammond, Christopher J; Richards, J Brent; Ciampi, Antonio; Greenwood, Celia M T

    2013-05-01

    Recent progress in sequencing technologies makes it possible to identify rare and unique variants that may be associated with complex traits. However, the results of such efforts depend crucially on the use of efficient statistical methods and study designs. Although family-based designs might enrich a data set for familial rare disease variants, most existing rare variant association approaches assume independence of all individuals. We introduce here a framework for association testing of rare variants in family-based designs. This framework is an adaptation of the sequence kernel association test (SKAT) which allows us to control for family structure. Our adjusted SKAT (ASKAT) combines the SKAT approach and the factored spectrally transformed linear mixed models (FaST-LMMs) algorithm to capture family effects based on a LMM incorporating the realized proportion of the genome that is identical by descent between pairs of individuals, and using restricted maximum likelihood methods for estimation. In simulation studies, we evaluated type I error and power of this proposed method and we showed that regardless of the level of the trait heritability, our approach has good control of type I error and good power. Since our approach uses FaST-LMM to calculate variance components for the proposed mixed model, ASKAT is reasonably fast and can analyze hundreds of thousands of markers. Data from the UK twins consortium are presented to illustrate the ASKAT methodology. PMID:23529756

  8. Complete nucleotide sequence analysis of the norovirus GII.4 Sydney variant in South Korea.

    PubMed

    Park, Ji-Sun; Lee, Sung-Geun; Jin, Ji-Young; Cho, Han-Gil; Jheong, Weon-Hwa; Paik, Soon-Young

    2015-01-01

    Norovirus is the primary cause of acute gastroenteritis in individuals of all ages. In Australia, a new strain of norovirus (GII.4) was identified in March 2012, and this strain has spread rapidly around the world. In August 2012, this new GII.4 strain was identified in patients in South Korea. Therefore, to examine the characteristics of the epidemic norovirus GII.4 2012 variant in South Korea, we conducted KM272334 full-length genomic analysis. The genome of the gg-12-08-04 strain consisted of 7,558 bp and contained three open reading frame (ORF) composites throughout the whole genome: ORF1 (5,100 bp), ORF2 (1,623 bp), and ORF3 (807 bp). Phylogenetic analyses showed that gg-12-08-04 belonged to the GII.4 Sydney 2012 variant, sharing 98.92% nucleotide similarity with this variant strain. According to SimPlot analysis, the gg-12-08-04 strain was a recombinant strain with breakpoint at the ORF1/2 junction between Osaka 2007 and Apeldoorn 2008 strains. This study is the first report of the complete sequence of the GII.4 Sydney 2012 strain in South Korea. Therefore, this may represent the standard sequence of the norovirus GII.4 2012 variant in South Korea and could therefore be useful for the development of norovirus vaccines.

  9. Coordinate amplification of metallothionein I and II gene sequences in cadmium-resistant CHO variants

    SciTech Connect

    Hildebrand, C.E.; Crawford, B.D.; Enger, M.D.

    1983-01-01

    Cadmium-resistanc (Cd/sup r/) variants of the Chinese hamster cell line, CHO, have been derived by stepwise selection for growth in medium containing CdCl/sub 2/. These variants show coordinately increased production of both metallothionein (MT) I and II and were stably resistant to Cd/sup 2 +/ in the absence of continued selection. Genomic DNAs from these Cd/sup r/ sublines were analyzed for both MT gene copy number and MT gene organization, using cDNA sequence probes specific for each of the two Chinese hamster isometallothioneins. These analyses revealed coordinate amplification of MT I and II genes in all Cd/sup r/ variants which had increased copies of MT-encoding sequences. In situ hybridization of an MT-encoding probe to mitotic chromosomes of a Cd/sup r/ variant, which has amplified MT genes at least 14-fold, revealed a single chromosomal site of hybridization. These results suggest that the isoMTs constitute a functionally related gene cluster which amplifies coordinately in response to toxic metal stress.

  10. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1998-07-21

    A method is disclosed for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe. 11 figs.

  11. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1998-01-01

    A method for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe.

  12. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications.

    PubMed

    Rimmer, Andy; Phan, Hang; Mathieson, Iain; Iqbal, Zamin; Twigg, Stephen R F; Wilkie, Andrew O M; McVean, Gil; Lunter, Gerton

    2014-08-01

    High-throughput DNA sequencing technology has transformed genetic research and is starting to make an impact on clinical practice. However, analyzing high-throughput sequencing data remains challenging, particularly in clinical settings where accuracy and turnaround times are critical. We present a new approach to this problem, implemented in a software package called Platypus. Platypus achieves high sensitivity and specificity for SNPs, indels and complex polymorphisms by using local de novo assembly to generate candidate variants, followed by local realignment and probabilistic haplotype estimation. It is an order of magnitude faster than existing tools and generates calls from raw aligned read data without preprocessing. We demonstrate the performance of Platypus in clinically relevant experimental designs by comparing with SAMtools and GATK on whole-genome and exome-capture data, by identifying de novo variation in 15 parent-offspring trios with high sensitivity and specificity, and by estimating human leukocyte antigen genotypes directly from variant calls. PMID:25017105

  13. Polymorphisms and variants in the prion protein sequence of European moose (Alces alces), reindeer (Rangifer tarandus), roe deer (Capreolus capreolus) and fallow deer (Dama dama) in Scandinavia.

    PubMed

    Wik, Lotta; Mikko, Sofia; Klingeborn, Mikael; Stéen, Margareta; Simonsson, Magnus; Linné, Tommy

    2012-07-01

    The prion protein (PrP) sequence of European moose, reindeer, roe deer and fallow deer in Scandinavia has high homology to the PrP sequence of North American cervids. Variants in the European moose PrP sequence were found at amino acid position 109 as K or Q. The 109Q variant is unique in the PrP sequence of vertebrates. During the 1980s a wasting syndrome in Swedish moose, Moose Wasting Syndrome (MWS), was described. SNP analysis demonstrated a difference in the observed genotype proportions of the heterozygous Q/K and homozygous Q/Q variants in the MWS animals compared with the healthy animals. In MWS moose the allele frequencies for 109K and 109Q were 0.73 and 0.27, respectively, and for healthy animals 0.69 and 0.31. Both alleles were seen as heterozygotes and homozygotes. In reindeer, PrP sequence variation was demonstrated at codon 176 as D or N and codon 225 as S or Y. The PrP sequences in roe deer and fallow deer were identical with published GenBank sequences.

  14. Sequence variant classification and reporting: recommendations for improving the interpretation of cancer susceptibility genetic test results.

    PubMed

    Plon, Sharon E; Eccles, Diana M; Easton, Douglas; Foulkes, William D; Genuardi, Maurizio; Greenblatt, Marc S; Hogervorst, Frans B L; Hoogerbrugge, Nicoline; Spurdle, Amanda B; Tavtigian, Sean V

    2008-11-01

    Genetic testing of cancer susceptibility genes is now widely applied in clinical practice to predict risk of developing cancer. In general, sequence-based testing of germline DNA is used to determine whether an individual carries a change that is clearly likely to disrupt normal gene function. Genetic testing may detect changes that are clearly pathogenic, clearly neutral, or variants of unclear clinical significance. Such variants present a considerable challenge to the diagnostic laboratory and the receiving clinician in terms of interpretation and clear presentation of the implications of the result to the patient. There does not appear to be a consistent approach to interpreting and reporting the clinical significance of variants either among genes or among laboratories. The potential for confusion among clinicians and patients is considerable and misinterpretation may lead to inappropriate clinical consequences. In this article we review the current state of sequence-based genetic testing, describe other standardized reporting systems used in oncology, and propose a standardized classification system for application to sequence-based results for cancer predisposition genes. We suggest a system of five classes of variants based on the degree of likelihood of pathogenicity. Each class is associated with specific recommendations for clinical management of at-risk relatives that will depend on the syndrome. We propose that panels of experts on each cancer predisposition syndrome facilitate the classification scheme and designate appropriate surveillance and cancer management guidelines. The international adoption of a standardized reporting system should improve the clinical utility of sequence-based genetic tests to predict cancer risk. PMID:18951446

  15. Optimization of short amino acid sequences classifier

    NASA Astrophysics Data System (ADS)

    Barcz, Aleksy; Szymański, Zbigniew

    This article describes processing methods used for short amino acid sequences classification. The data processed are 9-symbols string representations of amino acid sequences, divided into 49 data sets - each one containing samples labeled as reacting or not with given enzyme. The goal of the classification is to determine for a single enzyme, whether an amino acid sequence would react with it or not. Each data set is processed separately. Feature selection is performed to reduce the number of dimensions for each data set. The method used for feature selection consists of two phases. During the first phase, significant positions are selected using Classification and Regression Trees. Afterwards, symbols appearing at the selected positions are substituted with numeric values of amino acid properties taken from the AAindex database. In the second phase the new set of features is reduced using a correlation-based ranking formula and Gram-Schmidt orthogonalization. Finally, the preprocessed data is used for training LS-SVM classifiers. SPDE, an evolutionary algorithm, is used to obtain optimal hyperparameters for the LS-SVM classifier, such as error penalty parameter C and kernel-specific hyperparameters. A simple score penalty is used to adapt the SPDE algorithm to the task of selecting classifiers with best performance measures values.

  16. Direct sequence evaluation of the major outer membrane protein gene variant regions of Chlamydia trachomatis subtypes D', I', and L2'.

    PubMed Central

    Dean, D; Patton, M; Stephens, R S

    1991-01-01

    The nucleotide sequences of variable segments (VS) 1, 2, and 4 for the major outer membrane protein gene (omp1) of Chlamydia trachomatis were determined for serologically defined subtypes D', I', and L2'. Asymmetric DNA amplification was used to produce single-stranded DNA for direct sequencing. Amino acid substitutions were detected in VS1, VS2, and VS4 for I', in VS2 for L2', and in VS4 for D'. DNA sequencing of omp1 variant regions may be an important method for evaluating the molecular epidemiology of Chlamydia spp. PMID:1706325

  17. Direct sequence evaluation of the major outer membrane protein gene variant regions of Chlamydia trachomatis subtypes D', I', and L2'.

    PubMed

    Dean, D; Patton, M; Stephens, R S

    1991-04-01

    The nucleotide sequences of variable segments (VS) 1, 2, and 4 for the major outer membrane protein gene (omp1) of Chlamydia trachomatis were determined for serologically defined subtypes D', I', and L2'. Asymmetric DNA amplification was used to produce single-stranded DNA for direct sequencing. Amino acid substitutions were detected in VS1, VS2, and VS4 for I', in VS2 for L2', and in VS4 for D'. DNA sequencing of omp1 variant regions may be an important method for evaluating the molecular epidemiology of Chlamydia spp.

  18. Methods for analyzing nucleic acid sequences

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid. The method provides a complex comprising a polymerase enzyme, a target nucleic acid molecule, and a primer, wherein the complex is immobilized on a support Fluorescent label is attached to a terminal phosphate group of the nucleotide or nucleotide analog. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The time duration of the signal from labeled nucleotides or nucleotide analogs that become incorporated is distinguished from freely diffusing labels by a longer retention in the observation volume for the nucleotides or nucleotide analogs that become incorporated than for the freely diffusing labels.

  19. Nucleotide sequence of a hop stunt viroid variant isolated from citrus growing in Taiwan.

    PubMed

    Hsu, Y H; Chen, W; Owens, R A

    1995-01-01

    The 303 nucleotide sequence of HSVd-citrus(T), a hop stunt viroid (HSVd) variant present in Etrog citron growing in Taiwan, was determined from cDNAs amplified by the polymerase chain reaction. HSVd-citrus(T) is very similar to several HSVd isolates previously recovered from citrus or cucumber, and exhibits microsequence heterogeneity at positions 154 and 181. Phylogenetic analysis using maximum parsimony grouped HSVd-citrus(T) with seven other isolates from citrus and cucumber in a large cluster of "citrus-type" isolates. A similar analysis revealed marked differences in both the extent and distribution of sequence variation among naturally occurring isolates of potato spindle tuber viroid.

  20. Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data.

    PubMed

    Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A; Larsen, Martin Jakob

    2016-01-01

    Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths.

  1. Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data

    PubMed Central

    Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A.; Larsen, Martin Jakob

    2016-01-01

    Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths. PMID:27002637

  2. Family-based association test using both common and rare variants and accounting for directions of effects for sequencing data.

    PubMed

    Chung, Ren-Hua; Tsai, Wei-Yun; Martin, Eden R

    2014-01-01

    Current family-based association tests for sequencing data were mainly developed for identifying rare variants associated with a complex disease. As the disease can be influenced by the joint effects of common and rare variants, common variants with modest effects may not be identified by the methods focusing on rare variants. Moreover, variants can have risk, neutral, or protective effects. Association tests that can effectively select groups of common and rare variants that are likely to be causal and consider the directions of effects have become important. We developed the Ordered Subset - Variable Threshold - Pedigree Disequilibrium Test (OVPDT), a combination of three algorithms, for association analysis in family sequencing data. The ordered subset algorithm is used to select a subset of common variants based on their relative risks, calculated using only parental mating types. The variable threshold algorithm is used to search for an optimal allele frequency threshold such that rare variants below the threshold are more likely to be causal. The PDT statistics from both rare and common variants selected by the two algorithms are combined as the OVPDT statistic. A permutation procedure is used in OVPDT to calculate the p-value. We used simulations to demonstrate that OVPDT has the correct type I error rates under different scenarios and compared the power of OVPDT with two other family-based association tests. The results suggested that OVPDT can have more power than the other tests if both common and rare variants have effects on the disease in a region.

  3. Sequence variants in oxytocin pathway genes and preterm birth: a candidate gene association study

    PubMed Central

    2013-01-01

    Background Preterm birth (PTB) is a complex disorder associated with significant neonatal mortality and morbidity and long-term adverse health consequences. Multiple lines of evidence suggest that genetic factors play an important role in its etiology. This study was designed to identify genetic variation associated with PTB in oxytocin pathway genes whose role in parturition is well known. Methods To identify common genetic variants predisposing to PTB, we genotyped 16 single nucleotide polymorphisms (SNPs) in the oxytocin (OXT), oxytocin receptor (OXTR), and leucyl/cystinyl aminopeptidase (LNPEP) genes in 651 case infants from the U.S. and one or both of their parents. In addition, we examined the role of rare genetic variation in susceptibility to PTB by conducting direct sequence analysis of OXTR in 1394 cases and 1112 controls from the U.S., Argentina, Denmark, and Finland. This study was further extended to maternal triads (maternal grandparents-mother of a case infant, N=309). We also performed in vitro analysis of selected rare OXTR missense variants to evaluate their functional importance. Results Maternal genetic effect analysis of the SNP genotype data revealed four SNPs in LNPEP that show significant association with prematurity. In our case–control sequence analysis, we detected fourteen coding variants in exon 3 of OXTR, all but four of which were found in cases only. Of the fourteen variants, three were previously unreported novel rare variants. When the sequence data from the maternal triads were analyzed using the transmission disequilibrium test, two common missense SNPs (rs4686302 and rs237902) in OXTR showed suggestive association for three gestational age subgroups. In vitro functional assays showed a significant difference in ligand binding between wild-type and two mutant receptors. Conclusions Our study suggests an association between maternal common polymorphisms in LNPEP and susceptibility to PTB. Maternal OXTR missense SNPs rs4686302

  4. Targeted next-generation sequencing reveals multiple deleterious variants in OPLL-associated genes

    PubMed Central

    Chen, Xin; Guo, Jun; Cai, Tao; Zhang, Fengshan; Pan, Shengfa; Zhang, Li; Wang, Shaobo; Zhou, Feifei; Diao, Yinze; Zhao, Yanbin; Chen, Zhen; Liu, Xiaoguang; Chen, Zhongqiang; Liu, Zhongjun; Sun, Yu; Du, Jie

    2016-01-01

    Ossification of the posterior longitudinal ligament of the spine (OPLL), which is characterized by ectopic bone formation in the spinal ligaments, can cause spinal-cord compression. To date, at least 11 susceptibility genes have been genetically linked to OPLL. In order to identify potential deleterious alleles in these OPLL-associated genes, we designed a capture array encompassing all coding regions of the target genes for next-generation sequencing (NGS) in a cohort of 55 unrelated patients with OPLL. By bioinformatics analyses, we successfully identified three novel and five extremely rare variants (MAF < 0.005). These variants were predicted to be deleterious by commonly used various algorithms, thereby resulting in missense mutations in four OPLL-associated genes (i.e., COL6A1, COL11A2, FGFR1, and BMP2). Furthermore, potential effects of the patient with p.Q89E of BMP2 were confirmed by a markedly increased BMP2 level in peripheral blood samples. Notably, seven of the variants were found to be associated with the patients with continuous subtype changes by cervical spinal radiological analyses. Taken together, our findings revealed for the first time that deleterious coding variants of the four OPLL-associated genes are potentially pathogenic in the patients with OPLL. PMID:27246988

  5. Next-generation sequencing reveals large connected networks of intra-host HCV variants

    PubMed Central

    2014-01-01

    Background Next-generation sequencing (NGS) allows for sampling numerous viral variants from infected patients. This provides a novel opportunity to represent and study the mutational landscape of Hepatitis C Virus (HCV) within a single host. Results Intra-host variants of the HCV E1/E2 region were extensively sampled from 58 chronically infected patients. After NGS error correction, the average number of reads and variants obtained from each sample were 3202 and 464, respectively. The distance between each pair of variants was calculated and networks were created for each patient, where each node is a variant and two nodes are connected by a link if the nucleotide distance between them is 1. The work focused on large components having > 5% of all reads, which in average account for 93.7% of all reads found in a patient. The distance between any two variants calculated over the component correlated strongly with nucleotide distances (r = 0.9499; p = 0.0001), a better correlation than the one obtained with Neighbour-Joining trees (r = 0.7624; p = 0.0001). In each patient, components were well separated, with the average distance between (6.53%) being 10 times greater than within each component (0.68%). The ratio of nonsynonymous to synonymous changes was calculated and some patients (6.9%) showed a mixture of networks under strong negative and positive selection. All components were robust to in silico stochastic sampling; even after randomly removing 85% of all reads, the largest connected component in the new subsample still involved 82.4% of remaining nodes. In vitro sampling showed that 93.02% of components present in the original sample were also found in experimental replicas, with 81.6% of reads found in both. When syringe-sharing transmission events were simulated, 91.2% of all simulated transmission events seeded all components present in the source. Conclusions Most intra-host variants are organized into distinct single-mutation components that are: well

  6. Predicting Mendelian Disease-Causing Non-Synonymous Single Nucleotide Variants in Exome Sequencing Studies

    PubMed Central

    Bao, Su-Ying; Yang, Wanling; Ho, Shu-Leong; Song, Yong-Qiang; Sham, Pak C.

    2013-01-01

    Exome sequencing is becoming a standard tool for mapping Mendelian disease-causing (or pathogenic) non-synonymous single nucleotide variants (nsSNVs). Minor allele frequency (MAF) filtering approach and functional prediction methods are commonly used to identify candidate pathogenic mutations in these studies. Combining multiple functional prediction methods may increase accuracy in prediction. Here, we propose to use a logit model to combine multiple prediction methods and compute an unbiased probability of a rare variant being pathogenic. Also, for the first time we assess the predictive power of seven prediction methods (including SIFT, PolyPhen2, CONDEL, and logit) in predicting pathogenic nsSNVs from other rare variants, which reflects the situation after MAF filtering is done in exome-sequencing studies. We found that a logit model combining all or some original prediction methods outperforms other methods examined, but is unable to discriminate between autosomal dominant and autosomal recessive disease mutations. Finally, based on the predictions of the logit model, we estimate that an individual has around 5% of rare nsSNVs that are pathogenic and carries ∼22 pathogenic derived alleles at least, which if made homozygous by consanguineous marriages may lead to recessive diseases. PMID:23341771

  7. Predicting mendelian disease-causing non-synonymous single nucleotide variants in exome sequencing studies.

    PubMed

    Li, Miao-Xin; Kwan, Johnny S H; Bao, Su-Ying; Yang, Wanling; Ho, Shu-Leong; Song, Yong-Qiang; Sham, Pak C

    2013-01-01

    Exome sequencing is becoming a standard tool for mapping Mendelian disease-causing (or pathogenic) non-synonymous single nucleotide variants (nsSNVs). Minor allele frequency (MAF) filtering approach and functional prediction methods are commonly used to identify candidate pathogenic mutations in these studies. Combining multiple functional prediction methods may increase accuracy in prediction. Here, we propose to use a logit model to combine multiple prediction methods and compute an unbiased probability of a rare variant being pathogenic. Also, for the first time we assess the predictive power of seven prediction methods (including SIFT, PolyPhen2, CONDEL, and logit) in predicting pathogenic nsSNVs from other rare variants, which reflects the situation after MAF filtering is done in exome-sequencing studies. We found that a logit model combining all or some original prediction methods outperforms other methods examined, but is unable to discriminate between autosomal dominant and autosomal recessive disease mutations. Finally, based on the predictions of the logit model, we estimate that an individual has around 5% of rare nsSNVs that are pathogenic and carries ~22 pathogenic derived alleles at least, which if made homozygous by consanguineous marriages may lead to recessive diseases. PMID:23341771

  8. Novel inhibitors of human leukocyte elastase and cathepsin G. Sequence variants of squash seed protease inhibitor with altered protease selectivity

    SciTech Connect

    McWherter, C.A.; Walkenhorst, W.F.; Glover, G.I. ); Campbell, E.J. )

    1989-07-11

    Novel peptide inhibitors of human leukocyte elastase (HLE) and cathepsin G (CG) were prepared by solid-phase peptide synthesis of P1 amino acid sequence variants of Curcurbita maxima trypsin inhibitor III (CMTI-III), a 29-residue peptide found in squash seed. A systematic study of P1 variants indicated that P1, Arg, Lys, Leu, Ala, Phe, and Met inhibit trypsin; P1, Val, Ile, Gly, Leu, Ala, Phe, and Met inhibit HLE; P1 Leu, Ala, Phe, and Met inhibit CG and chymotrypsin. Variants with P1, Val, Ile, or Gly were selective inhibitors of HLE, while inhibition of trypsin required P1 amino acids with an unbranched {beta} carbon. Studies of Val-5-CMTI-III (P1 Val) inhibition of HLE demonstrated a 1:1 binding stoichiometry with a (K{sub i}){sub app} of 8.7 nM. Inhibition of HLE by Gly-5-CMTI-III indicated a significant role for reactive-site structural moieties other than the P1 side chain. Val-5-CMTI-III inhibited both HLE and human polymorphonuclear leukocyte (PMN) proteolysis of surface-bound {sup 125}I-labeled fibronectin. Val-5-CMTI-III was more effective at preventing turnover of a peptide p-nitroanilide substrate than halting dissolution of {sup 125}I-labeled fibronectin. It was about as effective as human serum {alpha}{sub 1}-proteinase inhibitor in preventing PMN degradation of the connective tissue substrate. In addition to providing interesting candidates for controlling inflammatory cell proteolytic injury, the CMTI-based inhibitors are ideal for studying molecular recognition because of their small size, their ease of preparation, and the availability of sensitive and quantitative assays for intermolecular interactions.

  9. Clinical Interpretation of Variants from Next-Generation Sequencing: The 2016 Scientific Meeting of the Human Genome Variation Society.

    PubMed

    Oetting, William S; Brookes, Anthony J; Béroud, Christophe; Taschner, Peter E

    2016-10-01

    The 2016 scientific meeting of the Human Genome Variation Society (HGVS; http://www.hgvs.org) was held on the 20th of May in Barcelona, Spain, with the theme of "Clinical Interpretation of Variants from Next-Generation Sequencing."

  10. Rare Variants in Neurodegeneration Associated Genes Revealed by Targeted Panel Sequencing in a German ALS Cohort

    PubMed Central

    Krüger, Stefanie; Battke, Florian; Sprecher, Andrea; Munz, Marita; Synofzik, Matthis; Schöls, Ludger; Gasser, Thomas; Grehl, Torsten; Prudlo, Johannes; Biskup, Saskia

    2016-01-01

    Amyotrophic lateral sclerosis (ALS) is a progressive fatal multisystemic neurodegenerative disorder caused by preferential degeneration of upper and lower motor neurons. To further delineate the genetic architecture of the disease, we used comprehensive panel sequencing in a cohort of 80 German ALS patients. The panel covered 39 confirmed ALS genes and candidate genes, as well as 238 genes associated with other entities of the neurodegenerative disease spectrum. In addition, we performed repeat length analysis for C9orf72. Our aim was to (1) identify potentially disease-causing variants, to (2) assess a proposed model of polygenic inheritance in ALS and to (3) connect ALS with other neurodegenerative entities. We identified 79 rare potentially pathogenic variants in 27 ALS associated genes in familial and sporadic cases. Five patients had pathogenic C9orf72 repeat expansions, a further four patients harbored intermediate length repeat expansions. Our findings demonstrate that a genetic background of the disease can actually be found in a large proportion of seemingly sporadic cases and that it is not limited to putative most frequently affected genes such as C9orf72 or SOD1. Assessing the polygenic nature of ALS, we identified 15 patients carrying at least two rare potentially pathogenic variants in ALS associated genes including pathogenic or intermediate C9orf72 repeat expansions. Multiple variants might influence severity or duration of disease or could account for intrafamilial phenotypic variability or reduced penetrance. However, we could not observe a correlation with age of onset in this study. We further detected potentially pathogenic variants in other neurodegeneration associated genes in 12 patients, supporting the hypothesis of common pathways in neurodegenerative diseases and linking ALS to other entities of the neurodegenerative spectrum. Most interestingly we found variants in GBE1 and SPG7 which might represent differential diagnoses. Based on our

  11. Diversity of acid stress resistant variants of Listeria monocytogenes and the potential role of ribosomal protein S21 encoded by rpsU

    PubMed Central

    Metselaar, Karin I.; den Besten, Heidy M. W.; Boekhorst, Jos; van Hijum, Sacha A. F. T.; Zwietering, Marcel H.; Abee, Tjakko

    2015-01-01

    The dynamic response of microorganisms to environmental conditions depends on the behavior of individual cells within the population. Adverse environments can select for stable stress resistant subpopulations. In this study, we aimed to get more insight in the diversity within Listeria monocytogenes LO28 populations, and the genetic basis for the increased resistance of stable resistant fractions isolated after acid exposure. Phenotypic cluster analysis of 23 variants resulted in three clusters and four individual variants and revealed multiple-stress resistance, with both unique and overlapping features related to stress resistance, growth, motility, biofilm formation, and virulence indicators. A higher glutamate decarboxylase activity correlated with increased acid resistance. Whole genome sequencing revealed mutations in rpsU, encoding ribosomal protein S21 in the largest phenotypic cluster, while mutations in ctsR, which were previously shown to be responsible for increased resistance of heat and high hydrostatic pressure resistant variants, were not found in the acid resistant variants. This underlined that large population diversity exists within one L. monocytogenes strain and that different adverse conditions drive selection for different variants. The finding that acid stress selects for rpsU variants provides potential insights in the mechanisms underlying population diversity of L. monocytogenes. PMID:26005439

  12. Genetic Mapping and Exome Sequencing Identify Variants Associated with Five Novel Diseases

    PubMed Central

    Puffenberger, Erik G.; Jinks, Robert N.; Sougnez, Carrie; Cibulskis, Kristian; Willert, Rebecca A.; Achilly, Nathan P.; Cassidy, Ryan P.; Fiorentini, Christopher J.; Heiken, Kory F.; Lawrence, Johnny J.; Mahoney, Molly H.; Miller, Christopher J.; Nair, Devika T.; Politi, Kristin A.; Worcester, Kimberly N.; Setton, Roni A.; DiPiazza, Rosa; Sherman, Eric A.; Eastman, James T.; Francklyn, Christopher; Robey-Bond, Susan; Rider, Nicholas L.; Gabriel, Stacey; Morton, D. Holmes; Strauss, Kevin A.

    2012-01-01

    The Clinic for Special Children (CSC) has integrated biochemical and molecular methods into a rural pediatric practice serving Old Order Amish and Mennonite (Plain) children. Among the Plain people, we have used single nucleotide polymorphism (SNP) microarrays to genetically map recessive disorders to large autozygous haplotype blocks (mean = 4.4 Mb) that contain many genes (mean = 79). For some, uninformative mapping or large gene lists preclude disease-gene identification by Sanger sequencing. Seven such conditions were selected for exome sequencing at the Broad Institute; all had been previously mapped at the CSC using low density SNP microarrays coupled with autozygosity and linkage analyses. Using between 1 and 5 patient samples per disorder, we identified sequence variants in the known disease-causing genes SLC6A3 and FLVCR1, and present evidence to strongly support the pathogenicity of variants identified in TUBGCP6, BRAT1, SNIP1, CRADD, and HARS. Our results reveal the power of coupling new genotyping technologies to population-specific genetic knowledge and robust clinical data. PMID:22279524

  13. RefCNV: Identification of Gene-Based Copy Number Variants Using Whole Exome Sequencing

    PubMed Central

    Chang, Lun-Ching; Das, Biswajit; Lih, Chih-Jian; Si, Han; Camalier, Corinne E.; McGregor, Paul M.; Polley, Eric

    2016-01-01

    With rapid advances in DNA sequencing technologies, whole exome sequencing (WES) has become a popular approach for detecting somatic mutations in oncology studies. The initial intent of WES was to characterize single nucleotide variants, but it was observed that the number of sequencing reads that mapped to a genomic region correlated with the DNA copy number variants (CNVs). We propose a method RefCNV that uses a reference set to estimate the distribution of the coverage for each exon. The construction of the reference set includes an evaluation of the sources of variability in the coverage distribution. We observed that the processing steps had an impact on the coverage distribution. For each exon, we compared the observed coverage with the expected normal coverage. Thresholds for determining CNVs were selected to control the false-positive error rate. RefCNV prediction correlated significantly (r = 0.96–0.86) with CNV measured by digital polymerase chain reaction for MET (7q31), EGFR (7p12), or ERBB2 (17q12) in 13 tumor cell lines. The genome-wide CNV analysis showed a good overall correlation (Spearman’s coefficient = 0.82) between RefCNV estimation and publicly available CNV data in Cancer Cell Line Encyclopedia. RefCNV also showed better performance than three other CNV estimation methods in genome-wide CNV analysis. PMID:27147817

  14. 77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-29

    ... Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request. SUMMARY: The United States....'' SUPPLEMENTARY INFORMATION: I. Abstract Patent applications that contain nucleotide and/or amino acid...

  15. Integrating sequencing technologies in personal genomics: optimal low cost reconstruction of structural variants.

    PubMed

    Du, Jiang; Bjornson, Robert D; Zhang, Zhengdong D; Kong, Yong; Snyder, Michael; Gerstein, Mark B

    2009-07-01

    The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen), with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs). SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome.) To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of human genomes at

  16. An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data

    PubMed Central

    Jun, Goo; Wing, Mary Kate; Abecasis, Gonçalo R.; Kang, Hyun Min

    2015-01-01

    The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes Project show that the pipeline provides effective filtering against false positive variants and high power to detect true variants. Our pipeline has already contributed to variant detection and genotyping in several large-scale sequencing projects, including the 1000 Genomes Project and the NHLBI Exome Sequencing Project. We hope it will now prove useful to many medical sequencing studies. PMID:25883319

  17. VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research.

    PubMed

    Lai, Zhongwu; Markovets, Aleksandra; Ahdesmaki, Miika; Chapman, Brad; Hofmann, Oliver; McEwen, Robert; Johnson, Justin; Dougherty, Brian; Barrett, J Carl; Dry, Jonathan R

    2016-06-20

    Accurate variant calling in next generation sequencing (NGS) is critical to understand cancer genomes better. Here we present VarDict, a novel and versatile variant caller for both DNA- and RNA-sequencing data. VarDict simultaneously calls SNV, MNV, InDels, complex and structural variants, expanding the detected genetic driver landscape of tumors. It performs local realignments on the fly for more accurate allele frequency estimation. VarDict performance scales linearly to sequencing depth, enabling ultra-deep sequencing used to explore tumor evolution or detect tumor DNA circulating in blood. In addition, VarDict performs amplicon aware variant calling for polymerase chain reaction (PCR)-based targeted sequencing often used in diagnostic settings, and is able to detect PCR artifacts. Finally, VarDict also detects differences in somatic and loss of heterozygosity variants between paired samples. VarDict reprocessing of The Cancer Genome Atlas (TCGA) Lung Adenocarcinoma dataset called known driver mutations in KRAS, EGFR, BRAF, PIK3CA and MET in 16% more patients than previously published variant calls. We believe VarDict will greatly facilitate application of NGS in clinical cancer research. PMID:27060149

  18. VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research

    PubMed Central

    Lai, Zhongwu; Markovets, Aleksandra; Ahdesmaki, Miika; Chapman, Brad; Hofmann, Oliver; McEwen, Robert; Johnson, Justin; Dougherty, Brian; Barrett, J. Carl; Dry, Jonathan R.

    2016-01-01

    Accurate variant calling in next generation sequencing (NGS) is critical to understand cancer genomes better. Here we present VarDict, a novel and versatile variant caller for both DNA- and RNA-sequencing data. VarDict simultaneously calls SNV, MNV, InDels, complex and structural variants, expanding the detected genetic driver landscape of tumors. It performs local realignments on the fly for more accurate allele frequency estimation. VarDict performance scales linearly to sequencing depth, enabling ultra-deep sequencing used to explore tumor evolution or detect tumor DNA circulating in blood. In addition, VarDict performs amplicon aware variant calling for polymerase chain reaction (PCR)-based targeted sequencing often used in diagnostic settings, and is able to detect PCR artifacts. Finally, VarDict also detects differences in somatic and loss of heterozygosity variants between paired samples. VarDict reprocessing of The Cancer Genome Atlas (TCGA) Lung Adenocarcinoma dataset called known driver mutations in KRAS, EGFR, BRAF, PIK3CA and MET in 16% more patients than previously published variant calls. We believe VarDict will greatly facilitate application of NGS in clinical cancer research. PMID:27060149

  19. Focus group discussions on secondary variants and next-generation sequencing technologies.

    PubMed

    Christenhusz, Gabrielle M; Devriendt, Koenraad; Van Esch, Hilde; Dierickx, Kris

    2015-04-01

    The clinical application of new genetic technologies will be and already is of great benefit to children with unexplained developmental disabilities or congenital anomalies. In most cases, it will be their parents who, together with medical professionals, make decisions about what should be disclosed and how the information will be used. We conducted eight exploratory focus group discussions with stakeholders to provide a broad sketch of concerns and ideas around the communication of results from next-generation sequencing technologies involving children. Stakeholders included those with (grand-) children of various ages and those without children; those involved professionally with genetics and those who were not; and a range of ages. Participants were asked to focus on which secondary variants they would and would not want disclosed about their (hypothetical) children or themselves. While the literature often concentrates on the medical and scientific characteristics of secondary variants, focus group participants were also interested in factors involving the parent-child relationship and the broader context. This resulted in more flexibility surrounding the types of secondary variants disclosed to parents than much of the literature currently supports. In addition, participants would on occasion use the same factors to argue opposing positions. The "Family Illness Paradigms model" can help explain this seeming contradiction. This model emphasises the importance of how the family reacts to personal and family experiences of disease and loss, more than the fact of having these experiences.

  20. Genetic and Functional Sequence Variants of the SIRT3 Gene Promoter in Myocardial Infarction

    PubMed Central

    Yin, Xiaoyun; Pang, Shuchao; Huang, Jian; Cui, Yinghua; Yan, Bo

    2016-01-01

    Coronary artery disease (CAD), including myocardial infarction (MI), is a common complex disease that is caused by atherosclerosis. Although a large number of genetic variants have been associated with CAD, only 10% of CAD cases could be explained. It has been proposed that low frequent and rare genetic variants may be main causes for CAD. SIRT3, a mitochondrial deacetylase, plays important roles in mitochondrial function and metabolism. Lack of SIRT3 in experimental animal leads to several age-related diseases, including cardiovascular diseases. Therefore, SIRT3 gene variants may contribute to the MI development. In this study, SIRT3 gene promoter was genetically and functionally analyzed in large cohorts of MI patients (n = 319) and ethnic-matched controls (n = 322). Total twenty-three DNA sequence variants (DSVs) were identified, including 10 single-nucleotide polymorphisms (SNPs). Six novel heterozygous DSVs, g.237307A>G, g.237270G>A, g.237023_25del, g.236653C>A, g.236628G>C, g.236557T>C, and two SNPs g.237030C>T (rs12293349) and g.237022C>G (rs369344513), were identified in nine MI patients, but in none of controls. Three SNPs, g.236473C>T (rs11246029), g.236380_81ins (rs71019893) and g.236370C>G (rs185277566), were more significantly frequent in MI patients than controls (P<0.05). These DSVs and SNPs, except g.236557T>C, significantly decreased the transcriptional activity of the SIRT3 gene promoter in cultured HEK-293 cells and H9c2 cells. Therefore, these DSVs identified in MI patients may change SIRT3 level by affecting the transcriptional activity of SIRT3 gene promoter, contributing to the MI development as a risk factor. PMID:27078640

  1. ClinLabGeneticist: a tool for clinical management of genetic variants from whole exome sequencing in clinical genetic laboratories.

    PubMed

    Wang, Jinlian; Liao, Jun; Zhang, Jinglan; Cheng, Wei-Yi; Hakenberg, Jörg; Ma, Meng; Webb, Bryn D; Ramasamudram-Chakravarthi, Rajasekar; Karger, Lisa; Mehta, Lakshmi; Kornreich, Ruth; Diaz, George A; Li, Shuyu; Edelmann, Lisa; Chen, Rong

    2015-01-01

    Routine clinical application of whole exome sequencing remains challenging due to difficulties in variant interpretation, large dataset management, and workflow integration. We describe a tool named ClinLabGeneticist to implement a workflow in clinical laboratories for management of variant assessment in genetic testing and disease diagnosis. We established an extensive variant annotation data source for the identification of pathogenic variants. A dashboard was deployed to aid a multi-step, hierarchical review process leading to final clinical decisions on genetic variant assessment. In addition, a central database was built to archive all of the genetic testing data, notes, and comments throughout the review process, variant validation data by Sanger sequencing as well as the final clinical reports for future reference. The entire workflow including data entry, distribution of work assignments, variant evaluation and review, selection of variants for validation, report generation, and communications between various personnel is integrated into a single data management platform. Three case studies are presented to illustrate the utility of ClinLabGeneticist. ClinLabGeneticist is freely available to academia at http://rongchenlab.org/software/clinlabgeneticist . PMID:26338694

  2. NGS-Logistics: federated analysis of NGS sequence variants across multiple locations.

    PubMed

    Ardeshirdavani, Amin; Souche, Erika; Dehaspe, Luc; Van Houdt, Jeroen; Vermeesch, Joris Robert; Moreau, Yves

    2014-01-01

    As many personal genomes are being sequenced, collaborative analysis of those genomes has become essential. However, analysis of personal genomic data raises important privacy and confidentiality issues. We propose a methodology for federated analysis of sequence variants from personal genomes. Specific base-pair positions and/or regions are queried for samples to which the user has access but also for the whole population. The statistics results do not breach data confidentiality but allow further exploration of the data; researchers can negotiate access to relevant samples through pseudonymous identifiers. This approach minimizes the impact on data confidentiality while enabling powerful data analysis by gaining access to important rare samples. Our methodology is implemented in an open source tool called NGS-Logistics, freely available at https://ngsl.esat.kuleuven.be.

  3. Targeted Re-Sequencing Approach of Candidate Genes Implicates Rare Potentially Functional Variants in Tourette Syndrome Etiology

    PubMed Central

    Alexander, John; Potamianou, Hera; Xing, Jinchuan; Deng, Li; Karagiannidis, Iordanis; Tsetsos, Fotis; Drineas, Petros; Tarnok, Zsanett; Rizzo, Renata; Wolanczyk, Tomasz; Farkas, Luca; Nagy, Peter; Szymanska, Urszula; Androutsos, Christos; Tsironi, Vaia; Koumoula, Anastasia; Barta, Csaba; Sandor, Paul; Barr, Cathy L.; Tischfield, Jay; Paschou, Peristera; Heiman, Gary A.; Georgitsi, Marianthi

    2016-01-01

    Although the genetic basis of Tourette Syndrome (TS) remains unclear, several candidate genes have been implicated. Using a set of 382 TS individuals of European ancestry we investigated four candidate genes for TS (HDC, SLITRK1, BTBD9, and SLC6A4) in an effort to identify possibly causal variants using a targeted re-sequencing approach by next generation sequencing technology. Identification of possible disease causing variants under different modes of inheritance was performed using the algorithms implemented in VAAST. We prioritized variants using Variant ranker and validated five rare variants via Sanger sequencing in HDC and SLITRK1, all of which are predicted to be deleterious. Intriguingly, one of the identified variants is in linkage disequilibrium with a variant that is included among the top hits of a genome-wide association study for response to citalopram treatment, an antidepressant drug with off-label use also in obsessive compulsive disorder. Our findings provide additional evidence for the implication of these two genes in TS susceptibility and the possible role of these proteins in the pathobiology of TS should be revisited. PMID:27708560

  4. Detection of nucleic acid sequences by invader-directed cleavage

    DOEpatents

    Brow, Mary Ann D.; Hall, Jeff Steven Grotelueschen; Lyamichev, Victor; Olive, David Michael; Prudent, James Robert

    1999-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The 5' nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based by charge.

  5. Exome sequencing is an efficient tool for variant late-infantile neuronal ceroid lipofuscinosis molecular diagnosis.

    PubMed

    Patiño, Liliana Catherine; Battu, Rajani; Ortega-Recalde, Oscar; Nallathambi, Jeyabalan; Anandula, Venkata Ramana; Renukaradhya, Umashankar; Laissue, Paul

    2014-01-01

    The neuronal ceroid-lipofuscinoses (NCL) is a group of neurodegenerative disorders characterized by epilepsy, visual failure, progressive mental and motor deterioration, myoclonus, dementia and reduced life expectancy. Classically, NCL-affected individuals have been classified into six categories, which have been mainly defined regarding the clinical onset of symptoms. However, some patients cannot be easily included in a specific group because of significant variation in the age of onset and disease progression. Molecular genetics has emerged in recent years as a useful tool for enhancing NCL subtype classification. Fourteen NCL genetic forms (CLN1 to CLN14) have been described to date. The variant late-infantile form of the disease has been linked to CLN5, CLN6, CLN7 (MFSD8) and CLN8 mutations. Despite advances in the diagnosis of neurodegenerative disorders mutations in these genes may cause similar phenotypes, which rends difficult accurate candidate gene selection for direct sequencing. Three siblings who were affected by variant late-infantile NCL are reported in the present study. We used whole-exome sequencing, direct sequencing and in silico approaches to identify the molecular basis of the disease. We identified the novel c.1219T>C (p.Trp407Arg) and c.1361T>C (p.Met454Thr) MFSD8 pathogenic mutations. Our results highlighted next generation sequencing as a novel and powerful methodological approach for the rapid determination of the molecular diagnosis of NCL. They also provide information regarding the phenotypic and molecular spectrum of CLN7 disease.

  6. Exome sequencing is an efficient tool for variant late-infantile neuronal ceroid lipofuscinosis molecular diagnosis.

    PubMed

    Patiño, Liliana Catherine; Battu, Rajani; Ortega-Recalde, Oscar; Nallathambi, Jeyabalan; Anandula, Venkata Ramana; Renukaradhya, Umashankar; Laissue, Paul

    2014-01-01

    The neuronal ceroid-lipofuscinoses (NCL) is a group of neurodegenerative disorders characterized by epilepsy, visual failure, progressive mental and motor deterioration, myoclonus, dementia and reduced life expectancy. Classically, NCL-affected individuals have been classified into six categories, which have been mainly defined regarding the clinical onset of symptoms. However, some patients cannot be easily included in a specific group because of significant variation in the age of onset and disease progression. Molecular genetics has emerged in recent years as a useful tool for enhancing NCL subtype classification. Fourteen NCL genetic forms (CLN1 to CLN14) have been described to date. The variant late-infantile form of the disease has been linked to CLN5, CLN6, CLN7 (MFSD8) and CLN8 mutations. Despite advances in the diagnosis of neurodegenerative disorders mutations in these genes may cause similar phenotypes, which rends difficult accurate candidate gene selection for direct sequencing. Three siblings who were affected by variant late-infantile NCL are reported in the present study. We used whole-exome sequencing, direct sequencing and in silico approaches to identify the molecular basis of the disease. We identified the novel c.1219T>C (p.Trp407Arg) and c.1361T>C (p.Met454Thr) MFSD8 pathogenic mutations. Our results highlighted next generation sequencing as a novel and powerful methodological approach for the rapid determination of the molecular diagnosis of NCL. They also provide information regarding the phenotypic and molecular spectrum of CLN7 disease. PMID:25333361

  7. Exome Sequencing Reveals Novel Rare Variants in the Ryanodine Receptor and Calcium Channel Genes in Malignant Hyperthermia Families

    PubMed Central

    Kim, Jerry H.; Jarvik, Gail P.; Browning, Brian L.; Rajagopalan, Ramakrishnan; Gordon, Adam S.; Rieder, Mark J.; Robertson, Peggy D.; Nickerson, Deborah A.; Fisher, Nickla A.; Hopkins, Philip M.

    2014-01-01

    Background About half of malignant hyperthermia (MH) cases are associated with skeletal muscle ryanodine receptor 1 (RYR1) and calcium channel, voltage-dependent, L type, α1S subunit (CACNA1S) gene mutations, leaving many with an unknown cause. We chose to apply a sequencing approach to uncover causal variants in unknown cases. Sequencing the exome, the protein-coding region of the genome, has power at low sample sizes and identified the cause of over a dozen Mendelian disorders. Methods We considered four families with multiple MH cases but in whom no mutations in RYR1 and CACNA1S had been identified by Sanger sequencing of complementary DNA. Exome sequencing of two affecteds per family, chosen for maximum genetic distance, were compared. Variants were ranked by allele frequency, protein change, and measures of conservation among mammals to assess likelihood of causation. Finally, putative pathogenic mutations were genotyped in other family members to verify cosegregation with MH. Results Exome sequencing revealed 1 rare RYR1 nonsynonymous variant in each of 3 families (Asp1056His, Val2627Met, Val4234Leu), and 1 CACNA1S variant (Thr1009Lys) in a 4th family. These were not seen in variant databases or in our control population sample of 5379 exomes. Follow-up sequencing in other family members verified cosegregation of alleles with MH. Conclusions Using both exome sequencing and allele frequency data from large sequencing efforts may aid genetic diagnosis of MH. In our sample, it was more sensitive for variant detection in known genes than Sanger sequencing of complementary DNA, and allows for the possibility of novel gene discovery. PMID:24013571

  8. Investigation of rare and low-frequency variants using high-throughput sequencing with pooled DNA samples.

    PubMed

    Wang, Jingwen; Skoog, Tiina; Einarsdottir, Elisabet; Kaartokallio, Tea; Laivuori, Hannele; Grauers, Anna; Gerdhem, Paul; Hytönen, Marjo; Lohi, Hannes; Kere, Juha; Jiao, Hong

    2016-01-01

    High-throughput sequencing using pooled DNA samples can facilitate genome-wide studies on rare and low-frequency variants in a large population. Some major questions concerning the pooling sequencing strategy are whether rare and low-frequency variants can be detected reliably, and whether estimated minor allele frequencies (MAFs) can represent the actual values obtained from individually genotyped samples. In this study, we evaluated MAF estimates using three variant detection tools with two sets of pooled whole exome sequencing (WES) and one set of pooled whole genome sequencing (WGS) data. Both GATK and Freebayes displayed high sensitivity, specificity and accuracy when detecting rare or low-frequency variants. For the WGS study, 56% of the low-frequency variants in Illumina array have identical MAFs and 26% have one allele difference between sequencing and individual genotyping data. The MAF estimates from WGS correlated well (r = 0.94) with those from Illumina arrays. The MAFs from the pooled WES data also showed high concordance (r = 0.88) with those from the individual genotyping data. In conclusion, the MAFs estimated from pooled DNA sequencing data reflect the MAFs in individually genotyped samples well. The pooling strategy can thus be a rapid and cost-effective approach for the initial screening in large-scale association studies. PMID:27633116

  9. Investigation of rare and low-frequency variants using high-throughput sequencing with pooled DNA samples

    PubMed Central

    Wang, Jingwen; Skoog, Tiina; Einarsdottir, Elisabet; Kaartokallio, Tea; Laivuori, Hannele; Grauers, Anna; Gerdhem, Paul; Hytönen, Marjo; Lohi, Hannes; Kere, Juha; Jiao, Hong

    2016-01-01

    High-throughput sequencing using pooled DNA samples can facilitate genome-wide studies on rare and low-frequency variants in a large population. Some major questions concerning the pooling sequencing strategy are whether rare and low-frequency variants can be detected reliably, and whether estimated minor allele frequencies (MAFs) can represent the actual values obtained from individually genotyped samples. In this study, we evaluated MAF estimates using three variant detection tools with two sets of pooled whole exome sequencing (WES) and one set of pooled whole genome sequencing (WGS) data. Both GATK and Freebayes displayed high sensitivity, specificity and accuracy when detecting rare or low-frequency variants. For the WGS study, 56% of the low-frequency variants in Illumina array have identical MAFs and 26% have one allele difference between sequencing and individual genotyping data. The MAF estimates from WGS correlated well (r = 0.94) with those from Illumina arrays. The MAFs from the pooled WES data also showed high concordance (r = 0.88) with those from the individual genotyping data. In conclusion, the MAFs estimated from pooled DNA sequencing data reflect the MAFs in individually genotyped samples well. The pooling strategy can thus be a rapid and cost-effective approach for the initial screening in large-scale association studies. PMID:27633116

  10. EGFR variant heterogeneity in glioblastoma resolved through single-nucleus sequencing

    PubMed Central

    Francis, Joshua M.; Zhang, Cheng-Zhong; Maire, Cecile L.; Jung, Joonil; Manzo, Veronica E.; Adalsteinsson, Viktor A.; Homer, Heather; Haidar, Sam; Blumenstiel, Brendan; Pedamallu, Chandra Sekhar; Ligon, Azra H.; Love, J. Christopher; Meyerson, Matthew; Ligon, Keith L.

    2014-01-01

    Glioblastomas with EGFR amplification represent approximately 50% of newly diagnosed cases and recent studies have revealed frequent coexistence of multiple EGFR aberrations within the same tumor with implications for mutation cooperation and treatment resistance. However, bulk tumor sequencing studies cannot resolve the patterns of how the multiple EGFR aberrations coexist with other mutations within single tumor cells. Here we applied a population-based single-cell whole genome sequencing methodology to characterize genomic heterogeneity in EGFR amplified glioblastomas. Our analysis effectively identified clonal events, including a novel translocation of a super enhancer to the TERT promoter, as well as subclonal loss-of-heterozygosity and multiple EGFR mutational variants within tumors. Correlating the EGFR mutations onto the cellular hierarchy revealed that EGFR truncation variants (EGFRvII and EGFR Carboxyl-terminal deletions) identified in the bulk tumor segregate into non-overlapping subclonal populations. In vitro and in vivo functional studies show EGFRvII is oncogenic and sensitive to EGFR inhibitors currently in clinical trials. Thus the association between diverse activating mutations in EGFR and other subclonal mutations within a single tumor supports an intrinsic mechanism for proliferative and clonal diversification with broad implications in resistance to treatment. PMID:24893890

  11. Variants of beta-glucosidases

    SciTech Connect

    Fidantsef, Ana; Lamsa, Michael; Gorre-Clancy, Brian

    2014-10-07

    The present invention relates to variants of a parent beta-glucosidase, comprising a substitution at one or more positions corresponding to positions 142, 183, 266, and 703 of amino acids 1 to 842 of SEQ ID NO: 2 or corresponding to positions 142, 183, 266, and 705 of amino acids 1 to 844 of SEQ ID NO: 70, wherein the variant has beta-glucosidase activity. The present invention also relates to nucleotide sequences encoding the variant beta-glucosidases and to nucleic acid constructs, vectors, and host cells comprising the nucleotide sequences.

  12. Variants of beta-glucosidase

    SciTech Connect

    Fidantsef, Ana; Lamsa, Michael; Gorre-Clancy, Brian

    2015-07-14

    The present invention relates to variants of a parent beta-glucosidase, comprising a substitution at one or more positions corresponding to positions 142, 183, 266, and 703 of amino acids 1 to 842 of SEQ ID NO: 2 or corresponding to positions 142, 183, 266, and 705 of amino acids 1 to 844 of SEQ ID NO: 70, wherein the variant has beta-glucosidase activity. The present invention also relates to nucleotide sequences encoding the variant beta-glucosidases and to nucleic acid constructs, vectors, and host cells comprising the nucleotide sequences.

  13. Variants of beta-glucosidase

    SciTech Connect

    Fidantsef, Ana; Lamsa, Michael; Gorre-Clancy, Brian

    2009-12-29

    The present invention relates to variants of a parent beta-glucosidase, comprising a substitution at one or more positions corresponding to positions 142, 183, 266, and 703 of amino acids 1 to 842 of SEQ ID NO: 2 or corresponding to positions 142, 183, 266, and 705 of amino acids 1 to 844 of SEQ ID NO: 70, wherein the variant has beta-glucosidase activity. The present invention also relates to nucleotide sequences encoding the variant beta-glucosidases and to nucleic acid constructs, vectors, and host cells comprising the nucleotide sequences.

  14. Variants of beta-glucosidases

    DOEpatents

    Fidantsef, Ana; Lamsa, Michael; Clancy, Brian Gorre

    2008-08-19

    The present invention relates to variants of a parent beta-glucosidase, comprising a substitution at one or more positions corresponding to positions 142, 183, 266, and 703 of amino acids 1 to 842 of SEQ ID NO: 2 or corresponding to positions 142, 183, 266, and 705 of amino acids 1 to 844 of SEQ ID NO: 70, wherein the variant has beta-glucosidase activity. The present invention also relates to nucleotide sequences encoding the variant beta-glucosidases and to nucleic acid constructs, vectors, and host cells comprising the nucleotide sequences.

  15. Characterization of Intra-Type Variants of Oncogenic Human Papillomaviruses by Next-Generation Deep Sequencing of the E6/E7 Region

    PubMed Central

    Lavezzo, Enrico; Masi, Giulia; Toppo, Stefano; Franchin, Elisa; Gazzola, Valentina; Sinigaglia, Alessandro; Masiero, Serena; Trevisan, Marta; Pagni, Silvana; Palù, Giorgio; Barzon, Luisa

    2016-01-01

    Different human papillomavirus (HPV) types are characterized by differences in tissue tropism and ability to promote cell proliferation and transformation. In addition, clinical and experimental studies have shown that some genetic variants/lineages of high-risk HPV (HR-HPV) types are characterized by increased oncogenic activity and probability to induce cancer. In this study, we designed and validated a new method based on multiplex PCR-deep sequencing of the E6/E7 region of HR-HPV types to characterize HPV intra-type variants in clinical specimens. Validation experiments demonstrated that this method allowed reliable identification of the different lineages of oncogenic HPV types. Advantages of this method over other published methods were represented by its ability to detect variants of all HR-HPV types in a single reaction, to detect variants of HR-HPV types in clinical specimens with multiple infections, and, being based on sequencing of the full E6/E7 region, to detect amino acid changes in these oncogenes potentially associated with increased transforming activity. PMID:26985902

  16. Characterization of Intra-Type Variants of Oncogenic Human Papillomaviruses by Next-Generation Deep Sequencing of the E6/E7 Region.

    PubMed

    Lavezzo, Enrico; Masi, Giulia; Toppo, Stefano; Franchin, Elisa; Gazzola, Valentina; Sinigaglia, Alessandro; Masiero, Serena; Trevisan, Marta; Pagni, Silvana; Palù, Giorgio; Barzon, Luisa

    2016-03-14

    Different human papillomavirus (HPV) types are characterized by differences in tissue tropism and ability to promote cell proliferation and transformation. In addition, clinical and experimental studies have shown that some genetic variants/lineages of high-risk HPV (HR-HPV) types are characterized by increased oncogenic activity and probability to induce cancer. In this study, we designed and validated a new method based on multiplex PCR-deep sequencing of the E6/E7 region of HR-HPV types to characterize HPV intra-type variants in clinical specimens. Validation experiments demonstrated that this method allowed reliable identification of the different lineages of oncogenic HPV types. Advantages of this method over other published methods were represented by its ability to detect variants of all HR-HPV types in a single reaction, to detect variants of HR-HPV types in clinical specimens with multiple infections, and, being based on sequencing of the full E6/E7 region, to detect amino acid changes in these oncogenes potentially associated with increased transforming activity.

  17. Characterization of Intra-Type Variants of Oncogenic Human Papillomaviruses by Next-Generation Deep Sequencing of the E6/E7 Region.

    PubMed

    Lavezzo, Enrico; Masi, Giulia; Toppo, Stefano; Franchin, Elisa; Gazzola, Valentina; Sinigaglia, Alessandro; Masiero, Serena; Trevisan, Marta; Pagni, Silvana; Palù, Giorgio; Barzon, Luisa

    2016-03-01

    Different human papillomavirus (HPV) types are characterized by differences in tissue tropism and ability to promote cell proliferation and transformation. In addition, clinical and experimental studies have shown that some genetic variants/lineages of high-risk HPV (HR-HPV) types are characterized by increased oncogenic activity and probability to induce cancer. In this study, we designed and validated a new method based on multiplex PCR-deep sequencing of the E6/E7 region of HR-HPV types to characterize HPV intra-type variants in clinical specimens. Validation experiments demonstrated that this method allowed reliable identification of the different lineages of oncogenic HPV types. Advantages of this method over other published methods were represented by its ability to detect variants of all HR-HPV types in a single reaction, to detect variants of HR-HPV types in clinical specimens with multiple infections, and, being based on sequencing of the full E6/E7 region, to detect amino acid changes in these oncogenes potentially associated with increased transforming activity. PMID:26985902

  18. Incorporating predicted functions of nonsynonymous variants into gene-based analysis of exome sequencing data: a comparative study

    PubMed Central

    2011-01-01

    Next-generation sequencing has opened up new avenues for the genetic study of complex traits. However, because of the small number of observations for any given rare allele and high sequencing error, it is a challenge to identify functional rare variants associated with the phenotype of interest. Recent research shows that grouping variants by gene and incorporating computationally predicted functions of variants may provide higher statistical power. On the other hand, many algorithms are available for predicting the damaging effects of nonsynonymous variants. Here, we use the simulated mini-exome data of Genetic Analysis Workshop 17 to study and compare the effects of incorporating the functional predictions of single-nucleotide polymorphisms using two popular algorithms, SIFT and PolyPhen-2, into a gene-based association test. We also propose a simple mixture model that can effectively combine test results based on different functional prediction algorithms. PMID:22373178

  19. Exome sequencing reveals frequent deleterious germline variants in cancer susceptibility genes in women with invasive breast cancer undergoing neoadjuvant chemotherapy.

    PubMed

    Ellingson, Marissa S; Hart, Steven N; Kalari, Krishna R; Suman, Vera; Schahl, Kimberly A; Dockter, Travis J; Felten, Sara J; Sinnwell, Jason P; Thompson, Kevin J; Tang, Xiaojia; Vedell, Peter T; Barman, Poulami; Sicotte, Hugues; Eckel-Passow, Jeanette E; Northfelt, Donald W; Gray, Richard J; McLaughlin, Sarah A; Moreno-Aspitia, Alvaro; Ingle, James N; Moyer, Ann M; Visscher, Daniel W; Jones, Katie; Conners, Amy; McDonough, Michelle; Wieben, Eric D; Wang, Liewei; Weinshilboum, Richard; Boughey, Judy C; Goetz, Matthew P

    2015-09-01

    When sequencing blood and tumor samples to identify targetable somatic variants for cancer therapy, clinically relevant germline variants may be uncovered. We evaluated the prevalence of deleterious germline variants in cancer susceptibility genes in women with breast cancer referred for neoadjuvant chemotherapy and returned clinically actionable results to patients. Exome sequencing was performed on blood samples from women with invasive breast cancer referred for neoadjuvant chemotherapy. Germline variants within 142 hereditary cancer susceptibility genes were filtered and reviewed for pathogenicity. Return of results was offered to patients with deleterious variants in actionable genes if they were not aware of their result through clinical testing. 124 patients were enrolled (median age 51) with the following subtypes: triple negative (n = 43, 34.7%), HER2+ (n = 37, 29.8%), luminal B (n = 31, 25%), and luminal A (n = 13, 10.5%). Twenty-eight deleterious variants were identified in 26/124 (21.0%) patients in the following genes: ATM (n = 3), BLM (n = 1), BRCA1 (n = 4), BRCA2 (n = 8), CHEK2 (n = 2), FANCA (n = 1), FANCI (n = 1), FANCL (n = 1), FANCM (n = 1), FH (n = 1), MLH3 (n = 1), MUTYH (n = 2), PALB2 (n = 1), and WRN (n = 1). 121/124 (97.6%) patients consented to return of research results. Thirteen (10.5%) had actionable variants, including four that were returned to patients and led to changes in medical management. Deleterious variants in cancer susceptibility genes are highly prevalent in patients with invasive breast cancer referred for neoadjuvant chemotherapy undergoing exome sequencing. Detection of these variants impacts medical management. PMID:26296701

  20. Proline-rich sequence recognition: II. Proteomics analysis of Tsg101 ubiquitin-E2-like variant (UEV) interactions.

    PubMed

    Schlundt, Andreas; Sticht, Jana; Piotukh, Kirill; Kosslick, Daniela; Jahnke, Nadin; Keller, Sandro; Schuemann, Michael; Krause, Eberhard; Freund, Christian

    2009-11-01

    The tumor maintenance protein Tsg101 has recently gained much attention because of its involvement in endosomal sorting, virus release, cytokinesis, and cancerogenesis. The ubiquitin-E2-like variant (UEV) domain of the protein interacts with proline-rich sequences of target proteins that contain P(S/T)AP amino acid motifs and weakly binds to the ubiquitin moiety of proteins committed to sorting or degradation. Here we performed peptide spot analysis and phage display to refine the peptide binding specificity of the Tsg101 UEV domain. A mass spectrometric proteomics approach that combines domain-based pulldown experiments, binding site inactivation, and stable isotope labeling by amino acids in cell culture (SILAC) was then used to delineate the relative importance of the peptide and ubiquitin binding sites. Clearly "PTAP" interactions dominate target recognition, and we identified several novel binders as for example the poly(A)-binding protein 1 (PABP1), Sec24b, NFkappaB2, and eIF4b. For PABP1 and eIF4b the interactions were confirmed in the context of the corresponding full-length proteins in cellular lysates. Therefore, our results strongly suggest additional roles of Tsg101 in cellular regulation of mRNA translation. Regulation of Tsg101 itself by the ubiquitin ligase TAL (Tsg101-associated ligase) is most likely conferred by a single PSAP binding motif that enables the interaction with Tsg101 UEV. Together with the results from the accompanying article (Kofler, M., Schuemann, M., Merz, C., Kosslick, D., Schlundt, A., Tannert, A., Schaefer, M., Lührmann, R., Krause, E., and Freund, C. (2009) Proline-rich sequence recognition: I. Marking GYF and WW domain assembly sites in early spliceosomal complexes. Mol. Cell. Proteomics 8, 2461-2473) on GYF and WW domain pathways our work defines major proline-rich sequence-mediated interaction networks that contribute to the modular assembly of physiologically relevant protein complexes.

  1. Multi-species sequence comparison reveals conservation of ghrelin gene-derived splice variants encoding a truncated ghrelin peptide.

    PubMed

    Seim, Inge; Jeffery, Penny L; Thomas, Patrick B; Walpole, Carina M; Maugham, Michelle; Fung, Jenny N T; Yap, Pei-Yi; O'Keeffe, Angela J; Lai, John; Whiteside, Eliza J; Herington, Adrian C; Chopin, Lisa K

    2016-06-01

    The peptide hormone ghrelin is a potent orexigen produced predominantly in the stomach. It has a number of other biological actions, including roles in appetite stimulation, energy balance, the stimulation of growth hormone release and the regulation of cell proliferation. Recently, several ghrelin gene splice variants have been described. Here, we attempted to identify conserved alternative splicing of the ghrelin gene by cross-species sequence comparisons. We identified a novel human exon 2-deleted variant and provide preliminary evidence that this splice variant and in1-ghrelin encode a C-terminally truncated form of the ghrelin peptide, termed minighrelin. These variants are expressed in humans and mice, demonstrating conservation of alternative splicing spanning 90 million years. Minighrelin appears to have similar actions to full-length ghrelin, as treatment with exogenous minighrelin peptide stimulates appetite and feeding in mice. Forced expression of the exon 2-deleted preproghrelin variant mirrors the effect of the canonical preproghrelin, stimulating cell proliferation and migration in the PC3 prostate cancer cell line. This is the first study to characterise an exon 2-deleted preproghrelin variant and to demonstrate sequence conservation of ghrelin gene-derived splice variants that encode a truncated ghrelin peptide. This adds further impetus for studies into the alternative splicing of the ghrelin gene and the function of novel ghrelin peptides in vertebrates.

  2. Hybridization and sequencing of nucleic acids using base pair mismatches

    DOEpatents

    Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

    2001-01-01

    Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.

  3. Genome Sequences of Simian Hemorrhagic Fever Virus Variant NIH LVR42-0/M6941 Isolates (Arteriviridae: Arterivirus)

    PubMed Central

    Lauck, Michael; Palacios, Gustavo; Wiley, Michael R.; Lǐ, Yànhuá; Fāng, Yīng; Lackemeyer, Matthew G.; Caì, Yíngyún; Bailey, Adam L.; Postnikova, Elena; Radoshitzky, Sheli R.; Johnson, Reed F.; Alkhovsky, Sergey V.; Deriabin, Petr G.; Friedrich, Thomas C.; Goldberg, Tony L.; Jahrling, Peter B.; O’Connor, David H.

    2014-01-01

    Simian hemorrhagic fever virus (SHFV) variant NIH LVR42-0/M6941 is the only remaining SHFV in culture, and only a single genome sequence record exists in GenBank/RefSeq. We compared the genomic sequence of NIH LVR42-0/M6941 acquired from the ATCC in 2011 to NIH LVR42-0/M6941 genomes sequenced directly from nonhuman primates experimentally infected in 1989. PMID:25301647

  4. Sequencing and computational analysis of complete genome sequences of Citrus yellow mosaic badna virus from acid lime and pummelo.

    PubMed

    Borah, Basanta K; Johnson, A M Anthony; Sai Gopal, D V R; Dasgupta, Indranil

    2009-08-01

    Citrus yellow mosaic badna virus (CMBV), a member of the Family Caulimoviridae, Genus Badnavirus, is the causative agent of Citrus mosaic disease in India. Although the virus has been detected in several citrus species, only two full-length genomes, one each from Sweet orange and Rangpur lime, are available in publicly accessible databases. In order to obtain a better understanding of the genetic variability of the virus in other citrus mosaic-affected citrus species, we performed the cloning and sequence analysis of complete genomes of CMBV from two additional citrus species, Acid lime and Pummelo. We show that CMBV genomes from the two hosts share high homology with previously reported CMBV sequences and hence conclude that the new isolates represent variants of the virus present in these species. Based on in silico sequence analysis, we predict the possible function of the protein encoded by one of the five ORFs.

  5. Next generation exome sequencing of paediatric inflammatory bowel disease patients identifies rare and novel variants in candidate genes

    PubMed Central

    Christodoulou, Katja; Wiskin, Anthony E; Gibson, Jane; Tapper, William; Willis, Claire; Afzal, Nadeem A; Upstill-Goddard, Rosanna; Holloway, John W; Simpson, Michael A; Beattie, R Mark; Collins, Andrew

    2013-01-01

    Background Multiple genes have been implicated by association studies in altering inflammatory bowel disease (IBD) predisposition. Paediatric patients often manifest more extensive disease and a particularly severe disease course. It is likely that genetic predisposition plays a more substantial role in this group. Objective To identify the spectrum of rare and novel variation in known IBD susceptibility genes using exome sequencing analysis in eight individual cases of childhood onset severe disease. Design DNA samples from the eight patients underwent targeted exome capture and sequencing. Data were processed through an analytical pipeline to align sequence reads, conduct quality checks, and identify and annotate variants where patient sequence differed from the reference sequence. For each patient, the entire complement of rare variation within strongly associated candidate genes was catalogued. Results Across the panel of 169 known IBD susceptibility genes, approximately 300 variants in 104 genes were found. Excluding splicing and HLA-class variants, 58 variants across 39 of these genes were classified as rare, with an alternative allele frequency of <5%, of which 17 were novel. Only two patients with early onset Crohn's disease exhibited rare deleterious variations within NOD2: the previously described R702W variant was the sole NOD2 variant in one patient, while the second patient also carried the L1007 frameshift insertion. Both patients harboured other potentially damaging mutations in the GSDMB, ERAP2 and SEC16A genes. The two patients severely affected with ulcerative colitis exhibited a distinct profile: both carried potentially detrimental variation in the BACH2 and IL10 genes not seen in other patients. Conclusion For each of the eight individuals studied, all non-synonymous, truncating and frameshift mutations across all known IBD genes were identified. A unique profile of rare and potentially damaging variants was evident for each patient with this

  6. Detection of Clinically Relevant Genetic Variants in Autism Spectrum Disorder by Whole-Genome Sequencing

    PubMed Central

    Jiang, Yong-hui; Yuen, Ryan K.C.; Jin, Xin; Wang, Mingbang; Chen, Nong; Wu, Xueli; Ju, Jia; Mei, Junpu; Shi, Yujian; He, Mingze; Wang, Guangbiao; Liang, Jieqin; Wang, Zhe; Cao, Dandan; Carter, Melissa T.; Chrysler, Christina; Drmic, Irene E.; Howe, Jennifer L.; Lau, Lynette; Marshall, Christian R.; Merico, Daniele; Nalpathamkalam, Thomas; Thiruvahindrapuram, Bhooma; Thompson, Ann; Uddin, Mohammed; Walker, Susan; Luo, Jun; Anagnostou, Evdokia; Zwaigenbaum, Lonnie; Ring, Robert H.; Wang, Jian; Lajonchere, Clara; Wang, Jun; Shih, Andy; Szatmari, Peter; Yang, Huanming; Dawson, Geraldine; Li, Yingrui; Scherer, Stephen W.

    2013-01-01

    Autism Spectrum Disorder (ASD) demonstrates high heritability and familial clustering, yet the genetic causes remain only partially understood as a result of extensive clinical and genomic heterogeneity. Whole-genome sequencing (WGS) shows promise as a tool for identifying ASD risk genes as well as unreported mutations in known loci, but an assessment of its full utility in an ASD group has not been performed. We used WGS to examine 32 families with ASD to detect de novo or rare inherited genetic variants predicted to be deleterious (loss-of-function and damaging missense mutations). Among ASD probands, we identified deleterious de novo mutations in six of 32 (19%) families and X-linked or autosomal inherited alterations in ten of 32 (31%) families (some had combinations of mutations). The proportion of families identified with such putative mutations was larger than has been previously reported; this yield was in part due to the comprehensive and uniform coverage afforded by WGS. Deleterious variants were found in four unrecognized, nine known, and eight candidate ASD risk genes. Examples include CAPRIN1 and AFF2 (both linked to FMR1, which is involved in fragile X syndrome), VIP (involved in social-cognitive deficits), and other genes such as SCN2A and KCNQ2 (linked to epilepsy), NRXN1, and CHD7, which causes ASD-associated CHARGE syndrome. Taken together, these results suggest that WGS and thorough bioinformatic analyses for de novo and rare inherited mutations will improve the detection of genetic variants likely to be associated with ASD or its accompanying clinical symptoms. PMID:23849776

  7. Amino acid and cDNA sequences of lysozyme from Hyalophora cecropia

    PubMed Central

    Engström, Å.; Xanthopoulos, K. G.; Boman, H. G.; Bennich, H.

    1985-01-01

    The amino acid and cDNA sequences of lysozyme from the giant silk moth Hyalophora cecropia have been determined. This enzyme is one of several immune proteins produced by the diapausing pupae after injection of bacteria. Cecropia lysozyme is composed of 120 amino acids, has a mol. wt. of 13.8 kd and shows great similarity with vertebrate lysozymes of the chicken type. The amino acid residues responsible for the catalytic activity and for the binding of substrate are essentially conserved. Three allelic variants of the Cecropia enzyme are identified. A comparison of the chicken and the Cecropia lysozymes shows that there is a 40% identity at both the amino acid and the nucleotide level. Some evolutionary aspects of the sequence data are discussed. PMID:16453632

  8. elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling.

    PubMed

    Herzeel, Charlotte; Costanza, Pascal; Decap, Dries; Fostier, Jan; Reumers, Joke

    2015-01-01

    elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878), we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878), elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost. PMID:26182406

  9. Characterization of functional methylomes by next-generation capture sequencing identifies novel disease-associated variants.

    PubMed

    Allum, Fiona; Shao, Xiaojian; Guénard, Frédéric; Simon, Marie-Michelle; Busche, Stephan; Caron, Maxime; Lambourne, John; Lessard, Julie; Tandre, Karolina; Hedman, Åsa K; Kwan, Tony; Ge, Bing; Rönnblom, Lars; McCarthy, Mark I; Deloukas, Panos; Richmond, Todd; Burgess, Daniel; Spector, Timothy D; Tchernof, André; Marceau, Simon; Lathrop, Mark; Vohl, Marie-Claude; Pastinen, Tomi; Grundberg, Elin

    2015-05-29

    Most genome-wide methylation studies (EWAS) of multifactorial disease traits use targeted arrays or enrichment methodologies preferentially covering CpG-dense regions, to characterize sufficiently large samples. To overcome this limitation, we present here a new customizable, cost-effective approach, methylC-capture sequencing (MCC-Seq), for sequencing functional methylomes, while simultaneously providing genetic variation information. To illustrate MCC-Seq, we use whole-genome bisulfite sequencing on adipose tissue (AT) samples and public databases to design AT-specific panels. We establish its efficiency for high-density interrogation of methylome variability by systematic comparisons with other approaches and demonstrate its applicability by identifying novel methylation variation within enhancers strongly correlated to plasma triglyceride and HDL-cholesterol, including at CD36. Our more comprehensive AT panel assesses tissue methylation and genotypes in parallel at ∼4 and ∼3 M sites, respectively. Our study demonstrates that MCC-Seq provides comparable accuracy to alternative approaches but enables more efficient cataloguing of functional and disease-relevant epigenetic and genetic variants for large-scale EWAS.

  10. Whole-Exome Sequencing Links a Variant in DHDDS to Retinitis Pigmentosa

    PubMed Central

    Züchner, Stephan; Dallman, Julia; Wen, Rong; Beecham, Gary; Naj, Adam; Farooq, Amjad; Kohli, Martin A.; Whitehead, Patrice L.; Hulme, William; Konidari, Ioanna; Edwards, Yvonne J.K.; Cai, Guiqing; Peter, Inga; Seo, David; Buxbaum, Joseph D.; Haines, Jonathan L.; Blanton, Susan; Young, Juan; Alfonso, Eduardo; Vance, Jeffery M.; Lam, Byron L.; Peričak-Vance, Margaret A.

    2011-01-01

    Increasingly, mutations in genes causing Mendelian disease will be supported by individual and small families only; however, exome sequencing studies have thus far focused on syndromic phenotypes characterized by low locus heterogeneity. In contrast, retinitis pigmentosa (RP) is caused by >50 known genes, which still explain only half of the clinical cases. In a single, one-generation, nonsyndromic RP family, we have identified a gene, dehydrodolichol diphosphate synthase (DHDDS), demonstrating the power of combining whole-exome sequencing with rapid in vivo studies. DHDDS is a highly conserved essential enzyme for dolichol synthesis, permitting global N-linked glycosylation. Zebrafish studies showed virtually identical photoreceptor defects as observed with N-linked glycosylation-interfering mutations in the light-sensing protein rhodopsin. The identified Lys42Glu variant likely arose from an ancestral founder, because eight of the nine identified alleles in 27,174 control chromosomes were of confirmed Ashkenazi Jewish ethnicity. These findings demonstrate the power of exome sequencing linked to functional studies when faced with challenging study designs and, importantly, link RP to the pathways of N-linked glycosylation, which promise new avenues for therapeutic interventions. PMID:21295283

  11. Characterization of functional methylomes by next-generation capture sequencing identifies novel disease-associated variants

    PubMed Central

    Allum, Fiona; Shao, Xiaojian; Guénard, Frédéric; Simon, Marie-Michelle; Busche, Stephan; Caron, Maxime; Lambourne, John; Lessard, Julie; Tandre, Karolina; Hedman, Åsa K.; Kwan, Tony; Ge, Bing; Rönnblom, Lars; McCarthy, Mark I.; Deloukas, Panos; Richmond, Todd; Burgess, Daniel; Spector, Timothy D.; Tchernof, André; Marceau, Simon; Lathrop, Mark; Vohl, Marie-Claude; Pastinen, Tomi; Grundberg, Elin; Ahmadi, Kourosh R.; Ainali, Chrysanthi; Barrett, Amy; Bataille, Veronique; Bell, Jordana T.; Buil, Alfonso; Dermitzakis, Emmanouil T.; Dimas, Antigone S.; Durbin, Richard; Glass, Daniel; Hassanali, Neelam; Ingle, Catherine; Knowles, David; Krestyaninova, Maria; Lindgren, Cecilia M.; Lowe, Christopher E.; Meduri, Eshwar; di Meglio, Paola; Min, Josine L.; Montgomery, Stephen B.; Nestle, Frank O.; Nica, Alexandra C.; Nisbet, James; O'Rahilly, Stephen; Parts, Leopold; Potter, Simon; Sandling, Johanna; Sekowska, Magdalena; Shin, So-Youn; Small, Kerrin S.; Soranzo, Nicole; Surdulescu, Gabriela; Travers, Mary E.; Tsaprouni, Loukia; Tsoka, Sophia; Wilk, Alicja; Yang, Tsun-Po; Zondervan, Krina T.

    2015-01-01

    Most genome-wide methylation studies (EWAS) of multifactorial disease traits use targeted arrays or enrichment methodologies preferentially covering CpG-dense regions, to characterize sufficiently large samples. To overcome this limitation, we present here a new customizable, cost-effective approach, methylC-capture sequencing (MCC-Seq), for sequencing functional methylomes, while simultaneously providing genetic variation information. To illustrate MCC-Seq, we use whole-genome bisulfite sequencing on adipose tissue (AT) samples and public databases to design AT-specific panels. We establish its efficiency for high-density interrogation of methylome variability by systematic comparisons with other approaches and demonstrate its applicability by identifying novel methylation variation within enhancers strongly correlated to plasma triglyceride and HDL-cholesterol, including at CD36. Our more comprehensive AT panel assesses tissue methylation and genotypes in parallel at ∼4 and ∼3 M sites, respectively. Our study demonstrates that MCC-Seq provides comparable accuracy to alternative approaches but enables more efficient cataloguing of functional and disease-relevant epigenetic and genetic variants for large-scale EWAS. PMID:26021296

  12. elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling.

    PubMed

    Herzeel, Charlotte; Costanza, Pascal; Decap, Dries; Fostier, Jan; Reumers, Joke

    2015-01-01

    elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878), we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878), elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost.

  13. elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling

    PubMed Central

    Decap, Dries; Fostier, Jan; Reumers, Joke

    2015-01-01

    elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878), we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878), elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost. PMID:26182406

  14. Leveraging long read sequencing from a single individual to provide a comprehensive resource for benchmarking variant calling methods.

    PubMed

    Mu, John C; Tootoonchi Afshar, Pegah; Mohiyuddin, Marghoob; Chen, Xi; Li, Jian; Bani Asadi, Narges; Gerstein, Mark B; Wong, Wing H; Lam, Hugo Y K

    2015-09-28

    A high-confidence, comprehensive human variant set is critical in assessing accuracy of sequencing algorithms, which are crucial in precision medicine based on high-throughput sequencing. Although recent works have attempted to provide such a resource, they still do not encompass all major types of variants including structural variants (SVs). Thus, we leveraged the massive high-quality Sanger sequences from the HuRef genome to construct by far the most comprehensive gold set of a single individual, which was cross validated with deep Illumina sequencing, population datasets, and well-established algorithms. It was a necessary effort to completely reanalyze the HuRef genome as its previously published variants were mostly reported five years ago, suffering from compatibility, organization, and accuracy issues that prevent their direct use in benchmarking. Our extensive analysis and validation resulted in a gold set with high specificity and sensitivity. In contrast to the current gold sets of the NA12878 or HS1011 genomes, our gold set is the first that includes small variants, deletion SVs and insertion SVs up to a hundred thousand base-pairs. We demonstrate the utility of our HuRef gold set to benchmark several published SV detection tools.

  15. Identification of Genome-Wide Variants and Discovery of Variants Associated with Brassica rapa Clubroot Resistance Gene Rcr1 through Bulked Segregant RNA Sequencing.

    PubMed

    Yu, Fengqun; Zhang, Xingguo; Huang, Zhen; Chu, Mingguang; Song, Tao; Falk, Kevin C; Deora, Abhinandan; Chen, Qilin; Zhang, Yan; McGregor, Linda; Gossen, Bruce D; McDonald, Mary Ruth; Peng, Gary

    2016-01-01

    Clubroot, caused by Plasmodiophora brassicae, is an important disease on Brassica species worldwide. A clubroot resistance gene, Rcr1, with efficacy against pathotype 3 of P. brassicae, was previously mapped to chromosome A03 of B. rapa in pak choy cultivar "Flower Nabana". In the current study, resistance to pathotypes 2, 5 and 6 was shown to be associated with Rcr1 region on chromosome A03. Bulked segregant RNA sequencing was performed and short read sequences were assembled into 10 chromosomes of the B. rapa reference genome v1.5. For the resistant (R) bulks, a total of 351.8 million (M) sequences, 30,836.5 million bases (Mb) in length, produced 120-fold coverage of the reference genome. For the susceptible (S) bulks, 322.9 M sequences, 28,216.6 Mb in length, produced 109-fold coverage. In total, 776.2 K single nucleotide polymorphisms (SNPs) and 122.2 K insertion / deletion (InDels) in R bulks and 762.8 K SNPs and 118.7 K InDels in S bulks were identified; each chromosome had about 87% SNPs and 13% InDels, with 78% monomorphic and 22% polymorphic variants between the R and S bulks. Polymorphic variants on each chromosome were usually below 23%, but made up 34% of the variants on chromosome A03. There were 35 genes annotated in the Rcr1 target region and variants were identified in 21 genes. The numbers of poly variants differed significantly among the genes. Four out of them encode Toll-Interleukin-1 receptor / nucleotide-binding site / leucine-rich-repeat proteins; Bra019409 and Bra019410 harbored the higher numbers of polymorphic variants, which indicates that they are more likely candidates of Rcr1. Fourteen SNP markers in the target region were genotyped using the Kompetitive Allele Specific PCR method and were confirmed to associate with Rcr1. Selected SNP markers were analyzed with 26 recombinants obtained from a segregating population consisting of 1587 plants, indicating that they were completely linked to Rcr1. Nine SNP markers were used for marker

  16. Identification of Genome-Wide Variants and Discovery of Variants Associated with Brassica rapa Clubroot Resistance Gene Rcr1 through Bulked Segregant RNA Sequencing.

    PubMed

    Yu, Fengqun; Zhang, Xingguo; Huang, Zhen; Chu, Mingguang; Song, Tao; Falk, Kevin C; Deora, Abhinandan; Chen, Qilin; Zhang, Yan; McGregor, Linda; Gossen, Bruce D; McDonald, Mary Ruth; Peng, Gary

    2016-01-01

    Clubroot, caused by Plasmodiophora brassicae, is an important disease on Brassica species worldwide. A clubroot resistance gene, Rcr1, with efficacy against pathotype 3 of P. brassicae, was previously mapped to chromosome A03 of B. rapa in pak choy cultivar "Flower Nabana". In the current study, resistance to pathotypes 2, 5 and 6 was shown to be associated with Rcr1 region on chromosome A03. Bulked segregant RNA sequencing was performed and short read sequences were assembled into 10 chromosomes of the B. rapa reference genome v1.5. For the resistant (R) bulks, a total of 351.8 million (M) sequences, 30,836.5 million bases (Mb) in length, produced 120-fold coverage of the reference genome. For the susceptible (S) bulks, 322.9 M sequences, 28,216.6 Mb in length, produced 109-fold coverage. In total, 776.2 K single nucleotide polymorphisms (SNPs) and 122.2 K insertion / deletion (InDels) in R bulks and 762.8 K SNPs and 118.7 K InDels in S bulks were identified; each chromosome had about 87% SNPs and 13% InDels, with 78% monomorphic and 22% polymorphic variants between the R and S bulks. Polymorphic variants on each chromosome were usually below 23%, but made up 34% of the variants on chromosome A03. There were 35 genes annotated in the Rcr1 target region and variants were identified in 21 genes. The numbers of poly variants differed significantly among the genes. Four out of them encode Toll-Interleukin-1 receptor / nucleotide-binding site / leucine-rich-repeat proteins; Bra019409 and Bra019410 harbored the higher numbers of polymorphic variants, which indicates that they are more likely candidates of Rcr1. Fourteen SNP markers in the target region were genotyped using the Kompetitive Allele Specific PCR method and were confirmed to associate with Rcr1. Selected SNP markers were analyzed with 26 recombinants obtained from a segregating population consisting of 1587 plants, indicating that they were completely linked to Rcr1. Nine SNP markers were used for marker

  17. Identification of Genome-Wide Variants and Discovery of Variants Associated with Brassica rapa Clubroot Resistance Gene Rcr1 through Bulked Segregant RNA Sequencing

    PubMed Central

    Yu, Fengqun; Zhang, Xingguo; Huang, Zhen; Chu, Mingguang; Song, Tao; Falk, Kevin C.; Deora, Abhinandan; Chen, Qilin; Zhang, Yan; McGregor, Linda; Gossen, Bruce D.; McDonald, Mary Ruth; Peng, Gary

    2016-01-01

    Clubroot, caused by Plasmodiophora brassicae, is an important disease on Brassica species worldwide. A clubroot resistance gene, Rcr1, with efficacy against pathotype 3 of P. brassicae, was previously mapped to chromosome A03 of B. rapa in pak choy cultivar “Flower Nabana”. In the current study, resistance to pathotypes 2, 5 and 6 was shown to be associated with Rcr1 region on chromosome A03. Bulked segregant RNA sequencing was performed and short read sequences were assembled into 10 chromosomes of the B. rapa reference genome v1.5. For the resistant (R) bulks, a total of 351.8 million (M) sequences, 30,836.5 million bases (Mb) in length, produced 120-fold coverage of the reference genome. For the susceptible (S) bulks, 322.9 M sequences, 28,216.6 Mb in length, produced 109-fold coverage. In total, 776.2 K single nucleotide polymorphisms (SNPs) and 122.2 K insertion / deletion (InDels) in R bulks and 762.8 K SNPs and 118.7 K InDels in S bulks were identified; each chromosome had about 87% SNPs and 13% InDels, with 78% monomorphic and 22% polymorphic variants between the R and S bulks. Polymorphic variants on each chromosome were usually below 23%, but made up 34% of the variants on chromosome A03. There were 35 genes annotated in the Rcr1 target region and variants were identified in 21 genes. The numbers of poly variants differed significantly among the genes. Four out of them encode Toll-Interleukin-1 receptor / nucleotide-binding site / leucine-rich-repeat proteins; Bra019409 and Bra019410 harbored the higher numbers of polymorphic variants, which indicates that they are more likely candidates of Rcr1. Fourteen SNP markers in the target region were genotyped using the Kompetitive Allele Specific PCR method and were confirmed to associate with Rcr1. Selected SNP markers were analyzed with 26 recombinants obtained from a segregating population consisting of 1587 plants, indicating that they were completely linked to Rcr1. Nine SNP markers were used for marker

  18. Dual-color detection of DNA sequence variants by ligase-mediated analysis

    SciTech Connect

    Samiotaki, M.; Kwiatkowski, M.; Parik, J.; Landegren, U. )

    1994-03-15

    Genetic screening for sequence variants associated with disease is assuming increasing importance in clinical medicine as well as in research. The authors describe an efficient method for such analyses, comprising a combination of practical features: (1) Amplified DNA samples are analyzed for their ability to serve as templates in standardized allele-specific ligation reactions between oligonucleotide probes; (2) Two allele-specific probes, differentially labeled with either of two lanthanide labels, compete for ligation to a third oligonucleotide (the signal from the two labeled probes can thus be directly compared in a sensitive time-resolved fluorescence detection reaction); and (3) Large sets of analyses are processed in parallel using a 96-pin capture manifold, serving to reduce pipetting steps and the risk of contamination. The authors present here the basis of the technique and its application to the screening for two common mutations causing cystic fibrosis and [alpha][sub 1]-antiytrypsin deficiency. 19 refs., 4 figs.

  19. High-Throughput Sequencing Reveals Single Nucleotide Variants in Longer-Kernel Bread Wheat

    PubMed Central

    Chen, Feng; Zhu, Zibo; Zhou, Xiaobian; Yan, Yan; Dong, Zhongdong; Cui, Dangqun

    2016-01-01

    The transcriptomes of bread wheat Yunong 201 and its ethyl methanesulfonate derivative Yunong 3114 were obtained by next-sequencing technology. Single nucleotide variants (SNVs) in the wheat strains were explored and compared. A total of 5907 and 6287 non-synonymous SNVs were acquired for Yunong 201 and 3114, respectively. A total of 4021 genes with SNVs were obtained. The genes that underwent non-synonymous SNVs were significantly involved in ATP binding, protein phosphorylation, and cellular protein metabolic process. The heat map analysis also indicated that most of these mutant genes were significantly differentially expressed at different developmental stages. The SNVs in these genes possibly contribute to the longer kernel length of Yunong 3114. Our data provide useful information on wheat transcriptome for future studies on wheat functional genomics. This study could also help in illustrating the gene functions of the non-synonymous SNVs of Yunong 201 and 3114. PMID:27551288

  20. High-Throughput Sequencing Reveals Single Nucleotide Variants in Longer-Kernel Bread Wheat.

    PubMed

    Chen, Feng; Zhu, Zibo; Zhou, Xiaobian; Yan, Yan; Dong, Zhongdong; Cui, Dangqun

    2016-01-01

    The transcriptomes of bread wheat Yunong 201 and its ethyl methanesulfonate derivative Yunong 3114 were obtained by next-sequencing technology. Single nucleotide variants (SNVs) in the wheat strains were explored and compared. A total of 5907 and 6287 non-synonymous SNVs were acquired for Yunong 201 and 3114, respectively. A total of 4021 genes with SNVs were obtained. The genes that underwent non-synonymous SNVs were significantly involved in ATP binding, protein phosphorylation, and cellular protein metabolic process. The heat map analysis also indicated that most of these mutant genes were significantly differentially expressed at different developmental stages. The SNVs in these genes possibly contribute to the longer kernel length of Yunong 3114. Our data provide useful information on wheat transcriptome for future studies on wheat functional genomics. This study could also help in illustrating the gene functions of the non-synonymous SNVs of Yunong 201 and 3114. PMID:27551288

  1. Associations between variants of FADS genes and omega-3 and omega-6 milk fatty acids of Canadian Holstein cows

    PubMed Central

    2014-01-01

    Background Fatty acid desaturase 1 (FADS1) and 2 (FADS2) genes code respectively for the enzymes delta-5 and delta-6 desaturases which are rate limiting enzymes in the synthesis of polyunsaturated omega-3 and omega-6 fatty acids (FAs). Omega-3 and-6 FAs as well as conjugated linoleic acid (CLA) are present in bovine milk and have demonstrated positive health effects in humans. Studies in humans have shown significant relationships between genetic variants in FADS1 and 2 genes with plasma and tissue concentrations of omega-3 and-6 FAs. The aim of this study was to evaluate the extent of sequence variations within these two genes in Canadian Holstein cows as well as the association between sequence variants and health promoting FAs in milk. Results Thirty three SNPs were detected within the studied regions of genes including a synonymous mutation (FADS1-07, rs42187261, 306Tyr > Tyr) in exon 8 of FADS1, a non-synonymous mutation (FADS2-14, rs211580559, 294Ala > Val) within FADS2 exon 7, a splice site SNP (FADS2-05, rs211263660), a 3′UTR SNP (FADS2-23, rs109772589), and another 3′UTR SNP with an effect on a microRNA binding site within FADS2 gene (FADS2-19, rs210169303). Association analyses showed significant relations between three out of seven tested SNPs and several FAs. Significant associations (FDR P < 0.05) were recorded between FADS2-23 (rs109772589) and two omega-6 FAs (dihomogamma linolenic acid [C20:3n6] and arachidonic acid [C20:4n6]), FADS1-07 (rs42187261) and one omega-3 FA (eicosapentaenoic acid, C20:5n3) and tricosanoic acid (C23:0), and one intronic SNP, FADS1-01 (rs136261927) and C20:3n6. Conclusion Our study has demonstrated positive associations between three SNPs within FADS1 and FADS2 genes (a SNP within the 3’UTR, a synonymous SNP and an intronic SNP), with three milk PUFAs of Canadian Holstein cows thus suggesting possible involvement of synonymous and non-coding region variants in FA synthesis. These SNPs may serve as

  2. Direct mutation analysis by high-throughput sequencing: from germline to low-abundant, somatic variants

    PubMed Central

    Gundry, Michael; Vijg, Jan

    2011-01-01

    DNA mutations are the source of genetic variation within populations. The majority of mutations with observable effects are deleterious. In humans mutations in the germ line can cause genetic disease. In somatic cells multiple rounds of mutations and selection lead to cancer. The study of genetic variation has progressed rapidly since the completion of the draft sequence of the human genome. Recent advances in sequencing technology, most importantly the introduction of massively parallel sequencing (MPS), have resulted in more than a hundred-fold reduction in the time and cost required for sequencing nucleic acids. These improvements have greatly expanded the use of sequencing as a practical tool for mutation analysis. While in the past the high cost of sequencing limited mutation analysis to selectable markers or small forward mutation targets assumed to be representative for the genome overall, current platforms allow whole genome sequencing for less than $5,000. This has already given rise to direct estimates of germline mutation rates in multiple organisms including humans by comparing whole genome sequences between parents and offspring. Here we present a brief history of the field of mutation research, with a focus on classical tools for the measurement of mutation rates. We then review MPS, how it is currently applied and the new insight into human and animal mutation frequencies and spectra that has been obtained from whole genome sequencing. While great progress has been made, we note that the single most important limitation of current MPS approaches for mutation analysis is the inability to address low-abundance mutations that turn somatic tissues into mosaics of cells. Such mutations are at the basis of intra-tumor heterogeneity, with important implications for clinical diagnosis, and could also contribute to somatic diseases other than cancer, including aging. Some possible approaches to gain access to low-abundance mutations are discussed, with a

  3. HapFABIA: identification of very short segments of identity by descent characterized by rare variants in large sequencing data.

    PubMed

    Hochreiter, Sepp

    2013-12-01

    Identity by descent (IBD) can be reliably detected for long shared DNA segments, which are found in related individuals. However, many studies contain cohorts of unrelated individuals that share only short IBD segments. New sequencing technologies facilitate identification of short IBD segments through rare variants, which convey more information on IBD than common variants. Current IBD detection methods, however, are not designed to use rare variants for the detection of short IBD segments. Short IBD segments reveal genetic structures at high resolution. Therefore, they can help to improve imputation and phasing, to increase genotyping accuracy for low-coverage sequencing and to increase the power of association studies. Since short IBD segments are further assumed to be old, they can shed light on the evolutionary history of humans. We propose HapFABIA, a computational method that applies biclustering to identify very short IBD segments characterized by rare variants. HapFABIA is designed to detect short IBD segments in genotype data that were obtained from next-generation sequencing, but can also be applied to DNA microarray data. Especially in next-generation sequencing data, HapFABIA exploits rare variants for IBD detection. HapFABIA significantly outperformed competing algorithms at detecting short IBD segments on artificial and simulated data with rare variants. HapFABIA identified 160 588 different short IBD segments characterized by rare variants with a median length of 23 kb (mean 24 kb) in data for chromosome 1 of the 1000 Genomes Project. These short IBD segments contain 752 000 single nucleotide variants (SNVs), which account for 39% of the rare variants and 23.5% of all variants. The vast majority-152 000 IBD segments-are shared by Africans, while only 19 000 and 11 000 are shared by Europeans and Asians, respectively. IBD segments that match the Denisova or the Neandertal genome are found significantly more often in Asians and Europeans but also, in

  4. Mapping whole genome shotgun sequence and variant calling in mammalian species without their reference genomes.

    PubMed

    Kalbfleisch, Ted; Heaton, Michael P

    2013-01-01

    Genomics research in mammals has produced reference genome sequences that are essential for identifying variation associated with disease.  High quality reference genome sequences are now available for humans, model species, and economically important agricultural animals.  Comparisons between these species have provided unique insights into mammalian gene function.  However, the number of species with reference genomes is small compared to those needed for studying molecular evolutionary relationships in the tree of life.  For example, among the even-toed ungulates there are approximately 300 species whose phylogenetic relationships have been calculated in the 10k trees project.  Only six of these have reference genomes:  cattle, swine, sheep, goat, water buffalo, and bison.  Although reference sequences will eventually be developed for additional hoof stock, the resources in terms of time, money, infrastructure and expertise required to develop a quality reference genome may be unattainable for most species for at least another decade.  In this work we mapped 35 Gb of next generation sequence data of a Katahdin sheep to its own species' reference genome ( Ovis aries Oar3.1) and to that of a species that diverged 15 to 30 million years ago ( Bos taurus UMD3.1).  In total, 56% of reads covered 76% of UMD3.1 to an average depth of 6.8 reads per site, 83 million variants were identified, of which 78 million were homozygous and likely represent interspecies nucleotide differences. Excluding repeat regions and sex chromosomes, nearly 3.7 million heterozygous sites were identified in this animal vs. bovine UMD3.1, representing polymorphisms occurring in sheep.  Of these, 41% could be readily mapped to orthologous positions in ovine Oar3.1 with 80% corroborated as heterozygous.  These variant sites, identified via interspecies mapping could be used for comparative genomics, disease association studies, and ultimately to understand mammalian gene

  5. Recurrent triploidy due to a failure to complete maternal meiosis II: whole-exome sequencing reveals candidate variants.

    PubMed

    Filges, I; Manokhina, I; Peñaherrera, M S; McFadden, D E; Louie, K; Nosova, E; Friedman, J M; Robinson, W P

    2015-04-01

    Triploidy is a relatively common cause of miscarriage; however, recurrent triploidy has rarely been reported. A healthy 34-year-old woman was ascertained because of 18 consecutive miscarriages with triploidy found in all 5 karyotyped losses. Molecular results in a sixth loss were also consistent with triploidy. Genotyping of markers near the centromere on multiple chromosomes suggested that all six triploid conceptuses occurred as a result of failure to complete meiosis II (MII). The proband's mother had also experienced recurrent miscarriage, with a total of 18 miscarriages. Based on the hypothesis that an inherited autosomal-dominant maternal predisposition would explain the phenotype, whole-exome sequencing of the proband and her parents was undertaken to identify potential candidate variants. After filtering for quality and rarity, potentially damaging variants shared between the proband and her mother were identified in 47 genes. Variants in genes coding for proteins implicated in oocyte maturation, oocyte activation or polar body extrusion were then prioritized. Eight of the most promising candidate variants were confirmed by Sanger sequencing. These included a novel change in the PLCD4 gene, and a rare variant in the OSBPL5 gene, which have been implicated in oocyte activation upon fertilization and completion of MII. Several variants in genes coding proteins playing a role in oocyte maturation and early embryonic development were also identified. The genes identified may be candidates for the study in other women experiencing recurrent triploidy or recurrent IVF failure.

  6. Prioritization Of Nonsynonymous Single Nucleotide Variants For Exome Sequencing Studies Via Integrative Learning On Multiple Genomic Data

    PubMed Central

    Wu, Mengmeng; Wu, Jiaxin; Chen, Ting; Jiang, Rui

    2015-01-01

    The rapid advancement of next generation sequencing technology has greatly accelerated the progress for understanding human inherited diseases via such innovations as exome sequencing. Nevertheless, the identification of causative variants from sequencing data remains a great challenge. Traditional statistical genetics approaches such as linkage analysis and association studies have limited power in analyzing exome sequencing data, while relying on simply filtration strategies and predicted functional implications of mutations to pinpoint pathogenic variants are prone to produce false positives. To overcome these limitations, we herein propose a supervised learning approach, termed snvForest, to prioritize candidate nonsynonymous single nucleotide variants for a specific type of disease by integrating 11 functional scores at the variant level and 8 association scores at the gene level. We conduct a series of large-scale in silico validation experiments, demonstrating the effectiveness of snvForest across 2,511 diseases of different inheritance styles and the superiority of our approach over two state-of-the-art methods. We further apply snvForest to three real exome sequencing data sets of epileptic encephalophathies and intellectual disability to show the ability of our approach to identify causative de novo mutations for these complex diseases. The online service and standalone software of snvForest are found at http://bioinfo.au.tsinghua.edu.cn/jianglab/snvforest. PMID:26459872

  7. Prioritization Of Nonsynonymous Single Nucleotide Variants For Exome Sequencing Studies Via Integrative Learning On Multiple Genomic Data.

    PubMed

    Wu, Mengmeng; Wu, Jiaxin; Chen, Ting; Jiang, Rui

    2015-01-01

    The rapid advancement of next generation sequencing technology has greatly accelerated the progress for understanding human inherited diseases via such innovations as exome sequencing. Nevertheless, the identification of causative variants from sequencing data remains a great challenge. Traditional statistical genetics approaches such as linkage analysis and association studies have limited power in analyzing exome sequencing data, while relying on simply filtration strategies and predicted functional implications of mutations to pinpoint pathogenic variants are prone to produce false positives. To overcome these limitations, we herein propose a supervised learning approach, termed snvForest, to prioritize candidate nonsynonymous single nucleotide variants for a specific type of disease by integrating 11 functional scores at the variant level and 8 association scores at the gene level. We conduct a series of large-scale in silico validation experiments, demonstrating the effectiveness of snvForest across 2,511 diseases of different inheritance styles and the superiority of our approach over two state-of-the-art methods. We further apply snvForest to three real exome sequencing data sets of epileptic encephalophathies and intellectual disability to show the ability of our approach to identify causative de novo mutations for these complex diseases. The online service and standalone software of snvForest are found at http://bioinfo.au.tsinghua.edu.cn/jianglab/snvforest. PMID:26459872

  8. Predicting intrinsic disorder from amino acid sequence.

    PubMed

    Obradovic, Zoran; Peng, Kang; Vucetic, Slobodan; Radivojac, Predrag; Brown, Celeste J; Dunker, A Keith

    2003-01-01

    Blind predictions of intrinsic order and disorder were made on 42 proteins subsequently revealed to contain 9,044 ordered residues, 284 disordered residues in 26 segments of length 30 residues or less, and 281 disordered residues in 2 disordered segments of length greater than 30 residues. The accuracies of the six predictors used in this experiment ranged from 77% to 91% for the ordered regions and from 56% to 78% for the disordered segments. The average of the order and disorder predictions ranged from 73% to 77%. The prediction of disorder in the shorter segments was poor, from 25% to 66% correct, while the prediction of disorder in the longer segments was better, from 75% to 95% correct. Four of the predictors were composed of ensembles of neural networks. This enabled them to deal more efficiently with the large asymmetry in the training data through diversified sampling from the significantly larger ordered set and achieve better accuracy on ordered and long disordered regions. The exclusive use of long disordered regions for predictor training likely contributed to the disparity of the predictions on long versus short disordered regions, while averaging the output values over 61-residue windows to eliminate short predictions of order or disorder probably contributed to the even greater disparity for three of the predictors. This experiment supports the predictability of intrinsic disorder from amino acid sequence. PMID:14579347

  9. Introduction to Deep Sequencing and Its Application to Drug Addiction Research with a Focus on Rare Variants

    PubMed Central

    Wang, Shaolin; Yang, Zhongli; Ma, Jennie Z.; Payne, Thomas J.; Li, Ming D

    2013-01-01

    Through linkage analysis, candidate gene approach, and genome-wide association studies (GWAS), many genetic susceptibility factors for substance dependence have been discovered, such as the alcohol dehydrogenase gene (ALDH2) for alcohol dependence (AD) and nicotinic acetylcholine receptor (nAChR) subunit variants on chromosomes 8 and 15 for nicotine dependence (ND). However, these confirmed genetic factors contribute only a small portion of the heritability responsible for each addiction. Among many potential factors, rare variants in those identified and unidentified susceptibility genes are supposed to contribute greatly to the missing heritability. Several studies focusing on rare variants have been conducted by taking advantage of next-generation sequencing technologies, which revealed that some rare variants of nAChR subunits are associated with ND in both genetic and functional studies. However, these studies investigated variants for only a small number of genes and need to be expanded to broad regions/genes in a larger population. This review presents an update on recently developed methods for rare-variant identification and association analysis and on studies focused on rare-variant discovery and function related to addictions. PMID:23990377

  10. Genome-wide association study reveals five nucleotide sequence variants for carcass traits in beef cattle.

    PubMed

    Kim, Y; Ryu, J; Woo, J; Kim, J B; Kim, C Y; Lee, C

    2011-08-01

    Genetic associations of nucleotide sequence variants with carcass traits in beef cattle were investigated using a genome-wide single nucleotide polymorphism (SNP) assay. Three hundred and thirteen Korean cattle were genotyped with the Illumina BovineSNP50 BeadChip, and 39,129 SNPs from 311 animals were analysed for each carcass phenotype after filtering by quality assurance. Five sequence markers were associated with one of the meat quantity or quality traits; rs109593638 on chromosome 3 with marbling score, rs109821175 on chromosome 11 and rs110862496 on chromosome 13 with backfat thickness (BFT), and rs110228023 on chromosome 6 and rs110201414 on chromosome 16 with eye muscle area (EMA) (P < 1.27 × 10(-6) , Bonferonni P < 0.05). The ss96319521 SNP, located within a gene with functions of muscle development, dishevelled homolog 1 (DVL1), would be a desirable candidate marker. Individuals with genotype CC at this gene appeared to have increased both EMA and carcass weight. Fine-mapping would be required to refine each of the five association signals shown in the current study for future application in marker-assisted selection for genetic improvement of beef quality and quantity.

  11. Assessing pathogenicity for novel mutation/sequence variants: the value of healthy older individuals.

    PubMed

    Zatz, Mayana; Pavanello, Rita de Cassia M; Lourenço, Naila Cristina V; Cerqueira, Antonia; Lazar, Monize; Vainzof, Mariz

    2012-12-01

    Improvement in DNA technology is increasingly revealing unexpected/unknown mutations in healthy persons and generating anxiety due to their still unknown health consequences. We report a 44-year-old healthy father of a 10-year-old daughter with bilateral coloboma and hearing loss, but without muscle weakness, in whom a whole-genome CGH revealed a deletion of exons 38-44 in the dystrophin gene. This mutation was inherited from her asymptomatic father, who was further clinically and molecularly evaluated for prognosis and genetic counseling (GC). This deletion was never identified by us in 982 Duchenne/Becker patients. To assess whether the present case represents a rare case of non-penetrance, and aiming to obtain more information for prognosis and GC, we suggested that healthy older relatives submit their DNA for analysis, to which several complied. Mutation analysis revealed that his mother, brother, and 56-year-old maternal uncle also carry the 38-44 deletion, suggesting it an unlikely cause of muscle weakness. Genome sequencing will disclose mutations and variants whose health impact are still unknown, raising important problems in interpreting results, defining prognosis, and discussing GC. We suggest that, in addition to family history, keeping the DNA of older relatives could be very informative, in particular for those interested in having their genome sequenced.

  12. Pooled Sequencing of Candidate Genes Implicates Rare Variants in the Development of Asthma Following Severe RSV Bronchiolitis in Infancy.

    PubMed

    Torgerson, Dara G; Giri, Tusar; Druley, Todd E; Zheng, Jie; Huntsman, Scott; Seibold, Max A; Young, Andrew L; Schweiger, Toni; Yin-Declue, Huiqing; Sajol, Geneline D; Schechtman, Kenneth B; Hernandez, Ryan D; Randolph, Adrienne G; Bacharier, Leonard B; Castro, Mario

    2015-01-01

    Severe infection with respiratory syncytial virus (RSV) during infancy is strongly associated with the development of asthma. To identify genetic variation that contributes to asthma following severe RSV bronchiolitis during infancy, we sequenced the coding exons of 131 asthma candidate genes in 182 European and African American children with severe RSV bronchiolitis in infancy using anonymous pools for variant discovery, and then directly genotyped a set of 190 nonsynonymous variants. Association testing was performed for physician-diagnosed asthma before the 7th birthday (asthma) using genotypes from 6,500 individuals from the Exome Sequencing Project (ESP) as controls to gain statistical power. In addition, among patients with severe RSV bronchiolitis during infancy, we examined genetic associations with asthma, active asthma, persistent wheeze, and bronchial hyperreactivity (methacholine PC20) at age 6 years. We identified four rare nonsynonymous variants that were significantly associated with asthma following severe RSV bronchiolitis, including single variants in ADRB2, FLG and NCAM1 in European Americans (p = 4.6x10-4, 1.9x10-13 and 5.0x10-5, respectively), and NOS1 in African Americans (p = 2.3x10-11). One of the variants was a highly functional nonsynonymous variant in ADRB2 (rs1800888), which was also nominally associated with asthma (p = 0.027) and active asthma (p = 0.013) among European Americans with severe RSV bronchiolitis without including the ESP. Our results suggest that rare nonsynonymous variants contribute to the development of asthma following severe RSV bronchiolitis in infancy, notably in ADRB2. Additional studies are required to explore the role of rare variants in the etiology of asthma and asthma-related traits following severe RSV bronchiolitis.

  13. Analysis of ANK3 and CACNA1C variants identified in bipolar disorder whole genome sequence data

    PubMed Central

    Fiorentino, Alessia; O'Brien, Niamh Louise; Locke, Devin Paul; McQuillin, Andrew; Jarram, Alexandra; Anjorin, Adebayo; Kandaswamy, Radhika; Curtis, David; Blizard, Robert Alan; Gurling, Hugh Malcolm Douglas

    2014-01-01

    Objectives Genetic markers in the genes encoding ankyrin 3 (ANK3) and the α-calcium channel subunit (CACNA1C) are associated with bipolar disorder (BP). The associated variants in the CACNA1C gene are mainly within intron 3 of the gene. ANK3 BP-associated variants are in two distinct clusters at the ends of the gene, indicating disease allele heterogeneity. Methods In order to screen both coding and non-coding regions to identify potential aetiological variants, we used whole-genome sequencing in 99 BP cases. Variants with markedly different allele frequencies in the BP samples and the 1,000 genomes project European data were genotyped in 1,510 BP cases and 1,095 controls. Results We found that the CACNA1C intron 3 variant, rs79398153, potentially affecting an ENCyclopedia of DNA Elements (ENCODE)-defined region, showed an association with BP (p = 0.015). We also found the ANK3 BP-associated variant rs139972937, responsible for an asparagine to serine change (p = 0.042). However, a previous study had not found support for an association between rs139972937 and BP. The variants at ANK3 and CACNA1C previously known to be associated with BP were not in linkage disequilibrium with either of the two variants that we identified and these are therefore independent of the previous haplotypes implicated by genome-wide association. Conclusions Sequencing in additional BP samples is needed to find the molecular pathology that explains the previous association findings. If changes similar to those we have found can be shown to have an effect on the expression and function of ANK3 and CACNA1C, they might help to explain the so-called ‘missing heritability’ of BP. PMID:24716743

  14. Sequence analysis of three pigmentation genes in the Newfoundland population of Canis latrans links the Golden Retriever Mc1r variant to white coat color in coyotes.

    PubMed

    Brockerville, Ryan M; McGrath, Michael J; Pilgrim, Brettney L; Marshall, H Dawn

    2013-04-01

    Three genes, Mc1r, Agouti, and CBD103, interact in a type-switching process that controls much of the pigmentation variation observed in mammals. A deletion in the CBD103 gene is responsible for dominant black color in dogs, while the white-phased black bear ("spirit bear") of British Columbia, Canada, is the lightest documented color variant caused by a mutation in Mc1r. Rare all-white animals have recently been discovered in a new northeastern population of the coyote in insular Newfoundland and Labrador, Canada. To investigate the causative gene and mutation of white coat in coyotes, we sequenced the three type-switching genes in white and dark-phased animals from Newfoundland. The only sequence variants unambiguously associated with white color were in Mc1r, and one of these variants causes the amino acid variant R306Ter, a premature stop codon also linked to coat color in Golden Retrievers and other dogs with yellow/red coats. The allele carrying R306Ter in coyotes matches that in the Golden Retriever at other variable amino acid sites and hence may have originated in these dogs. Coyotes experienced introgression with wolves and dogs as they colonized northeastern North America, and coyote/Golden Retriever interactions have been observed in Newfoundland. We speculate that natural selection, with or without a founder effect, may contribute to the observed frequency of white coyotes in Newfoundland, as it has contributed to the high frequency of white bears, and of a domestic dog-derived CBD allele in gray wolves. PMID:23297074

  15. Sequence analysis of three pigmentation genes in the Newfoundland population of Canis latrans links the Golden Retriever Mc1r variant to white coat color in coyotes.

    PubMed

    Brockerville, Ryan M; McGrath, Michael J; Pilgrim, Brettney L; Marshall, H Dawn

    2013-04-01

    Three genes, Mc1r, Agouti, and CBD103, interact in a type-switching process that controls much of the pigmentation variation observed in mammals. A deletion in the CBD103 gene is responsible for dominant black color in dogs, while the white-phased black bear ("spirit bear") of British Columbia, Canada, is the lightest documented color variant caused by a mutation in Mc1r. Rare all-white animals have recently been discovered in a new northeastern population of the coyote in insular Newfoundland and Labrador, Canada. To investigate the causative gene and mutation of white coat in coyotes, we sequenced the three type-switching genes in white and dark-phased animals from Newfoundland. The only sequence variants unambiguously associated with white color were in Mc1r, and one of these variants causes the amino acid variant R306Ter, a premature stop codon also linked to coat color in Golden Retrievers and other dogs with yellow/red coats. The allele carrying R306Ter in coyotes matches that in the Golden Retriever at other variable amino acid sites and hence may have originated in these dogs. Coyotes experienced introgression with wolves and dogs as they colonized northeastern North America, and coyote/Golden Retriever interactions have been observed in Newfoundland. We speculate that natural selection, with or without a founder effect, may contribute to the observed frequency of white coyotes in Newfoundland, as it has contributed to the high frequency of white bears, and of a domestic dog-derived CBD allele in gray wolves.

  16. Sequence analysis of MHC class I alpha 2 domain exon variants in one diploid and two haploid Atlantic salmon pedigrees.

    PubMed

    Grimholt, U; Olsaker, I; Lingaas, F; Lie, O

    1997-12-01

    Genetic diversity in the second domain exon of Atlantic salmon (Salmo salar) major histocompatibility complex (Mhc) class I was investigated in two dams and nine of their haploid offspring by means of polymerase chain reaction (PCR) and DNA sequence analysis. A similar study was also performed on nine diploid offspring from one of these dams. The complex segregation patterns and sequence similarities between variants make definitive allele, haplotype and locus assignments difficult. There are, however, indications of six Mhc-Sasa class I loci and a fairly well-defined haplotype of four variants. One non-polymorphic variant present in most specimens could be a salmon analogue to the human non-classical loci. PMID:9589580

  17. Evolutionary balancing is critical for correctly forecasting disease-associated amino acid variants.

    PubMed

    Liu, Li; Kumar, Sudhir

    2013-06-01

    Computational predictions have become indispensable for evaluating the disease-related impact of nonsynonymous single-nucleotide variants discovered in exome sequencing. Many such methods have their roots in molecular evolution, as they use information derived from multiple sequence alignments. We show that the performance of current methods (e.g., PolyPhen-2 and SIFT) is improved significantly by optimizing their statistical models on evolutionarily balanced training data, where equal numbers of positive and negative controls within each evolutionary conservation class are used. Evolutionary balancing significantly reduces the false-positive rates for variants observed at highly conserved sites and false-negative rates for variants observed at fast evolving sites. Use of these improved methods enables more accurate forecasting when concordant diagnosis from multiple methods is regarded as a more reliable indicator of the prediction. Applied to a large exome variation data set, we find that the current methods produce concordant predictions for less than half of the population variants. These advances are implemented in a web resource for use in practical applications (www.mypeg.info, last accessed March 13, 2013). PMID:23462317

  18. Evolutionary balancing is critical for correctly forecasting disease-associated amino acid variants.

    PubMed

    Liu, Li; Kumar, Sudhir

    2013-06-01

    Computational predictions have become indispensable for evaluating the disease-related impact of nonsynonymous single-nucleotide variants discovered in exome sequencing. Many such methods have their roots in molecular evolution, as they use information derived from multiple sequence alignments. We show that the performance of current methods (e.g., PolyPhen-2 and SIFT) is improved significantly by optimizing their statistical models on evolutionarily balanced training data, where equal numbers of positive and negative controls within each evolutionary conservation class are used. Evolutionary balancing significantly reduces the false-positive rates for variants observed at highly conserved sites and false-negative rates for variants observed at fast evolving sites. Use of these improved methods enables more accurate forecasting when concordant diagnosis from multiple methods is regarded as a more reliable indicator of the prediction. Applied to a large exome variation data set, we find that the current methods produce concordant predictions for less than half of the population variants. These advances are implemented in a web resource for use in practical applications (www.mypeg.info, last accessed March 13, 2013).

  19. Exploring the feasibility of using copy number variants as genetic markers through large-scale whole genome sequencing experiments

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Copy number variants (CNV) are large scale duplications or deletions of genomic sequence that are caused by a diverse set of molecular phenomena that are distinct from single nucleotide polymorphism (SNP) formation. Due to their different mechanisms of formation, CNVs are often difficult to track us...

  20. A rare variant of the mtDNA HVS1 sequence in the hairs of Napoléon's family

    PubMed Central

    2010-01-01

    This paper describes the finding of a rare variant in the sequence of the hypervariable segment (HVS1) of mitochondrial (mtDNA) extracted from two preserved hairs, authenticated as belonging to the French Emperor Napoléon I (Napoléon Bonaparte). This rare variant is a mutation that changes the base C to T at position 16,184 (16184C→T), and it constitutes the only mutation found in this HVS1 sequence. This mutation is rare, because it was not found in a reference database (P < 0.05). In a personal database (M. Pala) comprising 37,000 different sequences, the 16184C→T mutation was found in only three samples, thus in this database the mutation frequency was 0.00008%. This mutation 16184C→T was also the only variant found subsequently in the HVS1 sequences of mtDNAs extracted from Napoléon's mother (Letizia) and from his youngest sister (Caroline), confirming that this mutation is maternally inherited. This 16184C→T variant could be used for genetic verification to authenticate any doubtful material and determine whether it should indeed be attributed to Napoléon. PMID:21092341

  1. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2002-01-01

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  2. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2006-07-04

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  3. Kit for detecting nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    2001-01-01

    A kit is provided for detecting a target nucleic acid sequence in a sample, the kit comprising: a first hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the first hybridization probe including a first complexing agent for forming a binding pair with a second complexing agent; and a second hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the first hybridization probe does not selectively hybridize, the second hybridization probe including a detectable marker; a third hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the third hybridization probe including the same detectable marker as the second hybridization probe; and a fourth hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the third hybridization probe does not selectively hybridize, the fourth hybridization probe including the first complexing agent for forming a binding pair with the second complexing agent; wherein the first and second hybridization probes are capable of simultaneously hybridizing to the target sequence and the third and fourth hybridization probes are capable of simultaneously hybridizing to the target sequence, the detectable marker is not present on the first or fourth hybridization probes and the first, second, third, and fourth hybridization probes each include a competitive nucleic acid sequence which is sufficiently complementary to a third portion of the target sequence that the competitive sequences of the first, second, third, and fourth hybridization probes compete with each other to hybridize to the third portion of the

  4. Whole-exome imputation of sequence variants identified two novel alleles associated with adult body height in African Americans.

    PubMed

    Du, Mengmeng; Auer, Paul L; Jiao, Shuo; Haessler, Jeffrey; Altshuler, David; Boerwinkle, Eric; Carlson, Christopher S; Carty, Cara L; Chen, Yii-Der Ida; Curtis, Keith; Franceschini, Nora; Hsu, Li; Jackson, Rebecca; Lange, Leslie A; Lettre, Guillaume; Monda, Keri L; Nickerson, Deborah A; Reiner, Alex P; Rich, Stephen S; Rosse, Stephanie A; Rotter, Jerome I; Willer, Cristen J; Wilson, James G; North, Kari; Kooperberg, Charles; Heard-Costa, Nancy; Peters, Ulrike

    2014-12-15

    Adult body height is a quantitative trait for which genome-wide association studies (GWAS) have identified numerous loci, primarily in European populations. These loci, comprising common variants, explain <10% of the phenotypic variance in height. We searched for novel associations between height and common (minor allele frequency, MAF ≥5%) or infrequent (0.5% < MAF < 5%) variants across the exome in African Americans. Using a reference panel of 1692 African Americans and 471 Europeans from the National Heart, Lung, and Blood Institute's (NHLBI) Exome Sequencing Project (ESP), we imputed whole-exome sequence data into 13 719 African Americans with existing array-based GWAS data (discovery). Variants achieving a height-association threshold of P < 5E-06 in the imputed dataset were followed up in an independent sample of 1989 African Americans with whole-exome sequence data (replication). We used P < 2.5E-07 (=0.05/196 779 variants) to define statistically significant associations in meta-analyses combining the discovery and replication sets (N = 15 708). We discovered and replicated three independent loci for association: 5p13.3/C5orf22/rs17410035 (MAF = 0.10, β = 0.64 cm, P = 8.3E-08), 13q14.2/SPRYD7/rs114089985 (MAF = 0.03, β = 1.46 cm, P = 4.8E-10) and 17q23.3/GH2/rs2006123 (MAF = 0.30; β = 0.47 cm; P = 4.7E-09). Conditional analyses suggested 5p13.3 (C5orf22/rs17410035) and 13q14.2 (SPRYD7/rs114089985) may harbor novel height alleles independent of previous GWAS-identified variants (r(2) with GWAS loci <0.01); whereas 17q23.3/GH2/rs2006123 was correlated with GWAS-identified variants in European and African populations. Notably, 13q14.2/rs114089985 is infrequent in African Americans (MAF = 3%), extremely rare in European Americans (MAF = 0.03%), and monomorphic in Asian populations, suggesting it may be an African-American-specific height allele. Our findings demonstrate that whole-exome imputation of sequence variants can identify low-frequency variants

  5. Analysis and Annotation of Nucleic Acid Sequence

    SciTech Connect

    States, David J.

    2004-07-28

    The aims of this project were to develop improved methods for computational genome annotation and to apply these methods to improve the annotation of genomic sequence data with a specific focus on human genome sequencing. The project resulted in a substantial body of published work. Notable contributions of this project were the identification of basecalling and lane tracking as error processes in genome sequencing and contributions to improved methods for these steps in genome sequencing. This technology improved the accuracy and throughput of genome sequence analysis. Probabilistic methods for physical map construction were developed. Improved methods for sequence alignment, alternative splicing analysis, promoter identification and NF kappa B response gene prediction were also developed.

  6. Exome sequencing in an admixed isolated population indicates NFXL1 variants confer a risk for specific language impairment.

    PubMed

    Villanueva, Pía; Nudel, Ron; Hoischen, Alexander; Fernández, María Angélica; Simpson, Nuala H; Gilissen, Christian; Reader, Rose H; Jara, Lillian; Echeverry, María Magdalena; Echeverry, Maria Magdalena; Francks, Clyde; Baird, Gillian; Conti-Ramsden, Gina; O'Hare, Anne; Bolton, Patrick F; Hennessy, Elizabeth R; Palomino, Hernán; Carvajal-Carmona, Luis; Veltman, Joris A; Cazier, Jean-Baptiste; De Barbieri, Zulema; Fisher, Simon E; Newbury, Dianne F

    2015-03-01

    Children affected by Specific Language Impairment (SLI) fail to acquire age appropriate language skills despite adequate intelligence and opportunity. SLI is highly heritable, but the understanding of underlying genetic mechanisms has proved challenging. In this study, we use molecular genetic techniques to investigate an admixed isolated founder population from the Robinson Crusoe Island (Chile), who are affected by a high incidence of SLI, increasing the power to discover contributory genetic factors. We utilize exome sequencing in selected individuals from this population to identify eight coding variants that are of putative significance. We then apply association analyses across the wider population to highlight a single rare coding variant (rs144169475, Minor Allele Frequency of 4.1% in admixed South American populations) in the NFXL1 gene that confers a nonsynonymous change (N150K) and is significantly associated with language impairment in the Robinson Crusoe population (p = 2.04 × 10-4, 8 variants tested). Subsequent sequencing of NFXL1 in 117 UK SLI cases identified four individuals with heterozygous variants predicted to be of functional consequence. We conclude that coding variants within NFXL1 confer an increased risk of SLI within a complex genetic model.

  7. Exome Sequencing in an Admixed Isolated Population Indicates NFXL1 Variants Confer a Risk for Specific Language Impairment

    PubMed Central

    Villanueva, Pía; Nudel, Ron; Hoischen, Alexander; Fernández, María Angélica; Simpson, Nuala H.; Gilissen, Christian; Reader, Rose H.; Jara, Lillian; Echeverry, Maria Magdalena; Francks, Clyde; Baird, Gillian; Conti-Ramsden, Gina; O’Hare, Anne; Bolton, Patrick F.; Hennessy, Elizabeth R.; Palomino, Hernán; Carvajal-Carmona, Luis; Veltman, Joris A.; Cazier, Jean-Baptiste; De Barbieri, Zulema

    2015-01-01

    Children affected by Specific Language Impairment (SLI) fail to acquire age appropriate language skills despite adequate intelligence and opportunity. SLI is highly heritable, but the understanding of underlying genetic mechanisms has proved challenging. In this study, we use molecular genetic techniques to investigate an admixed isolated founder population from the Robinson Crusoe Island (Chile), who are affected by a high incidence of SLI, increasing the power to discover contributory genetic factors. We utilize exome sequencing in selected individuals from this population to identify eight coding variants that are of putative significance. We then apply association analyses across the wider population to highlight a single rare coding variant (rs144169475, Minor Allele Frequency of 4.1% in admixed South American populations) in the NFXL1 gene that confers a nonsynonymous change (N150K) and is significantly associated with language impairment in the Robinson Crusoe population (p = 2.04 × 10–4, 8 variants tested). Subsequent sequencing of NFXL1 in 117 UK SLI cases identified four individuals with heterozygous variants predicted to be of functional consequence. We conclude that coding variants within NFXL1 confer an increased risk of SLI within a complex genetic model. PMID:25781923

  8. Compact variant-rich customized sequence database and a fast and sensitive database search for efficient proteogenomic analyses.

    PubMed

    Park, Heejin; Bae, Junwoo; Kim, Hyunwoo; Kim, Sangok; Kim, Hokeun; Mun, Dong-Gi; Joh, Yoonsung; Lee, Wonyeop; Chae, Sehyun; Lee, Sanghyuk; Kim, Hark Kyun; Hwang, Daehee; Lee, Sang-Won; Paek, Eunok

    2014-12-01

    In proteogenomic analysis, construction of a compact, customized database from mRNA-seq data and a sensitive search of both reference and customized databases are essential to accurately determine protein abundances and structural variations at the protein level. However, these tasks have not been systematically explored, but rather performed in an ad-hoc fashion. Here, we present an effective method for constructing a compact database containing comprehensive sequences of sample-specific variants--single nucleotide variants, insertions/deletions, and stop-codon mutations derived from Exome-seq and RNA-seq data. It, however, occupies less space by storing variant peptides, not variant proteins. We also present an efficient search method for both customized and reference databases. The separate searches of the two databases increase the search time, and a unified search is less sensitive to identify variant peptides due to the smaller size of the customized database, compared to the reference database, in the target-decoy setting. Our method searches the unified database once, but performs target-decoy validations separately. Experimental results show that our approach is as fast as the unified search and as sensitive as the separate searches. Our customized database includes mutation information in the headers of variant peptides, thereby facilitating the inspection of peptide-spectrum matches.

  9. Solid phase sequencing of double-stranded nucleic acids

    DOEpatents

    Fu, Dong-Jing; Cantor, Charles R.; Koster, Hubert; Smith, Cassandra L.

    2002-01-01

    This invention relates to methods for detecting and sequencing of target double-stranded nucleic acid sequences, to nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probe comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include nucleic acids in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated determination of molecular weights and identification of the target sequence.

  10. Ultradeep Sequencing for Detection of Quasispecies Variants in the Major Hydrophilic Region of Hepatitis B Virus in Indonesian Patients.

    PubMed

    Yamani, Laura Navika; Yano, Yoshihiko; Utsumi, Takako; Juniastuti; Wandono, Hadi; Widjanarko, Doddy; Triantanoe, Ari; Wasityastuti, Widya; Liang, Yujiao; Okada, Rina; Tanahashi, Toshihito; Murakami, Yoshiki; Azuma, Takeshi; Soetjipto; Lusida, Maria Inge; Hayashi, Yoshitake

    2015-10-01

    Quasispecies of hepatitis B virus (HBV) with variations in the major hydrophilic region (MHR) of the HBV surface antigen (HBsAg) can evolve during infection, allowing HBV to evade neutralizing antibodies. These escape variants may contribute to chronic infections. In this study, we looked for MHR variants in HBV quasispecies using ultradeep sequencing and evaluated the relationship between these variants and clinical manifestations in infected patients. We enrolled 30 Indonesian patients with hepatitis B infection (11 with chronic hepatitis and 19 with advanced liver disease). The most common subgenotype/subtype of HBV was B3/adw (97%). The HBsAg titer was lower in patients with advanced liver disease than that in patients with chronic hepatitis. The MHR variants were grouped based on the percentage of the viral population affected: major, ≥20% of the total population; intermediate, 5% to <20%; and minor, 1% to <5%. The rates of MHR variation that were present in the major and intermediate viral population were significantly greater in patients with advanced liver disease than those in chronic patients. The most frequent MHR variants related to immune evasion in the major and intermediate populations were P120Q/T, T123A, P127T, Q129H/R, M133L/T, and G145R. The major population of MHR variants causing impaired of HBsAg secretion (e.g., G119R, Q129R, T140I, and G145R) was detected only in advanced liver disease patients. This is the first study to use ultradeep sequencing for the detection of MHR variants of HBV quasispecies in Indonesian patients. We found that a greater number of MHR variations was related to disease severity and reduced likelihood of HBsAg titer.

  11. Identification of cancer predisposition variants in apparently healthy individuals using a next-generation sequencing-based family genomics approach.

    PubMed

    Karageorgos, Ioannis; Mizzi, Clint; Giannopoulou, Efstathia; Pavlidis, Cristiana; Peters, Brock A; Zagoriti, Zoi; Stenson, Peter D; Mitropoulos, Konstantinos; Borg, Joseph; Kalofonos, Haralabos P; Drmanac, Radoje; Stubbs, Andrew; van der Spek, Peter; Cooper, David N; Katsila, Theodora; Patrinos, George P

    2015-01-01

    Cancer, like many common disorders, has a complex etiology, often with a strong genetic component and with multiple environmental factors contributing to susceptibility. A considerable number of genomic variants have been previously reported to be causative of, or associated with, an increased risk for various types of cancer. Here, we adopted a next-generation sequencing approach in 11 members of two families of Greek descent to identify all genomic variants with the potential to predispose family members to cancer. Cross-comparison with data from the Human Gene Mutation Database identified a total of 571 variants, from which 47 % were disease-associated polymorphisms, 26 % disease-associated polymorphisms with additional supporting functional evidence, 19 % functional polymorphisms with in vitro/laboratory or in vivo supporting evidence but no known disease association, 4 % putative disease-causing mutations but with some residual doubt as to their pathological significance, and 3 % disease-causing mutations. Subsequent analysis, focused on the latter variant class most likely to be involved in cancer predisposition, revealed two variants of prime interest, namely MSH2 c.2732T>A (p.L911R) and BRCA1 c.2955delC, the first of which is novel. KMT2D c.13895delC and c.1940C>A variants are additionally reported as incidental findings. The next-generation sequencing-based family genomics approach described herein has the potential to be applied to other types of complex genetic disorder in order to identify variants of potential pathological significance. PMID:26092435

  12. Ultradeep Sequencing for Detection of Quasispecies Variants in the Major Hydrophilic Region of Hepatitis B Virus in Indonesian Patients

    PubMed Central

    Yamani, Laura Navika; Utsumi, Takako; Juniastuti; Wandono, Hadi; Widjanarko, Doddy; Triantanoe, Ari; Wasityastuti, Widya; Liang, Yujiao; Okada, Rina; Tanahashi, Toshihito; Murakami, Yoshiki; Azuma, Takeshi; Soetjipto; Lusida, Maria Inge; Hayashi, Yoshitake

    2015-01-01

    Quasispecies of hepatitis B virus (HBV) with variations in the major hydrophilic region (MHR) of the HBV surface antigen (HBsAg) can evolve during infection, allowing HBV to evade neutralizing antibodies. These escape variants may contribute to chronic infections. In this study, we looked for MHR variants in HBV quasispecies using ultradeep sequencing and evaluated the relationship between these variants and clinical manifestations in infected patients. We enrolled 30 Indonesian patients with hepatitis B infection (11 with chronic hepatitis and 19 with advanced liver disease). The most common subgenotype/subtype of HBV was B3/adw (97%). The HBsAg titer was lower in patients with advanced liver disease than that in patients with chronic hepatitis. The MHR variants were grouped based on the percentage of the viral population affected: major, ≥20% of the total population; intermediate, 5% to <20%; and minor, 1% to <5%. The rates of MHR variation that were present in the major and intermediate viral population were significantly greater in patients with advanced liver disease than those in chronic patients. The most frequent MHR variants related to immune evasion in the major and intermediate populations were P120Q/T, T123A, P127T, Q129H/R, M133L/T, and G145R. The major population of MHR variants causing impaired of HBsAg secretion (e.g., G119R, Q129R, T140I, and G145R) was detected only in advanced liver disease patients. This is the first study to use ultradeep sequencing for the detection of MHR variants of HBV quasispecies in Indonesian patients. We found that a greater number of MHR variations was related to disease severity and reduced likelihood of HBsAg titer. PMID:26202119

  13. Performance of ACMG-AMP Variant-Interpretation Guidelines among Nine Laboratories in the Clinical Sequencing Exploratory Research Consortium.

    PubMed

    Amendola, Laura M; Jarvik, Gail P; Leo, Michael C; McLaughlin, Heather M; Akkari, Yassmine; Amaral, Michelle D; Berg, Jonathan S; Biswas, Sawona; Bowling, Kevin M; Conlin, Laura K; Cooper, Greg M; Dorschner, Michael O; Dulik, Matthew C; Ghazani, Arezou A; Ghosh, Rajarshi; Green, Robert C; Hart, Ragan; Horton, Carrie; Johnston, Jennifer J; Lebo, Matthew S; Milosavljevic, Aleksandar; Ou, Jeffrey; Pak, Christine M; Patel, Ronak Y; Punj, Sumit; Richards, Carolyn Sue; Salama, Joseph; Strande, Natasha T; Yang, Yaping; Plon, Sharon E; Biesecker, Leslie G; Rehm, Heidi L

    2016-06-01

    Evaluating the pathogenicity of a variant is challenging given the plethora of types of genetic evidence that laboratories consider. Deciding how to weigh each type of evidence is difficult, and standards have been needed. In 2015, the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) published guidelines for the assessment of variants in genes associated with Mendelian diseases. Nine molecular diagnostic laboratories involved in the Clinical Sequencing Exploratory Research (CSER) consortium piloted these guidelines on 99 variants spanning all categories (pathogenic, likely pathogenic, uncertain significance, likely benign, and benign). Nine variants were distributed to all laboratories, and the remaining 90 were evaluated by three laboratories. The laboratories classified each variant by using both the laboratory's own method and the ACMG-AMP criteria. The agreement between the two methods used within laboratories was high (K-alpha = 0.91) with 79% concordance. However, there was only 34% concordance for either classification system across laboratories. After consensus discussions and detailed review of the ACMG-AMP criteria, concordance increased to 71%. Causes of initial discordance in ACMG-AMP classifications were identified, and recommendations on clarification and increased specification of the ACMG-AMP criteria were made. In summary, although an initial pilot of the ACMG-AMP guidelines did not lead to increased concordance in variant interpretation, comparing variant interpretations to identify differences and having a common framework to facilitate resolution of those differences were beneficial for improving agreement, allowing iterative movement toward increased reporting consistency for variants in genes associated with monogenic disease.

  14. Characterization of Genomic Variants Associated with Scout and Recruit Behavioral Castes in Honey Bees Using Whole-Genome Sequencing.

    PubMed

    Southey, Bruce R; Zhu, Ping; Carr-Markell, Morgan K; Liang, Zhengzheng S; Zayed, Amro; Li, Ruiqiang; Robinson, Gene E; Rodriguez-Zas, Sandra L

    2016-01-01

    Among forager honey bees, scouts seek new resources and return to the colony, enlisting recruits to collect these resources. Differentially expressed genes between these behaviors and genetic variability in scouting phenotypes have been reported. Whole-genome sequencing of 44 Apis mellifera scouts and recruits was undertaken to detect variants and further understand the genetic architecture underlying the behavioral differences between scouts and recruits. The median coverage depth in recruits and scouts was 10.01 and 10.7 X, respectively. Representation of bacterial species among the unmapped reads reflected a more diverse microbiome in scouts than recruits. Overall, 1,412,705 polymorphic positions were analyzed for associations with scouting behavior, and 212 significant (p-value < 0.0001) associations with scouting corresponding to 137 positions were detected. Most frequent putative transcription factor binding sites proximal to significant variants included Broad-complex 4, Broad-complex 1, Hunchback, and CF2-II. Three variants associated with scouting were located within coding regions of ncRNAs including one codon change (LOC102653644) and 2 frameshift indels (LOC102654879 and LOC102655256). Significant variants were also identified on the 5'UTR of membrin, and 3'UTRs of laccase 2 and diacylglycerol kinase theta. The 60 significant variants located within introns corresponded to 39 genes and most of these positions were > 1000 bp apart from each other. A number of these variants were mapped to ncRNA LOC100578102, solute carrier family 12 member 6-like gene, and LOC100576965 (meprin and TRAF-C homology domain containing gene). Functional categories represented among the genes corresponding to significant variants included: neuronal function, exoskeleton, immune response, salivary gland development, and enzymatic food processing. These categories offer a glimpse into the molecular support to the behaviors of scouts and recruits. The level of association between

  15. Performance of ACMG-AMP Variant-Interpretation Guidelines among Nine Laboratories in the Clinical Sequencing Exploratory Research Consortium.

    PubMed

    Amendola, Laura M; Jarvik, Gail P; Leo, Michael C; McLaughlin, Heather M; Akkari, Yassmine; Amaral, Michelle D; Berg, Jonathan S; Biswas, Sawona; Bowling, Kevin M; Conlin, Laura K; Cooper, Greg M; Dorschner, Michael O; Dulik, Matthew C; Ghazani, Arezou A; Ghosh, Rajarshi; Green, Robert C; Hart, Ragan; Horton, Carrie; Johnston, Jennifer J; Lebo, Matthew S; Milosavljevic, Aleksandar; Ou, Jeffrey; Pak, Christine M; Patel, Ronak Y; Punj, Sumit; Richards, Carolyn Sue; Salama, Joseph; Strande, Natasha T; Yang, Yaping; Plon, Sharon E; Biesecker, Leslie G; Rehm, Heidi L

    2016-06-01

    Evaluating the pathogenicity of a variant is challenging given the plethora of types of genetic evidence that laboratories consider. Deciding how to weigh each type of evidence is difficult, and standards have been needed. In 2015, the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) published guidelines for the assessment of variants in genes associated with Mendelian diseases. Nine molecular diagnostic laboratories involved in the Clinical Sequencing Exploratory Research (CSER) consortium piloted these guidelines on 99 variants spanning all categories (pathogenic, likely pathogenic, uncertain significance, likely benign, and benign). Nine variants were distributed to all laboratories, and the remaining 90 were evaluated by three laboratories. The laboratories classified each variant by using both the laboratory's own method and the ACMG-AMP criteria. The agreement between the two methods used within laboratories was high (K-alpha = 0.91) with 79% concordance. However, there was only 34% concordance for either classification system across laboratories. After consensus discussions and detailed review of the ACMG-AMP criteria, concordance increased to 71%. Causes of initial discordance in ACMG-AMP classifications were identified, and recommendations on clarification and increased specification of the ACMG-AMP criteria were made. In summary, although an initial pilot of the ACMG-AMP guidelines did not lead to increased concordance in variant interpretation, comparing variant interpretations to identify differences and having a common framework to facilitate resolution of those differences were beneficial for improving agreement, allowing iterative movement toward increased reporting consistency for variants in genes associated with monogenic disease. PMID:27181684

  16. Identification of variants in primary and recurrent glioblastoma using a cancer-specific gene panel and whole exome sequencing.

    PubMed

    Virk, Selene M; Gibson, Richard M; Quinones-Mateu, Miguel E; Barnholtz-Sloan, Jill S

    2015-01-01

    Glioblastoma (GBM) is an aggressive, malignant brain tumor typically resulting in death of the patient within one year following diagnosis; and those who survive beyond this point usually present with tumor recurrence within two years (5-year survival is 5%). The genetic heterogeneity of GBM has made the molecular characterization of these tumors an area of great interest and has led to identification of molecular subtypes in GBM. The availability of sequencing platforms that are both fast and economical can further the adoption of tumor sequencing in the clinical environment, potentially leading to identification of clinically actionable genetic targets. In this pilot study, comprised of triplet samples of normal blood, primary tumor, and recurrent tumor samples from three patients; we compared the ability of Illumina whole exome sequencing (ExomeSeq) and the Ion AmpliSeq Comprehensive Cancer Panel (CCP) to identify somatic variants in patient-paired primary and recurrent tumor samples. Thirteen genes were found to harbor variants, the majority of which were exclusive to the ExomeSeq data. Surprisingly, only two variants were identified by both platforms and they were located within the PTCH1 and NF1 genes. Although preliminary in nature, this work highlights major differences in variant identification in data generated from the two platforms. Additional studies with larger samples sizes are needed to further explore the differences between these technologies and to enhance our understanding of the clinical utility of panel based platforms in genomic profiling of brain tumors. PMID:25950952

  17. The Swedish new variant of Chlamydia trachomatis: genome sequence, morphology, cell tropism and phenotypic characterization

    PubMed Central

    Unemo, Magnus; Seth-Smith, Helena M. B.; Cutcliffe, Lesley T.; Skilton, Rachel J.; Barlow, David; Goulding, David; Persson, Kenneth; Harris, Simon R.; Kelly, Anne; Bjartling, Carina; Fredlund, Hans; Olcén, Per; Thomson, Nicholas R.; Clarke, Ian N.

    2010-01-01

    Chlamydia trachomatis is a major cause of bacterial sexually transmitted infections worldwide. In 2006, a new variant of C. trachomatis (nvCT), carrying a 377 bp deletion within the plasmid, was reported in Sweden. This deletion included the targets used by the commercial diagnostic systems from Roche and Abbott. The nvCT is clonal (serovar/genovar E) and it spread rapidly in Sweden, undiagnosed by these systems. The degree of spread may also indicate an increased biological fitness of nvCT. The aims of this study were to describe the genome of nvCT, to compare the nvCT genome to all available C. trachomatis genome sequences and to investigate the biological properties of nvCT. An early nvCT isolate (Sweden2) was analysed by genome sequencing, growth kinetics, microscopy, cell tropism assay and antimicrobial susceptibility testing. It was compared with relevant C. trachomatis isolates, including a similar serovar E C. trachomatis wild-type strain that circulated in Sweden prior to the initially undetected expansion of nvCT. The nvCT genome does not contain any major genetic polymorphisms – the genes for central metabolism, development cycle and virulence are conserved – or phenotypic characteristics that indicate any altered biological fitness. This is supported by the observations that the nvCT and wild-type C. trachomatis infections are very similar in terms of epidemiological distribution, and that differences in clinical signs are only described, in one study, in women. In conclusion, the nvCT does not appear to have any altered biological fitness. Therefore, the rapid transmission of nvCT in Sweden was due to the strong diagnostic selective advantage and its introduction into a high-frequency transmitting population. PMID:20093289

  18. Sequence variants affecting eosinophil numbers associate with asthma and myocardial infarction.

    PubMed

    Gudbjartsson, Daniel F; Bjornsdottir, Unnur S; Halapi, Eva; Helgadottir, Anna; Sulem, Patrick; Jonsdottir, Gudrun M; Thorleifsson, Gudmar; Helgadottir, Hafdis; Steinthorsdottir, Valgerdur; Stefansson, Hreinn; Williams, Carolyn; Hui, Jennie; Beilby, John; Warrington, Nicole M; James, Alan; Palmer, Lyle J; Koppelman, Gerard H; Heinzmann, Andrea; Krueger, Marcus; Boezen, H Marike; Wheatley, Amanda; Altmuller, Janine; Shin, Hyoung Doo; Uh, Soo-Taek; Cheong, Hyun Sub; Jonsdottir, Brynja; Gislason, David; Park, Choon-Sik; Rasmussen, Linda M; Porsbjerg, Celeste; Hansen, Jakob W; Backer, Vibeke; Werge, Thomas; Janson, Christer; Jönsson, Ulla-Britt; Ng, Maggie C Y; Chan, Juliana; So, Wing Yee; Ma, Ronald; Shah, Svati H; Granger, Christopher B; Quyyumi, Arshed A; Levey, Allan I; Vaccarino, Viola; Reilly, Muredach P; Rader, Daniel J; Williams, Michael J A; van Rij, Andre M; Jones, Gregory T; Trabetti, Elisabetta; Malerba, Giovanni; Pignatti, Pier Franco; Boner, Attilio; Pescollderungg, Lydia; Girelli, Domenico; Olivieri, Oliviero; Martinelli, Nicola; Ludviksson, Bjorn R; Ludviksdottir, Dora; Eyjolfsson, Gudmundur I; Arnar, David; Thorgeirsson, Gudmundur; Deichmann, Klaus; Thompson, Philip J; Wjst, Matthias; Hall, Ian P; Postma, Dirkje S; Gislason, Thorarinn; Gulcher, Jeffrey; Kong, Augustine; Jonsdottir, Ingileif; Thorsteinsdottir, Unnur; Stefansson, Kari

    2009-03-01

    Eosinophils are pleiotropic multifunctional leukocytes involved in initiation and propagation of inflammatory responses and thus have important roles in the pathogenesis of inflammatory diseases. Here we describe a genome-wide association scan for sequence variants affecting eosinophil counts in blood of 9,392 Icelanders. The most significant SNPs were studied further in 12,118 Europeans and 5,212 East Asians. SNPs at 2q12 (rs1420101), 2q13 (rs12619285), 3q21 (rs4857855), 5q31 (rs4143832) and 12q24 (rs3184504) reached genome-wide significance (P = 5.3 x 10(-14), 5.4 x 10(-10), 8.6 x 10(-17), 1.2 x 10(-10) and 6.5 x 10(-19), respectively). A SNP at IL1RL1 associated with asthma (P = 5.5 x 10(-12)) in a collection of ten different populations (7,996 cases and 44,890 controls). SNPs at WDR36, IL33 and MYB that showed suggestive association with eosinophil counts were also associated with atopic asthma (P = 4.2 x 10(-6), 2.2 x 10(-5) and 2.4 x 10(-4), respectively). We also found that a nonsynonymous SNP at 12q24, in SH2B3, associated significantly (P = 8.6 x 10(-8)) with myocardial infarction in six different populations (6,650 cases and 40,621 controls).

  19. Dietary fatty acids modulate associations between genetic variants and circulating fatty acids in plasma and erythrocyte membranes: meta-analysis of 9 studies in the CHARGE consortium

    PubMed Central

    Smith, Caren E.; Follis, Jack L.; Nettleton, Jennifer A.; Foy, Millennia; Wu, Jason H.Y.; Ma, Yiyi; Tanaka, Toshiko; Manichakul, Ani W.; Wu, Hongyu; Chu, Audrey Y.; Steffen, Lyn M.; Fornage, Myriam; Mozaffarian, Dariush; Kabagambe, Edmond K.; Ferruci, Luigi; da Chen, Yii-Der I; Rich, Stephen S.; Djoussé, Luc; Ridker, Paul M.; Tang, Weihong; McKnight, Barbara; Tsai, Michael Y.; Bandinelli, Stefania; Rotter, Jerome I.; Hu, Frank B.; Chasman, Daniel I.; Psaty, Bruce M.; Arnett, Donna K.; King, Irena B.; Sun, Qi; Wang, Lu; Lumley, Thomas; Chiuve, Stephanie E.; Siscovick, David S; Ordovás, José M.; Lemaitre, Rozenn N.

    2015-01-01

    Scope Tissue concentrations of omega-3 fatty acids may reduce cardiovascular disease risk, and genetic variants are associated with circulating fatty acids concentrations. Whether dietary fatty acids interact with genetic variants to modify circulating omega-3 fatty acids is unclear. Objective We evaluated interactions between genetic variants and fatty acid intakes for circulating alpha-linoleic acid (ALA), eicosapentaenoic acid (EPA), docosahexaenoic acid (DHA) and docosapentaenoic acid (DPA). Methods and Results We conducted meta-analyses (N to 11,668) evaluating interactions between dietary fatty acids and genetic variants (rs174538 and rs174548 in FADS1 (fatty acid desaturase 1), rs7435 in AGPAT3 (1-acyl-sn-glycerol-3-phosphate), rs4985167 in PDXDC1 (pyridoxal-dependent decarboxylase domain-containing 1), rs780094 in GCKR (glucokinase regulatory protein) and rs3734398 in ELOVL2 (fatty acid elongase 2)). Stratification by measurement compartment (plasma vs. erthyrocyte) revealed compartment-specific interactions between FADS1 rs174538 and rs174548 and dietary ALA and linoleic acid for DHA and DPA. Conclusion Our findings reinforce earlier reports that genetically-based differences in circulating fatty acids may be partially due to differences in the conversion of fatty acid precursors. Further, fatty acids measurement compartment may modify gene-diet relationships, and considering compartment may improve the detection of gene-fatty acids interactions for circulating fatty acid outcomes. PMID:25626431

  20. Genetic variants of the unsaturated fatty acid receptor GPR120 relating to obesity in dogs

    PubMed Central

    MIYABE, Masahiro; GIN, Azusa; ONOZAWA, Eri; DAIMON, Mana; YAMADA, Hana; ODA, Hitomi; MORI, Akihiro; MOMOTA, Yutaka; AZAKAMI, Daigo; YAMAMOTO, Ichiro; MOCHIZUKI, Mariko; SAKO, Toshinori; TAMURA, Katsutoshi; ISHIOKA, Katsumi

    2015-01-01

    G protein-coupled receptor (GPR) 120 is an unsaturated fatty acid receptor, which is associated with various physiological functions. It is reported that the genetic variant of GPR120, p.Arg270His, is detected more in obese people, and this genetic variation functionally relates to obesity in humans. Obesity is a common nutritional disorder also in dogs, but the genetic factors have not ever been identified in dogs. In this study, we investigated the molecular structure of canine GPR120 and searched for candidate genetic variants which may relate to obesity in dogs. Canine GPR120 was highly homologous to those of other species, and seven transmembrane domains and two N-glycosylation sites were conserved. GPR120 mRNA was expressed in lung, jejunum, ileum, colon, hypothalamus, hippocampus, spinal cord, bone marrow, dermis and white adipose tissues in dogs, as those in mice and humans. Genetic variants of GPR120 were explored in client-owned 141 dogs, resulting in that 5 synonymous and 4 non-synonymous variants were found. The variant c.595C>A (p.Pro199Thr) was found in 40 dogs, and the gene frequency was significantly higher in dogs with higher body condition scores, i.e. 0.320 in BCS4–5 dogs, 0.175 in BCS3 dogs and 0.000 in BCS2 dogs. We conclude that c.595C>A (p.Pro199Thr) is a candidate variant relating to obesity, which may be helpful for nutritional management of dogs. PMID:25960032

  1. In search of rare variants: preliminary results from whole genome sequencing of 1,325 individuals with psychophysiological endophenotypes.

    PubMed

    Vrieze, Scott I; Malone, Stephen M; Vaidyanathan, Uma; Kwong, Alan; Kang, Hyun Min; Zhan, Xiaowei; Flickinger, Matthew; Irons, Daniel; Jun, Goo; Locke, Adam E; Pistis, Giorgio; Porcu, Eleonora; Levy, Shawn; Myers, Richard M; Oetting, William; McGue, Matt; Abecasis, Goncalo; Iacono, William G

    2014-12-01

    Whole genome sequencing was completed on 1,325 individuals from 602 families, identifying 27 million autosomal variants. Genetic association tests were conducted for those individuals who had been assessed for one or more of 17 endophenotypes (N range = 802-1,185). No significant associations were found. These 27 million variants were then imputed into the full sample of individuals with psychophysiological data (N range = 3,088-4,469) and again tested for associations with the 17 endophenotypes. No association was significant. Using a gene-based variable threshold burden test of nonsynonymous variants, we obtained five significant associations. These findings are preliminary and call for additional analysis of this rich sample. We argue that larger samples, alternative study designs, and additional bioinformatics approaches will be necessary to discover associations between these endophenotypes and genomic variation. PMID:25387710

  2. In search of rare variants: Preliminary results from whole genome sequencing of 1,325 individuals with psychophysiological endophenotypes

    PubMed Central

    VRIEZE, SCOTT I.; MALONE, STEPHEN M.; VAIDYANATHAN, UMA; KWONG, ALAN; KANG, HYUN MIN; ZHAN, XIAOWEI; FLICKINGER, MATTHEW; IRONS, DANIEL; JUN, GOO; LOCKE, ADAM E.; PISTIS, GIORGIO; PORCU, ELEONORA; LEVY, SHAWN; MYERS, RICHARD M.; OETTING, WILLIAM; MCGUE, MATT; ABECASIS, GONCALO; IACONO, WILLIAM G.

    2014-01-01

    Whole genome sequencing was completed on 1,325 individuals from 602 families, identifying 27 million autosomal variants. Genetic association tests were conducted for those individuals who had been assessed for one or more of 17 endophenotypes (N range = 802–1,185). No significant associations were found. These 27 million variants were then imputed into the full sample of individuals with psychophysiological data (N range = 3,088–4,469) and again tested for associations with the 17 endophenotypes. No association was significant. Using a gene-based variable threshold burden test of nonsynonymous variants, we obtained five significant associations. These findings are preliminary and call for additional analysis of this rich sample. We argue that larger samples, alternative study designs, and additional bioinformatics approaches will be necessary to discover associations between these endophenotypes and genomic variation. PMID:25387710

  3. Exome Sequencing Analysis Reveals Variants in Primary Immunodeficiency Genes in Patients With Very Early Onset Inflammatory Bowel Disease

    PubMed Central

    Kelsen, Judith R.; Dawany, Noor; Moran, Christopher J.; Petersen, Britt-Sabina; Sarmady, Mahdi; Sasson, Ariella; Pauly-Hubbard, Helen; Martinez, Alejandro; Maurer, Kelly; Soong, Joanne; Rappaport, Eric; Franke, Andre; Keller, Andreas; Winter, Harland S.; Mamula, Petar; Piccoli, David; Artis, David; Sonnenberg, Gregory F.; Daly, Mark; Sullivan, Kathleen E.; Baldassano, Robert N.; Devoto, Marcella

    2016-01-01

    Background & Aims Very early onset inflammatory bowel disease (VEO-IBD), IBD diagnosed ≤5 y of age, frequently presents with a different and more severe phenotype than older-onset IBD. We investigated whether patients with VEO-IBD carry rare or novel variants in genes associated with immunodeficiencies that might contribute to disease development. Methods Patients with VEO-IBD and parents (when available) were recruited from the Children's Hospital of Philadelphia from March 2013 through July 2014. We analyzed DNA from 125 patients with VEO-IBD (ages 3 weeks to 4 y) and 19 parents, 4 of whom also had IBD. Exome capture was performed by Agilent SureSelect V4, and sequencing was performed using the Illumina HiSeq platform. Alignment to human genome GRCh37 was achieved followed by post-processing and variant calling. Following functional annotation, candidate variants were analyzed for change in protein function, minor allele frequency <0.1%, and scaled combined annotation dependent depletion scores ≤10. We focused on genes associated with primary immunodeficiencies and related pathways. An additional 210 exome samples from patients with pediatric IBD (n=45) or adult-onset Crohn's disease (n=20) and healthy individuals (controls, n=145) were obtained from the University of Kiel, Germany and used as control groups. Results Four-hundred genes and regions associated with primary immunodeficiency, covering approximately 6500 coding exons totaling > 1 Mbp of coding sequence, were selected from the whole exome data. Our analysis revealed novel and rare variants within these genes that could contribute to the development of VEO-IBD, including rare heterozygous missense variants in IL10RA and previously unidentified variants in MSH5 and CD19. Conclusions In an exome sequence analysis of patients with VEO-IBD and their parents, we identified variants in genes that regulate B- and T-cell functions and could contribute to pathogenesis. Our analysis could lead to the

  4. From Artificial Amino Acids to Sequence-Defined Targeted Oligoaminoamides.

    PubMed

    Morys, Stephan; Wagner, Ernst; Lächelt, Ulrich

    2016-01-01

    Artificial oligoamino acids with appropriate protecting groups can be used for the sequential assembly of oligoaminoamides on solid-phase. With the help of these oligoamino acids multifunctional nucleic acid (NA) carriers can be designed and produced in highly defined topologies. Here we describe the synthesis of the artificial oligoamino acid Fmoc-Stp(Boc3)-OH, the subsequent assembly into sequence-defined oligomers and the formulation of tumor-targeted plasmid DNA (pDNA) polyplexes. PMID:27436323

  5. Next-generation re-sequencing of genes involved in increased platelet reactivity in diabetic patients on acetylsalicylic acid.

    PubMed

    Postula, Marek; Janicki, Piotr K; Eyileten, Ceren; Rosiak, Marek; Kaplon-Cieslicka, Agnieszka; Sugino, Shigekazu; Wilimski, Radosław; Kosior, Dariusz A; Opolski, Grzegorz; Filipiak, Krzysztof J; Mirowska-Guzel, Dagmara

    2016-06-01

    The objective of this study was to investigate whether rare missense genetic variants in several genes related to platelet functions and acetylsalicylic acid (ASA) response are associated with the platelet reactivity in patients with diabetes type 2 (T2D) on ASA therapy. Fifty eight exons and corresponding introns of eight selected genes, including PTGS1, PTGS2, TXBAS1, PTGIS, ADRA2A, ADRA2B, TXBA2R, and P2RY1 were re-sequenced in 230 DNA samples from T2D patients by using a pooled PCR amplification and next-generation sequencing by Illumina HiSeq2000. The observed non-synonymous variants were confirmed by individual genotyping of 384 DNA samples comprising of the individuals from the original discovery pools and additional verification cohort of 154 ASA-treated T2DM patients. The association between investigated phenotypes (ASA induced changes in platelets reactivity by PFA-100, VerifyNow and serum thromboxane B2 level [sTxB2]), and accumulation of rare missense variants (genetic burden) in investigated genes was tested using statistical collapsing tests. We identified a total of 35 exonic variants, including 3 common missense variants, 15 rare missense variants, and 17 synonymous variants in 8 investigated genes. The rare missense variants exhibited statistically significant difference in the accumulation pattern between a group of patients with increased and normal platelet reactivity based on PFA-100 assay. Our study suggests that genetic burden of the rare functional variants in eight genes may contribute to differences in the platelet reactivity measured with the PFA-100 assay in the T2DM patients treated with ASA. PMID:26599574

  6. Genome Sequence of Rough and Smooth Variants of Pleomorphic Strain Lactobacillus farciminis CNCM-I-3699

    PubMed Central

    Tareb, R.; Bernardeau, M.

    2015-01-01

    The probiotic Lactobacillus farciminis CNCM-I-3699 is a pleomorphic strain exhibiting smooth and rough variants. We report their complete genomes consisting of a chromosome of 2, 4 Mb and a plasmid of 6,417 bp. The smooth variant differs by the presence of an additional plasmid of 35,418 bp. PMID:26383668

  7. Whole-Genome Sequencing of a Canine Family Trio Reveals a FAM83G Variant Associated with Hereditary Footpad Hyperkeratosis.

    PubMed

    Sayyab, Shumaila; Viluma, Agnese; Bergvall, Kerstin; Brunberg, Emma; Jagannathan, Vidhya; Leeb, Tosso; Andersson, Göran; Bergström, Tomas F

    2016-03-01

    Over 250 Mendelian traits and disorders, caused by rare alleles have been mapped in the canine genome. Although each disease is rare in the dog as a species, they are collectively common and have major impact on canine health. With SNP-based genotyping arrays, genome-wide association studies (GWAS) have proven to be a powerful method to map the genomic region of interest when 10-20 cases and 10-20 controls are available. However, to identify the genetic variant in associated regions, fine-mapping and targeted resequencing is required. Here we present a new approach using whole-genome sequencing (WGS) of a family trio without prior GWAS. As a proof-of-concept, we chose an autosomal recessive disease known as hereditary footpad hyperkeratosis (HFH) in Kromfohrländer dogs. To our knowledge, this is the first time this family trio WGS-approach has been used successfully to identify a genetic variant that perfectly segregates with a canine disorder. The sequencing of three Kromfohrländer dogs from a family trio (an affected offspring and both its healthy parents) resulted in an average genome coverage of 9.2X per individual. After applying stringent filtering criteria for candidate causative coding variants, 527 single nucleotide variants (SNVs) and 15 indels were found to be homozygous in the affected offspring and heterozygous in the parents. Using the computer software packages ANNOVAR and SIFT to functionally annotate coding sequence differences, and to predict their functional effect, resulted in seven candidate variants located in six different genes. Of these, only FAM83G:c155G > C (p.R52P) was found to be concordant in eight additional cases, and 16 healthy Kromfohrländer dogs. PMID:26747202

  8. Whole-Genome Sequencing of a Canine Family Trio Reveals a FAM83G Variant Associated with Hereditary Footpad Hyperkeratosis

    PubMed Central

    Sayyab, Shumaila; Viluma, Agnese; Bergvall, Kerstin; Brunberg, Emma; Jagannathan, Vidhya; Leeb, Tosso; Andersson, Göran; Bergström, Tomas F.

    2016-01-01

    Over 250 Mendelian traits and disorders, caused by rare alleles have been mapped in the canine genome. Although each disease is rare in the dog as a species, they are collectively common and have major impact on canine health. With SNP-based genotyping arrays, genome-wide association studies (GWAS) have proven to be a powerful method to map the genomic region of interest when 10–20 cases and 10–20 controls are available. However, to identify the genetic variant in associated regions, fine-mapping and targeted resequencing is required. Here we present a new approach using whole-genome sequencing (WGS) of a family trio without prior GWAS. As a proof-of-concept, we chose an autosomal recessive disease known as hereditary footpad hyperkeratosis (HFH) in Kromfohrländer dogs. To our knowledge, this is the first time this family trio WGS-approach has been used successfully to identify a genetic variant that perfectly segregates with a canine disorder. The sequencing of three Kromfohrländer dogs from a family trio (an affected offspring and both its healthy parents) resulted in an average genome coverage of 9.2X per individual. After applying stringent filtering criteria for candidate causative coding variants, 527 single nucleotide variants (SNVs) and 15 indels were found to be homozygous in the affected offspring and heterozygous in the parents. Using the computer software packages ANNOVAR and SIFT to functionally annotate coding sequence differences, and to predict their functional effect, resulted in seven candidate variants located in six different genes. Of these, only FAM83G:c155G > C (p.R52P) was found to be concordant in eight additional cases, and 16 healthy Kromfohrländer dogs. PMID:26747202

  9. CBH1 homologs and variant CBH1 cellulases

    DOEpatents

    Goedegebuur, Frits; Gualfetti, Peter; Mitchinson, Colin; Neefe, Paulien

    2008-11-18

    Disclosed are a number of homologs and variants of Hypocrea jecorina Cel7A (formerly Trichoderma reesei cellobiohydrolase I or CBH1), nucleic acids encoding the same and methods for producing the same. The homologs and variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted and/or deleted.

  10. CBH1 homologs and variant CBH1 cellulases

    DOEpatents

    Goedegebuur, Frits; Gualfetti, Peter; Mitchinson, Colin; Neefe, Paulien

    2011-05-31

    Disclosed are a number of homologs and variants of Hypocrea jecorina Cel7A (formerly Trichoderma reesei cellobiohydrolase I or CBH1), nucleic acids encoding the same and methods for producing the same. The homologs and variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted and/or deleted.

  11. Clinically relevant variants identified in thoracic aortic aneurysm patients by research exome sequencing.

    PubMed

    Schubert, Jeffrey A; Landis, Benjamin J; Shikany, Amy R; Hinton, Robert B; Ware, Stephanie M

    2016-05-01

    Thoracic aortic aneurysm (TAA) is a genetically heterogeneous disease involving subclinical and progressive dilation of the thoracic aorta, which can lead to life-threatening complications such as dissection or rupture. Genetic testing is important for risk stratification and identification of at risk family members, and clinically available genetic testing panels have been expanding rapidly. However, when past testing results are normal, there is little evidence to guide decision-making about the indications and timing to pursue additional clinical genetic testing. Results from research based genetic testing can help inform this process. Here we present 10 TAA patients who have a family history of disease and who enrolled in research-based exome testing. Nine of these ten patients had previous clinical genetic testing that did not identify the cause of disease. We sought to determine the number of rare variants in 23 known TAA associated genes identified by research-based exome testing. In total, we found 10 rare variants in six patients. Likely pathogenic variants included a TGFB2 variant in one patient and a SMAD3 variant in another. These variants have been reported previously in individuals with similar phenotypes. Variants of uncertain significance of particular interest included novel variants in MYLK and MFAP5, which were identified in a third patient. In total, clinically reportable rare variants were found in 6/10 (60%) patients, with at least 2/10 (20%) patients having likely pathogenic variants identified. These data indicate that consideration of re-testing is important in TAA patients with previous negative or inconclusive results. PMID:26854089

  12. Angiogenesis-associated sequence variants relative to breast cancer recurrence and survival

    PubMed Central

    Brock, Guy N.; VanCleave, Tiva T.; Benford, Marnita L.; Lavender, Nicole A.; Kruer, Traci L.; Wittliff, James L.

    2016-01-01

    Introduction Breast cancer (BrCA) risk stratification using clinico-pathological biomarkers helps improve disease prognosis prediction. However, disease recurrence rates remain unfavorable and individualized clinical management strategies are needed. Consequently, we evaluated the influence of 14 sequence variants detected in IL-10, TGF-β1, VEGF, and their associated receptors as effective predictors of BrCA clinical outcomes. Methods Tumor DNA samples collected from 441 BrCA patients were genotyped using TaqMan-PCR. Most selected targets alter cytokine serum/plasma levels or signaling pathways. Relationships between genetic profiles and recurrence as well as disease-related mortality were evaluated using cumulative incidence curves and competing risk regression models. Results The VEGF−2578 C allele was associated with a 1.3-to 1.6-fold increase in BrCA recurrence (HRtrend = 1.28; 95% CI = 0.96–1.72) and disease-related mortality (HRtrend = 1.56; 95% CI = 0.93–2.56). Although this marker was marginally significant relative to BrCA outcomes, there were substantial gains in the 5- and 8-year predictive accuracy compared to standard prognostic indicators. Among ER+/PR+ status patients, there was a significant impact of the VEGF−2578 CC genotype on disease recurrence and predictive accuracy. Conclusions Our findings suggest inheritance of the VEGF−2578 C allele could serve as an independent prognostic indicator of BrCA prognosis. The VEGF−2578 marker may have clinical implications among a subset of ER+/ PR+ patients with an aggressive phenotype. Because the VEGF−2578 C allele is linked to high VEGF expression, this cytokine is a potential prognostic and targeted clinical management tool. PMID:20571871

  13. Construction and Application of Variants of the Pseudomonas fluorescens EBC191 Arylacetonitrilase for Increased Production of Acids or Amides▿ †

    PubMed Central

    Sosedov, Olga; Baum, Stefanie; Bürger, Sibylle; Matzer, Kathrin; Kiziak, Christoph; Stolz, Andreas

    2010-01-01

    The arylacetonitrilase from Pseudomonas fluorescens EBC191 differs from previously studied arylacetonitrilases by its low enantiospecificity during the turnover of mandelonitrile and by the large amounts of amides that are formed in the course of this reaction. In the sequence of the nitrilase from P. fluorescens, a cysteine residue (Cys163) is present in direct neighborhood (toward the amino terminus) to the catalytic active cysteine residue, which is rather unique among bacterial nitrilases. Therefore, this cysteine residue was exchanged in the nitrilase from P. fluorescens EBC191 for various amino acid residues which are present in other nitrilases at the homologous position. The influence of these mutations on the reaction specificity and enantiospecificity was analyzed with (R,S)-mandelonitrile and (R,S)-2-phenylpropionitrile as substrates. The mutants obtained demonstrated significant differences in their amide-forming capacities. The exchange of Cys163 for asparagine or glutamine residues resulted in significantly increased amounts of amides formed. In contrast, a substitution for alanine or serine residues decreased the amounts of amides formed. The newly discovered mutation was combined with previously identified mutations which also resulted in increased amide formation. Thus, variants which possessed in addition to the mutation Cys163Asn also a deletion at the C terminus of the enzyme and/or the modification Ala165Arg were constructed. These constructs demonstrated increased amide formation capacity in comparison to the mutants carrying only single mutations. The recombinant plasmids that encoded enzyme variants which formed large amounts of mandeloamide or that formed almost stoichiometric amounts of mandelic acid from mandelonitrile were used to transform Escherichia coli strains that expressed a plant-derived (S)-hydroxynitrile lyase. The whole-cell biocatalysts obtained in this way converted benzaldehyde plus cyanide either to (S)-mandeloamide or (S

  14. Antagonistic lactic acid bacteria isolated from goat milk and identification of a novel nisin variant Lactococcus lactis

    PubMed Central

    2014-01-01

    Background The raw goat milk microbiota is considered a good source of novel bacteriocinogenic lactic acid bacteria (LAB) strains that can be exploited as an alternative for use as biopreservatives in foods. The constant demand for such alternative tools justifies studies that investigate the antimicrobial potential of such strains. Results The obtained data identified a predominance of Lactococcus and Enterococcus strains in raw goat milk microbiota with antimicrobial activity against Listeria monocytogenes ATCC 7644. Enzymatic assays confirmed the bacteriocinogenic nature of the antimicrobial substances produced by the isolated strains, and PCR reactions detected a variety of bacteriocin-related genes in their genomes. Rep-PCR identified broad genetic variability among the Enterococcus isolates, and close relations between the Lactococcus strains. The sequencing of PCR products from nis-positive Lactococcus allowed the identification of a predicted nisin variant not previously described and possessing a wide inhibitory spectrum. Conclusions Raw goat milk was confirmed as a good source of novel bacteriocinogenic LAB strains, having identified Lactococcus isolates possessing variations in their genomes that suggest the production of a nisin variant not yet described and with potential for use as biopreservatives in food due to its broad spectrum of action. PMID:24521354

  15. In vivo distribution and cytopathology of variants of human immunodeficiency virus type 1 showing restricted sequence variability in the V3 loop.

    PubMed Central

    Donaldson, Y K; Bell, J E; Holmes, E C; Hughes, E S; Brown, H K; Simmonds, P

    1994-01-01

    The distribution, cell tropism, and cytopathology in vivo of human immunodeficiency virus (HIV) was investigated in postmortem tissue samples from a series of HIV-infected individuals who died either of complications associated with AIDS or for unrelated reasons while they were asymptomatic. Proviral sequences were detected at a high copy number in lymphoid tissue of both presymptomatic patients and patients with AIDS, whereas significant infection of nonlymphoid tissue such as that from brains, spinal cords, and lungs were confined to those with AIDS. V3 loop sequences from both groups showed highly restricted sequence variability and a low overall positive charge of the encoded amino acid sequence compared with those of standard laboratory isolates of HIV type 1 (HIV-1). The low charge and the restriction in sequence variability were comparable to those observed with isolates showing a non-syncytium-inducing (NSI) and macrophage-tropic phenotype in vitro. All patients were either exclusively infected (six of seven cases) or predominantly infected (one case) with variants with a predicted NSI/macrophage-tropic phenotype, irrespective of the degree of disease progression. p24 antigen was detected by immunocytochemical staining of paraffin-fixed sections in the germinal centers within lymphoid tissue, although little or no antigen was found in areas of lymph node or spleen containing T lymphocytes from either presymptomatic patients or patients with AIDS. The predominant p24 antigen-expressing cells in the lungs and brains of the patients with AIDS were macrophages and microglia (in brains), frequently forming multinucleated giant cells (syncytia) even though the V3 loop sequences of these variants resembled those of NSI isolates in vitro. These studies indicate that lack of syncytium-forming ability in established T-cell lines does not necessarily predict syncytium-forming ability in primary target cells in vivo. Furthermore, variants of HIV with V3 sequences

  16. Multiple Amino Acid Sequence Alignment Nitrogenase Component 1: Insights into Phylogenetics and Structure-Function Relationships

    PubMed Central

    Howard, James B.; Kechris, Katerina J.; Rees, Douglas C.; Glazer, Alexander N.

    2013-01-01

    Amino acid residues critical for a protein's structure-function are retained by natural selection and these residues are identified by the level of variance in co-aligned homologous protein sequences. The relevant residues in the nitrogen fixation Component 1 α- and β-subunits were identified by the alignment of 95 protein sequences. Proteins were included from species encompassing multiple microbial phyla and diverse ecological niches as well as the nitrogen fixation genotypes, anf, nif, and vnf, which encode proteins associated with cofactors differing at one metal site. After adjusting for differences in sequence length, insertions, and deletions, the remaining >85% of the sequence co-aligned the subunits from the three genotypes. Six Groups, designated Anf, Vnf , and Nif I-IV, were assigned based upon genetic origin, sequence adjustments, and conserved residues. Both subunits subdivided into the same groups. Invariant and single variant residues were identified and were defined as “core” for nitrogenase function. Three species in Group Nif-III, Candidatus Desulforudis audaxviator, Desulfotomaculum kuznetsovii, and Thermodesulfatator indicus, were found to have a seleno-cysteine that replaces one cysteinyl ligand of the 8Fe:7S, P-cluster. Subsets of invariant residues, limited to individual groups, were identified; these unique residues help identify the gene of origin (anf, nif, or vnf) yet should not be considered diagnostic of the metal content of associated cofactors. Fourteen of the 19 residues that compose the cofactor pocket are invariant or single variant; the other five residues are highly variable but do not correlate with the putative metal content of the cofactor. The variable residues are clustered on one side of the cofactor, away from other functional centers in the three dimensional structure. Many of the invariant and single variant residues were not previously recognized as potentially critical and their identification provides the bases

  17. Identification of rare DNA sequence variants in high-risk autism families and their prevalence in a large case/control population

    PubMed Central

    2014-01-01

    Background Genetics clearly plays a major role in the etiology of autism spectrum disorders (ASDs), but studies to date are only beginning to characterize the causal genetic variants responsible. Until recently, studies using multiple extended multi-generation families to identify ASD risk genes had not been undertaken. Methods We identified haplotypes shared among individuals with ASDs in large multiplex families, followed by targeted DNA capture and sequencing to identify potential causal variants. We also assayed the prevalence of the identified variants in a large ASD case/control population. Results We identified 584 non-conservative missense, nonsense, frameshift and splice site variants that might predispose to autism in our high-risk families. Eleven of these variants were observed to have odds ratios greater than 1.5 in a set of 1,541 unrelated children with autism and 5,785 controls. Three variants, in the RAB11FIP5, ABP1, and JMJD7-PLA2G4B genes, each were observed in a single case and not in any controls. These variants also were not seen in public sequence databases, suggesting that they may be rare causal ASD variants. Twenty-eight additional rare variants were observed only in high-risk ASD families. Collectively, these 39 variants identify 36 genes as ASD risk genes. Segregation of sequence variants and of copy number variants previously detected in these families reveals a complex pattern, with only a RAB11FIP5 variant segregating to all affected individuals in one two-generation pedigree. Some affected individuals were found to have multiple potential risk alleles, including sequence variants and copy number variants (CNVs), suggesting that the high incidence of autism in these families could be best explained by variants at multiple loci. Conclusions Our study is the first to use haplotype sharing to identify familial ASD risk loci. In total, we identified 39 variants in 36 genes that may confer a genetic risk of developing autism. The

  18. Segments of amino acid sequence similarity in beta-amylases.

    PubMed

    Friedberg, F; Rhodes, C

    1988-01-01

    In alpha-amylases from animals, plants and bacteria and in beta-amylases from plants and bacteria a number of segments exhibit amino acid sequence similarity specific to the alpha or to the beta type, respectively. In the case of the beta-amylases the similar sequence regions are extensive and they are disrupted only by short interspersed dissimilar regions. Close to the C terminus, however, no such sequence similarity exist. PMID:2464171

  19. DNA sequence variants in the carbonyl reductase 1 (cbr1) gene in seven breeds of Canis lupus familiaris.

    PubMed

    Cheng, Q; Sanborn, C; Ferguson, D; Blanco, J G

    2012-04-27

    The anticancer anthracyclines doxorubicin and daunorubicin are used to treat a variety of cancers in dogs. The therapeutic utility of anthracyclines is limited by cardiotoxicity in some cases. Synthesis of anthracycline alcohol metabolites by carbonyl reductase 1 (CBR1) is crucial for the pathogenesis of cardiotoxicity. We hypothesize that genetic polymorphisms in canine cbr1 contribute to the variable pharmacodynamics of anthracyclines in dogs. DNA sequence variants in canine cbr1 were investigated in DNA samples from dogs of seven breeds. Thirteen SNPs were detected in canine cbr1. A 10-bp deletion in the 5'-untranslated region (5'-UTR) was found in specimens from the Labrador Retriever, Beagle, Siberian Husky, and Boxer breeds. The 5'-UTR also included a polymorphic "hot spot" region immediately downstream of the 10-bp deletion. DNA sequence variants in the "hot spot region" ranged from 1 to 21 bp in length. Bioinformatics searches identified a cluster of three to six potential binding sites for the transcription factor Sp1 in the DNA segment containing both the "hot spot" region and the 10-bp deletion. This information provides a foundation to allow us to investigate whether DNA sequence variants in the 5'-UTR of canine cbr1 impact the pharmacodynamics of anticancer anthracyclines in dogs.

  20. Influenza virus sequence feature variant type analysis: evidence of a role for NS1 in influenza virus host range restriction.

    PubMed

    Noronha, Jyothi M; Liu, Mengya; Squires, R Burke; Pickett, Brett E; Hale, Benjamin G; Air, Gillian M; Galloway, Summer E; Takimoto, Toru; Schmolke, Mirco; Hunt, Victoria; Klem, Edward; García-Sastre, Adolfo; McGee, Monnie; Scheuermann, Richard H

    2012-05-01

    Genetic drift of influenza virus genomic sequences occurs through the combined effects of sequence alterations introduced by a low-fidelity polymerase and the varying selective pressures experienced as the virus migrates through different host environments. While traditional phylogenetic analysis is useful in tracking the evolutionary heritage of these viruses, the specific genetic determinants that dictate important phenotypic characteristics are often difficult to discern within the complex genetic background arising through evolution. Here we describe a novel influenza virus sequence feature variant type (Flu-SFVT) approach, made available through the public Influenza Research Database resource (www.fludb.org), in which variant types (VTs) identified in defined influenza virus protein sequence features (SFs) are used for genotype-phenotype association studies. Since SFs have been defined for all influenza virus proteins based on known structural, functional, and immune epitope recognition properties, the Flu-SFVT approach allows the rapid identification of the molecular genetic determinants of important influenza virus characteristics and their connection to underlying biological functions. We demonstrate the use of the SFVT approach to obtain statistical evidence for effects of NS1 protein sequence variations in dictating influenza virus host range restriction.

  1. High-Throughput Sequencing of mGluR Signaling Pathway Genes Reveals Enrichment of Rare Variants in Autism

    PubMed Central

    Hovhannisyan, Hayk; Trautman, Edwin; Pinard, Robert; Rathmell, Barbara; Carpenter, Randall; Margulies, David

    2012-01-01

    Identification of common molecular pathways affected by genetic variation in autism is important for understanding disease pathogenesis and devising effective therapies. Here, we test the hypothesis that rare genetic variation in the metabotropic glutamate-receptor (mGluR) signaling pathway contributes to autism susceptibility. Single-nucleotide variants in genes encoding components of the mGluR signaling pathway were identified by high-throughput multiplex sequencing of pooled samples from 290 non-syndromic autism cases and 300 ethnically matched controls on two independent next-generation platforms. This analysis revealed significant enrichment of rare functional variants in the mGluR pathway in autism cases. Higher burdens of rare, potentially deleterious variants were identified in autism cases for three pathway genes previously implicated in syndromic autism spectrum disorder, TSC1, TSC2, and SHANK3, suggesting that genetic variation in these genes also contributes to risk for non-syndromic autism. In addition, our analysis identified HOMER1, which encodes a postsynaptic density-localized scaffolding protein that interacts with Shank3 to regulate mGluR activity, as a novel autism-risk gene. Rare, potentially deleterious HOMER1 variants identified uniquely in the autism population affected functionally important protein regions or regulatory sequences and co-segregated closely with autism among children of affected families. We also identified rare ASD-associated coding variants predicted to have damaging effects on components of the Ras/MAPK cascade. Collectively, these findings suggest that altered signaling downstream of mGluRs contributes to the pathogenesis of non-syndromic autism. PMID:22558107

  2. Development and Validation of a Scalable Next-Generation Sequencing System for Assessing Relevant Somatic Variants in Solid Tumors12

    PubMed Central

    Hovelson, Daniel H.; McDaniel, Andrew S.; Cani, Andi K.; Johnson, Bryan; Rhodes, Kate; Williams, Paul D.; Bandla, Santhoshi; Bien, Geoffrey; Choppa, Paul; Hyland, Fiona; Gottimukkala, Rajesh; Liu, Guoying; Manivannan, Manimozhi; Schageman, Jeoffrey; Ballesteros-Villagrana, Efren; Grasso, Catherine S.; Quist, Michael J.; Yadati, Venkata; Amin, Anmol; Siddiqui, Javed; Betz, Bryan L.; Knudsen, Karen E.; Cooney, Kathleen A.; Feng, Felix Y.; Roh, Michael H.; Nelson, Peter S.; Liu, Chia-Jen; Beer, David G.; Wyngaard, Peter; Chinnaiyan, Arul M.; Sadis, Seth; Rhodes, Daniel R.; Tomlins, Scott A.

    2015-01-01

    Next-generation sequencing (NGS) has enabled genome-wide personalized oncology efforts at centers and companies with the specialty expertise and infrastructure required to identify and prioritize actionable variants. Such approaches are not scalable, preventing widespread adoption. Likewise, most targeted NGS approaches fail to assess key relevant genomic alteration classes. To address these challenges, we predefined the catalog of relevant solid tumor somatic genome variants (gain-of-function or loss-of-function mutations, high-level copy number alterations, and gene fusions) through comprehensive bioinformatics analysis of >700,000 samples. To detect these variants, we developed the Oncomine Comprehensive Panel (OCP), an integrative NGS-based assay [compatible with < 20 ng of DNA/RNA from formalin-fixed paraffin-embedded (FFPE) tissues], coupled with an informatics pipeline to specifically identify relevant predefined variants and created a knowledge base of related potential treatments, current practice guidelines, and open clinical trials. We validated OCP using molecular standards and more than 300 FFPE tumor samples, achieving >95% accuracy for KRAS, epidermal growth factor receptor, and BRAF mutation detection as well as for ALK and TMPRSS2:ERG gene fusions. Associating positive variants with potential targeted treatments demonstrated that 6% to 42% of profiled samples (depending on cancer type) harbored alterations beyond routine molecular testing that were associated with approved or guideline-referenced therapies. As a translational research tool, OCP identified adaptive CTNNB1 amplifications/mutations in treated prostate cancers. Through predefining somatic variants in solid tumors and compiling associated potential treatment strategies, OCP represents a simplified, broadly applicable targeted NGS system with the potential to advance precision oncology efforts. PMID:25925381

  3. Gene-Based Sequencing Identifies Lipid-Influencing Variants with Ethnicity-Specific Effects in African Americans

    PubMed Central

    Bentley, Amy R.; Chen, Guanjie; Shriner, Daniel; Doumatey, Ayo P.; Zhou, Jie; Huang, Hanxia; Mullikin, James C.; Blakesley, Robert W.; Hansen, Nancy F.; Bouffard, Gerard G.; Cherukuri, Praveen F.; Maskeri, Baishali; Young, Alice C.; Adeyemo, Adebowale; Rotimi, Charles N.

    2014-01-01

    Although a considerable proportion of serum lipids loci identified in European ancestry individuals (EA) replicate in African Americans (AA), interethnic differences in the distribution of serum lipids suggest that some genetic determinants differ by ethnicity. We conducted a comprehensive evaluation of five lipid candidate genes to identify variants with ethnicity-specific effects. We sequenced ABCA1, LCAT, LPL, PON1, and SERPINE1 in 48 AA individuals with extreme serum lipid concentrations (high HDLC/low TG or low HDLC/high TG). Identified variants were genotyped in the full population-based sample of AA (n = 1694) and tested for an association with serum lipids. rs328 (LPL) and correlated variants were associated with higher HDLC and lower TG. Interestingly, a stronger effect was observed on a “European” vs. “African” genetic background at this locus. To investigate this effect, we evaluated the region among West Africans (WA). For TG, the effect size among WA was the same in AA with only African local ancestry (2–3% lower TG), while the larger association among AA with local European ancestry matched previous reports in EA (10%). For HDLC, there was no association with rs328 in AA with only African local ancestry or in WA, while the association among AA with European local ancestry was much greater than what has been observed for EA (15 vs. ∼5 mg/dl), suggesting an interaction with an environmental or genetic factor that differs by ethnicity. Beyond this ancestry effect, the importance of African ancestry-focused, sequence-based work was also highlighted by serum lipid associations of variants that were in higher frequency (or present only) among those of African ancestry. By beginning our study with the sequence variation present in AA individuals, investigating local ancestry effects, and seeking replication in WA, we were able to comprehensively evaluate the role of a set of candidate genes in serum lipids in AA. PMID:24603370

  4. Gene-based sequencing identifies lipid-influencing variants with ethnicity-specific effects in African Americans.

    PubMed

    Bentley, Amy R; Chen, Guanjie; Shriner, Daniel; Doumatey, Ayo P; Zhou, Jie; Huang, Hanxia; Mullikin, James C; Blakesley, Robert W; Hansen, Nancy F; Bouffard, Gerard G; Cherukuri, Praveen F; Maskeri, Baishali; Young, Alice C; Adeyemo, Adebowale; Rotimi, Charles N

    2014-03-01

    Although a considerable proportion of serum lipids loci identified in European ancestry individuals (EA) replicate in African Americans (AA), interethnic differences in the distribution of serum lipids suggest that some genetic determinants differ by ethnicity. We conducted a comprehensive evaluation of five lipid candidate genes to identify variants with ethnicity-specific effects. We sequenced ABCA1, LCAT, LPL, PON1, and SERPINE1 in 48 AA individuals with extreme serum lipid concentrations (high HDLC/low TG or low HDLC/high TG). Identified variants were genotyped in the full population-based sample of AA (n = 1694) and tested for an association with serum lipids. rs328 (LPL) and correlated variants were associated with higher HDLC and lower TG. Interestingly, a stronger effect was observed on a "European" vs. "African" genetic background at this locus. To investigate this effect, we evaluated the region among West Africans (WA). For TG, the effect size among WA was the same in AA with only African local ancestry (2-3% lower TG), while the larger association among AA with local European ancestry matched previous reports in EA (10%). For HDLC, there was no association with rs328 in AA with only African local ancestry or in WA, while the association among AA with European local ancestry was much greater than what has been observed for EA (15 vs. ∼ 5 mg/dl), suggesting an interaction with an environmental or genetic factor that differs by ethnicity. Beyond this ancestry effect, the importance of African ancestry-focused, sequence-based work was also highlighted by serum lipid associations of variants that were in higher frequency (or present only) among those of African ancestry. By beginning our study with the sequence variation present in AA individuals, investigating local ancestry effects, and seeking replication in WA, we were able to comprehensively evaluate the role of a set of candidate genes in serum lipids in AA. PMID:24603370

  5. Gene-based sequencing identifies lipid-influencing variants with ethnicity-specific effects in African Americans.

    PubMed

    Bentley, Amy R; Chen, Guanjie; Shriner, Daniel; Doumatey, Ayo P; Zhou, Jie; Huang, Hanxia; Mullikin, James C; Blakesley, Robert W; Hansen, Nancy F; Bouffard, Gerard G; Cherukuri, Praveen F; Maskeri, Baishali; Young, Alice C; Adeyemo, Adebowale; Rotimi, Charles N

    2014-03-01

    Although a considerable proportion of serum lipids loci identified in European ancestry individuals (EA) replicate in African Americans (AA), interethnic differences in the distribution of serum lipids suggest that some genetic determinants differ by ethnicity. We conducted a comprehensive evaluation of five lipid candidate genes to identify variants with ethnicity-specific effects. We sequenced ABCA1, LCAT, LPL, PON1, and SERPINE1 in 48 AA individuals with extreme serum lipid concentrations (high HDLC/low TG or low HDLC/high TG). Identified variants were genotyped in the full population-based sample of AA (n = 1694) and tested for an association with serum lipids. rs328 (LPL) and correlated variants were associated with higher HDLC and lower TG. Interestingly, a stronger effect was observed on a "European" vs. "African" genetic background at this locus. To investigate this effect, we evaluated the region among West Africans (WA). For TG, the effect size among WA was the same in AA with only African local ancestry (2-3% lower TG), while the larger association among AA with local European ancestry matched previous reports in EA (10%). For HDLC, there was no association with rs328 in AA with only African local ancestry or in WA, while the association among AA with European local ancestry was much greater than what has been observed for EA (15 vs. ∼ 5 mg/dl), suggesting an interaction with an environmental or genetic factor that differs by ethnicity. Beyond this ancestry effect, the importance of African ancestry-focused, sequence-based work was also highlighted by serum lipid associations of variants that were in higher frequency (or present only) among those of African ancestry. By beginning our study with the sequence variation present in AA individuals, investigating local ancestry effects, and seeking replication in WA, we were able to comprehensively evaluate the role of a set of candidate genes in serum lipids in AA.

  6. Variant upstream regulatory region sequences differentially regulate human papillomavirus type 16 DNA replication throughout the viral life cycle.

    PubMed

    Hubert, Walter G

    2005-05-01

    While the central role of the viral upstream regulatory region (URR) in the human papillomavirus (HPV) life cycle has been well established, its effects on viral replication factor expression and plasmid replication of HPV type 16 (HPV16) remain unclear. Some nonprototypic variants of HPV16 contain altered URR sequences and are considered to increase the oncogenic risk of infections. To determine the relationship between viral replication and variant URRs, hybrid viral genomes were constructed with the replication-competent HPV16 prototype W12 and analyzed in assays which recapitulate the different phases of normal viral replication. The establishment efficiencies of hybrid HPV16 genomes differed about 20-fold among European prototypes and variants from Africa and America. Generally, European and African genomes exhibited the lowest replication efficiencies. The high replication levels observed with American variants were primarily attributable to their efficient expression of the replication factors E1 and E2. The maintenance levels of these viral genomes varied about fivefold, which correlated with their respective establishment phenotypes and published P(97) activities. Vegetative DNA amplification could also be observed with replicating HPV16 genomes. These results indicate that efficient E1/E2 expression and elevated plasmid replication levels during the persistent stage of infection may comprise a risk factor in HPV16-mediated oncogenesis.

  7. Complete Genome Sequences of Eight Human Papillomavirus Type 16 Asian American and European Variant Isolates from Cervical Biopsies and Lesions in Indian Women

    PubMed Central

    Mandal, Paramita; Sen, Shrinka; Bhattacharya, Amrapali; Roy Chowdhury, Rahul; Mondal, Nidhu Ranjan

    2016-01-01

    Human papillomavirus type 16 (HPV16), a member of the Papillomaviridae family, is the primary etiological agent of cervical cancer. Here, we report the complete genome sequences of four HPV16 Asian American variants and four European variants, isolated from cervical biopsies and scrapings in India. PMID:27198009

  8. Characterization of Genomic Variants Associated with Scout and Recruit Behavioral Castes in Honey Bees Using Whole-Genome Sequencing

    PubMed Central

    Southey, Bruce R.; Zhu, Ping; Carr-Markell, Morgan K.; Liang, Zhengzheng S.; Zayed, Amro; Li, Ruiqiang; Robinson, Gene E.; Rodriguez-Zas, Sandra L.

    2016-01-01

    Among forager honey bees, scouts seek new resources and return to the colony, enlisting recruits to collect these resources. Differentially expressed genes between these behaviors and genetic variability in scouting phenotypes have been reported. Whole-genome sequencing of 44 Apis mellifera scouts and recruits was undertaken to detect variants and further understand the genetic architecture underlying the behavioral differences between scouts and recruits. The median coverage depth in recruits and scouts was 10.01 and 10.7 X, respectively. Representation of bacterial species among the unmapped reads reflected a more diverse microbiome in scouts than recruits. Overall, 1,412,705 polymorphic positions were analyzed for associations with scouting behavior, and 212 significant (p-value < 0.0001) associations with scouting corresponding to 137 positions were detected. Most frequent putative transcription factor binding sites proximal to significant variants included Broad-complex 4, Broad-complex 1, Hunchback, and CF2-II. Three variants associated with scouting were located within coding regions of ncRNAs including one codon change (LOC102653644) and 2 frameshift indels (LOC102654879 and LOC102655256). Significant variants were also identified on the 5’UTR of membrin, and 3’UTRs of laccase 2 and diacylglycerol kinase theta. The 60 significant variants located within introns corresponded to 39 genes and most of these positions were > 1000 bp apart from each other. A number of these variants were mapped to ncRNA LOC100578102, solute carrier family 12 member 6-like gene, and LOC100576965 (meprin and TRAF-C homology domain containing gene). Functional categories represented among the genes corresponding to significant variants included: neuronal function, exoskeleton, immune response, salivary gland development, and enzymatic food processing. These categories offer a glimpse into the molecular support to the behaviors of scouts and recruits. The level of association

  9. Characterization of the Two Intra-Individual Sequence Variants in the 18S rRNA Gene in the Plant Parasitic Nematode, Rotylenchulus reniformis

    PubMed Central

    Nyaku, Seloame T.; Sripathi, Venkateswara R.; Kantety, Ramesh V.; Gu, Yong Q.; Lawrence, Kathy; Sharma, Govind C.

    2013-01-01

    The 18S rRNA gene is fundamental to cellular and organismal protein synthesis and because of its stable persistence through generations it is also used in phylogenetic analysis among taxa. Sequence variation in this gene within a single species is rare, but it has been observed in few metazoan organisms. More frequently it has mostly been reported in the non-transcribed spacer region. Here, we have identified two sequence variants within the near full coding region of 18S rRNA gene from a single reniform nematode (RN) Rotylenchulus reniformis labeled as reniform nematode variant 1 (RN_VAR1) and variant 2 (RN_VAR2). All sequences from three of the four isolates had both RN variants in their sequences; however, isolate 13B had only RN variant 2 sequence. Specific variable base sites (96 or 5.5%) were found within the 18S rRNA gene that can clearly distinguish the two 18S rDNA variants of RN, in 11 (25.0%) and 33 (75.0%) of the 44 RN clones, for RN_VAR1 and RN_VAR2, respectively. Neighbor-joining trees show that the RN_VAR1 is very similar to the previously existing R. reniformis sequence in GenBank, while the RN_VAR2 sequence is more divergent. This is the first report of the identification of two major variants of the 18S rRNA gene in the same single RN, and documents the specific base variation between the two variants, and hypothesizes on simultaneous co-existence of these two variants for this gene. PMID:23593343

  10. Characterization of the two intra-individual sequence variants in the 18S rRNA gene in the plant parasitic nematode, Rotylenchulus reniformis.

    PubMed

    Nyaku, Seloame T; Sripathi, Venkateswara R; Kantety, Ramesh V; Gu, Yong Q; Lawrence, Kathy; Sharma, Govind C

    2013-01-01

    The 18S rRNA gene is fundamental to cellular and organismal protein synthesis and because of its stable persistence through generations it is also used in phylogenetic analysis among taxa. Sequence variation in this gene within a single species is rare, but it has been observed in few metazoan organisms. More frequently it has mostly been reported in the non-transcribed spacer region. Here, we have identified two sequence variants within the near full coding region of 18S rRNA gene from a single reniform nematode (RN) Rotylenchulus reniformis labeled as reniform nematode variant 1 (RN_VAR1) and variant 2 (RN_VAR2). All sequences from three of the four isolates had both RN variants in their sequences; however, isolate 13B had only RN variant 2 sequence. Specific variable base sites (96 or 5.5%) were found within the 18S rRNA gene that can clearly distinguish the two 18S rDNA variants of RN, in 11 (25.0%) and 33 (75.0%) of the 44 RN clones, for RN_VAR1 and RN_VAR2, respectively. Neighbor-joining trees show that the RN_VAR1 is very similar to the previously existing R. reniformis sequence in GenBank, while the RN_VAR2 sequence is more divergent. This is the first report of the identification of two major variants of the 18S rRNA gene in the same single RN, and documents the specific base variation between the two variants, and hypothesizes on simultaneous co-existence of these two variants for this gene.

  11. Kinetic and Sequence-Structure-Function Analysis of Known LinA Variants with Different Hexachlorocyclohexane Isomers

    PubMed Central

    Kumari, Kirti; Pandey, Gunjan; Jackson, Colin J.; Russell, Robyn J.; Oakeshott, John G.; Lal, Rup

    2011-01-01

    Background Here we report specific activities of all seven naturally occurring LinA variants towards three different isomers, α, γ and δ, of a priority persistent pollutant, hexachlorocyclohexane (HCH). Sequence-structure-function differences contributing to the differences in their stereospecificity for α-, γ-, and δ-HCH and enantiospecificity for (+)- and (−)-α -HCH are also discussed. Methodology/Principal Findings Enzyme kinetic studies were performed with purified LinA variants. Models of LinA2B90A A110T, A111C, A110T/A111C and LinA1B90A were constructed using the FoldX computer algorithm. Turnover rates (min−1) showed that the LinAs exhibited differential substrate affinity amongst the four HCH isomers tested. α-HCH was found to be the most preferred substrate by all LinA's, followed by the γ and then δ isomer. Conclusions/Significance The kinetic observations suggest that LinA-γ1-7 is the best variant for developing an enzyme-based bioremediation technology for HCH. The majority of the sequence variation in the various linA genes that have been isolated is not neutral, but alters the enantio- and stereoselectivity of the encoded proteins. PMID:21949868

  12. Association between SLC2A9 transporter gene variants and uric acid phenotypes in African American and white families

    PubMed Central

    de Andrade, Mariza; Matsumoto, Martha; Mosley, Tom H.; Kardia, Sharon; Turner, Stephen T.

    2011-01-01

    Objectives. SLC2A9 gene variants associate with serum uric acid in white populations, but little is known about African American populations. Since SLC2A9 is a transporter, gene variants may be expected to associate more closely with the fractional excretion of urate, a measure of renal tubular transport, than with serum uric acid, which is influenced by production and extrarenal clearance. Methods. Genotypes of single nucleotide polymorphisms (SNPs) distributed across the SLC2A9 gene were obtained in the Genetic Epidemiology Network of Arteriopathy cohorts. The associations of SNPs with serum uric acid, fractional excretion of urate and urine urate-to-creatinine ratio were assessed with adjustments for age, sex, diuretic use, BMI, homocysteine and triglycerides. Results. We identified SLC2A9 gene variants that were associated with serum uric acid in 1155 African American subjects (53 SNPs) and 1132 white subjects (63 SNPs). The most statistically significant SNPs in African American subjects (rs13113918) and white subjects (rs11723439) were in the latter half of the gene and explained 2.7 and 2.8% of the variation in serum uric acid, respectively. After adjustment for this SNP in African Americans, 0.9% of the variation in serum uric acid was explained by an SNP (rs1568318) in the first half of the gene. Unexpectedly, SLC2A9 gene variants had stronger associations with serum uric acid than with fractional excretion of urate. Conclusions. These findings support two different loci by which SLC2A9 variants affect uric acid levels in African Americans and suggest SLC2A9 variants affect serum uric acid level via renal and extrarenal clearance. PMID:21186168

  13. FamSeq: a variant calling program for family-based sequencing data using graphics processing units.

    PubMed

    Peng, Gang; Fan, Yu; Wang, Wenyi

    2014-10-01

    Various algorithms have been developed for variant calling using next-generation sequencing data, and various methods have been applied to reduce the associated false positive and false negative rates. Few variant calling programs, however, utilize the pedigree information when the family-based sequencing data are available. Here, we present a program, FamSeq, which reduces both false positive and false negative rates by incorporating the pedigree information from the Mendelian genetic model into variant calling. To accommodate variations in data complexity, FamSeq consists of four distinct implementations of the Mendelian genetic model: the Bayesian network algorithm, a graphics processing unit version of the Bayesian network algorithm, the Elston-Stewart algorithm and the Markov chain Monte Carlo algorithm. To make the software efficient and applicable to large families, we parallelized the Bayesian network algorithm that copes with pedigrees with inbreeding loops without losing calculation precision on an NVIDIA graphics processing unit. In order to compare the difference in the four methods, we applied FamSeq to pedigree sequencing data with family sizes that varied from 7 to 12. When there is no inbreeding loop in the pedigree, the Elston-Stewart algorithm gives analytical results in a short time. If there are inbreeding loops in the pedigree, we recommend the Bayesian network method, which provides exact answers. To improve the computing speed of the Bayesian network method, we parallelized the computation on a graphics processing unit. This allowed the Bayesian network method to process the whole genome sequencing data of a family of 12 individuals within two days, which was a 10-fold time reduction compared to the time required for this computation on a central processing unit.

  14. Candidate genes for congenital diaphragmatic hernia from animalmodels: sequencing of fog2 and pdgfra reveals rare variants indiaphragmatic hernia patients

    SciTech Connect

    Bleyl, S.B.; Moshrefi, A.; Shaw, G.M.; Saijoh, Y.; Schoenwolf,G.C.; Pennacchio, L.A.; Slavotinek, A.M.

    2007-05-11

    Congenital diaphragmatic hernia (CDH) is a common, lifethreatening birth defect. Although there is strong evidence implicatinggenetic factors in its pathogenesis, few causative genes have beenidentified, and in isolated CDH, only one de novo, nonsense mutation hasbeen reported in FOG2 in a female with posterior diaphragmaticeventration. We report here that the homozygous null mouse for the Pdgfragene has posterolateral diaphragmatic defects and thus is a model forhuman CDH. We hypothesized that mutations in this gene could cause humanCDH. We sequenced PDGFRa and FOG2 in 96 patients with CDH, of which 53had isolated CDH (55.2 percent), 36 had CDH and additional anomalies(37.5 percent), and 7 had CDH and known chromosome aberrations (7.3percent). For FOG2, we identified novel sequence alterations predictingp.M703L and p.T843A in two patients with isolated CDH that were absent in526 and 564 control chromosomes respectively. These altered amino acidswere highly conserved. However, due to the lack of available parental DNAsamples we were not able to determine if the sequence alterations were denovo. For PDGFRa, we found a single variant predicting p.L967V in apatient with CDH and multiple anomalies that was absent in 768 controlchromosomes. This patient also had one cell with trisomy 15 on skinfibroblast culture, a finding of uncertain significance. Although ourstudy identified sequence variants in FOG2 and PDGFRa, we have notdefinitively established the variants as mutations and we found noevidence that CDH commonly results from mutations in thesegenes.

  15. Longitudinal Antigenic Sequences and Sites from Intra-Host Evolution (LASSIE) identifies immune-selected HIV variants

    DOE PAGES

    Hraber, Peter; Korber, Bette; Wagh, Kshitij; Giorgi, Elena; Bhattacharya, Tanmoy; Gnanakaran, S.; Lapedes, Alan S.; Learn, Gerald H.; Kreider, Edward F.; Li, Yingying; et al

    2015-10-21

    Within-host genetic sequencing from samples collected over time provides a dynamic view of how viruses evade host immunity. Immune-driven mutations might stimulate neutralization breadth by selecting antibodies adapted to cycles of immune escape that generate within-subject epitope diversity. Comprehensive identification of immune-escape mutations is experimentally and computationally challenging. With current technology, many more viral sequences can readily be obtained than can be tested for binding and neutralization, making down-selection necessary. Typically, this is done manually, by picking variants that represent different time-points and branches on a phylogenetic tree. Such strategies are likely to miss many relevant mutations and combinations ofmore » mutations, and to be redundant for other mutations. Longitudinal Antigenic Sequences and Sites from Intrahost Evolution (LASSIE) uses transmitted founder loss to identify virus “hot-spots” under putative immune selection and chooses sequences that represent recurrent mutations in selected sites. LASSIE favors earliest sequences in which mutations arise. Here, with well-characterized longitudinal Env sequences, we confirmed selected sites were concentrated in antibody contacts and selected sequences represented diverse antigenic phenotypes. Finally, practical applications include rapidly identifying immune targets under selective pressure within a subject, selecting minimal sets of reagents for immunological assays that characterize evolving antibody responses, and for immunogens in polyvalent “cocktail” vaccines.« less

  16. Longitudinal Antigenic Sequences and Sites from Intra-Host Evolution (LASSIE) identifies immune-selected HIV variants

    SciTech Connect

    Hraber, Peter; Korber, Bette; Wagh, Kshitij; Giorgi, Elena; Bhattacharya, Tanmoy; Gnanakaran, S.; Lapedes, Alan S.; Learn, Gerald H.; Kreider, Edward F.; Li, Yingying; Shaw, George M.; Hahn, Beatrice H.; Montefiori, David C.; Alam, S. Munir; Bonsignori, Mattia; Moody, M. Anthony; Liao, Hua-Xin; Gao, Feng; Haynes, Barton

    2015-10-21

    Within-host genetic sequencing from samples collected over time provides a dynamic view of how viruses evade host immunity. Immune-driven mutations might stimulate neutralization breadth by selecting antibodies adapted to cycles of immune escape that generate within-subject epitope diversity. Comprehensive identification of immune-escape mutations is experimentally and computationally challenging. With current technology, many more viral sequences can readily be obtained than can be tested for binding and neutralization, making down-selection necessary. Typically, this is done manually, by picking variants that represent different time-points and branches on a phylogenetic tree. Such strategies are likely to miss many relevant mutations and combinations of mutations, and to be redundant for other mutations. Longitudinal Antigenic Sequences and Sites from Intrahost Evolution (LASSIE) uses transmitted founder loss to identify virus “hot-spots” under putative immune selection and chooses sequences that represent recurrent mutations in selected sites. LASSIE favors earliest sequences in which mutations arise. Here, with well-characterized longitudinal Env sequences, we confirmed selected sites were concentrated in antibody contacts and selected sequences represented diverse antigenic phenotypes. Finally, practical applications include rapidly identifying immune targets under selective pressure within a subject, selecting minimal sets of reagents for immunological assays that characterize evolving antibody responses, and for immunogens in polyvalent “cocktail” vaccines.

  17. Red-Shifted Aequorin Variants Incorporating Non-Canonical Amino Acids: Applications in In Vivo Imaging.

    PubMed

    Grinstead, Kristen M; Rowe, Laura; Ensor, Charles M; Joel, Smita; Daftarian, Pirouz; Dikici, Emre; Zingg, Jean-Marc; Daunert, Sylvia

    2016-01-01

    The increased importance of in vivo diagnostics has posed new demands for imaging technologies. In that regard, there is a need for imaging molecules capable of expanding the applications of current state-of-the-art imaging in vivo diagnostics. To that end, there is a desire for new reporter molecules capable of providing strong signals, are non-toxic, and can be tailored to diagnose or monitor the progression of a number of diseases. Aequorin is a non-toxic photoprotein that can be used as a sensitive marker for bioluminescence in vivo imaging. The sensitivity of aequorin is due to the fact that bioluminescence is a rare phenomenon in nature and, therefore, it does not suffer from autofluorescence, which contributes to background emission. Emission of bioluminescence in the blue-region of the spectrum by aequorin only occurs when calcium, and its luciferin coelenterazine, are bound to the protein and trigger a biochemical reaction that results in light generation. It is this reaction that endows aequorin with unique characteristics, making it ideally suited for a number of applications in bioanalysis and imaging. Herein we report the site-specific incorporation of non-canonical or non-natural amino acids and several coelenterazine analogues, resulting in a catalog of 72 cysteine-free, aequorin variants which expand the potential applications of these photoproteins by providing several red-shifted mutants better suited to use in vivo. In vivo studies in mouse models using the transparent tissue of the eye confirmed the activity of the aequorin variants incorporating L-4-iodophehylalanine and L-4-methoxyphenylalanine after injection into the eye and topical addition of coelenterazine. The signal also remained localized within the eye. This is the first time that aequorin variants incorporating non-canonical amino acids have shown to be active in vivo and useful as reporters in bioluminescence imaging. PMID:27367859

  18. Red-Shifted Aequorin Variants Incorporating Non-Canonical Amino Acids: Applications in In Vivo Imaging

    PubMed Central

    Grinstead, Kristen M.; Rowe, Laura; Ensor, Charles M.; Joel, Smita; Daftarian, Pirouz; Dikici, Emre; Zingg, Jean-Marc; Daunert, Sylvia

    2016-01-01

    The increased importance of in vivo diagnostics has posed new demands for imaging technologies. In that regard, there is a need for imaging molecules capable of expanding the applications of current state-of-the-art imaging in vivo diagnostics. To that end, there is a desire for new reporter molecules capable of providing strong signals, are non-toxic, and can be tailored to diagnose or monitor the progression of a number of diseases. Aequorin is a non-toxic photoprotein that can be used as a sensitive marker for bioluminescence in vivo imaging. The sensitivity of aequorin is due to the fact that bioluminescence is a rare phenomenon in nature and, therefore, it does not suffer from autofluorescence, which contributes to background emission. Emission of bioluminescence in the blue-region of the spectrum by aequorin only occurs when calcium, and its luciferin coelenterazine, are bound to the protein and trigger a biochemical reaction that results in light generation. It is this reaction that endows aequorin with unique characteristics, making it ideally suited for a number of applications in bioanalysis and imaging. Herein we report the site-specific incorporation of non-canonical or non-natural amino acids and several coelenterazine analogues, resulting in a catalog of 72 cysteine-free, aequorin variants which expand the potential applications of these photoproteins by providing several red-shifted mutants better suited to use in vivo. In vivo studies in mouse models using the transparent tissue of the eye confirmed the activity of the aequorin variants incorporating L-4-iodophehylalanine and L-4-methoxyphenylalanine after injection into the eye and topical addition of coelenterazine. The signal also remained localized within the eye. This is the first time that aequorin variants incorporating non-canonical amino acids have shown to be active in vivo and useful as reporters in bioluminescence imaging. PMID:27367859

  19. Early strains of multidrug-resistant Salmonella enterica serovar Kentucky sequence type 198 from Southeast Asia harbor Salmonella genomic island 1-J variants with a novel insertion sequence.

    PubMed

    Le Hello, Simon; Weill, François-Xavier; Guibert, Véronique; Praud, Karine; Cloeckaert, Axel; Doublet, Benoît

    2012-10-01

    Salmonella genomic island 1 (SGI1) is a 43-kb integrative mobilizable element that harbors a great diversity of multidrug resistance gene clusters described in numerous Salmonella enterica serovars and also in Proteus mirabilis. The majority of SGI1 variants contain an In104-derivative complex class 1 integron inserted between resolvase gene res and open reading frame (ORF) S044 in SGI1. Recently, the international spread of ciprofloxacin-resistant S. enterica serovar Kentucky sequence type 198 (ST198) containing SGI1-K variants has been reported. A retrospective study was undertaken to characterize ST198 S. Kentucky strains isolated before the spread of the epidemic ST198-SGI1-K population in Africa and the Middle East. Here, we characterized 12 ST198 S. Kentucky strains isolated between 1969 and 1999, mainly from humans returning from Southeast Asia (n = 10 strains) or Israel (n = 1 strain) or from meat in Egypt (n = 1 strain). All these ST198 S. Kentucky strains did not belong to the XbaI pulsotype X1 associated with the African epidemic clone but to pulsotype X2. SGI1-J subgroup variants containing different complex integrons with a partial transposition module and inserted within ORF S023 of SGI1 were detected in six strains. The SGI1-J4 variant containing a partially deleted class 1 integron and thus showing a narrow resistance phenotype to sulfonamides was identified in two epidemiologically unrelated strains from Indonesia. The four remaining strains harbored a novel SGI1-J variant, named SGI1-J6, which contained aadA2, floR2, tetR(G)-tetA(G), and sul1 resistance genes within its complex integron. Moreover, in all these S. Kentucky isolates, a novel insertion sequence related to the IS630 family and named ISSen5 was found inserted upstream of the SGI1 complex integron in ORF S023. Thus, two subpopulations of S. Kentucky ST198 independently and exclusively acquired the SGI1 during the 1980s and 1990s. Unlike the ST198-X1 African epidemic subpopulation, the

  20. Early Strains of Multidrug-Resistant Salmonella enterica Serovar Kentucky Sequence Type 198 from Southeast Asia Harbor Salmonella Genomic Island 1-J Variants with a Novel Insertion Sequence

    PubMed Central

    Le Hello, Simon; Weill, François-Xavier; Guibert, Véronique; Praud, Karine; Cloeckaert, Axel

    2012-01-01

    Salmonella genomic island 1 (SGI1) is a 43-kb integrative mobilizable element that harbors a great diversity of multidrug resistance gene clusters described in numerous Salmonella enterica serovars and also in Proteus mirabilis. The majority of SGI1 variants contain an In104-derivative complex class 1 integron inserted between resolvase gene res and open reading frame (ORF) S044 in SGI1. Recently, the international spread of ciprofloxacin-resistant S. enterica serovar Kentucky sequence type 198 (ST198) containing SGI1-K variants has been reported. A retrospective study was undertaken to characterize ST198 S. Kentucky strains isolated before the spread of the epidemic ST198-SGI1-K population in Africa and the Middle East. Here, we characterized 12 ST198 S. Kentucky strains isolated between 1969 and 1999, mainly from humans returning from Southeast Asia (n = 10 strains) or Israel (n = 1 strain) or from meat in Egypt (n = 1 strain). All these ST198 S. Kentucky strains did not belong to the XbaI pulsotype X1 associated with the African epidemic clone but to pulsotype X2. SGI1-J subgroup variants containing different complex integrons with a partial transposition module and inserted within ORF S023 of SGI1 were detected in six strains. The SGI1-J4 variant containing a partially deleted class 1 integron and thus showing a narrow resistance phenotype to sulfonamides was identified in two epidemiologically unrelated strains from Indonesia. The four remaining strains harbored a novel SGI1-J variant, named SGI1-J6, which contained aadA2, floR2, tetR(G)-tetA(G), and sul1 resistance genes within its complex integron. Moreover, in all these S. Kentucky isolates, a novel insertion sequence related to the IS630 family and named ISSen5 was found inserted upstream of the SGI1 complex integron in ORF S023. Thus, two subpopulations of S. Kentucky ST198 independently and exclusively acquired the SGI1 during the 1980s and 1990s. Unlike the ST198-X1 African epidemic subpopulation, the

  1. Amino acid sequences of proteins from Leptospira serovar pomona.

    PubMed

    Alves, S F; Lefebvre, R B; Probert, W

    2000-01-01

    This report describes a partial amino acid sequences from three putative outer envelope proteins from Leptospira serovar pomona. In order to obtain internal fragments for protein sequencing, enzymatic and chemical digestion was performed. The enzyme clostripain was used to digest the proteins 32 and 45 kDa. In situ digestion of 40 kDa molecular weight protein was accomplished using cyanogen bromide. The 32 kDa protein generated two fragments, one of 21 kDa and another of 10 kDa that yielded five residues. A fragment of 24 kDa that yielded nineteen residues of amino acids was obtained from 45 kDa protein. A fragment with a molecular weight of 20 kDa, yielding a twenty amino acids sequence from the 40 kDa protein.

  2. The amino acid sequence of Staphylococcus aureus penicillinase.

    PubMed Central

    Ambler, R P

    1975-01-01

    The amino acid sequence of the penicillinase (penicillin amido-beta-lactamhydrolase, EC 3.5.2.6) from Staphylococcus aureus strain PC1 was determined. The protein consists of a single polypeptide chain of 257 residues, and the sequence was determined by characterization of tryptic, chymotryptic, peptic and CNBr peptides, with some additional evidence from thermolysin and S. aureus proteinase peptides. A mistake in the preliminary report of the sequence is corrected; residues 113-116 are now thought to be -Lys-Lys-Val-Lys- rather than -Lys-Val-Lys-Lys-. Detailed evidence for the amino acid sequence has been deposited as Supplementary Publication SUP 50056 (91 pages) at the British Library (Lending Division), Boston Spa, Wetherby, West Yorkshire LS23 7BQ, U.K., from whom copies may be obtained on the terms given in Biochem. J. (1975) 145, 5. PMID:1218078

  3. Ancient human sialic acid variant restricts an emerging zoonotic malaria parasite

    PubMed Central

    Dankwa, Selasi; Lim, Caeul; Bei, Amy K.; Jiang, Rays H. Y.; Abshire, James R.; Patel, Saurabh D.; Goldberg, Jonathan M.; Moreno, Yovany; Kono, Maya; Niles, Jacquin C.; Duraisingh, Manoj T.

    2016-01-01

    Plasmodium knowlesi is a zoonotic parasite transmitted from macaques causing malaria in humans in Southeast Asia. Plasmodium parasites bind to red blood cell (RBC) surface receptors, many of which are sialylated. While macaques synthesize the sialic acid variant N-glycolylneuraminic acid (Neu5Gc), humans cannot because of a mutation in the enzyme CMAH that converts N-acetylneuraminic acid (Neu5Ac) to Neu5Gc. Here we reconstitute CMAH in human RBCs for the reintroduction of Neu5Gc, which results in enhancement of P. knowlesi invasion. We show that two P. knowlesi invasion ligands, PkDBPβ and PkDBPγ, bind specifically to Neu5Gc-containing receptors. A human-adapted P. knowlesi line invades human RBCs independently of Neu5Gc, with duplication of the sialic acid-independent invasion ligand, PkDBPα and loss of PkDBPγ. Our results suggest that absence of Neu5Gc on human RBCs limits P. knowlesi invasion, but that parasites may evolve to invade human RBCs through the use of sialic acid-independent pathways. PMID:27041489

  4. Ethnic-specific associations of rare and low-frequency DNA sequence variants with asthma

    PubMed Central

    Igartua, Catherine; Myers, Rachel A.; Mathias, Rasika A.; Pino-Yanes, Maria; Eng, Celeste; Graves, Penelope E.; Levin, Albert M.; Del-Rio-Navarro, Blanca E.; Jackson, Daniel J.; Livne, Oren E.; Rafaels, Nicholas; Edlund, Christopher K.; Yang, James J.; Huntsman, Scott; Salam, Muhammad T.; Romieu, Isabelle; Mourad, Raphael; Gern, James E.; Lemanske, Robert F.; Wyss, Annah; Hoppin, Jane A.; Barnes, Kathleen C.; Burchard, Esteban G.; Gauderman, W. James; Martinez, Fernando D.; Raby, Benjamin A.; Weiss, Scott T.; Williams, L. Keoki; London, Stephanie J.; Gilliland, Frank D.; Nicolae, Dan L.; Ober, Carole

    2015-01-01

    Common variants at many loci have been robustly associated with asthma but explain little of the overall genetic risk. Here we investigate the role of rare (<1%) and low-frequency (1–5%) variants using the Illumina HumanExome BeadChip array in 4,794 asthma cases, 4,707 non-asthmatic controls and 590 case–parent trios representing European Americans, African Americans/African Caribbeans and Latinos. Our study reveals one low-frequency missense mutation in the GRASP gene that is associated with asthma in the Latino sample (P=4.31 × 10−6; OR=1.25; MAF=1.21%) and two genes harbouring functional variants that are associated with asthma in a gene-based analysis: GSDMB at the 17q12–21 asthma locus in the Latino and combined samples (P=7.81 × 10−8 and 4.09 × 10−8, respectively) and MTHFR in the African ancestry sample (P=1.72 × 10−6). Our results suggest that associations with rare and low-frequency variants are ethnic specific and not likely to explain a significant proportion of the ‘missing heritability’ of asthma. PMID:25591454

  5. Complete genome sequence of Campylobacter jejuni RM1285 a rod-shaped morphological variant

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Campylobacter jejuni is a spiral-shaped Gram-negative food-borne human pathogen found on poultry products. Strain RM1285 is a rod-shaped variant of this species. The genome of RM1285 was determined to be 1,635,803 bp with a G+C content of 30.5%....

  6. DNA Sequence Variants in the Five Prime Untranslated Region of the Cyclooxygenase-2 Gene Are Commonly Found in Healthy Dogs and Gray Wolves.

    PubMed

    Safra, Noa; Hayward, Louisa J; Aguilar, Miriam; Sacks, Benjamin N; Westropp, Jodi L; Mohr, F Charles; Mellersh, Cathryn S; Bannasch, Danika L

    2015-01-01

    The aim of this study was to investigate the frequency of regional DNA variants upstream to the translation initiation site of the canine Cyclooxygenase-2 (Cox-2) gene in healthy dogs. Cox-2 plays a role in various disease conditions such as acute and chronic inflammation, osteoarthritis and malignancy. A role for Cox-2 DNA variants in genetic predisposition to canine renal dysplasia has been proposed and dog breeders have been encouraged to select against these DNA variants. We sequenced 272-422 bases in 152 dogs unaffected by renal dysplasia and found 19 different haplotypes including 11 genetic variants which had not been described previously. We genotyped 7 gray wolves to ascertain the wildtype variant and found that the wolves we analyzed had predominantly the second most common DNA variant found in dogs. Our results demonstrate an elevated level of regional polymorphism that appears to be a feature of healthy domesticated dogs.

  7. DNA Sequence Variants in the Five Prime Untranslated Region of the Cyclooxygenase-2 Gene Are Commonly Found in Healthy Dogs and Gray Wolves

    PubMed Central

    Safra, Noa; Hayward, Louisa J.; Aguilar, Miriam; Sacks, Benjamin N.; Westropp, Jodi L.; Mohr, F. Charles; Mellersh, Cathryn S.; Bannasch, Danika L.

    2015-01-01

    The aim of this study was to investigate the frequency of regional DNA variants upstream to the translation initiation site of the canine Cyclooxygenase-2 (Cox-2) gene in healthy dogs. Cox-2 plays a role in various disease conditions such as acute and chronic inflammation, osteoarthritis and malignancy. A role for Cox-2 DNA variants in genetic predisposition to canine renal dysplasia has been proposed and dog breeders have been encouraged to select against these DNA variants. We sequenced 272–422 bases in 152 dogs unaffected by renal dysplasia and found 19 different haplotypes including 11 genetic variants which had not been described previously. We genotyped 7 gray wolves to ascertain the wildtype variant and found that the wolves we analyzed had predominantly the second most common DNA variant found in dogs. Our results demonstrate an elevated level of regional polymorphism that appears to be a feature of healthy domesticated dogs. PMID:26244515

  8. Different Variants in Reverse Transcriptase Domain Determined by Ultra-deep Sequencing in Treatment-naïve and Treated Indonesian Patients Infected with Hepatitis B Virus.

    PubMed

    Wasityastuti, Widya; Yano, Yoshihiko; Widasari, Dewiyani Indah; Yamani, Laura Navika; Ratnasari, Neneng; Heriyanto, Didik Setyo; Okada, Rina; Tanahashi, Toshihito; Murakami, Yoshiki; Azuma, Takeshi; Hayashi, Yoshitake

    2016-01-01

    A nucleos(t)ide analog (NA) is the common antiviral drug available for directly treating hepatitis B virus (HBV) infection. However, its application has led to the emergence of NA-resistant mutations mostly in a conserved region of the reverse transcriptase domain of HBV polymerase. Harboring NA-resistant mutations decreases drug effectiveness and increases the frequency of end-stage liver disease. The invention of next-generation sequencing that can generate thousands of sequences from viral complex mixtures provides opportunities to detect minor changes and early viral evolution under drug stress. The present study used ultra-deep sequencing to evaluate discrepant quasispecies in the reverse transcriptase domain of HBV including NA-resistant hotspots between seven treatment-naïve Indonesian patients infected with HBV and five at the early phase of treatment. The most common sub-genotype was HBV B3 (83.34%). The substitution rate of variants determined among amino acids with a ratio of ≥ 1% changes was higher among the population in conserved regions (23.19% vs. 4.59%, P = 0.001) and in the inter-reverse transcriptase domain (23.95% vs. 2.94%, P = 0.002) in treatment naïve, than in treated patients. Nine hotspots of antiviral resistance were identified in both groups, and the mean frequency of changes in all patients was < 1%. The known rtM204I mutation was the most frequent in both groups. The lower rate of variants in HBV quasispecies in patients undergoing treatment could be associated with virus elimination and the extinction of sensitive species by NA therapy. The present findings imply that HBV quasispecies dynamically change during treatment. PMID:27492206

  9. The amino-acid sequence of kangaroo pancreatic ribonuclease.

    PubMed

    Gaastra, W; Welling, G W; Beintema, J J

    1978-05-01

    Red kangaroo (Macropus rufus) ribonuclease was isolated from pancreatic tissue by affinity chromatography. The amino acid sequence was determined by automatic sequencing of overlapping large fragments and by analysis of shorter peptides obtained by digestion with a number of proteolytic enzymes. The polypeptide chain consists of 122 amino acid residues. Compared to other ribonucleases, the N-terminal residue and residue 114 are deleted. In other pancreatic ribonucleases position 114 is occupied by a cis proline residue in an external loop at the surface of the molecule. Other remarkable substitutions are the presence of a tyrosine residue at position 123 instead of a serine which forms a hydrogen bond with the pyrimidine ring of a nucleotide substrate, and a number of hydrophobichydrophilic interchanges in the sequence 51-55, which forms part of an alpha-helix in bovine ribonuclease and exhibits few substitutions in the placental mammals. Kangaroo ribonuclease contains no carbohydrate, although the enzyme possesses a recognition site for carbohydrate attachment in the sequence Asn-Val-Thr (62-64). The enzyme differs at about 35-40% of the positions from all other mammalian pancreatic ribonucleases sequenced to date, which is in agreement with the early divergence between the marsupials and the placental mammals. From fragmentary data a tentative sequence of red-necked wallaby (Macropus rufogriseus) pancreatic ribonuclease has been derived. Eight differences with the kangaroo sequence were found.

  10. Prebiotically plausible mechanisms increase compositional diversity of nucleic acid sequences

    PubMed Central

    Derr, Julien; Manapat, Michael L.; Rajamani, Sudha; Leu, Kevin; Xulvi-Brunet, Ramon; Joseph, Isaac; Nowak, Martin A.; Chen, Irene A.

    2012-01-01

    During the origin of life, the biological information of nucleic acid polymers must have increased to encode functional molecules (the RNA world). Ribozymes tend to be compositionally unbiased, as is the vast majority of possible sequence space. However, ribonucleotides vary greatly in synthetic yield, reactivity and degradation rate, and their non-enzymatic polymerization results in compositionally biased sequences. While natural selection could lead to complex sequences, molecules with some activity are required to begin this process. Was the emergence of compositionally diverse sequences a matter of chance, or could prebiotically plausible reactions counter chemical biases to increase the probability of finding a ribozyme? Our in silico simulations using a two-letter alphabet show that template-directed ligation and high concatenation rates counter compositional bias and shift the pool toward longer sequences, permitting greater exploration of sequence space and stable folding. We verified experimentally that unbiased DNA sequences are more efficient templates for ligation, thus increasing the compositional diversity of the pool. Our work suggests that prebiotically plausible chemical mechanisms of nucleic acid polymerization and ligation could predispose toward a diverse pool of longer, potentially structured molecules. Such mechanisms could have set the stage for the appearance of functional activity very early in the emergence of life. PMID:22319215

  11. The amino-acid sequence of kangaroo pancreatic ribonuclease.

    PubMed

    Gaastra, W; Welling, G W; Beintema, J J

    1978-05-01

    Red kangaroo (Macropus rufus) ribonuclease was isolated from pancreatic tissue by affinity chromatography. The amino acid sequence was determined by automatic sequencing of overlapping large fragments and by analysis of shorter peptides obtained by digestion with a number of proteolytic enzymes. The polypeptide chain consists of 122 amino acid residues. Compared to other ribonucleases, the N-terminal residue and residue 114 are deleted. In other pancreatic ribonucleases position 114 is occupied by a cis proline residue in an external loop at the surface of the molecule. Other remarkable substitutions are the presence of a tyrosine residue at position 123 instead of a serine which forms a hydrogen bond with the pyrimidine ring of a nucleotide substrate, and a number of hydrophobichydrophilic interchanges in the sequence 51-55, which forms part of an alpha-helix in bovine ribonuclease and exhibits few substitutions in the placental mammals. Kangaroo ribonuclease contains no carbohydrate, although the enzyme possesses a recognition site for carbohydrate attachment in the sequence Asn-Val-Thr (62-64). The enzyme differs at about 35-40% of the positions from all other mammalian pancreatic ribonucleases sequenced to date, which is in agreement with the early divergence between the marsupials and the placental mammals. From fragmentary data a tentative sequence of red-necked wallaby (Macropus rufogriseus) pancreatic ribonuclease has been derived. Eight differences with the kangaroo sequence were found. PMID:658039

  12. Sequencing PDX1 (insulin promoter factor 1) in 1788 UK individuals found 5% had a low frequency coding variant, but these variants are not associated with Type 2 diabetes

    PubMed Central

    Edghill, E L; Khamis, A; Weedon, M N; Walker, M; Hitman, G A; McCarthy, M I; Owen, K R; Ellard, S; T Hattersley, A; Frayling, T M

    2011-01-01

    Aim Genome-wide association studies have identified > 30 common variants associated with Type 2 diabetes (> 5% minor allele frequency). These variants have small effects on individual risk and do not account for a large proportion of the heritable component of the disease. Monogenic forms of diabetes are caused by mutations that occur in < 1:2000 individuals and follow strict patterns of inheritance. In contrast, the role of low frequency genetic variants (minor allele frequency 0.1–5%) in Type 2 diabetes is not known. The aim of this study was to assess the role of low frequency PDX1 (also called IPF1) variants in Type 2 diabetes. Methods We sequenced the coding and flanking intronic regions of PDX1 in 910 patients with Type 2 diabetes and 878 control subjects. Results We identified a total of 26 variants that occurred in 5.3% of individuals, 14 of which occurred once. Only D76N occurred in > 1%. We found no difference in carrier frequency between patients (5.7%) and control subjects (5.0%) (P = 0.46). There were also no differences between patients and control subjects when analyses were limited to subsets of variants. The strongest subset were those variants in the DNA binding domain where all five variants identified were only found in patients (P = 0.06). Conclusion Approximately 5% of UK individuals carry a PDX1 variant, but there is no evidence that these variants, either individually or cumulatively, predispose to Type 2 diabetes. Further studies will need to consider strategies to assess the role of multiple variants that occur in < 1 in 1000 individuals. PMID:21569088

  13. Stem pitting and seedling yellows symptoms of Citrus tristeza virus infection may be determined by minor sequence variants.

    PubMed

    Cerni, Silvija; Ruscić, Jelena; Nolasco, Gustavo; Gatin, Zivko; Krajacić, Mladen; Skorić, Dijana

    2008-02-01

    The isolates of Citrus tristeza virus (CTV), the most destructive viral pathogen of citrus, display a high level of variability. As a result of genetic bottleneck induced by the bud-inoculation of CTV-infected material, inoculated seedlings of Citrus wilsonii Tanaka displayed different symptoms. All successfully grafted plants showed severe symptoms of stem pitting and seedling yellows, while plants in which inoculated buds died displayed mild symptoms. Since complex CTV population structure was detected in the parental host, the aim of this work was to investigate how it changed after the virus transmission, and to correlate it with observed symptoms. The coat protein gene sequence of the predominant genotype was identical in parental and grafted plants and clustered to the phylogenetic group 5 encompassing severe reference isolates. In seedlings displaying severe symptoms, the low-frequency variants clustering to other phylogenetic groups were detected, as well. Indicator plants were inoculated with buds taken from unsuccessfully grafted C. wilsonii seedlings. Surprisingly, they displayed no severe symptoms despite the presence of phylogenetic group 5 genomic variants. The results suggest that the appearance of severe symptoms in this case is probably induced by a complex CTV population structure found in seedlings displaying severe symptoms, and not directly by the predominant genomic variant. PMID:18074213

  14. Pancreatic ribonucleases of mammals with ruminant-like digestion. Amino-acid sequences of hippopotamus and sloth ribonucleases.

    PubMed

    Havinga, J; Beintema, J J

    1980-09-01

    High levels of pancreatic ribonucleases are found in ruminants, species that have a ruminant-like digestion and several species with coecal digestion. Pancreatic ribonucleases from several independently evolved species with ruminant-like digestion were investigated to test a hypothesis that glycosylation of ribonucleases may have some function in species with coecal digestion and that glycosylation of the enzyme may not be advantageous for ruminants. Ribonucleases from the hippopotamus, two-toed sloth and three-toed sloth were isolated by extraction with sulfuric acid and affinity chromatography. Complete amino acid sequences were determined for the ribonucleases from the hippopotamus and two-toed sloth and a partial sequence for the enzyme from the three-toed sloth. The amino acids 75-78 of hippopotamus ribonuclease were positioned by homology with other artiodactyl ribonucleases. In hippopotamus ribonuclease a heterogeneity was found at position 37, half of the molecules containing glutamine acid the other half lysine. Hippopotamus ribonuclease differs less from pig and bovine ribonuclease than these differ from each other, because more ancestral characteristics have been retained. Although hippopotamus ribonuclease contains all four Asn-X-Ser/Thr sequences previously found to be glycosylation sites in one or more pancreatic ribonucleases, only the sequence Ans-Met-Thr (34-36) is glycosylated in the variant with glutamine at position 37, while the variant with lysine at this position is carbohydrate-free. Both sloth ribonucleases are completely glycosylated at the sequence Ans-Met-Thr (34-36) with a simple type of carbohydrate chain. The amino acid sequence of two-toed sloth ribonuclease shows some interesting coupled replacements.

  15. Identification of Rare Causal Variants in Sequence-Based Studies: Methods and Applications to VPS13B, a Gene Involved in Cohen Syndrome and Autism

    PubMed Central

    De Rubeis, Silvia; McCallum, Kenneth; Buxbaum, Joseph D.

    2014-01-01

    Pinpointing the small number of causal variants among the abundant naturally occurring genetic variation is a difficult challenge, but a crucial one for understanding precise molecular mechanisms of disease and follow-up functional studies. We propose and investigate two complementary statistical approaches for identification of rare causal variants in sequencing studies: a backward elimination procedure based on groupwise association tests, and a hierarchical approach that can integrate sequencing data with diverse functional and evolutionary conservation annotations for individual variants. Using simulations, we show that incorporation of multiple bioinformatic predictors of deleteriousness, such as PolyPhen-2, SIFT and GERP++ scores, can improve the power to discover truly causal variants. As proof of principle, we apply the proposed methods to VPS13B, a gene mutated in the rare neurodevelopmental disorder called Cohen syndrome, and recently reported with recessive variants in autism. We identify a small set of promising candidates for causal variants, including two loss-of-function variants and a rare, homozygous probably-damaging variant that could contribute to autism risk. PMID:25502226

  16. Development of an expert system for amino acid sequence identification.

    PubMed

    Hu, L; Saulinskas, E F; Johnson, P; Harrington, P B

    1996-08-01

    An expert system for amino acid sequence identification has been developed. The algorithm uses heuristic rules developed by human experts in protein sequencing. The system is applied to the chromatographic data of phenylthiohydantoin-amino acids acquired from an automated sequencer. The peak intensities in the current cycle are compared with those in the previous cycle, while the calibration and succeeding cycles are used as ancillary identification criteria when necessary. The retention time for each chromatographic peak in each cycle is corrected by the corresponding peak in the calibration cycle at the same run. The main improvement of our system compared with the onboard software used by the Applied Biosystems 477A Protein/Peptide Sequencer is that each peak in each cycle is assigned an identification name according to the corrected retention time to be used for the comparison with different cycles. The system was developed from analyses of ribonuclease A and evaluated by runs of four other protein samples that were not used in rule development. This paper demonstrates that rules developed by human experts can be automatically applied to sequence assignment. The expert system performed more accurately than the onboard software of the protein sequencer, in that the misidentification rates for the expert system were around 7%, whereas those for the onboard software were between 13 and 21%.

  17. In silico comparative characterization of pharmacogenomic missense variants

    PubMed Central

    2014-01-01

    Background Missense pharmacogenomic (PGx) variants refer to amino acid substitutions that potentially affect the pharmacokinetic (PK) or pharmacodynamic (PD) response to drug therapies. The PGx variants, as compared to disease-associated variants, have not been investigated as deeply. The ability to computationally predict future PGx variants is desirable; however, it is not clear what data sets should be used or what features are beneficial to this end. Hence we carried out a comparative characterization of PGx variants with annotated neutral and disease variants from UniProt, to test the predictive power of sequence conservation and structural information in discriminating these three groups. Results 126 PGx variants of high quality from PharmGKB were selected and two data sets were created: one set contained 416 variants with structural and sequence information, and, the other set contained 1,265 variants with sequence information only. In terms of sequence conservation, PGx variants are more conserved than neutral variants and much less conserved than disease variants. A weighted random forest was used to strike a more balanced classification for PGx variants. Generally structural features are helpful in discriminating PGx variant from the other two groups, but still classification of PGx from neutral polymorphisms is much less effective than between disease and neutral variants. Conclusions We found that PGx variants are much more similar to neutral variants than to disease variants in the feature space consisting of residue conservation, neighboring residue conservation, number of neighbors, and protein solvent accessibility. Such similarity poses great difficulty in the classification of PGx variants and polymorphisms. PMID:25057096

  18. Coding Variants at Hexa-allelic Amino Acid 13 of HLA-DRB1 Explain Independent SNP Associations with Follicular Lymphoma Risk

    PubMed Central

    Foo, Jia Nee; Smedby, Karin E.; Akers, Nicholas K.; Berglund, Mattias; Irwan, Ishak D.; Jia, Xiaoming; Li, Yi; Conde, Lucia; Darabi, Hatef; Bracci, Paige M.; Melbye, Mads; Adami, Hans-Olov; Glimelius, Bengt; Khor, Chiea Chuen; Hjalgrim, Henrik; Padyukov, Leonid; Humphreys, Keith; Enblad, Gunilla; Skibola, Christine F.; de Bakker, Paul I.W.; Liu, Jianjun

    2013-01-01

    Non-Hodgkin lymphoma represents a diverse group of blood malignancies, of which follicular lymphoma (FL) is a common subtype. Previous genome-wide association studies (GWASs) have identified in the human leukocyte antigen (HLA) class II region multiple independent SNPs that are significantly associated with FL risk. To dissect these signals and determine whether coding variants in HLA genes are responsible for the associations, we conducted imputation, HLA typing, and sequencing in three independent populations for a total of 689 cases and 2,446 controls. We identified a hexa-allelic amino acid polymorphism at position 13 of the HLA-DR beta chain that showed the strongest association with FL within the major histocompatibility complex (MHC) region (multiallelic p = 2.3 × 10−15). Out of six possible amino acids that occurred at that position within the population, we classified two as high risk (Tyr and Phe), two as low risk (Ser and Arg), and two as moderate risk (His and Gly). There was a 4.2-fold difference in risk (95% confidence interval = 2.9–6.1) between subjects carrying two alleles encoding high-risk amino acids and those carrying two alleles encoding low-risk amino acids (p = 1.01 × 10−14). This coding variant might explain the complex SNP associations identified by GWASs and suggests a common HLA-DR antigen-driven mechanism for the pathogenesis of FL and rheumatoid arthritis. PMID:23791106

  19. Coding variants at hexa-allelic amino acid 13 of HLA-DRB1 explain independent SNP associations with follicular lymphoma risk.

    PubMed

    Foo, Jia Nee; Smedby, Karin E; Akers, Nicholas K; Berglund, Mattias; Irwan, Ishak D; Jia, Xiaoming; Li, Yi; Conde, Lucia; Darabi, Hatef; Bracci, Paige M; Melbye, Mads; Adami, Hans-Olov; Glimelius, Bengt; Khor, Chiea Chuen; Hjalgrim, Henrik; Padyukov, Leonid; Humphreys, Keith; Enblad, Gunilla; Skibola, Christine F; de Bakker, Paul I W; Liu, Jianjun

    2013-07-11

    Non-Hodgkin lymphoma represents a diverse group of blood malignancies, of which follicular lymphoma (FL) is a common subtype. Previous genome-wide association studies (GWASs) have identified in the human leukocyte antigen (HLA) class II region multiple independent SNPs that are significantly associated with FL risk. To dissect these signals and determine whether coding variants in HLA genes are responsible for the associations, we conducted imputation, HLA typing, and sequencing in three independent populations for a total of 689 cases and 2,446 controls. We identified a hexa-allelic amino acid polymorphism at position 13 of the HLA-DR beta chain that showed the strongest association with FL within the major histocompatibility complex (MHC) region (multiallelic p = 2.3 × 10⁻¹⁵). Out of six possible amino acids that occurred at that position within the population, we classified two as high risk (Tyr and Phe), two as low risk (Ser and Arg), and two as moderate risk (His and Gly). There was a 4.2-fold difference in risk (95% confidence interval = 2.9-6.1) between subjects carrying two alleles encoding high-risk amino acids and those carrying two alleles encoding low-risk amino acids (p = 1.01 × 10⁻¹⁴). This coding variant might explain the complex SNP associations identified by GWASs and suggests a common HLA-DR antigen-driven mechanism for the pathogenesis of FL and rheumatoid arthritis.

  20. The IBO germination quantitative trait locus encodes a phosphatase 2C-related variant with a nonsynonymous amino acid change that interferes with abscisic acid signaling.

    PubMed

    Amiguet-Vercher, Amélia; Santuari, Luca; Gonzalez-Guzman, Miguel; Depuydt, Stephen; Rodriguez, Pedro L; Hardtke, Christian S

    2015-02-01

    Natural genetic variation is crucial for adaptability of plants to different environments. Seed dormancy prevents precocious germination in unsuitable conditions and is an adaptation to a major macro-environmental parameter, the seasonal variation in temperature and day length. Here we report the isolation of IBO, a quantitative trait locus (QTL) that governs c. 30% of germination rate variance in an Arabidopsis recombinant inbred line (RIL) population derived from the parental accessions Eilenburg-0 (Eil-0) and Loch Ness-0 (Lc-0). IBO encodes an uncharacterized phosphatase 2C-related protein, but neither the Eil-0 nor the Lc-0 variant, which differ in a single amino acid, have any appreciable phosphatase activity in in vitro assays. However, we found that the amino acid change in the Lc-0 variant of the IBO protein confers reduced germination rate. Moreover, unlike the Eil-0 variant of the protein, the Lc-0 variant can interfere with the activity of the phosphatase 2C ABSCISIC ACID INSENSITIVE 1 in vitro. This suggests that the Lc-0 variant possibly interferes with abscisic acid signaling, a notion that is supported by physiological assays. Thus, we isolated an example of a QTL allele with a nonsynonymous amino acid change that might mediate local adaptation of seed germination timing. PMID:25490966

  1. PeSV-Fisher: Identification of Somatic and Non-Somatic Structural Variants Using Next Generation Sequencing Data

    PubMed Central

    Rabionet, Raquel; Tubio, Jose M. C.; Martínez-Fundichely, Alexander; Cáceres, Mario; Gut, Marta; Ossowski, Stephan; Estivill, Xavier

    2013-01-01

    Next-generation sequencing technologies expedited research to develop efficient computational tools for the identification of structural variants (SVs) and their use to study human diseases. As deeper data is obtained, the existence of higher complexity SVs in some genomes becomes more evident, but the detection and definition of most of these complex rearrangements is still in its infancy. The full characterization of SVs is a key aspect for discovering their biological implications. Here we present a pipeline (PeSV-Fisher) for the detection of deletions, gains, intra- and inter-chromosomal translocations, and inversions, at very reasonable computational costs. We further provide comprehensive information on co-localization of SVs in the genome, a crucial aspect for studying their biological consequences. The algorithm uses a combination of methods based on paired-reads and read-depth strategies. PeSV-Fisher has been designed with the aim to facilitate identification of somatic variation, and, as such, it is capable of analysing two or more samples simultaneously, producing a list of non-shared variants between samples. We tested PeSV-Fisher on available sequencing data, and compared its behaviour to that of frequently deployed tools (BreakDancer and VariationHunter). We have also tested this algorithm on our own sequencing data, obtained from a tumour and a normal blood sample of a patient with chronic lymphocytic leukaemia, on which we have also validated the results by targeted re-sequencing of different kinds of predictions. This allowed us to determine confidence parameters that influence the reliability of breakpoint predictions. Availability PeSV-Fisher is available at http://gd.crg.eu/tools. PMID:23704902

  2. Identification of Novel Variants in LTBP2 and PXDN Using Whole-Exome Sequencing in Developmental and Congenital Glaucoma

    PubMed Central

    Micheal, Shazia; Siddiqui, Sorath Noorani; Zafar, Saemah Nuzhat; Iqbal, Aftab; Khan, Muhammad Imran; den Hollander, Anneke I.

    2016-01-01

    Background Primary congenital glaucoma (PCG) is the most common form of glaucoma in children. PCG occurs due to the developmental defects in the trabecular meshwork and anterior chamber of the eye. The purpose of this study is to identify the causative genetic variants in three families with developmental and primary congenital glaucoma (PCG) with a recessive inheritance pattern. Methods DNA samples were obtained from consanguineous families of Pakistani ancestry. The CYP1B1 gene was sequenced in the affected probands by conventional Sanger DNA sequencing. Whole exome sequencing (WES) was performed in DNA samples of four individuals belonging to three different CYP1B1-negative families. Variants identified by WES were validated by Sanger sequencing. Results WES identified potentially causative novel mutations in the latent transforming growth factor beta binding protein 2 (LTBP2) gene in two PCG families. In the first family a novel missense mutation (c.4934G>A; p.Arg1645Glu) co-segregates with the disease phenotype, and in the second family a novel frameshift mutation (c.4031_4032insA; p.Asp1345Glyfs*6) was identified. In a third family with developmental glaucoma a novel mutation (c.3496G>A; p.Gly1166Arg) was identified in the PXDN gene, which segregates with the disease. Conclusions We identified three novel mutations in glaucoma families using WES; two in the LTBP2 gene and one in the PXDN gene. The results will not only enhance our current understanding of the genetic basis of glaucoma, but may also contribute to a better understanding of the diverse phenotypic consequences caused by mutations in these genes. PMID:27409795

  3. Hydrogen Exchange Mass Spectrometry of Related Proteins with Divergent Sequences: A Comparative Study of HIV-1 Nef Allelic Variants

    NASA Astrophysics Data System (ADS)

    Wales, Thomas E.; Poe, Jerrod A.; Emert-Sedlak, Lori; Morgan, Christopher R.; Smithgall, Thomas E.; Engen, John R.

    2016-06-01

    Hydrogen exchange mass spectrometry can be used to compare the conformation and dynamics of proteins that are similar in tertiary structure. If relative deuterium levels are measured, differences in sequence, deuterium forward- and back-exchange, peptide retention time, and protease digestion patterns all complicate the data analysis. We illustrate what can be learned from such data sets by analyzing five variants (Consensus G2E, SF2, NL4-3, ELI, and LTNP4) of the HIV-1 Nef protein, both alone and when bound to the human Hck SH3 domain. Regions with similar sequence could be compared between variants. Although much of the hydrogen exchange features were preserved across the five proteins, the kinetics of Nef binding to Hck SH3 were not the same. These observations may be related to biological function, particularly for ELI Nef where we also observed an impaired ability to downregulate CD4 surface presentation. The data illustrate some of the caveats that must be considered for comparison experiments and provide a framework for investigations of other protein relatives, families, and superfamilies with HX MS.

  4. ZFP57 recognizes multiple and closely spaced sequence motif variants to maintain repressive epigenetic marks in mouse embryonic stem cells

    PubMed Central

    Anvar, Zahra; Cammisa, Marco; Riso, Vincenzo; Baglivo, Ilaria; Kukreja, Harpreet; Sparago, Angela; Girardot, Michael; Lad, Shraddha; De Feis, Italia; Cerrato, Flavia; Angelini, Claudia; Feil, Robert; Pedone, Paolo V.; Grimaldi, Giovanna; Riccio, Andrea

    2016-01-01

    Imprinting Control Regions (ICRs) need to maintain their parental allele-specific DNA methylation during early embryogenesis despite genome-wide demethylation and subsequent de novo methylation. ZFP57 and KAP1 are both required for maintaining the repressive DNA methylation and H3-lysine-9-trimethylation (H3K9me3) at ICRs. In vitro, ZFP57 binds a specific hexanucleotide motif that is enriched at its genomic binding sites. We now demonstrate in mouse embryonic stem cells (ESCs) that SNPs disrupting closely-spaced hexanucleotide motifs are associated with lack of ZFP57 binding and H3K9me3 enrichment. Through a transgenic approach in mouse ESCs, we further demonstrate that an ICR fragment containing three ZFP57 motif sequences recapitulates the original methylated or unmethylated status when integrated into the genome at an ectopic position. Mutation of Zfp57 or the hexanucleotide motifs led to loss of ZFP57 binding and DNA methylation of the transgene. Finally, we identified a sequence variant of the hexanucleotide motif that interacts with ZFP57 both in vivo and in vitro. The presence of multiple and closely located copies of ZFP57 motif variants emerges as a distinct characteristic that is required for the faithful maintenance of repressive epigenetic marks at ICRs and other ZFP57 binding sites. PMID:26481358

  5. Increased breadth and depth of cytotoxic T lymphocytes responses against HIV-1-B Nef by inclusion of epitope variant sequences.

    PubMed

    Rolland, Morgane; Frahm, Nicole; Nickle, David C; Jojic, Nebojsa; Deng, Wenjie; Allen, Todd M; Brander, Christian; Heckerman, David E; Mullins, James I

    2011-03-28

    Different vaccine approaches cope with HIV-1 diversity, ranging from centralized(1-4) to variability-encompassing(5-7) antigens. For all these strategies, a concern remains: how does HIV-1 diversity impact epitope recognition by the immune system? We studied the relationship between HIV-1 diversity and CD8(+) T Lymphocytes (CTL) targeting of HIV-1 subtype B Nef using 944 peptides (10-mers overlapping by nine amino acids (AA)) that corresponded to consensus peptides and their most common variants in the HIV-1-B virus population. IFN-γ ELISpot assays were performed using freshly isolated PBMC from 26 HIV-1-infected persons. Three hundred and fifty peptides elicited a response in at least one individual. Individuals targeted a median of 7 discrete regions. Overall, 33% of responses were directed against viral variants but not elicited against consensus-based test peptides. However, there was no significant relationship between the frequency of a 10-mer in the viral population and either its frequency of recognition (Spearman's correlation coefficient ρ = 0.24) or the magnitude of the responses (ρ = 0.16). We found that peptides with a single mutation compared to the consensus were likely to be recognized (especially if the change was conservative) and to elicit responses of similar magnitude as the consensus peptide. Our results indicate that cross-reactivity between rare and frequent variants is likely to play a role in the expansion of CTL responses, and that maximizing antigenic diversity in a vaccine may increase the breadth and depth of CTL responses. However, since there are few obvious preferred pathways to virologic escape, the diversity that may be required to block all potential escape pathways may be too large for a realistic vaccine to accommodate. Furthermore, since peptides were not recognized based on their frequency in the population, it remains unclear by which mechanisms variability-inclusive antigens (i.e., constructs enriched with frequent

  6. Common and rare von Willebrand factor (VWF) coding variants, VWF levels, and factor VIII levels in African Americans: the NHLBI Exome Sequencing Project

    PubMed Central

    Johnsen, Jill M.; Auer, Paul L.; Morrison, Alanna C.; Jiao, Shuo; Wei, Peng; Haessler, Jeffrey; Fox, Keolu; McGee, Sean R.; Smith, Joshua D.; Carlson, Christopher S.; Smith, Nicholas; Boerwinkle, Eric; Kooperberg, Charles; Nickerson, Deborah A.; Rich, Stephen S.; Green, David; Peters, Ulrike; Cushman, Mary

    2013-01-01

    Several rare European von Willebrand disease missense variants of VWF (including p.Arg2185Gln and p.His817Gln) were recently reported to be common in apparently healthy African Americans (AAs). Using data from the NHLBI Exome Sequencing Project, we assessed the association of these and other VWF coding variants with von Willebrand factor (VWF) and factor VIII (FVIII) levels in 4468 AAs. Of 30 nonsynonymous VWF variants, 6 were significantly and independently associated (P < .001) with levels of VWF and/or FVIII. Each additional copy of the common VWF variants encoding p.Thr789Ala or p.Asp1472His was associated with 6 to 8 IU/dL higher VWF levels. The VWF variant encoding p.Arg2185Gln was associated with 7 to 13 IU/dL lower VWF and FVIII levels. The type 2N-related VWF variant encoding p.His817Gln was associated with 17 IU/dL lower FVIII level but normal VWF level. A novel, rare missense VWF variant that predicts disruption of an O-glycosylation site (p.Ser1486Leu) and a rare variant encoding p.Arg2287Trp were each associated with 30 to 40 IU/dL lower VWF level (P < .001). In summary, several common and rare VWF missense variants contribute to phenotypic differences in VWF and FVIII among AAs. PMID:23690449

  7. Genetic Variants in the FADS Gene: Implications for Dietary Recommendations for Fatty Acid Intake

    PubMed Central

    Mathias, Rasika A.; Pani, Vrindarani; Chilton, Floyd H.

    2014-01-01

    Unequivocally, genetic variants within the fatty acid desaturase (FADS) cluster are determinants of long chain polyunsaturated fatty acid (LC-PUFA) levels in circulation, cells and tissues. A recent series of papers have addressed these associations in the context of ancestry; evidence clearly supports that the associations are robust to ethnicity. However ∼80% of African Americans carry two copies of the alleles associated with increased levels of arachidonic acid, compared to only ∼45% of European Americans raising important questions of whether gene-PUFA interactions induced by a modern western diet are differentially driving the risk of diseases of inflammation in diverse populations, and are these interactions leading to health disparities. We highlight an important aspect thus far missing in the debate regarding dietary recommendations; we content that current evidence from genetics strongly suggest that an individual's, or at the very least the population from which an individual is sampled, genetic architecture must be factored into dietary recommendations currently in place. PMID:24977108

  8. Complete Genome Sequence of Human Norovirus GII.4_2006b, a Variant of Minerva 2006

    PubMed Central

    Yang, Zhihui; Mammel, Mark K.

    2016-01-01

    In 2006, the National Calicivirus Laboratory at the U.S. Centers for Disease Control and Prevention (CDC) confirmed multistate outbreaks of norovirus infection and identified two new GII.4 norovirus strains (Minerva and Laurens) through partial sequencing of the major capsid (VP1) gene. Here, we report the first complete genome sequence of the GII.4 Minerva isolate. PMID:26823589

  9. Complete Genome Sequence of Human Norovirus GII.4_2006b, a Variant of Minerva 2006.

    PubMed

    Yang, Zhihui; Mammel, Mark K; Kulka, Michael

    2016-01-01

    In 2006, the National Calicivirus Laboratory at the U.S. Centers for Disease Control and Prevention (CDC) confirmed multistate outbreaks of norovirus infection and identified two new GII.4 norovirus strains (Minerva and Laurens) through partial sequencing of the major capsid (VP1) gene. Here, we report the first complete genome sequence of the GII.4 Minerva isolate. PMID:26823589

  10. Pilot whole-exome sequencing of a German early-onset Alzheimer's disease cohort reveals a substantial frequency of PSEN2 variants.

    PubMed

    Blauwendraat, Cornelis; Wilke, Carlo; Jansen, Iris E; Schulte, Claudia; Simón-Sánchez, Javier; Metzger, Florian G; Bender, Benjamin; Gasser, Thomas; Maetzler, Walter; Rizzu, Patrizia; Heutink, Peter; Synofzik, Matthis

    2016-01-01

    Early-onset Alzheimer's disease (EOAD) accounts for 1%-2% of all Alzheimer's disease (AD) subjects, with large variation in the reported genetic contribution of known dementia genes. In this pilot study, we genetically characterized a German EOAD cohort (23 subjects) by whole-exome sequencing, capturing variants in all recognized AD and frontotemporal dementia genes. After variant filtering, we identified 7 events of altogether 6 different rare variants in 6 subjects, including 4 novel variants. Four of the 6 variants, observed in 5 different index subjects (5/23 = 22%), were considered to be possibly pathogenic. These included 2 presenilin 2 (PSEN2) variants (p.N141I-previously denoted as a Volga German variant, observed in 2 index subjects; and p.L238P), 1 amyloid precursor protein (p.I716M), and 1 presenilin 1 (ΔE9). Using a control exome data set of 96 ethnically matched neurodegenerative disease controls (Parkinson's disease), we identified only 1 variant (PSEN2 p.T18M) (1%), demonstrating a significantly higher mutational burden in the EOAD group (p > 0.0001). Our findings demonstrate a substantial frequency of variants in dementia genes in EOAD, including several seemingly "sporadic" subjects. This indicates that heritability in EOAD might be higher than assumed. The finding of 3 subjects carrying potential pathogenic PSEN2 variants suggests that, in specific populations PSEN2 variants might be as frequent as (or more frequent than) presenilin 1, for example, in German populations which are influenced by Volga German heritage. Variants in AD genes were also associated with rare phenotypes such as frontal AD or primary progressive aphasia, demonstrating the need to screen AD genes in frontotemporal dementia-like phenotypes.

  11. Application of Two-Part Statistics for Comparison of Sequence Variant Counts

    PubMed Central

    Wagner, Brandie D.; Robertson, Charles E.; Harris, J. Kirk

    2011-01-01

    Investigation of microbial communities, particularly human associated communities, is significantly enhanced by the vast amounts of sequence data produced by high throughput sequencing technologies. However, these data create high-dimensional complex data sets that consist of a large proportion of zeros, non-negative skewed counts, and frequently, limited number of samples. These features distinguish sequence data from other forms of high-dimensional data, and are not adequately addressed by statistical approaches in common use. Ultimately, medical studies may identify targeted interventions or treatments, but lack of analytic tools for feature selection and identification of taxa responsible for differences between groups, is hindering advancement. The objective of this paper is to examine the application of a two-part statistic to identify taxa that differ between two groups. The advantages of the two-part statistic over common statistical tests applied to sequence count datasets are discussed. Results from the t-test, the Wilcoxon test, and the two-part test are compared using sequence counts from microbial ecology studies in cystic fibrosis and from cenote samples. We show superior performance of the two-part statistic for analysis of sequence data. The improved performance in microbial ecology studies was independent of study type and sequence technology used. PMID:21629788

  12. Serine protease variants encoded by Echis ocellatus venom gland cDNA: cloning and sequencing analysis.

    PubMed

    Hasson, S S; Mothana, R A; Sallam, T A; Al-balushi, M S; Rahman, M T; Al-Jabri, A A

    2010-01-01

    Envenoming by Echis saw-scaled viper is the leading cause of death and morbidity in Africa due to snake bite. Despite its medical importance, there have been few investigations into the toxin composition of the venom of this viper. Here, we report the cloning of cDNA sequences encoding four groups or isoforms of the haemostasis-disruptive Serine protease proteins (SPs) from the venom glands of Echis ocellatus. All these SP sequences encoded the cysteine residues scaffold that form the 6-disulphide bonds responsible for the characteristic tertiary structure of venom serine proteases. All the Echis ocellatus EoSP groups showed varying degrees of sequence similarity to published viper venom SPs. However, these groups also showed marked intercluster sequence conservation across them which were significantly different from that of previously published viper SPs. Because viper venom SPs exhibit a high degree of sequence similarity and yet exert profoundly different effects on the mammalian haemostatic system, no attempt was made to assign functionality to the new Echis ocellatus EoSPs on the basis of sequence alone. The extraordinary level of interspecific and intergeneric sequence conservation exhibited by the Echis ocellatus EoSPs and analogous serine proteases from other viper species leads us to speculate that antibodies to representative molecules should neutralise (that we will exploit, by epidermal DNA immunization) the biological function of this important group of venom toxins in vipers that are distributed throughout Africa, the Middle East, and the Indian subcontinent. PMID:20936075

  13. The genome of RNA tumor viruses contains polyadenylic acid sequences.

    PubMed

    Green, M; Cartas, M

    1972-04-01

    The 70S genome of two RNA tumor viruses, murine sarcoma virus and avian myeloblastosis virus, binds to Millipore filters in buffer with high salt concentration and to glass fiber filters containing poly(U). These observations suggest that 70S RNA contains adenylic acid-rich sequences. When digested by pancreatic RNase, 70S RNA of murine sarcoma virus yielded poly(A) sequences that contain 91% adenylic acid. These poly(A) sequences sedimented as a relatively homogenous peak in sucrose gradients with a sedimentation coefficient of 4-5 S, but had a mobility during polyacrylamide gel electrophoresis that corresponds to molecules that sediment at 6-7 S. If we estimate a molecular weight for each sequence of 30,000-60,000 (100-200 nucleotides) and a molecular weight for viral 70S RNA of 3-12 million, each viral genome could contain 1-8 poly(A) sequences. Possible functions of poly(A) in the infecting viral RNA may include a role in the initiation of viral DNA or RNA synthesis, in protein maturation, or in the assembly of the viral genome.

  14. Comparative sequence analyses of genome and transcriptome reveal novel transcripts and variants in the Asian elephant Elephas maximus.

    PubMed

    Reddy, Puli Chandramouli; Sinha, Ishani; Kelkar, Ashwin; Habib, Farhat; Pradhan, Saurabh J; Sukumar, Raman; Galande, Sanjeev

    2015-12-01

    The Asian elephant Elephas maximus and the African elephant Loxodonta africana that diverged 5-7 million years ago exhibit differences in their physiology, behaviour and morphology. A comparative genomics approach would be useful and necessary for evolutionary and functional genetic studies of elephants. We performed sequencing of E. maximus and map to L. africana at ~15X coverage. Through comparative sequence analyses, we have identified Asian elephant specific homozygous, non-synonymous single nucleotide variants (SNVs) that map to 1514 protein coding genes, many of which are involved in olfaction. We also present the first report of a high-coverage transcriptome sequence in E. maximus from peripheral blood lymphocytes. We have identified 103 novel protein coding transcripts and 66-long non-coding (lnc)RNAs. We also report the presence of 181 protein domains unique to elephants when compared to other Afrotheria species. Each of these findings can be further investigated to gain a better understanding of functional differences unique to elephant species, as well as those unique to elephantids in comparison with other mammals. This work therefore provides a valuable resource to explore the immense research potential of comparative analyses of transcriptome and genome sequences in the Asian elephant.

  15. Comparative sequence analyses of genome and transcriptome reveal novel transcripts and variants in the Asian elephant Elephas maximus.

    PubMed

    Reddy, Puli Chandramouli; Sinha, Ishani; Kelkar, Ashwin; Habib, Farhat; Pradhan, Saurabh J; Sukumar, Raman; Galande, Sanjeev

    2015-12-01

    The Asian elephant Elephas maximus and the African elephant Loxodonta africana that diverged 5-7 million years ago exhibit differences in their physiology, behaviour and morphology. A comparative genomics approach would be useful and necessary for evolutionary and functional genetic studies of elephants. We performed sequencing of E. maximus and map to L. africana at ~15X coverage. Through comparative sequence analyses, we have identified Asian elephant specific homozygous, non-synonymous single nucleotide variants (SNVs) that map to 1514 protein coding genes, many of which are involved in olfaction. We also present the first report of a high-coverage transcriptome sequence in E. maximus from peripheral blood lymphocytes. We have identified 103 novel protein coding transcripts and 66-long non-coding (lnc)RNAs. We also report the presence of 181 protein domains unique to elephants when compared to other Afrotheria species. Each of these findings can be further investigated to gain a better understanding of functional differences unique to elephant species, as well as those unique to elephantids in comparison with other mammals. This work therefore provides a valuable resource to explore the immense research potential of comparative analyses of transcriptome and genome sequences in the Asian elephant. PMID:26648035

  16. A sequence variant at 4p16.3 confers susceptibility to urinary bladder cancer

    PubMed Central

    Kiemeney, Lambertus A; Sulem, Patrick; Besenbacher, Soren; Vermeulen, Sita H; Sigurdsson, Asgeir; Thorleifsson, Gudmar; Gudbjartsson, Daniel F; Stacey, Simon N; Gudmundsson, Julius; Zanon, Carlo; Kostic, Jelena; Masson, Gisli; Bjarnason, Hjordis; Palsson, Stefan T; Skarphedinsson, Oskar B; Gudjonsson, Sigurjon A; Witjes, J Alfred; Grotenhuis, Anne J; Verhaegh, Gerald W; Bishop, D Timothy; Sak, Sei Chung; Choudhury, Ananya; Elliott, Faye; Barrett, Jennifer H; Hurst, Carolyn D; de Verdier, Petra J; Ryk, Charlotta; Rudnai, Peter; Gurzau, Eugene; Koppova, Kvetoslava; Vineis, Paolo; Polidoro, Silvia; Guarrera, Simonetta; Sacerdote, Carlotta; Campagna, Marcello; Placidi, Donatella; Arici, Cecilia; Zeegers, Maurice P; Kellen, Eliane; Gutierrez, Berta Saez; Sanz-Velez, José I; Sanchez-Zalabardo, Manuel; Valdivia, Gabriel; Garcia-Prats, Maria D; Hengstler, Jan G; Blaszkewicz, Meinolf; Dietrich, Holger; Ophoff, Roel A; van den Berg, Leonard H; Alexiusdottir, Kristin; Kristjansson, Kristleifur; Geirsson, Gudmundur; Nikulasson, Sigfus; Petursdottir, Vigdis; Kong, Augustine; Thorgeirsson, Thorgeir; Mungan, N Aydin; Lindblom, Annika; van Es, Michael A; Porru, Stefano; Buntinx, Frank; Golka, Klaus; Mayordomo, José I; Kumar, Rajiv; Matullo, Giuseppe; Steineck, Gunnar; Kiltie, Anne E; Aben, Katja K H; Jonsson, Eirikur; Thorsteinsdottir, Unnur; Knowles, Margaret A; Rafnar, Thorunn; Stefansson, Kari

    2010-01-01

    Previously, we reported germline DNA variants associated with risk of urinary bladder cancer (UBC) in Dutch and Icelandic subjects. Here we expanded the Icelandic sample set and tested the top 20 markers from the combined analysis in several European case-control sample sets, with a total of 4,739 cases and 45,549 controls. The T allele of rs798766 on 4p16.3 was found to associate with UBC (odds ratio = 1.24, P = 9.9 × 10−12). rs798766 is located in an intron of TACC3, 70 kb from FGFR3, which often harbors activating somatic mutations in low-grade, noninvasive UBC. Notably, rs798766[T] shows stronger association with low-grade and low-stage UBC than with more aggressive forms of the disease and is associated with higher risk of recurrence in low-grade stage Ta tumors. The frequency of rs798766[T] is higher in Ta tumors that carry an activating mutation in FGFR3 than in Ta tumors with wild-type FGFR3. Our results show a link between germline variants, somatic mutations of FGFR3 and risk of UBC. PMID:20348956

  17. Variants of glycoside hydrolases

    DOEpatents

    Teter, Sarah; Ward, Connie; Cherry, Joel; Jones, Aubrey; Harris, Paul; Yi, Jung

    2011-04-26

    The present invention relates to variants of a parent glycoside hydrolase, comprising a substitution at one or more positions corresponding to positions 21, 94, 157, 205, 206, 247, 337, 350, 373, 383, 438, 455, 467, and 486 of amino acids 1 to 513 of SEQ ID NO: 2, and optionally further comprising a substitution at one or more positions corresponding to positions 8, 22, 41, 49, 57, 113, 193, 196, 226, 227, 246, 251, 255, 259, 301, 356, 371, 411, and 462 of amino acids 1 to 513 of SEQ ID NO: 2 a substitution at one or more positions corresponding to positions 8, 22, 41, 49, 57, 113, 193, 196, 226, 227, 246, 251, 255, 259, 301, 356, 371, 411, and 462 of amino acids 1 to 513 of SEQ ID NO: 2, wherein the variants have glycoside hydrolase activity. The present invention also relates to nucleotide sequences encoding the variant glycoside hydrolases and to nucleic acid constructs, vectors, and host cells comprising the nucleotide sequences.

  18. Variants of glycoside hydrolases

    SciTech Connect

    Teter, Sarah; Ward, Connie; Cherry, Joel; Jones, Aubrey; Harris, Paul; Yi, Jung

    2013-02-26

    The present invention relates to variants of a parent glycoside hydrolase, comprising a substitution at one or more positions corresponding to positions 21, 94, 157, 205, 206, 247, 337, 350, 373, 383, 438, 455, 467, and 486 of amino acids 1 to 513 of SEQ ID NO: 2, and optionally further comprising a substitution at one or more positions corresponding to positions 8, 22, 41, 49, 57, 113, 193, 196, 226, 227, 246, 251, 255, 259, 301, 356, 371, 411, and 462 of amino acids 1 to 513 of SEQ ID NO: 2 a substitution at one or more positions corresponding to positions 8, 22, 41, 49, 57, 113, 193, 196, 226, 227, 246, 251, 255, 259, 301, 356, 371, 411, and 462 of amino acids 1 to 513 of SEQ ID NO: 2, wherein the variants have glycoside hydrolase activity. The present invention also relates to nucleotide sequences encoding the variant glycoside hydrolases and to nucleic acid constructs, vectors, and host cells comprising the nucleotide sequences.

  19. Sequences Of Amino Acids For Human Serum Albumin

    NASA Technical Reports Server (NTRS)

    Carter, Daniel C.

    1992-01-01

    Sequences of amino acids defined for use in making polypeptides one-third to one-sixth as large as parent human serum albumin molecule. Smaller, chemically stable peptides have diverse applications including service as artificial human serum and as active components of biosensors and chromatographic matrices. In applications involving production of artificial sera from new sequences, little or no concern about viral contaminants. Smaller genetically engineered polypeptides more easily expressed and produced in large quantities, making commercial isolation and production more feasible and profitable.

  20. miRNA Nomenclature: A View Incorporating Genetic Origins, Biosynthetic Pathways, and Sequence Variants.

    PubMed

    Desvignes, T; Batzel, P; Berezikov, E; Eilbeck, K; Eppig, J T; McAndrews, M S; Singer, A; Postlethwait, J H

    2015-11-01

    High-throughput sequencing of miRNAs has revealed the diversity and variability of mature and functional short noncoding RNAs, including their genomic origins, biogenesis pathways, sequence variability, and newly identified products such as miRNA-offset RNAs (moRs). Here we review known cases of alternative mature miRNA-like RNA fragments and propose a revised definition of miRNAs to encompass this diversity. We then review nomenclature guidelines for miRNAs and propose to extend nomenclature conventions to align with those for protein-coding genes established by international consortia. Finally, we suggest a system to encompass the full complexity of sequence variations (i.e., isomiRs) in the analysis of small RNA sequencing experiments.

  1. Aging as accelerated accumulation of somatic variants: whole-genome sequencing of centenarian and middle-aged monozygotic twin pairs.

    PubMed

    Ye, Kai; Beekman, Marian; Lameijer, Eric-Wubbo; Zhang, Yanju; Moed, Matthijs H; van den Akker, Erik B; Deelen, Joris; Houwing-Duistermaat, Jeanine J; Kremer, Dennis; Anvar, Seyed Yahya; Laros, Jeroen F J; Jones, David; Raine, Keiran; Blackburne, Ben; Potluri, Shobha; Long, Quan; Guryev, Victor; van der Breggen, Ruud; Westendorp, Rudi G J; 't Hoen, Peter A C; den Dunnen, Johan; van Ommen, Gert Jan B; Willemsen, Gonneke; Pitts, Steven J; Cox, David R; Ning, Zemin; Boomsma, Dorret I; Slagboom, P Eline

    2013-12-01

    It has been postulated that aging is the consequence of an accelerated accumulation of somatic DNA mutations and that subsequent errors in the primary structure of proteins ultimately reach levels sufficient to affect organismal functions. The technical limitations of detecting somatic changes and the lack of insight about the minimum level of erroneous proteins to cause an error catastrophe hampered any firm conclusions on these theories. In this study, we sequenced the whole genome of DNA in whole blood of two pairs of monozygotic (MZ) twins, 40 and 100 years old, by two independent next-generation sequencing (NGS) platforms (Illumina and Complete Genomics). Potentially discordant single-base substitutions supported by both platforms were validated extensively by Sanger, Roche 454, and Ion Torrent sequencing. We demonstrate that the genomes of the two twin pairs are germ-line identical between co-twins, and that the genomes of the 100-year-old MZ twins are discerned by eight confirmed somatic single-base substitutions, five of which are within introns. Putative somatic variation between the 40-year-old twins was not confirmed in the validation phase. We conclude from this systematic effort that by using two independent NGS platforms, somatic single nucleotide substitutions can be detected, and that a century of life did not result in a large number of detectable somatic mutations in blood. The low number of somatic variants observed by using two NGS platforms might provide a framework for detecting disease-related somatic variants in phenotypically discordant MZ twins. PMID:24182360

  2. Pooled sequencing and rare variant association tests for identifying the determinants of emerging drug resistance in malaria parasites.

    PubMed

    Cheeseman, Ian H; McDew-White, Marina; Phyo, Aung Pyae; Sriprawat, Kanlaya; Nosten, François; Anderson, Timothy J C

    2015-04-01

    We explored the potential of pooled sequencing to swiftly and economically identify selective sweeps due to emerging artemisinin (ART) resistance in a South-East Asian malaria parasite population. ART resistance is defined by slow parasite clearance from the blood of ART-treated patients and mutations in the kelch gene (chr. 13) have been strongly implicated to play a role. We constructed triplicate pools of 70 slow-clearing (resistant) and 70 fast-clearing (sensitive) infections collected from the Thai-Myanmar border and sequenced these to high (∼ 150-fold) read depth. Allele frequency estimates from pools showed almost perfect correlation (Lin's concordance = 0.98) with allele frequencies at 93 single nucleotide polymorphisms measured directly from individual infections, giving us confidence in the accuracy of this approach. By mapping genome-wide divergence (FST) between pools of drug-resistant and drug-sensitive parasites, we identified two large (>150 kb) regions (on chrs. 13 and 14) and 17 smaller candidate genome regions. To identify individual genes within these genome regions, we resequenced an additional 38 parasite genomes (16 slow and 22 fast-clearing) and performed rare variant association tests. These confirmed kelch as a major molecular marker for ART resistance (P = 6.03 × 10(-6)). This two-tier approach is powerful because pooled sequencing rapidly narrows down genome regions of interest, while targeted rare variant association testing within these regions can pinpoint the genetic basis of resistance. We show that our approach is robust to recurrent mutation and the generation of soft selective sweeps, which are predicted to be common in pathogen populations with large effective population sizes, and may confound more traditional gene mapping approaches. PMID:25534029

  3. Pooled Sequencing and Rare Variant Association Tests for Identifying the Determinants of Emerging Drug Resistance in Malaria Parasites

    PubMed Central

    Cheeseman, Ian H.; McDew-White, Marina; Phyo, Aung Pyae; Sriprawat, Kanlaya; Nosten, François; Anderson, Timothy J.C.

    2015-01-01

    We explored the potential of pooled sequencing to swiftly and economically identify selective sweeps due to emerging artemisinin (ART) resistance in a South-East Asian malaria parasite population. ART resistance is defined by slow parasite clearance from the blood of ART-treated patients and mutations in the kelch gene (chr. 13) have been strongly implicated to play a role. We constructed triplicate pools of 70 slow-clearing (resistant) and 70 fast-clearing (sensitive) infections collected from the Thai–Myanmar border and sequenced these to high (∼150-fold) read depth. Allele frequency estimates from pools showed almost perfect correlation (Lin’s concordance = 0.98) with allele frequencies at 93 single nucleotide polymorphisms measured directly from individual infections, giving us confidence in the accuracy of this approach. By mapping genome-wide divergence (FST) between pools of drug-resistant and drug-sensitive parasites, we identified two large (>150 kb) regions (on chrs. 13 and 14) and 17 smaller candidate genome regions. To identify individual genes within these genome regions, we resequenced an additional 38 parasite genomes (16 slow and 22 fast-clearing) and performed rare variant association tests. These confirmed kelch as a major molecular marker for ART resistance (P = 6.03 × 10−6). This two-tier approach is powerful because pooled sequencing rapidly narrows down genome regions of interest, while targeted rare variant association testing within these regions can pinpoint the genetic basis of resistance. We show that our approach is robust to recurrent mutation and the generation of soft selective sweeps, which are predicted to be common in pathogen populations with large effective population sizes, and may confound more traditional gene mapping approaches. PMID:25534029

  4. Genetic analyses of GII.17 norovirus strains in diarrheal disease outbreaks from December 2014 to March 2015 in Japan reveal a novel polymerase sequence and amino acid substitutions in the capsid region.

    PubMed

    Matsushima, Y; Ishikawa, M; Shimizu, T; Komane, A; Kasuo, S; Shinohara, M; Nagasawa, K; Kimura, H; Ryo, A; Okabe, N; Haga, K; Doan, Y H; Katayama, K; Shimizu, H

    2015-01-01

    A novel GII.P17-GII.17 variant norovirus emerged as a major cause of norovirus outbreaks from December 2014 to March 2015 in Japan. Named Hu/GII/JP/2014/GII.P17-GII.17, this variant has a newly identified GII.P17 type RNA-dependent RNA polymerase, while the capsid sequence displays amino acid substitutions around histo-blood group antigen (HBGA) binding sites. Several variants caused by mutations in the capsid region have previously been observed in the GII.4 genotype. Monitoring the GII.17 variant's geographical spread and evolution is important.

  5. Nucleic acid sequence design via efficient ensemble defect optimization.

    PubMed

    Zadeh, Joseph N; Wolfe, Brian R; Pierce, Niles A

    2011-02-01

    We describe an algorithm for designing the sequence of one or more interacting nucleic acid strands intended to adopt a target secondary structure at equilibrium. Sequence design is formulated as an optimization problem with the goal of reducing the ensemble defect below a user-specified stop condition. For a candidate sequence and a given target secondary structure, the ensemble defect is the average number of incorrectly paired nucleotides at equilibrium evaluated over the ensemble of unpseudoknotted secondary structures. To reduce the computational cost of accepting or rejecting mutations to a random initial sequence, candidate mutations are evaluated on the leaf nodes of a tree-decomposition of the target structure. During leaf optimization, defect-weighted mutation sampling is used to select each candidate mutation position with probability proportional to its contribution to the ensemble defect of the leaf. As subsequences are merged moving up the tree, emergent structural defects resulting from crosstalk between sibling sequences are eliminated via reoptimization within the defective subtree starting from new random subsequences. Using a Θ(N(3) ) dynamic program to evaluate the ensemble defect of a target structure with N nucleotides, this hierarchical approach implies an asymptotic optimality bound on design time: for sufficiently large N, the cost of sequence design is bounded below by 4/3 the cost of a single evaluation of the ensemble defect for the full sequence. Hence, the design algorithm has time complexity Ω(N(3) ). For target structures containing N ∈{100,200,400,800,1600,3200} nucleotides and duplex stems ranging from 1 to 30 base pairs, RNA sequence designs at 37°C typically succeed in satisfying a stop condition with ensemble defect less than N/100. Empirically, the sequence design algorithm exhibits asymptotic optimality and the exponent in the time complexity bound is sharp.

  6. Sequence analysis of the novel HLA-Cw*08 variant allele, Cw*0820, in a Chinese Han individual.

    PubMed

    Deng, Z-H; Xu, Y-P; Wang, D-M

    2009-09-01

    A novel human leukocyte antigen (HLA) allele, HLA-Cw*0820, was identified in a Chinese Han individual. It differs from the closest allele Cw*080101 by single nucleotide change at genomic nucleotide (nt) 1615 G>A (coding sequence nt 652 G>A, codon 194 GTC>ATC) in exon 4, which results in an amino acid change Val194Ile.

  7. A multifactorial likelihood model for MMR gene variant classification incorporating probabilities based on sequence bioinformatics and tumor characteristics: a report from the Colon Cancer Family Registry.

    PubMed

    Thompson, Bryony A; Goldgar, David E; Paterson, Carol; Clendenning, Mark; Walters, Rhiannon; Arnold, Sven; Parsons, Michael T; Michael D, Walsh; Gallinger, Steven; Haile, Robert W; Hopper, John L; Jenkins, Mark A; Lemarchand, Loic; Lindor, Noralane M; Newcomb, Polly A; Thibodeau, Stephen N; Young, Joanne P; Buchanan, Daniel D; Tavtigian, Sean V; Spurdle, Amanda B

    2013-01-01

    Mismatch repair (MMR) gene sequence variants of uncertain clinical significance are often identified in suspected Lynch syndrome families, and this constitutes a challenge for both researchers and clinicians. Multifactorial likelihood model approaches provide a quantitative measure of MMR variant pathogenicity, but first require input of likelihood ratios (LRs) for different MMR variation-associated characteristics from appropriate, well-characterized reference datasets. Microsatellite instability (MSI) and somatic BRAF tumor data for unselected colorectal cancer probands of known pathogenic variant status were used to derive LRs for tumor characteristics using the Colon Cancer Family Registry (CFR) resource. These tumor LRs were combined with variant segregation within families, and estimates of prior probability of pathogenicity based on sequence conservation and position, to analyze 44 unclassified variants identified initially in Australasian Colon CFR families. In addition, in vitro splicing analyses were conducted on the subset of variants based on bioinformatic splicing predictions. The LR in favor of pathogenicity was estimated to be ~12-fold for a colorectal tumor with a BRAF mutation-negative MSI-H phenotype. For 31 of the 44 variants, the posterior probabilities of pathogenicity were such that altered clinical management would be indicated. Our findings provide a working multifactorial likelihood model for classification that carefully considers mode of ascertainment for gene testing.

  8. On combining protein sequences and nucleic acid sequences in phylogenetic analysis: the homeobox protein case.

    PubMed

    Agosti, D; Jacobs, D; DeSalle, R

    1996-01-01

    Amino acid encoding genes contain character state information that may be useful for phylogenetic analysis on at least two levels. The nucleotide sequence and the translated amino acid sequences have both been employed separately as character states for cladistic studies of various taxa, including studies of the genealogy of genes in multigene families. In essence, amino acid sequences and nucleic acid sequences are two different ways of character coding the information in a gene. Silent positions in the nucleotide sequence (first or third positions in codons that can accrue change without changing the identity of the amino acid that the triplet codes for) may accrue change relatively rapidly and become saturated, losing the pattern of historical divergence. On the other hand, non-silent nucleotide alterations and their accompanying amino acid changes may evolve too slowly to reveal relationships among closely related taxa. In general, the dynamics of sequence change in silent and non-silent positions in protein coding genes result in homoplasy and lack of resolution, respectively. We suggest that the combination of nucleic acid and the translated amino acid coded character states into the same data matrix for phylogenetic analysis addresses some of the problems caused by the rapid change of silent nucleotide positions and overall slow rate of change of non-silent nucleotide positions and slowly changing amino acid positions. One major theoretical problem with this approach is the apparent non-independence of the two sources of characters. However, there are at least three possible outcomes when comparing protein coding nucleic acid sequences with their translated amino acids in a phylogenetic context on a codon by codon basis. First, the two character sets for a codon may be entirely congruent with respect to the information they convey about the relationships of a certain set of taxa. Second, one character set may display no information concerning a phylogenetic

  9. Nanopores and nucleic acids: prospects for ultrarapid sequencing

    NASA Technical Reports Server (NTRS)

    Deamer, D. W.; Akeson, M.

    2000-01-01

    DNA and RNA molecules can be detected as they are driven through a nanopore by an applied electric field at rates ranging from several hundred microseconds to a few milliseconds per molecule. The nanopore can rapidly discriminate between pyrimidine and purine segments along a single-stranded nucleic acid molecule. Nanopore detection and characterization of single molecules represents a new method for directly reading information encoded in linear polymers. If single-nucleotide resolution can be achieved, it is possible that nucleic acid sequences can be determined at rates exceeding a thousand bases per second.

  10. Development of a Targeted Multi-Disorder High-Throughput Sequencing Assay for the Effective Identification of Disease-Causing Variants

    PubMed Central

    Delio, Maria; Patel, Kunjan; Maslov, Alex; Marion, Robert W.; McDonald, Thomas V.; Cadoff, Evan M.; Golden, Aaron; Greally, John M.; Vijg, Jan; Morrow, Bernice; Montagna, Cristina

    2015-01-01

    Background While next generation sequencing (NGS) is a useful tool for the identification of genetic variants to aid diagnosis and support therapy decision, high sequencing costs have limited its application within routine clinical care, especially in economically depressed areas. To investigate the utility of a multi-disease NGS based genetic test, we designed a custom sequencing assay targeting over thirty disease-associated areas including cardiac disorders, intellectual disabilities, hearing loss, collagenopathies, muscular dystrophy, Ashkenazi Jewish genetic disorders, and complex Mendelian disorders. We focused on these specific areas based on the interest of our collaborative clinical team, suggesting these diseases being the ones in need for the development of a sequencing-screening assay. Results We targeted all coding, untranslated regions (UTR) and flanking intronic regions of 650 known disease-associated genes using the Roche-NimbleGen EZ SeqCapV3 capture system and sequenced on the Illumina HiSeq 2500 Rapid Run platform. Eight controls with known variants and one HapMap sample were first sequenced to assess the performance of the panel. Subsequently, as a proof of principle and to explore the possible utility of our test, we analyzed test disease subjects (n = 16). Eight had known Mendelian disorders and eight had complex pediatric diseases. In addition to assess whether copy number variation may be of utility as a companion assay relative to these specific disease areas, we used the Affymetrix Genome-Wide SNP Array 6.0 to analyze the same samples. Conclusion We identified potentially disease-associated variants: 22 missense, 4 nonsense, 1 frameshift, and 1 splice variants (16 previously identified, 12 novel among dbSNP and 15 novel among NHLBI Exome Variant Server). We found multi-disease targeted high-throughput sequencing to be a cost efficient approach in detecting disease-associated variants to aid diagnosis. PMID:26214305

  11. Genome-wide Studies of Copy Number Variation and Exome Sequencing Identify Rare Variants in BAG3 as a Cause of Dilated Cardiomyopathy

    PubMed Central

    Norton, Nadine; Li, Duanxiang; Rieder, Mark J.; Siegfried, Jill D.; Rampersaud, Evadnie; Züchner, Stephan; Mangos, Steve; Gonzalez-Quintana, Jorge; Wang, Libin; McGee, Sean; Reiser, Jochen; Martin, Eden; Nickerson, Deborah A.; Hershberger, Ray E.

    2011-01-01

    Dilated cardiomyopathy commonly causes heart failure and is the most frequent precipitating cause of heart transplantation. Familial dilated cardiomyopathy has been shown to be caused by rare variant mutations in more than 30 genes but only ∼35% of its genetic cause has been identified, principally by using linkage-based or candidate gene discovery approaches. In a multigenerational family with autosomal dominant transmission, we employed whole-exome sequencing in a proband and three of his affected family members, and genome-wide copy number variation in the proband and his affected father and unaffected mother. Exome sequencing identified 428 single point variants resulting in missense, nonsense, or splice site changes. Genome-wide copy number analysis identified 51 insertion deletions and 440 copy number variants > 1 kb. Of these, a 8733 bp deletion, encompassing exon 4 of the heat shock protein cochaperone BCL2-associated athanogene 3 (BAG3), was found in seven affected family members and was absent in 355 controls. To establish the relevance of variants in this protein class in genetic DCM, we sequenced the coding exons in BAG3 in 311 other unrelated DCM probands and identified one frameshift, two nonsense, and four missense rare variants absent in 355 control DNAs, four of which were familial and segregated with disease. Knockdown of bag3 in a zebrafish model recapitulated DCM and heart failure. We conclude that new comprehensive genomic approaches have identified rare variants in BAG3 as causative of DCM. PMID:21353195

  12. Variability of the progeny of a sequence variant Citrus bent leaf viroid (CBLVd).

    PubMed

    Gandía, M; Duran-Vila, N

    2004-02-01

    A field isolate of CBLVd was previously shown to contain two dominant subpopulations (I and II), which differed by the presence or absence of a Sal I restriction site in the PCR product [10]. Here we demonstrate the infectivity and symptom expression of subpopulation II by inoculating Etrog citron with a single representative haplotype. The resulting progeny was characterised as an heterogeneous population of closely related variants with a new fitness peak represented by an haplotype that was not identified in the original isolate. This demonstrates that CBLVd conforms a "quasispecies" model. The progeny shared features of the two subpopulations of the original isolate indicating that the original isolate probably arose from a single CBLVd ancestor.

  13. Complete Nucleotide Sequence of IncP-1β Plasmid pDTC28 Reveals a Non-Functional Variant of the blaGES-Type Gene.

    PubMed

    Dang, Bingjun; Mao, Daqing; Luo, Yi

    2016-01-01

    Plasmid pDTC28 was isolated from the sediments of Haihe River using E. coli CV601 (gfp-tagged) as recipient and indigenous bacteria from the sediment as donors. This plasmid confers reduced susceptibility to tetracycline and sulfamethoxazole. The complete sequence of plasmid pDTC28 was 61,503 bp in length with an average G+C content of 64.09%. Plasmid pDTC28 belongs to the IncP-1β group by phylogenetic analysis. The backbones of plasmid pDTC28 and other IncP-1β plasmids are very classical and conserved, whereas the accessory regions of these plasmids are diverse. A blaGES-5-like gene was found on the accessory region, and this blaGES-5-like gene contained 18 silent mutations and 7 missense mutations compared with the blaGES-5 gene. The mutations resulted in 7 amino acid substitutions in GES-5 carbapenemase, causing the loss of function of the blaGES-5-like gene on plasmid pDTC28 against carbapenems and even β-lactams. The enzyme produced by the blaGES-5-like gene cassette may be a new variant of GES-type enzymes. Thus, the plasmid sequenced in this study will expand our understanding of GES-type β-lactamases and provide insights into the genetic platforms used for the dissemination of GES-type genes. PMID:27152950

  14. Promoter variants determine γ-aminobutyric acid homeostasis-related gene transcription in human epileptic hippocampi.

    PubMed

    Pernhorst, Katharina; Raabe, Anna; Niehusmann, Pitt; van Loo, Karen M J; Grote, Alexander; Hoffmann, Per; Cichon, Sven; Sander, Thomas; Schoch, Susanne; Becker, Albert J

    2011-12-01

    The functional consequences of single nucleotide polymorphisms associated with episodic brain disorders such as epilepsy and depression are unclear. Allelic associations with generalized epilepsies have been reported for single nucleotide polymorphisms rs1883415 (ALDH5A1; succinic semialdehyde dehydrogenase) and rs4906902 (GABRB3; GABAA β3), both of which are present in the 5' regulatory region of genes involved in γ-aminobutyric acid (GABA) homeostasis. To address their allelic association with episodic brain disorders and allele-specific impact on the transcriptional regulation of these genes in human brain tissue, DNA and messenger RNA (mRNA) isolated from hippocampi were obtained at epilepsy surgery of 146 pharmacoresistant mesial temporal lobe epilepsy (mTLE) patients and from 651 healthy controls. We found that the C allele of rs1883415 is accumulated to a greater extentin mTLE versus controls. By real-time quantitative reverse transcription-polymerase chain reaction analyses, individuals homozygous for the C allele showed higher ALDH5A1 mRNA expression. The rs4906902 G allele of the GABRB3 gene was overrepresented in mTLE patients with depression; individuals homozygous for the G allele showed reduced GABRB3 mRNA expression. Bioinformatic analyses suggest that rs1883415 and rs4906902 alter the DNA binding affinity of the transcription factors Egr-3 in ALDH5A1 and MEF-2 in GABRB3 promoters, respectively. Using in vitro luciferase transfection assays, we observed that, in both cases, the transcription factors regulate gene expression depending on the allelic variant in the same direction as in the human hippocampi. Our data suggest that distinct promoter variants may sensitize individuals for differential, potentially stimulus-induced alterations of GABA homeostasis-relevant gene expression. This might contribute to the episodic onset of symptoms and point to new targets for pharmacotherapies.

  15. The amino acid sequence of Escherichia coli cyanase.

    PubMed

    Chin, C C; Anderson, P M; Wold, F

    1983-01-10

    The amino acid sequence of the enzyme cyanase (cyanate hydrolase) from Escherichia coli has been determined by automatic Edman degradation of the intact protein and of its component peptides. The primary peptides used in the sequencing were produced by cyanogen bromide cleavage at the methionine residues, yielding 4 peptides plus free homoserine from the NH2-terminal methionine, and by trypsin cleavage at the 7 arginine residues after acetylation of the lysines. Secondary peptides required for overlaps and COOH-terminal sequences were produced by chymotrypsin or clostripain cleavage of some of the larger peptides. The complete sequence of the cyanase subunit consists of 156 amino acid residues (Mr 16,350). Based on the observation that the cysteine-containing peptide is obtained as a disulfide-linked dimer, it is proposed that the covalent structure of cyanase is made up of two subunits linked by a disulfide bond between the single cystine residue in each subunit. The native enzyme (Mr 150,000) then appears to be a complex of four or five such subunit dimers.

  16. Predicting the functional consequences of non-synonymous DNA sequence variants--evaluation of bioinformatics tools and development of a consensus strategy.

    PubMed

    Frousios, Kimon; Iliopoulos, Costas S; Schlitt, Thomas; Simpson, Michael A

    2013-10-01

    The study of DNA sequence variation has been transformed by recent advances in DNA sequencing technologies. Determination of the functional consequences of sequence variant alleles offers potential insight as to how genotype may influence phenotype. Even within protein coding regions of the genome, establishing the consequences of variation on gene and protein function is challenging and requires substantial laboratory investigation. However, a series of bioinformatics tools have been developed to predict whether non-synonymous variants are neutral or disease-causing. In this study we evaluate the performance of nine such methods (SIFT, PolyPhen2, SNPs&GO, PhD-SNP, PANTHER, Mutation Assessor, MutPred, Condel and CAROL) and developed CoVEC (Consensus Variant Effect Classification), a tool that integrates the prediction results from four of these methods. We demonstrate that the CoVEC approach outperforms most individual methods and highlights the benefit of combining results from multiple tools. PMID:23831115

  17. Characterization of a novel HLA-A2 variant, A*0214, by ARMS-PCR and DNA sequencing

    SciTech Connect

    Krausa, P.; Bodmber, J.G.; Browning, M.J.; Barouch, D.; Hill, A.V.S.; McMichael, A.J.; Mason, C.

    1995-01-01

    With recent advances in DNA-based methods for typing HLA class I alleles, the discrimination of new variants not readily detectable by serological or biochemical means has become possible. We used an ARMS polymerase chain reaction (PCR)-based system that identifies HLA-A locus alleles to high resolution to type a Kenyan Black African individual. The unexpected result was indicative of a novel HLA-A*02 allele. Genomic DNA from this individual was analyzed by ARMS-PCR low-resolution typing, high resolution subtyping, and PCR gene mapping. Low-resolution typing identified the sample as HLA-A*02, A*33. High-resolution HLA-A*02 subtyping (data not published), however, suggested the sample was A*02 heterozygous (A*0205, A*0206). In view of the low-resolution HLA-A locus typing, this raised the possibility that the A*02 gene was a new HLA-A*02 variant that contained sequence motifs characteristic of both the A*0205 and A*0206 alleles. 6 refs., 1 fig., 2 tabs.

  18. Using Sequence Variants in Linkage Disequilibrium with Causative Mutations to Improve Across-Breed Prediction in Dairy Cattle: A Simulation Study.

    PubMed

    van den Berg, Irene; Boichard, Didier; Guldbrandtsen, Bernt; Lund, Mogens S

    2016-01-01

    Sequence data are expected to increase the reliability of genomic prediction by containing causative mutations directly, especially in cases where low linkage disequilibrium between markers and causative mutations limits prediction reliability, such as across-breed prediction in dairy cattle. In practice, the causative mutations are unknown, and prediction with only variants in perfect linkage disequilibrium with the causative mutations is not realistic, leading to a reduced reliability compared to knowing the causative variants. Our objective was to use sequence data to investigate the potential benefits of sequence data for the prediction of genomic relationships, and consequently reliability of genomic breeding values. We used sequence data from five dairy cattle breeds, and a larger number of imputed sequences for two of the five breeds. We focused on the influence of linkage disequilibrium between markers and causative mutations, and assumed that a fraction of the causative mutations was shared across breeds and had the same effect across breeds. By comparing the loss in reliability of different scenarios, varying the distance between markers and causative mutations, using either all genome wide markers from commercial SNP chips, or only the markers closest to the causative mutations, we demonstrate the importance of using only variants very close to the causative mutations, especially for across-breed prediction. Rare variants improved prediction only if they were very close to rare causative mutations, and all causative mutations were rare. Our results show that sequence data can potentially improve genomic prediction, but careful selection of markers is essential. PMID:27317779

  19. Using Sequence Variants in Linkage Disequilibrium with Causative Mutations to Improve Across-Breed Prediction in Dairy Cattle: A Simulation Study

    PubMed Central

    van den Berg, Irene; Boichard, Didier; Guldbrandtsen, Bernt; Lund, Mogens S.

    2016-01-01

    Sequence data are expected to increase the reliability of genomic prediction by containing causative mutations directly, especially in cases where low linkage disequilibrium between markers and causative mutations limits prediction reliability, such as across-breed prediction in dairy cattle. In practice, the causative mutations are unknown, and prediction with only variants in perfect linkage disequilibrium with the causative mutations is not realistic, leading to a reduced reliability compared to knowing the causative variants. Our objective was to use sequence data to investigate the potential benefits of sequence data for the prediction of genomic relationships, and consequently reliability of genomic breeding values. We used sequence data from five dairy cattle breeds, and a larger number of imputed sequences for two of the five breeds. We focused on the influence of linkage disequilibrium between markers and causative mutations, and assumed that a fraction of the causative mutations was shared across breeds and had the same effect across breeds. By comparing the loss in reliability of different scenarios, varying the distance between markers and causative mutations, using either all genome wide markers from commercial SNP chips, or only the markers closest to the causative mutations, we demonstrate the importance of using only variants very close to the causative mutations, especially for across-breed prediction. Rare variants improved prediction only if they were very close to rare causative mutations, and all causative mutations were rare. Our results show that sequence data can potentially improve genomic prediction, but careful selection of markers is essential. PMID:27317779

  20. Are sites with multiple single nucleotide variants in cancer genomes a consequence of drivers, hypermutable sites or sequencing errors?

    PubMed Central

    Carr, Antony M.

    2016-01-01

    Across independent cancer genomes it has been observed that some sites have been recurrently hit by single nucleotide variants (SNVs). Such recurrently hit sites might be either (i) drivers of cancer that are postively selected during oncogenesis, (ii) due to mutation rate variation, or (iii) due to sequencing and assembly errors. We have investigated the cause of recurrently hit sites in a dataset of >3 million SNVs from 507 complete cancer genome sequences. We find evidence that many sites have been hit significantly more often than one would expect by chance, even taking into account the effect of the adjacent nucleotides on the rate of mutation. We find that the density of these recurrently hit sites is higher in non-coding than coding DNA and hence conclude that most of them are unlikely to be drivers. We also find that most of them are found in parts of the genome that are not uniquely mappable and hence are likely to be due to mapping errors. In support of the error hypothesis, we find that recurently hit sites are not randomly distributed across sequences from different laboratories. We fit a model to the data in which the rate of mutation is constant across sites but the rate of error varies. This model suggests that ∼4% of all SNVs are errors in this dataset, but that the rate of error varies by thousands-of-fold between sites. PMID:27688957

  1. Are sites with multiple single nucleotide variants in cancer genomes a consequence of drivers, hypermutable sites or sequencing errors?

    PubMed Central

    Carr, Antony M.

    2016-01-01

    Across independent cancer genomes it has been observed that some sites have been recurrently hit by single nucleotide variants (SNVs). Such recurrently hit sites might be either (i) drivers of cancer that are postively selected during oncogenesis, (ii) due to mutation rate variation, or (iii) due to sequencing and assembly errors. We have investigated the cause of recurrently hit sites in a dataset of >3 million SNVs from 507 complete cancer genome sequences. We find evidence that many sites have been hit significantly more often than one would expect by chance, even taking into account the effect of the adjacent nucleotides on the rate of mutation. We find that the density of these recurrently hit sites is higher in non-coding than coding DNA and hence conclude that most of them are unlikely to be drivers. We also find that most of them are found in parts of the genome that are not uniquely mappable and hence are likely to be due to mapping errors. In support of the error hypothesis, we find that recurently hit sites are not randomly distributed across sequences from different laboratories. We fit a model to the data in which the rate of mutation is constant across sites but the rate of error varies. This model suggests that ∼4% of all SNVs are errors in this dataset, but that the rate of error varies by thousands-of-fold between sites.

  2. Vascular Ehlers-Danlos Syndrome in siblings with biallelic COL3A1 sequence variants and marked clinical variability in the extended family.

    PubMed

    Jørgensen, Agnete; Fagerheim, Toril; Rand-Hendriksen, Svend; Lunde, Per I; Vorren, Torgrim O; Pepin, Melanie G; Leistritz, Dru F; Byers, Peter H

    2015-06-01

    Vascular Ehlers-Danlos Syndrome (vEDS), also known as EDS type IV, is considered to be an autosomal dominant disorder caused by sequence variants in COL3A1, which encodes the chains of type III procollagen. We identified a family in which there was marked clinical variation with the earliest death due to extensive aortic dissection at age 15 years and other family members in their eighties with no complications. The proband was born with right-sided clubfoot but was otherwise healthy until he died unexpectedly at 15 years. His sister, in addition to signs consistent with vascular EDS, had bilateral frontal and parietal polymicrogyria. The proband and his sister each had two COL3A1 sequence variants, c.1786C>T, p.(Arg596*) in exon 26 and c.3851G>A, p.(Gly1284Glu) in exon 50 on different alleles. Cells from the compound heterozygote produced a reduced amount of type III procollagen, all the chains of which had abnormal electrophoretic mobility. Biallelic sequence variants have a significantly worse outcome than heterozygous variants for either null mutations or missense mutations, and frontoparietal polymicrogyria may be an added phenotype feature. This genetic constellation provides a very rare explanation for marked intrafamilial clinical variation due to sequence variants in COL3A1. PMID:25205403

  3. Vascular Ehlers–Danlos Syndrome in siblings with biallelic COL3A1 sequence variants and marked clinical variability in the extended family

    PubMed Central

    Jørgensen, Agnete; Fagerheim, Toril; Rand-Hendriksen, Svend; Lunde, Per I; Vorren, Torgrim O; Pepin, Melanie G; Leistritz, Dru F; Byers, Peter H

    2015-01-01

    Vascular Ehlers–Danlos Syndrome (vEDS), also known as EDS type IV, is considered to be an autosomal dominant disorder caused by sequence variants in COL3A1, which encodes the chains of type III procollagen. We identified a family in which there was marked clinical variation with the earliest death due to extensive aortic dissection at age 15 years and other family members in their eighties with no complications. The proband was born with right-sided clubfoot but was otherwise healthy until he died unexpectedly at 15 years. His sister, in addition to signs consistent with vascular EDS, had bilateral frontal and parietal polymicrogyria. The proband and his sister each had two COL3A1 sequence variants, c.1786C>T, p.(Arg596*) in exon 26 and c.3851G>A, p.(Gly1284Glu) in exon 50 on different alleles. Cells from the compound heterozygote produced a reduced amount of type III procollagen, all the chains of which had abnormal electrophoretic mobility. Biallelic sequence variants have a significantly worse outcome than heterozygous variants for either null mutations or missense mutations, and frontoparietal polymicrogyria may be an added phenotype feature. This genetic constellation provides a very rare explanation for marked intrafamilial clinical variation due to sequence variants in COL3A1. PMID:25205403

  4. Identification of Bari Transposons in 23 Sequenced Drosophila Genomes Reveals Novel Structural Variants, MITEs and Horizontal Transfer.

    PubMed

    Palazzo, Antonio; Lovero, Domenica; D'Addabbo, Pietro; Caizzi, Ruggiero; Marsano, René Massimiliano

    2016-01-01

    Bari elements are members of the Tc1-mariner superfamily of DNA transposons, originally discovered in Drosophila melanogaster, and subsequently identified in silico in 11 sequenced Drosophila genomes and as experimentally isolated in four non-sequenced Drosophila species. Bari-like elements have been also studied for their mobility both in vivo and in vitro. We analyzed 23 Drosophila genomes and carried out a detailed characterization of the Bari elements identified, including those from the heterochromatic Bari1 cluster in D. melanogaster. We have annotated 401 copies of Bari elements classified either as putatively autonomous or inactive according to the structure of the terminal sequences and the presence of a complete transposase-coding region. Analyses of the integration sites revealed that Bari transposase prefers AT-rich sequences in which the TA target is cleaved and duplicated. Furthermore evaluation of transposon's co-occurrence near the integration sites of Bari elements showed a non-random distribution of other transposable elements. We also unveil the existence of a putatively autonomous Bari1 variant characterized by two identical long Terminal Inverted Repeats, in D. rhopaloa. In addition, we detected MITEs related to Bari transposons in 9 species. Phylogenetic analyses based on transposase gene and the terminal sequences confirmed that Bari-like elements are distributed into three subfamilies. A few inconsistencies in Bari phylogenetic tree with respect to the Drosophila species tree could be explained by the occurrence of horizontal transfer events as also suggested by the results of dS analyses. This study further clarifies the Bari transposon's evolutionary dynamics and increases our understanding on the Tc1-mariner elements' biology. PMID:27213270

  5. Identification of Bari Transposons in 23 Sequenced Drosophila Genomes Reveals Novel Structural Variants, MITEs and Horizontal Transfer

    PubMed Central

    D’Addabbo, Pietro; Caizzi, Ruggiero

    2016-01-01

    Bari elements are members of the Tc1-mariner superfamily of DNA transposons, originally discovered in Drosophila melanogaster, and subsequently identified in silico in 11 sequenced Drosophila genomes and as experimentally isolated in four non-sequenced Drosophila species. Bari-like elements have been also studied for their mobility both in vivo and in vitro. We analyzed 23 Drosophila genomes and carried out a detailed characterization of the Bari elements identified, including those from the heterochromatic Bari1 cluster in D. melanogaster. We have annotated 401 copies of Bari elements classified either as putatively autonomous or inactive according to the structure of the terminal sequences and the presence of a complete transposase-coding region. Analyses of the integration sites revealed that Bari transposase prefers AT-rich sequences in which the TA target is cleaved and duplicated. Furthermore evaluation of transposon’s co-occurrence near the integration sites of Bari elements showed a non-random distribution of other transposable elements. We also unveil the existence of a putatively autonomous Bari1 variant characterized by two identical long Terminal Inverted Repeats, in D. rhopaloa. In addition, we detected MITEs related to Bari transposons in 9 species. Phylogenetic analyses based on transposase gene and the terminal sequences confirmed that Bari-like elements are distributed into three subfamilies. A few inconsistencies in Bari phylogenetic tree with respect to the Drosophila species tree could be explained by the occurrence of horizontal transfer events as also suggested by the results of dS analyses. This study further clarifies the Bari transposon’s evolutionary dynamics and increases our understanding on the Tc1-mariner elements’ biology. PMID:27213270

  6. Distinct Acid Resistance and Survival Fitness Displayed by Curli Variants of Enterohemorrhagic Escherichia coli O157:H7▿†

    PubMed Central

    Carter, Michelle Q.; Brandl, Maria T.; Louie, Jacqueline W.; Kyle, Jennifer L.; Carychao, Diana K.; Cooley, Michael B.; Parker, Craig T.; Bates, Anne H.; Mandrell, Robert E.

    2011-01-01

    Curli are adhesive fimbriae of Enterobacteriaceae and are involved in surface attachment, cell aggregation, and biofilm formation. Here, we report that both inter- and intrastrain variations in curli production are widespread in enterohemorrhagic Escherichia coli O157:H7. The relative proportions of curli-producing variants (C+) and curli-deficient variants (C−) in an E. coli O157:H7 cell population varied depending on the growth conditions. In variants derived from the 2006 U.S. spinach outbreak strains, the shift between the C+ and C− subpopulations occurred mostly in response to starvation and was unidirectional from C− to C+; in variants derived from the 1993 hamburger outbreak strains, the shift occurred primarily in response to oxygen depletion and was bidirectional. Furthermore, curli variants derived from the same strain displayed marked differences in survival fitness: C+ variants grew to higher concentrations in nutrient-limited conditions than C− variants, whereas C− variants were significantly more acid resistant than C+ variants. This difference in acid resistance does not appear to be linked to the curli fimbriae per se, since a csgA deletion mutant in either a C+ or a C− variant exhibited an acid resistance similar to that of its parental strain. Our data suggest that natural curli variants of E. coli O157:H7 carry several distinct physiological properties that are important for their environmental survival. Maintenance of curli variants in an E. coli O157:H7 population may provide a survival strategy in which C+ variants are selected in a nutrient-limited environment, whereas C− variants are selected in an acidic environment, such as the stomach of an animal host, including that of a human. PMID:21478320

  7. Kinetic and Sequence-Structure-Function Analysis of LinB Enzyme Variants with β- and δ-Hexachlorocyclohexane

    PubMed Central

    Kumari, Kirti; Sharma, Pooja; Lal, Rup; Oakeshott, John G.; Pandey, Gunjan

    2014-01-01

    Organochlorine insecticide hexachlorocyclohexane (HCH) has recently been classified as a ‘Persistent Organic pollutant’ by the Stockholm Convention. The LinB haloalkane dehalogenase is a key upstream enzyme in the recently evolved Lin pathway for the catabolism of HCH in bacteria. Here we report a sequence-structure-function analysis of ten naturally occurring and thirteen synthetic mutants of LinB. One of the synthetic mutants was found to have ∼80 fold more activity for β- and δ-hexachlorocyclohexane. Based on detailed biophysical calculations, molecular dynamics and ensemble docking calculations, we propose that the latter variant is more active because of alterations to the shape of its active site and increased conformational plasticity. PMID:25076214

  8. Exome sequencing identifies recessive CDK5RAP2 variants in patients with isolated agenesis of corpus callosum.

    PubMed

    Jouan, Loubna; Ouled Amar Bencheikh, Bouchra; Daoud, Hussein; Dionne-Laporte, Alexandre; Dobrzeniecka, Sylvia; Spiegelman, Dan; Rochefort, Daniel; Hince, Pascale; Szuto, Anna; Lassonde, Maryse; Barbelanne, Marine; Tsang, William Y; Dion, Patrick A; Théoret, Hugo; Rouleau, Guy A

    2016-04-01

    Agenesis of the corpus callosum (ACC) is a common brain malformation which can be observed either as an isolated condition or as part of numerous congenital syndromes. Therefore, cognitive and neurological involvements in patients with ACC are variable, from mild linguistic and behavioral impairments to more severe neurological deficits. To date, the underlying genetic causes of isolated ACC remains elusive and causative genes have yet to be identified. We performed exome sequencing on three acallosal siblings from the same non-consanguineous family and identified compound heterozygous variants, p.[Gly94Arg];[Asn1232Ser], in the protein encoded by the CDK5RAP2 gene, also known as MCPH3, a gene previously reported to cause autosomal recessive primary microcephaly. Our findings suggest a novel role for this gene in the pathogenesis of isolated ACC. PMID:26197979

  9. A sequence variant of Staphylococcus hominis with a high prevalence of oxacillin and fluoroquinolone resistance.

    PubMed

    Fitzgibbon, J E; Nahvi, M D; Dubin, D T; John, J F

    2001-11-01

    A newly identified subspecies of Staphylococcus hominis, S. hominis subsp. novobiosepticus, was found to be the cause of several invasive infections at a hospital in New Jersey. This subspecies differs from classical S. hominis, now S. hominis subsp. hominis, by the phenotypic characteristics of novobiocin resistance and the inability to ferment trehalose. DNA sequences of segments of 16S rRNA, DNA gyrase (gyrA), and DNA topoisomerase IV (grlA) genes were determined for the type strains of the 2 subspecies, and for 34 S. hominis clinical isolates. The 16S rRNA sequences of the type strains differed at 3 positions over 410 bp; the grlA sequences differed at 6 positions over 119 bp. These sequence differences define S. hominis subsp. novobiosepticus and S. hominis subsp. hominis "sequevars." Of 34 S. hominis clinical isolates, 31 were S. hominis subsp. novobiosepticus sequevars, 28 of which were resistant to both oxacillin and ciprofloxacin. The clinical microbiology laboratory, using a MicroScan system, identified 7 of the 31S. hominis subsp. novobiosepticus sequevars as S. hominis subsp. hominis on the basis of phenotypic characteristics. Three S. hominis subsp. hominis sequevars were all identified phenotypically as S. hominis subsp. hominis and were oxacillin- and ciprofloxacin-susceptible. Although the precise relationship between the S. hominis sequevars and their phenotypic subspecies remains to be determined, our results indicate that antibiotic-resistant clinical isolates of S. hominis belong almost exclusively to the S. hominis subsp. novobiosepticus sequevar.

  10. Complete Genome Sequence of the Porcine Epidemic Diarrhea Virus Variant CH/HNYF/2014.

    PubMed

    Li, Renfeng; Tian, Xiangqin; Qiao, Songlin; Guo, Junqing; Xie, Weitao; Zhang, Gaiping

    2015-01-01

    Sow's milk is a potential route for the vertical transmission of porcine epidemic diarrhea virus (PEDV) from sow to suckling piglet. We report here the complete genome sequence of PEDV strain CH/HNYF/2014, which was isolated from milk samples : This information provides further understanding of the transmission mechanisms and genetic diversity of PEDV. PMID:26679593

  11. Rare, Low-Frequency, and Common Variants in the Protein-Coding Sequence of Biological Candidate Genes from GWASs Contribute to Risk of Rheumatoid Arthritis

    PubMed Central

    Diogo, Dorothée; Kurreeman, Fina; Stahl, Eli A.; Liao, Katherine P.; Gupta, Namrata; Greenberg, Jeffrey D.; Rivas, Manuel A.; Hickey, Brendan; Flannick, Jason; Thomson, Brian; Guiducci, Candace; Ripke, Stephan; Adzhubey, Ivan; Barton, Anne; Kremer, Joel M.; Alfredsson, Lars; Sunyaev, Shamil; Martin, Javier; Zhernakova, Alexandra; Bowes, John; Eyre, Steve; Siminovitch, Katherine A.; Gregersen, Peter K.; Worthington, Jane; Klareskog, Lars; Padyukov, Leonid; Raychaudhuri, Soumya; Plenge, Robert M.

    2013-01-01

    The extent to which variants in the protein-coding sequence of genes contribute to risk of rheumatoid arthritis (RA) is unknown. In this study, we addressed this issue by deep exon sequencing and large-scale genotyping of 25 biological candidate genes located within RA risk loci discovered by genome-wide association studies (GWASs). First, we assessed the contribution of rare coding variants in the 25 genes to the risk of RA in a pooled sequencing study of 500 RA cases and 650 controls of European ancestry. We observed an accumulation of rare nonsynonymous variants exclusive to RA cases in IL2RA and IL2RB (burden test: p = 0.007 and p = 0.018, respectively). Next, we assessed the aggregate contribution of low-frequency and common coding variants to the risk of RA by dense genotyping of the 25 gene loci in 10,609 RA cases and 35,605 controls. We observed a strong enrichment of coding variants with a nominal signal of association with RA (p < 0.05) after adjusting for the best signal of association at the loci (penrichment = 6.4 × 10−4). For one locus containing CD2, we found that a missense variant, rs699738 (c.798C>A [p.His266Gln]), and a noncoding variant, rs624988, reside on distinct haplotypes and independently contribute to the risk of RA (p = 4.6 × 10−6). Overall, our results indicate that variants (distributed across the allele-frequency spectrum) within the protein-coding portion of a subset of biological candidate genes identified by GWASs contribute to the risk of RA. Further, we have demonstrated that very large sample sizes will be required for comprehensively identifying the independent alleles contributing to the missing heritability of RA. PMID:23261300

  12. The KL-VS sequence variant of Klotho and cancer risk in BRCA1 and BRCA2 mutation carriers

    PubMed Central

    Laitman, Yael; Kuchenbaecker, Karoline B.; Rantala, Johanna; Hogervorst, Frans; Peock, Susan; Godwin, Andrew K.; Arason, Adalgeir; Kirchhoff, Tomas; Offit, Kenneth; Isaacs, Claudine; Schmutzler, Rita K.; Wappenschmidt, Barbara; Nevanlinna, Heli; Chen, Xiaoqing; Chenevix-Trench, Georgia; Healey, Sue; Couch, Fergus; Peterlongo, Paolo; Radice, Paolo; Nathanson, Katherine L.; Caligo, Maria Adelaide; Neuhausen, Susan L.; Ganz, Patricia; Sinilnikova, Olga M.; McGuffog, Lesley; Easton, Douglas F.; Antoniou, Antonis C.; Wolf, Ido

    2012-01-01

    Klotho (KL) is a putative tumor suppressor gene in breast and pancreatic cancers located at chromosome 13q12. A functional sequence variant of Klotho (KL-VS) was previously reported to modify breast cancer risk in Jewish BRCA1 mutation carriers. The effect of this variant on breast and ovarian cancer risks in non-Jewish BRCA1/BRCA2 mutation carriers has not been reported. The KL-VS variant was genotyped in women of European ancestry carrying a BRCA mutation: 5,741 BRCA1 mutation carriers (2,997 with breast cancer, 705 with ovarian cancer, and 2,039 cancer free women) and 3,339 BRCA2 mutation carriers (1,846 with breast cancer, 207 with ovarian cancer, and 1,286 cancer free women) from 16 centers. Genotyping was accomplished using TaqMan® allelic discrimination or matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Data were analyzed within a retrospective cohort approach, stratified by country of origin and Ashkenazi Jewish origin. The per-allele hazard ratio (HR) for breast cancer was 1.02 (95% CI 0.93–1.12, P = 0.66) for BRCA1 mutation carriers and 0.92 (95% CI 0.82–1.04, P = 0.17) for BRCA2 mutation carriers. Results remained unaltered when analysis excluded prevalent breast cancer cases. Similarly, the per-allele HR for ovarian cancer was 1.01 (95% CI 0.84–1.20, P = 0.95) for BRCA1 mutation carriers and 0.9 (95% CI 0.66–1.22, P = 0.45) for BRCA2 mutation carriers. The risk did not change when carriers of the 6174delT mutation were excluded. There was a lack of association of the KL-VS Klotho variant with either breast or ovarian cancer risk in BRCA1 and BRCA2 mutation carriers. PMID:22212556

  13. Quantum-Sequencing: Biophysics of quantum tunneling through nucleic acids

    NASA Astrophysics Data System (ADS)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    Tunneling microscopy and spectroscopy has extensively been used in physical surface sciences to study quantum tunneling to measure electronic local density of states of nanomaterials and to characterize adsorbed species. Quantum-Sequencing (Q-Seq) is a new method based on tunneling microscopy for electronic sequencing of single molecule of nucleic acids. A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free single-molecule sequencing method. Here, we present the unique ``electronic fingerprints'' for all nucleotides on DNA and RNA using Q-Seq along their intrinsic biophysical parameters. We have analyzed tunneling spectra for the nucleotides at different pH conditions and analyzed the HOMO, LUMO and energy gap for all of them. In addition we show a number of biophysical parameters to further characterize all nucleobases (electron and hole transition voltage and energy barriers). These results highlight the robustness of Q-Seq as a technique for next-generation sequencing.

  14. Enhanced sensitivity to neutralizing antibodies in a variant of equine infectious anemia virus is linked to amino acid substitutions in the surface unit envelope glycoprotein.

    PubMed Central

    Cook, R F; Berger, S L; Rushlow, K E; McManus, J M; Cook, S J; Harrold, S; Raabe, M L; Montelaro, R C; Issel, C J

    1995-01-01

    Serial passage of the prototype (PR) cell-adapted Wyoming strain of equine infectious anemia virus (EIAV) in fetal donkey dermal (FDD) rather than fetal horse (designated fetal equine kidney [FEK]) cell cultures resulted in the generation of a variant virus strain which produced accelerated cytopathic effects in FDD cells and was 100- to 1,000-fold more sensitive to neutralizing antibodies than its parent. This neutralization-sensitive variant was designated the FDD strain. Although there were differences in glycosylation between the PR and FDD strains, passage of the FDD virus in FEK cells did not reduce its sensitivity to neutralizing antibody. Nucleotide sequencing of the region encoding the surface unit (SU) protein from the FDD strain revealed nine amino acid substitutions compared with the PR strain. Two of these substitutions resulted in changes in the polarity of charge, four caused the introduction of a charged residue, and three had no net change in charge. Nucleotide sequence analysis was extended to the region of the FDD virus genome encoding the extracellular domain of the transmembrane envelope glycoprotein (TM). Unlike the situation with the FDD virus coding region, there were minor variations in nucleotide sequence between individual molecular clones containing this region of the TM gene. Although each clone contained three nucleotide substitutions compared with the PR strain, only one of these was common to all, and this did not affect the amino acid content. Of the remaining two nucleotide substitutions, only one resulted in an amino acid change, and in each case, this change appeared to be conservative. To determine if amino acid substitutions in the SU protein of FDD cell-grown viruses were responsible for the enhanced sensitivity to neutralizing antibodies, chimeric viruses were constructed by using an infectious molecular clone of EIAV. These chimeric viruses contained all of the amino acid substitutions found in the FDD virus strain and were

  15. Sequence variants in the PTCH1 gene associate with spine bone mineral density and osteoporotic fractures.

    PubMed

    Styrkarsdottir, Unnur; Thorleifsson, Gudmar; Gudjonsson, Sigurjon A; Sigurdsson, Asgeir; Center, Jacqueline R; Lee, Seung Hun; Nguyen, Tuan V; Kwok, Timothy C Y; Lee, Jenny S W; Ho, Suzanne C; Woo, Jean; Leung, Ping-C; Kim, Beom-Jun; Rafnar, Thorunn; Kiemeney, Lambertus A; Ingvarsson, Thorvaldur; Koh, Jung-Min; Tang, Nelson L S; Eisman, John A; Christiansen, Claus; Sigurdsson, Gunnar; Thorsteinsdottir, Unnur; Stefansson, Kari

    2016-01-01

    Bone mineral density (BMD) is a measure of osteoporosis and is useful in evaluating the risk of fracture. In a genome-wide association study of BMD among 20,100 Icelanders, with follow-up in 10,091 subjects of European and East-Asian descent, we found a new BMD locus that harbours the PTCH1 gene, represented by rs28377268 (freq. 11.4-22.6%) that associates with reduced spine BMD (P=1.0 × 10(-11), β=-0.09). We also identified a new spine BMD signal in RSPO3, rs577721086 (freq. 6.8%), that associates with increased spine BMD (P=6.6 × 10(-10), β=0.14). Importantly, both variants associate with osteoporotic fractures and affect expression of the PTCH1 and RSPO3 genes that is in line with their influence on BMD and known biological function of these genes. Additional new BMD signals were also found at the AXIN1 and SOST loci and a new lead SNP at the EN1 locus. PMID:26733130

  16. Sequence variants at CHRNB3-CHRNA6 and CYP2A6 affect smoking behavior

    PubMed Central

    Thorgeirsson, Thorgeir E.; Gudbjartsson, Daniel F.; Surakka, Ida; Vink, Jacqueline M.; Amin, Najaf; Geller, Frank; Sulem, Patrick; Rafnar, Thorunn; Esko, Tõnu; Walter, Stefan; Gieger, Christian; Rawal, Rajesh; Mangino, Massimo; Prokopenko, Inga; Mägi, Reedik; Keskitalo, Kaisu; Gudjonsdottir, Iris H.; Gretarsdottir, Solveig; Stefansson, Hreinn; Thompson, John R.; Aulchenko, Yurii S.; Nelis, Mari; Aben, Katja K.; den Heijer, Martin; Dirksen, Asger; Ashraf, Haseem; Soranzo, Nicole; Valdes, Ana M; Steves, Claire; Uitterlinden, André G; Hofman, Albert; Tönjes, Anke; Kovacs, Peter; Hottenga, Jouke Jan; Willemsen, Gonneke; Vogelzangs, Nicole; Döring, Angela; Dahmen, Norbert; Nitz, Barbara; Pergadia, Michele L.; Saez, Berta; De Diego, Veronica; Lezcano, Victoria; Garcia-Prats, Maria D.; Ripatti, Samuli; Perola, Markus; Kettunen, Johannes; Hartikainen, Anna-Liisa; Pouta, Anneli; Laitinen, Jaana; Isohanni, Matti; Huei-Yi, Shen; Allen, Maxine; Krestyaninova, Maria; Hall, Alistair S; Jones, Gregory T.; van Rij, Andre M.; Mueller, Thomas; Dieplinger, Benjamin; Haltmayer, Meinhard; Jonsson, Steinn; Matthiasson, Stefan E.; Oskarsson, Hogni; Tyrfingsson, Thorarinn; Kiemeney, Lambertus A.; Mayordomo, Jose I.; Lindholt, Jes S; Pedersen, Jesper Holst; Franklin, Wilbur A.; Wolf, Holly; Montgomery, Grant W.; Heath, Andrew C.; Martin, Nicholas G.; Madden, Pamela A.F.; Giegling, Ina; Rujescu, Dan; Järvelin, Marjo-Riitta; Salomaa, Veikko; Stumvoll, Michael; Spector, Tim D; Wichmann, H-Erich; Metspalu, Andres; Samani, Nilesh J.; Penninx, Brenda W.; Oostra, Ben A.; Boomsma, Dorret I.; Tiemeier, Henning; van Duijn, Cornelia M.; Kaprio, Jaakko; Gulcher, Jeffrey R.; McCarthy, Mark I.; Peltonen, Leena; Thorsteinsdottir, Unnur; Stefansson, Kari

    2011-01-01

    Smoking is a risk factor for most of the diseases leading in mortality1. We conducted genome-wide association (GWA) meta-analyses of smoking data within the ENGAGE consortium to search for common alleles associating with the number of cigarettes smoked per day (CPD) in smokers (N=31,266) and smoking initiation (N=46,481). We tested selected SNPs in a second stage (N=45,691 smokers), and assessed some in a third sample (N=9,040). Variants in three genomic regions associated with CPD (P< 5·10−8), including previously identified SNPs at 15q25 represented by rs1051730-A (0.80 CPD,P=2.4·10−69), and SNPs at 19q13 and 8p11, represented by rs4105144-C (0.39 CPD, P=2.2·10−12) and rs6474412-T (0.29 CPD,P= 1.4·10−8), respectively. Among the genes at the two novel loci, are genes encoding nicotine-metabolizing enzymes (CYP2A6 and CYP2B6), and nicotinic acetylcholine receptor subunits (CHRNB3 and CHRNA6) highlighted in previous studies of nicotine dependence2-3. Nominal associations with lung cancer were observed at both 8p11 (rs6474412-T,OR=1.09,P=0.04) and 19q13 (rs4105144-C,OR=1.12,P=0.0006). PMID:20418888

  17. Sequence variants in the PTCH1 gene associate with spine bone mineral density and osteoporotic fractures

    PubMed Central

    Styrkarsdottir, Unnur; Thorleifsson, Gudmar; Gudjonsson, Sigurjon A.; Sigurdsson, Asgeir; Center, Jacqueline R.; Lee, Seung Hun; Nguyen, Tuan V.; Kwok, Timothy C.Y.; Lee, Jenny S.W.; Ho, Suzanne C.; Woo, Jean; Leung, Ping-C.; Kim, Beom-Jun; Rafnar, Thorunn; Kiemeney, Lambertus A.; Ingvarsson, Thorvaldur; Koh, Jung-Min; Tang, Nelson L.S.; Eisman, John A.; Christiansen, Claus; Sigurdsson, Gunnar; Thorsteinsdottir, Unnur; Stefansson, Kari

    2016-01-01

    Bone mineral density (BMD) is a measure of osteoporosis and is useful in evaluating the risk of fracture. In a genome-wide association study of BMD among 20,100 Icelanders, with follow-up in 10,091 subjects of European and East-Asian descent, we found a new BMD locus that harbours the PTCH1 gene, represented by rs28377268 (freq. 11.4–22.6%) that associates with reduced spine BMD (P=1.0 × 10−11, β=−0.09). We also identified a new spine BMD signal in RSPO3, rs577721086 (freq. 6.8%), that associates with increased spine BMD (P=6.6 × 10−10, β=0.14). Importantly, both variants associate with osteoporotic fractures and affect expression of the PTCH1 and RSPO3 genes that is in line with their influence on BMD and known biological function of these genes. Additional new BMD signals were also found at the AXIN1 and SOST loci and a new lead SNP at the EN1 locus. PMID:26733130

  18. Sequence variants in the PTCH1 gene associate with spine bone mineral density and osteoporotic fractures.

    PubMed

    Styrkarsdottir, Unnur; Thorleifsson, Gudmar; Gudjonsson, Sigurjon A; Sigurdsson, Asgeir; Center, Jacqueline R; Lee, Seung Hun; Nguyen, Tuan V; Kwok, Timothy C Y; Lee, Jenny S W; Ho, Suzanne C; Woo, Jean; Leung, Ping-C; Kim, Beom-Jun; Rafnar, Thorunn; Kiemeney, Lambertus A; Ingvarsson, Thorvaldur; Koh, Jung-Min; Tang, Nelson L S; Eisman, John A; Christiansen, Claus; Sigurdsson, Gunnar; Thorsteinsdottir, Unnur; Stefansson, Kari

    2016-01-01

    Bone mineral density (BMD) is a measure of osteoporosis and is useful in evaluating the risk of fracture. In a genome-wide association study of BMD among 20,100 Icelanders, with follow-up in 10,091 subjects of European and East-Asian descent, we found a new BMD locus that harbours the PTCH1 gene, represented by rs28377268 (freq. 11.4-22.6%) that associates with reduced spine BMD (P=1.0 × 10(-11), β=-0.09). We also identified a new spine BMD signal in RSPO3, rs577721086 (freq. 6.8%), that associates with increased spine BMD (P=6.6 × 10(-10), β=0.14). Importantly, both variants associate with osteoporotic fractures and affect expression of the PTCH1 and RSPO3 genes that is in line with their influence on BMD and known biological function of these genes. Additional new BMD signals were also found at the AXIN1 and SOST loci and a new lead SNP at the EN1 locus.

  19. Sequence variant on 8q24 confers susceptibility to urinary bladder cancer

    PubMed Central

    Kiemeney, Lambertus A.; Thorlacius, Steinunn; Sulem, Patrick; Geller, Frank; Aben, Katja K.H.; Stacey, Simon N.; Gudmundsson, Julius; Jakobsdottir, Margret; Bergthorsson, Jon T.; Sigurdsson, Asgeir; Blondal, Thorarinn; Witjes, J. Alfred; Vermeulen, Sita H.; Hulsbergen-van de Kaa, Christina A.; Swinkels, Dorine W.; Ploeg, Martine; Cornel, Erik B.; Vergunst, Henk; Thorgeirsson, Thorgeir E.; Gudbjartsson, Daniel; Gudjonsson, Sigurjon A.; Thorleifsson, Gudmar; Kristinsson, Kari T.; Mouy, Magali; Snorradottir, Steinunn; Placidi, Donatella; Campagna, Marcello; Arici, Cecilia; Koppova, Kvetoslava; Gurzau, Eugene; Rudnai, Peter; Kellen, Eliane; Polidoro, Silvia; Guarrera, Simonetta; Sacerdote, Carlotta; Sanchez, Manuel; Saez, Berta; Valdivia, Gabriel; Ryk, Charlotta; de Verdier, Petra; Lindblom, Annika; Golka, Klaus; Bishop, D. Timothy; Knowles, Margaret A.; Nikulasson, Sigfus; Petursdottir, Vigdis; Jonsson, Eirikur; Geirsson, Gudmundur; Kristjansson, Baldvin; Mayordomo, Jose I.; Steineck, Gunnar; Porru, Stefano; Buntinx, Frank; Zeegers, Maurice P.; Fletcher, Tony; Kumar, Rajiv; Matullo, Giuseppe; Vineis, Paolo; Kiltie, Anne E.; Gulcher, Jeffrey R.; Thorsteinsdottir, Unnur; Kong, Augustine; Rafnar, Thorunn; Stefansson, Kari

    2015-01-01

    We conducted a genome wide SNP association study on 1,803 Urinary Bladder Cancer (UBC) cases and 34,336 controls from Iceland and the Netherlands and follow up studies in seven additional case control groups (2,165 cases and 3,800 controls). The strongest association was observed with allele T of rs9642880 on chromosome 8q24, 30kb upstream of the c-Myc gene (allele specific OR=1.22; P=9.34×10−12). Approximately 20% of individuals of European ancestry are homozygous for rs9642880 (T) and their estimated risk of developing UBC is 1.49 times that of non-carriers with population attributable risk (PAR) of 17%. No association was observed between UBC and the four 8q24 variants previously associated with prostate, colorectal and breast cancers, nor did rs9642880 associate with any of these three cancers. A weaker signal, but nonetheless of genome wide significance, was captured by rs710521 (A) located near the TP63 gene on chromosome 3q28 (allele specific OR=1.19; P=1. 15× 10−7). PMID:18794855

  20. Genome-wide association study for endocrine fertility traits using single nucleotide polymorphism arrays and sequence variants in dairy cattle.

    PubMed

    Tenghe, A M M; Bouwman, A C; Berglund, B; Strandberg, E; de Koning, D J; Veerkamp, R F

    2016-07-01

    Endocrine fertility traits, which are defined from progesterone concentration levels in milk, are interesting indicators of dairy cow fertility because they more directly reflect the cows own reproductive physiology than classical fertility traits, which are more biased by farm management decisions. The aim of this study was to detect quantitative trait loci (QTL) for 7 endocrine fertility traits in dairy cows by performing a genome-wide association study with 85k single nucleotide polymorphisms (SNP), and then fine-map targeted QTL regions, using imputed sequence variants. Two classical fertility traits were also analyzed for QTL with 85k SNP. The association between a SNP and a phenotype was assessed by single-locus regression for each SNP, using a linear mixed model that included a random polygenic effect. A total of 2,447 Holstein Friesian cows with 5,339 lactations with both phenotypes and genotypes were used for association analysis. Heritability estimates ranged from 0.09 to 0.15 for endocrine fertility traits and 0.03 to 0.10 for classical fertility traits. The genome-wide association study identified 17 QTL regions for endocrine fertility traits on Bos taurus autosomes (BTA) 2, 3, 8, 12, 15, 17, 23, and 25. The highest number (5) of QTL regions from the genome-wide association study was identified for the endocrine trait "proportion of samples with luteal activity." Overlapping QTL regions were found between endocrine traits on BTA 2, 3, and 17. For the classical trait calving to first service, 3 QTL regions were identified on BTA 3, 15, and 23, and an overlapping region was identified on BTA 23 with endocrine traits. Fine-mapping target regions for the endocrine traits on BTA 2 and 3 using imputed sequence variants confirmed the QTL from the genome-wide association study, and identified several associated variants that can contribute to an index of markers for genetic improvement of fertility. Several potential candidate genes underlying endocrine

  1. Whole-genome re-sequencing for the identification of high contribution susceptibility gene variants in patients with type 2 diabetes.

    PubMed

    Sun, Xiaojuan; Sui, Weiguo; Wang, Xiaobing; Hou, Xianliang; Ou, Minglin; Dai, Yong; Xiang, Yueying

    2016-05-01

    There is increasing evidence that several genes are associated with an increased risk of type 2 diabetes (T2D); genome-wide association investigations and whole-genome re‑sequencing investigations offer a useful approach for the identification of genes involved in common human diseases. To further investigate which polymorphisms confer susceptibility to T2D, the present study screened for high‑contribution susceptibility gene variants Chinese patients with T2D using whole‑genome re‑sequencing with DNA pooling. In total, 100 Chinese individuals with T2D and 100 healthy Chinese individuals were analyzed using whole‑genome re‑sequencing using DNA pooling. To minimize the likelihood of systematic bias in sampling, paired‑end libraries with an insert size of 500 bp were prepared for in T2D in all samples, which were then subjected to whole‑genome sequencing. Each library contained four lanes. The average sequencing depth was 35.70. In the present study, 1.36 GB of clean sequence data were generated, and the resulting calculated T2D genome consensus sequence covered 99.88% of the hg19 sequence. A total of 3,974,307 single nucleotide polymorphisms were identified, of which 99.88% were in the dbSNP database. The present study also found 642,189 insertions and deletions, 5,590 structure variants (SVs), 4,713 copy number variants (CNVs) and 13,049 single nucleotide variants. A total of 1,884 somatic CNVs and 74 somatic SVs were significantly different between the cases and controls. Therefore, the present study provided validation of whole‑genome re‑sequencing using the DNA pooling approach. It also generated a whole-genome re-sequencing genotype database for future investigations of T2D. PMID:27035118

  2. Whole-genome re-sequencing for the identification of high contribution susceptibility gene variants in patients with type 2 diabetes

    PubMed Central

    SUN, XIAOJUAN; SUI, WEIGUO; WANG, XIAOBING; HOU, XIANLIANG; OU, MINGLIN; DAI, YONG; XIANG, YUEYING

    2016-01-01

    There is increasing evidence that several genes are associated with an increased risk of type 2 diabetes (T2D); genome-wide association investigations and whole-genome re-sequencing investigations offer a useful approach for the identification of genes involved in common human diseases. To further investigate which polymorphisms confer susceptibility to T2D, the present study screened for high-contribution susceptibility gene variants Chinese patients with T2D using whole-genome re-sequencing with DNA pooling. In total, 100 Chinese individuals with T2D and 100 healthy Chinese individuals were analyzed using whole-genome re-sequencing using DNA pooling. To minimize the likelihood of systematic bias in sampling, paired-end libraries with an insert size of 500 bp were prepared for in T2D in all samples, which were then subjected to whole-genome sequencing. Each library contained four lanes. The average sequencing depth was 35.70. In the present study, 1.36 GB of clean sequence data were generated, and the resulting calculated T2D genome consensus sequence covered 99.88% of the hg19 sequence. A total of 3,974,307 single nucleotide polymorphisms were identified, of which 99.88% were in the dbSNP database. The present study also found 642,189 insertions and deletions, 5,590 structure variants (SVs), 4,713 copy number variants (CNVs) and 13,049 single nucleotide variants. A total of 1,884 somatic CNVs and 74 somatic SVs were significantly different between the cases and controls. Therefore, the present study provided validation of whole-genome re-sequencing using the DNA pooling approach. It also generated a whole-genome re-sequencing genotype database for future investigations of T2D. PMID:27035118

  3. VNTRseek—a computational tool to detect tandem repeat variants in high-throughput sequencing data

    PubMed Central

    Gelfand, Yevgeniy; Hernandez, Yozen; Loving, Joshua; Benson, Gary

    2014-01-01

    DNA tandem repeats (TRs) are ubiquitous genomic features which consist of two or more adjacent copies of an underlying pattern sequence. The copies may be identical or approximate. Variable number of tandem repeats or VNTRs are polymorphic TR loci in which the number of pattern copies is variable. In this paper we describe VNTRseek, our software for discovery of minisatellite VNTRs (pattern size ≥ 7 nucleotides) using whole genome sequencing data. VNTRseek maps sequencing reads to a set of reference TRs and then identifies putative VNTRs based on a discrepancy between the copy number of a reference and its mapped reads. VNTRseek was used to analyze the Watson and Khoisan genomes (454 technology) and two 1000 Genomes family trios (Illumina). In the Watson genome, we identified 752 VNTRs with pattern sizes ranging from 7 to 84 nt. In the Khoisan genome, we identified 2572 VNTRs with pattern sizes ranging from 7 to 105 nt. In the trios, we identified between 2660 and 3822 VNTRs per individual and found nearly 100% consistency with Mendelian inheritance. VNTRseek is, to the best of our knowledge, the first software for genome-wide detection of minisatellite VNTRs. It is available at http://orca.bu.edu/vntrseek/. PMID:25056320

  4. cDNA sequences of variant forms of human placenta diamine oxidase

    SciTech Connect

    Zhang, X.; Kim, J.; McIntire, S.

    1995-08-01

    Genes for two forms of human placenta diamine oxidase (dao) were cloned from a cDNA library and sequenced. One gene, pdao1, is identical in length to human kidney dao but differs from it by two bases in the coding region and differs slightly in the 3{prime} - and 5{prime}-noncoding regions. The second gene, pdao2, is nearly identical to these genes in the coding region, except that it has an extra 57-nucleotide coding segment near the 3{prime} end of this region. This segment corresponds to the contiguous sequence of the 3{prime} end of intron 3 of human kidney dao. pdao2 also differs significantly from pdao1 and human kidney dao in a 13-base sequence in the t{prime}-noncoding region. It is proposed that pdao1 and human kidney dao are polymorphic forms of the same allele. Whether pdao2 is a polymorph of these two is not certain, because of the significant differences in the coding and noncoding regions. pdao2 may represent a different allele. 21 refs., 2 figs.

  5. Identification of functional variants for cleft lip with or without cleft palate in or near PAX7, FGFR2, and NOG by targeted sequencing of GWAS loci.

    PubMed

    Leslie, Elizabeth J; Taub, Margaret A; Liu, Huan; Steinberg, Karyn Meltz; Koboldt, Daniel C; Zhang, Qunyuan; Carlson, Jenna C; Hetmanski, Jacqueline B; Wang, Hang; Larson, David E; Fulton, Robert S; Kousa, Youssef A; Fakhouri, Walid D; Naji, Ali; Ruczinski, Ingo; Begum, Ferdouse; Parker, Margaret M; Busch, Tamara; Standley, Jennifer; Rigdon, Jennifer; Hecht, Jacqueline T; Scott, Alan F; Wehby, George L; Christensen, Kaare; Czeizel, Andrew E; Deleyiannis, Frederic W-B; Schutte, Brian C; Wilson, Richard K; Cornell, Robert A; Lidral, Andrew C; Weinstock, George M; Beaty, Terri H; Marazita, Mary L; Murray, Jeffrey C

    2015-03-01

    Although genome-wide association studies (GWASs) for nonsyndromic orofacial clefts have identified multiple strongly associated regions, the causal variants are unknown. To address this, we selected 13 regions from GWASs and other studies, performed targeted sequencing in 1,409 Asian and European trios, and carried out a series of statistical and functional analyses. Within a cluster of strongly associated common variants near NOG, we found that one, rs227727, disrupts enhancer activity. We furthermore identified significant clusters of non-coding rare variants near NTN1 and NOG and found several rare coding variants likely to affect protein function, including four nonsense variants in ARHGAP29. We confirmed 48 de novo mutations and, based on best biological evidence available, chose two of these for functional assays. One mutation in PAX7 disrupted the DNA binding of the encoded transcription factor in an in vitro assay. The second, a non-coding mutation, disrupted the activity of a neural crest enhancer downstream of FGFR2 both in vitro and in vivo. This targeted sequencing study provides strong functional evidence implicating several specific variants as primary contributory risk alleles for nonsyndromic clefting in humans.

  6. Identification of Functional Variants for Cleft Lip with or without Cleft Palate in or near PAX7, FGFR2, and NOG by Targeted Sequencing of GWAS Loci

    PubMed Central

    Leslie, Elizabeth J.; Taub, Margaret A.; Liu, Huan; Steinberg, Karyn Meltz; Koboldt, Daniel C.; Zhang, Qunyuan; Carlson, Jenna C.; Hetmanski, Jacqueline B.; Wang, Hang; Larson, David E.; Fulton, Robert S.; Kousa, Youssef A.; Fakhouri, Walid D.; Naji, Ali; Ruczinski, Ingo; Begum, Ferdouse; Parker, Margaret M.; Busch, Tamara; Standley, Jennifer; Rigdon, Jennifer; Hecht, Jacqueline T.; Scott, Alan F.; Wehby, George L.; Christensen, Kaare; Czeizel, Andrew E.; Deleyiannis, Frederic W.-B.; Schutte, Brian C.; Wilson, Richard K.; Cornell, Robert A.; Lidral, Andrew C.; Weinstock, George M.; Beaty, Terri H.; Marazita, Mary L.; Murray, Jeffrey C.

    2015-01-01

    Although genome-wide association studies (GWASs) for nonsyndromic orofacial clefts have identified multiple strongly associated regions, the causal variants are unknown. To address this, we selected 13 regions from GWASs and other studies, performed targeted sequencing in 1,409 Asian and European trios, and carried out a series of statistical and functional analyses. Within a cluster of strongly associated common variants near NOG, we found that one, rs227727, disrupts enhancer activity. We furthermore identified significant clusters of non-coding rare variants near NTN1 and NOG and found several rare coding variants likely to affect protein function, including four nonsense variants in ARHGAP29. We confirmed 48 de novo mutations and, based on best biological evidence available, chose two of these for functional assays. One mutation in PAX7 disrupted the DNA binding of the encoded transcription factor in an in vitro assay. The second, a non-coding mutation, disrupted the activity of a neural crest enhancer downstream of FGFR2 both in vitro and in vivo. This targeted sequencing study provides strong functional evidence implicating several specific variants as primary contributory risk alleles for nonsyndromic clefting in humans. PMID:25704602

  7. Association of low-frequency and rare coding-sequence variants with blood lipids and coronary heart disease in 56,000 whites and blacks.

    PubMed

    Peloso, Gina M; Auer, Paul L; Bis, Joshua C; Voorman, Arend; Morrison, Alanna C; Stitziel, Nathan O; Brody, Jennifer A; Khetarpal, Sumeet A; Crosby, Jacy R; Fornage, Myriam; Isaacs, Aaron; Jakobsdottir, Johanna; Feitosa, Mary F; Davies, Gail; Huffman, Jennifer E; Manichaikul, Ani; Davis, Brian; Lohman, Kurt; Joon, Aron Y; Smith, Albert V; Grove, Megan L; Zanoni, Paolo; Redon, Valeska; Demissie, Serkalem; Lawson, Kim; Peters, Ulrike; Carlson, Christopher; Jackson, Rebecca D; Ryckman, Kelli K; Mackey, Rachel H; Robinson, Jennifer G; Siscovick, David S; Schreiner, Pamela J; Mychaleckyj, Josyf C; Pankow, James S; Hofman, Albert; Uitterlinden, Andre G; Harris, Tamara B; Taylor, Kent D; Stafford, Jeanette M; Reynolds, Lindsay M; Marioni, Riccardo E; Dehghan, Abbas; Franco, Oscar H; Patel, Aniruddh P; Lu, Yingchang; Hindy, George; Gottesman, Omri; Bottinger, Erwin P; Melander, Olle; Orho-Melander, Marju; Loos, Ruth J F; Duga, Stefano; Merlini, Piera Angelica; Farrall, Martin; Goel, Anuj; Asselta, Rosanna; Girelli, Domenico; Martinelli, Nicola; Shah, Svati H; Kraus, William E; Li, Mingyao; Rader, Daniel J; Reilly, Muredach P; McPherson, Ruth; Watkins, Hugh; Ardissino, Diego; Zhang, Qunyuan; Wang, Judy; Tsai, Michael Y; Taylor, Herman A; Correa, Adolfo; Griswold, Michael E; Lange, Leslie A; Starr, John M; Rudan, Igor; Eiriksdottir, Gudny; Launer, Lenore J; Ordovas, Jose M; Levy, Daniel; Chen, Y-D Ida; Reiner, Alexander P; Hayward, Caroline; Polasek, Ozren; Deary, Ian J; Borecki, Ingrid B; Liu, Yongmei; Gudnason, Vilmundur; Wilson, James G; van Duijn, Cornelia M; Kooperberg, Charles; Rich, Stephen S; Psaty, Bruce M; Rotter, Jerome I; O'Donnell, Christopher J; Rice, Kenneth; Boerwinkle, Eric; Kathiresan, Sekar; Cupples, L Adrienne

    2014-02-01

    Low-frequency coding DNA sequence variants in the proprotein convertase subtilisin/kexin type 9 gene (PCSK9) lower plasma low-density lipoprotein cholesterol (LDL-C), protect against risk of coronary heart disease (CHD), and have prompted the development of a new class of therapeutics. It is uncertain whether the PCSK9 example represents a paradigm or an isolated exception. We used the "Exome Array" to genotype >200,000 low-frequency and rare coding sequence variants across the genome in 56,538 individuals (42,208 European ancestry [EA] and 14,330 African ancestry [AA]) and tested these variants for association with LDL-C, high-density lipoprotein cholesterol (HDL-C), and triglycerides. Although we did not identify new genes associated with LDL-C, we did identify four low-frequency (frequencies between 0.1% and 2%) variants (ANGPTL8 rs145464906 [c.361C>T; p.Gln121*], PAFAH1B2 rs186808413 [c.482C>T; p.Ser161Leu], COL18A1 rs114139997 [c.331G>A; p.Gly111Arg], and PCSK7 rs142953140 [c.1511G>A; p.Arg504His]) with large effects on HDL-C and/or triglycerides. None of these four variants was associated with risk for CHD, suggesting that examples of low-frequency coding variants with robust effects on both lipids and CHD will be limited.

  8. Association of Low-Frequency and Rare Coding-Sequence Variants with Blood Lipids and Coronary Heart Disease in 56,000 Whites and Blacks

    PubMed Central

    Peloso, Gina M.; Auer, Paul L.; Bis, Joshua C.; Voorman, Arend; Morrison, Alanna C.; Stitziel, Nathan O.; Brody, Jennifer A.; Khetarpal, Sumeet A.; Crosby, Jacy R.; Fornage, Myriam; Isaacs, Aaron; Jakobsdottir, Johanna; Feitosa, Mary F.; Davies, Gail; Huffman, Jennifer E.; Manichaikul, Ani; Davis, Brian; Lohman, Kurt; Joon, Aron Y.; Smith, Albert V.; Grove, Megan L.; Zanoni, Paolo; Redon, Valeska; Demissie, Serkalem; Lawson, Kim; Peters, Ulrike; Carlson, Christopher; Jackson, Rebecca D.; Ryckman, Kelli K.; Mackey, Rachel H.; Robinson, Jennifer G.; Siscovick, David S.; Schreiner, Pamela J.; Mychaleckyj, Josyf C.; Pankow, James S.; Hofman, Albert; Uitterlinden, Andre G.; Harris, Tamara B.; Taylor, Kent D.; Stafford, Jeanette M.; Reynolds, Lindsay M.; Marioni, Riccardo E.; Dehghan, Abbas; Franco, Oscar H.; Patel, Aniruddh P.; Lu, Yingchang; Hindy, George; Gottesman, Omri; Bottinger, Erwin P.; Melander, Olle; Orho-Melander, Marju; Loos, Ruth J.F.; Duga, Stefano; Merlini, Piera Angelica; Farrall, Martin; Goel, Anuj; Asselta, Rosanna; Girelli, Domenico; Martinelli, Nicola; Shah, Svati H.; Kraus, William E.; Li, Mingyao; Rader, Daniel J.; Reilly, Muredach P.; McPherson, Ruth; Watkins, Hugh; Ardissino, Diego; Zhang, Qunyuan; Wang, Judy; Tsai, Michael Y.; Taylor, Herman A.; Correa, Adolfo; Griswold, Michael E.; Lange, Leslie A.; Starr, John M.; Rudan, Igor; Eiriksdottir, Gudny; Launer, Lenore J.; Ordovas, Jose M.; Levy, Daniel; Chen, Y.-D. Ida; Reiner, Alexander P.; Hayward, Caroline; Polasek, Ozren; Deary, Ian J.; Borecki, Ingrid B.; Liu, Yongmei; Gudnason, Vilmundur; Wilson, James G.; van Duijn, Cornelia M.; Kooperberg, Charles; Rich, Stephen S.; Psaty, Bruce M.; Rotter, Jerome I.; O’Donnell, Christopher J.; Rice, Kenneth; Boerwinkle, Eric; Kathiresan, Sekar; Cupples, L. Adrienne

    2014-01-01

    Low-frequency coding DNA sequence variants in the proprotein convertase subtilisin/kexin type 9 gene (PCSK9) lower plasma low-density lipoprotein cholesterol (LDL-C), protect against risk of coronary heart disease (CHD), and have prompted the development of a new class of therapeutics. It is uncertain whether the PCSK9 example represents a paradigm or an isolated exception. We used the “Exome Array” to genotype >200,000 low-frequency and rare coding sequence variants across the genome in 56,538 individuals (42,208 European ancestry [EA] and 14,330 African ancestry [AA]) and tested these variants for association with LDL-C, high-density lipoprotein cholesterol (HDL-C), and triglycerides. Although we did not identify new genes associated with LDL-C, we did identify four low-frequency (frequencies between 0.1% and 2%) variants (ANGPTL8 rs145464906 [c.361C>T; p.Gln121∗], PAFAH1B2 rs186808413 [c.482C>T; p.Ser161Leu], COL18A1 rs114139997 [c.331G>A; p.Gly111Arg], and PCSK7 rs142953140 [c.1511G>A; p.Arg504His]) with large effects on HDL-C and/or triglycerides. None of these four variants was associated with risk for CHD, suggesting that examples of low-frequency coding variants with robust effects on both lipids and CHD will be limited. PMID:24507774

  9. Optimal Unified Approach for Rare-Variant Association Testing with Application to Small-Sample Case-Control Whole-Exome Sequencing Studies

    PubMed Central

    Lee, Seunggeun; Emond, Mary J.; Bamshad, Michael J.; Barnes, Kathleen C.; Rieder, Mark J.; Nickerson, Deborah A.; Christiani, David C.; Wurfel, Mark M.; Lin, Xihong

    2012-01-01

    We propose in this paper a unified approach for testing the association between rare variants and phenotypes in sequencing association studies. This approach maximizes power by adaptively using the data to optimally combine the burden test and the nonburden sequence kernel association test (SKAT). Burden tests are more powerful when most variants in a region are causal and the effects are in the same direction, whereas SKAT is more powerful when a large fraction of the variants in a region are noncausal or the effects of causal variants are in different directions. The proposed unified test maintains the power in both scenarios. We show that the unified test corresponds to the optimal test in an extended family of SKAT tests, which we refer to as SKAT-O. The second goal of this paper is to develop a small-sample adjustment procedure for the proposed methods for the correction of conservative type I error rates of SKAT family tests when the trait of interest is dichotomous and the sample size is small. Both small-sample-adjusted SKAT and the optimal unified test (SKAT-O) are computationally efficient and can easily be applied to genome-wide sequencing association studies. We evaluate the finite sample performance of the proposed methods using extensive simulation studies and illustrate their application using the acute-lung-injury exome-sequencing data of the National Heart, Lung, and Blood Institute Exome Sequencing Project. PMID:22863193

  10. Molecular Evidence for Mother-to-Child Transmission of Multiple Variants by Analysis of RNA and DNA Sequences of Human Immunodeficiency Virus Type 1

    PubMed Central

    Pasquier, C.; Cayrou, C.; Blancher, A.; Tourne-Petheil, C.; Berrebi, A.; Tricoire, J.; Puel, J.; Izopet, J.

    1998-01-01

    We have examined the viral selection that may occur during transmission by studying the env gene sequences from four cases of mother-to-child transmission of human immunodeficiency virus type 1. The V3 region sequences were directly amplified from both plasma viral RNA and peripheral blood mononuclear cells containing proviral DNA from mothers at delivery and at the time of diagnosis for children. Transmission occurred perinatally in three cases. The similarity of the viral sequences in each infant sample contrasted with the heterogeneous viral populations in the mothers. Phylogenetic analysis indicated the transmission of one or a few closely related maternal minor virus variants. In contrast, the child virus population in the fourth case was as heterogeneous as that of his mother, and phylogenetic analysis strongly suggested the transmission of multiple maternal variants. This case of multiple transmission was confirmed by analyzing sequences obtained at three times after delivery. Strains with sequences corresponding to the syncytium-inducing phenotype were also transmitted in this fourth case, and this was associated with the rapid development of disease in the child. There was no evidence for transmission of particular viral variants from mother to infant. We have thus described a particular case of vertical human immunodeficiency virus type 1 transmission with the transmission of multiple maternal variants to the infant and a rapid, fatal outcome in the child. PMID:9765386

  11. Accurate variant detection across non-amplified and whole genome amplified DNA using targeted next generation sequencing

    PubMed Central

    2012-01-01

    Background Many hypothesis-driven genetic studies require the ability to comprehensively and efficiently target specific regions of the genome to detect sequence variations. Often, sample availability is limited requiring the use of whole genome amplification (WGA). We evaluated a high-throughput microdroplet-based PCR approach in combination with next generation sequencing (NGS) to target 384 discrete exons from 373 genes involved in cancer. In our evaluation, we compared the performance of six non-amplified gDNA samples from two HapMap family trios. Three of these samples were also preamplified by WGA and evaluated. We tested sample pooling or multiplexing strategies at different stages of the tested targeted NGS (T-NGS) workflow. Results The results demonstrated comparable sequence performance between non-amplified and preamplified samples and between different indexing strategies [sequence specificity of 66.0% ± 3.4%, uniformity (coverage at 0.2× of the mean) of 85.6% ± 0.6%]. The average genotype concordance maintained across all the samples was 99.5% ± 0.4%, regardless of sample type or pooling strategy. We did not detect any errors in the Mendelian patterns of inheritance of genotypes between the parents and offspring within each trio. We also demonstrated the ability to detect minor allele frequencies within the pooled samples that conform to predicted models. Conclusion Our described PCR-based sample multiplex approach and the ability to use WGA material for NGS may enable researchers to perform deep resequencing studies and explore variants at very low frequencies and cost. PMID:22994565

  12. VisCap: inference and visualization of germ-line copy-number variants from targeted clinical sequencing data

    PubMed Central

    Pugh, Trevor J.; Amr, Sami S.; Bowser, Mark J.; Gowrisankar, Sivakumar; Hynes, Elizabeth; Mahanta, Lisa M.; Rehm, Heidi L.; Funke, Birgit; Lebo, Matthew S.

    2016-01-01

    Purpose: To develop and validate VisCap, a software program targeted to clinical laboratories for inference and visualization of germ-line copy-number variants (CNVs) from targeted next-generation sequencing data. Genet Med 18 7, 712–719. Methods: VisCap calculates the fraction of overall sequence coverage assigned to genomic intervals and computes log2 ratios of these values to the median of reference samples profiled using the same test configuration. Candidate CNVs are called when log2 ratios exceed user-defined thresholds. Genet Med 18 7, 712–719. Results: We optimized VisCap using 14 cases with known CNVs, followed by prospective analysis of 1,104 cases referred for diagnostic DNA sequencing. To verify calls in the prospective cohort, we used droplet digital polymerase chain reaction (PCR) to confirm 10/27 candidate CNVs and 72/72 copy-neutral genomic regions scored by VisCap. We also used a genome-wide bead array to confirm the absence of CNV calls across panels applied to 10 cases. To improve specificity, we instituted a visual scoring system that enabled experienced reviewers to differentiate true-positive from false-positive calls with minimal impact on laboratory workflow. Genet Med 18 7, 712–719. Conclusions: VisCap is a sensitive method for inferring CNVs from targeted sequence data from targeted gene panels. Visual scoring of data underlying CNV calls is a critical step to reduce false-positive calls for follow-up testing. Genet Med 18 7, 712–719. PMID:26681316

  13. Functional Variants in DPYSL2 Sequence Increase Risk of Schizophrenia and Suggest a Link to mTOR Signaling

    PubMed Central

    Liu, Yaping; Pham, Xuan; Zhang, Lilei; Chen, Pei-lung; Burzynski, Grzegorz; McGaughey, David M.; He, Shan; McGrath, John A.; Wolyniec, Paula; Fallin, Margaret D.; Pierce, Megan S.; McCallion, Andrew S.; Pulver, Ann E.; Avramopoulos, Dimitrios; Valle, David

    2014-01-01

    Numerous linkage and association studies by our group and others have implicated DPYSL2 at 8p21.2 in schizophrenia. Here we explore DPYSL2 for functional variation that underlies these associations. We sequenced all 14 exons of DPYSL2 as well as 27 conserved noncoding regions at the locus in 137 cases and 151 controls. We identified 120 variants, eight of which we genotyped in an additional 729 cases and 1542 controls. Several were significantly associated with schizophrenia, including a three single-nucleotide polymorphism (SNP) haplotype in the proximal promoter, two SNPs in intron 1, and a polymorphic dinucleotide repeat in the 5′-untranslated region that alters sequences predicted to be involved in translational regulation by mammalian target of rapamycin signaling. The 3-SNP promoter haplotype and the sequence surrounding one of the intron 1 SNPs direct tissue-specific expression in the nervous systems of Zebrafish in a pattern consistent with the two endogenous dpysl2 paralogs. In addition, two SNP haplotypes over the coding exons and 3′ end of DPYSL2 showed association with opposing sex-specific risks. These data suggest that these polymorphic, schizophrenia-associated sequences function as regulatory elements for DPYSL2 expression. In transient transfection assays, the high risk allele of the polymorphic dinucleotide repeat diminished reporter expression by 3- to 4-fold. Both the high- and low-risk alleles respond to allosteric mTOR inhibition by rapamycin until, at high drug levels, allelic differences are eliminated. Our results suggest that reduced transcription and mTOR-regulated translation of certain DPYSL2 isoforms increase the risk for schizophrenia. PMID:25416705

  14. Hepatitis C Virus (HCV) NS3 sequence diversity and antiviral resistance-associated variant frequency in HCV/HIV coinfection.

    PubMed

    Jabara, Cassandra B; Hu, Fengyu; Mollan, Katie R; Williford, Sara E; Menezes, Prema; Yang, Yan; Eron, Joseph J; Fried, Michael W; Hudgens, Michael G; Jones, Corbin D; Swanstrom, Ronald; Lemon, Stanley M

    2014-10-01

    HIV coinfection accelerates disease progression in chronic hepatitis C and reduces sustained antiviral responses (SVR) to interferon-based therapy. New direct-acting antivirals (DAAs) promise higher SVR rates, but the selection of preexisting resistance-associated variants (RAVs) may lead to virologic breakthrough or relapse. Thus, pretreatment frequencies of RAVs are likely determinants of treatment outcome but typically are below levels at which the viral sequence can be accurately resolved. Moreover, it is not known how HIV coinfection influences RAV frequency. We adopted an accurate high-throughput sequencing strategy to compare nucleotide diversity in HCV NS3 protease-coding sequences in 20 monoinfected and 20 coinfected subjects with well-controlled HIV infection. Differences in mean pairwise nucleotide diversity (π), Tajima's D statistic, and Shannon entropy index suggested that the genetic diversity of HCV is reduced in coinfection. Among coinfected subjects, diversity correlated positively with increases in CD4(+) T cells on antiretroviral therapy, suggesting T cell responses are important determinants of diversity. At a median sequencing depth of 0.084%, preexisting RAVs were readily identified. Q80K, which negatively impacts clinical responses to simeprevir, was encoded by more than 99% of viral RNAs in 17 of the 40 subjects. RAVs other than Q80K were identified in 39 of 40 subjects, mostly at frequencies near 0.1%. RAV frequency did not differ significantly between monoinfected and coinfected subjects. We conclude that HCV genetic diversity is reduced in patients with well-controlled HIV infection, likely reflecting impaired T cell immunity. However, RAV frequency is not increased and should not adversely influence the outcome of DAA therapy.

  15. TMC-SNPdb: an Indian germline variant database derived from whole exome sequences.

    PubMed

    Upadhyay, Pawan; Gardi, Nilesh; Desai, Sanket; Sahoo, Bikram; Singh, Ankita; Togar, Trupti; Iyer, Prajish; Prasad, Ratnam; Chandrani, Pratik; Gupta, Sudeep; Dutt, Amit

    2016-01-01

    Cancer is predominantly a somatic disease. A mutant allele present in a cancer cell genome is considered somatic when it's absent in the paired normal genome along with public SNP databases. The current build of dbSNP, the most comprehensive public SNP database, however inadequately represents several non-European Caucasian populations, posing a limitation in cancer genomic analyses of data from these populations. We present the T: ata M: emorial C: entre-SNP D: ata B: ase (TMC-SNPdb), as the first open source, flexible, upgradable, and freely available SNP database (accessible through dbSNP build 149 and ANNOVAR)-representing 114 309 unique germline variants-generated from whole exome data of 62 normal samples derived from cancer patients of Indian origin. The TMC-SNPdb is presented with a companion subtraction tool that can be executed with command line option or using an easy-to-use graphical user interface with the ability to deplete additional Indian population specific SNPs over and above dbSNP and 1000 Genomes databases. Using an institutional generated whole exome data set of 132 samples of Indian origin, we demonstrate that TMC-SNPdb could deplete 42, 33 and 28% false positive somatic events post dbSNP depletion in Indian origin tongue, gallbladder, and cervical cancer samples, respectively. Beyond cancer somatic analyses, we anticipate utility of the TMC-SNPdb in several Mendelian germline diseases. In addition to dbSNP build 149 and ANNOVAR, the TMC-SNPdb along with the subtraction tool is available for download in the public domain at the following:Database URL: http://www.actrec.gov.in/pi-webpages/AmitDutt/TMCSNP/TMCSNPdp.html. PMID:27402678

  16. Nucleic acid sequence detection using multiplexed oligonucleotide PCR

    SciTech Connect

    Nolan, John P.; White, P. Scott

    2006-12-26

    Methods for rapidly detecting single or multiple sequence alleles in a sample nucleic acid are described. Provided are all of the oligonucleotide pairs capable of annealing specifically to a target allele and discriminating among possible sequences thereof, and ligating to each other to form an oligonucleotide complex when a particular sequence feature is present (or, alternatively, absent) in the sample nucleic acid. The design of each oligonucleotide pair permits the subsequent high-level PCR amplification of a specific amplicon when the oligonucleotide complex is formed, but not when the oligonucleotide complex is not formed. The presence or absence of the specific amplicon is used to detect the allele. Detection of the specific amplicon may be achieved using a variety of methods well known in the art, including without limitation, oligonucleotide capture onto DNA chips or microarrays, oligonucleotide capture onto beads or microspheres, electrophoresis, and mass spectrometry. Various labels and address-capture tags may be employed in the amplicon detection step of multiplexed assays, as further described herein.

  17. Molecular cloning and amino acid sequence of human 5-lipoxygenase

    SciTech Connect

    Matsumoto, T.; Funk, C.D.; Radmark, O.; Hoeoeg, J.O.; Joernvall, H.; Samuelsson, B.

    1988-01-01

    5-Lipoxygenase (EC 1.13.11.34), a Ca/sup 2 +/- and ATP-requiring enzyme, catalyzes the first two steps in the biosynthesis of the peptidoleukotrienes and the chemotactic factor leukotriene B/sub 4/. A cDNA clone corresponding to 5-lipoxygenase was isolated from a human lung lambda gt11 expression library by immunoscreening with a polyclonal antibody. Additional clones from a human placenta lambda gt11 cDNA library were obtained by plaque hybridization with the /sup 32/P-labeled lung cDNA clone. Sequence data obtained from several overlapping clones indicate that the composite DNAs contain the complete coding region for the enzyme. From the deduced primary structure, 5-lipoxygenase encodes a 673 amino acid protein with a calculated molecular weight of 77,839. Direct analysis of the native protein and its proteolytic fragments confirmed the deduced composition, the amino-terminal amino acid sequence, and the structure of many internal segments. 5-Lipoxygenase has no apparent sequence homology with leukotriene A/sub 4/ hydrolase or Ca/sup 2 +/-binding proteins. RNA blot analysis indicated substantial amounts of an mRNA species of approx. = 2700 nucleotides in leukocytes, lung, and placenta.

  18. Characterization and amino acid sequence of a fatty acid-binding protein from human heart.

    PubMed Central

    Offner, G D; Brecher, P; Sawlivich, W B; Costello, C E; Troxler, R F

    1988-01-01

    The complete amino acid sequence of a fatty acid-binding protein from human heart was determined by automated Edman degradation of CNBr, BNPS-skatole [3'-bromo-3-methyl-2-(2-nitrobenzenesulphenyl)indolenine], hydroxylamine, Staphylococcus aureus V8 proteinase, tryptic and chymotryptic peptides, and by digestion of the protein with carboxypeptidase A. The sequence of the blocked N-terminal tryptic peptide from citraconylated protein was determined by collisionally induced decomposition mass spectrometry. The protein contains 132 amino acid residues, is enriched with respect to threonine and lysine, lacks cysteine, has an acetylated valine residue at the N-terminus, and has an Mr of 14768 and an isoelectric point of 5.25. This protein contains two short internal repeated sequences from residues 48-54 and from residues 114-119 located within regions of predicted beta-structure and decreasing hydrophobicity. These short repeats are contained within two longer repeated regions from residues 48-60 and residues 114-125, which display 62% sequence similarity. These regions could accommodate the charged and uncharged moieties of long-chain fatty acids and may represent fatty acid-binding domains consistent with the finding that human heart fatty acid-binding protein binds 2 mol of oleate or palmitate/mol of protein. Detailed evidence for the amino acid sequences of the peptides has been deposited as Supplementary Publication SUP 50143 (23 pages) at the British Library Lending Division, Boston Spa, Yorkshire LS23 7BQ, U.K., from whom copies may be obtained as indicated in Biochem. J. (1988) 249, 5. PMID:3421901

  19. Targeted Deep Sequencing in Multiple-Affected Sibships of European Ancestry Identifies Rare Deleterious Variants in PTPN22 That Confer Risk for Type 1 Diabetes.

    PubMed

    Ge, Yan; Onengut-Gumuscu, Suna; Quinlan, Aaron R; Mackey, Aaron J; Wright, Jocyndra A; Buckner, Jane H; Habib, Tania; Rich, Stephen S; Concannon, Patrick

    2016-03-01

    Despite finding more than 40 risk loci for type 1 diabetes (T1D), the causative variants and genes remain largely unknown. Here, we sought to identify rare deleterious variants of moderate-to-large effects contributing to T1D. We deeply sequenced 301 protein-coding genes located in 49 previously reported T1D risk loci in 70 T1D cases of European ancestry. These cases were selected from putatively high-risk families that had three or more siblings diagnosed with T1D at early ages. A cluster of rare deleterious variants in PTPN22 was identified, including two novel frameshift mutations (ss538819444 and rs371865329) and two missense variants (rs74163663 and rs56048322). Genotyping in 3,609 T1D families showed that rs56048322 was significantly associated with T1D and that this association was independent of the T1D-associated common variant rs2476601. The risk allele at rs56048322 affects splicing of PTPN22, resulting in the production of two alternative PTPN22 transcripts and a novel isoform of LYP (the protein encoded by PTPN22). This isoform competes with the wild-type LYP for binding to CSK and results in hyporesponsiveness of CD4(+) T cells to antigen stimulation in T1D subjects. These findings demonstrate that in addition to common variants, rare deleterious variants in PTPN22 exist and can affect T1D risk.

  20. Detection of Potato spindle tuber viroid sequence variants derived from PSTVd-infected Phelipanche ramosa in flower organs of tomato plants

    PubMed Central

    Vachev, Tihomir; Ivanova, Desislava; Yahubyan, Galina; Naimov, Samir; Minkov, Ivan; Gozmanova, Mariyana

    2014-01-01

    Potato spindle tuber viroid (PSTVd) is an infectious small, circular, non-coding single-stranded RNA that induces disease on many crop species, ornamental plants, weeds and parasitic plants. PSTVd propagate in their host as a population of closely related but non-identical RNA variants referred to as quasispecies. Recently, we have described three de novo arising PSTVd variants in the parasitic plant Phelipanche ramosa after mechanical inoculation with the PSTVd KF440-2 isolate. These P. ramosa derived mutants were designated as G241-C, C208-U and C227-U PSTVd variants. Each of these variants carries a single-nucleotide substitution compared to the PSTVd KF440-2 sequence from which they are considered to have evolved. Here we complement our previous studies on these mutants by exploring their potential to infect the floral organs of tomato plants. We found that the PSTVd G241-C and C208-U variants were able to replicate in systemic leaves and floral organs of tomato plants, while the PSTVd C227-U variant did not develop systemic infection. Furthermore, we analysed the progeny of these PSTVd variants in sepals and petals of tomato plants for retention of the specific mutations. PMID:26019526

  1. Discovery of common Asian copy number variants using integrated high-resolution array CGH and massively parallel DNA sequencing.

    PubMed

    Park, Hansoo; Kim, Jong-Il; Ju, Young Seok; Gokcumen, Omer; Mills, Ryan E; Kim, Sheehyun; Lee, Seungbok; Suh, Dongwhan; Hong, Dongwan; Kang, Hyunseok Peter; Yoo, Yun Joo; Shin, Jong-Yeon; Kim, Hyun-Jin; Yavartanoo, Maryam; Chang, Young Wha; Ha, Jung-Sook; Chong, Wilson; Hwang, Ga-Ram; Darvishi, Katayoon; Kim, Hyeran; Yang, Song Ju; Yang, Kap-Seok; Kim, Hyungtae; Hurles, Matthew E; Scherer, Stephen W; Carter, Nigel P; Tyler-Smith, Chris; Lee, Charles; Seo, Jeong-Sun

    2010-05-01

    Copy number variants (CNVs) account for the majority of human genomic diversity in terms of base coverage. Here, we have developed and applied a new method to combine high-resolution array comparative genomic hybridization (CGH) data with whole-genome DNA sequencing data to obtain a comprehensive catalog of common CNVs in Asian individuals. The genomes of 30 individuals from three Asian populations (Korean, Chinese and Japanese) were interrogated with an ultra-high-resolution array CGH platform containing 24 million probes. Whole-genome sequencing data from a reference genome (NA10851, with 28.3x coverage) and two Asian genomes (AK1, with 27.8x coverage and AK2, with 32.0x coverage) were used to transform the relative copy number information obtained from array CGH experiments into absolute copy number values. We discovered 5,177 CNVs, of which 3,547 were putative Asian-specific CNVs. These common CNVs in Asian populations will be a useful resource for subsequent genetic studies in these populations, and the new method of calling absolute CNVs will be essential for applying CNV data to personalized medicine.

  2. Genomic variants of genes associated with three horticultural traits in apple revealed by genome re-sequencing

    PubMed Central

    Zhang, Shijie; Chen, Weiping; Xin, Lu; Gao, Zhihong; Hou, Yingjun; Yu, Xinyi; Zhang, Zhen; Qu, Shenchun

    2014-01-01

    The apple (Malus × domestica Borkh.) cultivar ‘Su Shuai’ exhibits greater disease resistance, shorter internodes and lighter fruit flavor compared with its parents ‘Golden Delicious’ and ‘Indo’. To obtain a comprehensive overview of the sequence variation in these three horticultural traits, the genomes of ‘Su Shuai’ and ‘Indo’ were resequenced using next-generation sequencing and compared to the genome of ‘Golden Delicious’. A wide range of genetic variations were detected, including 2 454 406 and 18 749 349 single nucleotide polymorphism (SNP) and 59 547 and 50 143 structural variants (SVs) in the ‘Indo’ and ‘Su Shuai’ genomes, respectively. Among the SVs in ‘Su Shuai’, 17 genes related to disease resistance, 10 genes related to Gibberellin (GA) and 19 genes associated with fruit flavor were identified. The expression patterns of eight of the SV genes were examined using reverse transcription-quantitative polymerase chain reaction (RT-qPCR). The results of this study illustrate the genomic variation in these cultivars and provide evidence for a genetic basis for the horticultural traits of disease resistance, short internodes and lighter flavor exhibited in these cultivars. These results provide a genetic basis for the phenotypic characteristics of ‘Su Shuai’ and, as such, these SVs could serve as gene-specific molecular markers in maker-assisted breeding of apples. PMID:26504548

  3. Genome wide association study of uric acid in Indian population and interaction of identified variants with Type 2 diabetes.

    PubMed

    Giri, Anil K; Banerjee, Priyanka; Chakraborty, Shraddha; Kauser, Yasmeen; Undru, Aditya; Roy, Suki; Parekatt, Vaisak; Ghosh, Saurabh; Tandon, Nikhil; Bharadwaj, Dwaipayan

    2016-02-23

    Abnormal level of Serum Uric Acid (SUA) is an important marker and risk factor for complex diseases including Type 2 Diabetes. Since genetic determinant of uric acid in Indians is totally unexplored, we tried to identify common variants associated with SUA in Indians using Genome Wide Association Study (GWAS). Association of five known variants in SLC2A9 and SLC22A11 genes with SUA level in 4,834 normoglycemics (1,109 in discovery and 3,725 in validation phase) was revealed with different effect size in Indians compared to other major ethnic population of the world. Combined analysis of 1,077 T2DM subjects (772 in discovery and 305 in validation phase) and normoglycemics revealed additional GWAS signal in ABCG2 gene. Differences in effect sizes of ABCG2 and SLC2A9 gene variants were observed between normoglycemics and T2DM patients. We identified two novel variants near long non-coding RNA genes AL356739.1 and AC064865.1 with nearly genome wide significance level. Meta-analysis and in silico replication in 11,745 individuals from AUSTWIN consortium improved association for rs12206002 in AL356739.1 gene to sub-genome wide association level. Our results extends association of SLC2A9, SLC22A11 and ABCG2 genes with SUA level in Indians and enrich the assemblages of evidence for SUA level and T2DM interrelationship.

  4. Genome wide association study of uric acid in Indian population and interaction of identified variants with Type 2 diabetes

    PubMed Central

    Giri, Anil K; Banerjee, Priyanka; Chakraborty, Shraddha; Kauser, Yasmeen; Undru, Aditya; Roy, Suki; Parekatt, Vaisak; Ghosh, Saurabh; Tandon, Nikhil; Bharadwaj, Dwaipayan

    2016-01-01

    Abnormal level of Serum Uric Acid (SUA) is an important marker and risk factor for complex diseases including Type 2 Diabetes. Since genetic determinant of uric acid in Indians is totally unexplored, we tried to identify common variants associated with SUA in Indians using Genome Wide Association Study (GWAS). Association of five known variants in SLC2A9 and SLC22A11 genes with SUA level in 4,834 normoglycemics (1,109 in discovery and 3,725 in validation phase) was revealed with different effect size in Indians compared to other major ethnic population of the world. Combined analysis of 1,077 T2DM subjects (772 in discovery and 305 in validation phase) and normoglycemics revealed additional GWAS signal in ABCG2 gene. Differences in effect sizes of ABCG2 and SLC2A9 gene variants were observed between normoglycemics and T2DM patients. We identified two novel variants near long non-coding RNA genes AL356739.1 and AC064865.1 with nearly genome wide significance level. Meta-analysis and in silico replication in 11,745 individuals from AUSTWIN consortium improved association for rs12206002 in AL356739.1 gene to sub-genome wide association level. Our results extends association of SLC2A9, SLC22A11 and ABCG2 genes with SUA level in Indians and enrich the assemblages of evidence for SUA level and T2DM interrelationship. PMID:26902266

  5. Characterization of a factor IX variant with a glycine207 to glutamic acid mutation.

    PubMed

    Lin, S W; Lin, C N; Hamaguchi, N; Smith, K J; Shen, M C

    1994-09-15

    Factor IXTaipei9 is a factor IX variant from a hemophilia B patient with reduced levels of circulating protein molecules (cross-reacting material reduced, CRM). This variant contained a glycine (Gly) to glutamic acid (Glu) substitution at the 207th codon of mature factor IX. The functional consequences of the Gly-->Glu mutation in factor IXTaipei9 (IXG207E) were characterized in this study. Plasma-derived IXG207E exhibited a mobility similar to that of normal factor IX on sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Its specific activity was estimated to be 3.5% that of the purified normal factor IX in a one-stage partial thromboplastin time assay (aPTT). Cleavage of factor IXG207E by factor XIa or factor VIIa-tissue factor complex appeared to be normal. When the calcium-dependent conformational change was examined by monitoring quenching of intrinsic fluorescence, both normal factor IX and IXG207E exhibited equivalent intrinsic fluorescence quenching. Activated factor IXG207E (IXaG207E) also binds antithrombin III equally as well as normal factor IXa. However, aberrant binding of the active site probe p-aminobenzamidine was observed for factor XIa-activated factor IXG207E, indicating that the active site pocket of the heavy chain of factor IXaG207E was abnormal. Moreover, the rate of activation of factor X by factor IXaG207E, as measured in a purified system using chromogenic substrates, was estimated to be 1/40 of that of normal factor IXa. A computer-modeled heavy-chain structure of factor IXa predicts a hydrophobic environment surrounding Gly-207 and this Gly forms a hydrogen bound to the active site serine-365. The molecular mechanism of the Gly-->Glu mutation in factor IXTaipei9 might result in the alteration of the microenvironment of the active site pocket which renders the active site serine-365 inaccessible to its substrate. PMID:7915915

  6. New approaches for computer analysis of nucleic acid sequences.

    PubMed

    Karlin, S; Ghandour, G; Ost, F; Tavare, S; Korn, L J

    1983-09-01

    A new high-speed computer algorithm is outlined that ascertains within and between nucleic acid and protein sequences all direct repeats, dyad symmetries, and other structural relationships. Large repeats, repeats of high frequency, dyad symmetries of specified stem length and loop distance, and their distributions are determined. Significance of homologies is assessed by a hierarchy of permutation procedures. Applications are made to papovaviruses, the human papillomavirus HPV, lambda phage, the human and mouse mitochondrial genomes, and the human and mouse immunoglobulin kappa-chain genes. PMID:6577449

  7. Structural and functional interaction of fatty acids with human liver fatty acid-binding protein (L-FABP) T94A variant.

    PubMed

    Huang, Huan; McIntosh, Avery L; Martin, Gregory G; Landrock, Kerstin K; Landrock, Danilo; Gupta, Shipra; Atshaves, Barbara P; Kier, Ann B; Schroeder, Friedhelm

    2014-05-01

    The human liver fatty acid-binding protein (L-FABP) T94A variant, the most common in the FABP family, has been associated with elevated liver triglyceride levels. How this amino acid substitution elicits these effects is not known. This issue was addressed using human recombinant wild-type (WT) and T94A variant L-FABP proteins as well as cultured primary human hepatocytes expressing the respective proteins (genotyped as TT, TC and CC). The T94A substitution did not alter or only slightly altered L-FABP binding affinities for saturated, monounsaturated or polyunsaturated long chain fatty acids, nor did it change the affinity for intermediates of triglyceride synthesis. Nevertheless, the T94A substitution markedly altered the secondary structural response of L-FABP induced by binding long chain fatty acids or intermediates of triglyceride synthesis. Finally, the T94A substitution markedly decreased the levels of induction of peroxisome proliferator-activated receptor α-regulated proteins such as L-FABP, fatty acid transport protein 5 and peroxisome proliferator-activated receptor α itself meditated by the polyunsaturated fatty acids eicosapentaenoic acid and docosahexaenoic acid in cultured primary human hepatocytes. Thus, although the T94A substitution did not alter the affinity of human L-FABP for long chain fatty acids, it significantly altered human L-FABP structure and stability, as well as the conformational and functional response to these ligands.

  8. Association of low-frequency and rare coding-sequence variants with blood lipids and Coronary Heart Disease in 56,000 whites and blacks

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Low-frequency coding DNA sequence variants in the proprotein convertase subtilisin/kexin type 9 gene (PCSK9) lower plasma low-density lipoprotein cholesterol (LDL-C), protect against risk of coronary heart disease (CHD), and have prompted the development of a new class of therapeutics. It is uncerta...

  9. Discovery of potential new gene variants and inflammatory cytokine associations with fibromyalgia syndrome by whole exome sequencing.

    PubMed

    Feng, Jinong; Zhang, Zhifang; Wu, Xiwei; Mao, Allen; Chang, Frances; Deng, Xutao; Gao, Harry; Ouyang, Ching; Dery, Kenneth J; Le, Keith; Longmate, Jeffrey; Marek, Claudia; St Amand, R Paul; Krontiris, Theodore G; Shively, John E

    2013-01-01

    Fibromyalgia syndrome (FMS) is a chronic musculoskeletal pain disorder affecting 2% to 5% of the general population. Both genetic and environmental factors may be involved. To ascertain in an unbiased manner which genes play a role in the disorder, we performed complete exome sequencing on a subset of FMS patients. Out of 150 nuclear families (trios) DNA from 19 probands was subjected to complete exome sequencing. Since >80,000 SNPs were found per proband, the data were further filtered, including analysis of those with stop codons, a rare frequency (<2.5%) in the 1000 Genomes database, and presence in at least 2/19 probands sequenced. Two nonsense mutations, W32X in C11orf40 and Q100X in ZNF77 among 150 FMS trios had a significantly elevated frequency of transmission to affected probands (p = 0.026 and p = 0.032, respectively) and were present in a subset of 13% and 11% of FMS patients, respectively. Among 9 patients bearing more than one of the variants we have described, 4 had onset of symptoms between the ages of 10 and 18. The subset with the C11orf40 mutation had elevated plasma levels of the inflammatory cytokines, MCP-1 and IP-10, compared with unaffected controls or FMS patients with the wild-type allele. Similarly, patients with the ZNF77 mutation have elevated levels of the inflammatory cytokine, IL-12, compared with controls or patients with the wild type allele. Our results strongly implicate an inflammatory basis for FMS, as well as specific cytokine dysregulation, in at least 35% of our FMS cohort.

  10. Clinical Implementation of Germline Cancer Pharmacogenetic Variants during the Next-Generation Sequencing Era

    PubMed Central

    Gillis, Nancy K.; Patel, Jai N.; Innocenti, Federico

    2014-01-01

    Over 100 FDA-approved medications include pharmacogenetic biomarkers in the drug label, many with cancer indications referencing germline DNA variations. With the advent of next-generation sequencing (NGS) and its rapidly increasing uptake into cancer research and clinical practice, an enormous amount of data to inform documented gene-drug associations will be collected, which must be exploited to optimize patient benefit. This state-of-the-art article focuses on the implementation of germline cancer pharmacogenetics into clinical practice. Specifically, it discusses the importance of germline variation in cancer and the role of NGS in pharmacogenetic discovery and implementation. In the context of a scenario where massive NGS-based genetic information will be increasingly available to health stakeholders, this review explores the ongoing debate over the threshold of evidence necessary for implementation, provides an overview of recommendations in cancer by professional organizations and regulatory bodies, discusses limitations of current guidelines and strategies to improve third-party coverage. PMID:24136381

  11. Analysis of 'Fuji' apple somatic variants from next-generation sequencing.

    PubMed

    Lee, H S; Kim, G H; Kwon, S I; Kim, J H; Kwon, Y S; Choi, C

    2016-01-01

    The domesticated apple (Malus x domestica Borkh.) is a major fruit crop of temperate regions of the world. 'Fuji' apple (Ralls Genet x Delicious), a famous apple cultivar in Korea, has been very popular since its promotion in Japan in 1958. 'Fuji' and its bud mutant cultivars possess variable levels of genetic diversity. Nonetheless, the phenotypes of each group, which are classified into the bud mutation groups: early season, fruiting spur, and coloring, are similar. Despite attempts to identify these bud mutation cultivars, molecular markers, which were developed before the emergence of next-generation sequencing technology, have not been able to distinguish each cultivar easily. In this study, we adopted the resequencing technique using the 'Golden Delicious' (Grimes Golden x Unknown) apple genome as a reference. SNPs (single nucleotide polymorphisms) and InDels (insertions or deletions) of 'Fuji' apple and its bud mutant cultivar were detected and SNPs and unique InDels distinct to each cultivar were identified. Data from this study may be used to identify bud mutant cultivars of 'Fuji' apples and be useful for further breeding of apples. PMID:27525934

  12. Characterization of HSD17B1 sequence variants in breast cancer cases from French Canadian families with high risk of breast and ovarian cancer.

    PubMed

    Plourde, Marie; Samson, Carolle; Durocher, Francine; Sinilnokova, Olga; Simard, Jacques

    2008-03-01

    A family history of disease and estrogen exposure are risk factors for breast cancer. The HSD17B1 gene encodes a key steroidogenic enzyme that catalyses the final step of estradiol biosynthesis, rendering it a good candidate gene for breast cancer susceptibility. The current study was designed to screen for HSD17B1 germline mutations potentially involved in breast cancer susceptibility. DNA samples from 50 individuals affected with breast cancer from non-BRCA1/2 French Canadian families with a high risk of breast and ovarian cancer were screened for sequence variants in HSD17B1. Our study identified 28 sequence variants, including three non-synonymous variants, p.Ala238Val, p.Arg259His, p.Ser313Gly, one of which (p.Arg259His) was not previously reported. Functional assays failed to show changes in either activity or recombinant proteins levels for all three variants. Thus, our resequencing analysis does not support the existence of deleterious, gain-of-function or transcription mutations in HSD17B1, which could explain the clustering of breast cancer cases in non-BRCA1/2 high-risk French Canadian families. However, a haplotype-based approach was used to establish tSNPs, providing a valuable tool for further searches of common disease-associated variants in this gene, using large cohorts.

  13. Rational design of translational pausing without altering the amino acid sequence dramatically promotes soluble protein expression: a strategic demonstration.

    PubMed

    Chen, Wei; Jin, Jingjie; Gu, Wei; Wei, Bo; Lei, Yun; Xiong, Sheng; Zhang, Gong

    2014-11-10

    The production of many pharmaceutical and industrial proteins in prokaryotic hosts is hindered by the insolubility of industrial expression products resulting from misfolding. Even with a correct primary sequence, an improper translation elongation rate in a heterologous expression system is an important cause of misfolding. In silico analysis revealed that most of the endogenous Escherichia coli genes display translational pausing sites that promote correct folding, and almost 1/5 genes have pausing sites at the 3'-termini of their coding sequence. Therefore, we established a novel strategy to efficiently promote the expression of soluble and active proteins without altering the amino acid sequence or expression conditions. This strategy uses the rational design of translational pausing based on structural information solely through synonymous substitutions, i.e. no change on the amino acids sequence. We demonstrated this strategy on a promising antiviral candidate, Cyanovirin-N (CVN), which could not be efficiently expressed in any previously reported system. By introducing silent mutations, we increased the soluble expression level in E. coli by 2000-fold without altering the CVN protein sequence, and the specific activity was slightly higher for the optimized CVN than for the wild-type variant. This strategy introduces new possibilities for the production of bioactive recombinant proteins.

  14. Complete nucleotide sequence of an Amerindian human T-cell lymphotropic virus type II (HTLV-II) isolate: identification of a variant HTLV-II subtype b from a Guaymi Indian.

    PubMed Central

    Pardi, D; Switzer, W M; Hadlock, K G; Kaplan, J E; Lal, R B; Folks, T M

    1993-01-01

    The complete nucleotide sequence of a human T-cell lymphotropic virus type II (HTLV-II) isolate from a Panamanian Guaymi Indian was determined and analyzed. When this new viral isolate (HTLV-IIG12) was compared with prototypic HTLV-IIMoT, the overall nucleotide sequence similarity was 95.4%, while the predicted amino acid sequence similarity was 97.5%. Although the overall percentage of nucleotide and amino acid identity with prototypic HTLV-IIMoT (subtype a) was high, HTLV-IIG12 displayed several distinctive features that defined it as an HTLV-II subtype b. However, there were several characteristics unique to this isolate, which included a cluster of nucleotide substitutions in the pre-gag region and changes in restriction enzyme sites within the pre-gag region and the gag, pol, env, and pX genes. In addition, two nucleotide changes in the C terminus of the Tax protein coding sequence inserted an Arg residue for a stop codon and appeared to result in a larger tax gene product in HTLV-IIG12. Although the HTLV-IIG12 isolate appears to be a variant of the prototypic HTLV-IIb, this information represents the first complete nucleotide sequence of any HTLV-II subtype b. These data will allow further studies on the evolutionary relationships between the HTLV-II subtypes and between HTLV-I and HTLV-II. PMID:8331724

  15. Rare variant associations with waist-to-hip ratio in European-American and African-American women from the NHLBI-Exome Sequencing Project.

    PubMed

    Kan, Mengyuan; Auer, Paul L; Wang, Gao T; Bucasas, Kristine L; Hooker, Stanley; Rodriguez, Alejandra; Li, Biao; Ellis, Jaclyn; Adrienne Cupples, L; Ida Chen, Yii-Der; Dupuis, Josée; Fox, Caroline S; Gross, Myron D; Smith, Joshua D; Heard-Costa, Nancy; Meigs, James B; Pankow, James S; Rotter, Jerome I; Siscovick, David; Wilson, James G; Shendure, Jay; Jackson, Rebecca; Peters, Ulrike; Zhong, Hua; Lin, Danyu; Hsu, Li; Franceschini, Nora; Carlson, Chris; Abecasis, Goncalo; Gabriel, Stacey; Bamshad, Michael J; Altshuler, David; Nickerson, Deborah A; North, Kari E; Lange, Leslie A; Reiner, Alexander P; Leal, Suzanne M

    2016-08-01

    Waist-to-hip ratio (WHR), a relative comparison of waist and hip circumferences, is an easily accessible measurement of body fat distribution, in particular central abdominal fat. A high WHR indicates more intra-abdominal fat deposition and is an established risk factor for cardiovascular disease and type 2 diabetes. Recent genome-wide association studies have identified numerous common genetic loci influencing WHR, but the contributions of rare variants have not been previously reported. We investigated rare variant associations with WHR in 1510 European-American and 1186 African-American women from the National Heart, Lung, and Blood Institute-Exome Sequencing Project. Association analysis was performed on the gene level using several rare variant association methods. The strongest association was observed for rare variants in IKBKB (P=4.0 × 10(-8)) in European-Americans, where rare variants in this gene are predicted to decrease WHRs. The activation of the IKBKB gene is involved in inflammatory processes and insulin resistance, which may affect normal food intake and body weight and shape. Meanwhile, aggregation of rare variants in COBLL1, previously found to harbor common variants associated with WHR and fasting insulin, were nominally associated (P=2.23 × 10(-4)) with higher WHR in European-Americans. However, these significant results are not shared between African-Americans and European-Americans that may be due to differences in the allelic architecture of the two populations and the small sample sizes. Our study indicates that the combined effect of rare variants contribute to the inter-individual variation in fat distribution through the regulation of insulin response.

  16. Data on the evolutionary history of the V(D)J recombination-activating protein 1 - RAG1 coupled with sequence and variant analyses.

    PubMed

    Kumar, Abhishek; Bhandari, Anita; Sarde, Sandeep J; Muppavarapu, Sekhar; Tandon, Ravi

    2016-09-01

    RAG1 protein is one of the key component of RAG complex regulating the V(D)J recombination. There are only few studies for RAG1 concerning evolutionary history, detailed sequence and mutational hotspots. Herein, we present out datasets used for the recent comprehensive study of RAG1 based on sequence, phylogenetic and genetic variant analyses (Kumar et al., 2015) [1]. Protein sequence alignment helped in characterizing the conserved domains and regions of RAG1. It also aided in unraveling ancestral RAG1 in the sea urchin. Human genetic variant analyses revealed 751 mutational hotspots, located both in the coding and the non-coding regions. For further analysis and discussion, see (Kumar et al., 2015) [1]. PMID:27284568

  17. Systematic Identification of Single Amino Acid Variants in Glioma Stem-Cell-Derived Chromosome 19 Proteins

    PubMed Central

    2015-01-01

    Novel proteoforms with single amino acid variations represent proteins that often have altered biological functions but are less explored in the human proteome. We have developed an approach, searching high quality shotgun proteomic data against an extended protein database, to identify expressed mutant proteoforms in glioma stem cell (GSC) lines. The systematic search of MS/MS spectra using PEAKS 7.0 as the search engine has recognized 17 chromosome 19 proteins in GSCs with altered amino acid sequences. The results were further verified by manual spectral examination, validating 19 proteoforms. One of the novel findings, a mutant form of branched-chain aminotransferase 2 (p.Thr186Arg), was verified at the transcript level and by targeted proteomics in several glioma stem cell lines. The structure of this proteoform was examined by molecular modeling in order to estimate conformational changes due to mutation that might lead to functional modifications potentially linked to glioma. Based on our initial findings, we believe that our approach presented could contribute to construct a more complete map of the human functional proteome. PMID:25399873

  18. Haplotype combination of the bovine CFL2 gene sequence variants and association with growth traits in Qinchuan cattle.

    PubMed

    Sun, Yujia; Lan, Xianyong; Lei, Chuzhao; Zhang, Chunlei; Chen, Hong

    2015-06-01

    The aim of this study was to examine the association of cofilin2 (CFL2) gene polymorphisms with growth traits in Chinese Qinchuan cattle. Three single nucleotide polymorphisms (SNPs) were identified in the bovine CFL2 gene using DNA sequencing and (forced) PCR-RFLP methods. These polymorphisms included a missense mutation (NC_007319.5: g. C 2213 G) in exon 4, one synonymous mutation (NC_007319.5: g. T 1694 A) in exon 4, and a mutation (NC_007319.5: g. G 1500 A) in intron 2, respectively. In addition, we evaluated the haplotype frequency and linkage disequilibrium coefficient of three sequence variants in 488 individuals in QC cattle. All the three SNPs in QC cattle belonged to an intermediate level of genetic diversity (0.250.33). Association analysis indicated that SNP G 1500 A, T 1694 A and C 2213 G were significantly associated with growth traits in the QC population. The results of our study suggest that the CFL2 gene may be a strong candidate gene that affects growth traits in the QC cattle breeding program.

  19. Enhancer sequence variants and transcription-factor deregulation synergize to construct pathogenic regulatory circuits in B-cell lymphoma.

    PubMed

    Koues, Olivia I; Kowalewski, Rodney A; Chang, Li-Wei; Pyfrom, Sarah C; Schmidt, Jennifer A; Luo, Hong; Sandoval, Luis E; Hughes, Tyler B; Bednarski, Jeffrey J; Cashen, Amanda F; Payton, Jacqueline E; Oltz, Eugene M

    2015-01-20

    Most B-cell lymphomas arise in the germinal center (GC), where humoral immune responses evolve from potentially oncogenic cycles of mutation, proliferation, and clonal selection. Although lymphoma gene expression diverges significantly from GC B cells, underlying mechanisms that alter the activities of corresponding regulatory elements (REs) remain elusive. Here we define the complete pathogenic circuitry of human follicular lymphoma (FL), which activates or decommissions REs from normal GC B cells and commandeers enhancers from other lineages. Moreover, independent sets of transcription factors, whose expression was deregulated in FL, targeted commandeered versus decommissioned REs. Our approach revealed two distinct subtypes of low-grade FL, whose pathogenic circuitries resembled GC B or activated B cells. FL-altered enhancers also were enriched for sequence variants, including somatic mutations, which disrupt transcription-factor binding and expression of circuit-linked genes. Thus, the pathogenic regulatory circuitry of FL reveals distinct genetic and epigenetic etiologies for GC B-cell transformation.

  20. Next-generation sequencing of hereditary hemochromatosis-related genes: Novel likely pathogenic variants found in the Portuguese population.

    PubMed

    Faria, Ricardo; Silva, Bruno; Silva, Catarina; Loureiro, Pedro; Queiroz, Ana; Fraga, Sofia; Esteves, Jorge; Mendes, Diana; Fleming, Rita; Vieira, Luís; Gonçalves, João; Faustino, Paula

    2016-10-01

    Hereditary hemochromatosis (HH) is an autosomal recessive disorder characterized by excessive iron absorption resulting in pathologically increased body iron stores. It is typically associated with common HFE gene mutation (p.Cys282Tyr and p.His63Asp). However, in Southern European populations up to one third of HH patients do not carry the risk genotypes. This study aimed to explore the use of next-generation sequencing (NGS) technology to analyse a panel of iron metabolism-related genes (HFE, TFR2, HJV, HAMP, SLC40A1, and FTL) in 87 non-classic HH Portuguese patients. A total of 1241 genetic alterations were detected corresponding to 53 different variants, 13 of which were not described in the available public databases. Among them, five were predicted to be potentially pathogenic: three novel mutations in TFR2 [two missense (p.Leu750Pro and p.Ala777Val) and one intronic splicing mutation (c.967-1G>C)], one missense mutation in HFE (p.Tyr230Cys), and one mutation in the 5'-UTR of HAMP gene (c.-25G>A). The results reported here illustrate the usefulness of NGS for targeted iron metabolism-related gene panels, as a likely cost-effective approach for molecular genetics diagnosis of non-classic HH patients. Simultaneously, it has contributed to the knowledge of the pathophysiology of those rare iron metabolism-related disorders. PMID:27667161

  1. Stability of monomeric Cro variants: Isoenergetic transformation of a type I′ to a type II′ β-hairpin by single amino acid replacements

    PubMed Central

    Mollah, A.K.M.M.; Stennis, Rhonda L.; Mossing, Michael C.

    2003-01-01

    The thermodynamic stabilities of three monomeric variants of the bacteriophage λ Cro repressor that differ only in the sequence of two amino acids at the apex of an engineered β-hairpin have been determined. The sequences of the turns are EVK-XX-EVK, where the two central residues are DG, GG, and GT, respectively. Standard-state unfolding free energies, determined from circular dichroism measurements as a function of urea concentration, range from 2.4 to 2.7 kcal/mole, while those determined from guanidine hydrochloride range from 2.8 to 3.3 kcal/mole for the three proteins. Thermal denaturation yields van’t Hoff unfolding enthalpies of 36 to 40 kcal /mole at midpoint temperatures in the range of 53 to 58°C. Extrapolation of the thermal denaturation free energies with heat capacities of 400 to 600 cal/mole deg gives good agreement with the parameters determined in denaturant titrations. As predicted from statistical surveys of amino acid replacements in β-hairpins, energetic barriers to transformation from a type I′ turn (DG) to a type II′ turn (GT) can be quite small. PMID:12717034

  2. Stability of monomeric Cro variants: Isoenergetic transformation of a type I' to a type II' beta-hairpin by single amino acid replacements.

    PubMed

    Mollah, A K M M; Stennis, Rhonda L; Mossing, Michael C

    2003-05-01

    The thermodynamic stabilities of three monomeric variants of the bacteriophage lambda Cro repressor that differ only in the sequence of two amino acids at the apex of an engineered beta-hairpin have been determined. The sequences of the turns are EVK-XX-EVK, where the two central residues are DG, GG, and GT, respectively. Standard-state unfolding free energies, determined from circular dichroism measurements as a function of urea concentration, range from 2.4 to 2.7 kcal/mole, while those determined from guanidine hydrochloride range from 2.8 to 3.3 kcal/mole for the three proteins. Thermal denaturation yields van't Hoff unfolding enthalpies of 36 to 40 kcal /mole at midpoint temperatures in the range of 53 to 58 degrees C. Extrapolation of the thermal denaturation free energies with heat capacities of 400 to 600 cal/mole deg gives good agreement with the parameters determined in denaturant titrations. As predicted from statistical surveys of amino acid replacements in beta-hairpins, energetic barriers to transformation from a type I' turn (DG) to a type II' turn (GT) can be quite small.

  3. Predicting protein disorder by analyzing amino acid sequence

    PubMed Central

    Yang, Jack Y; Yang, Mary Qu

    2008-01-01

    Background Many protein regions and some entire proteins have no definite tertiary structure, presenting instead as dynamic, disorder ensembles under different physiochemical circumstances. These proteins and regions are known as Intrinsically Unstructured Proteins (IUP). IUP have been associated with a wide range of protein functions, along with roles in diseases characterized by protein misfolding and aggregation. Results Identifying IUP is important task in structural and functional genomics. We exact useful features from sequences and develop machine learning algorithms for the above task. We compare our IUP predictor with PONDRs (mainly neural-network-based predictors), disEMBL (also based on neural networks) and Globplot (based on disorder propensity). Conclusion We find that augmenting features derived from physiochemical properties of amino acids (such as hydrophobicity, complexity etc.) and using ensemble method proved beneficial. The IUP predictor is a viable alternative software tool for identifying IUP protein regions and proteins. PMID:18831799

  4. Targeted sequencing of the Paget's disease associated 14q32 locus identifies several missense coding variants in RIN3 that predispose to Paget's disease of bone

    PubMed Central

    Vallet, Mahéva; Soares, Dinesh C.; Wani, Sachin; Sophocleous, Antonia; Warner, Jon; Salter, Donald M.; Ralston, Stuart H.; Albagha, Omar M.E.

    2015-01-01

    Paget's disease of bone (PDB) is a common disorder with a strong genetic component characterized by increased but disorganized bone remodelling. Previous genome-wide association studies identified a locus on chromosome 14q32 tagged by rs10498635 which was significantly associated with susceptibility to PDB in several European populations. Here we conducted fine-mapping and targeted sequencing of the candidate locus to identify possible functional variants. Imputation in 741 PDB patients and 2699 controls confirmed that the association was confined to a 60 kb region in the RIN3 gene and conditional analysis adjusting for rs10498635 identified no new independent signals. Sequencing of the RIN3 gene identified a common missense variant (p.R279C) that was strongly associated with the disease (OR = 0.64; P = 1.4 × 10−9), and was in strong linkage disequilibrium with rs10498635. A further 13 rare missense variants were identified, seven of which were novel and detected only in PDB cases. When combined, these rare variants were over-represented in cases compared with controls (OR = 3.72; P = 8.9 × 10−10). Most rare variants were located in a region that encodes a proline-rich, intrinsically disordered domain of the protein and many were predicted to be pathogenic. RIN3 was expressed in bone tissue and its expression level was ∼10-fold higher in osteoclasts compared with osteoblasts. We conclude that susceptibility to PDB at the 14q32 locus is mediated by a combination of common and rare coding variants in RIN3 and suggest that RIN3 may contribute to PDB susceptibility by affecting osteoclast function. PMID:25701875

  5. Phylogenetic analysis of beta-papillomaviruses as inferred from nucleotide and amino acid sequence data.

    PubMed

    Gottschling, Marc; Köhler, Anja; Stockfleth, Eggert; Nindl, Ingo

    2007-01-01

    Human papillomaviruses (HPV) of the beta-group seem to be involved in the pathogenesis of non-melanoma skin cancer. Papillomaviruses are host specific and are considered closely co-evolving with their hosts. Evolutionary incongruence between early genes and late genes has been reported among oncogenic genital alpha-papillomaviruses and considerably challenge phylogenetic reconstructions. We investigated the relationships of 29 beta-HPV (25 types plus four putative new types, subtypes, or variants) as inferred from codon aligned and amino acid sequence data of the genes E1, E2, E6, E7, L1, and L2 using likelihood, distance, and parsimony approaches. An analysis of a L1 fragment included additional nucleotide and amino acid sequences from seven non-human beta-papillomaviruses. Early genes and late genes evolution did not conflict significantly in beta-papillomaviruses based on partition homogeneity tests (p > or = 0.001). As inferred from the complete genome analyses, beta-papillomaviruses were monophyletic and segregated into four highly supported monophyletic assemblages corresponding to the species 1, 2, 3, and fused 4/5. They basically split into the species 1 and the remainder of beta-papillomaviruses, whose species 3, 4, and 5 constituted the sistergroup of species 2. beta-Papillomaviruses have been isolated from humans, apes, and monkeys, and phylogenetic analyses of the L1 fragment showed non-human papillomaviruses highly polyphyletic nesting within the HPV species. Thus, host and virus phylogenies were not congruent in beta-papillomaviruses, and multiple invasions across species borders may contribute (additionally to host-linked evolution) to their diversification.

  6. Heterogeneity of amino acid sequence in hippopotamus cytochrome c.

    PubMed

    Thompson, R B; Borden, D; Tarr, G E; Margoliash, E

    1978-12-25

    The amino acid sequences of chymotryptic and tryptic peptides of Hippopotamus amphibius cytochrome c were determined by a recent modification of the manual Edman sequential degradation procedure. They were ordered by comparison with the structure of the hog protein. The hippopotamus protein differs in three positions: serine, alanine, and glutamine replace alanine, glutamic acid, and lysine in positions 43, 92, and 100, respectively. Since the artiodactyl suborders diverged in the mid-Eocene some 50 million years ago, the fact that representatives of some of them show no differences in their cytochromes c (cow, sheep, and hog), while another exhibits as many as three such differences, verifies that even in relatively closely related lines of descent the rate at which cytochrome c changes in the course of evolution is not constant. Furthermore, 10.6% of the hippopotamus cytochrome c preparation was shown to contain isoleucine instead of valine at position 3, indicating that one of the four animals from which the protein was obtained was heterozygous in the cytochrome c gene. Such heterogeneity is a necessary condition of evolutionary variation and has not been previously observed in the cytochrome c of a wild mammalian population.

  7. Analysis of coding variants identified from exome sequencing resources for association with diabetic and non-diabetic nephropathy in African Americans.

    PubMed

    Cooke Bailey, Jessica N; Palmer, Nicholette D; Ng, Maggie C Y; Bonomo, Jason A; Hicks, Pamela J; Hester, Jessica M; Langefeld, Carl D; Freedman, Barry I; Bowden, Donald W

    2014-06-01

    Prior studies have identified common genetic variants influencing diabetic and non-diabetic nephropathy, diseases which disproportionately affect African Americans. Recently, exome sequencing techniques have facilitated identification of coding variants on a genome-wide basis in large samples. Exonic variants in known or suspected end-stage kidney disease (ESKD) or nephropathy genes can be tested for their ability to identify association either singly or in combination with known associated common variants. Coding variants in genes with prior evidence for association with ESKD or nephropathy were identified in the NHLBI-ESP GO database and genotyped in 5,045 African Americans (3,324 cases with type 2 diabetes associated nephropathy [T2D-ESKD] or non-T2D ESKD, and 1,721 controls) and 1,465 European Americans (568 T2D-ESKD cases and 897 controls). Logistic regression analyses were performed to assess association, with admixture and APOL1 risk status incorporated as covariates. Ten of 31 SNPs were associated in African Americans; four replicated in European Americans. In African Americans, SNPs in OR2L8, OR2AK2, C6orf167 (MMS22L), LIMK2, APOL3, APOL2, and APOL1 were nominally associated (P = 1.8 × 10(-4)-0.044). Haplotype analysis of common and coding variants increased evidence of association at the OR2L13 and APOL1 loci (P = 6.2 × 10(-5) and 4.6 × 10(-5), respectively). SNPs replicating in European Americans were in OR2AK2, LIMK2, and APOL2 (P = 0.0010-0.037). Meta-analyses highlighted four SNPs associated in T2D-ESKD and all-cause ESKD. Results from this study suggest a role for coding variants in the development of diabetic, non-diabetic, and/or all-cause ESKD in African Americans and/or European Americans.

  8. Analysis of Coding Variants Identified from Exome Sequencing Resources for Association with Diabetic and Non-diabetic Nephropathy in African Americans

    PubMed Central

    Ng, Maggie C.Y.; Bonomo, Jason A.; Hicks, Pamela J.; Hester, Jessica M.; Langefeld, Carl D.; Freedman, Barry I.; Bowden, Donald W.

    2014-01-01

    Prior studies have identified common genetic variants influencing diabetic and non-diabetic nephropathy, diseases which disproportionately affect African Americans. Recently, exome sequencing techniques have facilitated identification of coding variants on a genome-wide basis in large samples. Exonic variants in known or suspected end-stage kidney disease (ESKD) or nephropathy genes can be tested for their ability to identify association either singly or in combination with known associated common variants. Coding variants in genes with prior evidence for association with ESKD or nephropathy were identified in the NHLBI-ESP GO database and genotyped in 5045 African Americans (3324 cases with type 2 diabetes associated nephropathy [T2D-ESKD] or non-T2D ESKD, and 1721 controls) and 1465 European Americans (568 T2D-ESKD cases and 897 controls). Logistic regression analyses were performed to assess association, with admixture and APOL1 risk status incorporated as covariates. Ten of 31 SNPs were associated in African Americans; four replicated in European Americans. In African Americans, SNPs in OR2L8, OR2AK2, C6orf167 (MMS22L), LIMK2, APOL3, APOL2, and APOL1 were nominally associated (P=1.8×10−4-0.044). Haplotype analysis of common and coding variants increased evidence of association at the OR2L13 and APOL1 loci (P=6.2×10−5 and 4.6×10−5, respectively). SNPs replicating in European Americans were in OR2AK2, LIMK2, and APOL2 (P=0.0010-0.037). Meta-analyses highlighted four SNPs associated in T2DESKD and all-cause ESKD. Results from this study suggest a role for coding variants in the development of diabetic, non-diabetic, and/or all-cause ESKD in African Americans and/or European Americans. PMID:24385048

  9. Amino acid substitutions in naturally occurring variants of ail result in altered invasion activity.

    PubMed Central

    Beer, K B; Miller, V L

    1992-01-01

    Yersinia enterocolitica is the causative agent of a variety of gastrointestinal syndromes ranging from acute enteritis to mesenteric lymphadenitis. In addition, systemic infections resulting in high mortality rates can occur in elderly and immunocompromised patients. More than 50 serotypes of Y. enterocolitica have been identified, but only a few of them commonly cause disease in otherwise healthy hosts. Those serotypes that cause disease have been divided into two groups, American and non-American, based on their geographical distributions, biotypes, and pathogenicity. We have been studying two genes, inv and ail, from Y. enterocolitica that confer in tissue culture assays an invasive phenotype that strongly correlates with virulence. Some differences between the American and non-American serotypes at the ail locus were noted previously and have been investigated further in this report. The ail locus was cloned from seven Y. enterocolitica strains (seven different serotypes). Although the different clones produced similar amounts of Ail, the product of the ail gene from non-American serotypes (AilNA) was less able to promote invasion by Escherichia coli than was the product of the ail gene from American serotypes (AilA). This difference is probably due to one or more of the eight amino acid changes found in the derived amino acid sequence for the mature form of AilNA compared with that of AilA. Seven of these changes are predicted to be in cell surface domains of the protein (a model for the proposed folding of Ail within the outer membrane is presented). These results are discussed in relation to the growing family of outer membrane proteins, which includes Lom from bacteriophage lambda, PagC from salmonella typhimurium, and OmpX from Enterobacter cloacae. Images PMID:1370953

  10. Whole-exome sequencing and imaging genetics identify functional variants for rate of change in hippocampal volume in mild cognitive impairment.

    PubMed

    Nho, K; Corneveaux, J J; Kim, S; Lin, H; Risacher, S L; Shen, L; Swaminathan, S; Ramanan, V K; Liu, Y; Foroud, T; Inlow, M H; Siniard, A L; Reiman, R A; Aisen, P S; Petersen, R C; Green, R C; Jack, C R; Weiner, M W; Baldwin, C T; Lunetta, K; Farrer, L A; Furney, S J; Lovestone, S; Simmons, A; Mecocci, P; Vellas, B; Tsolaki, M; Kloszewska, I; Soininen, H; McDonald, B C; Farlow, M R; Ghetti, B; Huentelman, M J; Saykin, A J

    2013-07-01

    Whole-exome sequencing of individuals with mild cognitive impairment, combined with genotype imputation, was used to identify coding variants other than the apolipoprotein E (APOE) ε4 allele associated with rate of hippocampal volume loss using an extreme trait design. Matched unrelated APOE ε3 homozygous male Caucasian participants from the Alzheimer's Disease Neuroimaging Initiative (ADNI) were selected at the extremes of the 2-year longitudinal change distribution of hippocampal volume (eight subjects with rapid rates of atrophy and eight with slow/stable rates of atrophy). We identified 57 non-synonymous single nucleotide variants (SNVs) which were found exclusively in at least 4 of 8 subjects in the rapid atrophy group, but not in any of the 8 subjects in the slow atrophy group. Among these SNVs, the variants that accounted for the greatest group difference and were predicted in silico as 'probably damaging' missense variants were rs9610775 (CARD10) and rs1136410 (PARP1). To further investigate and extend the exome findings in a larger sample, we conducted quantitative trait analysis including whole-brain search in the remaining ADNI APOE ε3/ε3 group (N=315). Genetic variation within PARP1 and CARD10 was associated with rate of hippocampal neurodegeneration in APOE ε3/ε3. Meta-analysis across five independent cross sectional cohorts indicated that rs1136410 is also significantly associated with hippocampal volume in APOE ε3/ε3 individuals (N=923). Larger sequencing studies and longitudinal follow-up are needed for confirmation. The combination of next-generation sequencing and quantitative imaging phenotypes holds significant promise for discovery of variants involved in neurodegeneration. PMID:23608917

  11. Polymorphic sites in the African population detected by sequence analysis of the glucose-6-phosphate dehydrogenase gene outline the evolution of the variants A and A-.

    PubMed Central

    Vulliamy, T J; Othman, A; Town, M; Nathwani, A; Falusi, A G; Mason, P J; Luzzatto, L

    1991-01-01

    The human X chromosome-linked gene encoding glucose-6-phosphate dehydrogenase (G6PD; EC 1.1.1.49) is known to be highly polymorphic from the biochemical characterization of enzyme variants. The variant A (with enzyme activity in the normal range) and the variant A- (associated with enzyme deficiency) each have a frequency of about 0.2 in several African populations. Two restriction fragment length polymorphisms have also been found in people of African descent, but not in other populations, whereas a silent mutation has been shown to be polymorphic in Mediterranean, Middle Eastern, African, and Indian populations. We report now on two additional polymorphisms that we have detected by sequence analysis, one in intron 7 and one in intron 8. The analysis of 54 African male subjects for the seven polymorphic sites, clustered within 3 kilobases of the G6PD gene, has revealed only 7 of the 128 possible haplotypes, indicating marked linkage disequilibrium. These data have enabled us to suggest an evolutionary pathway for the different mutations, with only a single ambiguity. The mutation underlying the A variant is the most ancient and the mutation underlying the A- variant is the most recent. Since it seems reasonable that the A- allele is subject to positive selection by malaria, whereas the other alleles are neutral, G6PD may lend itself to the analysis of the role of random genetic drift and selection in determining allele frequencies within a single genetic locus in human populations. Images PMID:1924316

  12. A variant in the sonic hedgehog regulatory sequence (ZRS) is associated with triphalangeal thumb and deregulates expression in the developing limb

    PubMed Central

    Furniss, Dominic; Lettice, Laura A.; Taylor, Indira B.; Critchley, Paul S.; Giele, Henk; Hill, Robert E.; Wilkie, Andrew O.M.

    2008-01-01

    A locus for triphalangeal thumb, variably associated with pre-axial polydactyly, was previously identified in the zone of polarizing activity regulatory sequence (ZRS), a long range limb-specific enhancer of the Sonic Hedgehog (SHH) gene at human chromosome 7q36.3. Here, we demonstrate that a 295T>C variant in the human ZRS, previously thought to represent a neutral polymorphism, acts as a dominant allele with reduced penetrance. We found this variant in three independently ascertained probands from southern England with triphalangeal thumb, demonstrated significant linkage of the phenotype to the variant (LOD = 4.1), and identified a shared microsatellite haplotype around the ZRS, suggesting that the probands share a common ancestor. An individual homozygous for the 295C allele presented with isolated bilateral triphalangeal thumb resembling the heterozygous phenotype, suggesting that the variant is largely dominant to the wild-type allele. As a functional test of the pathogenicity of the 295C allele, we utilized a mutated ZRS construct to demonstrate that it can drive ectopic anterior expression of a reporter gene in the developing mouse forelimb. We conclude that the 295T>C variant is in fact pathogenic and, in southern England, appears to be the most common cause of triphalangeal thumb. Depending on the dispersal of the founding mutation, it may play a wider role in the aetiology of this disorder. PMID:18463159

  13. Development and Validation of a Template-Independent Next-Generation Sequencing Assay for Detecting Low-Level Resistance-Associated Variants of Hepatitis C Virus.

    PubMed

    Wei, Bo; Kang, John; Kibukawa, Miho; Chen, Lei; Qiu, Ping; Lahser, Fred; Marton, Matthew; Levitan, Diane

    2016-09-01

    To develop hepatitis C virus (HCV) direct-acting antiviral (DAA) drugs that can treat most HCV genotypes and offer higher barriers for treatment-resistant mutations, it is important to study resistance-associated variants (RAVs). Current commercially available RAV detection assays rely on genotype- or subtype-specific template-dependent PCR amplification. These assays are limited to genotypes and subtypes that are often prevalent in developed countries because of availability of public sequence databases. To support global clinical trials of DAAs, we developed and validated a template-independent (TI) next-generation sequencing (NGS) assay for HCV whole genome sequencing that can perform HCV subtyping, detect HCV mixed genotype or subtype infection, and identify low-level RAVs at a 5% fraction of the viral population with sensitivity and positive predictive value ≥ 0.9. We compared TI-NGS with commercial genotype- or subtype-specific Sanger sequencing assays, and found that TI-NGS both confirmed most of variants called by Sanger sequencing and avoided biases likely caused by PCR primers used in Sanger sequencing. To confirm TI-NGS assay's variant calls at the discrepant positions with Sanger sequencing, we custom designed template-dependent NGS assays and obtained 100% concordance with the TI-NGS assay. The ability to reliably detect low-level RAVs in HCV samples of any subtype without PCR primer-related bias makes this TI-NGS assay an important tool in studying HCV DAA drug resistance. PMID:27393904

  14. Lake Louise mutation detection meeting 2013: clinical translation of next-generation sequencing requires optimization of workflows and interpretation of variants.

    PubMed

    Smith, Amanda; Boycott, Kym M; Jarinova, Olga

    2014-02-01

    With the exponential reduction of the cost of next-generation sequencing (NGS), it is no longer the generation of data but the analysis and interpretation of massive amounts of sequencing data that are seen as key challenges for the effective integration of these technologies into clinical practice. Clinical geneticists, informaticians, and scientists from 17 countries gathered for the 12th International Symposium on Mutation in the Genome at the Fairmont Chateau Lake Louise (Canada) to discuss technological advances and applications of NGS and consider possible approaches to the challenges of clinical translation. Here, we provide an overview of the main themes of the meeting that included development of innovative solutions for variant sharing, tools and resources for NGS analysis, novel technology and methodology development, NGS-based discovery of disease pathogenesis, development of multigene NGS sequencing panels for clinical use, exploring diagnostic utility of whole-exome and whole-genome sequencing, and, finally, integration of genomic sequencing into the clinic.

  15. Human liver apolipoprotein B-100 cDNA: complete nucleic acid and derived amino acid sequence.

    PubMed Central

    Law, S W; Grant, S M; Higuchi, K; Hospattankar, A; Lackner, K; Lee, N; Brewer, H B

    1986-01-01

    Human apolipoprotein B-100 (apoB-100), the ligand on low density lipoproteins that interacts with the low density lipoprotein receptor and initiates receptor-mediated endocytosis and low density lipoprotein catabolism, has been cloned, and the complete nucleic acid and derived amino acid sequences have been determined. ApoB-100 cDNAs were isolated from normal human liver cDNA libraries utilizing immunoscreening as well as filter hybridization with radiolabeled apoB-100 oligodeoxynucleotides. The apoB-100 mRNA is 14.1 kilobases long encoding a mature apoB-100 protein of 4536 amino acids with a calculated amino acid molecular weight of 512,723. ApoB-100 contains 20 potential glycosylation sites, and 12 of a total of 25 cysteine residues are located in the amino-terminal region of the apolipoprotein providing a potential globular structure of the amino terminus of the protein. ApoB-100 contains relatively few regions of amphipathic helices, but compared to other human apolipoproteins it is enriched in beta-structure. The delineation of the entire human apoB-100 sequence will now permit a detailed analysis of the conformation of the protein, the low density lipoprotein receptor binding domain(s), and the structural relationship between apoB-100 and apoB-48 and will provide the basis for the study of genetic defects in apoB-100 in patients with dyslipoproteinemias. PMID:3464946

  16. Mutation analysis and characterization of HSD17B2 sequence variants in breast cancer cases from French Canadian families with high risk of breast and ovarian cancer.

    PubMed

    Plourde, Marie; Manhes, Caroline; Leblanc, Gilles; Durocher, Francine; Dumont, Martine; Sinilnikova, Olga; Simard, Jacques

    2008-04-01

    Estrogen exposure is a risk factor for breast cancer. Given that HSD17B2 gene encodes an enzyme that catalyses estradiol inactivation, it appears as a good candidate breast cancer susceptibility gene. This study was designed to screen for HSD17B2 germline mutations potentially involved in breast cancer predisposition. Our re-sequencing analysis did not identify any deleterious germline mutations, and therefore mutations in HSD17B2 do not explain the clustering of breast cancer cases in non-BRCA1/2 high-risk French Canadian families. However, six sequence variants were identified, including two novel missense variants. Expression assays revealed that p.Ala111Asp and p.Gly160Arg did not alter the catalytic properties of 17beta-hydroxysteroid dehydrogenase type 2 enzyme, although p.Ala111Asp appears to affect protein stability resulting in significant decreases in the protein levels, providing valuable information on structure-function relationship.

  17. A new natural hGH variant--17.5 kd--produced by alternative splicing. An additional consensus sequence which might play a role in branchpoint selection.

    PubMed Central

    Lecomte, C M; Renard, A; Martial, J A

    1987-01-01

    From a human pituitary cDNA library, we have cloned 3 distinct human growth hormone (hGH) cDNAs, coding respectively for the 22 K hGH, the 20 K variant, and a yet unknown 17.5 K variant. S1 mapping analysis using human pituitary RNA confirms the existence of at least four distinct hGH mRNAs originating from alternative acceptor sites at the second intron of the primary transcript. We have analysed the hGH gene sequence to explain the high frequency of alternative splicings which occur only at this location. In this study we propose CTTGNNPyPyPy as an additional consensus sequence guiding the selection of the branched nucleotide. Images PMID:3627992

  18. A polymorphic variant of human erythrocyte carbonic anhydrase I with a widespread distribution in Australian aborigines, CAI Australia-9 (8 Asp leads to Gly): purification, properties, amino acid substitution, and possible physiological significance of the variant enzyme.

    PubMed

    Jones, G L; Shaw, D C

    1982-10-01

    Carbonic anhydrase I (EC 4.2.1.1) purified from the pooled packed red blood cells of 100 individuals typed as heterozygous for the common Australian Aboriginal carbonic anhydrase I variant CAI Australia-9 had a slightly higher specific CO2 hydratase or esterase (toward p-nitrophenyl acetate) activity than the normal component and a higher Km and Vmax using the esterase substrate. The variant enzyme was slightly more resistant to heat inactivation. The extent of inhibition of both enzymes by the specific inhibitor acetazolamide was identical, as was their immunological behavior and the lability of the active-site zinc ion. The variant enzyme was more resistant to chloride inhibition. The physiological importance of this observation is discussed in the context of a proposed adaptive advantage of the variant gene in the arid western and central regions of Australia. The amino acid substitution in the Aboriginal variant of a glycine for an aspartic acid residue has been located at residue 8 from the N terminus (i.e., 8 Asp leads to Gly), by proteolytic and partial acid hydrolyses. The possible effects of this substitution on the structure and function of the molecule are discussed. PMID:6817746

  19. Sequencing the GRHL3 Coding Region Reveals Rare Truncating Mutations and a Common Susceptibility Variant for Nonsyndromic Cleft Palate.

    PubMed

    Mangold, Elisabeth; Böhmer, Anne C; Ishorst, Nina; Hoebel, Ann-Kathrin; Gültepe, Pinar; Schuenke, Hannah; Klamt, Johanna; Hofmann, Andrea; Gölz, Lina; Raff, Ruth; Tessmann, Peter; Nowak, Stefanie; Reutter, Heiko; Hemprich, Alexander; Kreusch, Thomas; Kramer, Franz-Josef; Braumann, Bert; Reich, Rudolf; Schmidt, Gül; Jäger, Andreas; Reiter, Rudolf; Brosch, Sibylle; Stavusis, Janis; Ishida, Miho; Seselgyte, Rimante; Moore, Gudrun E; Nöthen, Markus M; Borck, Guntram; Aldhorae, Khalid A; Lace, Baiba; Stanier, Philip; Knapp, Michael; Ludwig, Kerstin U

    2016-04-01

    Nonsyndromic cleft lip with/without cleft palate (nsCL/P) and nonsyndromic cleft palate only (nsCPO) are the most frequent subphenotypes of orofacial clefts. A common syndromic form of orofacial clefting is Van der Woude syndrome (VWS) where individuals have CL/P or CPO, often but not always associated with lower lip pits. Recently, ∼5% of VWS-affected individuals were identified with mutations in the grainy head-like 3 gene (GRHL3). To investigate GRHL3 in nonsyndromic clefting, we sequenced its coding region in 576 Europeans with nsCL/P and 96 with nsCPO. Most strikingly, nsCPO-affected individuals had a higher minor allele frequency for rs41268753 (0.099) than control subjects (0.049; p = 1.24 × 10(-2)). This association was replicated in nsCPO/control cohorts from Latvia, Yemen, and the UK (pcombined = 2.63 × 10(-5); ORallelic = 2.46 [95% CI 1.6-3.7]) and reached genome-wide significance in combination with imputed data from a GWAS in nsCPO triads (p = 2.73 × 10(-9)). Notably, rs41268753 is not associated with nsCL/P (p = 0.45). rs41268753 encodes the highly conserved p.Thr454Met (c.1361C>T) (GERP = 5.3), which prediction programs denote as deleterious, has a CADD score of 29.6, and increases protein binding capacity in silico. Sequencing also revealed four novel truncating GRHL3 mutations including two that were de novo in four families, where all nine individuals harboring mutations had nsCPO. This is important for genetic counseling: given that VWS is rare compared to nsCPO, our data suggest that dominant GRHL3 mutations are more likely to cause nonsyndromic than syndromic CPO. Thus, with rare dominant mutations and a common risk variant in the coding region, we have identified an important contribution for GRHL3 in nsCPO. PMID:27018475

  20. Sequencing the GRHL3 Coding Region Reveals Rare Truncating Mutations and a Common Susceptibility Variant for Nonsyndromic Cleft Palate

    PubMed Central

    Mangold, Elisabeth; Böhmer, Anne C.; Ishorst, Nina; Hoebel, Ann-Kathrin; Gültepe, Pinar; Schuenke, Hannah; Klamt, Johanna; Hofmann, Andrea; Gölz, Lina; Raff, Ruth; Tessmann, Peter; Nowak, Stefanie; Reutter, Heiko; Hemprich, Alexander; Kreusch, Thomas; Kramer, Franz-Josef; Braumann, Bert; Reich, Rudolf; Schmidt, Gül; Jäger, Andreas; Reiter, Rudolf; Brosch, Sibylle; Stavusis, Janis; Ishida, Miho; Seselgyte, Rimante; Moore, Gudrun E.; Nöthen, Markus M.; Borck, Guntram; Aldhorae, Khalid A.; Lace, Baiba; Stanier, Philip; Knapp, Michael; Ludwig, Kerstin U.

    2016-01-01

    Nonsyndromic cleft lip with/without cleft palate (nsCL/P) and nonsyndromic cleft palate only (nsCPO) are the most frequent subphenotypes of orofacial clefts. A common syndromic form of orofacial clefting is Van der Woude syndrome (VWS) where individuals have CL/P or CPO, often but not always associated with lower lip pits. Recently, ∼5% of VWS-affected individuals were identified with mutations in the grainy head-like 3 gene (GRHL3). To investigate GRHL3 in nonsyndromic clefting, we sequenced its coding region in 576 Europeans with nsCL/P and 96 with nsCPO. Most strikingly, nsCPO-affected individuals had a higher minor allele frequency for rs41268753 (0.099) than control subjects (0.049; p = 1.24 × 10−2). This association was replicated in nsCPO/control cohorts from Latvia, Yemen, and the UK (pcombined = 2.63 × 10−5; ORallelic = 2.46 [95% CI 1.6–3.7]) and reached genome-wide significance in combination with imputed data from a GWAS in nsCPO triads (p = 2.73 × 10−9). Notably, rs41268753 is not associated with nsCL/P (p = 0.45). rs41268753 encodes the highly conserved p.Thr454Met (c.1361C>T) (GERP = 5.3), which prediction programs denote as deleterious, has a CADD score of 29.6, and increases protein binding capacity in silico. Sequencing also revealed four novel truncating GRHL3 mutations including two that were de novo in four families, where all nine individuals harboring mutations had nsCPO. This is important for genetic counseling: given that VWS is rare compared to nsCPO, our data suggest that dominant GRHL3 mutations are more likely to cause nonsyndromic than syndromic CPO. Thus, with rare dominant mutations and a common risk variant in the coding region, we have identified an important contribution for GRHL3 in nsCPO. PMID:27018475

  1. Host Competence and Helicase Activity Differences Exhibited by West Nile Viral Variants Expressing NS3-249 Amino Acid Polymorphisms

    PubMed Central

    Langevin, Stanley A.; Bowen, Richard A.; Reisen, William K.; Andrade, Christy C.; Ramey, Wanichaya N.; Maharaj, Payal D.; Anishchenko, Michael; Kenney, Joan L.; Duggal, Nisha K.; Romo, Hannah; Bera, Aloke Kumar; Sanders, Todd A.; Bosco-Lauth, Angela; Smith, Janet L.; Kuhn, Richard; Brault, Aaron C.

    2014-01-01

    A single helicase amino acid substitution, NS3-T249P, has been shown to increase viremia magnitude/mortality in American crows (AMCRs) following West Nile virus (WNV) infection. Lineage/intra-lineage geographic variants exhibit consistent amino acid polymorphisms at this locus; however, the majority of WNV isolates associated with recent outbreaks reported worldwide have a proline at the NS3-249 residue. In order to evaluate the impact of NS3-249 variants on avian and mammalian virulence, multiple amino acid substitutions were engineered into a WNV infectious cDNA (NY99; NS3-249P) and the resulting viruses inoculated into AMCRs, house sparrows (HOSPs) and mice. Differential viremia profiles were observed between mutant viruses in the two bird species; however, the NS3-249P virus produced the highest mean peak viral loads in both avian models. In contrast, this avian modulating virulence determinant had no effect on LD50 or the neurovirulence phenotype in the murine model. Recombinant helicase proteins demonstrated variable helicase and ATPase activities; however, differences did not correlate with avian or murine viremia phenotypes. These in vitro and in vivo data indicate that avian-specific phenotypes are modulated by critical viral-host protein interactions involving the NS3-249 residue that directly influence transmission efficiency and therefore the magnitude of WNV epizootics in nature. PMID:24971589

  2. Association Between Variants of PRDM1 and NDP52 and Crohn’s Disease, Based on Exome Sequencing and Functional Studies

    PubMed Central

    Ellinghaus, David; Zhang, Hu; Zeissig, Sebastian; Lipinski, Simone; Till, Andreas; Jiang, Tao; Stade, Björn; Bromberg, Yana; Ellinghaus, Eva; Keller, Andreas; Rivas, Manuel A; Skieceviciene, Jurgita; Doncheva, Nadezhda T; Liu, Xiao; Liu, Qing; Jiang, Fuman; Forster, Michael; Mayr, Gabriele; Albrecht, Mario; Häsler, Robert; Boehm, Bernhard O; Goodall, Jane; Berzuini, Carlo R; Lee, James; Andersen, Vibeke; Vogel, Ulla; Kupcinskas, Limas; Kayser, Manfred; Krawczak, Michael; Nikolaus, Susanna; Weersma, Rinse K; Ponsioen, Cyriel Y; Sans, Miquel; Wijmenga, Cisca; Strachan, David P; McArdle, Wendy L; Vermeire, Séverine; Rutgeerts, Paul; Sanderson, Jeremy D; Mathew, Christopher G; Vatn, Morten H; Wang, Jun; Nöthen, Markus M; Duerr, Richard H; Büning, Carsten; Brand, Stephan; Glas, Jürgen; Winkelmann, Juliane; Illig, Thomas; Latiano, Anna; Annese, Vito; Halfvarson, Jonas; D’Amato, Mauro; Daly, Mark J; Nothnagel, Michael; Karlsen, Tom H; Subramani, Suresh; Rosenstiel, Philip; Schreiber, Stefan; Parkes, Miles; Franke, Andre

    2013-01-01

    Background & Aims Genome-wide association studies (GWASs) have identified 140 Crohn’s disease (CD) susceptibility loci. For most loci, the variants that cause disease are not known and the genes affected by these variants have not been identified. We aimed to identify variants that cause CD through detailed sequencing, genetic association, expression, and functional studies. Methods We sequenced whole exomes of 42 unrelated subjects with Crohn’s disease (CD) and 5 healthy individuals (controls), and then filtered single-nucleotide variants by incorporating association results from meta-analyses of CD GWASs and in silico mutation effect prediction algorithms. We then genotyped 9348 patients with CD, 2868 with ulcerative colitis, and 14,567 controls, and associated variants analyzed in functional studies using materials from patients and controls and in vitro model systems. Results We identified rare missense mutations in PR domain-containing1 (PRDM1) and associated these with CD. These increased proliferation of T cells and secretion of cytokines upon activation, and increased expression of the adhesion molecule L-selectin. A common CD risk allele, identified in GWASs, correlated with reduced expression of PRDM1 in ileal biopsies and peripheral blood mononuclear cells (combined P=1.6×0−8). We identified an association between CD and a common missense variant, Val248Ala, in nuclear domain 10 protein 52 (NDP52) (P=4.83×10−9). We found that this variant impairs the regulatory functions of NDP52 to inhibit NFκB activation of genes that regulate inflammation and affect stability of proteins in toll-like receptor pathways. Conclusions We have extended GWAS results and provide evidence that variants in PRDM1 and NDP52 determine susceptibility to CD. PRDM1 maps adjacent to a CD interval identified in GWASs and encodes a transcription factor expressed by T and B cells. NDP52 is an adaptor protein that functions in selective autophagy of intracellular bacteria and

  3. Human retroviruses and AIDS 1996. A compilation and analysis of nucleic acid and amino acid sequences

    SciTech Connect

    Myers, G.; Foley, B.; Korber, B.; Mellors, J.W.; Jeang, K.T.; Wain-Hobson, S.

    1997-04-01

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (1) Nuclear Acid Alignments and Sequences; (2) Amino Acid Alignments; (3) Analysis; (4) Related Sequences; and (5) Database Communications. Information within all the parts is updated throughout the year on the Web site, http://hiv-web.lanl.gov. While this publication could take the form of a review or sequence monograph, it is not so conceived. Instead, the literature from which the database is derived has simply been summarized and some elementary computational analyses have been performed upon the data. Interpretation and commentary have been avoided insofar as possible so that the reader can form his or her own judgments concerning the complex information. In addition to the general descriptions of the parts of the compendium, the user should read the individual introductions for each part.

  4. A HSV-1 variant (1720) generates four equimolar isomers despite a 9200-bp deletion from TRL and sequences between 9200 np and 97,000 np in inverted orientation being covalently bound to sequences 94,000-126,372 np.

    PubMed

    Harland, J; Brown, S M

    1992-08-01

    The genome structure of a spontaneously generated HSV-1 strain 17 variant, 1720, has been determined by restriction endonuclease and Southern blot analysis. The short segment of 1720 is unaltered compared to the parental strain 17 genome, whereas the long segment is extensively rearranged. Almost all of TRL (approximately 9.2 kb) has been deleted and consequently IRL is converted into unique sequence. Sequences from approximately 9200 nucleotide position (np) to 97,000 np are present in inverted orientation, covalently bound to sequences in the prototype orientation from approximately 94,000 np to the L/S junction at 126,372 np. Thus, sequences from 94,000 np to 97,000 np are now diploid, with one copy in the normal orientation and location, and the other at the long terminus as an inverted repeat; no inversion of the intervening unique sequences occurs about this novel inverted repeat. In contrast, normal inversions of the long and short segments occur to give four equimolar genomic isomers, indicating that the novel long terminus has gained an "a" sequence. The duplication of sequences between 94,000 np and 97,000 np results in a genome containing two copies of UL43 and one complete and one partial copy each of genes UL42 and UL44 encoding the 65 kD DNA-binding protein and glycoprotein C, respectively. The variant has been shown to grow normally in vitro following high multiplicity infection.

  5. Relationship between a Common Variant in the Fatty Acid Desaturase (FADS) Cluster and Eicosanoid Generation in Humans*

    PubMed Central

    Hester, Austin G.; Murphy, Robert C.; Uhlson, Charis J.; Ivester, Priscilla; Lee, Tammy C.; Sergeant, Susan; Miller, Leslie R.; Howard, Timothy D.; Mathias, Rasika A.; Chilton, Floyd H.

    2014-01-01

    Dramatic shifts in the Western diet have led to a marked increase in the dietary intake of the n-6 polyunsaturated fatty acid (PUFA), linoleic acid (LA). Dietary LA can then be converted to arachidonic acid (ARA) utilizing three enzymatic steps. Two of these steps are encoded for by the fatty acid desaturase (FADS) cluster (chromosome 11, 11q12.2-q13) and certain genetic variants within the cluster are highly associated with ARA levels. However, no study to date has examined whether these variants further influence pro-inflammatory, cyclooxygenase and lipoxygenase eicosanoid products. This study examined the impact of a highly influential FADS SNP, rs174537 on leukotriene, HETE, prostaglandin, and thromboxane biosynthesis in stimulated whole blood. Thirty subjects were genotyped at rs174537 (GG, n = 11; GT, n = 13; TT, n = 6), a panel of fatty acids from whole serum was analyzed, and precursor-to-product PUFA ratios were calculated as a marker of the capacity of tissues (particularly the liver) to synthesize long chain PUFAs. Eicosanoids produced by stimulated human blood were measured by LC-MS/MS. We observed an association between rs174537 and the ratio of ARA/LA, leukotriene B4, and 5-HETE but no effect on levels of cyclooxygenase products. Our results suggest that variation at rs174537 not only impacts the synthesis of ARA but the overall capacity of whole blood to synthesize 5-lipoxygenase products; these genotype-related changes in eicosanoid levels could have important implications in a variety of inflammatory diseases. PMID:24962583

  6. Generic and sequence-variant specific molecular assays for the detection of the highly variable Grapevine leafroll-associated virus 3.

    PubMed

    Chooi, Kar Mun; Cohen, Daniel; Pearson, Michael N

    2013-04-01

    Grapevine leafroll-associated virus 3 (GLRaV-3) is an economically important virus, which is found in all grapevine growing regions worldwide. Its accurate detection in nursery and field samples is of high importance for certification schemes and disease management programmes. To reduce false negatives that can be caused by sequence variability, a new universal primer pair was designed against a divergent sequence data set, targeting the open reading frame 4 (heat shock protein 70 homologue gene), and optimised for conventional one-step RT-PCR and one-step SYBR Green real-time RT-PCR assays. In addition, primer pairs for the simultaneous detection of specific GLRaV-3 variants from groups 1, 2, 6 (specifically NZ-1) and the outlier NZ2 variant, and the generic detection of variants from groups 1 to 5 were designed and optimised as a conventional one-step multiplex RT-PCR assay using the plant nad5 gene as an internal control (i.e. one-step hexaplex RT-PCR). Results showed that the generic and variant specific assays detected in vitro RNA transcripts from a range of 1×10(1)-1×10(8) copies of amplicon per μl diluted in healthy total RNA from Vitis vinifera cv. Cabernet Sauvignon. Furthermore, the assays were employed effectively to screen 157 germplasm and 159 commercial field samples. Thus results demonstrate that the GLRaV-3 generic and variant-specific assays are prospective tools that will be beneficial for certification schemes and disease management programmes, as well as biological and epidemiological studies of the divergent GLRaV-3 populations.

  7. Differentially Expressed Genes in Endometrium and Corpus Luteum of Holstein Cows Selected for High and Low Fertility Are Enriched for Sequence Variants Associated with Fertility.

    PubMed

    Moore, Stephen G; Pryce, Jennie E; Hayes, Ben J; Chamberlain, Amanda J; Kemper, Kathryn E; Berry, Donagh P; McCabe, Matt; Cormican, Paul; Lonergan, Pat; Fair, Trudee; Butler, Stephen T

    2016-01-01

    Despite the importance of fertility in humans and livestock, there has been little success dissecting the genetic basis of fertility. Our hypothesis was that genes differentially expressed in the endometrium and corpus luteum on Day 13 of the estrous cycle between cows with either good or poor genetic merit for fertility would be enriched for genetic variants associated with fertility. We combined a unique genetic model of fertility (cattle that have been selected for high and low fertility and show substantial difference in fertility) with gene expression data from these cattle and genome-wide association study (GWAS) results in ∼20,000 cattle to identify quantitative trait loci (QTL) regions and sequence variants associated with genetic variation in fertility. Two hundred and forty-five QTL regions and 17 sequence variants associated primarily with prostaglandin F2alpha, steroidogenesis, mRNA processing, energy status, and immune-related processes were identified. Ninety-three of the QTL regions were validated by two independent GWAS, with signals for fertility detected primarily on chromosomes 18, 5, 7, 8, and 29. Plausible causative mutations were identified, including one missense variant significantly associated with fertility and predicted to affect the protein function of EIF4EBP3. The results of this study enhance our understanding of 1) the contribution of the endometrium and corpus luteum transcriptome to phenotypic fertility differences and 2) the genetic architecture of fertility in dairy cattle. Including these variants in predictions of genomic breeding values may improve the rate of genetic gain for this critical trait.

  8. Generic and sequence-variant specific molecular assays for the detection of the highly variable Grapevine leafroll-associated virus 3.

    PubMed

    Chooi, Kar Mun; Cohen, Daniel; Pearson, Michael N

    2013-04-01

    Grapevine leafroll-associated virus 3 (GLRaV-3) is an economically important virus, which is found in all grapevine growing regions worldwide. Its accurate detection in nursery and field samples is of high importance for certification schemes and disease management programmes. To reduce false negatives that can be caused by sequence variability, a new universal primer pair was designed against a divergent sequence data set, targeting the open reading frame 4 (heat shock protein 70 homologue gene), and optimised for conventional one-step RT-PCR and one-step SYBR Green real-time RT-PCR assays. In addition, primer pairs for the simultaneous detection of specific GLRaV-3 variants from groups 1, 2, 6 (specifically NZ-1) and the outlier NZ2 variant, and the generic detection of variants from groups 1 to 5 were designed and optimised as a conventional one-step multiplex RT-PCR assay using the plant nad5 gene as an internal control (i.e. one-step hexaplex RT-PCR). Results showed that the generic and variant specific assays detected in vitro RNA transcripts from a range of 1×10(1)-1×10(8) copies of amplicon per μl diluted in healthy total RNA from Vitis vinifera cv. Cabernet Sauvignon. Furthermore, the assays were employed effectively to screen 157 germplasm and 159 commercial field samples. Thus results demonstrate that the GLRaV-3 generic and variant-specific assays are prospective tools that will be beneficial for certification schemes and disease management programmes, as well as biological and epidemiological studies of the divergent GLRaV-3 populations. PMID:23313884

  9. Exome sequencing reveals a novel WDR45 frameshift mutation and inherited POLR3A heterozygous variants in a female with a complex phenotype and mixed brain MRI findings.

    PubMed

    Khalifa, Mohamed; Naffaa, Lena

    2015-08-01

    WDR45 and POLR3A are newly recognized genes; each is associated with a distinct neurodegenerative disease. WDR45 is an X-linked gene associated with a dominant form of Neurodegeneration with Brain Iron Accumulation (NBIA), manifested by progressive disabilities, dystonia, cognitive decline, spastic paraplegia, neuropsychiatric abnormalities and iron deposition in the basal ganglia on brain imaging. POLR3A, on the other hand, is an autosomal gene, and its mutations cause a recessive form of a hypomyelination with leukodystrophy disease, also known as 4H syndrome, characterized by congenital Hypomyelination with thinning of the corpus callosum, Hypodontia and Hypogonadotropic Hypogonadism. We report on a female child with severe intellectual disability, aphasia, short stature, ataxia, failure to thrive and structural brain abnormalities. Brain MRI obtained in late infancy showed hypomyelination involving the central periventricular white matter and thinning of the corpus callosum with no evidence of iron accumulation. Brain MRI obtained in childhood showed stable hypomyelination, with progressive iron accumulation in the basal ganglia, in particular in the globus pallidus and substantia nigra. Whole Exome Sequencing (WES) identified a novel WDR45 frameshift deleterious mutation in Exon 9 (c.587-588del) and also revealed three POLR3A missense heterozygous variants. The first is a maternally inherited novel missense variant in exon 4 (c.346A > G). Exon 13 carried two heterozygous missense variants, a maternally inherited variant (c.1724A > T) and a paternally inherited variant (1745G > A). These variants are considered likely damaging. The patient's complex clinical phenotype and mixed brain MRI findings might be attributed to the confounding effects of the expression of these two mutant genes.

  10. Transcriptome Sequencing in Response to Salicylic Acid in Salvia miltiorrhiza.

    PubMed

    Zhang, Xiaoru; Dong, Juane; Liu, Hailong; Wang, Jiao; Qi, Yuexin; Liang, Zongsuo

    2016-01-01

    Salvia miltiorrhiza is a traditional Chinese herbal medicine, whose quality and yield are often affected by diseases and environmental stresses during its growing season. Salicylic acid (SA) plays a significant role in plants responding to biotic and abiotic stresses, but the involved regulatory factors and their signaling mechanisms are largely unknown. In order to identify the genes involved in SA signaling, the RNA sequencing (RNA-seq) strategy was employed to evaluate the transcriptional profiles in S. miltiorrhiza cell cultures. A total of 50,778 unigenes were assembled, in which 5,316 unigenes were differentially expressed among 0-, 2-, and 8-h SA induction. The up-regulated genes were mainly involved in stimulus response and multi-organism process. A core set of candidate novel genes coding SA signaling component proteins was identified. Many transcription factors (e.g., WRKY, bHLH and GRAS) and genes involved in hormone signal transduction were differentially expressed in response to SA induction. Detailed analysis revealed that genes associated with defense signaling, such as antioxidant system genes, cytochrome P450s and ATP-binding cassette transporters, were significantly overexpressed, which can be used as genetic tools to investigate disease resistance. Our transcriptome analysis will help understand SA signaling and its mechanism of defense systems in S. miltiorrhiza. PMID:26808150

  11. Transcriptome Sequencing in Response to Salicylic Acid in Salvia miltiorrhiza

    PubMed Central

    Zhang, Xiaoru; Dong, Juane; Liu, Hailong; Wang, Jiao; Qi, Yuexin; Liang, Zongsuo

    2016-01-01

    Salvia miltiorrhiza is a traditional Chinese herbal medicine, whose quality and yield are often affected by diseases and environmental stresses during its growing season. Salicylic acid (SA) plays a significant role in plants responding to biotic and abiotic stresses, but the involved regulatory factors and their signaling mechanisms are largely unknown. In order to identify the genes involved in SA signaling, the RNA sequencing (RNA-seq) strategy was employed to evaluate the transcriptional profiles in S. miltiorrhiza cell cultures. A total of 50,778 unigenes were assembled, in which 5,316 unigenes were differentially expressed among 0-, 2-, and 8-h SA induction. The up-regulated genes were mainly involved in stimulus response and multi-organism process. A core set of candidate novel genes coding SA signaling component proteins was identified. Many transcription factors (e.g., WRKY, bHLH and GRAS) and genes involved in hormone signal transduction were differentially expressed in response to SA induction. Detailed analysis revealed that genes associated with defense signaling, such as antioxidant system genes, cytochrome P450s and ATP-binding cassette transporters, were significantly overexpressed, which can be used as genetic tools to investigate disease resistance. Our transcriptome analysis will help understand SA signaling and its mechanism of defense systems in S. miltiorrhiza. PMID:26808150

  12. Amino acid sequence diversity of the major human papillomavirus capsid protein: implications for current and next generation vaccines.

    PubMed

    Ahmed, Amina I; Bissett, Sara L; Beddows, Simon

    2013-08-01

    Despite the fidelity of host cell polymerases, the human papillomavirus (HPV) displays a degree of genomic polymorphism resulting in distinct genotypes and intra-type variants. The current HPV vaccines target the most prevalent genotypes associated with cervical cancer (HPV16/18) and genital warts (HPV6/11). Although these vaccines confer some measure of cross-protection, a multivalent HPV vaccine is in the pipeline that aims to broaden vaccine protection against other cervical cancer-associated genotypes including HPV31, HPV33, HPV45, HPV52 and HPV58. Both current and next generation vaccines comprise virus-like particles, based upon the major capsid protein, L1, and vaccine-induced, type-specific protection is likely mediated by neutralizing antibodies targeting L1 surface-exposed domains. The aim of this study was to perform an in silico analysis of existing full length L1 sequences representing vaccine-relevant HPV genotypes in order to address the degree of naturally-occurring, intra-type polymorphisms. In total, 1281 sequences from the Americas, Africa, Asia and Europe were assembled. Intra-type entropy was low and/or limited to non-surface-exposed residues for HPV6, HPV11 and HPV52 suggesting a minimal effect on vaccine antibodies for these genotypes. For HPV16, intra-type entropy was high but the present analysis did not reveal any significant polymorphisms not previously identified. For HPV31, HPV33, HPV58, however, intra-type entropy was high, mostly mapped to surface-exposed domains and in some cases within known neutralizing antibody epitopes. For HPV18 and HPV45 there were too few sequences for a definitive analysis, but HPV45 displayed some degree of surface-exposed residue diversity. In most cases, the reference sequence for each genotype represented a minority variant and the consensus L1 sequences for HPV18, HPV31, HPV45 and HPV58 did not reflect the L1 sequence of the currently available HPV pseudoviruses. These data highlight a number of variant

  13. Natural vs. random protein sequences: Discovering combinatorics properties on amino acid words.

    PubMed

    Santoni, Daniele; Felici, Giovanni; Vergni, Davide

    2016-02-21

    Casual mutations and natural selection have driven the evolution of protein amino acid sequences that we observe at present in nature. The question about which is the dominant force of proteins evolution is still lacking of an unambiguous answer. Casual mutations tend to randomize protein sequences while, in order to have the correct functionality, one expects that selection mechanisms impose rigid constraints on amino acid sequences. Moreover, one also has to consider that the space of all possible amino acid sequences is so astonishingly large that it could be reasonable to have a well tuned amino acid sequence indistinguishable from a random one. In order to study the possibility to discriminate between random and natural amino acid sequences, we introduce different measures of association between pairs of amino acids in a sequence, and apply them to a dataset of 1047 natural protein sequences and 10,470 random sequences, carefully generated in order to preserve the relative length and amino acid distribution of the natural proteins. We analyze the multidimensional measures with machine learning techniques and show that, to a reasonable extent, natural protein sequences can be differentiated from random ones.

  14. Natural vs. random protein sequences: Discovering combinatorics properties on amino acid words.

    PubMed

    Santoni, Daniele; Felici, Giovanni; Vergni, Davide

    2016-02-21

    Casual mutations and natural selection have driven the evolution of protein amino acid sequences that we observe at present in nature. The question about which is the dominant force of proteins evolution is still lacking of an unambiguous answer. Casual mutations tend to randomize protein sequences while, in order to have the correct functionality, one expects that selection mechanisms impose rigid constraints on amino acid sequences. Moreover, one also has to consider that the space of all possible amino acid sequences is so astonishingly large that it could be reasonable to have a well tuned amino acid sequence indistinguishable from a random one. In order to study the possibility to discriminate between random and natural amino acid sequences, we introduce different measures of association between pairs of amino acids in a sequence, and apply them to a dataset of 1047 natural protein sequences and 10,470 random sequences, carefully generated in order to preserve the relative length and amino acid distribution of the natural proteins. We analyze the multidimensional measures with machine learning techniques and show that, to a reasonable extent, natural protein sequences can be differentiated from random ones. PMID:26656109

  15. Amino acid sequence of Coprinus macrorhizus peroxidase and cDNA sequence encoding Coprinus cinereus peroxidase. A new family of fungal peroxidases.

    PubMed

    Baunsgaard, L; Dalbøge, H; Houen, G; Rasmussen, E M; Welinder, K G

    1993-04-01

    Sequence analysis and cDNA cloning of Coprinus peroxidase (CIP) were undertaken to expand the understanding of the relationships of structure, function and molecular genetics of the secretory heme peroxidases from fungi and plants. Amino acid sequencing of Coprinus macrorhizus peroxidase, and cDNA sequencing of Coprinus cinereus peroxidase showed that the mature proteins are identical in amino acid sequence, 343 residues in size and preceded by a 20-residue signal peptide. Their likely identity to peroxidase from Arthromyces ramosus is discussed. CIP has an 8-residue, glycine-rich N-terminal extension blocked with a pyroglutamate residue which is absent in other fungal peroxidases. The presence of pyroglutamate, formed by cyclization of glutamine, and the finding of a minor fraction of a variant form lacking the N-terminal residue, indicate that signal peptidase cleavage is followed by further enzymic processing. CIP is 40-45% identical in amino-acid sequence to 11 lignin peroxidases from four fungal species, and 42-43% identical to the two known Mn-peroxidases. Like these white-rot fungal peroxidases, CIP has an additional segment of approximately 40 residues at the C-terminus which is absent in plant peroxidases. Although CIP is much more similar to horseradish peroxidase (HRP C) in substrate specificity, specific activity and pH optimum than to white-rot fungal peroxidases, the sequences of CIP and HRP C showed only 18% identity. Hence, CIP qualifies as the first member of a new family of fungal peroxidases. The nine invariant residues present in all plant, fungal and bacterial heme peroxidases are also found in CIP. The present data support the hypothesis that only one chromosomal CIP gene exists. In contrast, a large number of secretory plant and fungal peroxidases are expressed from several peroxidase gene clusters. Analyses of three batches of CIP protein and of 49 CIP clones revealed the existence of only two highly similar alleles indicating less

  16. Whole-Exome Sequencing Identifies Loci Associated with Blood Cell Traits and Reveals a Role for Alternative GFI1B Splice Variants in Human Hematopoiesis.

    PubMed

    Polfus, Linda M; Khajuria, Rajiv K; Schick, Ursula M; Pankratz, Nathan; P