Science.gov

Sample records for acid sequence variants

  1. On human disease-causing amino acid variants: statistical study of sequence and structural patterns

    PubMed Central

    Alexov, Emil

    2015-01-01

    Statistical analysis was carried out on large set of naturally occurring human amino acid variations and it was demonstrated that there is a preference for some amino acid substitutions to be associated with diseases. At an amino acid sequence level, it was shown that the disease-causing variants frequently involve drastic changes of amino acid physico-chemical properties of proteins such as charge, hydrophobicity and geometry. Structural analysis of variants involved in diseases and being frequently observed in human population showed similar trends: disease-causing variants tend to cause more changes of hydrogen bond network and salt bridges as compared with harmless amino acid mutations. Analysis of thermodynamics data reported in literature, both experimental and computational, indicated that disease-causing variants tend to destabilize proteins and their interactions, which prompted us to investigate the effects of amino acid mutations on large databases of experimentally measured energy changes in unrelated proteins. Although the experimental datasets were linked neither to diseases nor exclusory to human proteins, the observed trends were the same: amino acid mutations tend to destabilize proteins and their interactions. Having in mind that structural and thermodynamics properties are interrelated, it is pointed out that any large change of any of them is anticipated to cause a disease. PMID:25689729

  2. Data in support of the discovery of alternative splicing variants of quail LEPR and the evolutionary conservation of qLEPRl by nucleotide and amino acid sequences alignment.

    PubMed

    Wang, Dandan; Xu, Chunlin; Wang, Taian; Li, Hong; Li, Yanmin; Ren, Junxiao; Tian, Yadong; Li, Zhuanjian; Jiao, Yuping; Kang, Xiangtao; Liu, Xiaojun

    2016-03-01

    Leptin receptor (LEPR) belongs to the class I cytokine receptor superfamily which share common structural features and signal transduction pathways. Although multiple LEPR isoforms, which are derived from one gene, were identified in mammals, they were rarely found in avian except the long LEPR. Four alternative splicing variants of quail LEPR (qLEPR) had been cloned and sequenced for the first time (Wang et al., 2015 [1]). To define patterns of the four splicing variants (qLEPRl, qLEPR-a, qLEPR-b and qLEPR-c) and locate the conserved regions of qLEPRl, this data article provides nucleotide sequence alignment of qLEPR and amino acid sequence alignment of representative vertebrate LEPR. The detailed analysis was shown in [1]. PMID:26759819

  3. Data in support of the discovery of alternative splicing variants of quail LEPR and the evolutionary conservation of qLEPRl by nucleotide and amino acid sequences alignment

    PubMed Central

    Wang, Dandan; Xu, Chunlin; Wang, Taian; Li, Hong; Li, Yanmin; Ren, Junxiao; Tian, Yadong; Li, Zhuanjian; Jiao, Yuping; Kang, Xiangtao; Liu, Xiaojun

    2015-01-01

    Leptin receptor (LEPR) belongs to the class I cytokine receptor superfamily which share common structural features and signal transduction pathways. Although multiple LEPR isoforms, which are derived from one gene, were identified in mammals, they were rarely found in avian except the long LEPR. Four alternative splicing variants of quail LEPR (qLEPR) had been cloned and sequenced for the first time (Wang et al., 2015 [1]). To define patterns of the four splicing variants (qLEPRl, qLEPR-a, qLEPR-b and qLEPR-c) and locate the conserved regions of qLEPRl, this data article provides nucleotide sequence alignment of qLEPR and amino acid sequence alignment of representative vertebrate LEPR. The detailed analysis was shown in [1]. PMID:26759819

  4. Better prediction of functional effects for sequence variants

    PubMed Central

    2015-01-01

    Elucidating the effects of naturally occurring genetic variation is one of the major challenges for personalized health and personalized medicine. Here, we introduce SNAP2, a novel neural network based classifier that improves over the state-of-the-art in distinguishing between effect and neutral variants. Our method's improved performance results from screening many potentially relevant protein features and from refining our development data sets. Cross-validated on >100k experimentally annotated variants, SNAP2 significantly outperformed other methods, attaining a two-state accuracy (effect/neutral) of 83%. SNAP2 also outperformed combinations of other methods. Performance increased for human variants but much more so for other organisms. Our method's carefully calibrated reliability index informs selection of variants for experimental follow up, with the most strongly predicted half of all effect variants predicted at over 96% accuracy. As expected, the evolutionary information from automatically generated multiple sequence alignments gave the strongest signal for the prediction. However, we also optimized our new method to perform surprisingly well even without alignments. This feature reduces prediction runtime by over two orders of magnitude, enables cross-genome comparisons, and renders our new method as the best solution for the 10-20% of sequence orphans. SNAP2 is available at: https://rostlab.org/services/snap2web Definitions used Delta, input feature that results from computing the difference feature scores for native amino acid and feature scores for variant amino acid; nsSNP, non-synoymous SNP; PMD, Protein Mutant Database; SNAP, Screening for non-acceptable polymorphisms; SNP, single nucleotide polymorphism; variant, any amino acid changing sequence variant. PMID:26110438

  5. Sequence Variant Descriptions: HGVS Nomenclature and Mutalyzer.

    PubMed

    den Dunnen, Johan T

    2016-01-01

    Consistent and unambiguous description of sequence variants is essential to report and exchange information on the analysis of a genome, in particular in DNA diagnostics. The HGVS nomenclature-recommendations for the description of sequence variants as originally proposed by the Human Genome Variation Society-has gradually been accepted as the international standard for variant description. In this unit, we describe the current recommendations (HGVS version 15.11) regarding how to describe variants at the DNA, RNA, and protein level. We explain the rationale and give example descriptions for all variant types: substitution, deletion, duplication, insertion, inversion, conversion, and complex, as well as special types occurring only on the RNA (splicing) or protein level (nonsense, frame shift, extension). Finally, we point users to available support tools and give examples for the use of the freely available Mutalyzer suite. An extensive version of the HGVS recommendations is available online at http://varnomen.hgvs.org/. © 2016 by John Wiley & Sons, Inc. PMID:27367167

  6. Efficient analysis of mouse genome sequences reveal many nonsense variants.

    PubMed

    Steeland, Sophie; Timmermans, Steven; Van Ryckeghem, Sara; Hulpiau, Paco; Saeys, Yvan; Van Montagu, Marc; Vandenbroucke, Roosmarijn E; Libert, Claude

    2016-05-17

    Genetic polymorphisms in coding genes play an important role when using mouse inbred strains as research models. They have been shown to influence research results, explain phenotypical differences between inbred strains, and increase the amount of interesting gene variants present in the many available inbred lines. SPRET/Ei is an inbred strain derived from Mus spretus that has ∼1% sequence difference with the C57BL/6J reference genome. We obtained a listing of all SNPs and insertions/deletions (indels) present in SPRET/Ei from the Mouse Genomes Project (Wellcome Trust Sanger Institute) and processed these data to obtain an overview of all transcripts having nonsynonymous coding sequence variants. We identified 8,883 unique variants affecting 10,096 different transcripts from 6,328 protein-coding genes, which is about 28% of all coding genes. Because only a subset of these variants results in drastic changes in proteins, we focused on variations that are nonsense mutations that ultimately resulted in a gain of a stop codon. These genes were identified by in silico changing the C57BL/6J coding sequences to the SPRET/Ei sequences, converting them to amino acid (AA) sequences, and comparing the AA sequences. All variants and transcripts affected were also stored in a database, which can be browsed using a SPRET/Ei M. spretus variants web tool (www.spretus.org), including a manual. We validated the tool by demonstrating the loss of function of three proteins predicted to be severely truncated, namely Fas, IRAK2, and IFNγR1. PMID:27147605

  7. Nanopore sequencing detects structural variants in cancer.

    PubMed

    Norris, Alexis L; Workman, Rachael E; Fan, Yunfan; Eshleman, James R; Timp, Winston

    2016-03-01

    Despite advances in sequencing, structural variants (SVs) remain difficult to reliably detect due to the short read length (<300 bp) of 2nd generation sequencing. Not only do the reads (or paired-end reads) need to straddle a breakpoint, but repetitive elements often lead to ambiguities in the alignment of short reads. We propose to use the long-reads (up to 20 kb) possible with 3rd generation sequencing, specifically nanopore sequencing on the MinION. Nanopore sequencing relies on a similar concept to a Coulter counter, reading the DNA sequence from the change in electrical current resulting from a DNA strand being forced through a nanometer-sized pore embedded in a membrane. Though nanopore sequencing currently has a relatively high mismatch rate that precludes base substitution and small frameshift mutation detection, its accuracy is sufficient for SV detection because of its long reads. In fact, long reads in some cases may improve SV detection efficiency. We have tested nanopore sequencing to detect a series of well-characterized SVs, including large deletions, inversions, and translocations that inactivate the CDKN2A/p16 and SMAD4/DPC4 tumor suppressor genes in pancreatic cancer. Using PCR amplicon mixes, we have demonstrated that nanopore sequencing can detect large deletions, translocations and inversions at dilutions as low as 1:100, with as few as 500 reads per sample. Given the speed, small footprint, and low capital cost, nanopore sequencing could become the ideal tool for the low-level detection of cancer-associated SVs needed for molecular relapse, early detection, or therapeutic monitoring. PMID:26787508

  8. Nanopore sequencing detects structural variants in cancer

    PubMed Central

    Norris, Alexis L.; Workman, Rachael E.; Fan, Yunfan; Eshleman, James R.; Timp, Winston

    2016-01-01

    ABSTRACT Despite advances in sequencing, structural variants (SVs) remain difficult to reliably detect due to the short read length (<300 bp) of 2nd generation sequencing. Not only do the reads (or paired-end reads) need to straddle a breakpoint, but repetitive elements often lead to ambiguities in the alignment of short reads. We propose to use the long-reads (up to 20 kb) possible with 3rd generation sequencing, specifically nanopore sequencing on the MinION. Nanopore sequencing relies on a similar concept to a Coulter counter, reading the DNA sequence from the change in electrical current resulting from a DNA strand being forced through a nanometer-sized pore embedded in a membrane. Though nanopore sequencing currently has a relatively high mismatch rate that precludes base substitution and small frameshift mutation detection, its accuracy is sufficient for SV detection because of its long reads. In fact, long reads in some cases may improve SV detection efficiency. We have tested nanopore sequencing to detect a series of well-characterized SVs, including large deletions, inversions, and translocations that inactivate the CDKN2A/p16 and SMAD4/DPC4 tumor suppressor genes in pancreatic cancer. Using PCR amplicon mixes, we have demonstrated that nanopore sequencing can detect large deletions, translocations and inversions at dilutions as low as 1:100, with as few as 500 reads per sample. Given the speed, small footprint, and low capital cost, nanopore sequencing could become the ideal tool for the low-level detection of cancer-associated SVs needed for molecular relapse, early detection, or therapeutic monitoring. PMID:26787508

  9. Strategies to choose from millions of imputed sequence variants

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Millions of sequence variants are known, but subsets are needed for routine genomic predictions or to include on genotyping arrays. Variant selection and imputation strategies were tested using 26 984 simulated reference bulls, of which 1 000 had 30 million sequence variants, 773 had 600 000 markers...

  10. αIIbβ3 variants defined by next-generation sequencing: Predicting variants likely to cause Glanzmann thrombasthenia

    PubMed Central

    Buitrago, Lorena; Rendon, Augusto; Liang, Yupu; Simeoni, Ilenia; Negri, Ana; Filizola, Marta; Ouwehand, Willem H.; Coller, Barry S.; Alessi, Marie-Christine; Ballmaier, Matthias; Bariana, Tadbir; Bellissimo, Daniel; Bertoli, Marta; Bray, Paul; Bury, Loredana; Carrell, Robin; Cattaneo, Marco; Collins, Peter; French, Deborah; Favier, Remi; Freson, Kathleen; Furie, Bruce; Germeshausen, Manuela; Ghevaert, Cedric; Gomez, Keith; Goodeve, Anne; Gresele, Paolo; Guerrero, Jose; Hampshire, Dan J.; Hadinnapola, Charaka; Heemskerk, Johan; Henskens, Yvonne; Hill, Marian; Hogg, Nancy; Johnsen, Jill; Kahr, Walter; Kerr, Ron; Kunishima, Shinji; Laffan, Michael; Natwani, Amit; Neerman-Arbez, Marguerite; Nurden, Paquita; Nurden, Alan; Ormiston, Mark; Othman, Maha; Ouwehand, Willem; Perry, David; Vilk, Shoshana Ravel; Reitsma, Pieter; Rondina, Matthew; Simeoni, Ilenia; Smethurst, Peter; Stephens, Jonathan; Stevenson, William; Szkotak, Artur; Turro, Ernest; Van Geet, Christel; Vries, Minka; Ward, June; Waye, John; Westbury, Sarah; Whiteheart, Sidney; Wilcox, David; Zhang, Bi

    2015-01-01

    Next-generation sequencing is transforming our understanding of human genetic variation but assessing the functional impact of novel variants presents challenges. We analyzed missense variants in the integrin αIIbβ3 receptor subunit genes ITGA2B and ITGB3 identified by whole-exome or -genome sequencing in the ThromboGenomics project, comprising ∼32,000 alleles from 16,108 individuals. We analyzed the results in comparison with 111 missense variants in these genes previously reported as being associated with Glanzmann thrombasthenia (GT), 20 associated with alloimmune thrombocytopenia, and 5 associated with aniso/macrothrombocytopenia. We identified 114 novel missense variants in ITGA2B (affecting ∼11% of the amino acids) and 68 novel missense variants in ITGB3 (affecting ∼9% of the amino acids). Of the variants, 96% had minor allele frequencies (MAF) < 0.1%, indicating their rarity. Based on sequence conservation, MAF, and location on a complete model of αIIbβ3, we selected three novel variants that affect amino acids previously associated with GT for expression in HEK293 cells. αIIb P176H and β3 C547G severely reduced αIIbβ3 expression, whereas αIIb P943A partially reduced αIIbβ3 expression and had no effect on fibrinogen binding. We used receiver operating characteristic curves of combined annotation-dependent depletion, Polyphen 2-HDIV, and sorting intolerant from tolerant to estimate the percentage of novel variants likely to be deleterious. At optimal cut-off values, which had 69–98% sensitivity in detecting GT mutations, between 27% and 71% of the novel αIIb or β3 missense variants were predicted to be deleterious. Our data have implications for understanding the evolutionary pressure on αIIbβ3 and highlight the challenges in predicting the clinical significance of novel missense variants. PMID:25827233

  11. αIIbβ3 variants defined by next-generation sequencing: predicting variants likely to cause Glanzmann thrombasthenia.

    PubMed

    Buitrago, Lorena; Rendon, Augusto; Liang, Yupu; Simeoni, Ilenia; Negri, Ana; Filizola, Marta; Ouwehand, Willem H; Coller, Barry S

    2015-04-14

    Next-generation sequencing is transforming our understanding of human genetic variation but assessing the functional impact of novel variants presents challenges. We analyzed missense variants in the integrin αIIbβ3 receptor subunit genes ITGA2B and ITGB3 identified by whole-exome or -genome sequencing in the ThromboGenomics project, comprising ∼32,000 alleles from 16,108 individuals. We analyzed the results in comparison with 111 missense variants in these genes previously reported as being associated with Glanzmann thrombasthenia (GT), 20 associated with alloimmune thrombocytopenia, and 5 associated with aniso/macrothrombocytopenia. We identified 114 novel missense variants in ITGA2B (affecting ∼11% of the amino acids) and 68 novel missense variants in ITGB3 (affecting ∼9% of the amino acids). Of the variants, 96% had minor allele frequencies (MAF) < 0.1%, indicating their rarity. Based on sequence conservation, MAF, and location on a complete model of αIIbβ3, we selected three novel variants that affect amino acids previously associated with GT for expression in HEK293 cells. αIIb P176H and β3 C547G severely reduced αIIbβ3 expression, whereas αIIb P943A partially reduced αIIbβ3 expression and had no effect on fibrinogen binding. We used receiver operating characteristic curves of combined annotation-dependent depletion, Polyphen 2-HDIV, and sorting intolerant from tolerant to estimate the percentage of novel variants likely to be deleterious. At optimal cut-off values, which had 69-98% sensitivity in detecting GT mutations, between 27% and 71% of the novel αIIb or β3 missense variants were predicted to be deleterious. Our data have implications for understanding the evolutionary pressure on αIIbβ3 and highlight the challenges in predicting the clinical significance of novel missense variants. PMID:25827233

  12. Impaired fasting tolerance among Alaska Native Children with a common Carnitine Palmitoyltransferase 1A sequence variant

    PubMed Central

    Gillingham, Melanie B.; Hirschfeld, Matthew; Lowe, Sarah; Matern, Dietrich; Shoemaker, James; Lambert, William E.; Koeller, David M.

    2011-01-01

    A high prevalence of the sequence variant c.1436C>T in the CPT1A gene has been identified among Alaska Native newborns but the clinical implications of this variant are unknown. We conducted medically supervised fasts in 5 children homozygous for the c.1436C>T variant. Plasma free fatty acids increased normally in these children but their long-chain acylcarnitine and ketone production was significantly blunted. The fast was terminated early in two subjects due to symptoms of hypoglycemia. Homozygosity for the c.1436C>T sequence variant of CPT1A impairs fasting ketogenesis, and can cause hypoketotic hypoglycemia in young children. PMID:21763168

  13. Selection of sequence variants to improve dairy cattle genomic predictions

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genomic prediction reliabilities improved when adding selected sequence variants from run 5 of the 1,000 bull genomes project. High density (HD) imputed genotypes for 26,970 progeny tested Holstein bulls were combined with sequence variants for 444 Holstein animals. The first test included 481,904 c...

  14. Reproducibility of Variant Calls in Replicate Next Generation Sequencing Experiments

    PubMed Central

    Qi, Yuan; Liu, Xiuping; Liu, Chang-gong; Wang, Bailing; Hess, Kenneth R.; Symmans, W. Fraser; Shi, Weiwei; Pusztai, Lajos

    2015-01-01

    Nucleotide alterations detected by next generation sequencing are not always true biological changes but could represent sequencing errors. Even highly accurate methods can yield substantial error rates when applied to millions of nucleotides. In this study, we examined the reproducibility of nucleotide variant calls in replicate sequencing experiments of the same genomic DNA. We performed targeted sequencing of all known human protein kinase genes (kinome) (~3.2 Mb) using the SOLiD v4 platform. Seventeen breast cancer samples were sequenced in duplicate (n=14) or triplicate (n=3) to assess concordance of all calls and single nucleotide variant (SNV) calls. The concordance rates over the entire sequenced region were >99.99%, while the concordance rates for SNVs were 54.3-75.5%. There was substantial variation in basic sequencing metrics from experiment to experiment. The type of nucleotide substitution and genomic location of the variant had little impact on concordance but concordance increased with coverage level, variant allele count (VAC), variant allele frequency (VAF), variant allele quality and p-value of SNV-call. The most important determinants of concordance were VAC and VAF. Even using the highest stringency of QC metrics the reproducibility of SNV calls was around 80% suggesting that erroneous variant calling can be as high as 20-40% in a single experiment. The sequence data have been deposited into the European Genome-phenome Archive (EGA) with accession number EGAS00001000826. PMID:26136146

  15. Identifying rare variants associated with complex traits via sequencing

    PubMed Central

    Li, Bingshan; Liu, Dajiang J.; Leal, Suzanne M.

    2013-01-01

    Although genome-wide association studies have been successful in detecting associations with common variants, there is currently an increasing interest in identifying low frequency and rare variants associated with complex traits. Next-generation sequencing technologies make it feasible to survey the full spectrum of genetic variation in coding regions or the entire genome. Due to the low frequency of rare variants, coupled with allelic heterogeneity, however, the association analysis for rare variants is challenging and traditional methods are ineffective. Recently a battery of new statistical methods has been proposed for identifying rare variants associated with complex traits. These methods test for associations by aggregating multiple rare variants across a gene or a genomic region, or a group of variants in the genome. In this Unit, we describe key concepts for rare variant association for complex traits, survey some of the recent methods and discuss their statistical power under various scenarios, and provide practical guidance on analyzing next-generation sequencing data for identifying rare variants associated with complex traits. PMID:23853079

  16. Analysis of amino acid substitutions in AraC variants that respond to triacetic acid lactone.

    PubMed

    Frei, Christopher S; Wang, Zhiqing; Qian, Shuai; Deutsch, Samuel; Sutter, Markus; Cirino, Patrick C

    2016-04-01

    The Escherichia coli regulatory protein AraC regulates expression of ara genes in response to l-arabinose. In efforts to develop genetically encoded molecular reporters, we previously engineered an AraC variant that responds to the compound triacetic acid lactone (TAL). This variant (named "AraC-TAL1") was isolated by screening a library of AraC variants, in which five amino acid positions in the ligand-binding pocket were simultaneously randomized. Screening was carried out through multiple rounds of alternating positive and negative fluorescence-activated cell sorting. Here we show that changing the screening protocol results in the identification of different TAL-responsive variants (nine new variants). Individual substituted residues within these variants were found to primarily act cooperatively toward the gene expression response. Finally, X-ray diffraction was used to solve the crystal structure of the apo AraC-TAL1 ligand-binding domain. The resolved crystal structure confirms that this variant takes on a structure nearly identical to the apo wild-type AraC ligand-binding domain (root-mean-square deviation 0.93 Å), suggesting that AraC-TAL1 behaves similar to wild-type with regard to ligand recognition and gene regulation. Our results provide amino acid sequence-function data sets for training and validating AraC modeling studies, and contribute to our understanding of how to design new biosensors based on AraC. PMID:26749125

  17. Impaired fasting tolerance among Alaska native children with a common carnitine palmitoyltransferase 1A sequence variant.

    PubMed

    Gillingham, Melanie B; Hirschfeld, Matthew; Lowe, Sarah; Matern, Dietrich; Shoemaker, James; Lambert, William E; Koeller, David M

    2011-11-01

    A high prevalence of the sequence variant c.1436C→T in the CPT1A gene has been identified among Alaska Native newborns but the clinical implications of this variant are unknown. We conducted medically supervised fasts in 5 children homozygous for the c.1436C→T variant. Plasma free fatty acids increased normally in these children but their long-chain acylcarnitine and ketone production was significantly blunted. The fast was terminated early in two subjects due to symptoms of hypoglycemia. Homozygosity for the c.1436C→T sequence variant of CPT1A impairs fasting ketogenesis, and can cause hypoketotic hypoglycemia in young children. Trial registration www.clinical trials.gov NCT00653666 "Metabolic Consequences of CPT1A Deficiency" PMID:21763168

  18. Protective variant for hippocampal atrophy identified by whole exome sequencing.

    PubMed

    Nho, Kwangsik; Kim, Sungeun; Risacher, Shannon L; Shen, Li; Corneveaux, Jason J; Swaminathan, Shanker; Lin, Hai; Ramanan, Vijay K; Liu, Yunlong; Foroud, Tatiana M; Inlow, Mark H; Siniard, Ashley L; Reiman, Rebecca A; Aisen, Paul S; Petersen, Ronald C; Green, Robert C; Jack, Clifford R; Weiner, Michael W; Baldwin, Clinton T; Lunetta, Kathryn L; Farrer, Lindsay A; Furney, Simon J; Lovestone, Simon; Simmons, Andrew; Mecocci, Patrizia; Vellas, Bruno; Tsolaki, Magda; Kloszewska, Iwona; Soininen, Hilkka; McDonald, Brenna C; Farlow, Martin R; Ghetti, Bernardino; Huentelman, Matthew J; Saykin, Andrew J

    2015-03-01

    We used whole-exome sequencing to identify variants other than APOE associated with the rate of hippocampal atrophy in amnestic mild cognitive impairment. An in-silico predicted missense variant in REST (rs3796529) was found exclusively in subjects with slow hippocampal volume loss and validated using unbiased whole-brain analysis and meta-analysis across 5 independent cohorts. REST is a master regulator of neurogenesis and neuronal differentiation that has not been previously implicated in Alzheimer's disease. These findings nominate REST and its functional pathways as protective and illustrate the potential of combining next-generation sequencing with neuroimaging to discover novel disease mechanisms and potential therapeutic targets. PMID:25559091

  19. HGVS Recommendations for the Description of Sequence Variants: 2016 Update.

    PubMed

    den Dunnen, Johan T; Dalgleish, Raymond; Maglott, Donna R; Hart, Reece K; Greenblatt, Marc S; McGowan-Jordan, Jean; Roux, Anne-Francoise; Smith, Timothy; Antonarakis, Stylianos E; Taschner, Peter E M

    2016-06-01

    The consistent and unambiguous description of sequence variants is essential to report and exchange information on the analysis of a genome. In particular, DNA diagnostics critically depends on accurate and standardized description and sharing of the variants detected. The sequence variant nomenclature system proposed in 2000 by the Human Genome Variation Society has been widely adopted and has developed into an internationally accepted standard. The recommendations are currently commissioned through a Sequence Variant Description Working Group (SVD-WG) operating under the auspices of three international organizations: the Human Genome Variation Society (HGVS), the Human Variome Project (HVP), and the Human Genome Organization (HUGO). Requests for modifications and extensions go through the SVD-WG following a standard procedure including a community consultation step. Version numbers are assigned to the nomenclature system to allow users to specify the version used in their variant descriptions. Here, we present the current recommendations, HGVS version 15.11, and briefly summarize the changes that were made since the 2000 publication. Most focus has been on removing inconsistencies and tightening definitions allowing automatic data processing. An extensive version of the recommendations is available online, at http://www.HGVS.org/varnomen. PMID:26931183

  20. Sequencing Structural Variants in Cancer for Precision Therapeutics.

    PubMed

    Macintyre, Geoff; Ylstra, Bauke; Brenton, James D

    2016-09-01

    The identification of mutations that guide therapy selection for patients with cancer is now routine in many clinical centres. The majority of assays used for solid tumour profiling use DNA sequencing to interrogate somatic point mutations because they are relatively easy to identify and interpret. Many cancers, however, including high-grade serous ovarian, oesophageal, and small-cell lung cancer, are driven by somatic structural variants that are not measured by these assays. Therefore, there is currently an unmet need for clinical assays that can cheaply and rapidly profile structural variants in solid tumours. In this review we survey the landscape of 'actionable' structural variants in cancer and identify promising detection strategies based on massively-parallel sequencing. PMID:27478068

  1. Composition for nucleic acid sequencing

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2008-08-26

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  2. A mitochondrial DNA variant, identified in Leber hereditary optic neuropathy patients, which extends the amino acid sequence of cytochrome c oxidase subunit I.

    PubMed Central

    Brown, M D; Yang, C C; Trounce, I; Torroni, A; Lott, M T; Wallace, D C

    1992-01-01

    A G-to-A transition at nucleotide pair (np) 7444 in the mtDNA was found to correlate with Leber hereditary optic neuropathy (LHON). The mutation eliminates the termination codon of the cytochrome c oxidase subunit I (COI) gene, extending the COI polypeptide by three amino acids. The mutation was discovered as an XbaI restriction-endonuclease-site loss present in 2 (9.1%) of 22 LHON patients who lacked the np 11778 LHON mutation and in 6 (1.1%) of 545 unaffected controls. The mutant polypeptide has an altered mobility on SDS-PAGE, suggesting a structural alteration, and the cytochrome c oxidase enzyme activity of patient lymphocytes is reduced approximately 40% relative to that in controls. These data suggest that the np 7444 mutation results in partial respiratory deficiency and thus contributes to the onset of LHON. Images Figure 1 Figure 3 PMID:1322638

  3. Fast single-pass alignment and variant calling using sequencing data

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Sequencing research requires efficient computation. Few programs use already known information about DNA variants when aligning sequence data to the reference map. New program findmap.f90 reads the previous variant list before aligning sequence, calling variant alleles, and summing the allele counts...

  4. M2SG: mapping human disease-related genetic variants to protein sequences and genomic loci

    PubMed Central

    Ji, Renkai; Cong, Qian; Li, Wenlin; Grishin, Nick V.

    2013-01-01

    Summary: Online Mendelian Inheritance in Man (OMIM) is a manually curated compendium of human genetic variants and the corresponding phenotypes, mostly human diseases. Instead of directly documenting the native sequences for gene entries, OMIM links its entries to protein and DNA sequences in other databases. However, because of the existence of gene isoforms and errors in OMIM records, mapping a specific OMIM mutation to its corresponding protein sequence is not trivial. Combining computer programs and extensive manual curation of OMIM full-text descriptions and original literature, we mapped 98% of OMIM amino acid substitutions (AASs) and all SwissProt Variant (SwissVar) disease-related AASs to reference sequences and confidently mapped 99.96% of all AASs to the genomic loci. Based on the results, we developed an online database and interactive web server (M2SG) to (i) retrieve the mapped OMIM and SwissVar variants for a given protein sequence; and (ii) obtain related proteins and mutations for an input disease phenotype. This database will be useful for analyzing sequences, understanding the effect of mutations, identifying important genetic variations and designing experiments on a protein of interest. Availability and implementation: The database and web server are freely available at http://prodata.swmed.edu/M2S/mut2seq.cgi. Contact: grishin@chop.swmed.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24002112

  5. An optimized approach to the rapid assessment and detection of sequence variants in recombinant protein products.

    PubMed

    Brady, Lowell J; Scott, Rebecca A; Balland, Alain

    2015-05-01

    The development of sensitive techniques to detect sequence variants (SVs), which naturally arise due to DNA mutations and errors in transcription/translation (amino acid misincorporations), has resulted in increased attention to their potential presence in protein-based biologic drugs in recent years. Often, these SVs may be below 0.1%, adding challenges for consistent and accurate detection. Furthermore, the presence of false-positive (FP) signals, a hallmark of SV analysis, requires time-consuming analyst inspection of the data to sort true from erroneous signal. Consequently, gaps in information about the prevalence, type, and impact of SVs in marketed and in-development products are significant. Here, we report the results of a simple, straightforward, and sensitive approach to sequence variant analysis. This strategy employs mixing of two samples of an antibody or protein with the same amino acid sequence in a dilution series followed by subsequent sequence variant analysis. Using automated peptide map analysis software, a quantitative assessment of the levels of SVs in each sample can be made based on the signal derived from the mass spectrometric data. We used this strategy to rapidly detect differences in sequence variants in a monoclonal antibody after a change in process scale, and in a comparison of three mAbs as part of a biosimilar program. This approach is powerful, as true signals can be readily distinguished from FP signal, even at a level well below 0.1%, by using a simple linear regression analysis across the data set with none to minimal inspection of the MS/MS data. Additionally, the data produced from these studies can also be used to make a quantitative assessment of relative levels of product quality attributes. The information provided here extends the published knowledge about SVs and provides context for the discussion around the potential impact of these SVs on product heterogeneity and immunogenicity. PMID:25795027

  6. SuSPect: enhanced prediction of single amino acid variant (SAV) phenotype using network features.

    PubMed

    Yates, Christopher M; Filippis, Ioannis; Kelley, Lawrence A; Sternberg, Michael J E

    2014-07-15

    Whole-genome and exome sequencing studies reveal many genetic variants between individuals, some of which are linked to disease. Many of these variants lead to single amino acid variants (SAVs), and accurate prediction of their phenotypic impact is important. Incorporating sequence conservation and network-level features, we have developed a method, SuSPect (Disease-Susceptibility-based SAV Phenotype Prediction), for predicting how likely SAVs are to be associated with disease. SuSPect performs significantly better than other available batch methods on the VariBench benchmarking dataset, with a balanced accuracy of 82%. SuSPect is available at www.sbg.bio.ic.ac.uk/suspect. The Web site has been implemented in Perl and SQLite and is compatible with modern browsers. An SQLite database of possible missense variants in the human proteome is available to download at www.sbg.bio.ic.ac.uk/suspect/download.html. PMID:24810707

  7. High speed nucleic acid sequencing

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid. Each type of labeled nucleotide comprises an acceptor fluorophore attached to a phosphate portion of the nucleotide such that the fluorophore is removed upon incorporation into a growing strand. Fluorescent signal is emitted via fluorescent resonance energy transfer between the donor fluorophore and the acceptor fluorophore as each nucleotide is incorporated into the growing strand. The sequence is deduced by identifying which base is being incorporated into the growing strand.

  8. Identifying novel sequence variants of RNA 3D motifs

    PubMed Central

    Zirbel, Craig L.; Roll, James; Sweeney, Blake A.; Petrov, Anton I.; Pirrung, Meg; Leontis, Neocles B.

    2015-01-01

    Predicting RNA 3D structure from sequence is a major challenge in biophysics. An important sub-goal is accurately identifying recurrent 3D motifs from RNA internal and hairpin loop sequences extracted from secondary structure (2D) diagrams. We have developed and validated new probabilistic models for 3D motif sequences based on hybrid Stochastic Context-Free Grammars and Markov Random Fields (SCFG/MRF). The SCFG/MRF models are constructed using atomic-resolution RNA 3D structures. To parameterize each model, we use all instances of each motif found in the RNA 3D Motif Atlas and annotations of pairwise nucleotide interactions generated by the FR3D software. Isostericity relations between non-Watson–Crick basepairs are used in scoring sequence variants. SCFG techniques model nested pairs and insertions, while MRF ideas handle crossing interactions and base triples. We use test sets of randomly-generated sequences to set acceptance and rejection thresholds for each motif group and thus control the false positive rate. Validation was carried out by comparing results for four motif groups to RMDetect. The software developed for sequence scoring (JAR3D) is structured to automatically incorporate new motifs as they accumulate in the RNA 3D Motif Atlas when new structures are solved and is available free for download. PMID:26130723

  9. Identifying novel sequence variants of RNA 3D motifs.

    PubMed

    Zirbel, Craig L; Roll, James; Sweeney, Blake A; Petrov, Anton I; Pirrung, Meg; Leontis, Neocles B

    2015-09-01

    Predicting RNA 3D structure from sequence is a major challenge in biophysics. An important sub-goal is accurately identifying recurrent 3D motifs from RNA internal and hairpin loop sequences extracted from secondary structure (2D) diagrams. We have developed and validated new probabilistic models for 3D motif sequences based on hybrid Stochastic Context-Free Grammars and Markov Random Fields (SCFG/MRF). The SCFG/MRF models are constructed using atomic-resolution RNA 3D structures. To parameterize each model, we use all instances of each motif found in the RNA 3D Motif Atlas and annotations of pairwise nucleotide interactions generated by the FR3D software. Isostericity relations between non-Watson-Crick basepairs are used in scoring sequence variants. SCFG techniques model nested pairs and insertions, while MRF ideas handle crossing interactions and base triples. We use test sets of randomly-generated sequences to set acceptance and rejection thresholds for each motif group and thus control the false positive rate. Validation was carried out by comparing results for four motif groups to RMDetect. The software developed for sequence scoring (JAR3D) is structured to automatically incorporate new motifs as they accumulate in the RNA 3D Motif Atlas when new structures are solved and is available free for download. PMID:26130723

  10. Structure prediction and analysis of neuraminidase sequence variants.

    PubMed

    Thayer, Kelly M

    2016-07-01

    Analyzing protein structure has become an integral aspect of understanding systems of biochemical import. The laboratory experiment endeavors to introduce protein folding to ascertain structures of proteins for which the structure is unavailable, as well as to critically evaluate the quality of the prediction obtained. The model system used is the highly mutable influenza virus protein neuraminidase, which is the key target in the development of therapeutics. In light of recent pandemics, understanding how mutations confer drug resistance, which translates at the molecular level to understanding how different sequence variants differ, constitutes an area of great interest because of the ramifications in public health. This lab targets upper level undergraduate biochemistry students, and aims to introduce tools to be used to explore protein folding and protein visualization in the context of the neuraminidase case study. Students proceed to critically evaluate the folded models by comparison with crystallographic structures. When validity is established, they fold a neuraminidase sequence for which a structure is not available. Through structural alignment and visual inspection of the 150 loop, students gain molecular insight into two possible conformations of the protein, which are actively being studied. Folding the third chosen sequence mimics a true research environment in allowing students to generate a structure from a sequence for which a structure was not previously available, and to assess whether their particular variant has an open or closed loop. From this vantage, they are then challenged to speculate about the connection between loop conformation and drug susceptibility. © 2016 by The International Union of Biochemistry and Molecular Biology, 44(4):361-376, 2016. PMID:26900942

  11. Predicted Molecular Effects of Sequence Variants Link to System Level of Disease.

    PubMed

    Reeb, Jonas; Hecht, Maximilian; Mahlich, Yannick; Bromberg, Yana; Rost, Burkhard

    2016-08-01

    Developments in experimental and computational biology are advancing our understanding of how protein sequence variation impacts molecular protein function. However, the leap from the micro level of molecular function to the macro level of the whole organism, e.g. disease, remains barred. Here, we present new results emphasizing earlier work that suggested some links from molecular function to disease. We focused on non-synonymous single nucleotide variants, also referred to as single amino acid variants (SAVs). Building upon OMIA (Online Mendelian Inheritance in Animals), we introduced a curated set of 117 disease-causing SAVs in animals. Methods optimized to capture effects upon molecular function often correctly predict human (OMIM) and animal (OMIA) Mendelian disease-causing variants. We also predicted effects of human disease-causing variants in the mouse model, i.e. we put OMIM SAVs into mouse orthologs. Overall, fewer variants were predicted with effect in the model organism than in the original organism. Our results, along with other recent studies, demonstrate that predictions of molecular effects capture some important aspects of disease. Thus, in silico methods focusing on the micro level of molecular function can help to understand the macro system level of disease. PMID:27536940

  12. Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants

    PubMed Central

    Belkadi, Aziz; Bolze, Alexandre; Itan, Yuval; Cobat, Aurélie; Vincent, Quentin B.; Antipenko, Alexander; Shang, Lei; Boisson, Bertrand; Casanova, Jean-Laurent; Abel, Laurent

    2015-01-01

    We compared whole-exome sequencing (WES) and whole-genome sequencing (WGS) in six unrelated individuals. In the regions targeted by WES capture (81.5% of the consensus coding genome), the mean numbers of single-nucleotide variants (SNVs) and small insertions/deletions (indels) detected per sample were 84,192 and 13,325, respectively, for WES, and 84,968 and 12,702, respectively, for WGS. For both SNVs and indels, the distributions of coverage depth, genotype quality, and minor read ratio were more uniform for WGS than for WES. After filtering, a mean of 74,398 (95.3%) high-quality (HQ) SNVs and 9,033 (70.6%) HQ indels were called by both platforms. A mean of 105 coding HQ SNVs and 32 indels was identified exclusively by WES whereas 692 HQ SNVs and 105 indels were identified exclusively by WGS. We Sanger-sequenced a random selection of these exclusive variants. For SNVs, the proportion of false-positive variants was higher for WES (78%) than for WGS (17%). The estimated mean number of real coding SNVs (656 variants, ∼3% of all coding HQ SNVs) identified by WGS and missed by WES was greater than the number of SNVs identified by WES and missed by WGS (26 variants). For indels, the proportions of false-positive variants were similar for WES (44%) and WGS (46%). Finally, WES was not reliable for the detection of copy-number variations, almost all of which extended beyond the targeted regions. Although currently more expensive, WGS is more powerful than WES for detecting potential disease-causing mutations within WES regions, particularly those due to SNVs. PMID:25827230

  13. Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants.

    PubMed

    Belkadi, Aziz; Bolze, Alexandre; Itan, Yuval; Cobat, Aurélie; Vincent, Quentin B; Antipenko, Alexander; Shang, Lei; Boisson, Bertrand; Casanova, Jean-Laurent; Abel, Laurent

    2015-04-28

    We compared whole-exome sequencing (WES) and whole-genome sequencing (WGS) in six unrelated individuals. In the regions targeted by WES capture (81.5% of the consensus coding genome), the mean numbers of single-nucleotide variants (SNVs) and small insertions/deletions (indels) detected per sample were 84,192 and 13,325, respectively, for WES, and 84,968 and 12,702, respectively, for WGS. For both SNVs and indels, the distributions of coverage depth, genotype quality, and minor read ratio were more uniform for WGS than for WES. After filtering, a mean of 74,398 (95.3%) high-quality (HQ) SNVs and 9,033 (70.6%) HQ indels were called by both platforms. A mean of 105 coding HQ SNVs and 32 indels was identified exclusively by WES whereas 692 HQ SNVs and 105 indels were identified exclusively by WGS. We Sanger-sequenced a random selection of these exclusive variants. For SNVs, the proportion of false-positive variants was higher for WES (78%) than for WGS (17%). The estimated mean number of real coding SNVs (656 variants, ∼3% of all coding HQ SNVs) identified by WGS and missed by WES was greater than the number of SNVs identified by WES and missed by WGS (26 variants). For indels, the proportions of false-positive variants were similar for WES (44%) and WGS (46%). Finally, WES was not reliable for the detection of copy-number variations, almost all of which extended beyond the targeted regions. Although currently more expensive, WGS is more powerful than WES for detecting potential disease-causing mutations within WES regions, particularly those due to SNVs. PMID:25827230

  14. Human microsomal epoxide hydrolase: genetic polymorphism and functional expression in vitro of amino acid variants

    PubMed Central

    Hassett, Christopher; Aicher, Lauri; Sidhu, Jaspreet S.

    2016-01-01

    Human microsomal epoxide hydrolase (mEH) is a biotransformation enzyme that metabolizes reactive epoxide intermediates to more water-soluble trans-dihydrodiol derivatives. We compared protein-coding sequences from six full-length human mEH DNA clones and assessed potential amino acid variation at seven positions. The prevalence of these variants was assessed in at least 37 unrelated individuals using polymerase chain reaction experiments. Only Tyr/His 113 (exon 3) and His/Arg 139 (exon 4) variants were observed. The genotype frequencies determined for residue 113 alleles indicate that this locus may not be in Hardy – Weinberg equilibrium, whereas frequencies observed for residue 139 alleles were similar to expected values. Nucleotide sequences coding for the variant amino acids were constructed in an mEH cDNA using site-directed mutagenesis, and each was expressed in vitro by transient transfection of COS-1 cells. Epoxide hydrolase mRNA level, catalytic activity, and immunoreactive protein were evaluated for each construct. The results of these analyses demonstrated relatively uniform levels of mEH RNA expression between the constructs. mEH enzymatic activity and immunoreactive protein were strongly correlated, indicating that mEH specific activity was similar for each variant. However, marked differences were noted in the relative amounts of immunoreactive protein and enzymatic activity resulting from the amino acid substitutions. These data suggest that common human mEH amino acid polymorphisms may alter enzymatic function, possibly by modifying protein stability. PMID:7516776

  15. Many amino acid substitution variants identified in DNA repair genes during human population screenings are predicted to impact protein function

    SciTech Connect

    Xi, T; Jones, I M; Mohrenweiser, H W

    2003-11-03

    Over 520 different amino acid substitution variants have been previously identified in the systematic screening of 91 human DNA repair genes for sequence variation. Two algorithms were employed to predict the impact of these amino acid substitutions on protein activity. Sorting Intolerant From Tolerant (SIFT) classified 226 of 508 variants (44%) as ''Intolerant''. Polymorphism Phenotyping (PolyPhen) classed 165 of 489 amino acid substitutions (34%) as ''Probably or Possibly Damaging''. Another 9-15% of the variants were classed as ''Potentially Intolerant or Damaging''. The results from the two algorithms are highly associated, with concordance in predicted impact observed for {approx}62% of the variants. Twenty one to thirty one percent of the variant proteins are predicted to exhibit reduced activity by both algorithms. These variants occur at slightly lower individual allele frequency than do the variants classified as ''Tolerant'' or ''Benign''. Both algorithms correctly predicted the impact of 26 functionally characterized amino acid substitutions in the APE1 protein on biochemical activity, with one exception. It is concluded that a substantial fraction of the missense variants observed in the general human population are functionally relevant. These variants are expected to be the molecular genetic and biochemical basis for the associations of reduced DNA repair capacity phenotypes with elevated cancer risk.

  16. Beyond Rare-Variant Association Testing: Pinpointing Rare Causal Variants in Case-Control Sequencing Study

    PubMed Central

    Lin, Wan-Yu

    2016-01-01

    Rare-variant association testing usually requires some method of aggregation. The next important step is to pinpoint individual rare causal variants among a large number of variants within a genetic region. Recently Ionita-Laza et al. propose a backward elimination (BE) procedure that can identify individual causal variants among the many variants in a gene. The BE procedure removes a variant if excluding this variant can lead to a smaller P-value for the BURDEN test (referred to as “BE-BURDEN”) or the SKAT test (referred to as “BE-SKAT”). We here use the adaptive combination of P-values (ADA) method to pinpoint causal variants. Unlike most gene-based association tests, the ADA statistic is built upon per-site P-values of individual variants. It is straightforward to select important variants given the optimal P-value truncation threshold found by ADA. We performed comprehensive simulations to compare ADA with BE-SKAT and BE-BURDEN. Ranking these three approaches according to positive predictive values (PPVs), the percentage of truly causal variants among the total selected variants, we found ADA > BE-SKAT > BE-BURDEN across all simulation scenarios. We therefore recommend using ADA to pinpoint plausible rare causal variants in a gene. PMID:26903168

  17. Functional annotation of non-coding sequence variants

    PubMed Central

    Ritchie, Graham R. S.; Dunham, Ian; Zeggini, Eleftheria; Flicek, Paul

    2016-01-01

    Identifying functionally relevant variants against the background of ubiquitous genetic variation is a major challenge in human genetics. For variants that fall in protein-coding regions our understanding of the genetic code and splicing allow us to identify likely candidates, but interpreting variants that fall outside of genic regions is more difficult. Here we present a new tool, GWAVA, which supports prioritisation of non-coding variants by integrating a range of annotations. PMID:24487584

  18. Detection and sequence determination of a new variant beta-lactoglobulin II from donkey.

    PubMed

    Cunsolo, Vincenzo; Costa, Alessia; Saletti, Rosaria; Muccilli, Vera; Foti, Salvatore

    2007-01-01

    The sequence determination of a new variant of beta-LG II, detected as a minor component by reversed-phase high-performance liquid chromatography/electrospray ionization mass spectrometry (RP-HPLC/ESI-MS) analysis of the whey fraction from a milk sample taken from an individual donkey belonging to the 'Ragusana' species of eastern Sicily, is reported. Direct RP-HPLC/ESI-MS analysis of the whey fraction from this milk sample allowed the identification of a new variant of beta-LG II, based on the determination of the M(r) of the intact protein. The new protein, with an experimentally determined M(r) of 18311 Da, was detected as a minor component in the whey fraction investigated. Matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF)MS and RP-HPLC/ESI-MS/MS analyses of the tryptic digest of the new protein demonstrate that it presents two amino acid substitutions with respect to the sequence of beta-LG II A, namely a substitution Pro-->Cys at position 110, and a substitution Asp-->Gly at position 162. The disulfide bonds between the four cysteines, not directly determined in donkey's and horse's beta-LG II, were shown to occur between Cys(106)-Cys(120) and Cys(66)-Cys(161), as in other mammalian beta-LGs. The new beta-LG II variant from donkey was named D. PMID:17377935

  19. A comparison of 454 sequencing and clonal sequencing for the characterization of hepatitis C virus NS3 variants.

    PubMed

    Ho, Cynthia K Y; Welkers, Matthijs R A; Thomas, Xiomara V; Sullivan, James C; Kieffer, Tara L; Reesink, Henk W; Rebers, Sjoerd P H; de Jong, Menno D; Schinkel, Janke; Molenkamp, Richard

    2015-07-01

    We compared 454 amplicon sequencing with clonal sequencing for the characterization of intra-host hepatitis C virus (HCV) NS3 variants. Clonal and 454 sequences were obtained from 12 patients enrolled in a clinical phase I study for telaprevir, an NS3-4a protease inhibitor. Thirty-nine datasets were used to compare the consensus sequence, average pairwise distance, normalized Shannon entropy, phylogenetic tree topology and the number and frequency of variants derived from both sequencing techniques. In general, a good concordance was observed between both techniques for the majority of datasets. Discordant results were observed for 5 out of 39 clonal and 454 datasets, which could be attributed to primer-related selective amplification used for clonal sequencing. Both 454 and clonal datasets consisted of a few major variants and a large number of low-frequency variants. Telaprevir resistance-associated variants were observed in low frequencies and were detected more often by 454. We conclude that performance of 454 and clonal sequencing is comparable for the characterization of intra-host virus populations. Not surprisingly, 454 is superior for the detection of low frequency resistance-associated variants. However, despite the greater coverage, 454 failed to detect some low frequency variants detected by clonal sequencing. PMID:25818622

  20. Consensus Rules in Variant Detection from Next-Generation Sequencing Data

    PubMed Central

    Jia, Peilin; Li, Fei; Xia, Jufeng; Chen, Haiquan; Ji, Hongbin; Pao, William; Zhao, Zhongming

    2012-01-01

    A critical step in detecting variants from next-generation sequencing data is post hoc filtering of putative variants called or predicted by computational tools. Here, we highlight four critical parameters that could enhance the accuracy of called single nucleotide variants and insertions/deletions: quality and deepness, refinement and improvement of initial mapping, allele/strand balance, and examination of spurious genes. Use of these sequence features appropriately in variant filtering could greatly improve validation rates, thereby saving time and costs in next-generation sequencing projects. PMID:22715385

  1. Predicting effects of noncoding variants with deep learning-based sequence model.

    PubMed

    Zhou, Jian; Troyanskaya, Olga G

    2015-10-01

    Identifying functional effects of noncoding variants is a major challenge in human genetics. To predict the noncoding-variant effects de novo from sequence, we developed a deep learning-based algorithmic framework, DeepSEA (http://deepsea.princeton.edu/), that directly learns a regulatory sequence code from large-scale chromatin-profiling data, enabling prediction of chromatin effects of sequence alterations with single-nucleotide sensitivity. We further used this capability to improve prioritization of functional variants including expression quantitative trait loci (eQTLs) and disease-associated variants. PMID:26301843

  2. Predicting effects of noncoding variants with deep learning–based sequence model

    PubMed Central

    Zhou, Jian; Troyanskaya, Olga G

    2016-01-01

    Identifying functional effects of noncoding variants is a major challenge in human genetics. To predict the noncoding-variant effects de novo from sequence, we developed a deep learning–based algorithmic framework, DeepSEA (http://deepsea.princeton.edu/), that directly learns a regulatory sequence code from large-scale chromatin-profiling data, enabling prediction of chromatin effects of sequence alterations with single-nucleotide sensitivity. We further used this capability to improve prioritization of functional variants including expression quantitative trait loci (eQTLs) and disease-associated variants. PMID:26301843

  3. Rapid identification of an antibody DNA construct rearrangement sequence variant by mass spectrometry.

    PubMed

    Scott, Rebecca A; Rogers, Rich; Balland, Alain; Brady, Lowell J

    2014-01-01

    During cell line development for an IgG1 antibody candidate (mAb1), a C-terminal extension was identified in 2 product candidate clones expressed in CHO-K1 cell line. The extension was initially observed as the presence of anomalous new peaks in these clones after analysis by cation exchange chromatography (CEX-HPLC) and reduced capillary electrophoresis (rCE-SDS). Reduced mass analysis of these CHO-K1 clones revealed that a larger than expected mass was present on a sub-population of the heavy chain species, which could not be explained by any known chemical or post-translational modifications. It was suspected that this additional mass on the heavy chain was due to the presence of an additional amino acid sequence. To identify the suspected additional sequence, de novo sequencing in combination with proteomic searching was performed against translated DNA vectors for the heavy chain and light chain. Peptides unique to the clones containing the extension were identified matching short sequences (corresponding to 9 and 35 amino acids, respectively) from 2 non-coding sections of the light chain vector construct. After investigation, this extension was observed to be due to the re-arrangement of the DNA construct, with the addition of amino acids derived from the light chain vector non-translated sequence to the C-terminus of the heavy chain. This observation showed the power of proteomic mass spectrometric techniques to identify an unexpected antibody sequence variant using de novo sequencing combined with database searching, and allowed for rapid identification of the root cause for new peaks in the cation exchange and rCE-SDS assays. PMID:25484040

  4. Discovery of Rare Variants via Sequencing: Implications for the Design of Complex Trait Association Studies

    PubMed Central

    Li, Bingshan; Leal, Suzanne M.

    2009-01-01

    There is strong evidence that rare variants are involved in complex disease etiology. The first step in implicating rare variants in disease etiology is their identification through sequencing in both randomly ascertained samples (e.g., the 1,000 Genomes Project) and samples ascertained according to disease status. We investigated to what extent rare variants will be observed across the genome and in candidate genes in randomly ascertained samples, the magnitude of variant enrichment in diseased individuals, and biases that can occur due to how variants are discovered. Although sequencing cases can enrich for casual variants, when a gene or genes are not involved in disease etiology, limiting variant discovery to cases can lead to association studies with dramatically inflated false positive rates. PMID:19436704

  5. Replication Strategies for Rare Variant Complex Trait Association Studies via Next-Generation Sequencing

    PubMed Central

    Liu, Dajiang J.; Leal, Suzanne M.

    2010-01-01

    There is solid evidence that complex traits can be caused by rare variants. Next-generation sequencing technologies are powerful tools for mapping rare variants. Confirmation of significant findings in stage 1 through replication in an independent stage 2 sample is necessary for association studies. For gene-based mapping of rare variants, two replication strategies are possible: (1) variant-based replication, wherein only variants from nucleotide sites uncovered in stage 1 are genotyped and followed-up and (2) sequence-based replication, wherein the gene region is sequenced in the replication sample and both known and novel variants are tested. The efficiency of the two strategies is dependent on the proportions of causative variants discovered in stage 1 and sequencing/genotyping errors. With rigorous population genetic and phenotypic models, it is demonstrated that sequence-based replication is consistently more powerful. However, the power gain is small (1) for large-scale studies with thousands of individuals, because a large fraction of causative variant sites can be observed and (2) for small- to medium-scale studies with a few hundred samples, because a large proportion of the locus population attributable risk can be explained by the uncovered variants. Therefore, genotyping can be a temporal solution for replicating genetic studies if stage 1 and 2 samples are drawn from the same population. However, sequence-based replication is advantageous if the stage 1 sample is small or novel variants discovery is also of interest. It is shown that currently attainable levels of sequencing error only minimally affect the comparison, and the advantage of sequence-based replication remains. PMID:21129725

  6. Characterization of alanine to valine sequence variants in the Fc region of nivolumab biosimilar produced in Chinese hamster ovary cells.

    PubMed

    Li, Yantao; Fu, Tuo; Liu, Tao; Guo, Huaizu; Guo, Qingcheng; Xu, Jin; Zhang, Dapeng; Qian, Weizhu; Dai, Jianxin; Li, Bohua; Guo, Yajun; Hou, Sheng; Wang, Hao

    2016-07-01

    Nivolumab is a therapeutic fully human IgG4 antibody to programmed death 1 (PD-1). In this study, a nivolumab biosimilar, which was produced in our laboratory, was analyzed and characterized. Sequence variants that contain undesired amino acid sequences may cause concern during biosimilar bioprocess development. We found that low levels of sequence variants were detected in the heavy chain of the nivolumab biosimilar by ultra performance liquid chromatography (UPLC) and tandem mass spectrometry. It was further identified with UPLC-MS/MS by IdeS or trypsin digestion. The sequence variant was confirmed through addition of synthetic mutant peptide. Subsequently, the mixing base signal of normal and mutant sequence was detected through DNA sequencing. The relative levels of mutant A424V in the Fc region of the heavy chain have been detected and demonstrated to be 12.25% and 13.54%, via base peak intensity (BPI) and UV chromatography of the tryptic peptide mapping, respectively. A424V variant was also quantified by real-time PCR (RT-PCR) at the DNA and RNA level, which was 19.2% and 16.8%, respectively. The relative content of the mutant was consistent at the DNA, RNA and protein level, indicating that the A424V mutation may have little influence at transcriptional or translational levels. These results demonstrate that orthogonal state-of-the-art techniques such as LC- UV- MS and RT-PCR should be implemented to characterize recombinant proteins and cell lines for development of biosimilars. Our study suggests that it is important to establish an integrated and effective analytical method to monitor and characterize sequence variants during antibody drug development, especially for antibody biosimilar products. PMID:27050807

  7. Sequence variants from whole genome sequencing a large group of Icelanders.

    PubMed

    Gudbjartsson, Daniel F; Sulem, Patrick; Helgason, Hannes; Gylfason, Arnaldur; Gudjonsson, Sigurjon A; Zink, Florian; Oddson, Asmundur; Magnusson, Gisli; Halldorsson, Bjarni V; Hjartarson, Eirikur; Sigurdsson, Gunnar Th; Kong, Augustine; Helgason, Agnar; Masson, Gisli; Magnusson, Olafur Th; Thorsteinsdottir, Unnur; Stefansson, Kari

    2015-01-01

    We have accumulated considerable data on the genetic makeup of the Icelandic population by sequencing the whole genomes of 2,636 Icelanders to depth of at least 10X and by chip genotyping 101,584 more. The sequencing was done with Illumina technology. The median sequencing depth was 20X and 909 individuals were sequenced to a depth of at least 30X. We found 20 million single nucleotide polymorphisms (SNPs) and 1.5 million insertions/deletions (indels) that passed stringent quality control. Almost all the common SNPs (derived allele frequency (DAF) over 2%) that we identified in Iceland have been observed by either dbSNP (build 137) or the Exome Sequencing Project (ESP) while only 60 and 20% of rare (DAF<0.5%) SNPs and indels in coding regions, the most heavily studied parts of the genome, have been observed in the public databases. Features of our variant data, such as the transition/transversion ratio and the length distribution of indels, are similar to published reports. PMID:25977816

  8. Emergence of telaprevir-resistant variants detected by ultra-deep sequencing after triple therapy in patients infected with HCV genotype 1.

    PubMed

    Akuta, Norio; Suzuki, Fumitaka; Seko, Yuya; Kawamura, Yusuke; Sezaki, Hitomi; Suzuki, Yoshiyuki; Hosaka, Tetsuya; Kobayashi, Masahiro; Hara, Tasuku; Kobayashi, Mariko; Saitoh, Satoshi; Arase, Yasuji; Ikeda, Kenji; Kumada, Hiromitsu

    2013-06-01

    Using ultra-deep sequencing technology, the present was designed to investigate whether the emergence of telaprevir-resistant variants (amino acid substitutions of aa36, aa54, aa155, aa156, and aa170 positions in HCV NS3 region) after commencement of triple therapy of telaprevir/peginterferon (PEG-IFN)/ribavirin could be predicted at baseline in previous non-responders to dual therapy. Fourteen patients infected with HCV genotype 1 who did not respond to previous PEG-IFN/ribavirin, received a 24-week regimen of triple therapy, and were evaluated for appearance of telaprevir-resistant variants (amino acid substitutions of more than 0.2% among the total coverage) by ultra-deep sequencing. The sustained virological response rate was 28.6% (4 of 14 patients), which was significantly higher in patients with Arg70 (substitution at core aa70) and partial response (type of previous response to PEG-IFN/ribavirin) than in other patients. Telaprevir-resistant variants at baseline were detected in 7.1% (1 of 14 patients) by direct sequencing and in 21.4% (3 of 14 patients) by ultra-deep sequencing. The appearance of telaprevir-resistant variants was examined by ultra-deep sequencing in 10 who did not show sustained virological responders. De novo variants emerged at re-elevation of viral load, regardless of variant frequencies at baseline (one patient with very high frequency variants [T54S: 99.9%], two patients with very low frequency variants [V36A: 0.2%; and V170A: 0.4%], and seven patients of undetectable variants). It is concluded that it is difficult to predict at baseline the emergence of telaprevir-resistant variants after commencement of triple therapy in prior non-responders of HCV genotype 1, even with the use of ultra-deep sequencing. PMID:23588728

  9. Rare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Level

    PubMed Central

    Lu, Wenbin; Tzeng, Jung-Ying

    2016-01-01

    Genetic association analyses of rare variants in next-generation sequencing (NGS) studies are fundamentally challenging due to the presence of a very large number of candidate variants at extremely low minor allele frequencies. Recent developments often focus on pooling multiple variants to provide association analysis at the gene instead of the locus level. Nonetheless, pinpointing individual variants is a critical goal for genomic researches as such information can facilitate the precise delineation of molecular mechanisms and functions of genetic factors on diseases. Due to the extreme rarity of mutations and high-dimensionality, significances of causal variants cannot easily stand out from those of noncausal ones. Consequently, standard false-positive control procedures, such as the Bonferroni and false discovery rate (FDR), are often impractical to apply, as a majority of the causal variants can only be identified along with a few but unknown number of noncausal variants. To provide informative analysis of individual variants in large-scale sequencing studies, we propose the Adaptive False-Negative Control (AFNC) procedure that can include a large proportion of causal variants with high confidence by introducing a novel statistical inquiry to determine those variants that can be confidently dispatched as noncausal. The AFNC provides a general framework that can accommodate for a variety of models and significance tests. The procedure is computationally efficient and can adapt to the underlying proportion of causal variants and quality of significance rankings. Extensive simulation studies across a plethora of scenarios demonstrate that the AFNC is advantageous for identifying individual rare variants, whereas the Bonferroni and FDR are exceedingly over-conservative for rare variants association studies. In the analyses of the CoLaus dataset, AFNC has identified individual variants most responsible for gene-level significances. Moreover, single-variant results

  10. Chip-based sequencing nucleic acids

    DOEpatents

    Beer, Neil Reginald

    2014-08-26

    A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.

  11. Complete Genome Sequence of a Variant Pseudorabies Virus Strain Isolated in Central China

    PubMed Central

    Xiang, Shuangshuang; Zhou, Zhi; Hu, Xule; Li, Yingying; Zhang, Chaolin; Wang, Juan; Li, Xiangdong

    2016-01-01

    Pseudorabies virus (PRV) variants have been prevalent in China since 2011 and have caused huge economic losses to the Chinese pig industry. Here, we report the genome sequence of a PRV variant HN1201 that was isolated from diseased animals in central China in 2011. PMID:26988055

  12. Detection and characterization of two co-infection variant strains of avian orthoreovirus (ARV) in young layer chickens using next-generation sequencing (NGS)

    PubMed Central

    Tang, Yi; Lin, Lin; Sebastian, Aswathy; Lu, Huaguang

    2016-01-01

    Using next-generation sequencing (NGS) for full genomic characterization studies of the newly emerging avian orthoreovirus (ARV) field strains isolated in Pennsylvania poultry, we identified two co-infection ARV variant strains from one ARV isolate obtained from ARV-affected young layer chickens. The de novo assembly of the ARV reads generated 19 contigs of two different ARV variant strains according to 10 genome segments of each ARV strain. The two variants had the same M2 segment. The complete genomes of each of the two variant strains were 23,493 bp in length, and 10 dsRNA segments ranged from 1192 bp (S4) to 3958 bp (L1), encoding 12 viral proteins. Sequence comparison of nucleotide (nt) and amino acid (aa) sequences of all 10 genome segments revealed 58.1–100% and 51.4–100% aa identity between the two variant strains, and 54.3–89.4% and 49.5–98.1% aa identity between the two variants and classic vaccine strains. Phylogenetic analysis revealed a moderate to significant nt sequence divergence between the two variant and ARV reference strains. These findings have demonstrated the first naturally occurring co-infection of two ARV variants in commercial young layer chickens, providing scientific evidence that multiple ARV strains can be simultaneously present in one host species of chickens. PMID:27089943

  13. Distinguishing Proteins From Arbitrary Amino Acid Sequences

    PubMed Central

    Yau, Stephen S.-T.; Mao, Wei-Guang; Benson, Max; He, Rong Lucy

    2015-01-01

    What kinds of amino acid sequences could possibly be protein sequences? From all existing databases that we can find, known proteins are only a small fraction of all possible combinations of amino acids. Beginning with Sanger's first detailed determination of a protein sequence in 1952, previous studies have focused on describing the structure of existing protein sequences in order to construct the protein universe. No one, however, has developed a criteria for determining whether an arbitrary amino acid sequence can be a protein. Here we show that when the collection of arbitrary amino acid sequences is viewed in an appropriate geometric context, the protein sequences cluster together. This leads to a new computational test, described here, that has proved to be remarkably accurate at determining whether an arbitrary amino acid sequence can be a protein. Even more, if the results of this test indicate that the sequence can be a protein, and it is indeed a protein sequence, then its identity as a protein sequence is uniquely defined. We anticipate our computational test will be useful for those who are attempting to complete the job of discovering all proteins, or constructing the protein universe. PMID:25609314

  14. The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure information on human protein variants.

    PubMed

    Yip, Yum L; Scheib, Holger; Diemand, Alexander V; Gattiker, Alexandre; Famiglietti, Livia M; Gasteiger, Elisabeth; Bairoch, Amos

    2004-05-01

    Missense mutation leading to single amino acid polymorphism (SAP) is the type of mutation most frequently related to human diseases. The Swiss-Prot protein knowledgebase records information on such mutations in various sections of a protein entry, namely in the "feature," "comment," and "reference" fields. To facilitate users in obtaining the most relevant information about each human SAP recorded in the knowledgebase, the Swiss-Prot Variant web pages were created to provide a summary of available sequence information, as well as additional structural information on each variant. In particular, the ModSNP database was set up to store information related to SAPs and to manage the modeling of SAPs onto protein structures via an automatic homology modeling pipeline. Currently, among the 16,566 human SAPs recorded in the Swiss-Prot knowledgebase (release 42.5, 21 November 2003), more than 25% have corresponding 3D-models. Of these variants, 47% are related to disease, 26% are polymorphisms, and 27% are not yet clearly classified. The ModSNP database is updated and the subsequent model construction pipeline is launched with each weekly Swiss-Prot release. Thus, the ModSNP database represents a valuable resource for the structural analysis of protein variation. The Swiss-Prot variant pages are accessible from the NiceProt view of a Swiss-Prot entry on the ExPASy server (www.expasy.org/), via a hyperlink created for the stable and unique identifier FTId of each human SAP. PMID:15108278

  15. Test for Rare Variants by Environment Interactions in Sequencing Association Studies

    PubMed Central

    Lin, Xinyi; Lee, Seunggeun; Wu, Michael C.; Wang, Chaolong; Chen, Han; Li, Zilin; Lin, Xihong

    2015-01-01

    Summary We consider in this paper testing rare variants by environment interactions in sequencing association studies. Current methods for studying the association of rare variants with traits cannot be readily applied for testing for rare variants by environment interactions, as these methods do not effectively control for the main effects of rare variants, leading to unstable results and/or inflated Type 1 error rates. We will first analytically study the bias of the use of conventional burden based tests for rare variants by environment interactions, and show the tests can often be invalid and result in inflated Type 1 error rates. To overcome these difficulties, we develop the interaction sequence kernel association test (iSKAT) for assessing rare variants by environment interactions. The proposed test iSKAT is optimal in a class of variance component tests and is powerful and robust to the proportion of variants in a gene that interact with environment and the signs of the effects. This test properly controls for the main effects of the rare variants using weighted ridge regression while adjusting for covariates. We demonstrate the performance of iSKAT using simulation studies and illustrate its application by analysis of a candidate gene sequencing study of plasma adiponectin levels. PMID:26229047

  16. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-05-30

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  17. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-06-06

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  18. RVTESTS: an efficient and comprehensive tool for rare variant association analysis using sequence data

    PubMed Central

    Zhan, Xiaowei; Hu, Youna; Li, Bingshan; Abecasis, Goncalo R.; Liu, Dajiang J.

    2016-01-01

    Motivation: Next-generation sequencing technologies have enabled the large-scale assessment of the impact of rare and low-frequency genetic variants for complex human diseases. Gene-level association tests are often performed to analyze rare variants, where multiple rare variants in a gene region are analyzed jointly. Applying gene-level association tests to analyze sequence data often requires integrating multiple heterogeneous sources of information (e.g. annotations, functional prediction scores, allele frequencies, genotypes and phenotypes) to determine the optimal analysis unit and prioritize causal variants. Given the complexity and scale of current sequence datasets and bioinformatics databases, there is a compelling need for more efficient software tools to facilitate these analyses. To answer this challenge, we developed RVTESTS, which implements a broad set of rare variant association statistics and supports the analysis of autosomal and X-linked variants for both unrelated and related individuals. RVTESTS also provides useful companion features for annotating sequence variants, integrating bioinformatics databases, performing data quality control and sample selection. We illustrate the advantages of RVTESTS in functionality and efficiency using the 1000 Genomes Project data. Availability and implementation: RVTESTS is available on Linux, MacOS and Windows. Source code and executable files can be obtained at https://github.com/zhanxw/rvtests Contact: zhanxw@gmail.com; goncalo@umich.edu; dajiang.liu@outlook.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153000

  19. Detection of Genomic Structural Variants from Next-Generation Sequencing Data

    PubMed Central

    Tattini, Lorenzo; D’Aurizio, Romina; Magi, Alberto

    2015-01-01

    Structural variants are genomic rearrangements larger than 50 bp accounting for around 1% of the variation among human genomes. They impact on phenotypic diversity and play a role in various diseases including neurological/neurocognitive disorders and cancer development and progression. Dissecting structural variants from next-generation sequencing data presents several challenges and a number of approaches have been proposed in the literature. In this mini review, we describe and summarize the latest tools – and their underlying algorithms – designed for the analysis of whole-genome sequencing, whole-exome sequencing, custom captures, and amplicon sequencing data, pointing out the major advantages/drawbacks. We also report a summary of the most recent applications of third-generation sequencing platforms. This assessment provides a guided indication – with particular emphasis on human genetics and copy number variants – for researchers involved in the investigation of these genomic events. PMID:26161383

  20. Simple and efficient identification of rare recessive pathologically important sequence variants from next generation exome sequence data.

    PubMed

    Carr, Ian M; Morgan, Joanne; Watson, Christopher; Melnik, Svitlana; Diggle, Christine P; Logan, Clare V; Harrison, Sally M; Taylor, Graham R; Pena, Sergio D J; Markham, Alexander F; Alkuraya, Fowzan S; Black, Graeme C M; Ali, Manir; Bonthron, David T

    2013-07-01

    Massively parallel ("next generation") DNA sequencing (NGS) has quickly become the method of choice for seeking pathogenic mutations in rare uncharacterized monogenic diseases. Typically, before DNA sequencing, protein-coding regions are enriched from patient genomic DNA, representing either the entire genome ("exome sequencing") or selected mapped candidate loci. Sequence variants, identified as differences between the patient's and the human genome reference sequences, are then filtered according to various quality parameters. Changes are screened against datasets of known polymorphisms, such as dbSNP and the 1000 Genomes Project, in the effort to narrow the list of candidate causative variants. An increasing number of commercial services now offer to both generate and align NGS data to a reference genome. This potentially allows small groups with limited computing infrastructure and informatics skills to utilize this technology. However, the capability to effectively filter and assess sequence variants is still an important bottleneck in the identification of deleterious sequence variants in both research and diagnostic settings. We have developed an approach to this problem comprising a user-friendly suite of programs that can interactively analyze, filter and screen data from enrichment-capture NGS data. These programs ("Agile Suite") are particularly suitable for small-scale gene discovery or for diagnostic analysis. PMID:23554237

  1. Identification of rare variants from exome sequence in a large pedigree with autism

    PubMed Central

    Marchani, E. E.; Chapman, N. H.; Cheung, C. Y. K.; Ankenman, K.; Stanaway, I. B.; Coon, H. H.; Nickerson, D.; Bernier, R.; Brkanac, Z.; Wijsman, E. M.

    2013-01-01

    We carried out analyses with the goal of identifying rare variants in exome sequence data that contribute to disease risk for a complex trait. We analyzed a large, 47-member multigenerational pedigree with 11 cases of autism spectrum disorder, using genotypes from three technologies representing increasing resolution: a multiallelic linkage marker panel; a dense diallelic marker panel; and variants from exome sequencing. Genome-scan marker genotypes were available on most subjects, and exome sequence data was available on 5 subjects. We used genome-scan linkage analysis to identify and prioritize the chromosome 22 region of interest, and to select subjects for exome sequencing. Inheritance vectors (IVs) generated by Markov chain Monte Carlo analysis of multilocus marker data were the foundation of most analyses. Genotype imputation used IVs to determine which sequence variants reside on the haplotype that co-segregates with the autism diagnosis. Together with a rare-allele frequency filter, we identified only one rare variant on the risk haplotype, illustrating the potential of this approach to prioritize variants. The associated gene, MYH9, is biologically unlikely, and we speculate that for this complex trait, the key variants may lie outside the exome. PMID:23594493

  2. Somatic mutations and germline sequence variants in the expressed tyrosine kinase genes of patients with de novo acute myeloid leukemia

    PubMed Central

    Xiang, Zhifu; Walgren, Richard; Zhao, Yu; Kasai, Yumi; Miner, Tracie; Ries, Rhonda E.; Lubman, Olga; Fremont, Daved H.; McLellan, Michael D.; Payton, Jacqueline E.; Westervelt, Peter; DiPersio, John F.; Link, Daniel C.; Walter, Matthew J.; Graubert, Timothy A.; Watson, Mark; Baty, Jack; Heath, Sharon; Shannon, William D.; Nagarajan, Rakesh; Bloomfield, Clara D.; Mardis, Elaine R.; Wilson, Richard K.; Ley, Timothy J.

    2008-01-01

    Activating mutations in tyrosine kinase (TK) genes (eg, FLT3 and KIT) are found in more than 30% of patients with de novo acute myeloid leukemia (AML); many groups have speculated that mutations in other TK genes may be present in the remaining 70%. We performed high-throughput resequencing of the kinase domains of 26 TK genes (11 receptor TK; 15 cytoplasmic TK) expressed in most AML patients using genomic DNA from the bone marrow (tumor) and matched skin biopsy samples (“germline”) from 94 patients with de novo AML; sequence variants were validated in an additional 94 AML tumor samples (14.3 million base pairs of sequence were obtained and analyzed). We identified known somatic mutations in FLT3, KIT, and JAK2 TK genes at the expected frequencies and found 4 novel somatic mutations, JAK1V623A, JAK1T478S, DDR1A803V, and NTRK1S677N, once each in 4 respective patients of 188 tested. We also identified novel germline sequence changes encoding amino acid substitutions (ie, nonsynonymous changes) in 14 TK genes, including TYK2, which had the largest number of nonsynonymous sequence variants (11 total detected). Additional studies will be required to define the roles that these somatic and germline TK gene variants play in AML pathogenesis. PMID:18270328

  3. Re-sequencing expands our understanding of the phenotypic impact of variants at GWAS loci.

    PubMed

    Service, Susan K; Teslovich, Tanya M; Fuchsberger, Christian; Ramensky, Vasily; Yajnik, Pranav; Koboldt, Daniel C; Larson, David E; Zhang, Qunyuan; Lin, Ling; Welch, Ryan; Ding, Li; McLellan, Michael D; O'Laughlin, Michele; Fronick, Catrina; Fulton, Lucinda L; Magrini, Vincent; Swift, Amy; Elliott, Paul; Jarvelin, Marjo-Riitta; Kaakinen, Marika; McCarthy, Mark I; Peltonen, Leena; Pouta, Anneli; Bonnycastle, Lori L; Collins, Francis S; Narisu, Narisu; Stringham, Heather M; Tuomilehto, Jaakko; Ripatti, Samuli; Fulton, Robert S; Sabatti, Chiara; Wilson, Richard K; Boehnke, Michael; Freimer, Nelson B

    2014-01-01

    Genome-wide association studies (GWAS) have identified >500 common variants associated with quantitative metabolic traits, but in aggregate such variants explain at most 20-30% of the heritable component of population variation in these traits. To further investigate the impact of genotypic variation on metabolic traits, we conducted re-sequencing studies in >6,000 members of a Finnish population cohort (The Northern Finland Birth Cohort of 1966 [NFBC]) and a type 2 diabetes case-control sample (The Finland-United States Investigation of NIDDM Genetics [FUSION] study). By sequencing the coding sequence and 5' and 3' untranslated regions of 78 genes at 17 GWAS loci associated with one or more of six metabolic traits (serum levels of fasting HDL-C, LDL-C, total cholesterol, triglycerides, plasma glucose, and insulin), and conducting both single-variant and gene-level association tests, we obtained a more complete understanding of phenotype-genotype associations at eight of these loci. At all eight of these loci, the identification of new associations provides significant evidence for multiple genetic signals to one or more phenotypes, and at two loci, in the genes ABCA1 and CETP, we found significant gene-level evidence of association to non-synonymous variants with MAF<1%. Additionally, two potentially deleterious variants that demonstrated significant associations (rs138726309, a missense variant in G6PC2, and rs28933094, a missense variant in LIPC) were considerably more common in these Finnish samples than in European reference populations, supporting our prior hypothesis that deleterious variants could attain high frequencies in this isolated population, likely due to the effects of population bottlenecks. Our results highlight the value of large, well-phenotyped samples for rare-variant association analysis, and the challenge of evaluating the phenotypic impact of such variants. PMID:24497850

  4. Integrated platform for detection of DNA sequence variants using capillary array electrophoresis

    SciTech Connect

    Qingbro, Li; Liu, Zhaowei; Monroe, Heidi M; Culiat, Cymbeline T

    2002-08-01

    We have developed a highly versatile platform that performs temperature gradient capillary electrophoresis (TGCE) for mutation/single-nucleotide polymorphism (SNP) detection, sequencing and mutation/SNP genotyping for identification of sequence variants on an automated 24-, 96- or 192-capillary array instrument. In the first mode, multiple DNA samples consisting of homoduplexes and heteroduplexes are separated by CE, during which a temperature gradient is applied that covers all possible temperatures of 50% melting equilibrium (Tms) for the samples. The differences in Tms result in separation of homoduplexes from heteroduplexes, thereby identifying the presence of DNA variants. The sequencing mode is then used to determine the exact location of the mutation/SNPs in the DNA variants. The first two modes allow the rapid identification of variants from the screening of a large number of samples. Only the variants need to be sequenced. The third mode utilizes multiplexed single-base extensions (SBEs) to survey mutations and SNPs at the known sites of DNA sequence. The TGCE approach combined with sequencing and SBE is fast and cost-effective for high-throughput mutation/SNP detection.

  5. Whole-Exome Sequencing Identifies Rare and Low-Frequency Coding Variants Associated with LDL Cholesterol

    PubMed Central

    Lange, Leslie A.; Hu, Youna; Zhang, He; Xue, Chenyi; Schmidt, Ellen M.; Tang, Zheng-Zheng; Bizon, Chris; Lange, Ethan M.; Smith, Joshua D.; Turner, Emily H.; Jun, Goo; Kang, Hyun Min; Peloso, Gina; Auer, Paul; Li, Kuo-ping; Flannick, Jason; Zhang, Ji; Fuchsberger, Christian; Gaulton, Kyle; Lindgren, Cecilia; Locke, Adam; Manning, Alisa; Sim, Xueling; Rivas, Manuel A.; Holmen, Oddgeir L.; Gottesman, Omri; Lu, Yingchang; Ruderfer, Douglas; Stahl, Eli A.; Duan, Qing; Li, Yun; Durda, Peter; Jiao, Shuo; Isaacs, Aaron; Hofman, Albert; Bis, Joshua C.; Correa, Adolfo; Griswold, Michael E.; Jakobsdottir, Johanna; Smith, Albert V.; Schreiner, Pamela J.; Feitosa, Mary F.; Zhang, Qunyuan; Huffman, Jennifer E.; Crosby, Jacy; Wassel, Christina L.; Do, Ron; Franceschini, Nora; Martin, Lisa W.; Robinson, Jennifer G.; Assimes, Themistocles L.; Crosslin, David R.; Rosenthal, Elisabeth A.; Tsai, Michael; Rieder, Mark J.; Farlow, Deborah N.; Folsom, Aaron R.; Lumley, Thomas; Fox, Ervin R.; Carlson, Christopher S.; Peters, Ulrike; Jackson, Rebecca D.; van Duijn, Cornelia M.; Uitterlinden, André G.; Levy, Daniel; Rotter, Jerome I.; Taylor, Herman A.; Gudnason, Vilmundur; Siscovick, David S.; Fornage, Myriam; Borecki, Ingrid B.; Hayward, Caroline; Rudan, Igor; Chen, Y. Eugene; Bottinger, Erwin P.; Loos, Ruth J.F.; Sætrom, Pål; Hveem, Kristian; Boehnke, Michael; Groop, Leif; McCarthy, Mark; Meitinger, Thomas; Ballantyne, Christie M.; Gabriel, Stacey B.; O’Donnell, Christopher J.; Post, Wendy S.; North, Kari E.; Reiner, Alexander P.; Boerwinkle, Eric; Psaty, Bruce M.; Altshuler, David; Kathiresan, Sekar; Lin, Dan-Yu; Jarvik, Gail P.; Cupples, L. Adrienne; Kooperberg, Charles; Wilson, James G.; Nickerson, Deborah A.; Abecasis, Goncalo R.; Rich, Stephen S.; Tracy, Russell P.; Willer, Cristen J.; Gabriel, Stacey B.; Altshuler, David M.; Abecasis, Gonçalo R.; Allayee, Hooman; Cresci, Sharon; Daly, Mark J.; de Bakker, Paul I.W.; DePristo, Mark A.; Do, Ron; Donnelly, Peter; Farlow, Deborah N.; Fennell, Tim; Garimella, Kiran; Hazen, Stanley L.; Hu, Youna; Jordan, Daniel M.; Jun, Goo; Kathiresan, Sekar; Kang, Hyun Min; Kiezun, Adam; Lettre, Guillaume; Li, Bingshan; Li, Mingyao; Newton-Cheh, Christopher H.; Padmanabhan, Sandosh; Peloso, Gina; Pulit, Sara; Rader, Daniel J.; Reich, David; Reilly, Muredach P.; Rivas, Manuel A.; Schwartz, Steve; Scott, Laura; Siscovick, David S.; Spertus, John A.; Stitziel, Nathaniel O.; Stoletzki, Nina; Sunyaev, Shamil R.; Voight, Benjamin F.; Willer, Cristen J.; Rich, Stephen S.; Akylbekova, Ermeg; Atwood, Larry D.; Ballantyne, Christie M.; Barbalic, Maja; Barr, R. Graham; Benjamin, Emelia J.; Bis, Joshua; Boerwinkle, Eric; Bowden, Donald W.; Brody, Jennifer; Budoff, Matthew; Burke, Greg; Buxbaum, Sarah; Carr, Jeff; Chen, Donna T.; Chen, Ida Y.; Chen, Wei-Min; Concannon, Pat; Crosby, Jacy; Cupples, L. Adrienne; D’Agostino, Ralph; DeStefano, Anita L.; Dreisbach, Albert; Dupuis, Josée; Durda, J. Peter; Ellis, Jaclyn; Folsom, Aaron R.; Fornage, Myriam; Fox, Caroline S.; Fox, Ervin; Funari, Vincent; Ganesh, Santhi K.; Gardin, Julius; Goff, David; Gordon, Ora; Grody, Wayne; Gross, Myron; Guo, Xiuqing; Hall, Ira M.; Heard-Costa, Nancy L.; Heckbert, Susan R.; Heintz, Nicholas; Herrington, David M.; Hickson, DeMarc; Huang, Jie; Hwang, Shih-Jen; Jacobs, David R.; Jenny, Nancy S.; Johnson, Andrew D.; Johnson, Craig W.; Kawut, Steven; Kronmal, Richard; Kurz, Raluca; Lange, Ethan M.; Lange, Leslie A.; Larson, Martin G.; Lawson, Mark; Lewis, Cora E.; Levy, Daniel; Li, Dalin; Lin, Honghuang; Liu, Chunyu; Liu, Jiankang; Liu, Kiang; Liu, Xiaoming; Liu, Yongmei; Longstreth, William T.; Loria, Cay; Lumley, Thomas; Lunetta, Kathryn; Mackey, Aaron J.; Mackey, Rachel; Manichaikul, Ani; Maxwell, Taylor; McKnight, Barbara; Meigs, James B.; Morrison, Alanna C.; Musani, Solomon K.; Mychaleckyj, Josyf C.; Nettleton, Jennifer A.; North, Kari; O’Donnell, Christopher J.; O’Leary, Daniel; Ong, Frank; Palmas, Walter

    2014-01-01

    Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98th or <2nd percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments. PMID:24507775

  6. Whole-exome sequencing identifies rare and low-frequency coding variants associated with LDL cholesterol.

    PubMed

    Lange, Leslie A; Hu, Youna; Zhang, He; Xue, Chenyi; Schmidt, Ellen M; Tang, Zheng-Zheng; Bizon, Chris; Lange, Ethan M; Smith, Joshua D; Turner, Emily H; Jun, Goo; Kang, Hyun Min; Peloso, Gina; Auer, Paul; Li, Kuo-Ping; Flannick, Jason; Zhang, Ji; Fuchsberger, Christian; Gaulton, Kyle; Lindgren, Cecilia; Locke, Adam; Manning, Alisa; Sim, Xueling; Rivas, Manuel A; Holmen, Oddgeir L; Gottesman, Omri; Lu, Yingchang; Ruderfer, Douglas; Stahl, Eli A; Duan, Qing; Li, Yun; Durda, Peter; Jiao, Shuo; Isaacs, Aaron; Hofman, Albert; Bis, Joshua C; Correa, Adolfo; Griswold, Michael E; Jakobsdottir, Johanna; Smith, Albert V; Schreiner, Pamela J; Feitosa, Mary F; Zhang, Qunyuan; Huffman, Jennifer E; Crosby, Jacy; Wassel, Christina L; Do, Ron; Franceschini, Nora; Martin, Lisa W; Robinson, Jennifer G; Assimes, Themistocles L; Crosslin, David R; Rosenthal, Elisabeth A; Tsai, Michael; Rieder, Mark J; Farlow, Deborah N; Folsom, Aaron R; Lumley, Thomas; Fox, Ervin R; Carlson, Christopher S; Peters, Ulrike; Jackson, Rebecca D; van Duijn, Cornelia M; Uitterlinden, André G; Levy, Daniel; Rotter, Jerome I; Taylor, Herman A; Gudnason, Vilmundur; Siscovick, David S; Fornage, Myriam; Borecki, Ingrid B; Hayward, Caroline; Rudan, Igor; Chen, Y Eugene; Bottinger, Erwin P; Loos, Ruth J F; Sætrom, Pål; Hveem, Kristian; Boehnke, Michael; Groop, Leif; McCarthy, Mark; Meitinger, Thomas; Ballantyne, Christie M; Gabriel, Stacey B; O'Donnell, Christopher J; Post, Wendy S; North, Kari E; Reiner, Alexander P; Boerwinkle, Eric; Psaty, Bruce M; Altshuler, David; Kathiresan, Sekar; Lin, Dan-Yu; Jarvik, Gail P; Cupples, L Adrienne; Kooperberg, Charles; Wilson, James G; Nickerson, Deborah A; Abecasis, Goncalo R; Rich, Stephen S; Tracy, Russell P; Willer, Cristen J

    2014-02-01

    Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98(th) or <2(nd) percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments. PMID:24507775

  7. A unified mixed-effects model for rare-variant association in sequencing studies.

    PubMed

    Sun, Jianping; Zheng, Yingye; Hsu, Li

    2013-05-01

    For rare-variant association analysis, due to extreme low frequencies of these variants, it is necessary to aggregate them by a prior set (e.g., genes and pathways) in order to achieve adequate power. In this paper, we consider hierarchical models to relate a set of rare variants to phenotype by modeling the effects of variants as a function of variant characteristics while allowing for variant-specific effect (heterogeneity). We derive a set of two score statistics, testing the group effect by variant characteristics and the heterogeneity effect. We make a novel modification to these score statistics so that they are independent under the null hypothesis and their asymptotic distributions can be derived. As a result, the computational burden is greatly reduced compared with permutation-based tests. Our approach provides a general testing framework for rare variants association, which includes many commonly used tests, such as the burden test [Li and Leal, 2008] and the sequence kernel association test [Wu et al., 2011], as special cases. Furthermore, in contrast to these tests, our proposed test has an added capacity to identify which components of variant characteristics and heterogeneity contribute to the association. Simulations under a wide range of scenarios show that the proposed test is valid, robust, and powerful. An application to the Dallas Heart Study illustrates that apart from identifying genes with significant associations, the new method also provides additional information regarding the source of the association. Such information may be useful for generating hypothesis in future studies. PMID:23483651

  8. Single variant and multi-variant trend tests for genetic association with next generation sequencing that are robust to sequencing error

    PubMed Central

    Kim, Wonkuk; Londono, Douglas; Zhou, Lisheng; Xing, Jinchuan; Nato, Andrew; Musolf, Anthony; Matise, Tara C.; Finch, Stephen J.; Gordon, Derek

    2013-01-01

    As with any new technology, next generation sequencing (NGS) has potential advantages and potential challenges. One advantage is the identification of multiple causal variants for disease that might otherwise be missed by SNP-chip technology. One potential challenge is misclassification error (as with any emerging technology) and the issue of power loss due to multiple testing. Here, we develop an extension of the linear trend test for association that incorporates differential misclassification error and may be applied to any number of SNPs. We call the statistic the linear trend test allowing for error, applied to NGS, or LTTae,NGS. This statistic allows for differential misclassification. The observed data are phenotypes for unrelated cases and controls, coverage, and the number of putative causal variants for every individual at all SNPs. We simulate data considering multiple factors (disease mode of inheritance, genotype relative risk, causal variant frequency, sequence error rate in cases, sequence error rate in controls, number of loci, and others) and evaluate type I error rate and power for each vector of factor settings. We compare our results with two recently published NGS statistics. Also, we create a fictitious disease model, based on downloaded 1000 Genomes data for 5 SNPs and 388 individuals, and apply our statistic to that data. We find that the LTTae,NGS maintains the correct type I error rate in all simulations (differential and non-differential error), while the other statistics show large inflation in type I error for lower coverage. Power for all three methods is approximately the same for all three statistics in the presence of non-differential error. Application of our statistic to the 1000 Genomes data suggests that, for the data downloaded, there is a 1.5% sequence misclassification rate over all SNPs. Finally, application of the multi-variant form of LTTae,NGS shows high power for a number of simulation settings, although it can have

  9. A survey of tools for variant analysis of next-generation genome sequencing data

    PubMed Central

    Pabinger, Stephan; Dander, Andreas; Fischer, Maria; Snajder, Rene; Sperk, Michael; Efremova, Mirjana; Krabichler, Birgit; Speicher, Michael R.; Zschocke, Johannes

    2014-01-01

    Recent advances in genome sequencing technologies provide unprecedented opportunities to characterize individual genomic landscapes and identify mutations relevant for diagnosis and therapy. Specifically, whole-exome sequencing using next-generation sequencing (NGS) technologies is gaining popularity in the human genetics community due to the moderate costs, manageable data amounts and straightforward interpretation of analysis results. While whole-exome and, in the near future, whole-genome sequencing are becoming commodities, data analysis still poses significant challenges and led to the development of a plethora of tools supporting specific parts of the analysis workflow or providing a complete solution. Here, we surveyed 205 tools for whole-genome/whole-exome sequencing data analysis supporting five distinct analytical steps: quality assessment, alignment, variant identification, variant annotation and visualization. We report an overview of the functionality, features and specific requirements of the individual tools. We then selected 32 programs for variant identification, variant annotation and visualization, which were subjected to hands-on evaluation using four data sets: one set of exome data from two patients with a rare disease for testing identification of germline mutations, two cancer data sets for testing variant callers for somatic mutations, copy number variations and structural variations, and one semi-synthetic data set for testing identification of copy number variations. Our comprehensive survey and evaluation of NGS tools provides a valuable guideline for human geneticists working on Mendelian disorders, complex diseases and cancers. PMID:23341494

  10. Identification of Novel FMR1 Variants by Massively Parallel Sequencing in Developmentally Delayed Males

    PubMed Central

    Collins, Stephen C.; Bray, Steven M.; Suhl, Joshua A.; Cutler, David J.; Coffee, Bradford; Zwick, Michael E.; Warren, Stephen T.

    2010-01-01

    Fragile X syndrome (FXS), the most common inherited form of developmental delay, is typically caused by CGG-repeat expansion in FMR1. However, little attention has been paid to sequence variants in FMR1. Through the use of pooled-template massively parallel sequencing, we identified 130 novel FMR1 sequence variants in a population of 963 developmentally delayed males without CGG-repeat expansion mutations. Among these, we identified a novel missense change, p.R138Q, which alters a conserved residue in the nuclear localization signal of FMRP. We have also identified three promoter mutations in this population, all of which significantly reduce in vitro levels of FMR1 transcription. Additionally, we identified 10 noncoding variants of possible functional significance in the introns and 3’-untranslated region of FMR1, including two predicted splice site mutations. These findings greatly expand the catalogue of known FMR1 sequence variants and suggest that FMR1 sequence variants may represent an important cause of developmental delay. PMID:20799337

  11. Deep Sequencing Reveals Novel Genetic Variants in Children with Acute Liver Failure and Tissue Evidence of Impaired Energy Metabolism

    PubMed Central

    Valencia, C. Alexander; Wang, Xinjian; Wang, Jin; Peters, Anna; Simmons, Julia R.; Moran, Molly C.; Mathur, Abhinav; Husami, Ammar; Qian, Yaping; Sheridan, Rachel; Bove, Kevin E.; Witte, David; Huang, Taosheng; Miethke, Alexander G.

    2016-01-01

    Background & Aims The etiology of acute liver failure (ALF) remains elusive in almost half of affected children. We hypothesized that inherited mitochondrial and fatty acid oxidation disorders were occult etiological factors in patients with idiopathic ALF and impaired energy metabolism. Methods Twelve patients with elevated blood molar lactate/pyruvate ratio and indeterminate etiology were selected from a retrospective cohort of 74 subjects with ALF because their fixed and frozen liver samples were available for histological, ultrastructural, molecular and biochemical analysis. Results A customized next-generation sequencing panel for 26 genes associated with mitochondrial and fatty acid oxidation defects revealed mutations and sequence variants in five subjects. Variants involved the genes ACAD9, POLG, POLG2, DGUOK, and RRM2B; the latter not previously reported in subjects with ALF. The explanted livers of the patients with heterozygous, truncating insertion mutations in RRM2B showed patchy micro- and macrovesicular steatosis, decreased mitochondrial DNA (mtDNA) content <30% of controls, and reduced respiratory chain complex activity; both patients had good post-transplant outcome. One infant with severe lactic acidosis was found to carry two heterozygous variants in ACAD9, which was associated with isolated complex I deficiency and diffuse hypergranular hepatocytes. The two subjects with heterozygous variants of unknown clinical significance in POLG and DGUOK developed ALF following drug exposure. Their hepatocytes displayed abnormal mitochondria by electron microscopy. Conclusion Targeted next generation sequencing and correlation with histological, ultrastructural and functional studies on liver tissue in children with elevated lactate/pyruvate ratio expand the spectrum of genes associated with pediatric ALF. PMID:27483465

  12. Filovirus RefSeq Entries: Evaluation and Selection of Filovirus Type Variants, Type Sequences, and Names

    PubMed Central

    Kuhn, Jens H.; Andersen, Kristian G.; Bào, Yīmíng; Bavari, Sina; Becker, Stephan; Bennett, Richard S.; Bergman, Nicholas H.; Blinkova, Olga; Bradfute, Steven; Brister, J. Rodney; Bukreyev, Alexander; Chandran, Kartik; Chepurnov, Alexander A.; Davey, Robert A.; Dietzgen, Ralf G.; Doggett, Norman A.; Dolnik, Olga; Dye, John M.; Enterlein, Sven; Fenimore, Paul W.; Formenty, Pierre; Freiberg, Alexander N.; Garry, Robert F.; Garza, Nicole L.; Gire, Stephen K.; Gonzalez, Jean-Paul; Griffiths, Anthony; Happi, Christian T.; Hensley, Lisa E.; Herbert, Andrew S.; Hevey, Michael C.; Hoenen, Thomas; Honko, Anna N.; Ignatyev, Georgy M.; Jahrling, Peter B.; Johnson, Joshua C.; Johnson, Karl M.; Kindrachuk, Jason; Klenk, Hans-Dieter; Kobinger, Gary; Kochel, Tadeusz J.; Lackemeyer, Matthew G.; Lackner, Daniel F.; Leroy, Eric M.; Lever, Mark S.; Mühlberger, Elke; Netesov, Sergey V.; Olinger, Gene G.; Omilabu, Sunday A.; Palacios, Gustavo; Panchal, Rekha G.; Park, Daniel J.; Patterson, Jean L.; Paweska, Janusz T.; Peters, Clarence J.; Pettitt, James; Pitt, Louise; Radoshitzky, Sheli R.; Ryabchikova, Elena I.; Saphire, Erica Ollmann; Sabeti, Pardis C.; Sealfon, Rachel; Shestopalov, Aleksandr M.; Smither, Sophie J.; Sullivan, Nancy J.; Swanepoel, Robert; Takada, Ayato; Towner, Jonathan S.; van der Groen, Guido; Volchkov, Viktor E.; Volchkova, Valentina A.; Wahl-Jensen, Victoria; Warren, Travis K.; Warfield, Kelly L.; Weidmann, Manfred; Nichol, Stuart T.

    2014-01-01

    Sequence determination of complete or coding-complete genomes of viruses is becoming common practice for supporting the work of epidemiologists, ecologists, virologists, and taxonomists. Sequencing duration and costs are rapidly decreasing, sequencing hardware is under modification for use by non-experts, and software is constantly being improved to simplify sequence data management and analysis. Thus, analysis of virus disease outbreaks on the molecular level is now feasible, including characterization of the evolution of individual virus populations in single patients over time. The increasing accumulation of sequencing data creates a management problem for the curators of commonly used sequence databases and an entry retrieval problem for end users. Therefore, utilizing the data to their fullest potential will require setting nomenclature and annotation standards for virus isolates and associated genomic sequences. The National Center for Biotechnology Information’s (NCBI’s) RefSeq is a non-redundant, curated database for reference (or type) nucleotide sequence records that supplies source data to numerous other databases. Building on recently proposed templates for filovirus variant naming [ ()////variant designation>-], we report consensus decisions from a majority of past and currently active filovirus experts on the eight filovirus type variants and isolates to be represented in RefSeq, their final designations, and their associated sequences. PMID:25256396

  13. Filovirus RefSeq entries: evaluation and selection of filovirus type variants, type sequences, and names.

    PubMed

    Kuhn, Jens H; Andersen, Kristian G; Bào, Yīmíng; Bavari, Sina; Becker, Stephan; Bennett, Richard S; Bergman, Nicholas H; Blinkova, Olga; Bradfute, Steven; Brister, J Rodney; Bukreyev, Alexander; Chandran, Kartik; Chepurnov, Alexander A; Davey, Robert A; Dietzgen, Ralf G; Doggett, Norman A; Dolnik, Olga; Dye, John M; Enterlein, Sven; Fenimore, Paul W; Formenty, Pierre; Freiberg, Alexander N; Garry, Robert F; Garza, Nicole L; Gire, Stephen K; Gonzalez, Jean-Paul; Griffiths, Anthony; Happi, Christian T; Hensley, Lisa E; Herbert, Andrew S; Hevey, Michael C; Hoenen, Thomas; Honko, Anna N; Ignatyev, Georgy M; Jahrling, Peter B; Johnson, Joshua C; Johnson, Karl M; Kindrachuk, Jason; Klenk, Hans-Dieter; Kobinger, Gary; Kochel, Tadeusz J; Lackemeyer, Matthew G; Lackner, Daniel F; Leroy, Eric M; Lever, Mark S; Mühlberger, Elke; Netesov, Sergey V; Olinger, Gene G; Omilabu, Sunday A; Palacios, Gustavo; Panchal, Rekha G; Park, Daniel J; Patterson, Jean L; Paweska, Janusz T; Peters, Clarence J; Pettitt, James; Pitt, Louise; Radoshitzky, Sheli R; Ryabchikova, Elena I; Saphire, Erica Ollmann; Sabeti, Pardis C; Sealfon, Rachel; Shestopalov, Aleksandr M; Smither, Sophie J; Sullivan, Nancy J; Swanepoel, Robert; Takada, Ayato; Towner, Jonathan S; van der Groen, Guido; Volchkov, Viktor E; Volchkova, Valentina A; Wahl-Jensen, Victoria; Warren, Travis K; Warfield, Kelly L; Weidmann, Manfred; Nichol, Stuart T

    2014-09-01

    Sequence determination of complete or coding-complete genomes of viruses is becoming common practice for supporting the work of epidemiologists, ecologists, virologists, and taxonomists. Sequencing duration and costs are rapidly decreasing, sequencing hardware is under modification for use by non-experts, and software is constantly being improved to simplify sequence data management and analysis. Thus, analysis of virus disease outbreaks on the molecular level is now feasible, including characterization of the evolution of individual virus populations in single patients over time. The increasing accumulation of sequencing data creates a management problem for the curators of commonly used sequence databases and an entry retrieval problem for end users. Therefore, utilizing the data to their fullest potential will require setting nomenclature and annotation standards for virus isolates and associated genomic sequences. The National Center for Biotechnology Information's (NCBI's) RefSeq is a non-redundant, curated database for reference (or type) nucleotide sequence records that supplies source data to numerous other databases. Building on recently proposed templates for filovirus variant naming [ ()////variant designation>-], we report consensus decisions from a majority of past and currently active filovirus experts on the eight filovirus type variants and isolates to be represented in RefSeq, their final designations, and their associated sequences. PMID:25256396

  14. A rare sequence variant in intron 1 of THAP1 is associated with primary dystonia.

    PubMed

    Vemula, Satya R; Xiao, Jianfeng; Zhao, Yu; Bastian, Robert W; Perlmutter, Joel S; Racette, Brad A; Paniello, Randal C; Wszolek, Zbigniew K; Uitti, Ryan J; Van Gerpen, Jay A; Hedera, Peter; Truong, Daniel D; Blitzer, Andrew; Rudzińska, Monika; Momčilović, Dragana; Jinnah, Hyder A; Frei, Karen; Pfeiffer, Ronald F; LeDoux, Mark S

    2014-05-01

    Although coding variants in THAP1 have been causally associated with primary dystonia, the contribution of noncoding variants remains uncertain. Herein, we examine a previously identified Intron 1 variant (c.71+9C>A, rs200209986). Among 1672 subjects with mainly adult-onset primary dystonia, 12 harbored the variant in contrast to 1/1574 controls (P < 0.01). Dystonia classification included cervical dystonia (N = 3), laryngeal dystonia (adductor subtype, N = 3), jaw-opening oromandibular dystonia (N = 1), blepharospasm (N = 2), and unclassified (N = 3). Age of dystonia onset ranged from 25 to 69 years (mean = 54 years). In comparison to controls with no identified THAP1 sequence variants, the c.71+9C>A variant was associated with an elevated ratio of Isoform 1 (NM_018105) to Isoform 2 (NM_199003) in leukocytes. In silico and minigene analyses indicated that c.71+9C>A alters THAP1 splicing. Lymphoblastoid cells harboring the c.71+9C>A variant showed extensive apoptosis with relatively fewer cells in the G2 phase of the cell cycle. Differentially expressed genes from lymphoblastoid cells revealed that the c.71+9C>A variant exerts effects on DNA synthesis, cell growth and proliferation, cell survival, and cytotoxicity. In aggregate, these data indicate that THAP1 c.71+9C>A is a risk factor for adult-onset primary dystonia. PMID:24936516

  15. A rare sequence variant in intron 1 of THAP1 is associated with primary dystonia

    PubMed Central

    Vemula, Satya R; Xiao, Jianfeng; Zhao, Yu; Bastian, Robert W; Perlmutter, Joel S; Racette, Brad A; Paniello, Randal C; Wszolek, Zbigniew K; Uitti, Ryan J; Van Gerpen, Jay A; Hedera, Peter; Truong, Daniel D; Blitzer, Andrew; Rudzińska, Monika; Momčilović, Dragana; Jinnah, Hyder A; Frei, Karen; Pfeiffer, Ronald F; LeDoux, Mark S

    2014-01-01

    Although coding variants in THAP1 have been causally associated with primary dystonia, the contribution of noncoding variants remains uncertain. Herein, we examine a previously identified Intron 1 variant (c.71+9C>A, rs200209986). Among 1672 subjects with mainly adult-onset primary dystonia, 12 harbored the variant in contrast to 1/1574 controls (P < 0.01). Dystonia classification included cervical dystonia (N = 3), laryngeal dystonia (adductor subtype, N = 3), jaw-opening oromandibular dystonia (N = 1), blepharospasm (N = 2), and unclassified (N = 3). Age of dystonia onset ranged from 25 to 69 years (mean = 54 years). In comparison to controls with no identified THAP1 sequence variants, the c.71+9C>A variant was associated with an elevated ratio of Isoform 1 (NM_018105) to Isoform 2 (NM_199003) in leukocytes. In silico and minigene analyses indicated that c.71+9C>A alters THAP1 splicing. Lymphoblastoid cells harboring the c.71+9C>A variant showed extensive apoptosis with relatively fewer cells in the G2 phase of the cell cycle. Differentially expressed genes from lymphoblastoid cells revealed that the c.71+9C>A variant exerts effects on DNA synthesis, cell growth and proliferation, cell survival, and cytotoxicity. In aggregate, these data indicate that THAP1 c.71+9C>A is a risk factor for adult-onset primary dystonia. PMID:24936516

  16. Improved detection of artifactual viral minority variants in high-throughput sequencing data.

    PubMed

    Welkers, Matthijs R A; Jonges, Marcel; Jeeninga, Rienk E; Koopmans, Marion P G; de Jong, Menno D

    2014-01-01

    High-throughput sequencing (HTS) of viral samples provides important information on the presence of viral minority variants. However, detection and accurate quantification is limited by the capacity to distinguish biological from artificial variation. In this study, errors related to the Illumina HiSeq2000 library generation and HTS process were investigated by determining minority variant frequencies in an influenza A/WSN/1933(H1N1) virus reverse-genetics plasmid pool. Errors related to amplification and sequencing were determined using the same plasmid pool, by generation of infectious virus using reverse genetics followed by in duplo reverse-transcriptase PCR (RT-PCR) amplification and HTS in the same sequence run. Results showed that after "best practice" quality control (QC), within the plasmid pool, one minority variant with a frequency >0.5% was identified, while 84 and 139 were identified in the RT-PCR amplified samples, indicating RT-PCR amplification artificially increased variation. Detailed analysis showed that artifactual minority variants could be identified by two major technical characteristics: their predominant presence in a single read orientation and uneven distribution of mismatches over the length of the reads. We demonstrate that by addition of two QC steps 95% of the artifactual minority variants could be identified. When our analysis approach was applied to three clinical samples 68% of the initially identified minority variants were identified as artifacts. Our study clearly demonstrated that, without additional QC steps, overestimation of viral minority variants is very likely to occur, mainly as a consequence of the required RT-PCR amplification step. The improved ability to detect and correct for artifactual minority variants, increases data resolution and could aid both past and future studies incorporating HTS. The source code has been made available through Sourceforge (https://sourceforge.net/projects/mva-ngs). PMID:25657642

  17. Comparison of Multi-Sample Variant Calling Methods for Whole Genome Sequencing

    PubMed Central

    Nho, Kwangsik; West, John D.; Li, Huian; Henschel, Robert; Bharthur, Apoorva; Tavares, Michel C.; Saykin, Andrew J.

    2015-01-01

    Rapid advancement of next-generation sequencing (NGS) technologies has facilitated the search for genetic susceptibility factors that influence disease risk in the field of human genetics. In particular whole genome sequencing (WGS) has been used to obtain the most comprehensive genetic variation of an individual and perform detailed evaluation of all genetic variation. To this end, sophisticated methods to accurately call high-quality variants and genotypes simultaneously on a cohort of individuals from raw sequence data are required. On chromosome 22 of 818 WGS data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), which is the largest WGS related to a single disease, we compared two multi-sample variant calling methods for the detection of single nucleotide variants (SNVs) and short insertions and deletions (indels) in WGS: (1) reduce the analysis-ready reads (BAM) file to a manageable size by keeping only essential information for variant calling (“REDUCE”) and (2) call variants individually on each sample and then perform a joint genotyping analysis of the variant files produced for all samples in a cohort (“JOINT”). JOINT identified 515,210 SNVs and 60,042 indels, while REDUCE identified 358,303 SNVs and 52,855 indels. JOINT identified many more SNVs and indels compared to REDUCE. Both methods had concordance rate of 99.60% for SNVs and 99.06% for indels. For SNVs, evaluation with HumanOmni 2.5M genotyping arrays revealed a concordance rate of 99.68% for JOINT and 99.50% for REDUCE. REDUCE needed more computational time and memory compared to JOINT. Our findings indicate that the multi-sample variant calling method using the JOINT process is a promising strategy for the variant detection, which should facilitate our understanding of the underlying pathogenesis of human diseases. PMID:26167514

  18. Novel Y-chromosome Short Tandem Repeat Variants Detected Through the Use of Massively Parallel Sequencing.

    PubMed

    Warshauer, David H; Churchill, Jennifer D; Novroski, Nicole; King, Jonathan L; Budowle, Bruce

    2015-08-01

    Massively parallel sequencing (MPS) technology is capable of determining the sizes of short tandem repeat (STR) alleles as well as their individual nucleotide sequences. Thus, single nucleotide polymorphisms (SNPs) within the repeat regions of STRs and variations in the pattern of repeat units in a given repeat motif can be used to differentiate alleles of the same length. In this study, MPS was used to sequence 28 forensically-relevant Y-chromosome STRs in a set of 41 DNA samples from the 3 major U.S. population groups (African Americans, Caucasians, and Hispanics). The resulting sequence data, which were analyzed with STRait Razor v2.0, revealed 37 unique allele sequence variants that have not been previously reported. Of these, 19 sequences were variations of documented sequences resulting from the presence of intra-repeat SNPs or alternative repeat unit patterns. Despite a limited sampling, two of the most frequently-observed variants were found only in African American samples. The remaining 18 variants represented allele sequences for which there were no published data with which to compare. These findings illustrate the great potential of MPS with regard to increasing the resolving power of STR typing and emphasize the need for sample population characterization of STR alleles. PMID:26391384

  19. Novel Y-chromosome Short Tandem Repeat Variants Detected Through the Use of Massively Parallel Sequencing

    PubMed Central

    Warshauer, David H.; Churchill, Jennifer D.; Novroski, Nicole; King, Jonathan L.; Budowle, Bruce

    2015-01-01

    Massively parallel sequencing (MPS) technology is capable of determining the sizes of short tandem repeat (STR) alleles as well as their individual nucleotide sequences. Thus, single nucleotide polymorphisms (SNPs) within the repeat regions of STRs and variations in the pattern of repeat units in a given repeat motif can be used to differentiate alleles of the same length. In this study, MPS was used to sequence 28 forensically-relevant Y-chromosome STRs in a set of 41 DNA samples from the 3 major U.S. population groups (African Americans, Caucasians, and Hispanics). The resulting sequence data, which were analyzed with STRait Razor v2.0, revealed 37 unique allele sequence variants that have not been previously reported. Of these, 19 sequences were variations of documented sequences resulting from the presence of intra-repeat SNPs or alternative repeat unit patterns. Despite a limited sampling, two of the most frequently-observed variants were found only in African American samples. The remaining 18 variants represented allele sequences for which there were no published data with which to compare. These findings illustrate the great potential of MPS with regard to increasing the resolving power of STR typing and emphasize the need for sample population characterization of STR alleles. PMID:26391384

  20. Variant Humicola grisea CBH1.1

    SciTech Connect

    Goedegeburr, Frits; Gualfetti, Peter; Mitchinson, Colin; Larenas, Edmund

    2013-02-19

    Disclosed are variants of Humicola grisea Cel7A (CBH1.1), H. jecorina CBH1 variant or S. thermophilium CBH1, nucleic acids encoding the same and methods for producing the same. The variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted.

  1. Variant Humicola grisea CBH1.1

    SciTech Connect

    Goedegebuur, Frits; Gualfetti, Peter; Mitchinson, Colin; Larenas, Edmund

    2014-03-18

    Disclosed are variants of Humicola grisea Cel7A (CBH1.1), H. jecorina CBH1 variant or S. thermophilium CBH1, nucleic acids encoding the same and methods for producing the same. The variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted.

  2. Variant humicola grisea CBH1.1

    SciTech Connect

    Goedegebuur, Frits; Gualfetti, Peter; Mitchinson, Colin; Edmund, Larenas

    2014-09-09

    Disclosed are variants of Humicola grisea Cel7A (CBH1.1), H. jecorina CBH1 variant or S. thermophilium CBH1, nucleic acids encoding the same and methods for producing the same. The variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted.

  3. Variant Humicola grisea CBH1.1

    DOEpatents

    Goedegebuur, Frits; Gualfetti, Peter; Mitchinson, Colin; Larenas, Edmund

    2011-08-16

    Disclosed are variants of Humicola grisea Cel7A (CBH1.1), H. jecorina CBH1 variant or S. thermophilium CBH1, nucleic acids encoding the same and methods for producing the same. The variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted.

  4. Variant Humicola grisea CBH1.1

    DOEpatents

    Goedegebuur, Frits; Gualfetti, Peter; Mitchinson, Colin; Larenas, Edmund

    2011-05-31

    Disclosed are variants of Humicola grisea Cel7A (CBH1.1), H. jecorina CBH1 variant or S. thermophilium CBH1, nucleic acids encoding the same and methods for producing the same. The variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted.

  5. Variant Humicola grisea CBH1.1

    DOEpatents

    Goedegebuur, Frits; Gualfetti, Peter; Mitchinson, Colin; Larenas, Edmund

    2008-12-02

    Disclosed are variants of Humicola grisea Cel7A (CBH1.1), H. jecorina CBH1 variant or S. thermophilium CBH1, nucleic acids encoding the same and methods for producing the same. The variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted.

  6. Variant Humicola grisea CBH1.1

    DOEpatents

    Goedegebuur, Frits; Gualfetti, Peter; Mitchinson, Colin; Larenas, Edmund

    2012-08-07

    Disclosed are variants of Humicola grisea Cel7A (CBH1.1), H. jecorina CBH1 variant or S. thermophilium CBH1, nucleic acids encoding the same and methods for producing the same. The variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted.

  7. Exome sequencing of case-unaffected-parents trios reveals recessive and de novo genetic variants in sporadic ALS

    PubMed Central

    Steinberg, Karyn Meltz; Yu, Bing; Koboldt, Daniel C.; Mardis, Elaine R.; Pamphlett, Roger

    2015-01-01

    The contribution of genetic variants to sporadic amyotrophic lateral sclerosis (ALS) remains largely unknown. Either recessive or de novo variants could result in an apparently sporadic occurrence of ALS. In an attempt to find such variants we sequenced the exomes of 44 ALS-unaffected-parents trios. Rare and potentially damaging compound heterozygous variants were found in 27% of ALS patients, homozygous recessive variants in 14% and coding de novo variants in 27%. In 20% of patients more than one of the above variants was present. Genes with recessive variants were enriched in nucleotide binding capacity, ATPase activity, and the dynein heavy chain. Genes with de novo variants were enriched in transcription regulation and cell cycle processes. This trio study indicates that rare private recessive variants could be a mechanism underlying some case of sporadic ALS, and that de novo mutations are also likely to play a part in the disease. PMID:25773295

  8. SNPlice: variants that modulate Intron retention from RNA-sequencing data

    PubMed Central

    Movassagh, Mercedeh; Kowsari, Kamran; Seyfi, Ali; Kokkinaki, Maria; Edwards, Nathan J.; Golestaneh, Nady; Horvath, Anelia

    2015-01-01

    Rationale: The growing recognition of the importance of splicing, together with rapidly accumulating RNA-sequencing data, demand robust high-throughput approaches, which efficiently analyze experimentally derived whole-transcriptome splice profiles. Results: We have developed a computational approach, called SNPlice, for identifying cis-acting, splice-modulating variants from RNA-seq datasets. SNPlice mines RNA-seq datasets to find reads that span single-nucleotide variant (SNV) loci and nearby splice junctions, assessing the co-occurrence of variants and molecules that remain unspliced at nearby exon–intron boundaries. Hence, SNPlice highlights variants preferentially occurring on intron-containing molecules, possibly resulting from altered splicing. To illustrate co-occurrence of variant nucleotide and exon–intron boundary, allele-specific sequencing was used. SNPlice results are generally consistent with splice-prediction tools, but also indicate splice-modulating elements missed by other algorithms. SNPlice can be applied to identify variants that correlate with unexpected splicing events, and to measure the splice-modulating potential of canonical splice-site SNVs. Availability and implementation: SNPlice is freely available for download from https://code.google.com/p/snplice/ as a self-contained binary package for 64-bit Linux computers and as python source-code. Contact: pmudvari@gwu.edu or horvatha@gwu.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25481010

  9. Exome sequencing in pooled DNA samples to identify maternal pre-eclampsia risk variants

    PubMed Central

    Kaartokallio, Tea; Wang, Jingwen; Heinonen, Seppo; Kajantie, Eero; Kivinen, Katja; Pouta, Anneli; Gerdhem, Paul; Jiao, Hong; Kere, Juha; Laivuori, Hannele

    2016-01-01

    Pre-eclampsia is a common pregnancy disorder that is a major cause for maternal and perinatal mortality and morbidity. Variants predisposing to pre-eclampsia might be under negative evolutionary selection that is likely to keep their population frequencies low. We exome sequenced samples from a hundred Finnish pre-eclamptic women in pools of ten to screen for low-frequency, large-effect risk variants for pre-eclampsia. After filtering and additional genotyping steps, we selected 28 low-frequency missense, nonsense and splice site variants that were enriched in the pre-eclampsia pools compared to reference data, and genotyped the variants in 1353 pre-eclamptic and 699 non-pre-eclamptic women to test the association of them with pre-eclampsia and quantitative traits relevant for the disease. Genotypes from the SISu project (n = 6118 exome sequenced Finnish samples) were included in the binary trait association analysis as a population reference to increase statistical power. In these analyses, none of the variants tested reached genome-wide significance. In conclusion, the genetic risk for pre-eclampsia is likely complex even in a population isolate like Finland, and larger sample sizes will be necessary to detect risk variants. PMID:27384325

  10. Exome sequencing in pooled DNA samples to identify maternal pre-eclampsia risk variants.

    PubMed

    Kaartokallio, Tea; Wang, Jingwen; Heinonen, Seppo; Kajantie, Eero; Kivinen, Katja; Pouta, Anneli; Gerdhem, Paul; Jiao, Hong; Kere, Juha; Laivuori, Hannele

    2016-01-01

    Pre-eclampsia is a common pregnancy disorder that is a major cause for maternal and perinatal mortality and morbidity. Variants predisposing to pre-eclampsia might be under negative evolutionary selection that is likely to keep their population frequencies low. We exome sequenced samples from a hundred Finnish pre-eclamptic women in pools of ten to screen for low-frequency, large-effect risk variants for pre-eclampsia. After filtering and additional genotyping steps, we selected 28 low-frequency missense, nonsense and splice site variants that were enriched in the pre-eclampsia pools compared to reference data, and genotyped the variants in 1353 pre-eclamptic and 699 non-pre-eclamptic women to test the association of them with pre-eclampsia and quantitative traits relevant for the disease. Genotypes from the SISu project (n = 6118 exome sequenced Finnish samples) were included in the binary trait association analysis as a population reference to increase statistical power. In these analyses, none of the variants tested reached genome-wide significance. In conclusion, the genetic risk for pre-eclampsia is likely complex even in a population isolate like Finland, and larger sample sizes will be necessary to detect risk variants. PMID:27384325

  11. Integrating mRNA and protein sequencing enables the detection and quantitative profiling of natural protein sequence variants of Populus trichocarpa

    SciTech Connect

    Abraham, Paul E.; Wang, Xiaojing; Ranjan, Priya; Zhang, Bing; Tuskan, Gerald A.; Robert L. Hettich; Nookaew, Intawat

    2015-10-20

    The availability of next-generation sequencing technologies has rapidly transformed our ability to link genotypes to phenotypes, and as such, promises to facilitate the dissection of genetic contribution to complex traits. Although discoveries of genetic associations will further our understanding of biology, once candidate variants have been identified, investigators are faced with the challenge of characterizing the functional effects on proteins encoded by such genes. Here we show how next-generation RNA sequencing data can be exploited to construct genotype-specific protein sequence databases, which provide a clearer picture of the molecular toolbox underlying cellular and organismal processes and their variation in a natural population. For this study, we used two individual genotypes (DENA-17-3 and VNDL-27-4) from a recent genome wide association (GWA) study of Populus trichocarpa, an obligate outcrosser that exhibits tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs) and insertions and deletions (INDELS). Based on large-scale identification of SAAPs, we profiled the frequency of 128 types of naturally occurring amino acid substitutions, with a subset of SAAPs occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. In addition, we were able to explore the diploid landscape of Populus at the proteome-level, allowing the characterization of heterozygous variants.

  12. Integrating mRNA and protein sequencing enables the detection and quantitative profiling of natural protein sequence variants of Populus trichocarpa

    DOE PAGESBeta

    Abraham, Paul E.; Wang, Xiaojing; Ranjan, Priya; Zhang, Bing; Tuskan, Gerald A.; Robert L. Hettich; Nookaew, Intawat

    2015-10-20

    The availability of next-generation sequencing technologies has rapidly transformed our ability to link genotypes to phenotypes, and as such, promises to facilitate the dissection of genetic contribution to complex traits. Although discoveries of genetic associations will further our understanding of biology, once candidate variants have been identified, investigators are faced with the challenge of characterizing the functional effects on proteins encoded by such genes. Here we show how next-generation RNA sequencing data can be exploited to construct genotype-specific protein sequence databases, which provide a clearer picture of the molecular toolbox underlying cellular and organismal processes and their variation in amore » natural population. For this study, we used two individual genotypes (DENA-17-3 and VNDL-27-4) from a recent genome wide association (GWA) study of Populus trichocarpa, an obligate outcrosser that exhibits tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs) and insertions and deletions (INDELS). Based on large-scale identification of SAAPs, we profiled the frequency of 128 types of naturally occurring amino acid substitutions, with a subset of SAAPs occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. In addition, we were able to explore the diploid landscape of Populus at the proteome-level, allowing the characterization of heterozygous variants.« less

  13. CML10, a variant of calmodulin, modulates ascorbic acid synthesis.

    PubMed

    Cho, Kwang-Moon; Nguyen, Ha Thi Kim; Kim, Soo Youn; Shin, Jin Seok; Cho, Dong Hwa; Hong, Seung Beom; Shin, Jeong Sheop; Ok, Sung Han

    2016-01-01

    Calmodulins (CaMs) regulate numerous Ca(2+) -mediated cellular processes in plants by interacting with their respective downstream effectors. Due to the limited number of CaMs, other calcium sensors modulate the regulation of Ca(2+) -mediated cellular processes that are not managed by CaMs. Of 50 CaM-like (CML) proteins identified in Arabidopsis thaliana, we characterized the function of CML10. Yeast two-hybrid screening revealed phosphomannomutase (PMM) as a putative interaction partner of CML10. In vitro and in vivo interaction assays were performed to analyze the interaction mechanisms of CML10 and PMM. PMM activity and the phenotypes of cml10 knock-down mutants were studied to elucidate the role(s) of the CML10-PMM interaction. PMM interacted specifically with CML10 in the presence of Ca(2+) through its multiple interaction motifs. This interaction promoted the activity of PMM. The phenotypes of cml10 knock-down mutants were more sensitive to stress conditions than wild-type plants, corresponding with the fact that PMM is an enzyme which modulates the biosynthesis of ascorbic acid, an antioxidant. The results of this research demonstrate that a calcium sensor, CML10, which is an evolutionary variant of CaM, modulates the stress responses in Arabidopsis by regulating ascorbic acid production. PMID:26315131

  14. Novel pathogenic variants and genes for myopathies identified by whole exome sequencing

    PubMed Central

    Hunter, Jesse M; Ahearn, Mary Ellen; Balak, Christopher D; Liang, Winnie S; Kurdoglu, Ahmet; Corneveaux, Jason J; Russell, Megan; Huentelman, Matthew J; Craig, David W; Carpten, John; Coons, Stephen W; DeMello, Daphne E; Hall, Judith G; Bernes, Saunder M; Baumbach-Reardon, Lisa

    2015-01-01

    Neuromuscular diseases (NMD) account for a significant proportion of infant and childhood mortality and devastating chronic disease. Determining the specific diagnosis of NMD is challenging due to thousands of unique or rare genetic variants that result in overlapping phenotypes. We present four unique childhood myopathy cases characterized by relatively mild muscle weakness, slowly progressing course, mildly elevated creatine phosphokinase (CPK), and contractures. We also present two additional cases characterized by severe prenatal/neonatal myopathy. Prior extensive genetic testing and histology of these cases did not reveal the genetic etiology of disease. Here, we applied whole exome sequencing (WES) and bioinformatics to identify likely causal pathogenic variants in each pedigree. In two cases, we identified novel pathogenic variants in COL6A3. In a third case, we identified novel likely pathogenic variants in COL6A6 and COL6A3. We identified a novel splice variant in EMD in a fourth case. Finally, we classify two cases as calcium channelopathies with identification of novel pathogenic variants in RYR1 and CACNA1S. These are the first cases of myopathies reported to be caused by variants in COL6A6 and CACNA1S. Our results demonstrate the utility and genetic diagnostic value of WES in the broad class of NMD phenotypes. PMID:26247046

  15. Novel pathogenic variants and genes for myopathies identified by whole exome sequencing.

    PubMed

    Hunter, Jesse M; Ahearn, Mary Ellen; Balak, Christopher D; Liang, Winnie S; Kurdoglu, Ahmet; Corneveaux, Jason J; Russell, Megan; Huentelman, Matthew J; Craig, David W; Carpten, John; Coons, Stephen W; DeMello, Daphne E; Hall, Judith G; Bernes, Saunder M; Baumbach-Reardon, Lisa

    2015-07-01

    Neuromuscular diseases (NMD) account for a significant proportion of infant and childhood mortality and devastating chronic disease. Determining the specific diagnosis of NMD is challenging due to thousands of unique or rare genetic variants that result in overlapping phenotypes. We present four unique childhood myopathy cases characterized by relatively mild muscle weakness, slowly progressing course, mildly elevated creatine phosphokinase (CPK), and contractures. We also present two additional cases characterized by severe prenatal/neonatal myopathy. Prior extensive genetic testing and histology of these cases did not reveal the genetic etiology of disease. Here, we applied whole exome sequencing (WES) and bioinformatics to identify likely causal pathogenic variants in each pedigree. In two cases, we identified novel pathogenic variants in COL6A3. In a third case, we identified novel likely pathogenic variants in COL6A6 and COL6A3. We identified a novel splice variant in EMD in a fourth case. Finally, we classify two cases as calcium channelopathies with identification of novel pathogenic variants in RYR1 and CACNA1S. These are the first cases of myopathies reported to be caused by variants in COL6A6 and CACNA1S. Our results demonstrate the utility and genetic diagnostic value of WES in the broad class of NMD phenotypes. PMID:26247046

  16. Sequencing rare and common APOL1 coding variants to determine kidney disease risk.

    PubMed

    Limou, Sophie; Nelson, George W; Lecordier, Laurence; An, Ping; O'hUigin, Colm S; David, Victor A; Binns-Roemer, Elizabeth A; Guiblet, Wilfried M; Oleksyk, Taras K; Pays, Etienne; Kopp, Jeffrey B; Winkler, Cheryl A

    2015-10-01

    A third of African Americans with sporadic focal segmental glomerulosclerosis (FSGS) or HIV-associated nephropathy (HIVAN) do not carry APOL1 renal risk genotypes. This raises the possibility that other APOL1 variants may contribute to kidney disease. To address this question, we sequenced all APOL1 exons in 1437 Americans of African and European descent, including 464 patients with biopsy-proven FSGS/HIVAN. Testing for association with 33 common and rare variants with FSGS/HIVAN revealed no association independent of strong recessive G1 and G2 effects. Seeking additional variants that might have been under selection by pathogens and could represent candidates for kidney disease risk, we also sequenced an additional 1112 individuals representing 53 global populations. Except for G1 and G2, none of the 7 common codon-altering variants showed evidence of selection or could restore lysis against trypanosomes causing human African trypanosomiasis. Thus, only APOL1 G1 and G2 confer renal risk, and other common and rare APOL1 missense variants, including the archaic G3 haplotype, do not contribute to sporadic FSGS and HIVAN in the US population. Hence, in most potential clinical or screening applications, our study suggests that sequencing APOL1 exons is unlikely to bring additional information compared to genotyping only APOL1 G1 and G2 risk alleles. PMID:25993319

  17. Rare variant discovery by deep whole-genome sequencing of 1,070 Japanese individuals

    PubMed Central

    Nagasaki, Masao; Yasuda, Jun; Katsuoka, Fumiki; Nariai, Naoki; Kojima, Kaname; Kawai, Yosuke; Yamaguchi-Kabata, Yumi; Yokozawa, Junji; Danjoh, Inaho; Saito, Sakae; Sato, Yukuto; Mimori, Takahiro; Tsuda, Kaoru; Saito, Rumiko; Pan, Xiaoqing; Nishikawa, Satoshi; Ito, Shin; Kuroki, Yoko; Tanabe, Osamu; Fuse, Nobuo; Kuriyama, Shinichi; Kiyomoto, Hideyasu; Hozawa, Atsushi; Minegishi, Naoko; Douglas Engel, James; Kinoshita, Kengo; Kure, Shigeo; Yaegashi, Nobuo; Tsuboi, Akito; Nagami, Fuji; Kawame, Hiroshi; Tomita, Hiroaki; Tsuji, Ichiro; Nakaya, Jun; Sugawara, Junichi; Suzuki, Kichiya; Kikuya, Masahiro; Abe, Michiaki; Nakaya, Naoki; Osumi, Noriko; Yamashita, Riu; Ogishima, Soichi; Takai, Takako; Tominaga, Teiji; Taki, Yasuyuki; Suzuki, Yoichi; Yamamoto, Masayuki

    2015-01-01

    The Tohoku Medical Megabank Organization reports the whole-genome sequences of 1,070 healthy Japanese individuals and construction of a Japanese population reference panel (1KJPN). Here we identify through this high-coverage sequencing (32.4 × on average), 21.2 million, including 12 million novel, single-nucleotide variants (SNVs) at an estimated false discovery rate of <1.0%. This detailed analysis detected signatures for purifying selection on regulatory elements as well as coding regions. We also catalogue structural variants, including 3.4 million insertions and deletions, and 25,923 genic copy-number variants. The 1KJPN was effective for imputing genotypes of the Japanese population genome wide. These data demonstrate the value of high-coverage sequencing for constructing population-specific variant panels, which covers 99.0% SNVs of minor allele frequency ≥0.1%, and its value for identifying causal rare variants of complex human disease phenotypes in genetic association studies. PMID:26292667

  18. Rare variant phasing and haplotypic expression from RNA sequencing with phASER.

    PubMed

    Castel, Stephane E; Mohammadi, Pejman; Chung, Wendy K; Shen, Yufeng; Lappalainen, Tuuli

    2016-01-01

    Haplotype phasing of genetic variants is important for clinical interpretation of the genome, population genetic analysis and functional genomic analysis of allelic activity. Here we present phASER, an accurate approach for phasing variants that are overlapped by sequencing reads, including those from RNA sequencing (RNA-seq), which often span multiple exons due to splicing. Using diverse RNA-seq data we demonstrate that this provides more accurate phasing of rare variants compared with population-based phasing and allows phasing of variants in the same gene up to hundreds of kilobases away that cannot be obtained from DNA sequencing (DNA-seq) reads. We show that in the context of medical genetic studies this improves the resolution of compound heterozygotes. Additionally, phASER provides measures of haplotypic expression that increase power and accuracy in studies of allelic expression. In summary, phasing using RNA-seq and phASER is accurate and improves studies where rare variant haplotypes or allelic expression is needed. PMID:27605262

  19. Common 5S rRNA variants are likely to be accepted in many sequence contexts

    NASA Technical Reports Server (NTRS)

    Zhang, Zhengdong; D'Souza, Lisa M.; Lee, Youn-Hyung; Fox, George E.

    2003-01-01

    Over evolutionary time RNA sequences which are successfully fixed in a population are selected from among those that satisfy the structural and chemical requirements imposed by the function of the RNA. These sequences together comprise the structure space of the RNA. In principle, a comprehensive understanding of RNA structure and function would make it possible to enumerate which specific RNA sequences belong to a particular structure space and which do not. We are using bacterial 5S rRNA as a model system to attempt to identify principles that can be used to predict which sequences do or do not belong to the 5S rRNA structure space. One promising idea is the very intuitive notion that frequently seen sequence changes in an aligned data set of naturally occurring 5S rRNAs would be widely accepted in many other 5S rRNA sequence contexts. To test this hypothesis, we first developed well-defined operational definitions for a Vibrio region of the 5S rRNA structure space and what is meant by a highly variable position. Fourteen sequence variants (10 point changes and 4 base-pair changes) were identified in this way, which, by the hypothesis, would be expected to incorporate successfully in any of the known sequences in the Vibrio region. All 14 of these changes were constructed and separately introduced into the Vibrio proteolyticus 5S rRNA sequence where they are not normally found. Each variant was evaluated for its ability to function as a valid 5S rRNA in an E. coli cellular context. It was found that 93% (13/14) of the variants tested are likely valid 5S rRNAs in this context. In addition, seven variants were constructed that, although present in the Vibrio region, did not meet the stringent criteria for a highly variable position. In this case, 86% (6/7) are likely valid. As a control we also examined seven variants that are seldom or never seen in the Vibrio region of 5S rRNA sequence space. In this case only two of seven were found to be potentially valid. The

  20. Molecular Cloning and Expression of Sequence Variants of Manganese Superoxide Dismutase Genes from Wheat

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Reactive oxygen species (ROS) are very harmful to living organisms due to the potential oxidation of membrane lipids, DNA, proteins, and carbohydrates. Transformed E.coli strain QC 871, superoxide dismutase (SOD) double-mutant, with three sequence variant MnSOD1, MnSOD2, and MnSOD3 manganese supero...

  1. An integrated framework for discovery and genotyping of genomic variants from high-throughput sequencing experiments.

    PubMed

    Duitama, Jorge; Quintero, Juan Camilo; Cruz, Daniel Felipe; Quintero, Constanza; Hubmann, Georg; Foulquié-Moreno, Maria R; Verstrepen, Kevin J; Thevelein, Johan M; Tohme, Joe

    2014-04-01

    Recent advances in high-throughput sequencing (HTS) technologies and computing capacity have produced unprecedented amounts of genomic data that have unraveled the genetics of phenotypic variability in several species. However, operating and integrating current software tools for data analysis still require important investments in highly skilled personnel. Developing accurate, efficient and user-friendly software packages for HTS data analysis will lead to a more rapid discovery of genomic elements relevant to medical, agricultural and industrial applications. We therefore developed Next-Generation Sequencing Eclipse Plug-in (NGSEP), a new software tool for integrated, efficient and user-friendly detection of single nucleotide variants (SNVs), indels and copy number variants (CNVs). NGSEP includes modules for read alignment, sorting, merging, functional annotation of variants, filtering and quality statistics. Analysis of sequencing experiments in yeast, rice and human samples shows that NGSEP has superior accuracy and efficiency, compared with currently available packages for variants detection. We also show that only a comprehensive and accurate identification of repeat regions and CNVs allows researchers to properly separate SNVs from differences between copies of repeat elements. We expect that NGSEP will become a strong support tool to empower the analysis of sequencing data in a wide range of research projects on different species. PMID:24413664

  2. An integrated framework for discovery and genotyping of genomic variants from high-throughput sequencing experiments

    PubMed Central

    Duitama, Jorge; Quintero, Juan Camilo; Cruz, Daniel Felipe; Quintero, Constanza; Hubmann, Georg; Foulquié-Moreno, Maria R.; Verstrepen, Kevin J.; Thevelein, Johan M.; Tohme, Joe

    2014-01-01

    Recent advances in high-throughput sequencing (HTS) technologies and computing capacity have produced unprecedented amounts of genomic data that have unraveled the genetics of phenotypic variability in several species. However, operating and integrating current software tools for data analysis still require important investments in highly skilled personnel. Developing accurate, efficient and user-friendly software packages for HTS data analysis will lead to a more rapid discovery of genomic elements relevant to medical, agricultural and industrial applications. We therefore developed Next-Generation Sequencing Eclipse Plug-in (NGSEP), a new software tool for integrated, efficient and user-friendly detection of single nucleotide variants (SNVs), indels and copy number variants (CNVs). NGSEP includes modules for read alignment, sorting, merging, functional annotation of variants, filtering and quality statistics. Analysis of sequencing experiments in yeast, rice and human samples shows that NGSEP has superior accuracy and efficiency, compared with currently available packages for variants detection. We also show that only a comprehensive and accurate identification of repeat regions and CNVs allows researchers to properly separate SNVs from differences between copies of repeat elements. We expect that NGSEP will become a strong support tool to empower the analysis of sequencing data in a wide range of research projects on different species. PMID:24413664

  3. Rare FOXC1 variants in congenital glaucoma: identification of translation regulatory sequences.

    PubMed

    Medina-Trillo, Cristina; Aroca-Aguilar, José-Daniel; Méndez-Hernández, Carmen-Dora; Morales, Laura; García-Antón, Maite; García-Feijoo, Julián; Escribano, Julio

    2016-05-01

    Primary congenital glaucoma (PCG) is the cause of a significant proportion of inherited visual loss in children, but the underlying mechanism is poorly understood. In this study, we assessed the relationship between PCG and FOXC1 variants by Sanger sequencing the proximal promoter and transcribed sequence of FOXC1 from a cohort of 133 PCG families with no known CYP1B1 or MYOC mutations. The pathogenicity of the identified variants was evaluated by functional analyses. Ten patients (7.5%) with no family history of glaucoma carried five different rare heterozygous FOXC1 variants with both increased (rs77888940:C>G, c.-429C>G, rs730882054:c.1134_144del(CGGCGGCGCGG), p.(G380Rfs*144) and rs35717904:A>T, c.*734A>T) and decreased (rs185790394: C>T, c.-244C>T and rs79691946:C>T, p.(P297S)) transactivation, ranging from 50 to 180% of the wild-type activity. The five variants did not show monogenic segregation, and four of them were absent in a control group (n=233). To the best of our knowledge, one of these variants (p.(G380Rfs*144)) has not previously been described. One of the FOXC1 variant carriers (p.(P297S)) also coinherited a functionally altered rare PITX2 heterozygous variant (rs6533526:C>T, c.*454C>T). Bioinformatics and functional analyses provided novel information on three of these variants. c.-429C>G potentially disrupts a consensus sequence for a terminal oligopyrimidine tract, whereas c.-244C>T may alter the RNA secondary structure in the 5'-untranslated region (UTR) that affects mRNA translation. In addition, p.(G380Rfs*144) led to increased protein stability. In summary, these data reveal the presence of translation regulatory sequences in the UTRs of FOXC1 and provide evidence for a possible role of rare FOXC1 variants as modifying factors of goniodysgenesis in PCG. PMID:26220699

  4. Phenolic acid esterases, coding sequences and methods

    DOEpatents

    Blum, David L.; Kataeva, Irina; Li, Xin-Liang; Ljungdahl, Lars G.

    2002-01-01

    Described herein are four phenolic acid esterases, three of which correspond to domains of previously unknown function within bacterial xylanases, from XynY and XynZ of Clostridium thermocellum and from a xylanase of Ruminococcus. The fourth specifically exemplified xylanase is a protein encoded within the genome of Orpinomyces PC-2. The amino acids of these polypeptides and nucleotide sequences encoding them are provided. Recombinant host cells, expression vectors and methods for the recombinant production of phenolic acid esterases are also provided.

  5. Amino-Acid Sequence of Porcine Pepsin

    PubMed Central

    Tang, J.; Sepulveda, P.; Marciniszyn, J.; Chen, K. C. S.; Huang, W-Y.; Tao, N.; Liu, D.; Lanier, J. P.

    1973-01-01

    As the culmination of several years of experiments, we propose a complete amino-acid sequence for porcine pepsin, an enzyme containing 327 amino-acid residues in a single polypeptide chain. In the sequence determination, the enzyme was treated with cyanogen bromide. Five resulting fragments were purified. The amino-acid sequence of four of the fragments accounted for 290 residues. Because the structure of a 37-residue carboxyl-terminal fragment was already known, it was not studied. The alignment of these fragments was determined from the sequence of methionyl-peptides we had previously reported. We also discovered the locations of activesite aspartyl residues, as well as the pairing of the three disulfide bridges. A minor component of commercial crystalline pepsin was found to contain two extra amino-acid residues, Ala-Leu-, at the amino-terminus of the molecule. This minor component was apparently derived from a different site of cleavage during the activation of porcine pepsinogen. PMID:4587252

  6. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1998-01-01

    A method for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe.

  7. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1998-07-21

    A method is disclosed for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe. 11 figs.

  8. MutaBind estimates and interprets the effects of sequence variants on protein-protein interactions.

    PubMed

    Li, Minghui; Simonetti, Franco L; Goncearenco, Alexander; Panchenko, Anna R

    2016-07-01

    Proteins engage in highly selective interactions with their macromolecular partners. Sequence variants that alter protein binding affinity may cause significant perturbations or complete abolishment of function, potentially leading to diseases. There exists a persistent need to develop a mechanistic understanding of impacts of variants on proteins. To address this need we introduce a new computational method MutaBind to evaluate the effects of sequence variants and disease mutations on protein interactions and calculate the quantitative changes in binding affinity. The MutaBind method uses molecular mechanics force fields, statistical potentials and fast side-chain optimization algorithms. The MutaBind server maps mutations on a structural protein complex, calculates the associated changes in binding affinity, determines the deleterious effect of a mutation, estimates the confidence of this prediction and produces a mutant structural model for download. MutaBind can be applied to a large number of problems, including determination of potential driver mutations in cancer and other diseases, elucidation of the effects of sequence variants on protein fitness in evolution and protein design. MutaBind is available at http://www.ncbi.nlm.nih.gov/projects/mutabind/. PMID:27150810

  9. Using Whole Exome Sequencing to Identify Candidate Genes With Rare Variants In Nonsyndromic Cleft Lip and Palate.

    PubMed

    Aylward, Alana; Cai, Yi; Lee, Andrew; Blue, Elizabeth; Rabinowitz, Daniel; Haddad, Joseph

    2016-07-01

    Studies suggest that nonsyndromic cleft lip and palate (NSCLP) is polygenic with variable penetrance, presenting a challenge in identifying all causal genetic variants. Despite relatively high prevalence of NSCLP among Amerindian populations, no large whole exome sequencing (WES) studies have been completed in this population. Our goal was to identify candidate genes with rare genetic variants for NSCLP in a Honduran population using WES. WES was performed on two to four members of 27 multiplex Honduran families. Genetic variants with a minor allele frequency > 1% in reference databases were removed. Heterozygous variants consistent with dominant disease with incomplete penetrance were ascertained, and variants with predicted functional consequence were prioritized for analysis. Pedigree-specific P-values were calculated as the probability of all affected members in the pedigree being carriers, given that at least one is a carrier. Preliminary results identified 3,727 heterozygous rare variants; 1,282 were predicted to be functionally consequential. Twenty-three genes had variants of interest in ≥3 families, where some genes had different variants in each family, giving a total of 50 variants. Variant validation via Sanger sequencing of the families and unrelated unaffected controls excluded variants that were sequencing errors or common variants not in databases, leaving four genes with candidate variants in ≥3 families. Of these, candidate variants in two genes consistently segregate with NSCLP as a dominant variant with incomplete penetrance: ACSS2 and PHYH. Rare variants found at the same gene in all affected individuals in several families are likely to be directly related to NSCLP. PMID:27229527

  10. Sequencing of SCN5A identifies rare and common variants associated with cardiac conduction

    PubMed Central

    Magnani, Jared W.; Brody, Jennifer A.; Prins, Bram P.; Arking, Dan E.; Lin, Honghuang; Yin, Xiaoyan; Liu, Ching-Ti; Morrison, Alanna C.; Zhang, Feng; Spector, Tim D.; Alonso, Alvaro; Bis, Joshua C.; Heckbert, Susan R.; Lumley, Thomas; Sitlani, Colleen M.; Cupples, L. Adrienne; Lubitz, Steven A.; Soliman, Elsayed Z.; Pulit, Sara L.; Newton-Cheh, Christopher; O'Donnell, Christopher J.; Ellinor, Patrick T.; Benjamin, Emelia J.; Muzny, Donna M.; Gibbs, Richard A.; Santibanez, Jireh; Taylor, Herman A.; Rotter, Jerome I.; Lange, Leslie A.; Psaty, Bruce M.; Jackson, Rebecca; Rich, Stephen S.; Boerwinkle, Eric; Jamshidi, Yalda; Sotoodehnia, Nona

    2014-01-01

    Background The cardiac sodium channel SCN5A regulates atrioventricular and ventricular conduction. Genetic variants in this gene are associated with PR and QRS intervals. We sought to further characterize the contribution of rare and common coding variation in SCN5A to cardiac conduction. Methods and Results In the Cohorts for Heart and Aging Research in Genomic Epidemiology Targeted Sequencing Study (CHARGE), we performed targeted exonic sequencing of SCN5A (n=3699, European-ancestry individuals) and identified 4 common (minor allele frequency >1%) and 157 rare variants. Common and rare SCN5A coding variants were examined for association with PR and QRS intervals through meta-analysis of European ancestry participants from CHARGE, NHLBI’s Exome Sequencing Project (ESP, n=607) and the UK10K (n=1275) and by examining ESP African-ancestry participants (N=972). Rare coding SCN5A variants in aggregate were associated with PR interval in European and African-ancestry participants (P=1.3×10−3). Three common variants were associated with PR and/or QRS interval duration among European-ancestry participants and one among African-ancestry participants. These included two well-known missense variants; rs1805124 (H558R) was associated with PR and QRS shortening in European-ancestry participants (P=6.25×10−4 and P=5.2×10−3 respectively) and rs7626962 (S1102Y) was associated with PR shortening in those of African ancestry (P=2.82×10−3). Among European-ancestry participants, two novel synonymous variants, rs1805126 and rs6599230, were associated with cardiac conduction. Our top signal, rs1805126 was associated with PR and QRS lengthening (P=3.35×10−7 and P=2.69×10−4 respectively), and rs6599230 was associated with PR shortening (P=2.67×10−5). Conclusions By sequencing SCN5A, we identified novel common and rare coding variants associated with cardiac conduction. PMID:24951663

  11. ATM sequence variants and risk of radiation-induced subcutaneous fibrosis after postmastectomy radiotherapy

    SciTech Connect

    Andreassen, Christian N.; Overgaard, Jens; Alsner, Jan; Overgaard, Marie; Herskind, Carsten; Cesaretti, Jamie A.; Atencio, David P.; Green, Sheryl; Formenti, Silvia C.; Stock, Richard G.; Rosenstein, Barry S. . E-mail: barry.rosenstein@mssm.edu

    2006-03-01

    Purpose: To examine the hypothesis that women who are carriers of genetic alterations in the ATM gene are more likely to develop subcutaneous fibrosis after radiotherapy for treatment of breast cancer compared with patients who do not possess DNA sequence variations in this gene. Methods and Materials: DNA samples isolated from fibroblast cell lines established from 41 women treated with postmastectomy radiotherapy for breast cancer were screened for genetic variants in ATM using denaturing high-performance liquid chromatography (DHPLC). A minimum follow-up of 2 years enabled analysis of late effects to generate dose-response curves and to estimate the dose that resulted in a 50% incidence of Grade 3 fibrosis (ED{sub 5}). Results: A total of 26 genetic alterations in the expressed portions of the ATM gene, or within 10 bases of each exon in regions encompassing putative splice sites, were detected in 22 patients. The ED{sub 5} (95% confidence interval) of 60.2 (55.7-65.1) Gy calculated for patients without a sequence variation did not differ significantly from the ED{sub 5} of 58.4 (54.0-63.1) Gy for the group of patients with any ATM sequence abnormality. The ED{sub 5} of 53.7 (50.2-57.5) Gy for those patients who were either homozygous or heterozygous for the G{sup {yields}}A polymorphism at nucleotide 5557, which results in substitution of asparagine for aspartic acid at position 1853 of the ATM protein, was substantially lower than the ED{sub 5} of 60.8 (57.0-64.8) Gy for patients not carriers of this sequence alteration. This resulted in an enhancement ratio (ratio of the ED{sub 5} values) of 1.13 (1.05-1.22), which was significantly greater than unity. Conclusion: The results of this study suggest an association between the ATM codon 1853 Asn/Asp and Asn/Asn genotypes with the development of Grade 3 fibrosis in breast cancer patients treated with radiotherapy.

  12. FAVR (Filtering and Annotation of Variants that are Rare): methods to facilitate the analysis of rare germline genetic variants from massively parallel sequencing datasets

    PubMed Central

    2013-01-01

    Background Characterising genetic diversity through the analysis of massively parallel sequencing (MPS) data offers enormous potential to significantly improve our understanding of the genetic basis for observed phenotypes, including predisposition to and progression of complex human disease. Great challenges remain in resolving genetic variants that are genuine from the millions of artefactual signals. Results FAVR is a suite of new methods designed to work with commonly used MPS analysis pipelines to assist in the resolution of some of the issues related to the analysis of the vast amount of resulting data, with a focus on relatively rare genetic variants. To the best of our knowledge, no equivalent method has previously been described. The most important and novel aspect of FAVR is the use of signatures in comparator sequence alignment files during variant filtering, and annotation of variants potentially shared between individuals. The FAVR methods use these signatures to facilitate filtering of (i) platform and/or mapping-specific artefacts, (ii) common genetic variants, and, where relevant, (iii) artefacts derived from imbalanced paired-end sequencing, as well as annotation of genetic variants based on evidence of co-occurrence in individuals. We applied conventional variant calling applied to whole-exome sequencing datasets, produced using both SOLiD and TruSeq chemistries, with or without downstream processing by FAVR methods. We demonstrate a 3-fold smaller rare single nucleotide variant shortlist with no detected reduction in sensitivity. This analysis included Sanger sequencing of rare variant signals not evident in dbSNP131, assessment of known variant signal preservation, and comparison of observed and expected rare variant numbers across a range of first cousin pairs. The principles described herein were applied in our recent publication identifying XRCC2 as a new breast cancer risk gene and have been made publically available as a suite of software

  13. Methods for analyzing nucleic acid sequences

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid. The method provides a complex comprising a polymerase enzyme, a target nucleic acid molecule, and a primer, wherein the complex is immobilized on a support Fluorescent label is attached to a terminal phosphate group of the nucleotide or nucleotide analog. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The time duration of the signal from labeled nucleotides or nucleotide analogs that become incorporated is distinguished from freely diffusing labels by a longer retention in the observation volume for the nucleotides or nucleotide analogs that become incorporated than for the freely diffusing labels.

  14. Rapid Detection of Rare Deleterious Variants by Next Generation Sequencing with Optional Microarray SNP Genotype Data

    PubMed Central

    Watson, Christopher M.; Crinnion, Laura A.; Gurgel‐Gianetti, Juliana; Harrison, Sally M.; Daly, Catherine; Antanavicuite, Agne; Lascelles, Carolina; Markham, Alexander F.; Pena, Sergio D. J.; Bonthron, David T.

    2015-01-01

    ABSTRACT Autozygosity mapping is a powerful technique for the identification of rare, autosomal recessive, disease‐causing genes. The ease with which this category of disease gene can be identified has greatly increased through the availability of genome‐wide SNP genotyping microarrays and subsequently of exome sequencing. Although these methods have simplified the generation of experimental data, its analysis, particularly when disparate data types must be integrated, remains time consuming. Moreover, the huge volume of sequence variant data generated from next generation sequencing experiments opens up the possibility of using these data instead of microarray genotype data to identify disease loci. To allow these two types of data to be used in an integrated fashion, we have developed AgileVCFMapper, a program that performs both the mapping of disease loci by SNP genotyping and the analysis of potentially deleterious variants using exome sequence variant data, in a single step. This method does not require microarray SNP genotype data, although analysis with a combination of microarray and exome genotype data enables more precise delineation of disease loci, due to superior marker density and distribution. PMID:26037133

  15. De novo sequencing and variant calling with nanopores using PoreSeq

    PubMed Central

    Szalay, Tamas; Golovchenko, Jene A.

    2016-01-01

    The single-molecule accuracy of nanopore sequencing has been an area of rapid academic and commercial advancement, but remains challenging for the de novo analysis of genomes. We introduce here a novel algorithm for the error correction of nanopore data, utilizing statistical models of the physical system in order to obtain high accuracy de novo sequences at a range of coverage depths. We demonstrate the technique by sequencing M13 bacteriophage DNA to 99% accuracy at moderate coverage as well as its use in an assembly pipeline by sequencing E. coli and λ DNA at a range of coverages. We also show the algorithm’s ability to accurately classify sequence variants at far lower coverage than existing methods. PMID:26352647

  16. CLAMMS: a scalable algorithm for calling common and rare copy number variants from exome sequencing data

    PubMed Central

    Packer, Jonathan S.; Maxwell, Evan K.; O’Dushlaine, Colm; Lopez, Alexander E.; Dewey, Frederick E.; Chernomorsky, Rostislav; Baras, Aris; Overton, John D.; Habegger, Lukas; Reid, Jeffrey G.

    2016-01-01

    Motivation: Several algorithms exist for detecting copy number variants (CNVs) from human exome sequencing read depth, but previous tools have not been well suited for large population studies on the order of tens or hundreds of thousands of exomes. Their limitations include being difficult to integrate into automated variant-calling pipelines and being ill-suited for detecting common variants. To address these issues, we developed a new algorithm—Copy number estimation using Lattice-Aligned Mixture Models (CLAMMS)—which is highly scalable and suitable for detecting CNVs across the whole allele frequency spectrum. Results: In this note, we summarize the methods and intended use-case of CLAMMS, compare it to previous algorithms and briefly describe results of validation experiments. We evaluate the adherence of CNV calls from CLAMMS and four other algorithms to Mendelian inheritance patterns on a pedigree; we compare calls from CLAMMS and other algorithms to calls from SNP genotyping arrays for a set of 3164 samples; and we use TaqMan quantitative polymerase chain reaction to validate CNVs predicted by CLAMMS at 39 loci (95% of rare variants validate; across 19 common variant loci, the mean precision and recall are 99% and 94%, respectively). In the Supplementary Materials (available at the CLAMMS Github repository), we present our methods and validation results in greater detail. Availability and implementation: https://github.com/rgcgithub/clamms (implemented in C). Contact: jeffrey.reid@regeneron.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26382196

  17. HLA class II sequence variants influence tuberculosis risk in populations of European ancestry.

    PubMed

    Sveinbjornsson, Gardar; Gudbjartsson, Daniel F; Halldorsson, Bjarni V; Kristinsson, Karl G; Gottfredsson, Magnus; Barrett, Jeffrey C; Gudmundsson, Larus J; Blondal, Kai; Gylfason, Arnaldur; Gudjonsson, Sigurjon Axel; Helgadottir, Hafdis T; Jonasdottir, Adalbjorg; Jonasdottir, Aslaug; Karason, Ari; Kardum, Ljiljana Bulat; Knežević, Jelena; Kristjansson, Helgi; Kristjansson, Mar; Love, Arthur; Luo, Yang; Magnusson, Olafur T; Sulem, Patrick; Kong, Augustine; Masson, Gisli; Thorsteinsdottir, Unnur; Dembic, Zlatko; Nejentsev, Sergey; Blondal, Thorsteinn; Jonsdottir, Ingileif; Stefansson, Kari

    2016-03-01

    Mycobacterium tuberculosis infections cause 9 million new tuberculosis cases and 1.5 million deaths annually. To identify variants conferring risk of tuberculosis, we tested 28.3 million variants identified through whole-genome sequencing of 2,636 Icelanders for association with tuberculosis (8,162 cases and 277,643 controls), pulmonary tuberculosis (PTB) and M. tuberculosis infection. We found association of three variants in the region harboring genes encoding the class II human leukocyte antigens (HLAs): rs557011[T] (minor allele frequency (MAF) = 40.2%), associated with M. tuberculosis infection (odds ratio (OR) = 1.14, P = 3.1 × 10(-13)) and PTB (OR = 1.25, P = 5.8 × 10(-12)), and rs9271378[G] (MAF = 32.5%), associated with PTB (OR = 0.78, P = 2.5 × 10(-12))--both located between HLA-DQA1 and HLA-DRB1--and a missense variant encoding p.Ala210Thr in HLA-DQA1 (MAF = 19.1%, rs9272785), associated with M. tuberculosis infection (P = 9.3 × 10(-9), OR = 1.14). We replicated association of these variants with PTB in samples of European ancestry from Russia and Croatia (P < 5.9 × 10(-4)). These findings show that the HLA class II region contributes to genetic risk of tuberculosis, possibly through reduced presentation of protective M. tuberculosis antigens to T cells. PMID:26829749

  18. Adjusted sequence kernel association test for rare variants controlling for cryptic and family relatedness.

    PubMed

    Oualkacha, Karim; Dastani, Zari; Li, Rui; Cingolani, Pablo E; Spector, Timothy D; Hammond, Christopher J; Richards, J Brent; Ciampi, Antonio; Greenwood, Celia M T

    2013-05-01

    Recent progress in sequencing technologies makes it possible to identify rare and unique variants that may be associated with complex traits. However, the results of such efforts depend crucially on the use of efficient statistical methods and study designs. Although family-based designs might enrich a data set for familial rare disease variants, most existing rare variant association approaches assume independence of all individuals. We introduce here a framework for association testing of rare variants in family-based designs. This framework is an adaptation of the sequence kernel association test (SKAT) which allows us to control for family structure. Our adjusted SKAT (ASKAT) combines the SKAT approach and the factored spectrally transformed linear mixed models (FaST-LMMs) algorithm to capture family effects based on a LMM incorporating the realized proportion of the genome that is identical by descent between pairs of individuals, and using restricted maximum likelihood methods for estimation. In simulation studies, we evaluated type I error and power of this proposed method and we showed that regardless of the level of the trait heritability, our approach has good control of type I error and good power. Since our approach uses FaST-LMM to calculate variance components for the proposed mixed model, ASKAT is reasonably fast and can analyze hundreds of thousands of markers. Data from the UK twins consortium are presented to illustrate the ASKAT methodology. PMID:23529756

  19. Allelic diversity in MCAD deficiency: the biochemical classification of 54 variants identified during 5 years of ACADM sequencing.

    PubMed

    Smith, Emily H; Thomas, Cheryl; McHugh, David; Gavrilov, Dimitar; Raymond, Kimiyo; Rinaldo, Piero; Tortorelli, Silvia; Matern, Dietrich; Highsmith, W Edward; Oglesbee, Devin

    2010-07-01

    Medium-chain acyl-coA dehydrogenase (MCAD) deficiency is a commonly detected fatty acid oxidation disorder and its diagnosis relies on both biochemical and molecular analyses. Over a 5-year period, sequencing all 12 exons of the MCAD gene (ACADM) in our laboratory revealed a total of 54 variants in 549 subjects analyzed. As most molecular ACADM testing is referred for the follow-up of an abnormal newborn screening result obtained from an asymptomatic newborn, the identification of a novel DNA variant, or "variant of unknown significance (VUS)," presents clinicians with a dilemma. Frequently, the results of molecular analyses are correlated to biochemical findings, such as the concentration of octanoylcarnitine (C8) in plasma and the excretion of hexanoylglycine (HG) in urine. Here, we describe the classification of genotypes harboring at least one VUS through the comparison of C8 and HG values measured in individuals who are carriers of, or affected with, MCAD deficiency on the basis of the following genotypes: c.985A>G/wildtype, c.199T>C/c.985A>G and c.985A>G/c.985A>G. Our findings emphasize the importance of obtaining both plasma and urine when following up positive newborn screening results and may influence the way physicians counsel their asymptomatic patients about MCAD deficiency after genetic analysis. PMID:20434380

  20. Visualization and probability-based scoring of structural variants within repetitive sequences

    PubMed Central

    Halper-Stromberg, Eitan; Steranka, Jared; Burns, Kathleen H.; Sabunciyan, Sarven; Irizarry, Rafael A.

    2014-01-01

    Motivation: Repetitive sequences account for approximately half of the human genome. Accurately ascertaining sequences in these regions with next generation sequencers is challenging, and requires a different set of analytical techniques than for reads originating from unique sequences. Complicating the matter are repetitive regions subject to programmed rearrangements, as is the case with the antigen-binding domains in the Immunoglobulin (Ig) and T-cell receptor (TCR) loci. Results: We developed a probability-based score and visualization method to aid in distinguishing true structural variants from alignment artifacts. We demonstrate the usefulness of this method in its ability to separate real structural variants from false positives generated with existing upstream analysis tools. We validated our approach using both target-capture and whole-genome experiments. Capture sequencing reads were generated from primary lymphoid tumors, cancer cell lines and an EBV-transformed lymphoblast cell line over the Ig and TCR loci. Whole-genome sequencing reads were from a lymphoblastoid cell-line. Availability: We implement our method as an R package available at https://github.com/Eitan177/targetSeqView. Code to reproduce the figures and results are also available. Contact: ehalper2@jhmi.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24501098

  1. Polymorphisms and variants in the prion protein sequence of European moose (Alces alces), reindeer (Rangifer tarandus), roe deer (Capreolus capreolus) and fallow deer (Dama dama) in Scandinavia

    PubMed Central

    Wik, Lotta; Mikko, Sofia; Klingeborn, Mikael; Stéen, Margareta; Simonsson, Magnus; Linné, Tommy

    2012-01-01

    The prion protein (PrP) sequence of European moose, reindeer, roe deer and fallow deer in Scandinavia has high homology to the PrP sequence of North American cervids. Variants in the European moose PrP sequence were found at amino acid position 109 as K or Q. The 109Q variant is unique in the PrP sequence of vertebrates. During the 1980s a wasting syndrome in Swedish moose, Moose Wasting Syndrome (MWS), was described. SNP analysis demonstrated a difference in the observed genotype proportions of the heterozygous Q/K and homozygous Q/Q variants in the MWS animals compared with the healthy animals. In MWS moose the allele frequencies for 109K and 109Q were 0.73 and 0.27, respectively, and for healthy animals 0.69 and 0.31. Both alleles were seen as heterozygotes and homozygotes. In reindeer, PrP sequence variation was demonstrated at codon 176 as D or N and codon 225 as S or Y. The PrP sequences in roe deer and fallow deer were identical with published GenBank sequences. PMID:22441661

  2. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications

    PubMed Central

    Mathieson, Iain; Iqbal, Zamin; Twigg, Stephen R F; Wilkie, Andrew O M; McVean, Gil; Lunter, Gerton

    2016-01-01

    High-throughput DNA sequencing technology has transformed genetic research and is starting to make an impact on clinical practice. However, analyzing high-throughput sequencing data remains challenging, particularly in clinical settings where accuracy and turnaround times are critical. We present a new approach to this problem, implemented in a software package called Platypus. Platypus achieves high sensitivity and specificity for SNPs, indels and complex polymorphisms by using local de novo assembly to generate candidate variants, followed by local realignment and probabilistic haplotype estimation. It is an order of magnitude faster than existing tools and generates calls from raw aligned read data without preprocessing. We demonstrate the performance of Platypus in clinically relevant experimental designs by comparing with SAMtools and GATK on whole-genome and exome-capture data, by identifying de novo variation in 15 parent-offspring trios with high sensitivity and specificity, and by estimating human leukocyte antigen genotypes directly from variant calls. PMID:25017105

  3. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications.

    PubMed

    Rimmer, Andy; Phan, Hang; Mathieson, Iain; Iqbal, Zamin; Twigg, Stephen R F; Wilkie, Andrew O M; McVean, Gil; Lunter, Gerton

    2014-08-01

    High-throughput DNA sequencing technology has transformed genetic research and is starting to make an impact on clinical practice. However, analyzing high-throughput sequencing data remains challenging, particularly in clinical settings where accuracy and turnaround times are critical. We present a new approach to this problem, implemented in a software package called Platypus. Platypus achieves high sensitivity and specificity for SNPs, indels and complex polymorphisms by using local de novo assembly to generate candidate variants, followed by local realignment and probabilistic haplotype estimation. It is an order of magnitude faster than existing tools and generates calls from raw aligned read data without preprocessing. We demonstrate the performance of Platypus in clinically relevant experimental designs by comparing with SAMtools and GATK on whole-genome and exome-capture data, by identifying de novo variation in 15 parent-offspring trios with high sensitivity and specificity, and by estimating human leukocyte antigen genotypes directly from variant calls. PMID:25017105

  4. Frameshift Sequence Variants in the Human Lipase-H Gene Causing Hypotrichosis.

    PubMed

    Mehmood, Sabba; Shah, Sayed Hajan; Jan, Abid; Younus, Muhammad; Ahmad, Farooq; Ayub, Muhammad; Ahmad, Wasim

    2016-01-01

    Hypotrichosis is a condition of abnormal hair pattern characterized by sparse to absent hair on different parts of the body, including the scalp. The condition is often characterized by tightly curled woolly hairs, discoloration of hair, and development of multiple keratin filled cysts or papules on the body. Sequence analysis of the lipase H (LIPH) gene, mapped on chromosome 3q27.3, led to the identification of a novel frameshift deletion variant (c.932delC, p.Pro311Leufs*3) in one family and previously reported 2-bp deletion (c.659_660delTA) in five other families, inherited hypotrichosis, and woolly hair in an autosomal recessive pattern. The study further extends the body of evidence that sequence variants in the LIPH gene result in hypotrichosis and woolly hair phenotype. PMID:26645693

  5. Possession of ATM Sequence Variants as Predictor for Late Normal Tissue Responses in Breast Cancer Patients Treated With Radiotherapy

    SciTech Connect

    Ho, Alice Y.; Fan, Grace; Atencio, David P.; Green, Sheryl; Formenti, Silvia C.; Haffty, Bruce G.; Iyengar, Preetha B.A.; Bernstein, Jonine L.; Stock, Richard G.; Cesaretti, Jamie A.; Rosenstein, Barry S.

    2007-11-01

    Purpose: The ATM gene product is a central component of cell cycle regulation and genomic surveillance. We hypothesized that DNA sequence alterations in ATM predict for adverse effects after external beam radiotherapy for early breast cancer. Methods and Materials: A total of 131 patients with a minimum of 2 years follow-up who had undergone breast-conserving surgery and adjuvant radiotherapy were screened for sequence alterations in ATM using DNA from blood lymphocytes. Genetic variants were identified using denaturing high performance liquid chromatography. The Radiation Therapy Oncology Group late morbidity scoring schemes for skin and subcutaneous tissues were applied to quantify the radiation-induced effects. Results: Of the 131 patients, 51 possessed ATM sequence alterations located within exons or in short intron regions flanking each exon that encompass putative splice site regions. Of these 51 patients, 21 (41%) exhibited a minimum of a Grade 2 late radiation response. In contrast, of the 80 patients without an ATM sequence variation, only 18 (23%) had radiation-induced adverse responses, for an odds ratio of 2.4 (95% confidence interval, 1.1-5.2). Fifteen patients were heterozygous for the G{yields}A polymorphism at nucleotide 5557, which causes substitution of asparagine for aspartic acid at position 1853 of the ATM protein. Of these 15 patients, 8 (53%) exhibited a Grade 2-4 late response compared with 31 (27%) of the 116 patients without this alteration, for an odds ratio of 3.1 (95% confidence interval, 1.1-9.4). Conclusion: Sequence variants located in the ATM gene, in particular the 5557 G{yields}A polymorphism, may predict for late adverse radiation responses in breast cancer patients.

  6. Ethanol and Reactive Species Increase Basal Sequence Heterogeneity of Hepatitis C Virus and Produce Variants with Reduced Susceptibility to Antivirals

    PubMed Central

    Seronello, Scott; Montanez, Jessica; Presleigh, Kristen; Barlow, Miriam; Park, Seung Bum; Choi, Jinah

    2011-01-01

    Hepatitis C virus (HCV) exhibits a high level of genetic variability, and variants with reduced susceptibility to antivirals can occur even before treatment begins. In addition, alcohol decreases efficacy of antiviral therapy and increases sequence heterogeneity of HCV RNA but how ethanol affects HCV sequence is unknown. Ethanol metabolism and HCV infection increase the level of reactive species that can alter cell metabolism, modify signaling, and potentially act as mutagen to the viral RNA. Therefore, we investigated whether ethanol and reactive species affected the basal sequence variability of HCV RNA in hepatocytes. Human hepatoma cells supporting a continuous replication of genotype 1b HCV RNA (Con1, AJ242652) were exposed to ethanol, acetaldehyde, hydrogen peroxide, or L-buthionine-S,R-sulfoximine (BSO) that decreases intracellular glutathione as seen in patients. Then, NS5A region was sequenced and compared with genotype 1b HCV sequences in the database. Ethanol and BSO elevated nucleotide and amino acid substitution rates of HCV RNA by 4–18 folds within 48 hrs which were accompanied by oxidative RNA damage. Iron chelator and glutathione ester decreased both RNA damage and mutation rates. Furthermore, infectious HCV and HCV core gene were sufficient to induce oxidative RNA damage even in the absence of ethanol or BSO. Interestingly, the dn/ds ratio and percentage of sites undergoing positive selection increased with ethanol and BSO, resulting in an increased detection of NS5A variants with reduced susceptibility to interferon alpha, cyclosporine, and ribavirin and others implicated in immune tolerance and modulation of viral replication. Therefore, alcohol is likely to synergize with virus-induced oxidative/nitrosative stress to modulate the basal mutation rate of HCV. Positive selection induced by alcohol and reactive species may contribute to antiviral resistance. PMID:22087316

  7. Using VAAST to Identify Disease-Associated Variants in Next-Generation Sequencing Data

    PubMed Central

    Kennedy, Brett; Kronenberg, Zev; Hu, Hao; Moore, Barry; Flygare, Steven; Reese, Martin G.; Jorde, Lynn B.; Yandell, Mark; Huff, Chad

    2014-01-01

    The VAAST pipeline is specifically designed to identify disease-associated alleles in next-generation sequencing data. In the protocols presented in this paper, we outline the best practices for variant prioritization using VAAST. Examples and test data are provided for case-control, small pedigree, and large pedigree analyses. These protocols will teach users the fundamentals of VAAST, VAAST 2.0, and pVAAST analyses. PMID:24763993

  8. Disease Variant Landscape of a Large Multiethnic Population of Moyamoya Patients by Exome Sequencing

    PubMed Central

    Shoemaker, Lorelei D.; Clark, Michael J.; Patwardhan, Anil; Chandratillake, Gemma; Garcia, Sarah; Chen, Rong; Morgan, Alexander A.; Leng, Nan; Kirk, Scott; Chen, Richard; Cook, Douglas J.; Snyder, Michael; Steinberg, Gary K.

    2015-01-01

    Moyamoya disease (MMD) is a rare disorder characterized by cerebrovascular occlusion and development of hemorrhage-prone collateral vessels. Approximately 10–12% of cases are familial, with a presumed low penetrance autosomal dominant pattern of inheritance. Diagnosis commonly occurs only after clinical presentation. The recent identification of the RNF213 founder mutation (p.R4810K) in the Asian population has made a significant contribution, but the etiology of this disease remains unclear. To further develop the variant landscape of MMD, we performed high-depth whole exome sequencing of 125 unrelated, predominantly nonfamilial, ethnically diverse MMD patients in parallel with 125 internally sequenced, matched controls using the same exome and analysis platform. Three subpopulations were established: Asian, Caucasian, and non-RNF213 founder mutation cases. We provided additional support for the previously observed RNF213 founder mutation (p.R4810K) in Asian cases (P = 6.01×10−5) that was enriched among East Asians compared to Southeast Asian and Pacific Islander cases (P = 9.52×10−4) and was absent in all Caucasian cases. The most enriched variant in Caucasian (P = 7.93×10−4) and non-RNF213 founder mutation (P = 1.51×10−3) cases was ZXDC (p.P562L), a gene involved in MHC Class II activation. Collapsing variant methodology ranked OBSCN, a gene involved in myofibrillogenesis, as most enriched in Caucasian (P = 1.07×10−4) and non-RNF213 founder mutation cases (P = 5.31×10−5). These findings further support the East Asian origins of the RNF213 (p.R4810K) variant and more fully describe the genetic landscape of multiethnic MMD, revealing novel, alternative candidate variants and genes that may be important in MMD etiology and diagnosis. PMID:26530418

  9. Disease Variant Landscape of a Large Multiethnic Population of Moyamoya Patients by Exome Sequencing.

    PubMed

    Shoemaker, Lorelei D; Clark, Michael J; Patwardhan, Anil; Chandratillake, Gemma; Garcia, Sarah; Chen, Rong; Morgan, Alexander A; Leng, Nan; Kirk, Scott; Chen, Richard; Cook, Douglas J; Snyder, Michael; Steinberg, Gary K

    2016-01-01

    Moyamoya disease (MMD) is a rare disorder characterized by cerebrovascular occlusion and development of hemorrhage-prone collateral vessels. Approximately 10-12% of cases are familial, with a presumed low penetrance autosomal dominant pattern of inheritance. Diagnosis commonly occurs only after clinical presentation. The recent identification of the RNF213 founder mutation (p.R4810K) in the Asian population has made a significant contribution, but the etiology of this disease remains unclear. To further develop the variant landscape of MMD, we performed high-depth whole exome sequencing of 125 unrelated, predominantly nonfamilial, ethnically diverse MMD patients in parallel with 125 internally sequenced, matched controls using the same exome and analysis platform. Three subpopulations were established: Asian, Caucasian, and non-RNF213 founder mutation cases. We provided additional support for the previously observed RNF213 founder mutation (p.R4810K) in Asian cases (P = 6.01×10(-5)) that was enriched among East Asians compared to Southeast Asian and Pacific Islander cases (P = 9.52×10(-4)) and was absent in all Caucasian cases. The most enriched variant in Caucasian (P = 7.93×10(-4)) and non-RNF213 founder mutation (P = 1.51×10(-3)) cases was ZXDC (p.P562L), a gene involved in MHC Class II activation. Collapsing variant methodology ranked OBSCN, a gene involved in myofibrillogenesis, as most enriched in Caucasian (P = 1.07×10(-4)) and non-RNF213 founder mutation cases (P = 5.31×10(-5)). These findings further support the East Asian origins of the RNF213 (p.R4810K) variant and more fully describe the genetic landscape of multiethnic MMD, revealing novel, alternative candidate variants and genes that may be important in MMD etiology and diagnosis. PMID:26530418

  10. Sequencing of LRP2 reveals multiple rare variants associated with urinary trefoil factor-3.

    PubMed

    McMahon, Gearoid M; Olden, Matthias; Garnaas, Maija; Yang, Qiong; Liu, Xuan; Hwang, Shih-Jen; Larson, Martin G; Goessling, Wolfram; Fox, Caroline S

    2014-12-01

    Novel biomarkers are being investigated to identify patients with kidney disease. We measured a panel of 13 urinary biomarkers in participants from the Offspring Cohort of the Framingham Heart Study. Using an Affymetrix chip with imputation to 2.5 M single-nucleotide polymorphisms (SNPs), we conducted a GWAS of these biomarkers (n=2640) followed by exonic sequencing and genotyping. Functional studies in zebrafish were used to investigate histologic correlation with renal function. Across all 13 biomarkers, there were 97 significant SNPs at three loci. Lead SNPs at each locus were rs6555820 (P=6.7×10(-49); minor allele frequency [MAF]=0.49) in HAVCR1 (associated with kidney injury molecule-1), rs7565788 (P=2.15×10(-16); MAF=0.22) in LRP2 (associated with trefoil factor 3 [TFF3]), and rs11048230 (P=4.77×10(-8); MAF=0.10) in an intergenic region near RASSF8 (associated with vascular endothelial growth factor). Validation in the CKDGen Consortium (n=67,093) showed that only rs7565788 at LRP2, which encodes megalin, was associated with eGFR (P=0.003). Sequencing of exons 16-72 of LRP2 in 200 unrelated individuals at extremes of urinary TFF3 levels identified 197 variants (152 rare; MAF<0.05), 31 of which (27 rare) were nonsynonymous. In aggregate testing, rare variants were associated with urinary TFF3 levels (P=0.003), and the lead GWAS signal was not explained by these variants. Knockdown of LRP2 in zebrafish did not alter the renal phenotype in static or kidney injury models. In conclusion, this study revealed common variants associated with urinary levels of TFF3, kidney injury molecule-1, and vascular endothelial growth factor and identified a cluster of rare variants independently associated with TFF3. PMID:24876117

  11. Novel Pathogenic Variant (c.3178G>A) in the SMC1A Gene in a Family With Cornelia de Lange Syndrome Identified by Exome Sequencing

    PubMed Central

    Jang, Mi-Ae; Lee, Chang-Woo

    2015-01-01

    Cornelia de Lange syndrome (CdLS) is a clinically and genetically heterogeneous congenital anomaly. Mutations in the NIPBL gene account for a half of the affected individuals. We describe a family with CdLS carrying a novel pathogenic variant of the SMC1A gene identified by exome sequencing. The proband was a 3-yr-old boy presenting with a developmental delay. He had distinctive facial features without major structural anomalies and tested negative for the NIPBL gene. His younger sister, mother, and maternal grandmother presented with mild mental retardation. By exome sequencing of the proband, a novel SMC1A variant, c.3178G>A, was identified, which was expected to cause an amino acid substitution (p.Glu1060Lys) in the highly conserved coiled-coil domain of the SMC1A protein. Sanger sequencing confirmed that the three female relatives with mental retardation also carry this variant. Our results reveal that SMC1A gene defects are associated with milder phenotypes of CdLS. Furthermore, we showed that exome sequencing could be a useful tool to identify pathogenic variants in patients with CdLS. PMID:26354354

  12. Novel pathogenic variant (c.3178G>A) in the SMC1A gene in a family with Cornelia de Lange syndrome identified by exome sequencing.

    PubMed

    Jang, Mi Ae; Lee, Chang Woo; Kim, Jin Kyung; Ki, Chang Seok

    2015-11-01

    Cornelia de Lange syndrome (CdLS) is a clinically and genetically heterogeneous congenital anomaly. Mutations in the NIPBL gene account for a half of the affected individuals. We describe a family with CdLS carrying a novel pathogenic variant of the SMC1A gene identified by exome sequencing. The proband was a 3-yr-old boy presenting with a developmental delay. He had distinctive facial features without major structural anomalies and tested negative for the NIPBL gene. His younger sister, mother, and maternal grandmother presented with mild mental retardation. By exome sequencing of the proband, a novel SMC1A variant, c.3178G>A, was identified, which was expected to cause an amino acid substitution (p.Glu1060Lys) in the highly conserved coiled-coil domain of the SMC1A protein. Sanger sequencing confirmed that the three female relatives with mental retardation also carry this variant. Our results reveal that SMC1A gene defects are associated with milder phenotypes of CdLS. Furthermore, we showed that exome sequencing could be a useful tool to identify pathogenic variants in patients with CdLS. PMID:26354354

  13. Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data

    PubMed Central

    Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A.; Larsen, Martin Jakob

    2016-01-01

    Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths. PMID:27002637

  14. Complete Genome Sequences of Two Genetically Distinct Variants of Porcine Epidemic Diarrhea Virus in the Eastern Region of Thailand

    PubMed Central

    Cheun-Arom, Thaniwan; Temeeyasen, Gun; Srijangwad, Anchalee; Tripipat, Thitima; Sangmalee, Suphattra; Vui, Dam Thi; Chuanasa, Taksina; Tantituvanont, Angkana

    2015-01-01

    Porcine epidemic diarrhea virus (PEDV) has continued to cause sporadic outbreaks in Thailand since 2007. Previously, PEDV in Thailand was a new variant containing an insertion and deletion in the spike gene. Herein, full-length genome sequences are reported for two variants of PEDV isolates from pigs displaying diarrhea in Thailand. PMID:26112783

  15. Sequence variants in oxytocin pathway genes and preterm birth: a candidate gene association study

    PubMed Central

    2013-01-01

    Background Preterm birth (PTB) is a complex disorder associated with significant neonatal mortality and morbidity and long-term adverse health consequences. Multiple lines of evidence suggest that genetic factors play an important role in its etiology. This study was designed to identify genetic variation associated with PTB in oxytocin pathway genes whose role in parturition is well known. Methods To identify common genetic variants predisposing to PTB, we genotyped 16 single nucleotide polymorphisms (SNPs) in the oxytocin (OXT), oxytocin receptor (OXTR), and leucyl/cystinyl aminopeptidase (LNPEP) genes in 651 case infants from the U.S. and one or both of their parents. In addition, we examined the role of rare genetic variation in susceptibility to PTB by conducting direct sequence analysis of OXTR in 1394 cases and 1112 controls from the U.S., Argentina, Denmark, and Finland. This study was further extended to maternal triads (maternal grandparents-mother of a case infant, N=309). We also performed in vitro analysis of selected rare OXTR missense variants to evaluate their functional importance. Results Maternal genetic effect analysis of the SNP genotype data revealed four SNPs in LNPEP that show significant association with prematurity. In our case–control sequence analysis, we detected fourteen coding variants in exon 3 of OXTR, all but four of which were found in cases only. Of the fourteen variants, three were previously unreported novel rare variants. When the sequence data from the maternal triads were analyzed using the transmission disequilibrium test, two common missense SNPs (rs4686302 and rs237902) in OXTR showed suggestive association for three gestational age subgroups. In vitro functional assays showed a significant difference in ligand binding between wild-type and two mutant receptors. Conclusions Our study suggests an association between maternal common polymorphisms in LNPEP and susceptibility to PTB. Maternal OXTR missense SNPs rs4686302

  16. Targeted next-generation sequencing reveals multiple deleterious variants in OPLL-associated genes.

    PubMed

    Chen, Xin; Guo, Jun; Cai, Tao; Zhang, Fengshan; Pan, Shengfa; Zhang, Li; Wang, Shaobo; Zhou, Feifei; Diao, Yinze; Zhao, Yanbin; Chen, Zhen; Liu, Xiaoguang; Chen, Zhongqiang; Liu, Zhongjun; Sun, Yu; Du, Jie

    2016-01-01

    Ossification of the posterior longitudinal ligament of the spine (OPLL), which is characterized by ectopic bone formation in the spinal ligaments, can cause spinal-cord compression. To date, at least 11 susceptibility genes have been genetically linked to OPLL. In order to identify potential deleterious alleles in these OPLL-associated genes, we designed a capture array encompassing all coding regions of the target genes for next-generation sequencing (NGS) in a cohort of 55 unrelated patients with OPLL. By bioinformatics analyses, we successfully identified three novel and five extremely rare variants (MAF < 0.005). These variants were predicted to be deleterious by commonly used various algorithms, thereby resulting in missense mutations in four OPLL-associated genes (i.e., COL6A1, COL11A2, FGFR1, and BMP2). Furthermore, potential effects of the patient with p.Q89E of BMP2 were confirmed by a markedly increased BMP2 level in peripheral blood samples. Notably, seven of the variants were found to be associated with the patients with continuous subtype changes by cervical spinal radiological analyses. Taken together, our findings revealed for the first time that deleterious coding variants of the four OPLL-associated genes are potentially pathogenic in the patients with OPLL. PMID:27246988

  17. Targeted next-generation sequencing reveals multiple deleterious variants in OPLL-associated genes

    PubMed Central

    Chen, Xin; Guo, Jun; Cai, Tao; Zhang, Fengshan; Pan, Shengfa; Zhang, Li; Wang, Shaobo; Zhou, Feifei; Diao, Yinze; Zhao, Yanbin; Chen, Zhen; Liu, Xiaoguang; Chen, Zhongqiang; Liu, Zhongjun; Sun, Yu; Du, Jie

    2016-01-01

    Ossification of the posterior longitudinal ligament of the spine (OPLL), which is characterized by ectopic bone formation in the spinal ligaments, can cause spinal-cord compression. To date, at least 11 susceptibility genes have been genetically linked to OPLL. In order to identify potential deleterious alleles in these OPLL-associated genes, we designed a capture array encompassing all coding regions of the target genes for next-generation sequencing (NGS) in a cohort of 55 unrelated patients with OPLL. By bioinformatics analyses, we successfully identified three novel and five extremely rare variants (MAF < 0.005). These variants were predicted to be deleterious by commonly used various algorithms, thereby resulting in missense mutations in four OPLL-associated genes (i.e., COL6A1, COL11A2, FGFR1, and BMP2). Furthermore, potential effects of the patient with p.Q89E of BMP2 were confirmed by a markedly increased BMP2 level in peripheral blood samples. Notably, seven of the variants were found to be associated with the patients with continuous subtype changes by cervical spinal radiological analyses. Taken together, our findings revealed for the first time that deleterious coding variants of the four OPLL-associated genes are potentially pathogenic in the patients with OPLL. PMID:27246988

  18. Large-scale identification of sequence variants impacting human transcription factor occupancy in vivo

    PubMed Central

    Maurano, Matthew T.; Haugen, Eric; Sandstrom, Richard; Vierstra, Jeff; Shafer, Anthony; Kaul, Rajinder; Stamatoyannopoulos, John A.

    2015-01-01

    The function of human regulatory regions depends exquisitely on their local genomic environment and cellular context, complicating experimental analysis of the expanding pool of common disease- and trait-associated variants that localize within regulatory DNA. We leverage allelically resolved genomic DNaseI footprinting data encompassing 166 individuals and 114 cell types to identify >60,000 common variants that directly impact transcription factor occupancy and regulatory DNA accessibility in vivo. The unprecedented scale of these data enable systematic analysis of the impact of sequence variation on transcription factor occupancy in vivo. We leverage this analysis to develop accurate models of variation affecting the recognition sites for diverse transcription factors, and apply these models to discriminate nearly 500,000 common regulatory variants likely to affect transcription factor occupancy across the human genome. The approach and results provide a novel foundation for analysis and interpretation of noncoding variation in complete human genomes, and for systems-level investigation of disease-associated variants. PMID:26502339

  19. Hierarchical Bayesian model for rare variant association analysis integrating genotype uncertainty in human sequence data.

    PubMed

    He, Liang; Pitkäniemi, Janne; Sarin, Antti-Pekka; Salomaa, Veikko; Sillanpää, Mikko J; Ripatti, Samuli

    2015-02-01

    Next-generation sequencing (NGS) has led to the study of rare genetic variants, which possibly explain the missing heritability for complex diseases. Most existing methods for rare variant (RV) association detection do not account for the common presence of sequencing errors in NGS data. The errors can largely affect the power and perturb the accuracy of association tests due to rare observations of minor alleles. We developed a hierarchical Bayesian approach to estimate the association between RVs and complex diseases. Our integrated framework combines the misclassification probability with shrinkage-based Bayesian variable selection. It allows for flexibility in handling neutral and protective RVs with measurement error, and is robust enough for detecting causal RVs with a wide spectrum of minor allele frequency (MAF). Imputation uncertainty and MAF are incorporated into the integrated framework to achieve the optimal statistical power. We demonstrate that sequencing error does significantly affect the findings, and our proposed model can take advantage of it to improve statistical power in both simulated and real data. We further show that our model outperforms existing methods, such as sequence kernel association test (SKAT). Finally, we illustrate the behavior of the proposed method using a Finnish low-density lipoprotein cholesterol study, and show that it identifies an RV known as FH North Karelia in LDLR gene with three carriers in 1,155 individuals, which is missed by both SKAT and Granvil. PMID:25395270

  20. Identification of novel functional sequence variants in the gene for peptidase inhibitor 3

    PubMed Central

    Chowdhury, Mahboob A; Kuivaniemi, Helena; Romero, Roberto; Edwin, Samuel; Chaiworapongsa, Tinnakorn; Tromp, Gerard

    2006-01-01

    Background Peptidase inhibitor 3 (PI3) inhibits neutrophil elastase and proteinase-3, and has a potential role in skin and lung diseases as well as in cancer. Genome-wide expression profiling of chorioamniotic membranes revealed decreased expression of PI3 in women with preterm premature rupture of membranes. To elucidate the molecular mechanisms contributing to the decreased expression in amniotic membranes, the PI3 gene was searched for sequence variations and the functional significance of the identified promoter variants was studied. Methods Single nucleotide polymorphisms (SNPs) were identified by direct sequencing of PCR products spanning a region from 1,173 bp upstream to 1,266 bp downstream of the translation start site. Fourteen SNPs were genotyped from 112 and nine SNPs from 24 unrelated individuals. Putative transcription factor binding sites as detected by in silico search were verified by electrophoretic mobility shift assay (EMSA) using nuclear extract from Hela and amnion cell nuclear extract. Deviation from Hardy-Weinberg equilibrium (HWE) was tested by χ2 goodness-of-fit test. Haplotypes were estimated using expectation maximization (EM) algorithm. Results Twenty-three sequence variations were identified by direct sequencing of polymerase chain reaction (PCR) products covering 2,439 nt of the PI3 gene (-1,173 nt of promoter sequences and all three exons). Analysis of 112 unrelated individuals showed that 20 variants had minor allele frequencies (MAF) ranging from 0.02 to 0.46 representing "true polymorphisms", while three had MAF ≤ 0.01. Eleven variants were in the promoter region; several putative transcription factor binding sites were found at these sites by database searches. Differential binding of transcription factors was demonstrated at two polymorphic sites by electrophoretic mobility shift assays, both in amniotic and HeLa cell nuclear extracts. Differential binding of the transcription factor GATA1 at -689C>G site was confirmed by a

  1. Molecular cloning and nucleotide sequence of cDNA for human glucose-6-phosphate dehydrogenase variant A(-)

    SciTech Connect

    Hirono, A.; Beutler, E. )

    1988-06-01

    Glucose-6-phosphate dehydrogenase A(-) is a common variant in Blacks that causes sensitivity to drug- and infection-induced hemolytic anemia. A cDNA library was constructed from Epstein-Barr virus-transformed lymphoblastoid cells from a male who was G6PD A(-). One of four cDNA clones isolated contained a sequence not found in the other clones nor in the published cDNA sequence. Consisting of 138 bases and coding 46 amino acids, this segment of cDNA apparently is derived from the alternative splicing involving the 3{prime} end of intron 7. Comparison of the remaining sequences of these clones with the published sequence revealed three nucleotide substitutions: C{sup 33} {yields} G, G{sup 202} {yields} A, and A{sup 376} {yields} G. Each change produces a new restriction site. Genomic DNA from five G6PD A(-) individuals was amplified by the polymerase chain reaction. The findings of the same mutation in G6PD A(-) as is found in G6PD A(+) strongly suggests that the G6PD A(-) mutation arose in an individual with G6PD A(+), adding another mutation that causes the in vivo instability of this enzyme protein.

  2. Homozygous sequence variants in the FKBP10 gene underlie osteogenesis imperfecta in consanguineous families.

    PubMed

    Umair, Muhammad; Hassan, Annum; Jan, Abid; Ahmad, Farooq; Imran, Muhammad; Samman, Muhammad I; Basit, Sulman; Ahmad, Wasim

    2016-03-01

    Osteogenesis imperfecta (OI, MIM 610968) is a genetically and clinically heterogeneous disorder characterized by bone fragility. It is one of the rare forms of skeletal deformity caused by sequence variants in at least 14 different genes, including FKBP10 (MIM 607063) encoding protein FKBP65. Here we present three consanguineous families of Pakistani origin segregating OI in an autosomal-recessive pattern. Genotyping using either single-nucleotide polymorphism markers by Affymetrix GeneChip Human Mapping 250K Nsp array or polymorphic microsatellite markers revealed a homozygous region, containing a candidate gene FKBP10, among affected members on chromosome 17q21.2. Sequencing the FKBP10 gene revealed a homozygous novel nonsense variant (c.1490G>A, p.Trp497*) in the family A and two previously reported variants, including a missense (c.344G>A, p.Arg115Gln), in the family B and duplication of a nucleotide C (c.831dupC, p.Gly278ArgfsX295) in the family C. Our findings further extend the body of evidence that supports the importance of FKBP10 gene in the development of skeletal system. PMID:26538303

  3. Predicting Mendelian Disease-Causing Non-Synonymous Single Nucleotide Variants in Exome Sequencing Studies

    PubMed Central

    Bao, Su-Ying; Yang, Wanling; Ho, Shu-Leong; Song, Yong-Qiang; Sham, Pak C.

    2013-01-01

    Exome sequencing is becoming a standard tool for mapping Mendelian disease-causing (or pathogenic) non-synonymous single nucleotide variants (nsSNVs). Minor allele frequency (MAF) filtering approach and functional prediction methods are commonly used to identify candidate pathogenic mutations in these studies. Combining multiple functional prediction methods may increase accuracy in prediction. Here, we propose to use a logit model to combine multiple prediction methods and compute an unbiased probability of a rare variant being pathogenic. Also, for the first time we assess the predictive power of seven prediction methods (including SIFT, PolyPhen2, CONDEL, and logit) in predicting pathogenic nsSNVs from other rare variants, which reflects the situation after MAF filtering is done in exome-sequencing studies. We found that a logit model combining all or some original prediction methods outperforms other methods examined, but is unable to discriminate between autosomal dominant and autosomal recessive disease mutations. Finally, based on the predictions of the logit model, we estimate that an individual has around 5% of rare nsSNVs that are pathogenic and carries ∼22 pathogenic derived alleles at least, which if made homozygous by consanguineous marriages may lead to recessive diseases. PMID:23341771

  4. Predicting mendelian disease-causing non-synonymous single nucleotide variants in exome sequencing studies.

    PubMed

    Li, Miao-Xin; Kwan, Johnny S H; Bao, Su-Ying; Yang, Wanling; Ho, Shu-Leong; Song, Yong-Qiang; Sham, Pak C

    2013-01-01

    Exome sequencing is becoming a standard tool for mapping Mendelian disease-causing (or pathogenic) non-synonymous single nucleotide variants (nsSNVs). Minor allele frequency (MAF) filtering approach and functional prediction methods are commonly used to identify candidate pathogenic mutations in these studies. Combining multiple functional prediction methods may increase accuracy in prediction. Here, we propose to use a logit model to combine multiple prediction methods and compute an unbiased probability of a rare variant being pathogenic. Also, for the first time we assess the predictive power of seven prediction methods (including SIFT, PolyPhen2, CONDEL, and logit) in predicting pathogenic nsSNVs from other rare variants, which reflects the situation after MAF filtering is done in exome-sequencing studies. We found that a logit model combining all or some original prediction methods outperforms other methods examined, but is unable to discriminate between autosomal dominant and autosomal recessive disease mutations. Finally, based on the predictions of the logit model, we estimate that an individual has around 5% of rare nsSNVs that are pathogenic and carries ~22 pathogenic derived alleles at least, which if made homozygous by consanguineous marriages may lead to recessive diseases. PMID:23341771

  5. Genetics Talks to Epigenetics? The Interplay Between Sequence Variants and Chromatin Structure

    PubMed Central

    Zaina, Silvio; Pérez-Luque, Elva L; Lund, Gertrud

    2010-01-01

    Transcription is regulated by two major mechanisms. On the one hand, changes in DNA sequence are responsible for genetic gene regulation. On the other hand, chromatin structure regulates gene activity at the epigenetic level. Given the fundamental participation of these mechanisms in transcriptional regulation of virtually any gene, they are likely to co-regulate a significant proportion of the genome. The simple concept behind this idea is that a mutation may have a significant impact on local chromatin structure by modifying DNA methylation patterns or histone type recruitment. Yet, the relevance of these interactions is poorly understood. Elucidating how genetic and epigenetic mechanisms co-participate in regulating transcription may assist in some of the unresolved cases of genetic variant-phenotype association. One example is loci that have biologically predictable functions but genotypes that fail to correlate with phenotype, particularly disease outcome. Conversely, a crosstalk between genetics and epigenetics may provide a mechanistic explanation for cases in which a convincing association between phenotype and a genetic variant has been established, but the latter does not lie in a promoter or protein coding sequence. Here, we review recently published data in the field and discuss their implications for genetic variant-phenotype association studies. PMID:21286314

  6. Diversity of acid stress resistant variants of Listeria monocytogenes and the potential role of ribosomal protein S21 encoded by rpsU

    PubMed Central

    Metselaar, Karin I.; den Besten, Heidy M. W.; Boekhorst, Jos; van Hijum, Sacha A. F. T.; Zwietering, Marcel H.; Abee, Tjakko

    2015-01-01

    The dynamic response of microorganisms to environmental conditions depends on the behavior of individual cells within the population. Adverse environments can select for stable stress resistant subpopulations. In this study, we aimed to get more insight in the diversity within Listeria monocytogenes LO28 populations, and the genetic basis for the increased resistance of stable resistant fractions isolated after acid exposure. Phenotypic cluster analysis of 23 variants resulted in three clusters and four individual variants and revealed multiple-stress resistance, with both unique and overlapping features related to stress resistance, growth, motility, biofilm formation, and virulence indicators. A higher glutamate decarboxylase activity correlated with increased acid resistance. Whole genome sequencing revealed mutations in rpsU, encoding ribosomal protein S21 in the largest phenotypic cluster, while mutations in ctsR, which were previously shown to be responsible for increased resistance of heat and high hydrostatic pressure resistant variants, were not found in the acid resistant variants. This underlined that large population diversity exists within one L. monocytogenes strain and that different adverse conditions drive selection for different variants. The finding that acid stress selects for rpsU variants provides potential insights in the mechanisms underlying population diversity of L. monocytogenes. PMID:26005439

  7. A variant of Plasmodium ovale; analysis of its 18S ribosomal RNA gene sequence.

    PubMed

    Miyake, H; Suwa, S; Kimura, M; Wataya, Y

    1997-01-01

    We report here a new variant of human malaria parasite found by comparison of diagnostic results obtained from a new DNA diagnostic method named microtiter plate-hybridization (MPH) and traditional microscopic method. Total five cases of malaria were diagnosed as microscopy-positive but MPH-negative; one case was found in epidemiological research in Vietnam and four cases were obtained from imported malaria in Japan. Although they were quite similar to typical P. ovale morphologically in microscopy, sequence analysis of PCR-amplified DNA fragment revealed that their 18S ribosomal RNA gene sequence was different from published sequence of P. ovale. Combination of MPH and microscopic examination provides us a new method for detection of a new type of malaria parasite which is difficult to distinguish morphologically. PMID:9586115

  8. Sequencing the IL4 locus in African Americans implicates rare noncoding variants in asthma susceptibility

    PubMed Central

    Haller, Gabe; Torgerson, Dara G.; Ober, Carole; Thompson, Emma E

    2014-01-01

    Background Common genetic variations in the IL4 gene have been associated with asthma and atopy in European and Asian populations, but not in African Americans. Objective Because populations of African descent have increased levels of genetic variation compared to other populations, particularly with respect to low frequency or rare variants, we hypothesized that rare variants in the IL4 gene contribute to the development of asthma in African Americans. Methods To test this hypothesis, we sequenced the IL4 locus in 72 African Americans with asthma and 70 African American non-asthmatic controls to identify novel and rare polymorphisms in the IL4 gene that may be contributing to asthma susceptibility. Results We report an excess of private non-coding SNPs in the subjects with asthma compared to non-asthmatic control subjects (P=0.031). Tajima’s D is significantly more negative in cases (−0.375) compared to controls (−0.073) (P=0.04), reflecting an excess of rare variants in the cases. Conclusions Our findings indicate that SNPs at the IL4 locus that are potentially exclusive to African Americans are associated with susceptibility to asthma. Only three of the 26 private SNPs (i.e., SNPs present only in the cases or only in the controls) are tagged by single SNPs on one of the common genotyping platforms used in genome-wide association studies. We also find that most of the private SNPs cannot be reliably imputed, highlighting the importance of sequencing to identify genetic variants contributing to common diseases in African Americans. PMID:19910025

  9. Identification of two novel SMCHD1 sequence variants in families with FSHD-like muscular dystrophy.

    PubMed

    Winston, Jincy; Duerden, Laura; Mort, Matthew; Frayling, Ian M; Rogers, Mark T; Upadhyaya, Meena

    2015-01-01

    Facioscapulohumeral muscular dystrophy 1 (FSHD1) is caused by a contraction in the number of D4Z4 repeats on chromosome 4, resulting in relaxation of D4Z4 chromatin causing inappropriate expression of DUX4 in skeletal muscle. Clinical severity is inversely related to the number of repeats. In contrast, FSHD2 patients also have inappropriate expression of DUX4 in skeletal muscle, but due to constitutional mutations in SMCHD1 (structural maintenance of chromosomes flexible hinge domain containing 1), which cause global hypomethylation and hence general relaxation of chromatin. Thirty patients originally referred for FSHD testing were screened for SMCHD1 mutations. Twenty-nine had >11 D4Z4 repeats. SMCHD1 c.1040+1G>A, a pathogenic splice-site variant, was identified in a FSHD1 family with a borderline number of D4Z4 repeats (10) and a variable phenotype (in which a LMNA1 sequence variant was previously described), and SMCHD1 c.2606 G>T, a putative missense variant (p.Gly869Val) with strong in vitro indications of pathogenicity, was identified in a family with an unusual muscular dystrophy with some FSHD-like features. The two families described here emphasise the genetic complexity of muscular dystrophies. As SMCHD1 has a wider role in global genomic methylation, the possibility exists that it could be involved in other complex undiagnosed muscle disorders. Thus far, only 15 constitutional mutations have been identified in SMCHD1, and these two sequence variants add to the molecular and phenotypic spectrum associated with FSHD. PMID:24755953

  10. The Use of Non-Variant Sites to Improve the Clinical Assessment of Whole-Genome Sequence Data

    PubMed Central

    Griggio, Francesca; Garonzi, Marianna; Cantaloni, Chiara; Centomo, Cesare; Vargas, Sergio Marin; Descombes, Patrick; Marquis, Julien; Collino, Sebastiano; Franceschi, Claudio; Garagnani, Paolo; Salisbury, Benjamin A.; Harvey, John Max; Delledonne, Massimo

    2015-01-01

    Genetic testing, which is now a routine part of clinical practice and disease management protocols, is often based on the assessment of small panels of variants or genes. On the other hand, continuous improvements in the speed and per-base costs of sequencing have now made whole exome sequencing (WES) and whole genome sequencing (WGS) viable strategies for targeted or complete genetic analysis, respectively. Standard WGS/WES data analytical workflows generally rely on calling of sequence variants respect to the reference genome sequence. However, the reference genome sequence contains a large number of sites represented by rare alleles, by known pathogenic alleles and by alleles strongly associated to disease by GWAS. It’s thus critical, for clinical applications of WGS and WES, to interpret whether non-variant sites are homozygous for the reference allele or if the corresponding genotype cannot be reliably called. Here we show that an alternative analytical approach based on the analysis of both variant and non-variant sites from WGS data allows to genotype more than 92% of sites corresponding to known SNPs compared to 6% genotyped by standard variant analysis. These include homozygous reference sites of clinical interest, thus leading to a broad and comprehensive characterization of variation necessary to an accurate evaluation of disease risk. Altogether, our findings indicate that characterization of both variant and non-variant clinically informative sites in the genome is necessary to allow an accurate clinical assessment of a personal genome. Finally, we propose a highly efficient extended VCF (eVCF) file format which allows to store genotype calls for sites of clinical interest while remaining compatible with current variant interpretation software. PMID:26147798

  11. BI-29VARIANT ANALYSIS OF PRIMARY AND RECURRENT GLIOBLASTOMA USING ION AMPLISEQTM COMPREHENSIVE CANCER PANEL AND WHOLE EXOME SEQUENCING

    PubMed Central

    Virk, Selene; Gibson, Richard; Barnholtz-Sloan, Jill; Quinones-Mateu, Miguel

    2014-01-01

    BACKGROUND: Glioblastoma is the most deadly and frequently occurring adult primary brain tumor. The characterization of genetic variants and molecular signatures in glioblastoma is heavily reliant upon genomic sequencing. The availability of rapid and economical sequencing platforms is necessary for the widespread adoption of high-throughput sequencing in the clinical environment. METHODS: Utilizing patient matched triplet samples consisting of normal blood and snap-frozen primary and recurrent glioblastoma tumor samples from the Ohio Brain Tumor Study, we compared whole exome sequencing data from TCGA to sequencing data obtained from Ion AmpliSeqTM Comprehensive Cancer Panel (CCP). RESULTS: As we anticipated, the number of variants identified from the exome sequencing data (n = 619) was greater than those identified from the Ion AmpliSeqTM CCP data (n = 22). Surprisingly, there were only six variants common across both data sets. In addition, none of the variants from the Ion AmpliSeqTM CCP data were shared across patient samples. CONCLUSIONS: Our pilot results suggest disparities in both the number and category of mutations identified from analysis of data generated from the Ion AmpliSeqTM CCP and whole exome sequencing. Future studies are needed to elucidate the nature of these differences and to determine the clinical relevance of variants that may be associated with glioblastoma recurrence and response to treatment. High-throughput sequencing based cancer panels may be improved by the development of brain tumor specific panels.

  12. ROVER variant caller: read-pair overlap considerate variant-calling software applied to PCR-based massively parallel sequencing datasets

    PubMed Central

    2014-01-01

    Background We recently described Hi-Plex, a highly multiplexed PCR-based target-enrichment system for massively parallel sequencing (MPS), which allows the uniform definition of library size so that subsequent paired-end sequencing can achieve complete overlap of read pairs. Variant calling from Hi-Plex-derived datasets can thus rely on the identification of variants appearing in both reads of read-pairs, permitting stringent filtering of sequencing chemistry-induced errors. These principles underly ROVER software (derived from Read Overlap PCR-MPS variant caller), which we have recently used to report the screening for genetic mutations in the breast cancer predisposition gene PALB2. Here, we describe the algorithms underlying ROVER and its usage. Results ROVER enables users to quickly and accurately identify genetic variants from PCR-targeted, overlapping paired-end MPS datasets. The open-source availability of the software and threshold tailorability enables broad access for a range of PCR-MPS users. Methods ROVER is implemented in Python and runs on all popular POSIX-like operating systems (Linux, OS X). The software accepts a tab-delimited text file listing the coordinates of the target-specific primers used for targeted enrichment based on a specified genome-build. It also accepts aligned sequence files resulting from mapping to the same genome-build. ROVER identifies the amplicon a given read-pair represents and removes the primer sequences by using the mapping co-ordinates and primer co-ordinates. It considers overlapping read-pairs with respect to primer-intervening sequence. Only when a variant is observed in both reads of a read-pair does the signal contribute to a tally of read-pairs containing or not containing the variant. A user-defined threshold informs the minimum number of, and proportion of, read-pairs a variant must be observed in for a ‘call’ to be made. ROVER also reports the depth of coverage across amplicons to facilitate the

  13. Detection of nucleic acid sequences by invader-directed cleavage

    DOEpatents

    Brow, Mary Ann D.; Hall, Jeff Steven Grotelueschen; Lyamichev, Victor; Olive, David Michael; Prudent, James Robert

    1999-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The 5' nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based by charge.

  14. Genetic Mapping and Exome Sequencing Identify Variants Associated with Five Novel Diseases

    PubMed Central

    Puffenberger, Erik G.; Jinks, Robert N.; Sougnez, Carrie; Cibulskis, Kristian; Willert, Rebecca A.; Achilly, Nathan P.; Cassidy, Ryan P.; Fiorentini, Christopher J.; Heiken, Kory F.; Lawrence, Johnny J.; Mahoney, Molly H.; Miller, Christopher J.; Nair, Devika T.; Politi, Kristin A.; Worcester, Kimberly N.; Setton, Roni A.; DiPiazza, Rosa; Sherman, Eric A.; Eastman, James T.; Francklyn, Christopher; Robey-Bond, Susan; Rider, Nicholas L.; Gabriel, Stacey; Morton, D. Holmes; Strauss, Kevin A.

    2012-01-01

    The Clinic for Special Children (CSC) has integrated biochemical and molecular methods into a rural pediatric practice serving Old Order Amish and Mennonite (Plain) children. Among the Plain people, we have used single nucleotide polymorphism (SNP) microarrays to genetically map recessive disorders to large autozygous haplotype blocks (mean = 4.4 Mb) that contain many genes (mean = 79). For some, uninformative mapping or large gene lists preclude disease-gene identification by Sanger sequencing. Seven such conditions were selected for exome sequencing at the Broad Institute; all had been previously mapped at the CSC using low density SNP microarrays coupled with autozygosity and linkage analyses. Using between 1 and 5 patient samples per disorder, we identified sequence variants in the known disease-causing genes SLC6A3 and FLVCR1, and present evidence to strongly support the pathogenicity of variants identified in TUBGCP6, BRAT1, SNIP1, CRADD, and HARS. Our results reveal the power of coupling new genotyping technologies to population-specific genetic knowledge and robust clinical data. PMID:22279524

  15. RefCNV: Identification of Gene-Based Copy Number Variants Using Whole Exome Sequencing

    PubMed Central

    Chang, Lun-Ching; Das, Biswajit; Lih, Chih-Jian; Si, Han; Camalier, Corinne E.; McGregor, Paul M.; Polley, Eric

    2016-01-01

    With rapid advances in DNA sequencing technologies, whole exome sequencing (WES) has become a popular approach for detecting somatic mutations in oncology studies. The initial intent of WES was to characterize single nucleotide variants, but it was observed that the number of sequencing reads that mapped to a genomic region correlated with the DNA copy number variants (CNVs). We propose a method RefCNV that uses a reference set to estimate the distribution of the coverage for each exon. The construction of the reference set includes an evaluation of the sources of variability in the coverage distribution. We observed that the processing steps had an impact on the coverage distribution. For each exon, we compared the observed coverage with the expected normal coverage. Thresholds for determining CNVs were selected to control the false-positive error rate. RefCNV prediction correlated significantly (r = 0.96–0.86) with CNV measured by digital polymerase chain reaction for MET (7q31), EGFR (7p12), or ERBB2 (17q12) in 13 tumor cell lines. The genome-wide CNV analysis showed a good overall correlation (Spearman’s coefficient = 0.82) between RefCNV estimation and publicly available CNV data in Cancer Cell Line Encyclopedia. RefCNV also showed better performance than three other CNV estimation methods in genome-wide CNV analysis. PMID:27147817

  16. Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants

    PubMed Central

    Du, Jiang; Bjornson, Robert D.; Zhang, Zhengdong D.; Kong, Yong; Snyder, Michael; Gerstein, Mark B.

    2009-01-01

    The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen), with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs). SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome.) To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of human genomes at

  17. EXCAVATOR: detecting copy number variants from whole-exome sequencing data.

    PubMed

    Magi, Alberto; Tattini, Lorenzo; Cifola, Ingrid; D'Aurizio, Romina; Benelli, Matteo; Mangano, Eleonora; Battaglia, Cristina; Bonora, Elena; Kurg, Ants; Seri, Marco; Magini, Pamela; Giusti, Betti; Romeo, Giovanni; Pippucci, Tommaso; De Bellis, Gianluca; Abbate, Rosanna; Gensini, Gian Franco

    2013-01-01

    We developed a novel software tool, EXCAVATOR, for the detection of copy number variants (CNVs) from whole-exome sequencing data. EXCAVATOR combines a three-step normalization procedure with a novel heterogeneous hidden Markov model algorithm and a calling method that classifies genomic regions into five copy number states. We validate EXCAVATOR on three datasets and compare the results with three other methods. These analyses show that EXCAVATOR outperforms the other methods and is therefore a valuable tool for the investigation of CNVs in largescale projects, as well as in clinical research and diagnostics. EXCAVATOR is freely available at http://sourceforge.net/projects/excavatortool/. PMID:24172663

  18. An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data

    PubMed Central

    Jun, Goo; Wing, Mary Kate; Abecasis, Gonçalo R.; Kang, Hyun Min

    2015-01-01

    The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes Project show that the pipeline provides effective filtering against false positive variants and high power to detect true variants. Our pipeline has already contributed to variant detection and genotyping in several large-scale sequencing projects, including the 1000 Genomes Project and the NHLBI Exome Sequencing Project. We hope it will now prove useful to many medical sequencing studies. PMID:25883319

  19. VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research.

    PubMed

    Lai, Zhongwu; Markovets, Aleksandra; Ahdesmaki, Miika; Chapman, Brad; Hofmann, Oliver; McEwen, Robert; Johnson, Justin; Dougherty, Brian; Barrett, J Carl; Dry, Jonathan R

    2016-06-20

    Accurate variant calling in next generation sequencing (NGS) is critical to understand cancer genomes better. Here we present VarDict, a novel and versatile variant caller for both DNA- and RNA-sequencing data. VarDict simultaneously calls SNV, MNV, InDels, complex and structural variants, expanding the detected genetic driver landscape of tumors. It performs local realignments on the fly for more accurate allele frequency estimation. VarDict performance scales linearly to sequencing depth, enabling ultra-deep sequencing used to explore tumor evolution or detect tumor DNA circulating in blood. In addition, VarDict performs amplicon aware variant calling for polymerase chain reaction (PCR)-based targeted sequencing often used in diagnostic settings, and is able to detect PCR artifacts. Finally, VarDict also detects differences in somatic and loss of heterozygosity variants between paired samples. VarDict reprocessing of The Cancer Genome Atlas (TCGA) Lung Adenocarcinoma dataset called known driver mutations in KRAS, EGFR, BRAF, PIK3CA and MET in 16% more patients than previously published variant calls. We believe VarDict will greatly facilitate application of NGS in clinical cancer research. PMID:27060149

  20. VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research

    PubMed Central

    Lai, Zhongwu; Markovets, Aleksandra; Ahdesmaki, Miika; Chapman, Brad; Hofmann, Oliver; McEwen, Robert; Johnson, Justin; Dougherty, Brian; Barrett, J. Carl; Dry, Jonathan R.

    2016-01-01

    Accurate variant calling in next generation sequencing (NGS) is critical to understand cancer genomes better. Here we present VarDict, a novel and versatile variant caller for both DNA- and RNA-sequencing data. VarDict simultaneously calls SNV, MNV, InDels, complex and structural variants, expanding the detected genetic driver landscape of tumors. It performs local realignments on the fly for more accurate allele frequency estimation. VarDict performance scales linearly to sequencing depth, enabling ultra-deep sequencing used to explore tumor evolution or detect tumor DNA circulating in blood. In addition, VarDict performs amplicon aware variant calling for polymerase chain reaction (PCR)-based targeted sequencing often used in diagnostic settings, and is able to detect PCR artifacts. Finally, VarDict also detects differences in somatic and loss of heterozygosity variants between paired samples. VarDict reprocessing of The Cancer Genome Atlas (TCGA) Lung Adenocarcinoma dataset called known driver mutations in KRAS, EGFR, BRAF, PIK3CA and MET in 16% more patients than previously published variant calls. We believe VarDict will greatly facilitate application of NGS in clinical cancer research. PMID:27060149

  1. Snake venom. The amino acid sequence of protein A from Dendroaspis polylepis polylepis (black mamba) venom.

    PubMed

    Joubert, F J; Strydom, D J

    1980-12-01

    Protein A from Dendroaspis polylepis polylepis venom comprises 81 amino acids, including ten half-cystine residues. The complete primary structures of protein A and its variant A' were elucidated. The sequences of proteins A and A', which differ in a single position, show no homology with various neurotoxins and non-neurotoxic proteins and represent a new type of elapid venom protein. PMID:7461607

  2. The number of candidate variants in exome sequencing for Mendelian disease under no genetic heterogeneity.

    PubMed

    Nishino, Jo; Mano, Shuhei

    2013-01-01

    There has been recent success in identifying disease-causing variants in Mendelian disorders by exome sequencing followed by simple filtering techniques. Studies generally assume complete or high penetrance. However, there are likely many failed and unpublished studies due in part to incomplete penetrance or phenocopy. In this study, the expected number of candidate single-nucleotide variants (SNVs) in exome data for autosomal dominant or recessive Mendelian disorders was investigated under the assumption of "no genetic heterogeneity." All variants were assumed to be under the "null model," and sample allele frequencies were modeled using a standard population genetics theory. To investigate the properties of pedigree data, full-sibs were considered in addition to unrelated individuals. In both cases, particularly regarding full-sibs, the number of SNVs remained very high without controls. The high efficacy of controls was also confirmed. When controls were used with a relatively large total sample size (e.g., N = 20, 50), filtering incorporating of incomplete penetrance and phenocopy efficiently reduced the number of candidate SNVs. This suggests that filtering is useful when an assumption of no "genetic heterogeneity" is appropriate and could provide general guidelines for sample size determination. PMID:23762180

  3. Focus group discussions on secondary variants and next-generation sequencing technologies.

    PubMed

    Christenhusz, Gabrielle M; Devriendt, Koenraad; Van Esch, Hilde; Dierickx, Kris

    2015-04-01

    The clinical application of new genetic technologies will be and already is of great benefit to children with unexplained developmental disabilities or congenital anomalies. In most cases, it will be their parents who, together with medical professionals, make decisions about what should be disclosed and how the information will be used. We conducted eight exploratory focus group discussions with stakeholders to provide a broad sketch of concerns and ideas around the communication of results from next-generation sequencing technologies involving children. Stakeholders included those with (grand-) children of various ages and those without children; those involved professionally with genetics and those who were not; and a range of ages. Participants were asked to focus on which secondary variants they would and would not want disclosed about their (hypothetical) children or themselves. While the literature often concentrates on the medical and scientific characteristics of secondary variants, focus group participants were also interested in factors involving the parent-child relationship and the broader context. This resulted in more flexibility surrounding the types of secondary variants disclosed to parents than much of the literature currently supports. In addition, participants would on occasion use the same factors to argue opposing positions. The "Family Illness Paradigms model" can help explain this seeming contradiction. This model emphasises the importance of how the family reacts to personal and family experiences of disease and loss, more than the fact of having these experiences. PMID:25662393

  4. Amino Acid Substitutions in a Variant of IMP-1 Metallo-β-Lactamase

    PubMed Central

    Iyobe, Shizuko; Kusadokoro, Haruko; Ozaki, Junko; Matsumura, Naoki; Minami, Shinzaburo; Haruta, Shin; Sawai, Tetsuo; O'Hara, Koji

    2000-01-01

    In the course of surveying for the carbapenem-hydrolyzing metallo-β-lactamase gene blaIMP in pathogenic bacteria by the PCR method, we detected a gene encoding a variant metallo-β-lactamase, designated IMP-3, which differed from IMP-1 by having low hydrolyzing activity for penicillins and carbapenems. PCR product direct sequencing of a 2.2-kb segment revealed that the gene blaIMP-3 was located on a cassette inserted within a class I integron in the pMS390 plasmid. The 741-bp nucleotide sequence of blaIMP-3 was identical to that of blaIMP-1, except for seven base substitutions. Among these were two, at nucleotide positions 314 and 640, which caused amino acid alterations. Hybrid bla genes were constructed from blaIMP-3 and blaIMP-1 by recombinant DNA techniques, and β-lactamases encoded by these genes were compared with those of the parents IMP-3 and IMP-1 under the same experimental conditions. The kinetic parameters indicated that the inefficient hydrolysis of benzylpenicillin, ampicillin, imipenem, and ceftazidime by IMP-3 was due to the substitution of glycine for serine at amino acid residue 196 in the mature enzyme. This alteration corresponded to the presence of guanine instead of an adenine at nucleotide position 640 of the blaIMP-3 gene. This indicated that extension of the substrate profile in the metallo-β-lactamase IMP-1 compared to IMP-3 is the result of a one-step single-base mutation, suggesting that the gene blaIMP-3 is an ancestor of blaIMP-1. PMID:10898670

  5. Variants of beta-glucosidase

    SciTech Connect

    Fidantsef, Ana; Lamsa, Michael; Gorre-Clancy, Brian

    2015-07-14

    The present invention relates to variants of a parent beta-glucosidase, comprising a substitution at one or more positions corresponding to positions 142, 183, 266, and 703 of amino acids 1 to 842 of SEQ ID NO: 2 or corresponding to positions 142, 183, 266, and 705 of amino acids 1 to 844 of SEQ ID NO: 70, wherein the variant has beta-glucosidase activity. The present invention also relates to nucleotide sequences encoding the variant beta-glucosidases and to nucleic acid constructs, vectors, and host cells comprising the nucleotide sequences.

  6. Variants of beta-glucosidases

    SciTech Connect

    Fidantsef, Ana; Lamsa, Michael; Gorre-Clancy, Brian

    2014-10-07

    The present invention relates to variants of a parent beta-glucosidase, comprising a substitution at one or more positions corresponding to positions 142, 183, 266, and 703 of amino acids 1 to 842 of SEQ ID NO: 2 or corresponding to positions 142, 183, 266, and 705 of amino acids 1 to 844 of SEQ ID NO: 70, wherein the variant has beta-glucosidase activity. The present invention also relates to nucleotide sequences encoding the variant beta-glucosidases and to nucleic acid constructs, vectors, and host cells comprising the nucleotide sequences.

  7. Variants of beta-glucosidases

    DOEpatents

    Fidantsef, Ana; Lamsa, Michael; Clancy, Brian Gorre

    2008-08-19

    The present invention relates to variants of a parent beta-glucosidase, comprising a substitution at one or more positions corresponding to positions 142, 183, 266, and 703 of amino acids 1 to 842 of SEQ ID NO: 2 or corresponding to positions 142, 183, 266, and 705 of amino acids 1 to 844 of SEQ ID NO: 70, wherein the variant has beta-glucosidase activity. The present invention also relates to nucleotide sequences encoding the variant beta-glucosidases and to nucleic acid constructs, vectors, and host cells comprising the nucleotide sequences.

  8. Variants of beta-glucosidase

    DOEpatents

    Fidantsef, Ana; Lamsa, Michael; Gorre-Clancy, Brian

    2009-12-29

    The present invention relates to variants of a parent beta-glucosidase, comprising a substitution at one or more positions corresponding to positions 142, 183, 266, and 703 of amino acids 1 to 842 of SEQ ID NO: 2 or corresponding to positions 142, 183, 266, and 705 of amino acids 1 to 844 of SEQ ID NO: 70, wherein the variant has beta-glucosidase activity. The present invention also relates to nucleotide sequences encoding the variant beta-glucosidases and to nucleic acid constructs, vectors, and host cells comprising the nucleotide sequences.

  9. Genetic and Functional Sequence Variants of the SIRT3 Gene Promoter in Myocardial Infarction

    PubMed Central

    Yin, Xiaoyun; Pang, Shuchao; Huang, Jian; Cui, Yinghua; Yan, Bo

    2016-01-01

    Coronary artery disease (CAD), including myocardial infarction (MI), is a common complex disease that is caused by atherosclerosis. Although a large number of genetic variants have been associated with CAD, only 10% of CAD cases could be explained. It has been proposed that low frequent and rare genetic variants may be main causes for CAD. SIRT3, a mitochondrial deacetylase, plays important roles in mitochondrial function and metabolism. Lack of SIRT3 in experimental animal leads to several age-related diseases, including cardiovascular diseases. Therefore, SIRT3 gene variants may contribute to the MI development. In this study, SIRT3 gene promoter was genetically and functionally analyzed in large cohorts of MI patients (n = 319) and ethnic-matched controls (n = 322). Total twenty-three DNA sequence variants (DSVs) were identified, including 10 single-nucleotide polymorphisms (SNPs). Six novel heterozygous DSVs, g.237307A>G, g.237270G>A, g.237023_25del, g.236653C>A, g.236628G>C, g.236557T>C, and two SNPs g.237030C>T (rs12293349) and g.237022C>G (rs369344513), were identified in nine MI patients, but in none of controls. Three SNPs, g.236473C>T (rs11246029), g.236380_81ins (rs71019893) and g.236370C>G (rs185277566), were more significantly frequent in MI patients than controls (P<0.05). These DSVs and SNPs, except g.236557T>C, significantly decreased the transcriptional activity of the SIRT3 gene promoter in cultured HEK-293 cells and H9c2 cells. Therefore, these DSVs identified in MI patients may change SIRT3 level by affecting the transcriptional activity of SIRT3 gene promoter, contributing to the MI development as a risk factor. PMID:27078640

  10. ClinLabGeneticist: a tool for clinical management of genetic variants from whole exome sequencing in clinical genetic laboratories.

    PubMed

    Wang, Jinlian; Liao, Jun; Zhang, Jinglan; Cheng, Wei-Yi; Hakenberg, Jörg; Ma, Meng; Webb, Bryn D; Ramasamudram-Chakravarthi, Rajasekar; Karger, Lisa; Mehta, Lakshmi; Kornreich, Ruth; Diaz, George A; Li, Shuyu; Edelmann, Lisa; Chen, Rong

    2015-01-01

    Routine clinical application of whole exome sequencing remains challenging due to difficulties in variant interpretation, large dataset management, and workflow integration. We describe a tool named ClinLabGeneticist to implement a workflow in clinical laboratories for management of variant assessment in genetic testing and disease diagnosis. We established an extensive variant annotation data source for the identification of pathogenic variants. A dashboard was deployed to aid a multi-step, hierarchical review process leading to final clinical decisions on genetic variant assessment. In addition, a central database was built to archive all of the genetic testing data, notes, and comments throughout the review process, variant validation data by Sanger sequencing as well as the final clinical reports for future reference. The entire workflow including data entry, distribution of work assignments, variant evaluation and review, selection of variants for validation, report generation, and communications between various personnel is integrated into a single data management platform. Three case studies are presented to illustrate the utility of ClinLabGeneticist. ClinLabGeneticist is freely available to academia at http://rongchenlab.org/software/clinlabgeneticist . PMID:26338694

  11. Germline sequence variants in TGM3 and RGS22 confer risk of basal cell carcinoma

    PubMed Central

    Stacey, Simon N.; Sulem, Patrick; Gudbjartsson, Daniel F.; Jonasdottir, Aslaug; Thorleifsson, Gudmar; Gudjonsson, Sigurjon A.; Masson, Gisli; Gudmundsson, Julius; Sigurgeirsson, Bardur; Benediktsdottir, Kristrun R.; Thorisdottir, Kristin; Ragnarsson, Rafn; Fuentelsaz, Victoria; Corredera, Cristina; Grasa, Matilde; Planelles, Dolores; Sanmartin, Onofre; Rudnai, Peter; Gurzau, Eugene; Koppova, Kvetoslava; Hemminki, Kari; Nexø, Bjørn A; Tjønneland, Anne; Overvad, Kim; Johannsdottir, Hrefna; Helgadottir, Hafdis T.; Thorsteinsdottir, Unnur; Kong, Augustine; Vogel, Ulla; Kumar, Rajiv; Nagore, Eduardo; Mayordomo, José I.; Rafnar, Thorunn; Olafsson, Jon H.; Stefansson, Kari

    2014-01-01

    To search for new sequence variants that confer risk of cutaneous basal cell carcinoma (BCC), we conducted a genome-wide association study of 38.5 million single nucleotide polymorphisms (SNPs) and small indels identified through whole-genome sequencing of 2230 Icelanders. We imputed genotypes for 4208 BCC patients and 109 408 controls using Illumina SNP chip typing data, carried out association tests and replicated the findings in independent population samples. We found new BCC susceptibility loci at TGM3 (rs214782[G], P = 5.5 × 10−17, OR = 1.29) and RGS22 (rs7006527[C], P = 8.7 × 10−13, OR = 0.77). TGM3 encodes transglutaminase type 3, which plays a key role in production of the cornified envelope during epidermal differentiation. PMID:24403052

  12. Hybridization and sequencing of nucleic acids using base pair mismatches

    DOEpatents

    Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

    2001-01-01

    Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.

  13. Exome Sequencing Reveals Novel Rare Variants in the Ryanodine Receptor and Calcium Channel Genes in Malignant Hyperthermia Families

    PubMed Central

    Kim, Jerry H.; Jarvik, Gail P.; Browning, Brian L.; Rajagopalan, Ramakrishnan; Gordon, Adam S.; Rieder, Mark J.; Robertson, Peggy D.; Nickerson, Deborah A.; Fisher, Nickla A.; Hopkins, Philip M.

    2014-01-01

    Background About half of malignant hyperthermia (MH) cases are associated with skeletal muscle ryanodine receptor 1 (RYR1) and calcium channel, voltage-dependent, L type, α1S subunit (CACNA1S) gene mutations, leaving many with an unknown cause. We chose to apply a sequencing approach to uncover causal variants in unknown cases. Sequencing the exome, the protein-coding region of the genome, has power at low sample sizes and identified the cause of over a dozen Mendelian disorders. Methods We considered four families with multiple MH cases but in whom no mutations in RYR1 and CACNA1S had been identified by Sanger sequencing of complementary DNA. Exome sequencing of two affecteds per family, chosen for maximum genetic distance, were compared. Variants were ranked by allele frequency, protein change, and measures of conservation among mammals to assess likelihood of causation. Finally, putative pathogenic mutations were genotyped in other family members to verify cosegregation with MH. Results Exome sequencing revealed 1 rare RYR1 nonsynonymous variant in each of 3 families (Asp1056His, Val2627Met, Val4234Leu), and 1 CACNA1S variant (Thr1009Lys) in a 4th family. These were not seen in variant databases or in our control population sample of 5379 exomes. Follow-up sequencing in other family members verified cosegregation of alleles with MH. Conclusions Using both exome sequencing and allele frequency data from large sequencing efforts may aid genetic diagnosis of MH. In our sample, it was more sensitive for variant detection in known genes than Sanger sequencing of complementary DNA, and allows for the possibility of novel gene discovery. PMID:24013571

  14. SEQCHIP: a powerful method to integrate sequence and genotype data for the detection of rare variant associations

    PubMed Central

    Liu, Dajiang J.; Leal, Suzanne M.

    2012-01-01

    Motivation: Next-generation sequencing greatly increases the capacity to detect rare-variant complex-trait associations. However, it is still expensive to sequence a large number of samples and therefore often small datasets are used. Given cost constraints, a potentially more powerful two-step strategy is to sequence a subset of the sample to discover variants, and genotype the identified variants in the remaining sample. If only cases are sequenced, directly combining sequence and genotype data will lead to inflated type-I errors in rare-variant association analysis. Although several methods have been developed to correct for the bias, they are either underpowered or theoretically invalid. We proposed a new method SEQCHIP to integrate genotype and sequence data, which can be used with most existing rare-variant tests. Results: It is demonstrated using both simulated and real datasets that the SEQCHIP method has controlled type-I errors, and is substantially more powerful than all other currently available methods. Availability: SEQCHIP is implemented in an R-Package and is available at http://linkage.rockefeller.edu/suzanne/seqchip/Seqchip.htm Contacts: dajiang@umich.edu or sleal@bcm.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22556370

  15. A Unified Framework for Detecting Rare Variant Quantitative Trait Associations in Pedigree and Unrelated Individuals via Sequence Data

    PubMed Central

    Liu, Dajiang J.; Leal, Suzanne M.

    2012-01-01

    Objectives There is great interest to sequence unrelated or pedigree samples for detecting rare variant quantitative trait associations. In order to reduce the cost of sequencing and improve power, many studies sequence selected samples with extreme traits. Existing methods for detecting rare variant associations were developed for unrelated samples. Methods are needed to analyze (selected or randomly ascertained) pedigree samples. Methods We propose a unified framework of modeling extreme trait genetic associations (MEGA) with rare variants. Using MEGA and appropriate permutation algorithms, many rare variant tests can be extended to family data. As an application, we compared study designs using both sib-pairs and unrelated individuals. Extensive simulations were carried out using realistic population genetic and complex trait models. Results It is demonstrated that when extreme sampling is implemented within equal-sized cohorts of unrelated individuals or sib-pairs, analyzing unrelated individuals is consistently more powerful than studying sib-pairs. A higher portion of rare variants can be identified through sequencing unrelated samples compared to sibs. Alternatively, if samples are ascertained using fixed thresholds from an infinite-sized population, sequencing one sib with the most extreme trait from each extreme concordant sib-pair is consistently the most powerful design. Conclusions MEGA will play an important role in the analysis of sequence-based genetic association studies. PMID:22555759

  16. Exome sequencing is an efficient tool for variant late-infantile neuronal ceroid lipofuscinosis molecular diagnosis.

    PubMed

    Patiño, Liliana Catherine; Battu, Rajani; Ortega-Recalde, Oscar; Nallathambi, Jeyabalan; Anandula, Venkata Ramana; Renukaradhya, Umashankar; Laissue, Paul

    2014-01-01

    The neuronal ceroid-lipofuscinoses (NCL) is a group of neurodegenerative disorders characterized by epilepsy, visual failure, progressive mental and motor deterioration, myoclonus, dementia and reduced life expectancy. Classically, NCL-affected individuals have been classified into six categories, which have been mainly defined regarding the clinical onset of symptoms. However, some patients cannot be easily included in a specific group because of significant variation in the age of onset and disease progression. Molecular genetics has emerged in recent years as a useful tool for enhancing NCL subtype classification. Fourteen NCL genetic forms (CLN1 to CLN14) have been described to date. The variant late-infantile form of the disease has been linked to CLN5, CLN6, CLN7 (MFSD8) and CLN8 mutations. Despite advances in the diagnosis of neurodegenerative disorders mutations in these genes may cause similar phenotypes, which rends difficult accurate candidate gene selection for direct sequencing. Three siblings who were affected by variant late-infantile NCL are reported in the present study. We used whole-exome sequencing, direct sequencing and in silico approaches to identify the molecular basis of the disease. We identified the novel c.1219T>C (p.Trp407Arg) and c.1361T>C (p.Met454Thr) MFSD8 pathogenic mutations. Our results highlighted next generation sequencing as a novel and powerful methodological approach for the rapid determination of the molecular diagnosis of NCL. They also provide information regarding the phenotypic and molecular spectrum of CLN7 disease. PMID:25333361

  17. A Multiple-Sequence Variant of the Multiple-Baseline Design: A Strategy for Analysis of Sequence Effects and Treatment Comparison.

    ERIC Educational Resources Information Center

    Noell, George H.; Gresham, Frank M.

    2001-01-01

    Describes design logic and potential uses of a variant of the multiple-baseline design. The multiple-baseline multiple-sequence (MBL-MS) consists of multiple-baseline designs that are interlaced with one another and include all possible sequences of treatments. The MBL-MS design appears to be primarily useful for comparison of treatments taking…

  18. EGFR variant heterogeneity in glioblastoma resolved through single-nucleus sequencing

    PubMed Central

    Francis, Joshua M.; Zhang, Cheng-Zhong; Maire, Cecile L.; Jung, Joonil; Manzo, Veronica E.; Adalsteinsson, Viktor A.; Homer, Heather; Haidar, Sam; Blumenstiel, Brendan; Pedamallu, Chandra Sekhar; Ligon, Azra H.; Love, J. Christopher; Meyerson, Matthew; Ligon, Keith L.

    2014-01-01

    Glioblastomas with EGFR amplification represent approximately 50% of newly diagnosed cases and recent studies have revealed frequent coexistence of multiple EGFR aberrations within the same tumor with implications for mutation cooperation and treatment resistance. However, bulk tumor sequencing studies cannot resolve the patterns of how the multiple EGFR aberrations coexist with other mutations within single tumor cells. Here we applied a population-based single-cell whole genome sequencing methodology to characterize genomic heterogeneity in EGFR amplified glioblastomas. Our analysis effectively identified clonal events, including a novel translocation of a super enhancer to the TERT promoter, as well as subclonal loss-of-heterozygosity and multiple EGFR mutational variants within tumors. Correlating the EGFR mutations onto the cellular hierarchy revealed that EGFR truncation variants (EGFRvII and EGFR Carboxyl-terminal deletions) identified in the bulk tumor segregate into non-overlapping subclonal populations. In vitro and in vivo functional studies show EGFRvII is oncogenic and sensitive to EGFR inhibitors currently in clinical trials. Thus the association between diverse activating mutations in EGFR and other subclonal mutations within a single tumor supports an intrinsic mechanism for proliferative and clonal diversification with broad implications in resistance to treatment. PMID:24893890

  19. Sequence Variation in the Small-Subunit rRNA Gene of Plasmodium malariae and Prevalence of Isolates with the Variant Sequence in Sichuan, China

    PubMed Central

    Liu, Qing; Zhu, Shenghua; Mizuno, Sahoko; Kimura, Masatsugu; Liu, Peina; Isomura, Shin; Wang, Xingzhen; Kawamoto, Fumihiko

    1998-01-01

    By two PCR-based diagnostic methods, Plasmodium malariae infections have been rediscovered at two foci in the Sichuan province of China, a region where no cases of P. malariae have been officially reported for the last 2 decades. In addition, a variant form of P. malariae which has a deletion of 19 bp and seven substitutions of base pairs in the target sequence of the small-subunit (SSU) rRNA gene was detected with high frequency. Alignment analysis of Plasmodium sp. SSU rRNA gene sequences revealed that the 5′ region of the variant sequence is identical to that of P. vivax or P. knowlesi and its 3′ region is identical to that of P. malariae. The same sequence variations were also found in P. malariae isolates collected along the Thai-Myanmar border, suggesting a wide distribution of this variant form from southern China to Southeast Asia. PMID:9774600

  20. Testing Rare-Variant Association without Calling Genotypes Allows for Systematic Differences in Sequencing between Cases and Controls

    PubMed Central

    Hu, Yi-Juan; Liao, Peizhou; Johnston, H. Richard; Allen, Andrew S.; Satten, Glen A.

    2016-01-01

    Next-generation sequencing of DNA provides an unprecedented opportunity to discover rare genetic variants associated with complex diseases and traits. However, the common practice of first calling underlying genotypes and then treating the called values as known is prone to false positive findings, especially when genotyping errors are systematically different between cases and controls. This happens whenever cases and controls are sequenced at different depths, on different platforms, or in different batches. In this article, we provide a likelihood-based approach to testing rare variant associations that directly models sequencing reads without calling genotypes. We consider the (weighted) burden test statistic, which is the (weighted) sum of the score statistic for assessing effects of individual variants on the trait of interest. Because variant locations are unknown, we develop a simple, computationally efficient screening algorithm to estimate the loci that are variants. Because our burden statistic may not have mean zero after screening, we develop a novel bootstrap procedure for assessing the significance of the burden statistic. We demonstrate through extensive simulation studies that the proposed tests are robust to a wide range of differential sequencing qualities between cases and controls, and are at least as powerful as the standard genotype calling approach when the latter controls type I error. An application to the UK10K data reveals novel rare variants in gene BTBD18 associated with childhood onset obesity. The relevant software is freely available. PMID:27152526

  1. Testing Rare-Variant Association without Calling Genotypes Allows for Systematic Differences in Sequencing between Cases and Controls.

    PubMed

    Hu, Yi-Juan; Liao, Peizhou; Johnston, H Richard; Allen, Andrew S; Satten, Glen A

    2016-05-01

    Next-generation sequencing of DNA provides an unprecedented opportunity to discover rare genetic variants associated with complex diseases and traits. However, the common practice of first calling underlying genotypes and then treating the called values as known is prone to false positive findings, especially when genotyping errors are systematically different between cases and controls. This happens whenever cases and controls are sequenced at different depths, on different platforms, or in different batches. In this article, we provide a likelihood-based approach to testing rare variant associations that directly models sequencing reads without calling genotypes. We consider the (weighted) burden test statistic, which is the (weighted) sum of the score statistic for assessing effects of individual variants on the trait of interest. Because variant locations are unknown, we develop a simple, computationally efficient screening algorithm to estimate the loci that are variants. Because our burden statistic may not have mean zero after screening, we develop a novel bootstrap procedure for assessing the significance of the burden statistic. We demonstrate through extensive simulation studies that the proposed tests are robust to a wide range of differential sequencing qualities between cases and controls, and are at least as powerful as the standard genotype calling approach when the latter controls type I error. An application to the UK10K data reveals novel rare variants in gene BTBD18 associated with childhood onset obesity. The relevant software is freely available. PMID:27152526

  2. Characterization of Intra-Type Variants of Oncogenic Human Papillomaviruses by Next-Generation Deep Sequencing of the E6/E7 Region.

    PubMed

    Lavezzo, Enrico; Masi, Giulia; Toppo, Stefano; Franchin, Elisa; Gazzola, Valentina; Sinigaglia, Alessandro; Masiero, Serena; Trevisan, Marta; Pagni, Silvana; Palù, Giorgio; Barzon, Luisa

    2016-03-01

    Different human papillomavirus (HPV) types are characterized by differences in tissue tropism and ability to promote cell proliferation and transformation. In addition, clinical and experimental studies have shown that some genetic variants/lineages of high-risk HPV (HR-HPV) types are characterized by increased oncogenic activity and probability to induce cancer. In this study, we designed and validated a new method based on multiplex PCR-deep sequencing of the E6/E7 region of HR-HPV types to characterize HPV intra-type variants in clinical specimens. Validation experiments demonstrated that this method allowed reliable identification of the different lineages of oncogenic HPV types. Advantages of this method over other published methods were represented by its ability to detect variants of all HR-HPV types in a single reaction, to detect variants of HR-HPV types in clinical specimens with multiple infections, and, being based on sequencing of the full E6/E7 region, to detect amino acid changes in these oncogenes potentially associated with increased transforming activity. PMID:26985902

  3. Characterization of Intra-Type Variants of Oncogenic Human Papillomaviruses by Next-Generation Deep Sequencing of the E6/E7 Region

    PubMed Central

    Lavezzo, Enrico; Masi, Giulia; Toppo, Stefano; Franchin, Elisa; Gazzola, Valentina; Sinigaglia, Alessandro; Masiero, Serena; Trevisan, Marta; Pagni, Silvana; Palù, Giorgio; Barzon, Luisa

    2016-01-01

    Different human papillomavirus (HPV) types are characterized by differences in tissue tropism and ability to promote cell proliferation and transformation. In addition, clinical and experimental studies have shown that some genetic variants/lineages of high-risk HPV (HR-HPV) types are characterized by increased oncogenic activity and probability to induce cancer. In this study, we designed and validated a new method based on multiplex PCR-deep sequencing of the E6/E7 region of HR-HPV types to characterize HPV intra-type variants in clinical specimens. Validation experiments demonstrated that this method allowed reliable identification of the different lineages of oncogenic HPV types. Advantages of this method over other published methods were represented by its ability to detect variants of all HR-HPV types in a single reaction, to detect variants of HR-HPV types in clinical specimens with multiple infections, and, being based on sequencing of the full E6/E7 region, to detect amino acid changes in these oncogenes potentially associated with increased transforming activity. PMID:26985902

  4. 77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-29

    ... Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request. SUMMARY: The United States....'' SUPPLEMENTARY INFORMATION: I. Abstract Patent applications that contain nucleotide and/or amino acid sequence disclosures must include a copy of the sequence listing in accordance with the requirements in 37 CFR...

  5. Identifying bottlenecks in transient and stable production of recombinant monoclonal-antibody sequence variants in Chinese hamster ovary cells

    PubMed Central

    Mason, Megan; Sweeney, Bernadette; Cain, Katharine; Stephens, Paul; Sharfstein, Susan T.

    2012-01-01

    The increasing demand for antibody-based therapeutics has emphasized the need for technologies to improve recombinant antibody titers from mammalian cell lines. Moreover, as antibody therapeutics address an increasing spectrum of indications, interest has increased in antibody engineering to improve affinity and biological activity. However, the cellular mechanisms that dictate expression and the relationships between antibody sequence and expression level remain poorly understood. Fundamental understanding of how mammalian cells handle high levels of transgene expression and of the relationship between sequence and expression are vital to the development of new antibodies and for increasing recombinant antibody titers. In this work, we analyzed a pair of mutants that vary by a single amino acid at Kabat position 49 (heavy chain framework), resulting in differential transient and stable titers with no apparent loss of antigen affinity. Through analysis of mRNA, gene copy number, intracellular antibody content, and secreted antibody, we found that while translational/post-translational mechanisms are limiting in transient systems, it appears that the amount of available transgenic mRNA becomes the limiting event upon stable integration of the recombinant genes. We also show that amino acid substitution at residue 49 results in production of a non-secreted HC variant and postulate that stable antibody expression is maintained at a level which prevents toxic accumulation of this HC-related protein. This study highlights the need for proper sequence engineering strategies when developing therapeutic antibodies and alludes to the early analysis of transient expression systems to identify the potential for aberrant stable expression behavior. PMID:22467228

  6. International Lung Cancer Consortium: Pooled Analysis of Sequence Variants in DNA Repair and Cell Cycle Pathways

    PubMed Central

    Hung, Rayjean J.; Christiani, David C.; Risch, Angela; Popanda, Odilia; Haugen, Aage; Zienolddiny, Shan; Benhamou, Simone; Bouchardy, Christine; Lan, Qing; Spitz, Margaret R.; Wichmann, H.-Erich; LeMarchand, Loic; Vineis, Paolo; Matullo, Giuseppe; Kiyohara, Chikako; Zhang, Zuo-Feng; Pezeshki, Benhnaz; Harris, Curtis; Mechanic, Leah; Seow, Adeline; Ng, Daniel P.K.; Szeszenia-Dabrowska, Neonila; Zaridze, David; Lissowska, Jolanta; Rudnai, Peter; Fabianova, Eleonora; Mates, Dana; Foretova, Lenka; Janout, Vladimir; Bencko, Vladimir; Caporaso, Neil; Chen, Chu; Duell, Eric J.; Goodman, Gary; Field, John K.; Houlston, Richard S.; Hong, Yun-Chul; Landi, Maria Teresa; Lazarus, Philip; Muscat, Joshua; McLaughlin, John; Schwartz, Ann G.; Shen, Hongbing; Stucker, Isabelle; Tajima, Kazuo; Matsuo, Keitaro; Thun, Michael; Yang, Ping; Wiencke, John; Andrew, Angeline S.; Monnier, Stephanie; Boffetta, Paolo; Brennan, Paul

    2009-01-01

    Background The International Lung Cancer Consortium was established in 2004. To clarify the role of DNA repair genes in lung cancer susceptibility, we conducted a pooled analysis of genetic variants in DNA repair pathways, whose associations have been investigated by at least 3 individual studies. Methods Data from 14 studies were pooled for 18 sequence variants in 12 DNA repair genes, including APEX1, OGG1, XRCC1, XRCC2, XRCC3, ERCC1, XPD, XPF, XPG, XPA, MGMT, and TP53. The total number of subjects included in the analysis for each variant ranged from 2,073 to 13,955 subjects. Results Four of the variants were found to be weakly associated with lung cancer risk with borderline significance: these were XRCC3 T241M [heterozygote odds ratio (OR), 0.89; 95% confidence interval (95% CI), 0.79–0.99 and homozygote OR, 0.84; 95% CI, 0.71–1.00] based on 3,467 cases and 5,021 controls from 8 studies, XPD K751Q (heterozygote OR, 0.99; 95% CI, 0.89–1.10 and homozygote OR, 1.19; 95% CI, 1.02–1.39) based on 6,463 cases and 6,603 controls from 9 studies, and TP53 R72P (heterozygote OR, 1.14; 95% CI, 1.00–1.29 and homozygote OR, 1.20; 95% CI, 1.02–1.42) based on 3,610 cases and 5,293 controls from 6 studies. OGG1 S326C homozygote was suggested to be associated with lung cancer risk in Caucasians (homozygote OR, 1.34; 95% CI, 1.01–1.79) based on 2,569 cases and 4,178 controls from 4 studies but not in Asians. The other 14 variants did not exhibit main effects on lung cancer risk. Discussion In addition to data pooling, future priorities of International Lung Cancer Consortium include coordinated genotyping and multistage validation for ongoing genome-wide association studies. PMID:18990748

  7. Complexity of murine cardiomyocyte miRNA biogenesis, sequence variant expression and function.

    PubMed

    Humphreys, David T; Hynes, Carly J; Patel, Hardip R; Wei, Grace H; Cannon, Leah; Fatkin, Diane; Suter, Catherine M; Clancy, Jennifer L; Preiss, Thomas

    2012-01-01

    microRNAs (miRNAs) are critical to heart development and disease. Emerging research indicates that regulated precursor processing can give rise to an unexpected diversity of miRNA variants. We subjected small RNA from murine HL-1 cardiomyocyte cells to next generation sequencing to investigate the relevance of such diversity to cardiac biology. ∼40 million tags were mapped to known miRNA hairpin sequences as deposited in miRBase version 16, calling 403 generic miRNAs as appreciably expressed. Hairpin arm bias broadly agreed with miRBase annotation, although 44 miR* were unexpectedly abundant (>20% of tags); conversely, 33 -5p/-3p annotated hairpins were asymmetrically expressed. Overall, variability was infrequent at the 5' start but common at the 3' end of miRNAs (5.2% and 52.3% of tags, respectively). Nevertheless, 105 miRNAs showed marked 5' isomiR expression (>20% of tags). Among these was miR-133a, a miRNA with important cardiac functions, and we demonstrated differential mRNA targeting by two of its prevalent 5' isomiRs. Analyses of miRNA termini and base-pairing patterns around Drosha and Dicer cleavage regions confirmed the known bias towards uridine at the 5' most position of miRNAs, as well as supporting the thermodynamic asymmetry rule for miRNA strand selection and a role for local structural distortions in fine tuning miRNA processing. We further recorded appreciable expression of 5 novel miR*, 38 extreme variants and 8 antisense miRNAs. Analysis of genome-mapped tags revealed 147 novel candidate miRNAs. In summary, we revealed pronounced sequence diversity among cardiomyocyte miRNAs, knowledge of which will underpin future research into the mechanisms involved in miRNA biogenesis and, importantly, cardiac function, disease and therapy. PMID:22319597

  8. A unified method for detecting secondary trait associations with rare variants: application to sequence data.

    PubMed

    Liu, Dajiang J; Leal, Suzanne M

    2012-01-01

    Next-generation sequencing has made possible the detection of rare variant (RV) associations with quantitative traits (QT). Due to high sequencing cost, many studies can only sequence a modest number of selected samples with extreme QT. Therefore association testing in individual studies can be underpowered. Besides the primary trait, many clinically important secondary traits are often measured. It is highly beneficial if multiple studies can be jointly analyzed for detecting associations with commonly measured traits. However, analyzing secondary traits in selected samples can be biased if sample ascertainment is not properly modeled. Some methods exist for analyzing secondary traits in selected samples, where some burden tests can be implemented. However p-values can only be evaluated analytically via asymptotic approximations, which may not be accurate. Additionally, potentially more powerful sequence kernel association tests, variable selection-based methods, and burden tests that require permutations cannot be incorporated. To overcome these limitations, we developed a unified method for analyzing secondary trait associations with RVs (STAR) in selected samples, incorporating all RV tests. Statistical significance can be evaluated either through permutations or analytically. STAR makes it possible to apply more powerful RV tests to analyze secondary trait associations. It also enables jointly analyzing multiple cohorts ascertained under different study designs, which greatly boosts power. The performance of STAR and commonly used RV association tests were comprehensively evaluated using simulation studies. STAR was also implemented to analyze a dataset from the SardiNIA project where samples with extreme low-density lipoprotein levels were sequenced. A significant association between LDLR and systolic blood pressure was identified, which is supported by pharmacogenetic studies. In summary, for sequencing studies, STAR is an important tool for detecting

  9. A Unified Method for Detecting Secondary Trait Associations with Rare Variants: Application to Sequence Data

    PubMed Central

    Liu, Dajiang J.; Leal, Suzanne M.

    2012-01-01

    Next-generation sequencing has made possible the detection of rare variant (RV) associations with quantitative traits (QT). Due to high sequencing cost, many studies can only sequence a modest number of selected samples with extreme QT. Therefore association testing in individual studies can be underpowered. Besides the primary trait, many clinically important secondary traits are often measured. It is highly beneficial if multiple studies can be jointly analyzed for detecting associations with commonly measured traits. However, analyzing secondary traits in selected samples can be biased if sample ascertainment is not properly modeled. Some methods exist for analyzing secondary traits in selected samples, where some burden tests can be implemented. However p-values can only be evaluated analytically via asymptotic approximations, which may not be accurate. Additionally, potentially more powerful sequence kernel association tests, variable selection-based methods, and burden tests that require permutations cannot be incorporated. To overcome these limitations, we developed a unified method for analyzing secondary trait associations with RVs (STAR) in selected samples, incorporating all RV tests. Statistical significance can be evaluated either through permutations or analytically. STAR makes it possible to apply more powerful RV tests to analyze secondary trait associations. It also enables jointly analyzing multiple cohorts ascertained under different study designs, which greatly boosts power. The performance of STAR and commonly used RV association tests were comprehensively evaluated using simulation studies. STAR was also implemented to analyze a dataset from the SardiNIA project where samples with extreme low-density lipoprotein levels were sequenced. A significant association between LDLR and systolic blood pressure was identified, which is supported by pharmacogenetic studies. In summary, for sequencing studies, STAR is an important tool for detecting

  10. Sequencing-based variant detection in the polyploid crop oilseed rape

    PubMed Central

    2013-01-01

    Background The detection and exploitation of genetic variation underpins crop improvement. However, the polyploid nature of the genomes of many of our most important crops represents a barrier, particularly for the analysis of variation within genes. To overcome this, we aimed to develop methodologies based on amplicon sequencing that involve the incorporation of barcoded amplification tags (BATs) into PCR products. Results A protocol was developed to tag PCR products with 5’ 6-base oligonucleotide barcode extensions before pooling for sequencing library production using standard Illumina adapters. A computational method was developed for the de-convolution of products and the robust detection and scoring of sequence variants. Using this methodology, amplicons targeted to gene sequences were screened across a B. napus mapping population and the resulting allele scoring strings for 24 markers linkage mapped to the expected regions of the genome. Furthermore, using one-dimensional 8-fold pooling, 4608 lines of a B. napus mutation population were screened for induced mutations in a locus-specific amplicon (an orthologue of GL2.b) and mixed product of three co-amplified loci (orthologues of FAD2), identifying 10 and 41 mutants respectively. Conclusions The utilisation of barcode tags to de-convolute pooled PCR products in multiplexed, variation screening via Illumina sequencing provides a cost effective method for SNP genotyping and mutation detection and, potentially, markers for causative changes, even in polyploid species. Combining this approach with existing Illumina multiplexing workflows allows the analysis of thousands of lines cheaply and efficiently in a single sequencing run with minimal library production costs. PMID:23915099

  11. Multi-species sequence comparison reveals conservation of ghrelin gene-derived splice variants encoding a truncated ghrelin peptide.

    PubMed

    Seim, Inge; Jeffery, Penny L; Thomas, Patrick B; Walpole, Carina M; Maugham, Michelle; Fung, Jenny N T; Yap, Pei-Yi; O'Keeffe, Angela J; Lai, John; Whiteside, Eliza J; Herington, Adrian C; Chopin, Lisa K

    2016-06-01

    The peptide hormone ghrelin is a potent orexigen produced predominantly in the stomach. It has a number of other biological actions, including roles in appetite stimulation, energy balance, the stimulation of growth hormone release and the regulation of cell proliferation. Recently, several ghrelin gene splice variants have been described. Here, we attempted to identify conserved alternative splicing of the ghrelin gene by cross-species sequence comparisons. We identified a novel human exon 2-deleted variant and provide preliminary evidence that this splice variant and in1-ghrelin encode a C-terminally truncated form of the ghrelin peptide, termed minighrelin. These variants are expressed in humans and mice, demonstrating conservation of alternative splicing spanning 90 million years. Minighrelin appears to have similar actions to full-length ghrelin, as treatment with exogenous minighrelin peptide stimulates appetite and feeding in mice. Forced expression of the exon 2-deleted preproghrelin variant mirrors the effect of the canonical preproghrelin, stimulating cell proliferation and migration in the PC3 prostate cancer cell line. This is the first study to characterise an exon 2-deleted preproghrelin variant and to demonstrate sequence conservation of ghrelin gene-derived splice variants that encode a truncated ghrelin peptide. This adds further impetus for studies into the alternative splicing of the ghrelin gene and the function of novel ghrelin peptides in vertebrates. PMID:26792793

  12. Somatic mosaicism and variant frequency detected by next-generation sequencing in X-linked Alport syndrome.

    PubMed

    Fu, Xue Jun; Nozu, Kandai; Kaito, Hiroshi; Ninchoji, Takeshi; Morisada, Naoya; Nakanishi, Koichi; Yoshikawa, Norishige; Ohtsubo, Hiromi; Matsunoshita, Natsuki; Kamiyoshi, Naohiro; Matsumura, Chieko; Takagi, Nobuaki; Maekawa, Kohei; Taniguchi-Ikeda, Mariko; Iijima, Kazumoto

    2016-03-01

    X-linked Alport syndrome (XLAS) is a progressive, hereditary nephropathy. Although men with XLAS usually develop end-stage renal disease before 30 years of age, some men show a milder phenotype and develop end-stage renal disease later in life. However, the molecular mechanisms associated with this milder phenotype have not been fully identified. We genetically diagnosed 186 patients with suspected XLAS between January 2006 and August 2014. Genetic examination involved: (1) extraction and analysis of genomic DNA using PCR and direct sequencing using Sanger's method and (2) next-generation sequencing to detect variant allele frequencies. We identified somatic mosaic variants in the type VI collagen, α5 gene (COL4A5) in four patients. Interestingly, two of these four patients with variant frequencies in kidney biopsies or urinary sediment cells of ≥50% showed hematuria and moderate proteinuria, whereas the other two with variant frequencies of <50% were asymptomatic or only had hematuria. De novo variants can occur even in asymptomatic male cases of XLAS resulting in mosaicism, with important implications for genetic counseling. This is the first study to show a tendency between the variant allele frequency and disease severity in male XLAS patients with somatic mosaic variants in COL4A5. Although this is a very rare status of somatic mosaicism, further analysis is needed to show this correlation in a larger population. PMID:26014433

  13. Next-Generation Sequencing and Novel Variant Determination in a Cohort of 92 Familial Exudative Vitreoretinopathy Patients

    PubMed Central

    Salvo, Jason; Lyubasyuk, Vera; Xu, Mingchu; Wang, Hui; Wang, Feng; Nguyen, Duy; Wang, Keqing; Luo, Hongrong; Wen, Cindy; Shi, Catherine; Lin, Danni; Zhang, Kang; Chen, Rui

    2015-01-01

    Purpose. Familial exudative vitreoretinopathy (FEVR) is a developmental disease that can cause visual impairment and retinal detachment at a young age. Four genes involved in the Wnt signaling pathway were previously linked to this disease: NDP, FDZ4, LRP5, and TSPAN12. Identification of novel disease-causing alleles allows for a deeper understanding of the disease, better molecular diagnosis, and improved treatment. Methods. Sequencing libraries from 92 FEVR patients were generated using a custom capture panel to enrich for 163 known retinal disease-causing genes in humans. Samples were processed using next generation sequencing (NGS) techniques followed by data analysis to identify and classify single nucleotide variants and small insertions and deletions. Sanger validation and segregation testing were used to verify suspected variants. Results. Of the cohort of 92, 45 patients were potentially solved (48.9%). Solved cases resulted from the determination of 49 unique mutations, 41 of which are novel. Of the novel variants discovered, 13 were highly likely to cause FEVR due to the nature of these variants (frameshifting indels, splicing mutations, and nonsense variants types). To our knowledge, this is the largest study of a FEVR cohort using NGS. Conclusions. We were able to determine probable disease-causing variants in a large number of FEVR patients, the majority of which were novel. Knowledge of these variants will help to further characterize and diagnose FEVR. PMID:25711638

  14. The contribution of lactic acid to acidification of tumours: studies of variant cells lacking lactate dehydrogenase.

    PubMed Central

    Yamagata, M.; Hasuda, K.; Stamato, T.; Tannock, I. F.

    1998-01-01

    Solid tumours develop an acidic extracellular environment with high concentration of lactic acid, and lactic acid produced by glycolysis has been assumed to be the major cause of tumour acidity. Experiments using lactate dehydrogenase (LDH)-deficient ras-transfected Chinese hamster ovarian cells have been undertaken to address directly the hypothesis that lactic acid production is responsible for tumour acidification. The variant cells produce negligible quantities of lactic acid and consume minimal amounts of glucose compared with parental cells. Lactate-producing parental cells acidified lightly-buffered medium but variant cells did not. Tumours derived from parental and variant cells implanted into nude mice were found to have mean values of extracellular pH (pHe) of 7.03 +/- 0.03 and 7.03 +/- 0.05, respectively, both of which were significantly lower than that of normal muscle (pHe = 7.43 +/- 0.03; P < 0.001). Lactic acid concentration in variant tumours (450 +/- 90 microg g(-1) wet weight) was much lower than that in parental tumours (1880 +/- 140 microg/g(-1)) and similar to that in serum (400 +/- 35 microg/g(-1)). These data show discordance between mean levels of pHe and lactate content in tumours; the results support those of Newell et al (1993) and suggest that the production of lactic acid via glycolysis causes acidification of culture medium, but is not the only mechanism, and is probably not the major mechanism responsible for the development of an acidic environment within solid tumours. PMID:9667639

  15. Associations between variants of FADS genes and omega-3 and omega-6 milk fatty acids of Canadian Holstein cows

    PubMed Central

    2014-01-01

    Background Fatty acid desaturase 1 (FADS1) and 2 (FADS2) genes code respectively for the enzymes delta-5 and delta-6 desaturases which are rate limiting enzymes in the synthesis of polyunsaturated omega-3 and omega-6 fatty acids (FAs). Omega-3 and-6 FAs as well as conjugated linoleic acid (CLA) are present in bovine milk and have demonstrated positive health effects in humans. Studies in humans have shown significant relationships between genetic variants in FADS1 and 2 genes with plasma and tissue concentrations of omega-3 and-6 FAs. The aim of this study was to evaluate the extent of sequence variations within these two genes in Canadian Holstein cows as well as the association between sequence variants and health promoting FAs in milk. Results Thirty three SNPs were detected within the studied regions of genes including a synonymous mutation (FADS1-07, rs42187261, 306Tyr > Tyr) in exon 8 of FADS1, a non-synonymous mutation (FADS2-14, rs211580559, 294Ala > Val) within FADS2 exon 7, a splice site SNP (FADS2-05, rs211263660), a 3′UTR SNP (FADS2-23, rs109772589), and another 3′UTR SNP with an effect on a microRNA binding site within FADS2 gene (FADS2-19, rs210169303). Association analyses showed significant relations between three out of seven tested SNPs and several FAs. Significant associations (FDR P < 0.05) were recorded between FADS2-23 (rs109772589) and two omega-6 FAs (dihomogamma linolenic acid [C20:3n6] and arachidonic acid [C20:4n6]), FADS1-07 (rs42187261) and one omega-3 FA (eicosapentaenoic acid, C20:5n3) and tricosanoic acid (C23:0), and one intronic SNP, FADS1-01 (rs136261927) and C20:3n6. Conclusion Our study has demonstrated positive associations between three SNPs within FADS1 and FADS2 genes (a SNP within the 3’UTR, a synonymous SNP and an intronic SNP), with three milk PUFAs of Canadian Holstein cows thus suggesting possible involvement of synonymous and non-coding region variants in FA synthesis. These SNPs may serve as

  16. Molecular cloning and nucleotide sequence of cDNA for human glucose-6-phosphate dehydrogenase variant A(-).

    PubMed Central

    Hirono, A; Beutler, E

    1988-01-01

    Glucose-6-phosphate dehydrogenase (G6PD; D-glucose-6-phosphate:NADP+ oxidoreductase, EC 1.1.1.49) A(-) is a common variant in Blacks that causes sensitivity to drug-and infection-induced hemolytic anemia. A cDNA library was constructed from Epstein-Barr virus-transformed lymphoblastoid cells from a male who was G6PD A(-). One of four cDNA clones isolated contained a sequence not found in the other clones nor in the published cDNA sequence. Consisting of 138 bases and coding 46 amino acids, this segment of cDNA apparently is derived from the alternative splicing involving the 3' end of intron 7. Comparison of the remaining sequences of these clones with the published sequence revealed three nucleotide substitutions: C33----G, G202----A, and A376----G. Each change produces a new restriction site. Genomic DNA from five G6PD A(-) individuals was amplified by the polymerase chain reaction. The base substitution at position 376, identical to the substitution that has been reported in G6PD A(+), was present in all G6PD A(-) samples and none of the control G6PD B(+) samples examined. The substitution at position 202 was found in four of the five G6PD A(-) samples and no normal control sample. At position 33 guanine was found in all G6PD A(-) samples and seven G6PD B(+) control samples and is, presumably, the usual nucleotide found at this position. The finding of the same mutation in G6PD A(-) as is found in G6PD A(+) strongly suggests that the G6PD A(-) mutation arose in an individual with G6PD A(+), adding another mutation that causes the in vivo instability of this enzyme protein. Images PMID:2836867

  17. Next generation exome sequencing of paediatric inflammatory bowel disease patients identifies rare and novel variants in candidate genes

    PubMed Central

    Christodoulou, Katja; Wiskin, Anthony E; Gibson, Jane; Tapper, William; Willis, Claire; Afzal, Nadeem A; Upstill-Goddard, Rosanna; Holloway, John W; Simpson, Michael A; Beattie, R Mark; Collins, Andrew

    2013-01-01

    Background Multiple genes have been implicated by association studies in altering inflammatory bowel disease (IBD) predisposition. Paediatric patients often manifest more extensive disease and a particularly severe disease course. It is likely that genetic predisposition plays a more substantial role in this group. Objective To identify the spectrum of rare and novel variation in known IBD susceptibility genes using exome sequencing analysis in eight individual cases of childhood onset severe disease. Design DNA samples from the eight patients underwent targeted exome capture and sequencing. Data were processed through an analytical pipeline to align sequence reads, conduct quality checks, and identify and annotate variants where patient sequence differed from the reference sequence. For each patient, the entire complement of rare variation within strongly associated candidate genes was catalogued. Results Across the panel of 169 known IBD susceptibility genes, approximately 300 variants in 104 genes were found. Excluding splicing and HLA-class variants, 58 variants across 39 of these genes were classified as rare, with an alternative allele frequency of <5%, of which 17 were novel. Only two patients with early onset Crohn's disease exhibited rare deleterious variations within NOD2: the previously described R702W variant was the sole NOD2 variant in one patient, while the second patient also carried the L1007 frameshift insertion. Both patients harboured other potentially damaging mutations in the GSDMB, ERAP2 and SEC16A genes. The two patients severely affected with ulcerative colitis exhibited a distinct profile: both carried potentially detrimental variation in the BACH2 and IL10 genes not seen in other patients. Conclusion For each of the eight individuals studied, all non-synonymous, truncating and frameshift mutations across all known IBD genes were identified. A unique profile of rare and potentially damaging variants was evident for each patient with this

  18. Detection of Clinically Relevant Genetic Variants in Autism Spectrum Disorder by Whole-Genome Sequencing

    PubMed Central

    Jiang, Yong-hui; Yuen, Ryan K.C.; Jin, Xin; Wang, Mingbang; Chen, Nong; Wu, Xueli; Ju, Jia; Mei, Junpu; Shi, Yujian; He, Mingze; Wang, Guangbiao; Liang, Jieqin; Wang, Zhe; Cao, Dandan; Carter, Melissa T.; Chrysler, Christina; Drmic, Irene E.; Howe, Jennifer L.; Lau, Lynette; Marshall, Christian R.; Merico, Daniele; Nalpathamkalam, Thomas; Thiruvahindrapuram, Bhooma; Thompson, Ann; Uddin, Mohammed; Walker, Susan; Luo, Jun; Anagnostou, Evdokia; Zwaigenbaum, Lonnie; Ring, Robert H.; Wang, Jian; Lajonchere, Clara; Wang, Jun; Shih, Andy; Szatmari, Peter; Yang, Huanming; Dawson, Geraldine; Li, Yingrui; Scherer, Stephen W.

    2013-01-01

    Autism Spectrum Disorder (ASD) demonstrates high heritability and familial clustering, yet the genetic causes remain only partially understood as a result of extensive clinical and genomic heterogeneity. Whole-genome sequencing (WGS) shows promise as a tool for identifying ASD risk genes as well as unreported mutations in known loci, but an assessment of its full utility in an ASD group has not been performed. We used WGS to examine 32 families with ASD to detect de novo or rare inherited genetic variants predicted to be deleterious (loss-of-function and damaging missense mutations). Among ASD probands, we identified deleterious de novo mutations in six of 32 (19%) families and X-linked or autosomal inherited alterations in ten of 32 (31%) families (some had combinations of mutations). The proportion of families identified with such putative mutations was larger than has been previously reported; this yield was in part due to the comprehensive and uniform coverage afforded by WGS. Deleterious variants were found in four unrecognized, nine known, and eight candidate ASD risk genes. Examples include CAPRIN1 and AFF2 (both linked to FMR1, which is involved in fragile X syndrome), VIP (involved in social-cognitive deficits), and other genes such as SCN2A and KCNQ2 (linked to epilepsy), NRXN1, and CHD7, which causes ASD-associated CHARGE syndrome. Taken together, these results suggest that WGS and thorough bioinformatic analyses for de novo and rare inherited mutations will improve the detection of genetic variants likely to be associated with ASD or its accompanying clinical symptoms. PMID:23849776

  19. Predicting intrinsic disorder from amino acid sequence.

    PubMed

    Obradovic, Zoran; Peng, Kang; Vucetic, Slobodan; Radivojac, Predrag; Brown, Celeste J; Dunker, A Keith

    2003-01-01

    Blind predictions of intrinsic order and disorder were made on 42 proteins subsequently revealed to contain 9,044 ordered residues, 284 disordered residues in 26 segments of length 30 residues or less, and 281 disordered residues in 2 disordered segments of length greater than 30 residues. The accuracies of the six predictors used in this experiment ranged from 77% to 91% for the ordered regions and from 56% to 78% for the disordered segments. The average of the order and disorder predictions ranged from 73% to 77%. The prediction of disorder in the shorter segments was poor, from 25% to 66% correct, while the prediction of disorder in the longer segments was better, from 75% to 95% correct. Four of the predictors were composed of ensembles of neural networks. This enabled them to deal more efficiently with the large asymmetry in the training data through diversified sampling from the significantly larger ordered set and achieve better accuracy on ordered and long disordered regions. The exclusive use of long disordered regions for predictor training likely contributed to the disparity of the predictions on long versus short disordered regions, while averaging the output values over 61-residue windows to eliminate short predictions of order or disorder probably contributed to the even greater disparity for three of the predictors. This experiment supports the predictability of intrinsic disorder from amino acid sequence. PMID:14579347

  20. Whole Exome Sequencing of Distant Relatives in Multiplex Families Implicates Rare Variants in Candidate Genes for Oral Clefts

    PubMed Central

    Bureau, Alexandre; Parker, Margaret M.; Ruczinski, Ingo; Taub, Margaret A.; Marazita, Mary L.; Murray, Jeffrey C.; Mangold, Elisabeth; Noethen, Markus M.; Ludwig, Kirsten U.; Hetmanski, Jacqueline B.; Bailey-Wilson, Joan E.; Cropp, Cheryl D.; Li, Qing; Szymczak, Silke; Albacha-Hejazi, Hasan; Alqosayer, Khalid; Field, L. Leigh; Wu-Chou, Yah-Huei; Doheny, Kimberly F.; Ling, Hua; Scott, Alan F.; Beaty, Terri H.

    2014-01-01

    A dozen genes/regions have been confirmed as genetic risk factors for oral clefts in human association and linkage studies, and animal models argue even more genes may be involved. Genomic sequencing studies should identify specific causal variants and may reveal additional genes as influencing risk to oral clefts, which have a complex and heterogeneous etiology. We conducted a whole exome sequencing (WES) study to search for potentially causal variants using affected relatives drawn from multiplex cleft families. Two or three affected second, third, and higher degree relatives from 55 multiplex families were sequenced. We examined rare single nucleotide variants (SNVs) shared by affected relatives in 348 recognized candidate genes. Exact probabilities that affected relatives would share these rare variants were calculated, given pedigree structures, and corrected for the number of variants tested. Five novel and potentially damaging SNVs shared by affected distant relatives were found and confirmed by Sanger sequencing. One damaging SNV in CDH1, shared by three affected second cousins from a single family, attained statistical significance (P = 0.02 after correcting for multiple tests). Family-based designs such as the one used in this WES study offer important advantages for identifying genes likely to be causing complex and heterogeneous disorders. PMID:24793288

  1. Direct mutation analysis by high-throughput sequencing: from germline to low-abundant, somatic variants

    PubMed Central

    Gundry, Michael; Vijg, Jan

    2011-01-01

    DNA mutations are the source of genetic variation within populations. The majority of mutations with observable effects are deleterious. In humans mutations in the germ line can cause genetic disease. In somatic cells multiple rounds of mutations and selection lead to cancer. The study of genetic variation has progressed rapidly since the completion of the draft sequence of the human genome. Recent advances in sequencing technology, most importantly the introduction of massively parallel sequencing (MPS), have resulted in more than a hundred-fold reduction in the time and cost required for sequencing nucleic acids. These improvements have greatly expanded the use of sequencing as a practical tool for mutation analysis. While in the past the high cost of sequencing limited mutation analysis to selectable markers or small forward mutation targets assumed to be representative for the genome overall, current platforms allow whole genome sequencing for less than $5,000. This has already given rise to direct estimates of germline mutation rates in multiple organisms including humans by comparing whole genome sequences between parents and offspring. Here we present a brief history of the field of mutation research, with a focus on classical tools for the measurement of mutation rates. We then review MPS, how it is currently applied and the new insight into human and animal mutation frequencies and spectra that has been obtained from whole genome sequencing. While great progress has been made, we note that the single most important limitation of current MPS approaches for mutation analysis is the inability to address low-abundance mutations that turn somatic tissues into mosaics of cells. Such mutations are at the basis of intra-tumor heterogeneity, with important implications for clinical diagnosis, and could also contribute to somatic diseases other than cancer, including aging. Some possible approaches to gain access to low-abundance mutations are discussed, with a

  2. elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling

    PubMed Central

    Decap, Dries; Fostier, Jan; Reumers, Joke

    2015-01-01

    elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878), we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878), elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost. PMID:26182406

  3. Whole-Exome Sequencing Links a Variant in DHDDS to Retinitis Pigmentosa

    PubMed Central

    Züchner, Stephan; Dallman, Julia; Wen, Rong; Beecham, Gary; Naj, Adam; Farooq, Amjad; Kohli, Martin A.; Whitehead, Patrice L.; Hulme, William; Konidari, Ioanna; Edwards, Yvonne J.K.; Cai, Guiqing; Peter, Inga; Seo, David; Buxbaum, Joseph D.; Haines, Jonathan L.; Blanton, Susan; Young, Juan; Alfonso, Eduardo; Vance, Jeffery M.; Lam, Byron L.; Peričak-Vance, Margaret A.

    2011-01-01

    Increasingly, mutations in genes causing Mendelian disease will be supported by individual and small families only; however, exome sequencing studies have thus far focused on syndromic phenotypes characterized by low locus heterogeneity. In contrast, retinitis pigmentosa (RP) is caused by >50 known genes, which still explain only half of the clinical cases. In a single, one-generation, nonsyndromic RP family, we have identified a gene, dehydrodolichol diphosphate synthase (DHDDS), demonstrating the power of combining whole-exome sequencing with rapid in vivo studies. DHDDS is a highly conserved essential enzyme for dolichol synthesis, permitting global N-linked glycosylation. Zebrafish studies showed virtually identical photoreceptor defects as observed with N-linked glycosylation-interfering mutations in the light-sensing protein rhodopsin. The identified Lys42Glu variant likely arose from an ancestral founder, because eight of the nine identified alleles in 27,174 control chromosomes were of confirmed Ashkenazi Jewish ethnicity. These findings demonstrate the power of exome sequencing linked to functional studies when faced with challenging study designs and, importantly, link RP to the pathways of N-linked glycosylation, which promise new avenues for therapeutic interventions. PMID:21295283

  4. Characterization of functional methylomes by next-generation capture sequencing identifies novel disease-associated variants

    PubMed Central

    Allum, Fiona; Shao, Xiaojian; Guénard, Frédéric; Simon, Marie-Michelle; Busche, Stephan; Caron, Maxime; Lambourne, John; Lessard, Julie; Tandre, Karolina; Hedman, Åsa K.; Kwan, Tony; Ge, Bing; Rönnblom, Lars; McCarthy, Mark I.; Deloukas, Panos; Richmond, Todd; Burgess, Daniel; Spector, Timothy D.; Tchernof, André; Marceau, Simon; Lathrop, Mark; Vohl, Marie-Claude; Pastinen, Tomi; Grundberg, Elin; Ahmadi, Kourosh R.; Ainali, Chrysanthi; Barrett, Amy; Bataille, Veronique; Bell, Jordana T.; Buil, Alfonso; Dermitzakis, Emmanouil T.; Dimas, Antigone S.; Durbin, Richard; Glass, Daniel; Hassanali, Neelam; Ingle, Catherine; Knowles, David; Krestyaninova, Maria; Lindgren, Cecilia M.; Lowe, Christopher E.; Meduri, Eshwar; di Meglio, Paola; Min, Josine L.; Montgomery, Stephen B.; Nestle, Frank O.; Nica, Alexandra C.; Nisbet, James; O'Rahilly, Stephen; Parts, Leopold; Potter, Simon; Sandling, Johanna; Sekowska, Magdalena; Shin, So-Youn; Small, Kerrin S.; Soranzo, Nicole; Surdulescu, Gabriela; Travers, Mary E.; Tsaprouni, Loukia; Tsoka, Sophia; Wilk, Alicja; Yang, Tsun-Po; Zondervan, Krina T.

    2015-01-01

    Most genome-wide methylation studies (EWAS) of multifactorial disease traits use targeted arrays or enrichment methodologies preferentially covering CpG-dense regions, to characterize sufficiently large samples. To overcome this limitation, we present here a new customizable, cost-effective approach, methylC-capture sequencing (MCC-Seq), for sequencing functional methylomes, while simultaneously providing genetic variation information. To illustrate MCC-Seq, we use whole-genome bisulfite sequencing on adipose tissue (AT) samples and public databases to design AT-specific panels. We establish its efficiency for high-density interrogation of methylome variability by systematic comparisons with other approaches and demonstrate its applicability by identifying novel methylation variation within enhancers strongly correlated to plasma triglyceride and HDL-cholesterol, including at CD36. Our more comprehensive AT panel assesses tissue methylation and genotypes in parallel at ∼4 and ∼3 M sites, respectively. Our study demonstrates that MCC-Seq provides comparable accuracy to alternative approaches but enables more efficient cataloguing of functional and disease-relevant epigenetic and genetic variants for large-scale EWAS. PMID:26021296

  5. Identification of Genome-Wide Variants and Discovery of Variants Associated with Brassica rapa Clubroot Resistance Gene Rcr1 through Bulked Segregant RNA Sequencing.

    PubMed

    Yu, Fengqun; Zhang, Xingguo; Huang, Zhen; Chu, Mingguang; Song, Tao; Falk, Kevin C; Deora, Abhinandan; Chen, Qilin; Zhang, Yan; McGregor, Linda; Gossen, Bruce D; McDonald, Mary Ruth; Peng, Gary

    2016-01-01

    Clubroot, caused by Plasmodiophora brassicae, is an important disease on Brassica species worldwide. A clubroot resistance gene, Rcr1, with efficacy against pathotype 3 of P. brassicae, was previously mapped to chromosome A03 of B. rapa in pak choy cultivar "Flower Nabana". In the current study, resistance to pathotypes 2, 5 and 6 was shown to be associated with Rcr1 region on chromosome A03. Bulked segregant RNA sequencing was performed and short read sequences were assembled into 10 chromosomes of the B. rapa reference genome v1.5. For the resistant (R) bulks, a total of 351.8 million (M) sequences, 30,836.5 million bases (Mb) in length, produced 120-fold coverage of the reference genome. For the susceptible (S) bulks, 322.9 M sequences, 28,216.6 Mb in length, produced 109-fold coverage. In total, 776.2 K single nucleotide polymorphisms (SNPs) and 122.2 K insertion / deletion (InDels) in R bulks and 762.8 K SNPs and 118.7 K InDels in S bulks were identified; each chromosome had about 87% SNPs and 13% InDels, with 78% monomorphic and 22% polymorphic variants between the R and S bulks. Polymorphic variants on each chromosome were usually below 23%, but made up 34% of the variants on chromosome A03. There were 35 genes annotated in the Rcr1 target region and variants were identified in 21 genes. The numbers of poly variants differed significantly among the genes. Four out of them encode Toll-Interleukin-1 receptor / nucleotide-binding site / leucine-rich-repeat proteins; Bra019409 and Bra019410 harbored the higher numbers of polymorphic variants, which indicates that they are more likely candidates of Rcr1. Fourteen SNP markers in the target region were genotyped using the Kompetitive Allele Specific PCR method and were confirmed to associate with Rcr1. Selected SNP markers were analyzed with 26 recombinants obtained from a segregating population consisting of 1587 plants, indicating that they were completely linked to Rcr1. Nine SNP markers were used for marker

  6. Identification of Genome-Wide Variants and Discovery of Variants Associated with Brassica rapa Clubroot Resistance Gene Rcr1 through Bulked Segregant RNA Sequencing

    PubMed Central

    Yu, Fengqun; Zhang, Xingguo; Huang, Zhen; Chu, Mingguang; Song, Tao; Falk, Kevin C.; Deora, Abhinandan; Chen, Qilin; Zhang, Yan; McGregor, Linda; Gossen, Bruce D.; McDonald, Mary Ruth; Peng, Gary

    2016-01-01

    Clubroot, caused by Plasmodiophora brassicae, is an important disease on Brassica species worldwide. A clubroot resistance gene, Rcr1, with efficacy against pathotype 3 of P. brassicae, was previously mapped to chromosome A03 of B. rapa in pak choy cultivar “Flower Nabana”. In the current study, resistance to pathotypes 2, 5 and 6 was shown to be associated with Rcr1 region on chromosome A03. Bulked segregant RNA sequencing was performed and short read sequences were assembled into 10 chromosomes of the B. rapa reference genome v1.5. For the resistant (R) bulks, a total of 351.8 million (M) sequences, 30,836.5 million bases (Mb) in length, produced 120-fold coverage of the reference genome. For the susceptible (S) bulks, 322.9 M sequences, 28,216.6 Mb in length, produced 109-fold coverage. In total, 776.2 K single nucleotide polymorphisms (SNPs) and 122.2 K insertion / deletion (InDels) in R bulks and 762.8 K SNPs and 118.7 K InDels in S bulks were identified; each chromosome had about 87% SNPs and 13% InDels, with 78% monomorphic and 22% polymorphic variants between the R and S bulks. Polymorphic variants on each chromosome were usually below 23%, but made up 34% of the variants on chromosome A03. There were 35 genes annotated in the Rcr1 target region and variants were identified in 21 genes. The numbers of poly variants differed significantly among the genes. Four out of them encode Toll-Interleukin-1 receptor / nucleotide-binding site / leucine-rich-repeat proteins; Bra019409 and Bra019410 harbored the higher numbers of polymorphic variants, which indicates that they are more likely candidates of Rcr1. Fourteen SNP markers in the target region were genotyped using the Kompetitive Allele Specific PCR method and were confirmed to associate with Rcr1. Selected SNP markers were analyzed with 26 recombinants obtained from a segregating population consisting of 1587 plants, indicating that they were completely linked to Rcr1. Nine SNP markers were used for marker

  7. High-Throughput Sequencing Reveals Single Nucleotide Variants in Longer-Kernel Bread Wheat

    PubMed Central

    Chen, Feng; Zhu, Zibo; Zhou, Xiaobian; Yan, Yan; Dong, Zhongdong; Cui, Dangqun

    2016-01-01

    The transcriptomes of bread wheat Yunong 201 and its ethyl methanesulfonate derivative Yunong 3114 were obtained by next-sequencing technology. Single nucleotide variants (SNVs) in the wheat strains were explored and compared. A total of 5907 and 6287 non-synonymous SNVs were acquired for Yunong 201 and 3114, respectively. A total of 4021 genes with SNVs were obtained. The genes that underwent non-synonymous SNVs were significantly involved in ATP binding, protein phosphorylation, and cellular protein metabolic process. The heat map analysis also indicated that most of these mutant genes were significantly differentially expressed at different developmental stages. The SNVs in these genes possibly contribute to the longer kernel length of Yunong 3114. Our data provide useful information on wheat transcriptome for future studies on wheat functional genomics. This study could also help in illustrating the gene functions of the non-synonymous SNVs of Yunong 201 and 3114. PMID:27551288

  8. Dual-color detection of DNA sequence variants by ligase-mediated analysis

    SciTech Connect

    Samiotaki, M.; Kwiatkowski, M.; Parik, J.; Landegren, U. )

    1994-03-15

    Genetic screening for sequence variants associated with disease is assuming increasing importance in clinical medicine as well as in research. The authors describe an efficient method for such analyses, comprising a combination of practical features: (1) Amplified DNA samples are analyzed for their ability to serve as templates in standardized allele-specific ligation reactions between oligonucleotide probes; (2) Two allele-specific probes, differentially labeled with either of two lanthanide labels, compete for ligation to a third oligonucleotide (the signal from the two labeled probes can thus be directly compared in a sensitive time-resolved fluorescence detection reaction); and (3) Large sets of analyses are processed in parallel using a 96-pin capture manifold, serving to reduce pipetting steps and the risk of contamination. The authors present here the basis of the technique and its application to the screening for two common mutations causing cystic fibrosis and [alpha][sub 1]-antiytrypsin deficiency. 19 refs., 4 figs.

  9. High-Throughput Sequencing Reveals Single Nucleotide Variants in Longer-Kernel Bread Wheat.

    PubMed

    Chen, Feng; Zhu, Zibo; Zhou, Xiaobian; Yan, Yan; Dong, Zhongdong; Cui, Dangqun

    2016-01-01

    The transcriptomes of bread wheat Yunong 201 and its ethyl methanesulfonate derivative Yunong 3114 were obtained by next-sequencing technology. Single nucleotide variants (SNVs) in the wheat strains were explored and compared. A total of 5907 and 6287 non-synonymous SNVs were acquired for Yunong 201 and 3114, respectively. A total of 4021 genes with SNVs were obtained. The genes that underwent non-synonymous SNVs were significantly involved in ATP binding, protein phosphorylation, and cellular protein metabolic process. The heat map analysis also indicated that most of these mutant genes were significantly differentially expressed at different developmental stages. The SNVs in these genes possibly contribute to the longer kernel length of Yunong 3114. Our data provide useful information on wheat transcriptome for future studies on wheat functional genomics. This study could also help in illustrating the gene functions of the non-synonymous SNVs of Yunong 201 and 3114. PMID:27551288

  10. HapFABIA: Identification of very short segments of identity by descent characterized by rare variants in large sequencing data

    PubMed Central

    Hochreiter, Sepp

    2013-01-01

    Identity by descent (IBD) can be reliably detected for long shared DNA segments, which are found in related individuals. However, many studies contain cohorts of unrelated individuals that share only short IBD segments. New sequencing technologies facilitate identification of short IBD segments through rare variants, which convey more information on IBD than common variants. Current IBD detection methods, however, are not designed to use rare variants for the detection of short IBD segments. Short IBD segments reveal genetic structures at high resolution. Therefore, they can help to improve imputation and phasing, to increase genotyping accuracy for low-coverage sequencing and to increase the power of association studies. Since short IBD segments are further assumed to be old, they can shed light on the evolutionary history of humans. We propose HapFABIA, a computational method that applies biclustering to identify very short IBD segments characterized by rare variants. HapFABIA is designed to detect short IBD segments in genotype data that were obtained from next-generation sequencing, but can also be applied to DNA microarray data. Especially in next-generation sequencing data, HapFABIA exploits rare variants for IBD detection. HapFABIA significantly outperformed competing algorithms at detecting short IBD segments on artificial and simulated data with rare variants. HapFABIA identified 160 588 different short IBD segments characterized by rare variants with a median length of 23 kb (mean 24 kb) in data for chromosome 1 of the 1000 Genomes Project. These short IBD segments contain 752 000 single nucleotide variants (SNVs), which account for 39% of the rare variants and 23.5% of all variants. The vast majority—152 000 IBD segments—are shared by Africans, while only 19 000 and 11 000 are shared by Europeans and Asians, respectively. IBD segments that match the Denisova or the Neandertal genome are found significantly more often in Asians and Europeans but also

  11. HapFABIA: identification of very short segments of identity by descent characterized by rare variants in large sequencing data.

    PubMed

    Hochreiter, Sepp

    2013-12-01

    Identity by descent (IBD) can be reliably detected for long shared DNA segments, which are found in related individuals. However, many studies contain cohorts of unrelated individuals that share only short IBD segments. New sequencing technologies facilitate identification of short IBD segments through rare variants, which convey more information on IBD than common variants. Current IBD detection methods, however, are not designed to use rare variants for the detection of short IBD segments. Short IBD segments reveal genetic structures at high resolution. Therefore, they can help to improve imputation and phasing, to increase genotyping accuracy for low-coverage sequencing and to increase the power of association studies. Since short IBD segments are further assumed to be old, they can shed light on the evolutionary history of humans. We propose HapFABIA, a computational method that applies biclustering to identify very short IBD segments characterized by rare variants. HapFABIA is designed to detect short IBD segments in genotype data that were obtained from next-generation sequencing, but can also be applied to DNA microarray data. Especially in next-generation sequencing data, HapFABIA exploits rare variants for IBD detection. HapFABIA significantly outperformed competing algorithms at detecting short IBD segments on artificial and simulated data with rare variants. HapFABIA identified 160 588 different short IBD segments characterized by rare variants with a median length of 23 kb (mean 24 kb) in data for chromosome 1 of the 1000 Genomes Project. These short IBD segments contain 752 000 single nucleotide variants (SNVs), which account for 39% of the rare variants and 23.5% of all variants. The vast majority-152 000 IBD segments-are shared by Africans, while only 19 000 and 11 000 are shared by Europeans and Asians, respectively. IBD segments that match the Denisova or the Neandertal genome are found significantly more often in Asians and Europeans but also, in

  12. Mapping whole genome shotgun sequence and variant calling in mammalian species without their reference genomes.

    PubMed

    Kalbfleisch, Ted; Heaton, Michael P

    2013-01-01

    Genomics research in mammals has produced reference genome sequences that are essential for identifying variation associated with disease.  High quality reference genome sequences are now available for humans, model species, and economically important agricultural animals.  Comparisons between these species have provided unique insights into mammalian gene function.  However, the number of species with reference genomes is small compared to those needed for studying molecular evolutionary relationships in the tree of life.  For example, among the even-toed ungulates there are approximately 300 species whose phylogenetic relationships have been calculated in the 10k trees project.  Only six of these have reference genomes:  cattle, swine, sheep, goat, water buffalo, and bison.  Although reference sequences will eventually be developed for additional hoof stock, the resources in terms of time, money, infrastructure and expertise required to develop a quality reference genome may be unattainable for most species for at least another decade.  In this work we mapped 35 Gb of next generation sequence data of a Katahdin sheep to its own species' reference genome ( Ovis aries Oar3.1) and to that of a species that diverged 15 to 30 million years ago ( Bos taurus UMD3.1).  In total, 56% of reads covered 76% of UMD3.1 to an average depth of 6.8 reads per site, 83 million variants were identified, of which 78 million were homozygous and likely represent interspecies nucleotide differences. Excluding repeat regions and sex chromosomes, nearly 3.7 million heterozygous sites were identified in this animal vs. bovine UMD3.1, representing polymorphisms occurring in sheep.  Of these, 41% could be readily mapped to orthologous positions in ovine Oar3.1 with 80% corroborated as heterozygous.  These variant sites, identified via interspecies mapping could be used for comparative genomics, disease association studies, and ultimately to understand mammalian gene

  13. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2002-01-01

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  14. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2006-07-04

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  15. Kit for detecting nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    2001-01-01

    A kit is provided for detecting a target nucleic acid sequence in a sample, the kit comprising: a first hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the first hybridization probe including a first complexing agent for forming a binding pair with a second complexing agent; and a second hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the first hybridization probe does not selectively hybridize, the second hybridization probe including a detectable marker; a third hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the third hybridization probe including the same detectable marker as the second hybridization probe; and a fourth hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the third hybridization probe does not selectively hybridize, the fourth hybridization probe including the first complexing agent for forming a binding pair with the second complexing agent; wherein the first and second hybridization probes are capable of simultaneously hybridizing to the target sequence and the third and fourth hybridization probes are capable of simultaneously hybridizing to the target sequence, the detectable marker is not present on the first or fourth hybridization probes and the first, second, third, and fourth hybridization probes each include a competitive nucleic acid sequence which is sufficiently complementary to a third portion of the target sequence that the competitive sequences of the first, second, third, and fourth hybridization probes compete with each other to hybridize to the third portion of the

  16. Prioritization Of Nonsynonymous Single Nucleotide Variants For Exome Sequencing Studies Via Integrative Learning On Multiple Genomic Data

    PubMed Central

    Wu, Mengmeng; Wu, Jiaxin; Chen, Ting; Jiang, Rui

    2015-01-01

    The rapid advancement of next generation sequencing technology has greatly accelerated the progress for understanding human inherited diseases via such innovations as exome sequencing. Nevertheless, the identification of causative variants from sequencing data remains a great challenge. Traditional statistical genetics approaches such as linkage analysis and association studies have limited power in analyzing exome sequencing data, while relying on simply filtration strategies and predicted functional implications of mutations to pinpoint pathogenic variants are prone to produce false positives. To overcome these limitations, we herein propose a supervised learning approach, termed snvForest, to prioritize candidate nonsynonymous single nucleotide variants for a specific type of disease by integrating 11 functional scores at the variant level and 8 association scores at the gene level. We conduct a series of large-scale in silico validation experiments, demonstrating the effectiveness of snvForest across 2,511 diseases of different inheritance styles and the superiority of our approach over two state-of-the-art methods. We further apply snvForest to three real exome sequencing data sets of epileptic encephalophathies and intellectual disability to show the ability of our approach to identify causative de novo mutations for these complex diseases. The online service and standalone software of snvForest are found at http://bioinfo.au.tsinghua.edu.cn/jianglab/snvforest. PMID:26459872

  17. Prioritization Of Nonsynonymous Single Nucleotide Variants For Exome Sequencing Studies Via Integrative Learning On Multiple Genomic Data.

    PubMed

    Wu, Mengmeng; Wu, Jiaxin; Chen, Ting; Jiang, Rui

    2015-01-01

    The rapid advancement of next generation sequencing technology has greatly accelerated the progress for understanding human inherited diseases via such innovations as exome sequencing. Nevertheless, the identification of causative variants from sequencing data remains a great challenge. Traditional statistical genetics approaches such as linkage analysis and association studies have limited power in analyzing exome sequencing data, while relying on simply filtration strategies and predicted functional implications of mutations to pinpoint pathogenic variants are prone to produce false positives. To overcome these limitations, we herein propose a supervised learning approach, termed snvForest, to prioritize candidate nonsynonymous single nucleotide variants for a specific type of disease by integrating 11 functional scores at the variant level and 8 association scores at the gene level. We conduct a series of large-scale in silico validation experiments, demonstrating the effectiveness of snvForest across 2,511 diseases of different inheritance styles and the superiority of our approach over two state-of-the-art methods. We further apply snvForest to three real exome sequencing data sets of epileptic encephalophathies and intellectual disability to show the ability of our approach to identify causative de novo mutations for these complex diseases. The online service and standalone software of snvForest are found at http://bioinfo.au.tsinghua.edu.cn/jianglab/snvforest. PMID:26459872

  18. Introduction to Deep Sequencing and Its Application to Drug Addiction Research with a Focus on Rare Variants

    PubMed Central

    Wang, Shaolin; Yang, Zhongli; Ma, Jennie Z.; Payne, Thomas J.; Li, Ming D

    2013-01-01

    Through linkage analysis, candidate gene approach, and genome-wide association studies (GWAS), many genetic susceptibility factors for substance dependence have been discovered, such as the alcohol dehydrogenase gene (ALDH2) for alcohol dependence (AD) and nicotinic acetylcholine receptor (nAChR) subunit variants on chromosomes 8 and 15 for nicotine dependence (ND). However, these confirmed genetic factors contribute only a small portion of the heritability responsible for each addiction. Among many potential factors, rare variants in those identified and unidentified susceptibility genes are supposed to contribute greatly to the missing heritability. Several studies focusing on rare variants have been conducted by taking advantage of next-generation sequencing technologies, which revealed that some rare variants of nAChR subunits are associated with ND in both genetic and functional studies. However, these studies investigated variants for only a small number of genes and need to be expanded to broad regions/genes in a larger population. This review presents an update on recently developed methods for rare-variant identification and association analysis and on studies focused on rare-variant discovery and function related to addictions. PMID:23990377

  19. Introduction to deep sequencing and its application to drug addiction research with a focus on rare variants.

    PubMed

    Wang, Shaolin; Yang, Zhongli; Ma, Jennie Z; Payne, Thomas J; Li, Ming D

    2014-02-01

    Through linkage analysis, candidate gene approach, and genome-wide association studies (GWAS), many genetic susceptibility factors for substance dependence have been discovered such as the alcohol dehydrogenase gene (ALDH2) for alcohol dependence (AD) and nicotinic acetylcholine receptor (nAChR) subunit variants on chromosomes 8 and 15 for nicotine dependence (ND). However, these confirmed genetic factors contribute only a small portion of the heritability responsible for each addiction. Among many potential factors, rare variants in those identified and unidentified susceptibility genes are supposed to contribute greatly to the missing heritability. Several studies focusing on rare variants have been conducted by taking advantage of next-generation sequencing technologies, which revealed that some rare variants of nAChR subunits are associated with ND in both genetic and functional studies. However, these studies investigated variants for only a small number of genes and need to be expanded to broad regions/genes in a larger population. This review presents an update on recently developed methods for rare-variant identification and association analysis and on studies focused on rare-variant discovery and function related to addictions. PMID:23990377

  20. Evolutionary balancing is critical for correctly forecasting disease-associated amino acid variants.

    PubMed

    Liu, Li; Kumar, Sudhir

    2013-06-01

    Computational predictions have become indispensable for evaluating the disease-related impact of nonsynonymous single-nucleotide variants discovered in exome sequencing. Many such methods have their roots in molecular evolution, as they use information derived from multiple sequence alignments. We show that the performance of current methods (e.g., PolyPhen-2 and SIFT) is improved significantly by optimizing their statistical models on evolutionarily balanced training data, where equal numbers of positive and negative controls within each evolutionary conservation class are used. Evolutionary balancing significantly reduces the false-positive rates for variants observed at highly conserved sites and false-negative rates for variants observed at fast evolving sites. Use of these improved methods enables more accurate forecasting when concordant diagnosis from multiple methods is regarded as a more reliable indicator of the prediction. Applied to a large exome variation data set, we find that the current methods produce concordant predictions for less than half of the population variants. These advances are implemented in a web resource for use in practical applications (www.mypeg.info, last accessed March 13, 2013). PMID:23462317

  1. Sequence analysis of three pigmentation genes in the Newfoundland population of Canis latrans links the Golden Retriever Mc1r variant to white coat color in coyotes.

    PubMed

    Brockerville, Ryan M; McGrath, Michael J; Pilgrim, Brettney L; Marshall, H Dawn

    2013-04-01

    Three genes, Mc1r, Agouti, and CBD103, interact in a type-switching process that controls much of the pigmentation variation observed in mammals. A deletion in the CBD103 gene is responsible for dominant black color in dogs, while the white-phased black bear ("spirit bear") of British Columbia, Canada, is the lightest documented color variant caused by a mutation in Mc1r. Rare all-white animals have recently been discovered in a new northeastern population of the coyote in insular Newfoundland and Labrador, Canada. To investigate the causative gene and mutation of white coat in coyotes, we sequenced the three type-switching genes in white and dark-phased animals from Newfoundland. The only sequence variants unambiguously associated with white color were in Mc1r, and one of these variants causes the amino acid variant R306Ter, a premature stop codon also linked to coat color in Golden Retrievers and other dogs with yellow/red coats. The allele carrying R306Ter in coyotes matches that in the Golden Retriever at other variable amino acid sites and hence may have originated in these dogs. Coyotes experienced introgression with wolves and dogs as they colonized northeastern North America, and coyote/Golden Retriever interactions have been observed in Newfoundland. We speculate that natural selection, with or without a founder effect, may contribute to the observed frequency of white coyotes in Newfoundland, as it has contributed to the high frequency of white bears, and of a domestic dog-derived CBD allele in gray wolves. PMID:23297074

  2. Solid phase sequencing of double-stranded nucleic acids

    DOEpatents

    Fu, Dong-Jing; Cantor, Charles R.; Koster, Hubert; Smith, Cassandra L.

    2002-01-01

    This invention relates to methods for detecting and sequencing of target double-stranded nucleic acid sequences, to nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probe comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include nucleic acids in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated determination of molecular weights and identification of the target sequence.

  3. Analysis and Annotation of Nucleic Acid Sequence

    SciTech Connect

    States, David J.

    2004-07-28

    The aims of this project were to develop improved methods for computational genome annotation and to apply these methods to improve the annotation of genomic sequence data with a specific focus on human genome sequencing. The project resulted in a substantial body of published work. Notable contributions of this project were the identification of basecalling and lane tracking as error processes in genome sequencing and contributions to improved methods for these steps in genome sequencing. This technology improved the accuracy and throughput of genome sequence analysis. Probabilistic methods for physical map construction were developed. Improved methods for sequence alignment, alternative splicing analysis, promoter identification and NF kappa B response gene prediction were also developed.

  4. Analysis and Annotation of Nucleic Acid Sequence

    SciTech Connect

    David J. States

    1998-08-01

    The aims of this project were to develop improved methods for computational genome annotation and to apply these methods to improve the annotation of genomic sequence data with a specific focus on human genome sequencing. The project resulted in a substantial body of published work. Notable contributions of this project were the identification of basecalling and lane tracking as error processes in genome sequencing and contributions to improved methods for these steps in genome sequencing. This technology improved the accuracy and throughput of genome sequence analysis. Probabilistic methods for physical map construction were developed. Improved methods for sequence alignment, alternative splicing analysis, promoter identification and NF kappa B response gene prediction were also developed.

  5. Exome sequencing reveals recurrent germ line variants in patients with familial Waldenström macroglobulinemia.

    PubMed

    Roccaro, Aldo M; Sacco, Antonio; Shi, Jiantao; Chiarini, Marco; Perilla-Glen, Adriana; Manier, Salomon; Glavey, Siobhan; Aljawai, Yosra; Mishima, Yuji; Kawano, Yawara; Moschetta, Michele; Correll, Mick; Improgo, Ma Reina; Brown, Jennifer R; Imberti, Luisa; Rossi, Giuseppe; Castillo, Jorge J; Treon, Steven P; Freedman, Matthew L; Van Allen, Eliezer M; Hide, Winston; Hiller, Elaine; Rainville, Irene; Ghobrial, Irene M

    2016-05-26

    Familial aggregation of Waldenström macroglobulinemia (WM) cases, and the clustering of B-cell lymphoproliferative disorders among first-degree relatives of WM patients, has been reported. Nevertheless, the possible contribution of inherited susceptibility to familial WM remains unrevealed. We performed whole exome sequencing on germ line DNA obtained from 4 family members in which coinheritance for WM was documented in 3 of them, and screened additional independent 246 cases by using gene-specific mutation sequencing. Among the shared germ line variants, LAPTM5(c403t) and HCLS1(g496a) were the most recurrent, being present in 3/3 affected members of the index family, detected in 8% of the unrelated familial cases, and present in 0.5% of the nonfamilial cases and in <0.05 of a control population. LAPTM5 and HCLS1 appeared as relevant WM candidate genes that characterized familial WM individuals and were also functionally relevant to the tumor clone. These findings highlight potentially novel contributors for the genetic predisposition to familial WM and indicate that LAPTM5(c403t) and HCLS1(g496a) may represent predisposition alleles in patients with familial WM. PMID:26903547

  6. Assessing pathogenicity for novel mutation/sequence variants: the value of healthy older individuals.

    PubMed

    Zatz, Mayana; Pavanello, Rita de Cassia M; Lourenço, Naila Cristina V; Cerqueira, Antonia; Lazar, Monize; Vainzof, Mariz

    2012-12-01

    Improvement in DNA technology is increasingly revealing unexpected/unknown mutations in healthy persons and generating anxiety due to their still unknown health consequences. We report a 44-year-old healthy father of a 10-year-old daughter with bilateral coloboma and hearing loss, but without muscle weakness, in whom a whole-genome CGH revealed a deletion of exons 38-44 in the dystrophin gene. This mutation was inherited from her asymptomatic father, who was further clinically and molecularly evaluated for prognosis and genetic counseling (GC). This deletion was never identified by us in 982 Duchenne/Becker patients. To assess whether the present case represents a rare case of non-penetrance, and aiming to obtain more information for prognosis and GC, we suggested that healthy older relatives submit their DNA for analysis, to which several complied. Mutation analysis revealed that his mother, brother, and 56-year-old maternal uncle also carry the 38-44 deletion, suggesting it an unlikely cause of muscle weakness. Genome sequencing will disclose mutations and variants whose health impact are still unknown, raising important problems in interpreting results, defining prognosis, and discussing GC. We suggest that, in addition to family history, keeping the DNA of older relatives could be very informative, in particular for those interested in having their genome sequenced. PMID:22707356

  7. Analysis of ANK3 and CACNA1C variants identified in bipolar disorder whole genome sequence data

    PubMed Central

    Fiorentino, Alessia; O'Brien, Niamh Louise; Locke, Devin Paul; McQuillin, Andrew; Jarram, Alexandra; Anjorin, Adebayo; Kandaswamy, Radhika; Curtis, David; Blizard, Robert Alan; Gurling, Hugh Malcolm Douglas

    2014-01-01

    Objectives Genetic markers in the genes encoding ankyrin 3 (ANK3) and the α-calcium channel subunit (CACNA1C) are associated with bipolar disorder (BP). The associated variants in the CACNA1C gene are mainly within intron 3 of the gene. ANK3 BP-associated variants are in two distinct clusters at the ends of the gene, indicating disease allele heterogeneity. Methods In order to screen both coding and non-coding regions to identify potential aetiological variants, we used whole-genome sequencing in 99 BP cases. Variants with markedly different allele frequencies in the BP samples and the 1,000 genomes project European data were genotyped in 1,510 BP cases and 1,095 controls. Results We found that the CACNA1C intron 3 variant, rs79398153, potentially affecting an ENCyclopedia of DNA Elements (ENCODE)-defined region, showed an association with BP (p = 0.015). We also found the ANK3 BP-associated variant rs139972937, responsible for an asparagine to serine change (p = 0.042). However, a previous study had not found support for an association between rs139972937 and BP. The variants at ANK3 and CACNA1C previously known to be associated with BP were not in linkage disequilibrium with either of the two variants that we identified and these are therefore independent of the previous haplotypes implicated by genome-wide association. Conclusions Sequencing in additional BP samples is needed to find the molecular pathology that explains the previous association findings. If changes similar to those we have found can be shown to have an effect on the expression and function of ANK3 and CACNA1C, they might help to explain the so-called ‘missing heritability’ of BP. PMID:24716743

  8. Pooled Sequencing of Candidate Genes Implicates Rare Variants in the Development of Asthma Following Severe RSV Bronchiolitis in Infancy

    PubMed Central

    Torgerson, Dara G.; Giri, Tusar; Druley, Todd E.; Zheng, Jie; Huntsman, Scott; Seibold, Max A.; Young, Andrew L.; Schweiger, Toni; Yin-Declue, Huiqing; Sajol, Geneline D.; Schechtman, Kenneth B; Hernandez, Ryan D.; Randolph, Adrienne G.; Bacharier, Leonard B.; Castro, Mario

    2015-01-01

    Severe infection with respiratory syncytial virus (RSV) during infancy is strongly associated with the development of asthma. To identify genetic variation that contributes to asthma following severe RSV bronchiolitis during infancy, we sequenced the coding exons of 131 asthma candidate genes in 182 European and African American children with severe RSV bronchiolitis in infancy using anonymous pools for variant discovery, and then directly genotyped a set of 190 nonsynonymous variants. Association testing was performed for physician-diagnosed asthma before the 7th birthday (asthma) using genotypes from 6,500 individuals from the Exome Sequencing Project (ESP) as controls to gain statistical power. In addition, among patients with severe RSV bronchiolitis during infancy, we examined genetic associations with asthma, active asthma, persistent wheeze, and bronchial hyperreactivity (methacholine PC20) at age 6 years. We identified four rare nonsynonymous variants that were significantly associated with asthma following severe RSV bronchiolitis, including single variants in ADRB2, FLG and NCAM1 in European Americans (p = 4.6x10-4, 1.9x10-13 and 5.0x10-5, respectively), and NOS1 in African Americans (p = 2.3x10-11). One of the variants was a highly functional nonsynonymous variant in ADRB2 (rs1800888), which was also nominally associated with asthma (p = 0.027) and active asthma (p = 0.013) among European Americans with severe RSV bronchiolitis without including the ESP. Our results suggest that rare nonsynonymous variants contribute to the development of asthma following severe RSV bronchiolitis in infancy, notably in ADRB2. Additional studies are required to explore the role of rare variants in the etiology of asthma and asthma-related traits following severe RSV bronchiolitis. PMID:26587832

  9. Identification of deep intronic variants in 15 haemophilia A patients by next generation sequencing of the whole factor VIII gene.

    PubMed

    Bach, J Elisa; Wolf, Beat; Oldenburg, Johannes; Müller, Clemens R; Rost, Simone

    2015-10-01

    Current screening methods for factor VIII gene (F8) mutations can reveal the causative alteration in the vast majority of haemophilia A patients. Yet, standard diagnostic methods fail in about 2% of cases. This study aimed at analysing the entire intronic sequences of the F8 gene in 15 haemophilia A patients by next generation sequencing. All patients had a mild to moderate phenotype and no mutation in the coding sequence and splice sites of the F8 gene could be diagnosed so far. Next generation sequencing data revealed 23 deep intronic candidate variants in several F8 introns, including six recurrent variants and three variants that have been described before. One patient additionally showed a deletion of 9.2 kb in intron 1, mediated by Alu-type repeats. Several bioinformatic tools were used to score the variants in comparison to known pathogenic F8 mutations in order to predict their deleteriousness. Pedigree analyses showed a correct segregation pattern for three of the presumptive mutations. In each of the 15 patients analysed, at least one deep intronic variant in the F8 gene was identified and predicted to alter F8 mRNA splicing. Reduced F8 mRNA levels and/or stability would be well compatible with the patients' mild to moderate haemophilia A phenotypes. The next generation sequencing approach used proved an efficient method to screen the complete F8 gene and could be applied as a one-stop sequencing method for molecular diagnostics of haemophilia A. PMID:25948085

  10. Sequence analysis of MHC class I alpha 2 domain exon variants in one diploid and two haploid Atlantic salmon pedigrees.

    PubMed

    Grimholt, U; Olsaker, I; Lingaas, F; Lie, O

    1997-12-01

    Genetic diversity in the second domain exon of Atlantic salmon (Salmo salar) major histocompatibility complex (Mhc) class I was investigated in two dams and nine of their haploid offspring by means of polymerase chain reaction (PCR) and DNA sequence analysis. A similar study was also performed on nine diploid offspring from one of these dams. The complex segregation patterns and sequence similarities between variants make definitive allele, haplotype and locus assignments difficult. There are, however, indications of six Mhc-Sasa class I loci and a fairly well-defined haplotype of four variants. One non-polymorphic variant present in most specimens could be a salmon analogue to the human non-classical loci. PMID:9589580

  11. Genome Sequence of WAU86/88-1, a New Variant of Vaccinia Virus Lister Strain from Poland.

    PubMed

    Mavian, Carla; López-Bueno, Alberto; Alcamí, Antonio

    2014-01-01

    The poxviruses Warsaw Agricultural University 86 (WAU86) and 88-1 (WAU88-1) were isolated in 1986 to 1988 from separate outbreaks in laboratory mice in Poland and described as ectromelia virus isolates. The genome sequences of these poxviruses reveal that they are almost identical and represent a novel variant of the vaccinia virus Lister strain. PMID:24407630

  12. A Systematic Assessment of Accuracy in Detecting Somatic Mosaic Variants by Deep Amplicon Sequencing: Application to NF2 Gene

    PubMed Central

    Sestini, Roberta; Candita, Luisa; Capone, Gabriele Lorenzo; Barbetti, Lorenzo; Falconi, Serena; Frusconi, Sabrina; Giotti, Irene; Giuliani, Costanza; Torricelli, Francesca; Benelli, Matteo; Papi, Laura

    2015-01-01

    The accurate detection of low-allelic variants is still challenging, particularly for the identification of somatic mosaicism, where matched control sample is not available. High throughput sequencing, by the simultaneous and independent analysis of thousands of different DNA fragments, might overcome many of the limits of traditional methods, greatly increasing the sensitivity. However, it is necessary to take into account the high number of false positives that may arise due to the lack of matched control samples. Here, we applied deep amplicon sequencing to the analysis of samples with known genotype and variant allele fraction (VAF) followed by a tailored statistical analysis. This method allowed to define a minimum value of VAF for detecting mosaic variants with high accuracy. Then, we exploited the estimated VAF to select candidate alterations in NF2 gene in 34 samples with unknown genotype (30 blood and 4 tumor DNAs), demonstrating the suitability of our method. The strategy we propose optimizes the use of deep amplicon sequencing for the identification of low abundance variants. Moreover, our method can be applied to different high throughput sequencing approaches to estimate the background noise and define the accuracy of the experimental design. PMID:26066488

  13. Exploring the feasibility of using copy number variants as genetic markers through large-scale whole genome sequencing experiments

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Copy number variants (CNV) are large scale duplications or deletions of genomic sequence that are caused by a diverse set of molecular phenomena that are distinct from single nucleotide polymorphism (SNP) formation. Due to their different mechanisms of formation, CNVs are often difficult to track us...

  14. tmVar: a text mining approach for extracting sequence variants in biomedical literature

    PubMed Central

    Wei, Chih-Hsuan; Harris, Bethany R.; Kao, Hung-Yu; Lu, Zhiyong

    2013-01-01

    Motivation: Text-mining mutation information from the literature becomes a critical part of the bioinformatics approach for the analysis and interpretation of sequence variations in complex diseases in the post-genomic era. It has also been used for assisting the creation of disease-related mutation databases. Most of existing approaches are rule-based and focus on limited types of sequence variations, such as protein point mutations. Thus, extending their extraction scope requires significant manual efforts in examining new instances and developing corresponding rules. As such, new automatic approaches are greatly needed for extracting different kinds of mutations with high accuracy. Results: Here, we report tmVar, a text-mining approach based on conditional random field (CRF) for extracting a wide range of sequence variants described at protein, DNA and RNA levels according to a standard nomenclature developed by the Human Genome Variation Society. By doing so, we cover several important types of mutations that were not considered in past studies. Using a novel CRF label model and feature set, our method achieves higher performance than a state-of-the-art method on both our corpus (91.4 versus 78.1% in F-measure) and their own gold standard (93.9 versus 89.4% in F-measure). These results suggest that tmVar is a high-performance method for mutation extraction from biomedical literature. Availability: tmVar software and its corpus of 500 manually curated abstracts are available for download at http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/pub/tmVar. Contact: zhiyong.lu@nih.gov PMID:23564842

  15. Whole-exome imputation of sequence variants identified two novel alleles associated with adult body height in African Americans

    PubMed Central

    Du, Mengmeng; Auer, Paul L.; Jiao, Shuo; Haessler, Jeffrey; Altshuler, David; Boerwinkle, Eric; Carlson, Christopher S.; Carty, Cara L.; Chen, Yii-Der Ida; Curtis, Keith; Franceschini, Nora; Hsu, Li; Jackson, Rebecca; Lange, Leslie A.; Lettre, Guillaume; Monda, Keri L.; Nickerson, Deborah A.; Reiner, Alex P.; Rich, Stephen S.; Rosse, Stephanie A.; Rotter, Jerome I.; Willer, Cristen J.; Wilson, James G.; North, Kari; Kooperberg, Charles; Heard-Costa, Nancy; Peters, Ulrike

    2014-01-01

    Adult body height is a quantitative trait for which genome-wide association studies (GWAS) have identified numerous loci, primarily in European populations. These loci, comprising common variants, explain <10% of the phenotypic variance in height. We searched for novel associations between height and common (minor allele frequency, MAF ≥5%) or infrequent (0.5% < MAF < 5%) variants across the exome in African Americans. Using a reference panel of 1692 African Americans and 471 Europeans from the National Heart, Lung, and Blood Institute's (NHLBI) Exome Sequencing Project (ESP), we imputed whole-exome sequence data into 13 719 African Americans with existing array-based GWAS data (discovery). Variants achieving a height-association threshold of P < 5E−06 in the imputed dataset were followed up in an independent sample of 1989 African Americans with whole-exome sequence data (replication). We used P < 2.5E−07 (=0.05/196 779 variants) to define statistically significant associations in meta-analyses combining the discovery and replication sets (N = 15 708). We discovered and replicated three independent loci for association: 5p13.3/C5orf22/rs17410035 (MAF = 0.10, β = 0.64 cm, P = 8.3E−08), 13q14.2/SPRYD7/rs114089985 (MAF = 0.03, β = 1.46 cm, P = 4.8E−10) and 17q23.3/GH2/rs2006123 (MAF = 0.30; β = 0.47 cm; P = 4.7E−09). Conditional analyses suggested 5p13.3 (C5orf22/rs17410035) and 13q14.2 (SPRYD7/rs114089985) may harbor novel height alleles independent of previous GWAS-identified variants (r2 with GWAS loci <0.01); whereas 17q23.3/GH2/rs2006123 was correlated with GWAS-identified variants in European and African populations. Notably, 13q14.2/rs114089985 is infrequent in African Americans (MAF = 3%), extremely rare in European Americans (MAF = 0.03%), and monomorphic in Asian populations, suggesting it may be an African-American-specific height allele. Our findings demonstrate that whole-exome imputation of sequence variants can identify low

  16. Dietary fatty acids modulate associations between genetic variants and circulating fatty acids in plasma and erythrocyte membranes: meta-analysis of 9 studies in the CHARGE consortium

    PubMed Central

    Smith, Caren E.; Follis, Jack L.; Nettleton, Jennifer A.; Foy, Millennia; Wu, Jason H.Y.; Ma, Yiyi; Tanaka, Toshiko; Manichakul, Ani W.; Wu, Hongyu; Chu, Audrey Y.; Steffen, Lyn M.; Fornage, Myriam; Mozaffarian, Dariush; Kabagambe, Edmond K.; Ferruci, Luigi; da Chen, Yii-Der I; Rich, Stephen S.; Djoussé, Luc; Ridker, Paul M.; Tang, Weihong; McKnight, Barbara; Tsai, Michael Y.; Bandinelli, Stefania; Rotter, Jerome I.; Hu, Frank B.; Chasman, Daniel I.; Psaty, Bruce M.; Arnett, Donna K.; King, Irena B.; Sun, Qi; Wang, Lu; Lumley, Thomas; Chiuve, Stephanie E.; Siscovick, David S; Ordovás, José M.; Lemaitre, Rozenn N.

    2015-01-01

    Scope Tissue concentrations of omega-3 fatty acids may reduce cardiovascular disease risk, and genetic variants are associated with circulating fatty acids concentrations. Whether dietary fatty acids interact with genetic variants to modify circulating omega-3 fatty acids is unclear. Objective We evaluated interactions between genetic variants and fatty acid intakes for circulating alpha-linoleic acid (ALA), eicosapentaenoic acid (EPA), docosahexaenoic acid (DHA) and docosapentaenoic acid (DPA). Methods and Results We conducted meta-analyses (N to 11,668) evaluating interactions between dietary fatty acids and genetic variants (rs174538 and rs174548 in FADS1 (fatty acid desaturase 1), rs7435 in AGPAT3 (1-acyl-sn-glycerol-3-phosphate), rs4985167 in PDXDC1 (pyridoxal-dependent decarboxylase domain-containing 1), rs780094 in GCKR (glucokinase regulatory protein) and rs3734398 in ELOVL2 (fatty acid elongase 2)). Stratification by measurement compartment (plasma vs. erthyrocyte) revealed compartment-specific interactions between FADS1 rs174538 and rs174548 and dietary ALA and linoleic acid for DHA and DPA. Conclusion Our findings reinforce earlier reports that genetically-based differences in circulating fatty acids may be partially due to differences in the conversion of fatty acid precursors. Further, fatty acids measurement compartment may modify gene-diet relationships, and considering compartment may improve the detection of gene-fatty acids interactions for circulating fatty acid outcomes. PMID:25626431

  17. Exome Sequencing in an Admixed Isolated Population Indicates NFXL1 Variants Confer a Risk for Specific Language Impairment

    PubMed Central

    Villanueva, Pía; Nudel, Ron; Hoischen, Alexander; Fernández, María Angélica; Simpson, Nuala H.; Gilissen, Christian; Reader, Rose H.; Jara, Lillian; Echeverry, Maria Magdalena; Francks, Clyde; Baird, Gillian; Conti-Ramsden, Gina; O’Hare, Anne; Bolton, Patrick F.; Hennessy, Elizabeth R.; Palomino, Hernán; Carvajal-Carmona, Luis; Veltman, Joris A.; Cazier, Jean-Baptiste; De Barbieri, Zulema

    2015-01-01

    Children affected by Specific Language Impairment (SLI) fail to acquire age appropriate language skills despite adequate intelligence and opportunity. SLI is highly heritable, but the understanding of underlying genetic mechanisms has proved challenging. In this study, we use molecular genetic techniques to investigate an admixed isolated founder population from the Robinson Crusoe Island (Chile), who are affected by a high incidence of SLI, increasing the power to discover contributory genetic factors. We utilize exome sequencing in selected individuals from this population to identify eight coding variants that are of putative significance. We then apply association analyses across the wider population to highlight a single rare coding variant (rs144169475, Minor Allele Frequency of 4.1% in admixed South American populations) in the NFXL1 gene that confers a nonsynonymous change (N150K) and is significantly associated with language impairment in the Robinson Crusoe population (p = 2.04 × 10–4, 8 variants tested). Subsequent sequencing of NFXL1 in 117 UK SLI cases identified four individuals with heterozygous variants predicted to be of functional consequence. We conclude that coding variants within NFXL1 confer an increased risk of SLI within a complex genetic model. PMID:25781923

  18. Exome sequencing in an admixed isolated population indicates NFXL1 variants confer a risk for specific language impairment.

    PubMed

    Villanueva, Pía; Nudel, Ron; Hoischen, Alexander; Fernández, María Angélica; Simpson, Nuala H; Gilissen, Christian; Reader, Rose H; Jara, Lillian; Echeverry, María Magdalena; Echeverry, Maria Magdalena; Francks, Clyde; Baird, Gillian; Conti-Ramsden, Gina; O'Hare, Anne; Bolton, Patrick F; Hennessy, Elizabeth R; Palomino, Hernán; Carvajal-Carmona, Luis; Veltman, Joris A; Cazier, Jean-Baptiste; De Barbieri, Zulema; Fisher, Simon E; Newbury, Dianne F

    2015-03-01

    Children affected by Specific Language Impairment (SLI) fail to acquire age appropriate language skills despite adequate intelligence and opportunity. SLI is highly heritable, but the understanding of underlying genetic mechanisms has proved challenging. In this study, we use molecular genetic techniques to investigate an admixed isolated founder population from the Robinson Crusoe Island (Chile), who are affected by a high incidence of SLI, increasing the power to discover contributory genetic factors. We utilize exome sequencing in selected individuals from this population to identify eight coding variants that are of putative significance. We then apply association analyses across the wider population to highlight a single rare coding variant (rs144169475, Minor Allele Frequency of 4.1% in admixed South American populations) in the NFXL1 gene that confers a nonsynonymous change (N150K) and is significantly associated with language impairment in the Robinson Crusoe population (p = 2.04 × 10-4, 8 variants tested). Subsequent sequencing of NFXL1 in 117 UK SLI cases identified four individuals with heterozygous variants predicted to be of functional consequence. We conclude that coding variants within NFXL1 confer an increased risk of SLI within a complex genetic model. PMID:25781923

  19. Compact variant-rich customized sequence database and a fast and sensitive database search for efficient proteogenomic analyses.

    PubMed

    Park, Heejin; Bae, Junwoo; Kim, Hyunwoo; Kim, Sangok; Kim, Hokeun; Mun, Dong-Gi; Joh, Yoonsung; Lee, Wonyeop; Chae, Sehyun; Lee, Sanghyuk; Kim, Hark Kyun; Hwang, Daehee; Lee, Sang-Won; Paek, Eunok

    2014-12-01

    In proteogenomic analysis, construction of a compact, customized database from mRNA-seq data and a sensitive search of both reference and customized databases are essential to accurately determine protein abundances and structural variations at the protein level. However, these tasks have not been systematically explored, but rather performed in an ad-hoc fashion. Here, we present an effective method for constructing a compact database containing comprehensive sequences of sample-specific variants--single nucleotide variants, insertions/deletions, and stop-codon mutations derived from Exome-seq and RNA-seq data. It, however, occupies less space by storing variant peptides, not variant proteins. We also present an efficient search method for both customized and reference databases. The separate searches of the two databases increase the search time, and a unified search is less sensitive to identify variant peptides due to the smaller size of the customized database, compared to the reference database, in the target-decoy setting. Our method searches the unified database once, but performs target-decoy validations separately. Experimental results show that our approach is as fast as the unified search and as sensitive as the separate searches. Our customized database includes mutation information in the headers of variant peptides, thereby facilitating the inspection of peptide-spectrum matches. PMID:25316439

  20. Ultradeep Sequencing for Detection of Quasispecies Variants in the Major Hydrophilic Region of Hepatitis B Virus in Indonesian Patients

    PubMed Central

    Yamani, Laura Navika; Utsumi, Takako; Juniastuti; Wandono, Hadi; Widjanarko, Doddy; Triantanoe, Ari; Wasityastuti, Widya; Liang, Yujiao; Okada, Rina; Tanahashi, Toshihito; Murakami, Yoshiki; Azuma, Takeshi; Soetjipto; Lusida, Maria Inge; Hayashi, Yoshitake

    2015-01-01

    Quasispecies of hepatitis B virus (HBV) with variations in the major hydrophilic region (MHR) of the HBV surface antigen (HBsAg) can evolve during infection, allowing HBV to evade neutralizing antibodies. These escape variants may contribute to chronic infections. In this study, we looked for MHR variants in HBV quasispecies using ultradeep sequencing and evaluated the relationship between these variants and clinical manifestations in infected patients. We enrolled 30 Indonesian patients with hepatitis B infection (11 with chronic hepatitis and 19 with advanced liver disease). The most common subgenotype/subtype of HBV was B3/adw (97%). The HBsAg titer was lower in patients with advanced liver disease than that in patients with chronic hepatitis. The MHR variants were grouped based on the percentage of the viral population affected: major, ≥20% of the total population; intermediate, 5% to <20%; and minor, 1% to <5%. The rates of MHR variation that were present in the major and intermediate viral population were significantly greater in patients with advanced liver disease than those in chronic patients. The most frequent MHR variants related to immune evasion in the major and intermediate populations were P120Q/T, T123A, P127T, Q129H/R, M133L/T, and G145R. The major population of MHR variants causing impaired of HBsAg secretion (e.g., G119R, Q129R, T140I, and G145R) was detected only in advanced liver disease patients. This is the first study to use ultradeep sequencing for the detection of MHR variants of HBV quasispecies in Indonesian patients. We found that a greater number of MHR variations was related to disease severity and reduced likelihood of HBsAg titer. PMID:26202119

  1. Variant association tools for quality control and analysis of large-scale sequence and genotyping array data.

    PubMed

    Wang, Gao T; Peng, Bo; Leal, Suzanne M

    2014-05-01

    Currently there is great interest in detecting associations between complex traits and rare variants. In this report, we describe Variant Association Tools (VAT) and the VAT pipeline, which implements best practices for rare-variant association studies. Highlights of VAT include variant-site and call-level quality control (QC), summary statistics, phenotype- and genotype-based sample selection, variant annotation, selection of variants for association analysis, and a collection of rare-variant association methods for analyzing qualitative and quantitative traits. The association testing framework for VAT is regression based, which readily allows for flexible construction of association models with multiple covariates and weighting themes based on allele frequencies or predicted functionality. Additionally, pathway analyses, conditional analyses, and analyses of gene-gene and gene-environment interactions can be performed. VAT is capable of rapidly scanning through data by using multi-process computation, adaptive permutation, and simultaneously conducting association analysis via multiple methods. Results are available in text or graphic file formats and additionally can be output to relational databases for further annotation and filtering. An interface to R language also facilitates user implementation of novel association methods. The VAT's data QC and association-analysis pipeline can be applied to sequence, imputed, and genotyping array, e.g., "exome chip," data, providing a reliable and reproducible computational environment in which to analyze small- to large-scale studies with data from the latest genotyping and sequencing technologies. Application of the VAT pipeline is demonstrated through analysis of data from the 1000 Genomes project. PMID:24791902

  2. Ultradeep Sequencing for Detection of Quasispecies Variants in the Major Hydrophilic Region of Hepatitis B Virus in Indonesian Patients.

    PubMed

    Yamani, Laura Navika; Yano, Yoshihiko; Utsumi, Takako; Juniastuti; Wandono, Hadi; Widjanarko, Doddy; Triantanoe, Ari; Wasityastuti, Widya; Liang, Yujiao; Okada, Rina; Tanahashi, Toshihito; Murakami, Yoshiki; Azuma, Takeshi; Soetjipto; Lusida, Maria Inge; Hayashi, Yoshitake

    2015-10-01

    Quasispecies of hepatitis B virus (HBV) with variations in the major hydrophilic region (MHR) of the HBV surface antigen (HBsAg) can evolve during infection, allowing HBV to evade neutralizing antibodies. These escape variants may contribute to chronic infections. In this study, we looked for MHR variants in HBV quasispecies using ultradeep sequencing and evaluated the relationship between these variants and clinical manifestations in infected patients. We enrolled 30 Indonesian patients with hepatitis B infection (11 with chronic hepatitis and 19 with advanced liver disease). The most common subgenotype/subtype of HBV was B3/adw (97%). The HBsAg titer was lower in patients with advanced liver disease than that in patients with chronic hepatitis. The MHR variants were grouped based on the percentage of the viral population affected: major, ≥20% of the total population; intermediate, 5% to <20%; and minor, 1% to <5%. The rates of MHR variation that were present in the major and intermediate viral population were significantly greater in patients with advanced liver disease than those in chronic patients. The most frequent MHR variants related to immune evasion in the major and intermediate populations were P120Q/T, T123A, P127T, Q129H/R, M133L/T, and G145R. The major population of MHR variants causing impaired of HBsAg secretion (e.g., G119R, Q129R, T140I, and G145R) was detected only in advanced liver disease patients. This is the first study to use ultradeep sequencing for the detection of MHR variants of HBV quasispecies in Indonesian patients. We found that a greater number of MHR variations was related to disease severity and reduced likelihood of HBsAg titer. PMID:26202119

  3. From Artificial Amino Acids to Sequence-Defined Targeted Oligoaminoamides.

    PubMed

    Morys, Stephan; Wagner, Ernst; Lächelt, Ulrich

    2016-01-01

    Artificial oligoamino acids with appropriate protecting groups can be used for the sequential assembly of oligoaminoamides on solid-phase. With the help of these oligoamino acids multifunctional nucleic acid (NA) carriers can be designed and produced in highly defined topologies. Here we describe the synthesis of the artificial oligoamino acid Fmoc-Stp(Boc3)-OH, the subsequent assembly into sequence-defined oligomers and the formulation of tumor-targeted plasmid DNA (pDNA) polyplexes. PMID:27436323

  4. A unified test of linkage analysis and rare-variant association for analysis of pedigree sequence data.

    PubMed

    Hu, Hao; Roach, Jared C; Coon, Hilary; Guthery, Stephen L; Voelkerding, Karl V; Margraf, Rebecca L; Durtschi, Jacob D; Tavtigian, Sean V; Shankaracharya; Wu, Wilfred; Scheet, Paul; Wang, Shuoguo; Xing, Jinchuan; Glusman, Gustavo; Hubley, Robert; Li, Hong; Garg, Vidu; Moore, Barry; Hood, Leroy; Galas, David J; Srivastava, Deepak; Reese, Martin G; Jorde, Lynn B; Yandell, Mark; Huff, Chad D

    2014-07-01

    High-throughput sequencing of related individuals has become an important tool for studying human disease. However, owing to technical complexity and lack of available tools, most pedigree-based sequencing studies rely on an ad hoc combination of suboptimal analyses. Here we present pedigree-VAAST (pVAAST), a disease-gene identification tool designed for high-throughput sequence data in pedigrees. pVAAST uses a sequence-based model to perform variant and gene-based linkage analysis. Linkage information is then combined with functional prediction and rare variant case-control association information in a unified statistical framework. pVAAST outperformed linkage and rare-variant association tests in simulations and identified disease-causing genes from whole-genome sequence data in three human pedigrees with dominant, recessive and de novo inheritance patterns. The approach is robust to incomplete penetrance and locus heterogeneity and is applicable to a wide variety of genetic traits. pVAAST maintains high power across studies of monogenic, high-penetrance phenotypes in a single pedigree to highly polygenic, common phenotypes involving hundreds of pedigrees. PMID:24837662

  5. Genetic variants of the unsaturated fatty acid receptor GPR120 relating to obesity in dogs

    PubMed Central

    MIYABE, Masahiro; GIN, Azusa; ONOZAWA, Eri; DAIMON, Mana; YAMADA, Hana; ODA, Hitomi; MORI, Akihiro; MOMOTA, Yutaka; AZAKAMI, Daigo; YAMAMOTO, Ichiro; MOCHIZUKI, Mariko; SAKO, Toshinori; TAMURA, Katsutoshi; ISHIOKA, Katsumi

    2015-01-01

    G protein-coupled receptor (GPR) 120 is an unsaturated fatty acid receptor, which is associated with various physiological functions. It is reported that the genetic variant of GPR120, p.Arg270His, is detected more in obese people, and this genetic variation functionally relates to obesity in humans. Obesity is a common nutritional disorder also in dogs, but the genetic factors have not ever been identified in dogs. In this study, we investigated the molecular structure of canine GPR120 and searched for candidate genetic variants which may relate to obesity in dogs. Canine GPR120 was highly homologous to those of other species, and seven transmembrane domains and two N-glycosylation sites were conserved. GPR120 mRNA was expressed in lung, jejunum, ileum, colon, hypothalamus, hippocampus, spinal cord, bone marrow, dermis and white adipose tissues in dogs, as those in mice and humans. Genetic variants of GPR120 were explored in client-owned 141 dogs, resulting in that 5 synonymous and 4 non-synonymous variants were found. The variant c.595C>A (p.Pro199Thr) was found in 40 dogs, and the gene frequency was significantly higher in dogs with higher body condition scores, i.e. 0.320 in BCS4–5 dogs, 0.175 in BCS3 dogs and 0.000 in BCS2 dogs. We conclude that c.595C>A (p.Pro199Thr) is a candidate variant relating to obesity, which may be helpful for nutritional management of dogs. PMID:25960032

  6. De novo assembly and next-generation sequencing to analyse full-length gene variants from codon-barcoded libraries.

    PubMed

    Cho, Namjin; Hwang, Byungjin; Yoon, Jung-ki; Park, Sangun; Lee, Joongoo; Seo, Han Na; Lee, Jeewon; Huh, Sunghoon; Chung, Jinsoo; Bang, Duhee

    2015-01-01

    Interpreting epistatic interactions is crucial for understanding evolutionary dynamics of complex genetic systems and unveiling structure and function of genetic pathways. Although high resolution mapping of en masse variant libraries renders molecular biologists to address genotype-phenotype relationships, long-read sequencing technology remains indispensable to assess functional relationship between mutations that lie far apart. Here, we introduce JigsawSeq for multiplexed sequence identification of pooled gene variant libraries by combining a codon-based molecular barcoding strategy and de novo assembly of short-read data. We first validate JigsawSeq on small sub-pools and observed high precision and recall at various experimental settings. With extensive simulations, we then apply JigsawSeq to large-scale gene variant libraries to show that our method can be reliably scaled using next-generation sequencing. JigsawSeq may serve as a rapid screening tool for functional genomics and offer the opportunity to explore evolutionary trajectories of protein variants. PMID:26387459

  7. De novo assembly and next-generation sequencing to analyse full-length gene variants from codon-barcoded libraries

    PubMed Central

    Cho, Namjin; Hwang, Byungjin; Yoon, Jung-ki; Park, Sangun; Lee, Joongoo; Seo, Han Na; Lee, Jeewon; Huh, Sunghoon; Chung, Jinsoo; Bang, Duhee

    2015-01-01

    Interpreting epistatic interactions is crucial for understanding evolutionary dynamics of complex genetic systems and unveiling structure and function of genetic pathways. Although high resolution mapping of en masse variant libraries renders molecular biologists to address genotype-phenotype relationships, long-read sequencing technology remains indispensable to assess functional relationship between mutations that lie far apart. Here, we introduce JigsawSeq for multiplexed sequence identification of pooled gene variant libraries by combining a codon-based molecular barcoding strategy and de novo assembly of short-read data. We first validate JigsawSeq on small sub-pools and observed high precision and recall at various experimental settings. With extensive simulations, we then apply JigsawSeq to large-scale gene variant libraries to show that our method can be reliably scaled using next-generation sequencing. JigsawSeq may serve as a rapid screening tool for functional genomics and offer the opportunity to explore evolutionary trajectories of protein variants. PMID:26387459

  8. Identification of Variants in Primary and Recurrent Glioblastoma Using a Cancer-Specific Gene Panel and Whole Exome Sequencing

    PubMed Central

    Virk, Selene M.; Gibson, Richard M.; Quinones-Mateu, Miguel E.; Barnholtz-Sloan, Jill S.

    2015-01-01

    Glioblastoma (GBM) is an aggressive, malignant brain tumor typically resulting in death of the patient within one year following diagnosis; and those who survive beyond this point usually present with tumor recurrence within two years (5-year survival is 5%). The genetic heterogeneity of GBM has made the molecular characterization of these tumors an area of great interest and has led to identification of molecular subtypes in GBM. The availability of sequencing platforms that are both fast and economical can further the adoption of tumor sequencing in the clinical environment, potentially leading to identification of clinically actionable genetic targets. In this pilot study, comprised of triplet samples of normal blood, primary tumor, and recurrent tumor samples from three patients; we compared the ability of Illumina whole exome sequencing (ExomeSeq) and the Ion AmpliSeq Comprehensive Cancer Panel (CCP) to identify somatic variants in patient-paired primary and recurrent tumor samples. Thirteen genes were found to harbor variants, the majority of which were exclusive to the ExomeSeq data. Surprisingly, only two variants were identified by both platforms and they were located within the PTCH1 and NF1 genes. Although preliminary in nature, this work highlights major differences in variant identification in data generated from the two platforms. Additional studies with larger samples sizes are needed to further explore the differences between these technologies and to enhance our understanding of the clinical utility of panel based platforms in genomic profiling of brain tumors. PMID:25950952

  9. Characterization of Genomic Variants Associated with Scout and Recruit Behavioral Castes in Honey Bees Using Whole-Genome Sequencing.

    PubMed

    Southey, Bruce R; Zhu, Ping; Carr-Markell, Morgan K; Liang, Zhengzheng S; Zayed, Amro; Li, Ruiqiang; Robinson, Gene E; Rodriguez-Zas, Sandra L

    2016-01-01

    Among forager honey bees, scouts seek new resources and return to the colony, enlisting recruits to collect these resources. Differentially expressed genes between these behaviors and genetic variability in scouting phenotypes have been reported. Whole-genome sequencing of 44 Apis mellifera scouts and recruits was undertaken to detect variants and further understand the genetic architecture underlying the behavioral differences between scouts and recruits. The median coverage depth in recruits and scouts was 10.01 and 10.7 X, respectively. Representation of bacterial species among the unmapped reads reflected a more diverse microbiome in scouts than recruits. Overall, 1,412,705 polymorphic positions were analyzed for associations with scouting behavior, and 212 significant (p-value < 0.0001) associations with scouting corresponding to 137 positions were detected. Most frequent putative transcription factor binding sites proximal to significant variants included Broad-complex 4, Broad-complex 1, Hunchback, and CF2-II. Three variants associated with scouting were located within coding regions of ncRNAs including one codon change (LOC102653644) and 2 frameshift indels (LOC102654879 and LOC102655256). Significant variants were also identified on the 5'UTR of membrin, and 3'UTRs of laccase 2 and diacylglycerol kinase theta. The 60 significant variants located within introns corresponded to 39 genes and most of these positions were > 1000 bp apart from each other. A number of these variants were mapped to ncRNA LOC100578102, solute carrier family 12 member 6-like gene, and LOC100576965 (meprin and TRAF-C homology domain containing gene). Functional categories represented among the genes corresponding to significant variants included: neuronal function, exoskeleton, immune response, salivary gland development, and enzymatic food processing. These categories offer a glimpse into the molecular support to the behaviors of scouts and recruits. The level of association between

  10. Performance of ACMG-AMP Variant-Interpretation Guidelines among Nine Laboratories in the Clinical Sequencing Exploratory Research Consortium.

    PubMed

    Amendola, Laura M; Jarvik, Gail P; Leo, Michael C; McLaughlin, Heather M; Akkari, Yassmine; Amaral, Michelle D; Berg, Jonathan S; Biswas, Sawona; Bowling, Kevin M; Conlin, Laura K; Cooper, Greg M; Dorschner, Michael O; Dulik, Matthew C; Ghazani, Arezou A; Ghosh, Rajarshi; Green, Robert C; Hart, Ragan; Horton, Carrie; Johnston, Jennifer J; Lebo, Matthew S; Milosavljevic, Aleksandar; Ou, Jeffrey; Pak, Christine M; Patel, Ronak Y; Punj, Sumit; Richards, Carolyn Sue; Salama, Joseph; Strande, Natasha T; Yang, Yaping; Plon, Sharon E; Biesecker, Leslie G; Rehm, Heidi L

    2016-06-01

    Evaluating the pathogenicity of a variant is challenging given the plethora of types of genetic evidence that laboratories consider. Deciding how to weigh each type of evidence is difficult, and standards have been needed. In 2015, the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) published guidelines for the assessment of variants in genes associated with Mendelian diseases. Nine molecular diagnostic laboratories involved in the Clinical Sequencing Exploratory Research (CSER) consortium piloted these guidelines on 99 variants spanning all categories (pathogenic, likely pathogenic, uncertain significance, likely benign, and benign). Nine variants were distributed to all laboratories, and the remaining 90 were evaluated by three laboratories. The laboratories classified each variant by using both the laboratory's own method and the ACMG-AMP criteria. The agreement between the two methods used within laboratories was high (K-alpha = 0.91) with 79% concordance. However, there was only 34% concordance for either classification system across laboratories. After consensus discussions and detailed review of the ACMG-AMP criteria, concordance increased to 71%. Causes of initial discordance in ACMG-AMP classifications were identified, and recommendations on clarification and increased specification of the ACMG-AMP criteria were made. In summary, although an initial pilot of the ACMG-AMP guidelines did not lead to increased concordance in variant interpretation, comparing variant interpretations to identify differences and having a common framework to facilitate resolution of those differences were beneficial for improving agreement, allowing iterative movement toward increased reporting consistency for variants in genes associated with monogenic disease. PMID:27181684

  11. Whole-exome sequencing to identify genetic risk variants underlying inhibitor development in severe hemophilia A patients.

    PubMed

    Gorski, Marcin M; Blighe, Kevin; Lotta, Luca A; Pappalardo, Emanuela; Garagiola, Isabella; Mancini, Ilaria; Mancuso, Maria Elisa; Fasulo, Maria Rosaria; Santagostino, Elena; Peyvandi, Flora

    2016-06-01

    The development of neutralizing antibodies (inhibitors) against coagulation factor VIII (FVIII) is the most problematic and costly complication of FVIII replacement therapy that affects up to 30% of previously untreated patients with severe hemophilia A. The development of inhibitors is a multifactorial complication involving environmental and genetic factors. Among the latter, F8 gene mutations, ethnicity, family history of inhibitors, and polymorphisms affecting genes involved in the immune response have been previously investigated. To identify novel genetic elements underling the risk of inhibitor development in patients with severe hemophilia A, we applied whole-exome sequencing (WES) and data analysis in a selected group of 26 Italian patients with (n = 17) and without (n = 9) inhibitors. WES revealed several rare, damaging variants in immunoregulatory genes as novel candidate mutations. A case-control association analysis using Cochran-Armitage and Fisher's exact statistical tests identified 1364 statistically significant variants. Hierarchical clustering of these genetic variants showed 2 distinct patterns of homozygous variants with a protective or harmful role in inhibitor development. When looking solely at coding variants, a total of 28 nonsynonymous variants were identified and replicated in 53 inhibitor-positive and 174 inhibitor-negative Italian severe hemophilia A patients using a TaqMan genotyping assay. The genotyping results revealed 10 variants showing estimated odds ratios in the same direction as in the discovery phase and confirmed the association of the rs3754689 missense variant (OR 0.58; 95% CI 0.36-0.94; P = .028) in a highly conserved haplotype region surrounding the LCT locus on chromosome 2q21 with inhibitor development. PMID:27060170

  12. The Swedish new variant of Chlamydia trachomatis: genome sequence, morphology, cell tropism and phenotypic characterization

    PubMed Central

    Unemo, Magnus; Seth-Smith, Helena M. B.; Cutcliffe, Lesley T.; Skilton, Rachel J.; Barlow, David; Goulding, David; Persson, Kenneth; Harris, Simon R.; Kelly, Anne; Bjartling, Carina; Fredlund, Hans; Olcén, Per; Thomson, Nicholas R.; Clarke, Ian N.

    2010-01-01

    Chlamydia trachomatis is a major cause of bacterial sexually transmitted infections worldwide. In 2006, a new variant of C. trachomatis (nvCT), carrying a 377 bp deletion within the plasmid, was reported in Sweden. This deletion included the targets used by the commercial diagnostic systems from Roche and Abbott. The nvCT is clonal (serovar/genovar E) and it spread rapidly in Sweden, undiagnosed by these systems. The degree of spread may also indicate an increased biological fitness of nvCT. The aims of this study were to describe the genome of nvCT, to compare the nvCT genome to all available C. trachomatis genome sequences and to investigate the biological properties of nvCT. An early nvCT isolate (Sweden2) was analysed by genome sequencing, growth kinetics, microscopy, cell tropism assay and antimicrobial susceptibility testing. It was compared with relevant C. trachomatis isolates, including a similar serovar E C. trachomatis wild-type strain that circulated in Sweden prior to the initially undetected expansion of nvCT. The nvCT genome does not contain any major genetic polymorphisms – the genes for central metabolism, development cycle and virulence are conserved – or phenotypic characteristics that indicate any altered biological fitness. This is supported by the observations that the nvCT and wild-type C. trachomatis infections are very similar in terms of epidemiological distribution, and that differences in clinical signs are only described, in one study, in women. In conclusion, the nvCT does not appear to have any altered biological fitness. Therefore, the rapid transmission of nvCT in Sweden was due to the strong diagnostic selective advantage and its introduction into a high-frequency transmitting population. PMID:20093289

  13. Next-generation re-sequencing of genes involved in increased platelet reactivity in diabetic patients on acetylsalicylic acid.

    PubMed

    Postula, Marek; Janicki, Piotr K; Eyileten, Ceren; Rosiak, Marek; Kaplon-Cieslicka, Agnieszka; Sugino, Shigekazu; Wilimski, Radosław; Kosior, Dariusz A; Opolski, Grzegorz; Filipiak, Krzysztof J; Mirowska-Guzel, Dagmara

    2016-06-01

    The objective of this study was to investigate whether rare missense genetic variants in several genes related to platelet functions and acetylsalicylic acid (ASA) response are associated with the platelet reactivity in patients with diabetes type 2 (T2D) on ASA therapy. Fifty eight exons and corresponding introns of eight selected genes, including PTGS1, PTGS2, TXBAS1, PTGIS, ADRA2A, ADRA2B, TXBA2R, and P2RY1 were re-sequenced in 230 DNA samples from T2D patients by using a pooled PCR amplification and next-generation sequencing by Illumina HiSeq2000. The observed non-synonymous variants were confirmed by individual genotyping of 384 DNA samples comprising of the individuals from the original discovery pools and additional verification cohort of 154 ASA-treated T2DM patients. The association between investigated phenotypes (ASA induced changes in platelets reactivity by PFA-100, VerifyNow and serum thromboxane B2 level [sTxB2]), and accumulation of rare missense variants (genetic burden) in investigated genes was tested using statistical collapsing tests. We identified a total of 35 exonic variants, including 3 common missense variants, 15 rare missense variants, and 17 synonymous variants in 8 investigated genes. The rare missense variants exhibited statistically significant difference in the accumulation pattern between a group of patients with increased and normal platelet reactivity based on PFA-100 assay. Our study suggests that genetic burden of the rare functional variants in eight genes may contribute to differences in the platelet reactivity measured with the PFA-100 assay in the T2DM patients treated with ASA. PMID:26599574

  14. Detecting frame shifts by amino acid sequence comparison.

    PubMed

    Claverie, J M

    1993-12-20

    Various amino acid substitution scoring matrices are used in conjunction with local alignments programs to detect regions of similarity and infer potential common ancestry between proteins. The usual scoring schemes derive from the implicit hypothesis that related proteins evolve from a common ancestor by the accumulation of point mutations and that amino acids tend to be progressively substituted by others with similar properties. However, other frequent single mutation events, like nucleotide insertion or deletion and gene inversion, change the translation reading frame and cause previously encoded amino acid sequences to become unrecognizable at once. Here, I derive five new types of scoring matrix, each capable of detecting a specific frame shift (deletion, insertion and inversion in 3 frames) and use them with a regular local alignments program to detect amino acid sequences that may have derived from alternative reading frames of the same nucleotide sequence. Frame shifts are inferred from the sole comparison of the protein sequences. The five scoring matrices were used with the BLASTP program to compare all the protein sequences in the Swissprot database. Surprisingly, the searches revealed hundreds of highly significant frame shift matches, of which many are likely to represent sequencing errors. Others provide some evidence that frame shift mutations might be used in protein evolution as a way to create new amino acid sequences from pre-existing coding regions. PMID:7903399

  15. CBH1 homologs and variant CBH1 cellulases

    DOEpatents

    Goedegebuur, Frits; Gualfetti, Peter; Mitchinson, Colin; Neefe, Paulien

    2011-05-31

    Disclosed are a number of homologs and variants of Hypocrea jecorina Cel7A (formerly Trichoderma reesei cellobiohydrolase I or CBH1), nucleic acids encoding the same and methods for producing the same. The homologs and variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted and/or deleted.

  16. CBH1 homologs and variant CBH1 cellulases

    DOEpatents

    Goedegebuur, Frits; Gualfetti, Peter; Mitchinson, Colin; Neefe, Paulien

    2008-11-18

    Disclosed are a number of homologs and variants of Hypocrea jecorina Cel7A (formerly Trichoderma reesei cellobiohydrolase I or CBH1), nucleic acids encoding the same and methods for producing the same. The homologs and variant cellulases have the amino acid sequence of a glycosyl hydrolase of family 7A wherein one or more amino acid residues are substituted and/or deleted.

  17. Multiple Amino Acid Sequence Alignment Nitrogenase Component 1: Insights into Phylogenetics and Structure-Function Relationships

    PubMed Central

    Howard, James B.; Kechris, Katerina J.; Rees, Douglas C.; Glazer, Alexander N.

    2013-01-01

    Amino acid residues critical for a protein's structure-function are retained by natural selection and these residues are identified by the level of variance in co-aligned homologous protein sequences. The relevant residues in the nitrogen fixation Component 1 α- and β-subunits were identified by the alignment of 95 protein sequences. Proteins were included from species encompassing multiple microbial phyla and diverse ecological niches as well as the nitrogen fixation genotypes, anf, nif, and vnf, which encode proteins associated with cofactors differing at one metal site. After adjusting for differences in sequence length, insertions, and deletions, the remaining >85% of the sequence co-aligned the subunits from the three genotypes. Six Groups, designated Anf, Vnf , and Nif I-IV, were assigned based upon genetic origin, sequence adjustments, and conserved residues. Both subunits subdivided into the same groups. Invariant and single variant residues were identified and were defined as “core” for nitrogenase function. Three species in Group Nif-III, Candidatus Desulforudis audaxviator, Desulfotomaculum kuznetsovii, and Thermodesulfatator indicus, were found to have a seleno-cysteine that replaces one cysteinyl ligand of the 8Fe:7S, P-cluster. Subsets of invariant residues, limited to individual groups, were identified; these unique residues help identify the gene of origin (anf, nif, or vnf) yet should not be considered diagnostic of the metal content of associated cofactors. Fourteen of the 19 residues that compose the cofactor pocket are invariant or single variant; the other five residues are highly variable but do not correlate with the putative metal content of the cofactor. The variable residues are clustered on one side of the cofactor, away from other functional centers in the three dimensional structure. Many of the invariant and single variant residues were not previously recognized as potentially critical and their identification provides the bases

  18. Segments of amino acid sequence similarity in beta-amylases.

    PubMed

    Friedberg, F; Rhodes, C

    1988-01-01

    In alpha-amylases from animals, plants and bacteria and in beta-amylases from plants and bacteria a number of segments exhibit amino acid sequence similarity specific to the alpha or to the beta type, respectively. In the case of the beta-amylases the similar sequence regions are extensive and they are disrupted only by short interspersed dissimilar regions. Close to the C terminus, however, no such sequence similarity exist. PMID:2464171

  19. In search of rare variants: preliminary results from whole genome sequencing of 1,325 individuals with psychophysiological endophenotypes.

    PubMed

    Vrieze, Scott I; Malone, Stephen M; Vaidyanathan, Uma; Kwong, Alan; Kang, Hyun Min; Zhan, Xiaowei; Flickinger, Matthew; Irons, Daniel; Jun, Goo; Locke, Adam E; Pistis, Giorgio; Porcu, Eleonora; Levy, Shawn; Myers, Richard M; Oetting, William; McGue, Matt; Abecasis, Goncalo; Iacono, William G

    2014-12-01

    Whole genome sequencing was completed on 1,325 individuals from 602 families, identifying 27 million autosomal variants. Genetic association tests were conducted for those individuals who had been assessed for one or more of 17 endophenotypes (N range = 802-1,185). No significant associations were found. These 27 million variants were then imputed into the full sample of individuals with psychophysiological data (N range = 3,088-4,469) and again tested for associations with the 17 endophenotypes. No association was significant. Using a gene-based variable threshold burden test of nonsynonymous variants, we obtained five significant associations. These findings are preliminary and call for additional analysis of this rich sample. We argue that larger samples, alternative study designs, and additional bioinformatics approaches will be necessary to discover associations between these endophenotypes and genomic variation. PMID:25387710

  20. Antagonistic lactic acid bacteria isolated from goat milk and identification of a novel nisin variant Lactococcus lactis

    PubMed Central

    2014-01-01

    Background The raw goat milk microbiota is considered a good source of novel bacteriocinogenic lactic acid bacteria (LAB) strains that can be exploited as an alternative for use as biopreservatives in foods. The constant demand for such alternative tools justifies studies that investigate the antimicrobial potential of such strains. Results The obtained data identified a predominance of Lactococcus and Enterococcus strains in raw goat milk microbiota with antimicrobial activity against Listeria monocytogenes ATCC 7644. Enzymatic assays confirmed the bacteriocinogenic nature of the antimicrobial substances produced by the isolated strains, and PCR reactions detected a variety of bacteriocin-related genes in their genomes. Rep-PCR identified broad genetic variability among the Enterococcus isolates, and close relations between the Lactococcus strains. The sequencing of PCR products from nis-positive Lactococcus allowed the identification of a predicted nisin variant not previously described and possessing a wide inhibitory spectrum. Conclusions Raw goat milk was confirmed as a good source of novel bacteriocinogenic LAB strains, having identified Lactococcus isolates possessing variations in their genomes that suggest the production of a nisin variant not yet described and with potential for use as biopreservatives in food due to its broad spectrum of action. PMID:24521354

  1. Whole-Genome Sequencing of a Canine Family Trio Reveals a FAM83G Variant Associated with Hereditary Footpad Hyperkeratosis.

    PubMed

    Sayyab, Shumaila; Viluma, Agnese; Bergvall, Kerstin; Brunberg, Emma; Jagannathan, Vidhya; Leeb, Tosso; Andersson, Göran; Bergström, Tomas F

    2016-03-01

    Over 250 Mendelian traits and disorders, caused by rare alleles have been mapped in the canine genome. Although each disease is rare in the dog as a species, they are collectively common and have major impact on canine health. With SNP-based genotyping arrays, genome-wide association studies (GWAS) have proven to be a powerful method to map the genomic region of interest when 10-20 cases and 10-20 controls are available. However, to identify the genetic variant in associated regions, fine-mapping and targeted resequencing is required. Here we present a new approach using whole-genome sequencing (WGS) of a family trio without prior GWAS. As a proof-of-concept, we chose an autosomal recessive disease known as hereditary footpad hyperkeratosis (HFH) in Kromfohrländer dogs. To our knowledge, this is the first time this family trio WGS-approach has been used successfully to identify a genetic variant that perfectly segregates with a canine disorder. The sequencing of three Kromfohrländer dogs from a family trio (an affected offspring and both its healthy parents) resulted in an average genome coverage of 9.2X per individual. After applying stringent filtering criteria for candidate causative coding variants, 527 single nucleotide variants (SNVs) and 15 indels were found to be homozygous in the affected offspring and heterozygous in the parents. Using the computer software packages ANNOVAR and SIFT to functionally annotate coding sequence differences, and to predict their functional effect, resulted in seven candidate variants located in six different genes. Of these, only FAM83G:c155G > C (p.R52P) was found to be concordant in eight additional cases, and 16 healthy Kromfohrländer dogs. PMID:26747202

  2. The eSNV-detect: a computational system to identify expressed single nucleotide variants from transcriptome sequencing data.

    PubMed

    Tang, Xiaojia; Baheti, Saurabh; Shameer, Khader; Thompson, Kevin J; Wills, Quin; Niu, Nifang; Holcomb, Ilona N; Boutet, Stephane C; Ramakrishnan, Ramesh; Kachergus, Jennifer M; Kocher, Jean-Pierre A; Weinshilboum, Richard M; Wang, Liewei; Thompson, E Aubrey; Kalari, Krishna R

    2014-12-16

    Rapid development of next generation sequencing technology has enabled the identification of genomic alterations from short sequencing reads. There are a number of software pipelines available for calling single nucleotide variants from genomic DNA but, no comprehensive pipelines to identify, annotate and prioritize expressed SNVs (eSNVs) from non-directional paired-end RNA-Seq data. We have developed the eSNV-Detect, a novel computational system, which utilizes data from multiple aligners to call, even at low read depths, and rank variants from RNA-Seq. Multi-platform comparisons with the eSNV-Detect variant candidates were performed. The method was first applied to RNA-Seq from a lymphoblastoid cell-line, achieving 99.7% precision and 91.0% sensitivity in the expressed SNPs for the matching HumanOmni2.5 BeadChip data. Comparison of RNA-Seq eSNV candidates from 25 ER+ breast tumors from The Cancer Genome Atlas (TCGA) project with whole exome coding data showed 90.6-96.8% precision and 91.6-95.7% sensitivity. Contrasting single-cell mRNA-Seq variants with matching traditional multicellular RNA-Seq data for the MD-MB231 breast cancer cell-line delineated variant heterogeneity among the single-cells. Further, Sanger sequencing validation was performed for an ER+ breast tumor with paired normal adjacent tissue validating 29 out of 31 candidate eSNVs. The source code and user manuals of the eSNV-Detect pipeline for Sun Grid Engine and virtual machine are available at http://bioinformaticstools.mayo.edu/research/esnv-detect/. PMID:25352556

  3. Whole-Genome Sequencing of a Canine Family Trio Reveals a FAM83G Variant Associated with Hereditary Footpad Hyperkeratosis

    PubMed Central

    Sayyab, Shumaila; Viluma, Agnese; Bergvall, Kerstin; Brunberg, Emma; Jagannathan, Vidhya; Leeb, Tosso; Andersson, Göran; Bergström, Tomas F.

    2016-01-01

    Over 250 Mendelian traits and disorders, caused by rare alleles have been mapped in the canine genome. Although each disease is rare in the dog as a species, they are collectively common and have major impact on canine health. With SNP-based genotyping arrays, genome-wide association studies (GWAS) have proven to be a powerful method to map the genomic region of interest when 10–20 cases and 10–20 controls are available. However, to identify the genetic variant in associated regions, fine-mapping and targeted resequencing is required. Here we present a new approach using whole-genome sequencing (WGS) of a family trio without prior GWAS. As a proof-of-concept, we chose an autosomal recessive disease known as hereditary footpad hyperkeratosis (HFH) in Kromfohrländer dogs. To our knowledge, this is the first time this family trio WGS-approach has been used successfully to identify a genetic variant that perfectly segregates with a canine disorder. The sequencing of three Kromfohrländer dogs from a family trio (an affected offspring and both its healthy parents) resulted in an average genome coverage of 9.2X per individual. After applying stringent filtering criteria for candidate causative coding variants, 527 single nucleotide variants (SNVs) and 15 indels were found to be homozygous in the affected offspring and heterozygous in the parents. Using the computer software packages ANNOVAR and SIFT to functionally annotate coding sequence differences, and to predict their functional effect, resulted in seven candidate variants located in six different genes. Of these, only FAM83G:c155G > C (p.R52P) was found to be concordant in eight additional cases, and 16 healthy Kromfohrländer dogs. PMID:26747202

  4. Genome Sequence of Rough and Smooth Variants of Pleomorphic Strain Lactobacillus farciminis CNCM-I-3699

    PubMed Central

    Tareb, R.; Bernardeau, M.

    2015-01-01

    The probiotic Lactobacillus farciminis CNCM-I-3699 is a pleomorphic strain exhibiting smooth and rough variants. We report their complete genomes consisting of a chromosome of 2, 4 Mb and a plasmid of 6,417 bp. The smooth variant differs by the presence of an additional plasmid of 35,418 bp. PMID:26383668

  5. Germ-line variants identified by next generation sequencing in a panel of estrogen and cancer associated genes correlate with poor clinical outcome in Lynch syndrome patients

    PubMed Central

    Jóri, Balazs; Delvoux, Bert; Blok, Marinus J.; Van de Vijver, Koen K.; de Koning, Bart; Oei, Felicia Trups; Tops, Carli M.; Speel, Ernst J. M.; Kruitwagen, Roy F.; Gomez-Garcia, Encarna B.; Romano, Andrea

    2015-01-01

    Background The risk to develop colorectal and endometrial cancers among subjects testing positive for a pathogenic Lynch syndrome mutation varies, making the risk prediction difficult. Genetic risk modifiers alter the risk conferred by inherited Lynch syndrome mutations, and their identification can improve genetic counseling. We aimed at identifying rare genetic modifiers of the risk of Lynch syndrome endometrial cancer. Methods A family based approach was used to assess the presence of genetic risk modifiers among 35 Lynch syndrome mutation carriers having either a poor clinical phenotype (early age of endometrial cancer diagnosis or multiple cancers) or a neutral clinical phenotype. Putative genetic risk modifiers were identified by Next Generation Sequencing among a panel of 154 genes involved in endometrial physiology and carcinogenesis. Results A simple pipeline, based on an allele frequency lower than 0.001 and on predicted non-conservative amino-acid substitutions returned 54 variants that were considered putative risk modifiers. The presence of two or more risk modifying variants in women carrying a pathogenic Lynch syndrome mutation was associated with a poor clinical phenotype. Conclusion A gene-panel is proposed that comprehends genes that can carry variants with putative modifying effects on the risk of Lynch syndrome endometrial cancer. Validation in further studies is warranted before considering the possible use of this tool in genetic counseling. PMID:26517685

  6. In vivo distribution and cytopathology of variants of human immunodeficiency virus type 1 showing restricted sequence variability in the V3 loop.

    PubMed Central

    Donaldson, Y K; Bell, J E; Holmes, E C; Hughes, E S; Brown, H K; Simmonds, P

    1994-01-01

    The distribution, cell tropism, and cytopathology in vivo of human immunodeficiency virus (HIV) was investigated in postmortem tissue samples from a series of HIV-infected individuals who died either of complications associated with AIDS or for unrelated reasons while they were asymptomatic. Proviral sequences were detected at a high copy number in lymphoid tissue of both presymptomatic patients and patients with AIDS, whereas significant infection of nonlymphoid tissue such as that from brains, spinal cords, and lungs were confined to those with AIDS. V3 loop sequences from both groups showed highly restricted sequence variability and a low overall positive charge of the encoded amino acid sequence compared with those of standard laboratory isolates of HIV type 1 (HIV-1). The low charge and the restriction in sequence variability were comparable to those observed with isolates showing a non-syncytium-inducing (NSI) and macrophage-tropic phenotype in vitro. All patients were either exclusively infected (six of seven cases) or predominantly infected (one case) with variants with a predicted NSI/macrophage-tropic phenotype, irrespective of the degree of disease progression. p24 antigen was detected by immunocytochemical staining of paraffin-fixed sections in the germinal centers within lymphoid tissue, although little or no antigen was found in areas of lymph node or spleen containing T lymphocytes from either presymptomatic patients or patients with AIDS. The predominant p24 antigen-expressing cells in the lungs and brains of the patients with AIDS were macrophages and microglia (in brains), frequently forming multinucleated giant cells (syncytia) even though the V3 loop sequences of these variants resembled those of NSI isolates in vitro. These studies indicate that lack of syncytium-forming ability in established T-cell lines does not necessarily predict syncytium-forming ability in primary target cells in vivo. Furthermore, variants of HIV with V3 sequences

  7. Clinically relevant variants identified in thoracic aortic aneurysm patients by research exome sequencing.

    PubMed

    Schubert, Jeffrey A; Landis, Benjamin J; Shikany, Amy R; Hinton, Robert B; Ware, Stephanie M

    2016-05-01

    Thoracic aortic aneurysm (TAA) is a genetically heterogeneous disease involving subclinical and progressive dilation of the thoracic aorta, which can lead to life-threatening complications such as dissection or rupture. Genetic testing is important for risk stratification and identification of at risk family members, and clinically available genetic testing panels have been expanding rapidly. However, when past testing results are normal, there is little evidence to guide decision-making about the indications and timing to pursue additional clinical genetic testing. Results from research based genetic testing can help inform this process. Here we present 10 TAA patients who have a family history of disease and who enrolled in research-based exome testing. Nine of these ten patients had previous clinical genetic testing that did not identify the cause of disease. We sought to determine the number of rare variants in 23 known TAA associated genes identified by research-based exome testing. In total, we found 10 rare variants in six patients. Likely pathogenic variants included a TGFB2 variant in one patient and a SMAD3 variant in another. These variants have been reported previously in individuals with similar phenotypes. Variants of uncertain significance of particular interest included novel variants in MYLK and MFAP5, which were identified in a third patient. In total, clinically reportable rare variants were found in 6/10 (60%) patients, with at least 2/10 (20%) patients having likely pathogenic variants identified. These data indicate that consideration of re-testing is important in TAA patients with previous negative or inconclusive results. PMID:26854089

  8. Genetic correction of PSA values using sequence variants associated with PSA levels

    PubMed Central

    Gudmundsson, Julius; Besenbacher, Soren; Sulem, Patrick; Gudbjartsson, Daniel F.; Olafsson, Isleifur; Arinbjarnarson, Sturla; Agnarsson, Bjarni A.; Benediktsdottir, Kristrun R.; Isaksson, Helgi J.; Kostic, Jelena P.; Gudjonsson, Sigurjon A.; Stacey, Simon N.; Gylfason, Arnaldur; Sigurdsson, Asgeir; Holm, Hilma; Bjornsdottir, Unnur S.; Eyjolfsson, Gudmundur I.; Navarrete, Sebastian; Fuertes, Fernando; Garcia-Prats, Maria D.; Polo, Eduardo; Checherita, Ionel A.; Jinga, Mariana; Badea, Paula; Aben, Katja K.; Schalken, Jack A.; van Oort, Inge M.; Sweep, Fred C.; Helfand, Brian T.; Davis, Michael; Donovan, Jenny L.; Hamdy, Freddie C.; Kristjansson, Kristleifur; Gulcher, Jeffrey R.; Masson, Gisli; Kong, Augustine; Catalona, William J.; Mayordomo, Jose I.; Geirsson, Gudmundur; Einarsson, Gudmundur V.; Barkardottir, Rosa B.; Jonsson, Eirikur; Jinga, Viorel; Mates, Dana; Kiemeney, Lambertus A.; Neal, David E.; Thorsteinsdottir, Unnur; Rafnar, Thorunn; Stefansson, Kari

    2013-01-01

    Measuring serum levels of the prostate specific antigen (PSA) is the most common screening method for prostate cancer. However, PSA levels are affected by a number of factors apart from neoplasia. Notably, around 40% of the variability of PSA levels in the general population is accounted for by inherited factors, suggesting that it may be possible to improve both sensitivity and specificity by adjusting test results for genetic effects. In order to search for sequence variants that associate with PSA levels, we performed a genome-wide association study and follow-up analysis using PSA information from 15,757 Icelandic and 454 British men not diagnosed with prostate cancer. Overall, we detected a genome-wide significant association between PSA levels and SNPs at six loci: 5p15.33 (rs2736098), 10q11 (rs10993994), 10q26 (rs10788160), 12q24 (rs11067228), 17q12 (rs4430796), and 19q13.33 (rs17632542 (KLK3: I179T), each with Pcombined < 3×10−10. Among 3,834 men who underwent a biopsy of the prostate, the 10q26, 12q24, and 19q13.33 alleles that associate with high PSA levels are associated with higher probability of a negative biopsy (OR between 1.15 and 1.27). Assessment of association between the 6 loci and prostate cancer risk in 5,325 cases and 41,417 controls from Iceland, the Netherlands, Spain, Romania, and the US showed that the SNPs at 10q26 and 12q24 were exclusively associated with PSA levels, whereas the other 4 loci also were associated with prostate cancer risk. We propose that a personalized PSA cutoff value, based on genotype, should be used when deciding to perform a prostate biopsy. PMID:21160077

  9. Interpretation, stratification and evidence for sequence variants affecting mRNA splicing in complete human genome sequences.

    PubMed

    Shirley, Ben C; Mucaki, Eliseos J; Whitehead, Tyson; Costea, Paul I; Akan, Pelin; Rogan, Peter K

    2013-04-01

    Information theory-based methods have been shown to be sensitive and specific for predicting and quantifying the effects of non-coding mutations in Mendelian diseases. We present the Shannon pipeline software for genome-scale mutation analysis and provide evidence that the software predicts variants affecting mRNA splicing. Individual information contents (in bits) of reference and variant splice sites are compared and significant differences are annotated and prioritized. The software has been implemented for CLC-Bio Genomics platform. Annotation indicates the context of novel mutations as well as common and rare SNPs with splicing effects. Potential natural and cryptic mRNA splicing variants are identified, and null mutations are distinguished from leaky mutations. Mutations and rare SNPs were predicted in genomes of three cancer cell lines (U2OS, U251 and A431), which were supported by expression analyses. After filtering, tractable numbers of potentially deleterious variants are predicted by the software, suitable for further laboratory investigation. In these cell lines, novel functional variants comprised 6-17 inactivating mutations, 1-5 leaky mutations and 6-13 cryptic splicing mutations. Predicted effects were validated by RNA-seq analysis of the three aforementioned cancer cell lines, and expression microarray analysis of SNPs in HapMap cell lines. PMID:23499923

  10. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... acids are not intended to be embraced by this definition. Any amino acid sequence that contains post-translationally modified amino acids may be described as the amino acid sequence that is initially translated... sequence of four or more amino acids or an unbranched sequence of ten or more nucleotides....

  11. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... acids are not intended to be embraced by this definition. Any amino acid sequence that contains post-translationally modified amino acids may be described as the amino acid sequence that is initially translated... sequence of four or more amino acids or an unbranched sequence of ten or more nucleotides....

  12. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... acids are not intended to be embraced by this definition. Any amino acid sequence that contains post-translationally modified amino acids may be described as the amino acid sequence that is initially translated... sequence of four or more amino acids or an unbranched sequence of ten or more nucleotides....

  13. Identification of rare DNA sequence variants in high-risk autism families and their prevalence in a large case/control population

    PubMed Central

    2014-01-01

    Background Genetics clearly plays a major role in the etiology of autism spectrum disorders (ASDs), but studies to date are only beginning to characterize the causal genetic variants responsible. Until recently, studies using multiple extended multi-generation families to identify ASD risk genes had not been undertaken. Methods We identified haplotypes shared among individuals with ASDs in large multiplex families, followed by targeted DNA capture and sequencing to identify potential causal variants. We also assayed the prevalence of the identified variants in a large ASD case/control population. Results We identified 584 non-conservative missense, nonsense, frameshift and splice site variants that might predispose to autism in our high-risk families. Eleven of these variants were observed to have odds ratios greater than 1.5 in a set of 1,541 unrelated children with autism and 5,785 controls. Three variants, in the RAB11FIP5, ABP1, and JMJD7-PLA2G4B genes, each were observed in a single case and not in any controls. These variants also were not seen in public sequence databases, suggesting that they may be rare causal ASD variants. Twenty-eight additional rare variants were observed only in high-risk ASD families. Collectively, these 39 variants identify 36 genes as ASD risk genes. Segregation of sequence variants and of copy number variants previously detected in these families reveals a complex pattern, with only a RAB11FIP5 variant segregating to all affected individuals in one two-generation pedigree. Some affected individuals were found to have multiple potential risk alleles, including sequence variants and copy number variants (CNVs), suggesting that the high incidence of autism in these families could be best explained by variants at multiple loci. Conclusions Our study is the first to use haplotype sharing to identify familial ASD risk loci. In total, we identified 39 variants in 36 genes that may confer a genetic risk of developing autism. The

  14. Cell culture media supplementation of bioflavonoids for the targeted reduction of acidic species charge variants on recombinant therapeutic proteins.

    PubMed

    Hossler, Patrick; Wang, Min; McDermott, Sean; Racicot, Christopher; Chemfe, Kofi; Zhang, Yun; Chumsae, Christopher; Manuilov, Anton

    2015-01-01

    Charge variants in recombinant proteins are an important series of protein modifications, whose potential role on protein stability, activity, immunogenicity, and pharmacokinetics continues to be studied. Monoclonal antibodies in particular have been shown to have a wide range of acidic species variants, including those associated with the addition of covalent modifications as well as the chemical degradation at specific peptide regions on the antibody. These variants play a significant role toward the overall heterogeneity of recombinant therapeutic proteins and are typically monitored during manufacturing to ensure they lie within proven acceptable ranges. In this work, it has been found that the supplementation of members of the bioflavonoid chemical family into mammalian cell culture media was effective toward the reduction of acidic species charge variants on recombinant monoclonal antibodies and dual variable domain immunoglobulins. The demonstrated reduction in acidic species through the use of bioflavonoids facilitates the manufacturing of a less heterogeneous product with potential improvements in antibody structure and function. PMID:25920009

  15. Whole Exome Sequencing for a Patient with Rubinstein-Taybi Syndrome Reveals de Novo Variants besides an Overt CREBBP Mutation

    PubMed Central

    Yoo, Hee Jeong; Kim, Kyung; Kim, In Hyang; Rho, Seong-Hwan; Park, Jong-Eun; Lee, Ki Young; Kim, Soon Ae; Choi, Byung Yoon; Kim, Namshin

    2015-01-01

    Rubinstein-Taybi syndrome (RSTS) is a rare condition with a prevalence of 1 in 125,000–720,000 births and characterized by clinical features that include facial, dental, and limb dysmorphology and growth retardation. Most cases of RSTS occur sporadically and are caused by de novo mutations. Cytogenetic or molecular abnormalities are detected in only 55% of RSTS cases. Previous genetic studies have yielded inconsistent results due to the variety of methods used for genetic analysis. The purpose of this study was to use whole exome sequencing (WES) to evaluate the genetic causes of RSTS in a young girl presenting with an Autism phenotype. We used the Autism diagnostic observation schedule (ADOS) and Autism diagnostic interview revised (ADI-R) to confirm her diagnosis of Autism. In addition, various questionnaires were used to evaluate other psychiatric features. We used WES to analyze the DNA sequences of the patient and her parents and to search for de novo variants. The patient showed all the typical features of Autism, WES revealed a de novo frameshift mutation in CREBBP and de novo sequence variants in TNC and IGFALS genes. Mutations in the CREBBP gene have been extensively reported in RSTS patients, while potential missense mutations in TNC and IGFALS genes have not previously been associated with RSTS. The TNC and IGFALS genes are involved in central nervous system development and growth. It is possible for patients with RSTS to have additional de novo variants that could account for previously unexplained phenotypes. PMID:25768348

  16. Whole exome sequencing for a patient with Rubinstein-Taybi syndrome reveals de novo variants besides an overt CREBBP mutation.

    PubMed

    Yoo, Hee Jeong; Kim, Kyung; Kim, In Hyang; Rho, Seong-Hwan; Park, Jong-Eun; Lee, Ki Young; Kim, Soon Ae; Choi, Byung Yoon; Kim, Namshin

    2015-01-01

    Rubinstein-Taybi syndrome (RSTS) is a rare condition with a prevalence of 1 in 125,000-720,000 births and characterized by clinical features that include facial, dental, and limb dysmorphology and growth retardation. Most cases of RSTS occur sporadically and are caused by de novo mutations. Cytogenetic or molecular abnormalities are detected in only 55% of RSTS cases. Previous genetic studies have yielded inconsistent results due to the variety of methods used for genetic analysis. The purpose of this study was to use whole exome sequencing (WES) to evaluate the genetic causes of RSTS in a young girl presenting with an Autism phenotype. We used the Autism diagnostic observation schedule (ADOS) and Autism diagnostic interview revised (ADI-R) to confirm her diagnosis of Autism. In addition, various questionnaires were used to evaluate other psychiatric features. We used WES to analyze the DNA sequences of the patient and her parents and to search for de novo variants. The patient showed all the typical features of Autism, WES revealed a de novo frameshift mutation in CREBBP and de novo sequence variants in TNC and IGFALS genes. Mutations in the CREBBP gene have been extensively reported in RSTS patients, while potential missense mutations in TNC and IGFALS genes have not previously been associated with RSTS. The TNC and IGFALS genes are involved in central nervous system development and growth. It is possible for patients with RSTS to have additional de novo variants that could account for previously unexplained phenotypes. PMID:25768348

  17. A method to find palindromes in nucleic acid sequences.

    PubMed

    Anjana, Ramnath; Shankar, Mani; Vaishnavi, Marthandan Kirti; Sekar, Kanagaraj

    2013-01-01

    Various types of sequences in the human genome are known to play important roles in different aspects of genomic functioning. Among these sequences, palindromic nucleic acid sequences are one such type that have been studied in detail and found to influence a wide variety of genomic characteristics. For a nucleotide sequence to be considered as a palindrome, its complementary strand must read the same in the opposite direction. For example, both the strands i.e the strand going from 5' to 3' and its complementary strand from 3' to 5' must be complementary. A typical nucleotide palindromic sequence would be TATA (5' to 3') and its complimentary sequence from 3' to 5' would be ATAT. Thus, a new method has been developed using dynamic programming to fetch the palindromic nucleic acid sequences. The new method uses less memory and thereby it increases the overall speed and efficiency. The proposed method has been tested using the bacterial (3891 KB bases) and human chromosomal sequences (Chr-18: 74366 kb and Chr-Y: 25554 kb) and the computation time for finding the palindromic sequences is in milli seconds. PMID:23515654

  18. Enhanced Acid Tolerance in Bifidobacterium longum by Adaptive Evolution: Comparison of the Genes between the Acid-Resistant Variant and Wild-Type Strain.

    PubMed

    Jiang, Yunyun; Ren, Fazheng; Liu, Songling; Zhao, Liang; Guo, Huiyuan; Hou, Caiyun

    2016-03-28

    Acid stress can affect the viability of probiotics, especially Bifidobacterium. This study aimed to improve the acid tolerance of Bifidobacterium longum BBMN68 using adaptive evolution. The stress response, and genomic differences of the parental strain and the variant strain were compared by acid stress. The highest acid-resistant mutant strain (BBMN68m) was isolated from more than 100 asexual lines, which were adaptive to the acid stress for 10(th), 20(th), 30(th), 40(th), and 50(th) repeats, respectively. The variant strain showed a significant increase in acid tolerance under conditions of pH 2.5 for 2 h (from 7.92 to 4.44 log CFU/ml) compared with the wildtype strain (WT, from 7.87 to 0 log CFU/ml). The surface of the variant strain was also smoother. Comparative whole-genome analysis showed that the galactosyl transferase D gene (cpsD, bbmn68_1012), a key gene involved in exopolysaccharide (EPS) synthesis, was altered by two nucleotides in the mutant, causing alteration in amino acids, pI (from 8.94 to 9.19), and predicted protein structure. Meanwhile, cpsD expression and EPS production were also reduced in the variant strain (p < 0.05) compared with WT, and the exogenous WT-EPS in the variant strain reduced its acid-resistant ability. These results suggested EPS was related to acid responses of BBMN68. PMID:26608165

  19. High-Throughput Sequencing of mGluR Signaling Pathway Genes Reveals Enrichment of Rare Variants in Autism

    PubMed Central

    Hovhannisyan, Hayk; Trautman, Edwin; Pinard, Robert; Rathmell, Barbara; Carpenter, Randall; Margulies, David

    2012-01-01

    Identification of common molecular pathways affected by genetic variation in autism is important for understanding disease pathogenesis and devising effective therapies. Here, we test the hypothesis that rare genetic variation in the metabotropic glutamate-receptor (mGluR) signaling pathway contributes to autism susceptibility. Single-nucleotide variants in genes encoding components of the mGluR signaling pathway were identified by high-throughput multiplex sequencing of pooled samples from 290 non-syndromic autism cases and 300 ethnically matched controls on two independent next-generation platforms. This analysis revealed significant enrichment of rare functional variants in the mGluR pathway in autism cases. Higher burdens of rare, potentially deleterious variants were identified in autism cases for three pathway genes previously implicated in syndromic autism spectrum disorder, TSC1, TSC2, and SHANK3, suggesting that genetic variation in these genes also contributes to risk for non-syndromic autism. In addition, our analysis identified HOMER1, which encodes a postsynaptic density-localized scaffolding protein that interacts with Shank3 to regulate mGluR activity, as a novel autism-risk gene. Rare, potentially deleterious HOMER1 variants identified uniquely in the autism population affected functionally important protein regions or regulatory sequences and co-segregated closely with autism among children of affected families. We also identified rare ASD-associated coding variants predicted to have damaging effects on components of the Ras/MAPK cascade. Collectively, these findings suggest that altered signaling downstream of mGluRs contributes to the pathogenesis of non-syndromic autism. PMID:22558107

  20. DRD1 rare variants associated with tardive-like dystonia: a pilot pathway sequencing study in dystonia.

    PubMed

    Groen, Justus L; Ritz, Katja; Warner, Tom T; Baas, Frank; Tijssen, Marina A J

    2014-07-01

    The dystonias are a clinical heterogeneous group with a complex genetic background. To gain more insight in genetic risk factors in dystonia we used a pathway sequence approach in patients with an extreme dystonia phenotype (n = 26). We assessed all coding and non-coding variants in candidate genes in D1-like subclass of dopamine receptor genes (DRD1, DRD5) and the synaptic vesicle pathway linked to torsinA (TOR1A, STON2, SNAPIN, KLC1 and THAP1), spanning 96 Kb. Two rare missense variants in DRD1 were found: c.68G>A(p.Arg23His) in the screening group and c.776C>A(p.Ser259Tyr) in an additional screen of 15 selected dystonia patients. Genetic burden analysis of DRD1 rare variants in patients (4.8%) versus European American controls from ESP (0.72%) reveals an OR 5.35 (95% CI 1.29-23.1). No rare missense SNVs in the synaptic vesicle pathway were found. Sequencing of TOR1A showed variant enrichment in haplotype 2, possibly accountable for contradictive results in previous association studies. Two new rare SNVs were detected in THAP1, including a nonsense mutation (p.Gln167Ter) and a splice site variant (c.72-1G>A). Screening for rare SNV of candidate pathways in a phenotype extreme population appears to be a promising alternative method to identify genetic risk factors in complex disorders like primary torsion dystonia. These findings indicate a role for rare genetic variation in dopamine processing genes in dystonia pathophysiology. PMID:24768614

  1. Development and Validation of a Scalable Next-Generation Sequencing System for Assessing Relevant Somatic Variants in Solid Tumors12

    PubMed Central

    Hovelson, Daniel H.; McDaniel, Andrew S.; Cani, Andi K.; Johnson, Bryan; Rhodes, Kate; Williams, Paul D.; Bandla, Santhoshi; Bien, Geoffrey; Choppa, Paul; Hyland, Fiona; Gottimukkala, Rajesh; Liu, Guoying; Manivannan, Manimozhi; Schageman, Jeoffrey; Ballesteros-Villagrana, Efren; Grasso, Catherine S.; Quist, Michael J.; Yadati, Venkata; Amin, Anmol; Siddiqui, Javed; Betz, Bryan L.; Knudsen, Karen E.; Cooney, Kathleen A.; Feng, Felix Y.; Roh, Michael H.; Nelson, Peter S.; Liu, Chia-Jen; Beer, David G.; Wyngaard, Peter; Chinnaiyan, Arul M.; Sadis, Seth; Rhodes, Daniel R.; Tomlins, Scott A.

    2015-01-01

    Next-generation sequencing (NGS) has enabled genome-wide personalized oncology efforts at centers and companies with the specialty expertise and infrastructure required to identify and prioritize actionable variants. Such approaches are not scalable, preventing widespread adoption. Likewise, most targeted NGS approaches fail to assess key relevant genomic alteration classes. To address these challenges, we predefined the catalog of relevant solid tumor somatic genome variants (gain-of-function or loss-of-function mutations, high-level copy number alterations, and gene fusions) through comprehensive bioinformatics analysis of >700,000 samples. To detect these variants, we developed the Oncomine Comprehensive Panel (OCP), an integrative NGS-based assay [compatible with < 20 ng of DNA/RNA from formalin-fixed paraffin-embedded (FFPE) tissues], coupled with an informatics pipeline to specifically identify relevant predefined variants and created a knowledge base of related potential treatments, current practice guidelines, and open clinical trials. We validated OCP using molecular standards and more than 300 FFPE tumor samples, achieving >95% accuracy for KRAS, epidermal growth factor receptor, and BRAF mutation detection as well as for ALK and TMPRSS2:ERG gene fusions. Associating positive variants with potential targeted treatments demonstrated that 6% to 42% of profiled samples (depending on cancer type) harbored alterations beyond routine molecular testing that were associated with approved or guideline-referenced therapies. As a translational research tool, OCP identified adaptive CTNNB1 amplifications/mutations in treated prostate cancers. Through predefining somatic variants in solid tumors and compiling associated potential treatment strategies, OCP represents a simplified, broadly applicable targeted NGS system with the potential to advance precision oncology efforts. PMID:25925381

  2. Amino acid sequence repertoire of the bacterial proteome and the occurrence of untranslatable sequences.

    PubMed

    Navon, Sharon Penias; Kornberg, Guy; Chen, Jin; Schwartzman, Tali; Tsai, Albert; Puglisi, Elisabetta Viani; Puglisi, Joseph D; Adir, Noam

    2016-06-28

    Bioinformatic analysis of Escherichia coli proteomes revealed that all possible amino acid triplet sequences occur at their expected frequencies, with four exceptions. Two of the four underrepresented sequences (URSs) were shown to interfere with translation in vivo and in vitro. Enlarging the URS by a single amino acid resulted in increased translational inhibition. Single-molecule methods revealed stalling of translation at the entrance of the peptide exit tunnel of the ribosome, adjacent to ribosomal nucleotides A2062 and U2585. Interaction with these same ribosomal residues is involved in regulation of translation by longer, naturally occurring protein sequences. The E. coli exit tunnel has evidently evolved to minimize interaction with the exit tunnel and maximize the sequence diversity of the proteome, although allowing some interactions for regulatory purposes. Bioinformatic analysis of the human proteome revealed no underrepresented triplet sequences, possibly reflecting an absence of regulation by interaction with the exit tunnel. PMID:27307442

  3. Variant fatty acid-like molecules Conjugation, novel approaches for extending the stability of therapeutic peptides

    PubMed Central

    Li, Ying; Wang, Yuli; Wei, Qunchao; Zheng, Xuemin; Tang, Lida; Kong, Dexin; Gong, Min

    2015-01-01

    The multiple physiological properties of glucagon-like peptide-1 (GLP-1) make it a promising drug candidate for the treatment of type 2 diabetes. However, the in vivo half-life of GLP-1 is short due to rapid degradation by dipeptidyl peptidase-IV (DPP-IV) and renal clearance. The poor stability of GLP-1 has significantly limited its clinical utility; however, many studies are focused on extending its stability. Fatty acid conjugation is a traditional approach for extending the stability of therapeutic peptides because of the high binding affinity of human serum albumin for fatty acids. However, the conjugate requires a complex synthetic approach, usually involving Lys and occasionally involving a linker. In the current study, we conjugated the GLP-1 molecule with fatty acid derivatives to simplify the synthesis steps. Human serum albumin binding assays indicated that the retained carboxyl groups of the fatty acids helped maintain a tight affinity to HSA. The conjugation of fatty acid-like molecules improved the stability and increased the binding affinity of GLP-1 to HSA. The use of fatty acid-like molecules as conjugating components allowed variant conjugation positions and freed carboxyl groups for other potential uses. This may be a novel, long-acting strategy for the development of therapeutic peptides. PMID:26658631

  4. Variant fatty acid-like molecules Conjugation, novel approaches for extending the stability of therapeutic peptides.

    PubMed

    Li, Ying; Wang, Yuli; Wei, Qunchao; Zheng, Xuemin; Tang, Lida; Kong, Dexin; Gong, Min

    2015-01-01

    The multiple physiological properties of glucagon-like peptide-1 (GLP-1) make it a promising drug candidate for the treatment of type 2 diabetes. However, the in vivo half-life of GLP-1 is short due to rapid degradation by dipeptidyl peptidase-IV (DPP-IV) and renal clearance. The poor stability of GLP-1 has significantly limited its clinical utility; however, many studies are focused on extending its stability. Fatty acid conjugation is a traditional approach for extending the stability of therapeutic peptides because of the high binding affinity of human serum albumin for fatty acids. However, the conjugate requires a complex synthetic approach, usually involving Lys and occasionally involving a linker. In the current study, we conjugated the GLP-1 molecule with fatty acid derivatives to simplify the synthesis steps. Human serum albumin binding assays indicated that the retained carboxyl groups of the fatty acids helped maintain a tight affinity to HSA. The conjugation of fatty acid-like molecules improved the stability and increased the binding affinity of GLP-1 to HSA. The use of fatty acid-like molecules as conjugating components allowed variant conjugation positions and freed carboxyl groups for other potential uses. This may be a novel, long-acting strategy for the development of therapeutic peptides. PMID:26658631

  5. Complete Genome Sequences of Eight Human Papillomavirus Type 16 Asian American and European Variant Isolates from Cervical Biopsies and Lesions in Indian Women

    PubMed Central

    Mandal, Paramita; Sen, Shrinka; Bhattacharya, Amrapali; Roy Chowdhury, Rahul; Mondal, Nidhu Ranjan

    2016-01-01

    Human papillomavirus type 16 (HPV16), a member of the Papillomaviridae family, is the primary etiological agent of cervical cancer. Here, we report the complete genome sequences of four HPV16 Asian American variants and four European variants, isolated from cervical biopsies and scrapings in India. PMID:27198009

  6. Complete Genome Sequences of Eight Human Papillomavirus Type 16 Asian American and European Variant Isolates from Cervical Biopsies and Lesions in Indian Women.

    PubMed

    Mandal, Paramita; Bhattacharjee, Bornali; Sen, Shrinka; Bhattacharya, Amrapali; Roy Chowdhury, Rahul; Mondal, Nidhu Ranjan; Sengupta, Sharmila

    2016-01-01

    Human papillomavirus type 16 (HPV16), a member of the Papillomaviridae family, is the primary etiological agent of cervical cancer. Here, we report the complete genome sequences of four HPV16 Asian American variants and four European variants, isolated from cervical biopsies and scrapings in India. PMID:27198009

  7. Characterization of Genomic Variants Associated with Scout and Recruit Behavioral Castes in Honey Bees Using Whole-Genome Sequencing

    PubMed Central

    Southey, Bruce R.; Zhu, Ping; Carr-Markell, Morgan K.; Liang, Zhengzheng S.; Zayed, Amro; Li, Ruiqiang; Robinson, Gene E.; Rodriguez-Zas, Sandra L.

    2016-01-01

    Among forager honey bees, scouts seek new resources and return to the colony, enlisting recruits to collect these resources. Differentially expressed genes between these behaviors and genetic variability in scouting phenotypes have been reported. Whole-genome sequencing of 44 Apis mellifera scouts and recruits was undertaken to detect variants and further understand the genetic architecture underlying the behavioral differences between scouts and recruits. The median coverage depth in recruits and scouts was 10.01 and 10.7 X, respectively. Representation of bacterial species among the unmapped reads reflected a more diverse microbiome in scouts than recruits. Overall, 1,412,705 polymorphic positions were analyzed for associations with scouting behavior, and 212 significant (p-value < 0.0001) associations with scouting corresponding to 137 positions were detected. Most frequent putative transcription factor binding sites proximal to significant variants included Broad-complex 4, Broad-complex 1, Hunchback, and CF2-II. Three variants associated with scouting were located within coding regions of ncRNAs including one codon change (LOC102653644) and 2 frameshift indels (LOC102654879 and LOC102655256). Significant variants were also identified on the 5’UTR of membrin, and 3’UTRs of laccase 2 and diacylglycerol kinase theta. The 60 significant variants located within introns corresponded to 39 genes and most of these positions were > 1000 bp apart from each other. A number of these variants were mapped to ncRNA LOC100578102, solute carrier family 12 member 6-like gene, and LOC100576965 (meprin and TRAF-C homology domain containing gene). Functional categories represented among the genes corresponding to significant variants included: neuronal function, exoskeleton, immune response, salivary gland development, and enzymatic food processing. These categories offer a glimpse into the molecular support to the behaviors of scouts and recruits. The level of association

  8. On Efficient and Accurate Calculation of Significance P-Values for Sequence Kernel Association Testing of Variant Set.

    PubMed

    Wu, Baolin; Guan, Weihua; Pankow, James S

    2016-03-01

    The objective of this paper is to discuss and develop alternative computational methods to accurately and efficiently calculate significance P-values for the commonly used sequence kernel association test (SKAT) and adaptive sum of SKAT and burden test (SKAT-O) for variant set association. We show that the existing software can lead to either conservative or inflated type I errors. We develop alternative and efficient computational algorithms that quickly compute the SKAT P-value and have well-controlled type I errors. In addition, we derive an alternative and simplified formula for calculating the significance P-value of SKAT-O, which sheds light on the development of efficient and accurate numerical algorithms. We implement the proposed methods in the publicly available R package that can be readily used or adapted to large-scale sequencing studies. Given that more and more large-scale exome and whole genome sequencing or re-sequencing studies are being conducted, the proposed methods are practically very important. We conduct extensive numerical studies to investigate the performance of the proposed methods. We further illustrate their usefulness with application to associations between rare exonic variants and fasting glucose levels in the Atherosclerosis Risk in Communities (ARIC) study. PMID:26757198

  9. Red-Shifted Aequorin Variants Incorporating Non-Canonical Amino Acids: Applications in In Vivo Imaging

    PubMed Central

    Grinstead, Kristen M.; Rowe, Laura; Ensor, Charles M.; Joel, Smita; Daftarian, Pirouz; Dikici, Emre; Zingg, Jean-Marc; Daunert, Sylvia

    2016-01-01

    The increased importance of in vivo diagnostics has posed new demands for imaging technologies. In that regard, there is a need for imaging molecules capable of expanding the applications of current state-of-the-art imaging in vivo diagnostics. To that end, there is a desire for new reporter molecules capable of providing strong signals, are non-toxic, and can be tailored to diagnose or monitor the progression of a number of diseases. Aequorin is a non-toxic photoprotein that can be used as a sensitive marker for bioluminescence in vivo imaging. The sensitivity of aequorin is due to the fact that bioluminescence is a rare phenomenon in nature and, therefore, it does not suffer from autofluorescence, which contributes to background emission. Emission of bioluminescence in the blue-region of the spectrum by aequorin only occurs when calcium, and its luciferin coelenterazine, are bound to the protein and trigger a biochemical reaction that results in light generation. It is this reaction that endows aequorin with unique characteristics, making it ideally suited for a number of applications in bioanalysis and imaging. Herein we report the site-specific incorporation of non-canonical or non-natural amino acids and several coelenterazine analogues, resulting in a catalog of 72 cysteine-free, aequorin variants which expand the potential applications of these photoproteins by providing several red-shifted mutants better suited to use in vivo. In vivo studies in mouse models using the transparent tissue of the eye confirmed the activity of the aequorin variants incorporating L-4-iodophehylalanine and L-4-methoxyphenylalanine after injection into the eye and topical addition of coelenterazine. The signal also remained localized within the eye. This is the first time that aequorin variants incorporating non-canonical amino acids have shown to be active in vivo and useful as reporters in bioluminescence imaging. PMID:27367859

  10. Exome-wide association analysis reveals novel coding sequence variants associated with lipid traits in Chinese.

    PubMed

    Tang, Clara S; Zhang, He; Cheung, Chloe Y Y; Xu, Ming; Ho, Jenny C Y; Zhou, Wei; Cherny, Stacey S; Zhang, Yan; Holmen, Oddgeir; Au, Ka-Wing; Yu, Haiyi; Xu, Lin; Jia, Jia; Porsch, Robert M; Sun, Lijie; Xu, Weixian; Zheng, Huiping; Wong, Lai-Yung; Mu, Yiming; Dou, Jingtao; Fong, Carol H Y; Wang, Shuyu; Hong, Xueyu; Dong, Liguang; Liao, Yanhua; Wang, Jiansong; Lam, Levina S M; Su, Xi; Yan, Hua; Yang, Min-Lee; Chen, Jin; Siu, Chung-Wah; Xie, Gaoqiang; Woo, Yu-Cho; Wu, Yangfeng; Tan, Kathryn C B; Hveem, Kristian; Cheung, Bernard M Y; Zöllner, Sebastian; Xu, Aimin; Eugene Chen, Y; Jiang, Chao Qiang; Zhang, Youyi; Lam, Tai-Hing; Ganesh, Santhi K; Huo, Yong; Sham, Pak C; Lam, Karen S L; Willer, Cristen J; Tse, Hung-Fat; Gao, Wei

    2015-01-01

    Blood lipids are important risk factors for coronary artery disease (CAD). Here we perform an exome-wide association study by genotyping 12,685 Chinese, using a custom Illumina HumanExome BeadChip, to identify additional loci influencing lipid levels. Single-variant association analysis on 65,671 single nucleotide polymorphisms reveals 19 loci associated with lipids at exome-wide significance (P<2.69 × 10(-7)), including three Asian-specific coding variants in known genes (CETP p.Asp459Gly, PCSK9 p.Arg93Cys and LDLR p.Arg257Trp). Furthermore, missense variants at two novel loci-PNPLA3 p.Ile148Met and PKD1L3 p.Thr429Ser-also influence levels of triglycerides and low-density lipoprotein cholesterol, respectively. Another novel gene, TEAD2, is found to be associated with high-density lipoprotein cholesterol through gene-based association analysis. Most of these newly identified coding variants show suggestive association (P<0.05) with CAD. These findings demonstrate that exome-wide genotyping on samples of non-European ancestry can identify additional population-specific possible causal variants, shedding light on novel lipid biology and CAD. PMID:26690388

  11. Exome-wide association analysis reveals novel coding sequence variants associated with lipid traits in Chinese

    PubMed Central

    Tang, Clara S.; Zhang, He; Cheung, Chloe Y. Y.; Xu, Ming; Ho, Jenny C. Y.; Zhou, Wei; Cherny, Stacey S.; Zhang, Yan; Holmen, Oddgeir; Au, Ka-Wing; Yu, Haiyi; Xu, Lin; Jia, Jia; Porsch, Robert M.; Sun, Lijie; Xu, Weixian; Zheng, Huiping; Wong, Lai-Yung; Mu, Yiming; Dou, Jingtao; Fong, Carol H. Y.; Wang, Shuyu; Hong, Xueyu; Dong, Liguang; Liao, Yanhua; Wang, Jiansong; Lam, Levina S. M.; Su, Xi; Yan, Hua; Yang, Min-Lee; Chen, Jin; Siu, Chung-Wah; Xie, Gaoqiang; Woo, Yu-Cho; Wu, Yangfeng; Tan, Kathryn C. B.; Hveem, Kristian; Cheung, Bernard M. Y.; Zöllner, Sebastian; Xu, Aimin; Eugene Chen, Y; Jiang, Chao Qiang; Zhang, Youyi; Lam, Tai-Hing; Ganesh, Santhi K.; Huo, Yong; Sham, Pak C.; Lam, Karen S. L.; Willer, Cristen J.; Tse, Hung-Fat; Gao, Wei

    2015-01-01

    Blood lipids are important risk factors for coronary artery disease (CAD). Here we perform an exome-wide association study by genotyping 12,685 Chinese, using a custom Illumina HumanExome BeadChip, to identify additional loci influencing lipid levels. Single-variant association analysis on 65,671 single nucleotide polymorphisms reveals 19 loci associated with lipids at exome-wide significance (P<2.69 × 10−7), including three Asian-specific coding variants in known genes (CETP p.Asp459Gly, PCSK9 p.Arg93Cys and LDLR p.Arg257Trp). Furthermore, missense variants at two novel loci—PNPLA3 p.Ile148Met and PKD1L3 p.Thr429Ser—also influence levels of triglycerides and low-density lipoprotein cholesterol, respectively. Another novel gene, TEAD2, is found to be associated with high-density lipoprotein cholesterol through gene-based association analysis. Most of these newly identified coding variants show suggestive association (P<0.05) with CAD. These findings demonstrate that exome-wide genotyping on samples of non-European ancestry can identify additional population-specific possible causal variants, shedding light on novel lipid biology and CAD. PMID:26690388

  12. On Quantum Algorithm for Multiple Alignment of Amino Acid Sequences

    NASA Astrophysics Data System (ADS)

    Iriyama, Satoshi; Ohya, Masanori

    2009-02-01

    The alignment of genome sequences or amino acid sequences is one of fundamental operations for the study of life. Usual computational complexity for the multiple alignment of N sequences with common length L by dynamic programming is O(LN). This alignment is considered as one of the NP problems, so that it is desirable to find a nice algorithm of the multiple alignment. Thus in this paper we propose the quantum algorithm for the multiple alignment based on the works12,1,2 in which the NP complete problem was shown to be the P problem by means of quantum algorithm and chaos information dynamics.

  13. Ancient human sialic acid variant restricts an emerging zoonotic malaria parasite

    PubMed Central

    Dankwa, Selasi; Lim, Caeul; Bei, Amy K.; Jiang, Rays H. Y.; Abshire, James R.; Patel, Saurabh D.; Goldberg, Jonathan M.; Moreno, Yovany; Kono, Maya; Niles, Jacquin C.; Duraisingh, Manoj T.

    2016-01-01

    Plasmodium knowlesi is a zoonotic parasite transmitted from macaques causing malaria in humans in Southeast Asia. Plasmodium parasites bind to red blood cell (RBC) surface receptors, many of which are sialylated. While macaques synthesize the sialic acid variant N-glycolylneuraminic acid (Neu5Gc), humans cannot because of a mutation in the enzyme CMAH that converts N-acetylneuraminic acid (Neu5Ac) to Neu5Gc. Here we reconstitute CMAH in human RBCs for the reintroduction of Neu5Gc, which results in enhancement of P. knowlesi invasion. We show that two P. knowlesi invasion ligands, PkDBPβ and PkDBPγ, bind specifically to Neu5Gc-containing receptors. A human-adapted P. knowlesi line invades human RBCs independently of Neu5Gc, with duplication of the sialic acid-independent invasion ligand, PkDBPα and loss of PkDBPγ. Our results suggest that absence of Neu5Gc on human RBCs limits P. knowlesi invasion, but that parasites may evolve to invade human RBCs through the use of sialic acid-independent pathways. PMID:27041489

  14. Ancient human sialic acid variant restricts an emerging zoonotic malaria parasite.

    PubMed

    Dankwa, Selasi; Lim, Caeul; Bei, Amy K; Jiang, Rays H Y; Abshire, James R; Patel, Saurabh D; Goldberg, Jonathan M; Moreno, Yovany; Kono, Maya; Niles, Jacquin C; Duraisingh, Manoj T

    2016-01-01

    Plasmodium knowlesi is a zoonotic parasite transmitted from macaques causing malaria in humans in Southeast Asia. Plasmodium parasites bind to red blood cell (RBC) surface receptors, many of which are sialylated. While macaques synthesize the sialic acid variant N-glycolylneuraminic acid (Neu5Gc), humans cannot because of a mutation in the enzyme CMAH that converts N-acetylneuraminic acid (Neu5Ac) to Neu5Gc. Here we reconstitute CMAH in human RBCs for the reintroduction of Neu5Gc, which results in enhancement of P. knowlesi invasion. We show that two P. knowlesi invasion ligands, PkDBPβ and PkDBPγ, bind specifically to Neu5Gc-containing receptors. A human-adapted P. knowlesi line invades human RBCs independently of Neu5Gc, with duplication of the sialic acid-independent invasion ligand, PkDBPα and loss of PkDBPγ. Our results suggest that absence of Neu5Gc on human RBCs limits P. knowlesi invasion, but that parasites may evolve to invade human RBCs through the use of sialic acid-independent pathways. PMID:27041489

  15. Candidate genes for congenital diaphragmatic hernia from animalmodels: sequencing of fog2 and pdgfra reveals rare variants indiaphragmatic hernia patients

    SciTech Connect

    Bleyl, S.B.; Moshrefi, A.; Shaw, G.M.; Saijoh, Y.; Schoenwolf,G.C.; Pennacchio, L.A.; Slavotinek, A.M.

    2007-05-11

    Congenital diaphragmatic hernia (CDH) is a common, lifethreatening birth defect. Although there is strong evidence implicatinggenetic factors in its pathogenesis, few causative genes have beenidentified, and in isolated CDH, only one de novo, nonsense mutation hasbeen reported in FOG2 in a female with posterior diaphragmaticeventration. We report here that the homozygous null mouse for the Pdgfragene has posterolateral diaphragmatic defects and thus is a model forhuman CDH. We hypothesized that mutations in this gene could cause humanCDH. We sequenced PDGFRa and FOG2 in 96 patients with CDH, of which 53had isolated CDH (55.2 percent), 36 had CDH and additional anomalies(37.5 percent), and 7 had CDH and known chromosome aberrations (7.3percent). For FOG2, we identified novel sequence alterations predictingp.M703L and p.T843A in two patients with isolated CDH that were absent in526 and 564 control chromosomes respectively. These altered amino acidswere highly conserved. However, due to the lack of available parental DNAsamples we were not able to determine if the sequence alterations were denovo. For PDGFRa, we found a single variant predicting p.L967V in apatient with CDH and multiple anomalies that was absent in 768 controlchromosomes. This patient also had one cell with trisomy 15 on skinfibroblast culture, a finding of uncertain significance. Although ourstudy identified sequence variants in FOG2 and PDGFRa, we have notdefinitively established the variants as mutations and we found noevidence that CDH commonly results from mutations in thesegenes.

  16. Prebiotically plausible mechanisms increase compositional diversity of nucleic acid sequences

    PubMed Central

    Derr, Julien; Manapat, Michael L.; Rajamani, Sudha; Leu, Kevin; Xulvi-Brunet, Ramon; Joseph, Isaac; Nowak, Martin A.; Chen, Irene A.

    2012-01-01

    During the origin of life, the biological information of nucleic acid polymers must have increased to encode functional molecules (the RNA world). Ribozymes tend to be compositionally unbiased, as is the vast majority of possible sequence space. However, ribonucleotides vary greatly in synthetic yield, reactivity and degradation rate, and their non-enzymatic polymerization results in compositionally biased sequences. While natural selection could lead to complex sequences, molecules with some activity are required to begin this process. Was the emergence of compositionally diverse sequences a matter of chance, or could prebiotically plausible reactions counter chemical biases to increase the probability of finding a ribozyme? Our in silico simulations using a two-letter alphabet show that template-directed ligation and high concatenation rates counter compositional bias and shift the pool toward longer sequences, permitting greater exploration of sequence space and stable folding. We verified experimentally that unbiased DNA sequences are more efficient templates for ligation, thus increasing the compositional diversity of the pool. Our work suggests that prebiotically plausible chemical mechanisms of nucleic acid polymerization and ligation could predispose toward a diverse pool of longer, potentially structured molecules. Such mechanisms could have set the stage for the appearance of functional activity very early in the emergence of life. PMID:22319215

  17. The amino-acid sequence of kangaroo pancreatic ribonuclease.

    PubMed

    Gaastra, W; Welling, G W; Beintema, J J

    1978-05-01

    Red kangaroo (Macropus rufus) ribonuclease was isolated from pancreatic tissue by affinity chromatography. The amino acid sequence was determined by automatic sequencing of overlapping large fragments and by analysis of shorter peptides obtained by digestion with a number of proteolytic enzymes. The polypeptide chain consists of 122 amino acid residues. Compared to other ribonucleases, the N-terminal residue and residue 114 are deleted. In other pancreatic ribonucleases position 114 is occupied by a cis proline residue in an external loop at the surface of the molecule. Other remarkable substitutions are the presence of a tyrosine residue at position 123 instead of a serine which forms a hydrogen bond with the pyrimidine ring of a nucleotide substrate, and a number of hydrophobichydrophilic interchanges in the sequence 51-55, which forms part of an alpha-helix in bovine ribonuclease and exhibits few substitutions in the placental mammals. Kangaroo ribonuclease contains no carbohydrate, although the enzyme possesses a recognition site for carbohydrate attachment in the sequence Asn-Val-Thr (62-64). The enzyme differs at about 35-40% of the positions from all other mammalian pancreatic ribonucleases sequenced to date, which is in agreement with the early divergence between the marsupials and the placental mammals. From fragmentary data a tentative sequence of red-necked wallaby (Macropus rufogriseus) pancreatic ribonuclease has been derived. Eight differences with the kangaroo sequence were found. PMID:658039

  18. Longitudinal Antigenic Sequences and Sites from Intra-Host Evolution (LASSIE) identifies immune-selected HIV variants

    SciTech Connect

    Hraber, Peter; Korber, Bette; Wagh, Kshitij; Giorgi, Elena; Bhattacharya, Tanmoy; Gnanakaran, S.; Lapedes, Alan S.; Learn, Gerald H.; Kreider, Edward F.; Li, Yingying; Shaw, George M.; Hahn, Beatrice H.; Montefiori, David C.; Alam, S. Munir; Bonsignori, Mattia; Moody, M. Anthony; Liao, Hua-Xin; Gao, Feng; Haynes, Barton

    2015-10-21

    Within-host genetic sequencing from samples collected over time provides a dynamic view of how viruses evade host immunity. Immune-driven mutations might stimulate neutralization breadth by selecting antibodies adapted to cycles of immune escape that generate within-subject epitope diversity. Comprehensive identification of immune-escape mutations is experimentally and computationally challenging. With current technology, many more viral sequences can readily be obtained than can be tested for binding and neutralization, making down-selection necessary. Typically, this is done manually, by picking variants that represent different time-points and branches on a phylogenetic tree. Such strategies are likely to miss many relevant mutations and combinations of mutations, and to be redundant for other mutations. Longitudinal Antigenic Sequences and Sites from Intrahost Evolution (LASSIE) uses transmitted founder loss to identify virus “hot-spots” under putative immune selection and chooses sequences that represent recurrent mutations in selected sites. LASSIE favors earliest sequences in which mutations arise. Here, with well-characterized longitudinal Env sequences, we confirmed selected sites were concentrated in antibody contacts and selected sequences represented diverse antigenic phenotypes. Finally, practical applications include rapidly identifying immune targets under selective pressure within a subject, selecting minimal sets of reagents for immunological assays that characterize evolving antibody responses, and for immunogens in polyvalent “cocktail” vaccines.

  19. Longitudinal Antigenic Sequences and Sites from Intra-Host Evolution (LASSIE) Identifies Immune-Selected HIV Variants

    PubMed Central

    Hraber, Peter; Korber, Bette; Wagh, Kshitij; Giorgi, Elena E.; Bhattacharya, Tanmoy; Gnanakaran, S.; Lapedes, Alan S.; Learn, Gerald H.; Kreider, Edward F.; Li, Yingying; Shaw, George M.; Hahn, Beatrice H.; Montefiori, David C.; Alam, S. Munir; Bonsignori, Mattia; Moody, M. Anthony; Liao, Hua-Xin; Gao, Feng; Haynes, Barton F.

    2015-01-01

    Within-host genetic sequencing from samples collected over time provides a dynamic view of how viruses evade host immunity. Immune-driven mutations might stimulate neutralization breadth by selecting antibodies adapted to cycles of immune escape that generate within-subject epitope diversity. Comprehensive identification of immune-escape mutations is experimentally and computationally challenging. With current technology, many more viral sequences can readily be obtained than can be tested for binding and neutralization, making down-selection necessary. Typically, this is done manually, by picking variants that represent different time-points and branches on a phylogenetic tree. Such strategies are likely to miss many relevant mutations and combinations of mutations, and to be redundant for other mutations. Longitudinal Antigenic Sequences and Sites from Intrahost Evolution (LASSIE) uses transmitted founder loss to identify virus “hot-spots” under putative immune selection and chooses sequences that represent recurrent mutations in selected sites. LASSIE favors earliest sequences in which mutations arise. With well-characterized longitudinal Env sequences, we confirmed selected sites were concentrated in antibody contacts and selected sequences represented diverse antigenic phenotypes. Practical applications include rapidly identifying immune targets under selective pressure within a subject, selecting minimal sets of reagents for immunological assays that characterize evolving antibody responses, and for immunogens in polyvalent “cocktail” vaccines. PMID:26506369

  20. Longitudinal Antigenic Sequences and Sites from Intra-Host Evolution (LASSIE) identifies immune-selected HIV variants

    DOE PAGESBeta

    Hraber, Peter; Korber, Bette; Wagh, Kshitij; Giorgi, Elena; Bhattacharya, Tanmoy; Gnanakaran, S.; Lapedes, Alan S.; Learn, Gerald H.; Kreider, Edward F.; Li, Yingying; et al

    2015-10-21

    Within-host genetic sequencing from samples collected over time provides a dynamic view of how viruses evade host immunity. Immune-driven mutations might stimulate neutralization breadth by selecting antibodies adapted to cycles of immune escape that generate within-subject epitope diversity. Comprehensive identification of immune-escape mutations is experimentally and computationally challenging. With current technology, many more viral sequences can readily be obtained than can be tested for binding and neutralization, making down-selection necessary. Typically, this is done manually, by picking variants that represent different time-points and branches on a phylogenetic tree. Such strategies are likely to miss many relevant mutations and combinations ofmore » mutations, and to be redundant for other mutations. Longitudinal Antigenic Sequences and Sites from Intrahost Evolution (LASSIE) uses transmitted founder loss to identify virus “hot-spots” under putative immune selection and chooses sequences that represent recurrent mutations in selected sites. LASSIE favors earliest sequences in which mutations arise. Here, with well-characterized longitudinal Env sequences, we confirmed selected sites were concentrated in antibody contacts and selected sequences represented diverse antigenic phenotypes. Finally, practical applications include rapidly identifying immune targets under selective pressure within a subject, selecting minimal sets of reagents for immunological assays that characterize evolving antibody responses, and for immunogens in polyvalent “cocktail” vaccines.« less

  1. A Novel Adaptive Method for the Analysis of Next-Generation Sequencing Data to Detect Complex Trait Associations with Rare Variants Due to Gene Main Effects and Interactions

    PubMed Central

    Liu, Dajiang J.; Leal, Suzanne M.

    2010-01-01

    There is solid evidence that rare variants contribute to complex disease etiology. Next-generation sequencing technologies make it possible to uncover rare variants within candidate genes, exomes, and genomes. Working in a novel framework, the kernel-based adaptive cluster (KBAC) was developed to perform powerful gene/locus based rare variant association testing. The KBAC combines variant classification and association testing in a coherent framework. Covariates can also be incorporated in the analysis to control for potential confounders including age, sex, and population substructure. To evaluate the power of KBAC: 1) variant data was simulated using rigorous population genetic models for both Europeans and Africans, with parameters estimated from sequence data, and 2) phenotypes were generated using models motivated by complex diseases including breast cancer and Hirschsprung's disease. It is demonstrated that the KBAC has superior power compared to other rare variant analysis methods, such as the combined multivariate and collapsing and weight sum statistic. In the presence of variant misclassification and gene interaction, association testing using KBAC is particularly advantageous. The KBAC method was also applied to test for associations, using sequence data from the Dallas Heart Study, between energy metabolism traits and rare variants in ANGPTL 3,4,5 and 6 genes. A number of novel associations were identified, including the associations of high density lipoprotein and very low density lipoprotein with ANGPTL4. The KBAC method is implemented in a user-friendly R package. PMID:20976247

  2. Complete genome sequence of Campylobacter jejuni RM1285 a rod-shaped morphological variant

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Campylobacter jejuni is a spiral-shaped Gram-negative food-borne human pathogen found on poultry products. Strain RM1285 is a rod-shaped variant of this species. The genome of RM1285 was determined to be 1,635,803 bp with a G+C content of 30.5%....

  3. Ethnic-specific associations of rare and low-frequency DNA sequence variants with asthma

    PubMed Central

    Igartua, Catherine; Myers, Rachel A.; Mathias, Rasika A.; Pino-Yanes, Maria; Eng, Celeste; Graves, Penelope E.; Levin, Albert M.; Del-Rio-Navarro, Blanca E.; Jackson, Daniel J.; Livne, Oren E.; Rafaels, Nicholas; Edlund, Christopher K.; Yang, James J.; Huntsman, Scott; Salam, Muhammad T.; Romieu, Isabelle; Mourad, Raphael; Gern, James E.; Lemanske, Robert F.; Wyss, Annah; Hoppin, Jane A.; Barnes, Kathleen C.; Burchard, Esteban G.; Gauderman, W. James; Martinez, Fernando D.; Raby, Benjamin A.; Weiss, Scott T.; Williams, L. Keoki; London, Stephanie J.; Gilliland, Frank D.; Nicolae, Dan L.; Ober, Carole

    2015-01-01

    Common variants at many loci have been robustly associated with asthma but explain little of the overall genetic risk. Here we investigate the role of rare (<1%) and low-frequency (1–5%) variants using the Illumina HumanExome BeadChip array in 4,794 asthma cases, 4,707 non-asthmatic controls and 590 case–parent trios representing European Americans, African Americans/African Caribbeans and Latinos. Our study reveals one low-frequency missense mutation in the GRASP gene that is associated with asthma in the Latino sample (P=4.31 × 10−6; OR=1.25; MAF=1.21%) and two genes harbouring functional variants that are associated with asthma in a gene-based analysis: GSDMB at the 17q12–21 asthma locus in the Latino and combined samples (P=7.81 × 10−8 and 4.09 × 10−8, respectively) and MTHFR in the African ancestry sample (P=1.72 × 10−6). Our results suggest that associations with rare and low-frequency variants are ethnic specific and not likely to explain a significant proportion of the ‘missing heritability’ of asthma. PMID:25591454

  4. Different Variants in Reverse Transcriptase Domain Determined by Ultra-deep Sequencing in Treatment-naïve and Treated Indonesian Patients Infected with Hepatitis B Virus.

    PubMed

    Wasityastuti, Widya; Yano, Yoshihiko; Widasari, Dewiyani Indah; Yamani, Laura Navika; Ratnasari, Neneng; Heriyanto, Didik Setyo; Okada, Rina; Tanahashi, Toshihito; Murakami, Yoshiki; Azuma, Takeshi; Hayashi, Yoshitake

    2016-01-01

    A nucleos(t)ide analog (NA) is the common antiviral drug available for directly treating hepatitis B virus (HBV) infection. However, its application has led to the emergence of NA-resistant mutations mostly in a conserved region of the reverse transcriptase domain of HBV polymerase. Harboring NA-resistant mutations decreases drug effectiveness and increases the frequency of end-stage liver disease. The invention of next-generation sequencing that can generate thousands of sequences from viral complex mixtures provides opportunities to detect minor changes and early viral evolution under drug stress. The present study used ultra-deep sequencing to evaluate discrepant quasispecies in the reverse transcriptase domain of HBV including NA-resistant hotspots between seven treatment-naïve Indonesian patients infected with HBV and five at the early phase of treatment. The most common sub-genotype was HBV B3 (83.34%). The substitution rate of variants determined among amino acids with a ratio of ≥ 1% changes was higher among the population in conserved regions (23.19% vs. 4.59%, P = 0.001) and in the inter-reverse transcriptase domain (23.95% vs. 2.94%, P = 0.002) in treatment naïve, than in treated patients. Nine hotspots of antiviral resistance were identified in both groups, and the mean frequency of changes in all patients was < 1%. The known rtM204I mutation was the most frequent in both groups. The lower rate of variants in HBV quasispecies in patients undergoing treatment could be associated with virus elimination and the extinction of sensitive species by NA therapy. The present findings imply that HBV quasispecies dynamically change during treatment. PMID:27492206

  5. Large-scale analysis of peptide sequence variants: the case for high-field asymmetric waveform ion mobility spectrometry.

    PubMed

    Creese, Andrew J; Smart, Jade; Cooper, Helen J

    2013-05-21

    Large scale analysis of proteins by mass spectrometry is becoming increasingly routine; however, the presence of peptide isomers remains a significant challenge for both identification and quantitation in proteomics. Classes of isomers include sequence inversions, structural isomers, and localization variants. In many cases, liquid chromatography is inadequate for separation of peptide isomers. The resulting tandem mass spectra are composite, containing fragments from multiple precursor ions. The benefits of high-field asymmetric waveform ion mobility spectrometry (FAIMS) for proteomics have been demonstrated by a number of groups, but previously work has focused on extending proteome coverage generally. Here, we present a systematic study of the benefits of FAIMS for a key challenge in proteomics, that of peptide isomers. We have applied FAIMS to the analysis of a phosphopeptide library comprising the sequences GPSGXVpSXAQLX(K/R) and SXPFKXpSPLXFG(K/R), where X = ADEFGLSTVY. The library has defined limits enabling us to make valid conclusions regarding FAIMS performance. The library contains numerous sequence inversions and structural isomers. In addition, there are large numbers of theoretical localization variants, allowing false localization rates to be determined. The FAIMS approach is compared with reversed-phase liquid chromatography and strong cation exchange chromatography. The FAIMS approach identified 35% of the peptide library, whereas LC-MS/MS alone identified 8% and LC-MS/MS with strong cation exchange chromatography prefractionation identified 17.3% of the library. PMID:23646896

  6. Mapping whole genome shotgun sequence and variant calling in mammalian species without their reference genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genomics research in mammals has produced reference genome sequences that are essential for identifying variation associated with disease. High quality reference genome sequences are now available for humans, model species, and economically important agricultural animals. Comparisons between these s...

  7. Amino acid sequence of Salmonella typhimurium branched-chain amino acid aminotransferase.

    PubMed

    Feild, M J; Nguyen, D C; Armstrong, F B

    1989-06-13

    The complete amino acid sequence of the subunit of branched-chain amino acid aminotransferase (transaminase B, EC 2.6.1.42) of Salmonella typhimurium was determined. An Escherichia coli recombinant containing the ilvGEDAY gene cluster of Salmonella was used as the source of the hexameric enzyme. The peptide fragments used for sequencing were generated by treatment with trypsin, Staphylococcus aureus V8 protease, endoproteinase Lys-C, and cyanogen bromide. The enzyme subunit contains 308 residues and has a molecular weight of 33,920. To determine the coenzyme-binding site, the pyridoxal 5-phosphate containing enzyme was treated with tritiated sodium borohydride prior to trypsin digestion. Peptide map comparisons with an apoenzyme tryptic digest and monitoring radioactivity incorporation allowed identification of the pyridoxylated peptide, which was then isolated and sequenced. The coenzyme-binding site is the lysyl residue at position 159. The amino acid sequence of Salmonella transaminase B is 97.4% identical with that of Escherichia coli, differing in only eight amino acid positions. Sequence comparisons of transaminase B to other known aminotransferase sequences revealed limited sequence similarity (24-33%) when conserved amino acid substitutions are allowed and alignments were forced to occur on the coenzyme-binding site. PMID:2669973

  8. NDesign: software for study design for the detection of rare variants from next-generation sequencing data.

    PubMed

    Sugaya, Yuki; Akazawa, Yasuaki; Saito, Akira; Kamitsuji, Shigeo

    2012-10-01

    We developed a software program, NDesign, for the design of a study intended for detecting rare variants from next-generation sequencing (NGS) data. In this study design, the optimal depth of coverage and the average depth of coverage are first evaluated, and then the ability of the designed experiment to obtain a desired power is determined. NDesign has been developed to calculate both these depths, as well as to evaluate the power of the designed experiment. It has a simple implementation in the JavaScript language, and is expected to enable researchers to design optimal NGS studies. PMID:22786579

  9. Sequencing PDX1 (insulin promoter factor 1) in 1788 UK individuals found 5% had a low frequency coding variant, but these variants are not associated with Type 2 diabetes

    PubMed Central

    Edghill, E L; Khamis, A; Weedon, M N; Walker, M; Hitman, G A; McCarthy, M I; Owen, K R; Ellard, S; T Hattersley, A; Frayling, T M

    2011-01-01

    Aim Genome-wide association studies have identified > 30 common variants associated with Type 2 diabetes (> 5% minor allele frequency). These variants have small effects on individual risk and do not account for a large proportion of the heritable component of the disease. Monogenic forms of diabetes are caused by mutations that occur in < 1:2000 individuals and follow strict patterns of inheritance. In contrast, the role of low frequency genetic variants (minor allele frequency 0.1–5%) in Type 2 diabetes is not known. The aim of this study was to assess the role of low frequency PDX1 (also called IPF1) variants in Type 2 diabetes. Methods We sequenced the coding and flanking intronic regions of PDX1 in 910 patients with Type 2 diabetes and 878 control subjects. Results We identified a total of 26 variants that occurred in 5.3% of individuals, 14 of which occurred once. Only D76N occurred in > 1%. We found no difference in carrier frequency between patients (5.7%) and control subjects (5.0%) (P = 0.46). There were also no differences between patients and control subjects when analyses were limited to subsets of variants. The strongest subset were those variants in the DNA binding domain where all five variants identified were only found in patients (P = 0.06). Conclusion Approximately 5% of UK individuals carry a PDX1 variant, but there is no evidence that these variants, either individually or cumulatively, predispose to Type 2 diabetes. Further studies will need to consider strategies to assess the role of multiple variants that occur in < 1 in 1000 individuals. PMID:21569088

  10. Stem pitting and seedling yellows symptoms of Citrus tristeza virus infection may be determined by minor sequence variants.

    PubMed

    Cerni, Silvija; Ruscić, Jelena; Nolasco, Gustavo; Gatin, Zivko; Krajacić, Mladen; Skorić, Dijana

    2008-02-01

    The isolates of Citrus tristeza virus (CTV), the most destructive viral pathogen of citrus, display a high level of variability. As a result of genetic bottleneck induced by the bud-inoculation of CTV-infected material, inoculated seedlings of Citrus wilsonii Tanaka displayed different symptoms. All successfully grafted plants showed severe symptoms of stem pitting and seedling yellows, while plants in which inoculated buds died displayed mild symptoms. Since complex CTV population structure was detected in the parental host, the aim of this work was to investigate how it changed after the virus transmission, and to correlate it with observed symptoms. The coat protein gene sequence of the predominant genotype was identical in parental and grafted plants and clustered to the phylogenetic group 5 encompassing severe reference isolates. In seedlings displaying severe symptoms, the low-frequency variants clustering to other phylogenetic groups were detected, as well. Indicator plants were inoculated with buds taken from unsuccessfully grafted C. wilsonii seedlings. Surprisingly, they displayed no severe symptoms despite the presence of phylogenetic group 5 genomic variants. The results suggest that the appearance of severe symptoms in this case is probably induced by a complex CTV population structure found in seedlings displaying severe symptoms, and not directly by the predominant genomic variant. PMID:18074213

  11. Whole exome sequencing reveals de novo pathogenic variants in KAT6A as a cause of a neurodevelopmental disorder.

    PubMed

    Millan, Francisca; Cho, Megan T; Retterer, Kyle; Monaghan, Kristin G; Bai, Renkui; Vitazka, Patrik; Everman, David B; Smith, Brooke; Angle, Brad; Roberts, Victoria; Immken, LaDonna; Nagakura, Honey; DiFazio, Marc; Sherr, Elliott; Haverfield, Eden; Friedman, Bethany; Telegrafi, Aida; Juusola, Jane; Chung, Wendy K; Bale, Sherri

    2016-07-01

    Neurodevelopmental disorders (NDD) are common, with 1-3% of general population being affected, but the etiology is unknown in most individuals. Clinical whole-exome sequencing (WES) has proven to be a powerful tool for the identification of pathogenic variants leading to Mendelian disorders, among which NDD represent a significant percentage. Performing WES with a trio-approach has proven to be extremely effective in identifying de novo pathogenic variants as a common cause of NDD. Here we report six unrelated individuals with a common phenotype consisting of NDD with severe speech delay, hypotonia, and facial dysmorphism. These patients underwent WES with a trio approach and de novo heterozygous predicted pathogenic novel variants in the KAT6A gene were identified. The KAT6A gene encodes a histone acetyltransfrease protein and it has long been known for its structural involvement in acute myeloid leukemia; however, it has not previously been associated with any congenital disorder. In animal models the KAT6A ortholog is involved in transcriptional regulation during development. Given the similar findings in animal models and our patient's phenotypes, we hypothesize that KAT6A could play a role in development of the brain, face, and heart in humans. © 2016 Wiley Periodicals, Inc. PMID:27133397

  12. Quantitative analysis of single amino acid variant peptides associated with pancreatic cancer in serum by an isobaric labeling quantitative method.

    PubMed

    Nie, Song; Yin, Haidi; Tan, Zhijing; Anderson, Michelle A; Ruffin, Mack T; Simeone, Diane M; Lubman, David M

    2014-12-01

    Single amino acid variations are highly associated with many human diseases. The direct detection of peptides containing single amino acid variants (SAAVs) derived from nonsynonymous single nucleotide polymorphisms (SNPs) in serum can provide unique opportunities for SAAV associated biomarker discovery. In the present study, an isobaric labeling quantitative strategy was applied to identify and quantify variant peptides in serum samples of pancreatic cancer patients and other benign controls. The largest number of SAAV peptides to date in serum including 96 unique variant peptides were quantified in this quantitative analysis, of which five variant peptides showed a statistically significant difference between pancreatic cancer and other controls (p-value < 0.05). Significant differences in the variant peptide SDNCEDTPEAGYFAVAVVK from serotransferrin were detected between pancreatic cancer and controls, which was further validated by selected reaction monitoring (SRM) analysis. The novel biomarker panel obtained by combining α-1-antichymotrypsin (AACT), Thrombospondin-1 (THBS1) and this variant peptide showed an excellent diagnostic performance in discriminating pancreatic cancer from healthy controls (AUC = 0.98) and chronic pancreatitis (AUC = 0.90). These results suggest that large-scale analysis of SAAV peptides in serum may provide a new direction for biomarker discovery research. PMID:25393578

  13. Amino acid sequence of bovine heart coupling factor 6.

    PubMed Central

    Fang, J K; Jacobs, J W; Kanner, B I; Racker, E; Bradshaw, R A

    1984-01-01

    The amino acid sequence of bovine heart mitochondrial coupling factor 6 (F6) has been determined by automated Edman degradation of the whole protein and derived peptides. Preparations based on heat precipitation and ethanol extraction showed allotypic variation at three positions while material further purified by HPLC yielded only one sequence that also differed by a Phe-Thr replacement at residue 62. The mature protein contains 76 amino acids with a calculated molecular weight of 9006 and a pI of approximately equal to 5, in good agreement with experimentally measured values. The charged amino acids are mainly clustered at the termini and in one section in the middle; these three polar segments are separated by two segments relatively rich in nonpolar residues. Chou-Fasman analysis suggests three stretches of alpha-helix coinciding (or within) the high-charge-density sequences with a single beta-turn at the first polar-nonpolar junction. Comparison of the F6 sequence with those of other proteins did not reveal any homologous structures. PMID:6149548

  14. Stabilization of Microsatellite Sequences by Variant Repeats in the Yeast Saccharomyces Cerevisiae

    PubMed Central

    Petes, T. D.; Greenwell, P. W.; Dominska, M.

    1997-01-01

    We examined the effect of a single variant repeat on the stability of a 51-base pair (bp) microsatellite (poly GT). We found that the insertion stabilizes the microsatellite about fivefold in wild-type strains. The stabilizing effect of the variant base was also observed in strains with mutations in the DNA mismatch repair genes pms1, msh2 and msh3, indicating that this effect does not require a functional DNA mismatch repair system. Most of the microsatellite alterations in the pms1, msh2 and msh3 strains were additions or deletions of single GT repeats, but about half of the alterations in the wild-type and msh6 strains were large (>8 bp) deletions or additions. PMID:9178000

  15. High-accuracy biodistribution analysis of adeno-associated virus variants by double barcode sequencing

    PubMed Central

    Marsic, Damien; Méndez-Gómez, Héctor R; Zolotukhin, Sergei

    2015-01-01

    Biodistribution analysis is a key step in the evaluation of adeno-associated virus (AAV) capsid variants, whether natural isolates or produced by rational design or directed evolution. Indeed, when screening candidate vectors, accurate knowledge about which tissues are infected and how efficiently is essential. We describe the design, validation, and application of a new vector, pTR-UF50-BC, encoding a bioluminescent protein, a fluorescent protein and a DNA barcode, which can be used to visualize localization of transduction at the organism, organ, tissue, or cellular levels. In addition, by linking capsid variants to different barcoded versions of the vector and amplifying the barcode region from various tissue samples using barcoded primers, biodistribution of viral genomes can be analyzed with high accuracy and efficiency. PMID:26793739

  16. Prevalence and functional analysis of sequence variants in the ATR checkpoint mediator Claspin (CLSPN)

    PubMed Central

    Zhang, Jianmin; Song, Young-Han; Brannigan, Brian W.; Wahrer, Doke C. R.; Schiripo, Taryn A.; Harris, Patricia L.; Haserlat, Sara M.; Ulkus, Lindsey E.; Shannon, Kristen M.; Garber, Judy E.; Freedman, Matthew L.; Henderson, Brian E.; Zou, Lee; Sgroi, Dennis C.; Haber, Daniel A.; Bell, Daphne W.

    2009-01-01

    Mutational inactivation of genes controlling the DNA damage response contributes to cancer susceptibility within families and within the general population as well as to sporadic tumorigenesis. Claspin (CLSPN) encodes a recently recognized mediator protein essential for the ATR and CHK1-dependent checkpoint elicited by replicative stress or the presence of single-stranded DNA. Here we describe a study to determine whether mutational disruption of CLSPN contributes to cancer susceptibility and sporadic tumorigenesis. We resequenced CLSPN from the germline of selected cancer families with a history of breast cancer (n=25) or a multicancer phenotype (n=46) as well as from a panel of sporadic cancer cell-lines (n=52) derived from a variety of tumor types. Eight nonsynonymous variants, including a recurrent mutation, were identified from the germline of two cancer-prone individuals and five cancer cell-lines of breast, ovarian and hematopoietic origin. None of the variants was present within population controls. In contrast, mutations were rare within genes encoding the CLSPN-interacting protein ATR and its binding partner ATRIP. One variant of CLSPN, encoding the I783S missense mutation, was defective in its ability to mediate CHK1 phosphorylation following DNA damage and was unable to rescue sensitivity to replicative stress in CLSPN-depleted cells. Taken together, these observations raise the possibility that CLSPN may encode a component of the DNA damage response pathway that is targeted by mutations in human cancers, suggesting the need for larger population-based studies to investigate whether CLSPN variants contribute to cancer susceptibility. PMID:19737971

  17. Genetic Variants in the FADS Gene: Implications for Dietary Recommendations for Fatty Acid Intake.

    PubMed

    Mathias, Rasika A; Pani, Vrindarani; Chilton, Floyd H

    2014-06-01

    Unequivocally, genetic variants within the fatty acid desaturase (FADS) cluster are determinants of long chain polyunsaturated fatty acid (LC-PUFA) levels in circulation, cells and tissues. A recent series of papers have addressed these associations in the context of ancestry; evidence clearly supports that the associations are robust to ethnicity. However ∼80% of African Americans carry two copies of the alleles associated with increased levels of arachidonic acid, compared to only ∼45% of European Americans raising important questions of whether gene-PUFA interactions induced by a modern western diet are differentially driving the risk of diseases of inflammation in diverse populations, and are these interactions leading to health disparities. We highlight an important aspect thus far missing in the debate regarding dietary recommendations; we content that current evidence from genetics strongly suggest that an individual's, or at the very least the population from which an individual is sampled, genetic architecture must be factored into dietary recommendations currently in place. PMID:24977108

  18. Identification of Rare Causal Variants in Sequence-Based Studies: Methods and Applications to VPS13B, a Gene Involved in Cohen Syndrome and Autism

    PubMed Central

    De Rubeis, Silvia; McCallum, Kenneth; Buxbaum, Joseph D.

    2014-01-01

    Pinpointing the small number of causal variants among the abundant naturally occurring genetic variation is a difficult challenge, but a crucial one for understanding precise molecular mechanisms of disease and follow-up functional studies. We propose and investigate two complementary statistical approaches for identification of rare causal variants in sequencing studies: a backward elimination procedure based on groupwise association tests, and a hierarchical approach that can integrate sequencing data with diverse functional and evolutionary conservation annotations for individual variants. Using simulations, we show that incorporation of multiple bioinformatic predictors of deleteriousness, such as PolyPhen-2, SIFT and GERP++ scores, can improve the power to discover truly causal variants. As proof of principle, we apply the proposed methods to VPS13B, a gene mutated in the rare neurodevelopmental disorder called Cohen syndrome, and recently reported with recessive variants in autism. We identify a small set of promising candidates for causal variants, including two loss-of-function variants and a rare, homozygous probably-damaging variant that could contribute to autism risk. PMID:25502226

  19. Sequences Of Amino Acids For Human Serum Albumin

    NASA Technical Reports Server (NTRS)

    Carter, Daniel C.

    1992-01-01

    Sequences of amino acids defined for use in making polypeptides one-third to one-sixth as large as parent human serum albumin molecule. Smaller, chemically stable peptides have diverse applications including service as artificial human serum and as active components of biosensors and chromatographic matrices. In applications involving production of artificial sera from new sequences, little or no concern about viral contaminants. Smaller genetically engineered polypeptides more easily expressed and produced in large quantities, making commercial isolation and production more feasible and profitable.

  20. Analysis of Amino Acid Variation in the P2 Domain of the GII-4 Norovirus VP1 Protein Reveals Putative Variant-Specific Epitopes

    PubMed Central

    Allen, David J.; Gray, Jim J.; Gallimore, Chris I.; Xerry, Jacqueline; Iturriza-Gómara, Miren

    2008-01-01

    Background Human noroviruses are a highly diverse group of viruses classified into three of the five currently recognised Norovirus genogroups, and contain numerous genotypes or genetic clusters. Noroviruses are the major aetiological agent of endemic gastroenteritis in all age groups, as well as the cause of periodic epidemic gastroenteritis. The noroviruses most commonly associated with outbreaks of gastroenteritis are genogroup II genotype 4 (GII-4) strains. The relationship between genotypes of noroviruses with their phenotypes and antigenic profile remains poorly understood through an inability to culture these viruses and the lack of a suitable animal model. Methodology/Principal Findings Here we describe a study of the diversity of amino acid sequences of the highly variable P2 region in the major capsid protein, VP1, of the GII-4 human noroviruses strains using sequence analysis and homology modelling techniques. Conclusions/Significance Our data identifies two sites in this region, which show significant amino acid substitutions associated with the appearance of variant strains responsible for epidemics with major public health impact. Homology modelling studies revealed the exposed nature of these sites on the capsid surface, providing supportive structural data that these two sites are likely to be associated with putative variant-specific epitopes. Furthermore, the patterns in the evolution of these viruses at these sites suggests that noroviruses follow a neutral network pattern of evolution. PMID:18213393

  1. Identification of Novel Variants in LTBP2 and PXDN Using Whole-Exome Sequencing in Developmental and Congenital Glaucoma

    PubMed Central

    Micheal, Shazia; Siddiqui, Sorath Noorani; Zafar, Saemah Nuzhat; Iqbal, Aftab; Khan, Muhammad Imran; den Hollander, Anneke I.

    2016-01-01

    Background Primary congenital glaucoma (PCG) is the most common form of glaucoma in children. PCG occurs due to the developmental defects in the trabecular meshwork and anterior chamber of the eye. The purpose of this study is to identify the causative genetic variants in three families with developmental and primary congenital glaucoma (PCG) with a recessive inheritance pattern. Methods DNA samples were obtained from consanguineous families of Pakistani ancestry. The CYP1B1 gene was sequenced in the affected probands by conventional Sanger DNA sequencing. Whole exome sequencing (WES) was performed in DNA samples of four individuals belonging to three different CYP1B1-negative families. Variants identified by WES were validated by Sanger sequencing. Results WES identified potentially causative novel mutations in the latent transforming growth factor beta binding protein 2 (LTBP2) gene in two PCG families. In the first family a novel missense mutation (c.4934G>A; p.Arg1645Glu) co-segregates with the disease phenotype, and in the second family a novel frameshift mutation (c.4031_4032insA; p.Asp1345Glyfs*6) was identified. In a third family with developmental glaucoma a novel mutation (c.3496G>A; p.Gly1166Arg) was identified in the PXDN gene, which segregates with the disease. Conclusions We identified three novel mutations in glaucoma families using WES; two in the LTBP2 gene and one in the PXDN gene. The results will not only enhance our current understanding of the genetic basis of glaucoma, but may also contribute to a better understanding of the diverse phenotypic consequences caused by mutations in these genes. PMID:27409795

  2. Hydrogen Exchange Mass Spectrometry of Related Proteins with Divergent Sequences: A Comparative Study of HIV-1 Nef Allelic Variants

    NASA Astrophysics Data System (ADS)

    Wales, Thomas E.; Poe, Jerrod A.; Emert-Sedlak, Lori; Morgan, Christopher R.; Smithgall, Thomas E.; Engen, John R.

    2016-03-01

    Hydrogen exchange mass spectrometry can be used to compare the conformation and dynamics of proteins that are similar in tertiary structure. If relative deuterium levels are measured, differences in sequence, deuterium forward- and back-exchange, peptide retention time, and protease digestion patterns all complicate the data analysis. We illustrate what can be learned from such data sets by analyzing five variants (Consensus G2E, SF2, NL4-3, ELI, and LTNP4) of the HIV-1 Nef protein, both alone and when bound to the human Hck SH3 domain. Regions with similar sequence could be compared between variants. Although much of the hydrogen exchange features were preserved across the five proteins, the kinetics of Nef binding to Hck SH3 were not the same. These observations may be related to biological function, particularly for ELI Nef where we also observed an impaired ability to downregulate CD4 surface presentation. The data illustrate some of the caveats that must be considered for comparison experiments and provide a framework for investigations of other protein relatives, families, and superfamilies with HX MS.

  3. ZFP57 recognizes multiple and closely spaced sequence motif variants to maintain repressive epigenetic marks in mouse embryonic stem cells

    PubMed Central

    Anvar, Zahra; Cammisa, Marco; Riso, Vincenzo; Baglivo, Ilaria; Kukreja, Harpreet; Sparago, Angela; Girardot, Michael; Lad, Shraddha; De Feis, Italia; Cerrato, Flavia; Angelini, Claudia; Feil, Robert; Pedone, Paolo V.; Grimaldi, Giovanna; Riccio, Andrea

    2016-01-01

    Imprinting Control Regions (ICRs) need to maintain their parental allele-specific DNA methylation during early embryogenesis despite genome-wide demethylation and subsequent de novo methylation. ZFP57 and KAP1 are both required for maintaining the repressive DNA methylation and H3-lysine-9-trimethylation (H3K9me3) at ICRs. In vitro, ZFP57 binds a specific hexanucleotide motif that is enriched at its genomic binding sites. We now demonstrate in mouse embryonic stem cells (ESCs) that SNPs disrupting closely-spaced hexanucleotide motifs are associated with lack of ZFP57 binding and H3K9me3 enrichment. Through a transgenic approach in mouse ESCs, we further demonstrate that an ICR fragment containing three ZFP57 motif sequences recapitulates the original methylated or unmethylated status when integrated into the genome at an ectopic position. Mutation of Zfp57 or the hexanucleotide motifs led to loss of ZFP57 binding and DNA methylation of the transgene. Finally, we identified a sequence variant of the hexanucleotide motif that interacts with ZFP57 both in vivo and in vitro. The presence of multiple and closely located copies of ZFP57 motif variants emerges as a distinct characteristic that is required for the faithful maintenance of repressive epigenetic marks at ICRs and other ZFP57 binding sites. PMID:26481358

  4. Hydrogen Exchange Mass Spectrometry of Related Proteins with Divergent Sequences: A Comparative Study of HIV-1 Nef Allelic Variants.

    PubMed

    Wales, Thomas E; Poe, Jerrod A; Emert-Sedlak, Lori; Morgan, Christopher R; Smithgall, Thomas E; Engen, John R

    2016-06-01

    Hydrogen exchange mass spectrometry can be used to compare the conformation and dynamics of proteins that are similar in tertiary structure. If relative deuterium levels are measured, differences in sequence, deuterium forward- and back-exchange, peptide retention time, and protease digestion patterns all complicate the data analysis. We illustrate what can be learned from such data sets by analyzing five variants (Consensus G2E, SF2, NL4-3, ELI, and LTNP4) of the HIV-1 Nef protein, both alone and when bound to the human Hck SH3 domain. Regions with similar sequence could be compared between variants. Although much of the hydrogen exchange features were preserved across the five proteins, the kinetics of Nef binding to Hck SH3 were not the same. These observations may be related to biological function, particularly for ELI Nef where we also observed an impaired ability to downregulate CD4 surface presentation. The data illustrate some of the caveats that must be considered for comparison experiments and provide a framework for investigations of other protein relatives, families, and superfamilies with HX MS. Graphical Abstract ᅟ. PMID:27032648

  5. Hydrogen Exchange Mass Spectrometry of Related Proteins with Divergent Sequences: A Comparative Study of HIV-1 Nef Allelic Variants

    NASA Astrophysics Data System (ADS)

    Wales, Thomas E.; Poe, Jerrod A.; Emert-Sedlak, Lori; Morgan, Christopher R.; Smithgall, Thomas E.; Engen, John R.

    2016-06-01

    Hydrogen exchange mass spectrometry can be used to compare the conformation and dynamics of proteins that are similar in tertiary structure. If relative deuterium levels are measured, differences in sequence, deuterium forward- and back-exchange, peptide retention time, and protease digestion patterns all complicate the data analysis. We illustrate what can be learned from such data sets by analyzing five variants (Consensus G2E, SF2, NL4-3, ELI, and LTNP4) of the HIV-1 Nef protein, both alone and when bound to the human Hck SH3 domain. Regions with similar sequence could be compared between variants. Although much of the hydrogen exchange features were preserved across the five proteins, the kinetics of Nef binding to Hck SH3 were not the same. These observations may be related to biological function, particularly for ELI Nef where we also observed an impaired ability to downregulate CD4 surface presentation. The data illustrate some of the caveats that must be considered for comparison experiments and provide a framework for investigations of other protein relatives, families, and superfamilies with HX MS.

  6. Nanopores and nucleic acids: prospects for ultrarapid sequencing

    NASA Technical Reports Server (NTRS)

    Deamer, D. W.; Akeson, M.

    2000-01-01

    DNA and RNA molecules can be detected as they are driven through a nanopore by an applied electric field at rates ranging from several hundred microseconds to a few milliseconds per molecule. The nanopore can rapidly discriminate between pyrimidine and purine segments along a single-stranded nucleic acid molecule. Nanopore detection and characterization of single molecules represents a new method for directly reading information encoded in linear polymers. If single-nucleotide resolution can be achieved, it is possible that nucleic acid sequences can be determined at rates exceeding a thousand bases per second.

  7. Secondary Variants in Individuals Undergoing Exome Sequencing: Screening of 572 Individuals Identifies High-Penetrance Mutations in Cancer-Susceptibility Genes

    PubMed Central

    Johnston, Jennifer J.; Rubinstein, Wendy S.; Facio, Flavia M.; Ng, David; Singh, Larry N.; Teer, Jamie K.; Mullikin, James C.; Biesecker, Leslie G.

    2012-01-01

    Genome- and exome-sequencing costs are continuing to fall, and many individuals are undergoing these assessments as research participants and patients. The issue of secondary (so-called incidental) findings in exome analysis is controversial, and data are needed on methods of detection and their frequency. We piloted secondary variant detection by analyzing exomes for mutations in cancer-susceptibility syndromes in subjects ascertained for atherosclerosis phenotypes. We performed exome sequencing on 572 ClinSeq participants, and in 37 genes, we interpreted variants that cause high-penetrance cancer syndromes by using an algorithm that filtered results on the basis of mutation type, quality, and frequency and that filtered mutation-database entries on the basis of defined categories of causation. We identified 454 sequence variants that differed from the human reference. Exclusions were made on the basis of sequence quality (26 variants) and high frequency in the cohort (77 variants) or dbSNP (17 variants), leaving 334 variants of potential clinical importance. These were further filtered on the basis of curation of literature reports. Seven participants, four of whom were of Ashkenazi Jewish descent and three of whom did not meet family-history-based referral criteria, had deleterious BRCA1 or BRCA2 mutations. One participant had a deleterious SDHC mutation, which causes paragangliomas. Exome sequencing, coupled with multidisciplinary interpretation, detected clinically important mutations in cancer-susceptibility genes; four of such mutations were in individuals without a significant family history of disease. We conclude that secondary variants of high clinical importance will be detected at an appreciable frequency in exomes, and we suggest that priority be given to the development of more efficient modes of interpretation with trials in larger patient groups. PMID:22703879

  8. Complete Genome Sequence of Human Norovirus GII.4_2006b, a Variant of Minerva 2006

    PubMed Central

    Yang, Zhihui; Mammel, Mark K.

    2016-01-01

    In 2006, the National Calicivirus Laboratory at the U.S. Centers for Disease Control and Prevention (CDC) confirmed multistate outbreaks of norovirus infection and identified two new GII.4 norovirus strains (Minerva and Laurens) through partial sequencing of the major capsid (VP1) gene. Here, we report the first complete genome sequence of the GII.4 Minerva isolate. PMID:26823589

  9. Amino acid sequence of the Amur tiger prion protein.

    PubMed

    Wu, Changde; Pang, Wanyong; Zhao, Deming

    2006-10-01

    Prion diseases are fatal neurodegenerative disorders in human and animal associated with conformational conversion of a cellular prion protein (PrP(C)) into the pathologic isoform (PrP(Sc)). Various data indicate that the polymorphisms within the open reading frame (ORF) of PrP are associated with the susceptibility and control the species barrier in prion diseases. In the present study, partial Prnp from 25 Amur tigers (tPrnp) were cloned and screened for polymorphisms. Four single nucleotide polymorphisms (T423C, A501G, C511A, A610G) were found; the C511A and A610G nucleotide substitutions resulted in the amino acid changes Lysine171Glutamine and Alanine204Threoine, respectively. The tPrnp amino acid sequence is similar to house cat (Felis catus ) and sheep, but differs significantly from other two cat Prnp sequences that were previously deposited in GenBank. PMID:16780982

  10. Variants of glycoside hydrolases

    SciTech Connect

    Teter, Sarah; Ward, Connie; Cherry, Joel; Jones, Aubrey; Harris, Paul; Yi, Jung

    2013-02-26

    The present invention relates to variants of a parent glycoside hydrolase, comprising a substitution at one or more positions corresponding to positions 21, 94, 157, 205, 206, 247, 337, 350, 373, 383, 438, 455, 467, and 486 of amino acids 1 to 513 of SEQ ID NO: 2, and optionally further comprising a substitution at one or more positions corresponding to positions 8, 22, 41, 49, 57, 113, 193, 196, 226, 227, 246, 251, 255, 259, 301, 356, 371, 411, and 462 of amino acids 1 to 513 of SEQ ID NO: 2 a substitution at one or more positions corresponding to positions 8, 22, 41, 49, 57, 113, 193, 196, 226, 227, 246, 251, 255, 259, 301, 356, 371, 411, and 462 of amino acids 1 to 513 of SEQ ID NO: 2, wherein the variants have glycoside hydrolase activity. The present invention also relates to nucleotide sequences encoding the variant glycoside hydrolases and to nucleic acid constructs, vectors, and host cells comprising the nucleotide sequences.

  11. Variants of glycoside hydrolases

    DOEpatents

    Teter, Sarah; Ward, Connie; Cherry, Joel; Jones, Aubrey; Harris, Paul; Yi, Jung

    2011-04-26

    The present invention relates to variants of a parent glycoside hydrolase, comprising a substitution at one or more positions corresponding to positions 21, 94, 157, 205, 206, 247, 337, 350, 373, 383, 438, 455, 467, and 486 of amino acids 1 to 513 of SEQ ID NO: 2, and optionally further comprising a substitution at one or more positions corresponding to positions 8, 22, 41, 49, 57, 113, 193, 196, 226, 227, 246, 251, 255, 259, 301, 356, 371, 411, and 462 of amino acids 1 to 513 of SEQ ID NO: 2 a substitution at one or more positions corresponding to positions 8, 22, 41, 49, 57, 113, 193, 196, 226, 227, 246, 251, 255, 259, 301, 356, 371, 411, and 462 of amino acids 1 to 513 of SEQ ID NO: 2, wherein the variants have glycoside hydrolase activity. The present invention also relates to nucleotide sequences encoding the variant glycoside hydrolases and to nucleic acid constructs, vectors, and host cells comprising the nucleotide sequences.

  12. Serine protease variants encoded by Echis ocellatus venom gland cDNA: cloning and sequencing analysis.

    PubMed

    Hasson, S S; Mothana, R A; Sallam, T A; Al-balushi, M S; Rahman, M T; Al-Jabri, A A

    2010-01-01

    Envenoming by Echis saw-scaled viper is the leading cause of death and morbidity in Africa due to snake bite. Despite its medical importance, there have been few investigations into the toxin composition of the venom of this viper. Here, we report the cloning of cDNA sequences encoding four groups or isoforms of the haemostasis-disruptive Serine protease proteins (SPs) from the venom glands of Echis ocellatus. All these SP sequences encoded the cysteine residues scaffold that form the 6-disulphide bonds responsible for the characteristic tertiary structure of venom serine proteases. All the Echis ocellatus EoSP groups showed varying degrees of sequence similarity to published viper venom SPs. However, these groups also showed marked intercluster sequence conservation across them which were significantly different from that of previously published viper SPs. Because viper venom SPs exhibit a high degree of sequence similarity and yet exert profoundly different effects on the mammalian haemostatic system, no attempt was made to assign functionality to the new Echis ocellatus EoSPs on the basis of sequence alone. The extraordinary level of interspecific and intergeneric sequence conservation exhibited by the Echis ocellatus EoSPs and analogous serine proteases from other viper species leads us to speculate that antibodies to representative molecules should neutralise (that we will exploit, by epidermal DNA immunization) the biological function of this important group of venom toxins in vipers that are distributed throughout Africa, the Middle East, and the Indian subcontinent. PMID:20936075

  13. Comparative sequence analyses of genome and transcriptome reveal novel transcripts and variants in the Asian elephant Elephas maximus.

    PubMed

    Reddy, Puli Chandramouli; Sinha, Ishani; Kelkar, Ashwin; Habib, Farhat; Pradhan, Saurabh J; Sukumar, Raman; Galande, Sanjeev

    2015-12-01

    The Asian elephant Elephas maximus and the African elephant Loxodonta africana that diverged 5-7 million years ago exhibit differences in their physiology, behaviour and morphology. A comparative genomics approach would be useful and necessary for evolutionary and functional genetic studies of elephants. We performed sequencing of E. maximus and map to L. africana at ~15X coverage. Through comparative sequence analyses, we have identified Asian elephant specific homozygous, non-synonymous single nucleotide variants (SNVs) that map to 1514 protein coding genes, many of which are involved in olfaction. We also present the first report of a high-coverage transcriptome sequence in E. maximus from peripheral blood lymphocytes. We have identified 103 novel protein coding transcripts and 66-long non-coding (lnc)RNAs. We also report the presence of 181 protein domains unique to elephants when compared to other Afrotheria species. Each of these findings can be further investigated to gain a better understanding of functional differences unique to elephant species, as well as those unique to elephantids in comparison with other mammals. This work therefore provides a valuable resource to explore the immense research potential of comparative analyses of transcriptome and genome sequences in the Asian elephant. PMID:26648035

  14. Sequence analysis of the dimerization initiation site of concordant and discordant viral variants superinfecting HIV type 1 patients.

    PubMed

    Mayr, Luzia; Powell, Rebecca; Kinge, Thompson; Nyambi, Phillipe N

    2011-11-01

    For HIV recombination to occur, the RNAs from two infecting strains within a cell must dimerize at the dimerization initiation site (DIS). We examined the sequence identity at the DIS (697-731 bp, Hxb2 numbering engine) in patients superinfected with concordant HIV-1 strains and compared them to those with discordant strains. Viral RNA in sequential plasma from four subjects superinfected with subtype-discordant and two subjects superinfected with subtype-concordant HIV-1 strains was extracted, amplified (5' LTR-early gag: 526-1200 bp, Hxb2 numbering engine), sequenced, and analyzed to determine their compatibility for dimerization in vivo. The concordant viruses infecting the two subjects exhibited identical sequences in the 35-bp-long DIS region while sequences from the discordant viruses revealed single nucleotide changes that were located in the DIS loop (715 bp), its flanking nucleotides (710 bp and 717 bp), and the DIS stem (719 bp). Evidence from in vitro experiments demonstrates that these in vivo changes identified can abolish dimerization and reduce recombination frequency. Therefore, these results revealing differences in the DIS of discordant strains versus the similarity noted for the concordant strains may contribute to the differences in the frequency of recombination in patients superinfected with such HIV-1 variants. PMID:21453132

  15. Intramuscular fat content and genetic variants at fatty acid-binding protein loci in Austrian pigs.

    PubMed

    Nechtelberger, D; Pires, V; Söolknet, J; Stur; Brem, G; Mueller, M; Mueller, S

    2001-11-01

    Intramuscular fat is an important meat quality trait in pig production. Previously, genetic variants of the heart fatty acid-binding protein (H-FABP) gene and the adipocyte fatty acid-binding protein (A-FABP) gene were suggested to be associated with intramuscular fat content. The objective of this investigation was to study these associations in the three most important Austrian breeding populations (Piétrain, Large White, and Landrace). Restriction fragment length polymorphism analysis of the H-FABP gene revealed a new MspI polymorphic site and genetic variation in all three breeds. Microsatellite analysis of the A-FABP locus showed up to nine different microsatellite alleles segregating. In Austrian breeds, no significant influence of the A-FABP and H-FABP gene polymorphisms on intramuscular fat could be detected. We also evaluated possible associations between the genetic variations at the H-FABP and A-FABP loci and other growth and carcass traits (average daily gain, feed conversion ratio, lean meat content, pH values, meat color, and drip loss). With regard to the extent of the effects, these genetic markers cannot be recommended for selection on growth and carcass traits in Austrian breeding populations. PMID:11768107

  16. Pooled sequencing and rare variant association tests for identifying the determinants of emerging drug resistance in malaria parasites.

    PubMed

    Cheeseman, Ian H; McDew-White, Marina; Phyo, Aung Pyae; Sriprawat, Kanlaya; Nosten, François; Anderson, Timothy J C

    2015-04-01

    We explored the potential of pooled sequencing to swiftly and economically identify selective sweeps due to emerging artemisinin (ART) resistance in a South-East Asian malaria parasite population. ART resistance is defined by slow parasite clearance from the blood of ART-treated patients and mutations in the kelch gene (chr. 13) have been strongly implicated to play a role. We constructed triplicate pools of 70 slow-clearing (resistant) and 70 fast-clearing (sensitive) infections collected from the Thai-Myanmar border and sequenced these to high (∼ 150-fold) read depth. Allele frequency estimates from pools showed almost perfect correlation (Lin's concordance = 0.98) with allele frequencies at 93 single nucleotide polymorphisms measured directly from individual infections, giving us confidence in the accuracy of this approach. By mapping genome-wide divergence (FST) between pools of drug-resistant and drug-sensitive parasites, we identified two large (>150 kb) regions (on chrs. 13 and 14) and 17 smaller candidate genome regions. To identify individual genes within these genome regions, we resequenced an additional 38 parasite genomes (16 slow and 22 fast-clearing) and performed rare variant association tests. These confirmed kelch as a major molecular marker for ART resistance (P = 6.03 × 10(-6)). This two-tier approach is powerful because pooled sequencing rapidly narrows down genome regions of interest, while targeted rare variant association testing within these regions can pinpoint the genetic basis of resistance. We show that our approach is robust to recurrent mutation and the generation of soft selective sweeps, which are predicted to be common in pathogen populations with large effective population sizes, and may confound more traditional gene mapping approaches. PMID:25534029

  17. Pooled Sequencing and Rare Variant Association Tests for Identifying the Determinants of Emerging Drug Resistance in Malaria Parasites

    PubMed Central

    Cheeseman, Ian H.; McDew-White, Marina; Phyo, Aung Pyae; Sriprawat, Kanlaya; Nosten, François; Anderson, Timothy J.C.

    2015-01-01

    We explored the potential of pooled sequencing to swiftly and economically identify selective sweeps due to emerging artemisinin (ART) resistance in a South-East Asian malaria parasite population. ART resistance is defined by slow parasite clearance from the blood of ART-treated patients and mutations in the kelch gene (chr. 13) have been strongly implicated to play a role. We constructed triplicate pools of 70 slow-clearing (resistant) and 70 fast-clearing (sensitive) infections collected from the Thai–Myanmar border and sequenced these to high (∼150-fold) read depth. Allele frequency estimates from pools showed almost perfect correlation (Lin’s concordance = 0.98) with allele frequencies at 93 single nucleotide polymorphisms measured directly from individual infections, giving us confidence in the accuracy of this approach. By mapping genome-wide divergence (FST) between pools of drug-resistant and drug-sensitive parasites, we identified two large (>150 kb) regions (on chrs. 13 and 14) and 17 smaller candidate genome regions. To identify individual genes within these genome regions, we resequenced an additional 38 parasite genomes (16 slow and 22 fast-clearing) and performed rare variant association tests. These confirmed kelch as a major molecular marker for ART resistance (P = 6.03 × 10−6). This two-tier approach is powerful because pooled sequencing rapidly narrows down genome regions of interest, while targeted rare variant association testing within these regions can pinpoint the genetic basis of resistance. We show that our approach is robust to recurrent mutation and the generation of soft selective sweeps, which are predicted to be common in pathogen populations with large effective population sizes, and may confound more traditional gene mapping approaches. PMID:25534029

  18. Copy number variants calling for single cell sequencing data by multi-constrained optimization.

    PubMed

    Xu, Bo; Cai, Hongmin; Zhang, Changsheng; Yang, Xi; Han, Guoqiang

    2016-08-01

    Variations in DNA copy number carry important information on genome evolution and regulation of DNA replication in cancer cells. The rapid development of single-cell sequencing technology allows one to explore gene expression heterogeneity among single-cells, thus providing important cancer cell evolution information. Single-cell DNA/RNA sequencing data usually have low genome coverage, which requires an extra step of amplification to accumulate enough samples. However, such amplification will introduce large bias and makes bioinformatics analysis challenging. Accurately modeling the distribution of sequencing data and effectively suppressing the bias influence is the key to success variations analysis. Recent advances demonstrate the technical noises by amplification are more likely to follow negative binomial distribution, a special case of Poisson distribution. Thus, we tackle the problem CNV detection by formulating it into a quadratic optimization problem involving two constraints, in which the underling signals are corrupted by Poisson distributed noises. By imposing the constraints of sparsity and smoothness, the reconstructed read depth signals from single-cell sequencing data are anticipated to fit the CNVs patterns more accurately. An efficient numerical solution based on the classical alternating direction minimization method (ADMM) is tailored to solve the proposed model. We demonstrate the advantages of the proposed method using both synthetic and empirical single-cell sequencing data. Our experimental results demonstrate that the proposed method achieves excellent performance and high promise of success with single-cell sequencing data. PMID:26923213

  19. Quantum-Sequencing: Biophysics of quantum tunneling through nucleic acids

    NASA Astrophysics Data System (ADS)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    Tunneling microscopy and spectroscopy has extensively been used in physical surface sciences to study quantum tunneling to measure electronic local density of states of nanomaterials and to characterize adsorbed species. Quantum-Sequencing (Q-Seq) is a new method based on tunneling microscopy for electronic sequencing of single molecule of nucleic acids. A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free single-molecule sequencing method. Here, we present the unique ``electronic fingerprints'' for all nucleotides on DNA and RNA using Q-Seq along their intrinsic biophysical parameters. We have analyzed tunneling spectra for the nucleotides at different pH conditions and analyzed the HOMO, LUMO and energy gap for all of them. In addition we show a number of biophysical parameters to further characterize all nucleobases (electron and hole transition voltage and energy barriers). These results highlight the robustness of Q-Seq as a technique for next-generation sequencing.

  20. Development of a Targeted Multi-Disorder High-Throughput Sequencing Assay for the Effective Identification of Disease-Causing Variants

    PubMed Central

    Delio, Maria; Patel, Kunjan; Maslov, Alex; Marion, Robert W.; McDonald, Thomas V.; Cadoff, Evan M.; Golden, Aaron; Greally, John M.; Vijg, Jan; Morrow, Bernice; Montagna, Cristina

    2015-01-01

    Background While next generation sequencing (NGS) is a useful tool for the identification of genetic variants to aid diagnosis and support therapy decision, high sequencing costs have limited its application within routine clinical care, especially in economically depressed areas. To investigate the utility of a multi-disease NGS based genetic test, we designed a custom sequencing assay targeting over thirty disease-associated areas including cardiac disorders, intellectual disabilities, hearing loss, collagenopathies, muscular dystrophy, Ashkenazi Jewish genetic disorders, and complex Mendelian disorders. We focused on these specific areas based on the interest of our collaborative clinical team, suggesting these diseases being the ones in need for the development of a sequencing-screening assay. Results We targeted all coding, untranslated regions (UTR) and flanking intronic regions of 650 known disease-associated genes using the Roche-NimbleGen EZ SeqCapV3 capture system and sequenced on the Illumina HiSeq 2500 Rapid Run platform. Eight controls with known variants and one HapMap sample were first sequenced to assess the performance of the panel. Subsequently, as a proof of principle and to explore the possible utility of our test, we analyzed test disease subjects (n = 16). Eight had known Mendelian disorders and eight had complex pediatric diseases. In addition to assess whether copy number variation may be of utility as a companion assay relative to these specific disease areas, we used the Affymetrix Genome-Wide SNP Array 6.0 to analyze the same samples. Conclusion We identified potentially disease-associated variants: 22 missense, 4 nonsense, 1 frameshift, and 1 splice variants (16 previously identified, 12 novel among dbSNP and 15 novel among NHLBI Exome Variant Server). We found multi-disease targeted high-throughput sequencing to be a cost efficient approach in detecting disease-associated variants to aid diagnosis. PMID:26214305

  1. Complete Nucleotide Sequence of IncP-1β Plasmid pDTC28 Reveals a Non-Functional Variant of the blaGES-Type Gene.

    PubMed

    Dang, Bingjun; Mao, Daqing; Luo, Yi

    2016-01-01

    Plasmid pDTC28 was isolated from the sediments of Haihe River using E. coli CV601 (gfp-tagged) as recipient and indigenous bacteria from the sediment as donors. This plasmid confers reduced susceptibility to tetracycline and sulfamethoxazole. The complete sequence of plasmid pDTC28 was 61,503 bp in length with an average G+C content of 64.09%. Plasmid pDTC28 belongs to the IncP-1β group by phylogenetic analysis. The backbones of plasmid pDTC28 and other IncP-1β plasmids are very classical and conserved, whereas the accessory regions of these plasmids are diverse. A blaGES-5-like gene was found on the accessory region, and this blaGES-5-like gene contained 18 silent mutations and 7 missense mutations compared with the blaGES-5 gene. The mutations resulted in 7 amino acid substitutions in GES-5 carbapenemase, causing the loss of function of the blaGES-5-like gene on plasmid pDTC28 against carbapenems and even β-lactams. The enzyme produced by the blaGES-5-like gene cassette may be a new variant of GES-type enzymes. Thus, the plasmid sequenced in this study will expand our understanding of GES-type β-lactamases and provide insights into the genetic platforms used for the dissemination of GES-type genes. PMID:27152950

  2. Complete Nucleotide Sequence of IncP-1β Plasmid pDTC28 Reveals a Non-Functional Variant of the blaGES-Type Gene

    PubMed Central

    Dang, Bingjun; Mao, Daqing; Luo, Yi

    2016-01-01

    Plasmid pDTC28 was isolated from the sediments of Haihe River using E. coli CV601 (gfp-tagged) as recipient and indigenous bacteria from the sediment as donors. This plasmid confers reduced susceptibility to tetracycline and sulfamethoxazole. The complete sequence of plasmid pDTC28 was 61,503 bp in length with an average G+C content of 64.09%. Plasmid pDTC28 belongs to the IncP-1β group by phylogenetic analysis. The backbones of plasmid pDTC28 and other IncP-1β plasmids are very classical and conserved, whereas the accessory regions of these plasmids are diverse. A blaGES-5-like gene was found on the accessory region, and this blaGES-5-like gene contained 18 silent mutations and 7 missense mutations compared with the blaGES-5 gene. The mutations resulted in 7 amino acid substitutions in GES-5 carbapenemase, causing the loss of function of the blaGES-5-like gene on plasmid pDTC28 against carbapenems and even β-lactams. The enzyme produced by the blaGES-5-like gene cassette may be a new variant of GES-type enzymes. Thus, the plasmid sequenced in this study will expand our understanding of GES-type β-lactamases and provide insights into the genetic platforms used for the dissemination of GES-type genes. PMID:27152950

  3. Correlation between fibroin amino acid sequence and physical silk properties.

    PubMed

    Fedic, Robert; Zurovec, Michal; Sehnal, Frantisek

    2003-09-12

    The fiber properties of lepidopteran silk depend on the amino acid repeats that interact during H-fibroin polymerization. The aim of our research was to relate repeat composition to insect biology and fiber strength. Representative regions of the H-fibroin genes were sequenced and analyzed in three pyralid species: wax moth (Galleria mellonella), European flour moth (Ephestia kuehniella), and Indian meal moth (Plodia interpunctella). The amino acid repeats are species-specific, evidently a diversification of an ancestral region of 43 residues, and include three types of regularly dispersed motifs: modifications of GSSAASAA sequence, stretches of tripeptides GXZ where X and Z represent bulky residues, and sequences similar to PVIVIEE. No concatenations of GX dipeptide or alanine, which are typical for Bombyx silkworms and Antheraea silk moths, respectively, were found. Despite different repeat structure, the silks of G. mellonella and E. kuehniella exhibit similar tensile strength as the Bombyx and Antheraea silks. We suggest that in these latter two species, variations in the repeat length obstruct repeat alignment, but sufficiently long stretches of iterated residues get superposed to interact. In the pyralid H-fibroins, interactions of the widely separated and diverse motifs depend on the precision of repeat matching; silk is strong in G. mellonella and E. kuehniella, with 2-3 types of long homogeneous repeats, and nearly 10 times weaker in P. interpunctella, with seven types of shorter erratic repeats. The high proportion of large amino acids in the H-fibroin of pyralids has probably evolved in connection with the spinning habit of caterpillars that live in protective silk tubes and spin continuously, enlarging the tubes on one end and partly devouring the other one. The silk serves as a depot of energetically rich and essential amino acids that may be scarce in the diet. PMID:12816957

  4. Amino acid sequence of the nonsecretory ribonuclease of human urine.

    PubMed

    Beintema, J J; Hofsteenge, J; Iwama, M; Morita, T; Ohgi, K; Irie, M; Sugiyama, R H; Schieven, G L; Dekker, C A; Glitz, D G

    1988-06-14

    The amino acid sequence of a nonsecretory ribonuclease isolated from human urine was determined except for the identity of the residue at position 7. Sequence information indicates that the ribonucleases of human liver and spleen and an eosinophil-derived neurotoxin are identical or very closely related gene products. The sequence is identical at about 30% of the amino acid positions with those of all of the secreted mammalian ribonucleases for which information is available. Identical residues include active-site residues histidine-12, histidine-119, and lysine-41, other residues known to be important for substrate binding and catalytic activity, and all eight half-cystine residues common to these enzymes. Major differences include a deletion of six residues in the (so-called) S-peptide loop, insertions of two, and nine residues, respectively, in three other external loops of the molecule, and an addition of three residues at the amino terminus. The sequence shows the human nonsecretory ribonuclease to belong to the same ribonuclease superfamily as the mammalian secretory ribonucleases, turtle pancreatic ribonuclease, and human angiogenin. Sequence data suggest that a gene duplication occurred in an ancient vertebrate ancestor; one branch led to the nonsecretory ribonuclease, while the other branch led to a second duplication, with one line leading to the secretory ribonucleases (in mammals) and the second line leading to pancreatic ribonuclease in turtle and an angiogenic factor in mammals (human angiogenin). The nonsecretory ribonuclease has five short carbohydrate chains attached via asparagine residues at the surface of the molecule; these chains may have been shortened by exoglycosidase action.(ABSTRACT TRUNCATED AT 250 WORDS) PMID:3166997

  5. Distinct Acid Resistance and Survival Fitness Displayed by Curli Variants of Enterohemorrhagic Escherichia coli O157:H7▿†

    PubMed Central

    Carter, Michelle Q.; Brandl, Maria T.; Louie, Jacqueline W.; Kyle, Jennifer L.; Carychao, Diana K.; Cooley, Michael B.; Parker, Craig T.; Bates, Anne H.; Mandrell, Robert E.

    2011-01-01

    Curli are adhesive fimbriae of Enterobacteriaceae and are involved in surface attachment, cell aggregation, and biofilm formation. Here, we report that both inter- and intrastrain variations in curli production are widespread in enterohemorrhagic Escherichia coli O157:H7. The relative proportions of curli-producing variants (C+) and curli-deficient variants (C−) in an E. coli O157:H7 cell population varied depending on the growth conditions. In variants derived from the 2006 U.S. spinach outbreak strains, the shift between the C+ and C− subpopulations occurred mostly in response to starvation and was unidirectional from C− to C+; in variants derived from the 1993 hamburger outbreak strains, the shift occurred primarily in response to oxygen depletion and was bidirectional. Furthermore, curli variants derived from the same strain displayed marked differences in survival fitness: C+ variants grew to higher concentrations in nutrient-limited conditions than C− variants, whereas C− variants were significantly more acid resistant than C+ variants. This difference in acid resistance does not appear to be linked to the curli fimbriae per se, since a csgA deletion mutant in either a C+ or a C− variant exhibited an acid resistance similar to that of its parental strain. Our data suggest that natural curli variants of E. coli O157:H7 carry several distinct physiological properties that are important for their environmental survival. Maintenance of curli variants in an E. coli O157:H7 population may provide a survival strategy in which C+ variants are selected in a nutrient-limited environment, whereas C− variants are selected in an acidic environment, such as the stomach of an animal host, including that of a human. PMID:21478320

  6. Evaluation of a 5-tier scheme proposed for classification of sequence variants using bioinformatic and splicing assay data: inter-reviewer variability and promotion of minimum reporting guidelines.

    PubMed

    Walker, Logan C; Whiley, Phillip J; Houdayer, Claude; Hansen, Thomas V O; Vega, Ana; Santamarina, Marta; Blanco, Ana; Fachal, Laura; Southey, Melissa C; Lafferty, Alan; Colombo, Mara; De Vecchi, Giovanna; Radice, Paolo; Spurdle, Amanda B

    2013-10-01

    Splicing assays are commonly undertaken in the clinical setting to assess the clinical relevance of sequence variants in disease predisposition genes. A 5-tier classification system incorporating both bioinformatic and splicing assay information was previously proposed as a method to provide consistent clinical classification of such variants. Members of the ENIGMA Consortium Splicing Working Group undertook a study to assess the applicability of the scheme to published assay results, and the consistency of classifications across multiple reviewers. Splicing assay data were identified for 235 BRCA1 and 176 BRCA2 unique variants, from 77 publications. At least six independent reviewers from research and/or clinical settings comprehensively examined splicing assay methods and data reported for 22 variant assays of 21 variants in four publications, and classified the variants using the 5-tier classification scheme. Inconsistencies in variant classification occurred between reviewers for 17 of the variant assays. These could be attributed to a combination of ambiguity in presentation of the classification criteria, differences in interpretation of the data provided, nonstandardized reporting of results, and the lack of quantitative data for the aberrant transcripts. We propose suggestions for minimum reporting guidelines for splicing assays, and improvements to the 5-tier splicing classification system to allow future evaluation of its performance as a clinical tool. PMID:23893897

  7. Long insert whole genome sequencing for copy number variant and translocation detection

    PubMed Central

    Liang, Winnie S.; Aldrich, Jessica; Tembe, Waibhav; Kurdoglu, Ahmet; Cherni, Irene; Phillips, Lori; Reiman, Rebecca; Baker, Angela; Weiss, Glen J.; Carpten, John D.; Craig, David W.

    2014-01-01

    As next-generation sequencing continues to have an expanding presence in the clinic, the identification of the most cost-effective and robust strategy for identifying copy number changes and translocations in tumor genomes is needed. We hypothesized that performing shallow whole genome sequencing (WGS) of 900–1000-bp inserts (long insert WGS, LI-WGS) improves our ability to detect these events, compared with shallow WGS of 300–400-bp inserts. A priori analyses show that LI-WGS requires less sequencing compared with short insert WGS to achieve a target physical coverage, and that LI-WGS requires less sequence coverage to detect a heterozygous event with a power of 0.99. We thus developed an LI-WGS library preparation protocol based off of Illumina’s WGS library preparation protocol and illustrate the feasibility of performing LI-WGS. We additionally applied LI-WGS to three separate tumor/normal DNA pairs collected from patients diagnosed with different cancers to demonstrate our application of LI-WGS on actual patient samples for identification of somatic copy number alterations and translocations. With the evolution of sequencing technologies and bioinformatics analyses, we show that modifications to current approaches may improve our ability to interrogate cancer genomes. PMID:24071583

  8. Characterization and amino acid sequence of a fatty acid-binding protein from human heart.

    PubMed

    Offner, G D; Brecher, P; Sawlivich, W B; Costello, C E; Troxler, R F

    1988-05-15

    The complete amino acid sequence of a fatty acid-binding protein from human heart was determined by automated Edman degradation of CNBr, BNPS-skatole [3'-bromo-3-methyl-2-(2-nitrobenzenesulphenyl)indolenine], hydroxylamine, Staphylococcus aureus V8 proteinase, tryptic and chymotryptic peptides, and by digestion of the protein with carboxypeptidase A. The sequence of the blocked N-terminal tryptic peptide from citraconylated protein was determined by collisionally induced decomposition mass spectrometry. The protein contains 132 amino acid residues, is enriched with respect to threonine and lysine, lacks cysteine, has an acetylated valine residue at the N-terminus, and has an Mr of 14768 and an isoelectric point of 5.25. This protein contains two short internal repeated sequences from residues 48-54 and from residues 114-119 located within regions of predicted beta-structure and decreasing hydrophobicity. These short repeats are contained within two longer repeated regions from residues 48-60 and residues 114-125, which display 62% sequence similarity. These regions could accommodate the charged and uncharged moieties of long-chain fatty acids and may represent fatty acid-binding domains consistent with the finding that human heart fatty acid-binding protein binds 2 mol of oleate or palmitate/mol of protein. Detailed evidence for the amino acid sequences of the peptides has been deposited as Supplementary Publication SUP 50143 (23 pages) at the British Library Lending Division, Boston Spa, Yorkshire LS23 7BQ, U.K., from whom copies may be obtained as indicated in Biochem. J. (1988) 249, 5. PMID:3421901

  9. Predicting the functional consequences of non-synonymous DNA sequence variants--evaluation of bioinformatics tools and development of a consensus strategy.

    PubMed

    Frousios, Kimon; Iliopoulos, Costas S; Schlitt, Thomas; Simpson, Michael A

    2013-10-01

    The study of DNA sequence variation has been transformed by recent advances in DNA sequencing technologies. Determination of the functional consequences of sequence variant alleles offers potential insight as to how genotype may influence phenotype. Even within protein coding regions of the genome, establishing the consequences of variation on gene and protein function is challenging and requires substantial laboratory investigation. However, a series of bioinformatics tools have been developed to predict whether non-synonymous variants are neutral or disease-causing. In this study we evaluate the performance of nine such methods (SIFT, PolyPhen2, SNPs&GO, PhD-SNP, PANTHER, Mutation Assessor, MutPred, Condel and CAROL) and developed CoVEC (Consensus Variant Effect Classification), a tool that integrates the prediction results from four of these methods. We demonstrate that the CoVEC approach outperforms most individual methods and highlights the benefit of combining results from multiple tools. PMID:23831115

  10. Using Sequence Variants in Linkage Disequilibrium with Causative Mutations to Improve Across-Breed Prediction in Dairy Cattle: A Simulation Study.

    PubMed

    van den Berg, Irene; Boichard, Didier; Guldbrandtsen, Bernt; Lund, Mogens S

    2016-01-01

    Sequence data are expected to increase the reliability of genomic prediction by containing causative mutations directly, especially in cases where low linkage disequilibrium between markers and causative mutations limits prediction reliability, such as across-breed prediction in dairy cattle. In practice, the causative mutations are unknown, and prediction with only variants in perfect linkage disequilibrium with the causative mutations is not realistic, leading to a reduced reliability compared to knowing the causative variants. Our objective was to use sequence data to investigate the potential benefits of sequence data for the prediction of genomic relationships, and consequently reliability of genomic breeding values. We used sequence data from five dairy cattle breeds, and a larger number of imputed sequences for two of the five breeds. We focused on the influence of linkage disequilibrium between markers and causative mutations, and assumed that a fraction of the causative mutations was shared across breeds and had the same effect across breeds. By comparing the loss in reliability of different scenarios, varying the distance between markers and causative mutations, using either all genome wide markers from commercial SNP chips, or only the markers closest to the causative mutations, we demonstrate the importance of using only variants very close to the causative mutations, especially for across-breed prediction. Rare variants improved prediction only if they were very close to rare causative mutations, and all causative mutations were rare. Our results show that sequence data can potentially improve genomic prediction, but careful selection of markers is essential. PMID:27317779

  11. Using Sequence Variants in Linkage Disequilibrium with Causative Mutations to Improve Across-Breed Prediction in Dairy Cattle: A Simulation Study

    PubMed Central

    van den Berg, Irene; Boichard, Didier; Guldbrandtsen, Bernt; Lund, Mogens S.

    2016-01-01

    Sequence data are expected to increase the reliability of genomic prediction by containing causative mutations directly, especially in cases where low linkage disequilibrium between markers and causative mutations limits prediction reliability, such as across-breed prediction in dairy cattle. In practice, the causative mutations are unknown, and prediction with only variants in perfect linkage disequilibrium with the causative mutations is not realistic, leading to a reduced reliability compared to knowing the causative variants. Our objective was to use sequence data to investigate the potential benefits of sequence data for the prediction of genomic relationships, and consequently reliability of genomic breeding values. We used sequence data from five dairy cattle breeds, and a larger number of imputed sequences for two of the five breeds. We focused on the influence of linkage disequilibrium between markers and causative mutations, and assumed that a fraction of the causative mutations was shared across breeds and had the same effect across breeds. By comparing the loss in reliability of different scenarios, varying the distance between markers and causative mutations, using either all genome wide markers from commercial SNP chips, or only the markers closest to the causative mutations, we demonstrate the importance of using only variants very close to the causative mutations, especially for across-breed prediction. Rare variants improved prediction only if they were very close to rare causative mutations, and all causative mutations were rare. Our results show that sequence data can potentially improve genomic prediction, but careful selection of markers is essential. PMID:27317779

  12. Molecular cloning and amino acid sequence of human 5-lipoxygenase

    SciTech Connect

    Matsumoto, T.; Funk, C.D.; Radmark, O.; Hoeoeg, J.O.; Joernvall, H.; Samuelsson, B.

    1988-01-01

    5-Lipoxygenase (EC 1.13.11.34), a Ca/sup 2 +/- and ATP-requiring enzyme, catalyzes the first two steps in the biosynthesis of the peptidoleukotrienes and the chemotactic factor leukotriene B/sub 4/. A cDNA clone corresponding to 5-lipoxygenase was isolated from a human lung lambda gt11 expression library by immunoscreening with a polyclonal antibody. Additional clones from a human placenta lambda gt11 cDNA library were obtained by plaque hybridization with the /sup 32/P-labeled lung cDNA clone. Sequence data obtained from several overlapping clones indicate that the composite DNAs contain the complete coding region for the enzyme. From the deduced primary structure, 5-lipoxygenase encodes a 673 amino acid protein with a calculated molecular weight of 77,839. Direct analysis of the native protein and its proteolytic fragments confirmed the deduced composition, the amino-terminal amino acid sequence, and the structure of many internal segments. 5-Lipoxygenase has no apparent sequence homology with leukotriene A/sub 4/ hydrolase or Ca/sup 2 +/-binding proteins. RNA blot analysis indicated substantial amounts of an mRNA species of approx. = 2700 nucleotides in leukocytes, lung, and placenta.

  13. Nucleic acid sequence detection using multiplexed oligonucleotide PCR

    DOEpatents

    Nolan, John P.; White, P. Scott

    2006-12-26

    Methods for rapidly detecting single or multiple sequence alleles in a sample nucleic acid are described. Provided are all of the oligonucleotide pairs capable of annealing specifically to a target allele and discriminating among possible sequences thereof, and ligating to each other to form an oligonucleotide complex when a particular sequence feature is present (or, alternatively, absent) in the sample nucleic acid. The design of each oligonucleotide pair permits the subsequent high-level PCR amplification of a specific amplicon when the oligonucleotide complex is formed, but not when the oligonucleotide complex is not formed. The presence or absence of the specific amplicon is used to detect the allele. Detection of the specific amplicon may be achieved using a variety of methods well known in the art, including without limitation, oligonucleotide capture onto DNA chips or microarrays, oligonucleotide capture onto beads or microspheres, electrophoresis, and mass spectrometry. Various labels and address-capture tags may be employed in the amplicon detection step of multiplexed assays, as further described herein.

  14. The amino acid sequence of rabbit muscle triose phosphate isomerase.

    PubMed Central

    Corran, P H; Waley, S G

    1975-01-01

    The amino acid sequence of rabbit muscle triose phosphate isomerase was deduced by characterizing peptides that overlap the tryptic peptides. Thiol groups were modified by oxidation, carboxymethylation or aminoen. About 50 peptides that provided information about overlaps were isolated; the peptides were mostly characterized by their compositions and N-terminal residues. The peptide chains contain 248 amino acid residues, and no evidence for dissimilarity of the two subunits that comprise the native enzyme was found. The sequence of the rabbit muscle enzyme may be compared with that of the coelacanth enzyme (Kolb et al., 1974): 84% of the residues are in identical positions. Similarly, comparison of the sequence with that inferred for the chicken enzyme (Furth et al., 1974) shows that 87% of the residues are in identical positions. Limited though these comparisons are, they suggest that triose phosphate isomerase has one of the lowest rates of evolutionary change. An extended version of the present paper has been deposited as Supplementary Publication SUP 50040 (42 pages) at the British Library (Lending Division) (formerly the National Lending Library for Science and Technology), Boston Spa, Yorks. LS23 7BQ, U.K., from whom copies can be obtained on the terms given in Biochem. J. (1975) 145, 5. PMID:1171682

  15. Association analysis for feet and legs disorders with whole-genome sequence variants in 3 dairy cattle breeds.

    PubMed

    Wu, Xiaoping; Guldbrandtsen, Bernt; Lund, Mogens Sandø; Sahana, Goutam

    2016-09-01

    Identification of genetic variants associated with feet and legs disorders (FLD) will aid in the genetic improvement of these traits by providing knowledge on genes that influence trait variations. In Denmark, FLD in cattle has been recorded since the 1990s. In this report, we used deregressed breeding values as response variables for a genome-wide association study. Bulls (5,334 Danish Holstein, 4,237 Nordic Red Dairy Cattle, and 1,180 Danish Jersey) with deregressed estimated breeding values were genotyped with the Illumina Bovine 54k single nucleotide polymorphism (SNP) genotyping array. Genotypes were imputed to whole-genome sequence variants, and then 22,751,039 SNP on 29 autosomes were used for an association analysis. A modified linear mixed-model approach (efficient mixed-model association eXpedited, EMMAX) and a linear mixed model were used for association analysis. We identified 5 (3,854 SNP), 3 (13,642 SNP), and 0 quantitative trait locus (QTL) regions associated with the FLD index in Danish Holstein, Nordic Red Dairy Cattle, and Danish Jersey populations, respectively. We did not identify any QTL that were common among the 3 breeds. In a meta-analysis of the 3 breeds, 4 QTL regions were significant, but no additional QTL region was identified compared with within-breed analyses. Comparison between top SNP locations within these QTL regions and known genes suggested that RASGRP1, LCORL, MOS, and MITF may be candidate genes for FLD in dairy cattle. PMID:27344389

  16. Toll-Like Receptor (TLR)-Associated Sequence Variants and Prostate Cancer Risk among Men of African Descent

    PubMed Central

    Rogers, Erica N.; Jones, Dominique; Kidd, Nayla C.; Yeyeodu, Susan; Brock, Guy; Ragin, Camille; Jackson, Maria; McFarlane-Anderson, Norma; Tulloch-Reid, Marshall; Kimbro, K. Sean; Kidd, LaCreis R.

    2013-01-01

    BACKGROUND Recent advances demonstrate a relationship between chronic/recurrent inflammation and prostate cancer (PCA). Among inflammatory regulators, toll-like receptors (TLRs) play a critical role in innate immune responses. However, it remains unclear whether variant TLR genes influence PCA risk among men of African descent. Therefore, we evaluated the impact of 32 TLR-associated single nucleotide polymorphisms (SNPs) on PCA risk among African-Americans and Jamaicans. METHODS SNP profiles of 814 subjects were evaluated using Illumina’s Veracode genotyping platform. Single and combined effects of SNPs in relation to PCA risk were assessed using age-adjusted logistic regression and entropy-based multifactor dimensionality reduction (MDR) models. RESULTS Seven sequence variants detected in TLR6, TOLLIP, IRAK4, IRF3 were marginally related to PCA. However, none of these effects remained significant after adjusting for multiple hypothesis testing. Nevertheless, MDR modeling revealed a complex interaction between IRAK4 rs4251545 and TLR2 rs1898830 as a significant predictor of PCA risk among U.S. men (permutation testing p-value = 0.001). CONCLUSIONS MDR identified an interaction between IRAK4 and TLR2 as the best two factor model for predicting PCA risk among men of African descent. However, these findings require further assessment and validation. PMID:23657238

  17. Vascular Ehlers-Danlos Syndrome in siblings with biallelic COL3A1 sequence variants and marked clinical variability in the extended family.

    PubMed

    Jørgensen, Agnete; Fagerheim, Toril; Rand-Hendriksen, Svend; Lunde, Per I; Vorren, Torgrim O; Pepin, Melanie G; Leistritz, Dru F; Byers, Peter H

    2015-06-01

    Vascular Ehlers-Danlos Syndrome (vEDS), also known as EDS type IV, is considered to be an autosomal dominant disorder caused by sequence variants in COL3A1, which encodes the chains of type III procollagen. We identified a family in which there was marked clinical variation with the earliest death due to extensive aortic dissection at age 15 years and other family members in their eighties with no complications. The proband was born with right-sided clubfoot but was otherwise healthy until he died unexpectedly at 15 years. His sister, in addition to signs consistent with vascular EDS, had bilateral frontal and parietal polymicrogyria. The proband and his sister each had two COL3A1 sequence variants, c.1786C>T, p.(Arg596*) in exon 26 and c.3851G>A, p.(Gly1284Glu) in exon 50 on different alleles. Cells from the compound heterozygote produced a reduced amount of type III procollagen, all the chains of which had abnormal electrophoretic mobility. Biallelic sequence variants have a significantly worse outcome than heterozygous variants for either null mutations or missense mutations, and frontoparietal polymicrogyria may be an added phenotype feature. This genetic constellation provides a very rare explanation for marked intrafamilial clinical variation due to sequence variants in COL3A1. PMID:25205403

  18. Vascular Ehlers–Danlos Syndrome in siblings with biallelic COL3A1 sequence variants and marked clinical variability in the extended family

    PubMed Central

    Jørgensen, Agnete; Fagerheim, Toril; Rand-Hendriksen, Svend; Lunde, Per I; Vorren, Torgrim O; Pepin, Melanie G; Leistritz, Dru F; Byers, Peter H

    2015-01-01

    Vascular Ehlers–Danlos Syndrome (vEDS), also known as EDS type IV, is considered to be an autosomal dominant disorder caused by sequence variants in COL3A1, which encodes the chains of type III procollagen. We identified a family in which there was marked clinical variation with the earliest death due to extensive aortic dissection at age 15 years and other family members in their eighties with no complications. The proband was born with right-sided clubfoot but was otherwise healthy until he died unexpectedly at 15 years. His sister, in addition to signs consistent with vascular EDS, had bilateral frontal and parietal polymicrogyria. The proband and his sister each had two COL3A1 sequence variants, c.1786C>T, p.(Arg596*) in exon 26 and c.3851G>A, p.(Gly1284Glu) in exon 50 on different alleles. Cells from the compound heterozygote produced a reduced amount of type III procollagen, all the chains of which had abnormal electrophoretic mobility. Biallelic sequence variants have a significantly worse outcome than heterozygous variants for either null mutations or missense mutations, and frontoparietal polymicrogyria may be an added phenotype feature. This genetic constellation provides a very rare explanation for marked intrafamilial clinical variation due to sequence variants in COL3A1. PMID:25205403

  19. The amino acid sequence of chymopapain from Carica papaya.

    PubMed Central

    Watson, D C; Yaguchi, M; Lynn, K R

    1990-01-01

    Chymopapain is a polypeptide of 218 amino acid residues. It has considerable structural similarity with papain and papaya proteinase omega, including conservation of the catalytic site and of the disulphide bonding. Chymopapain is like papaya proteinase omega in carrying four extra residues between papain positions 168 and 169, but differs from both papaya proteinases in the composition of its S2 subsite, as well as in having a second thiol group, Cys-117. Some evidence for the amino acid sequence of chymopapain has been deposited as Supplementary Publication SUP 50153 (12 pages) at the British Library Document Supply Centre, Boston Spa., Wetherby, West Yorkshire LS23 7BQ, U.K., from whom copies may be obtained on the terms indicated in Biochem. J. (1990) 265, 5. The information comprises Supplement Tables 1-4, which contain, in order, amino acid compositions of peptides from tryptic, peptic, CNBr and mild acid cleavages, Supplement Fig. 1, showing re-fractionation of selected peaks from Fig. 2 of the main paper. Supplement Fig. 2, showing cation-exchange chromatography of the earliest-eluted peak of Fig. 3 of the main paper, Supplement Fig. 3, showing reverse-phase h.p.l.c. of the later-eluted peak from Fig. 3 of the main paper, and Supplement Fig. 4, showing the separation of peptides after mild acid hydrolysis of CNBr-cleavage fragment CB3. PMID:2106878

  20. The amino acid sequence of chymopapain from Carica papaya.

    PubMed

    Watson, D C; Yaguchi, M; Lynn, K R

    1990-02-15

    Chymopapain is a polypeptide of 218 amino acid residues. It has considerable structural similarity with papain and papaya proteinase omega, including conservation of the catalytic site and of the disulphide bonding. Chymopapain is like papaya proteinase omega in carrying four extra residues between papain positions 168 and 169, but differs from both papaya proteinases in the composition of its S2 subsite, as well as in having a second thiol group, Cys-117. Some evidence for the amino acid sequence of chymopapain has been deposited as Supplementary Publication SUP 50153 (12 pages) at the British Library Document Supply Centre, Boston Spa., Wetherby, West Yorkshire LS23 7BQ, U.K., from whom copies may be obtained on the terms indicated in Biochem. J. (1990) 265, 5. The information comprises Supplement Tables 1-4, which contain, in order, amino acid compositions of peptides from tryptic, peptic, CNBr and mild acid cleavages, Supplement Fig. 1, showing re-fractionation of selected peaks from Fig. 2 of the main paper. Supplement Fig. 2, showing cation-exchange chromatography of the earliest-eluted peak of Fig. 3 of the main paper, Supplement Fig. 3, showing reverse-phase h.p.l.c. of the later-eluted peak from Fig. 3 of the main paper, and Supplement Fig. 4, showing the separation of peptides after mild acid hydrolysis of CNBr-cleavage fragment CB3. PMID:2106878

  1. Identification of Bari Transposons in 23 Sequenced Drosophila Genomes Reveals Novel Structural Variants, MITEs and Horizontal Transfer.

    PubMed

    Palazzo, Antonio; Lovero, Domenica; D'Addabbo, Pietro; Caizzi, Ruggiero; Marsano, René Massimiliano

    2016-01-01

    Bari elements are members of the Tc1-mariner superfamily of DNA transposons, originally discovered in Drosophila melanogaster, and subsequently identified in silico in 11 sequenced Drosophila genomes and as experimentally isolated in four non-sequenced Drosophila species. Bari-like elements have been also studied for their mobility both in vivo and in vitro. We analyzed 23 Drosophila genomes and carried out a detailed characterization of the Bari elements identified, including those from the heterochromatic Bari1 cluster in D. melanogaster. We have annotated 401 copies of Bari elements classified either as putatively autonomous or inactive according to the structure of the terminal sequences and the presence of a complete transposase-coding region. Analyses of the integration sites revealed that Bari transposase prefers AT-rich sequences in which the TA target is cleaved and duplicated. Furthermore evaluation of transposon's co-occurrence near the integration sites of Bari elements showed a non-random distribution of other transposable elements. We also unveil the existence of a putatively autonomous Bari1 variant characterized by two identical long Terminal Inverted Repeats, in D. rhopaloa. In addition, we detected MITEs related to Bari transposons in 9 species. Phylogenetic analyses based on transposase gene and the terminal sequences confirmed that Bari-like elements are distributed into three subfamilies. A few inconsistencies in Bari phylogenetic tree with respect to the Drosophila species tree could be explained by the occurrence of horizontal transfer events as also suggested by the results of dS analyses. This study further clarifies the Bari transposon's evolutionary dynamics and increases our understanding on the Tc1-mariner elements' biology. PMID:27213270

  2. Identification of Bari Transposons in 23 Sequenced Drosophila Genomes Reveals Novel Structural Variants, MITEs and Horizontal Transfer

    PubMed Central

    D’Addabbo, Pietro; Caizzi, Ruggiero

    2016-01-01

    Bari elements are members of the Tc1-mariner superfamily of DNA transposons, originally discovered in Drosophila melanogaster, and subsequently identified in silico in 11 sequenced Drosophila genomes and as experimentally isolated in four non-sequenced Drosophila species. Bari-like elements have been also studied for their mobility both in vivo and in vitro. We analyzed 23 Drosophila genomes and carried out a detailed characterization of the Bari elements identified, including those from the heterochromatic Bari1 cluster in D. melanogaster. We have annotated 401 copies of Bari elements classified either as putatively autonomous or inactive according to the structure of the terminal sequences and the presence of a complete transposase-coding region. Analyses of the integration sites revealed that Bari transposase prefers AT-rich sequences in which the TA target is cleaved and duplicated. Furthermore evaluation of transposon’s co-occurrence near the integration sites of Bari elements showed a non-random distribution of other transposable elements. We also unveil the existence of a putatively autonomous Bari1 variant characterized by two identical long Terminal Inverted Repeats, in D. rhopaloa. In addition, we detected MITEs related to Bari transposons in 9 species. Phylogenetic analyses based on transposase gene and the terminal sequences confirmed that Bari-like elements are distributed into three subfamilies. A few inconsistencies in Bari phylogenetic tree with respect to the Drosophila species tree could be explained by the occurrence of horizontal transfer events as also suggested by the results of dS analyses. This study further clarifies the Bari transposon’s evolutionary dynamics and increases our understanding on the Tc1-mariner elements’ biology. PMID:27213270

  3. Whole genome sequencing of a natural recombinant Toxoplasma gondii strain reveals chromosome sorting and local allelic variants

    PubMed Central

    Bontell, Irene Lindström; Hall, Neil; Ashelford, Kevin E; Dubey, JP; Boyle, Jon P; Lindh, Johan; Smith, Judith E

    2009-01-01

    Background Toxoplasma gondii is a zoonotic parasite of global importance. In common with many protozoan parasites it has the capacity for sexual recombination, but current evidence suggests this is rarely employed. The global population structure is dominated by a small number of clonal genotypes, which exhibit biallelic variation and limited intralineage divergence. Little is known of the genotypes present in Africa despite the importance of AIDS-associated toxoplasmosis. Results We here present extensive sequence analysis of eight isolates from Uganda, including the whole genome sequencing of a type II/III recombinant isolate, TgCkUg2. 454 sequencing gave 84% coverage across the approximate 61 Mb genome and over 70,000 single nucleotide polymorphisms (SNPs) were mapped against reference strains. TgCkUg2 was shown to contain entire chromosomes of either type II or type III origin, demonstrating chromosome sorting rather than intrachromosomal recombination. We mapped 1,252 novel polymorphisms and clusters of new SNPs within coding sequence implied selective pressure on a number of genes, including surface antigens and rhoptry proteins. Further sequencing of the remaining isolates, six type II and one type III strain, confirmed the presence of novel SNPs, suggesting these are local allelic variants within Ugandan type II strains. In mice, the type III isolate had parasite burdens at least 30-fold higher than type II isolates, while the recombinant strain had an intermediate burden. Conclusions Our data demonstrate that recombination between clonal lineages does occur in nature but there is nevertheless close homology between African and North American isolates. The quantity of high confidence SNP data generated in this study and the availability of the putative parental strains to this natural recombinant provide an excellent basis for future studies of the genetic divergence and of genotype-phenotype relationships. PMID:19457243

  4. Panel sequencing for clinically oriented variant screening and copy number detection in 142 untreated multiple myeloma patients

    PubMed Central

    Kortuem, K M; Braggio, E; Bruins, L; Barrio, S; Shi, C S; Zhu, Y X; Tibes, R; Viswanatha, D; Votruba, P; Ahmann, G; Fonseca, R; Jedlowski, P; Schlam, I; Kumar, S; Bergsagel, P L; Stewart, A K

    2016-01-01

    We employed a customized Multiple Myeloma (MM)-specific Mutation Panel (M3P) to screen a homogenous cohort of 142 untreated MM patients for relevant mutations in a selection of disease-specific genes. M3Pv2.0 includes 77 genes selected for being either actionable targets, potentially related to drug–response or part of known key pathways in MM biology. We identified mutations in potentially actionable genes in 49% of patients and provided prognostic evidence of STAT3 mutations. This panel may serve as a practical alternative to more comprehensive sequencing approaches, providing genomic information in a timely and cost-effective manner, thus allowing clinically oriented variant screening in MM. PMID:26918361

  5. Kinetic and Sequence-Structure-Function Analysis of LinB Enzyme Variants with β- and δ-Hexachlorocyclohexane

    PubMed Central

    Kumari, Kirti; Sharma, Pooja; Lal, Rup; Oakeshott, John G.; Pandey, Gunjan

    2014-01-01

    Organochlorine insecticide hexachlorocyclohexane (HCH) has recently been classified as a ‘Persistent Organic pollutant’ by the Stockholm Convention. The LinB haloalkane dehalogenase is a key upstream enzyme in the recently evolved Lin pathway for the catabolism of HCH in bacteria. Here we report a sequence-structure-function analysis of ten naturally occurring and thirteen synthetic mutants of LinB. One of the synthetic mutants was found to have ∼80 fold more activity for β- and δ-hexachlorocyclohexane. Based on detailed biophysical calculations, molecular dynamics and ensemble docking calculations, we propose that the latter variant is more active because of alterations to the shape of its active site and increased conformational plasticity. PMID:25076214

  6. Kinetic and sequence-structure-function analysis of LinB enzyme variants with β- and δ-hexachlorocyclohexane.

    PubMed

    Pandey, Rinku; Lucent, Del; Kumari, Kirti; Sharma, Pooja; Lal, Rup; Oakeshott, John G; Pandey, Gunjan

    2014-01-01

    Organochlorine insecticide hexachlorocyclohexane (HCH) has recently been classified as a 'Persistent Organic pollutant' by the Stockholm Convention. The LinB haloalkane dehalogenase is a key upstream enzyme in the recently evolved Lin pathway for the catabolism of HCH in bacteria. Here we report a sequence-structure-function analysis of ten naturally occurring and thirteen synthetic mutants of LinB. One of the synthetic mutants was found to have ∼80 fold more activity for β- and δ-hexachlorocyclohexane. Based on detailed biophysical calculations, molecular dynamics and ensemble docking calculations, we propose that the latter variant is more active because of alterations to the shape of its active site and increased conformational plasticity. PMID:25076214

  7. Exome sequencing identifies recessive CDK5RAP2 variants in patients with isolated agenesis of corpus callosum.

    PubMed

    Jouan, Loubna; Ouled Amar Bencheikh, Bouchra; Daoud, Hussein; Dionne-Laporte, Alexandre; Dobrzeniecka, Sylvia; Spiegelman, Dan; Rochefort, Daniel; Hince, Pascale; Szuto, Anna; Lassonde, Maryse; Barbelanne, Marine; Tsang, William Y; Dion, Patrick A; Théoret, Hugo; Rouleau, Guy A

    2016-04-01

    Agenesis of the corpus callosum (ACC) is a common brain malformation which can be observed either as an isolated condition or as part of numerous congenital syndromes. Therefore, cognitive and neurological involvements in patients with ACC are variable, from mild linguistic and behavioral impairments to more severe neurological deficits. To date, the underlying genetic causes of isolated ACC remains elusive and causative genes have yet to be identified. We performed exome sequencing on three acallosal siblings from the same non-consanguineous family and identified compound heterozygous variants, p.[Gly94Arg];[Asn1232Ser], in the protein encoded by the CDK5RAP2 gene, also known as MCPH3, a gene previously reported to cause autosomal recessive primary microcephaly. Our findings suggest a novel role for this gene in the pathogenesis of isolated ACC. PMID:26197979

  8. Genetic integrity of somaclonal variants in tea (Camellia sinensis (L.) O Kuntze) as revealed by inter simple sequence repeats.

    PubMed

    Thomas, Jibu; Vijayan, Deepu; Joshi, Sarvottam D; Joseph Lopez, S; Raj Kumar, R

    2006-05-17

    Adoption of inter simple sequence repeats (ISSR) technique to analyze the genetic variability of somatic embryo derived tea plants was evaluated. Morphological characterisation of the field grown plants revealed no identical character aligning with the parent, UPASI-10. Out of 40 primers, 15 exhibited concurrent polymorphism were selected for the study. Genetic variability of somaclones derived from single line cotyledonary culture ranged from 33.0 to 55.0%. A unique fragment of 1.2Kb was visible in majority of the accessions whereas the fragments below the length of 0.6Kb were noticed only in 50% of the variants. Out of 120 interactions attempted using Pearson's coefficient correlation, only 9.2% of somaclones exhibited significant similarity at genetic level. Dendrogram constructed based on simple matching coefficient revealed a distance of 2.257-3.317 between the final clusters. This strengthens the existence of wide genetic variation among the somaclones. PMID:16360228

  9. Complete Genome Sequence of the Porcine Epidemic Diarrhea Virus Variant CH/HNYF/2014.

    PubMed

    Li, Renfeng; Tian, Xiangqin; Qiao, Songlin; Guo, Junqing; Xie, Weitao; Zhang, Gaiping

    2015-01-01

    Sow's milk is a potential route for the vertical transmission of porcine epidemic diarrhea virus (PEDV) from sow to suckling piglet. We report here the complete genome sequence of PEDV strain CH/HNYF/2014, which was isolated from milk samples : This information provides further understanding of the transmission mechanisms and genetic diversity of PEDV. PMID:26679593

  10. The KL-VS sequence variant of Klotho and cancer risk in BRCA1 and BRCA2 mutation carriers

    PubMed Central

    Laitman, Yael; Kuchenbaecker, Karoline B.; Rantala, Johanna; Hogervorst, Frans; Peock, Susan; Godwin, Andrew K.; Arason, Adalgeir; Kirchhoff, Tomas; Offit, Kenneth; Isaacs, Claudine; Schmutzler, Rita K.; Wappenschmidt, Barbara; Nevanlinna, Heli; Chen, Xiaoqing; Chenevix-Trench, Georgia; Healey, Sue; Couch, Fergus; Peterlongo, Paolo; Radice, Paolo; Nathanson, Katherine L.; Caligo, Maria Adelaide; Neuhausen, Susan L.; Ganz, Patricia; Sinilnikova, Olga M.; McGuffog, Lesley; Easton, Douglas F.; Antoniou, Antonis C.; Wolf, Ido

    2012-01-01

    Klotho (KL) is a putative tumor suppressor gene in breast and pancreatic cancers located at chromosome 13q12. A functional sequence variant of Klotho (KL-VS) was previously reported to modify breast cancer risk in Jewish BRCA1 mutation carriers. The effect of this variant on breast and ovarian cancer risks in non-Jewish BRCA1/BRCA2 mutation carriers has not been reported. The KL-VS variant was genotyped in women of European ancestry carrying a BRCA mutation: 5,741 BRCA1 mutation carriers (2,997 with breast cancer, 705 with ovarian cancer, and 2,039 cancer free women) and 3,339 BRCA2 mutation carriers (1,846 with breast cancer, 207 with ovarian cancer, and 1,286 cancer free women) from 16 centers. Genotyping was accomplished using TaqMan® allelic discrimination or matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Data were analyzed within a retrospective cohort approach, stratified by country of origin and Ashkenazi Jewish origin. The per-allele hazard ratio (HR) for breast cancer was 1.02 (95% CI 0.93–1.12, P = 0.66) for BRCA1 mutation carriers and 0.92 (95% CI 0.82–1.04, P = 0.17) for BRCA2 mutation carriers. Results remained unaltered when analysis excluded prevalent breast cancer cases. Similarly, the per-allele HR for ovarian cancer was 1.01 (95% CI 0.84–1.20, P = 0.95) for BRCA1 mutation carriers and 0.9 (95% CI 0.66–1.22, P = 0.45) for BRCA2 mutation carriers. The risk did not change when carriers of the 6174delT mutation were excluded. There was a lack of association of the KL-VS Klotho variant with either breast or ovarian cancer risk in BRCA1 and BRCA2 mutation carriers. PMID:22212556

  11. Whole-genome re-sequencing for the identification of high contribution susceptibility gene variants in patients with type 2 diabetes

    PubMed Central

    SUN, XIAOJUAN; SUI, WEIGUO; WANG, XIAOBING; HOU, XIANLIANG; OU, MINGLIN; DAI, YONG; XIANG, YUEYING

    2016-01-01

    There is increasing evidence that several genes are associated with an increased risk of type 2 diabetes (T2D); genome-wide association investigations and whole-genome re-sequencing investigations offer a useful approach for the identification of genes involved in common human diseases. To further investigate which polymorphisms confer susceptibility to T2D, the present study screened for high-contribution susceptibility gene variants Chinese patients with T2D using whole-genome re-sequencing with DNA pooling. In total, 100 Chinese individuals with T2D and 100 healthy Chinese individuals were analyzed using whole-genome re-sequencing using DNA pooling. To minimize the likelihood of systematic bias in sampling, paired-end libraries with an insert size of 500 bp were prepared for in T2D in all samples, which were then subjected to whole-genome sequencing. Each library contained four lanes. The average sequencing depth was 35.70. In the present study, 1.36 GB of clean sequence data were generated, and the resulting calculated T2D genome consensus sequence covered 99.88% of the hg19 sequence. A total of 3,974,307 single nucleotide polymorphisms were identified, of which 99.88% were in the dbSNP database. The present study also found 642,189 insertions and deletions, 5,590 structure variants (SVs), 4,713 copy number variants (CNVs) and 13,049 single nucleotide variants. A total of 1,884 somatic CNVs and 74 somatic SVs were significantly different between the cases and controls. Therefore, the present study provided validation of whole-genome re-sequencing using the DNA pooling approach. It also generated a whole-genome re-sequencing genotype database for future investigations of T2D. PMID:27035118

  12. Whole-genome re-sequencing for the identification of high contribution susceptibility gene variants in patients with type 2 diabetes.

    PubMed

    Sun, Xiaojuan; Sui, Weiguo; Wang, Xiaobing; Hou, Xianliang; Ou, Minglin; Dai, Yong; Xiang, Yueying

    2016-05-01

    There is increasing evidence that several genes are associated with an increased risk of type 2 diabetes (T2D); genome-wide association investigations and whole-genome re‑sequencing investigations offer a useful approach for the identification of genes involved in common human diseases. To further investigate which polymorphisms confer susceptibility to T2D, the present study screened for high‑contribution susceptibility gene variants Chinese patients with T2D using whole‑genome re‑sequencing with DNA pooling. In total, 100 Chinese individuals with T2D and 100 healthy Chinese individuals were analyzed using whole‑genome re‑sequencing using DNA pooling. To minimize the likelihood of systematic bias in sampling, paired‑end libraries with an insert size of 500 bp were prepared for in T2D in all samples, which were then subjected to whole‑genome sequencing. Each library contained four lanes. The average sequencing depth was 35.70. In the present study, 1.36 GB of clean sequence data were generated, and the resulting calculated T2D genome consensus sequence covered 99.88% of the hg19 sequence. A total of 3,974,307 single nucleotide polymorphisms were identified, of which 99.88% were in the dbSNP database. The present study also found 642,189 insertions and deletions, 5,590 structure variants (SVs), 4,713 copy number variants (CNVs) and 13,049 single nucleotide variants. A total of 1,884 somatic CNVs and 74 somatic SVs were significantly different between the cases and controls. Therefore, the present study provided validation of whole‑genome re‑sequencing using the DNA pooling approach. It also generated a whole-genome re-sequencing genotype database for future investigations of T2D. PMID:27035118

  13. Genome-wide association study for endocrine fertility traits using single nucleotide polymorphism arrays and sequence variants in dairy cattle.

    PubMed

    Tenghe, A M M; Bouwman, A C; Berglund, B; Strandberg, E; de Koning, D J; Veerkamp, R F

    2016-07-01

    Endocrine fertility traits, which are defined from progesterone concentration levels in milk, are interesting indicators of dairy cow fertility because they more directly reflect the cows own reproductive physiology than classical fertility traits, which are more biased by farm management decisions. The aim of this study was to detect quantitative trait loci (QTL) for 7 endocrine fertility traits in dairy cows by performing a genome-wide association study with 85k single nucleotide polymorphisms (SNP), and then fine-map targeted QTL regions, using imputed sequence variants. Two classical fertility traits were also analyzed for QTL with 85k SNP. The association between a SNP and a phenotype was assessed by single-locus regression for each SNP, using a linear mixed model that included a random polygenic effect. A total of 2,447 Holstein Friesian cows with 5,339 lactations with both phenotypes and genotypes were used for association analysis. Heritability estimates ranged from 0.09 to 0.15 for endocrine fertility traits and 0.03 to 0.10 for classical fertility traits. The genome-wide association study identified 17 QTL regions for endocrine fertility traits on Bos taurus autosomes (BTA) 2, 3, 8, 12, 15, 17, 23, and 25. The highest number (5) of QTL regions from the genome-wide association study was identified for the endocrine trait "proportion of samples with luteal activity." Overlapping QTL regions were found between endocrine traits on BTA 2, 3, and 17. For the classical trait calving to first service, 3 QTL regions were identified on BTA 3, 15, and 23, and an overlapping region was identified on BTA 23 with endocrine traits. Fine-mapping target regions for the endocrine traits on BTA 2 and 3 using imputed sequence variants confirmed the QTL from the genome-wide association study, and identified several associated variants that can contribute to an index of markers for genetic improvement of fertility. Several potential candidate genes underlying endocrine

  14. Sequence variants at CHRNB3-CHRNA6 and CYP2A6 affect smoking behavior

    PubMed Central

    Thorgeirsson, Thorgeir E.; Gudbjartsson, Daniel F.; Surakka, Ida; Vink, Jacqueline M.; Amin, Najaf; Geller, Frank; Sulem, Patrick; Rafnar, Thorunn; Esko, Tõnu; Walter, Stefan; Gieger, Christian; Rawal, Rajesh; Mangino, Massimo; Prokopenko, Inga; Mägi, Reedik; Keskitalo, Kaisu; Gudjonsdottir, Iris H.; Gretarsdottir, Solveig; Stefansson, Hreinn; Thompson, John R.; Aulchenko, Yurii S.; Nelis, Mari; Aben, Katja K.; den Heijer, Martin; Dirksen, Asger; Ashraf, Haseem; Soranzo, Nicole; Valdes, Ana M; Steves, Claire; Uitterlinden, André G; Hofman, Albert; Tönjes, Anke; Kovacs, Peter; Hottenga, Jouke Jan; Willemsen, Gonneke; Vogelzangs, Nicole; Döring, Angela; Dahmen, Norbert; Nitz, Barbara; Pergadia, Michele L.; Saez, Berta; De Diego, Veronica; Lezcano, Victoria; Garcia-Prats, Maria D.; Ripatti, Samuli; Perola, Markus; Kettunen, Johannes; Hartikainen, Anna-Liisa; Pouta, Anneli; Laitinen, Jaana; Isohanni, Matti; Huei-Yi, Shen; Allen, Maxine; Krestyaninova, Maria; Hall, Alistair S; Jones, Gregory T.; van Rij, Andre M.; Mueller, Thomas; Dieplinger, Benjamin; Haltmayer, Meinhard; Jonsson, Steinn; Matthiasson, Stefan E.; Oskarsson, Hogni; Tyrfingsson, Thorarinn; Kiemeney, Lambertus A.; Mayordomo, Jose I.; Lindholt, Jes S; Pedersen, Jesper Holst; Franklin, Wilbur A.; Wolf, Holly; Montgomery, Grant W.; Heath, Andrew C.; Martin, Nicholas G.; Madden, Pamela A.F.; Giegling, Ina; Rujescu, Dan; Järvelin, Marjo-Riitta; Salomaa, Veikko; Stumvoll, Michael; Spector, Tim D; Wichmann, H-Erich; Metspalu, Andres; Samani, Nilesh J.; Penninx, Brenda W.; Oostra, Ben A.; Boomsma, Dorret I.; Tiemeier, Henning; van Duijn, Cornelia M.; Kaprio, Jaakko; Gulcher, Jeffrey R.; McCarthy, Mark I.; Peltonen, Leena; Thorsteinsdottir, Unnur; Stefansson, Kari

    2011-01-01

    Smoking is a risk factor for most of the diseases leading in mortality1. We conducted genome-wide association (GWA) meta-analyses of smoking data within the ENGAGE consortium to search for common alleles associating with the number of cigarettes smoked per day (CPD) in smokers (N=31,266) and smoking initiation (N=46,481). We tested selected SNPs in a second stage (N=45,691 smokers), and assessed some in a third sample (N=9,040). Variants in three genomic regions associated with CPD (P< 5·10−8), including previously identified SNPs at 15q25 represented by rs1051730-A (0.80 CPD,P=2.4·10−69), and SNPs at 19q13 and 8p11, represented by rs4105144-C (0.39 CPD, P=2.2·10−12) and rs6474412-T (0.29 CPD,P= 1.4·10−8), respectively. Among the genes at the two novel loci, are genes encoding nicotine-metabolizing enzymes (CYP2A6 and CYP2B6), and nicotinic acetylcholine receptor subunits (CHRNB3 and CHRNA6) highlighted in previous studies of nicotine dependence2-3. Nominal associations with lung cancer were observed at both 8p11 (rs6474412-T,OR=1.09,P=0.04) and 19q13 (rs4105144-C,OR=1.12,P=0.0006). PMID:20418888

  15. Amino acid sequence prerequisites for the formation of cn ions.

    PubMed

    Downard, K M; Biemann, K

    1993-11-01

    Ammo acid sequence prerequisites are described for the formation of c, ions observed in high-energy collision-induced decomposition spectra of peptides. It is shown that the formation of cn ions is promoted by the nature of the amino acid C-terminal to the cleavage site. A propensity for cn cleavage preceding threonine, and to a lesser extent tryptophan, lysine, and serine, is demonstrated where fragmentation is directed N-terminally at these residues. In addition, the nature of the residue N-terminal to the cleavage site is shown to have little effect on cn ion formation. A mechanism for cn ion formation is proposed and its applicability to the results observed is discussed. PMID:24227531

  16. Ultrasensitive nucleic acid sequence detection by single-molecule electrophoresis

    SciTech Connect

    Castro, A; Shera, E.B.

    1996-09-01

    This is the final report of a one-year laboratory-directed research and development project at Los Alamos National Laboratory. There has been considerable interest in the development of very sensitive clinical diagnostic techniques over the last few years. Many pathogenic agents are often present in extremely small concentrations in clinical samples, especially at the initial stages of infection, making their detection very difficult. This project sought to develop a new technique for the detection and accurate quantification of specific bacterial and viral nucleic acid sequences in clinical samples. The scheme involved the use of novel hybridization probes for the detection of nucleic acids combined with our recently developed technique of single-molecule electrophoresis. This project is directly relevant to the DOE`s Defense Programs strategic directions in the area of biological warfare counter-proliferation.

  17. Exome Sequencing of Only Seven Qataris Identifies Potentially Deleterious Variants in the Qatari Population

    PubMed Central

    Rodriguez-Flores, Juan L.; Fuller, Jennifer; Hackett, Neil R.; Salit, Jacqueline; Malek, Joel A.; Al-Dous, Eman; Chouchane, Lotfi; Zirie, Mahmoud; Jayoussi, Amin; Mahmoud, Mai A.; Crystal, Ronald G.; Mezey, Jason G.

    2012-01-01

    The Qatari population, located at the Arabian migration crossroads of African and Eurasia, is comprised of Bedouin, Persian and African genetic subgroups. By deep exome sequencing of only 7 Qataris, including individuals in each subgroup, we identified 2,750 nonsynonymous SNPs predicted to be deleterious, many of which are linked to human health, or are in genes linked to human health. Many of these SNPs were at significantly elevated deleterious allele frequency in Qataris compared to other populations worldwide. Despite the small sample size, SNP allele frequency was highly correlated with a larger Qatari sample. Together, the data demonstrate that exome sequencing of only a small number of individuals can reveal genetic variations with potential health consequences in understudied populations. PMID:23139751

  18. Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies.

    PubMed

    Lee, Seunggeun; Emond, Mary J; Bamshad, Michael J; Barnes, Kathleen C; Rieder, Mark J; Nickerson, Deborah A; Christiani, David C; Wurfel, Mark M; Lin, Xihong

    2012-08-10

    We propose in this paper a unified approach for testing the association between rare variants and phenotypes in sequencing association studies. This approach maximizes power by adaptively using the data to optimally combine the burden test and the nonburden sequence kernel association test (SKAT). Burden tests are more powerful when most variants in a region are causal and the effects are in the same direction, whereas SKAT is more powerful when a large fraction of the variants in a region are noncausal or the effects of causal variants are in different directions. The proposed unified test maintains the power in both scenarios. We show that the unified test corresponds to the optimal test in an extended family of SKAT tests, which we refer to as SKAT-O. The second goal of this paper is to develop a small-sample adjustment procedure for the proposed methods for the correction of conservative type I error rates of SKAT family tests when the trait of interest is dichotomous and the sample size is small. Both small-sample-adjusted SKAT and the optimal unified test (SKAT-O) are computationally efficient and can easily be applied to genome-wide sequencing association studies. We evaluate the finite sample performance of the proposed methods using extensive simulation studies and illustrate their application using the acute-lung-injury exome-sequencing data of the National Heart, Lung, and Blood Institute Exome Sequencing Project. PMID:22863193

  19. Optimal Unified Approach for Rare-Variant Association Testing with Application to Small-Sample Case-Control Whole-Exome Sequencing Studies

    PubMed Central

    Lee, Seunggeun; Emond, Mary J.; Bamshad, Michael J.; Barnes, Kathleen C.; Rieder, Mark J.; Nickerson, Deborah A.; Christiani, David C.; Wurfel, Mark M.; Lin, Xihong

    2012-01-01

    We propose in this paper a unified approach for testing the association between rare variants and phenotypes in sequencing association studies. This approach maximizes power by adaptively using the data to optimally combine the burden test and the nonburden sequence kernel association test (SKAT). Burden tests are more powerful when most variants in a region are causal and the effects are in the same direction, whereas SKAT is more powerful when a large fraction of the variants in a region are noncausal or the effects of causal variants are in different directions. The proposed unified test maintains the power in both scenarios. We show that the unified test corresponds to the optimal test in an extended family of SKAT tests, which we refer to as SKAT-O. The second goal of this paper is to develop a small-sample adjustment procedure for the proposed methods for the correction of conservative type I error rates of SKAT family tests when the trait of interest is dichotomous and the sample size is small. Both small-sample-adjusted SKAT and the optimal unified test (SKAT-O) are computationally efficient and can easily be applied to genome-wide sequencing association studies. We evaluate the finite sample performance of the proposed methods using extensive simulation studies and illustrate their application using the acute-lung-injury exome-sequencing data of the National Heart, Lung, and Blood Institute Exome Sequencing Project. PMID:22863193

  20. cDNA sequences of variant forms of human placenta diamine oxidase

    SciTech Connect

    Zhang, X.; Kim, J.; McIntire, S.

    1995-08-01

    Genes for two forms of human placenta diamine oxidase (dao) were cloned from a cDNA library and sequenced. One gene, pdao1, is identical in length to human kidney dao but differs from it by two bases in the coding region and differs slightly in the 3{prime} - and 5{prime}-noncoding regions. The second gene, pdao2, is nearly identical to these genes in the coding region, except that it has an extra 57-nucleotide coding segment near the 3{prime} end of this region. This segment corresponds to the contiguous sequence of the 3{prime} end of intron 3 of human kidney dao. pdao2 also differs significantly from pdao1 and human kidney dao in a 13-base sequence in the t{prime}-noncoding region. It is proposed that pdao1 and human kidney dao are polymorphic forms of the same allele. Whether pdao2 is a polymorph of these two is not certain, because of the significant differences in the coding and noncoding regions. pdao2 may represent a different allele. 21 refs., 2 figs.

  1. Identification of Functional Variants for Cleft Lip with or without Cleft Palate in or near PAX7, FGFR2, and NOG by Targeted Sequencing of GWAS Loci

    PubMed Central

    Leslie, Elizabeth J.; Taub, Margaret A.; Liu, Huan; Steinberg, Karyn Meltz; Koboldt, Daniel C.; Zhang, Qunyuan; Carlson, Jenna C.; Hetmanski, Jacqueline B.; Wang, Hang; Larson, David E.; Fulton, Robert S.; Kousa, Youssef A.; Fakhouri, Walid D.; Naji, Ali; Ruczinski, Ingo; Begum, Ferdouse; Parker, Margaret M.; Busch, Tamara; Standley, Jennifer; Rigdon, Jennifer; Hecht, Jacqueline T.; Scott, Alan F.; Wehby, George L.; Christensen, Kaare; Czeizel, Andrew E.; Deleyiannis, Frederic W.-B.; Schutte, Brian C.; Wilson, Richard K.; Cornell, Robert A.; Lidral, Andrew C.; Weinstock, George M.; Beaty, Terri H.; Marazita, Mary L.; Murray, Jeffrey C.

    2015-01-01

    Although genome-wide association studies (GWASs) for nonsyndromic orofacial clefts have identified multiple strongly associated regions, the causal variants are unknown. To address this, we selected 13 regions from GWASs and other studies, performed targeted sequencing in 1,409 Asian and European trios, and carried out a series of statistical and functional analyses. Within a cluster of strongly associated common variants near NOG, we found that one, rs227727, disrupts enhancer activity. We furthermore identified significant clusters of non-coding rare variants near NTN1 and NOG and found several rare coding variants likely to affect protein function, including four nonsense variants in ARHGAP29. We confirmed 48 de novo mutations and, based on best biological evidence available, chose two of these for functional assays. One mutation in PAX7 disrupted the DNA binding of the encoded transcription factor in an in vitro assay. The second, a non-coding mutation, disrupted the activity of a neural crest enhancer downstream of FGFR2 both in vitro and in vivo. This targeted sequencing study provides strong functional evidence implicating several specific variants as primary contributory risk alleles for nonsyndromic clefting in humans. PMID:25704602

  2. Association of Low-Frequency and Rare Coding-Sequence Variants with Blood Lipids and Coronary Heart Disease in 56,000 Whites and Blacks

    PubMed Central

    Peloso, Gina M.; Auer, Paul L.; Bis, Joshua C.; Voorman, Arend; Morrison, Alanna C.; Stitziel, Nathan O.; Brody, Jennifer A.; Khetarpal, Sumeet A.; Crosby, Jacy R.; Fornage, Myriam; Isaacs, Aaron; Jakobsdottir, Johanna; Feitosa, Mary F.; Davies, Gail; Huffman, Jennifer E.; Manichaikul, Ani; Davis, Brian; Lohman, Kurt; Joon, Aron Y.; Smith, Albert V.; Grove, Megan L.; Zanoni, Paolo; Redon, Valeska; Demissie, Serkalem; Lawson, Kim; Peters, Ulrike; Carlson, Christopher; Jackson, Rebecca D.; Ryckman, Kelli K.; Mackey, Rachel H.; Robinson, Jennifer G.; Siscovick, David S.; Schreiner, Pamela J.; Mychaleckyj, Josyf C.; Pankow, James S.; Hofman, Albert; Uitterlinden, Andre G.; Harris, Tamara B.; Taylor, Kent D.; Stafford, Jeanette M.; Reynolds, Lindsay M.; Marioni, Riccardo E.; Dehghan, Abbas; Franco, Oscar H.; Patel, Aniruddh P.; Lu, Yingchang; Hindy, George; Gottesman, Omri; Bottinger, Erwin P.; Melander, Olle; Orho-Melander, Marju; Loos, Ruth J.F.; Duga, Stefano; Merlini, Piera Angelica; Farrall, Martin; Goel, Anuj; Asselta, Rosanna; Girelli, Domenico; Martinelli, Nicola; Shah, Svati H.; Kraus, William E.; Li, Mingyao; Rader, Daniel J.; Reilly, Muredach P.; McPherson, Ruth; Watkins, Hugh; Ardissino, Diego; Zhang, Qunyuan; Wang, Judy; Tsai, Michael Y.; Taylor, Herman A.; Correa, Adolfo; Griswold, Michael E.; Lange, Leslie A.; Starr, John M.; Rudan, Igor; Eiriksdottir, Gudny; Launer, Lenore J.; Ordovas, Jose M.; Levy, Daniel; Chen, Y.-D. Ida; Reiner, Alexander P.; Hayward, Caroline; Polasek, Ozren; Deary, Ian J.; Borecki, Ingrid B.; Liu, Yongmei; Gudnason, Vilmundur; Wilson, James G.; van Duijn, Cornelia M.; Kooperberg, Charles; Rich, Stephen S.; Psaty, Bruce M.; Rotter, Jerome I.; O’Donnell, Christopher J.; Rice, Kenneth; Boerwinkle, Eric; Kathiresan, Sekar; Cupples, L. Adrienne

    2014-01-01

    Low-frequency coding DNA sequence variants in the proprotein convertase subtilisin/kexin type 9 gene (PCSK9) lower plasma low-density lipoprotein cholesterol (LDL-C), protect against risk of coronary heart disease (CHD), and have prompted the development of a new class of therapeutics. It is uncertain whether the PCSK9 example represents a paradigm or an isolated exception. We used the “Exome Array” to genotype >200,000 low-frequency and rare coding sequence variants across the genome in 56,538 individuals (42,208 European ancestry [EA] and 14,330 African ancestry [AA]) and tested these variants for association with LDL-C, high-density lipoprotein cholesterol (HDL-C), and triglycerides. Although we did not identify new genes associated with LDL-C, we did identify four low-frequency (frequencies between 0.1% and 2%) variants (ANGPTL8 rs145464906 [c.361C>T; p.Gln121∗], PAFAH1B2 rs186808413 [c.482C>T; p.Ser161Leu], COL18A1 rs114139997 [c.331G>A; p.Gly111Arg], and PCSK7 rs142953140 [c.1511G>A; p.Arg504His]) with large effects on HDL-C and/or triglycerides. None of these four variants was associated with risk for CHD, suggesting that examples of low-frequency coding variants with robust effects on both lipids and CHD will be limited. PMID:24507774

  3. Association of low-frequency and rare coding-sequence variants with blood lipids and coronary heart disease in 56,000 whites and blacks.

    PubMed

    Peloso, Gina M; Auer, Paul L; Bis, Joshua C; Voorman, Arend; Morrison, Alanna C; Stitziel, Nathan O; Brody, Jennifer A; Khetarpal, Sumeet A; Crosby, Jacy R; Fornage, Myriam; Isaacs, Aaron; Jakobsdottir, Johanna; Feitosa, Mary F; Davies, Gail; Huffman, Jennifer E; Manichaikul, Ani; Davis, Brian; Lohman, Kurt; Joon, Aron Y; Smith, Albert V; Grove, Megan L; Zanoni, Paolo; Redon, Valeska; Demissie, Serkalem; Lawson, Kim; Peters, Ulrike; Carlson, Christopher; Jackson, Rebecca D; Ryckman, Kelli K; Mackey, Rachel H; Robinson, Jennifer G; Siscovick, David S; Schreiner, Pamela J; Mychaleckyj, Josyf C; Pankow, James S; Hofman, Albert; Uitterlinden, Andre G; Harris, Tamara B; Taylor, Kent D; Stafford, Jeanette M; Reynolds, Lindsay M; Marioni, Riccardo E; Dehghan, Abbas; Franco, Oscar H; Patel, Aniruddh P; Lu, Yingchang; Hindy, George; Gottesman, Omri; Bottinger, Erwin P; Melander, Olle; Orho-Melander, Marju; Loos, Ruth J F; Duga, Stefano; Merlini, Piera Angelica; Farrall, Martin; Goel, Anuj; Asselta, Rosanna; Girelli, Domenico; Martinelli, Nicola; Shah, Svati H; Kraus, William E; Li, Mingyao; Rader, Daniel J; Reilly, Muredach P; McPherson, Ruth; Watkins, Hugh; Ardissino, Diego; Zhang, Qunyuan; Wang, Judy; Tsai, Michael Y; Taylor, Herman A; Correa, Adolfo; Griswold, Michael E; Lange, Leslie A; Starr, John M; Rudan, Igor; Eiriksdottir, Gudny; Launer, Lenore J; Ordovas, Jose M; Levy, Daniel; Chen, Y-D Ida; Reiner, Alexander P; Hayward, Caroline; Polasek, Ozren; Deary, Ian J; Borecki, Ingrid B; Liu, Yongmei; Gudnason, Vilmundur; Wilson, James G; van Duijn, Cornelia M; Kooperberg, Charles; Rich, Stephen S; Psaty, Bruce M; Rotter, Jerome I; O'Donnell, Christopher J; Rice, Kenneth; Boerwinkle, Eric; Kathiresan, Sekar; Cupples, L Adrienne

    2014-02-01

    Low-frequency coding DNA sequence variants in the proprotein convertase subtilisin/kexin type 9 gene (PCSK9) lower plasma low-density lipoprotein cholesterol (LDL-C), protect against risk of coronary heart disease (CHD), and have prompted the development of a new class of therapeutics. It is uncertain whether the PCSK9 example represents a paradigm or an isolated exception. We used the "Exome Array" to genotype >200,000 low-frequency and rare coding sequence variants across the genome in 56,538 individuals (42,208 European ancestry [EA] and 14,330 African ancestry [AA]) and tested these variants for association with LDL-C, high-density lipoprotein cholesterol (HDL-C), and triglycerides. Although we did not identify new genes associated with LDL-C, we did identify four low-frequency (frequencies between 0.1% and 2%) variants (ANGPTL8 rs145464906 [c.361C>T; p.Gln121*], PAFAH1B2 rs186808413 [c.482C>T; p.Ser161Leu], COL18A1 rs114139997 [c.331G>A; p.Gly111Arg], and PCSK7 rs142953140 [c.1511G>A; p.Arg504His]) with large effects on HDL-C and/or triglycerides. None of these four variants was associated with risk for CHD, suggesting that examples of low-frequency coding variants with robust effects on both lipids and CHD will be limited. PMID:24507774

  4. VisCap: inference and visualization of germ-line copy-number variants from targeted clinical sequencing data

    PubMed Central

    Pugh, Trevor J.; Amr, Sami S.; Bowser, Mark J.; Gowrisankar, Sivakumar; Hynes, Elizabeth; Mahanta, Lisa M.; Rehm, Heidi L.; Funke, Birgit; Lebo, Matthew S.

    2016-01-01

    Purpose: To develop and validate VisCap, a software program targeted to clinical laboratories for inference and visualization of germ-line copy-number variants (CNVs) from targeted next-generation sequencing data. Genet Med 18 7, 712–719. Methods: VisCap calculates the fraction of overall sequence coverage assigned to genomic intervals and computes log2 ratios of these values to the median of reference samples profiled using the same test configuration. Candidate CNVs are called when log2 ratios exceed user-defined thresholds. Genet Med 18 7, 712–719. Results: We optimized VisCap using 14 cases with known CNVs, followed by prospective analysis of 1,104 cases referred for diagnostic DNA sequencing. To verify calls in the prospective cohort, we used droplet digital polymerase chain reaction (PCR) to confirm 10/27 candidate CNVs and 72/72 copy-neutral genomic regions scored by VisCap. We also used a genome-wide bead array to confirm the absence of CNV calls across panels applied to 10 cases. To improve specificity, we instituted a visual scoring system that enabled experienced reviewers to differentiate true-positive from false-positive calls with minimal impact on laboratory workflow. Genet Med 18 7, 712–719. Conclusions: VisCap is a sensitive method for inferring CNVs from targeted sequence data from targeted gene panels. Visual scoring of data underlying CNV calls is a critical step to reduce false-positive calls for follow-up testing. Genet Med 18 7, 712–719. PMID:26681316

  5. Expression of PROKR1 and PROKR2 in Human Enteric Neural Precursor Cells and Identification of Sequence Variants Suggest a Role in HSCR

    PubMed Central

    Ruiz-Ferrer, Macarena; Torroglosa, Ana; Núñez-Torres, Rocío; de Agustín, Juan Carlos; Antiñolo, Guillermo; Borrego, Salud

    2011-01-01

    Background The enteric nervous system (ENS) is entirely derived from neural crest and its normal development is regulated by specific molecular pathways. Failure in complete ENS formation results in aganglionic gut conditions such as Hirschsprung's disease (HSCR). Recently, PROKR1 expression has been demonstrated in mouse enteric neural crest derived cells and Prok-1 was shown to work coordinately with GDNF in the development of the ENS. Principal Findings In the present report, ENS progenitors were isolated and characterized from the ganglionic gut from children diagnosed with and without HSCR, and the expression of prokineticin receptors was examined. Immunocytochemical analysis of neurosphere-forming cells demonstrated that both PROKR1 and PROKR2 were present in human enteric neural crest cells. In addition, we also performed a mutational analysis of PROKR1, PROKR2, PROK1 and PROK2 genes in a cohort of HSCR patients, evaluating them for the first time as susceptibility genes for the disease. Several missense variants were detected, most of them affecting highly conserved amino acid residues of the protein and located in functional domains of both receptors, which suggests a possible deleterious effect in their biological function. Conclusions Our results suggest that not only PROKR1, but also PROKR2 might mediate a complementary signalling to the RET/GFRα1/GDNF pathway supporting proliferation/survival and differentiation of precursor cells during ENS development. These findings, together with the detection of sequence variants in PROKR1, PROK1 and PROKR2 genes associated to HSCR and, in some cases in combination with RET or GDNF mutations, provide the first evidence to consider them as susceptibility genes for HSCR. PMID:21858136

  6. Genome wide association study of uric acid in Indian population and interaction of identified variants with Type 2 diabetes

    PubMed Central

    Giri, Anil K; Banerjee, Priyanka; Chakraborty, Shraddha; Kauser, Yasmeen; Undru, Aditya; Roy, Suki; Parekatt, Vaisak; Ghosh, Saurabh; Tandon, Nikhil; Bharadwaj, Dwaipayan

    2016-01-01

    Abnormal level of Serum Uric Acid (SUA) is an important marker and risk factor for complex diseases including Type 2 Diabetes. Since genetic determinant of uric acid in Indians is totally unexplored, we tried to identify common variants associated with SUA in Indians using Genome Wide Association Study (GWAS). Association of five known variants in SLC2A9 and SLC22A11 genes with SUA level in 4,834 normoglycemics (1,109 in discovery and 3,725 in validation phase) was revealed with different effect size in Indians compared to other major ethnic population of the world. Combined analysis of 1,077 T2DM subjects (772 in discovery and 305 in validation phase) and normoglycemics revealed additional GWAS signal in ABCG2 gene. Differences in effect sizes of ABCG2 and SLC2A9 gene variants were observed between normoglycemics and T2DM patients. We identified two novel variants near long non-coding RNA genes AL356739.1 and AC064865.1 with nearly genome wide significance level. Meta-analysis and in silico replication in 11,745 individuals from AUSTWIN consortium improved association for rs12206002 in AL356739.1 gene to sub-genome wide association level. Our results extends association of SLC2A9, SLC22A11 and ABCG2 genes with SUA level in Indians and enrich the assemblages of evidence for SUA level and T2DM interrelationship. PMID:26902266

  7. The pH low insertion peptide pHLIP Variant 3 as a novel marker of acidic malignant lesions

    PubMed Central

    Tapmeier, Thomas T.; Moshnikova, Anna; Beech, John; Allen, Danny; Kinchesh, Paul; Smart, Sean; Harris, Adrian; McIntyre, Alan; Engelman, Donald M.; Andreev, Oleg A.; Reshetnyak, Yana K.; Muschel, Ruth J.

    2015-01-01

    Current strategies for early detection of breast and other cancers are limited in part because some lesions identified as potentially malignant do not develop into aggressive tumors. Acid pH has been suggested as a key characteristic of aggressive tumors that might distinguish aggressive lesions from more indolent pathology. We therefore investigated the novel class of molecules, pH low insertion peptides (pHLIPs), as markers of low pH in tumor allografts and of malignant lesions in a mouse model of spontaneous breast cancer, BALB/neu-T. pHLIP Variant 3 (Var3) conjugated with fluorescent Alexa546 was shown to insert into tumor spheroids in a sequence-specific manner. Its signal reflected pH in murine tumors. It was induced by carbonic anhydrase IX (CAIX) overexpression and inhibited by acetazolamide (AZA) administration. By using 31P magnetic resonance spectroscopy (MRS), we demonstrated that pHLIP Var3 was retained in tumors of pH equal to or less than 6.7 but not in tissues of higher pH. In BALB/neu-T mice at different stages of the disease, the fluorescent signal from pHLIP Var3 marked cancerous lesions with a very low false-positive rate. However, only ∼60% of the smallest lesions retained a pHLIP Var3 signal, suggesting heterogeneity in pH. Taken together, these results show that pHLIP can identify regions of lower pH, allowing for its development as a theranostic tool for clinical applications. PMID:26195776

  8. Sequence variants at CHRNB3-CHRNA6 and CYP2A6 affect smoking behavior.

    PubMed

    Thorgeirsson, Thorgeir E; Gudbjartsson, Daniel F; Surakka, Ida; Vink, Jacqueline M; Amin, Najaf; Geller, Frank; Sulem, Patrick; Rafnar, Thorunn; Esko, Tõnu; Walter, Stefan; Gieger, Christian; Rawal, Rajesh; Mangino, Massimo; Prokopenko, Inga; Mägi, Reedik; Keskitalo, Kaisu; Gudjonsdottir, Iris H; Gretarsdottir, Solveig; Stefansson, Hreinn; Thompson, John R; Aulchenko, Yurii S; Nelis, Mari; Aben, Katja K; den Heijer, Martin; Dirksen, Asger; Ashraf, Haseem; Soranzo, Nicole; Valdes, Ana M; Steves, Claire; Uitterlinden, André G; Hofman, Albert; Tönjes, Anke; Kovacs, Peter; Hottenga, Jouke Jan; Willemsen, Gonneke; Vogelzangs, Nicole; Döring, Angela; Dahmen, Norbert; Nitz, Barbara; Pergadia, Michele L; Saez, Berta; De Diego, Veronica; Lezcano, Victoria; Garcia-Prats, Maria D; Ripatti, Samuli; Perola, Markus; Kettunen, Johannes; Hartikainen, Anna-Liisa; Pouta, Anneli; Laitinen, Jaana; Isohanni, Matti; Huei-Yi, Shen; Allen, Maxine; Krestyaninova, Maria; Hall, Alistair S; Jones, Gregory T; van Rij, Andre M; Mueller, Thomas; Dieplinger, Benjamin; Haltmayer, Meinhard; Jonsson, Steinn; Matthiasson, Stefan E; Oskarsson, Hogni; Tyrfingsson, Thorarinn; Kiemeney, Lambertus A; Mayordomo, Jose I; Lindholt, Jes S; Pedersen, Jesper Holst; Franklin, Wilbur A; Wolf, Holly; Montgomery, Grant W; Heath, Andrew C; Martin, Nicholas G; Madden, Pamela A F; Giegling, Ina; Rujescu, Dan; Järvelin, Marjo-Riitta; Salomaa, Veikko; Stumvoll, Michael; Spector, Tim D; Wichmann, H-Erich; Metspalu, Andres; Samani, Nilesh J; Penninx, Brenda W; Oostra, Ben A; Boomsma, Dorret I; Tiemeier, Henning; van Duijn, Cornelia M; Kaprio, Jaakko; Gulcher, Jeffrey R; McCarthy, Mark I; Peltonen, Leena; Thorsteinsdottir, Unnur; Stefansson, Kari

    2010-05-01

    Smoking is a common risk factor for many diseases. We conducted genome-wide association meta-analyses for the number of cigarettes smoked per day (CPD) in smokers (n = 31,266) and smoking initiation (n = 46,481) using samples from the ENGAGE Consortium. In a second stage, we tested selected SNPs with in silico replication in the Tobacco and Genetics (TAG) and Glaxo Smith Kline (Ox-GSK) consortia cohorts (n = 45,691 smokers) and assessed some of those in a third sample of European ancestry (n = 9,040). Variants in three genomic regions associated with CPD (P < 5 x 10(-8)), including previously identified SNPs at 15q25 represented by rs1051730[A] (effect size = 0.80 CPD, P = 2.4 x 10(-69)), and SNPs at 19q13 and 8p11, represented by rs4105144[C] (effect size = 0.39 CPD, P = 2.2 x 10(-12)) and rs6474412-T (effect size = 0.29 CPD, P = 1.4 x 10(-8)), respectively. Among the genes at the two newly associated loci are genes encoding nicotine-metabolizing enzymes (CYP2A6 and CYP2B6) and nicotinic acetylcholine receptor subunits (CHRNB3 and CHRNA6), all of which have been highlighted in previous studies of smoking and nicotine dependence. Nominal associations with lung cancer were observed at both 8p11 (rs6474412[T], odds ratio (OR) = 1.09, P = 0.04) and 19q13 (rs4105144[C], OR = 1.12, P = 0.0006). PMID:20418888

  9. TMC-SNPdb: an Indian germline variant database derived from whole exome sequences.

    PubMed

    Upadhyay, Pawan; Gardi, Nilesh; Desai, Sanket; Sahoo, Bikram; Singh, Ankita; Togar, Trupti; Iyer, Prajish; Prasad, Ratnam; Chandrani, Pratik; Gupta, Sudeep; Dutt, Amit

    2016-01-01

    Cancer is predominantly a somatic disease. A mutant allele present in a cancer cell genome is considered somatic when it's absent in the paired normal genome along with public SNP databases. The current build of dbSNP, the most comprehensive public SNP database, however inadequately represents several non-European Caucasian populations, posing a limitation in cancer genomic analyses of data from these populations. We present the T: ata M: emorial C: entre-SNP D: ata B: ase (TMC-SNPdb), as the first open source, flexible, upgradable, and freely available SNP database (accessible through dbSNP build 149 and ANNOVAR)-representing 114 309 unique germline variants-generated from whole exome data of 62 normal samples derived from cancer patients of Indian origin. The TMC-SNPdb is presented with a companion subtraction tool that can be executed with command line option or using an easy-to-use graphical user interface with the ability to deplete additional Indian population specific SNPs over and above dbSNP and 1000 Genomes databases. Using an institutional generated whole exome data set of 132 samples of Indian origin, we demonstrate that TMC-SNPdb could deplete 42, 33 and 28% false positive somatic events post dbSNP depletion in Indian origin tongue, gallbladder, and cervical cancer samples, respectively. Beyond cancer somatic analyses, we anticipate utility of the TMC-SNPdb in several Mendelian germline diseases. In addition to dbSNP build 149 and ANNOVAR, the TMC-SNPdb along with the subtraction tool is available for download in the public domain at the following:Database URL: http://www.actrec.gov.in/pi-webpages/AmitDutt/TMCSNP/TMCSNPdp.html. PMID:27402678

  10. Association analysis of bovine Foxa2 gene single sequence variant and haplotype combinations with growth traits in Chinese cattle.

    PubMed

    Liu, Mei; Li, Mijie; Wang, Shaoqiang; Xu, Yao; Lan, Xianyong; Li, Zhuanjian; Lei, Chuzhao; Yang, Dongying; Jia, Yutang; Chen, Hong

    2014-02-25

    Forkhead box A2 (Foxa2) has been recognized as one of the most potent transcriptional activators that is implicated in the control of feeding behavior and energy homeostasis. However, similar researches about the effects of genetic variations of Foxa2 gene on growth traits are lacking. Therefore, this study detected Foxa2 gene polymorphisms by DNA pool sequencing, PCR-RFLP and PCR-ACRS methods in 822 individuals from three Chinese cattle breeds. The results showed that four sequence variants (SVs) were screened, including two mutations (SV1, g. 7005 C>T and SV2, g. 7044 C>G) in intron 4, one mutation (SV3, g. 8449 A>G) in exon 5 and one mutation (SV4, g. 8537 T>C) in the 3'UTR. Notably, association analysis of the single mutations with growth traits in total individuals (at 24months) revealed that significant statistical difference was found in four SVs, and SV4 locus was highly significantly associated with growth traits throughout all three breeds (P<0.05 or P<0.01). Meanwhile, haplotype combination CCCCAGTC also indicated remarkably associated to better chest girth and body weight in Jiaxian Red cattle (P<0.05). We herein described a comprehensive study on the variability of bovine Foxa2 gene that was predictive of molecular markers in cattle breeding for the first time. PMID:24333857

  11. Genomic variants of genes associated with three horticultural traits in apple revealed by genome re-sequencing

    PubMed Central

    Zhang, Shijie; Chen, Weiping; Xin, Lu; Gao, Zhihong; Hou, Yingjun; Yu, Xinyi; Zhang, Zhen; Qu, Shenchun

    2014-01-01

    The apple (Malus × domestica Borkh.) cultivar ‘Su Shuai’ exhibits greater disease resistance, shorter internodes and lighter fruit flavor compared with its parents ‘Golden Delicious’ and ‘Indo’. To obtain a comprehensive overview of the sequence variation in these three horticultural traits, the genomes of ‘Su Shuai’ and ‘Indo’ were resequenced using next-generation sequencing and compared to the genome of ‘Golden Delicious’. A wide range of genetic variations were detected, including 2 454 406 and 18 749 349 single nucleotide polymorphism (SNP) and 59 547 and 50 143 structural variants (SVs) in the ‘Indo’ and ‘Su Shuai’ genomes, respectively. Among the SVs in ‘Su Shuai’, 17 genes related to disease resistance, 10 genes related to Gibberellin (GA) and 19 genes associated with fruit flavor were identified. The expression patterns of eight of the SV genes were examined using reverse transcription-quantitative polymerase chain reaction (RT-qPCR). The results of this study illustrate the genomic variation in these cultivars and provide evidence for a genetic basis for the horticultural traits of disease resistance, short internodes and lighter flavor exhibited in these cultivars. These results provide a genetic basis for the phenotypic characteristics of ‘Su Shuai’ and, as such, these SVs could serve as gene-specific molecular markers in maker-assisted breeding of apples. PMID:26504548

  12. ADAM19 and HTR4 Variants and Pulmonary Function: The Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Targeted Sequencing Study

    PubMed Central

    London, Stephanie J.; Gao, Wei; Gharib, Sina A.; Hancock, Dana B.; Wilk, Jemma B.; House, John S.; Gibbs, Richard A.; Muzny, Donna M.; Lumley, Thomas; Franceschini, Nora; North, Kari E.; Psaty, Bruce M.; Kovar, Christie L.; Coresh, Josef; Zhou, Yanhua; Heckbert, Susan R.; Brody, Jennifer A.; Morrison, Alanna C.; Dupuis, Josée

    2014-01-01

    Background The pulmonary function measures of forced expiratory volume in one second (FEV1) and its ratio to forced vital capacity (FVC) are used in the diagnosis and monitoring of lung diseases and predict cardiovascular mortality in the general population. Genome wide association studies (GWAS) have identified numerous loci associated with FEV1 and FEV1/FVC but the causal variants remain uncertain. We hypothesized that novel or rare variants poorly tagged by GWAS may explain the significant associations between FEV1/FVC and two genes: ADAM19 and HTR4. Methods and Results We sequenced ADAM19 and its promoter region along with the approximately 21 kb portion of HTR4 harboring GWAS SNPs for pulmonary function and analyzed associations with FEV1/FVC among 3,983 participants of European ancestry from Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE). Meta-analysis of common variants in each region identified statistically significant associations (316 tests, P < 1.58×10−4) with FEV1/FVC for 14 ADAM19 SNPs and 24 HTR4 SNPs. After conditioning on the sentinel GWAS hit in each gene [ADAM19 rs1422795, minor allele frequency (MAF)=0.33 and HTR4 rs11168048, MAF=0.40] one SNP remained statistically significant (ADAM19 rs13155908, MAF = 0.12, P = 1.56×10−4). Analysis of rare variants (MAF < 1%) using Sequence Kernel Association Test did not identify associations with either region. Conclusions Sequencing identified one common variant associated with FEV1/FVC independently of the sentinel ADAM19 GWAS hit and supports the original HTR4 GWAS findings. Rare variants do not appear to underlie GWAS associations with pulmonary function for common variants in ADAM19 and HTR4. PMID:24951661

  13. Wildlife sequences of islet amyloid polypeptide (IAPP) identify critical species variants for fibrillization.

    PubMed

    Fortin, Jessica S; Benoit-Biancamano, Marie-Odile

    2015-01-01

    Amyloid can be detected in the islets of Langerhans in a majority of type 2 diabetic patients. These deposits have been associated with β-cell death, thereby furthering diabetes progression. Islet amyloid polypeptide (IAPP) amyloidogenicity is quite variable among animal species, and studying this variability could further our understanding of the mechanisms involved in the aggregation process. Thus, the general aim of this study was to identify IAPP isoforms in different animal species and characterize their propensity to form fibrillar aggregates. A library of 23 peptides (fragment 8-32) was designed to study the amyloid formation using in silico analysis and in vitro assays. Amyloid formation was impeded when the NFLVH motif found in segment 8-20 was substituted by DFLGR or KFLIR segments. A 29P, 14K and 18R substitution were often present in non-amyloidogenic sequences. Non-amyloidogenic sequences were obtained from Leontopithecus rosalia, Tursiops truncatus and Vicugna pacos. Fragment peptides from 34 species were amyloidogenic. To conclude, this project advances our knowledge on the comparative pathogenesis of amyloidosis in type II diabetes. It is conceivable that the additional information gained may help point towards new therapeutic strategies for diabetes patients. PMID:26300107

  14. Whole-exome sequencing identifies rare, functional CFH variants in families with macular degeneration

    PubMed Central

    Yu, Yi; Triebwasser, Michael P.; Wong, Edwin K. S.; Schramm, Elizabeth C.; Thomas, Brett; Reynolds, Robyn; Mardis, Elaine R.; Atkinson, John P.; Daly, Mark; Raychaudhuri, Soumya; Kavanagh, David; Seddon, Johanna M.

    2014-01-01

    We sequenced the whole exome of 35 cases and 7 controls from 9 age-related macular degeneration (AMD) families in whom known common genetic risk alleles could not explain their high disease burden and/or their early-onset advanced disease. Two families harbored novel rare mutations in CFH (R53C and D90G). R53C segregates perfectly with AMD in 11 cases (heterozygous) and 1 elderly control (reference allele) (LOD = 5.07, P = 6.7 × 10−7). In an independent cohort, 4 out of 1676 cases but none of the 745 examined controls or 4300 NHBLI Exome Sequencing Project (ESP) samples carried the R53C mutation (P = 0.0039). In another family of six siblings, D90G similarly segregated with AMD in five cases and one control (LOD = 1.22, P = 0.009). No other sample in our large cohort or the ESP had this mutation. Functional studies demonstrated that R53C decreased the ability of FH to perform decay accelerating activity. D90G exhibited a decrease in cofactor-mediated inactivation. Both of these changes would lead to a loss of regulatory activity, resulting in excessive alternative pathway activation. This study represents an initial application of the whole-exome strategy to families with early-onset AMD. It successfully identified high impact alleles leading to clearer functional insight into AMD etiopathogenesis. PMID:24847005

  15. Whole-exome sequencing identifies rare, functional CFH variants in families with macular degeneration.

    PubMed

    Yu, Yi; Triebwasser, Michael P; Wong, Edwin K S; Schramm, Elizabeth C; Thomas, Brett; Reynolds, Robyn; Mardis, Elaine R; Atkinson, John P; Daly, Mark; Raychaudhuri, Soumya; Kavanagh, David; Seddon, Johanna M

    2014-10-01

    We sequenced the whole exome of 35 cases and 7 controls from 9 age-related macular degeneration (AMD) families in whom known common genetic risk alleles could not explain their high disease burden and/or their early-onset advanced disease. Two families harbored novel rare mutations in CFH (R53C and D90G). R53C segregates perfectly with AMD in 11 cases (heterozygous) and 1 elderly control (reference allele) (LOD = 5.07, P = 6.7 × 10(-7)). In an independent cohort, 4 out of 1676 cases but none of the 745 examined controls or 4300 NHBLI Exome Sequencing Project (ESP) samples carried the R53C mutation (P = 0.0039). In another family of six siblings, D90G similarly segregated with AMD in five cases and one control (LOD = 1.22, P = 0.009). No other sample in our large cohort or the ESP had this mutation. Functional studies demonstrated that R53C decreased the ability of FH to perform decay accelerating activity. D90G exhibited a decrease in cofactor-mediated inactivation. Both of these changes would lead to a loss of regulatory activity, resulting in excessive alternative pathway activation. This study represents an initial application of the whole-exome strategy to families with early-onset AMD. It successfully identified high impact alleles leading to clearer functional insight into AMD etiopathogenesis. PMID:24847005

  16. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology.

    PubMed

    Richards, Sue; Aziz, Nazneen; Bale, Sherri; Bick, David; Das, Soma; Gastier-Foster, Julie; Grody, Wayne W; Hegde, Madhuri; Lyon, Elaine; Spector, Elaine; Voelkerding, Karl; Rehm, Heidi L

    2015-05-01

    The American College of Medical Genetics and Genomics (ACMG) previously developed guidance for the interpretation of sequence variants.(1) In the past decade, sequencing technology has evolved rapidly with the advent of high-throughput next-generation sequencing. By adopting and leveraging next-generation sequencing, clinical laboratories are now performing an ever-increasing catalogue of genetic testing spanning genotyping, single genes, gene panels, exomes, genomes, transcriptomes, and epigenetic assays for genetic disorders. By virtue of increased complexity, this shift in genetic testing has been accompanied by new challenges in sequence interpretation. In this context the ACMG convened a workgroup in 2013 comprising representatives from the ACMG, the Association for Molecular Pathology (AMP), and the College of American Pathologists to revisit and revise the standards and guidelines for the interpretation of sequence variants. The group consisted of clinical laboratory directors and clinicians. This report represents expert opinion of the workgroup with input from ACMG, AMP, and College of American Pathologists stakeholders. These recommendations primarily apply to the breadth of genetic tests used in clinical laboratories, including genotyping, single genes, panels, exomes, and genomes. This report recommends the use of specific standard terminology-"pathogenic," "likely pathogenic," "uncertain significance," "likely benign," and "benign"-to describe variants identified in genes that cause Mendelian disorders. Moreover, this recommendation describes a process for classifying variants into these five categories based on criteria using typical types of variant evidence (e.g., population data, computational data, functional data, segregation data). Because of the increased complexity of analysis and interpretation of clinical genetic testing described in this report, the ACMG strongly recommends that clinical molecular genetic testing should be performed in a

  17. Full genome sequence analysis of a wild, non-MLV-related type 2 Hungarian PRRSV variant isolated in Europe.

    PubMed

    Balka, Gyula; Wang, Xiong; Olasz, Ferenc; Bálint, Ádám; Kiss, István; Bányai, Krisztián; Rusvai, Miklós; Stadejek, Tomasz; Marthaler, Douglas; Murtaugh, Michael P; Zádori, Zoltán

    2015-03-16

    Porcine reproductive and respiratory syndrome virus (PRRSV) is a widespread pathogen of pigs causing significant economic losses to the swine industry. The expanding diversity of PRRSV strains makes the diagnosis, control and eradication of the disease more and more difficult. In the present study, the authors report the full genome sequencing of a type 2 PRRSV strain isolated from piglet carcasses in Hungary. Next generation sequencing was used to determine the complete genome sequence of the isolate (PRRSV-2/Hungary/102/2012). Recombination analysis performed with the available full-length genome sequences showed no evidence of such event with other known PRRSV. Unique deletions and an insertion were found in the nsp2 region of PRRSV-2/Hungary/102/2012 when it was compared to the highly virulent VR2332 and JXA-1 prototype strains. The majority of amino acid alterations in GP4 and GP5 of the virus were in the known antigenic regions suggesting an important role for immunological pressure in PRRSV-2/Hungary/102/2012 evolution. Phylogenetic analysis revealed that it belongs to lineage 1 or 2 of type 2 PRRSV. Considering the lack of related PRRSV in Europe, except for a partial sequence from Slovakia, the ancestor of PRRSV-2/Hungary/102/2012 was most probably transported from North-America. It is the first documented type 2 PRRSV isolated in Europe that is not related to the Ingelvac MLV. PMID:25616050

  18. Association of low-frequency and rare coding-sequence variants with blood lipids and Coronary Heart Disease in 56,000 whites and blacks

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Low-frequency coding DNA sequence variants in the proprotein convertase subtilisin/kexin type 9 gene (PCSK9) lower plasma low-density lipoprotein cholesterol (LDL-C), protect against risk of coronary heart disease (CHD), and have prompted the development of a new class of therapeutics. It is uncerta...

  19. Absorption of Amino Acids and Peptides in a Child with a Variant of Hartnup Disease and Coexistent Coeliac Disease

    PubMed Central

    Tarlow, M. J.; Seakins, J. W. T.; Lloyd, June K.; Matthews, D. M.; Cheng, B.; Thomas, A. J.

    1972-01-01

    A child with a variant of Hartnup disease and co-existent coeliac disease is described. Oral tolerance tests with L-histidine, L-tyrosine, and glycyl-L-tyrosine, and in vitro uptake studies on a small intestinal biopsy with L-histidine and glycyl-L-histidine, showed impaired absorption of the free amino acids, and showed that absorption of tyrosine and mucosal uptake of histidine was better from the dipeptides than from the free amino acids. This supports the hypothesis that the intestinal mucosa can take up small peptides intact, and that the peptide uptake mechanism is not involved in the intestinal defect of Hartnup disease. PMID:5086513

  20. Clinical Implementation of Germline Cancer Pharmacogenetic Variants during the Next-Generation Sequencing Era

    PubMed Central

    Gillis, Nancy K.; Patel, Jai N.; Innocenti, Federico

    2014-01-01

    Over 100 FDA-approved medications include pharmacogenetic biomarkers in the drug label, many with cancer indications referencing germline DNA variations. With the advent of next-generation sequencing (NGS) and its rapidly increasing uptake into cancer research and clinical practice, an enormous amount of data to inform documented gene-drug associations will be collected, which must be exploited to optimize patient benefit. This state-of-the-art article focuses on the implementation of germline cancer pharmacogenetics into clinical practice. Specifically, it discusses the importance of germline variation in cancer and the role of NGS in pharmacogenetic discovery and implementation. In the context of a scenario where massive NGS-based genetic information will be increasingly available to health stakeholders, this review explores the ongoing debate over the threshold of evidence necessary for implementation, provides an overview of recommendations in cancer by professional organizations and regulatory bodies, discusses limitations of current guidelines and strategies to improve third-party coverage. PMID:24136381

  1. Novel variants in MLL confer to bladder cancer recurrence identified by whole-exome sequencing

    PubMed Central

    Wang, Yongqiang; Huang, Yi; Liu, Huan; Li, Feida; He, Luyun; Sun, Da; Yu, Yuan; Li, Qiaoling; Huang, Peide; Zhang, Meng; Zhao, Xin; Bi, Tengteng; Zhuang, Xuehan; Zhang, Liyan; Lu, Jingxiao; Sun, Xiaojuan; Zhou, Fangjian; Liu, Chunxiao; Yang, Guosheng; Hou, Yong; Fan, Zusen; Cai, Zhiming

    2016-01-01

    Bladder cancer (BC) is distinguished by high rate of recurrence after surgery, but the underlying mechanisms remain poorly understood. Here we performed the whole-exome sequencing of 37 BC individuals including 20 primary and 17 recurrent samples in which the primary and recurrent samples were not from the same patient. We uncovered that MLL, EP400, PRDM2, ANK3 and CHD5 exclusively altered in recurrent BCs. Specifically, the recurrent BCs and bladder cancer cells with MLL mutation displayed increased histone H3 tri-methyl K4 (H3K4me3) modification in tissue and cell levels and showed enhanced expression of GATA4 and ETS1 downstream. What's more, MLL mutated bladder cancer cells obtained with CRISPR/Cas9 showed increased ability of drug-resistance to epirubicin (a chemotherapy drug for bladder cancer) than wild type cells. Additionally, the BC patients with high expression of GATA4 and ETS1 significantly displayed shorter lifespan than patients with low expression. Our study provided an overview of the genetic basis of recrudescent bladder cancer and discovered that genetic alterations of MLL were involved in BC relapse. The increased modification of H3K4me3 and expression of GATA4 and ETS1 would be the promising targets for the diagnosis and therapy of relapsed bladder cancer. PMID:26625313

  2. Analysis of 'Fuji' apple somatic variants from next-generation sequencing.

    PubMed

    Lee, H S; Kim, G H; Kwon, S I; Kim, J H; Kwon, Y S; Choi, C

    2016-01-01

    The domesticated apple (Malus x domestica Borkh.) is a major fruit crop of temperate regions of the world. 'Fuji' apple (Ralls Genet x Delicious), a famous apple cultivar in Korea, has been very popular since its promotion in Japan in 1958. 'Fuji' and its bud mutant cultivars possess variable levels of genetic diversity. Nonetheless, the phenotypes of each group, which are classified into the bud mutation groups: early season, fruiting spur, and coloring, are similar. Despite attempts to identify these bud mutation cultivars, molecular markers, which were developed before the emergence of next-generation sequencing technology, have not been able to distinguish each cultivar easily. In this study, we adopted the resequencing technique using the 'Golden Delicious' (Grimes Golden x Unknown) apple genome as a reference. SNPs (single nucleotide polymorphisms) and InDels (insertions or deletions) of 'Fuji' apple and its bud mutant cultivar were detected and SNPs and unique InDels distinct to each cultivar were identified. Data from this study may be used to identify bud mutant cultivars of 'Fuji' apples and be useful for further breeding of apples. PMID:27525934

  3. Systematic Identification of Single Amino Acid Variants in Glioma Stem-Cell-Derived Chromosome 19 Proteins

    PubMed Central

    2015-01-01

    Novel proteoforms with single amino acid variations represent proteins that often have altered biological functions but are less explored in the human proteome. We have developed an approach, searching high quality shotgun proteomic data against an extended protein database, to identify expressed mutant proteoforms in glioma stem cell (GSC) lines. The systematic search of MS/MS spectra using PEAKS 7.0 as the search engine has recognized 17 chromosome 19 proteins in GSCs with altered amino acid sequences. The results were further verified by manual spectral examination, validating 19 proteoforms. One of the novel findings, a mutant form of branched-chain aminotransferase 2 (p.Thr186Arg), was verified at the transcript level and by targeted proteomics in several glioma stem cell lines. The structure of this proteoform was examined by molecular modeling in order to estimate conformational changes due to mutation that might lead to functional modifications potentially linked to glioma. Based on our initial findings, we believe that our approach presented could contribute to construct a more complete map of the human functional proteome. PMID:25399873

  4. Systematic identification of single amino acid variants in glioma stem-cell-derived chromosome 19 proteins.

    PubMed

    Lichti, Cheryl F; Mostovenko, Ekaterina; Wadsworth, Paul A; Lynch, Gillian C; Pettitt, B Montgomery; Sulman, Erik P; Wang, Qianghu; Lang, Frederick F; Rezeli, Melinda; Marko-Varga, György; Végvári, Ákos; Nilsson, Carol L

    2015-02-01

    Novel proteoforms with single amino acid variations represent proteins that often have altered biological functions but are less explored in the human proteome. We have developed an approach, searching high quality shotgun proteomic data against an extended protein database, to identify expressed mutant proteoforms in glioma stem cell (GSC) lines. The systematic search of MS/MS spectra using PEAKS 7.0 as the search engine has recognized 17 chromosome 19 proteins in GSCs with altered amino acid sequences. The results were further verified by manual spectral examination, validating 19 proteoforms. One of the novel findings, a mutant form of branched-chain aminotransferase 2 (p.Thr186Arg), was verified at the transcript level and by targeted proteomics in several glioma stem cell lines. The structure of this proteoform was examined by molecular modeling in order to estimate conformational changes due to mutation that might lead to functional modifications potentially linked to glioma. Based on our initial findings, we believe that our approach presented could contribute to construct a more complete map of the human functional proteome. PMID:25399873

  5. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... in the sequence. (4) The enumeration of amino acids may start at the first amino acid of the first..., counting backwards starting with the amino acid next to number 1. Otherwise, the enumeration of amino acids... sequence every 5 amino acids. The enumeration method for amino acid sequences that is set forth......

  6. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... in the sequence. (4) The enumeration of amino acids may start at the first amino acid of the first..., counting backwards starting with the amino acid next to number 1. Otherwise, the enumeration of amino acids... sequence every 5 amino acids. The enumeration method for amino acid sequences that is set forth......

  7. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... in the sequence. (4) The enumeration of amino acids may start at the first amino acid of the first..., counting backwards starting with the amino acid next to number 1. Otherwise, the enumeration of amino acids... sequence every 5 amino acids. The enumeration method for amino acid sequences that is set forth......

  8. IGF2BP2 Alternative Variants Associated with Glutamic Acid Decarboxylase Antibodies Negative Diabetes in Malaysian Subjects

    PubMed Central

    Salem, Sameer D.; Saif-Ali, Riyadh; Ismail, Ikram S.; Al-Hamodi, Zaid; Poh, Rozaida; Muniandy, Sekaran

    2012-01-01

    Background The association of Insulin-like growth factor 2 mRNA-binding protein 2 (IGF2BP2) common variants (rs4402960 and rs1470579) with type 2 diabetes (T2D) has been performed in different populations. The aim of this study was to evaluate the association of alternative variants of IGF2BP2; rs6777038, rs16860234 and rs7651090 with glutamic acid decarboxylase antibodies (GADA) negative diabetes in Malaysian Subjects. Methods/Principal Findings IGF2BP2; rs6777038, rs16860234 and rs7651090 single nucleotide polymorphisms (SNPs) were genotyped in 1107 GADA negative diabetic patients and 620 control subjects of Asian from Malaysia. The additive genetic model adjusted for age, race, gender and BMI showed that alternative variants; rs6777038, rs16860234 and rs7651090 of IGF2BP2 associated with GADA negative diabetes (OR = 1.21; 1.36; 1.35, P = 0.03; 0.0004; 0.0002, respectively). In addition, the CCG haplotype and diplotype CCG-TCG increased the risk of diabetes (OR = 1.51, P = 0.01; OR = 2.36, P = 0.009, respectively). Conclusions/Significance IGF2BP2 alternative variants were associated with GADA negative diabetes. The IGF2BP2 haplotypes and diplotypes increased the risk of diabetes in Malaysian subject. PMID:23029108

  9. Arg287Gln VARIANT OF EPHX2 AND EPOXYEICOSATRIENOIC ACIDS ARE ASSOCIATED WITH INSULIN SENSITIVITY IN HUMANS

    PubMed Central

    Ramirez, Claudia E.; Shuey, Megan M.; Milne, Ginger L.; Gilbert, Kimberly; Hui, Nian; Yu, Chang; Luther, James M.; Brown, Nancy J.

    2014-01-01

    Epoxyeicosatrienoic acids (EETs) protect against the development of insulin resistance in rodents. EETs are hydrolyzed to less biologically active diols by soluble epoxide hydrolase (encoded for by EPHX2). Functional variants of EPHX2 encode for enzymes with increased (Lys55Arg) or decreased (Arg287Gln) hydrolase activity. This study tested the hypothesis that variants of EPHX2 are associated with insulin sensitivity or secretion in humans. Subjects participating in metabolic phenotyping studies were genotyped. Eighty-five subjects underwent hyperglycemic clamps. There was no relationship between the Lys55Arg genotype and insulin sensitivity or secretion. In contrast, the EPHX2 287Gln variant was associated with higher insulin sensitivity index (p=0.019 controlling for body mass index and metabolic syndrome). Also, there was an interactive effect of EPHX2 Arg287Gln genotype and body mass index on insulin sensitivity index (p=0.029). There was no relationship between EPHX2 Arg287Gln genotype and acute or late-phase glucose-stimulated insulin secretion, but disposition index was higher in 287Gln carriers compared with Arg/Arg (p=0.022). Plasma EETs correlated with insulin sensitivity index (r=0.64, p=0.015 for total EETs) and were decreased in the metabolic syndrome. A genetic variant that results in decreased soluble epoxide hydrolase activity is associated with increased insulin sensitivity, as are higher EETs. PMID:25173047

  10. Predicting protein disorder by analyzing amino acid sequence

    PubMed Central

    Yang, Jack Y; Yang, Mary Qu

    2008-01-01

    Background Many protein regions and some entire proteins have no definite tertiary structure, presenting instead as dynamic, disorder ensembles under different physiochemical circumstances. These proteins and regions are known as Intrinsically Unstructured Proteins (IUP). IUP have been associated with a wide range of protein functions, along with roles in diseases characterized by protein misfolding and aggregation. Results Identifying IUP is important task in structural and functional genomics. We exact useful features from sequences and develop machine learning algorithms for the above task. We compare our IUP predictor with PONDRs (mainly neural-network-based predictors), disEMBL (also based on neural networks) and Globplot (based on disorder propensity). Conclusion We find that augmenting features derived from physiochemical properties of amino acids (such as hydrophobicity, complexity etc.) and using ensemble method proved beneficial. The IUP predictor is a viable alternative software tool for identifying IUP protein regions and proteins. PMID:18831799

  11. Transcriptome sequencing reveals a profile that corresponds to genomic variants in Waldenström macroglobulinemia.

    PubMed

    Hunter, Zachary R; Xu, Lian; Yang, Guang; Tsakmaklis, Nicholas; Vos, Josephine M; Liu, Xia; Chen, Jie; Manning, Robert J; Chen, Jiaji G; Brodsky, Philip; Patterson, Christopher J; Gustine, Joshua; Dubeau, Toni; Castillo, Jorge J; Anderson, Kenneth C; Munshi, Nikhil M; Treon, Steven P

    2016-08-11

    Whole-genome sequencing has identified highly prevalent somatic mutations including MYD88, CXCR4, and ARID1A in Waldenström macroglobulinemia (WM). The impact of these and other somatic mutations on transcriptional regulation in WM remains to be clarified. We performed next-generation transcriptional profiling in 57 WM patients and compared findings to healthy donor B cells. Compared with healthy donors, WM patient samples showed greatly enhanced expression of the VDJ recombination genes DNTT, RAG1, and RAG2, but not AICDA Genes related to CXCR4 signaling were also upregulated and included CXCR4, CXCL12, and VCAM1 regardless of CXCR4 mutation status, indicating a potential role for CXCR4 signaling in all WM patients. The WM transcriptional profile was equally dissimilar to healthy memory B cells and circulating B cells likely due increased differentiation rather than cellular origin. The profile for CXCR4 mutations corresponded to diminished B-cell differentiation and suppression of tumor suppressors upregulated by MYD88 mutations in a manner associated with the suppression of TLR4 signaling relative to those mutated for MYD88 alone. Promoter methylation studies of top findings failed to explain this suppressive effect but identified aberrant methylation patterns in MYD88 wild-type patients. CXCR4 and MYD88 transcription were negatively correlated, demonstrated allele-specific transcription bias, and, along with CXCL13, were associated with bone marrow disease involvement. Distinct gene expression profiles for patients with wild-type MYD88, mutated ARID1A, familial predisposition to WM, chr6q deletions, chr3q amplifications, and trisomy 4 are also described. The findings provide novel insights into the molecular pathogenesis and opportunities for targeted therapeutic strategies for WM. PMID:27301862

  12. Association of E26 Transformation Specific Sequence 1 Variants with Rheumatoid Arthritis in Chinese Han Population

    PubMed Central

    Yang, Bin; Cai, Bei; Su, Zhenzhen; Wang, Lanlan

    2015-01-01

    Objective E26 transformation specific sequence 1 (ETS-1) belongs to the ETS family of transcription factors that regulate the expression of various immune-related genes. Increasing evidence indicates that ETS-1 could contribute to the pathogenesis of autoimmune disease. Recent research has provided evidence that ETS-1 might correlate with rheumatoid arthritis (RA), but it's not clearly defined. In this study, we aimed to identify whether polymorphisms of ETS-1 play a role in Rheumatoid arthritis (RA) susceptibility and development in Chinese Han population. Methods Four single nucleotide polymorphisms (SNPs) within ETS-1 were selected based on HapMap data and previous associated studies. Whole blood and serum samples were obtained from 158 patients with RA and 192 healthy subjects. Genotyping was performed with polymerase chain reaction-high resolution melting (PCR-HRM) assay and the data was analyzed using SPSS17.0. Results A significantly positive correlation was observed between the SNP rs73013527 of ETS-1 and RA susceptibility, DAS28 and CRP (P<0.001, P = 0.001, and P = 0.028, respectively). Carriers of the haplotype CCT or TCT for rs4937333, rs11221332 and rs73013527 were associated with decreased risk of RA as compared to controls. No statistical significant difference was observed in the distribution of rs10893872, rs4937333 and rs11221332 genotypes between RA patients and controls. Conclusions Our data further supports that ETS-1 has a relevant role in the pathogenesis and development of RA. Allele T of rs73013527 plays a protective role in occurrence of RA but a risk factor in the high disease activity. Rs10893872, rs11221332 and rs4937333 are not associated with RA susceptibility and clinical features. PMID:26241881

  13. Data on the evolutionary history of the V(D)J recombination-activating protein 1 - RAG1 coupled with sequence and variant analyses.

    PubMed

    Kumar, Abhishek; Bhandari, Anita; Sarde, Sandeep J; Muppavarapu, Sekhar; Tandon, Ravi

    2016-09-01

    RAG1 protein is one of the key component of RAG complex regulating the V(D)J recombination. There are only few studies for RAG1 concerning evolutionary history, detailed sequence and mutational hotspots. Herein, we present out datasets used for the recent comprehensive study of RAG1 based on sequence, phylogenetic and genetic variant analyses (Kumar et al., 2015) [1]. Protein sequence alignment helped in characterizing the conserved domains and regions of RAG1. It also aided in unraveling ancestral RAG1 in the sea urchin. Human genetic variant analyses revealed 751 mutational hotspots, located both in the coding and the non-coding regions. For further analysis and discussion, see (Kumar et al., 2015) [1]. PMID:27284568

  14. Rare variant associations with waist-to-hip ratio in European-American and African-American women from the NHLBI-Exome Sequencing Project.

    PubMed

    Kan, Mengyuan; Auer, Paul L; Wang, Gao T; Bucasas, Kristine L; Hooker, Stanley; Rodriguez, Alejandra; Li, Biao; Ellis, Jaclyn; Adrienne Cupples, L; Ida Chen, Yii-Der; Dupuis, Josée; Fox, Caroline S; Gross, Myron D; Smith, Joshua D; Heard-Costa, Nancy; Meigs, James B; Pankow, James S; Rotter, Jerome I; Siscovick, David; Wilson, James G; Shendure, Jay; Jackson, Rebecca; Peters, Ulrike; Zhong, Hua; Lin, Danyu; Hsu, Li; Franceschini, Nora; Carlson, Chris; Abecasis, Goncalo; Gabriel, Stacey; Bamshad, Michael J; Altshuler, David; Nickerson, Deborah A; North, Kari E; Lange, Leslie A; Reiner, Alexander P; Leal, Suzanne M

    2016-08-01

    Waist-to-hip ratio (WHR), a relative comparison of waist and hip circumferences, is an easily accessible measurement of body fat distribution, in particular central abdominal fat. A high WHR indicates more intra-abdominal fat deposition and is an established risk factor for cardiovascular disease and type 2 diabetes. Recent genome-wide association studies have identified numerous common genetic loci influencing WHR, but the contributions of rare variants have not been previously reported. We investigated rare variant associations with WHR in 1510 European-American and 1186 African-American women from the National Heart, Lung, and Blood Institute-Exome Sequencing Project. Association analysis was performed on the gene level using several rare variant association methods. The strongest association was observed for rare variants in IKBKB (P=4.0 × 10(-8)) in European-Americans, where rare variants in this gene are predicted to decrease WHRs. The activation of the IKBKB gene is involved in inflammatory processes and insulin resistance, which may affect normal food intake and body weight and shape. Meanwhile, aggregation of rare variants in COBLL1, previously found to harbor common variants associated with WHR and fasting insulin, were nominally associated (P=2.23 × 10(-4)) with higher WHR in European-Americans. However, these significant results are not shared between African-Americans and European-Americans that may be due to differences in the allelic architecture of the two populations and the small sample sizes. Our study indicates that the combined effect of rare variants contribute to the inter-individual variation in fat distribution through the regulation of insulin response. PMID:26757982

  15. Sequence variants at CYP1A1–CYP1A2 and AHR associate with coffee consumption

    PubMed Central

    Sulem, Patrick; Gudbjartsson, Daniel F.; Geller, Frank; Prokopenko, Inga; Feenstra, Bjarke; Aben, Katja K.H.; Franke, Barbara; den Heijer, Martin; Kovacs, Peter; Stumvoll, Michael; Mägi, Reedik; Yanek, Lisa R.; Becker, Lewis C.; Boyd, Heather A.; Stacey, Simon N.; Walters, G. Bragi; Jonasdottir, Adalbjorg; Thorleifsson, Gudmar; Holm, Hilma; Gudjonsson, Sigurjon A.; Rafnar, Thorunn; Björnsdottir, Gyda; Becker, Diane M.; Melbye, Mads; Kong, Augustine; Tönjes, Anke; Thorgeirsson, Thorgeir; Thorsteinsdottir, Unnur; Kiemeney, Lambertus A.; Stefansson, Kari

    2011-01-01

    Coffee is the most commonly used stimulant and caffeine is its main psychoactive ingredient. The heritability of coffee consumption has been estimated at around 50%. We performed a meta-analysis of four genome-wide association studies of coffee consumption among coffee drinkers from Iceland (n = 2680), the Netherlands (n = 2791), the Sorbs Slavonic population isolate in Germany (n = 771) and the USA (n = 369) using both directly genotyped and imputed single nucleotide polymorphisms (SNPs) (2.5 million SNPs). SNPs at the two most significant loci were also genotyped in a sample set from Iceland (n = 2430) and a Danish sample set consisting of pregnant women (n = 1620). Combining all data, two sequence variants significantly associated with increased coffee consumption: rs2472297-T located between CYP1A1 and CYP1A2 at 15q24 (P = 5.4 · 10−14) and rs6968865-T near aryl hydrocarbon receptor (AHR) at 7p21 (P = 2.3 · 10−11). An effect of ∼0.2 cups a day per allele was observed for both SNPs. CYP1A2 is the main caffeine metabolizing enzyme and is also involved in drug metabolism. AHR detects xenobiotics, such as polycyclic aryl hydrocarbons found in roasted coffee, and induces transcription of CYP1A1 and CYP1A2. The association of these SNPs with coffee consumption was present in both smokers and non-smokers. PMID:21357676

  16. Development and Validation of a Next-Generation Sequencing Assay for BRCA1 and BRCA2 Variants for the Clinical Laboratory.

    PubMed

    Strom, Charles M; Rivera, Steven; Elzinga, Christopher; Angeloni, Taraneh; Rosenthal, Sun Hee; Goos-Root, Dana; Siaw, Martin; Platt, Jamie; Braastadt, Cory; Cheng, Linda; Ross, David; Sun, Weimin

    2015-01-01

    The objective of this study was to design and validate a next-generation sequencing assay (NGS) to detect BRCA1 and BRCA2 mutations. We developed an assay using random shearing of genomic DNA followed by RNA bait tile hybridization and NGS sequencing on both the Illumina MiSeq and Ion Personal Gene Machine (PGM). We determined that the MiSeq Reporter software supplied with the instrument could not detect deletions greater than 9 base pairs. Therefore, we developed an alternative alignment and variant calling software, Quest Sequencing Analysis Pipeline (QSAP), that was capable of detecting large deletions and insertions. In validation studies, we used DNA from 27 stem cell lines, all with known deleterious BRCA1 or BRCA2 mutations, and DNA from 67 consented control individuals who had a total of 352 benign variants. Both the MiSeq/QSAP combination and PGM/Torrent Suite combination had 100% sensitivity for the 379 known variants in the validation series. However, the PGM/Torrent Suite combination had a lower intra- and inter-assay precision of 96.2% and 96.7%, respectively when compared to the MiSeq/QSAP combination of 100% and 99.4%, respectively. All PGM/Torrent Suite inconsistencies were false-positive variant assignments. We began commercial testing using both platforms and in the first 521 clinical samples MiSeq/QSAP had 100% sensitivity for BRCA1/2 variants, including a 64-bp deletion and a 10-bp insertion not identified by PGM/Torrent Suite, which also suffered from a high false-positive rate. Neither the MiSeq nor PGM platform with their supplied alignment and variant calling software are appropriate for a clinical laboratory BRCA sequencing test. We have developed an NGS BRCA1/2 sequencing assay, MiSeq/QSAP, with 100% analytic sensitivity and specificity in the validation set consisting of 379 variants. The MiSeq/QSAP combination has sufficient performance for use in a clinical laboratory. PMID:26295337

  17. Development and Validation of a Next-Generation Sequencing Assay for BRCA1 and BRCA2 Variants for the Clinical Laboratory

    PubMed Central

    Strom, Charles M.; Rivera, Steven; Elzinga, Christopher; Angeloni, Taraneh; Rosenthal, Sun Hee; Goos-Root, Dana; Siaw, Martin; Platt, Jamie; Braastadt, Cory; Cheng, Linda; Ross, David; Sun, Weimin

    2015-01-01

    The objective of this study was to design and validate a next-generation sequencing assay (NGS) to detect BRCA1 and BRCA2 mutations. We developed an assay using random shearing of genomic DNA followed by RNA bait tile hybridization and NGS sequencing on both the Illumina MiSeq and Ion Personal Gene Machine (PGM). We determined that the MiSeq Reporter software supplied with the instrument could not detect deletions greater than 9 base pairs. Therefore, we developed an alternative alignment and variant calling software, Quest Sequencing Analysis Pipeline (QSAP), that was capable of detecting large deletions and insertions. In validation studies, we used DNA from 27 stem cell lines, all with known deleterious BRCA1 or BRCA2 mutations, and DNA from 67 consented control individuals who had a total of 352 benign variants. Both the MiSeq/QSAP combination and PGM/Torrent Suite combination had 100% sensitivity for the 379 known variants in the validation series. However, the PGM/Torrent Suite combination had a lower intra- and inter-assay precision of 96.2% and 96.7%, respectively when compared to the MiSeq/QSAP combination of 100% and 99.4%, respectively. All PGM/Torrent Suite inconsistencies were false-positive variant assignments. We began commercial testing using both platforms and in the first 521 clinical samples MiSeq/QSAP had 100% sensitivity for BRCA1/2 variants, including a 64-bp deletion and a 10-bp insertion not identified by PGM/Torrent Suite, which also suffered from a high false-positive rate. Neither the MiSeq nor PGM platform with their supplied alignment and variant calling software are appropriate for a clinical laboratory BRCA sequencing test. We have developed an NGS BRCA1/2 sequencing assay, MiSeq/QSAP, with 100% analytic sensitivity and specificity in the validation set consisting of 379 variants. The MiSeq/QSAP combination has sufficient performance for use in a clinical laboratory. PMID:26295337

  18. Structural gene and complete amino acid sequence of Pseudomonas aeruginosa IFO 3455 elastase.

    PubMed Central

    Fukushima, J; Yamamoto, S; Morihara, K; Atsumi, Y; Takeuchi, H; Kawamoto, S; Okuda, K

    1989-01-01

    The DNA encoding the elastase of Pseudomonas aeruginosa IFO 3455 was cloned, and its complete nucleotide sequence was determined. When the cloned gene was ligated to pUC18, the Escherichia coli expression vector, bacteria carrying the gene exhibited high levels of both elastase activity and elastase antigens. The amino acid sequence, deduced from the nucleotide sequence, revealed that the mature elastase consisted of 301 amino acids with a relative molecular mass of 32,926 daltons. The amino acid composition predicted from the DNA sequence was quite similar to the chemically determined composition of purified elastase reported previously. We also observed nucleotide sequence encoding a signal peptide and "pro" sequence consisting of 197 amino acids upstream from the mature elastase protein gene. The amino acid sequence analysis revealed that both the N-terminal sequence of the purified elastase and the N-terminal side sequences of the C-terminal tryptic peptide as well as the internal lysyl peptide fragment were completely identical to the deduced amino acid sequences. The pattern of identity of amino acid sequences was quite evident in the regions that include structurally and functionally important residues of Bacillus subtilis thermolysin. PMID:2493453

  19. Targeted sequencing of the Paget's disease associated 14q32 locus identifies several missense coding variants in RIN3 that predispose to Paget's disease of bone

    PubMed Central

    Vallet, Mahéva; Soares, Dinesh C.; Wani, Sachin; Sophocleous, Antonia; Warner, Jon; Salter, Donald M.; Ralston, Stuart H.; Albagha, Omar M.E.

    2015-01-01

    Paget's disease of bone (PDB) is a common disorder with a strong genetic component characterized by increased but disorganized bone remodelling. Previous genome-wide association studies identified a locus on chromosome 14q32 tagged by rs10498635 which was significantly associated with susceptibility to PDB in several European populations. Here we conducted fine-mapping and targeted sequencing of the candidate locus to identify possible functional variants. Imputation in 741 PDB patients and 2699 controls confirmed that the association was confined to a 60 kb region in the RIN3 gene and conditional analysis adjusting for rs10498635 identified no new independent signals. Sequencing of the RIN3 gene identified a common missense variant (p.R279C) that was strongly associated with the disease (OR = 0.64; P = 1.4 × 10−9), and was in strong linkage disequilibrium with rs10498635. A further 13 rare missense variants were identified, seven of which were novel and detected only in PDB cases. When combined, these rare variants were over-represented in cases compared with controls (OR = 3.72; P = 8.9 × 10−10). Most rare variants were located in a region that encodes a proline-rich, intrinsically disordered domain of the protein and many were predicted to be pathogenic. RIN3 was expressed in bone tissue and its expression level was ∼10-fold higher in osteoclasts compared with osteoblasts. We conclude that susceptibility to PDB at the 14q32 locus is mediated by a combination of common and rare coding variants in RIN3 and suggest that RIN3 may contribute to PDB susceptibility by affecting osteoclast function. PMID:25701875

  20. Human retroviruses and AIDS 1996. A compilation and analysis of nucleic acid and amino acid sequences

    SciTech Connect

    Myers, G.; Foley, B.; Korber, B.; Mellors, J.W.; Jeang, K.T.; Wain-Hobson, S.

    1997-04-01

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (1) Nuclear Acid Alignments and Sequences; (2) Amino Acid Alignments; (3) Analysis; (4) Related Sequences; and (5) Database Communications. Information within all the parts is updated throughout the year on the Web site, http://hiv-web.lanl.gov. While this publication could take the form of a review or sequence monograph, it is not so conceived. Instead, the literature from which the database is derived has simply been summarized and some elementary computational analyses have been performed upon the data. Interpretation and commentary have been avoided insofar as possible so that the reader can form his or her own judgments concerning the complex information. In addition to the general descriptions of the parts of the compendium, the user should read the individual introductions for each part.

  1. Effect of temperature shift on levels of acidic charge variants in IgG monoclonal antibodies in Chinese hamster ovary cell culture.

    PubMed

    Kishishita, Shohei; Nishikawa, Tomoko; Shinoda, Yasuharu; Nagashima, Hiroaki; Okamoto, Hiroshi; Takuma, Shinya; Aoyagi, Hideki

    2015-06-01

    During the production of therapeutic monoclonal antibodies (mAbs), not only enhancement of mAb productivity but also control of quality attributes is critical. Charge variants, which are among the most important quality attributes, can substantially affect the in vitro and in vivo properties of mAbs. During process development for the production of mAbs in a Chinese hamster ovary cell line, we have observed that an improvement in mAb titer is accompanied by an increase in the content of acidic charge variants. Here, to help maintain comparability among mAbs, we aimed to identify the process parameters that controlled the content of acidic charge variants. First, we used a Plackett-Burman design to identify the effect of selected process parameters on the acidic charge variant content. Eight process parameters were selected by using a failure modes and effects analysis. Among these, temperature shift was identified from the Plackett-Burman design as the factor most influencing the acidic charge variant content. We then investigated in more detail the effects of shift temperature and temperature shift timing on this content. The content decreased with a shift to a lower temperature and with earlier timing of this temperature shift. Our observations suggest that Plackett-Burman designs are advantageous for preliminary screening of bioprocess parameters. We report here for the first time that temperature downshift is beneficial for effective control of the acidic peak variant content. PMID:25466646

  2. Development and Validation of a Template-Independent Next-Generation Sequencing Assay for Detecting Low-Level Resistance-Associated Variants of Hepatitis C Virus.

    PubMed

    Wei, Bo; Kang, John; Kibukawa, Miho; Chen, Lei; Qiu, Ping; Lahser, Fred; Marton, Matthew; Levitan, Diane

    2016-09-01

    To develop hepatitis C virus (HCV) direct-acting antiviral (DAA) drugs that can treat most HCV genotypes and offer higher barriers for treatment-resistant mutations, it is important to study resistance-associated variants (RAVs). Current commercially available RAV detection assays rely on genotype- or subtype-specific template-dependent PCR amplification. These assays are limited to genotypes and subtypes that are often prevalent in developed countries because of availability of public sequence databases. To support global clinical trials of DAAs, we developed and validated a template-independent (TI) next-generation sequencing (NGS) assay for HCV whole genome sequencing that can perform HCV subtyping, detect HCV mixed genotype or subtype infection, and identify low-level RAVs at a 5% fraction of the viral population with sensitivity and positive predictive value ≥ 0.9. We compared TI-NGS with commercial genotype- or subtype-specific Sanger sequencing assays, and found that TI-NGS both confirmed most of variants called by Sanger sequencing and avoided biases likely caused by PCR primers used in Sanger sequencing. To confirm TI-NGS assay's variant calls at the discrepant positions with Sanger sequencing, we custom designed template-dependent NGS assays and obtained 100% concordance with the TI-NGS assay. The ability to reliably detect low-level RAVs in HCV samples of any subtype without PCR primer-related bias makes this TI-NGS assay an important tool in studying HCV DAA drug resistance. PMID:27393904

  3. Analysis of Coding Variants Identified from Exome Sequencing Resources for Association with Diabetic and Non-diabetic Nephropathy in African Americans

    PubMed Central

    Ng, Maggie C.Y.; Bonomo, Jason A.; Hicks, Pamela J.; Hester, Jessica M.; Langefeld, Carl D.; Freedman, Barry I.; Bowden, Donald W.

    2014-01-01

    Prior studies have identified common genetic variants influencing diabetic and non-diabetic nephropathy, diseases which disproportionately affect African Americans. Recently, exome sequencing techniques have facilitated identification of coding variants on a genome-wide basis in large samples. Exonic variants in known or suspected end-stage kidney disease (ESKD) or nephropathy genes can be tested for their ability to identify association either singly or in combination with known associated common variants. Coding variants in genes with prior evidence for association with ESKD or nephropathy were identified in the NHLBI-ESP GO database and genotyped in 5045 African Americans (3324 cases with type 2 diabetes associated nephropathy [T2D-ESKD] or non-T2D ESKD, and 1721 controls) and 1465 European Americans (568 T2D-ESKD cases and 897 controls). Logistic regression analyses were performed to assess association, with admixture and APOL1 risk status incorporated as covariates. Ten of 31 SNPs were associated in African Americans; four replicated in European Americans. In African Americans, SNPs in OR2L8, OR2AK2, C6orf167 (MMS22L), LIMK2, APOL3, APOL2, and APOL1 were nominally associated (P=1.8×10−4-0.044). Haplotype analysis of common and coding variants increased evidence of association at the OR2L13 and APOL1 loci (P=6.2×10−5 and 4.6×10−5, respectively). SNPs replicating in European Americans were in OR2AK2, LIMK2, and APOL2 (P=0.0010-0.037). Meta-analyses highlighted four SNPs associated in T2DESKD and all-cause ESKD. Results from this study suggest a role for coding variants in the development of diabetic, non-diabetic, and/or all-cause ESKD in African Americans and/or European Americans. PMID:24385048

  4. Whole-exome sequencing and imaging genetics identify functional variants for rate of change in hippocampal volume in mild cognitive impairment.

    PubMed

    Nho, K; Corneveaux, J J; Kim, S; Lin, H; Risacher, S L; Shen, L; Swaminathan, S; Ramanan, V K; Liu, Y; Foroud, T; Inlow, M H; Siniard, A L; Reiman, R A; Aisen, P S; Petersen, R C; Green, R C; Jack, C R; Weiner, M W; Baldwin, C T; Lunetta, K; Farrer, L A; Furney, S J; Lovestone, S; Simmons, A; Mecocci, P; Vellas, B; Tsolaki, M; Kloszewska, I; Soininen, H; McDonald, B C; Farlow, M R; Ghetti, B; Huentelman, M J; Saykin, A J

    2013-07-01

    Whole-exome sequencing of individuals with mild cognitive impairment, combined with genotype imputation, was used to identify coding variants other than the apolipoprotein E (APOE) ε4 allele associated with rate of hippocampal volume loss using an extreme trait design. Matched unrelated APOE ε3 homozygous male Caucasian participants from the Alzheimer's Disease Neuroimaging Initiative (ADNI) were selected at the extremes of the 2-year longitudinal change distribution of hippocampal volume (eight subjects with rapid rates of atrophy and eight with slow/stable rates of atrophy). We identified 57 non-synonymous single nucleotide variants (SNVs) which were found exclusively in at least 4 of 8 subjects in the rapid atrophy group, but not in any of the 8 subjects in the slow atrophy group. Among these SNVs, the variants that accounted for the greatest group difference and were predicted in silico as 'probably damaging' missense variants were rs9610775 (CARD10) and rs1136410 (PARP1). To further investigate and extend the exome findings in a larger sample, we conducted quantitative trait analysis including whole-brain search in the remaining ADNI APOE ε3/ε3 group (N=315). Genetic variation within PARP1 and CARD10 was associated with rate of hippocampal neurodegeneration in APOE ε3/ε3. Meta-analysis across five independent cross sectional cohorts indicated that rs1136410 is also significantly associated with hippocampal volume in APOE ε3/ε3 individuals (N=923). Larger sequencing studies and longitudinal follow-up are needed for confirmation. The combination of next-generation sequencing and quantitative imaging phenotypes holds significant promise for discovery of variants involved in neurodegeneration. PMID:23608917

  5. A variant in the sonic hedgehog regulatory sequence (ZRS) is associated with triphalangeal thumb and deregulates expression in the developing limb

    PubMed Central

    Furniss, Dominic; Lettice, Laura A.; Taylor, Indira B.; Critchley, Paul S.; Giele, Henk; Hill, Robert E.; Wilkie, Andrew O.M.

    2008-01-01

    A locus for triphalangeal thumb, variably associated with pre-axial polydactyly, was previously identified in the zone of polarizing activity regulatory sequence (ZRS), a long range limb-specific enhancer of the Sonic Hedgehog (SHH) gene at human chromosome 7q36.3. Here, we demonstrate that a 295T>C variant in the human ZRS, previously thought to represent a neutral polymorphism, acts as a dominant allele with reduced penetrance. We found this variant in three independently ascertained probands from southern England with triphalangeal thumb, demonstrated significant linkage of the phenotype to the variant (LOD = 4.1), and identified a shared microsatellite haplotype around the ZRS, suggesting that the probands share a common ancestor. An individual homozygous for the 295C allele presented with isolated bilateral triphalangeal thumb resembling the heterozygous phenotype, suggesting that the variant is largely dominant to the wild-type allele. As a functional test of the pathogenicity of the 295C allele, we utilized a mutated ZRS construct to demonstrate that it can drive ectopic anterior expression of a reporter gene in the developing mouse forelimb. We conclude that the 295T>C variant is in fact pathogenic and, in southern England, appears to be the most common cause of triphalangeal thumb. Depending on the dispersal of the founding mutation, it may play a wider role in the aetiology of this disorder. PMID:18463159

  6. Meta-Analysis of 28,141 Individuals Identifies Common Variants within Five New Loci That Influence Uric Acid Concentrations

    PubMed Central

    Sanna, Serena; Teumer, Alexander; Vitart, Veronique; Perola, Markus; Mangino, Massimo; Albrecht, Eva; Wallace, Chris; Farrall, Martin; Johansson, Åsa; Nyholt, Dale R.; Aulchenko, Yurii; Beckmann, Jacques S.; Bergmann, Sven; Bochud, Murielle; Brown, Morris; Campbell, Harry; Connell, John; Dominiczak, Anna; Homuth, Georg; Lamina, Claudia; McCarthy, Mark I.; Meitinger, Thomas; Mooser, Vincent; Munroe, Patricia; Nauck, Matthias; Peden, John; Prokisch, Holger; Salo, Perttu; Salomaa, Veikko; Samani, Nilesh J.; Schlessinger, David; Uda, Manuela; Völker, Uwe; Waeber, Gérard; Waterworth, Dawn; Wang-Sattler, Rui; Wright, Alan F.; Adamski, Jerzy; Whitfield, John B.; Gyllensten, Ulf; Wilson, James F.; Rudan, Igor; Pramstaller, Peter; Watkins, Hugh; Doering, Angela; Wichmann, H.-Erich; Spector, Tim D.; Peltonen, Leena; Völzke, Henry; Nagaraja, Ramaiah; Vollenweider, Peter; Caulfield, Mark; Illig, Thomas; Gieger, Christian

    2009-01-01

    Elevated serum uric acid levels cause gout and are a risk factor for cardiovascular disease and diabetes. To investigate the polygenetic basis of serum uric acid levels, we conducted a meta-analysis of genome-wide association scans from 14 studies totalling 28,141 participants of European descent, resulting in identification of 954 SNPs distributed across nine loci that exceeded the threshold of genome-wide significance, five of which are novel. Overall, the common variants associated with serum uric acid levels fall in the following nine regions: SLC2A9 (p = 5.2×10−201), ABCG2 (p = 3.1×10−26), SLC17A1 (p = 3.0×10−14), SLC22A11 (p = 6.7×10−14), SLC22A12 (p = 2.0×10−9), SLC16A9 (p = 1.1×10−8), GCKR (p = 1.4×10−9), LRRC16A (p = 8.5×10−9), and near PDZK1 (p = 2.7×10−9). Identified variants were analyzed for gender differences. We found that the minor allele for rs734553 in SLC2A9 has greater influence in lowering uric acid levels in women and the minor allele of rs2231142 in ABCG2 elevates uric acid levels more strongly in men compared to women. To further characterize the identified variants, we analyzed their association with a panel of metabolites. rs12356193 within SLC16A9 was associated with DL-carnitine (p = 4.0×10−26) and propionyl-L-carnitine (p = 5.0×10−8) concentrations, which in turn were associated with serum UA levels (p = 1.4×10−57 and p = 8.1×10−54, respectively), forming a triangle between SNP, metabolites, and UA levels. Taken together, these associations highlight additional pathways that are important in the regulation of serum uric acid levels and point toward novel potential targets for pharmacological intervention to prevent or treat hyperuricemia. In addition, these findings strongly support the hypothesis that transport proteins are key in regulating serum uric acid levels. PMID:19503597

  7. Natural vs. random protein sequences: Discovering combinatorics properties on amino acid words.

    PubMed

    Santoni, Daniele; Felici, Giovanni; Vergni, Davide

    2016-02-21

    Casual mutations and natural selection have driven the evolution of protein amino acid sequences that we observe at present in nature. The question about which is the dominant force of proteins evolution is still lacking of an unambiguous answer. Casual mutations tend to randomize protein sequences while, in order to have the correct functionality, one expects that selection mechanisms impose rigid constraints on amino acid sequences. Moreover, one also has to consider that the space of all possible amino acid sequences is so astonishingly large that it could be reasonable to have a well tuned amino acid sequence indistinguishable from a random one. In order to study the possibility to discriminate between random and natural amino acid sequences, we introduce different measures of association between pairs of amino acids in a sequence, and apply them to a dataset of 1047 natural protein sequences and 10,470 random sequences, carefully generated in order to preserve the relative length and amino acid distribution of the natural proteins. We analyze the multidimensional measures with machine learning techniques and show that, to a reasonable extent, natural protein sequences can be differentiated from random ones. PMID:26656109

  8. Transcriptome Sequencing in Response to Salicylic Acid in Salvia miltiorrhiza

    PubMed Central

    Zhang, Xiaoru; Dong, Juane; Liu, Hailong; Wang, Jiao; Qi, Yuexin; Liang, Zongsuo

    2016-01-01

    Salvia miltiorrhiza is a traditional Chinese herbal medicine, whose quality and yield are often affected by diseases and environmental stresses during its growing season. Salicylic acid (SA) plays a significant role in plants responding to biotic and abiotic stresses, but the involved regulatory factors and their signaling mechanisms are largely unknown. In order to identify the genes involved in SA signaling, the RNA sequencing (RNA-seq) strategy was employed to evaluate the transcriptional profiles in S. miltiorrhiza cell cultures. A total of 50,778 unigenes were assembled, in which 5,316 unigenes were differentially expressed among 0-, 2-, and 8-h SA induction. The up-regulated genes were mainly involved in stimulus response and multi-organism process. A core set of candidate novel genes coding SA signaling component proteins was identified. Many transcription factors (e.g., WRKY, bHLH and GRAS) and genes involved in hormone signal transduction were differentially expressed in response to SA induction. Detailed analysis revealed that genes associated with defense signaling, such as antioxidant system genes, cytochrome P450s and ATP-binding cassette transporters, were significantly overexpressed, which can be used as genetic tools to investigate disease resistance. Our transcriptome analysis will help understand SA signaling and its mechanism of defense systems in S. miltiorrhiza. PMID:26808150

  9. Transcriptome Sequencing in Response to Salicylic Acid in Salvia miltiorrhiza.

    PubMed

    Zhang, Xiaoru; Dong, Juane; Liu, Hailong; Wang, Jiao; Qi, Yuexin; Liang, Zongsuo

    2016-01-01

    Salvia miltiorrhiza is a traditional Chinese herbal medicine, whose quality and yield are often affected by diseases and environmental stresses during its growing season. Salicylic acid (SA) plays a significant role in plants responding to biotic and abiotic stresses, but the involved regulatory factors and their signaling mechanisms are largely unknown. In order to identify the genes involved in SA signaling, the RNA sequencing (RNA-seq) strategy was employed to evaluate the transcriptional profiles in S. miltiorrhiza cell cultures. A total of 50,778 unigenes were assembled, in which 5,316 unigenes were differentially expressed among 0-, 2-, and 8-h SA induction. The up-regulated genes were mainly involved in stimulus response and multi-organism process. A core set of candidate novel genes coding SA signaling component proteins was identified. Many transcription factors (e.g., WRKY, bHLH and GRAS) and genes involved in hormone signal transduction were differentially expressed in response to SA induction. Detailed analysis revealed that genes associated with defense signaling, such as antioxidant system genes, cytochrome P450s and ATP-binding cassette transporters, were significantly overexpressed, which can be used as genetic tools to investigate disease resistance. Our transcriptome analysis will help understand SA signaling and its mechanism of defense systems in S. miltiorrhiza. PMID:26808150

  10. Inosine Triphosphate Pyrophosphohydrolase (ITPA) polymorphic sequence variants in adult hematological malignancy patients and possible association with mitochondrial DNA defects

    PubMed Central

    2013-01-01

    Background Inosine triphosphate pyrophosphohydrolase (ITPase) is a ‘house-cleaning’ enzyme that degrades non-canonical (‘rogue’) nucleotides. Complete deficiency is fatal in knockout mice, but a mutant polymorphism resulting in low enzyme activity with an accumulation of ITP and other non-canonical nucleotides, appears benign in humans. We hypothesised that reduced ITPase activity may cause acquired mitochondrial DNA (mtDNA) defects. Furthermore, we investigated whether accumulating mtDNA defects may then be a risk factor for cell transformation, in adult haematological malignancy (AHM). Methods DNA was extracted from peripheral blood and bone marrow samples. Microarray-based sequencing of mtDNA was performed on 13 AHM patients confirmed as carrying the ITPA 94C>A mutation causing low ITPase activity, and 4 AHM patients with wildtype ITPA. The frequencies of ITPA 94C>A and IVS2+21A>C polymorphisms were studied from 85 available AHM patients. Results ITPA 94C>A was associated with a significant increase in total heteroplasmic/homoplasmic mtDNA mutations (p<0.009) compared with wildtype ITPA, following exclusion of haplogroup variants. This suggested that low ITPase activity may induce mitochondrial abnormalities. Compared to the normal population, frequencies for the 94C>A and IVS2+21A>C mutant alleles among the AHM patients were higher for myelodyplastic syndrome (MDS) - but below significance; were approximately equivalent for chronic lymphoblastic leukemia; and were lower for acute myeloid leukemia. Conclusions This study invokes a new paradigm for the evolution of MDS, where nucleotide imbalances produced by defects in ‘house-cleaning’ genes may induce mitochondrial dysfunction, compromising cell integrity. It supports recent studies which point towards an important role for ITPase in cellular surveillance of rogue nucleotides. PMID:23547827

  11. Replication fitness of multiple nonnucleoside reverse transcriptase-resistant HIV-1 variants in the presence of etravirine measured by 454 deep sequencing.

    PubMed

    Brumme, Chanson J; Huber, Kelly D; Dong, Winnie; Poon, Art F Y; Harrigan, P Richard; Sluis-Cremer, Nicolas

    2013-08-01

    We applied an efficient method to characterize the relative fitness levels of multiple nonnucleoside reverse transcriptase (NNRTI)-resistant HIV-1 variants by simultaneous competitive culture and 454 deep sequencing. Using this method, we show that the Y181V mutation in the HIV-1 reverse transcriptase in particular confers a clear selective advantage to the virus over 14 other NNRTI resistance mutations in the presence of etravirine in vitro. PMID:23720723

  12. Characterization of ISXax1, a Novel Insertion Sequence Restricted to Xanthomonas axonopodis pv. phaseoli (Variants fuscans and non-fuscans) and Xanthomonas axonopodis pv. vesicatoria▿

    PubMed Central

    Alavi, Seyed Mehdi; Poussier, Stéphane; Manceau, Charles

    2007-01-01

    ISXax1 is a novel insertion sequence belonging to the IS256 and Mutator families. Dot blot, Southern blot, and PCR analyses revealed that ISXax1 is restricted to Xanthomonas axonopodis pv. phaseoli (variants fuscans and non-fuscans) and X. axonopodis pv. vesicatoria strains. Directed AFLP also showed that a high degree of polymorphism is associated with ISXax1 insertion in these strains. PMID:17209062

  13. Sequencing the GRHL3 Coding Region Reveals Rare Truncating Mutations and a Common Susceptibility Variant for Nonsyndromic Cleft Palate.

    PubMed

    Mangold, Elisabeth; Böhmer, Anne C; Ishorst, Nina; Hoebel, Ann-Kathrin; Gültepe, Pinar; Schuenke, Hannah; Klamt, Johanna; Hofmann, Andrea; Gölz, Lina; Raff, Ruth; Tessmann, Peter; Nowak, Stefanie; Reutter, Heiko; Hemprich, Alexander; Kreusch, Thomas; Kramer, Franz-Josef; Braumann, Bert; Reich, Rudolf; Schmidt, Gül; Jäger, Andreas; Reiter, Rudolf; Brosch, Sibylle; Stavusis, Janis; Ishida, Miho; Seselgyte, Rimante; Moore, Gudrun E; Nöthen, Markus M; Borck, Guntram; Aldhorae, Khalid A; Lace, Baiba; Stanier, Philip; Knapp, Michael; Ludwig, Kerstin U

    2016-04-01

    Nonsyndromic cleft lip with/without cleft palate (nsCL/P) and nonsyndromic cleft palate only (nsCPO) are the most frequent subphenotypes of orofacial clefts. A common syndromic form of orofacial clefting is Van der Woude syndrome (VWS) where individuals have CL/P or CPO, often but not always associated with lower lip pits. Recently, ∼5% of VWS-affected individuals were identified with mutations in the grainy head-like 3 gene (GRHL3). To investigate GRHL3 in nonsyndromic clefting, we sequenced its coding region in 576 Europeans with nsCL/P and 96 with nsCPO. Most strikingly, nsCPO-affected individuals had a higher minor allele frequency for rs41268753 (0.099) than control subjects (0.049; p = 1.24 × 10(-2)). This association was replicated in nsCPO/control cohorts from Latvia, Yemen, and the UK (pcombined = 2.63 × 10(-5); ORallelic = 2.46 [95% CI 1.6-3.7]) and reached genome-wide significance in combination with imputed data from a GWAS in nsCPO triads (p = 2.73 × 10(-9)). Notably, rs41268753 is not associated with nsCL/P (p = 0.45). rs41268753 encodes the highly conserved p.Thr454Met (c.1361C>T) (GERP = 5.3), which prediction programs denote as deleterious, has a CADD score of 29.6, and increases protein binding capacity in silico. Sequencing also revealed four novel truncating GRHL3 mutations including two that were de novo in four families, where all nine individuals harboring mutations had nsCPO. This is important for genetic counseling: given that VWS is rare compared to nsCPO, our data suggest that dominant GRHL3 mutations are more likely to cause nonsyndromic than syndromic CPO. Thus, with rare dominant mutations and a common risk variant in the coding region, we have identified an important contribution for GRHL3 in nsCPO. PMID:27018475

  14. Relationship between a common variant in the fatty acid desaturase (FADS) cluster and eicosanoid generation in humans.

    PubMed

    Hester, Austin G; Murphy, Robert C; Uhlson, Charis J; Ivester, Priscilla; Lee, Tammy C; Sergeant, Susan; Miller, Leslie R; Howard, Timothy D; Mathias, Rasika A; Chilton, Floyd H

    2014-08-01

    Dramatic shifts in the Western diet have led to a marked increase in the dietary intake of the n-6 polyunsaturated fatty acid (PUFA), linoleic acid (LA). Dietary LA can then be converted to arachidonic acid (ARA) utilizing three enzymatic steps. Two of these steps are encoded for by the fatty acid desaturase (FADS) cluster (chromosome 11, 11q12.2-q13) and certain genetic variants within the cluster are highly associated with ARA levels. However, no study to date has examined whether these variants further influence pro-inflammatory, cyclooxygenase and lipoxygenase eicosanoid products. This study examined the impact of a highly influential FADS SNP, rs174537 on leukotriene, HETE, prostaglandin, and thromboxane biosynthesis in stimulated whole blood. Thirty subjects were genotyped at rs174537 (GG, n = 11; GT, n = 13; TT, n = 6), a panel of fatty acids from whole serum was analyzed, and precursor-to-product PUFA ratios were calculated as a marker of the capacity of tissues (particularly the liver) to synthesize long chain PUFAs. Eicosanoids produced by stimulated human blood were measured by LC-MS/MS. We observed an association between rs174537 and the ratio of ARA/LA, leukotriene B4, and 5-HETE but no effect on levels of cyclooxygenase products. Our results suggest that variation at rs174537 not only impacts the synthesis of ARA but the overall capacity of whole blood to synthesize 5-lipoxygenase products; these genotype-related changes in eicosanoid levels could have important implications in a variety of inflammatory diseases. PMID:24962583

  15. Association Between Variants of PRDM1 and NDP52 and Crohn’s Disease, Based on Exome Sequencing and Functional Studies

    PubMed Central

    Ellinghaus, David; Zhang, Hu; Zeissig, Sebastian; Lipinski, Simone; Till, Andreas; Jiang, Tao; Stade, Björn; Bromberg, Yana; Ellinghaus, Eva; Keller, Andreas; Rivas, Manuel A; Skieceviciene, Jurgita; Doncheva, Nadezhda T; Liu, Xiao; Liu, Qing; Jiang, Fuman; Forster, Michael; Mayr, Gabriele; Albrecht, Mario; Häsler, Robert; Boehm, Bernhard O; Goodall, Jane; Berzuini, Carlo R; Lee, James; Andersen, Vibeke; Vogel, Ulla; Kupcinskas, Limas; Kayser, Manfred; Krawczak, Michael; Nikolaus, Susanna; Weersma, Rinse K; Ponsioen, Cyriel Y; Sans, Miquel; Wijmenga, Cisca; Strachan, David P; McArdle, Wendy L; Vermeire, Séverine; Rutgeerts, Paul; Sanderson, Jeremy D; Mathew, Christopher G; Vatn, Morten H; Wang, Jun; Nöthen, Markus M; Duerr, Richard H; Büning, Carsten; Brand, Stephan; Glas, Jürgen; Winkelmann, Juliane; Illig, Thomas; Latiano, Anna; Annese, Vito; Halfvarson, Jonas; D’Amato, Mauro; Daly, Mark J; Nothnagel, Michael; Karlsen, Tom H; Subramani, Suresh; Rosenstiel, Philip; Schreiber, Stefan; Parkes, Miles; Franke, Andre

    2013-01-01

    Background & Aims Genome-wide association studies (GWASs) have identified 140 Crohn’s disease (CD) susceptibility loci. For most loci, the variants that cause disease are not known and the genes affected by these variants have not been identified. We aimed to identify variants that cause CD through detailed sequencing, genetic association, expression, and functional studies. Methods We sequenced whole exomes of 42 unrelated subjects with Crohn’s disease (CD) and 5 healthy individuals (controls), and then filtered single-nucleotide variants by incorporating association results from meta-analyses of CD GWASs and in silico mutation effect prediction algorithms. We then genotyped 9348 patients with CD, 2868 with ulcerative colitis, and 14,567 controls, and associated variants analyzed in functional studies using materials from patients and controls and in vitro model systems. Results We identified rare missense mutations in PR domain-containing1 (PRDM1) and associated these with CD. These increased proliferation of T cells and secretion of cytokines upon activation, and increased expression of the adhesion molecule L-selectin. A common CD risk allele, identified in GWASs, correlated with reduced expression of PRDM1 in ileal biopsies and peripheral blood mononuclear cells (combined P=1.6×0−8). We identified an association between CD and a common missense variant, Val248Ala, in nuclear domain 10 protein 52 (NDP52) (P=4.83×10−9). We found that this variant impairs the regulatory functions of NDP52 to inhibit NFκB activation of genes that regulate inflammation and affect stability of proteins in toll-like receptor pathways. Conclusions We have extended GWAS results and provide evidence that variants in PRDM1 and NDP52 determine susceptibility to CD. PRDM1 maps adjacent to a CD interval identified in GWASs and encodes a transcription factor expressed by T and B cells. NDP52 is an adaptor protein that functions in selective autophagy of intracellular bacteria and

  16. The LITAF/SIMPLE I92V sequence variant results in an earlier age of onset of CMT1A/HNPP diseases.

    PubMed

    Sinkiewicz-Darol, Elena; Lacerda, Andressa Ferreira; Kostera-Pruszczyk, Anna; Potulska-Chromik, Anna; Sokołowska, Beata; Kabzińska, Dagmara; Brunetti, Craig R; Hausmanowa-Petrusewicz, Irena; Kochański, Andrzej

    2015-01-01

    Charcot-Marie-Tooth disease type 1A (CMT1A) and hereditary neuropathy with liability to pressure palsies (HNPP) represent the most common heritable neuromuscular disorders. Molecular diagnostics of CMT1A/HNPP diseases confirm clinical diagnosis, but their value is limited to the clinical course and prognosis. However, no biomarkers of CMT1A/HNPP have been identified. We decided to explore if the LITAF/SIMPLE gene shared a functional link to the PMP22 gene, whose duplication or deletion results in CMT1A and HNPP, respectively. By studying a large cohort of CMT1A/HNPP-affected patients, we found that the LITAF I92V sequence variant predisposes patients to an earlier age of onset of both the CMT1A and HNPP diseases. Using cell transfection experiments, we showed that the LITAF I92V sequence variant partially mislocalizes to the mitochondria in contrast to wild-type LITAF which localizes to the late endosome/lysosomes and is associated with a tendency for PMP22 to accumulate in the cells. Overall, this study shows that the I92V LITAF sequence variant would be a good candidate for a biomarker in the case of the CMT1A/HNPP disorders. PMID:25342198

  17. Analysis of the Enzymatic Activity of an NS3 Helicase Genotype 3a Variant Sequence Obtained from a Relapse Patient

    PubMed Central

    Provazzi, Paola J. S.; Mukherjee, Sourav; Hanson, Alicia M.; Nogueira, Mauricio L.; Carneiro, Bruno M.; Frick, David N.; Rahal, Paula

    2015-01-01

    The hepatitis C virus (HCV) is a species of diverse genotypes that infect over 170 million people worldwide, causing chronic inflammation, cirrhosis and hepatocellular carcinoma. HCV genotype 3a is common in Brazil, and it is associated with a relatively poor response to current direct-acting antiviral therapies. The HCV NS3 protein cleaves part of the HCV polyprotein, and cellular antiviral proteins. It is therefore the target of several HCV drugs. In addition to its protease activity, NS3 is also an RNA helicase. Previously, HCV present in a relapse patient was found to harbor a mutation known to be lethal to HCV genotype 1b. The point mutation encodes the amino acid substitution W501R in the helicase RNA binding site. To examine how the W501R substitution affects NS3 helicase activity in a genotype 3a background, wild type and W501R genotype 3a NS3 alleles were sub-cloned, expressed in E. coli, and the recombinant proteins were purified and characterized. The impact of the W501R allele on genotype 2a and 3a subgenomic replicons was also analyzed. Assays monitoring helicase-catalyzed DNA and RNA unwinding revealed that the catalytic efficiency of wild type genotype 3a NS3 helicase was more than 600 times greater than the W501R protein. Other assays revealed that the W501R protein bound DNA less than 2 times weaker than wild type, and both proteins hydrolyzed ATP at similar rates. In Huh7.5 cells, both genotype 2a and 3a subgenomic HCV replicons harboring the W501R allele showed a severe defect in replication. Since the W501R allele is carried as a minor variant, its replication would therefore need to be attributed to the trans-complementation by other wild type quasispecies. PMID:26658750

  18. Matrix genes of measles virus and canine distemper virus: cloning, nucleotide sequences, and deduced amino acid sequences.

    PubMed Central

    Bellini, W J; Englund, G; Richardson, C D; Rozenblatt, S; Lazzarini, R A

    1986-01-01

    The nucleotide sequences encoding the matrix (M) proteins of measles virus (MV) and canine distemper virus (CDV) were determined from cDNA clones containing these genes in their entirety. In both cases, single open reading frames specifying basic proteins of 335 amino acid residues were predicted from the nucleotide sequences. Both viral messages were composed of approximately 1,450 nucleotides and contained 400 nucleotides of presumptive noncoding sequences at their respective 3' ends. MV and CDV M-protein-coding regions were 67% homologous at the nucleotide level and 76% homologous at the amino acid level. Only chance homology was observed in the 400-nucleotide trailer sequences. Comparisons of the M protein sequences of MV and CDV with the sequence reported for Sendai virus (B. M. Blumberg, K. Rose, M. G. Simona, L. Roux, C. Giorgi, and D. Kolakofsky, J. Virol. 52:656-663; Y. Hidaka, T. Kanda, K. Iwasaki, A. Nomoto, T. Shioda, and H. Shibuta, Nucleic Acids Res. 12:7965-7973) indicated the greatest homology among these M proteins in the carboxyterminal third of the molecule. Secondary-structure analyses of this shared region indicated a structurally conserved, hydrophobic sequence which possibly interacted with the lipid bilayer. Images PMID:3754588

  19. Differentially Expressed Genes in Endometrium and Corpus Luteum of Holstein Cows Selected for High and Low Fertility Are Enriched for Sequence Variants Associated with Fertility.

    PubMed

    Moore, Stephen G; Pryce, Jennie E; Hayes, Ben J; Chamberlain, Amanda J; Kemper, Kathryn E; Berry, Donagh P; McCabe, Matt; Cormican, Paul; Lonergan, Pat; Fair, Trudee; Butler, Stephen T

    2016-01-01

    Despite the importance of fertility in humans and livestock, there has been little success dissecting the genetic basis of fertility. Our hypothesis was that genes differentially expressed in the endometrium and corpus luteum on Day 13 of the estrous cycle between cows with either good or poor genetic merit for fertility would be enriched for genetic variants associated with fertility. We combined a unique genetic model of fertility (cattle that have been selected for high and low fertility and show substantial difference in fertility) with gene expression data from these cattle and genome-wide association study (GWAS) results in ∼20,000 cattle to identify quantitative trait loci (QTL) regions and sequence variants associated with genetic variation in fertility. Two hundred and forty-five QTL regions and 17 sequence variants associated primarily with prostaglandin F2alpha, steroidogenesis, mRNA processing, energy status, and immune-related processes were identified. Ninety-three of the QTL regions were validated by two independent GWAS, with signals for fertility detected primarily on chromosomes 18, 5, 7, 8, and 29. Plausible causative mutations were identified, including one missense variant significantly associated with fertility and predicted to affect the protein function of EIF4EBP3. The results of this study enhance our understanding of 1) the contribution of the endometrium and corpus luteum transcriptome to phenotypic fertility differences and 2) the genetic architecture of fertility in dairy cattle. Including these variants in predictions of genomic breeding values may improve the rate of genetic gain for this critical trait. PMID:26607721

  20. Generic and sequence-variant specific molecular assays for the detection of the highly variable Grapevine leafroll-associated virus 3.

    PubMed

    Chooi, Kar Mun; Cohen, Daniel; Pearson, Michael N

    2013-04-01

    Grapevine leafroll-associated virus 3 (GLRaV-3) is an economically important virus, which is found in all grapevine growing regions worldwide. Its accurate detection in nursery and field samples is of high importance for certification schemes and disease management programmes. To reduce false negatives that can be caused by sequence variability, a new universal primer pair was designed against a divergent sequence data set, targeting the open reading frame 4 (heat shock protein 70 homologue gene), and optimised for conventional one-step RT-PCR and one-step SYBR Green real-time RT-PCR assays. In addition, primer pairs for the simultaneous detection of specific GLRaV-3 variants from groups 1, 2, 6 (specifically NZ-1) and the outlier NZ2 variant, and the generic detection of variants from groups 1 to 5 were designed and optimised as a conventional one-step multiplex RT-PCR assay using the plant nad5 gene as an internal control (i.e. one-step hexaplex RT-PCR). Results showed that the generic and variant specific assays detected in vitro RNA transcripts from a range of 1×10(1)-1×10(8) copies of amplicon per μl diluted in healthy total RNA from Vitis vinifera cv. Cabernet Sauvignon. Furthermore, the assays were employed effectively to screen 157 germplasm and 159 commercial field samples. Thus results demonstrate that the GLRaV-3 generic and variant-specific assays are prospective tools that will be beneficial for certification schemes and disease management programmes, as well as biological and epidemiological studies of the divergent GLRaV-3 populations. PMID:23313884

  1. Targeted genomic enrichment and massively parallel sequencing identifies novel nonsyndromic hearing impairment pathogenic variants in Cameroonian families.

    PubMed

    Lebeko, K; Sloan-Heggen, C M; Noubiap, J J N; Dandara, C; Kolbe, D L; Ephraim, S S; Booth, K T; Azaiez, H; Santos-Cortez, R L P; Leal, S M; Smith, R J H; Wonkam, A

    2016-09-01

    In sub-Saharan Africa GJB2-related nonsyndromic hearing impairment (NSHI) is rare. Ten Cameroonian families was studied using a platform (OtoSCOPE®) with 116 genes. In seven of 10 families (70%), 12 pathogenic variants were identified in six genes. Five of the 12 (41.6%) variants are novel. These results confirm the efficiency of comprehensive genetic testing in defining the causes of NSHI in sub-Saharan Africa. PMID:27246798

  2. Sialic Acid Is a Cellular Receptor for Coxsackievirus A24 Variant, an Emerging Virus with Pandemic Potential▿

    PubMed Central

    Nilsson, Emma C.; Jamshidi, Fariba; Johansson, Susanne M. C.; Oberste, M. Steven; Arnberg, Niklas

    2008-01-01