Sample records for sequence variants detected

  1. A statistical method for the detection of variants from next-generation resequencing of DNA pools.

    PubMed

    Bansal, Vikas

    2010-06-15

    Next-generation sequencing technologies have enabled the sequencing of several human genomes in their entirety. However, the routine resequencing of complete genomes remains infeasible. The massive capacity of next-generation sequencers can be harnessed for sequencing specific genomic regions in hundreds to thousands of individuals. Sequencing-based association studies are currently limited by the low level of multiplexing offered by sequencing platforms. Pooled sequencing represents a cost-effective approach for studying rare variants in large populations. To utilize the power of DNA pooling, it is important to accurately identify sequence variants from pooled sequencing data. Detection of rare variants from pooled sequencing represents a different challenge than detection of variants from individual sequencing. We describe a novel statistical approach, CRISP [Comprehensive Read analysis for Identification of Single Nucleotide Polymorphisms (SNPs) from Pooled sequencing] that is able to identify both rare and common variants by using two approaches: (i) comparing the distribution of allele counts across multiple pools using contingency tables and (ii) evaluating the probability of observing multiple non-reference base calls due to sequencing errors alone. Information about the distribution of reads between the forward and reverse strands and the size of the pools is also incorporated within this framework to filter out false variants. Validation of CRISP on two separate pooled sequencing datasets generated using the Illumina Genome Analyzer demonstrates that it can detect 80-85% of SNPs identified using individual sequencing while achieving a low false discovery rate (3-5%). Comparison with previous methods for pooled SNP detection demonstrates the significantly lower false positive and false negative rates for CRISP. Implementation of this method is available at http://polymorphism.scripps.edu/~vbansal/software/CRISP/.

  2. VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research

    PubMed Central

    Lai, Zhongwu; Markovets, Aleksandra; Ahdesmaki, Miika; Chapman, Brad; Hofmann, Oliver; McEwen, Robert; Johnson, Justin; Dougherty, Brian; Barrett, J. Carl; Dry, Jonathan R.

    2016-01-01

    Abstract Accurate variant calling in next generation sequencing (NGS) is critical to understand cancer genomes better. Here we present VarDict, a novel and versatile variant caller for both DNA- and RNA-sequencing data. VarDict simultaneously calls SNV, MNV, InDels, complex and structural variants, expanding the detected genetic driver landscape of tumors. It performs local realignments on the fly for more accurate allele frequency estimation. VarDict performance scales linearly to sequencing depth, enabling ultra-deep sequencing used to explore tumor evolution or detect tumor DNA circulating in blood. In addition, VarDict performs amplicon aware variant calling for polymerase chain reaction (PCR)-based targeted sequencing often used in diagnostic settings, and is able to detect PCR artifacts. Finally, VarDict also detects differences in somatic and loss of heterozygosity variants between paired samples. VarDict reprocessing of The Cancer Genome Atlas (TCGA) Lung Adenocarcinoma dataset called known driver mutations in KRAS, EGFR, BRAF, PIK3CA and MET in 16% more patients than previously published variant calls. We believe VarDict will greatly facilitate application of NGS in clinical cancer research. PMID:27060149

  3. An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data.

    PubMed

    Jun, Goo; Wing, Mary Kate; Abecasis, Gonçalo R; Kang, Hyun Min

    2015-06-01

    The analysis of next-generation sequencing data is computationally and statistically challenging because of the massive volume of data and imperfect data quality. We present GotCloud, a pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes Project show that the pipeline provides effective filtering against false positive variants and high power to detect true variants. Our pipeline has already contributed to variant detection and genotyping in several large-scale sequencing projects, including the 1000 Genomes Project and the NHLBI Exome Sequencing Project. We hope it will now prove useful to many medical sequencing studies. © 2015 Jun et al.; Published by Cold Spring Harbor Laboratory Press.

  4. Linkage disequilibrium among commonly genotyped SNP and variants detected from bull sequence

    USDA-ARS?s Scientific Manuscript database

    Genomic prediction utilizing causal variants could increase selection accuracy above that achieved with SNP genotyped by commercial assays. A number of variants detected from sequencing influential sires are likely to be causal, but noticable improvements in prediction accuracy using imputed sequen...

  5. Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data.

    PubMed

    Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A; Larsen, Martin Jakob

    2016-01-01

    Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths.

  6. Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data

    PubMed Central

    Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A.; Larsen, Martin Jakob

    2016-01-01

    Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths. PMID:27002637

  7. Multiplexed enrichment of rare DNA variants via sequence-selective and temperature-robust amplification

    PubMed Central

    Wu, Lucia R.; Chen, Sherry X.; Wu, Yalei; Patel, Abhijit A.; Zhang, David Yu

    2018-01-01

    Rare DNA-sequence variants hold important clinical and biological information, but existing detection techniques are expensive, complex, allele-specific, or don’t allow for significant multiplexing. Here, we report a temperature-robust polymerase-chain-reaction method, which we term blocker displacement amplification (BDA), that selectively amplifies all sequence variants, including single-nucleotide variants (SNVs), within a roughly 20-nucleotide window by 1,000-fold over wild-type sequences. This allows for easy detection and quantitation of hundreds of potential variants originally at ≤0.1% in allele frequency. BDA is compatible with inexpensive thermocycler instrumentation and employs a rationally designed competitive hybridization reaction to achieve comparable enrichment performance across annealing temperatures ranging from 56 °C to 64 °C. To show the sequence generality of BDA, we demonstrate enrichment of 156 SNVs and the reliable detection of single-digit copies. We also show that the BDA detection of rare driver mutations in cell-free DNA samples extracted from the blood plasma of lung-cancer patients is highly consistent with deep sequencing using molecular lineage tags, with a receiver operator characteristic accuracy of 95%. PMID:29805844

  8. Higher criticism approach to detect rare variants using whole genome sequencing data

    PubMed Central

    2014-01-01

    Because of low statistical power of single-variant tests for whole genome sequencing (WGS) data, the association test for variant groups is a key approach for genetic mapping. To address the features of sparse and weak genetic effects to be detected, the higher criticism (HC) approach has been proposed and theoretically has proven optimal for detecting sparse and weak genetic effects. Here we develop a strategy to apply the HC approach to WGS data that contains rare variants as the majority. By using Genetic Analysis Workshop 18 "dose" genetic data with simulated phenotypes, we assess the performance of HC under a variety of strategies for grouping variants and collapsing rare variants. The HC approach is compared with the minimal p-value method and the sequence kernel association test. The results show that the HC approach is preferred for detecting weak genetic effects. PMID:25519367

  9. Comparison and evaluation of two exome capture kits and sequencing platforms for variant calling.

    PubMed

    Zhang, Guoqiang; Wang, Jianfeng; Yang, Jin; Li, Wenjie; Deng, Yutian; Li, Jing; Huang, Jun; Hu, Songnian; Zhang, Bing

    2015-08-05

    To promote the clinical application of next-generation sequencing, it is important to obtain accurate and consistent variants of target genomic regions at low cost. Ion Proton, the latest updated semiconductor-based sequencing instrument from Life Technologies, is designed to provide investigators with an inexpensive platform for human whole exome sequencing that achieves a rapid turnaround time. However, few studies have comprehensively compared and evaluated the accuracy of variant calling between Ion Proton and Illumina sequencing platforms such as HiSeq 2000, which is the most popular sequencing platform for the human genome. The Ion Proton sequencer combined with the Ion TargetSeq Exome Enrichment Kit together make up TargetSeq-Proton, whereas SureSelect-Hiseq is based on the Agilent SureSelect Human All Exon v4 Kit and the HiSeq 2000 sequencer. Here, we sequenced exonic DNA from four human blood samples using both TargetSeq-Proton and SureSelect-HiSeq. We then called variants in the exonic regions that overlapped between the two exome capture kits (33.6 Mb). The rates of shared variant loci called by two sequencing platforms were from 68.0 to 75.3% in four samples, whereas the concordance of co-detected variant loci reached 99%. Sanger sequencing validation revealed that the validated rate of concordant single nucleotide polymorphisms (SNPs) (91.5%) was higher than the SNPs specific to TargetSeq-Proton (60.0%) or specific to SureSelect-HiSeq (88.3%). With regard to 1-bp small insertions and deletions (InDels), the Sanger sequencing validated rates of concordant variants (100.0%) and SureSelect-HiSeq-specific (89.6%) were higher than those of TargetSeq-Proton-specific (15.8%). In the sequencing of exonic regions, a combination of using of two sequencing strategies (SureSelect-HiSeq and TargetSeq-Proton) increased the variant calling specificity for concordant variant loci and the sensitivity for variant loci called by any one platform. However, for the sequencing of platform-specific variants, the accuracy of variant calling by HiSeq 2000 was higher than that of Ion Proton, specifically for the InDel detection. Moreover, the variant calling software also influences the detection of SNPs and, specifically, InDels in Ion Proton exome sequencing.

  10. Detecting very low allele fraction variants using targeted DNA sequencing and a novel molecular barcode-aware variant caller.

    PubMed

    Xu, Chang; Nezami Ranjbar, Mohammad R; Wu, Zhong; DiCarlo, John; Wang, Yexun

    2017-01-03

    Detection of DNA mutations at very low allele fractions with high accuracy will significantly improve the effectiveness of precision medicine for cancer patients. To achieve this goal through next generation sequencing, researchers need a detection method that 1) captures rare mutation-containing DNA fragments efficiently in the mix of abundant wild-type DNA; 2) sequences the DNA library extensively to deep coverage; and 3) distinguishes low level true variants from amplification and sequencing errors with high accuracy. Targeted enrichment using PCR primers provides researchers with a convenient way to achieve deep sequencing for a small, yet most relevant region using benchtop sequencers. Molecular barcoding (or indexing) provides a unique solution for reducing sequencing artifacts analytically. Although different molecular barcoding schemes have been reported in recent literature, most variant calling has been done on limited targets, using simple custom scripts. The analytical performance of barcode-aware variant calling can be significantly improved by incorporating advanced statistical models. We present here a highly efficient, simple and scalable enrichment protocol that integrates molecular barcodes in multiplex PCR amplification. In addition, we developed smCounter, an open source, generic, barcode-aware variant caller based on a Bayesian probabilistic model. smCounter was optimized and benchmarked on two independent read sets with SNVs and indels at 5 and 1% allele fractions. Variants were called with very good sensitivity and specificity within coding regions. We demonstrated that we can accurately detect somatic mutations with allele fractions as low as 1% in coding regions using our enrichment protocol and variant caller.

  11. BlackOPs: increasing confidence in variant detection through mappability filtering.

    PubMed

    Cabanski, Christopher R; Wilkerson, Matthew D; Soloway, Matthew; Parker, Joel S; Liu, Jinze; Prins, Jan F; Marron, J S; Perou, Charles M; Hayes, D Neil

    2013-10-01

    Identifying variants using high-throughput sequencing data is currently a challenge because true biological variants can be indistinguishable from technical artifacts. One source of technical artifact results from incorrectly aligning experimentally observed sequences to their true genomic origin ('mismapping') and inferring differences in mismapped sequences to be true variants. We developed BlackOPs, an open-source tool that simulates experimental RNA-seq and DNA whole exome sequences derived from the reference genome, aligns these sequences by custom parameters, detects variants and outputs a blacklist of positions and alleles caused by mismapping. Blacklists contain thousands of artifact variants that are indistinguishable from true variants and, for a given sample, are expected to be almost completely false positives. We show that these blacklist positions are specific to the alignment algorithm and read length used, and BlackOPs allows users to generate a blacklist specific to their experimental setup. We queried the dbSNP and COSMIC variant databases and found numerous variants indistinguishable from mapping errors. We demonstrate how filtering against blacklist positions reduces the number of potential false variants using an RNA-seq glioblastoma cell line data set. In summary, accounting for mapping-caused variants tuned to experimental setups reduces false positives and, therefore, improves genome characterization by high-throughput sequencing.

  12. Investigation of rare and low-frequency variants using high-throughput sequencing with pooled DNA samples

    PubMed Central

    Wang, Jingwen; Skoog, Tiina; Einarsdottir, Elisabet; Kaartokallio, Tea; Laivuori, Hannele; Grauers, Anna; Gerdhem, Paul; Hytönen, Marjo; Lohi, Hannes; Kere, Juha; Jiao, Hong

    2016-01-01

    High-throughput sequencing using pooled DNA samples can facilitate genome-wide studies on rare and low-frequency variants in a large population. Some major questions concerning the pooling sequencing strategy are whether rare and low-frequency variants can be detected reliably, and whether estimated minor allele frequencies (MAFs) can represent the actual values obtained from individually genotyped samples. In this study, we evaluated MAF estimates using three variant detection tools with two sets of pooled whole exome sequencing (WES) and one set of pooled whole genome sequencing (WGS) data. Both GATK and Freebayes displayed high sensitivity, specificity and accuracy when detecting rare or low-frequency variants. For the WGS study, 56% of the low-frequency variants in Illumina array have identical MAFs and 26% have one allele difference between sequencing and individual genotyping data. The MAF estimates from WGS correlated well (r = 0.94) with those from Illumina arrays. The MAFs from the pooled WES data also showed high concordance (r = 0.88) with those from the individual genotyping data. In conclusion, the MAFs estimated from pooled DNA sequencing data reflect the MAFs in individually genotyped samples well. The pooling strategy can thus be a rapid and cost-effective approach for the initial screening in large-scale association studies. PMID:27633116

  13. Clinical Validation and Implementation of a Targeted Next-Generation Sequencing Assay to Detect Somatic Variants in Non-Small Cell Lung, Melanoma, and Gastrointestinal Malignancies

    PubMed Central

    Fisher, Kevin E.; Zhang, Linsheng; Wang, Jason; Smith, Geoffrey H.; Newman, Scott; Schneider, Thomas M.; Pillai, Rathi N.; Kudchadkar, Ragini R.; Owonikoko, Taofeek K.; Ramalingam, Suresh S.; Lawson, David H.; Delman, Keith A.; El-Rayes, Bassel F.; Wilson, Malania M.; Sullivan, H. Clifford; Morrison, Annie S.; Balci, Serdar; Adsay, N. Volkan; Gal, Anthony A.; Sica, Gabriel L.; Saxe, Debra F.; Mann, Karen P.; Hill, Charles E.; Khuri, Fadlo R.; Rossi, Michael R.

    2017-01-01

    We tested and clinically validated a targeted next-generation sequencing (NGS) mutation panel using 80 formalin-fixed, paraffin-embedded (FFPE) tumor samples. Forty non-small cell lung carcinoma (NSCLC), 30 melanoma, and 30 gastrointestinal (12 colonic, 10 gastric, and 8 pancreatic adenocarcinoma) FFPE samples were selected from laboratory archives. After appropriate specimen and nucleic acid quality control, 80 NGS libraries were prepared using the Illumina TruSight tumor (TST) kit and sequenced on the Illumina MiSeq. Sequence alignment, variant calling, and sequencing quality control were performed using vendor software and laboratory-developed analysis workflows. TST generated ≥500× coverage for 98.4% of the 13,952 targeted bases. Reproducible and accurate variant calling was achieved at ≥5% variant allele frequency with 8 to 12 multiplexed samples per MiSeq flow cell. TST detected 112 variants overall, and confirmed all known single-nucleotide variants (n = 27), deletions (n = 5), insertions (n = 3), and multinucleotide variants (n = 3). TST detected at least one variant in 85.0% (68/80), and two or more variants in 36.2% (29/80), of samples. TP53 was the most frequently mutated gene in NSCLC (13 variants; 13/32 samples), gastrointestinal malignancies (15 variants; 13/25 samples), and overall (30 variants; 28/80 samples). BRAF mutations were most common in melanoma (nine variants; 9/23 samples). Clinically relevant NGS data can be obtained from routine clinical FFPE solid tumor specimens using TST, benchtop instruments, and vendor-supplied bioinformatics pipelines. PMID:26801070

  14. Validation of a next-generation sequencing assay for clinical molecular oncology.

    PubMed

    Cottrell, Catherine E; Al-Kateb, Hussam; Bredemeyer, Andrew J; Duncavage, Eric J; Spencer, David H; Abel, Haley J; Lockwood, Christina M; Hagemann, Ian S; O'Guin, Stephanie M; Burcea, Lauren C; Sawyer, Christopher S; Oschwald, Dayna M; Stratman, Jennifer L; Sher, Dorie A; Johnson, Mark R; Brown, Justin T; Cliften, Paul F; George, Bijoy; McIntosh, Leslie D; Shrivastava, Savita; Nguyen, Tudung T; Payton, Jacqueline E; Watson, Mark A; Crosby, Seth D; Head, Richard D; Mitra, Robi D; Nagarajan, Rakesh; Kulkarni, Shashikant; Seibert, Karen; Virgin, Herbert W; Milbrandt, Jeffrey; Pfeifer, John D

    2014-01-01

    Currently, oncology testing includes molecular studies and cytogenetic analysis to detect genetic aberrations of clinical significance. Next-generation sequencing (NGS) allows rapid analysis of multiple genes for clinically actionable somatic variants. The WUCaMP assay uses targeted capture for NGS analysis of 25 cancer-associated genes to detect mutations at actionable loci. We present clinical validation of the assay and a detailed framework for design and validation of similar clinical assays. Deep sequencing of 78 tumor specimens (≥ 1000× average unique coverage across the capture region) achieved high sensitivity for detecting somatic variants at low allele fraction (AF). Validation revealed sensitivities and specificities of 100% for detection of single-nucleotide variants (SNVs) within coding regions, compared with SNP array sequence data (95% CI = 83.4-100.0 for sensitivity and 94.2-100.0 for specificity) or whole-genome sequencing (95% CI = 89.1-100.0 for sensitivity and 99.9-100.0 for specificity) of HapMap samples. Sensitivity for detecting variants at an observed 10% AF was 100% (95% CI = 93.2-100.0) in HapMap mixes. Analysis of 15 masked specimens harboring clinically reported variants yielded concordant calls for 13/13 variants at AF of ≥ 15%. The WUCaMP assay is a robust and sensitive method to detect somatic variants of clinical significance in molecular oncology laboratories, with reduced time and cost of genetic analysis allowing for strategic patient management. Copyright © 2014 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  15. Pooled-DNA Sequencing for Elucidating New Genomic Risk Factors, Rare Variants Underlying Alzheimer's Disease.

    PubMed

    Jin, Sheng Chih; Benitez, Bruno A; Deming, Yuetiva; Cruchaga, Carlos

    2016-01-01

    Analyses of genome-wide association studies (GWAS) for complex disorders usually identify common variants with a relatively small effect size that only explain a small proportion of phenotypic heritability. Several studies have suggested that a significant fraction of heritability may be explained by low-frequency (minor allele frequency (MAF) of 1-5 %) and rare-variants that are not contained in the commercial GWAS genotyping arrays (Schork et al., Curr Opin Genet Dev 19:212, 2009). Rare variants can also have relatively large effects on risk for developing human diseases or disease phenotype (Cruchaga et al., PLoS One 7:e31039, 2012). However, it is necessary to perform next-generation sequencing (NGS) studies in a large population (>4,000 samples) to detect a significant rare-variant association. Several NGS methods, such as custom capture sequencing and amplicon-based sequencing, are designed to screen a small proportion of the genome, but most of these methods are limited in the number of samples that can be multiplexed (i.e. most sequencing kits only provide 96 distinct index). Additionally, the sequencing library preparation for 4,000 samples remains expensive and thus conducting NGS studies with the aforementioned methods are not feasible for most research laboratories.The need for low-cost large scale rare-variant detection makes pooled-DNA sequencing an ideally efficient and cost-effective technique to identify rare variants in target regions by sequencing hundreds to thousands of samples. Our recent work has demonstrated that pooled-DNA sequencing can accurately detect rare variants in targeted regions in multiple DNA samples with high sensitivity and specificity (Jin et al., Alzheimers Res Ther 4:34, 2012). In these studies we used a well-established pooled-DNA sequencing approach and a computational package, SPLINTER (short indel prediction by large deviation inference and nonlinear true frequency estimation by recursion) (Vallania et al., Genome Res 20:1711, 2010), for accurate identification of rare variants in large DNA pools. Given an average sequencing coverage of 30× per haploid genome, SPLINTER can detect rare variants and short indels up to 4 base pairs (bp) with high sensitivity and specificity (up to 1 haploid allele in a pool as large as 500 individuals). Step-by-step instructions on how to conduct pooled-DNA sequencing experiments and data analyses are described in this chapter.

  16. Diff-seq: A high throughput sequencing-based mismatch detection assay for DNA variant enrichment and discovery

    PubMed Central

    Karas, Vlad O; Sinnott-Armstrong, Nicholas A; Varghese, Vici; Shafer, Robert W; Greenleaf, William J; Sherlock, Gavin

    2018-01-01

    Abstract Much of the within species genetic variation is in the form of single nucleotide polymorphisms (SNPs), typically detected by whole genome sequencing (WGS) or microarray-based technologies. However, WGS produces mostly uninformative reads that perfectly match the reference, while microarrays require genome-specific reagents. We have developed Diff-seq, a sequencing-based mismatch detection assay for SNP discovery without the requirement for specialized nucleic-acid reagents. Diff-seq leverages the Surveyor endonuclease to cleave mismatched DNA molecules that are generated after cross-annealing of a complex pool of DNA fragments. Sequencing libraries enriched for Surveyor-cleaved molecules result in increased coverage at the variant sites. Diff-seq detected all mismatches present in an initial test substrate, with specific enrichment dependent on the identity and context of the variation. Application to viral sequences resulted in increased observation of variant alleles in a biologically relevant context. Diff-Seq has the potential to increase the sensitivity and efficiency of high-throughput sequencing in the detection of variation. PMID:29361139

  17. MYO7A and USH2A gene sequence variants in Italian patients with Usher syndrome.

    PubMed

    Sodi, Andrea; Mariottini, Alessandro; Passerini, Ilaria; Murro, Vittoria; Tachyla, Iryna; Bianchi, Benedetta; Menchini, Ugo; Torricelli, Francesca

    2014-01-01

    To analyze the spectrum of sequence variants in the MYO7A and USH2A genes in a group of Italian patients affected by Usher syndrome (USH). Thirty-six Italian patients with a diagnosis of USH were recruited. They received a standard ophthalmologic examination, visual field testing, optical coherence tomography (OCT) scan, and electrophysiological tests. Fluorescein angiography and fundus autofluorescence imaging were performed in selected cases. All the patients underwent an audiologic examination for the 0.25-8,000 Hz frequencies. Vestibular function was evaluated with specific tests. DNA samples were analyzed for sequence variants of the MYO7A gene (for USH1) and the USH2A gene (for USH2) with direct sequencing techniques. A few patients were analyzed for both genes. In the MYO7A gene, ten missense variants were found; three patients were compound heterozygous, and two were homozygous. Thirty-four USH2A gene variants were detected, including eight missense variants, nine nonsense variants, six splicing variants, and 11 duplications/deletions; 19 patients were compound heterozygous, and three were homozygous. Four MYO7A and 17 USH2A variants have already been described in the literature. Among the novel mutations there are four USH2A large deletions, detected with multiplex ligation dependent probe amplification (MLPA) technology. Two potentially pathogenic variants were found in 27 patients (75%). Affected patients showed variable clinical pictures without a clear genotype-phenotype correlation. Ten variants in the MYO7A gene and 34 variants in the USH2A gene were detected in Italian patients with USH at a high detection rate. A selective analysis of these genes may be valuable for molecular analysis, combining diagnostic efficiency with little time wastage and less resource consumption.

  18. Ultra-deep mutant spectrum profiling: improving sequencing accuracy using overlapping read pairs.

    PubMed

    Chen-Harris, Haiyin; Borucki, Monica K; Torres, Clinton; Slezak, Tom R; Allen, Jonathan E

    2013-02-12

    High throughput sequencing is beginning to make a transformative impact in the area of viral evolution. Deep sequencing has the potential to reveal the mutant spectrum within a viral sample at high resolution, thus enabling the close examination of viral mutational dynamics both within- and between-hosts. The challenge however, is to accurately model the errors in the sequencing data and differentiate real viral mutations, particularly those that exist at low frequencies, from sequencing errors. We demonstrate that overlapping read pairs (ORP) -- generated by combining short fragment sequencing libraries and longer sequencing reads -- significantly reduce sequencing error rates and improve rare variant detection accuracy. Using this sequencing protocol and an error model optimized for variant detection, we are able to capture a large number of genetic mutations present within a viral population at ultra-low frequency levels (<0.05%). Our rare variant detection strategies have important implications beyond viral evolution and can be applied to any basic and clinical research area that requires the identification of rare mutations.

  19. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ruggles, Kelly V.; Tang, Zuojian; Wang, Xuya

    Improvements in mass spectrometry (MS)-based peptide sequencing provide a new opportunity to determine whether polymorphisms, mutations and splice variants identified in cancer cells are translated. Herein we therefore describe a proteogenomic data integration tool (QUILTS) and illustrate its application to whole genome, transcriptome and global MS peptide sequence datasets generated from a pair of luminal and basal-like breast cancer patient derived xenografts (PDX). The sensitivity of proteogenomic analysis for singe nucleotide variant (SNV) expression and novel splice junction (NSJ) detection was probed using multiple MS/MS process replicates. Despite over thirty sample replicates, only about 10% of all SNV (somatic andmore » germline) were detected by both DNA and RNA sequencing were observed as peptides. An even smaller proportion of peptides corresponding to NSJ observed by RNA sequencing were detected (<0.1%). Peptides mapping to DNA-detected SNV without a detectable mRNA transcript were also observed demonstrating the transcriptome coverage was also incomplete (~80%). In contrast to germ-line variants, somatic variants were less likely to be detected at the peptide level in the basal-like tumor than the luminal tumor raising the possibility of differential translation or protein degradation effects. In conclusion, the QUILTS program integrates DNA, RNA and peptide sequencing to assess the degree to which somatic mutations are translated and therefore biologically active. By identifying gaps in sequence coverage QUILTS benchmarks current technology and assesses progress towards whole cancer proteome and transcriptome analysis.« less

  20. Genotype-specific signal generation based on digestion of 3-way DNA junctions: application to KRAS variation detection.

    PubMed

    Amicarelli, Giulia; Adlerstein, Daniel; Shehi, Erlet; Wang, Fengfei; Makrigiorgos, G Mike

    2006-10-01

    Genotyping methods that reveal single-nucleotide differences are useful for a wide range of applications. We used digestion of 3-way DNA junctions in a novel technology, OneCutEventAmplificatioN (OCEAN) that allows sequence-specific signal generation and amplification. We combined OCEAN with peptide-nucleic-acid (PNA)-based variant enrichment to detect and simultaneously genotype v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog (KRAS) codon 12 sequence variants in human tissue specimens. We analyzed KRAS codon 12 sequence variants in 106 lung cancer surgical specimens. We conducted a PNA-PCR reaction that suppresses wild-type KRAS amplification and genotyped the product with a set of OCEAN reactions carried out in fluorescence microplate format. The isothermal OCEAN assay enabled a 3-way DNA junction to form between the specific target nucleic acid, a fluorescently labeled "amplifier", and an "anchor". The amplifier-anchor contact contains the recognition site for a restriction enzyme. Digestion produces a cleaved amplifier and generation of a fluorescent signal. The cleaved amplifier dissociates from the 3-way DNA junction, allowing a new amplifier to bind and propagate the reaction. The system detected and genotyped KRAS sequence variants down to approximately 0.3% variant-to-wild-type alleles. PNA-PCR/OCEAN had a concordance rate with PNA-PCR/sequencing of 93% to 98%, depending on the exact implementation. Concordance rate with restriction endonuclease-mediated selective-PCR/sequencing was 89%. OCEAN is a practical and low-cost novel technology for sequence-specific signal generation. Reliable analysis of KRAS sequence alterations in human specimens circumvents the requirement for sequencing. Application is expected in genotyping KRAS codon 12 sequence variants in surgical specimens or in bodily fluids, as well as single-base variations and sequence alterations in other genes.

  1. Analysis of selected genes associated with cardiomyopathy by next-generation sequencing.

    PubMed

    Szabadosova, Viktoria; Boronova, Iveta; Ferenc, Peter; Tothova, Iveta; Bernasovska, Jarmila; Zigova, Michaela; Kmec, Jan; Bernasovsky, Ivan

    2018-02-01

    As the leading cause of congestive heart failure, cardiomyopathy represents a heterogenous group of heart muscle disorders. Despite considerable progress being made in the genetic diagnosis of cardiomyopathy by detection of the mutations in the most prevalent cardiomyopathy genes, the cause remains unsolved in many patients. High-throughput mutation screening in the disease genes for cardiomyopathy is now possible because of using target enrichment followed by next-generation sequencing. The aim of the study was to analyze a panel of genes associated with dilated or hypertrophic cardiomyopathy based on previously published results in order to identify the subjects at risk. The method of next-generation sequencing by IlluminaHiSeq 2500 platform was used to detect sequence variants in 16 individuals diagnosed with dilated or hypertrophic cardiomyopathy. Detected variants were filtered and the functional impact of amino acid changes was predicted by computational programs. DNA samples of the 16 patients were analyzed by whole exome sequencing. We identified six nonsynonymous variants that were shown to be pathogenic in all used prediction softwares: rs3744998 (EPG5), rs11551768 (MGME1), rs148374985 (MURC), rs78461695 (PLEC), rs17158558 (RET) and rs2295190 (SYNE1). Two of the analyzed sequence variants had minor allele frequency (MAF)<0.01: rs148374985 (MURC), rs34580776 (MYBPC3). Our data support the potential role of the detected variants in pathogenesis of dilated or hypertrophic cardiomyopathy; however, the possibility that these variants might not be true disease-causing variants but are susceptibility alleles that require additional mutations or injury to cause the clinical phenotype of disease must be considered. © 2017 Wiley Periodicals, Inc.

  2. SvABA: genome-wide detection of structural variants and indels by local assembly.

    PubMed

    Wala, Jeremiah A; Bandopadhayay, Pratiti; Greenwald, Noah F; O'Rourke, Ryan; Sharpe, Ted; Stewart, Chip; Schumacher, Steve; Li, Yilong; Weischenfeldt, Joachim; Yao, Xiaotong; Nusbaum, Chad; Campbell, Peter; Getz, Gad; Meyerson, Matthew; Zhang, Cheng-Zhong; Imielinski, Marcin; Beroukhim, Rameen

    2018-04-01

    Structural variants (SVs), including small insertion and deletion variants (indels), are challenging to detect through standard alignment-based variant calling methods. Sequence assembly offers a powerful approach to identifying SVs, but is difficult to apply at scale genome-wide for SV detection due to its computational complexity and the difficulty of extracting SVs from assembly contigs. We describe SvABA, an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements. We evaluated SvABA's performance on the NA12878 human genome and in simulated and real cancer genomes. SvABA demonstrates superior sensitivity and specificity across a large spectrum of SVs and substantially improves detection performance for variants in the 20-300 bp range, compared with existing methods. SvABA also identifies complex somatic rearrangements with chains of short (<1000 bp) templated-sequence insertions copied from distant genomic regions. We applied SvABA to 344 cancer genomes from 11 cancer types and found that short templated-sequence insertions occur in ∼4% of all somatic rearrangements. Finally, we demonstrate that SvABA can identify sites of viral integration and cancer driver alterations containing medium-sized (50-300 bp) SVs. © 2018 Wala et al.; Published by Cold Spring Harbor Laboratory Press.

  3. Comparison of illumina and 454 deep sequencing in participants failing raltegravir-based antiretroviral therapy.

    PubMed

    Li, Jonathan Z; Chapman, Brad; Charlebois, Patrick; Hofmann, Oliver; Weiner, Brian; Porter, Alyssa J; Samuel, Reshmi; Vardhanabhuti, Saran; Zheng, Lu; Eron, Joseph; Taiwo, Babafemi; Zody, Michael C; Henn, Matthew R; Kuritzkes, Daniel R; Hide, Winston; Wilson, Cara C; Berzins, Baiba I; Acosta, Edward P; Bastow, Barbara; Kim, Peter S; Read, Sarah W; Janik, Jennifer; Meres, Debra S; Lederman, Michael M; Mong-Kryspin, Lori; Shaw, Karl E; Zimmerman, Louis G; Leavitt, Randi; De La Rosa, Guy; Jennings, Amy

    2014-01-01

    The impact of raltegravir-resistant HIV-1 minority variants (MVs) on raltegravir treatment failure is unknown. Illumina sequencing offers greater throughput than 454, but sequence analysis tools for viral sequencing are needed. We evaluated Illumina and 454 for the detection of HIV-1 raltegravir-resistant MVs. A5262 was a single-arm study of raltegravir and darunavir/ritonavir in treatment-naïve patients. Pre-treatment plasma was obtained from 5 participants with raltegravir resistance at the time of virologic failure. A control library was created by pooling integrase clones at predefined proportions. Multiplexed sequencing was performed with Illumina and 454 platforms at comparable costs. Illumina sequence analysis was performed with the novel snp-assess tool and 454 sequencing was analyzed with V-Phaser. Illumina sequencing resulted in significantly higher sequence coverage and a 0.095% limit of detection. Illumina accurately detected all MVs in the control library at ≥0.5% and 7/10 MVs expected at 0.1%. 454 sequencing failed to detect any MVs at 0.1% with 5 false positive calls. For MVs detected in the patient samples by both 454 and Illumina, the correlation in the detected variant frequencies was high (R2 = 0.92, P<0.001). Illumina sequencing detected 2.4-fold greater nucleotide MVs and 2.9-fold greater amino acid MVs compared to 454. The only raltegravir-resistant MV detected was an E138K mutation in one participant by Illumina sequencing, but not by 454. In participants of A5262 with raltegravir resistance at virologic failure, baseline raltegravir-resistant MVs were rarely detected. At comparable costs to 454 sequencing, Illumina demonstrated greater depth of coverage, increased sensitivity for detecting HIV MVs, and fewer false positive variant calls.

  4. Quick, sensitive and specific detection and evaluation of quantification of minor variants by high-throughput sequencing.

    PubMed

    Leung, Ross Ka-Kit; Dong, Zhi Qiang; Sa, Fei; Chong, Cheong Meng; Lei, Si Wan; Tsui, Stephen Kwok-Wing; Lee, Simon Ming-Yuen

    2014-02-01

    Minor variants have significant implications in quasispecies evolution, early cancer detection and non-invasive fetal genotyping but their accurate detection by next-generation sequencing (NGS) is hampered by sequencing errors. We generated sequencing data from mixtures at predetermined ratios in order to provide insight into sequencing errors and variations that can arise for which simulation cannot be performed. The information also enables better parameterization in depth of coverage, read quality and heterogeneity, library preparation techniques, technical repeatability for mathematical modeling, theory development and simulation experimental design. We devised minor variant authentication rules that achieved 100% accuracy in both testing and validation experiments. The rules are free from tedious inspection of alignment accuracy, sequencing read quality or errors introduced by homopolymers. The authentication processes only require minor variants to: (1) have minimum depth of coverage larger than 30; (2) be reported by (a) four or more variant callers, or (b) DiBayes or LoFreq, plus SNVer (or BWA when no results are returned by SNVer), and with the interassay coefficient of variation (CV) no larger than 0.1. Quantification accuracy undermined by sequencing errors could neither be overcome by ultra-deep sequencing, nor recruiting more variant callers to reach a consensus, such that consistent underestimation and overestimation (i.e. low CV) were observed. To accommodate stochastic error and adjust the observed ratio within a specified accuracy, we presented a proof of concept for the use of a double calibration curve for quantification, which provides an important reference towards potential industrial-scale fabrication of calibrants for NGS.

  5. Increased Sensitivity of Diagnostic Mutation Detection by Re-analysis Incorporating Local Reassembly of Sequence Reads.

    PubMed

    Watson, Christopher M; Camm, Nick; Crinnion, Laura A; Clokie, Samuel; Robinson, Rachel L; Adlard, Julian; Charlton, Ruth; Markham, Alexander F; Carr, Ian M; Bonthron, David T

    2017-12-01

    Diagnostic genetic testing programmes based on next-generation DNA sequencing have resulted in the accrual of large datasets of targeted raw sequence data. Most diagnostic laboratories process these data through an automated variant-calling pipeline. Validation of the chosen analytical methods typically depends on confirming the detection of known sequence variants. Despite improvements in short-read alignment methods, current pipelines are known to be comparatively poor at detecting large insertion/deletion mutations. We performed clinical validation of a local reassembly tool, ABRA (assembly-based realigner), through retrospective reanalysis of a cohort of more than 2000 hereditary cancer cases. ABRA enabled detection of a 96-bp deletion, 4-bp insertion mutation in PMS2 that had been initially identified using a comparative read-depth approach. We applied an updated pipeline incorporating ABRA to the entire cohort of 2000 cases and identified one previously undetected pathogenic variant, a 23-bp duplication in PTEN. We demonstrate the effect of read length on the ability to detect insertion/deletion variants by comparing HiSeq2500 (2 × 101-bp) and NextSeq500 (2 × 151-bp) sequence data for a range of variants and thereby show that the limitations of shorter read lengths can be mitigated using appropriate informatics tools. This work highlights the need for ongoing development of diagnostic pipelines to maximize test sensitivity. We also draw attention to the large differences in computational infrastructure required to perform day-to-day versus large-scale reprocessing tasks.

  6. Sensitivity of BRCA1/2 testing in high-risk breast/ovarian/male breast cancer families: little contribution of comprehensive RNA/NGS panel testing.

    PubMed

    Byers, Helen; Wallis, Yvonne; van Veen, Elke M; Lalloo, Fiona; Reay, Kim; Smith, Philip; Wallace, Andrew J; Bowers, Naomi; Newman, William G; Evans, D Gareth

    2016-11-01

    The sensitivity of testing BRCA1 and BRCA2 remains unresolved as the frequency of deep intronic splicing variants has not been defined in high-risk familial breast/ovarian cancer families. This variant category is reported at significant frequency in other tumour predisposition genes, including NF1 and MSH2. We carried out comprehensive whole gene RNA analysis on 45 high-risk breast/ovary and male breast cancer families with no identified pathogenic variant on exonic sequencing and copy number analysis of BRCA1/2. In addition, we undertook variant screening of a 10-gene high/moderate risk breast/ovarian cancer panel by next-generation sequencing. DNA testing identified the causative variant in 50/56 (89%) breast/ovarian/male breast cancer families with Manchester scores of ≥50 with two variants being confirmed to affect splicing on RNA analysis. RNA sequencing of BRCA1/BRCA2 on 45 individuals from high-risk families identified no deep intronic variants and did not suggest loss of RNA expression as a cause of lost sensitivity. Panel testing in 42 samples identified a known RAD51D variant, a high-risk ATM variant in another breast ovary family and a truncating CHEK2 mutation. Current exonic sequencing and copy number analysis variant detection methods of BRCA1/2 have high sensitivity in high-risk breast/ovarian cancer families. Sequence analysis of RNA does not identify any variants undetected by current analysis of BRCA1/2. However, RNA analysis clarified the pathogenicity of variants of unknown significance detected by current methods. The low diagnostic uplift achieved through sequence analysis of the other known breast/ovarian cancer susceptibility genes indicates that further high-risk genes remain to be identified.

  7. Resequencing Pathogen Microarray (RPM) for prospective detection and identification of emergent pathogen strains and variants

    NASA Astrophysics Data System (ADS)

    Tibbetts, Clark; Lichanska, Agnieszka M.; Borsuk, Lisa A.; Weslowski, Brian; Morris, Leah M.; Lorence, Matthew C.; Schafer, Klaus O.; Campos, Joseph; Sene, Mohamadou; Myers, Christopher A.; Faix, Dennis; Blair, Patrick J.; Brown, Jason; Metzgar, David

    2010-04-01

    High-density resequencing microarrays support simultaneous detection and identification of multiple viral and bacterial pathogens. Because detection and identification using RPM is based upon multiple specimen-specific target pathogen gene sequences generated in the individual test, the test results enable both a differential diagnostic analysis and epidemiological tracking of detected pathogen strains and variants from one specimen to the next. The RPM assay enables detection and identification of pathogen sequences that share as little as 80% sequence similarity to prototype target gene sequences represented as detector tiles on the array. This capability enables the RPM to detect and identify previously unknown strains and variants of a detected pathogen, as in sentinel cases associated with an infectious disease outbreak. We illustrate this capability using assay results from testing influenza A virus vaccines configured with strains that were first defined years after the design of the RPM microarray. Results are also presented from RPM-Flu testing of three specimens independently confirmed to the positive for the 2009 Novel H1N1 outbreak strain of influenza virus.

  8. HapFABIA: Identification of very short segments of identity by descent characterized by rare variants in large sequencing data

    PubMed Central

    Hochreiter, Sepp

    2013-01-01

    Identity by descent (IBD) can be reliably detected for long shared DNA segments, which are found in related individuals. However, many studies contain cohorts of unrelated individuals that share only short IBD segments. New sequencing technologies facilitate identification of short IBD segments through rare variants, which convey more information on IBD than common variants. Current IBD detection methods, however, are not designed to use rare variants for the detection of short IBD segments. Short IBD segments reveal genetic structures at high resolution. Therefore, they can help to improve imputation and phasing, to increase genotyping accuracy for low-coverage sequencing and to increase the power of association studies. Since short IBD segments are further assumed to be old, they can shed light on the evolutionary history of humans. We propose HapFABIA, a computational method that applies biclustering to identify very short IBD segments characterized by rare variants. HapFABIA is designed to detect short IBD segments in genotype data that were obtained from next-generation sequencing, but can also be applied to DNA microarray data. Especially in next-generation sequencing data, HapFABIA exploits rare variants for IBD detection. HapFABIA significantly outperformed competing algorithms at detecting short IBD segments on artificial and simulated data with rare variants. HapFABIA identified 160 588 different short IBD segments characterized by rare variants with a median length of 23 kb (mean 24 kb) in data for chromosome 1 of the 1000 Genomes Project. These short IBD segments contain 752 000 single nucleotide variants (SNVs), which account for 39% of the rare variants and 23.5% of all variants. The vast majority—152 000 IBD segments—are shared by Africans, while only 19 000 and 11 000 are shared by Europeans and Asians, respectively. IBD segments that match the Denisova or the Neandertal genome are found significantly more often in Asians and Europeans but also, in some cases exclusively, in Africans. The lengths of IBD segments and their sharing between continental populations indicate that many short IBD segments from chromosome 1 existed before humans migrated out of Africa. Thus, rare variants that tag these short IBD segments predate human migration from Africa. The software package HapFABIA is available from Bioconductor. All data sets, result files and programs for data simulation, preprocessing and evaluation are supplied at http://www.bioinf.jku.at/research/short-IBD. PMID:24174545

  9. Characterization of alanine to valine sequence variants in the Fc region of nivolumab biosimilar produced in Chinese hamster ovary cells.

    PubMed

    Li, Yantao; Fu, Tuo; Liu, Tao; Guo, Huaizu; Guo, Qingcheng; Xu, Jin; Zhang, Dapeng; Qian, Weizhu; Dai, Jianxin; Li, Bohua; Guo, Yajun; Hou, Sheng; Wang, Hao

    2016-07-01

    Nivolumab is a therapeutic fully human IgG4 antibody to programmed death 1 (PD-1). In this study, a nivolumab biosimilar, which was produced in our laboratory, was analyzed and characterized. Sequence variants that contain undesired amino acid sequences may cause concern during biosimilar bioprocess development. We found that low levels of sequence variants were detected in the heavy chain of the nivolumab biosimilar by ultra performance liquid chromatography (UPLC) and tandem mass spectrometry. It was further identified with UPLC-MS/MS by IdeS or trypsin digestion. The sequence variant was confirmed through addition of synthetic mutant peptide. Subsequently, the mixing base signal of normal and mutant sequence was detected through DNA sequencing. The relative levels of mutant A424V in the Fc region of the heavy chain have been detected and demonstrated to be 12.25% and 13.54%, via base peak intensity (BPI) and UV chromatography of the tryptic peptide mapping, respectively. A424V variant was also quantified by real-time PCR (RT-PCR) at the DNA and RNA level, which was 19.2% and 16.8%, respectively. The relative content of the mutant was consistent at the DNA, RNA and protein level, indicating that the A424V mutation may have little influence at transcriptional or translational levels. These results demonstrate that orthogonal state-of-the-art techniques such as LC- UV- MS and RT-PCR should be implemented to characterize recombinant proteins and cell lines for development of biosimilars. Our study suggests that it is important to establish an integrated and effective analytical method to monitor and characterize sequence variants during antibody drug development, especially for antibody biosimilar products.

  10. Analysis of CHRNA7 rare variants in autism spectrum disorder susceptibility.

    PubMed

    Bacchelli, Elena; Battaglia, Agatino; Cameli, Cinzia; Lomartire, Silvia; Tancredi, Raffaella; Thomson, Susanne; Sutcliffe, James S; Maestrini, Elena

    2015-04-01

    Chromosome 15q13.3 recurrent microdeletions are causally associated with a wide range of phenotypes, including autism spectrum disorder (ASD), seizures, intellectual disability, and other psychiatric conditions. Whether the reciprocal microduplication is pathogenic is less certain. CHRNA7, encoding for the alpha7 subunit of the neuronal nicotinic acetylcholine receptor, is considered the likely culprit gene in mediating neurological phenotypes in 15q13.3 deletion cases. To assess if CHRNA7 rare variants confer risk to ASD, we performed copy number variant analysis and Sanger sequencing of the CHRNA7 coding sequence in a sample of 135 ASD cases. Sequence variation in this gene remains largely unexplored, given the existence of a fusion gene, CHRFAM7A, which includes a nearly identical partial duplication of CHRNA7. Hence, attempts to sequence coding exons must distinguish between CHRNA7 and CHRFAM7A, making next-generation sequencing approaches unreliable for this purpose. A CHRNA7 microduplication was detected in a patient with autism and moderate cognitive impairment; while no rare damaging variants were identified in the coding region, we detected rare variants in the promoter region, previously described to functionally reduce transcription. This study represents the first sequence variant analysis of CHRNA7 in a sample of idiopathic autism. © 2015 Wiley Periodicals, Inc.

  11. Comprehensive Rare Variant Analysis via Whole-Genome Sequencing to Determine the Molecular Pathology of Inherited Retinal Disease.

    PubMed

    Carss, Keren J; Arno, Gavin; Erwood, Marie; Stephens, Jonathan; Sanchis-Juan, Alba; Hull, Sarah; Megy, Karyn; Grozeva, Detelina; Dewhurst, Eleanor; Malka, Samantha; Plagnol, Vincent; Penkett, Christopher; Stirrups, Kathleen; Rizzo, Roberta; Wright, Genevieve; Josifova, Dragana; Bitner-Glindzicz, Maria; Scott, Richard H; Clement, Emma; Allen, Louise; Armstrong, Ruth; Brady, Angela F; Carmichael, Jenny; Chitre, Manali; Henderson, Robert H H; Hurst, Jane; MacLaren, Robert E; Murphy, Elaine; Paterson, Joan; Rosser, Elisabeth; Thompson, Dorothy A; Wakeling, Emma; Ouwehand, Willem H; Michaelides, Michel; Moore, Anthony T; Webster, Andrew R; Raymond, F Lucy

    2017-01-05

    Inherited retinal disease is a common cause of visual impairment and represents a highly heterogeneous group of conditions. Here, we present findings from a cohort of 722 individuals with inherited retinal disease, who have had whole-genome sequencing (n = 605), whole-exome sequencing (n = 72), or both (n = 45) performed, as part of the NIHR-BioResource Rare Diseases research study. We identified pathogenic variants (single-nucleotide variants, indels, or structural variants) for 404/722 (56%) individuals. Whole-genome sequencing gives unprecedented power to detect three categories of pathogenic variants in particular: structural variants, variants in GC-rich regions, which have significantly improved coverage compared to whole-exome sequencing, and variants in non-coding regulatory regions. In addition to previously reported pathogenic regulatory variants, we have identified a previously unreported pathogenic intronic variant in CHM in two males with choroideremia. We have also identified 19 genes not previously known to be associated with inherited retinal disease, which harbor biallelic predicted protein-truncating variants in unsolved cases. Whole-genome sequencing is an increasingly important comprehensive method with which to investigate the genetic causes of inherited retinal disease. Copyright © 2017. Published by Elsevier Inc.

  12. A Bioinformatics Workflow for Variant Peptide Detection in Shotgun Proteomics*

    PubMed Central

    Li, Jing; Su, Zengliu; Ma, Ze-Qiang; Slebos, Robbert J. C.; Halvey, Patrick; Tabb, David L.; Liebler, Daniel C.; Pao, William; Zhang, Bing

    2011-01-01

    Shotgun proteomics data analysis usually relies on database search. However, commonly used protein sequence databases do not contain information on protein variants and thus prevent variant peptides and proteins from been identified. Including known coding variations into protein sequence databases could help alleviate this problem. Based on our recently published human Cancer Proteome Variation Database, we have created a protein sequence database that comprehensively annotates thousands of cancer-related coding variants collected in the Cancer Proteome Variation Database as well as noncancer-specific ones from the Single Nucleotide Polymorphism Database (dbSNP). Using this database, we then developed a data analysis workflow for variant peptide identification in shotgun proteomics. The high risk of false positive variant identifications was addressed by a modified false discovery rate estimation method. Analysis of colorectal cancer cell lines SW480, RKO, and HCT-116 revealed a total of 81 peptides that contain either noncancer-specific or cancer-related variations. Twenty-three out of 26 variants randomly selected from the 81 were confirmed by genomic sequencing. We further applied the workflow on data sets from three individual colorectal tumor specimens. A total of 204 distinct variant peptides were detected, and five carried known cancer-related mutations. Each individual showed a specific pattern of cancer-related mutations, suggesting potential use of this type of information for personalized medicine. Compatibility of the workflow has been tested with four popular database search engines including Sequest, Mascot, X!Tandem, and MyriMatch. In summary, we have developed a workflow that effectively uses existing genomic data to enable variant peptide detection in proteomics. PMID:21389108

  13. Vecuum: identification and filtration of false somatic variants caused by recombinant vector contamination.

    PubMed

    Kim, Junho; Maeng, Ju Heon; Lim, Jae Seok; Son, Hyeonju; Lee, Junehawk; Lee, Jeong Ho; Kim, Sangwoo

    2016-10-15

    Advances in sequencing technologies have remarkably lowered the detection limit of somatic variants to a low frequency. However, calling mutations at this range is still confounded by many factors including environmental contamination. Vector contamination is a continuously occurring issue and is especially problematic since vector inserts are hardly distinguishable from the sample sequences. Such inserts, which may harbor polymorphisms and engineered functional mutations, can result in calling false variants at corresponding sites. Numerous vector-screening methods have been developed, but none could handle contamination from inserts because they are focusing on vector backbone sequences alone. We developed a novel method-Vecuum-that identifies vector-originated reads and resultant false variants. Since vector inserts are generally constructed from intron-less cDNAs, Vecuum identifies vector-originated reads by inspecting the clipping patterns at exon junctions. False variant calls are further detected based on the biased distribution of mutant alleles to vector-originated reads. Tests on simulated and spike-in experimental data validated that Vecuum could detect 93% of vector contaminants and could remove up to 87% of variant-like false calls with 100% precision. Application to public sequence datasets demonstrated the utility of Vecuum in detecting false variants resulting from various types of external contamination. Java-based implementation of the method is available at http://vecuum.sourceforge.net/ CONTACT: swkim@yuhs.acSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  14. Molecular characterization of canine parvovirus strains in Argentina: Detection of the pathogenic variant CPV2c in vaccinated dogs.

    PubMed

    Calderon, Marina Gallo; Mattion, Nora; Bucafusco, Danilo; Fogel, Fernando; Remorini, Patricia; La Torre, Jose

    2009-08-01

    PCR amplification with sequence-specific primers was used to detect canine parvovirus (CPV) DNA in 38 rectal swabs from Argentine domestic dogs with symptoms compatible with parvovirus disease. Twenty-seven out of 38 samples analyzed were CPV positive. The classical CPV2 strain was not detected in any of the samples, but nine samples were identified as CPV2a variant and 18 samples as CPV2b variant. Further sequence analysis revealed a mutation at amino acid 426 of the VP2 gene (Asp426Glu), characteristic of the CPV2c variant, in 14 out of 18 of the samples identified initially by PCR as CPV2b. The appearance of CPV2c variant in Argentina might be dated at least to the year 2003. Three different pathogenic CPV variants circulating currently in the Argentine domestic dog population were identified, with CPV2c being the only variant affecting vaccinated and unvaccinated dogs during the year 2008.

  15. Construction of a combinatorial pipeline using two somatic variant  calling  methods  for whole exome sequence data of gastric cancer.

    PubMed

    Kohmoto, Tomohiro; Masuda, Kiyoshi; Naruto, Takuya; Tange, Shoichiro; Shoda, Katsutoshi; Hamada, Junichi; Saito, Masako; Ichikawa, Daisuke; Tajima, Atsushi; Otsuji, Eigo; Imoto, Issei

    2017-01-01

    High-throughput next-generation sequencing is a powerful tool to identify the genotypic landscapes of somatic variants and therapeutic targets in various cancers including gastric cancer, forming the basis for personalized medicine in the clinical setting. Although the advent of many computational algorithms leads to higher accuracy in somatic variant calling, no standard method exists due to the limitations of each method. Here, we constructed a new pipeline. We combined two different somatic variant callers with different algorithms, Strelka and VarScan 2, and evaluated performance using whole exome sequencing data obtained from 19 Japanese cases with gastric cancer (GC); then, we characterized these tumors based on identified driver molecular alterations. More single nucleotide variants (SNVs) and small insertions/deletions were detected by Strelka and VarScan 2, respectively. SNVs detected by both tools showed higher accuracy for estimating somatic variants compared with those detected by only one of the two tools and accurately showed the mutation signature and mutations of driver genes reported for GC. Our combinatorial pipeline may have an advantage in detection of somatic mutations in GC and may be useful for further genomic characterization of Japanese patients with GC to improve the efficacy of GC treatments. J. Med. Invest. 64: 233-240, August, 2017.

  16. Utility of NIST Whole-Genome Reference Materials for the Technical Validation of a Multigene Next-Generation Sequencing Test.

    PubMed

    Shum, Bennett O V; Henner, Ilya; Belluoccio, Daniele; Hinchcliffe, Marcus J

    2017-07-01

    The sensitivity and specificity of next-generation sequencing laboratory developed tests (LDTs) are typically determined by an analyte-specific approach. Analyte-specific validations use disease-specific controls to assess an LDT's ability to detect known pathogenic variants. Alternatively, a methods-based approach can be used for LDT technical validations. Methods-focused validations do not use disease-specific controls but use benchmark reference DNA that contains known variants (benign, variants of unknown significance, and pathogenic) to assess variant calling accuracy of a next-generation sequencing workflow. Recently, four whole-genome reference materials (RMs) from the National Institute of Standards and Technology (NIST) were released to standardize methods-based validations of next-generation sequencing panels across laboratories. We provide a practical method for using NIST RMs to validate multigene panels. We analyzed the utility of RMs in validating a novel newborn screening test that targets 70 genes, called NEO1. Despite the NIST RM variant truth set originating from multiple sequencing platforms, replicates, and library types, we discovered a 5.2% false-negative variant detection rate in the RM truth set genes that were assessed in our validation. We developed a strategy using complementary non-RM controls to demonstrate 99.6% sensitivity of the NEO1 test in detecting variants. Our findings have implications for laboratories or proficiency testing organizations using whole-genome NIST RMs for testing. Copyright © 2017 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  17. A survey of single nucleotide polymorphisms identified from whole-genome sequencing and their functional effect in the porcine genome

    USDA-ARS?s Scientific Manuscript database

    Genetic variants detected from sequence have been used to successfully identify causal variants and map complex traits in several organisms. High and moderate impact variants, those expected to alter or disrupt the protein coded by a gene and those that regulate protein production, likely have a mor...

  18. Mass Spectrometric Determination of ILPR G-quadruplex Binding Sites in Insulin and IGF-2

    PubMed Central

    Xiao, JunFeng

    2009-01-01

    The insulin-linked polymorphic region (ILPR) of the human insulin gene promoter region forms G-quadruplex structures in vitro. Previous studies show that insulin and insulin-like growth factor-2 (IGF-2) exhibit high affinity binding in vitro to 2-repeat sequences of ILPR variants a and h, but negligible binding to variant i. Two-repeat sequences of variants a and h form intramolecular G-quadruplex structures that are not evidenced for variant i. Here we report on the use of protein digestion combined with affinity capture and MALDI-MS detection to pinpoint ILPR binding sites in insulin and IGF-2. Peptides captured by ILPR variants a and h were sequenced by MALDI-MS/MS, LC-MS and in silico digestion. On-bead digestion of insulin-ILPR variant a complexes supported the conclusions. The results indicate that the sequence VCG(N)RGF is generally present in the captured peptides and is likely involved in the affinity binding interactions of the proteins with the ILPR G-quadruplexes. The significance of arginine in the interactions was studied by comparing the affinities of synthesized peptides VCGERGF and VCGEAGF with ILPR variant a. Peptides from other regions of the proteins that are connected through disulfide linkages were also detected in some capture experiments. Identification of binding sites could facilitate design of DNA binding ligands for capture and detection of insulin and IGF-2. The interactions may have biological significance as well. PMID:19747845

  19. An Analysis of the Sensitivity of Proteogenomic Mapping of Somatic Mutations and Novel Splicing Events in Cancer.

    PubMed

    Ruggles, Kelly V; Tang, Zuojian; Wang, Xuya; Grover, Himanshu; Askenazi, Manor; Teubl, Jennifer; Cao, Song; McLellan, Michael D; Clauser, Karl R; Tabb, David L; Mertins, Philipp; Slebos, Robbert; Erdmann-Gilmore, Petra; Li, Shunqiang; Gunawardena, Harsha P; Xie, Ling; Liu, Tao; Zhou, Jian-Ying; Sun, Shisheng; Hoadley, Katherine A; Perou, Charles M; Chen, Xian; Davies, Sherri R; Maher, Christopher A; Kinsinger, Christopher R; Rodland, Karen D; Zhang, Hui; Zhang, Zhen; Ding, Li; Townsend, R Reid; Rodriguez, Henry; Chan, Daniel; Smith, Richard D; Liebler, Daniel C; Carr, Steven A; Payne, Samuel; Ellis, Matthew J; Fenyő, David

    2016-03-01

    Improvements in mass spectrometry (MS)-based peptide sequencing provide a new opportunity to determine whether polymorphisms, mutations, and splice variants identified in cancer cells are translated. Herein, we apply a proteogenomic data integration tool (QUILTS) to illustrate protein variant discovery using whole genome, whole transcriptome, and global proteome datasets generated from a pair of luminal and basal-like breast-cancer-patient-derived xenografts (PDX). The sensitivity of proteogenomic analysis for singe nucleotide variant (SNV) expression and novel splice junction (NSJ) detection was probed using multiple MS/MS sample process replicates defined here as an independent tandem MS experiment using identical sample material. Despite analysis of over 30 sample process replicates, only about 10% of SNVs (somatic and germline) detected by both DNA and RNA sequencing were observed as peptides. An even smaller proportion of peptides corresponding to NSJ observed by RNA sequencing were detected (<0.1%). Peptides mapping to DNA-detected SNVs without a detectable mRNA transcript were also observed, suggesting that transcriptome coverage was incomplete (∼80%). In contrast to germline variants, somatic variants were less likely to be detected at the peptide level in the basal-like tumor than in the luminal tumor, raising the possibility of differential translation or protein degradation effects. In conclusion, this large-scale proteogenomic integration allowed us to determine the degree to which mutations are translated and identify gaps in sequence coverage, thereby benchmarking current technology and progress toward whole cancer proteome and transcriptome analysis. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.

  20. Targeted next-generation sequencing makes new molecular diagnoses and expands genotype-phenotype relationship in Ehlers-Danlos syndrome.

    PubMed

    Weerakkody, Ruwan A; Vandrovcova, Jana; Kanonidou, Christina; Mueller, Michael; Gampawar, Piyush; Ibrahim, Yousef; Norsworthy, Penny; Biggs, Jennifer; Abdullah, Abdulshakur; Ross, David; Black, Holly A; Ferguson, David; Cheshire, Nicholas J; Kazkaz, Hanadi; Grahame, Rodney; Ghali, Neeti; Vandersteen, Anthony; Pope, F Michael; Aitman, Timothy J

    2016-11-01

    Ehlers-Danlos syndrome (EDS) comprises a group of overlapping hereditary disorders of connective tissue with significant morbidity and mortality, including major vascular complications. We sought to identify the diagnostic utility of a next-generation sequencing (NGS) panel in a mixed EDS cohort. We developed and applied PCR-based NGS assays for targeted, unbiased sequencing of 12 collagen and aortopathy genes to a cohort of 177 unrelated EDS patients. Variants were scored blind to previous genetic testing and then compared with results of previous Sanger sequencing. Twenty-eight pathogenic variants in COL5A1/2, COL3A1, FBN1, and COL1A1 and four likely pathogenic variants in COL1A1, TGFBR1/2, and SMAD3 were identified by the NGS assays. These included all previously detected single-nucleotide and other short pathogenic variants in these genes, and seven newly detected pathogenic or likely pathogenic variants leading to clinically significant diagnostic revisions. Twenty-two variants of uncertain significance were identified, seven of which were in aortopathy genes and required clinical follow-up. Unbiased NGS-based sequencing made new molecular diagnoses outside the expected EDS genotype-phenotype relationship and identified previously undetected clinically actionable variants in aortopathy susceptibility genes. These data may be of value in guiding future clinical pathways for genetic diagnosis in EDS.Genet Med 18 11, 1119-1127.

  1. A survey of single nucleotide polymorphisms identified from whole-genome sequencing and their functional effect in the porcine genome.

    PubMed

    Keel, B N; Nonneman, D J; Rohrer, G A

    2017-08-01

    Genetic variants detected from sequence have been used to successfully identify causal variants and map complex traits in several organisms. High and moderate impact variants, those expected to alter or disrupt the protein coded by a gene and those that regulate protein production, likely have a more significant effect on phenotypic variation than do other types of genetic variants. Hence, a comprehensive list of these functional variants would be of considerable interest in swine genomic studies, particularly those targeting fertility and production traits. Whole-genome sequence was obtained from 72 of the founders of an intensely phenotyped experimental swine herd at the U.S. Meat Animal Research Center (USMARC). These animals included all 24 of the founding boars (12 Duroc and 12 Landrace) and 48 Yorkshire-Landrace composite sows. Sequence reads were mapped to the Sscrofa10.2 genome build, resulting in a mean of 6.1 fold (×) coverage per genome. A total of 22 342 915 high confidence SNPs were identified from the sequenced genomes. These included 21 million previously reported SNPs and 79% of the 62 163 SNPs on the PorcineSNP60 BeadChip assay. Variation was detected in the coding sequence or untranslated regions (UTRs) of 87.8% of the genes in the porcine genome: loss-of-function variants were predicted in 504 genes, 10 202 genes contained nonsynonymous variants, 10 773 had variation in UTRs and 13 010 genes contained synonymous variants. Approximately 139 000 SNPs were classified as loss-of-function, nonsynonymous or regulatory, which suggests that over 99% of the variation detected in our pigs could potentially be ignored, allowing us to focus on a much smaller number of functional SNPs during future analyses. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.

  2. Next generation sequencing identifies abnormal Y chromosome and candidate causal variants in premature ovarian failure patients.

    PubMed

    Lee, Yujung; Kim, Changshin; Park, YoungJoon; Pyun, Jung-A; Kwack, KyuBum

    2016-12-01

    Premature ovarian failure (POF) is characterized by heterogeneous genetic causes such as chromosomal abnormalities and variants in causal genes. Recently, development of techniques made next generation sequencing (NGS) possible to detect genome wide variants including chromosomal abnormalities. Among 37 Korean POF patients, XY karyotype with distal part deletions of Y chromosome, Yp11.32-31 and Yp12 end part, was observed in two patients through NGS. Six deleterious variants in POF genes were also detected which might explain the pathogenesis of POF with abnormalities in the sex chromosomes. Additionally, the two POF patients had no mutation in SRY but three non-synonymous variants were detected in genes regarding sex reversal. These findings suggest candidate causes of POF and sex reversal and show the propriety of NGS to approach the heterogeneous pathogenesis of POF. Copyright © 2016 Elsevier Inc. All rights reserved.

  3. Genomic variation in macrophage-cultured European porcine reproductive and respiratory syndrome virus Olot/91 revealed using ultra-deep next generation sequencing.

    PubMed

    Lu, Zen H; Brown, Alexander; Wilson, Alison D; Calvert, Jay G; Balasch, Monica; Fuentes-Utrilla, Pablo; Loecherbach, Julia; Turner, Frances; Talbot, Richard; Archibald, Alan L; Ait-Ali, Tahar

    2014-03-04

    Porcine Reproductive and Respiratory Syndrome (PRRS) is a disease of major economic impact worldwide. The etiologic agent of this disease is the PRRS virus (PRRSV). Increasing evidence suggest that microevolution within a coexisting quasispecies population can give rise to high sequence heterogeneity in PRRSV. We developed a pipeline based on the ultra-deep next generation sequencing approach to first construct the complete genome of a European PRRSV, strain Olot/9, cultured on macrophages and then capture the rare variants representative of the mixed quasispecies population. Olot/91 differs from the reference Lelystad strain by about 5% and a total of 88 variants, with frequencies as low as 1%, were detected in the mixed population. These variants included 16 non-synonymous variants concentrated in the genes encoding structural and nonstructural proteins; including Glycoprotein 2a and 5. Using an ultra-deep sequencing methodology, the complete genome of Olot/91 was constructed without any prior knowledge of the sequence. Rare variants that constitute minor fractions of the heterogeneous PRRSV population could successfully be detected to allow further exploration of microevolutionary events.

  4. Evaluation of three read-depth based CNV detection tools using whole-exome sequencing data.

    PubMed

    Yao, Ruen; Zhang, Cheng; Yu, Tingting; Li, Niu; Hu, Xuyun; Wang, Xiumin; Wang, Jian; Shen, Yiping

    2017-01-01

    Whole exome sequencing (WES) has been widely accepted as a robust and cost-effective approach for clinical genetic testing of small sequence variants. Detection of copy number variants (CNV) within WES data have become possible through the development of various algorithms and software programs that utilize read-depth as the main information. The aim of this study was to evaluate three commonly used, WES read-depth based CNV detection programs using high-resolution chromosomal microarray analysis (CMA) as a standard. Paired CMA and WES data were acquired for 45 samples. A total of 219 CNVs (size ranged from 2.3 kb - 35 mb) identified on three CMA platforms (Affymetrix, Agilent and Illumina) were used as standards. CNVs were called from WES data using XHMM, CoNIFER, and CNVnator with modified settings. All three software packages detected an elevated proportion of small variants (< 20 kb) compared to CMA. XHMM and CoNIFER had poor detection sensitivity (22.2 and 14.6%), which correlated with the number of capturing probes involved. CNVnator detected most variants and had better sensitivity (87.7%); however, suffered from an overwhelming detection of small CNVs below 20 kb, which required further confirmation. Size estimation of variants was exaggerated by CNVnator and understated by XHMM and CoNIFER. Low concordances of CNV, detected by three different read-depth based programs, indicate the immature status of WES-based CNV detection. Low sensitivity and uncertain specificity of WES-based CNV detection in comparison with CMA based CNV detection suggests that CMA will continue to play an important role in detecting clinical grade CNV in the NGS era, which is largely based on WES.

  5. Silent genetic alterations identified by targeted next-generation sequencing in pheochromocytoma/paraganglioma: A clinicopathological correlations.

    PubMed

    Pillai, Suja; Gopalan, Vinod; Lo, Chung Y; Liew, Victor; Smith, Robert A; Lam, Alfred King Y

    2017-02-01

    The goal of this pilot study was to develop a customized, cost-effective amplicon panel (Ampliseq) for target sequencing in a cohort of patients with sporadic phaeochromocytoma/paraganglioma. Phaeochromocytoma/paragangliomas from 25 patients were analysed by targeted next-generation sequencing approach using an Ion Torrent PGM instrument. Primers for 15 target genes (NF1, RET, VHL, SDHA, SDHB, SDHC, SDHD, SDHAF2, TMEM127, MAX, MEN1, KIF1Bβ, EPAS1, CDKN2 & PHD2) were designed using ion ampliseq designer. Ion Reporter software and Ingenuity® Variant Analysis™ software (www.ingenuity.com/variants) from Ingenuity Systems were used to analysis these results. Overall, 713 variants were identified. The variants identified from the Ion Reporter ranged from 64 to 161 per patient. Single nucleotide variants (SNV) were the most common. Further annotation with the help of Ingenuity variant analysis revealed 29 of these 713variants were deletions. Of these, six variants were non-pathogenic and four were likely to be pathogenic. The remaining 19 variants were of uncertain significance. The most frequently altered gene in the cohort was KIF1B followed by NF1. Novel KIF1B pathogenic variant c.3375+1G>A was identified. The mutation was noted in a patient with clinically confirmed neurofibromatosis. Chromosome 1 showed the presence of maximum number of variants. Use of targeted next-generation sequencing is a sensitive method for the detecting genetic changes in patients with phaeochromocytoma/paraganglioma. The precise detection of these genetic changes helps in understanding the pathogenesis of these tumours. Copyright © 2016 Elsevier Inc. All rights reserved.

  6. Evolution of simeprevir-resistant variants over time by ultra-deep sequencing in HCV genotype 1b.

    PubMed

    Akuta, Norio; Suzuki, Fumitaka; Sezaki, Hitomi; Suzuki, Yoshiyuki; Hosaka, Tetsuya; Kobayashi, Masahiro; Kobayashi, Mariko; Saitoh, Satoshi; Ikeda, Kenji; Kumada, Hiromitsu

    2014-08-01

    Using ultra-deep sequencing technology, the present study was designed to investigate the evolution of simeprevir-resistant variants (amino acid substitutions of aa80, aa155, aa156, and aa168 positions in HCV NS3 region) over time. In Toranomon Hospital, 18 Japanese patients infected with HCV genotype 1b, received triple therapy of simeprevir/PEG-IFN/ribavirin (DRAGON or CONCERT study). Sustained virological response rate was 67%, and that was significantly higher in patients with IL28B rs8099917 TT than in those with non-TT. Six patients, who did not achieve sustained virological response, were tested for resistant variants by ultra-deep sequencing, at the baseline, at the time of re-elevation of viral loads, and at 96 weeks after the completion of treatment. Twelve of 18 resistant variants, detected at re-elevation of viral load, were de novo resistant variants. Ten of 12 de novo resistant variants become undetectable over time, and that five of seven resistant variants, detected at baseline, persisted over time. In one patient, variants of Q80R at baseline (0.3%) increased at 96-week after the cessation of treatment (10.2%), and de novo resistant variants of D168E (0.3%) also increased at 96-week after the cessation of treatment (9.7%). In conclusion, the present study indicates that the emergence of simeprevir-resistant variants after the start of treatment could not be predicted at baseline, and the majority of de novo resistant variants become undetectable over time. Further large-scale prospective studies should be performed to investigate the clinical utility in detecting simeprevir-resistant variants. © 2014 Wiley Periodicals, Inc.

  7. Novel approach to genetic analysis and results in 3000 hemophilia patients enrolled in the My Life, Our Future initiative

    PubMed Central

    Johnsen, Jill M.; Fletcher, Shelley N.; Huston, Haley; Roberge, Sarah; Martin, Beth K.; Kircher, Martin; Josephson, Neil C.; Shendure, Jay; Ruuska, Sarah; Koerper, Marion A.; Morales, Jaime; Pierce, Glenn F.; Aschman, Diane J.

    2017-01-01

    Hemophilia A and B are rare, X-linked bleeding disorders. My Life, Our Future (MLOF) is a collaborative project established to genotype and study hemophilia. Patients were enrolled at US hemophilia treatment centers (HTCs). Genotyping was performed centrally using next-generation sequencing (NGS) with an approach that detected common F8 gene inversions simultaneously with F8 and F9 gene sequencing followed by confirmation using standard genotyping methods. Sixty-nine HTCs enrolled the first 3000 patients in under 3 years. Clinically reportable DNA variants were detected in 98.1% (2357/2401) of hemophilia A and 99.3% (595/599) of hemophilia B patients. Of the 924 unique variants found, 285 were novel. Predicted gene-disrupting variants were common in severe disease; missense variants predominated in mild–moderate disease. Novel DNA variants accounted for ∼30% of variants found and were detected continuously throughout the project, indicating that additional variation likely remains undiscovered. The NGS approach detected >1 reportable variants in 36 patients (10 females), a finding with potential clinical implications. NGS also detected incidental variants unlikely to cause disease, including 11 variants previously reported in hemophilia. Although these genes are thought to be conserved, our findings support caution in interpretation of new variants. In summary, MLOF has contributed significantly toward variant annotation in the F8 and F9 genes. In the near future, investigators will be able to access MLOF data and repository samples for research to advance our understanding of hemophilia. PMID:29296726

  8. Genomic prediction using preselected DNA variants from a GWAS with whole-genome sequence data in Holstein-Friesian cattle.

    PubMed

    Veerkamp, Roel F; Bouwman, Aniek C; Schrooten, Chris; Calus, Mario P L

    2016-12-01

    Whole-genome sequence data is expected to capture genetic variation more completely than common genotyping panels. Our objective was to compare the proportion of variance explained and the accuracy of genomic prediction by using imputed sequence data or preselected SNPs from a genome-wide association study (GWAS) with imputed whole-genome sequence data. Phenotypes were available for 5503 Holstein-Friesian bulls. Genotypes were imputed up to whole-genome sequence (13,789,029 segregating DNA variants) by using run 4 of the 1000 bull genomes project. The program GCTA was used to perform GWAS for protein yield (PY), somatic cell score (SCS) and interval from first to last insemination (IFL). From the GWAS, subsets of variants were selected and genomic relationship matrices (GRM) were used to estimate the variance explained in 2087 validation animals and to evaluate the genomic prediction ability. Finally, two GRM were fitted together in several models to evaluate the effect of selected variants that were in competition with all the other variants. The GRM based on full sequence data explained only marginally more genetic variation than that based on common SNP panels: for PY, SCS and IFL, genomic heritability improved from 0.81 to 0.83, 0.83 to 0.87 and 0.69 to 0.72, respectively. Sequence data also helped to identify more variants linked to quantitative trait loci and resulted in clearer GWAS peaks across the genome. The proportion of total variance explained by the selected variants combined in a GRM was considerably smaller than that explained by all variants (less than 0.31 for all traits). When selected variants were used, accuracy of genomic predictions decreased and bias increased. Although 35 to 42 variants were detected that together explained 13 to 19% of the total variance (18 to 23% of the genetic variance) when fitted alone, there was no advantage in using dense sequence information for genomic prediction in the Holstein data used in our study. Detection and selection of variants within a single breed are difficult due to long-range linkage disequilibrium. Stringent selection of variants resulted in more biased genomic predictions, although this might be due to the training population being the same dataset from which the selected variants were identified.

  9. Evaluation of targeted exome sequencing for 28 protein-based blood group systems, including the homologous gene systems, for blood group genotyping.

    PubMed

    Schoeman, Elizna M; Lopez, Genghis H; McGowan, Eunike C; Millard, Glenda M; O'Brien, Helen; Roulis, Eileen V; Liew, Yew-Wah; Martin, Jacqueline R; McGrath, Kelli A; Powley, Tanya; Flower, Robert L; Hyland, Catherine A

    2017-04-01

    Blood group single nucleotide polymorphism genotyping probes for a limited range of polymorphisms. This study investigated whether massively parallel sequencing (also known as next-generation sequencing), with a targeted exome strategy, provides an extended blood group genotype and the extent to which massively parallel sequencing correctly genotypes in homologous gene systems, such as RH and MNS. Donor samples (n = 28) that were extensively phenotyped and genotyped using single nucleotide polymorphism typing, were analyzed using the TruSight One Sequencing Panel and MiSeq platform. Genes for 28 protein-based blood group systems, GATA1, and KLF1 were analyzed. Copy number variation analysis was used to characterize complex structural variants in the GYPC and RH systems. The average sequencing depth per target region was 66.2 ± 39.8. Each sample harbored on average 43 ± 9 variants, of which 10 ± 3 were used for genotyping. For the 28 samples, massively parallel sequencing variant sequences correctly matched expected sequences based on single nucleotide polymorphism genotyping data. Copy number variation analysis defined the Rh C/c alleles and complex RHD hybrids. Hybrid RHD*D-CE-D variants were correctly identified, but copy number variation analysis did not confidently distinguish between D and CE exon deletion versus rearrangement. The targeted exome sequencing strategy employed extended the range of blood group genotypes detected compared with single nucleotide polymorphism typing. This single-test format included detection of complex MNS hybrid cases and, with copy number variation analysis, defined RH hybrid genes along with the RHCE*C allele hitherto difficult to resolve by variant detection. The approach is economical compared with whole-genome sequencing and is suitable for a red blood cell reference laboratory setting. © 2017 AABB.

  10. Exome Sequencing Fails to Identify the Genetic Cause of Aicardi Syndrome.

    PubMed

    Lund, Caroline; Striano, Pasquale; Sorte, Hanne Sørmo; Parisi, Pasquale; Iacomino, Michele; Sheng, Ying; Vigeland, Magnus D; Øye, Anne-Marte; Møller, Rikke Steensbjerre; Selmer, Kaja K; Zara, Federico

    2016-09-01

    Aicardi syndrome (AS) is a well-characterized neurodevelopmental disorder with an unknown etiology. In this study, we performed whole-exome sequencing in 11 female patients with the diagnosis of AS, in order to identify the disease-causing gene. In particular, we focused on detecting variants in the X chromosome, including the analysis of variants with a low number of sequencing reads, in case of somatic mosaicism. For 2 of the patients, we also sequenced the exome of the parents to search for de novo mutations. We did not identify any genetic variants likely to be damaging. Only one single missense variant was identified by the de novo analyses of the 2 trios, and this was considered benign. The failure to identify a disease gene in this study may be due to technical limitations of our study design, including the possibility that the genetic aberration leading to AS is situated in a non-exonic region or that the mutation is somatic and not detectable by our approach. Alternatively, it is possible that AS is genetically heterogeneous and that 11 patients are not sufficient to reveal the causative genes. Future studies of AS should consider designs where also non-exonic regions are explored and apply a sequencing depth so that also low-grade somatic mosaicism can be detected.

  11. Sequence data and association statistics from 12,940 type 2 diabetes cases and controls.

    PubMed

    Flannick, Jason; Fuchsberger, Christian; Mahajan, Anubha; Teslovich, Tanya M; Agarwala, Vineeta; Gaulton, Kyle J; Caulkins, Lizz; Koesterer, Ryan; Ma, Clement; Moutsianas, Loukas; McCarthy, Davis J; Rivas, Manuel A; Perry, John R B; Sim, Xueling; Blackwell, Thomas W; Robertson, Neil R; Rayner, N William; Cingolani, Pablo; Locke, Adam E; Tajes, Juan Fernandez; Highland, Heather M; Dupuis, Josee; Chines, Peter S; Lindgren, Cecilia M; Hartl, Christopher; Jackson, Anne U; Chen, Han; Huyghe, Jeroen R; van de Bunt, Martijn; Pearson, Richard D; Kumar, Ashish; Müller-Nurasyid, Martina; Grarup, Niels; Stringham, Heather M; Gamazon, Eric R; Lee, Jaehoon; Chen, Yuhui; Scott, Robert A; Below, Jennifer E; Chen, Peng; Huang, Jinyan; Go, Min Jin; Stitzel, Michael L; Pasko, Dorota; Parker, Stephen C J; Varga, Tibor V; Green, Todd; Beer, Nicola L; Day-Williams, Aaron G; Ferreira, Teresa; Fingerlin, Tasha; Horikoshi, Momoko; Hu, Cheng; Huh, Iksoo; Ikram, Mohammad Kamran; Kim, Bong-Jo; Kim, Yongkang; Kim, Young Jin; Kwon, Min-Seok; Lee, Juyoung; Lee, Selyeong; Lin, Keng-Han; Maxwell, Taylor J; Nagai, Yoshihiko; Wang, Xu; Welch, Ryan P; Yoon, Joon; Zhang, Weihua; Barzilai, Nir; Voight, Benjamin F; Han, Bok-Ghee; Jenkinson, Christopher P; Kuulasmaa, Teemu; Kuusisto, Johanna; Manning, Alisa; Ng, Maggie C Y; Palmer, Nicholette D; Balkau, Beverley; Stančáková, Alena; Abboud, Hanna E; Boeing, Heiner; Giedraitis, Vilmantas; Prabhakaran, Dorairaj; Gottesman, Omri; Scott, James; Carey, Jason; Kwan, Phoenix; Grant, George; Smith, Joshua D; Neale, Benjamin M; Purcell, Shaun; Butterworth, Adam S; Howson, Joanna M M; Lee, Heung Man; Lu, Yingchang; Kwak, Soo-Heon; Zhao, Wei; Danesh, John; Lam, Vincent K L; Park, Kyong Soo; Saleheen, Danish; So, Wing Yee; Tam, Claudia H T; Afzal, Uzma; Aguilar, David; Arya, Rector; Aung, Tin; Chan, Edmund; Navarro, Carmen; Cheng, Ching-Yu; Palli, Domenico; Correa, Adolfo; Curran, Joanne E; Rybin, Dennis; Farook, Vidya S; Fowler, Sharon P; Freedman, Barry I; Griswold, Michael; Hale, Daniel Esten; Hicks, Pamela J; Khor, Chiea-Chuen; Kumar, Satish; Lehne, Benjamin; Thuillier, Dorothée; Lim, Wei Yen; Liu, Jianjun; Loh, Marie; Musani, Solomon K; Puppala, Sobha; Scott, William R; Yengo, Loïc; Tan, Sian-Tsung; Taylor, Herman A; Thameem, Farook; Wilson, Gregory; Wong, Tien Yin; Njølstad, Pål Rasmus; Levy, Jonathan C; Mangino, Massimo; Bonnycastle, Lori L; Schwarzmayr, Thomas; Fadista, João; Surdulescu, Gabriela L; Herder, Christian; Groves, Christopher J; Wieland, Thomas; Bork-Jensen, Jette; Brandslund, Ivan; Christensen, Cramer; Koistinen, Heikki A; Doney, Alex S F; Kinnunen, Leena; Esko, Tõnu; Farmer, Andrew J; Hakaste, Liisa; Hodgkiss, Dylan; Kravic, Jasmina; Lyssenko, Valeri; Hollensted, Mette; Jørgensen, Marit E; Jørgensen, Torben; Ladenvall, Claes; Justesen, Johanne Marie; Käräjämäki, Annemari; Kriebel, Jennifer; Rathmann, Wolfgang; Lannfelt, Lars; Lauritzen, Torsten; Narisu, Narisu; Linneberg, Allan; Melander, Olle; Milani, Lili; Neville, Matt; Orho-Melander, Marju; Qi, Lu; Qi, Qibin; Roden, Michael; Rolandsson, Olov; Swift, Amy; Rosengren, Anders H; Stirrups, Kathleen; Wood, Andrew R; Mihailov, Evelin; Blancher, Christine; Carneiro, Mauricio O; Maguire, Jared; Poplin, Ryan; Shakir, Khalid; Fennell, Timothy; DePristo, Mark; de Angelis, Martin Hrabé; Deloukas, Panos; Gjesing, Anette P; Jun, Goo; Nilsson, Peter; Murphy, Jacquelyn; Onofrio, Robert; Thorand, Barbara; Hansen, Torben; Meisinger, Christa; Hu, Frank B; Isomaa, Bo; Karpe, Fredrik; Liang, Liming; Peters, Annette; Huth, Cornelia; O'Rahilly, Stephen P; Palmer, Colin N A; Pedersen, Oluf; Rauramaa, Rainer; Tuomilehto, Jaakko; Salomaa, Veikko; Watanabe, Richard M; Syvänen, Ann-Christine; Bergman, Richard N; Bharadwaj, Dwaipayan; Bottinger, Erwin P; Cho, Yoon Shin; Chandak, Giriraj R; Chan, Juliana Cn; Chia, Kee Seng; Daly, Mark J; Ebrahim, Shah B; Langenberg, Claudia; Elliott, Paul; Jablonski, Kathleen A; Lehman, Donna M; Jia, Weiping; Ma, Ronald C W; Pollin, Toni I; Sandhu, Manjinder; Tandon, Nikhil; Froguel, Philippe; Barroso, Inês; Teo, Yik Ying; Zeggini, Eleftheria; Loos, Ruth J F; Small, Kerrin S; Ried, Janina S; DeFronzo, Ralph A; Grallert, Harald; Glaser, Benjamin; Metspalu, Andres; Wareham, Nicholas J; Walker, Mark; Banks, Eric; Gieger, Christian; Ingelsson, Erik; Im, Hae Kyung; Illig, Thomas; Franks, Paul W; Buck, Gemma; Trakalo, Joseph; Buck, David; Prokopenko, Inga; Mägi, Reedik; Lind, Lars; Farjoun, Yossi; Owen, Katharine R; Gloyn, Anna L; Strauch, Konstantin; Tuomi, Tiinamaija; Kooner, Jaspal Singh; Lee, Jong-Young; Park, Taesung; Donnelly, Peter; Morris, Andrew D; Hattersley, Andrew T; Bowden, Donald W; Collins, Francis S; Atzmon, Gil; Chambers, John C; Spector, Timothy D; Laakso, Markku; Strom, Tim M; Bell, Graeme I; Blangero, John; Duggirala, Ravindranath; Tai, E Shyong; McVean, Gilean; Hanis, Craig L; Wilson, James G; Seielstad, Mark; Frayling, Timothy M; Meigs, James B; Cox, Nancy J; Sladek, Rob; Lander, Eric S; Gabriel, Stacey; Mohlke, Karen L; Meitinger, Thomas; Groop, Leif; Abecasis, Goncalo; Scott, Laura J; Morris, Andrew P; Kang, Hyun Min; Altshuler, David; Burtt, Noël P; Florez, Jose C; Boehnke, Michael; McCarthy, Mark I

    2017-12-19

    To investigate the genetic basis of type 2 diabetes (T2D) to high resolution, the GoT2D and T2D-GENES consortia catalogued variation from whole-genome sequencing of 2,657 European individuals and exome sequencing of 12,940 individuals of multiple ancestries. Over 27M SNPs, indels, and structural variants were identified, including 99% of low-frequency (minor allele frequency [MAF] 0.1-5%) non-coding variants in the whole-genome sequenced individuals and 99.7% of low-frequency coding variants in the whole-exome sequenced individuals. Each variant was tested for association with T2D in the sequenced individuals, and, to increase power, most were tested in larger numbers of individuals (>80% of low-frequency coding variants in ~82 K Europeans via the exome chip, and ~90% of low-frequency non-coding variants in ~44 K Europeans via genotype imputation). The variants, genotypes, and association statistics from these analyses provide the largest reference to date of human genetic information relevant to T2D, for use in activities such as T2D-focused genotype imputation, functional characterization of variants or genes, and other novel analyses to detect associations between sequence variation and T2D.

  12. Sequence data and association statistics from 12,940 type 2 diabetes cases and controls

    PubMed Central

    Jason, Flannick; Fuchsberger, Christian; Mahajan, Anubha; Teslovich, Tanya M.; Agarwala, Vineeta; Gaulton, Kyle J.; Caulkins, Lizz; Koesterer, Ryan; Ma, Clement; Moutsianas, Loukas; McCarthy, Davis J.; Rivas, Manuel A.; Perry, John R. B.; Sim, Xueling; Blackwell, Thomas W.; Robertson, Neil R.; Rayner, N William; Cingolani, Pablo; Locke, Adam E.; Tajes, Juan Fernandez; Highland, Heather M.; Dupuis, Josee; Chines, Peter S.; Lindgren, Cecilia M.; Hartl, Christopher; Jackson, Anne U.; Chen, Han; Huyghe, Jeroen R.; van de Bunt, Martijn; Pearson, Richard D.; Kumar, Ashish; Müller-Nurasyid, Martina; Grarup, Niels; Stringham, Heather M.; Gamazon, Eric R.; Lee, Jaehoon; Chen, Yuhui; Scott, Robert A.; Below, Jennifer E.; Chen, Peng; Huang, Jinyan; Go, Min Jin; Stitzel, Michael L.; Pasko, Dorota; Parker, Stephen C. J.; Varga, Tibor V.; Green, Todd; Beer, Nicola L.; Day-Williams, Aaron G.; Ferreira, Teresa; Fingerlin, Tasha; Horikoshi, Momoko; Hu, Cheng; Huh, Iksoo; Ikram, Mohammad Kamran; Kim, Bong-Jo; Kim, Yongkang; Kim, Young Jin; Kwon, Min-Seok; Lee, Juyoung; Lee, Selyeong; Lin, Keng-Han; Maxwell, Taylor J.; Nagai, Yoshihiko; Wang, Xu; Welch, Ryan P.; Yoon, Joon; Zhang, Weihua; Barzilai, Nir; Voight, Benjamin F.; Han, Bok-Ghee; Jenkinson, Christopher P.; Kuulasmaa, Teemu; Kuusisto, Johanna; Manning, Alisa; Ng, Maggie C. Y.; Palmer, Nicholette D.; Balkau, Beverley; Stančáková, Alena; Abboud, Hanna E.; Boeing, Heiner; Giedraitis, Vilmantas; Prabhakaran, Dorairaj; Gottesman, Omri; Scott, James; Carey, Jason; Kwan, Phoenix; Grant, George; Smith, Joshua D.; Neale, Benjamin M.; Purcell, Shaun; Butterworth, Adam S.; Howson, Joanna M. M.; Lee, Heung Man; Lu, Yingchang; Kwak, Soo-Heon; Zhao, Wei; Danesh, John; Lam, Vincent K. L.; Park, Kyong Soo; Saleheen, Danish; So, Wing Yee; Tam, Claudia H. T.; Afzal, Uzma; Aguilar, David; Arya, Rector; Aung, Tin; Chan, Edmund; Navarro, Carmen; Cheng, Ching-Yu; Palli, Domenico; Correa, Adolfo; Curran, Joanne E.; Rybin, Dennis; Farook, Vidya S.; Fowler, Sharon P.; Freedman, Barry I.; Griswold, Michael; Hale, Daniel Esten; Hicks, Pamela J.; Khor, Chiea-Chuen; Kumar, Satish; Lehne, Benjamin; Thuillier, Dorothée; Lim, Wei Yen; Liu, Jianjun; Loh, Marie; Musani, Solomon K.; Puppala, Sobha; Scott, William R.; Yengo, Loïc; Tan, Sian-Tsung; Taylor, Herman A.; Thameem, Farook; Wilson, Gregory; Wong, Tien Yin; Njølstad, Pål Rasmus; Levy, Jonathan C.; Mangino, Massimo; Bonnycastle, Lori L.; Schwarzmayr, Thomas; Fadista, João; Surdulescu, Gabriela L.; Herder, Christian; Groves, Christopher J.; Wieland, Thomas; Bork-Jensen, Jette; Brandslund, Ivan; Christensen, Cramer; Koistinen, Heikki A.; Doney, Alex S. F.; Kinnunen, Leena; Esko, Tõnu; Farmer, Andrew J.; Hakaste, Liisa; Hodgkiss, Dylan; Kravic, Jasmina; Lyssenko, Valeri; Hollensted, Mette; Jørgensen, Marit E.; Jørgensen, Torben; Ladenvall, Claes; Justesen, Johanne Marie; Käräjämäki, Annemari; Kriebel, Jennifer; Rathmann, Wolfgang; Lannfelt, Lars; Lauritzen, Torsten; Narisu, Narisu; Linneberg, Allan; Melander, Olle; Milani, Lili; Neville, Matt; Orho-Melander, Marju; Qi, Lu; Qi, Qibin; Roden, Michael; Rolandsson, Olov; Swift, Amy; Rosengren, Anders H.; Stirrups, Kathleen; Wood, Andrew R.; Mihailov, Evelin; Blancher, Christine; Carneiro, Mauricio O.; Maguire, Jared; Poplin, Ryan; Shakir, Khalid; Fennell, Timothy; DePristo, Mark; de Angelis, Martin Hrabé; Deloukas, Panos; Gjesing, Anette P.; Jun, Goo; Nilsson, Peter; Murphy, Jacquelyn; Onofrio, Robert; Thorand, Barbara; Hansen, Torben; Meisinger, Christa; Hu, Frank B.; Isomaa, Bo; Karpe, Fredrik; Liang, Liming; Peters, Annette; Huth, Cornelia; O'Rahilly, Stephen P; Palmer, Colin N. A.; Pedersen, Oluf; Rauramaa, Rainer; Tuomilehto, Jaakko; Salomaa, Veikko; Watanabe, Richard M.; Syvänen, Ann-Christine; Bergman, Richard N.; Bharadwaj, Dwaipayan; Bottinger, Erwin P.; Cho, Yoon Shin; Chandak, Giriraj R.; Chan, Juliana CN; Chia, Kee Seng; Daly, Mark J.; Ebrahim, Shah B.; Langenberg, Claudia; Elliott, Paul; Jablonski, Kathleen A.; Lehman, Donna M.; Jia, Weiping; Ma, Ronald C. W.; Pollin, Toni I.; Sandhu, Manjinder; Tandon, Nikhil; Froguel, Philippe; Barroso, Inês; Teo, Yik Ying; Zeggini, Eleftheria; Loos, Ruth J. F.; Small, Kerrin S.; Ried, Janina S.; DeFronzo, Ralph A.; Grallert, Harald; Glaser, Benjamin; Metspalu, Andres; Wareham, Nicholas J.; Walker, Mark; Banks, Eric; Gieger, Christian; Ingelsson, Erik; Im, Hae Kyung; Illig, Thomas; Franks, Paul W.; Buck, Gemma; Trakalo, Joseph; Buck, David; Prokopenko, Inga; Mägi, Reedik; Lind, Lars; Farjoun, Yossi; Owen, Katharine R.; Gloyn, Anna L.; Strauch, Konstantin; Tuomi, Tiinamaija; Kooner, Jaspal Singh; Lee, Jong-Young; Park, Taesung; Donnelly, Peter; Morris, Andrew D.; Hattersley, Andrew T.; Bowden, Donald W.; Collins, Francis S.; Atzmon, Gil; Chambers, John C.; Spector, Timothy D.; Laakso, Markku; Strom, Tim M.; Bell, Graeme I.; Blangero, John; Duggirala, Ravindranath; Tai, E. Shyong; McVean, Gilean; Hanis, Craig L.; Wilson, James G.; Seielstad, Mark; Frayling, Timothy M.; Meigs, James B.; Cox, Nancy J.; Sladek, Rob; Lander, Eric S.; Gabriel, Stacey; Mohlke, Karen L.; Meitinger, Thomas; Groop, Leif; Abecasis, Goncalo; Scott, Laura J.; Morris, Andrew P.; Kang, Hyun Min; Altshuler, David; Burtt, Noël P.; Florez, Jose C.; Boehnke, Michael; McCarthy, Mark I.

    2017-01-01

    To investigate the genetic basis of type 2 diabetes (T2D) to high resolution, the GoT2D and T2D-GENES consortia catalogued variation from whole-genome sequencing of 2,657 European individuals and exome sequencing of 12,940 individuals of multiple ancestries. Over 27M SNPs, indels, and structural variants were identified, including 99% of low-frequency (minor allele frequency [MAF] 0.1–5%) non-coding variants in the whole-genome sequenced individuals and 99.7% of low-frequency coding variants in the whole-exome sequenced individuals. Each variant was tested for association with T2D in the sequenced individuals, and, to increase power, most were tested in larger numbers of individuals (>80% of low-frequency coding variants in ~82 K Europeans via the exome chip, and ~90% of low-frequency non-coding variants in ~44 K Europeans via genotype imputation). The variants, genotypes, and association statistics from these analyses provide the largest reference to date of human genetic information relevant to T2D, for use in activities such as T2D-focused genotype imputation, functional characterization of variants or genes, and other novel analyses to detect associations between sequence variation and T2D. PMID:29257133

  13. VaDiR: an integrated approach to Variant Detection in RNA.

    PubMed

    Neums, Lisa; Suenaga, Seiji; Beyerlein, Peter; Anders, Sara; Koestler, Devin; Mariani, Andrea; Chien, Jeremy

    2018-02-01

    Advances in next-generation DNA sequencing technologies are now enabling detailed characterization of sequence variations in cancer genomes. With whole-genome sequencing, variations in coding and non-coding sequences can be discovered. But the cost associated with it is currently limiting its general use in research. Whole-exome sequencing is used to characterize sequence variations in coding regions, but the cost associated with capture reagents and biases in capture rate limit its full use in research. Additional limitations include uncertainty in assigning the functional significance of the mutations when these mutations are observed in the non-coding region or in genes that are not expressed in cancer tissue. We investigated the feasibility of uncovering mutations from expressed genes using RNA sequencing datasets with a method called Variant Detection in RNA(VaDiR) that integrates 3 variant callers, namely: SNPiR, RVBoost, and MuTect2. The combination of all 3 methods, which we called Tier 1 variants, produced the highest precision with true positive mutations from RNA-seq that could be validated at the DNA level. We also found that the integration of Tier 1 variants with those called by MuTect2 and SNPiR produced the highest recall with acceptable precision. Finally, we observed a higher rate of mutation discovery in genes that are expressed at higher levels. Our method, VaDiR, provides a possibility of uncovering mutations from RNA sequencing datasets that could be useful in further functional analysis. In addition, our approach allows orthogonal validation of DNA-based mutation discovery by providing complementary sequence variation analysis from paired RNA/DNA sequencing datasets.

  14. Effect of Next-Generation Exome Sequencing Depth for Discovery of Diagnostic Variants.

    PubMed

    Kim, Kyung; Seong, Moon-Woo; Chung, Won-Hyong; Park, Sung Sup; Leem, Sangseob; Park, Won; Kim, Jihyun; Lee, KiYoung; Park, Rae Woong; Kim, Namshin

    2015-06-01

    Sequencing depth, which is directly related to the cost and time required for the generation, processing, and maintenance of next-generation sequencing data, is an important factor in the practical utilization of such data in clinical fields. Unfortunately, identifying an exome sequencing depth adequate for clinical use is a challenge that has not been addressed extensively. Here, we investigate the effect of exome sequencing depth on the discovery of sequence variants for clinical use. Toward this, we sequenced ten germ-line blood samples from breast cancer patients on the Illumina platform GAII(x) at a high depth of ~200×. We observed that most function-related diverse variants in the human exonic regions could be detected at a sequencing depth of 120×. Furthermore, investigation using a diagnostic gene set showed that the number of clinical variants identified using exome sequencing reached a plateau at an average sequencing depth of about 120×. Moreover, the phenomena were consistent across the breast cancer samples.

  15. Simultaneous mutation and copy number variation (CNV) detection by multiplex PCR-based GS-FLX sequencing.

    PubMed

    Goossens, Dirk; Moens, Lotte N; Nelis, Eva; Lenaerts, An-Sofie; Glassee, Wim; Kalbe, Andreas; Frey, Bruno; Kopal, Guido; De Jonghe, Peter; De Rijk, Peter; Del-Favero, Jurgen

    2009-03-01

    We evaluated multiplex PCR amplification as a front-end for high-throughput sequencing, to widen the applicability of massive parallel sequencers for the detailed analysis of complex genomes. Using multiplex PCR reactions, we sequenced the complete coding regions of seven genes implicated in peripheral neuropathies in 40 individuals on a GS-FLX genome sequencer (Roche). The resulting dataset showed highly specific and uniform amplification. Comparison of the GS-FLX sequencing data with the dataset generated by Sanger sequencing confirmed the detection of all variants present and proved the sensitivity of the method for mutation detection. In addition, we showed that we could exploit the multiplexed PCR amplicons to determine individual copy number variation (CNV), increasing the spectrum of detected variations to both genetic and genomic variants. We conclude that our straightforward procedure substantially expands the applicability of the massive parallel sequencers for sequencing projects of a moderate number of amplicons (50-500) with typical applications in resequencing exons in positional or functional candidate regions and molecular genetic diagnostics. 2008 Wiley-Liss, Inc.

  16. Analysis of Genes Involved in Body Weight Regulation by Targeted Re-Sequencing.

    PubMed

    Volckmar, Anna-Lena; Han, Chung Ting; Pütter, Carolin; Haas, Stefan; Vogel, Carla I G; Knoll, Nadja; Struve, Christoph; Göbel, Maria; Haas, Katharina; Herrfurth, Nikolas; Jarick, Ivonne; Grallert, Harald; Schürmann, Annette; Al-Hasani, Hadi; Hebebrand, Johannes; Sauer, Sascha; Hinney, Anke

    2016-01-01

    Genes involved in body weight regulation that were previously investigated in genome-wide association studies (GWAS) and in animal models were target-enriched followed by massive parallel next generation sequencing. We enriched and re-sequenced continuous genomic regions comprising FTO, MC4R, TMEM18, SDCCAG8, TKNS, MSRA and TBC1D1 in a screening sample of 196 extremely obese children and adolescents with age and sex specific body mass index (BMI) ≥ 99th percentile and 176 lean adults (BMI ≤ 15th percentile). 22 variants were confirmed by Sanger sequencing. Genotyping was performed in up to 705 independent obesity trios (extremely obese child and both parents), 243 extremely obese cases and 261 lean adults. We detected 20 different non-synonymous variants, one frame shift and one nonsense mutation in the 7 continuous genomic regions in study groups of different weight extremes. For SNP Arg695Cys (rs58983546) in TBC1D1 we detected nominal association with obesity (pTDT = 0.03 in 705 trios). Eleven of the variants were rare, thus were only detected heterozygously in up to ten individual(s) of the complete screening sample of 372 individuals. Two of them (in FTO and MSRA) were found in lean individuals, nine in extremely obese. In silico analyses of the 11 variants did not reveal functional implications for the mutations. Concordant with our hypothesis we detected a rare variant that potentially leads to loss of FTO function in a lean individual. For TBC1D1, in contrary to our hypothesis, the loss of function variant (Arg443Stop) was found in an obese individual. Functional in vitro studies are warranted.

  17. Simultaneous detection of human mitochondrial DNA and nuclear-inserted mitochondrial-origin sequences (NumtS) using forensic mtDNA amplification strategies and pyrosequencing technology.

    PubMed

    Bintz, Brittania J; Dixon, Groves B; Wilson, Mark R

    2014-07-01

    Next-generation sequencing technologies enable the identification of minor mitochondrial DNA variants with higher sensitivity than Sanger methods, allowing for enhanced identification of minor variants. In this study, mixtures of human mtDNA control region amplicons were subjected to pyrosequencing to determine the detection threshold of the Roche GS Junior(®) instrument (Roche Applied Science, Indianapolis, IN). In addition to expected variants, a set of reproducible variants was consistently found in reads from one particular amplicon. A BLASTn search of the variant sequence revealed identity to a segment of a 611-bp nuclear insertion of the mitochondrial control region (NumtS) spanning the primer-binding sites of this amplicon (Nature 1995;378:489). Primers (Hum Genet 2012;131:757; Hum Biol 1996;68:847) flanking the insertion were used to confirm the presence or absence of the NumtS in buccal DNA extracts from twenty donors. These results further our understanding of human mtDNA variation and are expected to have a positive impact on the interpretation of mtDNA profiles using deep-sequencing methods in casework. © 2014 American Academy of Forensic Sciences.

  18. Application of PCR-LDR-nucleic acid detection strip in detection of YMDD mutation in hepatitis B patients treated with lamivudine.

    PubMed

    Xu, Gaolian; You, Qimin; Pickerill, Sam; Zhong, Huayan; Wang, Hongying; Shi, Jian; Luo, Ying; You, Paul; Kong, Huimin; Lu, Fengmin; Hu, Lin

    2010-07-01

    Chronic hepatitis B virus (CHBV) infection causes cirrhosis and hepatocellular carcinoma. Lamivudine (LAM) has been successfully used to treat CHBV infections but prolonged use leads to the emergence of drug-resistant variants. This is primarily linked to a mutation in the tyrosine-methionine-aspartate-aspartate (YMDD) motif of the HBV polymerase gene at position 204. Rapid diagnosis of drug-resistant HBV is necessary for a prompt treatment response. Common diagnostic methods such as sequencing and restriction fragment length polymorphism (RFLP) analysis lack sensitivity and require significant processing. The aim of this study was to demonstrate the usefulness of a novel diagnostic method that combines polymerase chain reaction (PCR), ligase detection reaction (LDR) and a nucleic acid detection strip (NADS) in detecting site-specific mutations related to HBV LAM resistance. We compared this method (PLNA) to direct sequencing and RFLP analysis in 50 clinical samples from HBV infected patients. There was 90% concordance between all three results. PLNA detected more samples containing mutant variants than both sequencing and RFLP analysis and was more sensitive in detecting mixed variant populations. Plasmid standards indicated that the sensitivity of PLNA is at or below 3,000 copies per ml and that it can detect a minor variant at 5% of the total viral population. This warrants its further development and suggests that the PLNA method could be a useful tool in detecting LAM resistance. (c) 2010 Wiley-Liss, Inc.

  19. Segregation and recombination of a multipartite mitochondrial DNA in populations of the potato cyst nematode Globodera pallida.

    PubMed

    Armstrong, Miles R; Husmeier, Dirk; Phillips, Mark S; Blok, Vivian C

    2007-06-01

    The discovery that the potato cyst nematode Globodera pallida has a multipartite mitochondrial DNA (mtDNA) composed, at least in part, of six small circular mtDNAs (scmtDNAs) raised a number of questions concerning the population-level processes that might act on such a complex genome. Here we report our observations on the distribution of some scmtDNAs among a sample of European and South American G. pallida populations. The occurrence of sequence variants of scmtDNA IV in population P4A from South America, and that particular sequence variants are common to the individuals within a single cyst, is described. Evidence for recombination of sequence variants of scmtDNA IV in P4A is also reported. The mosaic structure of P4A scmtDNA IV sequences was revealed using several detection methods and recombination breakpoints were independently detected by maximum likelihood and Bayesian MCMC methods.

  20. Screening for single nucleotide variants, small indels and exon deletions with a next-generation sequencing based gene panel approach for Usher syndrome

    PubMed Central

    Krawitz, Peter M; Schiska, Daniela; Krüger, Ulrike; Appelt, Sandra; Heinrich, Verena; Parkhomchuk, Dmitri; Timmermann, Bernd; Millan, Jose M; Robinson, Peter N; Mundlos, Stefan; Hecht, Jochen; Gross, Manfred

    2014-01-01

    Usher syndrome is an autosomal recessive disorder characterized both by deafness and blindness. For the three clinical subtypes of Usher syndrome causal mutations in altogether 12 genes and a modifier gene have been identified. Due to the genetic heterogeneity of Usher syndrome, the molecular analysis is predestined for a comprehensive and parallelized analysis of all known genes by next-generation sequencing (NGS) approaches. We describe here the targeted enrichment and deep sequencing for exons of Usher genes and compare the costs and workload of this approach compared to Sanger sequencing. We also present a bioinformatics analysis pipeline that allows us to detect single-nucleotide variants, short insertions and deletions, as well as copy number variations of one or more exons on the same sequence data. Additionally, we present a flexible in silico gene panel for the analysis of sequence variants, in which newly identified genes can easily be included. We applied this approach to a cohort of 44 Usher patients and detected biallelic pathogenic mutations in 35 individuals and monoallelic mutations in eight individuals of our cohort. Thirty-nine of the sequence variants, including two heterozygous deletions comprising several exons of USH2A, have not been reported so far. Our NGS-based approach allowed us to assess single-nucleotide variants, small indels, and whole exon deletions in a single test. The described diagnostic approach is fast and cost-effective with a high molecular diagnostic yield. PMID:25333064

  1. Screening for single nucleotide variants, small indels and exon deletions with a next-generation sequencing based gene panel approach for Usher syndrome.

    PubMed

    Krawitz, Peter M; Schiska, Daniela; Krüger, Ulrike; Appelt, Sandra; Heinrich, Verena; Parkhomchuk, Dmitri; Timmermann, Bernd; Millan, Jose M; Robinson, Peter N; Mundlos, Stefan; Hecht, Jochen; Gross, Manfred

    2014-09-01

    Usher syndrome is an autosomal recessive disorder characterized both by deafness and blindness. For the three clinical subtypes of Usher syndrome causal mutations in altogether 12 genes and a modifier gene have been identified. Due to the genetic heterogeneity of Usher syndrome, the molecular analysis is predestined for a comprehensive and parallelized analysis of all known genes by next-generation sequencing (NGS) approaches. We describe here the targeted enrichment and deep sequencing for exons of Usher genes and compare the costs and workload of this approach compared to Sanger sequencing. We also present a bioinformatics analysis pipeline that allows us to detect single-nucleotide variants, short insertions and deletions, as well as copy number variations of one or more exons on the same sequence data. Additionally, we present a flexible in silico gene panel for the analysis of sequence variants, in which newly identified genes can easily be included. We applied this approach to a cohort of 44 Usher patients and detected biallelic pathogenic mutations in 35 individuals and monoallelic mutations in eight individuals of our cohort. Thirty-nine of the sequence variants, including two heterozygous deletions comprising several exons of USH2A, have not been reported so far. Our NGS-based approach allowed us to assess single-nucleotide variants, small indels, and whole exon deletions in a single test. The described diagnostic approach is fast and cost-effective with a high molecular diagnostic yield.

  2. RAPTR-SV: a hybrid method for the detection of structural variants

    USDA-ARS?s Scientific Manuscript database

    Motivation: Identification of Structural Variants (SV) in sequence data results in a large number of false positive calls using existing software, which overburdens subsequent validation. Results: Simulations using RAPTR-SV and another software package that uses a similar algorithm for SV detection...

  3. Comparative oncology DNA sequencing of canine T cell lymphoma via human hotspot panel

    PubMed Central

    Beheshti, Afshin; Pilichowska, Monika; Burgess, Kristine; Ricks-Santi, Luisel; McNiel, Elizabeth; London, Cheryl B.; Ravi, Dashnamoorthy; Evens, Andrew M.

    2018-01-01

    T-cell lymphoma (TCL) is an uncommon and aggressive form of human cancer. Lymphoma is the most common hematopoietic tumor in canines (companion animals), with TCL representing approximately 30% of diagnoses. Collectively, the canine is an appealing model for cancer research given the spontaneous occurrence of cancer, intact immune system, and phytogenetic proximity to humans. We sought to establish mutational congruence of the canine with known human TCL mutations in order to identify potential actionable oncogenic pathways. Following pathologic confirmation, DNA was sequenced in 16 canine TCL (cTCL) cases using a custom Human Cancer Hotspot Panel of 68 genes commonly mutated in human TCL. Sequencing identified 4,527,638 total reads with average length of 229 bases containing 346 unique variants and 1,474 total variants; each sample had an average of 92 variants. Among these, there were 258 germline and 32 somatic variants. Among the 32 somatic variants there were 8 missense variants, 1 splice junction variant and the remaining were intron or synonymous variants. A frequency of 4 somatic mutations per sample were noted with >7 mutations detected in MET, KDR, STK11 and BRAF. Expression of these associated proteins were also detected via Western blot analyses. In addition, Sanger sequencing confirmed three variants of high quality (MYC, MET, and TP53 missense mutation). Taken together, the mutational spectrum and protein analyses showed mutations in signaling pathways similar to human TCL and also identified novel mutations that may serve as drug targets as well as potential biomarkers. PMID:29854308

  4. A massive parallel sequencing workflow for diagnostic genetic testing of mismatch repair genes

    PubMed Central

    Hansen, Maren F; Neckmann, Ulrike; Lavik, Liss A S; Vold, Trine; Gilde, Bodil; Toft, Ragnhild K; Sjursen, Wenche

    2014-01-01

    The purpose of this study was to develop a massive parallel sequencing (MPS) workflow for diagnostic analysis of mismatch repair (MMR) genes using the GS Junior system (Roche). A pathogenic variant in one of four MMR genes, (MLH1, PMS2, MSH6, and MSH2), is the cause of Lynch Syndrome (LS), which mainly predispose to colorectal cancer. We used an amplicon-based sequencing method allowing specific and preferential amplification of the MMR genes including PMS2, of which several pseudogenes exist. The amplicons were pooled at different ratios to obtain coverage uniformity and maximize the throughput of a single-GS Junior run. In total, 60 previously identified and distinct variants (substitutions and indels), were sequenced by MPS and successfully detected. The heterozygote detection range was from 19% to 63% and dependent on sequence context and coverage. We were able to distinguish between false-positive and true-positive calls in homopolymeric regions by cross-sample comparison and evaluation of flow signal distributions. In addition, we filtered variants according to a predefined status, which facilitated variant annotation. Our study shows that implementation of MPS in routine diagnostics of LS can accelerate sample throughput and reduce costs without compromising sensitivity, compared to Sanger sequencing. PMID:24689082

  5. Generic and sequence-variant specific molecular assays for the detection of the highly variable Grapevine leafroll-associated virus 3.

    PubMed

    Chooi, Kar Mun; Cohen, Daniel; Pearson, Michael N

    2013-04-01

    Grapevine leafroll-associated virus 3 (GLRaV-3) is an economically important virus, which is found in all grapevine growing regions worldwide. Its accurate detection in nursery and field samples is of high importance for certification schemes and disease management programmes. To reduce false negatives that can be caused by sequence variability, a new universal primer pair was designed against a divergent sequence data set, targeting the open reading frame 4 (heat shock protein 70 homologue gene), and optimised for conventional one-step RT-PCR and one-step SYBR Green real-time RT-PCR assays. In addition, primer pairs for the simultaneous detection of specific GLRaV-3 variants from groups 1, 2, 6 (specifically NZ-1) and the outlier NZ2 variant, and the generic detection of variants from groups 1 to 5 were designed and optimised as a conventional one-step multiplex RT-PCR assay using the plant nad5 gene as an internal control (i.e. one-step hexaplex RT-PCR). Results showed that the generic and variant specific assays detected in vitro RNA transcripts from a range of 1×10(1)-1×10(8) copies of amplicon per μl diluted in healthy total RNA from Vitis vinifera cv. Cabernet Sauvignon. Furthermore, the assays were employed effectively to screen 157 germplasm and 159 commercial field samples. Thus results demonstrate that the GLRaV-3 generic and variant-specific assays are prospective tools that will be beneficial for certification schemes and disease management programmes, as well as biological and epidemiological studies of the divergent GLRaV-3 populations. Copyright © 2013 Elsevier B.V. All rights reserved.

  6. Comparison of an In Vitro Diagnostic Next-Generation Sequencing Assay with Sanger Sequencing for HIV-1 Genotypic Resistance Testing.

    PubMed

    Tzou, Philip L; Ariyaratne, Pramila; Varghese, Vici; Lee, Charlie; Rakhmanaliev, Elian; Villy, Carolin; Yee, Meiqi; Tan, Kevin; Michel, Gerd; Pinsky, Benjamin A; Shafer, Robert W

    2018-06-01

    The ability of next-generation sequencing (NGS) technologies to detect low frequency HIV-1 drug resistance mutations (DRMs) not detected by dideoxynucleotide Sanger sequencing has potential advantages for improved patient outcomes. We compared the performance of an in vitro diagnostic (IVD) NGS assay, the Sentosa SQ HIV genotyping assay for HIV-1 genotypic resistance testing, with Sanger sequencing on 138 protease/reverse transcriptase (RT) and 39 integrase sequences. The NGS assay used a 5% threshold for reporting low-frequency variants. The level of complete plus partial nucleotide sequence concordance between Sanger sequencing and NGS was 99.9%. Among the 138 protease/RT sequences, a mean of 6.4 DRMs was identified by both Sanger and NGS, a mean of 0.5 DRM was detected by NGS alone, and a mean of 0.1 DRM was detected by Sanger sequencing alone. Among the 39 integrase sequences, a mean of 1.6 DRMs was detected by both Sanger sequencing and NGS and a mean of 0.15 DRM was detected by NGS alone. Compared with Sanger sequencing, NGS estimated higher levels of resistance to one or more antiretroviral drugs for 18.2% of protease/RT sequences and 5.1% of integrase sequences. There was little evidence for technical artifacts in the NGS sequences, but the G-to-A hypermutation was detected in three samples. In conclusion, the IVD NGS assay evaluated in this study was highly concordant with Sanger sequencing. At the 5% threshold for reporting minority variants, NGS appeared to attain a modestly increased sensitivity for detecting low-frequency DRMs without compromising sequence accuracy. Copyright © 2018 American Society for Microbiology.

  7. Double Hits in Schizophrenia.

    PubMed

    Vorstman, Jacob A S; Olde Loohuis, Loes M; Kahn, René S; Ophoff, Roel A

    2018-05-14

    The co-occurrence of a Copy Number Variant (CNV) and a functional variant on the other allele may be a relevant genetic mechanism in schizophrenia. We hypothesized that the cumulative burden of such double hits - in particular those composed of a deletion and a coding single nucleotide variation (SNV) - is increased in patients with schizophrenia.We combined CNV data with coding variants data in 795 patients with schizophrenia and 474 controls. To limit false CNV-detection, only CNVs called only by two algorithms we included. CNV-affected genes were subsequently examined for coding SNVs, which we termed "CNV-SNVs". Correcting for total queried sequence, we assessed the CNV-SNV-burden and the combined predicted deleterious effect. We estimated p-values by permutation of the phenotype.We detected 105 CNV-SNVs; 67 in duplicated and 38 in deleted genic sequence. While the difference in CNV-SNVs rates was not significant, the combined deleteriousness inferred by CNV-SNVs in deleted sequence was almost fourfold higher in cases compared to controls (nominal p = 0.009). This effect may be driven by a higher number of CNV-SNVs and/or by a higher degree of predicted deleteriousness of CNV-SNVs. No such effect was observed for duplications.We provide early evidence that deletions co-occurring with a functional variant may be relevant, albeit of modest impact, for the genetic etiology of schizophrenia. Large-scale consortium studies are required to validate our findings. Sequence-based analyses would provide the best resolution for detection of CNVs as well as coding variants genome-wide.

  8. Next-generation sequencing for genetic testing of familial colorectal cancer syndromes.

    PubMed

    Simbolo, Michele; Mafficini, Andrea; Agostini, Marco; Pedrazzani, Corrado; Bedin, Chiara; Urso, Emanuele D; Nitti, Donato; Turri, Giona; Scardoni, Maria; Fassan, Matteo; Scarpa, Aldo

    2015-01-01

    Genetic screening in families with high risk to develop colorectal cancer (CRC) prevents incurable disease and permits personalized therapeutic and follow-up strategies. The advancement of next-generation sequencing (NGS) technologies has revolutionized the throughput of DNA sequencing. A series of 16 probands for either familial adenomatous polyposis (FAP; 8 cases) or hereditary nonpolyposis colorectal cancer (HNPCC; 8 cases) were investigated for intragenic mutations in five CRC familial syndromes-associated genes (APC, MUTYH, MLH1, MSH2, MSH6) applying both a custom multigene Ion AmpliSeq NGS panel and conventional Sanger sequencing. Fourteen pathogenic variants were detected in 13/16 FAP/HNPCC probands (81.3 %); one FAP proband presented two co-existing pathogenic variants, one in APC and one in MUTYH. Thirteen of these 14 pathogenic variants were detected by both NGS and Sanger, while one MSH2 mutation (L280FfsX3) was identified only by Sanger sequencing. This is due to a limitation of the NGS approach in resolving sequences close or within homopolymeric stretches of DNA. To evaluate the performance of our NGS custom panel we assessed its capability to resolve the DNA sequences corresponding to 2225 pathogenic variants reported in the COSMIC database for APC, MUTYH, MLH1, MSH2, MSH6. Our NGS custom panel resolves the sequences where 2108 (94.7 %) of these variants occur. The remaining 117 mutations reside inside or in close proximity to homopolymer stretches; of these 27 (1.2 %) are imprecisely identified by the software but can be resolved by visual inspection of the region, while the remaining 90 variants (4.0 %) are blind spots. In summary, our custom panel would miss 4 % (90/2225) of pathogenic variants that would need a small set of Sanger sequencing reactions to be solved. The multiplex NGS approach has the advantage of analyzing multiple genes in multiple samples simultaneously, requiring only a reduced number of Sanger sequences to resolve homopolymeric DNA regions not adequately assessed by NGS. The implementation of NGS approaches in routine diagnostics of familial CRC is cost-effective and significantly reduces diagnostic turnaround times.

  9. Whole Exome Sequencing Identifies Rare Protein-Coding Variants in Behçet's Disease.

    PubMed

    Ognenovski, Mikhail; Renauer, Paul; Gensterblum, Elizabeth; Kötter, Ina; Xenitidis, Theodoros; Henes, Jörg C; Casali, Bruno; Salvarani, Carlo; Direskeneli, Haner; Kaufman, Kenneth M; Sawalha, Amr H

    2016-05-01

    Behçet's disease (BD) is a systemic inflammatory disease with an incompletely understood etiology. Despite the identification of multiple common genetic variants associated with BD, rare genetic variants have been less explored. We undertook this study to investigate the role of rare variants in BD by performing whole exome sequencing in BD patients of European descent. Whole exome sequencing was performed in a discovery set comprising 14 German BD patients of European descent. For replication and validation, Sanger sequencing and Sequenom genotyping were performed in the discovery set and in 2 additional independent sets of 49 German BD patients and 129 Italian BD patients of European descent. Genetic association analysis was then performed in BD patients and 503 controls of European descent. Functional effects of associated genetic variants were assessed using bioinformatic approaches. Using whole exome sequencing, we identified 77 rare variants (in 74 genes) with predicted protein-damaging effects in BD. These variants were genotyped in 2 additional patient sets and then analyzed to reveal significant associations with BD at 2 genetic variants detected in all 3 patient sets that remained significant after Bonferroni correction. We detected genetic association between BD and LIMK2 (rs149034313), involved in regulating cytoskeletal reorganization, and between BD and NEIL1 (rs5745908), involved in base excision DNA repair (P = 3.22 × 10(-4) and P = 5.16 × 10(-4) , respectively). The LIMK2 association is a missense variant with predicted protein damage that may influence functional interactions with proteins involved in cytoskeletal regulation by Rho GTPase, inflammation mediated by chemokine and cytokine signaling pathways, T cell activation, and angiogenesis (Bonferroni-corrected P = 5.63 × 10(-14) , P = 7.29 × 10(-6) , P = 1.15 × 10(-5) , and P = 6.40 × 10(-3) , respectively). The genetic association in NEIL1 is a predicted splice donor variant that may introduce a deleterious intron retention and result in a noncoding transcript variant. We used whole exome sequencing in BD for the first time and identified 2 rare putative protein-damaging genetic variants associated with this disease. These genetic variants might influence cytoskeletal regulation and DNA repair mechanisms in BD and might provide further insight into increased leukocyte tissue infiltration and the role of oxidative stress in BD. © 2016, American College of Rheumatology.

  10. A Follow-Up of the Multicenter Collaborative Study on HIV-1 Drug Resistance and Tropism Testing Using 454 Ultra Deep Pyrosequencing

    PubMed Central

    St. John, Elizabeth P.; Simen, Birgitte B.; Turenchalk, Gregory S.; Braverman, Michael S.; Abbate, Isabella; Aerssens, Jeroen; Bouchez, Olivier; Gabriel, Christian; Izopet, Jacques; Meixenberger, Karolin; Di Giallonardo, Francesca; Schlapbach, Ralph; Paredes, Roger; Sakwa, James; Schmitz-Agheguian, Gudrun G.; Thielen, Alexander; Victor, Martin

    2016-01-01

    Background Ultra deep sequencing is of increasing use not only in research but also in diagnostics. For implementation of ultra deep sequencing assays in clinical laboratories for routine diagnostics, intra- and inter-laboratory testing are of the utmost importance. Methods A multicenter study was conducted to validate an updated assay design for 454 Life Sciences’ GS FLX Titanium system targeting protease/reverse transcriptase (RTP) and env (V3) regions to identify HIV-1 drug-resistance mutations and determine co-receptor use with high sensitivity. The study included 30 HIV-1 subtype B and 6 subtype non-B samples with viral titers (VT) of 3,940–447,400 copies/mL, two dilution series (52,129–1,340 and 25,130–734 copies/mL), and triplicate samples. Amplicons spanning PR codons 10–99, RT codons 1–251 and the entire V3 region were generated using barcoded primers. Analysis was performed using the GS Amplicon Variant Analyzer and geno2pheno for tropism. For comparison, population sequencing was performed using the ViroSeq HIV-1 genotyping system. Results The median sequencing depth across the 11 sites was 1,829 reads per position for RTP (IQR 592–3,488) and 2,410 for V3 (IQR 786–3,695). 10 preselected drug resistant variants were measured across sites and showed high inter-laboratory correlation across all sites with data (P<0.001). The triplicate samples of a plasmid mixture confirmed the high inter-laboratory consistency (mean% ± stdev: 4.6 ±0.5, 4.8 ±0.4, 4.9 ±0.3) and revealed good intra-laboratory consistency (mean% range ± stdev range: 4.2–5.2 ± 0.04–0.65). In the two dilutions series, no variants >20% were missed, variants 2–10% were detected at most sites (even at low VT), and variants 1–2% were detected by some sites. All mutations detected by population sequencing were also detected by UDS. Conclusions This assay design results in an accurate and reproducible approach to analyze HIV-1 mutant spectra, even at variant frequencies well below those routinely detectable by population sequencing. PMID:26756901

  11. Reliable Detection of Herpes Simplex Virus Sequence Variation by High-Throughput Resequencing.

    PubMed

    Morse, Alison M; Calabro, Kaitlyn R; Fear, Justin M; Bloom, David C; McIntyre, Lauren M

    2017-08-16

    High-throughput sequencing (HTS) has resulted in data for a number of herpes simplex virus (HSV) laboratory strains and clinical isolates. The knowledge of these sequences has been critical for investigating viral pathogenicity. However, the assembly of complete herpesviral genomes, including HSV, is complicated due to the existence of large repeat regions and arrays of smaller reiterated sequences that are commonly found in these genomes. In addition, the inherent genetic variation in populations of isolates for viruses and other microorganisms presents an additional challenge to many existing HTS sequence assembly pipelines. Here, we evaluate two approaches for the identification of genetic variants in HSV1 strains using Illumina short read sequencing data. The first, a reference-based approach, identifies variants from reads aligned to a reference sequence and the second, a de novo assembly approach, identifies variants from reads aligned to de novo assembled consensus sequences. Of critical importance for both approaches is the reduction in the number of low complexity regions through the construction of a non-redundant reference genome. We compared variants identified in the two methods. Our results indicate that approximately 85% of variants are identified regardless of the approach. The reference-based approach to variant discovery captures an additional 15% representing variants divergent from the HSV1 reference possibly due to viral passage. Reference-based approaches are significantly less labor-intensive and identify variants across the genome where de novo assembly-based approaches are limited to regions where contigs have been successfully assembled. In addition, regions of poor quality assembly can lead to false variant identification in de novo consensus sequences. For viruses with a well-assembled reference genome, a reference-based approach is recommended.

  12. BreaKmer: detection of structural variation in targeted massively parallel sequencing data using kmers.

    PubMed

    Abo, Ryan P; Ducar, Matthew; Garcia, Elizabeth P; Thorner, Aaron R; Rojas-Rudilla, Vanesa; Lin, Ling; Sholl, Lynette M; Hahn, William C; Meyerson, Matthew; Lindeman, Neal I; Van Hummelen, Paul; MacConaill, Laura E

    2015-02-18

    Genomic structural variation (SV), a common hallmark of cancer, has important predictive and therapeutic implications. However, accurately detecting SV using high-throughput sequencing data remains challenging, especially for 'targeted' resequencing efforts. This is critically important in the clinical setting where targeted resequencing is frequently being applied to rapidly assess clinically actionable mutations in tumor biopsies in a cost-effective manner. We present BreaKmer, a novel approach that uses a 'kmer' strategy to assemble misaligned sequence reads for predicting insertions, deletions, inversions, tandem duplications and translocations at base-pair resolution in targeted resequencing data. Variants are predicted by realigning an assembled consensus sequence created from sequence reads that were abnormally aligned to the reference genome. Using targeted resequencing data from tumor specimens with orthogonally validated SV, non-tumor samples and whole-genome sequencing data, BreaKmer had a 97.4% overall sensitivity for known events and predicted 17 positively validated, novel variants. Relative to four publically available algorithms, BreaKmer detected SV with increased sensitivity and limited calls in non-tumor samples, key features for variant analysis of tumor specimens in both the clinical and research settings. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. BETASEQ: a powerful novel method to control type-I error inflation in partially sequenced data for rare variant association testing.

    PubMed

    Yan, Song; Li, Yun

    2014-02-15

    Despite its great capability to detect rare variant associations, next-generation sequencing is still prohibitively expensive when applied to large samples. In case-control studies, it is thus appealing to sequence only a subset of cases to discover variants and genotype the identified variants in controls and the remaining cases under the reasonable assumption that causal variants are usually enriched among cases. However, this approach leads to inflated type-I error if analyzed naively for rare variant association. Several methods have been proposed in recent literature to control type-I error at the cost of either excluding some sequenced cases or correcting the genotypes of discovered rare variants. All of these approaches thus suffer from certain extent of information loss and thus are underpowered. We propose a novel method (BETASEQ), which corrects inflation of type-I error by supplementing pseudo-variants while keeps the original sequence and genotype data intact. Extensive simulations and real data analysis demonstrate that, in most practical situations, BETASEQ leads to higher testing powers than existing approaches with guaranteed (controlled or conservative) type-I error. BETASEQ and associated R files, including documentation, examples, are available at http://www.unc.edu/~yunmli/betaseq

  14. Uptake, Results, and Outcomes of Germline Multiple-Gene Sequencing After Diagnosis of Breast Cancer.

    PubMed

    Kurian, Allison W; Ward, Kevin C; Hamilton, Ann S; Deapen, Dennis M; Abrahamse, Paul; Bondarenko, Irina; Li, Yun; Hawley, Sarah T; Morrow, Monica; Jagsi, Reshma; Katz, Steven J

    2018-05-10

    Low-cost sequencing of multiple genes is increasingly available for cancer risk assessment. Little is known about uptake or outcomes of multiple-gene sequencing after breast cancer diagnosis in community practice. To examine the effect of multiple-gene sequencing on the experience and treatment outcomes for patients with breast cancer. For this population-based retrospective cohort study, patients with breast cancer diagnosed from January 2013 to December 2015 and accrued from SEER registries across Georgia and in Los Angeles, California, were surveyed (n = 5080, response rate = 70%). Responses were merged with SEER data and results of clinical genetic tests, either BRCA1 and BRCA2 (BRCA1/2) sequencing only or including additional other genes (multiple-gene sequencing), provided by 4 laboratories. Type of testing (multiple-gene sequencing vs BRCA1/2-only sequencing), test results (negative, variant of unknown significance, or pathogenic variant), patient experiences with testing (timing of testing, who discussed results), and treatment (strength of patient consideration of, and surgeon recommendation for, prophylactic mastectomy), and prophylactic mastectomy receipt. We defined a patient subgroup with higher pretest risk of carrying a pathogenic variant according to practice guidelines. Among 5026 patients (mean [SD] age, 59.9 [10.7]), 1316 (26.2%) were linked to genetic results from any laboratory. Multiple-gene sequencing increasingly replaced BRCA1/2-only testing over time: in 2013, the rate of multiple-gene sequencing was 25.6% and BRCA1/2-only testing, 74.4%;in 2015 the rate of multiple-gene sequencing was 66.5% and BRCA1/2-only testing, 33.5%. Multiple-gene sequencing was more often ordered by genetic counselors (multiple-gene sequencing, 25.5% and BRCA1/2-only testing, 15.3%) and delayed until after surgery (multiple-gene sequencing, 32.5% and BRCA1/2-only testing, 19.9%). Multiple-gene sequencing substantially increased rate of detection of any pathogenic variant (multiple-gene sequencing: higher-risk patients, 12%; average-risk patients, 4.2% and BRCA1/2-only testing: higher-risk patients, 7.8%; average-risk patients, 2.2%) and variants of uncertain significance, especially in minorities (multiple-gene sequencing: white patients, 23.7%; black patients, 44.5%; and Asian patients, 50.9% and BRCA1/2-only testing: white patients, 2.2%; black patients, 5.6%; and Asian patients, 0%). Multiple-gene sequencing was not associated with an increase in the rate of prophylactic mastectomy use, which was highest with pathogenic variants in BRCA1/2 (BRCA1/2, 79.0%; other pathogenic variant, 37.6%; variant of uncertain significance, 30.2%; negative, 35.3%). Multiple-gene sequencing rapidly replaced BRCA1/2-only testing for patients with breast cancer in the community and enabled 2-fold higher detection of clinically relevant pathogenic variants without an associated increase in prophylactic mastectomy. However, important targets for improvement in the clinical utility of multiple-gene sequencing include postsurgical delay and racial/ethnic disparity in variants of uncertain significance.

  15. Sequencing Structural Variants in Cancer for Precision Therapeutics.

    PubMed

    Macintyre, Geoff; Ylstra, Bauke; Brenton, James D

    2016-09-01

    The identification of mutations that guide therapy selection for patients with cancer is now routine in many clinical centres. The majority of assays used for solid tumour profiling use DNA sequencing to interrogate somatic point mutations because they are relatively easy to identify and interpret. Many cancers, however, including high-grade serous ovarian, oesophageal, and small-cell lung cancer, are driven by somatic structural variants that are not measured by these assays. Therefore, there is currently an unmet need for clinical assays that can cheaply and rapidly profile structural variants in solid tumours. In this review we survey the landscape of 'actionable' structural variants in cancer and identify promising detection strategies based on massively-parallel sequencing. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. Hemoglobin analyses in the Netherlands reveal more than 80 different variants including six novel ones.

    PubMed

    van Zwieten, Rob; Veldthuis, Martijn; Delzenne, Barend; Berghuis, Jeffrey; Groen, Joke; Ait Ichou, Fatima; Clifford, Els; Harteveld, Cornelis L; Stroobants, An K

    2014-01-01

    More than 20,000 blood samples of individuals living in The Netherlands and suspected of hemolytic anemia or diabetes were analyzed by high resolution cation exchange high performance liquid chromatography (HPLC). Besides common disease-related hemoglobins (Hbs), rare variants were also detected. The variant Hbs were retrospectively analyzed by capillary zone electrophoresis (CZE) and by isoelectric focusing (IEF). For unambiguous identification, the globin genes were sequenced. Most of the 80 Hb variants detected by initial screening on HPLC were also separated by capillary electrophoresis (CE), but a few variants were only detectable with one of these methods. Some variants were unstable, had thalassemic properties or increased oxygen affinity, and some interfered with Hb A2 measurement, detection of sickle cell Hb or Hb A1c quantification. Two of the six novel variants, Hb Enschede (HBA2: c.308G  > A, p.Ser103Asn) and Hb Weesp (HBA1: c.301C > T, p.Leu101Phe), had no clinical consequences. In contrast, two others appeared clinically significant: Hb Ede (HBB: c.53A > T, p.Lys18Met) caused thalassemia and Hb Waterland (HBB: c.428C > T, pAla143Val) was related to mild polycytemia. Hb A2-Venlo (HBD: c.193G > A, p.Gly65Ser) and Hb A2-Rotterdam (HBD: c.38A > C, p.Asn13Thr) interfered with Hb A2 quantification. This survey shows that HPLC analysis followed by globin gene sequencing of rare variants is an effective method to reveal Hb variants.

  17. Integrating multiple genomic data to predict disease-causing nonsynonymous single nucleotide variants in exome sequencing studies.

    PubMed

    Wu, Jiaxin; Li, Yanda; Jiang, Rui

    2014-03-01

    Exome sequencing has been widely used in detecting pathogenic nonsynonymous single nucleotide variants (SNVs) for human inherited diseases. However, traditional statistical genetics methods are ineffective in analyzing exome sequencing data, due to such facts as the large number of sequenced variants, the presence of non-negligible fraction of pathogenic rare variants or de novo mutations, and the limited size of affected and normal populations. Indeed, prevalent applications of exome sequencing have been appealing for an effective computational method for identifying causative nonsynonymous SNVs from a large number of sequenced variants. Here, we propose a bioinformatics approach called SPRING (Snv PRioritization via the INtegration of Genomic data) for identifying pathogenic nonsynonymous SNVs for a given query disease. Based on six functional effect scores calculated by existing methods (SIFT, PolyPhen2, LRT, MutationTaster, GERP and PhyloP) and five association scores derived from a variety of genomic data sources (gene ontology, protein-protein interactions, protein sequences, protein domain annotations and gene pathway annotations), SPRING calculates the statistical significance that an SNV is causative for a query disease and hence provides a means of prioritizing candidate SNVs. With a series of comprehensive validation experiments, we demonstrate that SPRING is valid for diseases whose genetic bases are either partly known or completely unknown and effective for diseases with a variety of inheritance styles. In applications of our method to real exome sequencing data sets, we show the capability of SPRING in detecting causative de novo mutations for autism, epileptic encephalopathies and intellectual disability. We further provide an online service, the standalone software and genome-wide predictions of causative SNVs for 5,080 diseases at http://bioinfo.au.tsinghua.edu.cn/spring.

  18. High-resolution melting (HRM) re-analysis of a polyposis patients cohort reveals previously undetected heterozygous and mosaic APC gene mutations.

    PubMed

    Out, Astrid A; van Minderhout, Ivonne J H M; van der Stoep, Nienke; van Bommel, Lysette S R; Kluijt, Irma; Aalfs, Cora; Voorendt, Marsha; Vossen, Rolf H A M; Nielsen, Maartje; Vasen, Hans F A; Morreau, Hans; Devilee, Peter; Tops, Carli M J; Hes, Frederik J

    2015-06-01

    Familial adenomatous polyposis is most frequently caused by pathogenic variants in either the APC gene or the MUTYH gene. The detection rate of pathogenic variants depends on the severity of the phenotype and sensitivity of the screening method, including sensitivity for mosaic variants. For 171 patients with multiple colorectal polyps without previously detectable pathogenic variant, APC was reanalyzed in leukocyte DNA by one uniform technique: high-resolution melting (HRM) analysis. Serial dilution of heterozygous DNA resulted in a lowest detectable allelic fraction of 6% for the majority of variants. HRM analysis and subsequent sequencing detected pathogenic fully heterozygous APC variants in 10 (6%) of the patients and pathogenic mosaic variants in 2 (1%). All these variants were previously missed by various conventional scanning methods. In parallel, HRM APC scanning was applied to DNA isolated from polyp tissue of two additional patients with apparently sporadic polyposis and without detectable pathogenic APC variant in leukocyte DNA. In both patients a pathogenic mosaic APC variant was present in multiple polyps. The detection of pathogenic APC variants in 7% of the patients, including mosaics, illustrates the usefulness of a complete APC gene reanalysis of previously tested patients, by a supplementary scanning method. HRM is a sensitive and fast pre-screening method for reliable detection of heterozygous and mosaic variants, which can be applied to leukocyte and polyp derived DNA.

  19. Detection of Emerging Vaccine-Related Polioviruses by Deep Sequencing.

    PubMed

    Sahoo, Malaya K; Holubar, Marisa; Huang, ChunHong; Mohamed-Hadley, Alisha; Liu, Yuanyuan; Waggoner, Jesse J; Troy, Stephanie B; Garcia-Garcia, Lourdes; Ferreyra-Reyes, Leticia; Maldonado, Yvonne; Pinsky, Benjamin A

    2017-07-01

    Oral poliovirus vaccine can mutate to regain neurovirulence. To date, evaluation of these mutations has been performed primarily on culture-enriched isolates by using conventional Sanger sequencing. We therefore developed a culture-independent, deep-sequencing method targeting the 5' untranslated region (UTR) and P1 genomic region to characterize vaccine-related poliovirus variants. Error analysis of the deep-sequencing method demonstrated reliable detection of poliovirus mutations at levels of <1%, depending on read depth. Sequencing of viral nucleic acids from the stool of vaccinated, asymptomatic children and their close contacts collected during a prospective cohort study in Veracruz, Mexico, revealed no vaccine-derived polioviruses. This was expected given that the longest duration between sequenced sample collection and the end of the most recent national immunization week was 66 days. However, we identified many low-level variants (<5%) distributed across the 5' UTR and P1 genomic region in all three Sabin serotypes, as well as vaccine-related viruses with multiple canonical mutations associated with phenotypic reversion present at high levels (>90%). These results suggest that monitoring emerging vaccine-related poliovirus variants by deep sequencing may aid in the poliovirus endgame and efforts to ensure global polio eradication. Copyright © 2017 Sahoo et al.

  20. Ultradeep Sequencing for Detection of Quasispecies Variants in the Major Hydrophilic Region of Hepatitis B Virus in Indonesian Patients

    PubMed Central

    Yamani, Laura Navika; Utsumi, Takako; Juniastuti; Wandono, Hadi; Widjanarko, Doddy; Triantanoe, Ari; Wasityastuti, Widya; Liang, Yujiao; Okada, Rina; Tanahashi, Toshihito; Murakami, Yoshiki; Azuma, Takeshi; Soetjipto; Lusida, Maria Inge; Hayashi, Yoshitake

    2015-01-01

    Quasispecies of hepatitis B virus (HBV) with variations in the major hydrophilic region (MHR) of the HBV surface antigen (HBsAg) can evolve during infection, allowing HBV to evade neutralizing antibodies. These escape variants may contribute to chronic infections. In this study, we looked for MHR variants in HBV quasispecies using ultradeep sequencing and evaluated the relationship between these variants and clinical manifestations in infected patients. We enrolled 30 Indonesian patients with hepatitis B infection (11 with chronic hepatitis and 19 with advanced liver disease). The most common subgenotype/subtype of HBV was B3/adw (97%). The HBsAg titer was lower in patients with advanced liver disease than that in patients with chronic hepatitis. The MHR variants were grouped based on the percentage of the viral population affected: major, ≥20% of the total population; intermediate, 5% to <20%; and minor, 1% to <5%. The rates of MHR variation that were present in the major and intermediate viral population were significantly greater in patients with advanced liver disease than those in chronic patients. The most frequent MHR variants related to immune evasion in the major and intermediate populations were P120Q/T, T123A, P127T, Q129H/R, M133L/T, and G145R. The major population of MHR variants causing impaired of HBsAg secretion (e.g., G119R, Q129R, T140I, and G145R) was detected only in advanced liver disease patients. This is the first study to use ultradeep sequencing for the detection of MHR variants of HBV quasispecies in Indonesian patients. We found that a greater number of MHR variations was related to disease severity and reduced likelihood of HBsAg titer. PMID:26202119

  1. Sequence Variation in the Small-Subunit rRNA Gene of Plasmodium malariae and Prevalence of Isolates with the Variant Sequence in Sichuan, China

    PubMed Central

    Liu, Qing; Zhu, Shenghua; Mizuno, Sahoko; Kimura, Masatsugu; Liu, Peina; Isomura, Shin; Wang, Xingzhen; Kawamoto, Fumihiko

    1998-01-01

    By two PCR-based diagnostic methods, Plasmodium malariae infections have been rediscovered at two foci in the Sichuan province of China, a region where no cases of P. malariae have been officially reported for the last 2 decades. In addition, a variant form of P. malariae which has a deletion of 19 bp and seven substitutions of base pairs in the target sequence of the small-subunit (SSU) rRNA gene was detected with high frequency. Alignment analysis of Plasmodium sp. SSU rRNA gene sequences revealed that the 5′ region of the variant sequence is identical to that of P. vivax or P. knowlesi and its 3′ region is identical to that of P. malariae. The same sequence variations were also found in P. malariae isolates collected along the Thai-Myanmar border, suggesting a wide distribution of this variant form from southern China to Southeast Asia. PMID:9774600

  2. Identification of pathogen genomic variants through an integrated pipeline

    PubMed Central

    2014-01-01

    Background Whole-genome sequencing represents a powerful experimental tool for pathogen research. We present methods for the analysis of small eukaryotic genomes, including a streamlined system (called Platypus) for finding single nucleotide and copy number variants as well as recombination events. Results We have validated our pipeline using four sets of Plasmodium falciparum drug resistant data containing 26 clones from 3D7 and Dd2 background strains, identifying an average of 11 single nucleotide variants per clone. We also identify 8 copy number variants with contributions to resistance, and report for the first time that all analyzed amplification events are in tandem. Conclusions The Platypus pipeline provides malaria researchers with a powerful tool to analyze short read sequencing data. It provides an accurate way to detect SNVs using known software packages, and a novel methodology for detection of CNVs, though it does not currently support detection of small indels. We have validated that the pipeline detects known SNVs in a variety of samples while filtering out spurious data. We bundle the methods into a freely available package. PMID:24589256

  3. Rare Variant Association Test with Multiple Phenotypes

    PubMed Central

    Lee, Selyeong; Won, Sungho; Kim, Young Jin; Kim, Yongkang; Kim, Bong-Jo; Park, Taesung

    2016-01-01

    Although genome-wide association studies (GWAS) have now discovered thousands of genetic variants associated with common traits, such variants cannot explain the large degree of “missing heritability,” likely due to rare variants. The advent of next generation sequencing technology has allowed rare variant detection and association with common traits, often by investigating specific genomic regions for rare variant effects on a trait. Although multiply correlated phenotypes are often concurrently observed in GWAS, most studies analyze only single phenotypes, which may lessen statistical power. To increase power, multivariate analyses, which consider correlations between multiple phenotypes, can be used. However, few existing multi-variant analyses can identify rare variants for assessing multiple phenotypes. Here, we propose Multivariate Association Analysis using Score Statistics (MAAUSS), to identify rare variants associated with multiple phenotypes, based on the widely used Sequence Kernel Association Test (SKAT) for a single phenotype. We applied MAAUSS to Whole Exome Sequencing (WES) data from a Korean population of 1,058 subjects, to discover genes associated with multiple traits of liver function. We then assessed validation of those genes by a replication study, using an independent dataset of 3,445 individuals. Notably, we detected the gene ZNF620 among five significant genes. We then performed a simulation study to compare MAAUSS's performance with existing methods. Overall, MAAUSS successfully conserved type 1 error rates and in many cases, had a higher power than the existing methods. This study illustrates a feasible and straightforward approach for identifying rare variants correlated with multiple phenotypes, with likely relevance to missing heritability. PMID:28039885

  4. Next-generation sequencing of the monogenic obesity genes LEP, LEPR, MC4R, PCSK1 and POMC in a Norwegian cohort of patients with morbid obesity and normal weight controls.

    PubMed

    Nordang, Gry B N; Busk, Øyvind L; Tveten, Kristian; Hanevik, Hans Ivar; Fell, Anne Kristin M; Hjelmesæth, Jøran; Holla, Øystein L; Hertel, Jens K

    2017-05-01

    Rare sequence variants in at least five genes are known to cause monogenic obesity. In this study we aimed to investigate the prevalence of, and characterize, rare coding and splice site variants in LEP, LEPR, MC4R, PCSK1 and POMC in patients with morbid obesity and normal weight controls. Targeted next-generation sequencing of all exons in LEP, LEPR, MC4R, PCSK1 and POMC was performed in 485 patients with morbid obesity and 327 normal weight population-based controls from Norway. In total 151 variants were detected. Twenty-eight (18.5%) of these were rare, coding or splice variants and five (3.3%) were novel. All individuals, except one control, were heterozygous for the 28 variants, and the distribution of the rare variants showed a significantly higher carrier frequency among cases than controls (9.9% vs. 4.9%, p=0.011). Four variants in MC4R were classified as pathogenic or likely pathogenic. Four cases (0.8%) of monogenic obesity were detected, all due to MC4R variants previously linked to monogenic obesity. Significant differences in carrier frequencies among patients with morbid obesity and normal weight controls suggest an association between heterozygous rare coding variants in these five genes and morbid obesity. However, additional studies in larger cohorts and functional testing of the novel variants identified are required to confirm the findings. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  5. ICO amplicon NGS data analysis: a Web tool for variant detection in common high-risk hereditary cancer genes analyzed by amplicon GS Junior next-generation sequencing.

    PubMed

    Lopez-Doriga, Adriana; Feliubadaló, Lídia; Menéndez, Mireia; Lopez-Doriga, Sergio; Morón-Duran, Francisco D; del Valle, Jesús; Tornero, Eva; Montes, Eva; Cuesta, Raquel; Campos, Olga; Gómez, Carolina; Pineda, Marta; González, Sara; Moreno, Victor; Capellá, Gabriel; Lázaro, Conxi

    2014-03-01

    Next-generation sequencing (NGS) has revolutionized genomic research and is set to have a major impact on genetic diagnostics thanks to the advent of benchtop sequencers and flexible kits for targeted libraries. Among the main hurdles in NGS are the difficulty of performing bioinformatic analysis of the huge volume of data generated and the high number of false positive calls that could be obtained, depending on the NGS technology and the analysis pipeline. Here, we present the development of a free and user-friendly Web data analysis tool that detects and filters sequence variants, provides coverage information, and allows the user to customize some basic parameters. The tool has been developed to provide accurate genetic analysis of targeted sequencing of common high-risk hereditary cancer genes using amplicon libraries run in a GS Junior System. The Web resource is linked to our own mutation database, to assist in the clinical classification of identified variants. We believe that this tool will greatly facilitate the use of the NGS approach in routine laboratories.

  6. Mosaic CREBBP mutation causes overlapping clinical features of Rubinstein-Taybi and Filippi syndromes.

    PubMed

    de Vries, Tamar I; Monroe, Glen R; van Belzen, Martine J; van der Lans, Christian A; Savelberg, Sanne Mc; Newman, William G; van Haaften, Gijs; Nievelstein, Rutger A; van Haelst, Mieke M

    2016-08-01

    Rubinstein-Taybi syndrome (RTS, OMIM 180849) and Filippi syndrome (FLPIS, OMIM 272440) are both rare syndromes, with multiple congenital anomalies and intellectual deficit (MCA/ID). We present a patient with intellectual deficit, short stature, bilateral syndactyly of hands and feet, broad thumbs, ocular abnormalities, and dysmorphic facial features. These clinical features suggest both RTS and FLPIS. Initial DNA analysis of DNA isolated from blood did not identify variants to confirm either of these syndrome diagnoses. Whole-exome sequencing identified a homozygous variant in C9orf173, which was novel at the time of analysis. Further Sanger sequencing analysis of FLPIS cases tested negative for CKAP2L variants did not, however, reveal any further variants. Subsequent analysis using DNA isolated from buccal mucosa revealed a mosaic variant in CREBBP. This report highlights the importance of excluding mosaic variants in patients with a strong but atypical clinical presentation of a MCA/ID syndrome if no disease-causing variants can be detected in DNA isolated from blood samples. As the striking syndactyly observed in the present case is typical for FLPIS, we suggest CREBBP analysis in saliva samples for FLPIS syndrome cases in which no causal CKAP2L variant is detected.

  7. First detection of canine parvovirus type 2b from diarrheic dogs in Himachal Pradesh.

    PubMed

    Sharma, Shalini; Dhar, Prasenjit; Thakur, Aneesh; Sharma, Vivek; Sharma, Mandeep

    2016-09-01

    The present study was conducted to detect the presence of canine parvovirus (CPV) among diarrheic dogs in Himachal Pradesh and to identify the most prevalent antigenic variant of CPV based on molecular typing and sequence analysis of VP2 gene. A total of 102 fecal samples were collected from clinical cases of diarrhea or hemorrhagic gastroenteritis from CPV vaccinated or non-vaccinated dogs. Samples were tested using CPV-specific polymerase chain reaction (PCR) targeting VP2 gene, multiplex PCR for detection of CPV-2a and CPV-2b antigenic variants, and a PCR for the detection of CPV-2c. CPV-2b isolate was cultured on Madin-Darby canine kidney (MDCK) cell lines and sequenced using VP2 structural protein gene. Multiple alignment and phylogenetic analysis was done using ClustalW and MEGA6 and inferred using the Neighbor-Joining method. No sample was found positive for the original CPV strain usually present in the vaccine. However, about 50% (52 out of 102) of the samples were found to be positive with CPV-2ab PCR assay that detects newer variants of CPV circulating in the field. In addition, multiplex PCR assay that identifies both CPV-2ab and CPV-2b revealed that CPV-2b was the major antigenic variant present in the affected dogs. A PCR positive isolate of CPV-2b was adapted to grow in MDCK cells and produced characteristic cytopathic effect after 5 th passage. Multiple sequence alignment of VP2 structural gene of CPV-2b isolate (Accession number HG004610) used in the study was found to be similar to other sequenced isolates in NCBI sequence database and showed 98-99% homology. This study reports the first detection of CPV-2b in dogs with hemorrhagic gastroenteritis in Himachal Pradesh and absence of other antigenic types of CPV. Further, CPV-specific PCR assay can be used for rapid confirmation of circulating virus strains under field conditions.

  8. A near full-length open reading frame next generation sequencing assay for genotyping and identification of resistance-associated variants in hepatitis C virus.

    PubMed

    Pedersen, M S; Fahnøe, U; Hansen, T A; Pedersen, A G; Jenssen, H; Bukh, J; Schønning, K

    2018-06-01

    The current treatment options for hepatitis C virus (HCV), based on direct acting antivirals (DAA), are dependent on virus genotype and previous treatment experience. Treatment failures have been associated with detection of resistance-associated substitutions (RASs) in the DAA targets of HCV, the NS3, NS5A and NS5 B proteins. To develop a next generation sequencing based method that provides genotype and detection of HCV NS3, NS5A, and NS5 B RASs without prior knowledge of sample genotype. In total, 101 residual plasma samples from patients with HCV covering 10 different viral subtypes across 4 genotypes with viral loads of 3.84-7.61 Log IU/mL were included. All samples were de-identified and consequently prior treatment status for patients was unknown. Almost full open reading frame amplicons (∼ 9 kb) were generated using RT-PCR with a single primer set. The resulting amplicons were sequenced with high throughput sequencing and analysed using an in-house developed script for detecting RASs. The method successfully amplified and sequenced 94% (95/101) of samples with an average coverage of 14,035; four of six failed samples were genotype 4a. Samples analysed twice yielded reproducible nucleotide frequencies across all sites. RASs were detected in 21/95 (22%) samples at a 15% threshold. The method identified one patient infected with two genotype 2b variants, and the presence of subgenomic deletion variants in 8 (8.4%) of 95 successfully sequenced samples. The presented method may provide identification of HCV genotype, RASs detection, and detect multiple HCV infection without prior knowledge of sample genotype. Copyright © 2018 Elsevier B.V. All rights reserved.

  9. MToolBox: a highly automated pipeline for heteroplasmy annotation and prioritization analysis of human mitochondrial variants in high-throughput sequencing

    PubMed Central

    Diroma, Maria Angela; Santorsola, Mariangela; Guttà, Cristiano; Gasparre, Giuseppe; Picardi, Ernesto; Pesole, Graziano; Attimonelli, Marcella

    2014-01-01

    Motivation: The increasing availability of mitochondria-targeted and off-target sequencing data in whole-exome and whole-genome sequencing studies (WXS and WGS) has risen the demand of effective pipelines to accurately measure heteroplasmy and to easily recognize the most functionally important mitochondrial variants among a huge number of candidates. To this purpose, we developed MToolBox, a highly automated pipeline to reconstruct and analyze human mitochondrial DNA from high-throughput sequencing data. Results: MToolBox implements an effective computational strategy for mitochondrial genomes assembling and haplogroup assignment also including a prioritization analysis of detected variants. MToolBox provides a Variant Call Format file featuring, for the first time, allele-specific heteroplasmy and annotation files with prioritized variants. MToolBox was tested on simulated samples and applied on 1000 Genomes WXS datasets. Availability and implementation: MToolBox package is available at https://sourceforge.net/projects/mtoolbox/. Contact: marcella.attimonelli@uniba.it Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25028726

  10. High depth, whole-genome sequencing of cholera isolates from Haiti and the Dominican Republic.

    PubMed

    Sealfon, Rachel; Gire, Stephen; Ellis, Crystal; Calderwood, Stephen; Qadri, Firdausi; Hensley, Lisa; Kellis, Manolis; Ryan, Edward T; LaRocque, Regina C; Harris, Jason B; Sabeti, Pardis C

    2012-09-11

    Whole-genome sequencing is an important tool for understanding microbial evolution and identifying the emergence of functionally important variants over the course of epidemics. In October 2010, a severe cholera epidemic began in Haiti, with additional cases identified in the neighboring Dominican Republic. We used whole-genome approaches to sequence four Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae isolates to a high depth of coverage (>2000x); four of the seven isolates were previously sequenced. Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants. We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads. Phylogenetic analysis between the newly sequenced and thirty-three previously sequenced V. cholerae isolates indicates that the Haitian and Dominican Republic isolates are closest to strains from South Asia. The Haitian and Dominican Republic isolates form a tight cluster, with only four variants unique to individual isolates. These variants are located in the CTX region, the SXT region, and the core genome. Of the 126 mutations identified that separate the Haiti-Dominican Republic cluster from the V. cholerae reference strain (N16961), 73 are non-synonymous changes, and a number of these changes cluster in specific genes and pathways. Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.

  11. A statistical approach to detection of copy number variations in PCR-enriched targeted sequencing data.

    PubMed

    Demidov, German; Simakova, Tamara; Vnuchkova, Julia; Bragin, Anton

    2016-10-22

    Multiplex polymerase chain reaction (PCR) is a common enrichment technique for targeted massive parallel sequencing (MPS) protocols. MPS is widely used in biomedical research and clinical diagnostics as the fast and accurate tool for the detection of short genetic variations. However, identification of larger variations such as structure variants and copy number variations (CNV) is still being a challenge for targeted MPS. Some approaches and tools for structural variants detection were proposed, but they have limitations and often require datasets of certain type, size and expected number of amplicons affected by CNVs. In the paper, we describe novel algorithm for high-resolution germinal CNV detection in the PCR-enriched targeted sequencing data and present accompanying tool. We have developed a machine learning algorithm for the detection of large duplications and deletions in the targeted sequencing data generated with PCR-based enrichment step. We have performed verification studies and established the algorithm's sensitivity and specificity. We have compared developed tool with other available methods applicable for the described data and revealed its higher performance. We showed that our method has high specificity and sensitivity for high-resolution copy number detection in targeted sequencing data using large cohort of samples.

  12. Common and rare variants associated with kidney stones and biochemical traits

    PubMed Central

    Oddsson, Asmundur; Sulem, Patrick; Helgason, Hannes; Edvardsson, Vidar O.; Thorleifsson, Gudmar; Sveinbjörnsson, Gardar; Haraldsdottir, Eik; Eyjolfsson, Gudmundur I.; Sigurdardottir, Olof; Olafsson, Isleifur; Masson, Gisli; Holm, Hilma; Gudbjartsson, Daniel F.; Thorsteinsdottir, Unnur; Indridason, Olafur S.; Palsson, Runolfur; Stefansson, Kari

    2015-01-01

    Kidney stone disease is a complex disorder with a strong genetic component. We conducted a genome-wide association study of 28.3 million sequence variants detected through whole-genome sequencing of 2,636 Icelanders that were imputed into 5,419 kidney stone cases, including 2,172 cases with a history of recurrent kidney stones, and 279,870 controls. We identify sequence variants associating with kidney stones at ALPL (rs1256328[T], odds ratio (OR)=1.21, P=5.8 × 10−10) and a suggestive association at CASR (rs7627468[A], OR=1.16, P=2.0 × 10−8). Focusing our analysis on coding sequence variants in 63 genes with preferential kidney expression we identify two rare missense variants SLC34A1 p.Tyr489Cys (OR=2.38, P=2.8 × 10−5) and TRPV5 p.Leu530Arg (OR=3.62, P=4.1 × 10−5) associating with recurrent kidney stones. We also observe associations of the identified kidney stone variants with biochemical traits in a large population set, indicating potential biological mechanism. PMID:26272126

  13. Common and rare variants associated with kidney stones and biochemical traits.

    PubMed

    Oddsson, Asmundur; Sulem, Patrick; Helgason, Hannes; Edvardsson, Vidar O; Thorleifsson, Gudmar; Sveinbjörnsson, Gardar; Haraldsdottir, Eik; Eyjolfsson, Gudmundur I; Sigurdardottir, Olof; Olafsson, Isleifur; Masson, Gisli; Holm, Hilma; Gudbjartsson, Daniel F; Thorsteinsdottir, Unnur; Indridason, Olafur S; Palsson, Runolfur; Stefansson, Kari

    2015-08-14

    Kidney stone disease is a complex disorder with a strong genetic component. We conducted a genome-wide association study of 28.3 million sequence variants detected through whole-genome sequencing of 2,636 Icelanders that were imputed into 5,419 kidney stone cases, including 2,172 cases with a history of recurrent kidney stones, and 279,870 controls. We identify sequence variants associating with kidney stones at ALPL (rs1256328[T], odds ratio (OR)=1.21, P=5.8 × 10(-10)) and a suggestive association at CASR (rs7627468[A], OR=1.16, P=2.0 × 10(-8)). Focusing our analysis on coding sequence variants in 63 genes with preferential kidney expression we identify two rare missense variants SLC34A1 p.Tyr489Cys (OR=2.38, P=2.8 × 10(-5)) and TRPV5 p.Leu530Arg (OR=3.62, P=4.1 × 10(-5)) associating with recurrent kidney stones. We also observe associations of the identified kidney stone variants with biochemical traits in a large population set, indicating potential biological mechanism.

  14. From days to hours: reporting clinically actionable variants from whole genome sequencing.

    PubMed

    Middha, Sumit; Baheti, Saurabh; Hart, Steven N; Kocher, Jean-Pierre A

    2014-01-01

    As the cost of whole genome sequencing (WGS) decreases, clinical laboratories will be looking at broadly adopting this technology to screen for variants of clinical significance. To fully leverage this technology in a clinical setting, results need to be reported quickly, as the turnaround rate could potentially impact patient care. The latest sequencers can sequence a whole human genome in about 24 hours. However, depending on the computing infrastructure available, the processing of data can take several days, with the majority of computing time devoted to aligning reads to genomics regions that are to date not clinically interpretable. In an attempt to accelerate the reporting of clinically actionable variants, we have investigated the utility of a multi-step alignment algorithm focused on aligning reads and calling variants in genomic regions of clinical relevance prior to processing the remaining reads on the whole genome. This iterative workflow significantly accelerates the reporting of clinically actionable variants with no loss of accuracy when compared to genotypes obtained with the OMNI SNP platform or to variants detected with a standard workflow that combines Novoalign and GATK.

  15. AmpliVar: mutation detection in high-throughput sequence from amplicon-based libraries.

    PubMed

    Hsu, Arthur L; Kondrashova, Olga; Lunke, Sebastian; Love, Clare J; Meldrum, Cliff; Marquis-Nicholson, Renate; Corboy, Greg; Pham, Kym; Wakefield, Matthew; Waring, Paul M; Taylor, Graham R

    2015-04-01

    Conventional means of identifying variants in high-throughput sequencing align each read against a reference sequence, and then call variants at each position. Here, we demonstrate an orthogonal means of identifying sequence variation by grouping the reads as amplicons prior to any alignment. We used AmpliVar to make key-value hashes of sequence reads and group reads as individual amplicons using a table of flanking sequences. Low-abundance reads were removed according to a selectable threshold, and reads above this threshold were aligned as groups, rather than as individual reads, permitting the use of sensitive alignment tools. We show that this approach is more sensitive, more specific, and more computationally efficient than comparable methods for the analysis of amplicon-based high-throughput sequencing data. The method can be extended to enable alignment-free confirmation of variants seen in hybridization capture target-enrichment data. © 2015 WILEY PERIODICALS, INC.

  16. Carbapenem-Resistant Acinetobacter baumannii from Serbia: Revision of CarO Classification

    PubMed Central

    Novovic, Katarina; Mihajlovic, Sanja; Vasiljevic, Zorica; Filipic, Brankica; Begovic, Jelena; Jovcic, Branko

    2015-01-01

    Carbapenem-resistant A. baumannii present a significant therapeutic challenge for the treatment of nosocomial infections in many European countries. Although it is known that the gradient of A. baumannii prevalence increases from northern to southern Europe, this study provides the first data from Serbia. Twenty-eight carbapenem-resistant A. baumannii clinical isolates were collected at a Serbian pediatric hospital during a 2-year period. The majority of isolates (67.68%) belonged to the sequence type Group 1, European clonal complex II. All isolates harbored intrinsic OXA-51 and AmpC cephalosporinase. OXA-23 was detected in 16 isolates (57.14%), OXA-24 in 23 isolates (82.14%) and OXA-58 in 11 isolates (39.29%). Six of the isolates (21.43%) harbored all of the analyzed oxacillinases, except OXA-143 and OXA-235 that were not detected in this study. Production of oxacillinases was detected in different pulsotypes indicating the presence of horizontal gene transfer. NDM-1, VIM and IMP were not detected in analyzed clinical A. baumannii isolates. ISAba1 insertion sequence was present upstream of OXA-51 in one isolate, upstream of AmpC in 13 isolates and upstream of OXA-23 in 10 isolates. In silico analysis of carO sequences from analyzed A. baumannii isolates revealed the existence of two out of six highly polymorphic CarO variants. The phylogenetic analysis of CarO protein among Acinetobacter species revised the previous classification CarO variants into three groups based on strong bootstraps scores in the tree analysis. Group I comprises four variants (I-IV) while Groups II and III contain only one variant each. One half of the Serbian clinical isolates belong to Group I variant I, while the other half belongs to Group I variant III. PMID:25822626

  17. An Integrated Tool to Study MHC Region: Accurate SNV Detection and HLA Genes Typing in Human MHC Region Using Targeted High-Throughput Sequencing

    PubMed Central

    Liu, Xiao; Xu, Yinyin; Liang, Dequan; Gao, Peng; Sun, Yepeng; Gifford, Benjamin; D’Ascenzo, Mark; Liu, Xiaomin; Tellier, Laurent C. A. M.; Yang, Fang; Tong, Xin; Chen, Dan; Zheng, Jing; Li, Weiyang; Richmond, Todd; Xu, Xun; Wang, Jun; Li, Yingrui

    2013-01-01

    The major histocompatibility complex (MHC) is one of the most variable and gene-dense regions of the human genome. Most studies of the MHC, and associated regions, focus on minor variants and HLA typing, many of which have been demonstrated to be associated with human disease susceptibility and metabolic pathways. However, the detection of variants in the MHC region, and diagnostic HLA typing, still lacks a coherent, standardized, cost effective and high coverage protocol of clinical quality and reliability. In this paper, we presented such a method for the accurate detection of minor variants and HLA types in the human MHC region, using high-throughput, high-coverage sequencing of target regions. A probe set was designed to template upon the 8 annotated human MHC haplotypes, and to encompass the 5 megabases (Mb) of the extended MHC region. We deployed our probes upon three, genetically diverse human samples for probe set evaluation, and sequencing data show that ∼97% of the MHC region, and over 99% of the genes in MHC region, are covered with sufficient depth and good evenness. 98% of genotypes called by this capture sequencing prove consistent with established HapMap genotypes. We have concurrently developed a one-step pipeline for calling any HLA type referenced in the IMGT/HLA database from this target capture sequencing data, which shows over 96% typing accuracy when deployed at 4 digital resolution. This cost-effective and highly accurate approach for variant detection and HLA typing in the MHC region may lend further insight into immune-mediated diseases studies, and may find clinical utility in transplantation medicine research. This one-step pipeline is released for general evaluation and use by the scientific community. PMID:23894464

  18. Simple, multiplexed, PCR-based barcoding of DNA enables sensitive mutation detection in liquid biopsies using sequencing.

    PubMed

    Ståhlberg, Anders; Krzyzanowski, Paul M; Jackson, Jennifer B; Egyud, Matthew; Stein, Lincoln; Godfrey, Tony E

    2016-06-20

    Detection of cell-free DNA in liquid biopsies offers great potential for use in non-invasive prenatal testing and as a cancer biomarker. Fetal and tumor DNA fractions however can be extremely low in these samples and ultra-sensitive methods are required for their detection. Here, we report an extremely simple and fast method for introduction of barcodes into DNA libraries made from 5 ng of DNA. Barcoded adapter primers are designed with an oligonucleotide hairpin structure to protect the molecular barcodes during the first rounds of polymerase chain reaction (PCR) and prevent them from participating in mis-priming events. Our approach enables high-level multiplexing and next-generation sequencing library construction with flexible library content. We show that uniform libraries of 1-, 5-, 13- and 31-plex can be generated. Utilizing the barcodes to generate consensus reads for each original DNA molecule reduces background sequencing noise and allows detection of variant alleles below 0.1% frequency in clonal cell line DNA and in cell-free plasma DNA. Thus, our approach bridges the gap between the highly sensitive but specific capabilities of digital PCR, which only allows a limited number of variants to be analyzed, with the broad target capability of next-generation sequencing which traditionally lacks the sensitivity to detect rare variants. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. A Protein Domain and Family Based Approach to Rare Variant Association Analysis.

    PubMed

    Richardson, Tom G; Shihab, Hashem A; Rivas, Manuel A; McCarthy, Mark I; Campbell, Colin; Timpson, Nicholas J; Gaunt, Tom R

    2016-01-01

    It has become common practice to analyse large scale sequencing data with statistical approaches based around the aggregation of rare variants within the same gene. We applied a novel approach to rare variant analysis by collapsing variants together using protein domain and family coordinates, regarded to be a more discrete definition of a biologically functional unit. Using Pfam definitions, we collapsed rare variants (Minor Allele Frequency ≤ 1%) together in three different ways 1) variants within single genomic regions which map to individual protein domains 2) variants within two individual protein domain regions which are predicted to be responsible for a protein-protein interaction 3) all variants within combined regions from multiple genes responsible for coding the same protein domain (i.e. protein families). A conventional collapsing analysis using gene coordinates was also undertaken for comparison. We used UK10K sequence data and investigated associations between regions of variants and lipid traits using the sequence kernel association test (SKAT). We observed no strong evidence of association between regions of variants based on Pfam domain definitions and lipid traits. Quantile-Quantile plots illustrated that the overall distributions of p-values from the protein domain analyses were comparable to that of a conventional gene-based approach. Deviations from this distribution suggested that collapsing by either protein domain or gene definitions may be favourable depending on the trait analysed. We have collapsed rare variants together using protein domain and family coordinates to present an alternative approach over collapsing across conventionally used gene-based regions. Although no strong evidence of association was detected in these analyses, future studies may still find value in adopting these approaches to detect previously unidentified association signals.

  20. Validation and optimization of the Ion Torrent S5 XL sequencer and Oncomine workflow for BRCA1 and BRCA2 genetic testing.

    PubMed

    Shin, Saeam; Kim, Yoonjung; Chul Oh, Seoung; Yu, Nae; Lee, Seung-Tae; Rak Choi, Jong; Lee, Kyung-A

    2017-05-23

    In this study, we validated the analytical performance of BRCA1/2 sequencing using Ion Torrent's new bench-top sequencer with amplicon panel with optimized bioinformatics pipelines. Using 43 samples that were previously validated by Illumina's MiSeq platform and/or by Sanger sequencing/multiplex ligation-dependent probe amplification, we amplified the target with the Oncomine™ BRCA Research Assay and sequenced on Ion Torrent S5 XL (Thermo Fisher Scientific, Waltham, MA, USA). We compared two bioinformatics pipelines for optimal processing of S5 XL sequence data: the Torrent Suite with a plug-in Torrent Variant Caller (Thermo Fisher Scientific), and commercial NextGENe software (Softgenetics, State College, PA, USA). All expected 681 single nucleotide variants, 15 small indels, and three copy number variants were correctly called, except one common variant adjacent to a rare variant on the primer-binding site. The sensitivity, specificity, false positive rate, and accuracy for detection of single nucleotide variant and small indels of S5 XL sequencing were 99.85%, 100%, 0%, and 99.99% for the Torrent Variant Caller and 99.85%, 99.99%, 0.14%, and 99.99% for NextGENe, respectively. The reproducibility of variant calling was 100%, and the precision of variant frequency also showed good performance with coefficients of variation between 0.32 and 5.29%. We obtained highly accurate data through uniform and sufficient coverage depth over all target regions and through optimization of the bioinformatics pipeline. We confirmed that our platform is accurate and practical for diagnostic BRCA1/2 testing in a clinical laboratory.

  1. QQ-SNV: single nucleotide variant detection at low frequency by comparing the quality quantiles.

    PubMed

    Van der Borght, Koen; Thys, Kim; Wetzels, Yves; Clement, Lieven; Verbist, Bie; Reumers, Joke; van Vlijmen, Herman; Aerssens, Jeroen

    2015-11-10

    Next generation sequencing enables studying heterogeneous populations of viral infections. When the sequencing is done at high coverage depth ("deep sequencing"), low frequency variants can be detected. Here we present QQ-SNV (http://sourceforge.net/projects/qqsnv), a logistic regression classifier model developed for the Illumina sequencing platforms that uses the quantiles of the quality scores, to distinguish true single nucleotide variants from sequencing errors based on the estimated SNV probability. To train the model, we created a dataset of an in silico mixture of five HIV-1 plasmids. Testing of our method in comparison to the existing methods LoFreq, ShoRAH, and V-Phaser 2 was performed on two HIV and four HCV plasmid mixture datasets and one influenza H1N1 clinical dataset. For default application of QQ-SNV, variants were called using a SNV probability cutoff of 0.5 (QQ-SNV(D)). To improve the sensitivity we used a SNV probability cutoff of 0.0001 (QQ-SNV(HS)). To also increase specificity, SNVs called were overruled when their frequency was below the 80(th) percentile calculated on the distribution of error frequencies (QQ-SNV(HS-P80)). When comparing QQ-SNV versus the other methods on the plasmid mixture test sets, QQ-SNV(D) performed similarly to the existing approaches. QQ-SNV(HS) was more sensitive on all test sets but with more false positives. QQ-SNV(HS-P80) was found to be the most accurate method over all test sets by balancing sensitivity and specificity. When applied to a paired-end HCV sequencing study, with lowest spiked-in true frequency of 0.5%, QQ-SNV(HS-P80) revealed a sensitivity of 100% (vs. 40-60% for the existing methods) and a specificity of 100% (vs. 98.0-99.7% for the existing methods). In addition, QQ-SNV required the least overall computation time to process the test sets. Finally, when testing on a clinical sample, four putative true variants with frequency below 0.5% were consistently detected by QQ-SNV(HS-P80) from different generations of Illumina sequencers. We developed and successfully evaluated a novel method, called QQ-SNV, for highly efficient single nucleotide variant calling on Illumina deep sequencing virology data.

  2. A low density microarray method for the identification of human papillomavirus type 18 variants.

    PubMed

    Meza-Menchaca, Thuluz; Williams, John; Rodríguez-Estrada, Rocío B; García-Bravo, Aracely; Ramos-Ligonio, Ángel; López-Monteon, Aracely; Zepeda, Rossana C

    2013-09-26

    We describe a novel microarray based-method for the screening of oncogenic human papillomavirus 18 (HPV-18) molecular variants. Due to the fact that sequencing methodology may underestimate samples containing more than one variant we designed a specific and sensitive stacking DNA hybridization assay. This technology can be used to discriminate between three possible phylogenetic branches of HPV-18. Probes were attached covalently on glass slides and hybridized with single-stranded DNA targets. Prior to hybridization with the probes, the target strands were pre-annealed with the three auxiliary contiguous oligonucleotides flanking the target sequences. Screening HPV-18 positive cell lines and cervical samples were used to evaluate the performance of this HPV DNA microarray. Our results demonstrate that the HPV-18's variants hybridized specifically to probes, with no detection of unspecific signals. Specific probes successfully reveal detectable point mutations in these variants. The present DNA oligoarray system can be used as a reliable, sensitive and specific method for HPV-18 variant screening. Furthermore, this simple assay allows the use of inexpensive equipment, making it accessible in resource-poor settings.

  3. A Low Density Microarray Method for the Identification of Human Papillomavirus Type 18 Variants

    PubMed Central

    Meza-Menchaca, Thuluz; Williams, John; Rodríguez-Estrada, Rocío B.; García-Bravo, Aracely; Ramos-Ligonio, Ángel; López-Monteon, Aracely; Zepeda, Rossana C.

    2013-01-01

    We describe a novel microarray based-method for the screening of oncogenic human papillomavirus 18 (HPV-18) molecular variants. Due to the fact that sequencing methodology may underestimate samples containing more than one variant we designed a specific and sensitive stacking DNA hybridization assay. This technology can be used to discriminate between three possible phylogenetic branches of HPV-18. Probes were attached covalently on glass slides and hybridized with single-stranded DNA targets. Prior to hybridization with the probes, the target strands were pre-annealed with the three auxiliary contiguous oligonucleotides flanking the target sequences. Screening HPV-18 positive cell lines and cervical samples were used to evaluate the performance of this HPV DNA microarray. Our results demonstrate that the HPV-18's variants hybridized specifically to probes, with no detection of unspecific signals. Specific probes successfully reveal detectable point mutations in these variants. The present DNA oligoarray system can be used as a reliable, sensitive and specific method for HPV-18 variant screening. Furthermore, this simple assay allows the use of inexpensive equipment, making it accessible in resource-poor settings. PMID:24077317

  4. TaqMan based real time PCR assay targeting EML4-ALK fusion transcripts in NSCLC.

    PubMed

    Robesova, Blanka; Bajerova, Monika; Liskova, Kvetoslava; Skrickova, Jana; Tomiskova, Marcela; Pospisilova, Sarka; Mayer, Jiri; Dvorakova, Dana

    2014-07-01

    Lung cancer with the ALK rearrangement constitutes only a small fraction of patients with non-small cell lung cancer (NSCLC). However, in the era of molecular-targeted therapy, efficient patient selection is crucial for successful treatment. In this context, an effective method for EML4-ALK detection is necessary. We developed a new highly sensitive variant specific TaqMan based real time PCR assay applicable to RNA from formalin-fixed paraffin-embedded tissue (FFPE). This assay was used to analyze the EML4-ALK gene in 96 non-selected NSCLC specimens and compared with two other methods (end-point PCR and break-apart FISH). EML4-ALK was detected in 33/96 (34%) specimens using variant specific real time PCR, whereas in only 23/96 (24%) using end-point PCR. All real time PCR positive samples were confirmed with direct sequencing. A total of 46 specimens were subsequently analyzed by all three detection methods. Using variant specific real time PCR we identified EML4-ALK transcript in 17/46 (37%) specimens, using end-point PCR in 13/46 (28%) specimens and positive ALK rearrangement by FISH was detected in 8/46 (17.4%) specimens. Moreover, using variant specific real time PCR, 5 specimens showed more than one EML4-ALK variant simultaneously (in 2 cases the variants 1+3a+3b, in 2 specimens the variants 1+3a and in 1 specimen the variant 1+3b). In one case of 96 EML4-ALK fusion gene and EGFR mutation were detected. All simultaneous genetic variants were confirmed using end-point PCR and direct sequencing. Our variant specific real time PCR assay is highly sensitive, fast, financially acceptable, applicable to FFPE and seems to be a valuable tool for the rapid prescreening of NSCLC patients in clinical practice, so, that most patients able to benefit from targeted therapy could be identified. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  5. Deep whole-genome sequencing of 100 southeast Asian Malays.

    PubMed

    Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2013-01-10

    Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  6. Deep Whole-Genome Sequencing of 100 Southeast Asian Malays

    PubMed Central

    Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2013-01-01

    Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies. PMID:23290073

  7. Performance Comparison of Bench-Top Next Generation Sequencers Using Microdroplet PCR-Based Enrichment for Targeted Sequencing in Patients with Autism Spectrum Disorder

    PubMed Central

    Okamoto, Nobuhiko; Nakashima, Mitsuko; Tsurusaki, Yoshinori; Miyake, Noriko; Saitsu, Hirotomo; Matsumoto, Naomichi

    2013-01-01

    Next-generation sequencing (NGS) combined with enrichment of target genes enables highly efficient and low-cost sequencing of multiple genes for genetic diseases. The aim of this study was to validate the accuracy and sensitivity of our method for comprehensive mutation detection in autism spectrum disorder (ASD). We assessed the performance of the bench-top Ion Torrent PGM and Illumina MiSeq platforms as optimized solutions for mutation detection, using microdroplet PCR-based enrichment of 62 ASD associated genes. Ten patients with known mutations were sequenced using NGS to validate the sensitivity of our method. The overall read quality was better with MiSeq, largely because of the increased indel-related error associated with PGM. The sensitivity of SNV detection was similar between the two platforms, suggesting they are both suitable for SNV detection in the human genome. Next, we used these methods to analyze 28 patients with ASD, and identified 22 novel variants in genes associated with ASD, with one mutation detected by MiSeq only. Thus, our results support the combination of target gene enrichment and NGS as a valuable molecular method for investigating rare variants in ASD. PMID:24066114

  8. Genome and Transcriptome Sequencing of the Ostreid herpesvirus 1 From Tomales Bay, California

    NASA Astrophysics Data System (ADS)

    Burge, C. A.; Langevin, S.; Closek, C. J.; Roberts, S. B.; Friedman, C. S.

    2016-02-01

    Mass mortalities of larval and seed bivalve molluscs attributed to the Ostreid herpesvirus 1 (OsHV-1) occur globally. OsHV-1 was fully sequenced and characterized as a member of the Family Malacoherpesviridae. Multiple strains of OsHV-1 exist and may vary in virulence, i.e. OsHV-1 µvar. For most global variants of OsHV-1, sequence data is limited to PCR-based sequencing of segments, including two recent genomes. In the United States, OsHV-1 is limited to detection in adjacent embayments in California, Tomales and Drakes bays. Limited DNA sequence data of OsHV-1 infecting oysters in Tomales Bay indicates the virus detected in Tomales Bay is similar but not identical to any one global variant of OsHV-1. In order to better understand both strain variation and virulence of OsHV-1 infecting oysters in Tomales Bay, we used genomic and transcriptomic sequencing. Meta-genomic sequencing (Illumina MiSeq) was conducted from infected oysters (n=4 per year) collected in 2003, 2007, and 2014, where full OsHV-1 genome sequences and low overall microbial diversity were achieved from highly infected oysters. Increased microbial diversity was detected in three of four samples sequenced from 2003, where qPCR based genome copy numbers of OsHV-1 were lower. Expression analysis (SOLiD RNA sequencing) of OsHV-1 genes expressed in oyster larvae at 24 hours post exposure revealed a nearly complete transcriptome, with several highly expressed genes, which are similar to recent transcriptomic analyses of other OsHV-1 variants. Taken together, our results indicate that genome and transcriptome sequencing may be powerful tools in understanding both strain variation and virulence of non-culturable marine viruses.

  9. High-throughput matrix-assisted laser desorption ionization-time of flight mass spectrometry as an alternative approach to monitoring drug resistance of hepatitis B virus.

    PubMed

    Rybicka, Magda; Stalke, Piotr; Dreczewski, Marcin; Smiatacz, Tomasz; Bielawski, Krzysztof Piotr

    2014-01-01

    Long-term antiviral therapy of chronic hepatitis B virus (HBV) infection can lead to the selection of drug-resistant HBV variants and treatment failure. Moreover, these HBV strains are possibly present in treatment-naive patients. Currently available assays for the detection of HBV drug resistance can identify mutants that constitute ≥5% of the viral population. Furthermore, drug-resistant HBV variants can be detected when a viral load is >10(4) copies/ml (1,718 IU/ml). The aim of this study was to compare matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) and multitemperature single-strand conformation polymorphism (MSSCP) with commercially available assays for the detection of drug-resistant HBV strains. HBV DNA was extracted from 87 serum samples acquired from 45 chronic hepatitis B (CHB) patients. The 37 selected HBV variants were analyzed in 4 separate primer extension reactions on the MALDI-TOF MS. Moreover, MSSCP for identifying drug-resistant HBV YMDD variants was developed and turned out to be more sensitive than INNOLiPA HBV DR and direct sequencing. MALDI-TOF MS had the capability to detect mutant strains within a mixed viral population occurring with an allelic frequency of approximately 1% (with a specific value of ≥10(2) copies/ml, also expressed as ≥17.18 IU/ml). In our study, MSSCP detected 98% of the HBV YMDD variants among strains detected by the MALDI-TOF MS assay. The routine tests revealed results of 40% and 11%, respectively, for INNOLiPA and direct sequencing. The commonly available HBV tests are less sensitive than MALDI-TOF MS in the detection of HBV-resistant variants, including quasispecies.

  10. Comparison of Ion Personal Genome Machine Platforms for the Detection of Variants in BRCA1 and BRCA2.

    PubMed

    Hwang, Sang Mee; Lee, Ki Chan; Lee, Min Seob; Park, Kyoung Un

    2018-01-01

    Transition to next generation sequencing (NGS) for BRCA1 / BRCA2 analysis in clinical laboratories is ongoing but different platforms and/or data analysis pipelines give different results resulting in difficulties in implementation. We have evaluated the Ion Personal Genome Machine (PGM) Platforms (Ion PGM, Ion PGM Dx, Thermo Fisher Scientific) for the analysis of BRCA1 /2. The results of Ion PGM with OTG-snpcaller, a pipeline based on Torrent mapping alignment program and Genome Analysis Toolkit, from 75 clinical samples and 14 reference DNA samples were compared with Sanger sequencing for BRCA1 / BRCA2 . Ten clinical samples and 14 reference DNA samples were additionally sequenced by Ion PGM Dx with Torrent Suite. Fifty types of variants including 18 pathogenic or variants of unknown significance were identified from 75 clinical samples and known variants of the reference samples were confirmed by Sanger sequencing and/or NGS. One false-negative results were present for Ion PGM/OTG-snpcaller for an indel variant misidentified as a single nucleotide variant. However, eight discordant results were present for Ion PGM Dx/Torrent Suite with both false-positive and -negative results. A 40-bp deletion, a 4-bp deletion and a 1-bp deletion variant was not called and a false-positive deletion was identified. Four other variants were misidentified as another variant. Ion PGM/OTG-snpcaller showed acceptable performance with good concordance with Sanger sequencing. However, Ion PGM Dx/Torrent Suite showed many discrepant results not suitable for use in a clinical laboratory, requiring further optimization of the data analysis for calling variants.

  11. FAVR (Filtering and Annotation of Variants that are Rare): methods to facilitate the analysis of rare germline genetic variants from massively parallel sequencing datasets

    PubMed Central

    2013-01-01

    Background Characterising genetic diversity through the analysis of massively parallel sequencing (MPS) data offers enormous potential to significantly improve our understanding of the genetic basis for observed phenotypes, including predisposition to and progression of complex human disease. Great challenges remain in resolving genetic variants that are genuine from the millions of artefactual signals. Results FAVR is a suite of new methods designed to work with commonly used MPS analysis pipelines to assist in the resolution of some of the issues related to the analysis of the vast amount of resulting data, with a focus on relatively rare genetic variants. To the best of our knowledge, no equivalent method has previously been described. The most important and novel aspect of FAVR is the use of signatures in comparator sequence alignment files during variant filtering, and annotation of variants potentially shared between individuals. The FAVR methods use these signatures to facilitate filtering of (i) platform and/or mapping-specific artefacts, (ii) common genetic variants, and, where relevant, (iii) artefacts derived from imbalanced paired-end sequencing, as well as annotation of genetic variants based on evidence of co-occurrence in individuals. We applied conventional variant calling applied to whole-exome sequencing datasets, produced using both SOLiD and TruSeq chemistries, with or without downstream processing by FAVR methods. We demonstrate a 3-fold smaller rare single nucleotide variant shortlist with no detected reduction in sensitivity. This analysis included Sanger sequencing of rare variant signals not evident in dbSNP131, assessment of known variant signal preservation, and comparison of observed and expected rare variant numbers across a range of first cousin pairs. The principles described herein were applied in our recent publication identifying XRCC2 as a new breast cancer risk gene and have been made publically available as a suite of software tools. Conclusions FAVR is a platform-agnostic suite of methods that significantly enhances the analysis of large volumes of sequencing data for the study of rare genetic variants and their influence on phenotypes. PMID:23441864

  12. Towards Clinical Molecular Diagnosis of Inherited Cardiac Conditions: A Comparison of Bench-Top Genome DNA Sequencers

    PubMed Central

    Wilkinson, Samuel L.; John, Shibu; Walsh, Roddy; Novotny, Tomas; Valaskova, Iveta; Gupta, Manu; Game, Laurence; Barton, Paul J R.; Cook, Stuart A.; Ware, James S.

    2013-01-01

    Background Molecular genetic testing is recommended for diagnosis of inherited cardiac disease, to guide prognosis and treatment, but access is often limited by cost and availability. Recently introduced high-throughput bench-top DNA sequencing platforms have the potential to overcome these limitations. Methodology/Principal Findings We evaluated two next-generation sequencing (NGS) platforms for molecular diagnostics. The protein-coding regions of six genes associated with inherited arrhythmia syndromes were amplified from 15 human samples using parallelised multiplex PCR (Access Array, Fluidigm), and sequenced on the MiSeq (Illumina) and Ion Torrent PGM (Life Technologies). Overall, 97.9% of the target was sequenced adequately for variant calling on the MiSeq, and 96.8% on the Ion Torrent PGM. Regions missed tended to be of high GC-content, and most were problematic for both platforms. Variant calling was assessed using 107 variants detected using Sanger sequencing: within adequately sequenced regions, variant calling on both platforms was highly accurate (Sensitivity: MiSeq 100%, PGM 99.1%. Positive predictive value: MiSeq 95.9%, PGM 95.5%). At the time of the study the Ion Torrent PGM had a lower capital cost and individual runs were cheaper and faster. The MiSeq had a higher capacity (requiring fewer runs), with reduced hands-on time and simpler laboratory workflows. Both provide significant cost and time savings over conventional methods, even allowing for adjunct Sanger sequencing to validate findings and sequence exons missed by NGS. Conclusions/Significance MiSeq and Ion Torrent PGM both provide accurate variant detection as part of a PCR-based molecular diagnostic workflow, and provide alternative platforms for molecular diagnosis of inherited cardiac conditions. Though there were performance differences at this throughput, platforms differed primarily in terms of cost, scalability, protocol stability and ease of use. Compared with current molecular genetic diagnostic tests for inherited cardiac arrhythmias, these NGS approaches are faster, less expensive, and yet more comprehensive. PMID:23861798

  13. Phylogeny and S1 Gene Variation of Infectious Bronchitis Virus Detected in Broilers and Layers in Turkey.

    PubMed

    Yilmaz, Huseyin; Altan, Eda; Cizmecigil, Utku Y; Gurel, Aydin; Ozturk, Gulay Yuzbasioglu; Bamac, Ozge Erdogan; Aydin, Ozge; Britton, Paul; Monne, Isabella; Cetinkaya, Burhan; Morgan, Kenton L; Faburay, Bonto; Richt, Juergen A; Turan, Nuri

    2016-09-01

    The avian coronavirus infectious bronchitis virus (AvCoV-IBV) is recognized as an important global pathogen because new variants are a continuous threat to the poultry industry worldwide. This study investigates the genetic origin and diversity of AvCoV-IBV by analysis of the S1 sequence derived from 49 broiler flocks and 14 layer flocks in different regions of Turkey. AvCoV-IBV RNA was detected in 41 (83.6%) broiler flocks and nine (64.2%) of the layer flocks by TaqMan real-time RT-PCR. In addition, AvCoV-IBV RNA was detected in the tracheas 27/30 (90%), lungs 31/49 (62.2%), caecal tonsils 7/22 (31.8%), and kidneys 4/49 (8.1%) of broiler flocks examined. Pathologic lesions, hemorrhages, and mononuclear infiltrations were predominantly observed in tracheas and to a lesser extent in the lungs and a few in kidneys. A phylogenetic tree based on partial S1 sequences of the detected AvCoV-IBVs (including isolates) revealed that 1) viruses detected in five broiler flocks were similar to the IBV vaccines Ma5, H120, M41; 2) viruses detected in 24 broiler flocks were similar to those previously reported from Turkey and to Israel variant-2 strains; 3) viruses detected in seven layer flocks were different from those found in any of the broiler flocks but similar to viruses previously reported from Iran, India, and China (similar to Israel variant-1 and 4/91 serotypes); and 4) that the AVCoV-IBV, Israeli variant-2 strain, found to be circulating in Turkey appears to be undergoing molecular evolution. In conclusion, genetically different AvCoV-IBV strains, including vaccine-like strains, based on their partial S1 sequence, are circulating in broiler and layer chicken flocks in Turkey and the Israeli variant-2 strain is undergoing evolution.

  14. Preconception Carrier Screening by Genome Sequencing: Results from the Clinical Laboratory.

    PubMed

    Punj, Sumit; Akkari, Yassmine; Huang, Jennifer; Yang, Fei; Creason, Allison; Pak, Christine; Potter, Amiee; Dorschner, Michael O; Nickerson, Deborah A; Robertson, Peggy D; Jarvik, Gail P; Amendola, Laura M; Schleit, Jennifer; Simpson, Dana Kostiner; Rope, Alan F; Reiss, Jacob; Kauffman, Tia; Gilmore, Marian J; Himes, Patricia; Wilfond, Benjamin; Goddard, Katrina A B; Richards, C Sue

    2018-06-07

    Advances in sequencing technologies permit the analysis of a larger selection of genes for preconception carrier screening. The study was designed as a sequential carrier screen using genome sequencing to analyze 728 gene-disorder pairs for carrier and medically actionable conditions in 131 women and their partners (n = 71) who were planning a pregnancy. We report here on the clinical laboratory results from this expanded carrier screening program. Variants were filtered and classified using the latest American College of Medical Genetics and Genomics (ACMG) guideline; only pathogenic and likely pathogenic variants were confirmed by orthologous methods before being reported. Novel missense variants were classified as variants of uncertain significance. We reported 304 variants in 202 participants. Twelve carrier couples (12/71 couples tested) were identified for common conditions; eight were carriers for hereditary hemochromatosis. Although both known and novel variants were reported, 48% of all reported variants were missense. For novel splice-site variants, RNA-splicing assays were performed to aid in classification. We reported ten copy-number variants and five variants in non-coding regions. One novel variant was reported in F8, associated with hemophilia A; prenatal testing showed that the male fetus harbored this variant and the neonate suffered a life-threatening hemorrhage which was anticipated and appropriately managed. Moreover, 3% of participants had variants that were medically actionable. Compared with targeted mutation screening, genome sequencing improves the sensitivity of detecting clinically significant variants. While certain novel variant interpretation remains challenging, the ACMG guidelines are useful to classify variants in a healthy population. Copyright © 2018 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  15. Ultrasensitive Genotypic Detection of Antiviral Resistance in Hepatitis B Virus Clinical Isolates▿ †

    PubMed Central

    Fang, Jie; Wichroski, Michael J.; Levine, Steven M.; Baldick, Carl J.; Mazzucco, Charles E.; Walsh, Ann W.; Kienzle, Bernadette K.; Rose, Ronald E.; Pokornowski, Kevin A.; Colonno, Richard J.; Tenney, Daniel J.

    2009-01-01

    Amino acid substitutions that confer reduced susceptibility to antivirals arise spontaneously through error-prone viral polymerases and are selected as a result of antiviral therapy. Resistance substitutions first emerge in a fraction of the circulating virus population, below the limit of detection by nucleotide sequencing of either the population or limited sets of cloned isolates. These variants can expand under drug pressure to dominate the circulating virus population. To enhance detection of these viruses in clinical samples, we established a highly sensitive quantitative, real-time allele-specific PCR assay for hepatitis B virus (HBV) DNA. Sensitivity was accomplished using a high-fidelity DNA polymerase and oligonucleotide primers containing locked nucleic acid bases. Quantitative measurement of resistant and wild-type variants was accomplished using sequence-matched standards. Detection methodology that was not reliant on hybridization probes, and assay modifications, minimized the effect of patient-specific sequence polymorphisms. The method was validated using samples from patients chronically infected with HBV through parallel sequencing of large numbers of cloned isolates. Viruses with resistance to lamivudine and other l-nucleoside analogs and entecavir, involving 17 different nucleotide substitutions, were reliably detected at levels at or below 0.1% of the total population. The method worked across HBV genotypes. Longitudinal analysis of patient samples showed earlier emergence of resistance on therapy than was seen with sequencing methodologies, including some cases of resistance that existed prior to treatment. In summary, we established and validated an ultrasensitive method for measuring resistant HBV variants in clinical specimens, which enabled earlier, quantitative measurement of resistance to therapy. PMID:19433559

  16. Limited Variation in BK Virus T-Cell Epitopes Revealed by Next-Generation Sequencing

    PubMed Central

    Sahoo, Malaya K.; Tan, Susanna K.; Chen, Sharon F.; Kapusinszky, Beatrix; Concepcion, Katherine R.; Kjelson, Lynn; Mallempati, Kalyan; Farina, Heidi M.; Fernández-Viña, Marcelo; Tyan, Dolly; Grimm, Paul C.; Anderson, Matthew W.; Concepcion, Waldo

    2015-01-01

    BK virus (BKV) infection causing end-organ disease remains a formidable challenge to the hematopoietic cell transplant (HCT) and kidney transplant fields. As BKV-specific treatments are limited, immunologic-based therapies may be a promising and novel therapeutic option for transplant recipients with persistent BKV infection. Here, we describe a whole-genome, deep-sequencing methodology and bioinformatics pipeline that identify BKV variants across the genome and at BKV-specific HLA-A2-, HLA-B0702-, and HLA-B08-restricted CD8 T-cell epitopes. BKV whole genomes were amplified using long-range PCR with four inverse primer sets, and fragmentation libraries were sequenced on the Ion Torrent Personal Genome Machine (PGM). An error model and variant-calling algorithm were developed to accurately identify rare variants. A total of 65 samples from 18 pediatric HCT and kidney recipients with quantifiable BKV DNAemia underwent whole-genome sequencing. Limited genetic variation was observed. The median number of amino acid variants identified per sample was 8 (range, 2 to 37; interquartile range, 10), with the majority of variants (77%) detected at a frequency of <5%. When normalized for length, there was no statistical difference in the median number of variants across all genes. Similarly, the predominant virus population within samples harbored T-cell epitopes similar to the reference BKV strain that was matched for the BKV genotype. Despite the conservation of epitopes, low-level variants in T-cell epitopes were detected in 77.7% (14/18) of patients. Understanding epitope variation across the whole genome provides insight into the virus-immune interface and may help guide the development of protocols for novel immunologic-based therapies. PMID:26202116

  17. Leveraging long read sequencing from a single individual to provide a comprehensive resource for benchmarking variant calling methods

    PubMed Central

    Mu, John C.; Tootoonchi Afshar, Pegah; Mohiyuddin, Marghoob; Chen, Xi; Li, Jian; Bani Asadi, Narges; Gerstein, Mark B.; Wong, Wing H.; Lam, Hugo Y. K.

    2015-01-01

    A high-confidence, comprehensive human variant set is critical in assessing accuracy of sequencing algorithms, which are crucial in precision medicine based on high-throughput sequencing. Although recent works have attempted to provide such a resource, they still do not encompass all major types of variants including structural variants (SVs). Thus, we leveraged the massive high-quality Sanger sequences from the HuRef genome to construct by far the most comprehensive gold set of a single individual, which was cross validated with deep Illumina sequencing, population datasets, and well-established algorithms. It was a necessary effort to completely reanalyze the HuRef genome as its previously published variants were mostly reported five years ago, suffering from compatibility, organization, and accuracy issues that prevent their direct use in benchmarking. Our extensive analysis and validation resulted in a gold set with high specificity and sensitivity. In contrast to the current gold sets of the NA12878 or HS1011 genomes, our gold set is the first that includes small variants, deletion SVs and insertion SVs up to a hundred thousand base-pairs. We demonstrate the utility of our HuRef gold set to benchmark several published SV detection tools. PMID:26412485

  18. Exon 11 skipping of SCN10A coding for voltage-gated sodium channels in dorsal root ganglia

    PubMed Central

    Schirmeyer, Jana; Szafranski, Karol; Leipold, Enrico; Mawrin, Christian; Platzer, Matthias; Heinemann, Stefan H

    2014-01-01

    The voltage-gated sodium channel NaV1.8 (encoded by SCN10A) is predominantly expressed in dorsal root ganglia (DRG) and plays a critical role in pain perception. We analyzed SCN10A transcripts isolated from human DRGs using deep sequencing and found a novel splice variant lacking exon 11, which codes for 98 amino acids of the domain I/II linker. Quantitative PCR analysis revealed an abundance of this variant of up to 5–10% in human, while no such variants were detected in mouse or rat. Since no obvious functional differences between channels with and without the exon-11 sequence were detected, it is suggested that SCN10A exon 11 skipping in humans is a tolerated event. PMID:24763188

  19. Mosaic CREBBP mutation causes overlapping clinical features of Rubinstein–Taybi and Filippi syndromes

    PubMed Central

    de Vries, Tamar I; R Monroe, Glen; van Belzen, Martine J; van der Lans, Christian A; Savelberg, Sanne MC; Newman, William G; van Haaften, Gijs; Nievelstein, Rutger A; van Haelst, Mieke M

    2016-01-01

    Rubinstein–Taybi syndrome (RTS, OMIM 180849) and Filippi syndrome (FLPIS, OMIM 272440) are both rare syndromes, with multiple congenital anomalies and intellectual deficit (MCA/ID). We present a patient with intellectual deficit, short stature, bilateral syndactyly of hands and feet, broad thumbs, ocular abnormalities, and dysmorphic facial features. These clinical features suggest both RTS and FLPIS. Initial DNA analysis of DNA isolated from blood did not identify variants to confirm either of these syndrome diagnoses. Whole-exome sequencing identified a homozygous variant in C9orf173, which was novel at the time of analysis. Further Sanger sequencing analysis of FLPIS cases tested negative for CKAP2L variants did not, however, reveal any further variants. Subsequent analysis using DNA isolated from buccal mucosa revealed a mosaic variant in CREBBP. This report highlights the importance of excluding mosaic variants in patients with a strong but atypical clinical presentation of a MCA/ID syndrome if no disease-causing variants can be detected in DNA isolated from blood samples. As the striking syndactyly observed in the present case is typical for FLPIS, we suggest CREBBP analysis in saliva samples for FLPIS syndrome cases in which no causal CKAP2L variant is detected. PMID:26956253

  20. Analysis of Sequence Variation and Risk Association of Human Papillomavirus 52 Variants Circulating in Korea

    PubMed Central

    Choi, Youn Jin; Ki, Eun Young; Zhang, Chuqing; Ho, Wendy C. S.; Lee, Sung-Jong; Jeong, Min Jin

    2016-01-01

    Introduction Human papillomavirus (HPV) 52 is a carcinogenic, high-risk genotype frequently detected in cervical cancer cases from East Asia, including Korea. Materials and Methods Sequences of HPV52 detected in 91 cervical samples collected from women attending Seoul St. Mary’s Hospital were analyzed. HPV52 genomic sequences were obtained by polymerase chain reaction (PCR)-based sequencing and analyzed using Seq-Scape software, and phylogenetic trees were constructed using MEGA6 software. Results Of the 91 cervical samples, 40 were normal, 22 were low-grade lesions, 21 were high-grade lesions and 7 were squamous cell carcinomas. Four HPV52 variant lineages (A, B, C and D) were identified. Lineage B was the most frequently detected lineage, followed by lineage C. By analyzing the two most frequently detected lineages (B and C), we found that distinct variations existed in each lineage. We also found that a lineage B-specific mutation K93R (A379G) was associated with an increased risk of cervical neoplasia. Conclusions To our knowledge, we are the first to reveal the predominance of the HPV52 lineages, B and C, in Korea. We also found these lineages harbored distinct genetic alterations that may affect oncogenicity. Our findings increase our understanding on the heterogeneity of HPV52 variants, and may be useful for the development of new diagnostic assays and therapeutic vaccines. PMID:27977741

  1. Consensus generation and variant detection by Celera Assembler.

    PubMed

    Denisov, Gennady; Walenz, Brian; Halpern, Aaron L; Miller, Jason; Axelrod, Nelson; Levy, Samuel; Sutton, Granger

    2008-04-15

    We present an algorithm to identify allelic variation given a Whole Genome Shotgun (WGS) assembly of haploid sequences, and to produce a set of haploid consensus sequences rather than a single consensus sequence. Existing WGS assemblers take a column-by-column approach to consensus generation, and produce a single consensus sequence which can be inconsistent with the underlying haploid alleles, and inconsistent with any of the aligned sequence reads. Our new algorithm uses a dynamic windowing approach. It detects alleles by simultaneously processing the portions of aligned reads spanning a region of sequence variation, assigns reads to their respective alleles, phases adjacent variant alleles and generates a consensus sequence corresponding to each confirmed allele. This algorithm was used to produce the first diploid genome sequence of an individual human. It can also be applied to assemblies of multiple diploid individuals and hybrid assemblies of multiple haploid organisms. Being applied to the individual human genome assembly, the new algorithm detects exactly two confirmed alleles and reports two consensus sequences in 98.98% of the total number 2,033311 detected regions of sequence variation. In 33,269 out of 460,373 detected regions of size >1 bp, it fixes the constructed errors of a mosaic haploid representation of a diploid locus as produced by the original Celera Assembler consensus algorithm. Using an optimized procedure calibrated against 1 506 344 known SNPs, it detects 438 814 new heterozygous SNPs with false positive rate 12%. The open source code is available at: http://wgs-assembler.cvs.sourceforge.net/wgs-assembler/

  2. [A new variant of the simian T-lymphotropic retrovirus type I (STLV-IF) in the Sukhumi colony of hamadryas baboons].

    PubMed

    Chikobaeva, M G; Schatzl, H; Rose, D; Bush, U; Iakovleva, L A; Deinhardt, F; Helm, K; Lapin, B A

    1993-01-01

    Polymerase chain reaction (PCR) was developed for the detection of simian T-lymphotropic virus type 1 (STLV-1) infection of P. hamadryas and direct sequencing using oligo-nucleotide primer pairs specific for the tax and env regions of the related human T-lymphotropic virus type 1 (HTLV-1). Excellent specificity was shown in the detection of STLV-1 provirus in infected baboons by PCR using HTLV-1-derived primers. The nucleotide sequences of env 467bp and tax 159bp of the proviral genome (env position 5700-6137, tax position 7373-7498 HTLV-1, according to Seiki et al., 1983) derived from STLV-1-infected P. hamadryas were analysed using PCR and direct sequencing techniques. Two STLV-1 isolates from different sources (Sukhumi main-SuTLV-1 and forest stocks-STLV-1F) were compared. Two variants of STLV-1 among P. hamadryas with different level of homology to HTLV-1 were wound (83.8% and 95.2%, respectively). A possible role of nucleotide changes in env and tax sequenced fragments and oncogenicity of STLV-1 variants is discussed.

  3. Mutation detection of E6 and LCR genes from HPV 16 associated with carcinogenesis.

    PubMed

    Mosmann, Jessica P; Monetti, Marina S; Frutos, Maria C; Kiguen, Ana X; Venezuela, Raul F; Cuffini, Cecilia G

    2015-01-01

    Human papillomavirus (HPV) is responsible for one of the most frequent sexually transmitted infections. The first phylogenetic analysis was based on a LCR region fragment. Nowadays, 4 variants are known: African (Af-1, Af-2), Asian-American (AA) and European (E). However the existence of sub-lineages of the European variant havs been proposed, specific mutations in the E6 and LCR sequences being possibly related to persistent viral infections. The aim of this study was a phylogenetic study of HPV16 sequences of endocervical samples from Cordoba, in order to detect the circulating lineages and analyze the presence of mutations that could be correlated with malignant disease. The phylogenetic analysis determined that 86% of the samples belonged to the E variant, 7% to AF-1 and the remaining 7% to AF-2. The most frequent mutation in LCR sequences was G7521A, in 80% of the analyzed samples; it affects the binding site of a transcription factor that could contribute to carcinogenesis. In the E6 sequences, the most common mutation was T350G (L83V), detected in 67% of the samples, associated with increased risk of persistent infection. The high detection rate of the European lineage correlated with patterns of human migration. This study emphasizes the importance of recognizing circulating lineages, as well as the detection of mutations associated with high-grade neoplastic lesions that could be correlated to the development of carcinogenic lesions.

  4. HGVS Recommendations for the Description of Sequence Variants: 2016 Update.

    PubMed

    den Dunnen, Johan T; Dalgleish, Raymond; Maglott, Donna R; Hart, Reece K; Greenblatt, Marc S; McGowan-Jordan, Jean; Roux, Anne-Francoise; Smith, Timothy; Antonarakis, Stylianos E; Taschner, Peter E M

    2016-06-01

    The consistent and unambiguous description of sequence variants is essential to report and exchange information on the analysis of a genome. In particular, DNA diagnostics critically depends on accurate and standardized description and sharing of the variants detected. The sequence variant nomenclature system proposed in 2000 by the Human Genome Variation Society has been widely adopted and has developed into an internationally accepted standard. The recommendations are currently commissioned through a Sequence Variant Description Working Group (SVD-WG) operating under the auspices of three international organizations: the Human Genome Variation Society (HGVS), the Human Variome Project (HVP), and the Human Genome Organization (HUGO). Requests for modifications and extensions go through the SVD-WG following a standard procedure including a community consultation step. Version numbers are assigned to the nomenclature system to allow users to specify the version used in their variant descriptions. Here, we present the current recommendations, HGVS version 15.11, and briefly summarize the changes that were made since the 2000 publication. Most focus has been on removing inconsistencies and tightening definitions allowing automatic data processing. An extensive version of the recommendations is available online, at http://www.HGVS.org/varnomen. © 2016 WILEY PERIODICALS, INC.

  5. Whole-Exome Sequencing to Decipher the Genetic Heterogeneity of Hearing Loss in a Chinese Family with Deaf by Deaf Mating

    PubMed Central

    Qing, Jie; Yan, Denise; Zhou, Yuan; Liu, Qiong; Wu, Weijing; Xiao, Zian; Liu, Yuyuan; Liu, Jia; Du, Lilin; Xie, Dinghua; Liu, Xue Zhong

    2014-01-01

    Inherited deafness has been shown to have high genetic heterogeneity. For many decades, linkage analysis and candidate gene approaches have been the main tools to elucidate the genetics of hearing loss. However, this associated study design is costly, time-consuming, and unsuitable for small families. This is mainly due to the inadequate numbers of available affected individuals, locus heterogeneity, and assortative mating. Exome sequencing has now become technically feasible and a cost-effective method for detection of disease variants underlying Mendelian disorders due to the recent advances in next-generation sequencing (NGS) technologies. In the present study, we have combined both the Deafness Gene Mutation Detection Array and exome sequencing to identify deafness causative variants in a large Chinese composite family with deaf by deaf mating. The simultaneous screening of the 9 common deafness mutations using the allele-specific PCR based universal array, resulted in the identification of the 1555A>G in the mitochondrial DNA (mtDNA) 12S rRNA in affected individuals in one branch of the family. We then subjected the mutation-negative cases to exome sequencing and identified novel causative variants in the MYH14 and WFS1 genes. This report confirms the effective use of a NGS technique to detect pathogenic mutations in affected individuals who were not candidates for classical genetic studies. PMID:25289672

  6. Detection of a divergent variant of grapevine virus F by next-generation sequencing.

    PubMed

    Molenaar, Nicholas; Burger, Johan T; Maree, Hans J

    2015-08-01

    The complete genome sequence of a South African isolate of grapevine virus F (GVF) is presented. It was first detected by metagenomic next-generation sequencing of field samples and validated through direct Sanger sequencing. The genome sequence of GVF isolate V5 consists of 7539 nucleotides and contains a poly(A) tail. It has a typical vitivirus genome arrangement that comprises five open reading frames (ORFs), which share only 88.96 % nucleotide sequence identity with the existing complete GVF genome sequence (JX105428).

  7. Detection of clinically relevant copy-number variants by exome sequencing in a large cohort of genetic disorders

    PubMed Central

    Pfundt, Rolph; del Rosario, Marisol; Vissers, Lisenka E.L.M.; Kwint, Michael P.; Janssen, Irene M.; de Leeuw, Nicole; Yntema, Helger G.; Nelen, Marcel R.; Lugtenberg, Dorien; Kamsteeg, Erik-Jan; Wieskamp, Nienke; Stegmann, Alexander P.A.; Stevens, Servi J.C.; Rodenburg, Richard J.T.; Simons, Annet; Mensenkamp, Arjen R.; Rinne, Tuula; Gilissen, Christian; Scheffer, Hans; Veltman, Joris A.; Hehir-Kwa, Jayne Y.

    2017-01-01

    Purpose: Copy-number variation is a common source of genomic variation and an important genetic cause of disease. Microarray-based analysis of copy-number variants (CNVs) has become a first-tier diagnostic test for patients with neurodevelopmental disorders, with a diagnostic yield of 10–20%. However, for most other genetic disorders, the role of CNVs is less clear and most diagnostic genetic studies are generally limited to the study of single-nucleotide variants (SNVs) and other small variants. With the introduction of exome and genome sequencing, it is now possible to detect both SNVs and CNVs using an exome- or genome-wide approach with a single test. Methods: We performed exome-based read-depth CNV screening on data from 2,603 patients affected by a range of genetic disorders for which exome sequencing was performed in a diagnostic setting. Results: In total, 123 clinically relevant CNVs ranging in size from 727 bp to 15.3 Mb were detected, which resulted in 51 conclusive diagnoses and an overall increase in diagnostic yield of ~2% (ranging from 0 to –5.8% per disorder). Conclusions: This study shows that CNVs play an important role in a broad range of genetic disorders and that detection via exome-based CNV profiling results in an increase in the diagnostic yield without additional testing, bringing us closer to single-test genomics. Genet Med advance online publication 27 October 2016 PMID:28574513

  8. Significance of functional disease-causal/susceptible variants identified by whole-genome analyses for the understanding of human diseases.

    PubMed

    Hitomi, Yuki; Tokunaga, Katsushi

    2017-01-01

    Human genome variation may cause differences in traits and disease risks. Disease-causal/susceptible genes and variants for both common and rare diseases can be detected by comprehensive whole-genome analyses, such as whole-genome sequencing (WGS), using next-generation sequencing (NGS) technology and genome-wide association studies (GWAS). Here, in addition to the application of an NGS as a whole-genome analysis method, we summarize approaches for the identification of functional disease-causal/susceptible variants from abundant genetic variants in the human genome and methods for evaluating their functional effects in human diseases, using an NGS and in silico and in vitro functional analyses. We also discuss the clinical applications of the functional disease causal/susceptible variants to personalized medicine.

  9. Whole Genome Sequencing Increases Molecular Diagnostic Yield Compared with Current Diagnostic Testing for Inherited Retinal Disease.

    PubMed

    Ellingford, Jamie M; Barton, Stephanie; Bhaskar, Sanjeev; Williams, Simon G; Sergouniotis, Panagiotis I; O'Sullivan, James; Lamb, Janine A; Perveen, Rahat; Hall, Georgina; Newman, William G; Bishop, Paul N; Roberts, Stephen A; Leach, Rick; Tearle, Rick; Bayliss, Stuart; Ramsden, Simon C; Nemeth, Andrea H; Black, Graeme C M

    2016-05-01

    To compare the efficacy of whole genome sequencing (WGS) with targeted next-generation sequencing (NGS) in the diagnosis of inherited retinal disease (IRD). Case series. A total of 562 patients diagnosed with IRD. We performed a direct comparative analysis of current molecular diagnostics with WGS. We retrospectively reviewed the findings from a diagnostic NGS DNA test for 562 patients with IRD. A subset of 46 of 562 patients (encompassing potential clinical outcomes of diagnostic analysis) also underwent WGS, and we compared mutation detection rates and molecular diagnostic yields. In addition, we compared the sensitivity and specificity of the 2 techniques to identify known single nucleotide variants (SNVs) using 6 control samples with publically available genotype data. Diagnostic yield of genomic testing. Across known disease-causing genes, targeted NGS and WGS achieved similar levels of sensitivity and specificity for SNV detection. However, WGS also identified 14 clinically relevant genetic variants through WGS that had not been identified by NGS diagnostic testing for the 46 individuals with IRD. These variants included large deletions and variants in noncoding regions of the genome. Identification of these variants confirmed a molecular diagnosis of IRD for 11 of the 33 individuals referred for WGS who had not obtained a molecular diagnosis through targeted NGS testing. Weighted estimates, accounting for population structure, suggest that WGS methods could result in an overall 29% (95% confidence interval, 15-45) uplift in diagnostic yield. We show that WGS methods can detect disease-causing genetic variants missed by current NGS diagnostic methodologies for IRD and thereby demonstrate the clinical utility and additional value of WGS. Copyright © 2016 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.

  10. Longitudinal Detection and Persistence of Minority Drug-Resistant Populations and Their Effect on Salvage Therapy

    PubMed Central

    Nishizawa, Masako; Matsuda, Masakazu; Hattori, Junko; Shiino, Teiichiro; Matano, Tetsuro; Heneine, Walid; Johnson, Jeffrey A.; Sugiura, Wataru

    2015-01-01

    Background Drug-resistant HIV are more prevalent and persist longer than previously demonstrated by bulk sequencing due to the ability to detect low-frequency variants. To clarify a clinical benefit to monitoring minority-level drug resistance populations as a guide to select active drugs for salvage therapy, we retrospectively analyzed the dynamics of low-frequency drug-resistant population in antiretroviral (ARV)-exposed drug resistant individuals. Materials and Methods Six HIV-infected individuals treated with ARV for more than five years were analyzed. These individuals had difficulty in controlling viremia, and treatment regimens were switched multiple times guided by standard drug resistance testing using bulk sequencing. To detect minority variant populations with drug resistance, we used a highly sensitive allele-specific PCR (AS-PCR) with detection thresholds of 0.3–2%. According to ARV used in these individuals, we focused on the following seven reverse transcriptase inhibitor-resistant mutations: M41L, K65R, K70R, K103N, Y181C, M184V, and T215F/Y. Results of AS-PCR were compared with bulk sequencing data for concordance and presence of additional mutations. To clarify the genetic relationship between low-frequency and high-frequency populations, AS-PCR amplicon sequences were compared with bulk sequences in phylogenetic analysis. Results The use of AS-PCR enabled detection of the drug-resistant mutations, M41L, K103N, Y181C, M184V and T215Y, present as low-frequency populations in five of the six individuals. These drug resistant variants persisted for several years without ARV pressure. Phylogenetic analysis indicated that pre-existing K103N and T215I variants had close genetic relationships with high-frequency K103N and T215I observed during treatment. Discussion and Conclusion Our results demonstrate the long-term persistence of drug-resistant viruses in the absence of drug pressure. The rapid virologic failures with pre-existing mutant viruses detectable by AS-PCR highlight the clinical importance of low-frequency drug-resistant viruses. Thus, our results highlight the usefulness of AS-PCR and support its expanded evaluation in ART clinical management. PMID:26360259

  11. Molecular Diagnosis of Cystic Fibrosis.

    PubMed

    Deignan, Joshua L; Grody, Wayne W

    2016-01-01

    This unit describes a recommended approach to identifying causal genetic variants in an individual suspected of having cystic fibrosis. An introduction to the genetics and clinical presentation of cystic fibrosis is initially presented, followed by a description of the two main strategies used in the molecular diagnosis of cystic fibrosis: (1) an initial targeted variant panel used to detect only the most common cystic fibrosis-causing variants in the CFTR gene, and (2) sequencing of the entire coding region of the CFTR gene to detect additional rare causal CFTR variants. Finally, the unit concludes with a discussion regarding the analytic and clinical validity of these approaches. Copyright © 2016 John Wiley & Sons, Inc.

  12. Targeted Deep Resequencing Identifies Coding Variants in the PEAR1 Gene That Play a Role in Platelet Aggregation

    PubMed Central

    Kim, Yoonhee; Suktitipat, Bhoom; Yanek, Lisa R.; Faraday, Nauder; Wilson, Alexander F.; Becker, Diane M.; Becker, Lewis C.; Mathias, Rasika A.

    2013-01-01

    Platelet aggregation is heritable, and genome-wide association studies have detected strong associations with a common intronic variant of the platelet endothelial aggregation receptor1 (PEAR1) gene both in African American and European American individuals. In this study, we used a sequencing approach to identify additional exonic variants in PEAR1 that may also determine variability in platelet aggregation in the GeneSTAR Study. A 0.3 Mb targeted region on chromosome 1q23.1 including the entire PEAR1 gene was Sanger sequenced in 104 subjects (45% male, 49% African American, age = 52±13) selected on the basis of hyper- and hypo- aggregation across three different agonists (collagen, epinephrine, and adenosine diphosphate). Single-variant and multi-variant burden tests for association were performed. Of the 235 variants identified through sequencing, 61 were novel, and three of these were missense variants. More rare variants (MAF<5%) were noted in African Americans compared to European Americans (108 vs. 45). The common intronic GWAS-identified variant (rs12041331) demonstrated the most significant association signal in African Americans (p = 4.020×10−4); no association was seen for additional exonic variants in this group. In contrast, multi-variant burden tests indicated that exonic variants play a more significant role in European Americans (p = 0.0099 for the collective coding variants compared to p = 0.0565 for intronic variant rs12041331). Imputation of the individual exonic variants in the rest of the GeneSTAR European American cohort (N = 1,965) supports the results noted in the sequenced discovery sample: p = 3.56×10−4, 2.27×10−7, 5.20×10−5 for coding synonymous variant rs56260937 and collagen, epinephrine and adenosine diphosphate induced platelet aggregation, respectively. Sequencing approaches confirm that a common intronic variant has the strongest association with platelet aggregation in African Americans, and show that exonic variants play an additional role in platelet aggregation in European Americans. PMID:23704978

  13. Position-specific automated processing of V3 env ultra-deep pyrosequencing data for predicting HIV-1 tropism

    PubMed Central

    Jeanne, Nicolas; Saliou, Adrien; Carcenac, Romain; Lefebvre, Caroline; Dubois, Martine; Cazabat, Michelle; Nicot, Florence; Loiseau, Claire; Raymond, Stéphanie; Izopet, Jacques; Delobel, Pierre

    2015-01-01

    HIV-1 coreceptor usage must be accurately determined before starting CCR5 antagonist-based treatment as the presence of undetected minor CXCR4-using variants can cause subsequent virological failure. Ultra-deep pyrosequencing of HIV-1 V3 env allows to detect low levels of CXCR4-using variants that current genotypic approaches miss. However, the computation of the mass of sequence data and the need to identify true minor variants while excluding artifactual sequences generated during amplification and ultra-deep pyrosequencing is rate-limiting. Arbitrary fixed cut-offs below which minor variants are discarded are currently used but the errors generated during ultra-deep pyrosequencing are sequence-dependant rather than random. We have developed an automated processing of HIV-1 V3 env ultra-deep pyrosequencing data that uses biological filters to discard artifactual or non-functional V3 sequences followed by statistical filters to determine position-specific sensitivity thresholds, rather than arbitrary fixed cut-offs. It allows to retain authentic sequences with point mutations at V3 positions of interest and discard artifactual ones with accurate sensitivity thresholds. PMID:26585833

  14. Position-specific automated processing of V3 env ultra-deep pyrosequencing data for predicting HIV-1 tropism.

    PubMed

    Jeanne, Nicolas; Saliou, Adrien; Carcenac, Romain; Lefebvre, Caroline; Dubois, Martine; Cazabat, Michelle; Nicot, Florence; Loiseau, Claire; Raymond, Stéphanie; Izopet, Jacques; Delobel, Pierre

    2015-11-20

    HIV-1 coreceptor usage must be accurately determined before starting CCR5 antagonist-based treatment as the presence of undetected minor CXCR4-using variants can cause subsequent virological failure. Ultra-deep pyrosequencing of HIV-1 V3 env allows to detect low levels of CXCR4-using variants that current genotypic approaches miss. However, the computation of the mass of sequence data and the need to identify true minor variants while excluding artifactual sequences generated during amplification and ultra-deep pyrosequencing is rate-limiting. Arbitrary fixed cut-offs below which minor variants are discarded are currently used but the errors generated during ultra-deep pyrosequencing are sequence-dependant rather than random. We have developed an automated processing of HIV-1 V3 env ultra-deep pyrosequencing data that uses biological filters to discard artifactual or non-functional V3 sequences followed by statistical filters to determine position-specific sensitivity thresholds, rather than arbitrary fixed cut-offs. It allows to retain authentic sequences with point mutations at V3 positions of interest and discard artifactual ones with accurate sensitivity thresholds.

  15. Mapping and phasing of structural variation in patient genomes using nanopore sequencing.

    PubMed

    Cretu Stancu, Mircea; van Roosmalen, Markus J; Renkens, Ivo; Nieboer, Marleen M; Middelkamp, Sjors; de Ligt, Joep; Pregno, Giulia; Giachino, Daniela; Mandrile, Giorgia; Espejo Valle-Inclan, Jose; Korzelius, Jerome; de Bruijn, Ewart; Cuppen, Edwin; Talkowski, Michael E; Marschall, Tobias; de Ridder, Jeroen; Kloosterman, Wigard P

    2017-11-06

    Despite improvements in genomics technology, the detection of structural variants (SVs) from short-read sequencing still poses challenges, particularly for complex variation. Here we analyse the genomes of two patients with congenital abnormalities using the MinION nanopore sequencer and a novel computational pipeline-NanoSV. We demonstrate that nanopore long reads are superior to short reads with regard to detection of de novo chromothripsis rearrangements. The long reads also enable efficient phasing of genetic variations, which we leveraged to determine the parental origin of all de novo chromothripsis breakpoints and to resolve the structure of these complex rearrangements. Additionally, genome-wide surveillance of inherited SVs reveals novel variants, missed in short-read data sets, a large proportion of which are retrotransposon insertions. We provide a first exploration of patient genome sequencing with a nanopore sequencer and demonstrate the value of long-read sequencing in mapping and phasing of SVs for both clinical and research applications.

  16. Engineering of a DNA Polymerase for Direct m6 A Sequencing.

    PubMed

    Aschenbrenner, Joos; Werner, Stephan; Marchand, Virginie; Adam, Martina; Motorin, Yuri; Helm, Mark; Marx, Andreas

    2018-01-08

    Methods for the detection of RNA modifications are of fundamental importance for advancing epitranscriptomics. N 6 -methyladenosine (m 6 A) is the most abundant RNA modification in mammalian mRNA and is involved in the regulation of gene expression. Current detection techniques are laborious and rely on antibody-based enrichment of m 6 A-containing RNA prior to sequencing, since m 6 A modifications are generally "erased" during reverse transcription (RT). To overcome the drawbacks associated with indirect detection, we aimed to generate novel DNA polymerase variants for direct m 6 A sequencing. Therefore, we developed a screen to evolve an RT-active KlenTaq DNA polymerase variant that sets a mark for N 6 -methylation. We identified a mutant that exhibits increased misincorporation opposite m 6 A compared to unmodified A. Application of the generated DNA polymerase in next-generation sequencing allowed the identification of m 6 A sites directly from the sequencing data of untreated RNA samples. © 2017 The Authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA.

  17. Screening of dementia genes by whole-exome sequencing in early-onset Alzheimer disease: input and lessons.

    PubMed

    Nicolas, Gaël; Wallon, David; Charbonnier, Camille; Quenez, Olivier; Rousseau, Stéphane; Richard, Anne-Claire; Rovelet-Lecrux, Anne; Coutant, Sophie; Le Guennec, Kilan; Bacq, Delphine; Garnier, Jean-Guillaume; Olaso, Robert; Boland, Anne; Meyer, Vincent; Deleuze, Jean-François; Munter, Hans Markus; Bourque, Guillaume; Auld, Daniel; Montpetit, Alexandre; Lathrop, Mark; Guyant-Maréchal, Lucie; Martinaud, Olivier; Pariente, Jérémie; Rollin-Sillaire, Adeline; Pasquier, Florence; Le Ber, Isabelle; Sarazin, Marie; Croisile, Bernard; Boutoleau-Bretonnière, Claire; Thomas-Antérion, Catherine; Paquet, Claire; Sauvée, Mathilde; Moreaud, Olivier; Gabelle, Audrey; Sellal, François; Ceccaldi, Mathieu; Chamard, Ludivine; Blanc, Frédéric; Frebourg, Thierry; Campion, Dominique; Hannequin, Didier

    2016-05-01

    Causative variants in APP, PSEN1 or PSEN2 account for a majority of cases of autosomal dominant early-onset Alzheimer disease (ADEOAD, onset before 65 years). Variant detection rates in other EOAD patients, that is, with family history of late-onset AD (LOAD) (and no incidence of EOAD) and sporadic cases might be much lower. We analyzed the genomes from 264 patients using whole-exome sequencing (WES) with high depth of coverage: 90 EOAD patients with family history of LOAD and no incidence of EOAD in the family and 174 patients with sporadic AD starting between 51 and 65 years. We found three PSEN1 and one PSEN2 causative, probably or possibly causative variants in four patients (1.5%). Given the absence of PSEN1, PSEN2 and APP causative variants, we investigated whether these 260 patients might be burdened with protein-modifying variants in 20 genes that were previously shown to cause other types of dementia when mutated. For this analysis, we included an additional set of 160 patients who were previously shown to be free of causative variants in PSEN1, PSEN2 and APP: 107 ADEOAD patients and 53 sporadic EOAD patients with an age of onset before 51 years. In these 420 patients, we detected no variant that might modify the function of the 20 dementia-causing genes. We conclude that EOAD patients with family history of LOAD and no incidence of EOAD in the family or patients with sporadic AD starting between 51 and 65 years have a low variant-detection rate in AD genes.

  18. The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection

    PubMed Central

    Jiang, Yue; Turinsky, Andrei L.; Brudno, Michael

    2015-01-01

    With the development of High-Throughput Sequencing (HTS) thousands of human genomes have now been sequenced. Whenever different studies analyze the same genome they usually agree on the amount of single-nucleotide polymorphisms, but differ dramatically on the number of insertion and deletion variants (indels). Furthermore, there is evidence that indels are often severely under-reported. In this manuscript we derive the total number of indel variants in a human genome by combining data from different sequencing technologies, while assessing the indel detection accuracy. Our estimate of approximately 1 million indels in a Yoruban genome is much higher than the results reported in several recent HTS studies. We identify two key sources of difficulties in indel detection: the insufficient coverage, read length or alignment quality; and the presence of repeats, including short interspersed elements and homopolymers/dimers. We quantify the effect of these factors on indel detection. The quality of sequencing data plays a major role in improving indel detection by HTS methods. However, many indels exist in long homopolymers and repeats, where their detection is severely impeded. The true number of indel events is likely even higher than our current estimates, and new techniques and technologies will be required to detect them. PMID:26130710

  19. Sensitivity to sequencing depth in single-cell cancer genomics.

    PubMed

    Alves, João M; Posada, David

    2018-04-16

    Querying cancer genomes at single-cell resolution is expected to provide a powerful framework to understand in detail the dynamics of cancer evolution. However, given the high costs currently associated with single-cell sequencing, together with the inevitable technical noise arising from single-cell genome amplification, cost-effective strategies that maximize the quality of single-cell data are critically needed. Taking advantage of previously published single-cell whole-genome and whole-exome cancer datasets, we studied the impact of sequencing depth and sampling effort towards single-cell variant detection. Five single-cell whole-genome and whole-exome cancer datasets were independently downscaled to 25, 10, 5, and 1× sequencing depth. For each depth level, ten technical replicates were generated, resulting in a total of 6280 single-cell BAM files. The sensitivity of variant detection, including structural and driver mutations, genotyping, clonal inference, and phylogenetic reconstruction to sequencing depth was evaluated using recent tools specifically designed for single-cell data. Altogether, our results suggest that for relatively large sample sizes (25 or more cells) sequencing single tumor cells at depths > 5× does not drastically improve somatic variant discovery, characterization of clonal genotypes, or estimation of single-cell phylogenies. We suggest that sequencing multiple individual tumor cells at a modest depth represents an effective alternative to explore the mutational landscape and clonal evolutionary patterns of cancer genomes.

  20. Population-based rare variant detection via pooled exome or custom hybridization capture with or without individual indexing.

    PubMed

    Ramos, Enrique; Levinson, Benjamin T; Chasnoff, Sara; Hughes, Andrew; Young, Andrew L; Thornton, Katherine; Li, Allie; Vallania, Francesco L M; Province, Michael; Druley, Todd E

    2012-12-06

    Rare genetic variation in the human population is a major source of pathophysiological variability and has been implicated in a host of complex phenotypes and diseases. Finding disease-related genes harboring disparate functional rare variants requires sequencing of many individuals across many genomic regions and comparing against unaffected cohorts. However, despite persistent declines in sequencing costs, population-based rare variant detection across large genomic target regions remains cost prohibitive for most investigators. In addition, DNA samples are often precious and hybridization methods typically require large amounts of input DNA. Pooled sample DNA sequencing is a cost and time-efficient strategy for surveying populations of individuals for rare variants. We set out to 1) create a scalable, multiplexing method for custom capture with or without individual DNA indexing that was amenable to low amounts of input DNA and 2) expand the functionality of the SPLINTER algorithm for calling substitutions, insertions and deletions across either candidate genes or the entire exome by integrating the variant calling algorithm with the dynamic programming aligner, Novoalign. We report methodology for pooled hybridization capture with pre-enrichment, indexed multiplexing of up to 48 individuals or non-indexed pooled sequencing of up to 92 individuals with as little as 70 ng of DNA per person. Modified solid phase reversible immobilization bead purification strategies enable no sample transfers from sonication in 96-well plates through adapter ligation, resulting in 50% less library preparation reagent consumption. Custom Y-shaped adapters containing novel 7 base pair index sequences with a Hamming distance of ≥2 were directly ligated onto fragmented source DNA eliminating the need for PCR to incorporate indexes, and was followed by a custom blocking strategy using a single oligonucleotide regardless of index sequence. These results were obtained aligning raw reads against the entire genome using Novoalign followed by variant calling of non-indexed pools using SPLINTER or SAMtools for indexed samples. With these pipelines, we find sensitivity and specificity of 99.4% and 99.7% for pooled exome sequencing. Sensitivity, and to a lesser degree specificity, proved to be a function of coverage. For rare variants (≤2% minor allele frequency), we achieved sensitivity and specificity of ≥94.9% and ≥99.99% for custom capture of 2.5 Mb in multiplexed libraries of 22-48 individuals with only ≥5-fold coverage/chromosome, but these parameters improved to ≥98.7 and 100% with 20-fold coverage/chromosome. This highly scalable methodology enables accurate rare variant detection, with or without individual DNA sample indexing, while reducing the amount of required source DNA and total costs through less hybridization reagent consumption, multi-sample sonication in a standard PCR plate, multiplexed pre-enrichment pooling with a single hybridization and lesser sequencing coverage required to obtain high sensitivity.

  1. Inferring Short-Range Linkage Information from Sequencing Chromatograms

    PubMed Central

    Beggel, Bastian; Neumann-Fraune, Maria; Kaiser, Rolf; Verheyen, Jens; Lengauer, Thomas

    2013-01-01

    Direct Sanger sequencing of viral genome populations yields multiple ambiguous sequence positions. It is not straightforward to derive linkage information from sequencing chromatograms, which in turn hampers the correct interpretation of the sequence data. We present a method for determining the variants existing in a viral quasispecies in the case of two nearby ambiguous sequence positions by exploiting the effect of sequence context-dependent incorporation of dideoxynucleotides. The computational model was trained on data from sequencing chromatograms of clonal variants and was evaluated on two test sets of in vitro mixtures. The approach achieved high accuracies in identifying the mixture components of 97.4% on a test set in which the positions to be analyzed are only one base apart from each other, and of 84.5% on a test set in which the ambiguous positions are separated by three bases. In silico experiments suggest two major limitations of our approach in terms of accuracy. First, due to a basic limitation of Sanger sequencing, it is not possible to reliably detect minor variants with a relative frequency of no more than 10%. Second, the model cannot distinguish between mixtures of two or four clonal variants, if one of two sets of linear constraints is fulfilled. Furthermore, the approach requires repetitive sequencing of all variants that might be present in the mixture to be analyzed. Nevertheless, the effectiveness of our method on the two in vitro test sets shows that short-range linkage information of two ambiguous sequence positions can be inferred from Sanger sequencing chromatograms without any further assumptions on the mixture composition. Additionally, our model provides new insights into the established and widely used Sanger sequencing technology. The source code of our method is made available at http://bioinf.mpi-inf.mpg.de/publications/beggel/linkageinformation.zip. PMID:24376502

  2. Novel variants in PAX6 gene caused congenital aniridia in two Chinese families.

    PubMed

    Zhang, R; Linpeng, S; Wei, X; Li, H; Huang, Y; Guo, J; Wu, Q; Liang, D; Wu, L

    2017-06-01

    PurposeTo reveal the underlying genetic defect in two four-generation Chinese families with aniridia and explore the pathologic mechanism.MethodsFull ophthalmic examinations were performed in two families with aniridia. The PAX6 gene was directly sequenced in patients of two families, and the detected variants were screened in unaffected family members and two hundred unrelated healthy controls. Real-time quantitative PCR was used to explore pathologic mechanisms of the two variants.ResultsAniridia, cataract, and oscillatory nystagmus were observed in patients of the two families. In addition, we observed corneal opacity and microphthalmus in family 1, and strabismus, left ectopia lentis, microphthalmus, and microcornea in family 2. Sanger sequencing detected a novel 1-bp duplication (c.50dupA) in family 1 and a novel 2-bp splice site deletion (c.765+1_765+2delGT) in family 2. Sequencing of cDNA indicated skipping of exon 9 caused by the splice site deletion, being predicted to cause a premature stop codon, as well as the duplication. The PAX6 mRNA significantly lower in patients with aniridia than in unaffected family members in both families, suggesting that the duplication and splice site deletion caused nonsense-mediated mRNA decay.ConclusionsOur study identified two novel PAX6 variants in two families with aniridia and revealed the pathogenicity of the variants; this would expand the variant spectrum of PAX6 and help us better understand the molecular basis of aniridia, thus facilitating genetic counseling.

  3. dbWGFP: a database and web server of human whole-genome single nucleotide variants and their functional predictions.

    PubMed

    Wu, Jiaxin; Wu, Mengmeng; Li, Lianshuo; Liu, Zhuo; Zeng, Wanwen; Jiang, Rui

    2016-01-01

    The recent advancement of the next generation sequencing technology has enabled the fast and low-cost detection of all genetic variants spreading across the entire human genome, making the application of whole-genome sequencing a tendency in the study of disease-causing genetic variants. Nevertheless, there still lacks a repository that collects predictions of functionally damaging effects of human genetic variants, though it has been well recognized that such predictions play a central role in the analysis of whole-genome sequencing data. To fill this gap, we developed a database named dbWGFP (a database and web server of human whole-genome single nucleotide variants and their functional predictions) that contains functional predictions and annotations of nearly 8.58 billion possible human whole-genome single nucleotide variants. Specifically, this database integrates 48 functional predictions calculated by 17 popular computational methods and 44 valuable annotations obtained from various data sources. Standalone software, user-friendly query services and free downloads of this database are available at http://bioinfo.au.tsinghua.edu.cn/dbwgfp. dbWGFP provides a valuable resource for the analysis of whole-genome sequencing, exome sequencing and SNP array data, thereby complementing existing data sources and computational resources in deciphering genetic bases of human inherited diseases. © The Author(s) 2016. Published by Oxford University Press.

  4. Unique Variants in OPN1LW Cause Both Syndromic and Nonsyndromic X-Linked High Myopia Mapped to MYP1.

    PubMed

    Li, Jiali; Gao, Bei; Guan, Liping; Xiao, Xueshan; Zhang, Jianguo; Li, Shiqiang; Jiang, Hui; Jia, Xiaoyun; Yang, Jianhua; Guo, Xiangming; Yin, Ye; Wang, Jun; Zhang, Qingjiong

    2015-06-01

    MYP1 is a locus for X-linked syndromic and nonsyndromic high myopia. Recently, unique haplotypes in OPN1LW were found to be responsible for X-linked syndromic high myopia mapped to MYP1. The current study is to test if such variants in OPN1LW are also responsible for X-linked nonsyndromic high myopia mapped to MYP1. The proband of the family previously mapped to MYP1 was initially analyzed using whole-exome sequencing and whole-genome sequencing. Additional probands with early-onset high myopia were analyzed using whole-exome sequencing. Variants in OPN1LW were selected and confirmed by Sanger sequencing. Long-range and second PCR were used to determine the haplotype and the first gene of the red-green gene array. Candidate variants were further validated in family members and controls. The unique LVAVA haplotype in OPN1LW was detected in the family with X-linked nonsyndromic high myopia mapped to MYP1. In addition, this haplotype and a novel frameshift mutation (c.617_620dup, p.Phe208Argfs*51) in OPN1LW were detected in two other families with X-linked high myopia. The unique haplotype cosegregated with high myopia in the two families, with a maximum LOD score of 3.34 and 2.31 at θ = 0. OPN1LW with the variants in these families was the first gene in the red-green gene array and was not present in 247 male controls. Reevaluation of the clinical data in both families with the unique haplotype suggested nonsyndromic high myopia. Our study confirms the findings that unique variants in OPN1LW are responsible for both syndromic and nonsyndromic X-linked high myopia mapped to MYP1.

  5. Mutation Analysis of SLC26A4 for Pendred Syndrome and Nonsyndromic Hearing Loss by High-Resolution Melting

    PubMed Central

    Chen, Neng; Tranebjærg, Lisbeth; Rendtorff, Nanna Dahl; Schrijver, Iris

    2011-01-01

    Pendred syndrome and DFNB4 (autosomal recessive nonsyndromic congenital deafness, locus 4) are associated with autosomal recessive congenital sensorineural hearing loss and mutations in the SLC26A4 gene. Extensive allelic heterogeneity, however, necessitates analysis of all exons and splice sites to identify mutations for individual patients. Although Sanger sequencing is the gold standard for mutation detection, screening methods supplemented with targeted sequencing can provide a cost-effective alternative. One such method, denaturing high-performance liquid chromatography, was developed for clinical mutation detection in SLC26A4. However, this method inherently cannot distinguish homozygous changes from wild-type sequences. High-resolution melting (HRM), on the other hand, can detect heterozygous and homozygous changes cost-effectively, without any post-PCR modifications. We developed a closed-tube HRM mutation detection method specific for SLC26A4 that can be used in the clinical diagnostic setting. Twenty-eight primer pairs were designed to cover all 21 SLC26A4 exons and splice junction sequences. Using the resulting amplicons, initial HRM analysis detected all 45 variants previously identified by sequencing. Subsequently, a 384-well plate format was designed for up to three patient samples per run. Blinded HRM testing on these plates of patient samples collected over 1 year in a clinical diagnostic laboratory accurately detected all variants identified by sequencing. In conclusion, HRM with targeted sequencing is a reliable, simple, and cost-effective method for SLC26A4 mutation screening and detection. PMID:21704276

  6. Houston Methodist Variant Viewer: An Application to Support Clinical Laboratory Interpretation of Next-generation Sequencing Data for Cancer

    PubMed Central

    Christensen, Paul A.; Ni, Yunyun; Bao, Feifei; Hendrickson, Heather L.; Greenwood, Michael; Thomas, Jessica S.; Long, S. Wesley; Olsen, Randall J.

    2017-01-01

    Introduction: Next-generation-sequencing (NGS) is increasingly used in clinical and research protocols for patients with cancer. NGS assays are routinely used in clinical laboratories to detect mutations bearing on cancer diagnosis, prognosis and personalized therapy. A typical assay may interrogate 50 or more gene targets that encompass many thousands of possible gene variants. Analysis of NGS data in cancer is a labor-intensive process that can become overwhelming to the molecular pathologist or research scientist. Although commercial tools for NGS data analysis and interpretation are available, they are often costly, lack key functionality or cannot be customized by the end user. Methods: To facilitate NGS data analysis in our clinical molecular diagnostics laboratory, we created a custom bioinformatics tool termed Houston Methodist Variant Viewer (HMVV). HMVV is a Java-based solution that integrates sequencing instrument output, bioinformatics analysis, storage resources and end user interface. Results: Compared to the predicate method used in our clinical laboratory, HMVV markedly simplifies the bioinformatics workflow for the molecular technologist and facilitates the variant review by the molecular pathologist. Importantly, HMVV reduces time spent researching the biological significance of the variants detected, standardizes the online resources used to perform the variant investigation and assists generation of the annotated report for the electronic medical record. HMVV also maintains a searchable variant database, including the variant annotations generated by the pathologist, which is useful for downstream quality improvement and research projects. Conclusions: HMVV is a clinical grade, low-cost, feature-rich, highly customizable platform that we have made available for continued development by the pathology informatics community. PMID:29226007

  7. Detection and Heterogeneity of Herpesviruses Causing Pacheco's Disease in Parrots

    PubMed Central

    Tomaszewski, Elizabeth; Wilson, Van G.; Wigle, William L.; Phalen, David N.

    2001-01-01

    Pacheco's disease (PD) is a common, often fatal, disease of parrots. We cloned a virus isolate from a parrot that had characteristic lesions of PD. Three viral clones were partially sequenced, demonstrating that this virus was an alphaherpesvirus most closely related to the gallid herpesvirus 1. Five primer sets were developed from these sequences. The primer sets were used with PCR to screen tissues or tissue culture media suspected to contain viruses from 54 outbreaks of PD. The primer sets amplified DNA from all but one sample. Ten amplification patterns were detected, indicating that PD is caused by a genetically heterogeneous population of viruses. A single genetic variant (psittacid herpesvirus variant 1) amplified with all primer sets and was the most common virus variant (62.7%). A single primer set (23F) amplified DNA from all of the positive samples, suggesting that PCR could be used as a rapid postmortem assay for these viruses. PCR was found to be significantly more sensitive than tissue culture for the detection of psittacid herpesviruses. PMID:11158102

  8. Characteristics of MUTYH variants in Japanese colorectal polyposis patients.

    PubMed

    Takao, Misato; Yamaguchi, Tatsuro; Eguchi, Hidetaka; Tada, Yuhki; Kohda, Masakazu; Koizumi, Koichi; Horiguchi, Shin-Ichiro; Okazaki, Yasushi; Ishida, Hideyuki

    2018-06-01

    The base excision repair gene MUTYH is the causative gene of colorectal polyposis syndrome, which is an autosomal recessive disorder associated with a high risk of colorectal cancer. Since few studies have investigated the genotype-phenotype association in Japanese patients with MUTYH variants, the aim of this study was to clarify the clinicopathological findings in Japanese patients with MUTYH gene variants who were detected by screening causative genes associated with hereditary colorectal polyposis. After obtaining informed consent, genetic testing was performed using target enrichment sequencing of 26 genes, including MUTYH. Of the 31 Japanese patients with suspected hereditary colorectal polyposis, eight MUTYH variants were detected in five patients. MUTYH hotspot variants known for Caucasians, namely p.G396D and p.Y179D, were not among the detected variants.Of five patients, two with biallelic MUTYH variants were diagnosed with MUTYH-associated polyposis, while two others had monoallelic MUTYH variants. One patient had the p.P18L and p.G25D variants on the same allele; however, supportive data for considering these two variants 'pathogenic' were lacking. Two patients with biallelic MUTYH variants and two others with monoallelic MUTYH variants were identified among Japanese colorectal polyposis patients. Hotspot variants of the MUTYH gene for Caucasians were not hotspots for Japanese patients.

  9. Characterization of Canine parvovirus 2 variants circulating in Greece.

    PubMed

    Ntafis, Vasileios; Xylouri, Eftychia; Kalli, Iris; Desario, Costantina; Mari, Viviana; Decaro, Nicola; Buonavoglia, Canio

    2010-09-01

    The aim of the present study was to characterize Canine parvovirus 2 (CPV-2) variants currently circulating in Greece. Between March 2008 and March 2009, 167 fecal samples were collected from diarrheic dogs from different regions of Greece. Canine parvovirus 2 was detected by standard polymerase chain reaction, whereas minor groove binder probe assays were used to distinguish genetic variants and discriminate between vaccine and field strains. Of 84 CPV-2-positive samples, 81 CPV-2a, 1 CPV-2b, and 2 CPV-2c were detected. Vaccine strains were not detected in any sample. Sequence analysis of the VP2 gene of the 2 CPV-2c viruses revealed up to 100% amino acid identity with the CPV-2c strains previously detected in Europe. The results indicated that, unlike other European countries, CPV-2a remains the most common variant in Greece, and that the CPV-2c variant found in Europe is also present in Greece.

  10. Frequency of EBV LMP-1 Promoter and Coding Variations in Burkitt Lymphoma Samples in Africa and South America and Peripheral Blood in Uganda.

    PubMed

    Liao, Hsiao-Mei; Liu, Hebing; Lei, Heiyan; Li, Bingjie; Chin, Pei-Ju; Tsai, Shien; Bhatia, Kishor; Gutierrez, Marina; Epelman, Sidnei; Biggar, Robert J; Nkrumah, Francis; Neequaye, Janet; Ogwang, Martin D; Reynolds, Steven J; Lo, Shyh-Ching; Mbulaiteye, Sam M

    2018-06-02

    Epstein-Barr virus (EBV) is linked to several cancers, including endemic Burkitt lymphoma (eBL), but causal variants are unknown. We recently reported novel sequence variants in the LMP-1 gene and promoter in EBV genomes sequenced from 13 of 14 BL biopsies. Alignments of the novel sequence variants for 114 published EBV genomes, including 27 from BL cases, revealed four LMP-1 variant patterns, designated A to D. Pattern A variant was found in 48% of BL EBV genomes. Here, we used PCR-Sanger sequencing to evaluate 50 additional BL biopsies from Ghana, Brazil, and Argentina, and peripheral blood samples from 113 eBL cases and 115 controls in Uganda. Pattern A was found in 60.9% of 64 BL biopsies evaluated. Compared to PCR-negative subjects in Uganda, detection of Pattern A in peripheral blood was associated with eBL case status (odds ratio [OR] 31.7, 95% confidence interval: 6.8⁻149), controlling for relevant confounders. Variant Pattern A and Pattern D were associated with eBL case status, but with lower ORs (9.7 and 13.6, respectively). Our results support the hypothesis that EBV LMP-1 Pattern A may be associated with eBL, but it is not the sole associated variant. Further research is needed to replicate and elucidate our findings.

  11. Analysis of intra-host genetic diversity of Prunus necrotic ringspot virus (PNRSV) using amplicon next generation sequencing.

    PubMed

    Kinoti, Wycliff M; Constable, Fiona E; Nancarrow, Narelle; Plummer, Kim M; Rodoni, Brendan

    2017-01-01

    PCR amplicon next generation sequencing (NGS) analysis offers a broadly applicable and targeted approach to detect populations of both high- or low-frequency virus variants in one or more plant samples. In this study, amplicon NGS was used to explore the diversity of the tripartite genome virus, Prunus necrotic ringspot virus (PNRSV) from 53 PNRSV-infected trees using amplicons from conserved gene regions of each of PNRSV RNA1, RNA2 and RNA3. Sequencing of the amplicons from 53 PNRSV-infected trees revealed differing levels of polymorphism across the three different components of the PNRSV genome with a total number of 5040, 2083 and 5486 sequence variants observed for RNA1, RNA2 and RNA3 respectively. The RNA2 had the lowest diversity of sequences compared to RNA1 and RNA3, reflecting the lack of flexibility tolerated by the replicase gene that is encoded by this RNA component. Distinct PNRSV phylo-groups, consisting of closely related clusters of sequence variants, were observed in each of PNRSV RNA1, RNA2 and RNA3. Most plant samples had a single phylo-group for each RNA component. Haplotype network analysis showed that smaller clusters of PNRSV sequence variants were genetically connected to the largest sequence variant cluster within a phylo-group of each RNA component. Some plant samples had sequence variants occurring in multiple PNRSV phylo-groups in at least one of each RNA and these phylo-groups formed distinct clades that represent PNRSV genetic strains. Variants within the same phylo-group of each Prunus plant sample had ≥97% similarity and phylo-groups within a Prunus plant sample and between samples had less ≤97% similarity. Based on the analysis of diversity, a definition of a PNRSV genetic strain was proposed. The proposed definition was applied to determine the number of PNRSV genetic strains in each of the plant samples and the complexity in defining genetic strains in multipartite genome viruses was explored.

  12. Ready to clone: CNV detection and breakpoint fine-mapping in breast and ovarian cancer susceptibility genes by high-resolution array CGH.

    PubMed

    Hackmann, Karl; Kuhlee, Franziska; Betcheva-Krajcir, Elitza; Kahlert, Anne-Karin; Mackenroth, Luisa; Klink, Barbara; Di Donato, Nataliya; Tzschach, Andreas; Kast, Karin; Wimberger, Pauline; Schrock, Evelin; Rump, Andreas

    2016-10-01

    Detection of predisposing copy number variants (CNV) in 330 families affected with hereditary breast and ovarian cancer (HBOC). In order to complement mutation detection with Illumina's TruSight Cancer panel, we designed a customized high-resolution 8 × 60k array for CGH (aCGH) that covers all 94 genes from the panel. Copy number variants with immediate clinical relevance were detected in 12 families (3.6%). Besides 3 known CNVs in CHEK2, RAD51C, and BRCA1, we identified 3 novel pathogenic CNVs in BRCA1 (deletion of exons 4-13, deletion of exons 12-18) and ATM (deletion exons 57-63) plus an intragenic duplication of BRCA2 (exons 3-11) and an intronic BRCA1 variant with unknown pathogenicity. The precision of high-resolution aCGH enabled straight forward breakpoint amplification of a BRCA1 deletion which subsequently allowed for fast and economic CNV verification in family members of the index patient. Furthermore, we used our aCGH data to validate an algorithm that was able to detect all identified copy number changes from next-generation sequencing (NGS) data. Copy number detection is a mandatory analysis in HBOC families at least if no predisposing mutations were found by sequencing. Currently, high-resolution array CGH is our first choice of method of analysis due to unmatched detection precision. Although it seems possible to detect CNV from sequencing data, there currently is no satisfying tool to do so in a routine diagnostic setting.

  13. The Application of Next-Generation Sequencing for Mutation Detection in Autosomal-Dominant Hereditary Hearing Impairment.

    PubMed

    Gürtler, Nicolas; Röthlisberger, Benno; Ludin, Katja; Schlegel, Christoph; Lalwani, Anil K

    2017-07-01

    Identification of the causative mutation using next-generation sequencing in autosomal-dominant hereditary hearing impairment, as mutation analysis in hereditary hearing impairment by classic genetic methods, is hindered by the high heterogeneity of the disease. Two Swiss families with autosomal-dominant hereditary hearing impairment. Amplified DNA libraries for next-generation sequencing were constructed from extracted genomic DNA, derived from peripheral blood, and enriched by a custom-made sequence capture library. Validated, pooled libraries were sequenced on an Illumina MiSeq instrument, 300 cycles and paired-end sequencing. Technical data analysis was performed with SeqMonk, variant analysis with GeneTalk or VariantStudio. The detection of mutations in genes related to hearing loss by next-generation sequencing was subsequently confirmed using specific polymerase-chain-reaction and Sanger sequencing. Mutation detection in hearing-loss-related genes. The first family harbored the mutation c.5383+5delGTGA in the TECTA-gene. In the second family, a novel mutation c.2614-2625delCATGGCGCCGTG in the WFS1-gene and a second mutation TCOF1-c.1028G>A were identified. Next-generation sequencing successfully identified the causative mutation in families with autosomal-dominant hereditary hearing impairment. The results helped to clarify the pathogenic role of a known mutation and led to the detection of a novel one. NGS represents a feasible approach with great potential future in the diagnostics of hereditary hearing impairment, even in smaller labs.

  14. Challenges imposed by minor reference alleles on the identification and reporting of clinical variants from exome data.

    PubMed

    Koko, Mahmoud; Abdallah, Mohammed O E; Amin, Mutaz; Ibrahim, Muntaser

    2018-01-15

    The conventional variant calling of pathogenic alleles in exome and genome sequencing requires the presence of the non-pathogenic alleles as genome references. This hinders the correct identification of variants with minor and/or pathogenic reference alleles warranting additional approaches for variant calling. More than 26,000 Exome Aggregation Consortium (ExAC) variants have a minor reference allele including variants with known ClinVar disease alleles. For instance, in a number of variants related to clotting disorders, the phenotype-associated allele is a human genome reference allele (rs6025, rs6003, rs1799983, and rs2227564 using the assembly hg19). We highlighted how the current variant calling standards miss homozygous reference disease variants in these sites and provided a bioinformatic panel that can be used to screen these variants using commonly available variant callers. We present exome sequencing results from an individual with venous thrombosis to emphasize how pathogenic alleles in clinically relevant variants escape variant calling while non-pathogenic alleles are detected. This article highlights the importance of specialized variant calling strategies in clinical variants with minor reference alleles especially in the context of personal genomes and exomes. We provide here a simple strategy to screen potential disease-causing variants when present in homozygous reference state.

  15. Annotation of Sequence Variants in Cancer Samples: Processes and Pitfalls for Routine Assays in the Clinical Laboratory.

    PubMed

    Lee, Lobin A; Arvai, Kevin J; Jones, Dan

    2015-07-01

    As DNA sequencing of multigene panels becomes routine for cancer samples in the clinical laboratory, an efficient process for classifying variants has become more critical. Determining which germline variants are significant for cancer disposition and which somatic mutations are integral to cancer development or therapy response remains difficult, even for well-studied genes such as BRCA1 and TP53. We compare and contrast the general principles and lines of evidence commonly used to distinguish the significance of cancer-associated germline and somatic genetic variants. The factors important in each step of the analysis pipeline are reviewed, as are some of the publicly available annotation tools. Given the range of indications and uses of cancer sequencing assays, including diagnosis, staging, prognostication, theranostics, and residual disease detection, the need for flexible methods for scoring of variants is discussed. The usefulness of protein prediction tools and multimodal risk-based or Bayesian approaches are highlighted. Using TET2 variants encountered in hematologic neoplasms, several examples of this multifactorial approach to classifying sequence variants of unknown significance are presented. Although there are still significant gaps in the publicly available data for many cancer genes that limit the broad application of explicit algorithms for variant scoring, the elements of a more rigorous model are outlined. Copyright © 2015 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  16. A power set-based statistical selection procedure to locate susceptible rare variants associated with complex traits with sequencing data.

    PubMed

    Sun, Hokeun; Wang, Shuang

    2014-08-15

    Existing association methods for rare variants from sequencing data have focused on aggregating variants in a gene or a genetic region because of the fact that analysing individual rare variants is underpowered. However, these existing rare variant detection methods are not able to identify which rare variants in a gene or a genetic region of all variants are associated with the complex diseases or traits. Once phenotypic associations of a gene or a genetic region are identified, the natural next step in the association study with sequencing data is to locate the susceptible rare variants within the gene or the genetic region. In this article, we propose a power set-based statistical selection procedure that is able to identify the locations of the potentially susceptible rare variants within a disease-related gene or a genetic region. The selection performance of the proposed selection procedure was evaluated through simulation studies, where we demonstrated the feasibility and superior power over several comparable existing methods. In particular, the proposed method is able to handle the mixed effects when both risk and protective variants are present in a gene or a genetic region. The proposed selection procedure was also applied to the sequence data on the ANGPTL gene family from the Dallas Heart Study to identify potentially susceptible rare variants within the trait-related genes. An R package 'rvsel' can be downloaded from http://www.columbia.edu/∼sw2206/ and http://statsun.pusan.ac.kr. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. Genome-Independent Identification of RNA Editing by Mutual Information (GIREMI) | Informatics Technology for Cancer Research (ITCR)

    Cancer.gov

    Identification of single-nucleotide variants in RNA-seq data. Current version focuses on detection of RNA editing sites without requiring genome sequence data. New version is under development to separately identify RNA editing sites and genetic variants using RNA-seq data alone.

  18. Variant discovery in the sheep milk transcriptome using RNA sequencing.

    PubMed

    Suárez-Vega, Aroa; Gutiérrez-Gil, Beatriz; Klopp, Christophe; Tosser-Klopp, Gwenola; Arranz, Juan José

    2017-02-15

    The identification of genetic variation underlying desired phenotypes is one of the main challenges of current livestock genetic research. High-throughput transcriptome sequencing (RNA-Seq) offers new opportunities for the detection of transcriptome variants (SNPs and short indels) in different tissues and species. In this study, we used RNA-Seq on Milk Sheep Somatic Cells (MSCs) with the goal of characterizing the genetic variation within the coding regions of the milk transcriptome in Churra and Assaf sheep, two common dairy sheep breeds farmed in Spain. A total of 216,637 variants were detected in the MSCs transcriptome of the eight ewes analyzed. Among them, a total of 57,795 variants were detected in the regions harboring Quantitative Trait Loci (QTL) for milk yield, protein percentage and fat percentage, of which 21.44% were novel variants. Among the total variants detected, 561 (2.52%) and 1,649 (7.42%) were predicted to produce high or moderate impact changes in the corresponding transcriptional unit, respectively. In the functional enrichment analysis of the genes positioned within selected QTL regions harboring novel relevant functional variants (high and moderate impact), the KEGG pathway with the highest enrichment was "protein processing in endoplasmic reticulum". Additionally, a total of 504 and 1,063 variants were identified in the genes encoding principal milk proteins and molecules involved in the lipid metabolism, respectively. Of these variants, 20 mutations were found to have putative relevant effects on the encoded proteins. We present herein the first transcriptomic approach aimed at identifying genetic variants of the genes expressed in the lactating mammary gland of sheep. Through the transcriptome analysis of variability within regions harboring QTL for milk yield, protein percentage and fat percentage, we have found several pathways and genes that harbor mutations that could affect dairy production traits. Moreover, remarkable variants were also found in candidate genes coding for major milk proteins and proteins related to milk fat metabolism. Several of the SNPs found in this study could be included as suitable markers in genotyping platforms or custom SNP arrays to perform association analyses in commercial populations and apply genomic selection protocols in the dairy production industry.

  19. A powerful tool for genome analysis in maize: development and evaluation of the high density 600 k SNP genotyping array.

    PubMed

    Unterseer, Sandra; Bauer, Eva; Haberer, Georg; Seidel, Michael; Knaak, Carsten; Ouzunova, Milena; Meitinger, Thomas; Strom, Tim M; Fries, Ruedi; Pausch, Hubert; Bertani, Christofer; Davassi, Alessandro; Mayer, Klaus Fx; Schön, Chris-Carolin

    2014-09-29

    High density genotyping data are indispensable for genomic analyses of complex traits in animal and crop species. Maize is one of the most important crop plants worldwide, however a high density SNP genotyping array for analysis of its large and highly dynamic genome was not available so far. We developed a high density maize SNP array composed of 616,201 variants (SNPs and small indels). Initially, 57 M variants were discovered by sequencing 30 representative temperate maize lines and then stringently filtered for sequence quality scores and predicted conversion performance on the array resulting in the selection of 1.2 M polymorphic variants assayed on two screening arrays. To identify high-confidence variants, 285 DNA samples from a broad genetic diversity panel of worldwide maize lines including the samples used for sequencing, important founder lines for European maize breeding, hybrids, and proprietary samples with European, US, semi-tropical, and tropical origin were used for experimental validation. We selected 616 k variants according to their performance during validation, support of genotype calls through sequencing data, and physical distribution for further analysis and for the design of the commercially available Affymetrix® Axiom® Maize Genotyping Array. This array is composed of 609,442 SNPs and 6,759 indels. Among these are 116,224 variants in coding regions and 45,655 SNPs of the Illumina® MaizeSNP50 BeadChip for study comparison. In a subset of 45,974 variants, apart from the target SNP additional off-target variants are detected, which show only a minor bias towards intermediate allele frequencies. We performed principal coordinate and admixture analyses to determine the ability of the array to detect and resolve population structure and investigated the extent of LD within a worldwide validation panel. The high density Affymetrix® Axiom® Maize Genotyping Array is optimized for European and American temperate maize and was developed based on a diverse sample panel by applying stringent quality filter criteria to ensure its suitability for a broad range of applications. With 600 k variants it is the largest currently publically available genotyping array in crop species.

  20. New COL6A6 variant detected by whole-exome sequencing is linked to break points in intron 4 and 3′-UTR, deleting exon 5 of RHO, and causing adRP

    PubMed Central

    de Sousa Dias, Miguel; Hernan, Imma; Delás, Barbara; Pascual, Beatriz; Borràs, Emma; Gamundi, Maria José; Mañé, Begoña; Fernández-San José, Patricia; Ayuso, Carmen

    2015-01-01

    Purpose This study aimed to test a newly devised cost-effective multiplex PCR assay for the molecular diagnosis of autosomal dominant retinitis pigmentosa (adRP), as well as the use of whole-exome sequencing (WES) to detect disease-causing mutations in adRP. Methods Genomic DNA was extracted from peripheral blood lymphocytes of index patients with adRP and their affected and unaffected family members. We used a newly devised multiplex PCR assay capable of amplifying the genetic loci of RHO, PRPH2, RP1, PRPF3, PRPF8, PRPF31, IMPDH1, NRL, CRX, KLHL7, and NR2E3 to molecularly diagnose 18 index patients with adRP. We also performed WES in affected and unaffected members of four families with adRP in whom a disease-causing mutation was previously not found. Results We identified five previously reported mutations (p.Arg677X in the RP1 gene, p.Asp133Val and p.Arg195Leu in the PRPH2 gene, and p.Pro171Leu and p.Pro215Leu in the RHO gene) and one novel mutation (p.Val345Gly in the RHO gene) representing 33% detection of causative mutations in our adRP cohort. Comparative WES analysis showed a new variant (p.Gly103Arg in the COL6A6 gene) that segregated with the disease in one family with adRP. As this variant was linked with the RHO locus, we sequenced the complete RHO gene, which revealed a deletion in intron 4 that encompassed all of exon 5 and 28 bp of the 3′-untranslated region (UTR). Conclusions The novel multiplex PCR assay with next-generation sequencing (NGS) proved effective for detecting most of the adRP-causing mutations. A WES approach led to identification of a deletion in RHO through detection of a new linked variant in COL6A6. No pathogenic variants were identified in the remaining three families. Moreover, NGS and WES were inefficient for detecting the complete deletion of exon 5 in the RHO gene in one family with adRP. Carriers of this deletion showed variable clinical status, and two of these carriers had not previously been diagnosed with RP. PMID:26321861

  1. nbCNV: a multi-constrained optimization model for discovering copy number variants in single-cell sequencing data.

    PubMed

    Zhang, Changsheng; Cai, Hongmin; Huang, Jingying; Song, Yan

    2016-09-17

    Variations in DNA copy number have an important contribution to the development of several diseases, including autism, schizophrenia and cancer. Single-cell sequencing technology allows the dissection of genomic heterogeneity at the single-cell level, thereby providing important evolutionary information about cancer cells. In contrast to traditional bulk sequencing, single-cell sequencing requires the amplification of the whole genome of a single cell to accumulate enough samples for sequencing. However, the amplification process inevitably introduces amplification bias, resulting in an over-dispersing portion of the sequencing data. Recent study has manifested that the over-dispersed portion of the single-cell sequencing data could be well modelled by negative binomial distributions. We developed a read-depth based method, nbCNV to detect the copy number variants (CNVs). The nbCNV method uses two constraints-sparsity and smoothness to fit the CNV patterns under the assumption that the read signals are negatively binomially distributed. The problem of CNV detection was formulated as a quadratic optimization problem, and was solved by an efficient numerical solution based on the classical alternating direction minimization method. Extensive experiments to compare nbCNV with existing benchmark models were conducted on both simulated data and empirical single-cell sequencing data. The results of those experiments demonstrate that nbCNV achieves superior performance and high robustness for the detection of CNVs in single-cell sequencing data.

  2. VARiD: a variation detection framework for color-space and letter-space platforms.

    PubMed

    Dalca, Adrian V; Rumble, Stephen M; Levy, Samuel; Brudno, Michael

    2010-06-15

    High-throughput sequencing (HTS) technologies are transforming the study of genomic variation. The various HTS technologies have different sequencing biases and error rates, and while most HTS technologies sequence the residues of the genome directly, generating base calls for each position, the Applied Biosystem's SOLiD platform generates dibase-coded (color space) sequences. While combining data from the various platforms should increase the accuracy of variation detection, to date there are only a few tools that can identify variants from color space data, and none that can analyze color space and regular (letter space) data together. We present VARiD--a probabilistic method for variation detection from both letter- and color-space reads simultaneously. VARiD is based on a hidden Markov model and uses the forward-backward algorithm to accurately identify heterozygous, homozygous and tri-allelic SNPs, as well as micro-indels. Our analysis shows that VARiD performs better than the AB SOLiD toolset at detecting variants from color-space data alone, and improves the calls dramatically when letter- and color-space reads are combined. The toolset is freely available at http://compbio.cs.utoronto.ca/varid.

  3. Rapid Detection of Rare Deleterious Variants by Next Generation Sequencing with Optional Microarray SNP Genotype Data

    PubMed Central

    Watson, Christopher M.; Crinnion, Laura A.; Gurgel‐Gianetti, Juliana; Harrison, Sally M.; Daly, Catherine; Antanavicuite, Agne; Lascelles, Carolina; Markham, Alexander F.; Pena, Sergio D. J.; Bonthron, David T.

    2015-01-01

    ABSTRACT Autozygosity mapping is a powerful technique for the identification of rare, autosomal recessive, disease‐causing genes. The ease with which this category of disease gene can be identified has greatly increased through the availability of genome‐wide SNP genotyping microarrays and subsequently of exome sequencing. Although these methods have simplified the generation of experimental data, its analysis, particularly when disparate data types must be integrated, remains time consuming. Moreover, the huge volume of sequence variant data generated from next generation sequencing experiments opens up the possibility of using these data instead of microarray genotype data to identify disease loci. To allow these two types of data to be used in an integrated fashion, we have developed AgileVCFMapper, a program that performs both the mapping of disease loci by SNP genotyping and the analysis of potentially deleterious variants using exome sequence variant data, in a single step. This method does not require microarray SNP genotype data, although analysis with a combination of microarray and exome genotype data enables more precise delineation of disease loci, due to superior marker density and distribution. PMID:26037133

  4. Functional analysis of a large set of BRCA2 exon 7 variants highlights the predictive value of hexamer scores in detecting alterations of exonic splicing regulatory elements.

    PubMed

    Di Giacomo, Daniela; Gaildrat, Pascaline; Abuli, Anna; Abdat, Julie; Frébourg, Thierry; Tosi, Mario; Martins, Alexandra

    2013-11-01

    Exonic variants can alter pre-mRNA splicing either by changing splice sites or by modifying splicing regulatory elements. Often these effects are difficult to predict and are only detected by performing RNA analyses. Here, we analyzed, in a minigene assay, 26 variants identified in the exon 7 of BRCA2, a cancer predisposition gene. Our results revealed eight new exon skipping mutations in this exon: one directly altering the 5' splice site and seven affecting potential regulatory elements. This brings the number of splicing regulatory mutations detected in BRCA2 exon 7 to a total of 11, a remarkably high number considering the total number of variants reported in this exon (n = 36), all tested in our minigene assay. We then exploited this large set of splicing data to test the predictive value of splicing regulator hexamers' scores recently established by Ke et al. (). Comparisons of hexamer-based predictions with our experimental data revealed high sensitivity in detecting variants that increased exon skipping, an important feature for prescreening variants before RNA analysis. In conclusion, hexamer scores represent a promising tool for predicting the biological consequences of exonic variants and may have important applications for the interpretation of variants detected by high-throughput sequencing. © 2013 WILEY PERIODICALS, INC.

  5. Variant Profiling of Candidate Genes in Pancreatic Ductal Adenocarcinoma.

    PubMed

    Huang, Jiaqi; Löhr, Johannes-Matthias; Nilsson, Magnus; Segersvärd, Ralf; Matsson, Hans; Verbeke, Caroline; Heuchel, Rainer; Kere, Juha; Iafrate, A John; Zheng, Zongli; Ye, Weimin

    2015-11-01

    Pancreatic ductal adenocarcinoma (PDAC) has a poor prognosis. Variant profiling is crucial for developing personalized treatment and elucidating the etiology of this disease. Patients with PDAC undergoing surgery from 2007 to 2012 (n = 73) were followed from diagnosis until death or the end of the study. We applied an anchored multiplex PCR (AMP)-based next-generation sequencing (NGS) method to a panel of 65 selected genes and assessed analytical performance by sequencing a quantitative multiplex DNA reference standard. In clinical PDAC samples, detection of low-level KRAS (Kirsten rat sarcoma viral oncogene homolog) mutations was validated by allele-specific PCR and digital PCR. We compared overall survival of patients according to KRAS mutation status by log-rank test and applied logistic regression to evaluate the association between smoking and tumor variant types. The AMP-based NGS method could detect variants with allele frequencies as low as 1% given sufficient sequencing depth (>1500×). Low-frequency KRAS G12 mutations (allele frequency 1%-5%) were all confirmed by allele-specific PCR and digital PCR. The most prevalent genetic alterations were in KRAS (78% of patients), TP53 (tumor protein p53) (25%), and SMAD4 (SMAD family member 4) (8%). Overall survival in T3-stage PDAC patients differed among KRAS mutation subtypes (P = 0.019). Transversion variants were more common in ever-smokers than in never-smokers (odds ratio 5.7; 95% CI 1.2-27.8). The AMP-based NGS method is applicable for profiling tumor variants. Using this approach, we demonstrated that in PDAC patients, KRAS mutant subtype G12V is associated with poorer survival, and that transversion variants are more common among smokers. © 2015 American Association for Clinical Chemistry.

  6. Targeted Analysis of Whole Genome Sequence Data to Diagnose Genetic Cardiomyopathy

    DOE PAGES

    Golbus, Jessica R.; Puckelwartz, Megan J.; Dellefave-Castillo, Lisa; ...

    2014-09-01

    Background—Cardiomyopathy is highly heritable but genetically diverse. At present, genetic testing for cardiomyopathy uses targeted sequencing to simultaneously assess the coding regions of more than 50 genes. New genes are routinely added to panels to improve the diagnostic yield. With the anticipated $1000 genome, it is expected that genetic testing will shift towards comprehensive genome sequencing accompanied by targeted gene analysis. Therefore, we assessed the reliability of whole genome sequencing and targeted analysis to identify cardiomyopathy variants in 11 subjects with cardiomyopathy. Methods and Results—Whole genome sequencing with an average of 37× coverage was combined with targeted analysis focused onmore » 204 genes linked to cardiomyopathy. Genetic variants were scored using multiple prediction algorithms combined with frequency data from public databases. This pipeline yielded 1-14 potentially pathogenic variants per individual. Variants were further analyzed using clinical criteria and/or segregation analysis. Three of three previously identified primary mutations were detected by this analysis. In six subjects for whom the primary mutation was previously unknown, we identified mutations that segregated with disease, had clinical correlates, and/or had additional pathological correlation to provide evidence for causality. For two subjects with previously known primary mutations, we identified additional variants that may act as modifiers of disease severity. In total, we identified the likely pathological mutation in 9 of 11 (82%) subjects. We conclude that these pilot data demonstrate that ~30-40× coverage whole genome sequencing combined with targeted analysis is feasible and sensitive to identify rare variants in cardiomyopathy-associated genes.« less

  7. Integrated sequence analysis pipeline provides one-stop solution for identifying disease-causing mutations.

    PubMed

    Hu, Hao; Wienker, Thomas F; Musante, Luciana; Kalscheuer, Vera M; Kahrizi, Kimia; Najmabadi, Hossein; Ropers, H Hilger

    2014-12-01

    Next-generation sequencing has greatly accelerated the search for disease-causing defects, but even for experts the data analysis can be a major challenge. To facilitate the data processing in a clinical setting, we have developed a novel medical resequencing analysis pipeline (MERAP). MERAP assesses the quality of sequencing, and has optimized capacity for calling variants, including single-nucleotide variants, insertions and deletions, copy-number variation, and other structural variants. MERAP identifies polymorphic and known causal variants by filtering against public domain databases, and flags nonsynonymous and splice-site changes. MERAP uses a logistic model to estimate the causal likelihood of a given missense variant. MERAP considers the relevant information such as phenotype and interaction with known disease-causing genes. MERAP compares favorably with GATK, one of the widely used tools, because of its higher sensitivity for detecting indels, its easy installation, and its economical use of computational resources. Upon testing more than 1,200 individuals with mutations in known and novel disease genes, MERAP proved highly reliable, as illustrated here for five families with disease-causing variants. We believe that the clinical implementation of MERAP will expedite the diagnostic process of many disease-causing defects. © 2014 WILEY PERIODICALS, INC.

  8. Novel rare variations of the oxytocin receptor (OXTR) gene in autism spectrum disorder individuals.

    PubMed

    Liu, Xiaoxi; Kawashima, Minae; Miyagawa, Taku; Otowa, Takeshi; Latt, Khun Zaw; Thiri, Myo; Nishida, Hisami; Sugiyama, Toshiro; Tsurusaki, Yoshinori; Matsumoto, Naomichi; Mabuchi, Akihiko; Tokunaga, Katsushi; Sasaki, Tsukasa

    2015-01-01

    The oxytocin receptor (OXTR) gene has been implicated as a risk gene for autism spectrum disorder (ASD)-a neurodevelopmental disorder with essential features of impairments in social communication and reciprocal interaction. The genetic associations between common variations in OXTR and ASD have been reported in multiple ethnic populations. However, little is known about the distribution of rare variations within OXTR in ASD patients. In this study, we resequenced the full length of OXTR in 105 ASD individuals using an approach that combined the power of next-generation sequencing technology, long-range PCR and DNA pooling. We demonstrated that rare variants with minor allele frequency as low as 0.05% could be reliably detected by our method. We identified 28 novel variants including potential functional variants in the intron region and one rare missense variant (R150S). We subsequently performed Sanger sequencing and validated five novel variants located in previously suggested candidate regions in ASD individuals. Further sequencing of 312 healthy subjects showed that the burden of rare variants is significantly higher in ASDs compared with healthy individuals. Our results support that the rare variation in OXTR gene might be involved in ASD.

  9. Functional Testing of SLC26A4 Variants—Clinical and Molecular Analysis of a Cohort with Enlarged Vestibular Aqueduct from Austria

    PubMed Central

    Bernardinelli, Emanuele; Nofziger, Charity; Patsch, Wolfgang; Rasp, Gerd; Paulmichl, Markus; Dossena, Silvia

    2018-01-01

    The prevalence and spectrum of sequence alterations in the SLC26A4 gene, which codes for the anion exchanger pendrin, are population-specific and account for at least 50% of cases of non-syndromic hearing loss associated with an enlarged vestibular aqueduct. A cohort of nineteen patients from Austria with hearing loss and a radiological alteration of the vestibular aqueduct underwent Sanger sequencing of SLC26A4 and GJB2, coding for connexin 26. The pathogenicity of sequence alterations detected was assessed by determining ion transport and molecular features of the corresponding SLC26A4 protein variants. In this group, four uncharacterized sequence alterations within the SLC26A4 coding region were found. Three of these lead to protein variants with abnormal functional and molecular features, while one should be considered with no pathogenic potential. Pathogenic SLC26A4 sequence alterations were only found in 12% of patients. SLC26A4 sequence alterations commonly found in other Caucasian populations were not detected. This survey represents the first study on the prevalence and spectrum of SLC26A4 sequence alterations in an Austrian cohort and further suggests that genetic testing should always be integrated with functional characterization and determination of the molecular features of protein variants in order to unequivocally identify or exclude a causal link between genotype and phenotype. PMID:29320412

  10. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Golbus, Jessica R.; Puckelwartz, Megan J.; Dellefave-Castillo, Lisa

    Background—Cardiomyopathy is highly heritable but genetically diverse. At present, genetic testing for cardiomyopathy uses targeted sequencing to simultaneously assess the coding regions of more than 50 genes. New genes are routinely added to panels to improve the diagnostic yield. With the anticipated $1000 genome, it is expected that genetic testing will shift towards comprehensive genome sequencing accompanied by targeted gene analysis. Therefore, we assessed the reliability of whole genome sequencing and targeted analysis to identify cardiomyopathy variants in 11 subjects with cardiomyopathy. Methods and Results—Whole genome sequencing with an average of 37× coverage was combined with targeted analysis focused onmore » 204 genes linked to cardiomyopathy. Genetic variants were scored using multiple prediction algorithms combined with frequency data from public databases. This pipeline yielded 1-14 potentially pathogenic variants per individual. Variants were further analyzed using clinical criteria and/or segregation analysis. Three of three previously identified primary mutations were detected by this analysis. In six subjects for whom the primary mutation was previously unknown, we identified mutations that segregated with disease, had clinical correlates, and/or had additional pathological correlation to provide evidence for causality. For two subjects with previously known primary mutations, we identified additional variants that may act as modifiers of disease severity. In total, we identified the likely pathological mutation in 9 of 11 (82%) subjects. We conclude that these pilot data demonstrate that ~30-40× coverage whole genome sequencing combined with targeted analysis is feasible and sensitive to identify rare variants in cardiomyopathy-associated genes.« less

  11. Customisation of the exome data analysis pipeline using a combinatorial approach.

    PubMed

    Pattnaik, Swetansu; Vaidyanathan, Srividya; Pooja, Durgad G; Deepak, Sa; Panda, Binay

    2012-01-01

    The advent of next generation sequencing (NGS) technologies have revolutionised the way biologists produce, analyse and interpret data. Although NGS platforms provide a cost-effective way to discover genome-wide variants from a single experiment, variants discovered by NGS need follow up validation due to the high error rates associated with various sequencing chemistries. Recently, whole exome sequencing has been proposed as an affordable option compared to whole genome runs but it still requires follow up validation of all the novel exomic variants. Customarily, a consensus approach is used to overcome the systematic errors inherent to the sequencing technology, alignment and post alignment variant detection algorithms. However, the aforementioned approach warrants the use of multiple sequencing chemistry, multiple alignment tools, multiple variant callers which may not be viable in terms of time and money for individual investigators with limited informatics know-how. Biologists often lack the requisite training to deal with the huge amount of data produced by NGS runs and face difficulty in choosing from the list of freely available analytical tools for NGS data analysis. Hence, there is a need to customise the NGS data analysis pipeline to preferentially retain true variants by minimising the incidence of false positives and make the choice of right analytical tools easier. To this end, we have sampled different freely available tools used at the alignment and post alignment stage suggesting the use of the most suitable combination determined by a simple framework of pre-existing metrics to create significant datasets.

  12. Detection of new HLA-DPB1 alleles generated by interallelic gene conversion using PCR amplification of DPB1 second exon sequences from sperm

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Erlich, H.; Zangenberg, G.; Bugawan, T.

    The rate at which allelic diversity at the HLA class I and class II loci evolves has been the subject of considerable controversy as have the mechanisms which generate new alleles. The patchwork pattern of polymorphism, particularly within the second exon of the HLA-DPB1 locus where the polymorphic sequence motifs are localized to 6 discrete regions, is consistent with the hypothesis that much of the allelic sequence variation may have been generated by segmental exchange (gene conversion). To measure the rate of new DPB1 variant generation, we have developed a strategy in which DPB1 second exon sequences are amplified frommore » pools of FACS-sorted sperm (n=50) from a heterozygous sperm donor. Pools of sperm from these heterozygous individuals are amplified with an allele-specific primer for one allele and analyzed with sequence-specific oligonucleotide probes (SSOP) complementary to the other allele. This screening procedure, which is capable of detecting a single variant molecule in a pool of parental alleles, allows the identification of new variants that have been generated by recombination and/or gene conversion between the two parental alleles. To control for potential PCR artifacts, the same screening procedure was carried out with mixtures of sperm from DPB1 *0301/*0301 and DPB1 *0401/ 0401 individuals. Pools containing putative new variants DPB1 alleles were analyzed further by cloning into M13 and sequencing the M13 clones. Our current estimate is that about 1/10,000 sperm from these heterozygous individuals represents a new DPB1 allele generated by micro-gene conversion within the second exon.« less

  13. Distinct Patterns of Somatic Mosaicism in the APC Gene in Neoplasms From Patients With Unexplained Adenomatous Polyposis.

    PubMed

    Jansen, Anne M L; Crobach, Stijn; Geurts-Giele, Willemina R R; van den Akker, Brendy E W M; Garcia, Marina Ventayol; Ruano, Dina; Nielsen, Maartje; Tops, Carli M J; Wijnen, Juul T; Hes, Frederik J; van Wezel, Tom; Dinjens, Winand N M; Morreau, Hans

    2017-02-01

    We investigated the presence and patterns of mosaicism in the APC gene in patients with colon neoplasms not associated with any other genetic variants; we performed deep sequence analysis of APC in at least 2 adenomas or carcinomas per patient. We identified mosaic variants in APC in adenomas from 9 of the 18 patients with 21 to approximately 100 adenomas. Mosaic variants of APC were variably detected in leukocyte DNA and/or non-neoplastic intestinal mucosa of these patients. In a comprehensive sequence analysis of 1 patient, we found no evidence for mosaicism in APC in non-neoplastic intestinal mucosa. One patient was found to carry a mosaic c.4666dupA APC variant in only 10 of 16 adenomas, indicating the importance of screening 2 or more adenomas for genetic variants. Copyright © 2017 AGA Institute. Published by Elsevier Inc. All rights reserved.

  14. A high-quality human reference panel reveals the complexity and distribution of genomic structural variants.

    PubMed

    Hehir-Kwa, Jayne Y; Marschall, Tobias; Kloosterman, Wigard P; Francioli, Laurent C; Baaijens, Jasmijn A; Dijkstra, Louis J; Abdellaoui, Abdel; Koval, Vyacheslav; Thung, Djie Tjwan; Wardenaar, René; Renkens, Ivo; Coe, Bradley P; Deelen, Patrick; de Ligt, Joep; Lameijer, Eric-Wubbo; van Dijk, Freerk; Hormozdiari, Fereydoun; Uitterlinden, André G; van Duijn, Cornelia M; Eichler, Evan E; de Bakker, Paul I W; Swertz, Morris A; Wijmenga, Cisca; van Ommen, Gert-Jan B; Slagboom, P Eline; Boomsma, Dorret I; Schönhuth, Alexander; Ye, Kai; Guryev, Victor

    2016-10-06

    Structural variation (SV) represents a major source of differences between individual human genomes and has been linked to disease phenotypes. However, the majority of studies provide neither a global view of the full spectrum of these variants nor integrate them into reference panels of genetic variation. Here, we analyse whole genome sequencing data of 769 individuals from 250 Dutch families, and provide a haplotype-resolved map of 1.9 million genome variants across 9 different variant classes, including novel forms of complex indels, and retrotransposition-mediated insertions of mobile elements and processed RNAs. A large proportion are previously under reported variants sized between 21 and 100 bp. We detect 4 megabases of novel sequence, encoding 11 new transcripts. Finally, we show 191 known, trait-associated SNPs to be in strong linkage disequilibrium with SVs and demonstrate that our panel facilitates accurate imputation of SVs in unrelated individuals.

  15. Reanalysis of BRCA1/2 negative high risk ovarian cancer patients reveals novel germline risk loci and insights into missing heritability

    PubMed Central

    Dyson, Gregory; Levin, Nancy K.; Chaudhry, Sophia; Rosati, Rita; Kalpage, Hasini; Simon, Michael S.; Tainsky, Michael A.

    2017-01-01

    While up to 25% of ovarian cancer (OVCA) cases are thought to be due to inherited factors, the majority of genetic risk remains unexplained. To address this gap, we sought to identify previously undescribed OVCA risk variants through the whole exome sequencing (WES) and candidate gene analysis of 48 women with ovarian cancer and selected for high risk of genetic inheritance, yet negative for any known pathogenic variants in either BRCA1 or BRCA2. In silico SNP analysis was employed to identify suspect variants followed by validation using Sanger DNA sequencing. We identified five pathogenic variants in our sample, four of which are in two genes featured on current multi-gene panels; (RAD51D, ATM). In addition, we found a pathogenic FANCM variant (R1931*) which has been recently implicated in familial breast cancer risk. Numerous rare and predicted to be damaging variants of unknown significance were detected in genes on current commercial testing panels, most prominently in ATM (n = 6) and PALB2 (n = 5). The BRCA2 variant p.K3326*, resulting in a 93 amino acid truncation, was overrepresented in our sample (odds ratio = 4.95, p = 0.01) and coexisted in the germline of these women with other deleterious variants, suggesting a possible role as a modifier of genetic penetrance. Furthermore, we detected loss of function variants in non-panel genes involved in OVCA relevant pathways; DNA repair and cell cycle control, including CHEK1, TP53I3, REC8, HMMR, RAD52, RAD1, POLK, POLQ, and MCM4. In summary, our study implicates novel risk loci as well as highlights the clinical utility for retesting BRCA1/2 negative OVCA patients by genomic sequencing and analysis of genes in relevant pathways. PMID:28591191

  16. The germline variants in DNA repair genes in pediatric medulloblastoma: a challenge for current therapeutic strategies.

    PubMed

    Trubicka, Joanna; Żemojtel, Tomasz; Hecht, Jochen; Falana, Katarzyna; Piekutowska-Abramczuk, Dorota; Płoski, Rafał; Perek-Polnik, Marta; Drogosiewicz, Monika; Grajkowska, Wiesława; Ciara, Elżbieta; Moszczyńska, Elżbieta; Dembowska-Bagińska, Bożenna; Perek, Danuta; Chrzanowska, Krystyna H; Krajewska-Walasek, Małgorzata; Łastowska, Maria

    2017-04-04

    The defects in DNA repair genes are potentially linked to development and response to therapy in medulloblastoma. Therefore the purpose of this study was to establish the spectrum and frequency of germline variants in selected DNA repair genes and their impact on response to chemotherapy in medulloblastoma patients. The following genes were investigated in 102 paediatric patients: MSH2 and RAD50 using targeted gene panel sequencing and NBN variants (p.I171V and p.K219fs*19) by Sanger sequencing. In three patients with presence of rare life-threatening adverse events (AE) and no detected variants in the analyzed genes, whole exome sequencing was performed. Based on combination of molecular and immunohistochemical evaluations tumors were divided into molecular subgroups. Presence of variants was tested for potential association with the occurrence of rare life-threatening AE and other clinical features. We have identified altogether six new potentially pathogenic variants in MSH2 (p.A733T and p.V606I), RAD50 (p.R1093*), FANCM (p.L694*), ERCC2 (p.R695C) and EXO1 (p.V738L), in addition to two known NBN variants. Five out of twelve patients with defects in either of MSH2, RAD50 and NBN genes suffered from rare life-threatening AE, more frequently than in control group (p = 0.0005). When all detected variants were taken into account, the majority of patients (8 out of 15) suffered from life-threatening toxicity during chemotherapy. Our results, based on the largest systematic study performed in a clinical setting, provide preliminary evidence for a link between defects in DNA repair genes and treatment related toxicity in children with medulloblastoma. The data suggest that patients with DNA repair gene variants could need special vigilance during and after courses of chemotherapy.

  17. Carrier screening in the era of expanding genetic technology.

    PubMed

    Arjunan, Aishwarya; Litwack, Karen; Collins, Nick; Charrow, Joel

    2016-12-01

    The Center for Jewish Genetics provides genetic education and carrier screening to individuals of Jewish descent. Carrier screening has traditionally been performed by targeted mutation analysis for founder mutations with an enzyme assay for Tay-Sachs carrier detection. The development of next-generation sequencing (NGS) allows for higher detection rates regardless of ethnicity. Here, we explore differences in carrier detection rates between genotyping and NGS in a primarily Jewish population. Peripheral blood samples or saliva samples were obtained from 506 individuals. All samples were analyzed by sequencing, targeted genotyping, triplet-repeat detection, and copy-number analysis; the analyses were carried out at Counsyl. Of 506 individuals screened, 288 were identified as carriers of at least 1 condition and 8 couples were carriers for the same disorder. A total of 434 pathogenic variants were identified. Three hundred twelve variants would have been detected via genotyping alone. Although no additional mutations were detected by NGS in diseases routinely screened for in the Ashkenazi Jewish population, 26.5% of carrier results and 2 carrier couples would have been missed without NGS in the larger panel. In a primarily Jewish population, NGS reveals a larger number of pathogenic variants and provides individuals with valuable information for family planning.Genet Med 18 12, 1214-1217.

  18. Integrating 400 million variants from 80,000 human samples with extensive annotations: towards a knowledge base to analyze disease cohorts.

    PubMed

    Hakenberg, Jörg; Cheng, Wei-Yi; Thomas, Philippe; Wang, Ying-Chih; Uzilov, Andrew V; Chen, Rong

    2016-01-08

    Data from a plethora of high-throughput sequencing studies is readily available to researchers, providing genetic variants detected in a variety of healthy and disease populations. While each individual cohort helps gain insights into polymorphic and disease-associated variants, a joint perspective can be more powerful in identifying polymorphisms, rare variants, disease-associations, genetic burden, somatic variants, and disease mechanisms. We have set up a Reference Variant Store (RVS) containing variants observed in a number of large-scale sequencing efforts, such as 1000 Genomes, ExAC, Scripps Wellderly, UK10K; various genotyping studies; and disease association databases. RVS holds extensive annotations pertaining to affected genes, functional impacts, disease associations, and population frequencies. RVS currently stores 400 million distinct variants observed in more than 80,000 human samples. RVS facilitates cross-study analysis to discover novel genetic risk factors, gene-disease associations, potential disease mechanisms, and actionable variants. Due to its large reference populations, RVS can also be employed for variant filtration and gene prioritization. A web interface to public datasets and annotations in RVS is available at https://rvs.u.hpc.mssm.edu/.

  19. Clinical Validation of Copy Number Variant Detection from Targeted Next-Generation Sequencing Panels.

    PubMed

    Kerkhof, Jennifer; Schenkel, Laila C; Reilly, Jack; McRobbie, Sheri; Aref-Eshghi, Erfan; Stuart, Alan; Rupar, C Anthony; Adams, Paul; Hegele, Robert A; Lin, Hanxin; Rodenhiser, David; Knoll, Joan; Ainsworth, Peter J; Sadikovic, Bekim

    2017-11-01

    Next-generation sequencing (NGS) technology has rapidly replaced Sanger sequencing in the assessment of sequence variations in clinical genetics laboratories. One major limitation of current NGS approaches is the ability to detect copy number variations (CNVs) approximately >50 bp. Because these represent a major mutational burden in many genetic disorders, parallel CNV assessment using alternate supplemental methods, along with the NGS analysis, is normally required, resulting in increased labor, costs, and turnaround times. The objective of this study was to clinically validate a novel CNV detection algorithm using targeted clinical NGS gene panel data. We have applied this approach in a retrospective cohort of 391 samples and a prospective cohort of 2375 samples and found a 100% sensitivity (95% CI, 89%-100%) for 37 unique events and a high degree of specificity to detect CNVs across nine distinct targeted NGS gene panels. This NGS CNV pipeline enables stand-alone first-tier assessment for CNV and sequence variants in a clinical laboratory setting, dispensing with the need for parallel CNV analysis using classic techniques, such as microarray, long-range PCR, or multiplex ligation-dependent probe amplification. This NGS CNV pipeline can also be applied to the assessment of complex genomic regions, including pseudogenic DNA sequences, such as the PMS2CL gene, and to mitochondrial genome heteroplasmy detection. Copyright © 2017 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  20. Detection of somatic, subclonal and mosaic CNVs from sequencing | Division of Cancer Prevention

    Cancer.gov

    Progress in technology has made individual genome sequencing a clinical reality, with partial genome sequencing already in use in clinical care. In fact, it is expected that within a few years whole genome sequencing will be a standard procedure that will allow discovering personal genomic variants of all types and thus greatly facilitate individualized medicine. However, fast

  1. Development of a molecular diagnostic test for Retinitis Pigmentosa in the Japanese population.

    PubMed

    Maeda, Akiko; Yoshida, Akiko; Kawai, Kanako; Arai, Yuki; Akiba, Ryutaro; Inaba, Akira; Takagi, Seiji; Fujiki, Ryoji; Hirami, Yasuhiko; Kurimoto, Yasuo; Ohara, Osamu; Takahashi, Masayo

    2018-05-21

    Retinitis Pigmentosa (RP) is the most common form of inherited retinal dystrophy caused by different genetic variants. More than 60 causative genes have been identified to date. The establishment of cost-effective molecular diagnostic tests with high sensitivity and specificity can be beneficial for patients and clinicians. Here, we developed a clinical diagnostic test for RP in the Japanese population. Evaluation of diagnostic technology, Prospective, Clinical and experimental study. A panel of 39 genes reported to cause RP in Japanese patients was established. Next generation sequence (NGS) technology was applied for the analyses of 94 probands with RP and RP-related diseases. After interpretation of detected genetic variants, molecular diagnosis based on a study of the genetic variants and a clinical phenotype was made by a multidisciplinary team including clinicians, researchers and genetic counselors. NGS analyses found 14,343 variants from 94 probands. Among them, 189 variants in 83 probands (88.3% of all cases) were selected as pathogenic variants and 64 probands (68.1%) have variants which can cause diseases. After the deliberation of these 64 cases, molecular diagnosis was made in 43 probands (45.7%). The final molecular diagnostic rate with the current system combining supplemental Sanger sequencing was 47.9% (45 of 94 cases). The RP panel provides the significant advantage of detecting genetic variants with a high molecular diagnostic rate. This type of race-specific high-throughput genotyping allows us to conduct a cost-effective and clinically useful genetic diagnostic test.

  2. Mutation analysis of Leber congenital amaurosis‑associated genes in patients with retinitis pigmentosa.

    PubMed

    Shen, Tao; Guan, Liping; Li, Shiqiang; Zhang, Jianguo; Xiao, Xueshan; Jiang, Hui; Yang, Jianhua; Guo, Xiangming; Wang, Jun; Zhang, Qingjiong

    2015-03-01

    The genetic defects underlying approximately half of all retinitis pigmentosa (RP) cases are unknown. A number of genes responsible for Leber congenital amaurosis (LCA) may also cause RP when they are mutated. Our previous study revealed that variants in the most frequently mutated nine exons accounted for approximately half of the mutations detected in a cohort of patients with LCA. The aim of the present study was to detect mutations in LCA-associated genes in patients with RP using two different strategies. Sanger sequencing was used to screen mutations in the nine exons in 293 patients with RP and exome sequencing was used to detect variants in 12 LCA-associated genes in 157 of the 293 patients with RP and then to validate the variants by Sanger sequencing. Potential pathogenic mutations were identified in four patients with early onset RP, including homozygous CRB1 mutations in two patients, compound heterozygous CRB1 mutations in one patient and compound heterozygous CEP290 mutations in one patient. The present study indicated that mutations in CEP290 may also be associated with RP but not with LCA. With the exception of CEP290, the remaining 11 genes known to be associated with LCA but not with RP are unlikely to be a common cause of RP.

  3. Panel-based Genetic Diagnostic Testing for Inherited Eye Diseases is Highly Accurate and Reproducible and More Sensitive for Variant Detection Than Exome Sequencing

    PubMed Central

    Bujakowska, Kinga M.; Sousa, Maria E.; Fonseca-Kelly, Zoë D.; Taub, Daniel G.; Janessian, Maria; Wang, Dan Yi; Au, Elizabeth D.; Sims, Katherine B.; Sweetser, David A.; Fulton, Anne B.; Liu, Qin; Wiggs, Janey L.; Gai, Xiaowu; Pierce, Eric A.

    2015-01-01

    Purpose Next-generation sequencing (NGS) based methods are being adopted broadly for genetic diagnostic testing, but the performance characteristics of these techniques have not been fully defined with regard to test accuracy and reproducibility. Methods We developed a targeted enrichment and NGS approach for genetic diagnostic testing of patients with inherited eye disorders, including inherited retinal degenerations, optic atrophy and glaucoma. In preparation for providing this Genetic Eye Disease (GEDi) test on a CLIA-certified basis, we performed experiments to measure the sensitivity, specificity, reproducibility as well as the clinical sensitivity of the test. Results The GEDi test is highly reproducible and accurate, with sensitivity and specificity for single nucleotide variant detection of 97.9% and 100%, respectively. The sensitivity for variant detection was notably better than the 88.3% achieved by whole exome sequencing (WES) using the same metrics, due to better coverage of targeted genes in the GEDi test compared to commercially available exome capture sets. Prospective testing of 192 patients with IRDs indicated that the clinical sensitivity of the GEDi test is high, with a diagnostic rate of 51%. Conclusion The data suggest that based on quantified performance metrics, selective targeted enrichment is preferable to WES for genetic diagnostic testing. PMID:25412400

  4. Infections with multiple Cryptosporidium species and new genetic variants in young dairy calves on a farm located within a drinking water catchment area in New Zealand.

    PubMed

    Shrestha, Rima D; Grinberg, Alex; Dukkipati, Venkata S R; Pleydell, Eve J; Prattley, Deborah J; French, Nigel P

    2014-05-28

    Several Cryptosporidium species are known to infect cattle. However, the occurrence of mixed infections with more than one species and the impact of this phenomenon on animal and human health are poorly understood. Therefore, to detect the presence of mixed Cryptosporidium infections, 15 immunofluorescence-positive specimens obtained from 6-week-old calves' faeces (n=60) on one dairy farm were subjected to PCR-sequencing at multiple loci. DNA sequences of three Cryptosporidium species: C. parvum (15/15), C. bovis (3/15) and C. andersoni (1/15), and two new genetic variants were identified. There was evidence of mixed infections in five specimens. C. parvum, C. bovis and C. andersoni sequences were detected together in one specimen, C. parvum and C. bovis in two specimens, and C. parvum and C. parvum-like variants in the remaining two specimens. Sequencing of gp60 amplicons identified the IIaA19G4R1 (8/15) and IIaA18G3R1 (4/15) C. parvum subgenotypes. This study provides evidence of endemic mixed infections with the three main Cryptosporidium species of cattle and new genetic variants, in calves at the transition age of six weeks. The results add to the body of evidence describing Cryptosporidium isolates as genetically heterogeneous populations, and highlight the need for iterative genotyping to explore their genetic makeup. Copyright © 2014 Elsevier B.V. All rights reserved.

  5. Detection of genome-wide copy number variants in myeloid malignancies using next-generation sequencing.

    PubMed

    Shen, Wei; Paxton, Christian N; Szankasi, Philippe; Longhurst, Maria; Schumacher, Jonathan A; Frizzell, Kimberly A; Sorrells, Shelly M; Clayton, Adam L; Jattani, Rakhi P; Patel, Jay L; Toydemir, Reha; Kelley, Todd W; Xu, Xinjie

    2018-04-01

    Genetic abnormalities, including copy number variants (CNV), copy number neutral loss of heterozygosity (CN-LOH) and gene mutations, underlie the pathogenesis of myeloid malignancies and serve as important diagnostic, prognostic and/or therapeutic markers. Currently, multiple testing strategies are required for comprehensive genetic testing in myeloid malignancies. The aim of this proof-of-principle study was to investigate the feasibility of combining detection of genome-wide large CNVs, CN-LOH and targeted gene mutations into a single assay using next-generation sequencing (NGS). For genome-wide CNV detection, we designed a single nucleotide polymorphism (SNP) sequencing backbone with 22 762 SNP regions evenly distributed across the entire genome. For targeted mutation detection, 62 frequently mutated genes in myeloid malignancies were targeted. We combined this SNP sequencing backbone with a targeted mutation panel, and sequenced 9 healthy individuals and 16 patients with myeloid malignancies using NGS. We detected 52 somatic CNVs, 11 instances of CN-LOH and 39 oncogenic mutations in the 16 patients with myeloid malignancies, and none in the 9 healthy individuals. All CNVs and CN-LOH were confirmed by SNP microarray analysis. We describe a genome-wide SNP sequencing backbone which allows for sensitive detection of genome-wide CNVs and CN-LOH using NGS. This proof-of-principle study has demonstrated that this strategy can provide more comprehensive genetic profiling for patients with myeloid malignancies using a single assay. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  6. Sanger Confirmation Is Required to Achieve Optimal Sensitivity and Specificity in Next-Generation Sequencing Panel Testing.

    PubMed

    Mu, Wenbo; Lu, Hsiao-Mei; Chen, Jefferey; Li, Shuwei; Elliott, Aaron M

    2016-11-01

    Next-generation sequencing (NGS) has rapidly replaced Sanger sequencing as the method of choice for diagnostic gene-panel testing. For hereditary-cancer testing, the technical sensitivity and specificity of the assay are paramount as clinicians use results to make important clinical management and treatment decisions. There is significant debate within the diagnostics community regarding the necessity of confirming NGS variant calls by Sanger sequencing, considering that numerous laboratories report having 100% specificity from the NGS data alone. Here we report our results from 20,000 hereditary-cancer NGS panels spanning 47 genes, in which all 7845 nonpolymorphic variants were Sanger- sequenced. Of these, 98.7% were concordant between NGS and Sanger sequencing and 1.3% were identified as NGS false-positives, located mainly in complex genomic regions (A/T-rich regions, G/C-rich regions, homopolymer stretches, and pseudogene regions). Simulating a false-positive rate of zero by adjusting the variant-calling quality-score thresholds decreased the sensitivity of the assay from 100% to 97.8%, resulting in the missed detection of 176 Sanger-confirmed variants, the majority in complex genomic regions (n = 114) and mosaic mutations (n = 7). The data illustrate the importance of setting quality thresholds for panel testing only after thousands of samples have been processed and the necessity of Sanger confirmation of NGS variants to maintain the highest possible sensitivity. Copyright © 2016 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  7. Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency

    PubMed Central

    Bone, William P.; Washington, Nicole L.; Buske, Orion J.; Adams, David R.; Davis, Joie; Draper, David; Flynn, Elise D.; Girdea, Marta; Godfrey, Rena; Golas, Gretchen; Groden, Catherine; Jacobsen, Julius; Köhler, Sebastian; Lee, Elizabeth M. J.; Links, Amanda E.; Markello, Thomas C.; Mungall, Christopher J.; Nehrebecky, Michele; Robinson, Peter N.; Sincan, Murat; Soldatos, Ariane G.; Tifft, Cynthia J.; Toro, Camilo; Trang, Heather; Valkanas, Elise; Vasilevsky, Nicole; Wahl, Colleen; Wolfe, Lynne A.; Boerkoel, Cornelius F.; Brudno, Michael; Haendel, Melissa A.; Gahl, William A.; Smedley, Damian

    2016-01-01

    Purpose: Medical diagnosis and molecular or biochemical confirmation typically rely on the knowledge of the clinician. Although this is very difficult in extremely rare diseases, we hypothesized that the recording of patient phenotypes in Human Phenotype Ontology (HPO) terms and computationally ranking putative disease-associated sequence variants improves diagnosis, particularly for patients with atypical clinical profiles. Genet Med 18 6, 608–617. Methods: Using simulated exomes and the National Institutes of Health Undiagnosed Diseases Program (UDP) patient cohort and associated exome sequence, we tested our hypothesis using Exomiser. Exomiser ranks candidate variants based on patient phenotype similarity to (i) known disease–gene phenotypes, (ii) model organism phenotypes of candidate orthologs, and (iii) phenotypes of protein–protein association neighbors. Genet Med 18 6, 608–617. Results: Benchmarking showed Exomiser ranked the causal variant as the top hit in 97% of known disease–gene associations and ranked the correct seeded variant in up to 87% when detectable disease–gene associations were unavailable. Using UDP data, Exomiser ranked the causative variant(s) within the top 10 variants for 11 previously diagnosed variants and achieved a diagnosis for 4 of 23 cases undiagnosed by clinical evaluation. Genet Med 18 6, 608–617. Conclusion: Structured phenotyping of patients and computational analysis are effective adjuncts for diagnosing patients with genetic disorders. Genet Med 18 6, 608–617. PMID:26562225

  8. Development and preliminary evaluation of a multiplexed amplification and next generation sequencing method for viral hemorrhagic fever diagnostics

    PubMed Central

    Radonić, Aleksandar; Kocak Tufan, Zeliha; Domingo, Cristina

    2017-01-01

    Background We describe the development and evaluation of a novel method for targeted amplification and Next Generation Sequencing (NGS)-based identification of viral hemorrhagic fever (VHF) agents and assess the feasibility of this approach in diagnostics. Methodology An ultrahigh-multiplex panel was designed with primers to amplify all known variants of VHF-associated viruses and relevant controls. The performance of the panel was evaluated via serially quantified nucleic acids from Yellow fever virus, Rift Valley fever virus, Crimean-Congo hemorrhagic fever (CCHF) virus, Ebola virus, Junin virus and Chikungunya virus in a semiconductor-based sequencing platform. A comparison of direct NGS and targeted amplification-NGS was performed. The panel was further tested via a real-time nanopore sequencing-based platform, using clinical specimens from CCHF patients. Principal findings The multiplex primer panel comprises two pools of 285 and 256 primer pairs for the identification of 46 virus species causing hemorrhagic fevers, encompassing 6,130 genetic variants of the strains involved. In silico validation revealed that the panel detected over 97% of all known genetic variants of the targeted virus species. High levels of specificity and sensitivity were observed for the tested virus strains. Targeted amplification ensured viral read detection in specimens with the lowest virus concentration (1–10 genome equivalents) and enabled significant increases in specific reads over background for all viruses investigated. In clinical specimens, the panel enabled detection of the causative agent and its characterization within 10 minutes of sequencing, with sample-to-result time of less than 3.5 hours. Conclusions Virus enrichment via targeted amplification followed by NGS is an applicable strategy for the diagnosis of VHFs which can be adapted for high-throughput or nanopore sequencing platforms and employed for surveillance or outbreak monitoring. PMID:29155823

  9. Development and preliminary evaluation of a multiplexed amplification and next generation sequencing method for viral hemorrhagic fever diagnostics.

    PubMed

    Brinkmann, Annika; Ergünay, Koray; Radonić, Aleksandar; Kocak Tufan, Zeliha; Domingo, Cristina; Nitsche, Andreas

    2017-11-01

    We describe the development and evaluation of a novel method for targeted amplification and Next Generation Sequencing (NGS)-based identification of viral hemorrhagic fever (VHF) agents and assess the feasibility of this approach in diagnostics. An ultrahigh-multiplex panel was designed with primers to amplify all known variants of VHF-associated viruses and relevant controls. The performance of the panel was evaluated via serially quantified nucleic acids from Yellow fever virus, Rift Valley fever virus, Crimean-Congo hemorrhagic fever (CCHF) virus, Ebola virus, Junin virus and Chikungunya virus in a semiconductor-based sequencing platform. A comparison of direct NGS and targeted amplification-NGS was performed. The panel was further tested via a real-time nanopore sequencing-based platform, using clinical specimens from CCHF patients. The multiplex primer panel comprises two pools of 285 and 256 primer pairs for the identification of 46 virus species causing hemorrhagic fevers, encompassing 6,130 genetic variants of the strains involved. In silico validation revealed that the panel detected over 97% of all known genetic variants of the targeted virus species. High levels of specificity and sensitivity were observed for the tested virus strains. Targeted amplification ensured viral read detection in specimens with the lowest virus concentration (1-10 genome equivalents) and enabled significant increases in specific reads over background for all viruses investigated. In clinical specimens, the panel enabled detection of the causative agent and its characterization within 10 minutes of sequencing, with sample-to-result time of less than 3.5 hours. Virus enrichment via targeted amplification followed by NGS is an applicable strategy for the diagnosis of VHFs which can be adapted for high-throughput or nanopore sequencing platforms and employed for surveillance or outbreak monitoring.

  10. Signatures of positive selection in the cis-regulatory sequences of the human oxytocin receptor (OXTR) and arginine vasopressin receptor 1a (AVPR1A) genes.

    PubMed

    Schaschl, Helmut; Huber, Susanne; Schaefer, Katrin; Windhager, Sonja; Wallner, Bernard; Fieder, Martin

    2015-05-13

    The evolutionary highly conserved neurohypophyseal hormones oxytocin and arginine vasopressin play key roles in regulating social cognition and behaviours. The effects of these two peptides are meditated by their specific receptors, which are encoded by the oxytocin receptor (OXTR) and arginine vasopressin receptor 1a genes (AVPR1A), respectively. In several species, polymorphisms in these genes have been linked to various behavioural traits. Little, however, is known about whether positive selection acts on sequence variants in genes influencing variation in human behaviours. We identified, in both neuroreceptor genes, signatures of balancing selection in the cis-regulative acting sequences such as transcription factor binding and enhancer sequences, as well as in a transcriptional repressor sequence motif. Additionally, in the intron 3 of the OXTR gene, the SNP rs59190448 appears to be under positive directional selection. For rs59190448, only one phenotypical association is known so far, but it is in high LD' (>0.8) with loci of known association; i.e., variants associated with key pro-social behaviours and mental disorders in humans. Only for one SNP on the OXTR gene (rs59190448) was a sign of positive directional selection detected with all three methods of selection detection. For rs59190448, however, only one phenotypical association is known, but rs59190448 is in high LD' (>0.8), with variants associated with important pro-social behaviours and mental disorders in humans. We also detected various signatures of balancing selection on both neuroreceptor genes.

  11. Clinical whole-genome sequencing from routine formalin-fixed, paraffin-embedded specimens: pilot study for the 100,000 Genomes Project.

    PubMed

    Robbe, Pauline; Popitsch, Niko; Knight, Samantha J L; Antoniou, Pavlos; Becq, Jennifer; He, Miao; Kanapin, Alexander; Samsonova, Anastasia; Vavoulis, Dimitrios V; Ross, Mark T; Kingsbury, Zoya; Cabes, Maite; Ramos, Sara D C; Page, Suzanne; Dreau, Helene; Ridout, Kate; Jones, Louise J; Tuff-Lacey, Alice; Henderson, Shirley; Mason, Joanne; Buffa, Francesca M; Verrill, Clare; Maldonado-Perez, David; Roxanis, Ioannis; Collantes, Elena; Browning, Lisa; Dhar, Sunanda; Damato, Stephen; Davies, Susan; Caulfield, Mark; Bentley, David R; Taylor, Jenny C; Turnbull, Clare; Schuh, Anna

    2018-02-01

    PurposeFresh-frozen (FF) tissue is the optimal source of DNA for whole-genome sequencing (WGS) of cancer patients. However, it is not always available, limiting the widespread application of WGS in clinical practice. We explored the viability of using formalin-fixed, paraffin-embedded (FFPE) tissues, available routinely for cancer patients, as a source of DNA for clinical WGS.MethodsWe conducted a prospective study using DNAs from matched FF, FFPE, and peripheral blood germ-line specimens collected from 52 cancer patients (156 samples) following routine diagnostic protocols. We compared somatic variants detected in FFPE and matching FF samples.ResultsWe found the single-nucleotide variant agreement reached 71% across the genome and somatic copy-number alterations (CNAs) detection from FFPE samples was suboptimal (0.44 median correlation with FF) due to nonuniform coverage. CNA detection was improved significantly with lower reverse crosslinking temperature in FFPE DNA extraction (80 °C or 65 °C depending on the methods). Our final data showed somatic variant detection from FFPE for clinical decision making is possible. We detected 98% of clinically actionable variants (including 30/31 CNAs).ConclusionWe present the first prospective WGS study of cancer patients using FFPE specimens collected in a routine clinical environment proving WGS can be applied in the clinic.GENETICS in MEDICINE advance online publication, 1 February 2018; doi:10.1038/gim.2017.241.

  12. Brief Report: Late-Onset Cryopyrin-Associated Periodic Syndrome Due to Myeloid-Restricted Somatic NLRP3 Mosaicism.

    PubMed

    Mensa-Vilaro, Anna; Teresa Bosque, María; Magri, Giuliana; Honda, Yoshitaka; Martínez-Banaclocha, Helios; Casorran-Berges, Marta; Sintes, Jordi; González-Roca, Eva; Ruiz-Ortiz, Estibaliz; Heike, Toshio; Martínez-Garcia, Juan J; Baroja-Mazo, Alberto; Cerutti, Andrea; Nishikomori, Ryuta; Yagüe, Jordi; Pelegrín, Pablo; Delgado-Beltran, Concha; Aróstegui, Juan I

    2016-12-01

    Gain-of-function NLRP3 mutations cause cryopyrin-associated periodic syndrome (CAPS), with gene mosaicism playing a relevant role in the pathogenesis. This study was undertaken to characterize the genetic cause underlying late-onset but otherwise typical CAPS. We studied a 64-year-old patient who presented with recurrent episodes of urticaria-like rash, fever, conjunctivitis, and oligoarthritis at age 56 years. DNA was extracted from both unfractionated blood and isolated leukocyte and CD34+ subpopulations. Genetic studies were performed using both the Sanger method of DNA sequencing and next-generation sequencing (NGS) methods. In vitro and ex vivo analyses were performed to determine the consequences that the presence of the variant have in the normal structure or function of the protein of the detected variant. NGS analyses revealed the novel p.Gln636Glu NLRP3 variant in unfractionated blood, with an allele frequency (18.4%) compatible with gene mosaicism. Sanger sequence chromatograms revealed a small peak corresponding to the variant allele. Amplicon-based deep sequencing revealed somatic NLRP3 mosaicism restricted to myeloid cells (31.8% in monocytes, 24.6% in neutrophils, and 11.2% in circulating CD34+ common myeloid progenitor cells) and its complete absence in lymphoid cells. Functional analyses confirmed the gain-of-function behavior of the gene variant and hyperactivity of the NLRP3 inflammasome in the patient. Treatment with anakinra resulted in good control of the disease. We identified the novel gain-of-function p.Gln636Glu NLRP3 mutation, which was detected as a somatic mutation restricted to myeloid cells, as the cause of late-onset but otherwise typical CAPS. Our results expand the diversity of CAPS toward milder phenotypes than previously reported, including those starting during adulthood. © 2016, American College of Rheumatology.

  13. Re-Ranking Sequencing Variants in the Post-GWAS Era for Accurate Causal Variant Identification

    PubMed Central

    Faye, Laura L.; Machiela, Mitchell J.; Kraft, Peter; Bull, Shelley B.; Sun, Lei

    2013-01-01

    Next generation sequencing has dramatically increased our ability to localize disease-causing variants by providing base-pair level information at costs increasingly feasible for the large sample sizes required to detect complex-trait associations. Yet, identification of causal variants within an established region of association remains a challenge. Counter-intuitively, certain factors that increase power to detect an associated region can decrease power to localize the causal variant. First, combining GWAS with imputation or low coverage sequencing to achieve the large sample sizes required for high power can have the unintended effect of producing differential genotyping error among SNPs. This tends to bias the relative evidence for association toward better genotyped SNPs. Second, re-use of GWAS data for fine-mapping exploits previous findings to ensure genome-wide significance in GWAS-associated regions. However, using GWAS findings to inform fine-mapping analysis can bias evidence away from the causal SNP toward the tag SNP and SNPs in high LD with the tag. Together these factors can reduce power to localize the causal SNP by more than half. Other strategies commonly employed to increase power to detect association, namely increasing sample size and using higher density genotyping arrays, can, in certain common scenarios, actually exacerbate these effects and further decrease power to localize causal variants. We develop a re-ranking procedure that accounts for these adverse effects and substantially improves the accuracy of causal SNP identification, often doubling the probability that the causal SNP is top-ranked. Application to the NCI BPC3 aggressive prostate cancer GWAS with imputation meta-analysis identified a new top SNP at 2 of 3 associated loci and several additional possible causal SNPs at these loci that may have otherwise been overlooked. This method is simple to implement using R scripts provided on the author's website. PMID:23950724

  14. Socrates: identification of genomic rearrangements in tumour genomes by re-aligning soft clipped reads

    PubMed Central

    Schröder, Jan; Hsu, Arthur; Boyle, Samantha E.; Macintyre, Geoff; Cmero, Marek; Tothill, Richard W.; Johnstone, Ricky W.; Shackleton, Mark; Papenfuss, Anthony T.

    2014-01-01

    Motivation: Methods for detecting somatic genome rearrangements in tumours using next-generation sequencing are vital in cancer genomics. Available algorithms use one or more sources of evidence, such as read depth, paired-end reads or split reads to predict structural variants. However, the problem remains challenging due to the significant computational burden and high false-positive or false-negative rates. Results: In this article, we present Socrates (SOft Clip re-alignment To idEntify Structural variants), a highly efficient and effective method for detecting genomic rearrangements in tumours that uses only split-read data. Socrates has single-nucleotide resolution, identifies micro-homologies and untemplated sequence at break points, has high sensitivity and high specificity and takes advantage of parallelism for efficient use of resources. We demonstrate using simulated and real data that Socrates performs well compared with a number of existing structural variant detection tools. Availability and implementation: Socrates is released as open source and available from http://bioinf.wehi.edu.au/socrates. Contact: papenfuss@wehi.edu.au Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24389656

  15. BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU.

    PubMed

    Luo, Ruibang; Wong, Yiu-Lun; Law, Wai-Chun; Lee, Lap-Kei; Cheung, Jeanno; Liu, Chi-Man; Lam, Tak-Wah

    2014-01-01

    This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads), or just 25 min for 210-fold whole exome sequencing. BALSA's speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa.

  16. Looking beyond the exome: a phenotype-first approach to molecular diagnostic resolution in rare and undiagnosed diseases.

    PubMed

    Pena, Loren D M; Jiang, Yong-Hui; Schoch, Kelly; Spillmann, Rebecca C; Walley, Nicole; Stong, Nicholas; Rapisardo Horn, Sarah; Sullivan, Jennifer A; McConkie-Rosell, Allyn; Kansagra, Sujay; Smith, Edward C; El-Dairi, Mays; Bellet, Jane; Keels, Martha Ann; Jasien, Joan; Kranz, Peter G; Noel, Richard; Nagaraj, Shashi K; Lark, Robert K; Wechsler, Daniel S G; Del Gaudio, Daniela; Leung, Marco L; Hendon, Laura G; Parker, Collette C; Jones, Kelly L; Goldstein, David B; Shashi, Vandana

    2018-04-01

    PurposeTo describe examples of missed pathogenic variants on whole-exome sequencing (WES) and the importance of deep phenotyping for further diagnostic testing.MethodsGuided by phenotypic information, three children with negative WES underwent targeted single-gene testing.ResultsIndividual 1 had a clinical diagnosis consistent with infantile systemic hyalinosis, although WES and a next-generation sequencing (NGS)-based ANTXR2 test were negative. Sanger sequencing of ANTXR2 revealed a homozygous single base pair insertion, previously missed by the WES variant caller software. Individual 2 had neurodevelopmental regression and cerebellar atrophy, with no diagnosis on WES. New clinical findings prompted Sanger sequencing and copy number testing of PLA2G6. A novel homozygous deletion of the noncoding exon 1 (not included in the WES capture kit) was detected, with extension into the promoter, confirming the clinical suspicion of infantile neuroaxonal dystrophy. Individual 3 had progressive ataxia, spasticity, and magnetic resonance image changes of vanishing white matter leukoencephalopathy. An NGS leukodystrophy gene panel and WES showed a heterozygous pathogenic variant in EIF2B5; no deletions/duplications were detected. Sanger sequencing of EIF2B5 showed a frameshift indel, probably missed owing to failure of alignment.ConclusionThese cases illustrate potential pitfalls of WES/NGS testing and the importance of phenotype-guided molecular testing in yielding diagnoses.

  17. Exome sequencing for simultaneous mutation screening in children with hemophagocytic lymphohistiocytosis.

    PubMed

    Mukda, Ekchol; Trachoo, Objoon; Pasomsub, Ekawat; Tiyasirichokchai, Rawiphorn; Iemwimangsa, Nareenart; Sosothikul, Darintr; Chantratita, Wasun; Pakakasama, Samart

    2017-08-01

    In the present study, we used exome sequencing to analyze PRF1, UNC13D, STX11, and STXBP2, as well as genes associated with primary immunodeficiency disease (RAB27A, LYST, AP3B1, SH2D1A, ITK, CD27, XIAP, and MAGT1) in Thai children with hemophagocytic lymphohistiocytosis (HLH). We performed mutation analysis of HLH-associated genes in 25 Thai children using an exome sequencing method. Genetic variations found within these target genes were compared to exome sequencing data from 133 healthy individuals. Variants identified with minor allele frequencies <5% and novel mutations were confirmed using Sanger sequencing. Exome sequencing data revealed 101 non-synonymous single nucleotide polymorphisms (SNPs) in all subjects. These SNPs were classified as pathogenic (n = 1), likely pathogenic (n = 16), variant of unknown significance (n = 12), or benign variant (n = 72). Homozygous, compound heterozygous, and double-gene heterozygous variants, involving mutations in PRF1 (n = 3), UNC13D (n = 2), STXBP2 (n = 3), LYST (n = 3), XIAP (n = 2), AP3B1 (n = 1), RAB27A (n = 1), and MAGT1 (n = 1), were demonstrated in 12 patients. Novel mutations were found in most patients in this study. In conclusion, exome sequencing demonstrated the ability to identify rare genetic variants in HLH patients. This method is useful in the detection of mutations in multi-gene associated diseases.

  18. αIIbβ3 variants defined by next-generation sequencing: Predicting variants likely to cause Glanzmann thrombasthenia

    PubMed Central

    Buitrago, Lorena; Rendon, Augusto; Liang, Yupu; Simeoni, Ilenia; Negri, Ana; Filizola, Marta; Ouwehand, Willem H.; Coller, Barry S.; Alessi, Marie-Christine; Ballmaier, Matthias; Bariana, Tadbir; Bellissimo, Daniel; Bertoli, Marta; Bray, Paul; Bury, Loredana; Carrell, Robin; Cattaneo, Marco; Collins, Peter; French, Deborah; Favier, Remi; Freson, Kathleen; Furie, Bruce; Germeshausen, Manuela; Ghevaert, Cedric; Gomez, Keith; Goodeve, Anne; Gresele, Paolo; Guerrero, Jose; Hampshire, Dan J.; Hadinnapola, Charaka; Heemskerk, Johan; Henskens, Yvonne; Hill, Marian; Hogg, Nancy; Johnsen, Jill; Kahr, Walter; Kerr, Ron; Kunishima, Shinji; Laffan, Michael; Natwani, Amit; Neerman-Arbez, Marguerite; Nurden, Paquita; Nurden, Alan; Ormiston, Mark; Othman, Maha; Ouwehand, Willem; Perry, David; Vilk, Shoshana Ravel; Reitsma, Pieter; Rondina, Matthew; Simeoni, Ilenia; Smethurst, Peter; Stephens, Jonathan; Stevenson, William; Szkotak, Artur; Turro, Ernest; Van Geet, Christel; Vries, Minka; Ward, June; Waye, John; Westbury, Sarah; Whiteheart, Sidney; Wilcox, David; Zhang, Bi

    2015-01-01

    Next-generation sequencing is transforming our understanding of human genetic variation but assessing the functional impact of novel variants presents challenges. We analyzed missense variants in the integrin αIIbβ3 receptor subunit genes ITGA2B and ITGB3 identified by whole-exome or -genome sequencing in the ThromboGenomics project, comprising ∼32,000 alleles from 16,108 individuals. We analyzed the results in comparison with 111 missense variants in these genes previously reported as being associated with Glanzmann thrombasthenia (GT), 20 associated with alloimmune thrombocytopenia, and 5 associated with aniso/macrothrombocytopenia. We identified 114 novel missense variants in ITGA2B (affecting ∼11% of the amino acids) and 68 novel missense variants in ITGB3 (affecting ∼9% of the amino acids). Of the variants, 96% had minor allele frequencies (MAF) < 0.1%, indicating their rarity. Based on sequence conservation, MAF, and location on a complete model of αIIbβ3, we selected three novel variants that affect amino acids previously associated with GT for expression in HEK293 cells. αIIb P176H and β3 C547G severely reduced αIIbβ3 expression, whereas αIIb P943A partially reduced αIIbβ3 expression and had no effect on fibrinogen binding. We used receiver operating characteristic curves of combined annotation-dependent depletion, Polyphen 2-HDIV, and sorting intolerant from tolerant to estimate the percentage of novel variants likely to be deleterious. At optimal cut-off values, which had 69–98% sensitivity in detecting GT mutations, between 27% and 71% of the novel αIIb or β3 missense variants were predicted to be deleterious. Our data have implications for understanding the evolutionary pressure on αIIbβ3 and highlight the challenges in predicting the clinical significance of novel missense variants. PMID:25827233

  19. Detection of the Canine Parvovirus 2c Subtype in Australian Dogs.

    PubMed

    Woolford, Lucy; Crocker, Paul; Bobrowski, Hannah; Baker, Trevor; Hemmatzadeh, Farhid

    2017-06-01

    Canine parvovirus (CPV-2) is an important cause of hemorrhagic enteritis in dogs. In Australia the disease has been associated with CPV-2a and CPV-2b variants. A third more recently emerged variant overseas, CPV-2c, has not been detected in surveys of the Australian dog population. In this study, we report three cases of canine parvoviral enteritis associated with CPV-2c infection; case 1 occurred in an 8-week-old puppy that died following acute hemorrhagic enteritis. Cases 2 and 3 were an 11-month-old female entire Saint Bernard and a 9-month-old male entire Siberian husky, respectively, both which had completed vaccination schedules and presented with vomiting or mild diarrhea only. Full genomic sequencing of parvoviral DNA from cases 1, 2, and 3 revealed greater than 99% homology to known CPV-2c variants and predicted protein sequences from the VP2 region of viral DNA from all three cases identified; glutamic acid residues at the 426 amino acid residue, characteristic of the CPV-2c variant. Veterinary professionals should be aware that CPV-2c is now present in Australia, detected in a puppy and vaccinated young adult dogs in this study. Further characterization of CPV-2c-associated disease and its prevalence in Australian dogs requires additional research.

  20. Ecotype-specific and chromosome-specific expansion of variant centromeric satellites in Arabidopsis thaliana.

    PubMed

    Ito, Hidetaka; Miura, Asuka; Takashima, Kazuya; Kakutani, Tetsuji

    2007-01-01

    Despite the conserved roles and conserved protein machineries of centromeres, their nucleotide sequences can be highly diverse even among related species. The diversity reflects rapid evolution, but the underlying mechanism is largely unknown. One approach to monitor rapid evolution is examination of intra-specific variation. Here we report variant centromeric satellites of Arabidopsis thaliana found through survey of 103 natural accessions (ecotypes). Among them, a cluster of variant centromeric satellites was detected in one ecotype, Cape Verde Islands (Cvi). Recombinant inbred mapping revealed that the variant satellites are distributed in centromeric region of the chromosome 5 (CEN5) of this ecotype. This apparently recent variant accumulation is associated with large deletion of a pericentromeric region and the expansion of satellite region. The variant satellite was bound to HTR12 (centromeric variant histone H3), although expansion of the satellite was not associated with comparable increase in the HTR12 binding. The results suggest that variant satellites with centromere function can rapidly accumulate in one centromere, supporting the model that the satellite repeats in the array are homogenized by occasional unequal crossing-over, which has a potential to generate an expansion of local sequence variants within a centromere cluster.

  1. Whole-Exome Sequencing in Age-Related Macular Degeneration Identifies Rare Variants in COL8A1, a Component of Bruch's Membrane.

    PubMed

    Corominas, Jordi; Colijn, Johanna M; Geerlings, Maartje J; Pauper, Marc; Bakker, Bjorn; Amin, Najaf; Lores Motta, Laura; Kersten, Eveline; Garanto, Alejandro; Verlouw, Joost A M; van Rooij, Jeroen G J; Kraaij, Robert; de Jong, Paulus T V M; Hofman, Albert; Vingerling, Johannes R; Schick, Tina; Fauser, Sascha; de Jong, Eiko K; van Duijn, Cornelia M; Hoyng, Carel B; Klaver, Caroline C W; den Hollander, Anneke I

    2018-04-26

    Genome-wide association studies and targeted sequencing studies of candidate genes have identified common and rare variants that are associated with age-related macular degeneration (AMD). Whole-exome sequencing (WES) studies allow a more comprehensive analysis of rare coding variants across all genes of the genome and will contribute to a better understanding of the underlying disease mechanisms. To date, the number of WES studies in AMD case-control cohorts remains scarce and sample sizes are limited. To scrutinize the role of rare protein-altering variants in AMD cause, we performed the largest WES study in AMD to date in a large European cohort consisting of 1125 AMD patients and 1361 control participants. Genome-wide case-control association study of WES data. One thousand one hundred twenty-five AMD patients and 1361 control participants. A single variant association test of WES data was performed to detect variants that are associated individually with AMD. The cumulative effect of multiple rare variants with 1 gene was analyzed using a gene-based CMC burden test. Immunohistochemistry was performed to determine the localization of the Col8a1 protein in mouse eyes. Genetic variants associated with AMD. We detected significantly more rare protein-altering variants in the COL8A1 gene in patients (22/2250 alleles [1.0%]) than in control participants (11/2722 alleles [0.4%]; P = 7.07×10 -5 ). The association of rare variants in the COL8A1 gene is independent of the common intergenic variant (rs140647181) near the COL8A1 gene previously associated with AMD. We demonstrated that the Col8a1 protein localizes at Bruch's membrane. This study supported a role for protein-altering variants in the COL8A1 gene in AMD pathogenesis. We demonstrated the presence of Col8a1 in Bruch's membrane, further supporting the role of COL8A1 variants in AMD pathogenesis. Protein-altering variants in COL8A1 may alter the integrity of Bruch's membrane, contributing to the accumulation of drusen and the development of AMD. Copyright © 2018 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.

  2. Population sequencing reveals breed and sub-species specific CNVs in cattle

    USDA-ARS?s Scientific Manuscript database

    Individualized copy number variation (CNV) maps have highlighted the need for population surveys of cattle to detect rare and common variants. While SNP and comparative genomic hybridization (CGH) arrays have provided preliminary data, next-generation sequence (NGS) data analysis offers an increased...

  3. VirVarSeq: a low-frequency virus variant detection pipeline for Illumina sequencing using adaptive base-calling accuracy filtering.

    PubMed

    Verbist, Bie M P; Thys, Kim; Reumers, Joke; Wetzels, Yves; Van der Borght, Koen; Talloen, Willem; Aerssens, Jeroen; Clement, Lieven; Thas, Olivier

    2015-01-01

    In virology, massively parallel sequencing (MPS) opens many opportunities for studying viral quasi-species, e.g. in HIV-1- and HCV-infected patients. This is essential for understanding pathways to resistance, which can substantially improve treatment. Although MPS platforms allow in-depth characterization of sequence variation, their measurements still involve substantial technical noise. For Illumina sequencing, single base substitutions are the main error source and impede powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores (Qs) that are useful for differentiating errors from the real low-frequency mutations. A variant calling tool, Q-cpileup, is proposed, which exploits the Qs of nucleotides in a filtering strategy to increase specificity. The tool is imbedded in an open-source pipeline, VirVarSeq, which allows variant calling starting from fastq files. Using both plasmid mixtures and clinical samples, we show that Q-cpileup is able to reduce the number of false-positive findings. The filtering strategy is adaptive and provides an optimized threshold for individual samples in each sequencing run. Additionally, linkage information is kept between single-nucleotide polymorphisms as variants are called at the codon level. This enables virologists to have an immediate biological interpretation of the reported variants with respect to their antiviral drug responses. A comparison with existing SNP caller tools reveals that calling variants at the codon level with Q-cpileup results in an outstanding sensitivity while maintaining a good specificity for variants with frequencies down to 0.5%. The VirVarSeq is available, together with a user's guide and test data, at sourceforge: http://sourceforge.net/projects/virtools/?source=directory. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  4. Semiconductor Whole Exome Sequencing for the Identification of Genetic Variants in Colombian Patients Clinically Diagnosed with Long QT Syndrome.

    PubMed

    Burgos, Mariana; Arenas, Alvaro; Cabrera, Rodrigo

    2016-08-01

    Inherited long QT syndrome (LQTS) is a cardiac channelopathy characterized by a prolongation of QT interval and the risk of syncope, cardiac arrest, and sudden cardiac death. Genetic diagnosis of LQTS is critical in medical practice as results can guide adequate management of patients and distinguish phenocopies such as catecholaminergic polymorphic ventricular tachycardia (CPVT). However, extensive screening of large genomic regions is required in order to reliably identify genetic causes. Semiconductor whole exome sequencing (WES) is a promising approach for the identification of variants in the coding regions of most human genes. DNA samples from 21 Colombian patients clinically diagnosed with LQTS were enriched for coding regions using multiplex polymerase chain reaction (PCR) and subjected to WES using a semiconductor sequencer. Semiconductor WES showed mean coverage of 93.6 % for all coding regions relevant to LQTS at >10× depth with high intra- and inter-assay depth heterogeneity. Fifteen variants were detected in 12 patients in genes associated with LQTS. Three variants were identified in three patients in genes associated with CPVT. Co-segregation analysis was performed when possible. All variants were analyzed with two pathogenicity prediction algorithms. The overall prevalence of LQTS and CPVT variants in our cohort was 71.4 %. All LQTS variants previously identified through commercial genetic testing were identified. Standardized WES assays can be easily implemented, often at a lower cost than sequencing panels. Our results show that WES can identify LQTS-causing mutations and permits differential diagnosis of related conditions in a real-world clinical setting. However, high heterogeneity in sequencing depth and low coverage in the most relevant genes is expected to be associated with reduced analytical sensitivity.

  5. Mutation analysis in 129 genes associated with other forms of retinal dystrophy in 157 families with retinitis pigmentosa based on exome sequencing.

    PubMed

    Xu, Yan; Guan, Liping; Xiao, Xueshan; Zhang, Jianguo; Li, Shiqiang; Jiang, Hui; Jia, Xiaoyun; Yang, Jianhua; Guo, Xiangming; Yin, Ye; Wang, Jun; Zhang, Qingjiong

    2015-01-01

    Mutations in 60 known genes were previously identified by exome sequencing in 79 of 157 families with retinitis pigmentosa (RP). This study analyzed variants in 129 genes associated with other forms of hereditary retinal dystrophy in the same cohort. Apart from the 73 genes previously analyzed, a further 129 genes responsible for other forms of hereditary retinal dystrophy were selected based on RetNet. Variants in the 129 genes determined by whole exome sequencing were selected and filtered by bioinformatics analysis. Candidate variants were confirmed by Sanger sequencing and validated by analysis of available family members and controls. A total of 90 candidate variants were present in the 129 genes. Sanger sequencing confirmed 83 of the 90 variants. Analysis of family members and controls excluded 76 of these 83 variants. The remaining seven variants were considered to be potential pathogenic mutations; these were c.899A>G, c.1814C>G, and c.2107C>T in BBS2; c.1073C>T and c.1669C>T in INPP5E; and c.3582C>G and c.5704-5C>G in CACNA1F. Six of these seven mutations were novel. The mutations were detected in five unrelated patients without a family history, including three patients with homozygous or compound heterozygous mutations in BBS2 and INPP5E, and two patients with hemizygous mutations in CACNA1F. None of the patients had mutations in the genes associated with autosome dominant retinal dystrophy. Only a small portion of patients with RP, about 3% (5/157), had causative mutations in the 129 genes associated with other forms of hereditary retinal dystrophy.

  6. Analysis of intra-host genetic diversity of Prunus necrotic ringspot virus (PNRSV) using amplicon next generation sequencing

    PubMed Central

    Constable, Fiona E.; Nancarrow, Narelle; Plummer, Kim M.; Rodoni, Brendan

    2017-01-01

    PCR amplicon next generation sequencing (NGS) analysis offers a broadly applicable and targeted approach to detect populations of both high- or low-frequency virus variants in one or more plant samples. In this study, amplicon NGS was used to explore the diversity of the tripartite genome virus, Prunus necrotic ringspot virus (PNRSV) from 53 PNRSV-infected trees using amplicons from conserved gene regions of each of PNRSV RNA1, RNA2 and RNA3. Sequencing of the amplicons from 53 PNRSV-infected trees revealed differing levels of polymorphism across the three different components of the PNRSV genome with a total number of 5040, 2083 and 5486 sequence variants observed for RNA1, RNA2 and RNA3 respectively. The RNA2 had the lowest diversity of sequences compared to RNA1 and RNA3, reflecting the lack of flexibility tolerated by the replicase gene that is encoded by this RNA component. Distinct PNRSV phylo-groups, consisting of closely related clusters of sequence variants, were observed in each of PNRSV RNA1, RNA2 and RNA3. Most plant samples had a single phylo-group for each RNA component. Haplotype network analysis showed that smaller clusters of PNRSV sequence variants were genetically connected to the largest sequence variant cluster within a phylo-group of each RNA component. Some plant samples had sequence variants occurring in multiple PNRSV phylo-groups in at least one of each RNA and these phylo-groups formed distinct clades that represent PNRSV genetic strains. Variants within the same phylo-group of each Prunus plant sample had ≥97% similarity and phylo-groups within a Prunus plant sample and between samples had less ≤97% similarity. Based on the analysis of diversity, a definition of a PNRSV genetic strain was proposed. The proposed definition was applied to determine the number of PNRSV genetic strains in each of the plant samples and the complexity in defining genetic strains in multipartite genome viruses was explored. PMID:28632759

  7. Evaluation of point mutations in dystrophin gene in Iranian Duchenne and Becker muscular dystrophy patients: introducing three novel variants.

    PubMed

    Haghshenas, Maryam; Akbari, Mohammad Taghi; Karizi, Shohreh Zare; Deilamani, Faravareh Khordadpoor; Nafissi, Shahriar; Salehi, Zivar

    2016-06-01

    Duchenne and Becker muscular dystrophies (DMD and BMD) are X-linked neuromuscular diseases characterized by progressive muscular weakness and degeneration of skeletal muscles. Approximately two-thirds of the patients have large deletions or duplications in the dystrophin gene and the remaining one-third have point mutations. This study was performed to evaluate point mutations in Iranian DMD/BMD male patients. A total of 29 DNA samples from patients who did not show any large deletion/duplication mutations following multiplex polymerase chain reaction (PCR) and multiplex ligation-dependent probe amplification (MLPA) screening were sequenced for detection of point mutations in exons 50-79. Also exon 44 was sequenced in one sample in which a false positive deletion was detected by MLPA method. Cycle sequencing revealed four nonsense, one frameshift and two splice site mutations as well as two missense variants.

  8. Routine HLA-B genotyping with PCR-sequence-specific oligonucleotides detects a B*52 variant (B*5206).

    PubMed

    Hoelsch, K; Lenggeler, I; Pfannes, W; Knabe, H; Klein, H-G; Woelpl, A

    2005-05-01

    A new human leukocyte antigen (HLA)-B allele was found during routine typing of samples for a German unrelated bone marrow donor registry, the "Aktion Knochenmarkspende Bayern". After first interpretation of data of two independent low-resolution sequence-specific oligonucleotide typing tests, a B*51 variant was suggested. Further analysis via sequence-based typing identified the sequence as new B*52 allele. This new allele officially assigned as B*5206 differs from HLA-B*520102 by one nucleotide exchange in exon 2. The mutation is located at nucleotide position 274, at which a cytosine is substituted by a thymine leading to an amino acid change at protein position 67 from serine (TCC) to phenylalanine (TTC).

  9. Frequency of pathogenic germline mutation in CHEK2, PALB2, MRE11, and RAD50 in patients at high risk for hereditary breast cancer.

    PubMed

    Kim, Haeyoung; Cho, Dae-Yeon; Choi, Doo Ho; Oh, Mijin; Shin, Inkyung; Park, Won; Huh, Seung Jae; Nam, Seok Jin; Lee, Jeong Eon; Kim, Seok Won

    2017-01-01

    This study was performed to evaluate the frequency of mutations in CHEK2, PALB2, MRE11, and RAD50 among Korean patients at high risk for hereditary breast cancer. A total of 235 Korean patients with hereditary breast cancer who tested negative for BRCA1/2 mutation were enrolled to this study. Entire coding regions of CHEK2, PALB2, MRE11, and RAD50 were analyzed using massively parallel sequencing (MPS). Sequence variants detected by MPS were confirmed by Sanger sequencing. Six patients (2.5 %) were found to have pathogenic variants in CHEK2 (n = 1), PALB2 (n = 2), MRE11 (n = 1), and RAD50 (n = 2). Among the pathogenic variants, PALB2 c.2257C>T was previously reported in other studies, while CHEK2 c.1245dupC, PALB2 c.1048C>T, MRE11 c.1773_1774delAA, RAD50 c.1276C>T, and RAD50 c.3811_3813delGAA were newly identified in this study. A total of 15 missense variants were found in the four genes among 26 patients; 7 patients had a variant in CHEK2, 11 in PALB2, 2 in MRE11, and 6 in RAD50. When in silico analyses were performed to the 15 missense variants, six variants (CHEK2 c.686A>G, PALB2 c.1492G>T, PALB2 c.3054G>C, MRE11 c.140C>T, RAD50 c.1456C>T, and RAD50 c.3790C>T) were predicted to be deleterious. Pathogenic variants in CHEK2, PALB2, MRE11, and RAD50 were detected in a small proportion of Korean patients with features of hereditary breast cancer.

  10. Genome sequencing of idiopathic pulmonary fibrosis in conjunction with a medical school human anatomy course.

    PubMed

    Kumar, Akash; Dougherty, Max; Findlay, Gregory M; Geisheker, Madeleine; Klein, Jason; Lazar, John; Machkovech, Heather; Resnick, Jesse; Resnick, Rebecca; Salter, Alexander I; Talebi-Liasi, Faezeh; Arakawa, Christopher; Baudin, Jacob; Bogaard, Andrew; Salesky, Rebecca; Zhou, Qian; Smith, Kelly; Clark, John I; Shendure, Jay; Horwitz, Marshall S

    2014-01-01

    Even in cases where there is no obvious family history of disease, genome sequencing may contribute to clinical diagnosis and management. Clinical application of the genome has not yet become routine, however, in part because physicians are still learning how best to utilize such information. As an educational research exercise performed in conjunction with our medical school human anatomy course, we explored the potential utility of determining the whole genome sequence of a patient who had died following a clinical diagnosis of idiopathic pulmonary fibrosis (IPF). Medical students performed dissection and whole genome sequencing of the cadaver. Gross and microscopic findings were more consistent with the fibrosing variant of nonspecific interstitial pneumonia (NSIP), as opposed to IPF per se. Variants in genes causing Mendelian disorders predisposing to IPF were not detected. However, whole genome sequencing identified several common variants associated with IPF, including a single nucleotide polymorphism (SNP), rs35705950, located in the promoter region of the gene encoding mucin glycoprotein MUC5B. The MUC5B promoter polymorphism was recently found to markedly elevate risk for IPF, though a particular association with NSIP has not been previously reported, nor has its contribution to disease risk previously been evaluated in the genome-wide context of all genetic variants. We did not identify additional predicted functional variants in a region of linkage disequilibrium (LD) adjacent to MUC5B, nor did we discover other likely risk-contributing variants elsewhere in the genome. Whole genome sequencing thus corroborates the association of rs35705950 with MUC5B dysregulation and interstitial lung disease. This novel exercise additionally served a unique mission in bridging clinical and basic science education.

  11. Biallelic germline and somatic mutations in malignant mesothelioma: multiple mutations in transcription regulators including mSWI/SNF genes.

    PubMed

    Yoshikawa, Yoshie; Sato, Ayuko; Tsujimura, Tohru; Otsuki, Taiichiro; Fukuoka, Kazuya; Hasegawa, Seiki; Nakano, Takashi; Hashimoto-Tamaoki, Tomoko

    2015-02-01

    We detected low levels of acetylation for histone H3 tail lysines in malignant mesothelioma (MM) cell lines resistant to histone deacetylase inhibitors. To identify the possible genetic causes related to the low histone acetylation levels, whole-exome sequencing was conducted with MM cell lines established from eight patients. A mono-allelic variant of BRD1 was common to two MM cell lines with very low acetylation levels. We identified 318 homozygous protein-damaging variants/mutations (18-78 variants/mutations per patient); annotation analysis showed enrichment of the molecules associated with mammalian SWI/SNF (mSWI/SNF) chromatin remodeling complexes and co-activators that facilitate initiation of transcription. In seven of the patients, we detected a combination of variants in histone modifiers or transcription factors/co-factors, in addition to variants in mSWI/SNF. Direct sequencing showed that homozygous mutations in SMARCA4, PBRM1 and ARID2 were somatic. In one patient, homozygous germline variants were observed for SMARCC1 and SETD2 in chr3p22.1-3p14.2. These exhibited extended germline homozygosity and were in regions containing somatic mutations, leading to a loss of BAP1 and PBRM1 expression in MM cell line. Most protein-damaging variants were heterozygous in normal tissues. Heterozygous germline variants were often converted into hemizygous variants by mono-allelic deletion, and were rarely homozygous because of acquired uniparental disomy. Our findings imply that MM might develop through the somatic inactivation of mSWI/SNF complex subunits and/or histone modifiers, including BAP1, in subjects that have rare germline variants of these transcription regulators and/or transcription factors/co-factors, and in regions prone to mono-allelic deletion during oncogenesis. © 2014 UICC.

  12. A novel EML4-ALK variant: exon 6 of EML4 fused to exon 19 of ALK.

    PubMed

    Penzel, Roland; Schirmacher, Peter; Warth, Arne

    2012-07-01

    Cytotoxic chemotherapy remains the mainstay of treatment for most patients with advanced disease. Recently, anaplastic lymphoma kinase (ALK) expression as a major target for successful treatment with ALK inhibitors was detected in a subset of non-small-cell lung carcinomas, usually as a result of echinoderm microtubule-associated protein-like 4 (EML4)-ALK rearrangements. Although the chromosomal breakpoint within the EML4 gene varied, the breakpoint within ALK was most frequently reported within intron 19 or rarely in exon 20. Therefore, the different EML4-ALK variants so far contain the same 3' portion of ALK starting with exon 20. Here, we report a novel EML4-ALK variant detected by reverse transcription polymerase chain reaction analysis. Subsequent sequencing revealed an EML4-ALK fusion variant in which exon 6 of EML4 was fused to exon 19 of ALK. It occurred in a predominant solid pulmonary adenocarcinoma of a 65-year-old woman with a clear split signal of ALK in fluorescence in situ hybridization analysis and a weakly homogeneous ALK expression in immunohistochemical staining. Because of the growing number of fusion variants a primary reverse transcription polymerase chain reaction-based screening for ALK-positive non-small-cell lung carcinoma patients may not be sufficient for predictive diagnostics but transcript-based approaches and sequencing of ALK fusion variants might finally contribute to an optimized selection of patients.

  13. Population sequencing reveals breed and sub-species specific CNVs in cattle

    USDA-ARS?s Scientific Manuscript database

    Individualized copy number variation (CNV) maps have highlighted the need for population surveys of cattle to detect the rare and common variants. While SNP and comparative genomic hybridization (CGH) arrays have provided preliminary data, next-generation sequence (NGS) data analysis offers an incre...

  14. Multiplex Amplification Refractory Mutation System PCR (ARMS-PCR) provides sequencing independent typing of canine parvovirus.

    PubMed

    Chander, Vishal; Chakravarti, Soumendu; Gupta, Vikas; Nandi, Sukdeb; Singh, Mithilesh; Badasara, Surendra Kumar; Sharma, Chhavi; Mittal, Mitesh; Dandapat, S; Gupta, V K

    2016-12-01

    Canine parvovirus-2 antigenic variants (CPV-2a, CPV-2b and CPV-2c) ubiquitously distributed worldwide in canine population causes severe fatal gastroenteritis. Antigenic typing of CPV-2 remains a prime focus of research groups worldwide in understanding the disease epidemiology and virus evolution. The present study was thus envisioned to provide a simple sequencing independent, rapid, robust, specific, user-friendly technique for detecting and typing of presently circulating CPV-2 antigenic variants. ARMS-PCR strategy was employed using specific primers for CPV-2a, CPV-2b and CPV-2c to differentiate these antigenic types. ARMS-PCR was initially optimized with reference positive controls in two steps; where first reaction was used to differentiate CPV-2a from CPV-2b/CPV-2c. The second reaction was carried out with CPV-2c specific primers to confirm the presence of CPV-2c. Initial validation of the ARMS-PCR was carried out with 24 sequenced samples and the results were matched with the sequencing results. ARMS-PCR technique was further used to screen and type 90 suspected clinical samples. Randomly selected 15 suspected clinical samples that were typed with this technique were sequenced. The results of ARMS-PCR and the sequencing matched exactly with each other. The developed technique has a potential to become a sequencing independent method for simultaneous detection and typing of CPV-2 antigenic variants in veterinary disease diagnostic laboratories globally. Copyright © 2016 Elsevier B.V. All rights reserved.

  15. Mutations in PIGY: expanding the phenotype of inherited glycosylphosphatidylinositol deficiencies

    PubMed Central

    Ilkovski, Biljana; Pagnamenta, Alistair T.; O'Grady, Gina L.; Kinoshita, Taroh; Howard, Malcolm F.; Lek, Monkol; Thomas, Brett; Turner, Anne; Christodoulou, John; Sillence, David; Knight, Samantha J.L.; Popitsch, Niko; Keays, David A.; Anzilotti, Consuelo; Goriely, Anne; Waddell, Leigh B.; Brilot, Fabienne; North, Kathryn N.; Kanzawa, Noriyuki; Macarthur, Daniel G.; Taylor, Jenny C.; Kini, Usha; Murakami, Yoshiko; Clarke, Nigel F.

    2015-01-01

    Glycosylphosphatidylinositol (GPI)-anchored proteins are ubiquitously expressed in the human body and are important for various functions at the cell surface. Mutations in many GPI biosynthesis genes have been described to date in patients with multi-system disease and together these constitute a subtype of congenital disorders of glycosylation. We used whole exome sequencing in two families to investigate the genetic basis of disease and used RNA and cellular studies to investigate the functional consequences of sequence variants in the PIGY gene. Two families with different phenotypes had homozygous recessive sequence variants in the GPI biosynthesis gene PIGY. Two sisters with c.137T>C (p.Leu46Pro) PIGY variants had multi-system disease including dysmorphism, seizures, severe developmental delay, cataracts and early death. There were significantly reduced levels of GPI-anchored proteins (CD55 and CD59) on the surface of patient-derived skin fibroblasts (∼20–50% compared with controls). In a second, consanguineous family, two siblings had moderate development delay and microcephaly. A homozygous PIGY promoter variant (c.-540G>A) was detected within a 7.7 Mb region of autozygosity. This variant was predicted to disrupt a SP1 consensus binding site and was shown to be associated with reduced gene expression. Mutations in PIGY can occur in coding and non-coding regions of the gene and cause variable phenotypes. This article contributes to understanding of the range of disease phenotypes and disease genes associated with deficiencies of the GPI-anchor biosynthesis pathway and also serves to highlight the potential importance of analysing variants detected in 5′-UTR regions despite their typically low coverage in exome data. PMID:26293662

  16. IMPACT: a whole-exome sequencing analysis pipeline for integrating molecular profiles with actionable therapeutics in clinical samples

    PubMed Central

    Hintzsche, Jennifer; Kim, Jihye; Yadav, Vinod; Amato, Carol; Robinson, Steven E; Seelenfreund, Eric; Shellman, Yiqun; Wisell, Joshua; Applegate, Allison; McCarter, Martin; Box, Neil; Tentler, John; De, Subhajyoti

    2016-01-01

    Objective Currently, there is a disconnect between finding a patient’s relevant molecular profile and predicting actionable therapeutics. Here we develop and implement the Integrating Molecular Profiles with Actionable Therapeutics (IMPACT) analysis pipeline, linking variants detected from whole-exome sequencing (WES) to actionable therapeutics. Methods and materials The IMPACT pipeline contains 4 analytical modules: detecting somatic variants, calling copy number alterations, predicting drugs against deleterious variants, and analyzing tumor heterogeneity. We tested the IMPACT pipeline on whole-exome sequencing data in The Cancer Genome Atlas (TCGA) lung adenocarcinoma samples with known EGFR mutations. We also used IMPACT to analyze melanoma patient tumor samples before treatment, after BRAF-inhibitor treatment, and after BRAF- and MEK-inhibitor treatment. Results IMPACT Food and Drug Administration (FDA) correctly identified known EGFR mutations in the TCGA lung adenocarcinoma samples. IMPACT linked these EGFR mutations to the appropriate FDA-approved EGFR inhibitors. For the melanoma patient samples, we identified NRAS p.Q61K as an acquired resistance mutation to BRAF-inhibitor treatment. We also identified CDKN2A deletion as a novel acquired resistance mutation to BRAFi/MEKi inhibition. The IMPACT analysis pipeline predicts these somatic variants to actionable therapeutics. We observed the clonal dynamic in the tumor samples after various treatments. We showed that IMPACT not only helped in successful prioritization of clinically relevant variants but also linked these variations to possible targeted therapies. Conclusion IMPACT provides a new bioinformatics strategy to delineate candidate somatic variants and actionable therapies. This approach can be applied to other patient tumor samples to discover effective drug targets for personalized medicine. IMPACT is publicly available at http://tanlab.ucdenver.edu/IMPACT. PMID:27026619

  17. Mutations in PIGY: expanding the phenotype of inherited glycosylphosphatidylinositol deficiencies.

    PubMed

    Ilkovski, Biljana; Pagnamenta, Alistair T; O'Grady, Gina L; Kinoshita, Taroh; Howard, Malcolm F; Lek, Monkol; Thomas, Brett; Turner, Anne; Christodoulou, John; Sillence, David; Knight, Samantha J L; Popitsch, Niko; Keays, David A; Anzilotti, Consuelo; Goriely, Anne; Waddell, Leigh B; Brilot, Fabienne; North, Kathryn N; Kanzawa, Noriyuki; Macarthur, Daniel G; Taylor, Jenny C; Kini, Usha; Murakami, Yoshiko; Clarke, Nigel F

    2015-11-01

    Glycosylphosphatidylinositol (GPI)-anchored proteins are ubiquitously expressed in the human body and are important for various functions at the cell surface. Mutations in many GPI biosynthesis genes have been described to date in patients with multi-system disease and together these constitute a subtype of congenital disorders of glycosylation. We used whole exome sequencing in two families to investigate the genetic basis of disease and used RNA and cellular studies to investigate the functional consequences of sequence variants in the PIGY gene. Two families with different phenotypes had homozygous recessive sequence variants in the GPI biosynthesis gene PIGY. Two sisters with c.137T>C (p.Leu46Pro) PIGY variants had multi-system disease including dysmorphism, seizures, severe developmental delay, cataracts and early death. There were significantly reduced levels of GPI-anchored proteins (CD55 and CD59) on the surface of patient-derived skin fibroblasts (∼20-50% compared with controls). In a second, consanguineous family, two siblings had moderate development delay and microcephaly. A homozygous PIGY promoter variant (c.-540G>A) was detected within a 7.7 Mb region of autozygosity. This variant was predicted to disrupt a SP1 consensus binding site and was shown to be associated with reduced gene expression. Mutations in PIGY can occur in coding and non-coding regions of the gene and cause variable phenotypes. This article contributes to understanding of the range of disease phenotypes and disease genes associated with deficiencies of the GPI-anchor biosynthesis pathway and also serves to highlight the potential importance of analysing variants detected in 5'-UTR regions despite their typically low coverage in exome data. © The Author 2015. Published by Oxford University Press.

  18. Extreme-phenotype genome-wide association study (XP-GWAS): a method for identifying trait-associated variants by sequencing pools of individuals selected from a diversity panel.

    PubMed

    Yang, Jinliang; Jiang, Haiying; Yeh, Cheng-Ting; Yu, Jianming; Jeddeloh, Jeffrey A; Nettleton, Dan; Schnable, Patrick S

    2015-11-01

    Although approaches for performing genome-wide association studies (GWAS) are well developed, conventional GWAS requires high-density genotyping of large numbers of individuals from a diversity panel. Here we report a method for performing GWAS that does not require genotyping of large numbers of individuals. Instead XP-GWAS (extreme-phenotype GWAS) relies on genotyping pools of individuals from a diversity panel that have extreme phenotypes. This analysis measures allele frequencies in the extreme pools, enabling discovery of associations between genetic variants and traits of interest. This method was evaluated in maize (Zea mays) using the well-characterized kernel row number trait, which was selected to enable comparisons between the results of XP-GWAS and conventional GWAS. An exome-sequencing strategy was used to focus sequencing resources on genes and their flanking regions. A total of 0.94 million variants were identified and served as evaluation markers; comparisons among pools showed that 145 of these variants were statistically associated with the kernel row number phenotype. These trait-associated variants were significantly enriched in regions identified by conventional GWAS. XP-GWAS was able to resolve several linked QTL and detect trait-associated variants within a single gene under a QTL peak. XP-GWAS is expected to be particularly valuable for detecting genes or alleles responsible for quantitative variation in species for which extensive genotyping resources are not available, such as wild progenitors of crops, orphan crops, and other poorly characterized species such as those of ecological interest. © 2015 The Authors The Plant Journal published by Society for Experimental Biology and John Wiley & Sons Ltd.

  19. IMPACT: a whole-exome sequencing analysis pipeline for integrating molecular profiles with actionable therapeutics in clinical samples.

    PubMed

    Hintzsche, Jennifer; Kim, Jihye; Yadav, Vinod; Amato, Carol; Robinson, Steven E; Seelenfreund, Eric; Shellman, Yiqun; Wisell, Joshua; Applegate, Allison; McCarter, Martin; Box, Neil; Tentler, John; De, Subhajyoti; Robinson, William A; Tan, Aik Choon

    2016-07-01

    Currently, there is a disconnect between finding a patient's relevant molecular profile and predicting actionable therapeutics. Here we develop and implement the Integrating Molecular Profiles with Actionable Therapeutics (IMPACT) analysis pipeline, linking variants detected from whole-exome sequencing (WES) to actionable therapeutics. The IMPACT pipeline contains 4 analytical modules: detecting somatic variants, calling copy number alterations, predicting drugs against deleterious variants, and analyzing tumor heterogeneity. We tested the IMPACT pipeline on whole-exome sequencing data in The Cancer Genome Atlas (TCGA) lung adenocarcinoma samples with known EGFR mutations. We also used IMPACT to analyze melanoma patient tumor samples before treatment, after BRAF-inhibitor treatment, and after BRAF- and MEK-inhibitor treatment. IMPACT Food and Drug Administration (FDA) correctly identified known EGFR mutations in the TCGA lung adenocarcinoma samples. IMPACT linked these EGFR mutations to the appropriate FDA-approved EGFR inhibitors. For the melanoma patient samples, we identified NRAS p.Q61K as an acquired resistance mutation to BRAF-inhibitor treatment. We also identified CDKN2A deletion as a novel acquired resistance mutation to BRAFi/MEKi inhibition. The IMPACT analysis pipeline predicts these somatic variants to actionable therapeutics. We observed the clonal dynamic in the tumor samples after various treatments. We showed that IMPACT not only helped in successful prioritization of clinically relevant variants but also linked these variations to possible targeted therapies. IMPACT provides a new bioinformatics strategy to delineate candidate somatic variants and actionable therapies. This approach can be applied to other patient tumor samples to discover effective drug targets for personalized medicine.IMPACT is publicly available at http://tanlab.ucdenver.edu/IMPACT. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  20. Genotyping of 25 leukemia-associated genes in a single work flow by next-generation sequencing technology with low amounts of input template DNA.

    PubMed

    Rinke, Jenny; Schäfer, Vivien; Schmidt, Mathias; Ziermann, Janine; Kohlmann, Alexander; Hochhaus, Andreas; Ernst, Thomas

    2013-08-01

    We sought to establish a convenient, sensitive next-generation sequencing (NGS) method for genotyping the 26 most commonly mutated leukemia-associated genes in a single work flow and to optimize this method for low amounts of input template DNA. We designed 184 PCR amplicons that cover all of the candidate genes. NGS was performed with genomic DNA (gDNA) from a cohort of 10 individuals with chronic myelomonocytic leukemia. The results were compared with NGS data obtained from sequencing of DNA generated by whole-genome amplification (WGA) of 20 ng template gDNA. Differences between gDNA and WGA samples in variant frequencies were determined for 2 different WGA kits. For gDNA samples, 25 of 26 genes were successfully sequenced with a sensitivity of 5%, which was achieved by a median coverage of 492 reads (range, 308-636 reads) per amplicon. We identified 24 distinct mutations in 11 genes. With WGA samples, we reliably detected all mutations above 5% sensitivity with a median coverage of 506 reads (range, 256-653 reads) per amplicon. With all variants included in the analysis, WGA amplification by the 2 kits tested yielded differences in variant frequencies that ranged from -28.19% to +9.94% [mean (SD) difference, -0.2% (4.08%)] and from -35.03% to +18.67% [mean difference, -0.75% (5.12%)]. Our method permits simultaneous analysis of a wide range of leukemia-associated target genes in a single sequencing run. NGS can be performed after WGA of template DNA for reliable detection of variants without introducing appreciable bias.

  1. BEST1 sequence variants in Italian patients with vitelliform macular dystrophy

    PubMed Central

    Sodi, Andrea; Passerini, Ilaria; Caputo, Roberto; Bacci, Giacomo Maria; Bodoj, Mirela; Torricelli, Francesca; Menchini, Ugo

    2012-01-01

    Purpose To analyze the spectrum of sequence variants in the BEST1 gene in a group of Italian patients affected by Best vitelliform macular dystrophy (VMD). Methods Thirty Italian patients with a diagnosis of VMD and 20 clinically healthy relatives were recruited. They belonged to 19 Italian families predominantly originating from central Italy. They received a standard ophthalmologic examination, OCT scan, and electrophysiological tests (ERG and EOG). Fluorescein and ICG angiographies and fundus autofluorescence imaging were performed in selected cases. DNA samples were analyzed for sequence variants of the BEST1 gene by direct sequencing techniques. Results Nine missense variants and one deletion were found in the affected patients; each patient carried one mutation. Five variants [c.73C>T (p.Arg25Trp), c.652C>T (p.Arg218Cys), c.652C>G (p.Arg218Gly), c.728C>T (p.Ala243Val), c.893T>C (p.Phe298Ser)] have already been described in literature while another five variants [c.217A>C (p.Ile73Leu), c.239T>G (p.Phe80Cys), c.883_885del (p.Ile295del), c.907G>A (p.Asp303Asn), c.911A>G (p.Asp304Gly)] had not previously been reported. Affected patients, sometimes even from the same family, occasionally showed variable phenotypes. One heterozygous variant was also found in five clinically healthy relatives with normal fundus, visual acuity and ERG but with abnormal EOG. Conclusions Ten variants in the BEST1 gene were detected in a group of individuals with clinically apparent VMD, and in some clinically normal individuals with an abnormal EOG. The high prevalence of novel variants and the frequent report of a specific variant (p.Arg25Trp) that has rarely been described in other ethnic groups suggests a distribution of BEST1 variants peculiar to Italian VMD patients. PMID:23213274

  2. Targeted next generation sequencing identified a novel mutation in MYO7A causing Usher syndrome type 1 in an Iranian consanguineous pedigree.

    PubMed

    Kooshavar, Daniz; Razipour, Masoumeh; Movasat, Morteza; Keramatipour, Mohammad

    2018-01-01

    Usher syndrome (USH) is characterized by congenital hearing loss and retinitis pigmentosa (RP) with a later onset. It is an autosomal recessive trait with clinical and genetic heterogeneity which makes the molecular diagnosis much difficult. In this study, we introduce a pedigree with two affected members with USH type 1 and represent a cost and time effective approach for genetic diagnosis of USH as a genetically heterogeneous disorder. Target region capture in the genes of interest, followed by next generation sequencing (NGS) was used to determine the causative mutations in one of the probands. Then segregation analysis in the pedigree was conducted using PCR-Sanger sequencing. Targeted NGS detected a novel homozygous nonsense variant c.4513G > T (p.Glu1505Ter) in MYO7A. The variant is segregating in the pedigree with an autosomal recessive pattern. In this study, a novel stop gained variant c.4513G > T (p.Glu1505Ter) in MYO7A was found in an Iranian pedigree with two affected members with USH type 1. Bioinformatic as well as pedigree segregation analyses were in line with pathogenic nature of this variant. Targeted NGS panel was showed to be an efficient method for mutation detection in hereditary disorders with locus heterogeneity. Copyright © 2017 Elsevier B.V. All rights reserved.

  3. Mutation Spectrum of the ABCA4 Gene in a Greek Cohort with Stargardt Disease: Identification of Novel Mutations and Evidence of Three Prevalent Mutated Alleles

    PubMed Central

    Vassiliki, Kokkinou; George, Koutsodontis; Polixeni, Stamatiou; Christoforos, Giatzakis; Minas, Aslanides Ioannis; Stavrenia, Koukoula; Ioannis, Datseris

    2018-01-01

    Aim To evaluate the frequency and pattern of disease-associated mutations of ABCA4 gene among Greek patients with presumed Stargardt disease (STGD1). Materials and Methods A total of 59 patients were analyzed for ABCA4 mutations using the ABCR400 microarray and PCR-based sequencing of all coding exons and flanking intronic regions. MLPA analysis as well as sequencing of two regions in introns 30 and 36 reported earlier to harbor deep intronic disease-associated variants was used in 4 selected cases. Results An overall detection rate of at least one mutant allele was achieved in 52 of the 59 patients (88.1%). Direct sequencing improved significantly the complete characterization rate, that is, identification of two mutations compared to the microarray analysis (93.1% versus 50%). In total, 40 distinct potentially disease-causing variants of the ABCA4 gene were detected, including six previously unreported potentially pathogenic variants. Among the disease-causing variants, in this cohort, the most frequent was c.5714+5G>A representing 16.1%, while p.Gly1961Glu and p.Leu541Pro represented 15.2% and 8.5%, respectively. Conclusions By using a combination of methods, we completely molecularly diagnosed 48 of the 59 patients studied. In addition, we identified six previously unreported, potentially pathogenic ABCA4 mutations. PMID:29854428

  4. Structural analysis of an HLA-B27 functional variant, B27d detected in American blacks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rojo, S.; Aparicio, P.; Hansen, J.A.

    1987-11-15

    The structure of a new functional variant B27d has been established by comparative peptide mapping and radiochemical sequencing. This analysis complete the structural characterization of the six know histocompatibility leukocyte antigen (HLA)-B27 subtypes. The only detected amino acid change between the main HLA-B27.1 subtype and B27d is that of Try/sub 59/ to His/sub 59/. Position 59 has not been previously found to vary among class I HLA or H-2 antigens. Such substitution accounts for the reported isoelectric focusing pattern of this variant. HLA-B27d is the only B27 variant found to differ from other subtypes by a single amino acid replacement.more » The nature of the change is compatible with its origin by a point mutation from HLB-B27.1. Because B27d was found only American blacks and in no other ethnic groups, it is suggested that this variant originated as a result of a mutation of the B27.1 gene that occurred within the black population. Structural analysis of B27d was done by comparative mapping. Radiochemical sequencing was carried out with /sup 14/C-labeled and /sup 3/H-labeled amino acids.« less

  5. Current state-of-art of STR sequencing in forensic genetics.

    PubMed

    Alonso, Antonio; Barrio, Pedro A; Müller, Petra; Köcher, Steffi; Berger, Burkhard; Martin, Pablo; Bodner, Martin; Willuweit, Sascha; Parson, Walther; Roewer, Lutz; Budowle, Bruce

    2018-05-11

    The current state of validation and implementation strategies of MPS technology for the analysis of STR markers for forensic genetics use is described, covering the topics of the current catalogue of commercial MPS-STR panels, leading MPS-platforms, and MPS-STR data analysis tools. In addition, the developmental and internal validation studies carried out to date to evaluate reliability, sensitivity, mixture analysis, concordance, and the ability to analyze challenged samples are summarized. The results of various MPS-STR population studies that showed a large number of new STR sequence variants that increase the power of discrimination in several forensically-relevant loci are also presented. Finally, various initiatives developed by several international projects and standardization (or guidelines) groups to facilitate application of MPS technology for STR marker analyses are discussed in regard to promoting a standard STR sequence nomenclature, performing population studies to detect sequence variants, and developing a universal system to translate sequence variants into a simple STR nomenclature (numbers and letters) compatible with national STR databases. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  6. A note on the efficiencies of sampling strategies in two-stage Bayesian regional fine mapping of a quantitative trait.

    PubMed

    Chen, Zhijian; Craiu, Radu V; Bull, Shelley B

    2014-11-01

    In focused studies designed to follow up associations detected in a genome-wide association study (GWAS), investigators can proceed to fine-map a genomic region by targeted sequencing or dense genotyping of all variants in the region, aiming to identify a functional sequence variant. For the analysis of a quantitative trait, we consider a Bayesian approach to fine-mapping study design that incorporates stratification according to a promising GWAS tag SNP in the same region. Improved cost-efficiency can be achieved when the fine-mapping phase incorporates a two-stage design, with identification of a smaller set of more promising variants in a subsample taken in stage 1, followed by their evaluation in an independent stage 2 subsample. To avoid the potential negative impact of genetic model misspecification on inference we incorporate genetic model selection based on posterior probabilities for each competing model. Our simulation study shows that, compared to simple random sampling that ignores genetic information from GWAS, tag-SNP-based stratified sample allocation methods reduce the number of variants continuing to stage 2 and are more likely to promote the functional sequence variant into confirmation studies. © 2014 WILEY PERIODICALS, INC.

  7. Middle East Respiratory Syndrome Coronavirus Intra-Host Populations Are Characterized by Numerous High Frequency Variants

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Borucki, Monica K.; Lao, Victoria; Hwang, Mona

    Middle East respiratory syndrome coronavirus (MERS-CoV) is an emerging human pathogen related to SARS virus. In vitro studies indicate this virus may have a broad host range suggesting an increased pandemic potential. Genetic and epidemiological evidence indicate camels serve as a reservoir for MERS virus but the mechanism of cross species transmission is unclear and many questions remain regarding the susceptibility of humans to infection. Deep sequencing data was obtained from the nasal samples of three camels that had been experimentally infected with a human MERS-CoV isolate. A majority of the genome was covered and average coverage was greater thanmore » 12,000x depth. Although only 5 mutations were detected in the consensus sequences, 473 intrahost single nucleotide variants were identified. Lastly, many of these variants were present at high frequencies and could potentially influence viral phenotype and the sensitivity of detection assays that target these regions for primer or probe binding.« less

  8. Middle East Respiratory Syndrome Coronavirus Intra-Host Populations Are Characterized by Numerous High Frequency Variants

    DOE PAGES

    Borucki, Monica K.; Lao, Victoria; Hwang, Mona; ...

    2016-01-20

    Middle East respiratory syndrome coronavirus (MERS-CoV) is an emerging human pathogen related to SARS virus. In vitro studies indicate this virus may have a broad host range suggesting an increased pandemic potential. Genetic and epidemiological evidence indicate camels serve as a reservoir for MERS virus but the mechanism of cross species transmission is unclear and many questions remain regarding the susceptibility of humans to infection. Deep sequencing data was obtained from the nasal samples of three camels that had been experimentally infected with a human MERS-CoV isolate. A majority of the genome was covered and average coverage was greater thanmore » 12,000x depth. Although only 5 mutations were detected in the consensus sequences, 473 intrahost single nucleotide variants were identified. Lastly, many of these variants were present at high frequencies and could potentially influence viral phenotype and the sensitivity of detection assays that target these regions for primer or probe binding.« less

  9. Single nucleotide variants and InDels identified from whole-genome re-sequencing of Guzerat, Gyr, Girolando and Holstein cattle breeds.

    PubMed

    Stafuzza, Nedenia Bonvino; Zerlotini, Adhemar; Lobo, Francisco Pereira; Yamagishi, Michel Eduardo Beleza; Chud, Tatiane Cristina Seleguim; Caetano, Alexandre Rodrigues; Munari, Danísio Prado; Garrick, Dorian J; Machado, Marco Antonio; Martins, Marta Fonseca; Carvalho, Maria Raquel; Cole, John Bruce; Barbosa da Silva, Marcos Vinicius Gualberto

    2017-01-01

    Whole-genome re-sequencing, alignment and annotation analyses were undertaken for 12 sires representing four important cattle breeds in Brazil: Guzerat (multi-purpose), Gyr, Girolando and Holstein (dairy production). A total of approximately 4.3 billion reads from an Illumina HiSeq 2000 sequencer generated for each animal 10.7 to 16.4-fold genome coverage. A total of 27,441,279 single nucleotide variations (SNVs) and 3,828,041 insertions/deletions (InDels) were detected in the samples, of which 2,557,670 SNVs and 883,219 InDels were novel. The submission of these genetic variants to the dbSNP database significantly increased the number of known variants, particularly for the indicine genome. The concordance rate between genotypes obtained using the Bovine HD BeadChip array and the same variants identified by sequencing was about 99.05%. The annotation of variants identified numerous non-synonymous SNVs and frameshift InDels which could affect phenotypic variation. Functional enrichment analysis was performed and revealed that variants in the olfactory transduction pathway was over represented in all four cattle breeds, while the ECM-receptor interaction pathway was over represented in Girolando and Guzerat breeds, the ABC transporters pathway was over represented only in Holstein breed, and the metabolic pathways was over represented only in Gyr breed. The genetic variants discovered here provide a rich resource to help identify potential genomic markers and their associated molecular mechanisms that impact economically important traits for Gyr, Girolando, Guzerat and Holstein breeding programs.

  10. Single nucleotide variants and InDels identified from whole-genome re-sequencing of Guzerat, Gyr, Girolando and Holstein cattle breeds

    PubMed Central

    Lobo, Francisco Pereira; Yamagishi, Michel Eduardo Beleza; Chud, Tatiane Cristina Seleguim; Caetano, Alexandre Rodrigues; Munari, Danísio Prado; Garrick, Dorian J.; Machado, Marco Antonio; Martins, Marta Fonseca; Carvalho, Maria Raquel; Cole, John Bruce; Barbosa da Silva, Marcos Vinicius Gualberto

    2017-01-01

    Whole-genome re-sequencing, alignment and annotation analyses were undertaken for 12 sires representing four important cattle breeds in Brazil: Guzerat (multi-purpose), Gyr, Girolando and Holstein (dairy production). A total of approximately 4.3 billion reads from an Illumina HiSeq 2000 sequencer generated for each animal 10.7 to 16.4-fold genome coverage. A total of 27,441,279 single nucleotide variations (SNVs) and 3,828,041 insertions/deletions (InDels) were detected in the samples, of which 2,557,670 SNVs and 883,219 InDels were novel. The submission of these genetic variants to the dbSNP database significantly increased the number of known variants, particularly for the indicine genome. The concordance rate between genotypes obtained using the Bovine HD BeadChip array and the same variants identified by sequencing was about 99.05%. The annotation of variants identified numerous non-synonymous SNVs and frameshift InDels which could affect phenotypic variation. Functional enrichment analysis was performed and revealed that variants in the olfactory transduction pathway was over represented in all four cattle breeds, while the ECM-receptor interaction pathway was over represented in Girolando and Guzerat breeds, the ABC transporters pathway was over represented only in Holstein breed, and the metabolic pathways was over represented only in Gyr breed. The genetic variants discovered here provide a rich resource to help identify potential genomic markers and their associated molecular mechanisms that impact economically important traits for Gyr, Girolando, Guzerat and Holstein breeding programs. PMID:28323836

  11. ERASE-Seq: Leveraging replicate measurements to enhance ultralow frequency variant detection in NGS data

    PubMed Central

    Kamps-Hughes, Nick; McUsic, Andrew; Kurihara, Laurie; Harkins, Timothy T.; Pal, Prithwish; Ray, Claire

    2018-01-01

    The accurate detection of ultralow allele frequency variants in DNA samples is of interest in both research and medical settings, particularly in liquid biopsies where cancer mutational status is monitored from circulating DNA. Next-generation sequencing (NGS) technologies employing molecular barcoding have shown promise but significant sensitivity and specificity improvements are still needed to detect mutations in a majority of patients before the metastatic stage. To address this we present analytical validation data for ERASE-Seq (Elimination of Recurrent Artifacts and Stochastic Errors), a method for accurate and sensitive detection of ultralow frequency DNA variants in NGS data. ERASE-Seq differs from previous methods by creating a robust statistical framework to utilize technical replicates in conjunction with background error modeling, providing a 10 to 100-fold reduction in false positive rates compared to published molecular barcoding methods. ERASE-Seq was tested using spiked human DNA mixtures with clinically realistic DNA input quantities to detect SNVs and indels between 0.05% and 1% allele frequency, the range commonly found in liquid biopsy samples. Variants were detected with greater than 90% sensitivity and a false positive rate below 0.1 calls per 10,000 possible variants. The approach represents a significant performance improvement compared to molecular barcoding methods and does not require changing molecular reagents. PMID:29630678

  12. Molecular characterization of variant alpha-subunit of electron transfer flavoprotein in three patients with glutaric acidemia type II--and identification of glycine substitution for valine-157 in the sequence of the precursor, producing an unstable mature protein in a patient.

    PubMed Central

    Indo, Y; Glassberg, R; Yokota, I; Tanaka, K

    1991-01-01

    In our previous study of eight glutaric acidemia type II (GAII) fibroblast lines by using [35S]methionine labeling and immunoprecipitation, three of them had a defect in the synthesis of the alpha-subunit of electron transfer flavoprotein (alpha-ETF) (Ikeda et al. 1986). In one of them (YH1313) the labeling of the mature alpha-ETF was barely detectable, while that of the precursor (p) was stronger. In another (YH605) no synthesis of immunoreactive p alpha-ETF was detectable. In the third cell line (YH1391) the rate of variant p alpha-ETF synthesis was comparable to normal, but its electrophoretic mobility was slightly faster than normal. In the present study, the northern blot analysis revealed that all three mutant cell lines contained p alpha-ETF mRNA and that their size and amount were comparable to normal. In immunoblot analysis, both alpha- and beta-ETF bands were barely detectable in YH1313 and YH605 but were detectable in YH1391 in amounts comparable to normal. Sequencing of YH1313 p alpha-ETF cDNA via PCR identified a transversion of T-470 to G. We then devised a simple PCR method for the 119-bp section (T-443/G-561) for detecting this mutation. In the upstream primer, A-466 was artificially replaced with C, to introduce a BstNI site into the amplified copies in the presence of G-470 from the variant sequence. The genomic DNA analysis using this method demonstrated that YH1313 was homozygous for T----G-470 transversion. It was not detected either in two other alpha-ETF-deficient GAII or in seven control cell lines. The alpha-ETF cDNA sequence in YH605 was identical to normal. Images Figure 1 Figure 2 Figure 3 Figure 5 PMID:1882842

  13. Low-abundance HIV drug-resistant viral variants in treatment-experienced persons correlate with historical antiretroviral use.

    PubMed

    Le, Thuy; Chiarella, Jennifer; Simen, Birgitte B; Hanczaruk, Bozena; Egholm, Michael; Landry, Marie L; Dieckhaus, Kevin; Rosen, Marc I; Kozal, Michael J

    2009-06-29

    It is largely unknown how frequently low-abundance HIV drug-resistant variants at levels under limit of detection of conventional genotyping (<20% of quasi-species) are present in antiretroviral-experienced persons experiencing virologic failure. Further, the clinical implications of low-abundance drug-resistant variants at time of virologic failure are unknown. Plasma samples from 22 antiretroviral-experienced subjects collected at time of virologic failure (viral load 1380 to 304,000 copies/mL) were obtained from a specimen bank (from 2004-2007). The prevalence and profile of drug-resistant mutations were determined using Sanger sequencing and ultra-deep pyrosequencing. Genotypes were interpreted using Stanford HIV database algorithm. Antiretroviral treatment histories were obtained by chart review and correlated with drug-resistant mutations. Low-abundance drug-resistant mutations were detected in all 22 subjects by deep sequencing and only in 3 subjects by Sanger sequencing. In total they accounted for 90 of 247 mutations (36%) detected by deep sequencing; the majority of these (95%) were not detected by standard genotyping. A mean of 4 additional mutations per subject were detected by deep sequencing (p<0.0001, 95%CI: 2.85-5.53). The additional low-abundance drug-resistant mutations increased a subject's genotypic resistance to one or more antiretrovirals in 17 of 22 subjects (77%). When correlated with subjects' antiretroviral treatment histories, the additional low-abundance drug-resistant mutations correlated with the failing antiretroviral drugs in 21% subjects and correlated with historical antiretroviral use in 79% subjects (OR, 13.73; 95% CI, 2.5-74.3, p = 0.0016). Low-abundance HIV drug-resistant mutations in antiretroviral-experienced subjects at time of virologic failure can increase a subject's overall burden of resistance, yet commonly go unrecognized by conventional genotyping. The majority of unrecognized resistant mutations correlate with historical antiretroviral use. Ultra-deep sequencing can provide important historical resistance information for clinicians when planning subsequent antiretroviral regimens for highly treatment-experienced patients, particularly when their prior treatment histories and longitudinal genotypes are not available.

  14. Low-Abundance HIV Drug-Resistant Viral Variants in Treatment-Experienced Persons Correlate with Historical Antiretroviral Use

    PubMed Central

    Le, Thuy; Chiarella, Jennifer; Simen, Birgitte B.; Hanczaruk, Bozena; Egholm, Michael; Landry, Marie L.; Dieckhaus, Kevin; Rosen, Marc I.; Kozal, Michael J.

    2009-01-01

    Background It is largely unknown how frequently low-abundance HIV drug-resistant variants at levels under limit of detection of conventional genotyping (<20% of quasi-species) are present in antiretroviral-experienced persons experiencing virologic failure. Further, the clinical implications of low-abundance drug-resistant variants at time of virologic failure are unknown. Methodology/Principal Findings Plasma samples from 22 antiretroviral-experienced subjects collected at time of virologic failure (viral load 1380 to 304,000 copies/mL) were obtained from a specimen bank (from 2004–2007). The prevalence and profile of drug-resistant mutations were determined using Sanger sequencing and ultra-deep pyrosequencing. Genotypes were interpreted using Stanford HIV database algorithm. Antiretroviral treatment histories were obtained by chart review and correlated with drug-resistant mutations. Low-abundance drug-resistant mutations were detected in all 22 subjects by deep sequencing and only in 3 subjects by Sanger sequencing. In total they accounted for 90 of 247 mutations (36%) detected by deep sequencing; the majority of these (95%) were not detected by standard genotyping. A mean of 4 additional mutations per subject were detected by deep sequencing (p<0.0001, 95%CI: 2.85–5.53). The additional low-abundance drug-resistant mutations increased a subject's genotypic resistance to one or more antiretrovirals in 17 of 22 subjects (77%). When correlated with subjects' antiretroviral treatment histories, the additional low-abundance drug-resistant mutations correlated with the failing antiretroviral drugs in 21% subjects and correlated with historical antiretroviral use in 79% subjects (OR, 13.73; 95% CI, 2.5–74.3, p = 0.0016). Conclusions/Significance Low-abundance HIV drug-resistant mutations in antiretroviral-experienced subjects at time of virologic failure can increase a subject's overall burden of resistance, yet commonly go unrecognized by conventional genotyping. The majority of unrecognized resistant mutations correlate with historical antiretroviral use. Ultra-deep sequencing can provide important historical resistance information for clinicians when planning subsequent antiretroviral regimens for highly treatment-experienced patients, particularly when their prior treatment histories and longitudinal genotypes are not available. PMID:19562031

  15. The use of population-scale sequencing to identify CNVs impacting productive traits in different cattle breeds

    USDA-ARS?s Scientific Manuscript database

    Individualized copy number variation (CNV) maps have highlighted the need for population surveys of cattle to detect rare and common variants. While SNP and comparative genomic hybridization (CGH) arrays have provided preliminary data, next-generation sequence (NGS) data analysis offers an increased...

  16. High-throughput interpretation of gene structure changes in human and nonhuman resequencing data, using ACE

    USDA-ARS?s Scientific Manuscript database

    We describe a suite of software tools for identifying possible functional changes in gene structure that may result from sequence variants. ACE (“Assessing Changes to Exons”) converts phased genotype calls to a collection of explicit haplotype sequences, maps transcript annotations onto them, detect...

  17. LipidSeq: a next-generation clinical resequencing panel for monogenic dyslipidemias.

    PubMed

    Johansen, Christopher T; Dubé, Joseph B; Loyzer, Melissa N; MacDonald, Austin; Carter, David E; McIntyre, Adam D; Cao, Henian; Wang, Jian; Robinson, John F; Hegele, Robert A

    2014-04-01

    We report the design of a targeted resequencing panel for monogenic dyslipidemias, LipidSeq, for the purpose of replacing Sanger sequencing in the clinical detection of dyslipidemia-causing variants. We also evaluate the performance of the LipidSeq approach versus Sanger sequencing in 84 patients with a range of phenotypes including extreme blood lipid concentrations as well as additional dyslipidemias and related metabolic disorders. The panel performs well, with high concordance (95.2%) in samples with known mutations based on Sanger sequencing and a high detection rate (57.9%) of mutations likely to be causative for disease in samples not previously sequenced. Clinical implementation of LipidSeq has the potential to aid in the molecular diagnosis of patients with monogenic dyslipidemias with a high degree of speed and accuracy and at lower cost than either Sanger sequencing or whole exome sequencing. Furthermore, LipidSeq will help to provide a more focused picture of monogenic and polygenic contributors that underlie dyslipidemia while excluding the discovery of incidental pathogenic clinically actionable variants in nonmetabolism-related genes, such as oncogenes, that would otherwise be identified by a whole exome approach, thus minimizing potential ethical issues.

  18. LipidSeq: a next-generation clinical resequencing panel for monogenic dyslipidemias[S

    PubMed Central

    Johansen, Christopher T.; Dubé, Joseph B.; Loyzer, Melissa N.; MacDonald, Austin; Carter, David E.; McIntyre, Adam D.; Cao, Henian; Wang, Jian; Robinson, John F.; Hegele, Robert A.

    2014-01-01

    We report the design of a targeted resequencing panel for monogenic dyslipidemias, LipidSeq, for the purpose of replacing Sanger sequencing in the clinical detection of dyslipidemia-causing variants. We also evaluate the performance of the LipidSeq approach versus Sanger sequencing in 84 patients with a range of phenotypes including extreme blood lipid concentrations as well as additional dyslipidemias and related metabolic disorders. The panel performs well, with high concordance (95.2%) in samples with known mutations based on Sanger sequencing and a high detection rate (57.9%) of mutations likely to be causative for disease in samples not previously sequenced. Clinical implementation of LipidSeq has the potential to aid in the molecular diagnosis of patients with monogenic dyslipidemias with a high degree of speed and accuracy and at lower cost than either Sanger sequencing or whole exome sequencing. Furthermore, LipidSeq will help to provide a more focused picture of monogenic and polygenic contributors that underlie dyslipidemia while excluding the discovery of incidental pathogenic clinically actionable variants in nonmetabolism-related genes, such as oncogenes, that would otherwise be identified by a whole exome approach, thus minimizing potential ethical issues. PMID:24503134

  19. A Window Into Clinical Next-Generation Sequencing-Based Oncology Testing Practices.

    PubMed

    Nagarajan, Rakesh; Bartley, Angela N; Bridge, Julia A; Jennings, Lawrence J; Kamel-Reid, Suzanne; Kim, Annette; Lazar, Alexander J; Lindeman, Neal I; Moncur, Joel; Rai, Alex J; Routbort, Mark J; Vasalos, Patricia; Merker, Jason D

    2017-12-01

    - Detection of acquired variants in cancer is a paradigm of precision medicine, yet little has been reported about clinical laboratory practices across a broad range of laboratories. - To use College of American Pathologists proficiency testing survey results to report on the results from surveys on next-generation sequencing-based oncology testing practices. - College of American Pathologists proficiency testing survey results from more than 250 laboratories currently performing molecular oncology testing were used to determine laboratory trends in next-generation sequencing-based oncology testing. - These presented data provide key information about the number of laboratories that currently offer or are planning to offer next-generation sequencing-based oncology testing. Furthermore, we present data from 60 laboratories performing next-generation sequencing-based oncology testing regarding specimen requirements and assay characteristics. The findings indicate that most laboratories are performing tumor-only targeted sequencing to detect single-nucleotide variants and small insertions and deletions, using desktop sequencers and predesigned commercial kits. Despite these trends, a diversity of approaches to testing exists. - This information should be useful to further inform a variety of topics, including national discussions involving clinical laboratory quality systems, regulation and oversight of next-generation sequencing-based oncology testing, and precision oncology efforts in a data-driven manner.

  20. Hotspot mutations in cancer genes may be missed in routine diagnostics due to neighbouring sequence variants.

    PubMed

    Bartels, Stephan; Schipper, Elisa; Hasemeier, Britta; Kreipe, Hans; Lehmann, Ulrich

    2018-05-27

    The detection of hotspot mutations in key cancer genes is now an essential part of the diagnostic work-up in molecular pathology. Nearly all assays for mutation detection involve an amplification step. A second single nucleotide variant (SNV) on the same allele adjacent to a mutational hotspot can interfere with primer binding, leading to unnoticed allele-specific amplification of the wild type allele and thereby false-negative mutation testing. We present two diagnostic cases with false negative sequence results for JAK2 and SRSF2. In both cases mutations would have escaped detection if only one strand of DNA had been analysed. Because many commercially available diagnostic kits rely on the analysis of only one DNA strand they are prone to fail in cases like these. Detailed protocols and quality control measures to prevent corresponding pitfalls are presented. Copyright © 2017. Published by Elsevier Inc.

  1. Single-cell whole exome and targeted sequencing in NPM1/FLT3 positive pediatric acute myeloid leukemia.

    PubMed

    Walter, Christiane; Pozzorini, Christian; Reinhardt, Katarina; Geffers, Robert; Xu, Zhenyu; Reinhardt, Dirk; von Neuhoff, Nils; Hanenberg, Helmut

    2018-02-01

    The small portion of leukemic stem cells (LSCs) in acute myeloid leukemia (AML) present in children and adolescents is often masked by the high background of AML blasts and normal hematopoietic cells. The aim of the current study was to establish a simple workflow for reliable genetic analysis of single LSC-enriched blasts from pediatric patients. For three AMLs with mutations in nucleophosmin 1 and/or fms-like tyrosine kinase 3, we performed whole genome amplification on sorted single-cell DNA followed by whole exome sequencing (WES). The corresponding bulk bone marrow DNAs were also analyzed by WES and by targeted sequencing (TS) that included 54 genes associated with myeloid malignancies. Analysis revealed that read coverage statistics were comparable between single-cell and bulk WES data, indicating high-quality whole genome amplification. From 102 single-cell variants, 72 single nucleotide variants and insertions or deletions (70%) were consistently found in the two bulk DNA analyses. Variants reliably detected in single cells were also present in TS. However, initial screening by WES with read counts between 50-72× failed to detect rare AML subclones in the bulk DNAs. In summary, our study demonstrated that single-cell WES combined with bulk DNA TS is a promising tool set for detecting AML subclones and possibly LSCs. © 2017 Wiley Periodicals, Inc.

  2. G2S: a web-service for annotating genomic variants on 3D protein structures.

    PubMed

    Wang, Juexin; Sheridan, Robert; Sumer, S Onur; Schultz, Nikolaus; Xu, Dong; Gao, Jianjiong

    2018-06-01

    Accurately mapping and annotating genomic locations on 3D protein structures is a key step in structure-based analysis of genomic variants detected by recent large-scale sequencing efforts. There are several mapping resources currently available, but none of them provides a web API (Application Programming Interface) that supports programmatic access. We present G2S, a real-time web API that provides automated mapping of genomic variants on 3D protein structures. G2S can align genomic locations of variants, protein locations, or protein sequences to protein structures and retrieve the mapped residues from structures. G2S API uses REST-inspired design and it can be used by various clients such as web browsers, command terminals, programming languages and other bioinformatics tools for bringing 3D structures into genomic variant analysis. The webserver and source codes are freely available at https://g2s.genomenexus.org. g2s@genomenexus.org. Supplementary data are available at Bioinformatics online.

  3. Negligible impact of rare autoimmune-locus coding-region variants on missing heritability.

    PubMed

    Hunt, Karen A; Mistry, Vanisha; Bockett, Nicholas A; Ahmad, Tariq; Ban, Maria; Barker, Jonathan N; Barrett, Jeffrey C; Blackburn, Hannah; Brand, Oliver; Burren, Oliver; Capon, Francesca; Compston, Alastair; Gough, Stephen C L; Jostins, Luke; Kong, Yong; Lee, James C; Lek, Monkol; MacArthur, Daniel G; Mansfield, John C; Mathew, Christopher G; Mein, Charles A; Mirza, Muddassar; Nutland, Sarah; Onengut-Gumuscu, Suna; Papouli, Efterpi; Parkes, Miles; Rich, Stephen S; Sawcer, Steven; Satsangi, Jack; Simmonds, Matthew J; Trembath, Richard C; Walker, Neil M; Wozniak, Eva; Todd, John A; Simpson, Michael A; Plagnol, Vincent; van Heel, David A

    2013-06-13

    Genome-wide association studies (GWAS) have identified common variants of modest-effect size at hundreds of loci for common autoimmune diseases; however, a substantial fraction of heritability remains unexplained, to which rare variants may contribute. To discover rare variants and test them for association with a phenotype, most studies re-sequence a small initial sample size and then genotype the discovered variants in a larger sample set. This approach fails to analyse a large fraction of the rare variants present in the entire sample set. Here we perform simultaneous amplicon-sequencing-based variant discovery and genotyping for coding exons of 25 GWAS risk genes in 41,911 UK residents of white European origin, comprising 24,892 subjects with six autoimmune disease phenotypes and 17,019 controls, and show that rare coding-region variants at known loci have a negligible role in common autoimmune disease susceptibility. These results do not support the rare-variant synthetic genome-wide-association hypothesis (in which unobserved rare causal variants lead to association detected at common tag variants). Many known autoimmune disease risk loci contain multiple, independently associated, common and low-frequency variants, and so genes at these loci are a priori stronger candidates for harbouring rare coding-region variants than other genes. Our data indicate that the missing heritability for common autoimmune diseases may not be attributable to the rare coding-region variant portion of the allelic spectrum, but perhaps, as others have proposed, may be a result of many common-variant loci of weak effect.

  4. A FRMD7 variant in a Japanese family causes congenital nystagmus.

    PubMed

    Kohmoto, Tomohiro; Okamoto, Nana; Satomura, Shigeko; Naruto, Takuya; Komori, Takahide; Hashimoto, Toshiaki; Imoto, Issei

    2015-01-01

    Idiopathic congenital nystagmus (ICN) is a genetically heterogeneous eye movement disorder that causes a large proportion of childhood visual impairment. Here we describe a missense variant (p.L292P) within a mutation-rich region of FRMD7 detected in three affected male siblings in a Japanese family with X-linked ICN. Combining sequence analysis and results from structural and functional predictions, we report p.L292P as a variant potentially disrupting FRMD7 function associated with X-linked ICN.

  5. A FRMD7 variant in a Japanese family causes congenital nystagmus

    PubMed Central

    Kohmoto, Tomohiro; Okamoto, Nana; Satomura, Shigeko; Naruto, Takuya; Komori, Takahide; Hashimoto, Toshiaki; Imoto, Issei

    2015-01-01

    Idiopathic congenital nystagmus (ICN) is a genetically heterogeneous eye movement disorder that causes a large proportion of childhood visual impairment. Here we describe a missense variant (p.L292P) within a mutation-rich region of FRMD7 detected in three affected male siblings in a Japanese family with X-linked ICN. Combining sequence analysis and results from structural and functional predictions, we report p.L292P as a variant potentially disrupting FRMD7 function associated with X-linked ICN. PMID:27081518

  6. Detecting and Estimating Contamination of Human DNA Samples in Sequencing and Array-Based Genotype Data

    PubMed Central

    Jun, Goo; Flickinger, Matthew; Hetrick, Kurt N.; Romm, Jane M.; Doheny, Kimberly F.; Abecasis, Gonçalo R.; Boehnke, Michael; Kang, Hyun Min

    2012-01-01

    DNA sample contamination is a serious problem in DNA sequencing studies and may result in systematic genotype misclassification and false positive associations. Although methods exist to detect and filter out cross-species contamination, few methods to detect within-species sample contamination are available. In this paper, we describe methods to identify within-species DNA sample contamination based on (1) a combination of sequencing reads and array-based genotype data, (2) sequence reads alone, and (3) array-based genotype data alone. Analysis of sequencing reads allows contamination detection after sequence data is generated but prior to variant calling; analysis of array-based genotype data allows contamination detection prior to generation of costly sequence data. Through a combination of analysis of in silico and experimentally contaminated samples, we show that our methods can reliably detect and estimate levels of contamination as low as 1%. We evaluate the impact of DNA contamination on genotype accuracy and propose effective strategies to screen for and prevent DNA contamination in sequencing studies. PMID:23103226

  7. Comprehensive Cancer-Predisposition Gene Testing in an Adult Multiple Primary Tumor Series Shows a Broad Range of Deleterious Variants and Atypical Tumor Phenotypes.

    PubMed

    Whitworth, James; Smith, Philip S; Martin, Jose-Ezequiel; West, Hannah; Luchetti, Andrea; Rodger, Faye; Clark, Graeme; Carss, Keren; Stephens, Jonathan; Stirrups, Kathleen; Penkett, Chris; Mapeta, Rutendo; Ashford, Sofie; Megy, Karyn; Shakeel, Hassan; Ahmed, Munaza; Adlard, Julian; Barwell, Julian; Brewer, Carole; Casey, Ruth T; Armstrong, Ruth; Cole, Trevor; Evans, Dafydd Gareth; Fostira, Florentia; Greenhalgh, Lynn; Hanson, Helen; Henderson, Alex; Hoffman, Jonathan; Izatt, Louise; Kumar, Ajith; Kwong, Ava; Lalloo, Fiona; Ong, Kai Ren; Paterson, Joan; Park, Soo-Mi; Chen-Shtoyerman, Rakefet; Searle, Claire; Side, Lucy; Skytte, Anne-Bine; Snape, Katie; Woodward, Emma R; Tischkowitz, Marc D; Maher, Eamonn R

    2018-06-12

    Multiple primary tumors (MPTs) affect a substantial proportion of cancer survivors and can result from various causes, including inherited predisposition. Currently, germline genetic testing of MPT-affected individuals for variants in cancer-predisposition genes (CPGs) is mostly targeted by tumor type. We ascertained pre-assessed MPT individuals (with at least two primary tumors by age 60 years or at least three by 70 years) from genetics centers and performed whole-genome sequencing (WGS) on 460 individuals from 440 families. Despite previous negative genetic assessment and molecular investigations, pathogenic variants in moderate- and high-risk CPGs were detected in 67/440 (15.2%) probands. WGS detected variants that would not be (or were not) detected by targeted resequencing strategies, including low-frequency structural variants (6/440 [1.4%] probands). In most individuals with a germline variant assessed as pathogenic or likely pathogenic (P/LP), at least one of their tumor types was characteristic of variants in the relevant CPG. However, in 29 probands (42.2% of those with a P/LP variant), the tumor phenotype appeared discordant. The frequency of individuals with truncating or splice-site CPG variants and at least one discordant tumor type was significantly higher than in a control population (χ 2 = 43.642; p ≤ 0.0001). 2/67 (3%) probands with P/LP variants had evidence of multiple inherited neoplasia allele syndrome (MINAS) with deleterious variants in two CPGs. Together with variant detection rates from a previous series of similarly ascertained MPT-affected individuals, the present results suggest that first-line comprehensive CPG analysis in an MPT cohort referred to clinical genetics services would detect a deleterious variant in about a third of individuals. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

  8. Positional bias in variant calls against draft reference assemblies.

    PubMed

    Briskine, Roman V; Shimizu, Kentaro K

    2017-03-28

    Whole genome resequencing projects may implement variant calling using draft reference genomes assembled de novo from short-read libraries. Despite lower quality of such assemblies, they allowed researchers to extend a wide range of population genetic and genome-wide association analyses to non-model species. As the variant calling pipelines are complex and involve many software packages, it is important to understand inherent biases and limitations at each step of the analysis. In this article, we report a positional bias present in variant calling performed against draft reference assemblies constructed from de Bruijn or string overlap graphs. We assessed how frequently variants appeared at each position counted from ends of a contig or scaffold sequence, and discovered unexpectedly high number of variants at the positions related to the length of either k-mers or reads used for the assembly. We detected the bias in both publicly available draft assemblies from Assemblathon 2 competition as well as in the assemblies we generated from our simulated short-read data. Simulations confirmed that the bias causing variants are predominantly false positives induced by reads from spatially distant repeated sequences. The bias is particularly strong in contig assemblies. Scaffolding does not eliminate the bias but tends to mitigate it because of the changes in variants' relative positions and alterations in read alignments. The bias can be effectively reduced by filtering out the variants that reside in repetitive elements. Draft genome sequences generated by several popular assemblers appear to be susceptible to the positional bias potentially affecting many resequencing projects in non-model species. The bias is inherent to the assembly algorithms and arises from their particular handling of repeated sequences. It is recommended to reduce the bias by filtering especially if higher-quality genome assembly cannot be achieved. Our findings can help other researchers to improve the quality of their variant data sets and reduce artefactual findings in downstream analyses.

  9. Statistical tests for detecting associations with groups of genetic variants: generalization, evaluation, and implementation

    PubMed Central

    Ferguson, John; Wheeler, William; Fu, YiPing; Prokunina-Olsson, Ludmila; Zhao, Hongyu; Sampson, Joshua

    2013-01-01

    With recent advances in sequencing, genotyping arrays, and imputation, GWAS now aim to identify associations with rare and uncommon genetic variants. Here, we describe and evaluate a class of statistics, generalized score statistics (GSS), that can test for an association between a group of genetic variants and a phenotype. GSS are a simple weighted sum of single-variant statistics and their cross-products. We show that the majority of statistics currently used to detect associations with rare variants are equivalent to choosing a specific set of weights within this framework. We then evaluate the power of various weighting schemes as a function of variant characteristics, such as MAF, the proportion associated with the phenotype, and the direction of effect. Ultimately, we find that two classical tests are robust and powerful, but details are provided as to when other GSS may perform favorably. The software package CRaVe is available at our website (http://dceg.cancer.gov/bb/tools/crave). PMID:23092956

  10. Comprehensive analysis of the MLH1 promoter region in 480 patients with colorectal cancer and 1150 controls reveals new variants including one with a heritable constitutional MLH1 epimutation.

    PubMed

    Morak, Monika; Ibisler, Ayseguel; Keller, Gisela; Jessen, Ellen; Laner, Andreas; Gonzales-Fassrainer, Daniela; Locher, Melanie; Massdorf, Trisari; Nissen, Anke M; Benet-Pagès, Anna; Holinski-Feder, Elke

    2018-04-01

    Germline defects in MLH1 , MSH2 , MSH6 and PMS2 predisposing for Lynch syndrome (LS) are mainly based on sequence changes, whereas a constitutional epimutation of MLH1 (CEM) is exceptionally rare. This abnormal MLH1 promoter methylation is not hereditary when arising de novo, whereas a stably heritable and variant-induced CEM was described for one single allele. We searched for MLH1 promoter variants causing a germline or somatic methylation induction or transcriptional repression. We analysed the MLH1 promoter sequence in five different patient groups with colorectal cancer (CRC) (n=480) composed of patients with i) CEM (n=16), ii) unsolved loss of MLH1 expression in CRC (n=37), iii) CpG-island methylator-phenotype CRC (n=102), iv) patients with LS (n=83) and v) MLH1-proficient CRC (n=242) as controls. 1150 patients with non-LS tumours also served as controls to correctly judge the results. We detected 10 rare MLH1 promoter variants. One novel, complex MLH1 variant c.-63_-58delins18 is present in a patient with CRC with CEM and his sister, both showing a complete allele-specific promoter methylation and transcriptional silencing. The other nine promoter variants detected in 17 individuals were not associated with methylation. For four of these, a normal, biallelic MLH1 expression was found in the patients' cDNA. We report the second promoter variant stably inducing a hereditary CEM. Concerning the classification of promoter variants, we discuss contradictory results from the literature for two variants, describe classification discrepancies between existing rules for five variants, suggest the (re-)classification of five promoter variants to (likely) benign and regard four variants as functionally unclear. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  11. MACARON: A python framework to identify and re-annotate multi-base affected codons in whole genome/exome sequence data.

    PubMed

    Khan, Waqasuddin; Saripella, Ganapathi Varma-; Ludwig, Thomas; Cuppens, Tania; Thibord, Florian; Génin, Emmanuelle; Deleuze, Jean-Francois; Trégouët, David-Alexandre

    2018-05-03

    Predicted deleteriousness of coding variants is a frequently used criterion to filter out variants detected in next-generation sequencing projects and to select candidates impacting on the risk of human diseases. Most available dedicated tools implement a base-to-base annotation approach that could be biased in presence of several variants in the same genetic codon. We here proposed the MACARON program that, from a standard VCF file, identifies, re-annotates and predicts the amino acid change resulting from multiple single nucleotide variants (SNVs) within the same genetic codon. Applied to the whole exome dataset of 573 individuals, MACARON identifies 114 situations where multiple SNVs within a genetic codon induce an amino acid change that is different from those predicted by standard single SNV annotation tool. Such events are not uncommon and deserve to be studied in sequencing projects with inconclusive findings. MACARON is written in python with codes available on the GENMED website (www.genmed.fr). david-alexandre.tregouet@inserm.fr. Supplementary data are available at Bioinformatics online.

  12. Label-Free Relative Quantitation of Isobaric and Isomeric Human Histone H2A and H2B Variants by Fourier Transform Ion Cyclotron Resonance Top-Down MS/MS.

    PubMed

    Dang, Xibei; Singh, Amar; Spetman, Brian D; Nolan, Krystal D; Isaacs, Jennifer S; Dennis, Jonathan H; Dalton, Stephen; Marshall, Alan G; Young, Nicolas L

    2016-09-02

    Histone variants are known to play a central role in genome regulation and maintenance. However, many variants are inaccessible by antibody-based methods or bottom-up tandem mass spectrometry due to their highly similar sequences. For many, the only tractable approach is with intact protein top-down tandem mass spectrometry. Here, ultra-high-resolution FT-ICR MS and MS/MS yield quantitative relative abundances of all detected HeLa H2A and H2B isobaric and isomeric variants with a label-free approach. We extend the analysis to identify and relatively quantitate 16 proteoforms from 12 sequence variants of histone H2A and 10 proteoforms of histone H2B from three other cell lines: human embryonic stem cells (WA09), U937, and a prostate cancer cell line LaZ. The top-down MS/MS approach provides a path forward for more extensive elucidation of the biological role of many previously unstudied histone variants and post-translational modifications.

  13. CanvasDB: a local database infrastructure for analysis of targeted- and whole genome re-sequencing projects

    PubMed Central

    Ameur, Adam; Bunikis, Ignas; Enroth, Stefan; Gyllensten, Ulf

    2014-01-01

    CanvasDB is an infrastructure for management and analysis of genetic variants from massively parallel sequencing (MPS) projects. The system stores SNP and indel calls in a local database, designed to handle very large datasets, to allow for rapid analysis using simple commands in R. Functional annotations are included in the system, making it suitable for direct identification of disease-causing mutations in human exome- (WES) or whole-genome sequencing (WGS) projects. The system has a built-in filtering function implemented to simultaneously take into account variant calls from all individual samples. This enables advanced comparative analysis of variant distribution between groups of samples, including detection of candidate causative mutations within family structures and genome-wide association by sequencing. In most cases, these analyses are executed within just a matter of seconds, even when there are several hundreds of samples and millions of variants in the database. We demonstrate the scalability of canvasDB by importing the individual variant calls from all 1092 individuals present in the 1000 Genomes Project into the system, over 4.4 billion SNPs and indels in total. Our results show that canvasDB makes it possible to perform advanced analyses of large-scale WGS projects on a local server. Database URL: https://github.com/UppsalaGenomeCenter/CanvasDB PMID:25281234

  14. CanvasDB: a local database infrastructure for analysis of targeted- and whole genome re-sequencing projects.

    PubMed

    Ameur, Adam; Bunikis, Ignas; Enroth, Stefan; Gyllensten, Ulf

    2014-01-01

    CanvasDB is an infrastructure for management and analysis of genetic variants from massively parallel sequencing (MPS) projects. The system stores SNP and indel calls in a local database, designed to handle very large datasets, to allow for rapid analysis using simple commands in R. Functional annotations are included in the system, making it suitable for direct identification of disease-causing mutations in human exome- (WES) or whole-genome sequencing (WGS) projects. The system has a built-in filtering function implemented to simultaneously take into account variant calls from all individual samples. This enables advanced comparative analysis of variant distribution between groups of samples, including detection of candidate causative mutations within family structures and genome-wide association by sequencing. In most cases, these analyses are executed within just a matter of seconds, even when there are several hundreds of samples and millions of variants in the database. We demonstrate the scalability of canvasDB by importing the individual variant calls from all 1092 individuals present in the 1000 Genomes Project into the system, over 4.4 billion SNPs and indels in total. Our results show that canvasDB makes it possible to perform advanced analyses of large-scale WGS projects on a local server. Database URL: https://github.com/UppsalaGenomeCenter/CanvasDB. © The Author(s) 2014. Published by Oxford University Press.

  15. Distinctive Epstein-Barr virus variants associated with benign and malignant pediatric pathologies: LMP1 sequence characterization and linkage with other viral gene polymorphisms.

    PubMed

    Lorenzetti, Mario Alejandro; Gantuz, Magdalena; Altcheh, Jaime; De Matteo, Elena; Chabay, Paola Andrea; Preciado, María Victoria

    2012-03-01

    The ubiquitous Epstein-Barr virus (EBV) is related to the development of lymphoma and is also the etiological agent for infectious mononucleosis (IM). Sequence variations in the gene encoding LMP1 have been deeply studied in different pathologies and geographic regions. Controversial results propose the existence of tumor-related variants, while others argued in favor of a geographical distribution of these variants. Reports assessing EBV variants in IM were performed in adult patients who displayed multiple variant infections. In the present study, LMP1 variants in 15 pediatric patients with IM and 20 pediatric patients with EBV-associated lymphomas from Argentina were analyzed as representatives of benign and malignant infections in children, respectively. A 3-month follow-up study of LMP1 variants in peripheral blood cells and in oral secretions of patients with IM was performed. Moreover, an integrated linkage analysis was performed with variants of EBNA1 and the promoter region of BZLF1. Similar sequence polymorphisms were detected in both pathological conditions, IM and lymphoma, but these differ from those previously described in healthy donors from Argentina and Brazil. The results suggest that certain LMP1 polymorphisms, namely, the 30-bp deletion and high copy number of the 33-bp repeats, are associated with EBV-related pathologies, either benign or malignant, instead of just being tumor related. Additionally, this is the first study to describe the Alaskan variant in EBV-related lymphomas that previously was restricted to nasopharyngeal carcinomas from North America.

  16. Distinctive Epstein-Barr Virus Variants Associated with Benign and Malignant Pediatric Pathologies: LMP1 Sequence Characterization and Linkage with Other Viral Gene Polymorphisms

    PubMed Central

    Gantuz, Magdalena; Altcheh, Jaime; De Matteo, Elena; Chabay, Paola Andrea; Preciado, María Victoria

    2012-01-01

    The ubiquitous Epstein-Barr virus (EBV) is related to the development of lymphoma and is also the etiological agent for infectious mononucleosis (IM). Sequence variations in the gene encoding LMP1 have been deeply studied in different pathologies and geographic regions. Controversial results propose the existence of tumor-related variants, while others argued in favor of a geographical distribution of these variants. Reports assessing EBV variants in IM were performed in adult patients who displayed multiple variant infections. In the present study, LMP1 variants in 15 pediatric patients with IM and 20 pediatric patients with EBV-associated lymphomas from Argentina were analyzed as representatives of benign and malignant infections in children, respectively. A 3-month follow-up study of LMP1 variants in peripheral blood cells and in oral secretions of patients with IM was performed. Moreover, an integrated linkage analysis was performed with variants of EBNA1 and the promoter region of BZLF1. Similar sequence polymorphisms were detected in both pathological conditions, IM and lymphoma, but these differ from those previously described in healthy donors from Argentina and Brazil. The results suggest that certain LMP1 polymorphisms, namely, the 30-bp deletion and high copy number of the 33-bp repeats, are associated with EBV-related pathologies, either benign or malignant, instead of just being tumor related. Additionally, this is the first study to describe the Alaskan variant in EBV-related lymphomas that previously was restricted to nasopharyngeal carcinomas from North America. PMID:22205789

  17. Development and Validation of Targeted Next-Generation Sequencing Panels for Detection of Germline Variants in Inherited Diseases.

    PubMed

    Santani, Avni; Murrell, Jill; Funke, Birgit; Yu, Zhenming; Hegde, Madhuri; Mao, Rong; Ferreira-Gonzalez, Andrea; Voelkerding, Karl V; Weck, Karen E

    2017-06-01

    - The number of targeted next-generation sequencing (NGS) panels for genetic diseases offered by clinical laboratories is rapidly increasing. Before an NGS-based test is implemented in a clinical laboratory, appropriate validation studies are needed to determine the performance characteristics of the test. - To provide examples of assay design and validation of targeted NGS gene panels for the detection of germline variants associated with inherited disorders. - The approaches used by 2 clinical laboratories for the development and validation of targeted NGS gene panels are described. Important design and validation considerations are examined. - Clinical laboratories must validate performance specifications of each test prior to implementation. Test design specifications and validation data are provided, outlining important steps in validation of targeted NGS panels by clinical diagnostic laboratories.

  18. Looking beyond the exome: a phenotype-first approach to molecular diagnostic resolution in rare and undiagnosed diseases

    PubMed Central

    Pena, Loren DM; Jiang, Yong-Hui; Schoch, Kelly; Spillmann, Rebecca C.; Walley, Nicole; Stong, Nicholas; Horn, Sarah Rapisardo; Sullivan, Jennifer A.; McConkie-Rosell, Allyn; Kansagra, Sujay; Smith, Edward C.; El-Dairi, Mays; Bellet, Jane; Ann Keels, Martha; Jasien, Joan; Kranz, Peter G.; Noel, Richard; Nagaraj, Shashi K.; Lark, Robert K.; Wechsler, Daniel SG; del Gaudio, Daniela; Leung, Marco L.; Hendon, Laura G.; Parker, Collette C.; Jones, Kelly L.; Goldstein, David B.; Shashi, Vandana

    2017-01-01

    Purpose To describe examples of missed pathogenic variants on whole exome sequencing (WES) and the importance of deep phenotyping for further diagnostic testing. Methods Guided by phenotypic information, three children with negative WES underwent targeted single gene testing. Results Individual 1 had a clinical diagnosis consistent with infantile systemic hyalinosis, although WES and an NGS-based ANTXR2 test were negative. Sanger sequencing of ANTXR2 revealed a homozygous single base pair insertion, previously missed by the WES variant caller software. Individual 2 had neurodevelopmental regression and cerebellar atrophy, with no diagnosis on WES. New clinical findings prompted Sanger sequencing and copy number testing of PLA2G6. A novel homozygous deletion of the non-coding exon 1 (not included in the WES capture kit) was detected, with extension into the promoter, confirming the clinical suspicion of infantile neuroaxonal dystrophy. Individual 3 had progressive ataxia, spasticity and MRI changes of vanishing white matter leukoencephalopathy. An NGS leukodystrophy gene panel and WES showed a heterozygous pathogenic variant in EIF2B5; no deletions/duplications were detected. Sanger sequencing of EIF2B5 showed a frameshift indel, likely missed due to failure of alignment. Conclusions These cases illustrate potential pitfalls of WES/NGS testing, and the importance of phenotype-guided molecular testing in yielding diagnoses. PMID:28914269

  19. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers.

    PubMed

    Quail, Michael A; Smith, Miriam; Coupland, Paul; Otto, Thomas D; Harris, Simon R; Connor, Thomas R; Bertoni, Anna; Swerdlow, Harold P; Gu, Yong

    2012-07-24

    Next generation sequencing (NGS) technology has revolutionized genomic and genetic research. The pace of change in this area is rapid with three major new sequencing platforms having been released in 2011: Ion Torrent's PGM, Pacific Biosciences' RS and the Illumina MiSeq. Here we compare the results obtained with those platforms to the performance of the Illumina HiSeq, the current market leader. In order to compare these platforms, and get sufficient coverage depth to allow meaningful analysis, we have sequenced a set of 4 microbial genomes with mean GC content ranging from 19.3 to 67.7%. Together, these represent a comprehensive range of genome content. Here we report our analysis of that sequence data in terms of coverage distribution, bias, GC distribution, variant detection and accuracy. Sequence generated by Ion Torrent, MiSeq and Pacific Biosciences technologies displays near perfect coverage behaviour on GC-rich, neutral and moderately AT-rich genomes, but a profound bias was observed upon sequencing the extremely AT-rich genome of Plasmodium falciparum on the PGM, resulting in no coverage for approximately 30% of the genome. We analysed the ability to call variants from each platform and found that we could call slightly more variants from Ion Torrent data compared to MiSeq data, but at the expense of a higher false positive rate. Variant calling from Pacific Biosciences data was possible but higher coverage depth was required. Context specific errors were observed in both PGM and MiSeq data, but not in that from the Pacific Biosciences platform. All three fast turnaround sequencers evaluated here were able to generate usable sequence. However there are key differences between the quality of that data and the applications it will support.

  20. Amplicon-based semiconductor sequencing of human exomes: performance evaluation and optimization strategies.

    PubMed

    Damiati, E; Borsani, G; Giacopuzzi, Edoardo

    2016-05-01

    The Ion Proton platform allows to perform whole exome sequencing (WES) at low cost, providing rapid turnaround time and great flexibility. Products for WES on Ion Proton system include the AmpliSeq Exome kit and the recently introduced HiQ sequencing chemistry. Here, we used gold standard variants from GIAB consortium to assess the performances in variants identification, characterize the erroneous calls and develop a filtering strategy to reduce false positives. The AmpliSeq Exome kit captures a large fraction of bases (>94 %) in human CDS, ClinVar genes and ACMG genes, but with 2,041 (7 %), 449 (13 %) and 11 (19 %) genes not fully represented, respectively. Overall, 515 protein coding genes contain hard-to-sequence regions, including 90 genes from ClinVar. Performance in variants detection was maximum at mean coverage >120×, while at 90× and 70× we measured a loss of variants of 3.2 and 4.5 %, respectively. WES using HiQ chemistry showed ~71/97.5 % sensitivity, ~37/2 % FDR and ~0.66/0.98 F1 score for indels and SNPs, respectively. The proposed low, medium or high-stringency filters reduced the amount of false positives by 10.2, 21.2 and 40.4 % for indels and 21.2, 41.9 and 68.2 % for SNP, respectively. Amplicon-based WES on Ion Proton platform using HiQ chemistry emerged as a competitive approach, with improved accuracy in variants identification. False-positive variants remain an issue for the Ion Torrent technology, but our filtering strategy can be applied to reduce erroneous variants.

  1. Molecular characterization of canine parvovirus variants (CPV-2a, CPV-2b, and CPV-2c) based on the VP2 gene in affected domestic dogs in Ecuador.

    PubMed

    la Torre, David De; Mafla, Eulalia; Puga, Byron; Erazo, Linda; Astolfi-Ferreira, Claudete; Ferreira, Antonio Piantino

    2018-04-01

    The objective of this study was to determine the presence of the variants of canine parvovirus (CPV)-2 in the city of Quito, Ecuador, due to the high domestic and street-type canine population, and to identify possible mutations at a genetic level that could be causing structural changes in the virus with a consequent influence on the immune response of the hosts. Thirty-five stool samples from different puppies with characteristic signs of the disease and positives for CPV through immunochromatography kits were collected from different veterinarian clinics of the city. Polymerase chain reaction and DNA sequencing were used to determine the mutations in residue 426 of the VP2 gene, which determines the variants of CPV-2; in addition, four samples were chosen for complete sequencing of the VP2 gene to identify all possible mutations in the circulating strains in this region of the country. The results revealed the presence of the three variants of CPV-2 with a prevalence of 57.1% (20/35) for CPV-2a, 8.5% (3/35) for CPV-2b, and 34.3% (12/35) for CPV-2c. In addition, complete sequencing of the VP2 gene showed amino acid substitutions in residues 87, 101, 139, 219, 297, 300, 305, 322, 324, 375, 386, 426, 440, and 514 of the three Ecuadorian variants when compared with the original CPV-2 sequence. This study describes the detection of CPV variants in the city of Quito, Ecuador. Variants of CPV-2 (2a, 2b, and 2c) have been reported in South America, and there are cases in Ecuador where CVP-2 is affecting even vaccinated puppies.

  2. COLD-PCR: improving the sensitivity of molecular diagnostics assays

    PubMed Central

    Milbury, Coren A; Li, Jin; Liu, Pingfang; Makrigiorgos, G Mike

    2011-01-01

    The detection of low-abundance DNA variants or mutations is of particular interest to medical diagnostics, individualized patient treatment and cancer prognosis; however, detection sensitivity for low-abundance variants is a pronounced limitation of most currently available molecular assays. We have recently developed coamplification at lower denaturation temperature-PCR (COLD-PCR) to resolve this limitation. This novel form of PCR selectively amplifies low-abundance DNA variants from mixtures of wild-type and mutant-containing (or variant-containing) sequences, irrespective of the mutation type or position on the amplicon, by using a critical denaturation temperature. The use of a lower denaturation temperature in COLD-PCR results in selective denaturation of amplicons with mutation-containing molecules within wild-type mutant heteroduplexes or with a lower melting temperature. COLD-PCR can be used in lieu of conventional PCR in several molecular applications, thus enriching the mutant fraction and improving the sensitivity of downstream mutation detection by up to 100-fold. PMID:21405967

  3. Detection of de novo single nucleotide variants in offspring of atomic-bomb survivors close to the hypocenter by whole-genome sequencing.

    PubMed

    Horai, Makiko; Mishima, Hiroyuki; Hayashida, Chisa; Kinoshita, Akira; Nakane, Yoshibumi; Matsuo, Tatsuki; Tsuruda, Kazuto; Yanagihara, Katsunori; Sato, Shinya; Imanishi, Daisuke; Imaizumi, Yoshitaka; Hata, Tomoko; Miyazaki, Yasushi; Yoshiura, Koh-Ichiro

    2018-03-01

    Ionizing radiation released by the atomic bombs at Hiroshima and Nagasaki, Japan, in 1945 caused many long-term illnesses, including increased risks of malignancies such as leukemia and solid tumours. Radiation has demonstrated genetic effects in animal models, leading to concerns over the potential hereditary effects of atomic bomb-related radiation. However, no direct analyses of whole DNA have yet been reported. We therefore investigated de novo variants in offspring of atomic-bomb survivors by whole-genome sequencing (WGS). We collected peripheral blood from three trios, each comprising a father (atomic-bomb survivor with acute radiation symptoms), a non-exposed mother, and their child, none of whom had any past history of haematological disorders. One trio of non-exposed individuals was included as a control. DNA was extracted and the numbers of de novo single nucleotide variants in the children were counted by WGS with sequencing confirmation. Gross structural variants were also analysed. Written informed consent was obtained from all participants prior to the study. There were 62, 81, and 42 de novo single nucleotide variants in the children of atomic-bomb survivors, compared with 48 in the control trio. There were no gross structural variants in any trio. These findings are in accord with previously published results that also showed no significant genetic effects of atomic-bomb radiation on second-generation survivors.

  4. Cumulative role of rare and common putative functional genetic variants at NPAS3 in schizophrenia susceptibility.

    PubMed

    González-Peñas, Javier; Arrojo, Manuel; Paz, Eduardo; Brenlla, Julio; Páramo, Mario; Costas, Javier

    2015-10-01

    Schizophrenia may be considered a human-specific disorder arisen as a maladaptive by-product of human-specific brain evolution. Therefore, genetic variants involved in susceptibility to schizophrenia may be identified among those genes related to acquisition of human-specific traits. NPAS3, a transcription factor involved in central nervous system development and neurogenesis, seems to be implicated in the evolution of human brain, as it is the human gene with most human-specific accelerated elements (HAEs), i.e., .mammalian conserved regulatory sequences with accelerated evolution in the lineage leading to humans after human-chimpanzee split. We hypothesize that any nucleotide variant at the NPAS3 HAEs may lead to altered susceptibility to schizophrenia. Twenty-one variants at these HAEs detected by the 1000 genomes Project, as well as five additional variants taken from psychiatric genome-wide association studies, were genotyped in 538 schizophrenic patients and 539 controls from Galicia. Analyses at the haplotype level or based on the cumulative role of the variants assuming different susceptibility models did not find any significant association in spite of enough power under several plausible scenarios regarding direction of effect and the specific role of rare and common variants. These results suggest that, contrary to our hypothesis, the special evolution of the NPAS3 HAEs in Homo relaxed the strong constraint on sequence that characterized these regions during mammalian evolution, allowing some sequence changes without any effect on schizophrenia risk. © 2015 Wiley Periodicals, Inc.

  5. Deep sequencing shows low-level oncogenic hepatitis B virus variants persists post-liver transplant despite potent anti-HBV prophylaxis.

    PubMed

    Lau, K C K; Osiowy, C; Giles, E; Lusina, B; van Marle, G; Burak, K W; Coffin, C S

    2018-06-01

    Recent studies suggest that withdrawal of hepatitis B immune globulin (HBIG) and nucleos(t)ide analogues (NA) prophylaxis may be considered in HBV surface antigen (HBsAg)-negative liver transplant (LT) recipients with a low risk of disease recurrence. However, the frequency of occult HBV infection (OBI) and HBV variants after LT in the current era of potent NA therapy is unknown. Twelve LT recipients on prophylaxis were tested in matched plasma and peripheral blood mononuclear cells (PBMCs) for HBV quasispecies by in-house nested PCR and next-generation sequencing of amplicons. HBV covalently closed circular DNA (cccDNA) was detected in Hirt DNA isolated from PBMCs with cccDNA-specific primers and confirmed by nucleic acid hybridization and Sanger sequencing. HBV mRNA in PBMC was detected with reverse-transcriptase nested PCR. In LT recipients on immunosuppressive therapy (10/12 male; median age 57.5 [IQR: 39.8-66.5]; median follow-up post-LT 60 months; 6 pre-LT hepatocellular carcinoma [HCC]), 9 were HBsAg-. HBV DNA was detected in all plasma and PBMC tested; cccDNA and/or mRNA was detected in the PBMC of 10/12 patients. Significant HBV quasispecies diversity (ie 143-2212 nonredundant HBV species) was noted in both sites, and single nucleotide polymorphisms associated with cirrhosis and HCC were detected at varying frequencies. In conclusion, OBI and HBV variants associated with severe liver disease persist in LT recipients on prophylaxis. Although HBV control and cccDNA transcriptional silencing may occur despite immunosuppression, complete virological eradication does not occur in LT recipients with a history of HBV-related end-stage liver disease. © 2018 John Wiley & Sons Ltd.

  6. Single-Molecule Sequencing Reveals Complex Genome Variation of Hepatitis B Virus during 15 Years of Chronic Infection following Liver Transplantation

    PubMed Central

    Betz-Stablein, B. D.; Töpfer, A.; Littlejohn, M.; Yuen, L.; Colledge, D.; Sozzi, V.; Angus, P.; Thompson, A.; Revill, P.; Beerenwinkel, N.; Warner, N.

    2016-01-01

    ABSTRACT Chronic hepatitis B (CHB) is prevalent worldwide. The infectious agent, hepatitis B virus (HBV), replicates via an RNA intermediate and is error prone, leading to the rapid generation of closely related but not identical viral variants, including those that can escape host immune responses and antiviral treatments. The complexity of CHB can be further enhanced by the presence of HBV variants with large deletions in the genome generated via splicing (spHBV variants). Although spHBV variants are incapable of autonomous replication, their replication is rescued by wild-type HBV. spHBV variants have been shown to enhance wild-type virus replication, and their prevalence increases with liver disease progression. Single-molecule deep sequencing was performed on whole HBV genomes extracted from samples, including the liver explant, longitudinally collected from a subject with CHB over a 15-year period after liver transplantation. By employing novel bioinformatics methods, this analysis showed that the dynamics of the viral population across a period of changing treatment regimens was complex. The spHBV variants detected in the liver explant remained present posttransplantation, and a highly diverse novel spHBV population as well as variants with multiple deletions in the pre-S genes emerged. The identification of novel mutations outside the HBV reverse transcriptase gene that co-occurred with known drug resistance-associated mutations highlights the relevance of using full-genome deep sequencing and supports the hypothesis that drug resistance involves interactions across the full length of the HBV genome. IMPORTANCE Single-molecule sequencing allowed the characterization, in unprecedented detail, of the evolution of HBV populations and offered unique insights into the dynamics of defective and spHBV variants following liver transplantation and complex treatment regimens. This analysis also showed the rapid adaptation of HBV populations to treatment regimens with evolving drug resistance phenotypes and evidence of purifying selection across the whole genome. Finally, the new open-source bioinformatics tools with the capacity to easily identify potential spliced variants from deep sequencing data are freely available. PMID:27252524

  7. Identification of a novel alternative splicing variant of hemocyanin from shrimp Litopenaeus vannamei.

    PubMed

    Zhao, Shan; Lu, Xin; Zhang, Yueling; Zhao, Xianliang; Zhong, Mingqi; Li, Shengkang; Lun, Jingsheng

    2013-01-01

    Recent evidences suggest that invertebrates express families of immune molecules with high levels of sequence diversity. Hemocyanin is an important non-specific immune molecule present in the hemolymph of both mollusks and arthropods. In the present study, we characterized a novel alternative splicing variant of hemocyanin (cHE1) from Litopenaeus vannamei that produced mRNA transcript of 2579 bp in length. The isoform contained two additional sequences of 296 and 267 bp in the 5'- and 3'-terminus respectively, in comparison to that of wild type hemocyanin (cHE). Sequence of cHE1 shows 100% identity to that of hemocyanin genomic DNA (HE, which does not form an open reading frame), suggesting that cHE1 might be an alternative splicing variant due to intron retention. Moreover, cHE1 could be detected by RT-PCR from five tissues (heart, gill, stomach, intestine and brain), and from shrimps at stages from nauplius to mysis larva. Further, cHE1 mRNA transcripts were significantly increased in hearts after 12h of infection with Vibrio parahemolyticus or poly I: C, while no significant difference in the transcript levels of hepatopancreas cHE was detected in the pathogen-treated shrimps during the period. In summary, these studies suggested a novel splicing variant of hemocyanin in shrimp, which might be involved in shrimp resistance to pathogenic infection. Copyright © 2013 Elsevier B.V. All rights reserved.

  8. Congenital chloride diarrhea needs to be distinguished from Bartter and Gitelman syndrome.

    PubMed

    Matsunoshita, Natsuki; Nozu, Kandai; Yoshikane, Masahide; Kawaguchi, Azusa; Fujita, Naoya; Morisada, Naoya; Ishimori, Shingo; Yamamura, Tomohiko; Minamikawa, Shogo; Horinouchi, Tomoko; Nakanishi, Keita; Fujimura, Junya; Ninchoji, Takeshi; Morioka, Ichiro; Nagase, Hiroaki; Taniguchi-Ikeda, Mariko; Kaito, Hiroshi; Iijima, Kazumoto

    2018-05-30

    Pseudo-Bartter/Gitelman syndrome (p-BS/GS) encompasses a clinically heterogeneous group of inherited or acquired disorders similar to Bartter syndrome (BS) or Gitelman syndrome (GS), both renal salt-losing tubulopathies. Phenotypic overlap frequently occurs between p-BS/GS and BS/GS, which are difficult to diagnose based on their clinical presentation and require genetic tests for accurate diagnosis. In addition, p-BS/GS can occur as a result of other inherited diseases such as cystic fibrosis, autosomal dominant hypocalcemia, Dent disease, or congenital chloride diarrhea (CCD). However, the detection of the variants in genes other than known BS/GS-causing genes by conventional Sanger sequencing requires substantial time and resources. We studied 27 cases clinically diagnosed with BS/GS, but with negative genetic tests for known BS/GS genes. We conducted targeted sequencing for 22 genes including genes responsible for tubulopathies and other inherited diseases manifesting with p-BS/GS symptoms. We detected the SLC26A3 gene variants responsible for CCD in two patients. In Patient 1, we found the SLC26A3 compound heterozygous variants: c.354delC and c.1008insT. In Patient 2, we identified the compound heterozygous variants: c.877G > A, p.(Glu293Lys), and c.1008insT. Our results suggest that a comprehensive genetic screening system using targeted sequencing is useful for the diagnosis of patients with p-BS/GS with alternative genetic origins.

  9. cnvScan: a CNV screening and annotation tool to improve the clinical utility of computational CNV prediction from exome sequencing data.

    PubMed

    Samarakoon, Pubudu Saneth; Sorte, Hanne Sørmo; Stray-Pedersen, Asbjørg; Rødningen, Olaug Kristin; Rognes, Torbjørn; Lyle, Robert

    2016-01-14

    With advances in next generation sequencing technology and analysis methods, single nucleotide variants (SNVs) and indels can be detected with high sensitivity and specificity in exome sequencing data. Recent studies have demonstrated the ability to detect disease-causing copy number variants (CNVs) in exome sequencing data. However, exonic CNV prediction programs have shown high false positive CNV counts, which is the major limiting factor for the applicability of these programs in clinical studies. We have developed a tool (cnvScan) to improve the clinical utility of computational CNV prediction in exome data. cnvScan can accept input from any CNV prediction program. cnvScan consists of two steps: CNV screening and CNV annotation. CNV screening evaluates CNV prediction using quality scores and refines this using an in-house CNV database, which greatly reduces the false positive rate. The annotation step provides functionally and clinically relevant information using multiple source datasets. We assessed the performance of cnvScan on CNV predictions from five different prediction programs using 64 exomes from Primary Immunodeficiency (PIDD) patients, and identified PIDD-causing CNVs in three individuals from two different families. In summary, cnvScan reduces the time and effort required to detect disease-causing CNVs by reducing the false positive count and providing annotation. This improves the clinical utility of CNV detection in exome data.

  10. New genetic variants of LATS1 detected in urinary bladder and colon cancer.

    PubMed

    Saadeldin, Mona K; Shawer, Heba; Mostafa, Ahmed; Kassem, Neemat M; Amleh, Asma; Siam, Rania

    2014-01-01

    LATS1, the large tumor suppressor 1 gene, encodes for a serine/threonine kinase protein and is implicated in cell cycle progression. LATS1 is down-regulated in various human cancers, such as breast cancer, and astrocytoma. Point mutations in LATS1 were reported in human sarcomas. Additionally, loss of heterozygosity of LATS1 chromosomal region predisposes to breast, ovarian, and cervical tumors. In the current study, we investigated LATS1 genetic variations including single nucleotide polymorphisms (SNPs), in 28 Egyptian patients with either urinary bladder or colon cancers. The LATS1 gene was amplified and sequenced and the expression of LATS1 at the RNA level was assessed in 12 urinary bladder cancer samples. We report, the identification of a total of 29 variants including previously identified SNPs within LATS1 coding and non-coding sequences. A total of 18 variants were novel. Majority of the novel variants, 13, were mapped to intronic sequences and un-translated regions of the gene. Four of the five novel variants located in the coding region of the gene, represented missense mutations within the serine/threonine kinase catalytic domain. Interestingly, LATS1 RNA steady state levels was lost in urinary bladder cancerous tissue harboring four specific SNPs (16045 + 41736 + 34614 + 56177) positioned in the 5'UTR, intron 6, and two silent mutations within exon 4 and exon 8, respectively. This study identifies novel single-base-sequence alterations in the LATS1 gene. These newly identified variants could potentially be used as novel diagnostic or prognostic tools in cancer.

  11. mirVAFC: A Web Server for Prioritizations of Pathogenic Sequence Variants from Exome Sequencing Data via Classifications.

    PubMed

    Li, Zhongshan; Liu, Zhenwei; Jiang, Yi; Chen, Denghui; Ran, Xia; Sun, Zhong Sheng; Wu, Jinyu

    2017-01-01

    Exome sequencing has been widely used to identify the genetic variants underlying human genetic disorders for clinical diagnoses, but the identification of pathogenic sequence variants among the huge amounts of benign ones is complicated and challenging. Here, we describe a new Web server named mirVAFC for pathogenic sequence variants prioritizations from clinical exome sequencing (CES) variant data of single individual or family. The mirVAFC is able to comprehensively annotate sequence variants, filter out most irrelevant variants using custom criteria, classify variants into different categories as for estimated pathogenicity, and lastly provide pathogenic variants prioritizations based on classifications and mutation effects. Case studies using different types of datasets for different diseases from publication and our in-house data have revealed that mirVAFC can efficiently identify the right pathogenic candidates as in original work in each case. Overall, the Web server mirVAFC is specifically developed for pathogenic sequence variant identifications from family-based CES variants using classification-based prioritizations. The mirVAFC Web server is freely accessible at https://www.wzgenomics.cn/mirVAFC/. © 2016 WILEY PERIODICALS, INC.

  12. Association between sequence variants in panicle development genes and the number of spikelets per panicle in rice.

    PubMed

    Jang, Su; Lee, Yunjoo; Lee, Gileung; Seo, Jeonghwan; Lee, Dongryung; Yu, Yoye; Chin, Joong Hyoun; Koh, Hee-Jong

    2018-01-15

    Balancing panicle-related traits such as panicle length and the numbers of primary and secondary branches per panicle, is key to improving the number of spikelets per panicle in rice. Identifying genetic information contributes to a broader understanding of the roles of gene and provides candidate alleles for use as DNA markers. Discovering relations between panicle-related traits and sequence variants allows opportunity for molecular application in rice breeding to improve the number of spikelets per panicle. In total, 142 polymorphic sites, which constructed 58 haplotypes, were detected in coding regions of ten panicle development gene and 35 sequence variants in six genes were significantly associated with panicle-related traits. Rice cultivars were clustered according to their sequence variant profiles. One of the four resultant clusters, which contained only indica and tong-il varieties, exhibited the largest average number of favorable alleles and highest average number of spikelets per panicle, suggesting that the favorable allele combination found in this cluster was beneficial in increasing the number of spikelets per panicle. Favorable alleles identified in this study can be used to develop functional markers for rice breeding programs. Furthermore, stacking several favorable alleles has the potential to substantially improve the number of spikelets per panicle in rice.

  13. Next-generation sequencing reveals a novel NDP gene mutation in a Chinese family with Norrie disease.

    PubMed

    Huang, Xiaoyan; Tian, Mao; Li, Jiankang; Cui, Ling; Li, Min; Zhang, Jianguo

    2017-11-01

    Norrie disease (ND) is a rare X-linked genetic disorder, the main symptoms of which are congenital blindness and white pupils. It has been reported that ND is caused by mutations in the NDP gene. Although many mutations in NDP have been reported, the genetic cause for many patients remains unknown. In this study, the aim is to investigate the genetic defect in a five-generation family with typical symptoms of ND. To identify the causative gene, next-generation sequencing based target capture sequencing was performed. Segregation analysis of the candidate variant was performed in additional family members using Sanger sequencing. We identified a novel missense variant (c.314C>A) located within the NDP gene. The mutation cosegregated within all affected individuals in the family and was not found in unaffected members. By happenstance, in this family, we also detected a known pathogenic variant of retinitis pigmentosa in a healthy individual. c.314C>A mutation of NDP gene is a novel mutation and broadens the genetic spectrum of ND.

  14. Next-generation sequencing reveals a novel NDP gene mutation in a Chinese family with Norrie disease

    PubMed Central

    Huang, Xiaoyan; Tian, Mao; Li, Jiankang; Cui, Ling; Li, Min; Zhang, Jianguo

    2017-01-01

    Purpose: Norrie disease (ND) is a rare X-linked genetic disorder, the main symptoms of which are congenital blindness and white pupils. It has been reported that ND is caused by mutations in the NDP gene. Although many mutations in NDP have been reported, the genetic cause for many patients remains unknown. In this study, the aim is to investigate the genetic defect in a five-generation family with typical symptoms of ND. Methods: To identify the causative gene, next-generation sequencing based target capture sequencing was performed. Segregation analysis of the candidate variant was performed in additional family members using Sanger sequencing. Results: We identified a novel missense variant (c.314C>A) located within the NDP gene. The mutation cosegregated within all affected individuals in the family and was not found in unaffected members. By happenstance, in this family, we also detected a known pathogenic variant of retinitis pigmentosa in a healthy individual. Conclusion: c.314C>A mutation of NDP gene is a novel mutation and broadens the genetic spectrum of ND. PMID:29133643

  15. Occurrence of novel GII.17 and GII.21 norovirus variants in the coastal environment of South Korea in 2015

    PubMed Central

    Koo, Eung Seo; Kim, Man Su; Choi, Yong Seon; Park, Kwon-Sam; Jeong, Yong Seok

    2017-01-01

    Human norovirus (HNoV), a positive-sense RNA virus, is the main causative agent of acute viral gastroenteritis. Multiple pandemic variants of the genogroup II genotype 4 (GII.4) of NoV have attracted great attention from researchers worldwide. However, novel variants of GII.17 have been overtaking those pandemic variants in some areas of East Asia. To investigate the environmental occurrence of GII in South Korea, we collected water samples from coastal streams and a neighboring waste water treatment plant in North Jeolla province (in March, July, and December of 2015). Based on capsid gene region C analysis, four different genotypes (GII.4, GII.13, GII.17, and GII.21) were detected, with much higher prevalence of GII.17 than of GII.4. Additional sequence analyses of the ORF1-ORF2 junction and ORF2 from the water samples revealed that the GII.17 sequences in this study were closely related to the novel strains of GII.P17-GII.17, the main causative variants of the 2014–2015 HNoV outbreak in China and Japan. In addition, the GII.P21-GII.21 variants were identified in this study and they had new amino acid sequence variations in the blockade epitopes of the P2 domain. From these results, we present two important findings: 1) the novel GII.P17-GII.17 variants appeared to be predominant in the study area, and 2) new GII.21 variants have emerged in South Korea. PMID:28199388

  16. Rapid differentiation of citrus Hop stunt viroid variants by real-time RT-PCR and high resolution melting analysis.

    PubMed

    Loconsole, Giuliana; Onelge, Nuket; Yokomi, Raymond K; Kubaa, Raied Abou; Savino, Vito; Saponari, Maria

    2013-01-01

    The RNA genome of pathogenic and non-pathogenic variants of citrus Hop stunt viroid (HSVd) differ by five to six nucleotides located within the variable (V) domain referred to as the "cachexia expression motif". Sensitive hosts such as mandarin and its hybrids are seriously affected by cachexia disease. Current methods to differentiate HSVd variants rely on lengthy greenhouse biological indexing on Parson's Special mandarin and/or direct nucleotide sequence analysis of amplicons from RT-PCR of HSVd-infected plants. Two independent high throughput assays to segregate HSVd variants by real-time RT-PCR and High-Resolution Melting Temperature (HRM) analysis were developed: one based on EVAGreen dye; the other based on TaqMan probes. Primers for both assays targeted three differentiating nucleotides in the V domain which separated HSVd variants into three clusters by distinct melting temperatures with a confidence level higher than 98%. The accuracy of the HRM assays were validated by nucleotide sequencing of representative samples within each HRM cluster and by testing 45 HSVd-infected field trees from California, Italy, Spain, Syria and Turkey. To our knowledge, this is the first report of a rapid and sensitive approach to detect and differentiate HSVd variants associated with different biological behaviors. Although, HSVd is found in several crops including citrus, cachexia variants are restricted to some citrus-growing areas, particularly the Mediterranean Region. Rapid diagnosis for cachexia and non-cachexia variants is, thus, important for the management of HSVd in citrus and reduces the need for bioindexing and sequencing analysis. Copyright © 2013 Elsevier Ltd. All rights reserved.

  17. A novel variant of FGFR3 causes proportionate short stature.

    PubMed

    Kant, Sarina G; Cervenkova, Iveta; Balek, Lukas; Trantirek, Lukas; Santen, Gijs W E; de Vries, Martine C; van Duyvenvoorde, Hermine A; van der Wielen, Michiel J R; Verkerk, Annemieke J M H; Uitterlinden, André G; Hannema, Sabine E; Wit, Jan M; Oostdijk, Wilma; Krejci, Pavel; Losekoot, Monique

    2015-06-01

    Mutations of the fibroblast growth factor receptor 3 (FGFR3) cause various forms of short stature, of which the least severe phenotype is hypochondroplasia, mainly characterized by disproportionate short stature. Testing for an FGFR3 mutation is currently not part of routine diagnostic testing in children with short stature without disproportion. A three-generation family A with dominantly transmitted proportionate short stature was studied by whole-exome sequencing to identify the causal gene mutation. Functional studies and protein modeling studies were performed to confirm the pathogenicity of the mutation found in FGFR3. We performed Sanger sequencing in a second family B with dominant proportionate short stature and identified a rare variant in FGFR3. Exome sequencing and/or Sanger sequencing was performed, followed by functional studies using transfection of the mutant FGFR3 into cultured cells; homology modeling was used to construct a three-dimensional model of the two FGFR3 variants. A novel p.M528I mutation in FGFR3 was detected in family A, which segregates with short stature and proved to be activating in vitro. In family B, a rare variant (p.F384L) was found in FGFR3, which did not segregate with short stature and showed normal functionality in vitro compared with WT. Proportionate short stature can be caused by a mutation in FGFR3. Sequencing of this gene can be considered in patients with short stature, especially when there is an autosomal dominant pattern of inheritance. However, functional studies and segregation studies should be performed before concluding that a variant is pathogenic. © 2015 European Society of Endocrinology.

  18. Sequence variant classification and reporting: recommendations for improving the interpretation of cancer susceptibility genetic test results

    PubMed Central

    Plon, Sharon E.; Eccles, Diana M.; Easton, Douglas; Foulkes, William D.; Genuardi, Maurizio; Greenblatt, Marc S.; Hogervorst, Frans B.L.; Hoogerbrugge, Nicoline; Spurdle, Amanda B.; Tavtigian, Sean

    2011-01-01

    Genetic testing of cancer susceptibility genes is now widely applied in clinical practice to predict risk of developing cancer. In general, sequence-based testing of germline DNA is used to determine whether an individual carries a change that is clearly likely to disrupt normal gene function. Genetic testing may detect changes that are clearly pathogenic, clearly neutral or variants of unclear clinical significance. Such variants present a considerable challenge to the diagnostic laboratory and the receiving clinician in terms of interpretation and clear presentation of the implications of the result to the patient. There does not appear to be a consistent approach to interpreting and reporting the clinical significance of variants either among genes or among laboratories. The potential for confusion among clinicians and patients is considerable and misinterpretation may lead to inappropriate clinical consequences. In this article we review the current state of sequence-based genetic testing, describe other standardized reporting systems used in oncology and propose a standardized classification system for application to sequence based results for cancer predisposition genes. We suggest a system of five classes of variants based on the degree of likelihood of pathogenicity. Each class is associated with specific recommendations for clinical management of at-risk relatives that will depend on the syndrome. We propose that panels of experts on each cancer predisposition syndrome facilitate the classification scheme and designate appropriate surveillance and cancer management guidelines. The international adoption of a standardized reporting system should improve the clinical utility of sequence-based genetic tests to predict cancer risk. PMID:18951446

  19. Next-Generation Sequencing in Oncology: Genetic Diagnosis, Risk Prediction and Cancer Classification

    PubMed Central

    Kamps, Rick; Brandão, Rita D.; van den Bosch, Bianca J.; Paulussen, Aimee D. C.; Xanthoulea, Sofia; Blok, Marinus J.; Romano, Andrea

    2017-01-01

    Next-generation sequencing (NGS) technology has expanded in the last decades with significant improvements in the reliability, sequencing chemistry, pipeline analyses, data interpretation and costs. Such advances make the use of NGS feasible in clinical practice today. This review describes the recent technological developments in NGS applied to the field of oncology. A number of clinical applications are reviewed, i.e., mutation detection in inherited cancer syndromes based on DNA-sequencing, detection of spliceogenic variants based on RNA-sequencing, DNA-sequencing to identify risk modifiers and application for pre-implantation genetic diagnosis, cancer somatic mutation analysis, pharmacogenetics and liquid biopsy. Conclusive remarks, clinical limitations, implications and ethical considerations that relate to the different applications are provided. PMID:28146134

  20. A 3.4-kb Copy-Number Deletion near EPAS1 Is Significantly Enriched in High-Altitude Tibetans but Absent from the Denisovan Sequence

    PubMed Central

    Lou, Haiyi; Lu, Yan; Lu, Dongsheng; Fu, Ruiqing; Wang, Xiaoji; Feng, Qidi; Wu, Sijie; Yang, Yajun; Li, Shilin; Kang, Longli; Guan, Yaqun; Hoh, Boon-Peng; Chung, Yeun-Jun; Jin, Li; Su, Bing; Xu, Shuhua

    2015-01-01

    Tibetan high-altitude adaptation (HAA) has been studied extensively, and many candidate genes have been reported. Subsequent efforts targeting HAA functional variants, however, have not been that successful (e.g., no functional variant has been suggested for the top candidate HAA gene, EPAS1). With WinXPCNVer, a method developed in this study, we detected in microarray data a Tibetan-enriched deletion (TED) carried by 90% of Tibetans; 50% were homozygous for the deletion, whereas only 3% carried the TED and 0% carried the homozygous deletion in 2,792 worldwide samples (p < 10−15). We employed long PCR and Sanger sequencing technologies to determine the exact copy number and breakpoints of the TED in 70 additional Tibetan and 182 diverse samples. The TED had identical boundaries (chr2: 46,694,276–46,697,683; hg19) and was 80 kb downstream of EPAS1. Notably, the TED was in strong linkage disequilibrium (LD; r2 = 0.8) with EPAS1 variants associated with reduced blood concentrations of hemoglobin. It was also in complete LD with the 5-SNP motif, which was suspected to be introgressed from Denisovans, but the deletion itself was absent from the Denisovan sequence. Correspondingly, we detected that footprints of positive selection for the TED occurred 12,803 (95% confidence interval = 12,075–14,725) years ago. We further whole-genome deep sequenced (>60×) seven Tibetans and verified the TED but failed to identify any other copy-number variations with comparable patterns, giving this TED top priority for further study. We speculate that the specific patterns of the TED resulted from its own functionality in HAA of Tibetans or LD with a functional variant of EPAS1. PMID:26073780

  1. Privacy preserving protocol for detecting genetic relatives using rare variants.

    PubMed

    Hormozdiari, Farhad; Joo, Jong Wha J; Wadia, Akshay; Guan, Feng; Ostrosky, Rafail; Sahai, Amit; Eskin, Eleazar

    2014-06-15

    High-throughput sequencing technologies have impacted many areas of genetic research. One such area is the identification of relatives from genetic data. The standard approach for the identification of genetic relatives collects the genomic data of all individuals and stores it in a database. Then, each pair of individuals is compared to detect the set of genetic relatives, and the matched individuals are informed. The main drawback of this approach is the requirement of sharing your genetic data with a trusted third party to perform the relatedness test. In this work, we propose a secure protocol to detect the genetic relatives from sequencing data while not exposing any information about their genomes. We assume that individuals have access to their genome sequences but do not want to share their genomes with anyone else. Unlike previous approaches, our approach uses both common and rare variants which provide the ability to detect much more distant relationships securely. We use a simulated data generated from the 1000 genomes data and illustrate that we can easily detect up to fifth degree cousins which was not possible using the existing methods. We also show in the 1000 genomes data with cryptic relationships that our method can detect these individuals. The software is freely available for download at http://genetics.cs.ucla.edu/crypto/. © The Author 2014. Published by Oxford University Press.

  2. [Structural organization of 5S ribosomal DNA of Rosa rugosa].

    PubMed

    Tynkevych, Iu O; Volkov, R A

    2014-01-01

    In order to clarify molecular organization of the genomic region encoding 5S rRNA in diploid species Rosa rugosa several 5S rDNA repeated units were cloned and sequenced. Analysis of the obtained sequences revealed that only one length variant of 5S rDNA repeated units, which contains intact promoter elements in the intergenic spacer region (IGS) and appears to be transcriptionally active is present in the genome. Additionally, a limited number of 5S rDNA pseudogenes lacking a portion of coding sequence and the complete IGS was detected. A high level of sequence similarity (from 93.7 to 97.5%) between the IGS of major 5S rDNA variants of East Asian R. rugosa and North American R. nitida was found indicating comparatively recent divergence of these species.

  3. Distribution of gene mutations in sporadic congenital cataract in a Han Chinese population

    PubMed Central

    Li, Dan; Wang, Siying; Ye, Hongfei; Tang, Yating; Qiu, Xiaodi; Fan, Qi; Rong, Xianfang; Liu, Xin; Chen, Yuhong; Yang, Jin

    2016-01-01

    Purpose This study aimed to investigate the genetic effects underlying non-familial sporadic congenital cataract (SCC). Methods We collected DNA samples from 74 patients with SCC and 20 patients with traumatic cataract (TC) in an age-matched group and performed genomic sequencing of 61 lens-related genes with target region capture and next-generation sequencing (NGS). The suspected SCC variants were validated with MassARRAY and Sanger sequencing. DNA samples from 103 healthy subjects were used as additional controls in the confirmation examination. Results By filtering against common variants in public databases and those associated with TC cases, we identified 23 SCC-specific variants in 17 genes from 19 patients, which were predicted to be functional. These mutations were further confirmed by examination of the 103 healthy controls. Among the mutated genes, CRYBB3 had the highest mutation frequency with mutations detected four times in four patients, followed by EPHA2, NHS, and WDR36, the mutation of which were detected two times in two patients. We observed that the four patients with CRYBB3 mutations had three different cataract phenotypes. Conclusions From this study, we concluded the clinical and genetic heterogeneity of SCC. This is the first study to report broad spectrum genotyping for patients with SCC. PMID:27307692

  4. A phylogenetic framework facilitates Y-STR variant discovery and classification via massively parallel sequencing.

    PubMed

    Huszar, Tunde I; Jobling, Mark A; Wetton, Jon H

    2018-04-12

    Short tandem repeats on the male-specific region of the Y chromosome (Y-STRs) are permanently linked as haplotypes, and therefore Y-STR sequence diversity can be considered within the robust framework of a phylogeny of haplogroups defined by single nucleotide polymorphisms (SNPs). Here we use massively parallel sequencing (MPS) to analyse the 23 Y-STRs in Promega's prototype PowerSeq™ Auto/Mito/Y System kit (containing the markers of the PowerPlex® Y23 [PPY23] System) in a set of 100 diverse Y chromosomes whose phylogenetic relationships are known from previous megabase-scale resequencing. Including allele duplications and alleles resulting from likely somatic mutation, we characterised 2311 alleles, demonstrating 99.83% concordance with capillary electrophoresis (CE) data on the same sample set. The set contains 267 distinct sequence-based alleles (an increase of 58% compared to the 169 detectable by CE), including 60 novel Y-STR variants phased with their flanking sequences which have not been reported previously to our knowledge. Variation includes 46 distinct alleles containing non-reference variants of SNPs/indels in both repeat and flanking regions, and 145 distinct alleles containing repeat pattern variants (RPV). For DYS385a,b, DYS481 and DYS390 we observed repeat count variation in short flanking segments previously considered invariable, and suggest new MPS-based structural designations based on these. We considered the observed variation in the context of the Y phylogeny: several specific haplogroup associations were observed for SNPs and indels, reflecting the low mutation rates of such variant types; however, RPVs showed less phylogenetic coherence and more recurrence, reflecting their relatively high mutation rates. In conclusion, our study reveals considerable additional diversity at the Y-STRs of the PPY23 set via MPS analysis, demonstrates high concordance with CE data, facilitates nomenclature standardisation, and places Y-STR sequence variants in their phylogenetic context. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

  5. Viral metagenomic analysis of feces of wild small carnivores

    PubMed Central

    2014-01-01

    Background Recent studies have clearly demonstrated the enormous virus diversity that exists among wild animals. This exemplifies the required expansion of our knowledge of the virus diversity present in wildlife, as well as the potential transmission of these viruses to domestic animals or humans. Methods In the present study we evaluated the viral diversity of fecal samples (n = 42) collected from 10 different species of wild small carnivores inhabiting the northern part of Spain using random PCR in combination with next-generation sequencing. Samples were collected from American mink (Neovison vison), European mink (Mustela lutreola), European polecat (Mustela putorius), European pine marten (Martes martes), stone marten (Martes foina), Eurasian otter (Lutra lutra) and Eurasian badger (Meles meles) of the family of Mustelidae; common genet (Genetta genetta) of the family of Viverridae; red fox (Vulpes vulpes) of the family of Canidae and European wild cat (Felis silvestris) of the family of Felidae. Results A number of sequences of possible novel viruses or virus variants were detected, including a theilovirus, phleboviruses, an amdovirus, a kobuvirus and picobirnaviruses. Conclusions Using random PCR in combination with next generation sequencing, sequences of various novel viruses or virus variants were detected in fecal samples collected from Spanish carnivores. Detected novel viruses highlight the viral diversity that is present in fecal material of wild carnivores. PMID:24886057

  6. A typing scheme for the honeybee pathogen Melissococcus plutonius allows detection of disease transmission events and a study of the distribution of variants.

    PubMed

    Haynes, Edward; Helgason, Thorunn; Young, J Peter W; Thwaites, Richard; Budge, Giles E

    2013-08-01

    Melissococcus plutonius is the bacterial pathogen that causes European Foulbrood of honeybees, a globally important honeybee brood disease. We have used next-generation sequencing to identify highly polymorphic regions in an otherwise genetically homogenous organism, and used these loci to create a modified MLST scheme. This synthesis of a proven typing scheme format with next-generation sequencing combines reliability and low costs with insights only available from high-throughput sequencing technologies. Using this scheme we show that the global distribution of M.plutonius variants is not uniform. We use the scheme in epidemiological studies to trace movements of infective material around England, insights that would have been impossible to confirm without the typing scheme. We also demonstrate the persistence of local variants over time. © 2013 Crown copyright. Reproduced with the permission of the Controller of Her Majesty's Stationary Office/Queen’s Printer for Scotland and Food and Environment Research Agency.

  7. Investigation of Outbreaks of Salmonella enterica Serovar Typhimurium and Its Monophasic Variants Using Whole-Genome Sequencing, Denmark

    PubMed Central

    Gymoese, Pernille; Sørensen, Gitte; Litrup, Eva; Olsen, John Elmerdal; Nielsen, Eva Møller

    2017-01-01

    Whole-genome sequencing is rapidly replacing current molecular typing methods for surveillance purposes. Our study evaluates core-genome single-nucleotide polymorphism analysis for outbreak detection and linking of sources of Salmonella enterica serovar Typhimurium and its monophasic variants during a 7-month surveillance period in Denmark. We reanalyzed and defined 8 previously characterized outbreaks from the phylogenetic relatedness of the isolates, epidemiologic data, and food traceback investigations. All outbreaks were identified, and we were able to exclude unrelated and include additional related human cases. We were furthermore able to link possible food and veterinary sources to the outbreaks. Isolates clustered according to sequence types (STs) 19, 34, and 36. Our study shows that core-genome single-nucleotide polymorphism analysis is suitable for surveillance and outbreak investigation for Salmonella Typhimurium (ST19 and ST36), but whole genome–wide analysis may be required for the tight genetic clone of monophasic variants (ST34). PMID:28930002

  8. Assessing the Power of Exome Chips.

    PubMed

    Page, Christian Magnus; Baranzini, Sergio E; Mevik, Bjørn-Helge; Bos, Steffan Daniel; Harbo, Hanne F; Andreassen, Bettina Kulle

    2015-01-01

    Genotyping chips for rare and low-frequent variants have recently gained popularity with the introduction of exome chips, but the utility of these chips remains unclear. These chips were designed using exome sequencing data from mainly American-European individuals, enriched for a narrow set of common diseases. In addition, it is well-known that the statistical power of detecting associations with rare and low-frequent variants is much lower compared to studies exclusively involving common variants. We developed a simulation program adaptable to any exome chip design to empirically evaluate the power of the exome chips. We implemented the main properties of the Illumina HumanExome BeadChip array. The simulated data sets were used to assess the power of exome chip based studies for varying effect sizes and causal variant scenarios. We applied two widely-used statistical approaches for rare and low-frequency variants, which collapse the variants into genetic regions or genes. Under optimal conditions, we found that a sample size between 20,000 to 30,000 individuals were needed in order to detect modest effect sizes (0.5% < PAR > 1%) with 80% power. For small effect sizes (PAR <0.5%), 60,000-100,000 individuals were needed in the presence of non-causal variants. In conclusion, we found that at least tens of thousands of individuals are necessary to detect modest effects under optimal conditions. In addition, when using rare variant chips on cohorts or diseases they were not originally designed for, the identification of associated variants or genes will be even more challenging.

  9. When is it MODY? Challenges in the Interpretation of Sequence Variants in MODY Genes

    PubMed Central

    Althari, Sara; Gloyn, Anna L.

    2015-01-01

    The genomics revolution has raised more questions than it has provided answers. Big data from large population-scale resequencing studies are increasingly deconstructing classic notions of Mendelian disease genetics, which support a simplistic correlation between mutational severity and phenotypic outcome. The boundaries are being blurred as the body of evidence showing monogenic disease-causing alleles in healthy genomes, and in the genomes of individu-als with increased common complex disease risk, continues to grow. In this review, we focus on the newly emerging challenges which pertain to the interpretation of sequence variants in genes implicated in the pathogenesis of maturity-onset diabetes of the young (MODY), a presumed mono-genic form of diabetes characterized by Mendelian inheritance. These challenges highlight the complexities surrounding the assignments of pathogenicity, in particular to rare protein-alerting variants, and bring to the forefront some profound clinical diagnostic implications. As MODY is both genetically and clinically heterogeneous, an accurate molecular diagnosis and cautious extrapolation of sequence data are critical to effective disease management and treatment. The biological and translational value of sequence information can only be attained by adopting a multitude of confirmatory analyses, which interrogate variant implication in disease from every possible angle. Indeed, studies which have effectively detected rare damaging variants in known MODY genes in normoglycemic individuals question the existence of a sin-gle gene mutation scenario: does monogenic diabetes exist when the genetic culprits of MODY have been systematical-ly identified in individuals without MODY? PMID:27111119

  10. ExScalibur: A High-Performance Cloud-Enabled Suite for Whole Exome Germline and Somatic Mutation Identification.

    PubMed

    Bao, Riyue; Hernandez, Kyle; Huang, Lei; Kang, Wenjun; Bartom, Elizabeth; Onel, Kenan; Volchenboum, Samuel; Andrade, Jorge

    2015-01-01

    Whole exome sequencing has facilitated the discovery of causal genetic variants associated with human diseases at deep coverage and low cost. In particular, the detection of somatic mutations from tumor/normal pairs has provided insights into the cancer genome. Although there is an abundance of publicly-available software for the detection of germline and somatic variants, concordance is generally limited among variant callers and alignment algorithms. Successful integration of variants detected by multiple methods requires in-depth knowledge of the software, access to high-performance computing resources, and advanced programming techniques. We present ExScalibur, a set of fully automated, highly scalable and modulated pipelines for whole exome data analysis. The suite integrates multiple alignment and variant calling algorithms for the accurate detection of germline and somatic mutations with close to 99% sensitivity and specificity. ExScalibur implements streamlined execution of analytical modules, real-time monitoring of pipeline progress, robust handling of errors and intuitive documentation that allows for increased reproducibility and sharing of results and workflows. It runs on local computers, high-performance computing clusters and cloud environments. In addition, we provide a data analysis report utility to facilitate visualization of the results that offers interactive exploration of quality control files, read alignment and variant calls, assisting downstream customization of potential disease-causing mutations. ExScalibur is open-source and is also available as a public image on Amazon cloud.

  11. Prevalence of pathogenic germline variants detected by multigene sequencing in unselected Japanese patients with ovarian cancer

    PubMed Central

    Hirasawa, Akira; Imoto, Issei; Naruto, Takuya; Akahane, Tomoko; Yamagami, Wataru; Nomura, Hiroyuki; Masuda, Kiyoshi; Susumu, Nobuyuki; Tsuda, Hitoshi; Aoki, Daisuke

    2017-01-01

    Pathogenic germline BRCA1, BRCA2 (BRCA1/2), and several other gene variants predispose women to primary ovarian, fallopian tube, and peritoneal carcinoma (OC), although variant frequency and relevance information is scarce in Japanese women with OC. Using targeted panel sequencing, we screened 230 unselected Japanese women with OC from our hospital-based cohort for pathogenic germline variants in 75 or 79 OC-associated genes. Pathogenic variants of 11 genes were identified in 41 (17.8%) women: 19 (8.3%; BRCA1), 8 (3.5%; BRCA2), 6 (2.6%; mismatch repair genes), 3 (1.3%; RAD51D), 2 (0.9%; ATM), 1 (0.4%; MRE11A), 1 (FANCC), and 1 (GABRA6). Carriers of BRCA1/2 or any other tested gene pathogenic variants were more likely to be diagnosed younger, have first or second-degree relatives with OC, and have OC classified as high-grade serous carcinoma (HGSC). After adjustment for these variables, all 3 features were independent predictive factors for pathogenic variants in any tested genes whereas only the latter two remained for variants in BRCA1/2. Our data indicate similar variant prevalence in Japanese patients with OC and other ethnic groups and suggest that HGSC and OC family history may facilitate genetic predisposition prediction in Japanese patients with OC and referring high-risk patients for genetic counseling and testing. PMID:29348823

  12. Prevalence of pathogenic germline variants detected by multigene sequencing in unselected Japanese patients with ovarian cancer.

    PubMed

    Hirasawa, Akira; Imoto, Issei; Naruto, Takuya; Akahane, Tomoko; Yamagami, Wataru; Nomura, Hiroyuki; Masuda, Kiyoshi; Susumu, Nobuyuki; Tsuda, Hitoshi; Aoki, Daisuke

    2017-12-22

    Pathogenic germline BRCA1 , BRCA2 ( BRCA1/2 ), and several other gene variants predispose women to primary ovarian, fallopian tube, and peritoneal carcinoma (OC), although variant frequency and relevance information is scarce in Japanese women with OC. Using targeted panel sequencing, we screened 230 unselected Japanese women with OC from our hospital-based cohort for pathogenic germline variants in 75 or 79 OC-associated genes. Pathogenic variants of 11 genes were identified in 41 (17.8%) women: 19 (8.3%; BRCA1 ), 8 (3.5%; BRCA2 ), 6 (2.6%; mismatch repair genes), 3 (1.3%; RAD51D ), 2 (0.9%; ATM ), 1 (0.4%; MRE11A ), 1 ( FANCC ), and 1 ( GABRA6 ). Carriers of BRCA1/2 or any other tested gene pathogenic variants were more likely to be diagnosed younger, have first or second-degree relatives with OC, and have OC classified as high-grade serous carcinoma (HGSC). After adjustment for these variables, all 3 features were independent predictive factors for pathogenic variants in any tested genes whereas only the latter two remained for variants in BRCA1/2 . Our data indicate similar variant prevalence in Japanese patients with OC and other ethnic groups and suggest that HGSC and OC family history may facilitate genetic predisposition prediction in Japanese patients with OC and referring high-risk patients for genetic counseling and testing.

  13. Evolutional dynamics of 45S and 5S ribosomal DNA in ancient allohexaploid Atropa belladonna.

    PubMed

    Volkov, Roman A; Panchuk, Irina I; Borisjuk, Nikolai V; Hosiawa-Baranska, Marta; Maluszynska, Jolanta; Hemleben, Vera

    2017-01-23

    Polyploid hybrids represent a rich natural resource to study molecular evolution of plant genes and genomes. Here, we applied a combination of karyological and molecular methods to investigate chromosomal structure, molecular organization and evolution of ribosomal DNA (rDNA) in nightshade, Atropa belladonna (fam. Solanaceae), one of the oldest known allohexaploids among flowering plants. Because of their abundance and specific molecular organization (evolutionarily conserved coding regions linked to variable intergenic spacers, IGS), 45S and 5S rDNA are widely used in plant taxonomic and evolutionary studies. Molecular cloning and nucleotide sequencing of A. belladonna 45S rDNA repeats revealed a general structure characteristic of other Solanaceae species, and a very high sequence similarity of two length variants, with the only difference in number of short IGS subrepeats. These results combined with the detection of three pairs of 45S rDNA loci on separate chromosomes, presumably inherited from both tetraploid and diploid ancestor species, example intensive sequence homogenization that led to substitution/elimination of rDNA repeats of one parent. Chromosome silver-staining revealed that only four out of six 45S rDNA sites are frequently transcriptionally active, demonstrating nucleolar dominance. For 5S rDNA, three size variants of repeats were detected, with the major class represented by repeats containing all functional IGS elements required for transcription, the intermediate size repeats containing partially deleted IGS sequences, and the short 5S repeats containing severe defects both in the IGS and coding sequences. While shorter variants demonstrate increased rate of based substitution, probably in their transition into pseudogenes, the functional 5S rDNA variants are nearly identical at the sequence level, pointing to their origin from a single parental species. Localization of the 5S rDNA genes on two chromosome pairs further supports uniparental inheritance from the tetraploid progenitor. The obtained molecular, cytogenetic and phylogenetic data demonstrate complex evolutionary dynamics of rDNA loci in allohexaploid species of Atropa belladonna. The high level of sequence unification revealed in 45S and 5S rDNA loci of this ancient hybrid species have been seemingly achieved by different molecular mechanisms.

  14. HPV-11 variability, persistence and progression to genital warts in men: the HIM study.

    PubMed

    Flores-Díaz, Ema; Sereday, Karen A; Ferreira, Silvaneide; Sirak, Bradley; Sobrinho, João Simão; Baggio, Maria Luiza; Galan, Lenice; Silva, Roberto C; Lazcano-Ponce, Eduardo; Giuliano, Anna R; Villa, Luisa L; Sichero, Laura

    2017-09-01

    HPV-11 and HPV-6 are the etiological agents of about 90 % of genital warts (GWs). The intra-typic variability of HPV-11 and its association with infection persistence and GW development remains undetermined. Here, HPV infection in men (HIM) participants who had an HPV-11 genital swab and/or GW, preceded or not by a normal skin genital swab were analysed. Genomic variants were characterized by PCR-sequencing and classified within lineages (A, B) and sublineages (A1, A2, A3, A4). HPV-11 A2 variants were the most frequently detected in the genital swab samples from controls and in both genital swabs and GW samples from cases. The same HPV-11 variant was detected in the GW sample and its preceding genital swab. There was a lack of association between any particular HPV-11 variant and the increased risk for GW development.

  15. Implementation and utilization of genetic testing in personalized medicine

    PubMed Central

    Abul-Husn, Noura S; Owusu Obeng, Aniwaa; Sanderson, Saskia C; Gottesman, Omri; Scott, Stuart A

    2014-01-01

    Clinical genetic testing began over 30 years ago with the availability of mutation detection for sickle cell disease diagnosis. Since then, the field has dramatically transformed to include gene sequencing, high-throughput targeted genotyping, prenatal mutation detection, preimplantation genetic diagnosis, population-based carrier screening, and now genome-wide analyses using microarrays and next-generation sequencing. Despite these significant advances in molecular technologies and testing capabilities, clinical genetics laboratories historically have been centered on mutation detection for Mendelian disorders. However, the ongoing identification of deoxyribonucleic acid (DNA) sequence variants associated with common diseases prompted the availability of testing for personal disease risk estimation, and created commercial opportunities for direct-to-consumer genetic testing companies that assay these variants. This germline genetic risk, in conjunction with other clinical, family, and demographic variables, are the key components of the personalized medicine paradigm, which aims to apply personal genomic and other relevant data into a patient’s clinical assessment to more precisely guide medical management. However, genetic testing for disease risk estimation is an ongoing topic of debate, largely due to inconsistencies in the results, concerns over clinical validity and utility, and the variable mode of delivery when returning genetic results to patients in the absence of traditional counseling. A related class of genetic testing with analogous issues of clinical utility and acceptance is pharmacogenetic testing, which interrogates sequence variants implicated in interindividual drug response variability. Although clinical pharmacogenetic testing has not previously been widely adopted, advances in rapid turnaround time genetic testing technology and the recent implementation of preemptive genotyping programs at selected medical centers suggest that personalized medicine through pharmacogenetics is now a reality. This review aims to summarize the current state of implementing genetic testing for personalized medicine, with an emphasis on clinical pharmacogenetic testing. PMID:25206309

  16. Mapping DNA Methylation with High Throughput Nanopore Sequencing

    PubMed Central

    Rand, Arthur C.; Jain, Miten; Eizenga, Jordan M.; Musselman-Brown, Audrey; Olsen, Hugh E.; Akeson, Mark

    2017-01-01

    Chemical modifications to DNA regulate its biological function. We present a framework for mapping methylation to cytosine and adenosine with the Oxford Nanopore Technologies MinION using its ionic current signal. We map three cytosine variants and two adenine variants. The results show that our model is sensitive enough to detect changes in genomic DNA methylation levels as a function of growth phase in E. coli. PMID:28218897

  17. Construction of an Exome-Wide Risk Score for Schizophrenia Based on a Weighted Burden Test.

    PubMed

    Curtis, David

    2018-01-01

    Polygenic risk scores obtained as a weighted sum of associated variants can be used to explore association in additional data sets and to assign risk scores to individuals. The methods used to derive polygenic risk scores from common SNPs are not suitable for variants detected in whole exome sequencing studies. Rare variants, which may have major effects, are seen too infrequently to judge whether they are associated and may not be shared between training and test subjects. A method is proposed whereby variants are weighted according to their frequency, their annotations and the genes they affect. A weighted sum across all variants provides an individual risk score. Scores constructed in this way are used in a weighted burden test and are shown to be significantly different between schizophrenia cases and controls using a five-way cross-validation procedure. This approach represents a first attempt to summarise exome sequence variation into a summary risk score, which could be combined with risk scores from common variants and from environmental factors. It is hoped that the method could be developed further. © 2017 John Wiley & Sons Ltd/University College London.

  18. Impact of Pathogen Population Heterogeneity and Stress-Resistant Variants on Food Safety.

    PubMed

    Abee, T; Koomen, J; Metselaar, K I; Zwietering, M H; den Besten, H M W

    2016-01-01

    This review elucidates the state-of-the-art knowledge about pathogen population heterogeneity and describes the genotypic and phenotypic analyses of persister subpopulations and stress-resistant variants. The molecular mechanisms underlying the generation of persister phenotypes and genetic variants are identified. Zooming in on Listeria monocytogenes, a comparative whole-genome sequence analysis of wild types and variants that enabled the identification of mutations in variants obtained after a single exposure to lethal food-relevant stresses is described. Genotypic and phenotypic features are compared to those for persistent strains isolated from food processing environments. Inactivation kinetics, models used for fitting, and the concept of kinetic modeling-based schemes for detection of variants are presented. Furthermore, robustness and fitness parameters of L. monocytogenes wild type and variants are used to model their performance in food chains. Finally, the impact of stress-resistant variants and persistence in food processing environments on food safety is discussed.

  19. Epidemiological evolution of canine parvovirus in the Portuguese domestic dog population.

    PubMed

    Miranda, Carla; Parrish, Colin R; Thompson, Gertrude

    2016-02-01

    Since its emergence, canine parvovirus type 2 (CPV-2) has caused disease pandemics with severe gastroenteritis signs, infecting especially puppies. As a consequence of CPV rapid evolution a variety of genetic and antigenic variants have been reported circulating worldwide. The detection of additional variants of CPV circulating in the dog population in Portugal suggests monitoring of the disease is useful. The objectives of this study were to further detect and characterize circulating field variants from suspected CPV diseased dogs that were admitted to veterinary clinics distributed throughout the country, during 2012-2014. Of the 260 fecal samples collected, 198 were CPV positive by PCR, and CPV antigen was detected in 61/109 samples by Immunochromatographic (IC) test. The restriction fragment length polymorphism (RFLP) analysis of 167 samples revealed that 86 were the CPV-2c. Sequence analysis of the 198 strains confirmed that CPV-2c were the dominant variant (51.5%), followed by CPV-2b (47.5%) and CPV-2a (1%). The variants were irregularly distributed throughout the country and some were detected with additional non-synonymous mutations in the VP2 gene. Phylogenetic analysis demonstrated that the isolates were similar to other European strains, and that this virus continues to evolve. Copyright © 2015 Elsevier B.V. All rights reserved.

  20. Oncogenic Human Papillomavirus (HPV) Type Distribution and HPV Type 16 E6 Variants in Two Spanish Population Groups with Different Levels of HPV Infection Risk

    PubMed Central

    Ortiz, M.; Torres, M.; Muñoz, L.; Fernández-García, E.; Canals, J.; Cabornero, A. I.; Aguilar, E.; Ballesteros, J.; del Amo, J.; García-Sáiz, A.

    2006-01-01

    The aim of this study is to determine oncogenic human papillomavirus (HPV) types and HPV type 16 (HPV16) variant distribution in two Spanish population groups, commercial sex workers and imprisoned women (CSW/IPW) and the general population. A multicenter cross-sectional study of 1,889 women from five clinical settings in two Spanish cities was conducted from May to November 2004. Oncogenic HPV infection was tested by an Hybrid Capture II (HC2) test, and positive samples were genotyped by direct sequencing using three different primer sets in L1 (MY09/11 and GP5+/GP6+) and E6/E7. HPV16 variants were identified by sequencing the E6, E2, and L1 regions. Four hundred twenty-five samples were positive for the HC2 test, 31.5% from CSW/IPW and 10.7% from the general population. HPV16 was the most frequent type. Distinct profiles of oncogenic HPV type prevalence were observed across the two populations. In order of decreasing frequency, HPV types 16, 31, 58, 66, 56, and 18 were most frequent in CSW/IPW women, and types 16, 31, 52, 68, 51, and 53 were most frequent in the general population. We analyzed HPV16 intratype variants, and a large majority (78.7%) belonged to the European lineage. AA variants were detected in 16.0% of cases. African variants belonging to classes Af1 (4.0%) and Af2 (1.3%) were detected. Different HPV types and HPV16 intratype variants are involved in oncogenic HPV infections in our population. These results suggest that HPV type distribution differs in CSW/IPW women and in the general population, although further analysis is necessary. PMID:16597872

  1. Oncogenic human papillomavirus (HPV) type distribution and HPV type 16 E6 variants in two Spanish population groups with different levels of HPV infection risk.

    PubMed

    Ortiz, M; Torres, M; Muñoz, L; Fernández-García, E; Canals, J; Cabornero, A I; Aguilar, E; Ballesteros, J; Del Amo, J; García-Sáiz, A

    2006-04-01

    The aim of this study is to determine oncogenic human papillomavirus (HPV) types and HPV type 16 (HPV16) variant distribution in two Spanish population groups, commercial sex workers and imprisoned women (CSW/IPW) and the general population. A multicenter cross-sectional study of 1,889 women from five clinical settings in two Spanish cities was conducted from May to November 2004. Oncogenic HPV infection was tested by an Hybrid Capture II (HC2) test, and positive samples were genotyped by direct sequencing using three different primer sets in L1 (MY09/11 and GP5+/GP6+) and E6/E7. HPV16 variants were identified by sequencing the E6, E2, and L1 regions. Four hundred twenty-five samples were positive for the HC2 test, 31.5% from CSW/IPW and 10.7% from the general population. HPV16 was the most frequent type. Distinct profiles of oncogenic HPV type prevalence were observed across the two populations. In order of decreasing frequency, HPV types 16, 31, 58, 66, 56, and 18 were most frequent in CSW/IPW women, and types 16, 31, 52, 68, 51, and 53 were most frequent in the general population. We analyzed HPV16 intratype variants, and a large majority (78.7%) belonged to the European lineage. AA variants were detected in 16.0% of cases. African variants belonging to classes Af1 (4.0%) and Af2 (1.3%) were detected. Different HPV types and HPV16 intratype variants are involved in oncogenic HPV infections in our population. These results suggest that HPV type distribution differs in CSW/IPW women and in the general population, although further analysis is necessary.

  2. Hybridization capture reveals evolution and conservation across the entire Koala retrovirus genome.

    PubMed

    Tsangaras, Kyriakos; Siracusa, Matthew C; Nikolaidis, Nikolas; Ishida, Yasuko; Cui, Pin; Vielgrader, Hanna; Helgen, Kristofer M; Roca, Alfred L; Greenwood, Alex D

    2014-01-01

    The koala retrovirus (KoRV) is the only retrovirus known to be in the midst of invading the germ line of its host species. Hybridization capture and next generation sequencing were used on modern and museum DNA samples of koala (Phascolarctos cinereus) to examine ca. 130 years of evolution across the full KoRV genome. Overall, the entire proviral genome appeared to be conserved across time in sequence, protein structure and transcriptional binding sites. A total of 138 polymorphisms were detected, of which 72 were found in more than one individual. At every polymorphic site in the museum koalas, one of the character states matched that of modern KoRV. Among non-synonymous polymorphisms, radical substitutions involving large physiochemical differences between amino acids were elevated in env, potentially reflecting anti-viral immune pressure or avoidance of receptor interference. Polymorphisms were not detected within two functional regions believed to affect infectivity. Host sequences flanking proviral integration sites were also captured; with few proviral loci shared among koalas. Recently described variants of KoRV, designated KoRV-B and KoRV-J, were not detected in museum samples, suggesting that these variants may be of recent origin.

  3. Hybridization Capture Reveals Evolution and Conservation across the Entire Koala Retrovirus Genome

    PubMed Central

    Ishida, Yasuko; Cui, Pin; Vielgrader, Hanna; Helgen, Kristofer M.; Roca, Alfred L.; Greenwood, Alex D.

    2014-01-01

    The koala retrovirus (KoRV) is the only retrovirus known to be in the midst of invading the germ line of its host species. Hybridization capture and next generation sequencing were used on modern and museum DNA samples of koala (Phascolarctos cinereus) to examine ca. 130 years of evolution across the full KoRV genome. Overall, the entire proviral genome appeared to be conserved across time in sequence, protein structure and transcriptional binding sites. A total of 138 polymorphisms were detected, of which 72 were found in more than one individual. At every polymorphic site in the museum koalas, one of the character states matched that of modern KoRV. Among non-synonymous polymorphisms, radical substitutions involving large physiochemical differences between amino acids were elevated in env, potentially reflecting anti-viral immune pressure or avoidance of receptor interference. Polymorphisms were not detected within two functional regions believed to affect infectivity. Host sequences flanking proviral integration sites were also captured; with few proviral loci shared among koalas. Recently described variants of KoRV, designated KoRV-B and KoRV-J, were not detected in museum samples, suggesting that these variants may be of recent origin. PMID:24752422

  4. HPV-6 Molecular Variants Association With the Development of Genital Warts in Men: The HIM Study

    PubMed Central

    Flores-Díaz, Ema; Sereday, Karen A.; Ferreira, Silvaneide; Sirak, Bradley; Sobrinho, João Simão; Baggio, Maria Luiza; Galan, Lenice; Silva, Roberto C.; Lazcano-Ponce, Eduardo; Giuliano, Anna R.; Villa, Luisa L.

    2017-01-01

    Abstract Background. Human papillomavirus type 6 (HPV-6) and HPV-11 are the etiological agents of approximately 90% of genital warts (GWs). The impact of HPV-6 genetic heterogeneity on persistence and progression to GWs remains undetermined. Methods. HPV Infection in Men (HIM) Study participants who had HPV-6 genital swabs and/or GWs preceded by a viable normal genital swab were analyzed. Variants characterization was performed by polymerase chain reaction sequencing and samples classified within lineages (A, B) and sublineages (B1, B2, B3, B4, B5). Country- and age-specific analyses were conducted for individual variants; odds ratios and 95% confidence intervals for the risk of GWs according to HPV-6 variants were calculated. Results. B3 variants were most prevalent. HPV-6 variants distribution differed between countries and case status. HPV-6 B1 variants prevalence was increased in GWs and genital swabs of cases compared to controls. There was difference in B1 and B3 variants detection in GW and the preceding genital swab. We observed significant association of HPV-6 B1 variants detection with GW development. Conclusions. HPV-6 B1 variants are more prevalent in genital swabs that precede GW development, and confer an increased risk for GW. Further research is warranted to understand the possible involvement of B1 variants in the progression to clinically relevant lesions. PMID:28011919

  5. Expanding the mutational spectrum of LZTR1 in schwannomatosis.

    PubMed

    Paganini, Irene; Chang, Vivian Y; Capone, Gabriele L; Vitte, Jeremie; Benelli, Matteo; Barbetti, Lorenzo; Sestini, Roberta; Trevisson, Eva; Hulsebos, Theo Jm; Giovannini, Marco; Nelson, Stanley F; Papi, Laura

    2015-07-01

    Schwannomatosis is characterized by the development of multiple non-vestibular, non-intradermal schwannomas. Constitutional inactivating variants in two genes, SMARCB1 and, very recently, LZTR1, have been reported. We performed exome sequencing of 13 schwannomatosis patients from 11 families without SMARCB1 deleterious variants. We identified four individuals with heterozygous loss-of-function variants in LZTR1. Sequencing of the germline of 60 additional patients identified 18 additional heterozygous variants in LZTR1. We identified LZTR1 variants in 43% and 30% of familial (three of the seven families) and sporadic patients, respectively. In addition, we tested LZTR1 protein immunostaining in 22 tumors from nine unrelated patients with and without LZTR1 deleterious variants. Tumors from individuals with LZTR1 variants lost the protein expression in at least a subset of tumor cells, consistent with a tumor suppressor mechanism. In conclusion, our study demonstrates that molecular analysis of LZTR1 may contribute to the molecular characterization of schwannomatosis patients, in addition to NF2 mutational analysis and the detection of chromosome 22 losses in tumor tissue. It will be especially useful in differentiating schwannomatosis from mosaic Neurofibromatosis type 2 (NF2). However, the role of LZTR1 in the pathogenesis of schwannomatosis needs further elucidation.

  6. Expanding the mutational spectrum of LZTR1 in schwannomatosis

    PubMed Central

    Paganini, Irene; Chang, Vivian Y; Capone, Gabriele L; Vitte, Jeremie; Benelli, Matteo; Barbetti, Lorenzo; Sestini, Roberta; Trevisson, Eva; Hulsebos, Theo JM; Giovannini, Marco; Nelson, Stanley F; Papi, Laura

    2015-01-01

    Schwannomatosis is characterized by the development of multiple non-vestibular, non-intradermal schwannomas. Constitutional inactivating variants in two genes, SMARCB1 and, very recently, LZTR1, have been reported. We performed exome sequencing of 13 schwannomatosis patients from 11 families without SMARCB1 deleterious variants. We identified four individuals with heterozygous loss-of-function variants in LZTR1. Sequencing of the germline of 60 additional patients identified 18 additional heterozygous variants in LZTR1. We identified LZTR1 variants in 43% and 30% of familial (three of the seven families) and sporadic patients, respectively. In addition, we tested LZTR1 protein immunostaining in 22 tumors from nine unrelated patients with and without LZTR1 deleterious variants. Tumors from individuals with LZTR1 variants lost the protein expression in at least a subset of tumor cells, consistent with a tumor suppressor mechanism. In conclusion, our study demonstrates that molecular analysis of LZTR1 may contribute to the molecular characterization of schwannomatosis patients, in addition to NF2 mutational analysis and the detection of chromosome 22 losses in tumor tissue. It will be especially useful in differentiating schwannomatosis from mosaic Neurofibromatosis type 2 (NF2). However, the role of LZTR1 in the pathogenesis of schwannomatosis needs further elucidation. PMID:25335493

  7. Loeffler 4.0: Diagnostic Metagenomics.

    PubMed

    Höper, Dirk; Wylezich, Claudia; Beer, Martin

    2017-01-01

    A new world of possibilities for "virus discovery" was opened up with high-throughput sequencing becoming available in the last decade. While scientifically metagenomic analysis was established before the start of the era of high-throughput sequencing, the availability of the first second-generation sequencers was the kick-off for diagnosticians to use sequencing for the detection of novel pathogens. Today, diagnostic metagenomics is becoming the standard procedure for the detection and genetic characterization of new viruses or novel virus variants. Here, we provide an overview about technical considerations of high-throughput sequencing-based diagnostic metagenomics together with selected examples of "virus discovery" for animal diseases or zoonoses and metagenomics for food safety or basic veterinary research. © 2017 Elsevier Inc. All rights reserved.

  8. Efficient and Accurate Algorithm for Cleaved Fragments Prediction (CFPA) in Protein Sequences Dataset Based on Consensus and Its Variants: A Novel Degradomics Prediction Application.

    PubMed

    El-Assaad, Atlal; Dawy, Zaher; Nemer, Georges; Hajj, Hazem; Kobeissy, Firas H

    2017-01-01

    Degradomics is a novel discipline that involves determination of the proteases/substrate fragmentation profile, called the substrate degradome, and has been recently applied in different disciplines. A major application of degradomics is its utility in the field of biomarkers where the breakdown products (BDPs) of different protease have been investigated. Among the major proteases assessed, calpain and caspase proteases have been associated with the execution phases of the pro-apoptotic and pro-necrotic cell death, generating caspase/calpain-specific cleaved fragments. The distinction between calpain and caspase protein fragments has been applied to distinguish injury mechanisms. Advanced proteomics technology has been used to identify these BDPs experimentally. However, it has been a challenge to identify these BDPs with high precision and efficiency, especially if we are targeting a number of proteins at one time. In this chapter, we present a novel bioinfromatic detection method that identifies BDPs accurately and efficiently with validation against experimental data. This method aims at predicting the consensus sequence occurrences and their variants in a large set of experimentally detected protein sequences based on state-of-the-art sequence matching and alignment algorithms. After detection, the method generates all the potential cleaved fragments by a specific protease. This space and time-efficient algorithm is flexible to handle the different orientations that the consensus sequence and the protein sequence can take before cleaving. It is O(mn) in space complexity and O(Nmn) in time complexity, with N number of protein sequences, m length of the consensus sequence, and n length of each protein sequence. Ultimately, this knowledge will subsequently feed into the development of a novel tool for researchers to detect diverse types of selected BDPs as putative disease markers, contributing to the diagnosis and treatment of related disorders.

  9. Genovar: a detection and visualization tool for genomic variants.

    PubMed

    Jung, Kwang Su; Moon, Sanghoon; Kim, Young Jin; Kim, Bong-Jo; Park, Kiejung

    2012-05-08

    Along with single nucleotide polymorphisms (SNPs), copy number variation (CNV) is considered an important source of genetic variation associated with disease susceptibility. Despite the importance of CNV, the tools currently available for its analysis often produce false positive results due to limitations such as low resolution of array platforms, platform specificity, and the type of CNV. To resolve this problem, spurious signals must be separated from true signals by visual inspection. None of the previously reported CNV analysis tools support this function and the simultaneous visualization of comparative genomic hybridization arrays (aCGH) and sequence alignment. The purpose of the present study was to develop a useful program for the efficient detection and visualization of CNV regions that enables the manual exclusion of erroneous signals. A JAVA-based stand-alone program called Genovar was developed. To ascertain whether a detected CNV region is a novel variant, Genovar compares the detected CNV regions with previously reported CNV regions using the Database of Genomic Variants (DGV, http://projects.tcag.ca/variation) and the Single Nucleotide Polymorphism Database (dbSNP). The current version of Genovar is capable of visualizing genomic data from sources such as the aCGH data file and sequence alignment format files. Genovar is freely accessible and provides a user-friendly graphic user interface (GUI) to facilitate the detection of CNV regions. The program also provides comprehensive information to help in the elimination of spurious signals by visual inspection, making Genovar a valuable tool for reducing false positive CNV results. http://genovar.sourceforge.net/.

  10. Evolutionary conservation analysis increases the colocalization of predicted exonic splicing enhancers in the BRCA1 gene with missense sequence changes and in-frame deletions, but not polymorphisms

    PubMed Central

    Pettigrew, Christopher; Wayte, Nicola; Lovelock, Paul K; Tavtigian, Sean V; Chenevix-Trench, Georgia; Spurdle, Amanda B; Brown, Melissa A

    2005-01-01

    Introduction Aberrant pre-mRNA splicing can be more detrimental to the function of a gene than changes in the length or nature of the encoded amino acid sequence. Although predicting the effects of changes in consensus 5' and 3' splice sites near intron:exon boundaries is relatively straightforward, predicting the possible effects of changes in exonic splicing enhancers (ESEs) remains a challenge. Methods As an initial step toward determining which ESEs predicted by the web-based tool ESEfinder in the breast cancer susceptibility gene BRCA1 are likely to be functional, we have determined their evolutionary conservation and compared their location with known BRCA1 sequence variants. Results Using the default settings of ESEfinder, we initially detected 669 potential ESEs in the coding region of the BRCA1 gene. Increasing the threshold score reduced the total number to 464, while taking into consideration the proximity to splice donor and acceptor sites reduced the number to 211. Approximately 11% of these ESEs (23/211) either are identical at the nucleotide level in human, primates, mouse, cow, dog and opossum Brca1 (conserved) or are detectable by ESEfinder in the same position in the Brca1 sequence (shared). The frequency of conserved and shared predicted ESEs between human and mouse is higher in BRCA1 exons (2.8 per 100 nucleotides) than in introns (0.6 per 100 nucleotides). Of conserved or shared putative ESEs, 61% (14/23) were predicted to be affected by sequence variants reported in the Breast Cancer Information Core database. Applying the filters described above increased the colocalization of predicted ESEs with missense changes, in-frame deletions and unclassified variants predicted to be deleterious to protein function, whereas they decreased the colocalization with known polymorphisms or unclassified variants predicted to be neutral. Conclusion In this report we show that evolutionary conservation analysis may be used to improve the specificity of an ESE prediction tool. This is the first report on the prediction of the frequency and distribution of ESEs in the BRCA1 gene, and it is the first reported attempt to predict which ESEs are most likely to be functional and therefore which sequence variants in ESEs are most likely to be pathogenic. PMID:16280041

  11. Fast single-pass alignment and variant calling using sequencing data

    USDA-ARS?s Scientific Manuscript database

    Sequencing research requires efficient computation. Few programs use already known information about DNA variants when aligning sequence data to the reference map. New program findmap.f90 reads the previous variant list before aligning sequence, calling variant alleles, and summing the allele counts...

  12. Variants in the PRPF8 Gene are Associated with Glaucoma.

    PubMed

    Micheal, Shazia; Hogewind, Barend F; Khan, Muhammad Imran; Siddiqui, Sorath Noorani; Zafar, Saemah Nuzhat; Akhtar, Farah; Qamar, Raheel; Hoyng, Carel B; den Hollander, Anneke I

    2018-05-01

    Glaucoma is the cause of irreversible blindness worldwide. Mutations in six genes have been associated with juvenile- and adult-onset familial primary open angle glaucoma (POAG) prior to this report but they explain only a small proportion of the genetic load. The aim of the study is to identify the novel genetic cause of the POAG in the families with adult-onset glaucoma. Whole exome sequencing (WES) was performed on DNA of two affected individuals, and predicted pathogenic variants were evaluated for segregation in four affected and three unaffected Dutch family members by Sanger sequencing. We identified a pathogenic variant (p.Val956Gly) in the PRPF8 gene, which segregates with the disease in Dutch family. Targeted Sanger sequencing of PRPF8 in a panel of 40 POAG families (18 Pakistani and 22 Dutch) revealed two additional nonsynonymous variants (p.Pro13Leu and p.Met25Thr), which segregate with the disease in two other Pakistani families. Both variants were then analyzed in a case-control cohort consisting of Pakistani 320 POAG cases and 250 matched controls. The p.Pro13Leu and p.Met25Thr variants were identified in 14 and 20 cases, respectively, while they were not detected in controls (p values 0.0004 and 0.0001, respectively). Previously, PRPF8 mutations have been associated with autosomal dominant retinitis pigmentosa (RP). The PRPF8 variants associated with POAG are located at the N-terminus, while all RP-associated mutations cluster at the C-terminus, dictating a clear genotype-phenotype correlation.

  13. Human papillomavirus variants among Inuit women in northern Quebec, Canada.

    PubMed

    Gauthier, Barbara; Coutlée, Francois; Franco, Eduardo L; Brassard, Paul

    2015-01-01

    Inuit communities in northern Quebec have high rates of human papillomavirus (HPV) infection, cervical cancer and cervical cancer-related mortality as compared to the Canadian population. HPV types can be further classified as intratypic variants based on the extent of homology in their nucleotide sequences. There is limited information on the distribution of intratypic variants in circumpolar areas. Our goal was to describe the HPV intratypic variants and associated baseline characteristics. We collected cervical cell samples in 2002-2006 from 676 Inuit women between the ages of 15 and 69 years in Nunavik. DNA isolates from high-risk HPVs were sequenced to determine the intratypic variant. There were 149 women that were positive for HPVs 16, 18, 31, 33, 35, 45, 52, 56 or 58 during follow-up. There were 5 different HPV16 variants, all of European lineage, among the 57 women positive for this type. There were 8 different variants of HPV18 present and all were of European lineage (n=21). The majority of samples of HPV31 (n=52) were of lineage B. The number of isolates and diversity of the other HPV types was low. Age was the only covariate associated with HPV16 variant category. These frequencies are similar to what was seen in another circumpolar region of Canada, although there appears to be less diversity as only European variants were detected. This study shows that most variants were clustered in one lineage for each HPV type.

  14. Epilepsy-related sudden unexpected death: targeted molecular analysis of inherited heart disease genes using next-generation DNA sequencing.

    PubMed

    Hata, Yukiko; Yoshida, Koji; Kinoshita, Koshi; Nishida, Naoki

    2017-05-01

    Inherited heart disease causing electric instability in the heart has been suggested to be a risk factor for sudden unexpected death in epilepsy (SUDEP). The purpose of this study was to reveal the correlation between epilepsy-related sudden unexpected death (SUD) and inherited heart disease. Twelve epilepsy-related SUD cases (seven males and five females, aged 11-78 years) were examined. Nine cases fulfilled the criteria of SUDEP, and three cases died by drowning. In addition to examining three major epilepsy-related genes, we used next-generation sequencing (NGS) to examine 73 inherited heart disease-related genes. We detected both known pathogenic variants and rare variants with minor allele frequencies of <0.5%. The pathogenicity of these variants was evaluated and graded by eight in silico predictive algorithms. Six known and six potential rare variants were detected. Among these, three known variants of LDB3, DSC2 and KCNE1 and three potential rare variants of MYH6, DSP and DSG2 were predicted by in silico analysis as possibly highly pathogenic in three of the nine SUDEP cases. Two of three cases with desmosome-related variants showed mild but possible significant right ventricular dysplasia-like pathology. A case with LDB3 and MYH6 variants showed hypertrabeculation of the left ventricle and severe fibrosis of the cardiac conduction system. In the three drowning death cases, one case with mild prolonged QT interval had two variants in ANK2. This study shows that inherited heart disease may be a significant risk factor for SUD in some epilepsy cases, even if pathological findings of the heart had not progressed to an advanced stage of the disease. A combination of detailed pathological examination of the heart and gene analysis using NGS may be useful for evaluating arrhythmogenic potential of epilepsy-related SUD. © 2016 International Society of Neuropathology.

  15. Identifying Likely Transmission Pathways within a 10-Year Community Outbreak of Tuberculosis by High-Depth Whole Genome Sequencing

    PubMed Central

    Sadsad, Rosemarie; Martinez, Elena; Jelfs, Peter; Hill-Cawthorne, Grant A.; Gilbert, Gwendolyn L.; Marais, Ben J.; Sintchenko, Vitali

    2016-01-01

    Background Improved tuberculosis control and the need to contain the spread of drug-resistant strains provide a strong rationale for exploring tuberculosis transmission dynamics at the population level. Whole-genome sequencing provides optimal strain resolution, facilitating detailed mapping of potential transmission pathways. Methods We sequenced 22 isolates from a Mycobacterium tuberculosis cluster in New South Wales, Australia, identified during routine 24-locus mycobacterial interspersed repetitive unit typing. Following high-depth paired-end sequencing using the Illumina HiSeq 2000 platform, two independent pipelines were employed for analysis, both employing read mapping onto reference genomes as well as de novo assembly, to control biases in variant detection. In addition to single-nucleotide polymorphisms, the analyses also sought to identify insertions, deletions and structural variants. Results Isolates were highly similar, with a distance of 13 variants between the most distant members of the cluster. The most sensitive analysis classified the 22 isolates into 18 groups. Four of the isolates did not appear to share a recent common ancestor with the largest clade; another four isolates had an uncertain ancestral relationship with the largest clade. Conclusion Whole genome sequencing, with analysis of single-nucleotide polymorphisms, insertions, deletions, structural variants and subpopulations, enabled the highest possible level of discrimination between cluster members, clarifying likely transmission pathways and exposing the complexity of strain origin. The analysis provides a basis for targeted public health intervention and enhanced classification of future isolates linked to the cluster. PMID:26938641

  16. Screening of whole genome sequences identified high-impact variants for stallion fertility.

    PubMed

    Schrimpf, Rahel; Gottschalk, Maren; Metzger, Julia; Martinsson, Gunilla; Sieme, Harald; Distl, Ottmar

    2016-04-14

    Stallion fertility is an economically important trait due to the increase of artificial insemination in horses. The availability of whole genome sequence data facilitates identification of rare high-impact variants contributing to stallion fertility. The aim of our study was to genotype rare high-impact variants retrieved from next-generation sequencing (NGS)-data of 11 horses in order to unravel harmful genetic variants in large samples of stallions. Gene ontology (GO) terms and search results from public databases were used to obtain a comprehensive list of human und mice genes predicted to participate in the regulation of male reproduction. The corresponding equine orthologous genes were searched in whole genome sequence data of seven stallions and four mares and filtered for high-impact genetic variants using SnpEFF, SIFT and Polyphen 2 software. All genetic variants with the missing homozygous mutant genotype were genotyped on 337 fertile stallions of 19 breeds using KASP genotyping assays or PCR-RFLP. Mixed linear model analysis was employed for an association analysis with de-regressed estimated breeding values of the paternal component of the pregnancy rate per estrus (EBV-PAT). We screened next generation sequenced data of whole genomes from 11 horses for equine genetic variants in 1194 human and mice genes involved in male fertility and linked through common gene ontology (GO) with male reproductive processes. Variants were filtered for high-impact on protein structure and validated through SIFT and Polyphen 2. Only those genetic variants were followed up when the homozygote mutant genotype was missing in the detection sample comprising 11 horses. After this filtering process, 17 single nucleotide polymorphism (SNPs) were left. These SNPs were genotyped in 337 fertile stallions of 19 breeds using KASP genotyping assays or PCR-RFLP. An association analysis in 216 Hanoverian stallions revealed a significant association of the splice-site disruption variant g.37455302G>A in NOTCH1 with the de-regressed estimated breeding values of the paternal component of the pregnancy rate per estrus (EBV-PAT). For 9 high-impact variants within the genes CFTR, OVGP1, FBXO43, TSSK6, PKD1, FOXP1, TCP11, SPATA31E1 and NOTCH1 (g.37453246G>C) absence of the homozygous mutant genotype in the validation sample of all 337 fertile stallions was obvious. Therefore, these variants were considered as potentially deleterious factors for stallion fertility. In conclusion, this study revealed 17 genetic variants with a predicted high damaging effect on protein structure and missing homozygous mutant genotype. The g.37455302G>A NOTCH1 variant was identified as a significant stallion fertility locus in Hanoverian stallions and further 9 candidate fertility loci with missing homozygous mutant genotypes were validated in a panel including 19 horse breeds. To our knowledge this is the first study in horses using next generation sequencing data to uncover strong candidate factors for stallion fertility.

  17. A pooling-based approach to mapping genetic variants associated with DNA methylation

    PubMed Central

    Kaplow, Irene M.; MacIsaac, Julia L.; Mah, Sarah M.; McEwen, Lisa M.; Kobor, Michael S.; Fraser, Hunter B.

    2015-01-01

    DNA methylation is an epigenetic modification that plays a key role in gene regulation. Previous studies have investigated its genetic basis by mapping genetic variants that are associated with DNA methylation at specific sites, but these have been limited to microarrays that cover <2% of the genome and cannot account for allele-specific methylation (ASM). Other studies have performed whole-genome bisulfite sequencing on a few individuals, but these lack statistical power to identify variants associated with DNA methylation. We present a novel approach in which bisulfite-treated DNA from many individuals is sequenced together in a single pool, resulting in a truly genome-wide map of DNA methylation. Compared to methods that do not account for ASM, our approach increases statistical power to detect associations while sharply reducing cost, effort, and experimental variability. As a proof of concept, we generated deep sequencing data from a pool of 60 human cell lines; we evaluated almost twice as many CpGs as the largest microarray studies and identified more than 2000 genetic variants associated with DNA methylation. We found that these variants are highly enriched for associations with chromatin accessibility and CTCF binding but are less likely to be associated with traits indirectly linked to DNA, such as gene expression and disease phenotypes. In summary, our approach allows genome-wide mapping of genetic variants associated with DNA methylation in any tissue of any species, without the need for individual-level genotype or methylation data. PMID:25910490

  18. Longitudinal studies on maternal HIV-1 variants by biological phenotyping, sequence analysis and viral load.

    PubMed

    Renta, J Y; Cadilla, C L; Vega, M E; Hillyer, G V; Estrada, C; Jiménez, E; Abreu, E; Méndez, I; Gandía, J; Meléndez-Guerrero, L M

    1997-11-01

    In this study, the HIV-1 variant viruses from ten pregnant women and their infants were isolated and characterized longitudinally in order to determine the role that viral envelope (gp120-V3 loop) gene variation and viral tropism play in vertical transmission. Biological phenotyping of each HIV variant was accomplished by growth in MT-2, and macrophages from healthy and non-HIV-infected donors. Genetic characterization of the variants was accomplished by DNA sequence analysis. All the women enrolled in this study received ZDV therapy. Virus was cultured from eight out of ten env V3-PCR positive mothers. HIV-1 isolates were all non-syncitium inducing variants. None of the mothers were found to transmit HIV, as determined by DNA PCR and quantitative co-cultures on their infants which were seronegative for HIV-1 through one year after birth. Viral cultures from infant blood samples were negative and infants were all healthy. However, nested env V3-PCR detected proviral DNA in five out of ten infants. In contrast, conventional gag-PCR was negative in the same five infants. Sequences of the five maternal-infant pairs were different, suggesting unique infant HIV-1 variants. The three highest maternal viral load values corresponded to infants that were env V3-PCR positive. These results suggest that HIV-1 particles are transmitted from ZDV-treated mothers to infants. Infant follow up is recommended to determine if HIV-1 has been inhibited by the immune system of the infants.

  19. A pooling-based approach to mapping genetic variants associated with DNA methylation

    DOE PAGES

    Kaplow, Irene M.; MacIsaac, Julia L.; Mah, Sarah M.; ...

    2015-04-24

    DNA methylation is an epigenetic modification that plays a key role in gene regulation. Previous studies have investigated its genetic basis by mapping genetic variants that are associated with DNA methylation at specific sites, but these have been limited to microarrays that cover <2% of the genome and cannot account for allele-specific methylation (ASM). Other studies have performed whole-genome bisulfite sequencing on a few individuals, but these lack statistical power to identify variants associated with DNA methylation. We present a novel approach in which bisulfite-treated DNA from many individuals is sequenced together in a single pool, resulting in a trulymore » genome-wide map of DNA methylation. Compared to methods that do not account for ASM, our approach increases statistical power to detect associations while sharply reducing cost, effort, and experimental variability. As a proof of concept, we generated deep sequencing data from a pool of 60 human cell lines; we evaluated almost twice as many CpGs as the largest microarray studies and identified more than 2000 genetic variants associated with DNA methylation. Here we found that these variants are highly enriched for associations with chromatin accessibility and CTCF binding but are less likely to be associated with traits indirectly linked to DNA, such as gene expression and disease phenotypes. In summary, our approach allows genome-wide mapping of genetic variants associated with DNA methylation in any tissue of any species, without the need for individual-level genotype or methylation data.« less

  20. Pre-capture multiplexing improves efficiency and cost-effectiveness of targeted genomic enrichment.

    PubMed

    Shearer, A Eliot; Hildebrand, Michael S; Ravi, Harini; Joshi, Swati; Guiffre, Angelica C; Novak, Barbara; Happe, Scott; LeProust, Emily M; Smith, Richard J H

    2012-11-14

    Targeted genomic enrichment (TGE) is a widely used method for isolating and enriching specific genomic regions prior to massively parallel sequencing. To make effective use of sequencer output, barcoding and sample pooling (multiplexing) after TGE and prior to sequencing (post-capture multiplexing) has become routine. While previous reports have indicated that multiplexing prior to capture (pre-capture multiplexing) is feasible, no thorough examination of the effect of this method has been completed on a large number of samples. Here we compare standard post-capture TGE to two levels of pre-capture multiplexing: 12 or 16 samples per pool. We evaluated these methods using standard TGE metrics and determined the ability to identify several classes of genetic mutations in three sets of 96 samples, including 48 controls. Our overall goal was to maximize cost reduction and minimize experimental time while maintaining a high percentage of reads on target and a high depth of coverage at thresholds required for variant detection. We adapted the standard post-capture TGE method for pre-capture TGE with several protocol modifications, including redesign of blocking oligonucleotides and optimization of enzymatic and amplification steps. Pre-capture multiplexing reduced costs for TGE by at least 38% and significantly reduced hands-on time during the TGE protocol. We found that pre-capture multiplexing reduced capture efficiency by 23 or 31% for pre-capture pools of 12 and 16, respectively. However efficiency losses at this step can be compensated by reducing the number of simultaneously sequenced samples. Pre-capture multiplexing and post-capture TGE performed similarly with respect to variant detection of positive control mutations. In addition, we detected no instances of sample switching due to aberrant barcode identification. Pre-capture multiplexing improves efficiency of TGE experiments with respect to hands-on time and reagent use compared to standard post-capture TGE. A decrease in capture efficiency is observed when using pre-capture multiplexing; however, it does not negatively impact variant detection and can be accommodated by the experimental design.

  1. Artificial selection increased body weight but induced increase of runs of homozygosity in Hanwoo cattle

    PubMed Central

    Kim, Kwondo; Jung, Jaehoon; Caetano-Anollés, Kelsey; Sung, Samsun; Yoo, DongAhn; Choi, Bong-Hwan; Kim, Hyung-Chul; Jeong, Jin-Young; Cho, Yong-Min; Park, Eung-Woo; Choi, Tae-Jeong; Park, Byoungho; Lim, Dajeong

    2018-01-01

    Artificial selection has been demonstrated to have a rapid and significant effect on the phenotype and genome of an organism. However, most previous studies on artificial selection have focused solely on genomic sequences modified by artificial selection or genomic sequences associated with a specific trait. In this study, we generated whole genome sequencing data of 126 cattle under artificial selection, and 24,973,862 single nucleotide variants to investigate the relationship among artificial selection, genomic sequences and trait. Using runs of homozygosity detected by the variants, we showed increase of inbreeding for decades, and at the same time demonstrated a little influence of recent inbreeding on body weight. Also, we could identify ~0.2 Mb runs of homozygosity segment which may be created by recent artificial selection. This approach may aid in development of genetic markers directly influenced by artificial selection, and provide insight into the process of artificial selection. PMID:29561881

  2. CRAWview: for viewing splicing variation, gene families, and polymorphism in clusters of ESTs and full-length sequences.

    PubMed

    Chou, A; Burke, J

    1999-05-01

    DNA sequence clustering has become a valuable method in support of gene discovery and gene expression analysis. Our interest lies in leveraging the sequence diversity within clusters of expressed sequence tags (ESTs) to model gene structure for the study of gene variants that arise from, among other things, alternative mRNA splicing, polymorphism, and divergence after gene duplication, fusion, and translocation events. In previous work, CRAW was developed to discover gene variants from assembled clusters of ESTs. Most importantly, novel gene features (the differing units between gene variants, for example alternative exons, polymorphisms, transposable elements, etc.) that are specialized to tissue, disease, population, or developmental states can be identified when these tools collate DNA source information with gene variant discrimination. While the goal is complete automation of novel feature and gene variant detection, current methods are far from perfect and hence the development of effective tools for visualization and exploratory data analysis are of paramount importance in the process of sifting through candidate genes and validating targets. We present CRAWview, a Java based visualization extension to CRAW. Features that vary between gene forms are displayed using an automatically generated color coded index. The reporting format of CRAWview gives a brief, high level summary report to display overlap and divergence within clusters of sequences as well as the ability to 'drill down' and see detailed information concerning regions of interest. Additionally, the alignment viewing and editing capabilities of CRAWview make it possible to interactively correct frame-shifts and otherwise edit cluster assemblies. We have implemented CRAWview as a Java application across windows NT/95 and UNIX platforms. A beta version of CRAWview will be freely available to academic users from Pangea Systems (http://www.pangeasystems.com). Contact :

  3. ViVaMBC: estimating viral sequence variation in complex populations from illumina deep-sequencing data using model-based clustering.

    PubMed

    Verbist, Bie; Clement, Lieven; Reumers, Joke; Thys, Kim; Vapirev, Alexander; Talloen, Willem; Wetzels, Yves; Meys, Joris; Aerssens, Jeroen; Bijnens, Luc; Thas, Olivier

    2015-02-22

    Deep-sequencing allows for an in-depth characterization of sequence variation in complex populations. However, technology associated errors may impede a powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores which are derived from a quadruplet of intensities, one channel for each nucleotide type for Illumina sequencing. The highest intensity of the four channels determines the base that is called. Mismatch bases can often be corrected by the second best base, i.e. the base with the second highest intensity in the quadruplet. A virus variant model-based clustering method, ViVaMBC, is presented that explores quality scores and second best base calls for identifying and quantifying viral variants. ViVaMBC is optimized to call variants at the codon level (nucleotide triplets) which enables immediate biological interpretation of the variants with respect to their antiviral drug responses. Using mixtures of HCV plasmids we show that our method accurately estimates frequencies down to 0.5%. The estimates are unbiased when average coverages of 25,000 are reached. A comparison with the SNP-callers V-Phaser2, ShoRAH, and LoFreq shows that ViVaMBC has a superb sensitivity and specificity for variants with frequencies above 0.4%. Unlike the competitors, ViVaMBC reports a higher number of false-positive findings with frequencies below 0.4% which might partially originate from picking up artificial variants introduced by errors in the sample and library preparation step. ViVaMBC is the first method to call viral variants directly at the codon level. The strength of the approach lies in modeling the error probabilities based on the quality scores. Although the use of second best base calls appeared very promising in our data exploration phase, their utility was limited. They provided a slight increase in sensitivity, which however does not warrant the additional computational cost of running the offline base caller. Apparently a lot of information is already contained in the quality scores enabling the model based clustering procedure to adjust the majority of the sequencing errors. Overall the sensitivity of ViVaMBC is such that technical constraints like PCR errors start to form the bottleneck for low frequency variant detection.

  4. Precise detection of de novo single nucleotide variants in human genomes.

    PubMed

    Gómez-Romero, Laura; Palacios-Flores, Kim; Reyes, José; García, Delfino; Boege, Margareta; Dávila, Guillermo; Flores, Margarita; Schatz, Michael C; Palacios, Rafael

    2018-05-22

    The precise determination of de novo genetic variants has enormous implications across different fields of biology and medicine, particularly personalized medicine. Currently, de novo variations are identified by mapping sample reads from a parent-offspring trio to a reference genome, allowing for a certain degree of differences. While widely used, this approach often introduces false-positive (FP) results due to misaligned reads and mischaracterized sequencing errors. In a previous study, we developed an alternative approach to accurately identify single nucleotide variants (SNVs) using only perfect matches. However, this approach could be applied only to haploid regions of the genome and was computationally intensive. In this study, we present a unique approach, coverage-based single nucleotide variant identification (COBASI), which allows the exploration of the entire genome using second-generation short sequence reads without extensive computing requirements. COBASI identifies SNVs using changes in coverage of exactly matching unique substrings, and is particularly suited for pinpointing de novo SNVs. Unlike other approaches that require population frequencies across hundreds of samples to filter out any methodological biases, COBASI can be applied to detect de novo SNVs within isolated families. We demonstrate this capability through extensive simulation studies and by studying a parent-offspring trio we sequenced using short reads. Experimental validation of all 58 candidate de novo SNVs and a selection of non-de novo SNVs found in the trio confirmed zero FP calls. COBASI is available as open source at https://github.com/Laura-Gomez/COBASI for any researcher to use. Copyright © 2018 the Author(s). Published by PNAS.

  5. Discovery of variant infectious salmon anaemia virus (ISAV) of European genotype in British Columbia, Canada.

    PubMed

    Kibenge, Molly Jt; Iwamoto, Tokinori; Wang, Yingwei; Morton, Alexandra; Routledge, Richard; Kibenge, Frederick Sb

    2016-01-06

    Infectious salmon anaemia (ISA) virus (ISAV) belongs to the genus Isavirus, family Orthomyxoviridae. ISAV occurs in two basic genotypes, North American and European. The European genotype is more widespread and shows greater genetic variation and greater virulence variation than the North American genotype. To date, all of the ISAV isolates from the clinical disease, ISA, have had deletions in the highly polymorphic region (HPR) on ISAV segment 6 (ISAV-HPRΔ) relative to ISAV-HPR0, named numerically from ISAV-HPR1 to over ISAV-HPR30. ISA outbreaks have only been reported in farmed Atlantic salmon, although ISAV has been detected by RT-PCR in wild fish. It is recognized that asymptomatically ISAV-infected fish exist. There is no universally accepted ISAV RT-qPCR TaqMan® assay. Most diagnostic laboratories use the primer-probe set targeting a 104 bp-fragment on ISAV segment 8. Some laboratories and researchers have found a primer-probe set targeting ISAV segment 7 to be more sensitive. Other researchers have published different ISAV segment 8 primer-probe sets that are highly sensitive. In this study, we tested 1,106 fish tissue samples collected from (i) market-bought farmed salmonids and (ii) wild salmon from throughout British Columbia (BC), Canada, for ISAV using real time RT-qPCR targeting segment 8 and/or conventional RT-PCR with segment 8 primers and segment 6 HPR primers, and by virus isolation attempts using Salmon head kidney (SHK-1 and ASK-2) cell line monolayers. The sequences from the conventional PCR products were compared by multiple alignment and phylogenetic analyses. Seventy-nine samples were "non-negative" with at least one of these tests in one or more replicates. The ISAV segment 6 HPR sequences from the PCR products matched ISAV variants, HPR5 on 29 samples, one sample had both HPR5 and HPR7b and one matched HPR0. All sequences were of European genotype. In addition, alignment of sequences of the conventional PCR product segment 8 showed they had a single nucleotide mutation in the region of the probe sequence and a 9-nucleotide overlap with the reverse primer sequence of the real time RT-qPCR assay. None of the classical ISAV segment 8 sequences in the GenBank have this mutation in the probe-binding site of the assay, suggesting the presence of a novel ISAV variant in BC. A phylogenetic tree of these sequences showed that some ISAV sequences diverted early from the classical European genotype sequences, while others have evolved separately. All virus isolation attempts on the samples were negative, and thus the samples were considered "negative" in terms of the threshold trigger set for Canadian federal regulatory action; i.e., successful virus isolation in cell culture. This is the first published report of the detection of ISAV sequences in fish from British Columbia, Canada. The sequences detected, both of ISAV-HPRΔ and ISAV-HPR0 are of European genotype. These sequences are different from the classical ISAV segment 8 sequences, and this difference suggests the presence of a new ISAV variant of European genotype in BC. Our results further suggest that ISAV-HPRΔ strains can be present without clinical disease in farmed fish and without being detected by virus isolation using fish cell lines.

  6. De novo nonsense and frameshift variants of TCF20 in individuals with intellectual disability and postnatal overgrowth.

    PubMed

    Schäfgen, Johanna; Cremer, Kirsten; Becker, Jessica; Wieland, Thomas; Zink, Alexander M; Kim, Sarah; Windheuser, Isabelle C; Kreiß, Martina; Aretz, Stefan; Strom, Tim M; Wieczorek, Dagmar; Engels, Hartmut

    2016-12-01

    Recently, germline variants of the transcriptional co-regulator gene TCF20 have been implicated in the aetiology of autism spectrum disorders (ASD). However, the knowledge about the associated clinical picture remains fragmentary. In this study, two individuals with de novo TCF20 sequence variants were identified in a cohort of 313 individuals with intellectual disability of unknown aetiology, which was analysed by whole exome sequencing using a child-parent trio design. Both detected variants - one nonsense and one frameshift variant - were truncating. A comprehensive clinical characterisation of the patients yielded mild intellectual disability, postnatal tall stature and macrocephaly, obesity and muscular hypotonia as common clinical signs while ASD was only present in one proband. The present report begins to establish the clinical picture of individuals with de novo nonsense and frameshift variants of TCF20 which includes features such as proportionate overgrowth and muscular hypotonia. Furthermore, intellectual disability/developmental delay seems to be fully penetrant amongst known individuals with de novo nonsense and frameshift variants of TCF20, whereas ASD is shown to be incompletely penetrant. The transcriptional co-regulator gene TCF20 is hereby added to the growing number of genes implicated in the aetiology of both ASD and intellectual disability. Furthermore, such de novo variants of TCF20 may represent a novel differential diagnosis in the overgrowth syndrome spectrum.

  7. Evaluation and optimisation of indel detection workflows for ion torrent sequencing of the BRCA1 and BRCA2 genes.

    PubMed

    Yeo, Zhen Xuan; Wong, Joshua Chee Leong; Rozen, Steven G; Lee, Ann Siew Gek

    2014-06-24

    The Ion Torrent PGM is a popular benchtop sequencer that shows promise in replacing conventional Sanger sequencing as the gold standard for mutation detection. Despite the PGM's reported high accuracy in calling single nucleotide variations, it tends to generate many false positive calls in detecting insertions and deletions (indels), which may hinder its utility for clinical genetic testing. Recently, the proprietary analytical workflow for the Ion Torrent sequencer, Torrent Suite (TS), underwent a series of upgrades. We evaluated three major upgrades of TS by calling indels in the BRCA1 and BRCA2 genes. Our analysis revealed that false negative indels could be generated by TS under both default calling parameters and parameters adjusted for maximum sensitivity. However, indel calling with the same data using the open source variant callers, GATK and SAMtools showed that false negatives could be minimised with the use of appropriate bioinformatics analysis. Furthermore, we identified two variant calling measures, Quality-by-Depth (QD) and VARiation of the Width of gaps and inserts (VARW), which substantially reduced false positive indels, including non-homopolymer associated errors without compromising sensitivity. In our best case scenario that involved the TMAP aligner and SAMtools, we achieved 100% sensitivity, 99.99% specificity and 29% False Discovery Rate (FDR) in indel calling from all 23 samples, which is a good performance for mutation screening using PGM. New versions of TS, BWA and GATK have shown improvements in indel calling sensitivity and specificity over their older counterpart. However, the variant caller of TS exhibits a lower sensitivity than GATK and SAMtools. Our findings demonstrate that although indel calling from PGM sequences may appear to be noisy at first glance, proper computational indel calling analysis is able to maximize both the sensitivity and specificity at the single base level, paving the way for the usage of this technology for future clinical genetic testing.

  8. Novel insights into the functional metabolic impact of an apparent de novo m.8993T>G variant in the MT-ATP6 gene associated with maternally inherited form of Leigh Syndrome.

    PubMed

    Uittenbogaard, Martine; Brantner, Christine A; Fang, ZiShui; Wong, Lee-Jun C; Gropman, Andrea; Chiaramello, Anne

    2018-03-27

    In this study, we report a novel perpective of metabolic consequences for the m.8993T>G variant using fibroblasts from a proband with clinical symptoms compatible with Maternally Inherited Leigh Syndrome (MILS). Definitive diagnosis was corroborated by mitochondrial DNA testing for the pathogenic variant m.8993T>G in MT-ATP6 subunit by Sanger sequencing. The long-range PCR followed by massively parallel sequencing method detected the near homoplasmic m.8993T>G variant at 83% in the proband's fibroblasts and at 0.4% in the mother's fibroblasts. Our results are compatible with very low levels of germline heteroplasmy or an apparent de novo mutation. Our mitochondrial morphometric analysis reveals severe defects in mitochondrial cristae structure in the proband's fibroblasts. Our live-cell mitochondrial respiratory analyses show impaired oxidative phosphorylation with decreased spare respiratory capacity in response to energy stress in the proband's fibroblasts. We detected a diminished glycolysis with a lessened glycolytic capacity and reserve, revealing a stunted ability to switch to glycolysis upon full inhibition of OXPHOS activities. This dysregulated energy reprogramming results in a defective interplay between OXPHOS and glycolysis during an energy crisis. Our study sheds light on the potential pathophysiologic mechanism leading to chronic energy crisis in this MILS patient harboring the m.8993T>G variant. Copyright © 2018 Elsevier Inc. All rights reserved.

  9. HPV Genotyping of Modified General Primer-Amplicons Is More Analytically Sensitive and Specific by Sequencing than by Hybridization

    PubMed Central

    Meisal, Roger; Rounge, Trine Ballestad; Christiansen, Irene Kraus; Eieland, Alexander Kirkeby; Worren, Merete Molton; Molden, Tor Faksvaag; Kommedal, Øyvind; Hovig, Eivind; Leegaard, Truls Michael

    2017-01-01

    Sensitive and specific genotyping of human papillomaviruses (HPVs) is important for population-based surveillance of carcinogenic HPV types and for monitoring vaccine effectiveness. Here we compare HPV genotyping by Next Generation Sequencing (NGS) to an established DNA hybridization method. In DNA isolated from urine, the overall analytical sensitivity of NGS was found to be 22% higher than that of hybridization. NGS was also found to be the most specific method and expanded the detection repertoire beyond the 37 types of the DNA hybridization assay. Furthermore, NGS provided an increased resolution by identifying genetic variants of individual HPV types. The same Modified General Primers (MGP)-amplicon was used in both methods. The NGS method is described in detail to facilitate implementation in the clinical microbiology laboratory and includes suggestions for new standards for detection and calling of types and variants with improved resolution. PMID:28045981

  10. HPV Genotyping of Modified General Primer-Amplicons Is More Analytically Sensitive and Specific by Sequencing than by Hybridization.

    PubMed

    Meisal, Roger; Rounge, Trine Ballestad; Christiansen, Irene Kraus; Eieland, Alexander Kirkeby; Worren, Merete Molton; Molden, Tor Faksvaag; Kommedal, Øyvind; Hovig, Eivind; Leegaard, Truls Michael; Ambur, Ole Herman

    2017-01-01

    Sensitive and specific genotyping of human papillomaviruses (HPVs) is important for population-based surveillance of carcinogenic HPV types and for monitoring vaccine effectiveness. Here we compare HPV genotyping by Next Generation Sequencing (NGS) to an established DNA hybridization method. In DNA isolated from urine, the overall analytical sensitivity of NGS was found to be 22% higher than that of hybridization. NGS was also found to be the most specific method and expanded the detection repertoire beyond the 37 types of the DNA hybridization assay. Furthermore, NGS provided an increased resolution by identifying genetic variants of individual HPV types. The same Modified General Primers (MGP)-amplicon was used in both methods. The NGS method is described in detail to facilitate implementation in the clinical microbiology laboratory and includes suggestions for new standards for detection and calling of types and variants with improved resolution.

  11. Polymorphic human somatostatin gene is located on chromosome 3.

    PubMed Central

    Naylor, S L; Sakaguchi, A Y; Shen, L P; Bell, G I; Rutter, W J; Shows, T B

    1983-01-01

    Somatostatin is a 14-amino-acid neuropeptide and hormone that inhibits the secretion of several peptide hormones. The human gene for somatostatin SST has been cloned, and the sequence has been determined. This clone was used as a probe in chromosome mapping studies to detect the human somatostatin sequence in human-rodent hybrids. Southern blot analysis of 41 hybrids, including some containing translocations of human chromosomes, placed SST in the q21 leads to qter region of chromosome 3. Human DNAs from unrelated individuals were screened for restriction fragment polymorphisms detectable by the somatostatin gene probe. Two polymorphisms were found: (i) an EcoRI variant located at the 3' end of the gene, found in Caucasian, U.S. Black, and Asian populations with a frequency of approximately 0.10 and (ii) a BamHI variant in the intron, which occurs in Caucasians at a frequency of 0.13. Images PMID:6133281

  12. Modeling read counts for CNV detection in exome sequencing data.

    PubMed

    Love, Michael I; Myšičková, Alena; Sun, Ruping; Kalscheuer, Vera; Vingron, Martin; Haas, Stefan A

    2011-11-08

    Varying depth of high-throughput sequencing reads along a chromosome makes it possible to observe copy number variants (CNVs) in a sample relative to a reference. In exome and other targeted sequencing projects, technical factors increase variation in read depth while reducing the number of observed locations, adding difficulty to the problem of identifying CNVs. We present a hidden Markov model for detecting CNVs from raw read count data, using background read depth from a control set as well as other positional covariates such as GC-content. The model, exomeCopy, is applied to a large chromosome X exome sequencing project identifying a list of large unique CNVs. CNVs predicted by the model and experimentally validated are then recovered using a cross-platform control set from publicly available exome sequencing data. Simulations show high sensitivity for detecting heterozygous and homozygous CNVs, outperforming normalization and state-of-the-art segmentation methods.

  13. Ataxia telangiectasia presenting as dopa-responsive cervical dystonia

    PubMed Central

    Mohire, Mahavir D.; Schneider, Susanne A.; Stamelou, Maria; Wood, Nicholas W.; Bhatia, Kailash P.

    2013-01-01

    Objective: To identify the cause of cervical dopa-responsive dystonia (DRD) in a Muslim Indian family inherited in an apparently autosomal recessive fashion, as previously described in this journal. Methods: Previous testing for mutations in the genes known to cause DRD (GCH1, TH, and SPR) had been negative. Whole exome sequencing was performed on all 3 affected individuals for whom DNA was available to identify potentially pathogenic shared variants. Genotyping data obtained for all 3 affected individuals using the OmniExpress single nucleotide polymorphism chip (Illumina, San Diego, CA) were used to perform linkage analysis, autozygosity mapping, and copy number variation analysis. Sanger sequencing was used to confirm all variants. Results: After filtering of the variants, exome sequencing revealed 2 genes harboring potentially pathogenic compound heterozygous variants (ATM and LRRC16A). Of these, the variants in ATM segregated perfectly with the cervical DRD. Both mutations detected in ATM have been shown to be pathogenic, and α-fetoprotein, a marker of ataxia telangiectasia, was increased in all affected individuals. Conclusion: Biallelic mutations in ATM can cause DRD, and mutations in this gene should be considered in the differential diagnosis of unexplained DRD, particularly if the dystonia is cervical and if there is a recessive family history. ATM has previously been reported to cause isolated cervical dystonia, but never, to our knowledge, DRD. Individuals with dystonia related to ataxia telangiectasia may benefit from a trial of levodopa. PMID:23946315

  14. Molecular analysis of varicella vaccines and varicella-zoster virus from vaccine-related skin lesions.

    PubMed

    Thiele, Sonja; Borschewski, Aljona; Küchler, Judit; Bieberbach, Marc; Voigt, Sebastian; Ehlers, Bernhard

    2011-07-01

    To prevent complications that might follow an infection with varicella-zoster virus (VZV), the live attenuated Oka strain (V-Oka) is administered to children in many developed countries. Three vaccine brands (Varivax from Sanofi Pasteur MSD; Varilrix and Priorix-Tetra, both from Glaxo-Smith-Kline) are licensed in Germany and have been associated with both different degrees of vaccine effectiveness and adverse effects. To identify genetic variants in the vaccines that might contribute to rash-associated syndromes, single nucleotide polymorphism (SNP) profiles of variants from the three vaccines and rash-associated vaccine-type VZV from German vaccinees were quantitatively compared by PCR-based pyrosequencing (PSQ). The Varivax vaccine contained an estimated 3-fold higher diversity of VZV variants, with 20% more wild-type (wt) SNPs than Varilrix and Priorix-Tetra. These minor VZV variants in the vaccines were identified by analyzing cloned full-length open reading frame (ORF) orf62 sequences by chain termination sequencing and PSQ. Some of these sequences amplified from vaccine VZV were very similar or identical to those of the rash-associated vaccine-type VZV from vaccinees and were almost exclusively detected in Varivax. Therefore, minorities of rash-associated VZV variants are present in varicella vaccine formulations, and it can be concluded that the analysis of a core set of four SNPs is required as a minimum for a firm diagnostic differentiation of vaccine-type VZV from wt VZV.

  15. Same-day genomic and epigenomic diagnosis of brain tumors using real-time nanopore sequencing.

    PubMed

    Euskirchen, Philipp; Bielle, Franck; Labreche, Karim; Kloosterman, Wigard P; Rosenberg, Shai; Daniau, Mailys; Schmitt, Charlotte; Masliah-Planchon, Julien; Bourdeaut, Franck; Dehais, Caroline; Marie, Yannick; Delattre, Jean-Yves; Idbaih, Ahmed

    2017-11-01

    Molecular classification of cancer has entered clinical routine to inform diagnosis, prognosis, and treatment decisions. At the same time, new tumor entities have been identified that cannot be defined histologically. For central nervous system tumors, the current World Health Organization classification explicitly demands molecular testing, e.g., for 1p/19q-codeletion or IDH mutations, to make an integrated histomolecular diagnosis. However, a plethora of sophisticated technologies is currently needed to assess different genomic and epigenomic alterations and turnaround times are in the range of weeks, which makes standardized and widespread implementation difficult and hinders timely decision making. Here, we explored the potential of a pocket-size nanopore sequencing device for multimodal and rapid molecular diagnostics of cancer. Low-pass whole genome sequencing was used to simultaneously generate copy number (CN) and methylation profiles from native tumor DNA in the same sequencing run. Single nucleotide variants in IDH1, IDH2, TP53, H3F3A, and the TERT promoter region were identified using deep amplicon sequencing. Nanopore sequencing yielded ~0.1X genome coverage within 6 h and resulting CN and epigenetic profiles correlated well with matched microarray data. Diagnostically relevant alterations, such as 1p/19q codeletion, and focal amplifications could be recapitulated. Using ad hoc random forests, we could perform supervised pan-cancer classification to distinguish gliomas, medulloblastomas, and brain metastases of different primary sites. Single nucleotide variants in IDH1, IDH2, and H3F3A were identified using deep amplicon sequencing within minutes of sequencing. Detection of TP53 and TERT promoter mutations shows that sequencing of entire genes and GC-rich regions is feasible. Nanopore sequencing allows same-day detection of structural variants, point mutations, and methylation profiling using a single device with negligible capital cost. It outperforms hybridization-based and current sequencing technologies with respect to time to diagnosis and required laboratory equipment and expertise, aiming to make precision medicine possible for every cancer patient, even in resource-restricted settings.

  16. Genetic Analyses of the NF1 Gene in Turkish Neurofibromatosis Type I Patients and Definition of three Novel Variants

    PubMed Central

    Ulusal, SD; Gürkan, H; Atlı, E; Özal, SA; Çiftdemir, M; Tozkır, H; Karal, Y; Güçlü, H; Eker, D; Görker, I

    2017-01-01

    Abstract Neurofibromatosis Type I (NF1) is a multi systemic autosomal dominant neurocutaneous disorder predisposing patients to have benign and/or malignant lesions predominantly of the skin, nervous system and bone. Loss of function mutations or deletions of the NF1 gene is responsible for NF1 disease. Involvement of various pathogenic variants, the size of the gene and presence of pseudogenes makes it difficult to analyze. We aimed to report the results of 2 years of multiplex ligation-dependent probe amplification (MLPA) and next generation sequencing (NGS) for genetic diagnosis of NF1 applied at our genetic diagnosis center. The MLPA, semiconductor sequencing and Sanger sequencing were performed in genomic DNA samples from 24 unrelated patients and their affected family members referred to our center suspected of having NF1. In total, three novel and 12 known pathogenic variants and a whole gene deletion were determined. We suggest that next generation sequencing is a practical tool for genetic analysis of NF1. Deletion/duplication analysis with MLPA may also be helpful for patients clinically diagnosed to carry NF1 but do not have a detectable mutation in NGS. PMID:28924536

  17. Genetic basis of arrhythmogenic cardiomyopathy.

    PubMed

    Karmouch, Jennifer; Protonotarios, Alexandros; Syrris, Petros

    2018-05-01

    To date 16 genes have been associated with arrhythmogenic cardiomyopathy (ACM). Mutations in these genes can lead to a broad spectrum of phenotypic expression ranging from disease affecting predominantly the right or left ventricle, to biventricular subtypes. Understanding the genetic causes of ACM is important in diagnosis and management of the disorder. This review summarizes recent advances in molecular genetics and discusses the application of next-generation sequencing technology in genetic testing in ACM. Use of next-generation sequencing methods has resulted in the identification of novel causative variants and genes for ACM. The involvement of filamin C in ACM demonstrates the genetic overlap between ACM and other types of cardiomyopathy. Putative pathogenic variants have been detected in cadherin 2 gene, a protein involved in cell adhesion. Large genomic rearrangements in desmosome genes have been systematically investigated in a cohort of ACM patients. Recent studies have identified novel causes of ACM providing new insights into the genetic spectrum of the disease and highlighting an overlapping phenotype between ACM and dilated cardiomyopathy. Next-generation sequencing is a useful tool for research and genetic diagnostic screening but interpretation of identified sequence variants requires caution and should be performed in specialized centres.

  18. Integrating mRNA and Protein Sequencing Enables the Detection and Quantitative Profiling of Natural Protein Sequence Variants of Populus trichocarpa.

    PubMed

    Abraham, Paul E; Wang, Xiaojing; Ranjan, Priya; Nookaew, Intawat; Zhang, Bing; Tuskan, Gerald A; Hettich, Robert L

    2015-12-04

    Next-generation sequencing has transformed the ability to link genotypes to phenotypes and facilitates the dissection of genetic contribution to complex traits. However, it is challenging to link genetic variants with the perturbed functional effects on proteins encoded by such genes. Here we show how RNA sequencing can be exploited to construct genotype-specific protein sequence databases to assess natural variation in proteins, providing information about the molecular toolbox driving cellular processes. For this study, we used two natural genotypes selected from a recent genome-wide association study of Populus trichocarpa, an obligate outcrosser with tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs), as well as insertions and deletions. We profiled the frequency of 128 types of naturally occurring amino acid substitutions, including both expected (neutral) and unexpected (non-neutral) SAAPs, with a subset occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. By zeroing in on the molecular signatures of these important regions that might have previously been uncharacterized, we now provide a high-resolution molecular inventory that should improve accessibility and subsequent identification of natural protein variants in future genotype-to-phenotype studies.

  19. RNA-ID, a Powerful Tool for Identifying and Characterizing Regulatory Sequences.

    PubMed

    Brule, C E; Dean, K M; Grayhack, E J

    2016-01-01

    The identification and analysis of sequences that regulate gene expression is critical because regulated gene expression underlies biology. RNA-ID is an efficient and sensitive method to discover and investigate regulatory sequences in the yeast Saccharomyces cerevisiae, using fluorescence-based assays to detect green fluorescent protein (GFP) relative to a red fluorescent protein (RFP) control in individual cells. Putative regulatory sequences can be inserted either in-frame or upstream of a superfolder GFP fusion protein whose expression, like that of RFP, is driven by the bidirectional GAL1,10 promoter. In this chapter, we describe the methodology to identify and study cis-regulatory sequences in the RNA-ID system, explaining features and variations of the RNA-ID reporter, as well as some applications of this system. We describe in detail the methods to analyze a single regulatory sequence, from construction of a single GFP variant to assay of variants by flow cytometry, as well as modifications required to screen libraries of different strains simultaneously. We also describe subsequent analyses of regulatory sequences. © 2016 Elsevier Inc. All rights reserved.

  20. Application of Coamplification at Lower Denaturation Temperature-PCR Sequencing for Early Detection of Antiviral Drug Resistance Mutations of Hepatitis B Virus

    PubMed Central

    Wong, Danny Ka-Ho; Tsoi, Ottilia; Huang, Fung-Yu; Seto, Wai-Kay; Fung, James; Lai, Ching-Lung

    2014-01-01

    Nucleoside/nucleotide analogue for the treatment of chronic hepatitis B virus (HBV) infection is hampered by the emergence of drug resistance mutations. Conventional PCR sequencing cannot detect minor variants of <20%. We developed a modified co-amplification at lower denaturation temperature-PCR (COLD-PCR) method for the detection of HBV minority drug resistance mutations. The critical denaturation temperature for COLD-PCR was determined to be 78°C. Sensitivity of COLD-PCR sequencing was determined using serially diluted plasmids containing mixed proportions of HBV reverse transcriptase (rt) wild-type and mutant sequences. Conventional PCR sequencing detected mutations only if they existed in ≥25%, whereas COLD-PCR sequencing detected mutations when they existed in 5 to 10% of the viral population. The performance of COLD-PCR was compared to conventional PCR sequencing and a line probe assay (LiPA) using 215 samples obtained from 136 lamivudine- or telbivudine-treated patients with virological breakthrough. Among these 215 samples, drug resistance mutations were detected in 155 (72%), 148 (69%), and 113 samples (53%) by LiPA, COLD-PCR, and conventional PCR sequencing, respectively. Nineteen (9%) samples had mutations detectable by COLD-PCR but not LiPA, while 26 (12%) samples had mutations detectable by LiPA but not COLD-PCR, indicating both methods were comparable (P = 0.371). COLD-PCR was more sensitive than conventional PCR sequencing. Thirty-five (16%) samples had mutations detectable by COLD-PCR but not conventional PCR sequencing, while none had mutations detected by conventional PCR sequencing but not COLD-PCR (P < 0.0001). COLD-PCR sequencing is a simple method which is comparable to LiPA and superior to conventional PCR sequencing in detecting minor lamivudine/telbivudine resistance mutations. PMID:24951803

  1. ExScalibur: A High-Performance Cloud-Enabled Suite for Whole Exome Germline and Somatic Mutation Identification

    PubMed Central

    Huang, Lei; Kang, Wenjun; Bartom, Elizabeth; Onel, Kenan; Volchenboum, Samuel; Andrade, Jorge

    2015-01-01

    Whole exome sequencing has facilitated the discovery of causal genetic variants associated with human diseases at deep coverage and low cost. In particular, the detection of somatic mutations from tumor/normal pairs has provided insights into the cancer genome. Although there is an abundance of publicly-available software for the detection of germline and somatic variants, concordance is generally limited among variant callers and alignment algorithms. Successful integration of variants detected by multiple methods requires in-depth knowledge of the software, access to high-performance computing resources, and advanced programming techniques. We present ExScalibur, a set of fully automated, highly scalable and modulated pipelines for whole exome data analysis. The suite integrates multiple alignment and variant calling algorithms for the accurate detection of germline and somatic mutations with close to 99% sensitivity and specificity. ExScalibur implements streamlined execution of analytical modules, real-time monitoring of pipeline progress, robust handling of errors and intuitive documentation that allows for increased reproducibility and sharing of results and workflows. It runs on local computers, high-performance computing clusters and cloud environments. In addition, we provide a data analysis report utility to facilitate visualization of the results that offers interactive exploration of quality control files, read alignment and variant calls, assisting downstream customization of potential disease-causing mutations. ExScalibur is open-source and is also available as a public image on Amazon cloud. PMID:26271043

  2. Molecular epidemiology of GI and GII noroviruses in sewage: 1-year surveillance in eastern China.

    PubMed

    Zhou, N; Lin, X; Wang, S; Tao, Z; Xiong, P; Wang, H; Liu, Y; Song, Y; Xu, A

    2016-10-01

    To determine the concentration and molecular epidemiology of GI and GII noroviruses in sewage in China. Twenty-three raw sewage samples were collected in the cities of Jinan and Linyi, eastern China in 2014. GI and GII noroviruses were positive in all samples after TaqMan-based quantitative PCR. The mean concentrations of GI and GII noroviruses were 4·52 × 10(4) and 7·88 × 10(4) genome copies per litre respectively. After reverse transcription-PCR, cloning and sequencing, 16 genotypes were identified. GI.6 (69·6%), GI.2 (65·2%), GII.13 (65·2%), GII.6 (60·9%) and GII.17 (60·9%) were the most common GI and GII genotypes. A recombination event was observed in two GI.6 sequences. GII.4 sequences belonged to Sydney 2012 and Den Haag 2006b variant. Interestingly, the novel GII.17 Kawasaki308 variant was detected. These results reveal that multiple norovirus genotypes cocirculated in the local population. The risk of acute gastroenteritis outbreak is high in the two cities due to the detection of GII.17 Kawasaki308 variant and the high concentration of norovirus in raw sewage. This study demonstrates sewage surveillance can be a useful approach to monitor norovirus circulating in the population. © 2016 The Society for Applied Microbiology.

  3. Molecular detection and characterization of sapovirus in hospitalized children with acute gastroenteritis in the Philippines.

    PubMed

    Liu, Xiaofang; Yamamoto, Dai; Saito, Mariko; Imagawa, Toshifumi; Ablola, Adrianne; Tandoc, Amado O; Segubre-Mercado, Edelwisa; Lupisan, Socorro P; Okamoto, Michiko; Furuse, Yuki; Saito, Mayuko; Oshitani, Hitoshi

    2015-07-01

    Human sapovirus (SaV) is a causative agent of acute gastroenteritis. Recently, SaV detection has been increasing worldwide due to the emerging SaV genotype I.2. However, SaV infection has not been reported in the Philippines. To evaluate the prevalence and genetic diversity of SaV in hospitalized children aged less than 5 years with acute gastroenteritis. Stool samples were collected from children with acute gastroenteritis at three hospitals in the Philippines from June 2012 to August 2013. SaV was detected by reverse transcription real-time PCR, and the polymerase and capsid gene sequences were analyzed. Full genome sequencing and recombination analysis were performed on possible recombinant viruses. SaV was detected in 7.0% of the tested stool samples (29/417). In 10 SaV-positive cases, other viruses were also detected, including rotavirus (n=6), norovirus (n=2), and human astrovirus (n=2). Four known SaV genotypes (GI.1 [7], GI.2 [2], GII.1 [12], and GV [2]) and one novel recombinant (n=3) were identified by polymerase and capsid gene sequence analysis. Full genome sequencing revealed that the 5' nontranslated region (NTR) and nonstructural protein region of the novel recombinant were closely related to the GII.1 Bristol/98/UK variant, whereas the structural protein region and 3' NTR were closely related to the GII.4 Kumamoto6/Mar2003/JPN variant. SaV was regularly detected in hospitalized children due to acute gastroenteritis during the study period. A novel recombinant, SaV GII.1/GII.4, was identified in three cases at two different study sites. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.

  4. Clinical Interpretation and Implications of Whole-Genome Sequencing

    PubMed Central

    Dewey, Frederick E.; Grove, Megan E.; Pan, Cuiping; Goldstein, Benjamin A.; Bernstein, Jonathan A.; Chaib, Hassan; Merker, Jason D.; Goldfeder, Rachel L.; Enns, Gregory M.; David, Sean P.; Pakdaman, Neda; Ormond, Kelly E.; Caleshu, Colleen; Kingham, Kerry; Klein, Teri E.; Whirl-Carrillo, Michelle; Sakamoto, Kenneth; Wheeler, Matthew T.; Butte, Atul J.; Ford, James M.; Boxer, Linda; Ioannidis, John P. A.; Yeung, Alan C.; Altman, Russ B.; Assimes, Themistocles L.; Snyder, Michael; Ashley, Euan A.; Quertermous, Thomas

    2014-01-01

    IMPORTANCE Whole-genome sequencing (WGS) is increasingly applied in clinical medicine and is expected to uncover clinically significant findings regardless of sequencing indication. OBJECTIVES To examine coverage and concordance of clinically relevant genetic variation provided by WGS technologies; to quantitate inherited disease risk and pharmacogenomic findings in WGS data and resources required for their discovery and interpretation; and to evaluate clinical action prompted by WGS findings. DESIGN, SETTING, AND PARTICIPANTS An exploratory study of 12 adult participants recruited at Stanford University Medical Center who underwent WGS between November 2011 and March 2012. A multidisciplinary team reviewed all potentially reportable genetic findings. Five physicians proposed initial clinical follow-up based on the genetic findings. MAIN OUTCOMES AND MEASURES Genome coverage and sequencing platform concordance in different categories of genetic disease risk, person-hours spent curating candidate disease-risk variants, interpretation agreement between trained curators and disease genetics databases, burden of inherited disease risk and pharmacogenomic findings, and burden and interrater agreement of proposed clinical follow-up. RESULTS Depending on sequencing platform, 10% to 19% of inherited disease genes were not covered to accepted standards for single nucleotide variant discovery. Genotype concordance was high for previously described single nucleotide genetic variants (99%-100%) but low for small insertion/deletion variants (53%-59%). Curation of 90 to 127 genetic variants in each participant required a median of 54 minutes (range, 5-223 minutes) per genetic variant, resulted in moderate classification agreement between professionals (Gross κ, 0.52; 95%CI, 0.40-0.64), and reclassified 69%of genetic variants cataloged as disease causing in mutation databases to variants of uncertain or lesser significance. Two to 6 personal disease-risk findings were discovered in each participant, including 1 frameshift deletion in the BRCA1 gene implicated in hereditary breast and ovarian cancer. Physician review of sequencing findings prompted consideration of a median of 1 to 3 initial diagnostic tests and referrals per participant, with fair interrater agreement about the suitability of WGS findings for clinical follow-up (Fleiss κ, 0.24; P < 001). CONCLUSIONS AND RELEVANCE In this exploratory study of 12 volunteer adults, the use of WGS was associated with incomplete coverage of inherited disease genes, low reproducibility of detection of genetic variation with the highest potential clinical effects, and uncertainty about clinically reportable findings. In certain cases, WGS will identify clinically actionable genetic variants warranting early medical intervention. These issues should be considered when determining the role of WGS in clinical medicine. PMID:24618965

  5. SIBIS: a Bayesian model for inconsistent protein sequence estimation.

    PubMed

    Khenoussi, Walyd; Vanhoutrève, Renaud; Poch, Olivier; Thompson, Julie D

    2014-09-01

    The prediction of protein coding genes is a major challenge that depends on the quality of genome sequencing, the accuracy of the model used to elucidate the exonic structure of the genes and the complexity of the gene splicing process leading to different protein variants. As a consequence, today's protein databases contain a huge amount of inconsistency, due to both natural variants and sequence prediction errors. We have developed a new method, called SIBIS, to detect such inconsistencies based on the evolutionary information in multiple sequence alignments. A Bayesian framework, combined with Dirichlet mixture models, is used to estimate the probability of observing specific amino acids and to detect inconsistent or erroneous sequence segments. We evaluated the performance of SIBIS on a reference set of protein sequences with experimentally validated errors and showed that the sensitivity is significantly higher than previous methods, with only a small loss of specificity. We also assessed a large set of human sequences from the UniProt database and found evidence of inconsistency in 48% of the previously uncharacterized sequences. We conclude that the integration of quality control methods like SIBIS in automatic analysis pipelines will be critical for the robust inference of structural, functional and phylogenetic information from these sequences. Source code, implemented in C on a linux system, and the datasets of protein sequences are freely available for download at http://www.lbgi.fr/∼julie/SIBIS. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  6. Molecular genetic studies of DMT1 on 12q in French-Canadian restless legs syndrome patients and families.

    PubMed

    Xiong, Lan; Dion, Patrick; Montplaisir, Jacques; Levchenko, Anastasia; Thibodeau, Pascale; Karemera, Liliane; Rivière, Jean-Baptiste; St-Onge, Judith; Gaspar, Claudia; Dubé, Marie-Pierre; Desautels, Alex; Turecki, Gustavo; Rouleau, Guy A

    2007-10-05

    Converging evidence from clinical observations, brain imaging and pathological findings strongly indicate impaired brain iron regulation in restless legs syndrome (RLS). Animal models with mutation in (DMT1) divalent metal transporter 1 gene, an important brain iron transporter, demonstrate a similar iron deficiency profile as found in RLS brain. The human DMT1 gene, mapped to chromosome 12q near the RLS1 locus, qualifies as an excellent functional and possible positional candidate for RLS. DMT1 protein levels were assessed in lymphoblastoid cell lines from RLS patients and controls. Linkage analyses were carried out with markers flanking and within the DMT1 gene. Selected patient samples from RLS families with compatible linkage to the RLS1 locus on 12q were fully sequenced in both the coding regions and the long stretches of UTR sequences. Finally, selected sequence variants were further studied in case/control and family-based association tests. A clinical association of anemia and RLS was further confirmed in this study. There was no detectable difference in DMT1 protein levels between RLS patient lymphoblastoid cell lines and normal controls. Non-parametric linkage analyses failed to identify any significant linkage signals within the DMT1 gene region. Sequencing of selected patients did not detect any sequence variant(s) compatible with DMT1 harboring RLS causative mutation(s). Further studies did not find any association between ten SNPs, spanning the whole DMT1 gene region, and RLS affection status. Finally, two DMT1 intronic SNPs showed positive association with RLS in patients with a history of anemia, when compared to RLS patients without anemia. (c) 2007 Wiley-Liss, Inc.

  7. Deep sequencing is an appropriate tool for the selection of unique Hepatitis C virus (HCV) variants after single genomic amplification.

    PubMed

    Guinoiseau, Thibault; Moreau, Alain; Hohnadel, Guillaume; Ngo-Giang-Huong, Nicole; Brulard, Celine; Vourc'h, Patrick; Goudeau, Alain; Gaudy-Graffin, Catherine

    2017-01-01

    Hepatitis C virus (HCV) evolves rapidly in a single host and circulates as a quasispecies wich is a complex mixture of genetically distinct virus's but closely related namely variants. To identify intra-individual diversity and investigate their functional properties in vitro, it is necessary to define their quasispecies composition and isolate the HCV variants. This is possible using single genome amplification (SGA). This technique, based on serially diluted cDNA to amplify a single cDNA molecule (clonal amplicon), has already been used to determine individual HCV diversity. In these studies, positive PCR reactions from SGA were directly sequenced using Sanger technology. The detection of non-clonal amplicons is necessary for excluding them to facilitate further functional analysis. Here, we compared Next Generation Sequencing (NGS) with De Novo assembly and Sanger sequencing for their ability to distinguish clonal and non-clonal amplicons after SGA on one plasma specimen. All amplicons (n = 42) classified as clonal by NGS were also classified as clonal by Sanger sequencing. No double peaks were seen on electropherograms for non-clonal amplicons with position-specific nucleotide variation below 15% by NGS. Altogether, NGS circumvented many of the difficulties encountered when using Sanger sequencing after SGA and is an appropriate tool to reliability select clonal amplicons for further functional studies.

  8. Deep sequencing is an appropriate tool for the selection of unique Hepatitis C virus (HCV) variants after single genomic amplification

    PubMed Central

    Guinoiseau, Thibault; Moreau, Alain; Hohnadel, Guillaume; Ngo-Giang-Huong, Nicole; Brulard, Celine; Vourc’h, Patrick; Goudeau, Alain; Gaudy-Graffin, Catherine

    2017-01-01

    Hepatitis C virus (HCV) evolves rapidly in a single host and circulates as a quasispecies wich is a complex mixture of genetically distinct virus’s but closely related namely variants. To identify intra-individual diversity and investigate their functional properties in vitro, it is necessary to define their quasispecies composition and isolate the HCV variants. This is possible using single genome amplification (SGA). This technique, based on serially diluted cDNA to amplify a single cDNA molecule (clonal amplicon), has already been used to determine individual HCV diversity. In these studies, positive PCR reactions from SGA were directly sequenced using Sanger technology. The detection of non-clonal amplicons is necessary for excluding them to facilitate further functional analysis. Here, we compared Next Generation Sequencing (NGS) with De Novo assembly and Sanger sequencing for their ability to distinguish clonal and non-clonal amplicons after SGA on one plasma specimen. All amplicons (n = 42) classified as clonal by NGS were also classified as clonal by Sanger sequencing. No double peaks were seen on electropherograms for non-clonal amplicons with position-specific nucleotide variation below 15% by NGS. Altogether, NGS circumvented many of the difficulties encountered when using Sanger sequencing after SGA and is an appropriate tool to reliability select clonal amplicons for further functional studies. PMID:28362878

  9. Molecular characterization of canine parvovirus variants (CPV-2a, CPV-2b, and CPV-2c) based on the VP2 gene in affected domestic dogs in Ecuador

    PubMed Central

    la Torre, David De; Mafla, Eulalia; Puga, Byron; Erazo, Linda; Astolfi-Ferreira, Claudete; Ferreira, Antonio Piantino

    2018-01-01

    Aim The objective of this study was to determine the presence of the variants of canine parvovirus (CPV)-2 in the city of Quito, Ecuador, due to the high domestic and street-type canine population, and to identify possible mutations at a genetic level that could be causing structural changes in the virus with a consequent influence on the immune response of the hosts. Materials and Methods Thirty-five stool samples from different puppies with characteristic signs of the disease and positives for CPV through immunochromatography kits were collected from different veterinarian clinics of the city. Polymerase chain reaction and DNA sequencing were used to determine the mutations in residue 426 of the VP2 gene, which determines the variants of CPV-2; in addition, four samples were chosen for complete sequencing of the VP2 gene to identify all possible mutations in the circulating strains in this region of the country. Results The results revealed the presence of the three variants of CPV-2 with a prevalence of 57.1% (20/35) for CPV-2a, 8.5% (3/35) for CPV-2b, and 34.3% (12/35) for CPV-2c. In addition, complete sequencing of the VP2 gene showed amino acid substitutions in residues 87, 101, 139, 219, 297, 300, 305, 322, 324, 375, 386, 426, 440, and 514 of the three Ecuadorian variants when compared with the original CPV-2 sequence. Conclusion This study describes the detection of CPV variants in the city of Quito, Ecuador. Variants of CPV-2 (2a, 2b, and 2c) have been reported in South America, and there are cases in Ecuador where CVP-2 is affecting even vaccinated puppies. PMID:29805214

  10. Identification of novel genetic causes of Rett syndrome-like phenotypes.

    PubMed

    Lopes, Fátima; Barbosa, Mafalda; Ameur, Adam; Soares, Gabriela; de Sá, Joaquim; Dias, Ana Isabel; Oliveira, Guiomar; Cabral, Pedro; Temudo, Teresa; Calado, Eulália; Cruz, Isabel Fineza; Vieira, José Pedro; Oliveira, Renata; Esteves, Sofia; Sauer, Sascha; Jonasson, Inger; Syvänen, Ann-Christine; Gyllensten, Ulf; Pinto, Dalila; Maciel, Patrícia

    2016-03-01

    The aim of this work was to identify new genetic causes of Rett-like phenotypes using array comparative genomic hybridisation and a whole exome sequencing approach. We studied a cohort of 19 Portuguese patients (16 girls, 3 boys) with a clinical presentation significantly overlapping Rett syndrome (RTT). Genetic analysis included filtering of the single nucleotide variants and indels with preference for de novo, homozygous/compound heterozygous, or maternally inherited X linked variants. Examination by MRI and muscle biopsies was also performed. Pathogenic genomic imbalances were found in two patients (10.5%): an 18q21.2 deletion encompassing four exons of the TCF4 gene and a mosaic UPD of chromosome 3. Variants in genes previously implicated in neurodevelopmental disorders (NDD) were identified in six patients (32%): de novo variants in EEF1A2, STXBP1 and ZNF238 were found in three patients, maternally inherited X linked variants in SLC35A2, ZFX and SHROOM4 were detected in two male patients and one homozygous variant in EIF2B2 was detected in one patient. Variants were also detected in five novel NDD candidate genes (26%): we identified de novo variants in the RHOBTB2, SMARCA1 and GABBR2 genes; a homozygous variant in EIF4G1; compound heterozygous variant in HTT. Network analysis reveals that these genes interact by means of protein interactions with each other and with the known RTT genes. These findings expand the phenotypical spectrum of previously known NDD genes to encompass RTT-like clinical presentations and identify new candidate genes for RTT-like phenotypes. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/

  11. Genotypic tropism testing by massively parallel sequencing: qualitative and quantitative analysis.

    PubMed

    Däumer, Martin; Kaiser, Rolf; Klein, Rolf; Lengauer, Thomas; Thiele, Bernhard; Thielen, Alexander

    2011-05-13

    Inferring viral tropism from genotype is a fast and inexpensive alternative to phenotypic testing. While being highly predictive when performed on clonal samples, sensitivity of predicting CXCR4-using (X4) variants drops substantially in clinical isolates. This is mainly attributed to minor variants not detected by standard bulk-sequencing. Massively parallel sequencing (MPS) detects single clones thereby being much more sensitive. Using this technology we wanted to improve genotypic prediction of coreceptor usage. Plasma samples from 55 antiretroviral-treated patients tested for coreceptor usage with the Monogram Trofile Assay were sequenced with standard population-based approaches. Fourteen of these samples were selected for further analysis with MPS. Tropism was predicted from each sequence with geno2pheno[coreceptor]. Prediction based on bulk-sequencing yielded 59.1% sensitivity and 90.9% specificity compared to the trofile assay. With MPS, 7600 reads were generated on average per isolate. Minorities of sequences with high confidence in CXCR4-usage were found in all samples, irrespective of phenotype. When using the default false-positive-rate of geno2pheno[coreceptor] (10%), and defining a minority cutoff of 5%, the results were concordant in all but one isolate. The combination of MPS and coreceptor usage prediction results in a fast and accurate alternative to phenotypic assays. The detection of X4-viruses in all isolates suggests that coreceptor usage as well as fitness of minorities is important for therapy outcome. The high sensitivity of this technology in combination with a quantitative description of the viral population may allow implementing meaningful cutoffs for predicting response to CCR5-antagonists in the presence of X4-minorities.

  12. PolyPhred analysis software for mutation detection from fluorescence-based sequence data.

    PubMed

    Montgomery, Kate T; Iartchouck, Oleg; Li, Li; Loomis, Stephanie; Obourn, Vanessa; Kucherlapati, Raju

    2008-10-01

    The ability to search for genetic variants that may be related to human disease is one of the most exciting consequences of the availability of the sequence of the human genome. Large cohorts of individuals exhibiting certain phenotypes can be studied and candidate genes resequenced. However, the challenge of analyzing sequence data from many individuals with accuracy, speed, and economy is great. This unit describes one set of software tools: Phred, Phrap, PolyPhred, and Consed. Coverage includes the advantages and disadvantages of these analysis tools, details for obtaining and using the software, and the results one may expect. The software is being continually updated to permit further automation of mutation analysis. Currently, however, at least some manual review is required if one wishes to identify 100% of the variants in a sample set.

  13. Next-generation sequencing reveals cryptic mtDNA diversity of Plasmodium relictum in the Hawaiian Islands

    USGS Publications Warehouse

    Jarvi, S.I.; Farias, M.E.; Lapointe, D.A.; Belcaid, M.; Atkinson, C.T.

    2013-01-01

    Next-generation 454 sequencing techniques were used to re-examine diversity of mitochondrial cytochrome b lineages of avian malaria (Plasmodium relictum) in Hawaii. We document a minimum of 23 variant lineages of the parasite based on single nucleotide transitional changes, in addition to the previously reported single lineage (GRW4). A new, publicly available portal (Integroomer) was developed for initial parsing of 454 datasets. Mean variant prevalence and frequency was higher in low elevation Hawaii Amakihi (Hemignathus virens) with Avipoxvirus-like lesions (P = 0·001), suggesting that the variants may be biologically distinct. By contrast, variant prevalence and frequency did not differ significantly among mid-elevation Apapane (Himatione sanguinea) with or without lesions (P = 0·691). The low frequency and the lack of detection of variants independent of GRW4 suggest that multiple independent introductions of P. relictum to Hawaii are unlikely. Multiple variants may have been introduced in heteroplasmy with GRW4 or exist within the tandem repeat structure of the mitochondrial genome. The discovery of multiple mitochondrial lineages of P. relictum in Hawaii provides a measure of genetic diversity within a geographically isolated population of this parasite and suggests the origins and evolution of parasite diversity may be more complicated than previously recognized.

  14. Next-generation sequencing reveals cryptic mtDNA diversity of Plasmodium relictum in the Hawaiian Islands.

    PubMed

    Jarvi, S I; Farias, M E; Lapointe, D A; Belcaid, M; Atkinson, C T

    2013-12-01

    Next-generation 454 sequencing techniques were used to re-examine diversity of mitochondrial cytochrome b lineages of avian malaria (Plasmodium relictum) in Hawaii. We document a minimum of 23 variant lineages of the parasite based on single nucleotide transitional changes, in addition to the previously reported single lineage (GRW4). A new, publicly available portal (Integroomer) was developed for initial parsing of 454 datasets. Mean variant prevalence and frequency was higher in low elevation Hawaii Amakihi (Hemignathus virens) with Avipoxvirus-like lesions (P = 0·001), suggesting that the variants may be biologically distinct. By contrast, variant prevalence and frequency did not differ significantly among mid-elevation Apapane (Himatione sanguinea) with or without lesions (P = 0·691). The low frequency and the lack of detection of variants independent of GRW4 suggest that multiple independent introductions of P. relictum to Hawaii are unlikely. Multiple variants may have been introduced in heteroplasmy with GRW4 or exist within the tandem repeat structure of the mitochondrial genome. The discovery of multiple mitochondrial lineages of P. relictum in Hawaii provides a measure of genetic diversity within a geographically isolated population of this parasite and suggests the origins and evolution of parasite diversity may be more complicated than previously recognized.

  15. Detection of Clinically Relevant Genetic Variants in Autism Spectrum Disorder by Whole-Genome Sequencing

    PubMed Central

    Jiang, Yong-hui; Yuen, Ryan K.C.; Jin, Xin; Wang, Mingbang; Chen, Nong; Wu, Xueli; Ju, Jia; Mei, Junpu; Shi, Yujian; He, Mingze; Wang, Guangbiao; Liang, Jieqin; Wang, Zhe; Cao, Dandan; Carter, Melissa T.; Chrysler, Christina; Drmic, Irene E.; Howe, Jennifer L.; Lau, Lynette; Marshall, Christian R.; Merico, Daniele; Nalpathamkalam, Thomas; Thiruvahindrapuram, Bhooma; Thompson, Ann; Uddin, Mohammed; Walker, Susan; Luo, Jun; Anagnostou, Evdokia; Zwaigenbaum, Lonnie; Ring, Robert H.; Wang, Jian; Lajonchere, Clara; Wang, Jun; Shih, Andy; Szatmari, Peter; Yang, Huanming; Dawson, Geraldine; Li, Yingrui; Scherer, Stephen W.

    2013-01-01

    Autism Spectrum Disorder (ASD) demonstrates high heritability and familial clustering, yet the genetic causes remain only partially understood as a result of extensive clinical and genomic heterogeneity. Whole-genome sequencing (WGS) shows promise as a tool for identifying ASD risk genes as well as unreported mutations in known loci, but an assessment of its full utility in an ASD group has not been performed. We used WGS to examine 32 families with ASD to detect de novo or rare inherited genetic variants predicted to be deleterious (loss-of-function and damaging missense mutations). Among ASD probands, we identified deleterious de novo mutations in six of 32 (19%) families and X-linked or autosomal inherited alterations in ten of 32 (31%) families (some had combinations of mutations). The proportion of families identified with such putative mutations was larger than has been previously reported; this yield was in part due to the comprehensive and uniform coverage afforded by WGS. Deleterious variants were found in four unrecognized, nine known, and eight candidate ASD risk genes. Examples include CAPRIN1 and AFF2 (both linked to FMR1, which is involved in fragile X syndrome), VIP (involved in social-cognitive deficits), and other genes such as SCN2A and KCNQ2 (linked to epilepsy), NRXN1, and CHD7, which causes ASD-associated CHARGE syndrome. Taken together, these results suggest that WGS and thorough bioinformatic analyses for de novo and rare inherited mutations will improve the detection of genetic variants likely to be associated with ASD or its accompanying clinical symptoms. PMID:23849776

  16. ColoSeq provides comprehensive lynch and polyposis syndrome mutational analysis using massively parallel sequencing.

    PubMed

    Pritchard, Colin C; Smith, Christina; Salipante, Stephen J; Lee, Ming K; Thornton, Anne M; Nord, Alex S; Gulden, Cassandra; Kupfer, Sonia S; Swisher, Elizabeth M; Bennett, Robin L; Novetsky, Akiva P; Jarvik, Gail P; Olopade, Olufunmilayo I; Goodfellow, Paul J; King, Mary-Claire; Tait, Jonathan F; Walsh, Tom

    2012-07-01

    Lynch syndrome (hereditary nonpolyposis colon cancer) and adenomatous polyposis syndromes frequently have overlapping clinical features. Current approaches for molecular genetic testing are often stepwise, taking a best-candidate gene approach with testing of additional genes if initial results are negative. We report a comprehensive assay called ColoSeq that detects all classes of mutations in Lynch and polyposis syndrome genes using targeted capture and massively parallel next-generation sequencing on the Illumina HiSeq2000 instrument. In blinded specimens and colon cancer cell lines with defined mutations, ColoSeq correctly identified 28/28 (100%) pathogenic mutations in MLH1, MSH2, MSH6, PMS2, EPCAM, APC, and MUTYH, including single nucleotide variants (SNVs), small insertions and deletions, and large copy number variants. There was 100% reproducibility of detection mutation between independent runs. The assay correctly identified 222 of 224 heterozygous SNVs (99.4%) in HapMap samples, demonstrating high sensitivity of calling all variants across each captured gene. Average coverage was greater than 320 reads per base pair when the maximum of 96 index samples with barcodes were pooled. In a specificity study of 19 control patients without cancer from different ethnic backgrounds, we did not find any pathogenic mutations but detected two variants of uncertain significance. ColoSeq offers a powerful, cost-effective means of genetic testing for Lynch and polyposis syndromes that eliminates the need for stepwise testing and multiple follow-up clinical visits. Copyright © 2012 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  17. A Comparison of Variant Calling Pipelines Using Genome in a Bottle as a Reference

    PubMed Central

    2015-01-01

    High-throughput sequencing, especially of exomes, is a popular diagnostic tool, but it is difficult to determine which tools are the best at analyzing this data. In this study, we use the NIST Genome in a Bottle results as a novel resource for validation of our exome analysis pipeline. We use six different aligners and five different variant callers to determine which pipeline, of the 30 total, performs the best on a human exome that was used to help generate the list of variants detected by the Genome in a Bottle Consortium. Of these 30 pipelines, we found that Novoalign in conjunction with GATK UnifiedGenotyper exhibited the highest sensitivity while maintaining a low number of false positives for SNVs. However, it is apparent that indels are still difficult for any pipeline to handle with none of the tools achieving an average sensitivity higher than 33% or a Positive Predictive Value (PPV) higher than 53%. Lastly, as expected, it was found that aligners can play as vital a role in variant detection as variant callers themselves. PMID:26539496

  18. Clinical testing of BRCA1 and BRCA2: a worldwide snapshot of technological practices.

    PubMed

    Toland, Amanda Ewart; Forman, Andrea; Couch, Fergus J; Culver, Julie O; Eccles, Diana M; Foulkes, William D; Hogervorst, Frans B L; Houdayer, Claude; Levy-Lahad, Ephrat; Monteiro, Alvaro N; Neuhausen, Susan L; Plon, Sharon E; Sharan, Shyam K; Spurdle, Amanda B; Szabo, Csilla; Brody, Lawrence C

    2018-01-01

    Clinical testing of BRCA1 and BRCA2 began over 20 years ago. With the expiration and overturning of the BRCA patents, limitations on which laboratories could offer commercial testing were lifted. These legal changes occurred approximately the same time as the widespread adoption of massively parallel sequencing (MPS) technologies. Little is known about how these changes impacted laboratory practices for detecting genetic alterations in hereditary breast and ovarian cancer genes. Therefore, we sought to examine current laboratory genetic testing practices for BRCA1 / BRCA2 . We employed an online survey of 65 questions covering four areas: laboratory characteristics, details on technological methods, variant classification, and client-support information. Eight United States (US) laboratories and 78 non-US laboratories completed the survey. Most laboratories (93%; 80/86) used MPS platforms to identify variants. Laboratories differed widely on: (1) technologies used for large rearrangement detection; (2) criteria for minimum read depths; (3) non-coding regions sequenced; (4) variant classification criteria and approaches; (5) testing volume ranging from 2 to 2.5 × 10 5 tests annually; and (6) deposition of variants into public databases. These data may be useful for national and international agencies to set recommendations for quality standards for BRCA1/BRCA2 clinical testing. These standards could also be applied to testing of other disease genes.

  19. Trends in Correlation-Based Pattern Recognition and Tracking in Forward-Looking Infrared Imagery

    PubMed Central

    Alam, Mohammad S.; Bhuiyan, Sharif M. A.

    2014-01-01

    In this paper, we review the recent trends and advancements on correlation-based pattern recognition and tracking in forward-looking infrared (FLIR) imagery. In particular, we discuss matched filter-based correlation techniques for target detection and tracking which are widely used for various real time applications. We analyze and present test results involving recently reported matched filters such as the maximum average correlation height (MACH) filter and its variants, and distance classifier correlation filter (DCCF) and its variants. Test results are presented for both single/multiple target detection and tracking using various real-life FLIR image sequences. PMID:25061840

  20. Functional phosphodiesterase 11A mutations may modify the risk of familial and bilateral testicular germ cell tumors

    PubMed Central

    Horvath, Anelia; Korde, Larissa; Greene, Mark H.; Libe, Rosella; Osorio, Paulo; Faucz, Fabio Rueda; Raffin-Sanson, Marie Laure; Tsang, Kit Man; Drori-Herishanu, Limor; Patronas, Yianna; Remmers, Elaine F; Nikita, Maria-Elena; Moran, Jason; Greene, Joseph; Nesterova, Maria; Merino, Maria; Bertherat, Jerome; Stratakis, Constantine A.

    2009-01-01

    Inactivating germline mutations in phosphodiesterase 11A (PDE11A) have been implicated in adrenal tumor susceptibility. PDE11A is highly-expressed in endocrine steroidogenic tissues, especially the testis, and mice with inactivated Pde11a exhibit male infertility, a known testicular germ cell tumor (TGCT) risk factor. We sequenced the PDE11A gene-coding region in 95 patients with TGCT from 64 unrelated kindreds. We identified 8 non-synonymous substitutions in 20 patients from 15 families: four (R52T; F258Y; G291R; V820M) were newly-recognized, three (R804H; R867G; M878V) were functional variants previously implicated in adrenal tumor predisposition, and one (Y727C) was a known polymorphism. We compared the frequency of these variants in our patients to unrelated controls that had been screened and found negative for any endocrine diseases: only the two previously-reported variants, R804H and R867G, known to be frequent in general population, were detected in these controls. The frequency of all PDE11A-gene variants (combined) was significantly higher among patients with TGCT (P=0.0002), present in 19% of the families of our cohort. Most variants were detected in the general population, but functional studies showed that all these mutations reduced PDE activity, and that PDE11A protein expression was decreased (or absent) in TGCT samples from carriers. This is the first demonstration of a PDE gene’s involvement in TGCT, although the cAMP signaling pathway has been investigated extensively in other reproductive organs and their diseases. In conclusion, we report that PDE11A-inactivating sequence variants may modify the risk of familial and bilateral TGCT. PMID:19549888

  1. Detection of novel divergent arenaviruses in boid snakes with inclusion body disease in The Netherlands.

    PubMed

    Bodewes, R; Kik, M J L; Raj, V Stalin; Schapendonk, C M E; Haagmans, B L; Smits, S L; Osterhaus, A D M E

    2013-06-01

    Arenaviruses are bi-segmented negative-stranded RNA viruses, which were until recently only detected in rodents and humans. Now highly divergent arenaviruses have been identified in boid snakes with inclusion body disease (IBD). Here, we describe the identification of a new species and variants of the highly divergent arenaviruses, which were detected in tissues of captive boid snakes with IBD in The Netherlands by next-generation sequencing. Phylogenetic analysis of the complete sequence of the open reading frames of the four predicted proteins of one of the detected viruses revealed that this virus was most closely related to the recently identified Golden Gate virus, while considerable sequence differences were observed between the highly divergent arenaviruses detected in this study. These findings add to the recent identification of the highly divergent arenaviruses in boid snakes with IBD in the United States and indicate that these viruses also circulate among boid snakes in Europe.

  2. ToTem: a tool for variant calling pipeline optimization.

    PubMed

    Tom, Nikola; Tom, Ondrej; Malcikova, Jitka; Pavlova, Sarka; Kubesova, Blanka; Rausch, Tobias; Kolarik, Miroslav; Benes, Vladimir; Bystry, Vojtech; Pospisilova, Sarka

    2018-06-26

    High-throughput bioinformatics analyses of next generation sequencing (NGS) data often require challenging pipeline optimization. The key problem is choosing appropriate tools and selecting the best parameters for optimal precision and recall. Here we introduce ToTem, a tool for automated pipeline optimization. ToTem is a stand-alone web application with a comprehensive graphical user interface (GUI). ToTem is written in Java and PHP with an underlying connection to a MySQL database. Its primary role is to automatically generate, execute and benchmark different variant calling pipeline settings. Our tool allows an analysis to be started from any level of the process and with the possibility of plugging almost any tool or code. To prevent an over-fitting of pipeline parameters, ToTem ensures the reproducibility of these by using cross validation techniques that penalize the final precision, recall and F-measure. The results are interpreted as interactive graphs and tables allowing an optimal pipeline to be selected, based on the user's priorities. Using ToTem, we were able to optimize somatic variant calling from ultra-deep targeted gene sequencing (TGS) data and germline variant detection in whole genome sequencing (WGS) data. ToTem is a tool for automated pipeline optimization which is freely available as a web application at  https://totem.software .

  3. Human papillomavirus type-16 variants in Quechua aboriginals from Argentina.

    PubMed

    Picconi, María Alejandra; Alonio, Lidia Virginia; Sichero, Laura; Mbayed, Viviana; Villa, Luisa Lina; Gronda, Jorge; Campos, Rodolfo; Teyssié, Angélica

    2003-04-01

    Cervical carcinoma is the leading cause of cancer death in Quechua indians from Jujuy (northwestern Argentina). To determine the prevalence of HPV-16 variants, 106 HPV-16 positive cervical samples were studied, including 33 low-grade squamous intraepithelial lesions (LSIL), 28 high-grade squamous intraepithelial lesions (HSIL), 9 invasive cervical cancer (ICC), and 36 samples from women with normal colposcopy and cytology. HPV genome variability was examined in the L1 and E6 genes by PCR-hybridization. In a subset of 20 samples, a LCR fragment was also analyzed by PCR-sequencing. Most variants belonged to the European branch with subtle differences that depended on the viral gene fragment studied. Only about 10% of the specimens had non-European variants, including eight Asian-American, two Asian, and one North-American-1. E6 gene analysis revealed that 43% of the samples were identical to HPV-16 prototype, while 57% corresponded to variants. Interestingly, the majority (87%) of normal smears had HPV-16 prototype, whereas variants were detected mainly in SIL and ICC. LCR sequencing yielded 80% of variants, including 69% of European, 19% Asian-American, and 12% Asian. We identified a new variant, the Argentine Quechua-51 (AQ-51), similar to B-14 plus two additional changes: G7842-->A and A7837-->C; phylogenetic inference allocated it in the Asian-American branch. The high proportion of European variants may reflect Spanish colonial influence on these native Inca descendants. The predominance of HPV-16 variants in pathologic samples when compared to normal controls could have implications for the natural history of cervical lesions. Copyright 2003 Wiley-Liss, Inc.

  4. Specific Detection of Naturally Occurring Hepatitis C Virus Mutants with Resistance to Telaprevir and Boceprevir (Protease Inhibitors) among Treatment-Naïve Infected Individuals

    PubMed Central

    Fonseca-Coronado, Salvador; Escobar-Gutiérrez, Alejandro; Ruiz-Tovar, Karina; Cruz-Rivera, Mayra Yolanda; Rivera-Osorio, Pilar; Vazquez-Pichardo, Mauricio; Carpio-Pedroza, Juan Carlos; Ruíz-Pacheco, Juan Alberto; Cazares, Fernando

    2012-01-01

    The use of telaprevir and boceprevir, both protease inhibitors (PI), as part of the specifically targeted antiviral therapy for hepatitis C (STAT-C) has significantly improved sustained virologic response (SVR) rates. However, different clinical studies have also identified several mutations associated with viral resistance to both PIs. In the absence of selective pressure, drug-resistant hepatitis C virus (HCV) mutants are generally present at low frequency, making mutation detection challenging. Here, we describe a mismatch amplification mutation assay (MAMA) PCR method for the specific detection of naturally occurring drug-resistant HCV mutants. MAMA PCR successfully identified the corresponding HCV variants, while conventional methods such as direct sequencing, endpoint limiting dilution (EPLD), and bacterial cloning were not sensitive enough to detect circulating drug-resistant mutants in clinical specimens. Ultradeep pyrosequencing was used to confirm the presence of the corresponding HCV mutants. In treatment-naïve patients, the frequency of all resistant variants was below 1%. Deep amplicon sequencing allowed a detailed analysis of the structure of the viral population among these patients, showing that the evolution of the NS3 is limited to a rather small sequence space. Monitoring of HCV drug resistance before and during treatment is likely to provide important information for management of patients undergoing anti-HCV therapy. PMID:22116161

  5. Whole exome or genome sequencing: nurses need to prepare families for the possibilities.

    PubMed

    Prows, Cynthia A; Tran, Grace; Blosser, Beverly

    2014-12-01

    A discussion of whole exome sequencing and the type of possible results patients and families should be aware of before samples are obtained. To find the genetic cause of a rare disorder, whole exome sequencing analyses all known and suspected human genes from a single sample. Over 20,000 detected DNA variants in each individual exome must be considered as possibly causing disease or disregarded as not relevant to the person's disease. In the process, unexpected gene variants associated with known diseases unrelated to the primary purpose of the test may be incidentally discovered. Because family members' DNA samples are often needed, gene variants associated with known genetic diseases or predispositions for diseases can also be discovered in their samples. Discussion paper. PubMed 2009-2013, list of references in retrieved articles, Google Scholar. Nurses need a general understanding of the scope of potential genomic information that may be revealed with whole exome sequencing to provide support and guidance to individuals and families during their decision-making process, while waiting for results and after disclosure. Nurse scientists who want to use whole exome sequencing in their study design and methods must decide early in study development if they will return primary whole exome sequencing research results and if they will give research participants choices about learning incidental research results. It is critical that nurses translate their knowledge about whole exome sequencing into their patient education and patient advocacy roles and relevant programmes of research. © 2014 John Wiley & Sons Ltd.

  6. Variants of human papillomavirus type 16 predispose toward persistent infection

    PubMed Central

    Zhang, Lei; Liao, Hong; Yang, Binlie; Geffre, Christopher P; Zhang, Ai; Zhou, Aizhi; Cao, Huimin; Wang, Jieru; Zhang, Zhenbo; Zheng, Wenxin

    2015-01-01

    A cohort study of 292 Chinese women was conducted to determine the relationship between human papillomavirus (HPV) type 16 variants and persistent viral infection. Enrolled patients were HPV16 positive and had both normal cytology and histology. Flow-through hybridization and gene chip technology was used to identify the HPV type. A PCR sequencing assay was performed to find HPV16 E2, E6 and E7 gene variants. The associations between these variants and HPV16 persistent infection was analyzed by Fisher’s exact test. It was found that the variants T178G, T350G and A442C in the E6 gene, as well as C3158A and G3248A variants in the E2 gene were associated with persistent HPV16 infection. No link was observed between E7 variants and persistent viral infection. Our findings suggest that detection of specific HPV variants would help identify patients who are at high risk for viral persistence and development of cervical neoplasia. PMID:26339417

  7. SeqHBase: a big data toolset for family based sequencing data analysis.

    PubMed

    He, Min; Person, Thomas N; Hebbring, Scott J; Heinzen, Ethan; Ye, Zhan; Schrodi, Steven J; McPherson, Elizabeth W; Lin, Simon M; Peissig, Peggy L; Brilliant, Murray H; O'Rawe, Jason; Robison, Reid J; Lyon, Gholson J; Wang, Kai

    2015-04-01

    Whole-genome sequencing (WGS) and whole-exome sequencing (WES) technologies are increasingly used to identify disease-contributing mutations in human genomic studies. It can be a significant challenge to process such data, especially when a large family or cohort is sequenced. Our objective was to develop a big data toolset to efficiently manipulate genome-wide variants, functional annotations and coverage, together with conducting family based sequencing data analysis. Hadoop is a framework for reliable, scalable, distributed processing of large data sets using MapReduce programming models. Based on Hadoop and HBase, we developed SeqHBase, a big data-based toolset for analysing family based sequencing data to detect de novo, inherited homozygous, or compound heterozygous mutations that may contribute to disease manifestations. SeqHBase takes as input BAM files (for coverage at every site), variant call format (VCF) files (for variant calls) and functional annotations (for variant prioritisation). We applied SeqHBase to a 5-member nuclear family and a 10-member 3-generation family with WGS data, as well as a 4-member nuclear family with WES data. Analysis times were almost linearly scalable with number of data nodes. With 20 data nodes, SeqHBase took about 5 secs to analyse WES familial data and approximately 1 min to analyse WGS familial data. These results demonstrate SeqHBase's high efficiency and scalability, which is necessary as WGS and WES are rapidly becoming standard methods to study the genetics of familial disorders. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  8. Multiplex PCR for detection of plasmid-mediated colistin resistance determinants, mcr-1, mcr-2, mcr-3, mcr-4 and mcr-5 for surveillance purposes

    PubMed Central

    Rebelo, Ana Rita; Bortolaia, Valeria; Kjeldgaard, Jette S; Pedersen, Susanne K; Leekitcharoenphon, Pimlapas; Hansen, Inge M; Guerra, Beatriz; Malorny, Burkhard; Borowiak, Maria; Hammerl, Jens Andre; Battisti, Antonio; Franco, Alessia; Alba, Patricia; Perrin-Guyomard, Agnes; Granier, Sophie A; De Frutos Escobar, Cristina; Malhotra-Kumar, Surbhi; Villa, Laura; Carattoli, Alessandra; Hendriksen, Rene S

    2018-01-01

    Background and aim Plasmid-mediated colistin resistance mechanisms have been identified worldwide in the past years. A multiplex polymerase chain reaction (PCR) protocol for detection of all currently known transferable colistin resistance genes (mcr-1 to mcr-5, and variants) in Enterobacteriaceae was developed for surveillance or research purposes. Methods: We designed four new primer pairs to amplify mcr-1, mcr-2, mcr-3 and mcr-4 gene products and used the originally described primers for mcr-5 to obtain a stepwise separation of ca 200 bp between amplicons. The primer pairs and amplification conditions allow for single or multiple detection of all currently described mcr genes and their variants present in Enterobacteriaceae. The protocol was validated testing 49 European Escherichia coli and Salmonella isolates of animal origin. Results: Multiplex PCR results in bovine and porcine isolates from Spain, Germany, France and Italy showed full concordance with whole genome sequence data. The method was able to detect mcr-1, mcr-3 and mcr-4 as singletons or in different combinations as they were present in the test isolates. One new mcr-4 variant, mcr-4.3, was also identified. Conclusions: This method allows rapid identification of mcr-positive bacteria and overcomes the challenges of phenotypic detection of colistin resistance. The multiplex PCR should be particularly interesting in settings or laboratories with limited resources for performing genetic analysis as it provides information on the mechanism of colistin resistance without requiring genome sequencing. PMID:29439754

  9. Myopathy With SQSTM1 and TIA1 Variants: Clinical and Pathological Features.

    PubMed

    Niu, Zhiyv; Pontifex, Carly Sabine; Berini, Sarah; Hamilton, Leslie E; Naddaf, Elie; Wieben, Eric; Aleff, Ross A; Martens, Kristina; Gruber, Angela; Engel, Andrew G; Pfeffer, Gerald; Milone, Margherita

    2018-01-01

    The aim of this study is to identify the molecular defect of three unrelated individuals with late-onset predominant distal myopathy; to describe the spectrum of phenotype resulting from the contributing role of two variants in genes located on two different chromosomes; and to highlight the underappreciated complex forms of genetic myopathies. Clinical and laboratory data of three unrelated probands with predominantly distal weakness manifesting in the sixth-seventh decade of life, and available affected and unaffected family members were reviewed. Next-generation sequencing panel, whole exome sequencing, and targeted analyses of family members were performed to elucidate the genetic etiology of the myopathy. Genetic analyses detected two contributing variants located on different chromosomes in three unrelated probands: a heterozygous pathogenic mutation in SQSTM1 (c.1175C>T, p.Pro392Leu) and a heterozygous variant in TIA1 (c.1070A>G, p.Asn357Ser). The affected fraternal twin of one proband also carries both variants, while the unaffected family members harbor one or none. Two unrelated probands (family 1, II.3, and family 3, II.1) have a distal myopathy with rimmed vacuoles that manifested with index extensor weakness; the other proband (family 2, I.1) has myofibrillar myopathy manifesting with hypercapnic respiratory insufficiency and distal weakness. The findings indicate that all the affected individuals have a myopathy associated with both variants in SQSTM1 and TIA1 , respectively, suggesting that the two variants determine the phenotype and likely functionally interact. We speculate that the TIA1 variant is a modifier of the SQSTM1 mutation. We identify the combination of SQSTM1 and TIA1 variants as a novel genetic defect associated with myofibrillar myopathy and suggest to consider sequencing both genes in the molecular investigation of myopathy with rimmed vacuoles and myofibrillar myopathy although additional studies are needed to investigate the digenic nature of the disease.

  10. Next generation sequencing to dissect the genetic architecture of KNG1 and F11 loci using factor XI levels as an intermediate phenotype of thrombosis.

    PubMed

    Martin-Fernandez, Laura; Gavidia-Bovadilla, Giovana; Corrales, Irene; Brunel, Helena; Ramírez, Lorena; López, Sonia; Souto, Juan Carlos; Vidal, Francisco; Soria, José Manuel

    2017-01-01

    Venous thromboembolism is a complex disease with a high heritability. There are significant associations among Factor XI (FXI) levels and SNPs in the KNG1 and F11 loci. Our aim was to identify the genetic variation of KNG1 and F11 that might account for the variability of FXI levels. The KNG1 and F11 loci were sequenced completely in 110 unrelated individuals from the GAIT-2 (Genetic Analysis of Idiopathic Thrombophilia 2) Project using Next Generation Sequencing on an Illumina MiSeq. The GAIT-2 Project is a study of 935 individuals in 35 extended Spanish families selected through a proband with idiopathic thrombophilia. Among the 110 individuals, a subset of 40 individuals was chosen as a discovery sample for identifying variants. A total of 762 genetic variants were detected. Several significant associations were established among common variants and low-frequency variants sets in KNG1 and F11 with FXI levels using the PLINK and SKAT packages. Among these associations, those of rs710446 and five low-frequency variant sets in KNG1 with FXI level variation were significant after multiple testing correction and permutation. Also, two putative pathogenic mutations related to high and low FXI levels were identified by data filtering and in silico predictions. This study of KNG1 and F11 loci should help to understand the connection between genotypic variation and variation in FXI levels. The functional genetic variants should be useful as markers of thromboembolic risk.

  11. Somatic APC mosaicism and oligogenic inheritance in genetically unsolved colorectal adenomatous polyposis patients.

    PubMed

    Ciavarella, Michele; Miccoli, Sara; Prossomariti, Anna; Pippucci, Tommaso; Bonora, Elena; Buscherini, Francesco; Palombo, Flavia; Zuntini, Roberta; Balbi, Tiziana; Ceccarelli, Claudio; Bazzoli, Franco; Ricciardiello, Luigi; Turchetti, Daniela; Piazzi, Giulia

    2018-03-01

    Germline variants in the APC gene cause familial adenomatous polyposis. Inherited variants in MutYH, POLE, POLD1, NTHL1, and MSH3 genes and somatic APC mosaicism have been reported as alternative causes of polyposis. However, ~30-50% of cases of polyposis remain genetically unsolved. Thus, the aim of this study was to investigate the genetic causes of unexplained adenomatous polyposis. Eight sporadic cases with >20 adenomatous polyps by 35 years of age or >50 adenomatous polyps by 55 years of age, and no causative germline variants in APC and/or MutYH, were enrolled from a cohort of 56 subjects with adenomatous colorectal polyposis. APC gene mosaicism was investigated on DNA from colonic adenomas by Sanger sequencing or Whole Exome Sequencing (WES). Mosaicism extension to other tissues (peripheral blood, saliva, hair follicles) was evaluated using Sanger sequencing and/or digital PCR. APC second hit was investigated in adenomas from mosaic patients. WES was performed on DNA from peripheral blood to identify additional polyposis candidate variants. We identified APC mosaicism in 50% of patients. In three cases mosaicism was restricted to the colon, while in one it also extended to the duodenum and saliva. One patient without APC mosaicism, carrying an APC in-frame deletion of uncertain significance, was found to harbor rare germline variants in OGG1, POLQ, and EXO1 genes. In conclusion, our restrictive selection criteria improved the detection of mosaic APC patients. In addition, we showed for the first time that an oligogenic inheritance of rare variants might have a cooperative role in sporadic colorectal polyposis onset.

  12. High resolution identity testing of inactivated poliovirus vaccines

    PubMed Central

    Mee, Edward T.; Minor, Philip D.; Martin, Javier

    2015-01-01

    Background Definitive identification of poliovirus strains in vaccines is essential for quality control, particularly where multiple wild-type and Sabin strains are produced in the same facility. Sequence-based identification provides the ultimate in identity testing and would offer several advantages over serological methods. Methods We employed random RT-PCR and high throughput sequencing to recover full-length genome sequences from monovalent and trivalent poliovirus vaccine products at various stages of the manufacturing process. Results All expected strains were detected in previously characterised products and the method permitted identification of strains comprising as little as 0.1% of sequence reads. Highly similar Mahoney and Sabin 1 strains were readily discriminated on the basis of specific variant positions. Analysis of a product known to contain incorrect strains demonstrated that the method correctly identified the contaminants. Conclusion Random RT-PCR and shotgun sequencing provided high resolution identification of vaccine components. In addition to the recovery of full-length genome sequences, the method could also be easily adapted to the characterisation of minor variant frequencies and distinction of closely related products on the basis of distinguishing consensus and low frequency polymorphisms. PMID:26049003

  13. Evaluation of full S1 gene sequencing of classical and variant infectious bronchitis viruses extracted from allantoic fluid and FTA cards.

    PubMed

    Manswr, Basim; Ball, Christopher; Forrester, Anne; Chantrey, Julian; Ganapathy, Kannan

    2018-08-01

    Sequence variability in the S1 gene determines the genotype of infectious bronchitis virus (IBV) strains. A single RT-PCR assay was developed to amplify and sequence the full S1 gene for six classical and variant IBVs (M41, D274, 793B, IS/885/00, IS/1494/06 and Q1) enriched in allantoic fluid (AF) or the same AF inoculated onto Flinders Technology Association (FTA) cards. Representative strains from each genotype were grown in specific-pathogen-free eggs and RNA was extracted from AF. Full S1 gene amplification was achieved using primer A and primer 22.51. Products were sequenced using primers A, 1050+, 1380+ and SX3+ to obtain short sequences covering the full gene. Following serial dilutions of AF, detection limits of the partial assay were higher than those of the full S1 gene. Partial S1 sequences exhibited higher-than-average nucleotide similarity percentages (79%; 352 bp) compared to full S1 sequences (77%; 1756 bp), suggesting that full S1 analysis allows greater strain differentiation. For IBV detection from AF-inoculated FTA cards, four serotypes were incubated for up to 21 days at three temperatures, 4°C, room temperature (approximately 24°C) and 40°C. RNA was extracted and tested with partial and full S1 protocols. Through partial sequencing, all IBVs were successfully detected at all sampling points and storage temperatures. In contrast, using full S1 sequencing it was not possible to amplify the gene beyond 14 days or when stored at 40°C. Data presented show that for full S1 sequencing, a substantial amount of RNA is needed. Field samples collected onto FTA cards are unlikely to yield such quantity or quality. AF: allantoic fluid; CD50: ciliostatic dose 50; FTA: Flinders Technology Association; IB: infectious bronchitis; IBV: infectious bronchitis virus.

  14. On the Power and the Systematic Biases of the Detection of Chromosomal Inversions by Paired-End Genome Sequencing

    PubMed Central

    Lucas Lledó, José Ignacio; Cáceres, Mario

    2013-01-01

    One of the most used techniques to study structural variation at a genome level is paired-end mapping (PEM). PEM has the advantage of being able to detect balanced events, such as inversions and translocations. However, inversions are still quite difficult to predict reliably, especially from high-throughput sequencing data. We simulated realistic PEM experiments with different combinations of read and library fragment lengths, including sequencing errors and meaningful base-qualities, to quantify and track down the origin of false positives and negatives along sequencing, mapping, and downstream analysis. We show that PEM is very appropriate to detect a wide range of inversions, even with low coverage data. However, % of inversions located between segmental duplications are expected to go undetected by the most common sequencing strategies. In general, longer DNA libraries improve the detectability of inversions far better than increments of the coverage depth or the read length. Finally, we review the performance of three algorithms to detect inversions —SVDetect, GRIAL, and VariationHunter—, identify common pitfalls, and reveal important differences in their breakpoint precisions. These results stress the importance of the sequencing strategy for the detection of structural variants, especially inversions, and offer guidelines for the design of future genome sequencing projects. PMID:23637806

  15. Rabies in southeast Brazil: a change in the epidemiological pattern.

    PubMed

    Queiroz, Luzia Helena; Favoretto, Silvana Regina; Cunha, Elenice Maria S; Campos, Angélica Cristine A; Lopes, Marissol Cardoso; de Carvalho, Cristiano; Iamamoto, Keila; Araújo, Danielle Bastos; Venditti, Leandro Lima R; Ribeiro, Erica S; Pedro, Wagner André; Durigon, Edison Luiz

    2012-01-01

    This epidemiological study was conducted using antigenic and genetic characterisation of rabies virus isolates obtained from different animal species in the southeast of Brazil from 1993 to 2007. An alteration in the epidemiological profile was observed. One hundred two samples were tested using a panel of eight monoclonal antibodies, and 94 were genetically characterised by sequencing the nucleoprotein gene. From 1993 to 1997, antigenic variant 2 (AgV-2), related to a rabies virus maintained in dog populations, was responsible for rabies cases in dogs, cats, cattle and horses. Antigenic variant 3 (AgV-3), associated with Desmodus rotundus, was detected in a few cattle samples from rural areas. From 1998 to 2007, rabies virus was detected in bats and urban pets, and four distinct variants were identified. A nucleotide similarity analysis resulted in two primary groups comprising the dog and bat antigenic variants and showing the distinct endemic cycles maintained in the different animal species in this region.

  16. VariantBam: filtering and profiling of next-generational sequencing data using region-specific rules.

    PubMed

    Wala, Jeremiah; Zhang, Cheng-Zhong; Meyerson, Matthew; Beroukhim, Rameen

    2016-07-01

    We developed VariantBam, a C ++ read filtering and profiling tool for use with BAM, CRAM and SAM sequencing files. VariantBam provides a flexible framework for extracting sequencing reads or read-pairs that satisfy combinations of rules, defined by any number of genomic intervals or variant sites. We have implemented filters based on alignment data, sequence motifs, regional coverage and base quality. For example, VariantBam achieved a median size reduction ratio of 3.1:1 when applied to 10 lung cancer whole genome BAMs by removing large tags and selecting for only high-quality variant-supporting reads and reads matching a large dictionary of sequence motifs. Thus VariantBam enables efficient storage of sequencing data while preserving the most relevant information for downstream analysis. VariantBam and full documentation are available at github.com/jwalabroad/VariantBam rameen@broadinstitute.org Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. A Perfect Match Genomic Landscape Provides a Unified Framework for the Precise Detection of Variation in Natural and Synthetic Haploid Genomes

    PubMed Central

    Palacios-Flores, Kim; García-Sotelo, Jair; Castillo, Alejandra; Uribe, Carina; Aguilar, Luis; Morales, Lucía; Gómez-Romero, Laura; Reyes, José; Garciarubio, Alejandro; Boege, Margareta; Dávila, Guillermo

    2018-01-01

    We present a conceptually simple, sensitive, precise, and essentially nonstatistical solution for the analysis of genome variation in haploid organisms. The generation of a Perfect Match Genomic Landscape (PMGL), which computes intergenome identity with single nucleotide resolution, reveals signatures of variation wherever a query genome differs from a reference genome. Such signatures encode the precise location of different types of variants, including single nucleotide variants, deletions, insertions, and amplifications, effectively introducing the concept of a general signature of variation. The precise nature of variants is then resolved through the generation of targeted alignments between specific sets of sequence reads and known regions of the reference genome. Thus, the perfect match logic decouples the identification of the location of variants from the characterization of their nature, providing a unified framework for the detection of genome variation. We assessed the performance of the PMGL strategy via simulation experiments. We determined the variation profiles of natural genomes and of a synthetic chromosome, both in the context of haploid yeast strains. Our approach uncovered variants that have previously escaped detection. Moreover, our strategy is ideally suited for further refining high-quality reference genomes. The source codes for the automated PMGL pipeline have been deposited in a public repository. PMID:29367403

  18. Regression and Data Mining Methods for Analyses of Multiple Rare Variants in the Genetic Analysis Workshop 17 Mini-Exome Data

    PubMed Central

    Bailey-Wilson, Joan E.; Brennan, Jennifer S.; Bull, Shelley B; Culverhouse, Robert; Kim, Yoonhee; Jiang, Yuan; Jung, Jeesun; Li, Qing; Lamina, Claudia; Liu, Ying; Mägi, Reedik; Niu, Yue S.; Simpson, Claire L.; Wang, Libo; Yilmaz, Yildiz E.; Zhang, Heping; Zhang, Zhaogong

    2012-01-01

    Group 14 of Genetic Analysis Workshop 17 examined several issues related to analysis of complex traits using DNA sequence data. These issues included novel methods for analyzing rare genetic variants in an aggregated manner (often termed collapsing rare variants), evaluation of various study designs to increase power to detect effects of rare variants, and the use of machine learning approaches to model highly complex heterogeneous traits. Various published and novel methods for analyzing traits with extreme locus and allelic heterogeneity were applied to the simulated quantitative and disease phenotypes. Overall, we conclude that power is (as expected) dependent on locus-specific heritability or contribution to disease risk, large samples will be required to detect rare causal variants with small effect sizes, extreme phenotype sampling designs may increase power for smaller laboratory costs, methods that allow joint analysis of multiple variants per gene or pathway are more powerful in general than analyses of individual rare variants, population-specific analyses can be optimal when different subpopulations harbor private causal mutations, and machine learning methods may be useful for selecting subsets of predictors for follow-up in the presence of extreme locus heterogeneity and large numbers of potential predictors. PMID:22128066

  19. Development and validation of the JAX Cancer Treatment Profile™ for detection of clinically actionable mutations in solid tumors

    PubMed Central

    Ananda, Guruprasad; Mockus, Susan; Lundquist, Micaela; Spotlow, Vanessa; Simons, Al; Mitchell, Talia; Stafford, Grace; Philip, Vivek; Stearns, Timothy; Srivastava, Anuj; Barter, Mary; Rowe, Lucy; Malcolm, Joan; Bult, Carol; Karuturi, Radha Krishna Murthy; Rasmussen, Karen; Hinerfeld, Douglas

    2015-01-01

    Background The continued development of targeted therapeutics for cancer treatment has required the concomitant development of more expansive methods for the molecular profiling of the patient’s tumor. We describe the validation of the JAX Cancer Treatment Profile™ (JAX-CTP™), a next generation sequencing (NGS)-based molecular diagnostic assay that detects actionable mutations in solid tumors to inform the selection of targeted therapeutics for cancer treatment. Methods NGS libraries are generated from DNA extracted from formalin fixed paraffin embedded tumors. Using hybrid capture, the genes of interest are enriched and sequenced on the Illumina HiSeq 2500 or MiSeq sequencers followed by variant detection and functional and clinical annotation for the generation of a clinical report. Results The JAX-CTP™ detects actionable variants, in the form of single nucleotide variations and small insertions and deletions (≤50bp) in 190 genes in specimens with a neoplastic cell content of ≥10%. The JAX-CTP™ is also validated for the detection of clinically actionable gene amplifications. Conclusions There is a lack of consensus in the molecular diagnostics field on the best method for the validation of NGS-based assays in oncology, thus the importance of communicating methods, as contained in this report. The growing number of targeted therapeutics and the complexity of the tumor genome necessitates continued development and refinement of advanced assays for tumor profiling to enable precision cancer treatment. PMID:25562415

  20. Diagnostics based on nucleic acid sequence variant profiling: PCR, hybridization, and NGS approaches.

    PubMed

    Khodakov, Dmitriy; Wang, Chunyan; Zhang, David Yu

    2016-10-01

    Nucleic acid sequence variations have been implicated in many diseases, and reliable detection and quantitation of DNA/RNA biomarkers can inform effective therapeutic action, enabling precision medicine. Nucleic acid analysis technologies being translated into the clinic can broadly be classified into hybridization, PCR, and sequencing, as well as their combinations. Here we review the molecular mechanisms of popular commercial assays, and their progress in translation into in vitro diagnostics. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  1. A 3.4-kb Copy-Number Deletion near EPAS1 Is Significantly Enriched in High-Altitude Tibetans but Absent from the Denisovan Sequence.

    PubMed

    Lou, Haiyi; Lu, Yan; Lu, Dongsheng; Fu, Ruiqing; Wang, Xiaoji; Feng, Qidi; Wu, Sijie; Yang, Yajun; Li, Shilin; Kang, Longli; Guan, Yaqun; Hoh, Boon-Peng; Chung, Yeun-Jun; Jin, Li; Su, Bing; Xu, Shuhua

    2015-07-02

    Tibetan high-altitude adaptation (HAA) has been studied extensively, and many candidate genes have been reported. Subsequent efforts targeting HAA functional variants, however, have not been that successful (e.g., no functional variant has been suggested for the top candidate HAA gene, EPAS1). With WinXPCNVer, a method developed in this study, we detected in microarray data a Tibetan-enriched deletion (TED) carried by 90% of Tibetans; 50% were homozygous for the deletion, whereas only 3% carried the TED and 0% carried the homozygous deletion in 2,792 worldwide samples (p < 10(-15)). We employed long PCR and Sanger sequencing technologies to determine the exact copy number and breakpoints of the TED in 70 additional Tibetan and 182 diverse samples. The TED had identical boundaries (chr2: 46,694,276-46,697,683; hg19) and was 80 kb downstream of EPAS1. Notably, the TED was in strong linkage disequilibrium (LD; r(2) = 0.8) with EPAS1 variants associated with reduced blood concentrations of hemoglobin. It was also in complete LD with the 5-SNP motif, which was suspected to be introgressed from Denisovans, but the deletion itself was absent from the Denisovan sequence. Correspondingly, we detected that footprints of positive selection for the TED occurred 12,803 (95% confidence interval = 12,075-14,725) years ago. We further whole-genome deep sequenced (>60×) seven Tibetans and verified the TED but failed to identify any other copy-number variations with comparable patterns, giving this TED top priority for further study. We speculate that the specific patterns of the TED resulted from its own functionality in HAA of Tibetans or LD with a functional variant of EPAS1. Copyright © 2015 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  2. Screening for duplications, deletions and a common intronic mutation detects 35% of second mutations in patients with USH2A monoallelic mutations on Sanger sequencing.

    PubMed

    Steele-Stallard, Heather B; Le Quesne Stabej, Polona; Lenassi, Eva; Luxon, Linda M; Claustres, Mireille; Roux, Anne-Francoise; Webster, Andrew R; Bitner-Glindzicz, Maria

    2013-08-08

    Usher Syndrome is the leading cause of inherited deaf-blindness. It is divided into three subtypes, of which the most common is Usher type 2, and the USH2A gene accounts for 75-80% of cases. Despite recent sequencing strategies, in our cohort a significant proportion of individuals with Usher type 2 have just one heterozygous disease-causing mutation in USH2A, or no convincing disease-causing mutations across nine Usher genes. The purpose of this study was to improve the molecular diagnosis in these families by screening USH2A for duplications, heterozygous deletions and a common pathogenic deep intronic variant USH2A: c.7595-2144A>G. Forty-nine Usher type 2 or atypical Usher families who had missing mutations (mono-allelic USH2A or no mutations following Sanger sequencing of nine Usher genes) were screened for duplications/deletions using the USH2A SALSA MLPA reagent kit (MRC-Holland). Identification of USH2A: c.7595-2144A>G was achieved by Sanger sequencing. Mutations were confirmed by a combination of reverse transcription PCR using RNA extracted from nasal epithelial cells or fibroblasts, and by array comparative genomic hybridisation with sequencing across the genomic breakpoints. Eight mutations were identified in 23 Usher type 2 families (35%) with one previously identified heterozygous disease-causing mutation in USH2A. These consisted of five heterozygous deletions, one duplication, and two heterozygous instances of the pathogenic variant USH2A: c.7595-2144A>G. No variants were found in the 15 Usher type 2 families with no previously identified disease-causing mutations. In 11 atypical families, none of whom had any previously identified convincing disease-causing mutations, the mutation USH2A: c.7595-2144A>G was identified in a heterozygous state in one family. All five deletions and the heterozygous duplication we report here are novel. This is the first time that a duplication in USH2A has been reported as a cause of Usher syndrome. We found that 8 of 23 (35%) of 'missing' mutations in Usher type 2 probands with only a single heterozygous USH2A mutation detected with Sanger sequencing could be attributed to deletions, duplications or a pathogenic deep intronic variant. Future mutation detection strategies and genetic counselling will need to take into account the prevalence of these types of mutations in order to provide a more comprehensive diagnostic service.

  3. Sex is a moderator of the association between NOS1AP sequence variants and QTc in two long QT syndrome founder populations: a pedigree-based measured genotype association analysis.

    PubMed

    Winbo, Annika; Stattin, Eva-Lena; Westin, Ida Maria; Norberg, Anna; Persson, Johan; Jensen, Steen M; Rydberg, Annika

    2017-07-18

    Sequence variants in the NOS1AP gene have repeatedly been reported to influence QTc, albeit with moderate effect sizes. In the long QT syndrome (LQTS), this may contribute to the substantial QTc variance seen among carriers of identical pathogenic sequence variants. Here we assess three non-coding NOS1AP sequence variants, chosen for their previously reported strong association with QTc in normal and LQTS populations, for association with QTc in two Swedish LQT1 founder populations. This study included 312 individuals (58% females) from two LQT1 founder populations, whereof 227 genotype positive segregating either Y111C (n = 148) or R518* (n = 79) pathogenic sequence variants in the KCNQ1 gene, and 85 genotype negatives. All were genotyped for NOS1AP sequence variants rs12143842, rs16847548 and rs4657139, and tested for association with QTc length (effect size presented as mean difference between derived and wildtype, in ms), using a pedigree-based measured genotype association analysis. Mean QTc was obtained by repeated manual measurement (preferably in lead II) by one observer using coded 50 mm/s standard 12-lead ECGs. A substantial variance in mean QTc was seen in genotype positives 476 ± 36 ms (Y111C 483 ± 34 ms; R518* 462 ± 34 ms) and genotype negatives 433 ± 24 ms. Female sex was significantly associated with QTc prolongation in all genotype groups (p < 0.001). In a multivariable analysis including the entire study population and adjusted for KCNQ1 genotype, sex and age, NOS1AP sequence variants rs12143842 and rs16847548 (but not rs4657139) were significantly associated with QT prolongation, +18 ms (p = 0.0007) and +17 ms (p = 0.006), respectively. Significant sex-interactions were detected for both sequent variants (interaction term r = 0.892, p < 0.001 and r = 0.944, p < 0.001, respectively). Notably, across the genotype groups, when stratified by sex neither rs12143842 nor rs16847548 were significantly associated with QTc in females (both p = 0.16) while in males, a prolongation of +19 ms and +8 ms (p = 0.002 and p = 0.02) was seen in multivariable analysis, explaining up to 23% of QTc variance in all males. Sex was identified as a moderator of the association between NOS1AP sequence variants and QTc in two LQT1 founder populations. This finding may contribute to QTc sex differences and affect the usefulness of NOS1AP as a marker for clinical risk stratification in LQTS.

  4. Epistasis analysis for quantitative traits by functional regression model.

    PubMed

    Zhang, Futao; Boerwinkle, Eric; Xiong, Momiao

    2014-06-01

    The critical barrier in interaction analysis for rare variants is that most traditional statistical methods for testing interactions were originally designed for testing the interaction between common variants and are difficult to apply to rare variants because of their prohibitive computational time and poor ability. The great challenges for successful detection of interactions with next-generation sequencing (NGS) data are (1) lack of methods for interaction analysis with rare variants, (2) severe multiple testing, and (3) time-consuming computations. To meet these challenges, we shift the paradigm of interaction analysis between two loci to interaction analysis between two sets of loci or genomic regions and collectively test interactions between all possible pairs of SNPs within two genomic regions. In other words, we take a genome region as a basic unit of interaction analysis and use high-dimensional data reduction and functional data analysis techniques to develop a novel functional regression model to collectively test interactions between all possible pairs of single nucleotide polymorphisms (SNPs) within two genome regions. By intensive simulations, we demonstrate that the functional regression models for interaction analysis of the quantitative trait have the correct type 1 error rates and a much better ability to detect interactions than the current pairwise interaction analysis. The proposed method was applied to exome sequence data from the NHLBI's Exome Sequencing Project (ESP) and CHARGE-S study. We discovered 27 pairs of genes showing significant interactions after applying the Bonferroni correction (P-values < 4.58 × 10(-10)) in the ESP, and 11 were replicated in the CHARGE-S study. © 2014 Zhang et al.; Published by Cold Spring Harbor Laboratory Press.

  5. Whole genome sequence analyses of brain imaging measures in the Framingham Study.

    PubMed

    Sarnowski, Chloé; Satizabal, Claudia L; DeCarli, Charles; Pitsillides, Achilleas N; Cupples, L Adrienne; Vasan, Ramachandran S; Wilson, James G; Bis, Joshua C; Fornage, Myriam; Beiser, Alexa S; DeStefano, Anita L; Dupuis, Josée; Seshadri, Sudha

    2018-01-16

    We sought to identify rare variants influencing brain imaging phenotypes in the Framingham Heart Study by performing whole genome sequence association analyses within the Trans-Omics for Precision Medicine Program. We performed association analyses of cerebral and hippocampal volumes and white matter hyperintensity (WMH) in up to 2,180 individuals by testing the association of rank-normalized residuals from mixed-effect linear regression models adjusted for sex, age, and total intracranial volume with individual variants while accounting for familial relatedness. We conducted gene-based tests for rare variants using (1) a sliding-window approach, (2) a selection of functional exonic variants, or (3) all variants. We detected new loci in 1p21 for cerebral volume (minor allele frequency [MAF] 0.005, p = 10 -8 ) and in 16q23 for hippocampal volume (MAF 0.05, p = 2.7 × 10 -8 ). Previously identified associations in 12q24 for hippocampal volume (rs7294919, p = 4.4 × 10 -4 ) and in 17q25 for WMH (rs7214628, p = 2.0 × 10 -3 ) were confirmed. Gene-based tests detected associations ( p ≤ 2.3 × 10 -6 ) in new loci for cerebral (5q13, 8p12, 9q31, 13q12-q13, 15q24, 17q12, 19q13) and hippocampal volumes (2p12) and WMH (3q13, 4p15) including Alzheimer disease- ( UNC5D ) and Parkinson disease-associated genes ( GBA ). Pathway analyses evidenced enrichment of associated genes in immunity, inflammation, and Alzheimer disease and Parkinson disease pathways. Whole genome sequence-wide search reveals intriguing new loci associated with brain measures. Replication of novel loci is needed to confirm these findings. Copyright © 2017 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Academy of Neurology.

  6. Analysis of whole exome sequencing with cardiometabolic traits using family-based linkage and association in the IRAS Family Study

    PubMed Central

    Tabb, Keri L.; Hellwege, Jacklyn N.; Palmer, Nicholette D.; Dimitrov, Latchezar; Sajuthi, Satria; Taylor, Kent D.; NG, Maggie C.Y.; Hawkins, Gregory A.; Chen, Yii-Der Ida; Brown, W. Mark; McWilliams, David; Williams, Adrienne; Lorenzo, Carlos; Norris, Jill M.; Long, Jirong; Rotter, Jerome I.; Curran, Joanne E.; Blangero, John; Wagenknecht, Lynne E.; Langefeld, Carl D.; Bowden, Donald W.

    2017-01-01

    Summary Family-based methods are a potentially powerful tool to identify trait-defining genetic variants in extended families, particularly when used to complement conventional association analysis. We utilized two-point linkage analysis and single variant association analysis to evaluate whole exome sequencing (WES) data from 1,205 Hispanic Americans (78 families) from the Insulin Resistance Atherosclerosis Family Study. WES identified 211,612 variants above the minor allele frequency threshold of ≥0.005. These variants were tested for linkage and/or association with 50 cardiometabolic traits after quality control checks. Two-point linkage analysis yielded 10,580,600 LOD scores with 1,148 LOD scores ≥3, 183 LOD scores ≥4, and 29 LOD scores ≥5. The maximal novel LOD score was 5.50 for rs2289043:T>C, in UNC5C with subcutaneous adipose tissue volume. Association analysis identified 13 variants attaining genome-wide significance (p<5×10-08), with the strongest association between rs651821:C>T in APOA5, and triglyceride levels (p=3.67×10-10). Overall, there was a 5.2-fold increase in the number of informative variants detected by WES compared to exome chip analysis in this population, nearly 30% of which were novel variants relative to dbSNP build 138. Thus, integration of results from two-point linkage and single-variant association analysis from WES data enabled identification of novel signals potentially contributing to cardiometabolic traits. PMID:28067407

  7. GeneiASE: Detection of condition-dependent and static allele-specific expression from RNA-seq data without haplotype information

    PubMed Central

    Edsgärd, Daniel; Iglesias, Maria Jesus; Reilly, Sarah-Jayne; Hamsten, Anders; Tornvall, Per; Odeberg, Jacob; Emanuelsson, Olof

    2016-01-01

    Allele-specific expression (ASE) is the imbalance in transcription between maternal and paternal alleles at a locus and can be probed in single individuals using massively parallel DNA sequencing technology. Assessing ASE within a single sample provides a static picture of the ASE, but the magnitude of ASE for a given transcript may vary between different biological conditions in an individual. Such condition-dependent ASE could indicate a genetic variation with a functional role in the phenotypic difference. We investigated ASE through RNA-sequencing of primary white blood cells from eight human individuals before and after the controlled induction of an inflammatory response, and detected condition-dependent and static ASE at 211 and 13021 variants, respectively. We developed a method, GeneiASE, to detect genes exhibiting static or condition-dependent ASE in single individuals. GeneiASE performed consistently over a range of read depths and ASE effect sizes, and did not require phasing of variants to estimate haplotypes. We observed condition-dependent ASE related to the inflammatory response in 19 genes, and static ASE in 1389 genes. Allele-specific expression was confirmed by validation of variants through real-time quantitative RT-PCR, with RNA-seq and RT-PCR ASE effect-size correlations r = 0.67 and r = 0.94 for static and condition-dependent ASE, respectively. PMID:26887787

  8. Chitayat-Hall and Schaaf-Yang syndromes:a common aetiology: expanding the phenotype of MAGEL2-related disorders.

    PubMed

    Jobling, Rebekah; Stavropoulos, Dimitri James; Marshall, Christian R; Cytrynbaum, Cheryl; Axford, Michelle M; Londero, Vanessa; Moalem, Sharon; Orr, Jennifer; Rossignol, Francis; Lopes, Fatima Daniela; Gauthier, Julie; Alos, Nathalie; Rupps, Rosemarie; McKinnon, Margaret; Adam, Shelin; Nowaczyk, Malgorzata J M; Walker, Susan; Scherer, Stephen W; Nassif, Christina; Hamdan, Fadi F; Deal, Cheri L; Soucy, Jean-François; Weksberg, Rosanna; Macleod, Patrick; Michaud, Jacques L; Chitayat, David

    2018-05-01

    Chitayat-Hall syndrome, initially described in 1990, is a rare condition characterised by distal arthrogryposis, intellectual disability, dysmorphic features and hypopituitarism, in particular growth hormone deficiency. The genetic aetiology has not been identified. We identified three unrelated families with a total of six affected patients with the clinical manifestations of Chitayat-Hall syndrome. Through whole exome or whole genome sequencing, pathogenic variants in the MAGEL2 gene were identified in all affected patients. All disease-causing sequence variants detected are predicted to result in a truncated protein, including one complex variant that comprised a deletion and inversion. Chitayat-Hall syndrome is caused by pathogenic variants in MAGEL2 and shares a common aetiology with the recently described Schaaf-Yang syndrome. The phenotype of MAGEL2 -related disorders is expanded to include growth hormone deficiency as an important and treatable complication. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  9. Sequence variants of the DFNB31 gene among Usher syndrome patients of diverse origin

    PubMed Central

    Aller, Elena; Jaijo, Teresa; van Wijk, Erwin; Ebermann, Inga; Kersten, Ferry; García-García, Gema; Voesenek, Krysta; Aparisi, María José; Hoefsloot, Lies; Cremers, Cor; Díaz-Llopis, Manuel; Pennings, Ronald; Bolz, Hanno J.; Kremer, Hannie; Millán, José M.

    2010-01-01

    Purpose It has been demonstrated that mutations in deafness, autosomal recessive 31 (DFNB31), the gene encoding whirlin, is responsible for nonsyndromic hearing loss (NSHL; DFNB31) and Usher syndrome type II (USH2D). We screened DFNB31 in a large cohort of patients with different clinical subtypes of Usher syndrome (USH) to determine the prevalence of DFNB31 mutations among USH patients. Methods DFNB31 was screened in 149 USH2, 29 USH1, six atypical USH, and 11 unclassified USH patients from diverse ethnic backgrounds. Mutation detection was performed by direct sequencing of all coding exons. Results We identified 38 different variants among 195 patients. Most variants were clearly polymorphic, but at least two out of the 15 nonsynonymous variants (p.R350W and p.R882S) are predicted to impair whirlin structure and function, suggesting eventual pathogenicity. No putatively pathogenic mutation was found in the second allele of patients with these mutations. Conclusions DFNB31 is not a major cause of USH. PMID:20352026

  10. sapFinder: an R/Bioconductor package for detection of variant peptides in shotgun proteomics experiments.

    PubMed

    Wen, Bo; Xu, Shaohang; Sheynkman, Gloria M; Feng, Qiang; Lin, Liang; Wang, Quanhui; Xu, Xun; Wang, Jun; Liu, Siqi

    2014-11-01

    Single nucleotide variations (SNVs) located within a reading frame can result in single amino acid polymorphisms (SAPs), leading to alteration of the corresponding amino acid sequence as well as function of a protein. Accurate detection of SAPs is an important issue in proteomic analysis at the experimental and bioinformatic level. Herein, we present sapFinder, an R software package, for detection of the variant peptides based on tandem mass spectrometry (MS/MS)-based proteomics data. This package automates the construction of variation-associated databases from public SNV repositories or sample-specific next-generation sequencing (NGS) data and the identification of SAPs through database searching, post-processing and generation of HTML-based report with visualized interface. sapFinder is implemented as a Bioconductor package in R. The package and the vignette can be downloaded at http://bioconductor.org/packages/devel/bioc/html/sapFinder.html and are provided under a GPL-2 license. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  11. Germline contamination and leakage in whole genome somatic single nucleotide variant detection.

    PubMed

    Sendorek, Dorota H; Caloian, Cristian; Ellrott, Kyle; Bare, J Christopher; Yamaguchi, Takafumi N; Ewing, Adam D; Houlahan, Kathleen E; Norman, Thea C; Margolin, Adam A; Stuart, Joshua M; Boutros, Paul C

    2018-01-31

    The clinical sequencing of cancer genomes to personalize therapy is becoming routine across the world. However, concerns over patient re-identification from these data lead to questions about how tightly access should be controlled. It is not thought to be possible to re-identify patients from somatic variant data. However, somatic variant detection pipelines can mistakenly identify germline variants as somatic ones, a process called "germline leakage". The rate of germline leakage across different somatic variant detection pipelines is not well-understood, and it is uncertain whether or not somatic variant calls should be considered re-identifiable. To fill this gap, we quantified germline leakage across 259 sets of whole-genome somatic single nucleotide variant (SNVs) predictions made by 21 teams as part of the ICGC-TCGA DREAM Somatic Mutation Calling Challenge. The median somatic SNV prediction set contained 4325 somatic SNVs and leaked one germline polymorphism. The level of germline leakage was inversely correlated with somatic SNV prediction accuracy and positively correlated with the amount of infiltrating normal cells. The specific germline variants leaked differed by tumour and algorithm. To aid in quantitation and correction of leakage, we created a tool, called GermlineFilter, for use in public-facing somatic SNV databases. The potential for patient re-identification from leaked germline variants in somatic SNV predictions has led to divergent open data access policies, based on different assessments of the risks. Indeed, a single, well-publicized re-identification event could reshape public perceptions of the values of genomic data sharing. We find that modern somatic SNV prediction pipelines have low germline-leakage rates, which can be further reduced, especially for cloud-sharing, using pre-filtering software.

  12. Minority Human Immunodeficiency Virus Type 1 Variants in Antiretroviral-Naive Persons with Reverse Transcriptase Codon 215 Revertant Mutations▿ †

    PubMed Central

    Mitsuya, Yumi; Varghese, Vici; Wang, Chunlin; Liu, Tommy F.; Holmes, Susan P.; Jayakumar, Prerana; Gharizadeh, Baback; Ronaghi, Mostafa; Klein, Daniel; Fessel, W. Jeffrey; Shafer, Robert W.

    2008-01-01

    T215 revertant mutations such as T215C/D/E/S that evolve from the nucleoside reverse transcriptase (RT) inhibitor mutations T215Y/F have been found in about 3% of human immunodeficiency virus type 1 (HIV-1) isolates from newly diagnosed HIV-1-infected persons. We used a newly developed sequencing method—ultradeep pyrosequencing (UDPS; 454 Life Sciences)—to determine the frequency with which T215Y/F or other RT inhibitor resistance mutations could be detected as minority variants in samples from untreated persons that contain T215 revertants (“revertant” samples) compared with samples from untreated persons that lack such revertants (“control” samples). Among the 22 revertant and 29 control samples, UDPS detected a mean of 3.8 and 4.8 additional RT amino acid mutations, respectively. In 6 of 22 (27%) revertant samples and in 4 of 29 control samples (14%; P = 0.4), UDPS detected one or more RT inhibitor resistance mutations. T215Y or T215F was not detected in any of the revertant or control samples; however, 4 of 22 revertant samples had one or more T215 revertants that were detected by UDPS but not by direct PCR sequencing. The failure to detect viruses with T215Y/F in the 22 revertant samples in this study may result from the overwhelming replacement of transmitted T215Y variants by the more fit T215 revertants or from the primary transmission of a T215 revertant in a subset of persons with T215 revertants. PMID:18715933

  13. Analysis of the 3’ untranslated regions of α-tubulin and S-crystallin mRNA and the identification of CPEB in dark- and light-adapted octopus retinas

    PubMed Central

    Kelly, Shannan; Yamamoto, Hideki

    2008-01-01

    Purpose We previously reported the differential expression and translation of mRNA and protein in dark- and light-adapted octopus retinas, which may result from cytoplasmic polyadenylation element (CPE)–dependent mRNA masking and unmasking. Here we investigate the presence of CPEs in α-tubulin and S-crystallin mRNA and report the identification of cytoplasmic polyadenylation element binding protein (CPEB) in light- and dark-adapted octopus retinas. Methods 3’-RACE and sequencing were used to isolate and analyze the 3’-UTRs of α-tubulin and S-crystallin mRNA. Total retinal protein isolated from light- and dark-adapted octopus retinas was subjected to western blot analysis followed by CPEB antibody detection, PEP-171 inhibition of CPEB, and dephosphorylation of CPEB. Results The following CPE-like sequence was detected in the 3’-UTR of isolated long S-crystallin mRNA variants: UUUAACA. No CPE or CPE-like sequences were detected in the 3’-UTRs of α-tubulin mRNA or of the short S-crystallin mRNA variants. Western blot analysis detected CPEB as two putative bands migrating between 60-80 kDa, while a third band migrated below 30 kDa in dark- and light-adapted retinas. Conclusions The detection of CPEB and the identification of the putative CPE-like sequences in the S-crystallin 3’-UTR suggest that CPEB may be involved in the activation of masked S-crystallin mRNA, but not in the regulation of α-tubulin mRNA, resulting in increased S-crystallin protein synthesis in dark-adapted octopus retinas. PMID:18682811

  14. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kaplow, Irene M.; MacIsaac, Julia L.; Mah, Sarah M.

    DNA methylation is an epigenetic modification that plays a key role in gene regulation. Previous studies have investigated its genetic basis by mapping genetic variants that are associated with DNA methylation at specific sites, but these have been limited to microarrays that cover <2% of the genome and cannot account for allele-specific methylation (ASM). Other studies have performed whole-genome bisulfite sequencing on a few individuals, but these lack statistical power to identify variants associated with DNA methylation. We present a novel approach in which bisulfite-treated DNA from many individuals is sequenced together in a single pool, resulting in a trulymore » genome-wide map of DNA methylation. Compared to methods that do not account for ASM, our approach increases statistical power to detect associations while sharply reducing cost, effort, and experimental variability. As a proof of concept, we generated deep sequencing data from a pool of 60 human cell lines; we evaluated almost twice as many CpGs as the largest microarray studies and identified more than 2000 genetic variants associated with DNA methylation. Here we found that these variants are highly enriched for associations with chromatin accessibility and CTCF binding but are less likely to be associated with traits indirectly linked to DNA, such as gene expression and disease phenotypes. In summary, our approach allows genome-wide mapping of genetic variants associated with DNA methylation in any tissue of any species, without the need for individual-level genotype or methylation data.« less

  15. Exploring the unknown: assumptions about allelic architecture and strategies for susceptibility variant discovery.

    PubMed

    McCarthy, Mark I

    2009-07-03

    Identification of common-variant associations for many common disorders has been highly effective, but the loci detected so far typically explain only a small proportion of the genetic predisposition to disease. Extending explained genetic variance is one of the major near-term goals of human genetic research. Next-generation sequencing technologies offer great promise, but optimal strategies for their deployment remain uncertain, not least because we lack a clear view of the characteristics of the variants being sought. Here, I discuss what can and cannot be inferred about complex trait disease architecture from the information currently available and review the implications for future research strategies.

  16. Selecting sequence variants to improve genomic predictions for dairy cattle

    USDA-ARS?s Scientific Manuscript database

    Millions of genetic variants have been identified by population-scale sequencing projects, but subsets are needed for routine genomic predictions or to include on genotyping arrays. Methods of selecting sequence variants were compared using both simulated sequence genotypes and actual data from run ...

  17. Sequencing thousands of single-cell genomes with combinatorial indexing.

    PubMed

    Vitak, Sarah A; Torkenczy, Kristof A; Rosenkrantz, Jimi L; Fields, Andrew J; Christiansen, Lena; Wong, Melissa H; Carbone, Lucia; Steemers, Frank J; Adey, Andrew

    2017-03-01

    Single-cell genome sequencing has proven valuable for the detection of somatic variation, particularly in the context of tumor evolution. Current technologies suffer from high library construction costs, which restrict the number of cells that can be assessed and thus impose limitations on the ability to measure heterogeneity within a tissue. Here, we present single-cell combinatorial indexed sequencing (SCI-seq) as a means of simultaneously generating thousands of low-pass single-cell libraries for detection of somatic copy-number variants. We constructed libraries for 16,698 single cells from a combination of cultured cell lines, primate frontal cortex tissue and two human adenocarcinomas, and obtained a detailed assessment of subclonal variation within a pancreatic tumor.

  18. X-exome sequencing of 405 unresolved families identifies seven novel intellectual disability genes.

    PubMed

    Hu, H; Haas, S A; Chelly, J; Van Esch, H; Raynaud, M; de Brouwer, A P M; Weinert, S; Froyen, G; Frints, S G M; Laumonnier, F; Zemojtel, T; Love, M I; Richard, H; Emde, A-K; Bienek, M; Jensen, C; Hambrock, M; Fischer, U; Langnick, C; Feldkamp, M; Wissink-Lindhout, W; Lebrun, N; Castelnau, L; Rucci, J; Montjean, R; Dorseuil, O; Billuart, P; Stuhlmann, T; Shaw, M; Corbett, M A; Gardner, A; Willis-Owen, S; Tan, C; Friend, K L; Belet, S; van Roozendaal, K E P; Jimenez-Pocquet, M; Moizard, M-P; Ronce, N; Sun, R; O'Keeffe, S; Chenna, R; van Bömmel, A; Göke, J; Hackett, A; Field, M; Christie, L; Boyle, J; Haan, E; Nelson, J; Turner, G; Baynam, G; Gillessen-Kaesbach, G; Müller, U; Steinberger, D; Budny, B; Badura-Stronka, M; Latos-Bieleńska, A; Ousager, L B; Wieacker, P; Rodríguez Criado, G; Bondeson, M-L; Annerén, G; Dufke, A; Cohen, M; Van Maldergem, L; Vincent-Delorme, C; Echenne, B; Simon-Bouy, B; Kleefstra, T; Willemsen, M; Fryns, J-P; Devriendt, K; Ullmann, R; Vingron, M; Wrogemann, K; Wienker, T F; Tzschach, A; van Bokhoven, H; Gecz, J; Jentsch, T J; Chen, W; Ropers, H-H; Kalscheuer, V M

    2016-01-01

    X-linked intellectual disability (XLID) is a clinically and genetically heterogeneous disorder. During the past two decades in excess of 100 X-chromosome ID genes have been identified. Yet, a large number of families mapping to the X-chromosome remained unresolved suggesting that more XLID genes or loci are yet to be identified. Here, we have investigated 405 unresolved families with XLID. We employed massively parallel sequencing of all X-chromosome exons in the index males. The majority of these males were previously tested negative for copy number variations and for mutations in a subset of known XLID genes by Sanger sequencing. In total, 745 X-chromosomal genes were screened. After stringent filtering, a total of 1297 non-recurrent exonic variants remained for prioritization. Co-segregation analysis of potential clinically relevant changes revealed that 80 families (20%) carried pathogenic variants in established XLID genes. In 19 families, we detected likely causative protein truncating and missense variants in 7 novel and validated XLID genes (CLCN4, CNKSR2, FRMPD4, KLHL15, LAS1L, RLIM and USP27X) and potentially deleterious variants in 2 novel candidate XLID genes (CDK16 and TAF1). We show that the CLCN4 and CNKSR2 variants impair protein functions as indicated by electrophysiological studies and altered differentiation of cultured primary neurons from Clcn4(-/-) mice or after mRNA knock-down. The newly identified and candidate XLID proteins belong to pathways and networks with established roles in cognitive function and intellectual disability in particular. We suggest that systematic sequencing of all X-chromosomal genes in a cohort of patients with genetic evidence for X-chromosome locus involvement may resolve up to 58% of Fragile X-negative cases.

  19. X-exome sequencing of 405 unresolved families identifies seven novel intellectual disability genes

    PubMed Central

    Hu, H; Haas, S A; Chelly, J; Van Esch, H; Raynaud, M; de Brouwer, A P M; Weinert, S; Froyen, G; Frints, S G M; Laumonnier, F; Zemojtel, T; Love, M I; Richard, H; Emde, A-K; Bienek, M; Jensen, C; Hambrock, M; Fischer, U; Langnick, C; Feldkamp, M; Wissink-Lindhout, W; Lebrun, N; Castelnau, L; Rucci, J; Montjean, R; Dorseuil, O; Billuart, P; Stuhlmann, T; Shaw, M; Corbett, M A; Gardner, A; Willis-Owen, S; Tan, C; Friend, K L; Belet, S; van Roozendaal, K E P; Jimenez-Pocquet, M; Moizard, M-P; Ronce, N; Sun, R; O'Keeffe, S; Chenna, R; van Bömmel, A; Göke, J; Hackett, A; Field, M; Christie, L; Boyle, J; Haan, E; Nelson, J; Turner, G; Baynam, G; Gillessen-Kaesbach, G; Müller, U; Steinberger, D; Budny, B; Badura-Stronka, M; Latos-Bieleńska, A; Ousager, L B; Wieacker, P; Rodríguez Criado, G; Bondeson, M-L; Annerén, G; Dufke, A; Cohen, M; Van Maldergem, L; Vincent-Delorme, C; Echenne, B; Simon-Bouy, B; Kleefstra, T; Willemsen, M; Fryns, J-P; Devriendt, K; Ullmann, R; Vingron, M; Wrogemann, K; Wienker, T F; Tzschach, A; van Bokhoven, H; Gecz, J; Jentsch, T J; Chen, W; Ropers, H-H; Kalscheuer, V M

    2016-01-01

    X-linked intellectual disability (XLID) is a clinically and genetically heterogeneous disorder. During the past two decades in excess of 100 X-chromosome ID genes have been identified. Yet, a large number of families mapping to the X-chromosome remained unresolved suggesting that more XLID genes or loci are yet to be identified. Here, we have investigated 405 unresolved families with XLID. We employed massively parallel sequencing of all X-chromosome exons in the index males. The majority of these males were previously tested negative for copy number variations and for mutations in a subset of known XLID genes by Sanger sequencing. In total, 745 X-chromosomal genes were screened. After stringent filtering, a total of 1297 non-recurrent exonic variants remained for prioritization. Co-segregation analysis of potential clinically relevant changes revealed that 80 families (20%) carried pathogenic variants in established XLID genes. In 19 families, we detected likely causative protein truncating and missense variants in 7 novel and validated XLID genes (CLCN4, CNKSR2, FRMPD4, KLHL15, LAS1L, RLIM and USP27X) and potentially deleterious variants in 2 novel candidate XLID genes (CDK16 and TAF1). We show that the CLCN4 and CNKSR2 variants impair protein functions as indicated by electrophysiological studies and altered differentiation of cultured primary neurons from Clcn4−/− mice or after mRNA knock-down. The newly identified and candidate XLID proteins belong to pathways and networks with established roles in cognitive function and intellectual disability in particular. We suggest that systematic sequencing of all X-chromosomal genes in a cohort of patients with genetic evidence for X-chromosome locus involvement may resolve up to 58% of Fragile X-negative cases. PMID:25644381

  20. Epidemic history of hepatitis C virus infection in two remote communities in Nigeria, West Africa.

    PubMed

    Forbi, Joseph C; Purdy, Michael A; Campo, David S; Vaughan, Gilberto; Dimitrova, Zoya E; Ganova-Raeva, Lilia M; Xia, Guo-Liang; Khudyakov, Yury E

    2012-07-01

    We investigated the molecular epidemiology and population dynamics of HCV infection among indigenes of two semi-isolated communities in North-Central Nigeria. Despite remoteness and isolation, ~15% of the population had serological or molecular markers of hepatitis C virus (HCV) infection. Phylogenetic analysis of the NS5b sequences obtained from 60 HCV-infected residents showed that HCV variants belonged to genotype 1 (n=51; 85%) and genotype 2 (n=9; 15%). All sequences were unique and intermixed in the phylogenetic tree with HCV sequences from people infected from other West African countries. The high-throughput 454 pyrosequencing of the HCV hypervariable region 1 and an empirical threshold error correction algorithm were used to evaluate intra-host heterogeneity of HCV strains of genotype 1 (n=43) and genotype 2 (n=6) from residents of the communities. Analysis revealed a rare detectable intermixing of HCV intra-host variants among residents. Identification of genetically close HCV variants among all known groups of relatives suggests a common intra-familial HCV transmission in the communities. Applying Bayesian coalescent analysis to the NS5b sequences, the most recent common ancestors for genotype 1 and 2 variants were estimated to have existed 675 and 286 years ago, respectively. Bayesian skyline plots suggest that HCV lineages of both genotypes identified in the Nigerian communities experienced epidemic growth for 200-300 years until the mid-20th century. The data suggest a massive introduction of numerous HCV variants to the communities during the 20th century in the background of a dynamic evolutionary history of the hepatitis C epidemic in Nigeria over the past three centuries.

  1. Analyzing Somatic Genome Rearrangements in Human Cancers by Using Whole-Exome Sequencing | Office of Cancer Genomics

    Cancer.gov

    Although exome sequencing data are generated primarily to detect single-nucleotide variants and indels, they can also be used to identify a subset of genomic rearrangements whose breakpoints are located in or near exons. Using >4,600 tumor and normal pairs across 15 cancer types, we identified over 9,000 high confidence somatic rearrangements, including a large number of gene fusions.

  2. Sequencing small genomic targets with high efficiency and extreme accuracy

    PubMed Central

    Schmitt, Michael W.; Fox, Edward J.; Prindle, Marc J.; Reid-Bayliss, Kate S.; True, Lawrence D.; Radich, Jerald P.; Loeb, Lawrence A.

    2015-01-01

    The detection of minority variants in mixed samples demands methods for enrichment and accurate sequencing of small genomic intervals. We describe an efficient approach based on sequential rounds of hybridization with biotinylated oligonucleotides, enabling more than one-million fold enrichment of genomic regions of interest. In conjunction with error correcting double-stranded molecular tags, our approach enables the quantification of mutations in individual DNA molecules. PMID:25849638

  3. Novel mutations in CRB1 gene identified in a chinese pedigree with retinitis pigmentosa by targeted capture and next generation sequencing

    PubMed Central

    Lo, David; Weng, Jingning; Liu, xiaohong; Yang, Juhua; He, Fen; Wang, Yun; Liu, Xuyang

    2016-01-01

    PURPOSE To detect the disease-causing gene in a Chinese pedigree with autosomal-recessive retinitis pigmentosa (ARRP). METHODS All subjects in this family underwent a complete ophthalmic examination. Targeted-capture next generation sequencing (NGS) was performed on the proband to detect variants. All variants were verified in the remaining family members by PCR amplification and Sanger sequencing. RESULTS All the affected subjects in this pedigree were diagnosed with retinitis pigmentosa (RP). The compound heterozygous c.138delA (p.Asp47IlefsX24) and c.1841G>T (p.Gly614Val) mutations in the Crumbs homolog 1 (CRB1) gene were identified in all the affected patients but not in the unaffected individuals in this family. These mutations were inherited from their parents, respectively. CONCLUSION The novel compound heterozygous mutations in CRB1 were identified in a Chinese pedigree with ARRP using targeted-capture next generation sequencing. After evaluating the significant heredity and impaired protein function, the compound heterozygous c.138delA (p.Asp47IlefsX24) and c.1841G>T (p.Gly614Val) mutations are the causal genes of early onset ARRP in this pedigree. To the best of our knowledge, there is no previous report regarding the compound mutations. PMID:27806333

  4. Pretreatment drug resistance in a large countrywide Ethiopian HIV-1C cohort: a comparison of Sanger and high-throughput sequencing.

    PubMed

    Telele, Nigus Fikrie; Kalu, Amare Worku; Gebre-Selassie, Solomon; Fekade, Daniel; Abdurahman, Samir; Marrone, Gaetano; Neogi, Ujjwal; Tegbaru, Belete; Sönnerborg, Anders

    2018-05-15

    Baseline plasma samples of 490 randomly selected antiretroviral therapy (ART) naïve patients from seven hospitals participating in the first nationwide Ethiopian HIV-1 cohort were analysed for surveillance drug resistance mutations (sDRM) by population based Sanger sequencing (PBSS). Also next generation sequencing (NGS) was used in a subset of 109 baseline samples of patients. Treatment outcome after 6- and 12-months was assessed by on-treatment (OT) and intention-to-treat (ITT) analyses. Transmitted drug resistance (TDR) was detected in 3.9% (18/461) of successfully sequenced samples by PBSS. However, NGS detected sDRM more often (24%; 26/109) than PBSS (6%; 7/109) (p = 0.0001) and major integrase strand transfer inhibitors (INSTI) DRMs were also found in minor viral variants from five patients. Patients with sDRM had more frequent treatment failure in both OT and ITT analyses. The high rate of TDR by NGS and the identification of preexisting INSTI DRMs in minor wild-type HIV-1 subtype C viral variants infected Ethiopian patients underscores the importance of TDR surveillance in low- and middle-income countries and shows added value of high-throughput NGS in such studies.

  5. Rift Valley Fever, Sudan, 2007 and 2010

    PubMed Central

    Aradaib, Imadeldin E.; Erickson, Bobbie R.; Elageb, Rehab M.; Khristova, Marina L.; Carroll, Serena A.; Elkhidir, Isam M.; Karsany, Mubarak E.; Karrar, AbdelRahim E.; Elbashir, Mustafa I.

    2013-01-01

    To elucidate whether Rift Valley fever virus (RVFV) diversity in Sudan resulted from multiple introductions or from acquired changes over time from 1 introduction event, we generated complete genome sequences from RVFV strains detected during the 2007 and 2010 outbreaks. Phylogenetic analyses of small, medium, and large RNA segment sequences indicated several genetic RVFV variants were circulating in Sudan, which all grouped into Kenya-1 or Kenya-2 sublineages from the 2006–2008 eastern Africa epizootic. Bayesian analysis of sequence differences estimated that diversity among the 2007 and 2010 Sudan RVFV variants shared a most recent common ancestor circa 1996. The data suggest multiple introductions of RVFV into Sudan as part of sweeping epizootics from eastern Africa. The sequences indicate recent movement of RVFV and support the need for surveillance to recognize when and where RVFV circulates between epidemics, which can make data from prediction tools easier to interpret and preventive measures easier to direct toward high-risk areas. PMID:23347790

  6. Resistance-Associated NS5A Variants of Hepatitis C Virus Are Susceptible to Interferon-Based Therapy.

    PubMed

    Itakura, Jun; Kurosaki, Masayuki; Higuchi, Mayu; Takada, Hitomi; Nakakuki, Natsuko; Itakura, Yoshie; Tamaki, Nobuharu; Yasui, Yutaka; Suzuki, Shoko; Tsuchiya, Kaoru; Nakanishi, Hiroyuki; Takahashi, Yuka; Maekawa, Shinya; Enomoto, Nobuyuki; Izumi, Namiki

    2015-01-01

    The presence of resistance-associated variants (RAVs) of hepatitis C virus (HCV) attenuates the efficacy of direct acting antivirals (DAAs). The objective of this study was to characterize the susceptibility of RAVs to interferon-based therapy. Direct and deep sequencing were performed to detect Y93H RAV in the NS5A region. Twenty nine genotype 1b patients with detectable RAV at baseline were treated by a combination of simeprevir, pegylated interferon and ribavirin. The longitudinal changes in the proportion of Y93H RAV during therapy and at breakthrough or relapse were determined. By direct sequencing, Y93H RAV became undetectable or decreased in proportion at an early time point during therapy (within 7 days) in 57% of patients with both the Y93H variant and wild type virus at baseline when HCV RNA was still detectable. By deep sequencing, the proportion of Y93H RAV against Y93 wild type was 52.7% (5.8%- 97.4%) at baseline which significantly decreased to 29.7% (0.16%- 98.3%) within 7 days of initiation of treatment (p = 0.023). The proportion of Y93H RAV was reduced in 21 of 29 cases (72.4%) and a marked reduction of more than 10% was observed in 14 cases (48.7%). HCV RNA reduction was significantly greater for Y93H RAV (-3.65±1.3 logIU/mL/day) than the Y93 wild type (-3.35±1.0 logIU/mL/day) (p<0.001). Y93H RAV is more susceptible to interferon-based therapy than the Y93 wild type.

  7. Single Assay for Simultaneous Detection and Differential Identification of Human and Avian Influenza Virus Types, Subtypes, and Emergent Variants

    PubMed Central

    Metzgar, David; Myers, Christopher A.; Russell, Kevin L.; Faix, Dennis; Blair, Patrick J.; Brown, Jason; Vo, Scott; Swayne, David E.; Thomas, Colleen; Stenger, David A.; Lin, Baochuan; Malanoski, Anthony P.; Wang, Zheng; Blaney, Kate M.; Long, Nina C.; Schnur, Joel M.; Saad, Magdi D.; Borsuk, Lisa A.; Lichanska, Agnieszka M.; Lorence, Matthew C.; Weslowski, Brian; Schafer, Klaus O.; Tibbetts, Clark

    2010-01-01

    For more than four decades the cause of most type A influenza virus infections of humans has been attributed to only two viral subtypes, A/H1N1 or A/H3N2. In contrast, avian and other vertebrate species are a reservoir of type A influenza virus genome diversity, hosting strains representing at least 120 of 144 combinations of 16 viral hemagglutinin and 9 viral neuraminidase subtypes. Viral genome segment reassortments and mutations emerging within this reservoir may spawn new influenza virus strains as imminent epidemic or pandemic threats to human health and poultry production. Traditional methods to detect and differentiate influenza virus subtypes are either time-consuming and labor-intensive (culture-based) or remarkably insensitive (antibody-based). Molecular diagnostic assays based upon reverse transcriptase-polymerase chain reaction (RT-PCR) have short assay cycle time, and high analytical sensitivity and specificity. However, none of these diagnostic tests determine viral gene nucleotide sequences to distinguish strains and variants of a detected pathogen from one specimen to the next. Decision-quality, strain- and variant-specific pathogen gene sequence information may be critical for public health, infection control, surveillance, epidemiology, or medical/veterinary treatment planning. The Resequencing Pathogen Microarray (RPM-Flu) is a robust, highly multiplexed and target gene sequencing-based alternative to both traditional culture- or biomarker-based diagnostic tests. RPM-Flu is a single, simultaneous differential diagnostic assay for all subtype combinations of type A influenza viruses and for 30 other viral and bacterial pathogens that may cause influenza-like illness. These other pathogen targets of RPM-Flu may co-infect and compound the morbidity and/or mortality of patients with influenza. The informative specificity of a single RPM-Flu test represents specimen-specific viral gene sequences as determinants of virus type, A/HN subtype, virulence, host-range, and resistance to antiviral agents. PMID:20140251

  8. Single assay for simultaneous detection and differential identification of human and avian influenza virus types, subtypes, and emergent variants.

    PubMed

    Metzgar, David; Myers, Christopher A; Russell, Kevin L; Faix, Dennis; Blair, Patrick J; Brown, Jason; Vo, Scott; Swayne, David E; Thomas, Colleen; Stenger, David A; Lin, Baochuan; Malanoski, Anthony P; Wang, Zheng; Blaney, Kate M; Long, Nina C; Schnur, Joel M; Saad, Magdi D; Borsuk, Lisa A; Lichanska, Agnieszka M; Lorence, Matthew C; Weslowski, Brian; Schafer, Klaus O; Tibbetts, Clark

    2010-02-03

    For more than four decades the cause of most type A influenza virus infections of humans has been attributed to only two viral subtypes, A/H1N1 or A/H3N2. In contrast, avian and other vertebrate species are a reservoir of type A influenza virus genome diversity, hosting strains representing at least 120 of 144 combinations of 16 viral hemagglutinin and 9 viral neuraminidase subtypes. Viral genome segment reassortments and mutations emerging within this reservoir may spawn new influenza virus strains as imminent epidemic or pandemic threats to human health and poultry production. Traditional methods to detect and differentiate influenza virus subtypes are either time-consuming and labor-intensive (culture-based) or remarkably insensitive (antibody-based). Molecular diagnostic assays based upon reverse transcriptase-polymerase chain reaction (RT-PCR) have short assay cycle time, and high analytical sensitivity and specificity. However, none of these diagnostic tests determine viral gene nucleotide sequences to distinguish strains and variants of a detected pathogen from one specimen to the next. Decision-quality, strain- and variant-specific pathogen gene sequence information may be critical for public health, infection control, surveillance, epidemiology, or medical/veterinary treatment planning. The Resequencing Pathogen Microarray (RPM-Flu) is a robust, highly multiplexed and target gene sequencing-based alternative to both traditional culture- or biomarker-based diagnostic tests. RPM-Flu is a single, simultaneous differential diagnostic assay for all subtype combinations of type A influenza viruses and for 30 other viral and bacterial pathogens that may cause influenza-like illness. These other pathogen targets of RPM-Flu may co-infect and compound the morbidity and/or mortality of patients with influenza. The informative specificity of a single RPM-Flu test represents specimen-specific viral gene sequences as determinants of virus type, A/HN subtype, virulence, host-range, and resistance to antiviral agents.

  9. Whole-genome sequencing and genetic variant analysis of a Quarter Horse mare.

    PubMed

    Doan, Ryan; Cohen, Noah D; Sawyer, Jason; Ghaffari, Noushin; Johnson, Charlie D; Dindot, Scott V

    2012-02-17

    The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse's genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids.

  10. Detecting Genomic Clustering of Risk Variants from Sequence Data: Cases vs. Controls

    PubMed Central

    Schaid, Daniel J.; Sinnwell, Jason P.; McDonnell, Shannon K.; Thibodeau, Stephen N.

    2013-01-01

    As the ability to measure dense genetic markers approaches the limit of the DNA sequence itself, taking advantage of possible clustering of genetic variants in, and around, a gene would benefit genetic association analyses, and likely provide biological insights. The greatest benefit might be realized when multiple rare variants cluster in a functional region. Several statistical tests have been developed, one of which is based on the popular Kulldorff scan statistic for spatial clustering of disease. We extended another popular spatial clustering method – Tango’s statistic – to genomic sequence data. An advantage of Tango’s method is that it is rapid to compute, and when single test statistic is computed, its distribution is well approximated by a scaled chi-square distribution, making computation of p-values very rapid. We compared the Type-I error rates and power of several clustering statistics, as well as the omnibus sequence kernel association test (SKAT). Although our version of Tango’s statistic, which we call “Kernel Distance” statistic, took approximately half the time to compute than the Kulldorff scan statistic, it had slightly less power than the scan statistic. Our results showed that the Ionita-Laza version of Kulldorff’s scan statistic had the greatest power over a range of clustering scenarios. PMID:23842950

  11. Promises and pitfalls of Illumina sequencing for HIV resistance genotyping.

    PubMed

    Brumme, Chanson J; Poon, Art F Y

    2017-07-15

    Genetic sequencing ("genotyping") plays a critical role in the modern clinical management of HIV infection. This virus evolves rapidly within patients because of its error-prone reverse transcriptase and short generation time. Consequently, HIV variants with mutations that confer resistance to one or more antiretroviral drugs can emerge during sub-optimal treatment. There are now multiple HIV drug resistance interpretation algorithms that take the region of the HIV genome encoding the major drug targets as inputs; expert use of these algorithms can significantly improve to clinical outcomes in HIV treatment. Next-generation sequencing has the potential to revolutionize HIV resistance genotyping by lowering the threshold that rare but clinically significant HIV variants can be detected reproducibly, and by conferring improved cost-effectiveness in high-throughput scenarios. In this review, we discuss the relative merits and challenges of deploying the Illumina MiSeq instrument for clinical HIV genotyping. Copyright © 2016 Elsevier B.V. All rights reserved.

  12. [Pharmacogenomics study of 620 whole-exome sequencing: focusing on aspirin application].

    PubMed

    Yang, L; Lu, Y L; Wang, H J; Zhou, W H

    2016-05-01

    To investigate the allele frequencies of aspirin-response-related variants in different population. The allele frequencies of reported clinically significant aspirin-response-related variants were evaluated based on 620 whole exome sequencing (WES) data collected from 2013 to 2016 in Children's Hospital of Fudan University.Then the local allele frequencies were compared with 1 000 Genomes project database, and χ(2) test was used. Thirty-eight aspirin-response-related variants that had clinical significance had been detected in the 620 WES data.Ten (26%) of them were related with drug efficacy while 28 (74%) were related with toxicity or adverse drug reaction (ADR). These variants were distributed in 33 genes.There were 23 aspirin-related variants further analysised, and the frequency of 7 (rs1050891, rs6065, rs7862221, rs1065776, rs3818822, rs3775291 and rs1126643) had no significant difference compared with frequency of European and East Asian population of 1 000 Genome project (P>0.01 for both), 10 (rs2228079, rs1613662, rs4523, rs28360521, rs1131882, rs1047626, rs3856806, rs2768759, rs7572857 and rs1126510) of them had no significant difference compared with East Asian but were significantly different from European population, 1 (rs2075797) had no significant difference compared with frequency of European and different with frequency of East Asian, and 5 variants(rs10279545, rs730012, rs16851030, rs1353411, rs1800469)were different from frequency of both East Asian(0.019, 0.058, 0.167, 0.452, 0.340 vs. 0.100, 0.151, 0.396, 0.568, 0.453, χ(2)=21.798, 20.400, 67.543, 16.531, 15.807, P all<0.01) and European population(0.531, 0.312, 0.037, 0.179, 0.688, χ(2)=325.799, 92.877, 144.811, 156.471, 174.533, P all<0.01). Most variants that have clinical significance in aspirin response are related with drug efficacy or drug toxicity or ADR, indicating the urgency of variants screen in clinical practice.Significant population-specificity is detected in local 620 WES data in aspirin-response-related variants.

  13. Use of next-generation sequencing to detect LDLR gene copy number variation in familial hypercholesterolemia[S

    PubMed Central

    Iacocca, Michael A.; Wang, Jian; Dron, Jacqueline S.; Robinson, John F.; McIntyre, Adam D.; Cao, Henian

    2017-01-01

    Familial hypercholesterolemia (FH) is a heritable condition of severely elevated LDL cholesterol, caused predominantly by autosomal codominant mutations in the LDL receptor gene (LDLR). In providing a molecular diagnosis for FH, the current procedure often includes targeted next-generation sequencing (NGS) panels for the detection of small-scale DNA variants, followed by multiplex ligation-dependent probe amplification (MLPA) in LDLR for the detection of whole-exon copy number variants (CNVs). The latter is essential because ∼10% of FH cases are attributed to CNVs in LDLR; accounting for them decreases false negative findings. Here, we determined the potential of replacing MLPA with bioinformatic analysis applied to NGS data, which uses depth-of-coverage analysis as its principal method to identify whole-exon CNV events. In analysis of 388 FH patient samples, there was 100% concordance in LDLR CNV detection between these two methods: 38 reported CNVs identified by MLPA were also successfully detected by our NGS method, while 350 samples negative for CNVs by MLPA were also negative by NGS. This result suggests that MLPA can be removed from the routine diagnostic screening for FH, significantly reducing associated costs, resources, and analysis time, while promoting more widespread assessment of this important class of mutations across diagnostic laboratories. PMID:28874442

  14. Validation and Implementation of BRCA1/2 Variant Screening in Ovarian Tumor Tissue.

    PubMed

    de Jonge, Marthe M; Ruano, Dina; van Eijk, Ronald; van der Stoep, Nienke; Nielsen, Maartje; Wijnen, Juul T; Ter Haar, Natalja T; Baalbergen, Astrid; Bos, Monique E M M; Kagie, Marjolein J; Vreeswijk, Maaike P G; Gaarenstroom, Katja N; Kroep, Judith R; Smit, Vincent T H B M; Bosse, Tjalling; van Wezel, Tom; van Asperen, Christi J

    2018-06-21

    BRCA1/2 variant analysis in tumor tissue could streamline the referral of patients with epithelial ovarian, fallopian tube, or primary peritoneal cancer to genetic counselors and select patients who benefit most from targeted treatment. We investigated the sensitivity of BRCA1/2 variant analysis in formalin-fixed, paraffin-embedded tumor tissue using a combination of next-generation sequencing and copy number variant multiplex ligation-dependent probe amplification. After optimization using a training cohort of known BRCA1/2 mutation carriers, validation was performed in a prospective cohort (Clinical implementation Of BRCA1/2 screening in ovarian tumor tissue: COBRA-cohort) in which screening of BRCA1/2 tumor DNA and leukocyte germline DNA was performed in parallel. BRCA1 promoter hypermethylation and pedigree analysis were also performed. In the training cohort 45 of 46 germline BRCA1/2 variants were detected (sensitivity 98%). In the COBRA cohort (n=62), all six germline variants were identified (sensitivity 100%), together with five somatic BRCA1/2 variants and eight cases with BRCA1 promoter hypermethylation. In four BRCA1/2 variant-negative patients, surveillance or prophylactic management options were offered based on positive family histories. We conclude that BRCA1/2 formalin-fixed, paraffin-embedded tumor tissue analysis reliably detects BRCA1/2 variants. When taking family history of BRCA1/2 variant-negative patients into account, tumor BRCA1/2 variant screening allows more efficient selection of epithelial ovarian cancer patients for genetic counselling and simultaneously selects patients who benefit most from targeted treatment. Copyright © 2018. Published by Elsevier Inc.

  15. HPV-6 Molecular Variants Association With the Development of Genital Warts in Men: The HIM Study.

    PubMed

    Flores-Díaz, Ema; Sereday, Karen A; Ferreira, Silvaneide; Sirak, Bradley; Sobrinho, João Simão; Baggio, Maria Luiza; Galan, Lenice; Silva, Roberto C; Lazcano-Ponce, Eduardo; Giuliano, Anna R; Villa, Luisa L; Sichero, Laura

    2017-02-15

    Human papillomavirus type 6 (HPV-6) and HPV-11 are the etiological agents of approximately 90% of genital warts (GWs). The impact of HPV-6 genetic heterogeneity on persistence and progression to GWs remains undetermined. HPV Infection in Men (HIM) Study participants who had HPV-6 genital swabs and/or GWs preceded by a viable normal genital swab were analyzed. Variants characterization was performed by polymerase chain reaction sequencing and samples classified within lineages (A, B) and sublineages (B1, B2, B3, B4, B5). Country- and age-specific analyses were conducted for individual variants; odds ratios and 95% confidence intervals for the risk of GWs according to HPV-6 variants were calculated. B3 variants were most prevalent. HPV-6 variants distribution differed between countries and case status. HPV-6 B1 variants prevalence was increased in GWs and genital swabs of cases compared to controls. There was difference in B1 and B3 variants detection in GW and the preceding genital swab. We observed significant association of HPV-6 B1 variants detection with GW development. HPV-6 B1 variants are more prevalent in genital swabs that precede GW development, and confer an increased risk for GW. Further research is warranted to understand the possible involvement of B1 variants in the progression to clinically relevant lesions. © The Author 2016. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail: journals.permissions@oup.com.

  16. Global characterization of copy number variants in epilepsy patients from whole genome sequencing

    PubMed Central

    Meloche, Caroline; Andrade, Danielle M.; Lafreniere, Ron G.; Gravel, Micheline; Spiegelman, Dan; Dionne-Laporte, Alexandre; Boelman, Cyrus; Hamdan, Fadi F.; Michaud, Jacques L.; Rouleau, Guy; Minassian, Berge A.; Bourque, Guillaume; Cossette, Patrick

    2018-01-01

    Epilepsy will affect nearly 3% of people at some point during their lifetime. Previous copy number variants (CNVs) studies of epilepsy have used array-based technology and were restricted to the detection of large or exonic events. In contrast, whole-genome sequencing (WGS) has the potential to more comprehensively profile CNVs but existing analytic methods suffer from limited accuracy. We show that this is in part due to the non-uniformity of read coverage, even after intra-sample normalization. To improve on this, we developed PopSV, an algorithm that uses multiple samples to control for technical variation and enables the robust detection of CNVs. Using WGS and PopSV, we performed a comprehensive characterization of CNVs in 198 individuals affected with epilepsy and 301 controls. For both large and small variants, we found an enrichment of rare exonic events in epilepsy patients, especially in genes with predicted loss-of-function intolerance. Notably, this genome-wide survey also revealed an enrichment of rare non-coding CNVs near previously known epilepsy genes. This enrichment was strongest for non-coding CNVs located within 100 Kbp of an epilepsy gene and in regions associated with changes in the gene expression, such as expression QTLs or DNase I hypersensitive sites. Finally, we report on 21 potentially damaging events that could be associated with known or new candidate epilepsy genes. Our results suggest that comprehensive sequence-based profiling of CNVs could help explain a larger fraction of epilepsy cases. PMID:29649218

  17. Parallel targeted next generation sequencing of childhood and adult acute myeloid leukemia patients reveals uniform genomic profile of the disease.

    PubMed

    Marjanovic, Irena; Kostic, Jelena; Stanic, Bojana; Pejanovic, Nadja; Lucic, Bojana; Karan-Djurasevic, Teodora; Janic, Dragana; Dokmanovic, Lidija; Jankovic, Srdja; Vukovic, Nada Suvajdzic; Tomin, Dragica; Perisic, Ognjen; Rakocevic, Goran; Popovic, Milos; Pavlovic, Sonja; Tosic, Natasa

    2016-10-01

    The age-specific differences in the genetic mechanisms of myeloid leukemogenesis have been observed and studied previously. However, NGS technology has provided a possibility to obtain a large amount of mutation data. We analyzed DNA samples from 20 childhood (cAML) and 20 adult AML (aAML) patients, using NGS targeted sequencing. The average coverage of high-quality sequences was 2981 × per amplicon. A total of 412 (207 cAML, 205 aAML) variants in the coding regions were detected; out of which, only 122 (62 cAML and 60 aAML) were potentially protein-changing. Our results confirmed that AML contains small number of genetic alterations (median 3 mutations/patient in both groups). The prevalence of the most frequent single gene AML associated mutations differed in cAML and aAML patient cohorts: IDH1 (0 % cAML, 5 % aAML), IDH2 (0 % cAML, 10 % aAML), NPM1 (10 % cAML, 35 % aAML). Additionally, potentially protein-changing variants were found in tyrosine kinase genes or genes encoding tyrosine kinase associated proteins (JAK3, ABL1, GNAQ, and EGFR) in cAML, while among aAML, the prevalence is directed towards variants in the methylation and histone modifying genes (IDH1, IDH2, and SMARCB1). Besides uniform genomic profile of AML, specific genetic characteristic was exclusively detected in cAML and aAML.

  18. Detection and assessment of copy number variation using PacBio long-read and Illumina sequencing in New Zealand dairy cattle.

    PubMed

    Couldrey, C; Keehan, M; Johnson, T; Tiplady, K; Winkelman, A; Littlejohn, M D; Scott, A; Kemper, K E; Hayes, B; Davis, S R; Spelman, R J

    2017-07-01

    Single nucleotide polymorphisms have been the DNA variant of choice for genomic prediction, largely because of the ease of single nucleotide polymorphism genotype collection. In contrast, structural variants (SV), which include copy number variants (CNV), translocations, insertions, and inversions, have eluded easy detection and characterization, particularly in nonhuman species. However, evidence increasingly shows that SV not only contribute a substantial proportion of genetic variation but also have significant influence on phenotypes. Here we present the discovery of CNV in a prominent New Zealand dairy bull using long-read PacBio (Pacific Biosciences, Menlo Park, CA) sequencing technology and the Sniffles SV discovery tool (version 0.0.1; https://github.com/fritzsedlazeck/Sniffles). The CNV identified from long reads were compared with CNV discovered in the same bull from Illumina sequencing using CNVnator (read depth-based tool; Illumina Inc., San Diego, CA) as a means of validation. Subsequently, further validation was undertaken using whole-genome Illumina sequencing of 556 cattle representing the wider New Zealand dairy cattle population. Very limited overlap was observed in CNV discovered from the 2 sequencing platforms, in part because of the differences in size of CNV detected. Only a few CNV were therefore able to be validated using this approach. However, the ability to use CNVnator to genotype the 557 cattle for copy number across all regions identified as putative CNV allowed a genome-wide assessment of transmission level of copy number based on pedigree. The more highly transmissible a putative CNV region was observed to be, the more likely the distribution of copy number was multimodal across the 557 sequenced animals. Furthermore, visual assessment of highly transmissible CNV regions provided evidence supporting the presence of CNV across the sequenced animals. This transmission-based approach was able to confirm a subset of CNV that segregates in the New Zealand dairy cattle population. Genome-wide identification and validation of CNV is an important step toward their inclusion in genomic selection strategies. The Authors. Published by the Federation of Animal Science Societies and Elsevier Inc. on behalf of the American Dairy Science Association®. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).

  19. Sequencing of a Patient with Balanced Chromosome Abnormalities and Neurodevelopmental Disease Identifies Disruption of Multiple High Risk Loci by Structural Variation

    PubMed Central

    Blake, Jonathon; Riddell, Andrew; Theiss, Susanne; Gonzalez, Alexis Perez; Haase, Bettina; Jauch, Anna; Janssen, Johannes W. G.; Ibberson, David; Pavlinic, Dinko; Moog, Ute; Benes, Vladimir; Runz, Heiko

    2014-01-01

    Balanced chromosome abnormalities (BCAs) occur at a high frequency in healthy and diseased individuals, but cost-efficient strategies to identify BCAs and evaluate whether they contribute to a phenotype have not yet become widespread. Here we apply genome-wide mate-pair library sequencing to characterize structural variation in a patient with unclear neurodevelopmental disease (NDD) and complex de novo BCAs at the karyotype level. Nucleotide-level characterization of the clinically described BCA breakpoints revealed disruption of at least three NDD candidate genes (LINC00299, NUP205, PSMD14) that gave rise to abnormal mRNAs and could be assumed as disease-causing. However, unbiased genome-wide analysis of the sequencing data for cryptic structural variation was key to reveal an additional submicroscopic inversion that truncates the schizophrenia- and bipolar disorder-associated brain transcription factor ZNF804A as an equally likely NDD-driving gene. Deep sequencing of fluorescent-sorted wild-type and derivative chromosomes confirmed the clinically undetected BCA. Moreover, deep sequencing further validated a high accuracy of mate-pair library sequencing to detect structural variants larger than 10 kB, proposing that this approach is powerful for clinical-grade genome-wide structural variant detection. Our study supports previous evidence for a role of ZNF804A in NDD and highlights the need for a more comprehensive assessment of structural variation in karyotypically abnormal individuals and patients with neurocognitive disease to avoid diagnostic deception. PMID:24625750

  20. Molecular Cytogenetics Guides Massively Parallel Sequencing of a Radiation-Induced Chromosome Translocation in Human Cells.

    PubMed

    Cornforth, Michael N; Anur, Pavana; Wang, Nicholas; Robinson, Erin; Ray, F Andrew; Bedford, Joel S; Loucas, Bradford D; Williams, Eli S; Peto, Myron; Spellman, Paul; Kollipara, Rahul; Kittler, Ralf; Gray, Joe W; Bailey, Susan M

    2018-05-11

    Chromosome rearrangements are large-scale structural variants that are recognized drivers of oncogenic events in cancers of all types. Cytogenetics allows for their rapid, genome-wide detection, but does not provide gene-level resolution. Massively parallel sequencing (MPS) promises DNA sequence-level characterization of the specific breakpoints involved, but is strongly influenced by bioinformatics filters that affect detection efficiency. We sought to characterize the breakpoint junctions of chromosomal translocations and inversions in the clonal derivatives of human cells exposed to ionizing radiation. Here, we describe the first successful use of DNA paired-end analysis to locate and sequence across the breakpoint junctions of a radiation-induced reciprocal translocation. The analyses employed, with varying degrees of success, several well-known bioinformatics algorithms, a task made difficult by the involvement of repetitive DNA sequences. As for underlying mechanisms, the results of Sanger sequencing suggested that the translocation in question was likely formed via microhomology-mediated non-homologous end joining (mmNHEJ). To our knowledge, this represents the first use of MPS to characterize the breakpoint junctions of a radiation-induced chromosomal translocation in human cells. Curiously, these same approaches were unsuccessful when applied to the analysis of inversions previously identified by directional genomic hybridization (dGH). We conclude that molecular cytogenetics continues to provide critical guidance for structural variant discovery, validation and in "tuning" analysis filters to enable robust breakpoint identification at the base pair level.

  1. Detection of Hemoglobin New York [β113 (G15) Val→Glu, GTG>GAG] in a Thai Woman by Capillary Electrophoresis.

    PubMed

    Panyasai, Sitthichai; Pornprasert, Sakorn

    2016-12-01

    Hemoglobin (Hb) New York [β113 (G15) Val→Glu, GTG>GAG] is a very rare β-chain variant found in Thailand. This variant is often missed by routine laboratory testing because Hb New York and Hb A have the identical retention time on high performance liquid chromatography. We reported here for the first time that the detection of Hb New York in a Thai woman by using capillary electrophoresis (CE). A peak of Hb New York located ahead of Hb A at the electrophoretic zone 11 with a level of 42.8 %. The DNA sequencing revealed the GTG>GAG mutation at codon 113 for Hb New York on one allele of β-globin gene. Therefore, the CE has a high efficiency to prevent the misinterpretation of hemoglobin analysis in patients who are heterozygote of this variant.

  2. Spectrum of genetic variants of BRCA1 and BRCA2 in a German single center study.

    PubMed

    Meisel, Cornelia; Sadowski, Carolin Eva; Kohlstedt, Daniela; Keller, Katja; Stäritz, Franziska; Grübling, Nannette; Becker, Kerstin; Mackenroth, Luisa; Rump, Andreas; Schröck, Evelin; Arnold, Norbert; Wimberger, Pauline; Kast, Karin

    2017-05-01

    Determination of mutation status of BRCA1 and BRCA2 has become part of the clinical routine. However, the spectrum of genetic variants differs between populations. The aim of this study was to deliver a comprehensive description of all detected variants. In families fulfilling one of the German Consortium for Hereditary Breast and Ovarian Cancer (GC-HBOC) criteria for genetic testing, one affected was chosen for analysis. DNA of blood lymphocytes was amplified by PCR and prescreened by DHPLC. Aberrant fragments were sequenced. All coding exons and splice sites of BRCA1 and BRCA2 were analyzed. Screening for large rearrangements in both genes was performed by MLPA. Of 523 index patients, 121 (23.1%) were found to carry a pathogenic or likely pathogenic (class 4/5) mutation. A variant of unknown significance (VUS) was detected in 73/523 patients (13.9%). Two mutations p.Gln1756Profs*74 and p.Cys61Gly comprised 42.3% (n = 33/78) of all detected pathogenic mutations in BRCA1. Most of the other mutations were unique mutations. The most frequently detected mutation in BRCA2 was p.Val1283Lys (13.9%; n = 6/43). Altogether, 101 different neutral genetic variants were counted in BRCA1 (n = 35) and in BRCA2 (n = 66). The two most frequently detected mutations are founder mutations in Poland and Czech Republic. More similarities seem to be shared with our direct neighbor countries compared to other European countries. For comparison of the extended genotype, a shared database is needed.

  3. GRIDSS: sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly

    PubMed Central

    Do, Hongdo; Molania, Ramyar

    2017-01-01

    The identification of genomic rearrangements with high sensitivity and specificity using massively parallel sequencing remains a major challenge, particularly in precision medicine and cancer research. Here, we describe a new method for detecting rearrangements, GRIDSS (Genome Rearrangement IDentification Software Suite). GRIDSS is a multithreaded structural variant (SV) caller that performs efficient genome-wide break-end assembly prior to variant calling using a novel positional de Bruijn graph-based assembler. By combining assembly, split read, and read pair evidence using a probabilistic scoring, GRIDSS achieves high sensitivity and specificity on simulated, cell line, and patient tumor data, recently winning SV subchallenge #5 of the ICGC-TCGA DREAM8.5 Somatic Mutation Calling Challenge. On human cell line data, GRIDSS halves the false discovery rate compared to other recent methods while matching or exceeding their sensitivity. GRIDSS identifies nontemplate sequence insertions, microhomologies, and large imperfect homologies, estimates a quality score for each breakpoint, stratifies calls into high or low confidence, and supports multisample analysis. PMID:29097403

  4. Rapid molecular diagnostics of severe primary immunodeficiency determined by using targeted next-generation sequencing.

    PubMed

    Yu, Hui; Zhang, Victor Wei; Stray-Pedersen, Asbjørg; Hanson, Imelda Celine; Forbes, Lisa R; de la Morena, M Teresa; Chinn, Ivan K; Gorman, Elizabeth; Mendelsohn, Nancy J; Pozos, Tamara; Wiszniewski, Wojciech; Nicholas, Sarah K; Yates, Anne B; Moore, Lindsey E; Berge, Knut Erik; Sorte, Hanne; Bayer, Diana K; ALZahrani, Daifulah; Geha, Raif S; Feng, Yanming; Wang, Guoli; Orange, Jordan S; Lupski, James R; Wang, Jing; Wong, Lee-Jun

    2016-10-01

    Primary immunodeficiency diseases (PIDDs) are inherited disorders of the immune system. The most severe form, severe combined immunodeficiency (SCID), presents with profound deficiencies of T cells, B cells, or both at birth. If not treated promptly, affected patients usually do not live beyond infancy because of infections. Genetic heterogeneity of SCID frequently delays the diagnosis; a specific diagnosis is crucial for life-saving treatment and optimal management. We developed a next-generation sequencing (NGS)-based multigene-targeted panel for SCID and other severe PIDDs requiring rapid therapeutic actions in a clinical laboratory setting. The target gene capture/NGS assay provides an average read depth of approximately 1000×. The deep coverage facilitates simultaneous detection of single nucleotide variants and exonic copy number variants in one comprehensive assessment. Exons with insufficient coverage (<20× read depth) or high sequence homology (pseudogenes) are complemented by amplicon-based sequencing with specific primers to ensure 100% coverage of all targeted regions. Analysis of 20 patient samples with low T-cell receptor excision circle numbers on newborn screening or a positive family history or clinical suspicion of SCID or other severe PIDD identified deleterious mutations in 14 of them. Identified pathogenic variants included both single nucleotide variants and exonic copy number variants, such as hemizygous nonsense, frameshift, and missense changes in IL2RG; compound heterozygous changes in ATM, RAG1, and CIITA; homozygous changes in DCLRE1C and IL7R; and a heterozygous nonsense mutation in CHD7. High-throughput deep sequencing analysis with complete clinical validation greatly increases the diagnostic yield of severe primary immunodeficiency. Establishing a molecular diagnosis enables early immune reconstitution through prompt therapeutic intervention and guides management for improved long-term quality of life. Copyright © 2016 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.

  5. Evaluation of exome variants using the Ion Proton Platform to sequence error-prone regions.

    PubMed

    Seo, Heewon; Park, Yoomi; Min, Byung Joo; Seo, Myung Eui; Kim, Ju Han

    2017-01-01

    The Ion Proton sequencer from Thermo Fisher accurately determines sequence variants from target regions with a rapid turnaround time at a low cost. However, misleading variant-calling errors can occur. We performed a systematic evaluation and manual curation of read-level alignments for the 675 ultrarare variants reported by the Ion Proton sequencer from 27 whole-exome sequencing data but that are not present in either the 1000 Genomes Project and the Exome Aggregation Consortium. We classified positive variant calls into 393 highly likely false positives, 126 likely false positives, and 156 likely true positives, which comprised 58.2%, 18.7%, and 23.1% of the variants, respectively. We identified four distinct error patterns of variant calling that may be bioinformatically corrected when using different strategies: simplicity region, SNV cluster, peripheral sequence read, and base inversion. Local de novo assembly successfully corrected 201 (38.7%) of the 519 highly likely or likely false positives. We also demonstrate that the two sequencing kits from Thermo Fisher (the Ion PI Sequencing 200 kit V3 and the Ion PI Hi-Q kit) exhibit different error profiles across different error types. A refined calling algorithm with better polymerase may improve the performance of the Ion Proton sequencing platform.

  6. Detection of a cfr(B) Variant in German Enterococcus faecium Clinical Isolates and the Impact on Linezolid Resistance in Enterococcus spp.

    PubMed

    Bender, Jennifer K; Fleige, Carola; Klare, Ingo; Fiedler, Stefan; Mischnik, Alexander; Mutters, Nico T; Dingle, Kate E; Werner, Guido

    2016-01-01

    The National Reference Centre for Staphylococci and Enterococci in Germany has received an increasing number of clinical linezolid-resistant E. faecium isolates in recent years. Five isolates harbored a cfr(B) variant gene locus the product of which is capable of conferring linezolid resistance. The cfr(B)-like methyltransferase gene was also detected in Clostridium difficile. Antimicrobial susceptibility was determined for cfr(B)-positive and linezolid-resistant E. faecium isolates and two isogenic C. difficile strains. All strains were subjected to whole genome sequencing and analyzed with respect to mutations in the 23S rDNA, rplC, rplD and rplV genes and integration sites of the cfr(B) variant locus. To evaluate methyltransferase function, the cfr(B) variant of Enterococcus and Clostridium was expressed in both E. coli and Enterococcus spp. Ribosomal target site mutations were detected in E. faecium strains but absent in clostridia. Sequencing revealed 99.9% identity between cfr(B) of Enterococcus and cfr of Clostridium. The methyltransferase gene is encoded by transposon Tn6218 which was present in C. difficile Ox3196, truncated in some E. faecium and absent in C. difficile Ox3206. The latter finding explains the lack of linezolid and chloramphenicol resistance in C. difficile Ox3206 and demonstrates for the first time a direct correlation of elevated linezolid MICs in C. difficile upon cfr acquisition. Tn6218 insertion sites revealed novel target loci for integration, both within the bacterial chromosome and as an integral part of plasmids. Importantly, the very first plasmid-association of a cfr(B) variant was observed. Although we failed to measure cfr(B)-mediated resistance in transformed laboratory strains the occurrence of the multidrug resistance gene cfr on putatively highly mobile and/or extrachromosomal DNA in clinical isolates is worrisome with respect to dissemination of antibiotic resistances.

  7. Detection of a cfr(B) Variant in German Enterococcus faecium Clinical Isolates and the Impact on Linezolid Resistance in Enterococcus spp.

    PubMed Central

    Fleige, Carola; Klare, Ingo; Fiedler, Stefan; Mischnik, Alexander; Mutters, Nico T.; Dingle, Kate E.; Werner, Guido

    2016-01-01

    The National Reference Centre for Staphylococci and Enterococci in Germany has received an increasing number of clinical linezolid-resistant E. faecium isolates in recent years. Five isolates harbored a cfr(B) variant gene locus the product of which is capable of conferring linezolid resistance. The cfr(B)-like methyltransferase gene was also detected in Clostridium difficile. Antimicrobial susceptibility was determined for cfr(B)-positive and linezolid-resistant E. faecium isolates and two isogenic C. difficile strains. All strains were subjected to whole genome sequencing and analyzed with respect to mutations in the 23S rDNA, rplC, rplD and rplV genes and integration sites of the cfr(B) variant locus. To evaluate methyltransferase function, the cfr(B) variant of Enterococcus and Clostridium was expressed in both E. coli and Enterococcus spp. Ribosomal target site mutations were detected in E. faecium strains but absent in clostridia. Sequencing revealed 99.9% identity between cfr(B) of Enterococcus and cfr of Clostridium. The methyltransferase gene is encoded by transposon Tn6218 which was present in C. difficile Ox3196, truncated in some E. faecium and absent in C. difficile Ox3206. The latter finding explains the lack of linezolid and chloramphenicol resistance in C. difficile Ox3206 and demonstrates for the first time a direct correlation of elevated linezolid MICs in C. difficile upon cfr acquisition. Tn6218 insertion sites revealed novel target loci for integration, both within the bacterial chromosome and as an integral part of plasmids. Importantly, the very first plasmid-association of a cfr(B) variant was observed. Although we failed to measure cfr(B)-mediated resistance in transformed laboratory strains the occurrence of the multidrug resistance gene cfr on putatively highly mobile and/or extrachromosomal DNA in clinical isolates is worrisome with respect to dissemination of antibiotic resistances. PMID:27893790

  8. Canine olfactory receptor gene polymorphism and its relation to odor detection performance by sniffer dogs.

    PubMed

    Lesniak, Anna; Walczak, Marta; Jezierski, Tadeusz; Sacharczuk, Mariusz; Gawkowski, Maciej; Jaszczak, Kazimierz

    2008-01-01

    The outstanding sensitivity of the canine olfactory system has been acknowledged by using sniffer dogs in military and civilian service for detection of a variety of odors. It is hypothesized that the canine olfactory ability is determined by polymorphisms in olfactory receptor (OR) genes. We investigated 5 OR genes for polymorphic sites which might affect the olfactory ability of service dogs in different fields of specific substance detection. All investigated OR DNA sequences proved to have allelic variants, the majority of which lead to protein sequence alteration. Homozygous individuals at 2 gene loci significantly differed in their detection skills from other genotypes. This suggests a role of specific alleles in odor detection and a linkage between single-nucleotide polymorphism and odor recognition efficiency.

  9. A programmable method for massively parallel targeted sequencing

    PubMed Central

    Hopmans, Erik S.; Natsoulis, Georges; Bell, John M.; Grimes, Susan M.; Sieh, Weiva; Ji, Hanlee P.

    2014-01-01

    We have developed a targeted resequencing approach referred to as Oligonucleotide-Selective Sequencing. In this study, we report a series of significant improvements and novel applications of this method whereby the surface of a sequencing flow cell is modified in situ to capture specific genomic regions of interest from a sample and then sequenced. These improvements include a fully automated targeted sequencing platform through the use of a standard Illumina cBot fluidics station. Targeting optimization increased the yield of total on-target sequencing data 2-fold compared to the previous iteration, while simultaneously increasing the percentage of reads that could be mapped to the human genome. The described assays cover up to 1421 genes with a total coverage of 5.5 Megabases (Mb). We demonstrate a 10-fold abundance uniformity of greater than 90% in 1 log distance from the median and a targeting rate of up to 95%. We also sequenced continuous genomic loci up to 1.5 Mb while simultaneously genotyping SNPs and genes. Variants with low minor allele fraction were sensitively detected at levels of 5%. Finally, we determined the exact breakpoint sequence of cancer rearrangements. Overall, this approach has high performance for selective sequencing of genome targets, configuration flexibility and variant calling accuracy. PMID:24782526

  10. A Novel Center Star Multiple Sequence Alignment Algorithm Based on Affine Gap Penalty and K-Band

    NASA Astrophysics Data System (ADS)

    Zou, Quan; Shan, Xiao; Jiang, Yi

    Multiple sequence alignment is one of the most important topics in computational biology, but it cannot deal with the large data so far. As the development of copy-number variant(CNV) and Single Nucleotide Polymorphisms(SNP) research, many researchers want to align numbers of similar sequences for detecting CNV and SNP. In this paper, we propose a novel multiple sequence alignment algorithm based on affine gap penalty and k-band. It can align more quickly and accurately, that will be helpful for mining CNV and SNP. Experiments prove the performance of our algorithm.

  11. Clinical and epidemiological characterization of a lymphogranuloma venereum outbreak in Madrid, Spain: co-circulation of two variants.

    PubMed

    Rodríguez-Domínguez, M; Puerta, T; Menéndez, B; González-Alba, J M; Rodríguez, C; Hellín, T; Vera, M; González-Sainz, F J; Clavo, P; Villa, M; Cantón, R; Del Romero, J; Galán, J C

    2014-03-01

    The lymphogranuloma venereum (LGV) outbreak described in the Netherlands in 2003, increased the interest in the genotyping of Chlamydia trachomatis. Although international surveillance programmes were implemented, these studies slowly decreased in the following years. Now data have revealed a new accumulation of LGV cases in those European countries with extended surveillance programmes. Between March 2009 and November 2011, a study was carried out to detect LGV cases in Madrid. The study was based on screening of C. trachomatis using commercial kits, followed by real-time pmpH-PCR discriminating LGV strains, and finally ompA gene was sequenced for phylogenetic reconstruction. Ninety-four LGV infections were identified. The number of cases increased from 10 to 30 and then to 54 during 2009-2011. Incidence of LGV was strongly associated with men who have sex with men; but in 2011, LGV cases were described in women and heterosexual men. Sixty-nine patients were also human immunodeficiency virus (HIV) positive, with detectable viral loads at the moment of LGV diagnosis, suggesting a high-risk of co-transmission. In fact, in four patients the diagnosis of HIV was simultaneous with LGV infection. The conventional treatment with doxycycline was prescribed in 75 patients, although in three patients the treatment failed. The sequencing of the ompA gene permitted identification of two independent transmission nodes. One constituted by 25 sequences identical to the L2b variant, and a second node including 37 sequences identical to L2. This epidemiological situation characterized by the co-circulation of two LGV variants has not been previously described, reinforcing the need for screening and genotyping of LGV strains. © 2013 The Authors Clinical Microbiology and Infection © 2013 European Society of Clinical Microbiology and Infectious Diseases.

  12. Genotyping microarray (gene chip) for the ABCR (ABCA4) gene.

    PubMed

    Jaakson, K; Zernant, J; Külm, M; Hutchinson, A; Tonisson, N; Glavac, D; Ravnik-Glavac, M; Hawlina, M; Meltzer, M R; Caruso, R C; Testa, F; Maugeri, A; Hoyng, C B; Gouras, P; Simonelli, F; Lewis, R A; Lupski, J R; Cremers, F P M; Allikmets, R

    2003-11-01

    Genetic variation in the ABCR (ABCA4) gene has been associated with five distinct retinal phenotypes, including Stargardt disease/fundus flavimaculatus (STGD/FFM), cone-rod dystrophy (CRD), and age-related macular degeneration (AMD). Comparative genetic analyses of ABCR variation and diagnostics have been complicated by substantial allelic heterogeneity and by differences in screening methods. To overcome these limitations, we designed a genotyping microarray (gene chip) for ABCR that includes all approximately 400 disease-associated and other variants currently described, enabling simultaneous detection of all known ABCR variants. The ABCR genotyping microarray (the ABCR400 chip) was constructed by the arrayed primer extension (APEX) technology. Each sequence change in ABCR was included on the chip by synthesis and application of sequence-specific oligonucleotides. We validated the chip by screening 136 confirmed STGD patients and 96 healthy controls, each of whom we had analyzed previously by single strand conformation polymorphism (SSCP) technology and/or heteroduplex analysis. The microarray was >98% effective in determining the existing genetic variation and was comparable to direct sequencing in that it yielded many sequence changes undetected by SSCP. In STGD patient cohorts, the efficiency of the array to detect disease-associated alleles was between 54% and 78%, depending on the ethnic composition and degree of clinical and molecular characterization of a cohort. In addition, chip analysis suggested a high carrier frequency (up to 1:10) of ABCR variants in the general population. The ABCR genotyping microarray is a robust, cost-effective, and comprehensive screening tool for variation in one gene in which mutations are responsible for a substantial fraction of retinal disease. The ABCR chip is a prototype for the next generation of screening and diagnostic tools in ophthalmic genetics, bridging clinical and scientific research. Copyright 2003 Wiley-Liss, Inc.

  13. High-throughput interpretation of gene structure changes in human and nonhuman resequencing data, using ACE

    PubMed Central

    Majoros, William H.; Campbell, Michael S.; Holt, Carson; DeNardo, Erin K.; Ware, Doreen; Allen, Andrew S.; Yandell, Mark; Reddy, Timothy E.

    2017-01-01

    Abstract Motivation: The accurate interpretation of genetic variants is critical for characterizing genotype–phenotype associations. Because the effects of genetic variants can depend strongly on their local genomic context, accurate genome annotations are essential. Furthermore, as some variants have the potential to disrupt or alter gene structure, variant interpretation efforts stand to gain from the use of individualized annotations that account for differences in gene structure between individuals or strains. Results: We describe a suite of software tools for identifying possible functional changes in gene structure that may result from sequence variants. ACE (‘Assessing Changes to Exons’) converts phased genotype calls to a collection of explicit haplotype sequences, maps transcript annotations onto them, detects gene-structure changes and their possible repercussions, and identifies several classes of possible loss of function. Novel transcripts predicted by ACE are commonly supported by spliced RNA-seq reads, and can be used to improve read alignment and transcript quantification when an individual-specific genome sequence is available. Using publicly available RNA-seq data, we show that ACE predictions confirm earlier results regarding the quantitative effects of nonsense-mediated decay, and we show that predicted loss-of-function events are highly concordant with patterns of intolerance to mutations across the human population. ACE can be readily applied to diverse species including animals and plants, making it a broadly useful tool for use in eukaryotic population-based resequencing projects, particularly for assessing the joint impact of all variants at a locus. Availability and Implementation: ACE is written in open-source C ++ and Perl and is available from geneprediction.org/ACE Contact: myandell@genetics.utah.edu or tim.reddy@duke.edu Supplementary information: Supplementary information is available at Bioinformatics online. PMID:28011790

  14. High-throughput interpretation of gene structure changes in human and nonhuman resequencing data, using ACE.

    PubMed

    Majoros, William H; Campbell, Michael S; Holt, Carson; DeNardo, Erin K; Ware, Doreen; Allen, Andrew S; Yandell, Mark; Reddy, Timothy E

    2017-05-15

    The accurate interpretation of genetic variants is critical for characterizing genotype-phenotype associations. Because the effects of genetic variants can depend strongly on their local genomic context, accurate genome annotations are essential. Furthermore, as some variants have the potential to disrupt or alter gene structure, variant interpretation efforts stand to gain from the use of individualized annotations that account for differences in gene structure between individuals or strains. We describe a suite of software tools for identifying possible functional changes in gene structure that may result from sequence variants. ACE ('Assessing Changes to Exons') converts phased genotype calls to a collection of explicit haplotype sequences, maps transcript annotations onto them, detects gene-structure changes and their possible repercussions, and identifies several classes of possible loss of function. Novel transcripts predicted by ACE are commonly supported by spliced RNA-seq reads, and can be used to improve read alignment and transcript quantification when an individual-specific genome sequence is available. Using publicly available RNA-seq data, we show that ACE predictions confirm earlier results regarding the quantitative effects of nonsense-mediated decay, and we show that predicted loss-of-function events are highly concordant with patterns of intolerance to mutations across the human population. ACE can be readily applied to diverse species including animals and plants, making it a broadly useful tool for use in eukaryotic population-based resequencing projects, particularly for assessing the joint impact of all variants at a locus. ACE is written in open-source C ++ and Perl and is available from geneprediction.org/ACE. myandell@genetics.utah.edu or tim.reddy@duke.edu. Supplementary information is available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  15. Detection of ATM germline variants by the p53 mitotic centrosomal localization test in BRCA1/2-negative patients with early-onset breast cancer.

    PubMed

    Prodosmo, Andrea; Buffone, Amelia; Mattioni, Manlio; Barnabei, Agnese; Persichetti, Agnese; De Leo, Aurora; Appetecchia, Marialuisa; Nicolussi, Arianna; Coppa, Anna; Sciacchitano, Salvatore; Giordano, Carolina; Pinnarò, Paola; Sanguineti, Giuseppe; Strigari, Lidia; Alessandrini, Gabriele; Facciolo, Francesco; Cosimelli, Maurizio; Grazi, Gian Luca; Corrado, Giacomo; Vizza, Enrico; Giannini, Giuseppe; Soddu, Silvia

    2016-09-06

    Variant ATM heterozygotes have an increased risk of developing cancer, cardiovascular diseases, and diabetes. Costs and time of sequencing and ATM variant complexity make large-scale, general population screenings not cost-effective yet. Recently, we developed a straightforward, rapid, and inexpensive test based on p53 mitotic centrosomal localization (p53-MCL) in peripheral blood mononuclear cells (PBMCs) that diagnoses mutant ATM zygosity and recognizes tumor-associated ATM polymorphisms. Fresh PBMCs from 496 cancer patients were analyzed by p53-MCL: 90 cases with familial BRCA1/2-positive and -negative breast and/or ovarian cancer, 337 with sporadic cancers (ovarian, lung, colon, and post-menopausal breast cancers), and 69 with breast/thyroid cancer. Variants were confirmed by ATM sequencing. A total of seven individuals with ATM variants were identified, 5/65 (7.7 %) in breast cancer cases of familial breast and/or ovarian cancer and 2/69 (2.9 %) in breast/thyroid cancer. No variant ATM carriers were found among the other cancer cases. Excluding a single case in which both BRCA1 and ATM were mutated, no p53-MCL alterations were observed in BRCA1/2-positive cases. These data validate p53-MCL as reliable and specific test for germline ATM variants, confirm ATM as breast cancer susceptibility gene, and highlight a possible association with breast/thyroid cancers.

  16. THAP1/DYT6 sequence variants in non-DYT1 early-onset primary dystonia in China and their effects on RNA expression.

    PubMed

    Cheng, Fu Bo; Ozelius, Laurie J; Wan, Xin Hua; Feng, Jia Chun; Ma, Ling Yan; Yang, Ying Mai; Wang, Lin

    2012-02-01

    Mutations in the THAP1 gene were recently identified as the cause of DYT6 primary dystonia. More than 40 mutations in this gene have been described in different populations. However, no previous report has identified sequence variations that affect the transcript process of the THAP1 gene. In addition, the mutation frequency in Chinese early-onset primary dystonia has not been well characterized. One hundred and two unrelated patients with non-DYT1 early-onset primary dystonia (age at onset <26 years), family members of participants with mutations, and 200 neurologically normal controls were screened for THAP1 gene mutations. The effects of the identified mutations on RNA expression were analyzed using semi-quantitative real-time PCR. Seven sequence variants (c.63_66del TTTC, c.161G>T, c.224A>T, c.267G>A, c.339T>C, c.449A>C, and c.539T>C) were identified in this group of patients (6.9%). In this cohort, 15 subjects (seven unrelated patients and eight family members) were detected to have THAP1 sequence variants. Among these 15 subjects, 11 were manifested (penetrance of DYT6 was 73.3%) and seven presented with craniocervical involvement (63.6%). However, one patient manifested paroxysmal headshake, and one presented with essential hand tremor. Semi-quantitative real-time PCR indicated that a novel silent mutation (c.267G>A) decreased the expression of THAP1 in human lymphocytes. Our findings indicated that THAP1 sequence variants are not common in non-DYT1 early-onset primary dystonia in China and that the clinical manifestation may vary. One silent mutation (c.267G>A) was shown to affect THAP1 expression.

  17. Principles and Recommendations for Standardizing the Use of the Next-Generation Sequencing Variant File in Clinical Settings.

    PubMed

    Lubin, Ira M; Aziz, Nazneen; Babb, Lawrence J; Ballinger, Dennis; Bisht, Himani; Church, Deanna M; Cordes, Shaun; Eilbeck, Karen; Hyland, Fiona; Kalman, Lisa; Landrum, Melissa; Lockhart, Edward R; Maglott, Donna; Marth, Gabor; Pfeifer, John D; Rehm, Heidi L; Roy, Somak; Tezak, Zivana; Truty, Rebecca; Ullman-Cullere, Mollie; Voelkerding, Karl V; Worthey, Elizabeth A; Zaranek, Alexander W; Zook, Justin M

    2017-05-01

    A national workgroup convened by the Centers for Disease Control and Prevention identified principles and made recommendations for standardizing the description of sequence data contained within the variant file generated during the course of clinical next-generation sequence analysis for diagnosing human heritable conditions. The specifications for variant files were initially developed to be flexible with regard to content representation to support a variety of research applications. This flexibility permits variation with regard to how sequence findings are described and this depends, in part, on the conventions used. For clinical laboratory testing, this poses a problem because these differences can compromise the capability to compare sequence findings among laboratories to confirm results and to query databases to identify clinically relevant variants. To provide for a more consistent representation of sequence findings described within variant files, the workgroup made several recommendations that considered alignment to a common reference sequence, variant caller settings, use of genomic coordinates, and gene and variant naming conventions. These recommendations were considered with regard to the existing variant file specifications presently used in the clinical setting. Adoption of these recommendations is anticipated to reduce the potential for ambiguity in describing sequence findings and facilitate the sharing of genomic data among clinical laboratories and other entities. Copyright © 2017 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  18. Shewanella species as the origin of blaOXA-48 genes: insights into gene diversity, associated phenotypes and possible transfer mechanisms.

    PubMed

    Tacão, Marta; Araújo, Susana; Vendas, Maria; Alves, Artur; Henriques, Isabel

    2018-03-01

    Chromosome-encoded beta-lactamases of Shewanella spp. have been indicated as probable progenitors of bla OXA-48 -like genes. However, these have been detected in few Shewanella spp. and dissemination mechanisms are unclear. Thus, our main objective was to confirm the role of Shewanella species as progenitors of bla OXA-48 -like genes. In silico analysis of Shewanella genomes was performed to detect bla OXA-48 -like genes and context, and 43 environmental Shewanella spp. were characterised. Clonal relatedness was determined by BOX-PCR. Phylogenetic affiliation was assessed by 16S rDNA and gyrB sequencing. Antibiotic susceptibility phenotypes were determined. The bla OXA-48 -like genes and genetic context were inspected by PCR, hybridisation and sequence analysis. Gene variants were cloned in Escherichia coli and MICs were determined. Shewanella isolates were screened for integrons, plasmids and insertion sequences. Analysis of Shewanella spp. genomes showed that putative bla OXA-48 -like is present in the majority and in an identical context. Isolates presenting unique BOX profiles affiliated with 11 Shewanella spp. bla OXA-48 -like genes were detected in 22 isolates from 6 species. Genes encoded enzymes identical to OXA-48, OXA-204, OXA-181, and 7 new variants differing from OXA-48 from 2 to 82 amino acids. IS1999 was detected in 24 isolates, although not in the vicinity of bla OXA-48 genes. Recombinant E. coli strains presented altered MICs. The presence/absence of bla OXA-48 -like genes was species-related. Gene variants encoded enzymes with hydrolytic spectra similar to OXA-48-like from non-shewanellae. From the mobile elements previously described in association with bla OXA-48 -like genes, only the IS1999 was found in Shewanella, which indicates its relevance in bla OXA-48 -like genes transfer to other hosts. Copyright © 2017 Elsevier B.V. and International Society of Chemotherapy. All rights reserved.

  19. Molecular screening of blue mussels indicated high mid-summer prevalence of human genogroup II Noroviruses, including the pandemic "GII.4 2012" variants in UK coastal waters during 2013.

    PubMed

    Biswas, Subhajit; Jackson, Philippa; Shannon, Rebecca; Dulwich, Katherine; Sukla, Soumi; Dixon, Ronald A

    This molecular study is the first report, to the best of our knowledge, on identification of norovirus, NoV GII.4 Sydney 2012 variants, from blue mussels collected from UK coastal waters. Blue mussels (three pooled samples from twelve mussels) collected during the 2013 summer months from UK coastal sites were screened by RT-PCR assays. PCR products of RdRP gene for noroviruses were purified, sequenced and subjected to phylogenetic analysis. All the samples tested positive for NoVs. Sequencing revealed that the NoV partial RdRP gene sequences from two pooled samples clustered with the pandemic "GII.4 Sydney variants" whilst the other pooled sample clustered with the NoV GII.2 variants. This molecular study indicated mussel contamination with pathogenic NoVs even during mid-summer in UK coastal waters which posed potential risk of NoV outbreaks irrespective of season. As the detection of Sydney 2012 NoV from our preliminary study of natural coastal mussels interestingly corroborated with NoV outbreaks in nearby areas during the same period, it emphasizes the importance of environmental surveillance work for forecast of high risk zones of NoV outbreaks. Copyright © 2017 Sociedade Brasileira de Microbiologia. Published by Elsevier Editora Ltda. All rights reserved.

  20. Next-generation sequencing meets genetic diagnostics: development of a comprehensive workflow for the analysis of BRCA1 and BRCA2 genes

    PubMed Central

    Feliubadaló, Lídia; Lopez-Doriga, Adriana; Castellsagué, Ester; del Valle, Jesús; Menéndez, Mireia; Tornero, Eva; Montes, Eva; Cuesta, Raquel; Gómez, Carolina; Campos, Olga; Pineda, Marta; González, Sara; Moreno, Victor; Brunet, Joan; Blanco, Ignacio; Serra, Eduard; Capellá, Gabriel; Lázaro, Conxi

    2013-01-01

    Next-generation sequencing (NGS) is changing genetic diagnosis due to its huge sequencing capacity and cost-effectiveness. The aim of this study was to develop an NGS-based workflow for routine diagnostics for hereditary breast and ovarian cancer syndrome (HBOCS), to improve genetic testing for BRCA1 and BRCA2. A NGS-based workflow was designed using BRCA MASTR kit amplicon libraries followed by GS Junior pyrosequencing. Data analysis combined Variant Identification Pipeline freely available software and ad hoc R scripts, including a cascade of filters to generate coverage and variant calling reports. A BRCA homopolymer assay was performed in parallel. A research scheme was designed in two parts. A Training Set of 28 DNA samples containing 23 unique pathogenic mutations and 213 other variants (33 unique) was used. The workflow was validated in a set of 14 samples from HBOCS families in parallel with the current diagnostic workflow (Validation Set). The NGS-based workflow developed permitted the identification of all pathogenic mutations and genetic variants, including those located in or close to homopolymers. The use of NGS for detecting copy-number alterations was also investigated. The workflow meets the sensitivity and specificity requirements for the genetic diagnosis of HBOCS and improves on the cost-effectiveness of current approaches. PMID:23249957

  1. Whole genome sequencing in the search for genes associated with the control of SIV infection in the Mauritian macaque model.

    PubMed

    de Manuel, Marc; Shiina, Takashi; Suzuki, Shingo; Dereuddre-Bosquet, Nathalie; Garchon, Henri-Jean; Tanaka, Masayuki; Congy-Jolivet, Nicolas; Aarnink, Alice; Le Grand, Roger; Marques-Bonet, Tomas; Blancher, Antoine

    2018-05-08

    In the Mauritian macaque experimentally inoculated with SIV, gene polymorphisms potentially associated with the plasma virus load at a set point, approximately 100 days post inoculation, were investigated. Among the 42 animals inoculated with 50 AID 50 of the same strain of SIV, none of which received any preventive or curative treatment, nine individuals were selected: three with a plasma virus load (PVL) among the lowest, three with intermediate PVL values and three among the highest PVL values. The complete genomes of these nine animals were then analyzed. Initially, attention was focused on variants with a potential functional impact on protein encoding genes (non-synonymous SNPs (NS-SNPs) and splicing variants). Thus, 424 NS-SNPs possibly associated with PVL were detected. The 424 candidates SNPs were genotyped in these 42 SIV experimentally infected animals (including the nine animals subjected to whole genome sequencing). The genes containing variants most probably associated with PVL at a set time point are analyzed herein.

  2. Possible role of rare variants in Trace amine associated receptor 1 in schizophrenia.

    PubMed

    John, Jibin; Kukshal, Prachi; Bhatia, Triptish; Chowdari, K V; Nimgaonkar, V L; Deshpande, S N; Thelma, B K

    2017-11-01

    Schizophrenia (SZ) is a chronic mental illness with behavioral abnormalities. Recent common variant based genome wide association studies and rare variant detection using next generation sequencing approaches have identified numerous variants that confer risk for SZ, but etiology remains unclear propelling continuing investigations. Using whole exome sequencing, we identified a rare heterozygous variant (c.545G>T; p.Cys182Phe) in Trace amine associated receptor 1 gene (TAAR1 6q23.2) in three affected members in a small SZ family. The variant predicted to be damaging by 15 prediction tools, causes breakage of a conserved disulfide bond in this G-protein-coupled receptor. On screening this intronless gene for additional variant(s) in ~800 sporadic SZ patients, we identified six rare protein altering variants (MAF<0.001) namely p.Ser47Cys, p.Phe51Leu, p.Tyr294Ter, p.Leu295Ser in four unrelated north Indian cases (n=475); p.Ala109Thr and p.Val250Ala in two independent Caucasian/African-American patients (n=310). Five of these variants were also predicted to be damaging. Besides, a rare synonymous variant was observed in SZ patients. These rare variants were absent in north Indian healthy controls (n=410) but significantly enriched in patients (p=0.036). Conversely, three common coding SNPs (rs8192621, rs8192620 and rs8192619) and a promoter SNP (rs60266355) tested for association with SZ in the north Indian cohort were not significant (P>0.05). TAAR1 is a modulator of monoaminergic pathways and interacts with AKT signaling pathways. Substantial animal model based pharmacological and functional data implying its relevance in SZ are also available. However, this is the first report suggestive of the likely contribution of rare variants in this gene to SZ. Copyright © 2017 Elsevier B.V. All rights reserved.

  3. The genomic landscape shaped by selection on transposable elements across 18 mouse strains.

    PubMed

    Nellåker, Christoffer; Keane, Thomas M; Yalcin, Binnaz; Wong, Kim; Agam, Avigail; Belgard, T Grant; Flint, Jonathan; Adams, David J; Frankel, Wayne N; Ponting, Chris P

    2012-06-15

    Transposable element (TE)-derived sequence dominates the landscape of mammalian genomes and can modulate gene function by dysregulating transcription and translation. Our current knowledge of TEs in laboratory mouse strains is limited primarily to those present in the C57BL/6J reference genome, with most mouse TEs being drawn from three distinct classes, namely short interspersed nuclear elements (SINEs), long interspersed nuclear elements (LINEs) and the endogenous retrovirus (ERV) superfamily. Despite their high prevalence, the different genomic and gene properties controlling whether TEs are preferentially purged from, or are retained by, genetic drift or positive selection in mammalian genomes remain poorly defined. Using whole genome sequencing data from 13 classical laboratory and 4 wild-derived mouse inbred strains, we developed a comprehensive catalogue of 103,798 polymorphic TE variants. We employ this extensive data set to characterize TE variants across the Mus lineage, and to infer neutral and selective processes that have acted over 2 million years. Our results indicate that the majority of TE variants are introduced though the male germline and that only a minority of TE variants exert detectable changes in gene expression. However, among genes with differential expression across the strains there are twice as many TE variants identified as being putative causal variants as expected. Most TE variants that cause gene expression changes appear to be purged rapidly by purifying selection. Our findings demonstrate that past TE insertions have often been highly deleterious, and help to prioritize TE variants according to their likely contribution to gene expression or phenotype variation.

  4. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs.

    PubMed

    Saunders, Christopher T; Wong, Wendy S W; Swamy, Sajani; Becq, Jennifer; Murray, Lisa J; Cheetham, R Keira

    2012-07-15

    Whole genome and exome sequencing of matched tumor-normal sample pairs is becoming routine in cancer research. The consequent increased demand for somatic variant analysis of paired samples requires methods specialized to model this problem so as to sensitively call variants at any practical level of tumor impurity. We describe Strelka, a method for somatic SNV and small indel detection from sequencing data of matched tumor-normal samples. The method uses a novel Bayesian approach which represents continuous allele frequencies for both tumor and normal samples, while leveraging the expected genotype structure of the normal. This is achieved by representing the normal sample as a mixture of germline variation with noise, and representing the tumor sample as a mixture of the normal sample with somatic variation. A natural consequence of the model structure is that sensitivity can be maintained at high tumor impurity without requiring purity estimates. We demonstrate that the method has superior accuracy and sensitivity on impure samples compared with approaches based on either diploid genotype likelihoods or general allele-frequency tests. The Strelka workflow source code is available at ftp://strelka@ftp.illumina.com/. csaunders@illumina.com

  5. Copy Number Variants in Obesity-Related Syndromes: Review and Perspectives on Novel Molecular Approaches

    PubMed Central

    Koiffmann, Celia Priszkulnik

    2012-01-01

    In recent decades, obesity has reached epidemic proportions worldwide and became a major concern in public health. Despite heritability estimates of 40 to 70% and the long-recognized genetic basis of obesity in a number of rare cases, the list of common obesity susceptibility variants by the currently published genome-wide association studies (GWASs) only explain a small proportion of the individual variation in risk of obesity. It was not until very recently that GWASs of copy number variants (CNVs) in individuals with extreme phenotypes reported a number of large and rare CNVs conferring high risk to obesity, and specifically deletions on chromosome 16p11.2. In this paper, we comment on the recent advances in the field of genetics of obesity with an emphasis on the genes and genomic regions implicated in highly penetrant forms of obesity associated with developmental disorders. Array genomic hybridization in this patient population has afforded discovery opportunities for CNVs that have not previously been detectable. This information can be used to generate new diagnostic arrays and sequencing platforms, which will likely enhance detection of known genetic conditions with the potential to elucidate new disease genes and ultimately help in developing a next-generation sequencing protocol relevant to clinical practice. PMID:23316347

  6. Increased norovirus activity was associated with a novel norovirus GII.17 variant in Beijing, China during winter 2014-2015.

    PubMed

    Gao, Zhiyong; Liu, Baiwei; Huo, Da; Yan, Hanqiu; Jia, Lei; Du, Yiwei; Qian, Haikun; Yang, Yang; Wang, Xiaoli; Li, Jie; Wang, Quanyi

    2015-12-18

    Norovirus (NoV) is a leading cause of sporadic cases and outbreaks of acute gastroenteritis (AGE). Increased NoV activity was observed in Beijing, China during winter 2014-2015; therefore, we examined the epidemiological patterns and genetic characteristics of NoV in the sporadic cases and outbreaks. The weekly number of infectious diarrhea cases reported by all hospitals in Beijing was analyzed through the China information system for disease control and prevention. Fecal specimens were collected from the outbreaks and outpatients with AGE, and GI and GII NoVs were detected using real time reverse transcription polymerase chain reaction. The partial capsid genes and RNA-dependent RNA polymerase (RdRp) genes of NoV were both amplified and sequenced, and genotyping and phylogenetic analyses were performed. Between December 2014 and March 2015, the number of infectious diarrhea cases in Beijing (10,626 cases) increased by 35.6% over that of the previous year (7835 cases), and the detection rate of NoV (29.8%, 191/640) among outpatients with AGE was significantly higher than in the previous year (12.9%, 79/613) (χ(2) = 53.252, P < 0.001). Between November 2014 and March 2015, 35 outbreaks of AGE were reported in Beijing, and NoVs were detected in 33 outbreaks, all of which belonged to the GII genogroup. NoVs were sequenced and genotyped in 22 outbreaks, among which 20 were caused by a novel GII.17 strain. Among outpatients with AGE, this novel GII.17 strain was first detected in an outpatient in August 2014, and it replaced GII.4 Sydney_2012 as the predominant variant between December 2014 and March 2015. A phylogenetic analysis of the capsid genes and RdRp genes revealed that this novel GII.17 strain was distinct from previously identified GII variants, and it was recently designated as GII.P17_GII.17. This variant was further clustered into two sub-groups, named GII.17_2012 and GII.17_2014. During winter 2014-2015, GII.17_2014 caused the majority of AGE outbreaks in China and Japan. During winter 2014-2015, a novel NoV GII.17 variant replaced the GII.4 variant Sydney 2012 as the predominant strain in Beijing, China and caused increased NoV activity.

  7. JAK2 Exon 14 Deletion in Patients with Chronic Myeloproliferative Neoplasms

    PubMed Central

    Ma, Wanlong; Kantarjian, Hagop; Zhang, Xi; Wang, Xiuqiang; Zhang, Zhong; Yeh, Chen-Hsiung; O'Brien, Susan; Giles, Francis; Bruey, Jean Marie; Albitar, Maher

    2010-01-01

    Background The JAK2 V617F mutation in exon 14 is the most common mutation in chronic myeloproliferative neoplasms (MPNs); deletion of the entire exon 14 is rarely detected. In our previous study of >10,000 samples from patients with suspected MPNs tested for JAK2 mutations by reverse transcription-PCR (RT-PCR) with direct sequencing, complete deletion of exon 14 (Δexon14) constituted <1% of JAK2 mutations. This appears to be an alternative splicing mutation, not detectable with DNA-based testing. Methodology/Principal Findings We investigated the possibility that MPN patients may express the JAK2 Δexon14 at low levels (<15% of total transcript) not routinely detectable by RT-PCR with direct sequencing. Using a sensitive RT-PCR–based fluorescent fragment analysis method to quantify JAK2 Δexon14 mRNA expression relative to wild-type, we tested 61 patients with confirmed MPNs, 183 with suspected MPNs (93 V617F-positive, 90 V617F-negative), and 46 healthy control subjects. The Δexon14 variant was detected in 9 of the 61 (15%) confirmed MPN patients, accounting for 3.96% to 33.85% (mean  = 12.04%) of total JAK2 transcript. This variant was also detected in 51 of the 183 patients with suspected MPNs (27%), including 20 of the 93 (22%) with V617F (mean [range] expression  = 5.41% [2.13%–26.22%]) and 31 of the 90 (34%) without V617F (mean [range] expression  = 3.88% [2.08%–12.22%]). Immunoprecipitation studies demonstrated that patients expressing Δexon14 mRNA expressed a corresponding truncated JAK2 protein. The Δexon14 variant was not detected in the 46 control subjects. Conclusions/Significance These data suggest that expression of the JAK2 Δexon14 splice variant, leading to a truncated JAK2 protein, is common in patients with MPNs. This alternatively spliced transcript appears to be more frequent in MPN patients without V617F mutation, in whom it might contribute to leukemogenesis. This mutation is missed if DNA rather than RNA is used for testing. PMID:20730051

  8. A Perfect Match Genomic Landscape Provides a Unified Framework for the Precise Detection of Variation in Natural and Synthetic Haploid Genomes.

    PubMed

    Palacios-Flores, Kim; García-Sotelo, Jair; Castillo, Alejandra; Uribe, Carina; Aguilar, Luis; Morales, Lucía; Gómez-Romero, Laura; Reyes, José; Garciarubio, Alejandro; Boege, Margareta; Dávila, Guillermo

    2018-04-01

    We present a conceptually simple, sensitive, precise, and essentially nonstatistical solution for the analysis of genome variation in haploid organisms. The generation of a Perfect Match Genomic Landscape (PMGL), which computes intergenome identity with single nucleotide resolution, reveals signatures of variation wherever a query genome differs from a reference genome. Such signatures encode the precise location of different types of variants, including single nucleotide variants, deletions, insertions, and amplifications, effectively introducing the concept of a general signature of variation. The precise nature of variants is then resolved through the generation of targeted alignments between specific sets of sequence reads and known regions of the reference genome. Thus, the perfect match logic decouples the identification of the location of variants from the characterization of their nature, providing a unified framework for the detection of genome variation. We assessed the performance of the PMGL strategy via simulation experiments. We determined the variation profiles of natural genomes and of a synthetic chromosome, both in the context of haploid yeast strains. Our approach uncovered variants that have previously escaped detection. Moreover, our strategy is ideally suited for further refining high-quality reference genomes. The source codes for the automated PMGL pipeline have been deposited in a public repository. Copyright © 2018 by the Genetics Society of America.

  9. [Detection of pathogenic mutations in Marfan syndrome by targeted next-generation semiconductor sequencing].

    PubMed

    Lu, Chaoxia; Wu, Wei; Xiao, Jifang; Meng, Yan; Zhang, Shuyang; Zhang, Xue

    2013-06-01

    To detect pathogenic mutations in Marfan syndrome (MFS) using an Ion Torrent Personal Genome Machine (PGM) and to validate the result of targeted next-generation semiconductor sequencing for the diagnosis of genetic disorders. Peripheral blood samples were collected from three MFS patients and a normal control with informed consent. Genomic DNA was isolated by standard method and then subjected to targeted sequencing using an Ion Ampliseq(TM) Inherited Disease Panel. Three multiplex PCR reactions were carried out to amplify the coding exons of 328 genes including FBN1, TGFBR1 and TGFBR2. DNA fragments from different samples were ligated with barcoded sequencing adaptors. Template preparation and emulsion PCR, and Ion Sphere Particles enrichment were carried out using an Ion One Touch system. The ion sphere particles were sequenced on a 318 chip using the PGM platform. Data from the PGM runs were processed using an Ion Torrent Suite 3.2 software to generate sequence reads. After sequence alignment and extraction of SNPs and indels, all the variants were filtered against dbSNP137. DNA sequences were visualized with an Integrated Genomics Viewer. The most likely disease-causing variants were analyzed by Sanger sequencing. The PGM sequencing has yielded an output of 855.80 Mb, with a > 100 × median sequencing depth and a coverage of > 98% for the targeted regions in all the four samples. After data analysis and database filtering, one known missense mutation (p.E1811K) and two novel premature termination mutations (p.E2264X and p.L871FfsX23) in the FBN1 gene were identified in the three MFS patients. All mutations were verified by conventional Sanger sequencing. Pathogenic FBN1 mutations have been identified in all patients with MFS, indicating that the targeted next-generation sequencing on the PGM sequencers can be applied for accurate and high-throughput testing of genetic disorders.

  10. Interactive web-based identification and visualization of transcript shared sequences.

    PubMed

    Azhir, Alaleh; Merino, Louis-Henri; Nauen, David W

    2018-05-12

    We have developed TraC (Transcript Consensus), a web-based tool for detecting and visualizing shared sequences among two or more mRNA transcripts such as splice variants. Results including exon-exon boundaries are returned in a highly intuitive, data-rich, interactive plot that permits users to explore the similarities and differences of multiple transcript sequences. The online tool (http://labs.pathology.jhu.edu/nauen/trac/) is free to use. The source code is freely available for download (https://github.com/nauenlab/TraC). Copyright © 2018 Elsevier Inc. All rights reserved.

  11. Characterization of the two intra-individual sequence variants in the 18S rRNA gene in the plant parasitic nematode, Rotylenchulus reniformis.

    PubMed

    Nyaku, Seloame T; Sripathi, Venkateswara R; Kantety, Ramesh V; Gu, Yong Q; Lawrence, Kathy; Sharma, Govind C

    2013-01-01

    The 18S rRNA gene is fundamental to cellular and organismal protein synthesis and because of its stable persistence through generations it is also used in phylogenetic analysis among taxa. Sequence variation in this gene within a single species is rare, but it has been observed in few metazoan organisms. More frequently it has mostly been reported in the non-transcribed spacer region. Here, we have identified two sequence variants within the near full coding region of 18S rRNA gene from a single reniform nematode (RN) Rotylenchulus reniformis labeled as reniform nematode variant 1 (RN_VAR1) and variant 2 (RN_VAR2). All sequences from three of the four isolates had both RN variants in their sequences; however, isolate 13B had only RN variant 2 sequence. Specific variable base sites (96 or 5.5%) were found within the 18S rRNA gene that can clearly distinguish the two 18S rDNA variants of RN, in 11 (25.0%) and 33 (75.0%) of the 44 RN clones, for RN_VAR1 and RN_VAR2, respectively. Neighbor-joining trees show that the RN_VAR1 is very similar to the previously existing R. reniformis sequence in GenBank, while the RN_VAR2 sequence is more divergent. This is the first report of the identification of two major variants of the 18S rRNA gene in the same single RN, and documents the specific base variation between the two variants, and hypothesizes on simultaneous co-existence of these two variants for this gene.

  12. Characterization of the Two Intra-Individual Sequence Variants in the 18S rRNA Gene in the Plant Parasitic Nematode, Rotylenchulus reniformis

    PubMed Central

    Nyaku, Seloame T.; Sripathi, Venkateswara R.; Kantety, Ramesh V.; Gu, Yong Q.; Lawrence, Kathy; Sharma, Govind C.

    2013-01-01

    The 18S rRNA gene is fundamental to cellular and organismal protein synthesis and because of its stable persistence through generations it is also used in phylogenetic analysis among taxa. Sequence variation in this gene within a single species is rare, but it has been observed in few metazoan organisms. More frequently it has mostly been reported in the non-transcribed spacer region. Here, we have identified two sequence variants within the near full coding region of 18S rRNA gene from a single reniform nematode (RN) Rotylenchulus reniformis labeled as reniform nematode variant 1 (RN_VAR1) and variant 2 (RN_VAR2). All sequences from three of the four isolates had both RN variants in their sequences; however, isolate 13B had only RN variant 2 sequence. Specific variable base sites (96 or 5.5%) were found within the 18S rRNA gene that can clearly distinguish the two 18S rDNA variants of RN, in 11 (25.0%) and 33 (75.0%) of the 44 RN clones, for RN_VAR1 and RN_VAR2, respectively. Neighbor-joining trees show that the RN_VAR1 is very similar to the previously existing R. reniformis sequence in GenBank, while the RN_VAR2 sequence is more divergent. This is the first report of the identification of two major variants of the 18S rRNA gene in the same single RN, and documents the specific base variation between the two variants, and hypothesizes on simultaneous co-existence of these two variants for this gene. PMID:23593343

  13. Whole exome sequencing for familial bicuspid aortic valve identifies putative variants.

    PubMed

    Martin, Lisa J; Pilipenko, Valentina; Kaufman, Kenneth M; Cripe, Linda; Kottyan, Leah C; Keddache, Mehdi; Dexheimer, Phillip; Weirauch, Matthew T; Benson, D Woodrow

    2014-10-01

    Bicuspid aortic valve (BAV) is the most common congenital cardiovascular malformation. Although highly heritable, few causal variants have been identified. The purpose of this study was to identify genetic variants underlying BAV by whole exome sequencing a multiplex BAV kindred. Whole exome sequencing was performed on 17 individuals from a single family (BAV=3; other cardiovascular malformation, 3). Postvariant calling error control metrics were established after examining the relationship between Mendelian inheritance error rate and coverage, quality score, and call rate. To determine the most effective approach to identifying susceptibility variants from among 54 674 variants passing error control metrics, we evaluated 3 variant selection strategies frequently used in whole exome sequencing studies plus extended family linkage. No putative rare, high-effect variants were identified in all affected but no unaffected individuals. Eight high-effect variants were identified by ≥2 of the commonly used selection strategies; however, these were either common in the general population (>10%) or present in the majority of the unaffected family members. However, using extended family linkage, 3 synonymous variants were identified; all 3 variants were identified by at least one other strategy. These results suggest that traditional whole exome sequencing approaches, which assume causal variants alter coding sense, may be insufficient for BAV and other complex traits. Identification of disease-associated variants is facilitated by the use of segregation within families. © 2014 American Heart Association, Inc.

  14. PRNP genetic variability and molecular typing of natural goat scrapie isolates in a high number of infected flocks

    PubMed Central

    2011-01-01

    One hundred and four scrapie positive and 77 negative goats from 34 Greek mixed flocks were analysed by prion protein gene sequencing and 17 caprine scrapie isolates from 11 flocks were submitted to molecular isolate typing. For the first time, the protective S146 variant was reported in Greece, while the protective K222 variant was detected in negative but also in five scrapie positive goats from heavily infected flocks. By immunoblotting six isolates, including two goat flockmates carrying the K222 variant, showed molecular features slightly different from all other Greek and Italian isolates co-analysed, possibly suggesting the presence of different scrapie strains in Greece. PMID:21961834

  15. PRNP genetic variability and molecular typing of natural goat scrapie isolates in a high number of infected flocks.

    PubMed

    Fragkiadaki, Eirini G; Vaccari, Gabriele; Ekateriniadou, Loukia V; Agrimi, Umberto; Giadinis, Nektarios D; Chiappini, Barbara; Esposito, Elena; Conte, Michela; Nonno, Romolo

    2011-09-30

    One hundred and four scrapie positive and 77 negative goats from 34 Greek mixed flocks were analysed by prion protein gene sequencing and 17 caprine scrapie isolates from 11 flocks were submitted to molecular isolate typing. For the first time, the protective S146 variant was reported in Greece, while the protective K222 variant was detected in negative but also in five scrapie positive goats from heavily infected flocks. By immunoblotting six isolates, including two goat flockmates carrying the K222 variant, showed molecular features slightly different from all other Greek and Italian isolates co-analysed, possibly suggesting the presence of different scrapie strains in Greece.

  16. High resolution identity testing of inactivated poliovirus vaccines.

    PubMed

    Mee, Edward T; Minor, Philip D; Martin, Javier

    2015-07-09

    Definitive identification of poliovirus strains in vaccines is essential for quality control, particularly where multiple wild-type and Sabin strains are produced in the same facility. Sequence-based identification provides the ultimate in identity testing and would offer several advantages over serological methods. We employed random RT-PCR and high throughput sequencing to recover full-length genome sequences from monovalent and trivalent poliovirus vaccine products at various stages of the manufacturing process. All expected strains were detected in previously characterised products and the method permitted identification of strains comprising as little as 0.1% of sequence reads. Highly similar Mahoney and Sabin 1 strains were readily discriminated on the basis of specific variant positions. Analysis of a product known to contain incorrect strains demonstrated that the method correctly identified the contaminants. Random RT-PCR and shotgun sequencing provided high resolution identification of vaccine components. In addition to the recovery of full-length genome sequences, the method could also be easily adapted to the characterisation of minor variant frequencies and distinction of closely related products on the basis of distinguishing consensus and low frequency polymorphisms. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  17. Whole Transcriptome Sequencing Enables Discovery and Analysis of Viruses in Archived Primary Central Nervous System Lymphomas

    PubMed Central

    DeBoever, Christopher; Reid, Erin G.; Smith, Erin N.; Wang, Xiaoyun; Dumaop, Wilmar; Harismendy, Olivier; Carson, Dennis; Richman, Douglas; Masliah, Eliezer; Frazer, Kelly A.

    2013-01-01

    Primary central nervous system lymphomas (PCNSL) have a dramatically increased prevalence among persons living with AIDS and are known to be associated with human Epstein Barr virus (EBV) infection. Previous work suggests that in some cases, co-infection with other viruses may be important for PCNSL pathogenesis. Viral transcription in tumor samples can be measured using next generation transcriptome sequencing. We demonstrate the ability of transcriptome sequencing to identify viruses, characterize viral expression, and identify viral variants by sequencing four archived AIDS-related PCNSL tissue samples and analyzing raw sequencing reads. EBV was detected in all four PCNSL samples and cytomegalovirus (CMV), JC polyomavirus (JCV), and HIV were also discovered, consistent with clinical diagnoses. CMV was found to express three long non-coding RNAs recently reported as expressed during active infection. Single nucleotide variants were observed in each of the viruses observed and three indels were found in CMV. No viruses were found in several control tumor types including 32 diffuse large B-cell lymphoma samples. This study demonstrates the ability of next generation transcriptome sequencing to accurately identify viruses, including DNA viruses, in solid human cancer tissue samples. PMID:24023918

  18. Next generation sequencing in women affected by nonsyndromic premature ovarian failure displays new potential causative genes and mutations.

    PubMed

    Fonseca, Dora Janeth; Patiño, Liliana Catherine; Suárez, Yohjana Carolina; de Jesús Rodríguez, Asid; Mateus, Heidi Eliana; Jiménez, Karen Marcela; Ortega-Recalde, Oscar; Díaz-Yamal, Ivonne; Laissue, Paul

    2015-07-01

    To identify new molecular actors involved in nonsyndromic premature ovarian failure (POF) etiology. This is a retrospective case-control cohort study. University research group and IVF medical center. Twelve women affected by nonsyndromic POF. The control group included 176 women whose menopause had occurred after age 50 and had no antecedents regarding gynecological disease. A further 345 women from the same ethnic origin (general population group) were also recruited to assess allele frequency for potentially deleterious sequence variants. Next generation sequencing (NGS), Sanger sequencing, and bioinformatics analysis. The complete coding regions of 70 candidate genes were massively sequenced, via NGS, in POF patients. Bioinformatics and genetics were used to confirm NGS results and to identify potential sequence variants related to the disease pathogenesis. We have identified mutations in two novel genes, ADAMTS19 and BMPR2, that are potentially related to POF origin. LHCGR mutations, which might have contributed to the phenotype, were also detected. We thus recommend NGS as a powerful tool for identifying new molecular actors in POF and for future diagnostic/prognostic purposes. Copyright © 2015 American Society for Reproductive Medicine. Published by Elsevier Inc. All rights reserved.

  19. Guidelines for investigating causality of sequence variants in human disease

    PubMed Central

    MacArthur, D. G.; Manolio, T. A.; Dimmock, D. P.; Rehm, H. L.; Shendure, J.; Abecasis, G. R.; Adams, D. R.; Altman, R. B.; Antonarakis, S. E.; Ashley, E. A.; Barrett, J. C.; Biesecker, L. G.; Conrad, D. F.; Cooper, G. M.; Cox, N. J.; Daly, M. J.; Gerstein, M. B.; Goldstein, D. B.; Hirschhorn, J. N.; Leal, S. M.; Pennacchio, L. A.; Stamatoyannopoulos, J. A.; Sunyaev, S. R.; Valle, D.; Voight, B. F.; Winckler, W.; Gunter, C.

    2014-01-01

    The discovery of rare genetic variants is accelerating, and clear guidelines for distinguishing disease-causing sequence variants from the many potentially functional variants present in any human genome are urgently needed. Without rigorous standards we risk an acceleration of false-positive reports of causality, which would impede the translation of genomic research findings into the clinical diagnostic setting and hinder biological understanding of disease. Here we discuss the key challenges of assessing sequence variants in human disease, integrating both gene-level and variant-level support for causality. We propose guidelines for summarizing confidence in variant pathogenicity and highlight several areas that require further resource development. PMID:24759409

  20. Guidelines for investigating causality of sequence variants in human disease.

    PubMed

    MacArthur, D G; Manolio, T A; Dimmock, D P; Rehm, H L; Shendure, J; Abecasis, G R; Adams, D R; Altman, R B; Antonarakis, S E; Ashley, E A; Barrett, J C; Biesecker, L G; Conrad, D F; Cooper, G M; Cox, N J; Daly, M J; Gerstein, M B; Goldstein, D B; Hirschhorn, J N; Leal, S M; Pennacchio, L A; Stamatoyannopoulos, J A; Sunyaev, S R; Valle, D; Voight, B F; Winckler, W; Gunter, C

    2014-04-24

    The discovery of rare genetic variants is accelerating, and clear guidelines for distinguishing disease-causing sequence variants from the many potentially functional variants present in any human genome are urgently needed. Without rigorous standards we risk an acceleration of false-positive reports of causality, which would impede the translation of genomic research findings into the clinical diagnostic setting and hinder biological understanding of disease. Here we discuss the key challenges of assessing sequence variants in human disease, integrating both gene-level and variant-level support for causality. We propose guidelines for summarizing confidence in variant pathogenicity and highlight several areas that require further resource development.

  1. Identification of novel mutations and sequence variants in the SOX2 and CHX10 genes in patients with anophthalmia/microphthalmia

    PubMed Central

    Zhou, Jie; Kherani, Femida; Bardakjian, Tanya M.; Katowitz, James; Hughes, Nkecha; Schimmenti, Lisa A.; Schneider, Adele

    2008-01-01

    Purpose Mutations in the SOX2 and CHX10 genes have been reported in patients with anophthalmia and/or microphthalmia. In this study, we evaluated 34 anophthalmic/microphthalmic patient DNA samples (two sets of siblings included) for mutations and sequence variants in SOX2 and CHX10. Methods Conformational sensitive gel electrophoresis (CSGE) was used for the initial SOX2 and CHX10 screening of 34 affected individuals (two sets of siblings), five unaffected family members, and 80 healthy controls. Patient samples containing heteroduplexes were selected for sequence analysis. Base pair changes in SOX2 and CHX10 were confirmed by sequencing bidirectionally in patient samples. Results Two novel heterozygous mutations and two sequence variants (one known) in SOX2 were identified in this cohort. Mutation c.310 G>T (p. Glu104X), found in one patient, was in the region encoding the high mobility group (HMG) DNA-binding domain and resulted in a change from glutamic acid to a stop codon. The second mutation, noted in two affected siblings, was a single nucleotide deletion c.549delC (p. Pro184ArgfsX19) in the region encoding the activation domain, resulting in a frameshift and premature termination of the coding sequence. The shortened protein products may result in the loss of function. In addition, a novel nucleotide substitution c.*557G>A was identified in the 3′-untranslated region in one patient. The relationship between the nucleotide change and the protein function is indeterminate. A known single nucleotide polymorphism (c. *469 C>A, SNP rs11915160) was also detected in 2 of the 34 patients. Screening of CHX10 identified two synonymous sequence variants, c.471 C>T (p.Ser157Ser, rs35435463) and c.579 G>A (p. Gln193Gln, novel SNP), and one non-synonymous sequence variant, c.871 G>A (p. Asp291Asn, novel SNP). The non-synonymous polymorphism was also present in healthy controls, suggesting non-causality. Conclusions These results support the role of SOX2 in ocular development. Loss of SOX2 function results in severe eye malformation. CHX10 was not implicated with microphthalmia/anophthalmia in our patient cohort. PMID:18385794

  2. Piroplasms in brown hyaenas (Parahyaena brunnea) and spotted hyaenas (Crocuta crocuta) in Namibia and South Africa are closely related to Babesia lengau.

    PubMed

    Burroughs, Richard E J; Penzhorn, Barend L; Wiesel, Ingrid; Barker, Nancy; Vorster, Ilse; Oosthuizen, Marinda C

    2017-02-01

    The objective of our study was identification and molecular characterization of piroplasms and rickettsias occurring in brown (Parahyaena brunnea) and spotted hyaenas (Crocuta crocuta) from various localities in Namibia and South Africa. Whole blood (n = 59) and skin (n = 3) specimens from brown (n = 15) and spotted hyaenas (n = 47) were screened for the presence of Babesia, Theileria, Ehrlichia and Anaplasma species using the reverse line blot (RLB) hybridization technique. PCR products of 52/62 (83.9%) of the specimens hybridized only with the Theileria/Babesia genus-specific probes and not with any of the species-specific probes, suggesting the presence of a novel species or variant of a species. No Ehrlichia and/or Anaplasma species DNA could be detected. A parasite 18S ribosomal RNA gene of brown (n = 3) and spotted hyaena (n = 6) specimens was subsequently amplified and cloned, and the recombinants were sequenced. Homologous sequence searches of databases indicated that the obtained sequences were most closely related to Babesia lengau, originally described from cheetahs (Acinonyx jubatus). Observed sequence similarities were subsequently confirmed by phylogenetic analyses which showed that the obtained hyaena sequences formed a monophyletic group with B. lengau, B abesia conradae and sequences previously isolated from humans and wildlife in the western USA. Within the B. lengau clade, the obtained sequences and the published B. lengau sequences were grouped into six distinct groups, of which groups I to V represented novel B. lengau genotypes and/or gene variants. We suggest that these genotypes cannot be classified as new Babesia species, but rather as variants of B. lengau. This is the first report of occurrence of piroplasms in brown hyaenas.

  3. Korean Variant Archive (KOVA): a reference database of genetic variations in the Korean population.

    PubMed

    Lee, Sangmoon; Seo, Jihae; Park, Jinman; Nam, Jae-Yong; Choi, Ahyoung; Ignatius, Jason S; Bjornson, Robert D; Chae, Jong-Hee; Jang, In-Jin; Lee, Sanghyuk; Park, Woong-Yang; Baek, Daehyun; Choi, Murim

    2017-06-27

    Despite efforts to interrogate human genome variation through large-scale databases, systematic preference toward populations of Caucasian descendants has resulted in unintended reduction of power in studying non-Caucasians. Here we report a compilation of coding variants from 1,055 healthy Korean individuals (KOVA; Korean Variant Archive). The samples were sequenced to a mean depth of 75x, yielding 101 singleton variants per individual. Population genetics analysis demonstrates that the Korean population is a distinct ethnic group comparable to other discrete ethnic groups in Africa and Europe, providing a rationale for such independent genomic datasets. Indeed, KOVA conferred 22.8% increased variant filtering power in addition to Exome Aggregation Consortium (ExAC) when used on Korean exomes. Functional assessment of nonsynonymous variant supported the presence of purifying selection in Koreans. Analysis of copy number variants detected 5.2 deletions and 10.3 amplifications per individual with an increased fraction of novel variants among smaller and rarer copy number variable segments. We also report a list of germline variants that are associated with increased tumor susceptibility. This catalog can function as a critical addition to the pre-existing variant databases in pursuing genetic studies of Korean individuals.

  4. A weighted U-statistic for genetic association analyses of sequencing data.

    PubMed

    Wei, Changshuai; Li, Ming; He, Zihuai; Vsevolozhskaya, Olga; Schaid, Daniel J; Lu, Qing

    2014-12-01

    With advancements in next-generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high-dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU-SEQ, for the high-dimensional association analysis of sequencing data. Based on a nonparametric U-statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol. © 2014 WILEY PERIODICALS, INC.

  5. Detection of mosaicism for the polymorphic variants in the 5'-UTR of hOGG1 by cloning and sequence analysis and pyrosequencing.

    PubMed

    Cao, Lili; Li, Tianfeng; Zhu, Yanbei; Zhou, Wei; Guo, Wenwen; Cai, Zhenming; Xie, Yuan; He, Xuan; Li, Xinxiu; Zhu, Dalong; Wang, Yaping

    2013-04-01

    Mosaicism refers to the presence of genetically distinct cell lines within an organism or a tissue. Somatic mosaicism exists in distinct populations of somatic cells and commonly arises as a result of somatic mutations, mainly in early embryonic development. SNPs are important markers that distinguish between different individuals in heterogeneous biological samples and contribute greatly to disease risk association studies. In this work, we investigated the relationship between the functional variants in the 5'-UTR of the hOGG1 gene and the risk of type 2 diabetes. Upon detection of the polymorphisms c.-53G>C, c.-23A>G, and c.-18G>T in the hOGG1 gene, we found that mosaicism was present in 3/28 (10.71%), 7/51 (13.73%), and 1/44 (2.27%) patients respectively, who were carriers of these single nucleotide variations, by cloning and sequence analysis and pyrosequencing. Statistical analysis showed that the frequency of the variation c.-23A>G in the hOGG1 5'-UTR in type 2 diabetic patients was significantly higher than that in healthy controls. However, sequencing of the mutant alleles in mosaic individuals showed weak peaks that may affect detection of the SNPs and impair association-based investigations. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  6. mtDNA-Server: next-generation sequencing data analysis of human mitochondrial DNA in the cloud.

    PubMed

    Weissensteiner, Hansi; Forer, Lukas; Fuchsberger, Christian; Schöpf, Bernd; Kloss-Brandstätter, Anita; Specht, Günther; Kronenberg, Florian; Schönherr, Sebastian

    2016-07-08

    Next generation sequencing (NGS) allows investigating mitochondrial DNA (mtDNA) characteristics such as heteroplasmy (i.e. intra-individual sequence variation) to a higher level of detail. While several pipelines for analyzing heteroplasmies exist, issues in usability, accuracy of results and interpreting final data limit their usage. Here we present mtDNA-Server, a scalable web server for the analysis of mtDNA studies of any size with a special focus on usability as well as reliable identification and quantification of heteroplasmic variants. The mtDNA-Server workflow includes parallel read alignment, heteroplasmy detection, artefact or contamination identification, variant annotation as well as several quality control metrics, often neglected in current mtDNA NGS studies. All computational steps are parallelized with Hadoop MapReduce and executed graphically with Cloudgene. We validated the underlying heteroplasmy and contamination detection model by generating four artificial sample mix-ups on two different NGS devices. Our evaluation data shows that mtDNA-Server detects heteroplasmies and artificial recombinations down to the 1% level with perfect specificity and outperforms existing approaches regarding sensitivity. mtDNA-Server is currently able to analyze the 1000G Phase 3 data (n = 2,504) in less than 5 h and is freely accessible at https://mtdna-server.uibk.ac.at. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. Extensive Horizontal Gene Transfer during Staphylococcus aureus Co-colonization In Vivo

    PubMed Central

    McCarthy, Alex J.; Loeffler, Anette; Witney, Adam A.; Gould, Katherine A.; Lloyd, David H.; Lindsay, Jodi A.

    2014-01-01

    Staphylococcus aureus is a commensal and major pathogen of humans and animals. Comparative genomics of S. aureus populations suggests that colonization of different host species is associated with carriage of mobile genetic elements (MGE), particularly bacteriophages and plasmids capable of encoding virulence, resistance, and immune evasion pathways. Antimicrobial-resistant S. aureus of livestock are a potential zoonotic threat to human health if they adapt to colonize humans efficiently. We utilized the technique of experimental evolution and co-colonized gnotobiotic piglets with both human- and pig-associated variants of the lineage clonal complex 398, and investigated growth and genetic changes over 16 days using whole genome sequencing. The human isolate survived co-colonization on piglets more efficiently than in vitro. During co-colonization, transfer of MGE from the pig to the human isolate was detected within 4 h. Extensive and repeated transfer of two bacteriophages and three plasmids resulted in colonization with isolates carrying a wide variety of mobilomes. Whole genome sequencing of progeny bacteria revealed no acquisition of core genome polymorphisms, highlighting the importance of MGE. Staphylococcus aureus bacteriophage recombination and integration into novel sites was detected experimentally for the first time. During colonization, clones coexisted and diversified rather than a single variant dominating. Unexpectedly, each piglet carried unique populations of bacterial variants, suggesting limited transmission of bacteria between piglets once colonized. Our data show that horizontal gene transfer occurs at very high frequency in vivo and significantly higher than that detectable in vitro. PMID:25260585

  8. Therapeutic strategies and genetic profile comparisons in small cell carcinoma and large cell neuroendocrine carcinoma of the lung using next-generation sequencing.

    PubMed

    Ito, Masaoki; Miyata, Yoshihiro; Hirano, Shoko; Kimura, Shingo; Irisuna, Fumiko; Ikeda, Kyoko; Kushitani, Kei; Tsutani, Yasuhiro; Ueda, Daisuke; Tsubokawa, Norifumi; Takeshima, Yukio; Okada, Morihito

    2017-12-12

    Small cell lung cancer (SCLC) and large cell neuroendocrine carcinoma (LCNEC) of the lung are classified as variants of endocrine carcinoma and subdivided into pure or combined type. Clinical benefit of target therapy has not been established in these tumors. This study aimed to compare genetic and clinicopathological features between SCLC and LCNEC or pure and combined types, and explore the possibility of target therapy using next-generation sequencing. In 13 SCLC and 22 LCNEC cases, 72 point mutations, 19 deletions, and 3 insertions were detected. As therapeutically targetable variants, mutations in EGFR (L858R), KRAS (G12D, G12A, G12V), and PIK3CA (E545K) were detected in 5 cases. The case harboring EGFR mutation showed response to EGFR-tyrosine kinase inhibitor. However, there are no clinicopathological features associated with therapeutically targetable cases. And there was no significant genetic feature between SCLC and LCNEC or pure and combined types. In conclusion, although patients with SCLC and LCNEC may benefit from target therapy, they were not identifiable by clinicopathologic background. And there was not significant genetic difference between SCLC and LCNEC, including between pure and combined types. Classifying SCLC and LCNEC in same category is reasonable. However, distinguishing the pure type from combined type was not validated. Comprehensive genetic analysis should be performed to detect targetable variants in any type of SCLC and LCNEC.

  9. Detection of hyper-conserved regions in hepatitis B virus X gene potentially useful for gene therapy.

    PubMed

    González, Carolina; Tabernero, David; Cortese, Maria Francesca; Gregori, Josep; Casillas, Rosario; Riveiro-Barciela, Mar; Godoy, Cristina; Sopena, Sara; Rando, Ariadna; Yll, Marçal; Lopez-Martinez, Rosa; Quer, Josep; Esteban, Rafael; Buti, Maria; Rodríguez-Frías, Francisco

    2018-05-21

    To detect hyper-conserved regions in the hepatitis B virus (HBV) X gene ( HBX ) 5' region that could be candidates for gene therapy. The study included 27 chronic hepatitis B treatment-naive patients in various clinical stages (from chronic infection to cirrhosis and hepatocellular carcinoma, both HBeAg-negative and HBeAg-positive), and infected with HBV genotypes A-F and H. In a serum sample from each patient with viremia > 3.5 log IU/mL, the HBX 5' end region [nucleotide (nt) 1255-1611] was PCR-amplified and submitted to next-generation sequencing (NGS). We assessed genotype variants by phylogenetic analysis, and evaluated conservation of this region by calculating the information content of each nucleotide position in a multiple alignment of all unique sequences (haplotypes) obtained by NGS. Conservation at the HBx protein amino acid (aa) level was also analyzed. NGS yielded 1333069 sequences from the 27 samples, with a median of 4578 sequences/sample (2487-9279, IQR 2817). In 14/27 patients (51.8%), phylogenetic analysis of viral nucleotide haplotypes showed a complex mixture of genotypic variants. Analysis of the information content in the haplotype multiple alignments detected 2 hyper-conserved nucleotide regions, one in the HBX upstream non-coding region (nt 1255-1286) and the other in the 5' end coding region (nt 1519-1603). This last region coded for a conserved amino acid region (aa 63-76) that partially overlaps a Kunitz-like domain. Two hyper-conserved regions detected in the HBX 5' end may be of value for targeted gene therapy, regardless of the patients' clinical stage or HBV genotype.

  10. Variants in KCNJ11 and BAD do not predict response to ketogenic dietary therapies for epilepsy.

    PubMed

    Schoeler, Natasha E; Leu, Costin; White, Jon; Plagnol, Vincent; Ellard, Sian; Matarin, Mar; Yellen, Gary; Thiele, Elizabeth A; Mackay, Mark; McMahon, Jacinta M; Scheffer, Ingrid E; Sander, Josemir W; Cross, J Helen; Sisodiya, Sanjay M

    2015-12-01

    In the absence of specific metabolic disorders, predictors of response to ketogenic dietary therapies (KDT) are unknown. We aimed to determine whether variants in established candidate genes KCNJ11 and BAD influence response to KDT. We sequenced KCNJ11 and BAD in individuals without previously-known glucose transporter type 1 deficiency syndrome or other metabolic disorders, who received KDT for epilepsy. Hospital records were used to obtain demographic and clinical data. Two response phenotypes were used: ≥ 50% seizure reduction and seizure-freedom at 3-month follow-up. Case/control association tests were conducted with KCNJ11 and BAD variants with minor allele frequency (MAF)>0.01, using PLINK. Response to KDT in individuals with variants with MAF<0.01 was evaluated. 303 Individuals had KCNJ11 and 246 individuals had BAD sequencing data and diet response data. Six SNPs in KCNJ11 and two in BAD had MAF>0.01. Eight variants in KCNJ11 and seven in BAD (of which three were previously-unreported) had MAF<0.01. No significant results were obtained from association analyses, with either KDT response phenotype. P-values were similar when accounting for ethnicity using a stratified Cochran-Mantel-Haenszel test. There did not seem to be a consistent effect of rare variants on response to KDT, although the cohort size was too small to assess significance. Common variants in KCNJ11 and BAD do not predict response to KDT for epilepsy. We can exclude, with 80% power, association from variants with a MAF of >0.05 and effect size >3. A larger sample size is needed to detect associations from rare variants or those with smaller effect sizes. Copyright © 2015 Elsevier B.V. All rights reserved.

  11. A few nucleotide polymorphisms are sufficient to recruit nuclear factors differentially to the intron 1 of HPV-16 intratypic variants.

    PubMed

    López-Urrutia, Eduardo; Valdés, Jesús; Bonilla-Moreno, Raúl; Martínez-Salazar, Martha; Martínez-Garcia, Martha; Berumen, Jaime; Villegas-Sepúlveda, Nicolás

    2012-06-01

    The HPV-16 E6/E7 genes, which contain intron 1, are processed by alternative splicing and its transcripts are detected with a heterogeneous profile in tumours cells. Frequently, the HPV-16 positive carcinoma cells bear viral variants that contain single nucleotide polymorphisms into its DNA sequence. We were interested in analysing the contribution of this polymorphism to the heterogeneity in the pattern of the E6/E7 spliced transcripts. Using the E6/E7 sequences from three closely related HPV-16 variants, we have shown that a few nucleotide changes are sufficient to produce heterogeneity in the splicing profile. Furthermore, using mutants that contained a single SNP, we also showed that one nucleotide change was sufficient to reproduce the heterogeneous splicing profile. Additionally, a difference of two or three SNPs among these viral sequences was sufficient to recruit differentially several splicing factors to the polymorphic E6/E7 transcripts. Moreover, only one SNP was sufficient to alter the binding site of at least one splicing factor, changing the ability of splicing factors to bind the transcript. Finally, the factors that were differentially bound to the short form of intron 1 of one of these E6/E7 variants were identified as TIA1 and/or TIAR and U1-70k, while U2AF65, U5-52k and PTB were preferentially bound to the transcript of the other variants. Copyright © 2012 Elsevier B.V. All rights reserved.

  12. Retrospective genotype-phenotype analysis in a 305 patient cohort referred for testing of a targeted epilepsy panel.

    PubMed

    Hesse, Andrew N; Bevilacqua, Jennifer; Shankar, Kritika; Reddi, Honey V

    2018-05-16

    Epilepsy is a diverse neurological condition with extreme genetic and phenotypic heterogeneity. The introduction of next-generation sequencing into the clinical laboratory has made it possible to investigate hundreds of associated genes simultaneously for a patient, even in the absence of a clearly defined syndrome. This has resulted in the detection of rare and novel mutations at a rate well beyond our ability to characterize their effects. This retrospective study reviews genotype data in the context of available phenotypic information on 305 patients spanning the epileptic spectrum to identify established and novel patterns of correlation. Our epilepsy panel comprising 377 genes was used to sequence 305 patients referred for genetic testing. Qualifying variants were annotated with phenotypic data obtained from either the test requisition form or supporting clinical documentation. Observed phenotypes were compared with established phenotypes in OMIM, published literature and the ILAEs 2010 report on genetic testing to assess congruity with known gene aberrations. We identified a number of novel and recognized genetic variants consistent with established epileptic phenotypes. Forty-one pathogenic or predicted deleterious variants were detected in 39 patients with accompanying clinical documentation. Twenty-five of these variants across 15 genes were novel. Furthermore, evaluation of phenotype data for 194 patients with variants of unknown significance in genes with autosomal dominant and X-linked disease inheritance elucidated potentially disease-causing variants that were not currently characterized in the literature. Assessment of key genotype-phenotype correlations from our cohort provide insight into variant classification, as well as the importance of including ILAE recommended genes as part of minimum panel content for comprehensive epilepsy tests. Many of the reported VUSs are likely genuine pathogenic variants driving the observed phenotypes, but not enough evidence is available for assertive classifications. Similar studies will provide more utility via mounting independent genotype-phenotype data from unrelated patients. The possible outcome would be a better molecular diagnostic product, with fewer indeterminate reports containing only VUSs. Copyright © 2018. Published by Elsevier B.V.

  13. Variant calling in low-coverage whole genome sequencing of a Native American population sample.

    PubMed

    Bizon, Chris; Spiegel, Michael; Chasse, Scott A; Gizer, Ian R; Li, Yun; Malc, Ewa P; Mieczkowski, Piotr A; Sailsbery, Josh K; Wang, Xiaoshu; Ehlers, Cindy L; Wilhelmsen, Kirk C

    2014-01-30

    The reduction in the cost of sequencing a human genome has led to the use of genotype sampling strategies in order to impute and infer the presence of sequence variants that can then be tested for associations with traits of interest. Low-coverage Whole Genome Sequencing (WGS) is a sampling strategy that overcomes some of the deficiencies seen in fixed content SNP array studies. Linkage-disequilibrium (LD) aware variant callers, such as the program Thunder, may provide a calling rate and accuracy that makes a low-coverage sequencing strategy viable. We examined the performance of an LD-aware variant calling strategy in a population of 708 low-coverage whole genome sequences from a community sample of Native Americans. We assessed variant calling through a comparison of the sequencing results to genotypes measured in 641 of the same subjects using a fixed content first generation exome array. The comparison was made using the variant calling routines GATK Unified Genotyper program and the LD-aware variant caller Thunder. Thunder was found to improve concordance in a coverage dependent fashion, while correctly calling nearly all of the common variants as well as a high percentage of the rare variants present in the sample. Low-coverage WGS is a strategy that appears to collect genetic information intermediate in scope between fixed content genotyping arrays and deep-coverage WGS. Our data suggests that low-coverage WGS is a viable strategy with a greater chance of discovering novel variants and associations than fixed content arrays for large sample association analyses.

  14. Outcome of ABCA4 disease-associated alleles in autosomal recessive retinal dystrophies: retrospective analysis in 420 Spanish families.

    PubMed

    Riveiro-Alvarez, Rosa; Lopez-Martinez, Miguel-Angel; Zernant, Jana; Aguirre-Lamban, Jana; Cantalapiedra, Diego; Avila-Fernandez, Almudena; Gimenez, Ascension; Lopez-Molina, Maria-Isabel; Garcia-Sandoval, Blanca; Blanco-Kelly, Fiona; Corton, Marta; Tatu, Sorina; Fernandez-San Jose, Patricia; Trujillo-Tiebas, Maria-Jose; Ramos, Carmen; Allikmets, Rando; Ayuso, Carmen

    2013-11-01

    To provide a comprehensive overview of all detected mutations in the ABCA4 gene in Spanish families with autosomal recessive retinal disorders, including Stargardt's disease (arSTGD), cone-rod dystrophy (arCRD), and retinitis pigmentosa (arRP), and to assess genotype-phenotype correlation and disease progression in 10 years by considering the type of variants and age at onset. Case series. A total of 420 unrelated Spanish families: 259 arSTGD, 86 arCRD, and 75 arRP. Spanish families were analyzed through a combination of ABCR400 genotyping microarray, denaturing high-performance liquid chromatography, and high-resolution melting scanning. Direct sequencing was used as a confirmation technique for the identified variants. Screening by multiple ligation probe analysis was used to detect possible large deletions or insertions in the ABCA4 gene. Selected families were analyzed further by next generation sequencing. DNA sequence variants, mutation detection rates, haplotypes, age at onset, central or peripheral vision loss, and night blindness. Overall, we detected 70.5% and 36.6% of all expected ABCA4 mutations in arSTGD and arCRD patient cohorts, respectively. In the fraction of the cohort where the ABCA4 gene was sequenced completely, the detection rates reached 73.6% for arSTGD and 66.7% for arCRD. However, the frequency of possibly pathogenic ABCA4 alleles in arRP families was only slightly higher than that in the general population. Moreover, in some families, mutations in other known arRP genes segregated with the disease phenotype. An increasing understanding of causal ABCA4 alleles in arSTGD and arCRD facilitates disease diagnosis and prognosis and also is paramount in selecting patients for emerging clinical trials of therapeutic interventions. Because ABCA4-associated diseases are evolving retinal dystrophies, assessment of age at onset, accurate clinical diagnosis, and genetic testing are crucial. We suggest that ABCA4 mutations may be associated with a retinitis pigmentosa-like phenotype often as a consequence of severe (null) mutations, in cases of long-term, advanced disease, or both. Patients with classical arRP phenotypes, especially from the onset of the disease, should be screened first for mutations in known arRP genes and not ABCA4. Copyright © 2013 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.

  15. De Novo Coding Variants Are Strongly Associated with Tourette Disorder

    PubMed Central

    Willsey, A. Jeremy; Fernandez, Thomas V.; Yu, Dongmei; King, Robert A.; Dietrich, Andrea; Xing, Jinchuan; Sanders, Stephan J.; Mandell, Jeffrey D.; Huang, Alden Y.; Richer, Petra; Smith, Louw; Dong, Shan; Samocha, Kaitlin E.; Neale, Benjamin M.; Coppola, Giovanni; Mathews, Carol A.; Tischfield, Jay A.; Scharf, Jeremiah M.; State, Matthew W.; Heiman, Gary A.

    2017-01-01

    SUMMARY Whole-exome sequencing (WES) and de novo variant detection have proven a powerful approach to gene discovery in complex neurodevelopmental disorders. We have completed WES of 325 Tourette disorder trios from the Tourette International Collaborative Genetics cohort and a replication sample of 186 trios from the Tourette Syndrome Association International Consortium on Genetics (511 total). We observe strong and consistent evidence for the contribution of de novo likely gene-disrupting (LGD) variants (rate ratio [RR] 2.32, p = 0.002). Additionally, de novo damaging variants (LGD and probably damaging missense) are overrepresented in probands (RR 1.37, p = 0.003). We identify four likely risk genes with multiple de novo damaging variants in unrelated probands: WWC1 (WW and C2 domain containing 1), CELSR3 (Cadherin EGF LAG seven-pass G-type receptor 3), NIPBL (Nipped-B-like), and FN1 (fibronectin 1). Overall, we estimate that de novo damaging variants in approximately 400 genes contribute risk in 12% of clinical cases. PMID:28472652

  16. A machine learning model to determine the accuracy of variant calls in capture-based next generation sequencing.

    PubMed

    van den Akker, Jeroen; Mishne, Gilad; Zimmer, Anjali D; Zhou, Alicia Y

    2018-04-17

    Next generation sequencing (NGS) has become a common technology for clinical genetic tests. The quality of NGS calls varies widely and is influenced by features like reference sequence characteristics, read depth, and mapping accuracy. With recent advances in NGS technology and software tools, the majority of variants called using NGS alone are in fact accurate and reliable. However, a small subset of difficult-to-call variants that still do require orthogonal confirmation exist. For this reason, many clinical laboratories confirm NGS results using orthogonal technologies such as Sanger sequencing. Here, we report the development of a deterministic machine-learning-based model to differentiate between these two types of variant calls: those that do not require confirmation using an orthogonal technology (high confidence), and those that require additional quality testing (low confidence). This approach allows reliable NGS-based calling in a clinical setting by identifying the few important variant calls that require orthogonal confirmation. We developed and tested the model using a set of 7179 variants identified by a targeted NGS panel and re-tested by Sanger sequencing. The model incorporated several signals of sequence characteristics and call quality to determine if a variant was identified at high or low confidence. The model was tuned to eliminate false positives, defined as variants that were called by NGS but not confirmed by Sanger sequencing. The model achieved very high accuracy: 99.4% (95% confidence interval: +/- 0.03%). It categorized 92.2% (6622/7179) of the variants as high confidence, and 100% of these were confirmed to be present by Sanger sequencing. Among the variants that were categorized as low confidence, defined as NGS calls of low quality that are likely to be artifacts, 92.1% (513/557) were found to be not present by Sanger sequencing. This work shows that NGS data contains sufficient characteristics for a machine-learning-based model to differentiate low from high confidence variants. Additionally, it reveals the importance of incorporating site-specific features as well as variant call features in such a model.

  17. SIN3A mutations are rare in men with azoospermia.

    PubMed

    Miyamoto, T; Koh, E; Tsujimura, A; Miyagawa, Y; Minase, G; Ueda, Y; Namiki, M; Sengoku, K

    2015-11-01

    A loss of function of the murine Sin3A gene resulted in male infertility with Sertoli cell-only syndrome (SCOS) phenotype in mice. Here, we investigated the relevance of this gene to human male infertility with azoospermia caused by SCOS. Mutation analysis of SIN3A in the coding region was performed on 80 Japanese patients. However, no variants could be detected. This study suggests a lack of association of SIN3A gene sequence variants with azoospermia caused by SCOS in humans. © 2014 Blackwell Verlag GmbH.

  18. Sensitive cell-based assay for determination of human immunodeficiency virus type 1 coreceptor tropism.

    PubMed

    Weber, Jan; Vazquez, Ana C; Winner, Dane; Gibson, Richard M; Rhea, Ariel M; Rose, Justine D; Wylie, Doug; Henry, Kenneth; Wright, Alison; King, Kevin; Archer, John; Poveda, Eva; Soriano, Vicente; Robertson, David L; Olivo, Paul D; Arts, Eric J; Quiñones-Mateu, Miguel E

    2013-05-01

    CCR5 antagonists are a powerful new class of antiretroviral drugs that require a companion assay to evaluate the presence of CXCR4-tropic (non-R5) viruses prior to use in human immunodeficiency virus (HIV)-infected individuals. In this study, we have developed, characterized, verified, and prevalidated a novel phenotypic test to determine HIV-1 coreceptor tropism (VERITROP) based on a sensitive cell-to-cell fusion assay. A proprietary vector was constructed containing a near-full-length HIV-1 genome with the yeast uracil biosynthesis (URA3) gene replacing the HIV-1 env coding sequence. Patient-derived HIV-1 PCR products were introduced by homologous recombination using an innovative yeast-based cloning strategy. The env-expressing vectors were then used in a cell-to-cell fusion assay to determine the presence of R5 and/or non-R5 HIV-1 variants within the viral population. Results were compared with (i) the original version of Trofile (Monogram Biosciences, San Francisco, CA), (ii) population sequencing, and (iii) 454 pyrosequencing, with the genotypic data analyzed using several bioinformatics tools, i.e., the 11/24/25 rule, Geno2Pheno (2% to 5.75%, 3.5%, or 10% false-positive rate [FPR]), and webPSSM. VERITROP consistently detected minority non-R5 variants from clinical specimens, with an analytical sensitivity of 0.3%, with viral loads of ≥1,000 copies/ml, and from B and non-B subtypes. In a pilot study, a 73.7% (56/76) concordance was observed with the original Trofile assay, with 19 of the 20 discordant results corresponding to non-R5 variants detected using VERITROP and not by the original Trofile assay. The degree of concordance of VERITROP and Trofile with population and deep sequencing results depended on the algorithm used to determine HIV-1 coreceptor tropism. Overall, VERITROP showed better concordance with deep sequencing/Geno2Pheno at a 0.3% detection threshold (67%), whereas Trofile matched better with population sequencing (79%). However, 454 sequencing using Geno2Pheno at a 10% FPR and 0.3% threshold and VERITROP more accurately predicted the success of a maraviroc-based regimen. In conclusion, VERITROP may promote the development of new HIV coreceptor antagonists and aid in the treatment and management of HIV-infected individuals prior to and/or during treatment with this class of drugs.

  19. Utility of whole-genome sequencing for detection of newborn screening disorders in a population cohort of 1,696 neonates.

    PubMed

    Bodian, Dale L; Klein, Elisabeth; Iyer, Ramaswamy K; Wong, Wendy S W; Kothiyal, Prachi; Stauffer, Daniel; Huddleston, Kathi C; Gaither, Amber D; Remsburg, Irina; Khromykh, Alina; Baker, Robin L; Maxwell, George L; Vockley, Joseph G; Niederhuber, John E; Solomon, Benjamin D

    2016-03-01

    To assess the potential of whole-genome sequencing (WGS) to replicate and augment results from conventional blood-based newborn screening (NBS). Research-generated WGS data from an ancestrally diverse cohort of 1,696 infants and both parents of each infant were analyzed for variants in 163 genes involved in disorders included or under discussion for inclusion in US NBS programs. WGS results were compared with results from state NBS and related follow-up testing. NBS genes are generally well covered by WGS. There is a median of one (range: 0-6) database-annotated pathogenic variant in the NBS genes per infant. Results of WGS and NBS in detecting 28 state-screened disorders and four hemoglobin traits were concordant for 88.6% of true positives (n = 35) and 98.9% of true negatives (n = 45,757). Of the five infants affected with a state-screened disorder, WGS identified two whereas NBS detected four. WGS yielded fewer false positives than NBS (0.037 vs. 0.17%) but more results of uncertain significance (0.90 vs. 0.013%). WGS may help rule in and rule out NBS disorders, pinpoint molecular diagnoses, and detect conditions not amenable to current NBS assays.

  20. An Engineered Kinetic Amplification Mechanism for Single Nucleotide Variant Discrimination by DNA Hybridization Probes.

    PubMed

    Chen, Sherry Xi; Seelig, Georg

    2016-04-20

    Even a single-nucleotide difference between the sequences of two otherwise identical biological nucleic acids can have dramatic functional consequences. Here, we use model-guided reaction pathway engineering to quantitatively improve the performance of selective hybridization probes in recognizing single nucleotide variants (SNVs). Specifically, we build a detection system that combines discrimination by competition with DNA strand displacement-based catalytic amplification. We show, both mathematically and experimentally, that the single nucleotide selectivity of such a system in binding to single-stranded DNA and RNA is quadratically better than discrimination due to competitive hybridization alone. As an additional benefit the integrated circuit inherits the property of amplification and provides at least 10-fold better sensitivity than standard hybridization probes. Moreover, we demonstrate how the detection mechanism can be tuned such that the detection reaction is agnostic to the position of the SNV within the target sequence. in contrast, prior strand displacement-based probes designed for kinetic discrimination are highly sensitive to position effects. We apply our system to reliably discriminate between different members of the let-7 microRNA family that differ in only a single base position. Our results demonstrate the power of systematic reaction network design to quantitatively improve biotechnology.

  1. Integrating mRNA and protein sequencing enables the detection and quantitative profiling of natural protein sequence variants of Populus trichocarpa

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Abraham, Paul E.; Wang, Xiaojing; Ranjan, Priya

    The availability of next-generation sequencing technologies has rapidly transformed our ability to link genotypes to phenotypes, and as such, promises to facilitate the dissection of genetic contribution to complex traits. Although discoveries of genetic associations will further our understanding of biology, once candidate variants have been identified, investigators are faced with the challenge of characterizing the functional effects on proteins encoded by such genes. Here we show how next-generation RNA sequencing data can be exploited to construct genotype-specific protein sequence databases, which provide a clearer picture of the molecular toolbox underlying cellular and organismal processes and their variation in amore » natural population. For this study, we used two individual genotypes (DENA-17-3 and VNDL-27-4) from a recent genome wide association (GWA) study of Populus trichocarpa, an obligate outcrosser that exhibits tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs) and insertions and deletions (INDELS). Based on large-scale identification of SAAPs, we profiled the frequency of 128 types of naturally occurring amino acid substitutions, with a subset of SAAPs occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. In addition, we were able to explore the diploid landscape of Populus at the proteome-level, allowing the characterization of heterozygous variants.« less

  2. Integrating mRNA and protein sequencing enables the detection and quantitative profiling of natural protein sequence variants of Populus trichocarpa

    DOE PAGES

    Abraham, Paul E.; Wang, Xiaojing; Ranjan, Priya; ...

    2015-10-20

    The availability of next-generation sequencing technologies has rapidly transformed our ability to link genotypes to phenotypes, and as such, promises to facilitate the dissection of genetic contribution to complex traits. Although discoveries of genetic associations will further our understanding of biology, once candidate variants have been identified, investigators are faced with the challenge of characterizing the functional effects on proteins encoded by such genes. Here we show how next-generation RNA sequencing data can be exploited to construct genotype-specific protein sequence databases, which provide a clearer picture of the molecular toolbox underlying cellular and organismal processes and their variation in amore » natural population. For this study, we used two individual genotypes (DENA-17-3 and VNDL-27-4) from a recent genome wide association (GWA) study of Populus trichocarpa, an obligate outcrosser that exhibits tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs) and insertions and deletions (INDELS). Based on large-scale identification of SAAPs, we profiled the frequency of 128 types of naturally occurring amino acid substitutions, with a subset of SAAPs occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. In addition, we were able to explore the diploid landscape of Populus at the proteome-level, allowing the characterization of heterozygous variants.« less

  3. Exome sequencing reveals novel genetic loci influencing obesity-related traits in Hispanic children

    USDA-ARS?s Scientific Manuscript database

    To perform whole exome sequencing in 928 Hispanic children and identify variants and genes associated with childhood obesity.Single-nucleotide variants (SNVs) were identified from Illumina whole exome sequencing data using integrated read mapping, variant calling, and an annotation pipeline (Mercury...

  4. Novel sequence variants in the TMIE gene in families with autosomal recessive nonsyndromic hearing impairment

    PubMed Central

    Santos, Regie Lyn P.; El-Shanti, Hatem; Sikandar, Shaheen; Lee, Kwanghyuk; Bhatti, Attya; Yan, Kai; Chahrour, Maria H.; McArthur, Nathan; Pham, Thanh L.; Mahasneh, Amjad Abdullah; Ahmad, Wasim

    2010-01-01

    To date, 37 genes have been identified for nonsyndromic hearing impairment (NSHI). Identifying the functional sequence variants within these genes and knowing their population-specific frequencies is of public health value, in particular for genetic screening for NSHI. To determine putatively functional sequence variants in the transmembrane inner ear (TMIE) gene in Pakistani and Jordanian families with autosomal recessive (AR) NSHI, four Jordanian and 168 Pakistani families with ARNSHI that is not due to GJB2 (CX26) were submitted to a genome scan. Two-point and multipoint parametric linkage analyses were performed, and families with logarithmic odds (LOD) scores of 1.0 or greater within the TMIE region underwent further DNA sequencing. The evolutionary conservation and location in predicted protein domains of amino acid residues where sequence variants occurred were studied to elucidate the possible effects of these sequence variants on function. Of seven families that were screened for TMIE, putatively functional sequence variants were found to segregate with hearing impairment in four families but were not seen in not less than 110 ethnically matched control chromosomes. The previously reported c.241C>T (p.R81C) variant was observed in two Pakistani families. Two novel variants, c.92A>G (p.E31G) and the splice site mutation c.212–2A>C, were identified in one Pakistani and one Jordanian family, respectively. The c.92A>G (p.E31G) variant occurred at a residue that is conserved in the mouse and is predicted to be extracellular. Conservation and potential functionality of previously published mutations were also examined. The prevalence of functional TMIE variants in Pakistani families is 1.7% [95% confidence interval (CI) 0.3–4.8]. Further studies on the spectrum, prevalence rates, and functional effect of sequence variants in the TMIE gene in other populations should demonstrate the true importance of this gene as a cause of hearing impairment. PMID:16389551

  5. Polypeptide having or assisting in carbohydrate material degrading activity and uses thereof

    DOEpatents

    Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Los, Alrik Pieter

    2016-02-16

    The invention relates to a polypeptide which comprises the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 76% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 76% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.

  6. Polypeptide having beta-glucosidase activity and uses thereof

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schoonneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; De Jong, Rene Marcel

    The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well asmore » the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.« less

  7. Polypeptide having swollenin activity and uses thereof

    DOEpatents

    Schoonneveld-Bergmans, Margot Elizabeth Francoise; Heijne, Wilbert Herman Marie; Vlasie, Monica D; Damveld, Robbertus Antonius

    2015-11-04

    The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.

  8. Polypeptide having beta-glucosidase activity and uses thereof

    DOEpatents

    Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; De Jong, Rene Marcel; Damveld, Robbertus Antonius

    2015-09-01

    The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 70% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 70% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.

  9. Polypeptide having cellobiohydrolase activity and uses thereof

    DOEpatents

    Sagt, Cornelis Maria Jacobus; Schooneveld-Bergmans, Margot Elisabeth Francoise; Roubos, Johannes Andries; Los, Alrik Pieter

    2015-09-15

    The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 93% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 93% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.

  10. Polypeptide having acetyl xylan esterase activity and uses thereof

    DOEpatents

    Schoonneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Los, Alrik Pieter

    2015-10-20

    The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 82% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 82% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.

  11. Polypeptide having carbohydrate degrading activity and uses thereof

    DOEpatents

    Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Vlasie, Monica Diana; Damveld, Robbertus Antonius

    2015-08-18

    The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.

  12. Whole Exome Sequencing of Pediatric Gastric Adenocarcinoma Reveals an Atypical Presentation of Li-Fraumeni Syndrome

    PubMed Central

    Chang, Vivian Y.; Federman, Noah; Martinez-Agosto, Julian; Tatishchev, Sergei F.; Nelson, Stanley F.

    2014-01-01

    Background Gastric adenocarcinoma is a rare diagnosis in childhood. A 14-year old male patient presented with metastatic gastric adenocarcinoma, and a strong family history of colon cancer. Clinical sequencing of CDH1 and APC were negative. Whole exome sequencing was therefore applied to capture the majority of protein-coding regions for the identification of single-nucleotide variants, small insertion/deletions, and copy number abnormalities in the patient’s germline as well as primary tumor. Materials and Methods DNA was extracted from the patient’s blood, primary tumor, and the unaffected mother’s blood. DNA libraries were constructed and sequenced on Illumina HiSeq2000. Data were post-processed using Picard and Samtools, then analyzed with the Genome Analysis Toolkit. Variants were annotated using an in-house Ensembl-based program. Copy number was assessed using ExomeCNV. Results Each sample was sequenced to a mean depth of coverage of greater than 120×. A rare non-synonymous coding SNV in TP53 was identified in the germline. There were 10 somatic cancer protein-damaging variants that were not observed in the unaffected mother genome. ExomeCNV comparing tumor to the patient’s germline, identified abnormal copy number, spanning 6,946 genes. Conclusion We present an unusual case of Li-Fraumeni detected by whole exome sequencing. There were also likely driver somatic mutations in the gastric adenocarcinoma. These results highlight the need for more thorough and broad scale germline and cancer analyses to accurately inform patients of inherited risk to cancer and to identify somatic mutations. PMID:23015295

  13. Population genomic analysis of strain variation in Leptospirillum group II bacteria involved in acid mine drainage formation.

    PubMed

    Simmons, Sheri L; Dibartolo, Genevieve; Denef, Vincent J; Goltsman, Daniela S Aliaga; Thelen, Michael P; Banfield, Jillian F

    2008-07-22

    Deeply sampled community genomic (metagenomic) datasets enable comprehensive analysis of heterogeneity in natural microbial populations. In this study, we used sequence data obtained from the dominant member of a low-diversity natural chemoautotrophic microbial community to determine how coexisting closely related individuals differ from each other in terms of gene sequence and gene content, and to uncover evidence of evolutionary processes that occur over short timescales. DNA sequence obtained from an acid mine drainage biofilm was reconstructed, taking into account the effects of strain variation, to generate a nearly complete genome tiling path for a Leptospirillum group II species closely related to L. ferriphilum (sampling depth approximately 20x). The population is dominated by one sequence type, yet we detected evidence for relatively abundant variants (>99.5% sequence identity to the dominant type) at multiple loci, and a few rare variants. Blocks of other Leptospirillum group II types ( approximately 94% sequence identity) have recombined into one or more variants. Variant blocks of both types are more numerous near the origin of replication. Heterogeneity in genetic potential within the population arises from localized variation in gene content, typically focused in integrated plasmid/phage-like regions. Some laterally transferred gene blocks encode physiologically important genes, including quorum-sensing genes of the LuxIR system. Overall, results suggest inter- and intrapopulation genetic exchange involving distinct parental genome types and implicate gain and loss of phage and plasmid genes in recent evolution of this Leptospirillum group II population. Population genetic analyses of single nucleotide polymorphisms indicate variation between closely related strains is not maintained by positive selection, suggesting that these regions do not represent adaptive differences between strains. Thus, the most likely explanation for the observed patterns of polymorphism is divergence of ancestral strains due to geographic isolation, followed by mixing and subsequent recombination.

  14. Population Genomic Analysis of Strain Variation in Leptospirillum Group II Bacteria Involved in Acid Mine Drainage Formation

    PubMed Central

    Denef, Vincent J; Goltsman, Daniela S. Aliaga; Thelen, Michael P; Banfield, Jillian F

    2008-01-01

    Deeply sampled community genomic (metagenomic) datasets enable comprehensive analysis of heterogeneity in natural microbial populations. In this study, we used sequence data obtained from the dominant member of a low-diversity natural chemoautotrophic microbial community to determine how coexisting closely related individuals differ from each other in terms of gene sequence and gene content, and to uncover evidence of evolutionary processes that occur over short timescales. DNA sequence obtained from an acid mine drainage biofilm was reconstructed, taking into account the effects of strain variation, to generate a nearly complete genome tiling path for a Leptospirillum group II species closely related to L. ferriphilum (sampling depth ∼20×). The population is dominated by one sequence type, yet we detected evidence for relatively abundant variants (>99.5% sequence identity to the dominant type) at multiple loci, and a few rare variants. Blocks of other Leptospirillum group II types (∼94% sequence identity) have recombined into one or more variants. Variant blocks of both types are more numerous near the origin of replication. Heterogeneity in genetic potential within the population arises from localized variation in gene content, typically focused in integrated plasmid/phage-like regions. Some laterally transferred gene blocks encode physiologically important genes, including quorum-sensing genes of the LuxIR system. Overall, results suggest inter- and intrapopulation genetic exchange involving distinct parental genome types and implicate gain and loss of phage and plasmid genes in recent evolution of this Leptospirillum group II population. Population genetic analyses of single nucleotide polymorphisms indicate variation between closely related strains is not maintained by positive selection, suggesting that these regions do not represent adaptive differences between strains. Thus, the most likely explanation for the observed patterns of polymorphism is divergence of ancestral strains due to geographic isolation, followed by mixing and subsequent recombination. PMID:18651792

  15. Droplet digital PCR technology promises new applications and research areas.

    PubMed

    Manoj, P

    2016-01-01

    Digital Polymerase Chain Reaction (dPCR) is used to quantify nucleic acids and its applications are in the detection and precise quantification of low-level pathogens, rare genetic sequences, quantification of copy number variants, rare mutations and in relative gene expressions. Here the PCR is performed in large number of reaction chambers or partitions and the reaction is carried out in each partition individually. This separation allows a more reliable collection and sensitive measurement of nucleic acid. Results are calculated by counting amplified target sequence (positive droplets) and the number of partitions in which there is no amplification (negative droplets). The mean number of target sequences was calculated by Poisson Algorithm. Poisson correction compensates the presence of more than one copy of target gene in any droplets. The method provides information with accuracy and precision which is highly reproducible and less susceptible to inhibitors than qPCR. It has been demonstrated in studying variations in gene sequences, such as copy number variants and point mutations, distinguishing differences between expression of nearly identical alleles, assessment of clinically relevant genetic variations and it is routinely used for clonal amplification of samples for NGS methods. dPCR enables more reliable predictors of tumor status and patient prognosis by absolute quantitation using reference normalizations. Rare mitochondrial DNA deletions associated with a range of diseases and disorders as well as aging can be accurately detected with droplet digital PCR.

  16. Steroid 5alpha-reductase 1 polymorphisms and testosterone/dihydrotestosterone ratio in male patients with hypospadias.

    PubMed

    Tria, Antje; Hiort, Olaf; Sinnecker, Gernot H G

    2004-01-01

    Defects in the steroid 5alpha-reductase type 2 (SRD5A2) activity cause decreased formation of dihydrotestosterone (DHT) from testosterone (T), resulting in defective masculinization of external genitalia; the T/DHT ratio is increased. We investigated 10 patients with elevated T/DHT ratios in whom mutations in the SRD5A2 and AR genes had been excluded to find out whether structural alterations of the SRD5A1 gene could contribute to their genital malformations. Single-strand conformation polymorphism analysis and direct sequencing were used to detect variations in the SRD5A1 gene of the patients and of 49 adult fertile men who served as controls. The sequence analysis of exon 3 of the SRD5A1 gene indicated an adenine-to-guanine change (ACA vs. ACG), both triplets encoding the amino acid residue threonine. The ACG sequence was detected in 57% of all subjects and was equally distributed in patients and controls. The T/DHT ratio was significantly higher in controls with the ACG variant as compared with those having the ACA variant. However, no particular sequence aberration was found in the SRD5A1 genes of either group. Mutant SRD5A1 isoenzyme does not seem to play a crucial role in the development of hypospadias. Copyright 2004 S. Karger AG, Basel

  17. Exome sequencing and genome-wide linkage analysis in 17 families illustrate the complex contribution of TTN truncating variants to dilated cardiomyopathy.

    PubMed

    Norton, Nadine; Li, Duanxiang; Rampersaud, Evadnie; Morales, Ana; Martin, Eden R; Zuchner, Stephan; Guo, Shengru; Gonzalez, Michael; Hedges, Dale J; Robertson, Peggy D; Krumm, Niklas; Nickerson, Deborah A; Hershberger, Ray E

    2013-04-01

    BACKGROUND- Familial dilated cardiomyopathy (DCM) is a genetically heterogeneous disease with >30 known genes. TTN truncating variants were recently implicated in a candidate gene study to cause 25% of familial and 18% of sporadic DCM cases. METHODS AND RESULTS- We used an unbiased genome-wide approach using both linkage analysis and variant filtering across the exome sequences of 48 individuals affected with DCM from 17 families to identify genetic cause. Linkage analysis ranked the TTN region as falling under the second highest genome-wide multipoint linkage peak, multipoint logarithm of odds, 1.59. We identified 6 TTN truncating variants carried by individuals affected with DCM in 7 of 17 DCM families (logarithm of odds, 2.99); 2 of these 7 families also had novel missense variants that segregated with disease. Two additional novel truncating TTN variants did not segregate with DCM. Nucleotide diversity at the TTN locus, including missense variants, was comparable with 5 other known DCM genes. The average number of missense variants in the exome sequences from the DCM cases or the ≈5400 cases from the Exome Sequencing Project was ≈23 per individual. The average number of TTN truncating variants in the Exome Sequencing Project was 0.014 per individual. We also identified a region (chr9q21.11-q22.31) with no known DCM genes with a maximum heterogeneity logarithm of odds score of 1.74. CONCLUSIONS- These data suggest that TTN truncating variants contribute to DCM cause. However, the lack of segregation of all identified TTN truncating variants illustrates the challenge of determining variant pathogenicity even with full exome sequencing.

  18. Picosecond-resolved FRET on non-amplified DNA for identifying individuals genetically susceptible to type-1 diabetes

    NASA Astrophysics Data System (ADS)

    Nardo, Luca; Tosi, Giovanna; Bondani, Maria; Accolla, Roberto; Andreoni, Alessandra

    2012-06-01

    By tens-of-picosecond resolved fluorescence detection we study Förster resonance energy transfer between a donor and a black-hole-quencher bound at the 5'- and 3'-positions of an oligonucleotide probe matching the highly polymorphic region between codons 51 and 58 of the human leukocyte antigen DQB1 0201 allele, conferring susceptibility to type-1 diabetes. The probe is annealed with non-amplified genomic DNAs carrying either the 0201 sequence or other DQB1 allelic variants. We detect the longest-lived donor fluorescence in the case of hybridization with the 0201 allele and definitely faster and distinct decays for the other allelic variants, some of which are single-nucleotide polymorphic.

  19. Identifying micro-inversions using high-throughput sequencing reads.

    PubMed

    He, Feifei; Li, Yang; Tang, Yu-Hang; Ma, Jian; Zhu, Huaiqiu

    2016-01-11

    The identification of inversions of DNA segments shorter than read length (e.g., 100 bp), defined as micro-inversions (MIs), remains challenging for next-generation sequencing reads. It is acknowledged that MIs are important genomic variation and may play roles in causing genetic disease. However, current alignment methods are generally insensitive to detect MIs. Here we develop a novel tool, MID (Micro-Inversion Detector), to identify MIs in human genomes using next-generation sequencing reads. The algorithm of MID is designed based on a dynamic programming path-finding approach. What makes MID different from other variant detection tools is that MID can handle small MIs and multiple breakpoints within an unmapped read. Moreover, MID improves reliability in low coverage data by integrating multiple samples. Our evaluation demonstrated that MID outperforms Gustaf, which can currently detect inversions from 30 bp to 500 bp. To our knowledge, MID is the first method that can efficiently and reliably identify MIs from unmapped short next-generation sequencing reads. MID is reliable on low coverage data, which is suitable for large-scale projects such as the 1000 Genomes Project (1KGP). MID identified previously unknown MIs from the 1KGP that overlap with genes and regulatory elements in the human genome. We also identified MIs in cancer cell lines from Cancer Cell Line Encyclopedia (CCLE). Therefore our tool is expected to be useful to improve the study of MIs as a type of genetic variant in the human genome. The source code can be downloaded from: http://cqb.pku.edu.cn/ZhuLab/MID .

  20. RefCNV: Identification of Gene-Based Copy Number Variants Using Whole Exome Sequencing.

    PubMed

    Chang, Lun-Ching; Das, Biswajit; Lih, Chih-Jian; Si, Han; Camalier, Corinne E; McGregor, Paul M; Polley, Eric

    2016-01-01

    With rapid advances in DNA sequencing technologies, whole exome sequencing (WES) has become a popular approach for detecting somatic mutations in oncology studies. The initial intent of WES was to characterize single nucleotide variants, but it was observed that the number of sequencing reads that mapped to a genomic region correlated with the DNA copy number variants (CNVs). We propose a method RefCNV that uses a reference set to estimate the distribution of the coverage for each exon. The construction of the reference set includes an evaluation of the sources of variability in the coverage distribution. We observed that the processing steps had an impact on the coverage distribution. For each exon, we compared the observed coverage with the expected normal coverage. Thresholds for determining CNVs were selected to control the false-positive error rate. RefCNV prediction correlated significantly (r = 0.96-0.86) with CNV measured by digital polymerase chain reaction for MET (7q31), EGFR (7p12), or ERBB2 (17q12) in 13 tumor cell lines. The genome-wide CNV analysis showed a good overall correlation (Spearman's coefficient = 0.82) between RefCNV estimation and publicly available CNV data in Cancer Cell Line Encyclopedia. RefCNV also showed better performance than three other CNV estimation methods in genome-wide CNV analysis.

Top