Sample records for resequencing highly variable

  1. Evaluation of Quality Assessment Protocols for High Throughput Genome Resequencing Data

    PubMed Central

    Chiara, Matteo; Pavesi, Giulio

    2017-01-01

    Large-scale initiatives aiming to recover the complete sequence of thousands of human genomes are currently being undertaken worldwide, concurring to the generation of a comprehensive catalog of human genetic variation. The ultimate and most ambitious goal of human population scale genomics is the characterization of the so-called human “variome,” through the identification of causal mutations or haplotypes. Several research institutions worldwide currently use genotyping assays based on Next-Generation Sequencing (NGS) for diagnostics and clinical screenings, and the widespread application of such technologies promises major revolutions in medical science. Bioinformatic analysis of human resequencing data is one of the main factors limiting the effectiveness and general applicability of NGS for clinical studies. The requirement for multiple tools, to be combined in dedicated protocols in order to accommodate different types of data (gene panels, exomes, or whole genomes) and the high variability of the data makes difficult the establishment of a ultimate strategy of general use. While there already exist several studies comparing sensitivity and accuracy of bioinformatic pipelines for the identification of single nucleotide variants from resequencing data, little is known about the impact of quality assessment and reads pre-processing strategies. In this work we discuss major strengths and limitations of the various genome resequencing protocols are currently used in molecular diagnostics and for the discovery of novel disease-causing mutations. By taking advantage of publicly available data we devise and suggest a series of best practices for the pre-processing of the data that consistently improve the outcome of genotyping with minimal impacts on computational costs. PMID:28736571

  2. Application of resequencing to rice genomics, functional genomics and evolutionary analysis

    PubMed Central

    2014-01-01

    Rice is a model system used for crop genomics studies. The completion of the rice genome draft sequences in 2002 not only accelerated functional genome studies, but also initiated a new era of resequencing rice genomes. Based on the reference genome in rice, next-generation sequencing (NGS) using the high-throughput sequencing system can efficiently accomplish whole genome resequencing of various genetic populations and diverse germplasm resources. Resequencing technology has been effectively utilized in evolutionary analysis, rice genomics and functional genomics studies. This technique is beneficial for both bridging the knowledge gap between genotype and phenotype and facilitating molecular breeding via gene design in rice. Here, we also discuss the limitation, application and future prospects of rice resequencing. PMID:25006357

  3. GeneChip Resequencing of the Smallpox Virus Genome Can Identify Novel Strains: a Biodefense Application▿

    PubMed Central

    Sulaiman, Irshad M.; Tang, Kevin; Osborne, John; Sammons, Scott; Wohlhueter, Robert M.

    2007-01-01

    We developed a set of seven resequencing GeneChips, based on the complete genome sequences of 24 strains of smallpox virus (variola virus), for rapid characterization of this human-pathogenic virus. Each GeneChip was designed to analyze a divergent segment of approximately 30,000 bases of the smallpox virus genome. This study includes the hybridization results of 14 smallpox virus strains. Of the 14 smallpox virus strains hybridized, only 7 had sequence information included in the design of the smallpox virus resequencing GeneChips; similar information for the remaining strains was not tiled as a reference in these GeneChips. By use of variola virus-specific primers and long-range PCR, 22 overlapping amplicons were amplified to cover nearly the complete genome and hybridized with the smallpox virus resequencing GeneChip set. These GeneChips were successful in generating nucleotide sequences for all 14 of the smallpox virus strains hybridized. Analysis of the data indicated that the GeneChip resequencing by hybridization was fast and reproducible and that the smallpox virus resequencing GeneChips could differentiate the 14 smallpox virus strains characterized. This study also suggests that high-density resequencing GeneChips have potential biodefense applications and may be used as an alternate tool for rapid identification of smallpox virus in the future. PMID:17182757

  4. Targeted Re-Sequencing Emulsion PCR Panel for Myopathies: Results in 94 Cases.

    PubMed

    Punetha, Jaya; Kesari, Akanchha; Uapinyoying, Prech; Giri, Mamta; Clarke, Nigel F; Waddell, Leigh B; North, Kathryn N; Ghaoui, Roula; O'Grady, Gina L; Oates, Emily C; Sandaradura, Sarah A; Bönnemann, Carsten G; Donkervoort, Sandra; Plotz, Paul H; Smith, Edward C; Tesi-Rocha, Carolina; Bertorini, Tulio E; Tarnopolsky, Mark A; Reitter, Bernd; Hausmanowa-Petrusewicz, Irena; Hoffman, Eric P

    2016-05-27

    Molecular diagnostics in the genetic myopathies often requires testing of the largest and most complex transcript units in the human genome (DMD, TTN, NEB). Iteratively targeting single genes for sequencing has traditionally entailed high costs and long turnaround times. Exome sequencing has begun to supplant single targeted genes, but there are concerns regarding coverage and needed depth of the very large and complex genes that frequently cause myopathies. To evaluate efficiency of next-generation sequencing technologies to provide molecular diagnostics for patients with previously undiagnosed myopathies. We tested a targeted re-sequencing approach, using a 45 gene emulsion PCR myopathy panel, with subsequent sequencing on the Illumina platform in 94 undiagnosed patients. We compared the targeted re-sequencing approach to exome sequencing for 10 of these patients studied. We detected likely pathogenic mutations in 33 out of 94 patients with a molecular diagnostic rate of approximately 35%. The remaining patients showed variants of unknown significance (35/94 patients) or no mutations detected in the 45 genes tested (26/94 patients). Mutation detection rates for targeted re-sequencing vs. whole exome were similar in both methods; however exome sequencing showed better distribution of reads and fewer exon dropouts. Given that costs of highly parallel re-sequencing and whole exome sequencing are similar, and that exome sequencing now takes considerably less laboratory processing time than targeted re-sequencing, we recommend exome sequencing as the standard approach for molecular diagnostics of myopathies.

  5. Global assessment of genomic variation in cattle by genome resequencing and high-throughput genotyping

    PubMed Central

    2011-01-01

    Background Integration of genomic variation with phenotypic information is an effective approach for uncovering genotype-phenotype associations. This requires an accurate identification of the different types of variation in individual genomes. Results We report the integration of the whole genome sequence of a single Holstein Friesian bull with data from single nucleotide polymorphism (SNP) and comparative genomic hybridization (CGH) array technologies to determine a comprehensive spectrum of genomic variation. The performance of resequencing SNP detection was assessed by combining SNPs that were identified to be either in identity by descent (IBD) or in copy number variation (CNV) with results from SNP array genotyping. Coding insertions and deletions (indels) were found to be enriched for size in multiples of 3 and were located near the N- and C-termini of proteins. For larger indels, a combination of split-read and read-pair approaches proved to be complementary in finding different signatures. CNVs were identified on the basis of the depth of sequenced reads, and by using SNP and CGH arrays. Conclusions Our results provide high resolution mapping of diverse classes of genomic variation in an individual bovine genome and demonstrate that structural variation surpasses sequence variation as the main component of genomic variability. Better accuracy of SNP detection was achieved with little loss of sensitivity when algorithms that implemented mapping quality were used. IBD regions were found to be instrumental for calculating resequencing SNP accuracy, while SNP detection within CNVs tended to be less reliable. CNV discovery was affected dramatically by platform resolution and coverage biases. The combined data for this study showed that at a moderate level of sequencing coverage, an ensemble of platforms and tools can be applied together to maximize the accurate detection of sequence and structural variants. PMID:22082336

  6. The Genome Sequencer FLX System--longer reads, more applications, straight forward bioinformatics and more complete data sets.

    PubMed

    Droege, Marcus; Hill, Brendon

    2008-08-31

    The Genome Sequencer FLX System (GS FLX), powered by 454 Sequencing, is a next-generation DNA sequencing technology featuring a unique mix of long reads, exceptional accuracy, and ultra-high throughput. It has been proven to be the most versatile of all currently available next-generation sequencing technologies, supporting many high-profile studies in over seven applications categories. GS FLX users have pursued innovative research in de novo sequencing, re-sequencing of whole genomes and target DNA regions, metagenomics, and RNA analysis. 454 Sequencing is a powerful tool for human genetics research, having recently re-sequenced the genome of an individual human, currently re-sequencing the complete human exome and targeted genomic regions using the NimbleGen sequence capture process, and detected low-frequency somatic mutations linked to cancer.

  7. ReSeqTools: an integrated toolkit for large-scale next-generation sequencing based resequencing analysis.

    PubMed

    He, W; Zhao, S; Liu, X; Dong, S; Lv, J; Liu, D; Wang, J; Meng, Z

    2013-12-04

    Large-scale next-generation sequencing (NGS)-based resequencing detects sequence variations, constructs evolutionary histories, and identifies phenotype-related genotypes. However, NGS-based resequencing studies generate extraordinarily large amounts of data, making computations difficult. Effective use and analysis of these data for NGS-based resequencing studies remains a difficult task for individual researchers. Here, we introduce ReSeqTools, a full-featured toolkit for NGS (Illumina sequencing)-based resequencing analysis, which processes raw data, interprets mapping results, and identifies and annotates sequence variations. ReSeqTools provides abundant scalable functions for routine resequencing analysis in different modules to facilitate customization of the analysis pipeline. ReSeqTools is designed to use compressed data files as input or output to save storage space and facilitates faster and more computationally efficient large-scale resequencing studies in a user-friendly manner. It offers abundant practical functions and generates useful statistics during the analysis pipeline, which significantly simplifies resequencing analysis. Its integrated algorithms and abundant sub-functions provide a solid foundation for special demands in resequencing projects. Users can combine these functions to construct their own pipelines for other purposes.

  8. Development of a High-Throughput Resequencing Array for the Detection of Pathogenic Mutations in Osteogenesis Imperfecta

    PubMed Central

    Wang, Yao; Cui, Yazhou; Zhou, Xiaoyan; Han, Jinxiang

    2015-01-01

    Objective Osteogenesis imperfecta (OI) is a rare inherited skeletal disease, characterized by bone fragility and low bone density. The mutations in this disorder have been widely reported to be on various exonal hotspots of the candidate genes, including COL1A1, COL1A2, CRTAP, LEPRE1, and FKBP10, thus creating a great demand for precise genetic tests. However, large genome sizes make the process daunting and the analyses, inefficient and expensive. Therefore, we aimed at developing a fast, accurate, efficient, and cheaper sequencing platform for OI diagnosis; and to this end, use of an advanced array-based technique was proposed. Method A CustomSeq Affymetrix Resequencing Array was established for high-throughput sequencing of five genes simultaneously. Genomic DNA extraction from 13 OI patients and 85 normal controls and amplification using long-range PCR (LR-PCR) were followed by DNA fragmentation and chip hybridization, according to standard Affymetrix protocols. Hybridization signals were determined using GeneChip Sequence Analysis Software (GSEQ). To examine the feasibility, the outcome from new resequencing approach was validated by conventional capillary sequencing method. Result Overall call rates using resequencing array was 96–98% and the agreement between microarray and capillary sequencing was 99.99%. 11 out of 13 OI patients with pathogenic mutations were successfully detected by the chip analysis without adjustment, and one mutation could also be identified using manual visual inspection. Conclusion A high-throughput resequencing array was developed that detects the disease-associated mutations in OI, providing a potential tool to facilitate large-scale genetic screening for OI patients. Through this method, a novel mutation was also found. PMID:25742658

  9. Genome resequencing and comparative variome analysis in a Brassica rapa and Brassica oleracea collection.

    PubMed

    Cheng, Feng; Wu, Jian; Cai, Chengcheng; Fu, Lixia; Liang, Jianli; Borm, Theo; Zhuang, Mu; Zhang, Yangyong; Zhang, Fenglan; Bonnema, Guusje; Wang, Xiaowu

    2016-12-20

    The closely related species Brassica rapa and B. oleracea encompass a wide range of vegetable, fodder and oil crops. The release of their reference genomes has facilitated resequencing collections of B. rapa and B. oleracea aiming to build their variome datasets. These data can be used to investigate the evolutionary relationships between and within the different species and the domestication of the crops, hereafter named morphotypes. These data can also be used in genetic studies aiming at the identification of genes that influence agronomic traits. We selected and resequenced 199 B. rapa and 119 B. oleracea accessions representing 12 and nine morphotypes, respectively. Based on these resequencing data, we obtained 2,249,473 and 3,852,169 high quality SNPs (single-nucleotide polymorphisms), as well as 303,617 and 417,004 InDels for the B. rapa and B. oleracea populations, respectively. The variome datasets of B. rapa and B. oleracea represent valuable resources to researchers working on evolution, domestication or breeding of Brassica vegetable crops.

  10. Multiplexed Elimination of Wild-Type DNA and High-Resolution Melting Prior to Targeted Resequencing of Liquid Biopsies.

    PubMed

    Ladas, Ioannis; Fitarelli-Kiehl, Mariana; Song, Chen; Adalsteinsson, Viktor A; Parsons, Heather A; Lin, Nancy U; Wagle, Nikhil; Makrigiorgos, G Mike

    2017-10-01

    The use of clinical samples and circulating cell-free DNA (cfDNA) collected from liquid biopsies for diagnostic and prognostic applications in cancer is burgeoning, and improved methods that reduce the influence of excess wild-type (WT) portion of the sample are desirable. Here we present enrichment of mutation-containing sequences using enzymatic degradation of WT DNA. Mutation enrichment is combined with high-resolution melting (HRM) performed in multiplexed closed-tube reactions as a rapid, cost-effective screening tool before targeted resequencing. We developed a homogeneous, closed-tube approach to use a double-stranded DNA-specific nuclease for degradation of WT DNA at multiple targets simultaneously. The No Denaturation Nuclease-assisted Minor Allele Enrichment with Probe Overlap (ND-NaME-PrO) uses WT oligonucleotides overlapping both strands on putative DNA targets. Under conditions of partial denaturation (DNA breathing), the oligonucleotide probes enhance double-stranded DNA-specific nuclease digestion at the selected targets, with high preference toward WT over mutant DNA. To validate ND-NaME-PrO, we used multiplexed HRM, digital PCR, and MiSeq targeted resequencing of mutated genomic DNA and cfDNA. Serial dilution of KRAS mutation-containing DNA shows mutation enrichment by 10- to 120-fold and detection of allelic fractions down to 0.01%. Multiplexed ND-NaME-PrO combined with multiplexed PCR-HRM showed mutation scanning of 10-20 DNA amplicons simultaneously. ND-NaME-PrO applied on cfDNA from clinical samples enables mutation enrichment and HRM scanning over 10 DNA targets. cfDNA mutations were enriched up to approximately 100-fold (average approximately 25-fold) and identified via targeted resequencing. Closed-tube homogeneous ND-NaME-PrO combined with multiplexed HRM is a convenient approach to efficiently enrich for mutations on multiple DNA targets and to enable prescreening before targeted resequencing. © 2017 American Association for Clinical Chemistry.

  11. Characterization of 25 full-length S-RNase alleles, including flanking regions, from a pool of resequenced apple cultivars.

    PubMed

    De Franceschi, Paolo; Bianco, Luca; Cestaro, Alessandro; Dondini, Luca; Velasco, Riccardo

    2018-06-01

    Data obtained from Illumina resequencing of 63 apple cultivars were used to obtain full-length S-RNase sequences using a strategy based on both alignment and de novo assembly of reads. The reproductive biology of apple is regulated by the S-RNase-based gametophytic self-incompatibility system, that is genetically controlled by the single, multi-genic and multi-allelic S locus. Resequencing of apple cultivars provided a huge amount of genetic data, that can be aligned to the reference genome in order to characterize variation to a genome-wide level. However, this approach is not immediately adaptable to the S-locus, due to some peculiar features such as the high degree of polymorphism, lack of colinearity between haplotypes and extensive presence of repetitive elements. In this study we describe a dedicated procedure aimed at characterizing S-RNase alleles from resequenced cultivars. The S-genotype of 63 apple accessions is reported; the full length coding sequence was determined for the 25 S-RNase alleles present in the 63 resequenced cultivars; these included 10 previously incomplete sequences (S 5 , S 6a , S 6b , S 8 , S 11 , S 23 , S 39 , S 46 , S 50 and S 58 ). Moreover, sequence divergence clearly suggests that alleles S 6a and S 6b , proposed to be neutral variants of the same alleles, should be instead considered different specificities. The promoter sequences have also been analyzed, highlighting regions of homology conserved among all the alleles.

  12. Quantification of differential gene expression by multiplexed targeted resequencing of cDNA

    PubMed Central

    Arts, Peer; van der Raadt, Jori; van Gestel, Sebastianus H.C.; Steehouwer, Marloes; Shendure, Jay; Hoischen, Alexander; Albers, Cornelis A.

    2017-01-01

    Whole-transcriptome or RNA sequencing (RNA-Seq) is a powerful and versatile tool for functional analysis of different types of RNA molecules, but sample reagent and sequencing cost can be prohibitive for hypothesis-driven studies where the aim is to quantify differential expression of a limited number of genes. Here we present an approach for quantification of differential mRNA expression by targeted resequencing of complementary DNA using single-molecule molecular inversion probes (cDNA-smMIPs) that enable highly multiplexed resequencing of cDNA target regions of ∼100 nucleotides and counting of individual molecules. We show that accurate estimates of differential expression can be obtained from molecule counts for hundreds of smMIPs per reaction and that smMIPs are also suitable for quantification of relative gene expression and allele-specific expression. Compared with low-coverage RNA-Seq and a hybridization-based targeted RNA-Seq method, cDNA-smMIPs are a cost-effective high-throughput tool for hypothesis-driven expression analysis in large numbers of genes (10 to 500) and samples (hundreds to thousands). PMID:28474677

  13. BreaKmer: detection of structural variation in targeted massively parallel sequencing data using kmers.

    PubMed

    Abo, Ryan P; Ducar, Matthew; Garcia, Elizabeth P; Thorner, Aaron R; Rojas-Rudilla, Vanesa; Lin, Ling; Sholl, Lynette M; Hahn, William C; Meyerson, Matthew; Lindeman, Neal I; Van Hummelen, Paul; MacConaill, Laura E

    2015-02-18

    Genomic structural variation (SV), a common hallmark of cancer, has important predictive and therapeutic implications. However, accurately detecting SV using high-throughput sequencing data remains challenging, especially for 'targeted' resequencing efforts. This is critically important in the clinical setting where targeted resequencing is frequently being applied to rapidly assess clinically actionable mutations in tumor biopsies in a cost-effective manner. We present BreaKmer, a novel approach that uses a 'kmer' strategy to assemble misaligned sequence reads for predicting insertions, deletions, inversions, tandem duplications and translocations at base-pair resolution in targeted resequencing data. Variants are predicted by realigning an assembled consensus sequence created from sequence reads that were abnormally aligned to the reference genome. Using targeted resequencing data from tumor specimens with orthogonally validated SV, non-tumor samples and whole-genome sequencing data, BreaKmer had a 97.4% overall sensitivity for known events and predicted 17 positively validated, novel variants. Relative to four publically available algorithms, BreaKmer detected SV with increased sensitivity and limited calls in non-tumor samples, key features for variant analysis of tumor specimens in both the clinical and research settings. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. Genetic variability of the equine casein genes.

    PubMed

    Brinkmann, J; Jagannathan, V; Drögemüller, C; Rieder, S; Leeb, T; Thaller, G; Tetens, J

    2016-07-01

    The casein genes are known to be highly variable in typical dairy species, such as cattle and goat, but the knowledge about equine casein genes is limited. Nevertheless, mare milk production and consumption is gaining importance because of its high nutritive value, use in naturopathy, and hypoallergenic properties with respect to cow milk protein allergies. In the current study, the open reading frames of the 4 casein genes CSN1S1 (αS1-casein), CSN2 (β-casein), CSN1S2 (αS2-casein), and CSN3 (κ-casein) were resequenced in 253 horses of 14 breeds. The analysis revealed 21 nonsynonymous nucleotide exchanges, as well as 11 synonymous nucleotide exchanges, leading to a total of 31 putative protein isoforms predicted at the DNA level, 26 of which considered novel. Although the majority of the alleles need to be confirmed at the transcript and protein level, a preliminary nomenclature was established for the equine casein alleles. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  15. Single-Molecule Electrical Random Resequencing of DNA and RNA

    NASA Astrophysics Data System (ADS)

    Ohshiro, Takahito; Matsubara, Kazuki; Tsutsui, Makusu; Furuhashi, Masayuki; Taniguchi, Masateru; Kawai, Tomoji

    2012-07-01

    Two paradigm shifts in DNA sequencing technologies--from bulk to single molecules and from optical to electrical detection--are expected to realize label-free, low-cost DNA sequencing that does not require PCR amplification. It will lead to development of high-throughput third-generation sequencing technologies for personalized medicine. Although nanopore devices have been proposed as third-generation DNA-sequencing devices, a significant milestone in these technologies has been attained by demonstrating a novel technique for resequencing DNA using electrical signals. Here we report single-molecule electrical resequencing of DNA and RNA using a hybrid method of identifying single-base molecules via tunneling currents and random sequencing. Our method reads sequences of nine types of DNA oligomers. The complete sequence of 5'-UGAGGUA-3' from the let-7 microRNA family was also identified by creating a composite of overlapping fragment sequences, which was randomly determined using tunneling current conducted by single-base molecules as they passed between a pair of nanoelectrodes.

  16. A targeted resequencing gene panel for focal epilepsy.

    PubMed

    Hildebrand, Michael S; Myers, Candace T; Carvill, Gemma L; Regan, Brigid M; Damiano, John A; Mullen, Saul A; Newton, Mark R; Nair, Umesh; Gazina, Elena V; Milligan, Carol J; Reid, Christopher A; Petrou, Steven; Scheffer, Ingrid E; Berkovic, Samuel F; Mefford, Heather C

    2016-04-26

    We report development of a targeted resequencing gene panel for focal epilepsy, the most prevalent phenotypic group of the epilepsies. The targeted resequencing gene panel was designed using molecular inversion probe (MIP) capture technology and sequenced using massively parallel Illumina sequencing. We demonstrated proof of principle that mutations can be detected in 4 previously genotyped focal epilepsy cases. We searched for both germline and somatic mutations in 251 patients with unsolved sporadic or familial focal epilepsy and identified 11 novel or very rare missense variants in 5 different genes: CHRNA4, GRIN2B, KCNT1, PCDH19, and SCN1A. Of these, 2 were predicted to be pathogenic or likely pathogenic, explaining ∼0.8% of the cohort, and 8 were of uncertain significance based on available data. We have developed and validated a targeted resequencing panel for focal epilepsies, the most important clinical class of epilepsies, accounting for about 60% of all cases. Our application of MIP technology is an innovative approach that will be advantageous in the clinical setting because it is highly sensitive, efficient, and cost-effective for screening large patient cohorts. Our findings indicate that mutations in known genes likely explain only a small proportion of focal epilepsy cases. This is not surprising given the established clinical and genetic heterogeneity of these disorders and underscores the importance of further gene discovery studies in this complex syndrome. © 2016 American Academy of Neurology.

  17. Quantitation of heteroplasmy of mtDNA sequence variants identified in a population of AD patients and controls by array-based resequencing.

    PubMed

    Coon, Keith D; Valla, Jon; Szelinger, Szabolics; Schneider, Lonnie E; Niedzielko, Tracy L; Brown, Kevin M; Pearson, John V; Halperin, Rebecca; Dunckley, Travis; Papassotiropoulos, Andreas; Caselli, Richard J; Reiman, Eric M; Stephan, Dietrich A

    2006-08-01

    The role of mitochondrial dysfunction in the pathogenesis of Alzheimer's disease (AD) has been well documented. Though evidence for the role of mitochondria in AD seems incontrovertible, the impact of mitochondrial DNA (mtDNA) mutations in AD etiology remains controversial. Though mutations in mitochondrially encoded genes have repeatedly been implicated in the pathogenesis of AD, many of these studies have been plagued by lack of replication as well as potential contamination of nuclear-encoded mitochondrial pseudogenes. To assess the role of mtDNA mutations in the pathogenesis of AD, while avoiding the pitfalls of nuclear-encoded mitochondrial pseudogenes encountered in previous investigations and showcasing the benefits of a novel resequencing technology, we sequenced the entire coding region (15,452 bp) of mtDNA from 19 extremely well-characterized AD patients and 18 age-matched, unaffected controls utilizing a new, reliable, high-throughput array-based resequencing technique, the Human MitoChip. High-throughput, array-based DNA resequencing of the entire mtDNA coding region from platelets of 37 subjects revealed the presence of 208 loci displaying a total of 917 sequence variants. There were no statistically significant differences in overall mutational burden between cases and controls, however, 265 independent sites of statistically significant change between cases and controls were identified. Changed sites were found in genes associated with complexes I (30.2%), III (3.0%), IV (33.2%), and V (9.1%) as well as tRNA (10.6%) and rRNA (14.0%). Despite their statistical significance, the subtle nature of the observed changes makes it difficult to determine whether they represent true functional variants involved in AD etiology or merely naturally occurring dissimilarity. Regardless, this study demonstrates the tremendous value of this novel mtDNA resequencing platform, which avoids the pitfalls of erroneously amplifying nuclear-encoded mtDNA pseudogenes, and our proposed analysis paradigm, which utilizes the availability of raw signal intensity values for each of the four potential alleles to facilitate quantitative estimates of mtDNA heteroplasmy. This information provides a potential new target for burgeoning diagnostics and therapeutics that could truly assist those suffering from this devastating disorder.

  18. A Novel Truncating LMNA Mutation in Patients with Cardiac Conduction Disorders and Dilated Cardiomyopathy.

    PubMed

    Kawakami, Hiroshi; Ogimoto, Akiyoshi; Tokunaga, Naohito; Nishimura, Kazuhisa; Kawakami, Hideo; Higashi, Haruhiko; Iio, Chiharuko; Kono, Tamami; Aono, Jun; Uetani, Teruyoshi; Nagai, Takayuki; Inoue, Katsuji; Suzuki, Jun; Ikeda, Shuntaro; Okura, Takafumi; Ohyagi, Yasumasa; Tabara, Yasuharu; Higaki, Jitsuo

    2018-05-30

    The cardiac phenotype of laminopathies is characterized by cardiac conduction disorders (CCDs) and dilated cardiomyopathy (DCM). Although laminopathies have been considered monogenic, they exhibit a remarkable degree of clinical variability. This case series aimed to detect the causal mutation and to investigate the causes of clinical variability in a Japanese family with inherited CCD and DCM.Of the five family members investigated, four had either CCD/DCM or CCD alone, while one subject had no cardiovascular disease and acted as a normal control. We performed targeted resequencing of 174 inherited cardiovascular disease-associated genes in this family and pathological mutations were confirmed using Sanger sequencing. The degree of clinical severity and variability were also evaluated using long-term medical records. We discovered a novel heterozygous truncating lamin A/C (LMNA) mutation (c.774delG) in all four subjects with CCD. Because this mutation was predicted to cause a frameshift mutation and premature termination (p.Gln258HisfsTer222) in LMNA, we believe that this LMNA mutation was the causal mutation in this family with CCD and laminopathies. In addition, gender-specific intra-familiar clinical variability was observed in this Japanese family where affected males exhibited an earlier onset of CCD and more severe DCM compared to affected females. Using targeted resequencing, we discovered a novel truncating LMNA mutation associated with CCD and DCM in this family characterized by gender differences in clinical severity in LMNA carriers. Our results suggest that in patients with laminopathy, clinical severity may be the result of multiple factors.

  19. Assessment of a targeted resequencing assay as a support tool in the diagnosis of lysosomal storage disorders

    PubMed Central

    2014-01-01

    Background With over 50 different disorders and a combined incidence of up to 1/3000 births, lysosomal storage diseases (LSDs) constitute a major public health problem and place an enormous burden on affected individuals and their families. Many factors make LSD diagnosis difficult, including phenotype and penetrance variability, shared signs and symptoms, and problems inherent to biochemical diagnosis. Developing a powerful diagnostic tool could mitigate the protracted diagnostic process for these families, lead to better outcomes for current and proposed therapies, and provide the basis for more appropriate genetic counseling. Methods We have designed a targeted resequencing assay for the simultaneous testing of 57 lysosomal genes, using in-solution capture as the enrichment method and two different sequencing platforms. A total of 84 patients with high to moderate-or low suspicion index for LSD were enrolled in different centers in Spain and Portugal, including 18 positive controls. Results We correctly diagnosed 18 positive blinded controls, provided genetic diagnosis to 25 potential LSD patients, and ended with 18 diagnostic odysseys. Conclusion We report the assessment of a next–generation-sequencing-based approach as an accessory tool in the diagnosis of LSDs, a group of disorders which have overlapping clinical profiles and genetic heterogeneity. We have also identified and quantified the strengths and limitations of next generation sequencing (NGS) technology applied to diagnosis. PMID:24767253

  20. Re-sequencing regions of the ovine Y chromosome in domestic and wild sheep reveals novel paternal haplotypes.

    PubMed

    Meadows, J R S; Kijas, J W

    2009-02-01

    The male-specific region of the ovine Y chromosome (MSY) remains poorly characterized, yet sequence variants from this region have the potential to reveal the wild progenitor of domestic sheep or examples of domestic and wild paternal introgression. The 5' promoter region of the sex-determining gene SRY was re-sequenced using a subset of wild sheep including bighorn (Ovis canadensis), thinhorn (Ovis dalli spp.), urial (Ovis vignei), argali (Ovis ammon), mouflon (Ovis musimon) and domestic sheep (Ovis aries). Seven novel SNPs (oY2-oY8) were revealed; these were polymorphic between but not within species. Re-sequencing and fragment analysis was applied to the MSY microsatellite SRYM18. It contains a complex compound repeat structure and sequencing of three novel size fragments revealed that a pentanucleotide element remained fixed, whilst a dinucleotide element displayed variability within species. Comparison of the sequence between species revealed that urial and argali sheep grouped more closely to the mouflon and domestic breeds than the pachyceriforms (bighorn and thinhorn). SNP and microsatellite data were combined to define six previously undetected haplotypes. Analysis revealed the mouflon as the only species to share a haplotype with domestic sheep, consistent with its status as a feral domesticate that has undergone male-mediated exchange with domestic animals. A comparison of the remaining wild species and domestic sheep revealed that O. aries is free from signatures of wild sheep introgression.

  1. Targeted resequencing reveals ALK fusions in non-small cell lung carcinomas detected by FISH, immunohistochemistry, and real-time RT-PCR: a comparison of four methods.

    PubMed

    Tuononen, Katja; Sarhadi, Virinder Kaur; Wirtanen, Aino; Rönty, Mikko; Salmenkivi, Kaisa; Knuuttila, Aija; Remes, Satu; Telaranta-Keerie, Aino I; Bloor, Stuart; Ellonen, Pekka; Knuutila, Sakari

    2013-01-01

    Anaplastic lymphoma receptor tyrosine kinase (ALK) gene rearrangements occur in a subgroup of non-small cell lung carcinomas (NSCLCs). The identification of these rearrangements is important for guiding treatment decisions. The aim of our study was to screen ALK gene fusions in NSCLCs and to compare the results detected by targeted resequencing with results detected by commonly used methods, including fluorescence in situ hybridization (FISH), immunohistochemistry (IHC), and real-time reverse transcription-PCR (RT-PCR). Furthermore, we aimed to ascertain the potential of targeted resequencing in detection of ALK-rearranged lung carcinomas. We assessed ALK fusion status for 95 formalin-fixed paraffin-embedded tumor tissue specimens from 87 patients with NSCLC by FISH and real-time RT-PCR, for 57 specimens from 56 patients by targeted resequencing, and for 14 specimens from 14 patients by IHC. All methods were performed successfully on formalin-fixed paraffin-embedded tumor tissue material. We detected ALK fusion in 5.7% (5 out of 87) of patients examined. The results obtained from resequencing correlated significantly with those from FISH, real-time RT-PCR, and IHC. Targeted resequencing proved to be a promising method for ALK gene fusion detection in NSCLC. Means to reduce the material and turnaround time required for analysis are, however, needed.

  2. Contribution of CYP2B6 alleles in explaining extreme (S)-methadone plasma levels: a CYP2B6 gene resequencing study.

    PubMed

    Dobrinas, Maria; Crettol, Séverine; Oneda, Beatrice; Lahyani, Rachel; Rotger, Margalida; Choong, Eva; Lubomirov, Rubin; Csajka, Chantal; Eap, Chin B

    2013-02-01

    (S)-Methadone, metabolized mainly by CYP2B6, shows a wide interindividual variability in its pharmacokinetics and pharmacodynamics. Resequencing of the CYP2B6 gene was performed in 12 and 35 selected individuals with high (S)-methadone plasma exposure and low (S)-methadone plasma exposure, respectively, from a previously described cohort of 276 patients undergoing methadone maintenance treatment. Selected genetic polymorphisms were then analyzed in the complete cohort. The rs35303484 (*11; c136A>G; M46V) polymorphism was overrepresented in the high (S)-methadone level group, whereas the rs3745274 (*9; c516G>T; Q172H), rs2279344 (c822+183G>A), and rs8192719 (c1294+53C>T) polymorphisms were underrepresented in the low (S)-methadone level group, suggesting an association with decreased CYP2B6 activity. Conversely, the rs3211371 (*5; c1459C>T; R487C) polymorphism was overrepresented in the low-level group, indicating an increased CYP2B6 activity. A higher allele frequency was found in the high-level group compared with the low-level group for rs3745274 (*9; c516G>T; Q172H), rs2279343 (*4; c785A>G; K262R) (together representing CYP2B6*6), rs8192719 (c1294+53C>T), and rs2279344 (c822+183G>A), suggesting their involvement in decreased CYP2B6 activity. These results should be replicated in larger independent cohorts. Known genetic polymorphisms in CYP2B6 contribute toward explaining extreme (S)-methadone plasma levels observed in a cohort of patients following methadone maintenance treatment.

  3. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis.

    PubMed

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-11-20

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.

  4. Multiplex amplification of large sets of human exons.

    PubMed

    Porreca, Gregory J; Zhang, Kun; Li, Jin Billy; Xie, Bin; Austin, Derek; Vassallo, Sara L; LeProust, Emily M; Peck, Bill J; Emig, Christopher J; Dahl, Fredrik; Gao, Yuan; Church, George M; Shendure, Jay

    2007-11-01

    A new generation of technologies is poised to reduce DNA sequencing costs by several orders of magnitude. But our ability to fully leverage the power of these technologies is crippled by the absence of suitable 'front-end' methods for isolating complex subsets of a mammalian genome at a scale that matches the throughput at which these platforms will routinely operate. We show that targeting oligonucleotides released from programmable microarrays can be used to capture and amplify approximately 10,000 human exons in a single multiplex reaction. Additionally, we show integration of this protocol with ultra-high-throughput sequencing for targeted variation discovery. Although the multiplex capture reaction is highly specific, we found that nonuniform capture is a key issue that will need to be resolved by additional optimization. We anticipate that highly multiplexed methods for targeted amplification will enable the comprehensive resequencing of human exons at a fraction of the cost of whole-genome resequencing.

  5. The Peach v2.0 release: high-resolution linkage mapping and deep resequencing improve chromosome-scale assembly and contiguity.

    PubMed

    Verde, Ignazio; Jenkins, Jerry; Dondini, Luca; Micali, Sabrina; Pagliarani, Giulia; Vendramin, Elisa; Paris, Roberta; Aramini, Valeria; Gazza, Laura; Rossini, Laura; Bassi, Daniele; Troggio, Michela; Shu, Shengqiang; Grimwood, Jane; Tartarini, Stefano; Dettori, Maria Teresa; Schmutz, Jeremy

    2017-03-11

    The availability of the peach genome sequence has fostered relevant research in peach and related Prunus species enabling the identification of genes underlying important horticultural traits as well as the development of advanced tools for genetic and genomic analyses. The first release of the peach genome (Peach v1.0) represented a high-quality WGS (Whole Genome Shotgun) chromosome-scale assembly with high contiguity (contig L50 214.2 kb), large portions of mapped sequences (96%) and high base accuracy (99.96%). The aim of this work was to improve the quality of the first assembly by increasing the portion of mapped and oriented sequences, correcting misassemblies and improving the contiguity and base accuracy using high-throughput linkage mapping and deep resequencing approaches. Four linkage maps with 3,576 molecular markers were used to improve the portion of mapped and oriented sequences (from 96.0% and 85.6% of Peach v1.0 to 99.2% and 98.2% of v2.0, respectively) and enabled a more detailed identification of discernible misassemblies (10.4 Mb in total). The deep resequencing approach fixed 859 homozygous SNPs (Single Nucleotide Polymorphisms) and 1347 homozygous indels. Moreover, the assembled NGS contigs enabled the closing of 212 gaps with an improvement in the contig L50 of 19.2%. The improved high quality peach genome assembly (Peach v2.0) represents a valuable tool for the analysis of the genetic diversity, domestication, and as a vehicle for genetic improvement of peach and related Prunus species. Moreover, the important phylogenetic position of peach and the absence of recent whole genome duplication (WGD) events make peach a pivotal species for comparative genomics studies aiming at elucidating plant speciation and diversification processes.

  6. Population Genetic Analysis of the Uncoupling Proteins Supports a Role for UCP3 in Human Cold Resistance

    PubMed Central

    Hancock, Angela M.; Clark, Vanessa J.; Qian, Yudong; Di Rienzo, Anna

    2011-01-01

    Production of heat via nonshivering thermogenesis (NST) is critical for temperature homeostasis in mammals. Uncoupling protein UCP1 plays a central role in NST by uncoupling the proton gradients produced in the inner membranes of mitochondria to produce heat; however, the extent to which UCP1 homologues, UCP2 and UCP3, are involved in NST is the subject of an ongoing debate. We used an evolutionary approach to test the hypotheses that variants that are associated with increased expression of these genes (UCP1 −3826A, UCP2 −866A, and UCP3 −55T) show evidence of adaptation with winter climate. To that end, we calculated correlations between allele frequencies and winter climate variables for these single-nucleotide polymorphisms (SNPs), which we genotyped in a panel of 52 worldwide populations. We found significant correlations with winter climate for UCP1 −3826G/A and UCP3 −55C/T. Further, by analyzing previously published genotype data for these SNPs, we found that the peak of the correlation for the UCP1 region occurred at the disease-associated −3826A/G variant and that the UCP3 region has a striking signal overall, with several individual SNPs showing interesting patterns, including the −55C/T variant. Resequencing of the regions in a set of three diverse population samples helped to clarify the signals that we found with the genotype data. At UCP1, the resequencing data revealed modest evidence that the haplotype carrying the −3826A variant was driven to high frequency by selection. In the UCP3 region, combining results from the climate analysis and resequencing survey suggest a more complex model in which variants on multiple haplotypes may independently be correlated with temperature. This is further supported by an excess of intermediate frequency variants in the UCP3 region in the Han Chinese population. Taken together, our results suggest that adaptation to climate influenced the global distribution of allele frequencies in UCP1 and UCP3 and provide an independent source of evidence for a role in cold resistance for UCP3. PMID:20802238

  7. Portability of tag SNPs across isolated population groups: an example from India.

    PubMed

    Sarkar Roy, N; Farheen, S; Roy, N; Sengupta, S; Majumder, P P

    2008-01-01

    Isolated population groups are useful in conducting association studies of complex diseases to avoid various pitfalls, including those arising from population stratification. Since DNA resequencing is expensive, it is recommended that genotyping be carried out at tagSNP (tSNP) loci. For this, tSNPs identified in one isolated population need to be used in another. Unless tSNPs are highly portable across populations this strategy may result in loss of information in association studies. We examined the issue of tSNP portability by sampling individuals from 10 isolated ethnic groups from India. We generated DNA resequencing data pertaining to 3 genomic regions and identified tSNPs in each population. We defined an index of tSNP portability and showed that portability is low across isolated Indian ethnic groups. The extent of portability did not significantly correlate with genetic similarity among the populations studied here. We also analyzed our data with sequence data from individuals of African and European descent. Our results indicated that it may be necessary to carry out resequencing in a small number of individuals to discover SNPs and identify tSNPs in the specific isolated population in which a disease association study is to be conducted.

  8. Targeted resequencing in peanuts using the fluidigm access array

    USDA-ARS?s Scientific Manuscript database

    The presence of homoeologous gene copies in allotetraploid peanut makes it challenging to select homologous SNPs differentiating two or more cultivars. An integrated approach of improved bioinformatics and targeted resequencing to select homologous SNPs in tetraploid peanut is needed. Raw transcrip...

  9. Resequencing and association analysis of OXTR with autism spectrum disorder in a Japanese population.

    PubMed

    Egawa, Jun; Watanabe, Yuichiro; Shibuya, Masako; Endo, Taro; Sugimoto, Atsunori; Igeta, Hirofumi; Nunokawa, Ayako; Inoue, Emiko; Someya, Toshiyuki

    2015-03-01

    The oxytocin receptor (OXTR) is implicated in the pathophysiology of autism spectrum disorder (ASD). A recent study found a rare non-synonymous OXTR gene variation, rs35062132 (R376G), associated with ASD in a Japanese population. In order to investigate the association between rare non-synonymous OXTR variations and ASD, we resequenced OXTR and performed association analysis with ASD in a Japanese population. We resequenced the OXTR coding region in 213 ASD patients. Rare non-synonymous OXTR variations detected by resequencing were genotyped in 213 patients and 667 controls. We detected three rare non-synonymous variations: rs35062132 (R376G/C), rs151257822 (G334D), and g.8809426G>T (R150S). However, there was no significant association between these rare non-synonymous variations and ASD. Our present study does not support the contribution of rare non-synonymous OXTR variations to ASD susceptibility in the Japanese population. © 2014 The Authors. Psychiatry and Clinical Neurosciences © 2014 Japanese Society of Psychiatry and Neurology.

  10. Resequencing Calculus

    ERIC Educational Resources Information Center

    Dwyer, Dave; Gruenwald, Mark; Stickles, Joe; Axtell, Mike

    2018-01-01

    Resequencing Calculus is a project that has reordered the typical delivery of Calculus material to better serve the needs of STEM majors. Funded twice by the National Science Foundation, this project has produced a three-semester textbook that has been piloted at numerous institutions, large and small, public and private. This paper describes the…

  11. Whole-genome resequencing: changing the paradigms of SNP detection, molecular mapping and gene discovery

    USDA-ARS?s Scientific Manuscript database

    The next generation sequencing (NGS) technologies have opened a wealth of opportunities for plant breeding and genomics research, and changed the paradigms of marker detection, genotyping, and gene discovery. Abundant genomic resources have been generated using a whole genome resequencing (WGR) str...

  12. Resequencing microarray probe design for typing genetically diverse viruses: human rhinoviruses and enteroviruses

    PubMed Central

    Wang, Zheng; Malanoski, Anthony P; Lin, Baochuan; Kidd, Carolyn; Long, Nina C; Blaney, Kate M; Thach, Dzung C; Tibbetts, Clark; Stenger, David A

    2008-01-01

    Background Febrile respiratory illness (FRI) has a high impact on public health and global economics and poses a difficult challenge for differential diagnosis. A particular issue is the detection of genetically diverse pathogens, i.e. human rhinoviruses (HRV) and enteroviruses (HEV) which are frequent causes of FRI. Resequencing Pathogen Microarray technology has demonstrated potential for differential diagnosis of several respiratory pathogens simultaneously, but a high confidence design method to select probes for genetically diverse viruses is lacking. Results Using HRV and HEV as test cases, we assess a general design strategy for detecting and serotyping genetically diverse viruses. A minimal number of probe sequences (26 for HRV and 13 for HEV), which were potentially capable of detecting all serotypes of HRV and HEV, were determined and implemented on the Resequencing Pathogen Microarray RPM-Flu v.30/31 (Tessarae RPM-Flu). The specificities of designed probes were validated using 34 HRV and 28 HEV strains. All strains were successfully detected and identified at least to species level. 33 HRV strains and 16 HEV strains could be further differentiated to serotype level. Conclusion This study provides a fundamental evaluation of simultaneous detection and differential identification of genetically diverse RNA viruses with a minimal number of prototype sequences. The results demonstrated that the newly designed RPM-Flu v.30/31 can provide comprehensive and specific analysis of HRV and HEV samples which implicates that this design strategy will be applicable for other genetically diverse viruses. PMID:19046445

  13. Targeted capture and resequencing of 1040 genes reveal environmentally driven functional variation in grey wolves.

    PubMed

    Schweizer, Rena M; Robinson, Jacqueline; Harrigan, Ryan; Silva, Pedro; Galverni, Marco; Musiani, Marco; Green, Richard E; Novembre, John; Wayne, Robert K

    2016-01-01

    In an era of ever-increasing amounts of whole-genome sequence data for individuals and populations, the utility of traditional single nucleotide polymorphisms (SNPs) array-based genome scans is uncertain. We previously performed a SNP array-based genome scan to identify candidate genes under selection in six distinct grey wolf (Canis lupus) ecotypes. Using this information, we designed a targeted capture array for 1040 genes, including all exons and flanking regions, as well as 5000 1-kb nongenic neutral regions, and resequenced these regions in 107 wolves. Selection tests revealed striking patterns of variation within candidate genes relative to noncandidate regions and identified potentially functional variants related to local adaptation. We found 27% and 47% of candidate genes from the previous SNP array study had functional changes that were outliers in sweed and bayenv analyses, respectively. This result verifies the use of genomewide SNP surveys to tag genes that contain functional variants between populations. We highlight nonsynonymous variants in APOB, LIPG and USH2A that occur in functional domains of these proteins, and that demonstrate high correlation with precipitation seasonality and vegetation. We find Arctic and High Arctic wolf ecotypes have higher numbers of genes under selection, which highlight their conservation value and heightened threat due to climate change. This study demonstrates that combining genomewide genotyping arrays with large-scale resequencing and environmental data provides a powerful approach to discern candidate functional variants in natural populations. © 2015 John Wiley & Sons Ltd.

  14. A novel COL11A1 missense mutation in siblings with non-ocular Stickler syndrome.

    PubMed

    Kohmoto, Tomohiro; Tsuji, Atsumi; Morita, Kei-Ichi; Naruto, Takuya; Masuda, Kiyoshi; Kashimada, Kenichi; Enomoto, Keisuke; Morio, Tomohiro; Harada, Hiroyuki; Imoto, Issei

    2016-01-01

    Stickler syndrome (STL) is an autosomal, dominantly inherited, clinically variable and genetically heterogeneous connective tissue disorder characterized by ocular, auditory, orofacial and skeletal abnormalities. We conducted targeted resequencing using a next-generation sequencer for molecular diagnosis of a 2-year-old girl who was clinically suspected of having STL with Pierre Robin sequence. We detected a novel heterozygous missense mutation, NM_001854.3:n.4838G>A [NM_001854.3 (COL11A1_v001):c.4520G>A], in COL11A1, resulting in a Gly to Asp substitution at position 1507 [NM_001854.3(COL11A1_i001)] within one of the collagen-like domains of the triple helical region. The same mutation was detected in her 4-year-old brother with cleft palate and high-frequency sensorineural hearing loss.

  15. Extreme-Depth Re-sequencing of Mitochondrial DNA Finds No Evidence of Paternal Transmission in Humans.

    PubMed

    Pyle, Angela; Hudson, Gavin; Wilson, Ian J; Coxhead, Jonathan; Smertenko, Tania; Herbert, Mary; Santibanez-Koref, Mauro; Chinnery, Patrick F

    2015-05-01

    Recent reports have questioned the accepted dogma that mammalian mitochondrial DNA (mtDNA) is strictly maternally inherited. In humans, the argument hinges on detecting a signature of inter-molecular recombination in mtDNA sequences sampled at the population level, inferring a paternal source for the mixed haplotypes. However, interpreting these data is fraught with difficulty, and direct experimental evidence is lacking. Using extreme-high depth mtDNA re-sequencing up to ~1.2 million-fold coverage, we find no evidence that paternal mtDNA haplotypes are transmitted to offspring in humans, thus excluding a simple dilution mechanism for uniparental transmission of mtDNA present in all healthy individuals. Our findings indicate that an active mechanism eliminates paternal mtDNA which likely acts at the molecular level.

  16. Extreme-Depth Re-sequencing of Mitochondrial DNA Finds No Evidence of Paternal Transmission in Humans

    PubMed Central

    Pyle, Angela; Hudson, Gavin; Wilson, Ian J.; Coxhead, Jonathan; Smertenko, Tania; Herbert, Mary; Santibanez-Koref, Mauro; Chinnery, Patrick F.

    2015-01-01

    Recent reports have questioned the accepted dogma that mammalian mitochondrial DNA (mtDNA) is strictly maternally inherited. In humans, the argument hinges on detecting a signature of inter-molecular recombination in mtDNA sequences sampled at the population level, inferring a paternal source for the mixed haplotypes. However, interpreting these data is fraught with difficulty, and direct experimental evidence is lacking. Using extreme-high depth mtDNA re-sequencing up to ~1.2 million-fold coverage, we find no evidence that paternal mtDNA haplotypes are transmitted to offspring in humans, thus excluding a simple dilution mechanism for uniparental transmission of mtDNA present in all healthy individuals. Our findings indicate that an active mechanism eliminates paternal mtDNA which likely acts at the molecular level. PMID:25973765

  17. Selective whole genome amplification for resequencing target microbial species from complex natural samples.

    PubMed

    Leichty, Aaron R; Brisson, Dustin

    2014-10-01

    Population genomic analyses have demonstrated power to address major questions in evolutionary and molecular microbiology. Collecting populations of genomes is hindered in many microbial species by the absence of a cost effective and practical method to collect ample quantities of sufficiently pure genomic DNA for next-generation sequencing. Here we present a simple method to amplify genomes of a target microbial species present in a complex, natural sample. The selective whole genome amplification (SWGA) technique amplifies target genomes using nucleotide sequence motifs that are common in the target microbe genome, but rare in the background genomes, to prime the highly processive phi29 polymerase. SWGA thus selectively amplifies the target genome from samples in which it originally represented a minor fraction of the total DNA. The post-SWGA samples are enriched in target genomic DNA, which are ideal for population resequencing. We demonstrate the efficacy of SWGA using both laboratory-prepared mixtures of cultured microbes as well as a natural host-microbe association. Targeted amplification of Borrelia burgdorferi mixed with Escherichia coli at genome ratios of 1:2000 resulted in >10(5)-fold amplification of the target genomes with <6.7-fold amplification of the background. SWGA-treated genomic extracts from Wolbachia pipientis-infected Drosophila melanogaster resulted in up to 70% of high-throughput resequencing reads mapping to the W. pipientis genome. By contrast, 2-9% of sequencing reads were derived from W. pipientis without prior amplification. The SWGA technique results in high sequencing coverage at a fraction of the sequencing effort, thus allowing population genomic studies at affordable costs. Copyright © 2014 by the Genetics Society of America.

  18. Physics First: An Informational Guide for Teachers, School Administrators, Parents, Scientists, and the Public

    ERIC Educational Resources Information Center

    American Association of Physics Teachers (NJ1), 2009

    2009-01-01

    Physics First represents an organizational alternative to the traditional high school science sequence. It calls for a re-sequencing of high school courses so that students study physics before chemistry and biology. The purpose of this pamphlet is to provide: (1) Basic information and rationale for the Physics First curriculum; (2) Strategies for…

  19. Whole-genome resequencing identifies the molecular genetic cause for the absence of a Gy5 glycinin protein in soybean PI 603408

    USDA-ARS?s Scientific Manuscript database

    During ongoing proteomic analysis of the soybean (Glycine max (L.) Merr) germplasm collection, PI 603408 was identified as a landrace whose seeds lack accumulation of one of the major seed storage glycinin protein subunits. Whole genomic resequencing was used to identify a two-base deletion affectin...

  20. Single nucleotide variants and indels identified from whole-genome re-sequencing of Guzerat, Gyr, Girolando and Holstein cattle breeds

    USDA-ARS?s Scientific Manuscript database

    Whole-genome re-sequencing, alignment and annotation analyses were undertaken for 12 sires representing four important cattle breeds in Brazil: Guzerat (multi-purpose), Gyr, Girolando and Holstein (dairy production). A total of approximately 4.3 billion reads from an Illumina HiSeq 2000 sequencer ge...

  1. Universal Detection and Identification of Avian Influenza Virus by Use of Resequencing Microarrays▿ †

    PubMed Central

    Lin, Baochuan; Malanoski, Anthony P.; Wang, Zheng; Blaney, Kate M.; Long, Nina C.; Meador, Carolyn E.; Metzgar, David; Myers, Christopher A.; Yingst, Samuel L.; Monteville, Marshall R.; Saad, Magdi D.; Schnur, Joel M.; Tibbetts, Clark; Stenger, David A.

    2009-01-01

    Zoonotic microbes have historically been, and continue to emerge as, threats to human health. The recent outbreaks of highly pathogenic avian influenza virus in bird populations and the appearance of some human infections have increased the concern of a possible new influenza pandemic, which highlights the need for broad-spectrum detection methods for rapidly identifying the spread or outbreak of all variants of avian influenza virus. In this study, we demonstrate that high-density resequencing pathogen microarrays (RPM) can be such a tool. The results from 37 influenza virus isolates show that the RPM platform is an effective means for detecting and subtyping influenza virus, while simultaneously providing sequence information for strain resolution, pathogenicity, and drug resistance without additional analysis. This study establishes that the RPM platform is a broad-spectrum pathogen detection and surveillance tool for monitoring the circulation of prevalent influenza viruses in the poultry industry and in wild birds or incidental exposures and infections in humans. PMID:19279171

  2. CoVaCS: a consensus variant calling system.

    PubMed

    Chiara, Matteo; Gioiosa, Silvia; Chillemi, Giovanni; D'Antonio, Mattia; Flati, Tiziano; Picardi, Ernesto; Zambelli, Federico; Horner, David Stephen; Pesole, Graziano; Castrignanò, Tiziana

    2018-02-05

    The advent and ongoing development of next generation sequencing technologies (NGS) has led to a rapid increase in the rate of human genome re-sequencing data, paving the way for personalized genomics and precision medicine. The body of genome resequencing data is progressively increasing underlining the need for accurate and time-effective bioinformatics systems for genotyping - a crucial prerequisite for identification of candidate causal mutations in diagnostic screens. Here we present CoVaCS, a fully automated, highly accurate system with a web based graphical interface for genotyping and variant annotation. Extensive tests on a gold standard benchmark data-set -the NA12878 Illumina platinum genome- confirm that call-sets based on our consensus strategy are completely in line with those attained by similar command line based approaches, and far more accurate than call-sets from any individual tool. Importantly our system exhibits better sensitivity and higher specificity than equivalent commercial software. CoVaCS offers optimized pipelines integrating state of the art tools for variant calling and annotation for whole genome sequencing (WGS), whole-exome sequencing (WES) and target-gene sequencing (TGS) data. The system is currently hosted at Cineca, and offers the speed of a HPC computing facility, a crucial consideration when large numbers of samples must be analysed. Importantly, all the analyses are performed automatically allowing high reproducibility of the results. As such, we believe that CoVaCS can be a valuable tool for the analysis of human genome resequencing studies. CoVaCS is available at: https://bioinformatics.cineca.it/covacs .

  3. A new single-nucleotide polymorphism database for rainbow trout generated through whole genome re-sequencing

    USDA-ARS?s Scientific Manuscript database

    Single-nucleotide polymorphisms (SNPs) are highly abundant markers, which are broadly distributed in animal genomes. For rainbow trout, SNP discovery has been done through sequencing of restriction-site associated DNA (RAD) libraries, reduced representation libraries (RRL), RNA sequencing, and whole...

  4. A novel COL11A1 missense mutation in siblings with non-ocular Stickler syndrome

    PubMed Central

    Kohmoto, Tomohiro; Tsuji, Atsumi; Morita, Kei-ichi; Naruto, Takuya; Masuda, Kiyoshi; Kashimada, Kenichi; Enomoto, Keisuke; Morio, Tomohiro; Harada, Hiroyuki; Imoto, Issei

    2016-01-01

    Stickler syndrome (STL) is an autosomal, dominantly inherited, clinically variable and genetically heterogeneous connective tissue disorder characterized by ocular, auditory, orofacial and skeletal abnormalities. We conducted targeted resequencing using a next-generation sequencer for molecular diagnosis of a 2-year-old girl who was clinically suspected of having STL with Pierre Robin sequence. We detected a novel heterozygous missense mutation, NM_001854.3:n.4838G>A [NM_001854.3 (COL11A1_v001):c.4520G>A], in COL11A1, resulting in a Gly to Asp substitution at position 1507 [NM_001854.3(COL11A1_i001)] within one of the collagen-like domains of the triple helical region. The same mutation was detected in her 4-year-old brother with cleft palate and high-frequency sensorineural hearing loss. PMID:27081569

  5. Sequence Variability and Geographic Distribution of Lassa Virus, Sierra Leone

    PubMed Central

    Stockelman, Michael G.; Moses, Lina M.; Park, Matthew; Stenger, David A.; Ansumana, Rashid; Bausch, Daniel G.; Lin, Baochuan

    2015-01-01

    Lassa virus (LASV) is endemic to parts of West Africa and causes highly fatal hemorrhagic fever. The multimammate rat (Mastomys natalensis) is the only known reservoir of LASV. Most human infections result from zoonotic transmission. The very diverse LASV genome has 4 major lineages associated with different geographic locations. We used reverse transcription PCR and resequencing microarrays to detect LASV in 41 of 214 samples from rodents captured at 8 locations in Sierra Leone. Phylogenetic analysis of partial sequences of nucleoprotein (NP), glycoprotein precursor (GPC), and polymerase (L) genes showed 5 separate clades within lineage IV of LASV in this country. The sequence diversity was higher than previously observed; mean diversity was 7.01% for nucleoprotein gene at the nucleotide level. These results may have major implications for designing diagnostic tests and therapeutic agents for LASV infections in Sierra Leone. PMID:25811712

  6. BAC-end sequence-based SNP mining in Allotetraploid Cotton (Gossypium) utilizing re-sequencing data, phylogenetic inferences and perspectives for genetic mapping

    USDA-ARS?s Scientific Manuscript database

    A bacterial artificial chromosome (BAC) library and BAC-end sequences for Gossypium hirsutum L. have recently been developed. Here we report on genomic-based genome-wide SNP mining utilizing re-sequencing data with a BAC-end sequence reference for twelve G. hirsutum L. lines, one G. barbadense L. li...

  7. A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic

    PubMed Central

    Madsen, Bo Eskerod; Browning, Sharon R.

    2009-01-01

    Resequencing is an emerging tool for identification of rare disease-associated mutations. Rare mutations are difficult to tag with SNP genotyping, as genotyping studies are designed to detect common variants. However, studies have shown that genetic heterogeneity is a probable scenario for common diseases, in which multiple rare mutations together explain a large proportion of the genetic basis for the disease. Thus, we propose a weighted-sum method to jointly analyse a group of mutations in order to test for groupwise association with disease status. For example, such a group of mutations may result from resequencing a gene. We compare the proposed weighted-sum method to alternative methods and show that it is powerful for identifying disease-associated genes, both on simulated and Encode data. Using the weighted-sum method, a resequencing study can identify a disease-associated gene with an overall population attributable risk (PAR) of 2%, even when each individual mutation has much lower PAR, using 1,000 to 7,000 affected and unaffected individuals, depending on the underlying genetic model. This study thus demonstrates that resequencing studies can identify important genetic associations, provided that specialised analysis methods, such as the weighted-sum method, are used. PMID:19214210

  8. BAC-End Sequence-Based SNP Mining in Allotetraploid Cotton (Gossypium) Utilizing Resequencing Data, Phylogenetic Inferences, and Perspectives for Genetic Mapping

    PubMed Central

    Hulse-Kemp, Amanda M.; Ashrafi, Hamid; Stoffel, Kevin; Zheng, Xiuting; Saski, Christopher A.; Scheffler, Brian E.; Fang, David D.; Chen, Z. Jeffrey; Van Deynze, Allen; Stelly, David M.

    2015-01-01

    A bacterial artificial chromosome library and BAC-end sequences for cultivated cotton (Gossypium hirsutum L.) have recently been developed. This report presents genome-wide single nucleotide polymorphism (SNP) mining utilizing resequencing data with BAC-end sequences as a reference by alignment of 12 G. hirsutum L. lines, one G. barbadense L. line, and one G. longicalyx Hutch and Lee line. A total of 132,262 intraspecific SNPs have been developed for G. hirsutum, whereas 223,138 and 470,631 interspecific SNPs have been developed for G. barbadense and G. longicalyx, respectively. Using a set of interspecific SNPs, 11 randomly selected and 77 SNPs that are putatively associated with the homeologous chromosome pair 12 and 26, we mapped 77 SNPs into two linkage groups representing these chromosomes, spanning a total of 236.2 cM in an interspecific F2 population (G. barbadense 3-79 × G. hirsutum TM-1). The mapping results validated the approach for reliably producing large numbers of both intraspecific and interspecific SNPs aligned to BAC-ends. This will allow for future construction of high-density integrated physical and genetic maps for cotton and other complex polyploid genomes. The methods developed will allow for future Gossypium resequencing data to be automatically genotyped for identified SNPs along the BAC-end sequence reference for anchoring sequence assemblies and comparative studies. PMID:25858960

  9. Development of cleaved amplified polymorphic sequence markers and a CAPS-based genetic linkage map in watermelon (Citrullus lanatus [Thunb.] Matsum. and Nakai) constructed using whole-genome re-sequencing data

    PubMed Central

    Liu, Shi; Gao, Peng; Zhu, Qianglong; Luan, Feishi; Davis, Angela R.; Wang, Xiaolu

    2016-01-01

    Cleaved amplified polymorphic sequence (CAPS) markers are useful tools for detecting single nucleotide polymorphisms (SNPs). This study detected and converted SNP sites into CAPS markers based on high-throughput re-sequencing data in watermelon, for linkage map construction and quantitative trait locus (QTL) analysis. Two inbred lines, Cream of Saskatchewan (COS) and LSW-177 had been re-sequenced and analyzed by Perl self-compiled script for CAPS marker development. 88.7% and 78.5% of the assembled sequences of the two parental materials could map to the reference watermelon genome, respectively. Comparative assembled genome data analysis provided 225,693 and 19,268 SNPs and indels between the two materials. 532 pairs of CAPS markers were designed with 16 restriction enzymes, among which 271 pairs of primers gave distinct bands of the expected length and polymorphic bands, via PCR and enzyme digestion, with a polymorphic rate of 50.94%. Using the new CAPS markers, an initial CAPS-based genetic linkage map was constructed with the F2 population, spanning 1836.51 cM with 11 linkage groups and 301 markers. 12 QTLs were detected related to fruit flesh color, length, width, shape index, and brix content. These newly CAPS markers will be a valuable resource for breeding programs and genetic studies of watermelon. PMID:27162496

  10. Next generation diagnostics of cystic fibrosis and CFTR-related disorders by targeted multiplex high-coverage resequencing of CFTR.

    PubMed

    Trujillano, D; Ramos, M D; González, J; Tornador, C; Sotillo, F; Escaramis, G; Ossowski, S; Armengol, L; Casals, T; Estivill, X

    2013-07-01

    Here we have developed a novel and much more efficient strategy for the complete molecular characterisation of the cystic fibrosis (CF) transmembrane regulator (CFTR) gene, based on multiplexed targeted resequencing. We have tested this approach in a cohort of 92 samples with previously characterised CFTR mutations and polymorphisms. After enrichment of the pooled barcoded DNA libraries with a custom NimbleGen SeqCap EZ Choice array (Roche) and sequencing with a HiSeq2000 (Illumina) sequencer, we applied several bioinformatics tools to call mutations and polymorphisms in CFTR. The combination of several bioinformatics tools allowed us to detect all known pathogenic variants (point mutations, short insertions/deletions, and large genomic rearrangements) and polymorphisms (including the poly-T and poly-thymidine-guanine polymorphic tracts) in the 92 samples. In addition, we report the precise characterisation of the breakpoints of seven genomic rearrangements in CFTR, including those of a novel deletion of exon 22 and a complex 85 kb inversion which includes two large deletions affecting exons 4-8 and 12-21, respectively. This work is a proof-of-principle that targeted resequencing is an accurate and cost-effective approach for the genetic testing of CF and CFTR-related disorders (ie, male infertility) amenable to the routine clinical practice, and ready to substitute classical molecular methods in medical genetics.

  11. High-density genetic map using whole-genome re-sequencing for fine mapping and candidate gene discovery for disease resistance in peanut

    USDA-ARS?s Scientific Manuscript database

    High-density genetic linkage maps are essential for fine mapping QTLs controlling disease resistance traits, such as early leaf spot (ELS), late leaf spot (LLS), and Tomato spotted wilt virus (TSWV). With completion of the genome sequences of two diploid ancestors of cultivated peanut, we could use ...

  12. Canonical single nucleotide polymorphisms (SNPs) for high-resolution subtyping of Shiga-toxin producing Escherichia coli (STEC) O157:H7

    USDA-ARS?s Scientific Manuscript database

    The objective of this study was to develop a canonical SNP panel for subtyping of Shiga-toxin producing Escherichia coli (STEC). To this purpose, 906 putative SNPs were identified using resequencing tiling arrays. A subset of 391 SNPs was further screened using high-throughput TaqMan PCR against a d...

  13. High-Throughput resequencing of maize landraces at genomic regions associated with flowering time

    USDA-ARS?s Scientific Manuscript database

    Despite the reduction in the price of sequencing, it remains expensive to sequence and assemble whole, complex genomes of multiple samples for population studies, particularly for large genomes like those of many crop species. Enrichment of target genome regions coupled with next generation sequenci...

  14. High-throughput interpretation of gene structure changes in human and nonhuman resequencing data, using ACE

    USDA-ARS?s Scientific Manuscript database

    We describe a suite of software tools for identifying possible functional changes in gene structure that may result from sequence variants. ACE (“Assessing Changes to Exons”) converts phased genotype calls to a collection of explicit haplotype sequences, maps transcript annotations onto them, detect...

  15. A new single-nucleotide polymorphisms database for rainbow trout generated through whole genome resequencing of selected samples

    USDA-ARS?s Scientific Manuscript database

    Single-nucleotide polymorphisms (SNPs) are highly abundant markers, which are broadly distributed in animal genomes. For rainbow trout, SNP discovery has been done through sequencing of restriction-site associated DNA (RAD) libraries, reduced representation libraries (RRL), RNA sequencing, and whole...

  16. A comparative analysis of exome capture.

    PubMed

    Parla, Jennifer S; Iossifov, Ivan; Grabill, Ian; Spector, Mona S; Kramer, Melissa; McCombie, W Richard

    2011-09-29

    Human exome resequencing using commercial target capture kits has been and is being used for sequencing large numbers of individuals to search for variants associated with various human diseases. We rigorously evaluated the capabilities of two solution exome capture kits. These analyses help clarify the strengths and limitations of those data as well as systematically identify variables that should be considered in the use of those data. Each exome kit performed well at capturing the targets they were designed to capture, which mainly corresponds to the consensus coding sequences (CCDS) annotations of the human genome. In addition, based on their respective targets, each capture kit coupled with high coverage Illumina sequencing produced highly accurate nucleotide calls. However, other databases, such as the Reference Sequence collection (RefSeq), define the exome more broadly, and so not surprisingly, the exome kits did not capture these additional regions. Commercial exome capture kits provide a very efficient way to sequence select areas of the genome at very high accuracy. Here we provide the data to help guide critical analyses of sequencing data derived from these products.

  17. Smallpox virus resequencing GeneChips can also rapidly ascertain species status for some zoonotic non-variola orthopoxviruses.

    PubMed

    Sulaiman, Irshad M; Sammons, Scott A; Wohlhueter, Robert M

    2008-04-01

    We recently developed a set of seven resequencing GeneChips for the rapid sequencing of Variola virus strains in the WHO Repository of the Centers for Disease Control and Prevention. In this study, we attempted to hybridize these GeneChips with some known non-Variola orthopoxvirus isolates, including monkeypox, cowpox, and vaccinia viruses, for rapid detection.

  18. Application of whole genome re-sequencing data in the development of diagnostic DNA markers tightly linked to a disease-resistance locus for marker-assisted selection in lupin (Lupinus angustifolius).

    PubMed

    Yang, Huaan; Jian, Jianbo; Li, Xuan; Renshaw, Daniel; Clements, Jonathan; Sweetingham, Mark W; Tan, Cong; Li, Chengdao

    2015-09-02

    Molecular marker-assisted breeding provides an efficient tool to develop improved crop varieties. A major challenge for the broad application of markers in marker-assisted selection is that the marker phenotypes must match plant phenotypes in a wide range of breeding germplasm. In this study, we used the legume crop species Lupinus angustifolius (lupin) to demonstrate the utility of whole genome sequencing and re-sequencing on the development of diagnostic markers for molecular plant breeding. Nine lupin cultivars released in Australia from 1973 to 2007 were subjected to whole genome re-sequencing. The re-sequencing data together with the reference genome sequence data were used in marker development, which revealed 180,596 to 795,735 SNP markers from pairwise comparisons among the cultivars. A total of 207,887 markers were anchored on the lupin genetic linkage map. Marker mining obtained an average of 387 SNP markers and 87 InDel markers for each of the 24 genome sequence assembly scaffolds bearing markers linked to 11 genes of agronomic interest. Using the R gene PhtjR conferring resistance to phomopsis stem blight disease as a test case, we discovered 17 candidate diagnostic markers by genotyping and selecting markers on a genetic linkage map. A further 243 candidate diagnostic markers were discovered by marker mining on a scaffold bearing non-diagnostic markers linked to the PhtjR gene. Nine out from the ten tested candidate diagnostic markers were confirmed as truly diagnostic on a broad range of commercial cultivars. Markers developed using these strategies meet the requirements for broad application in molecular plant breeding. We demonstrated that low-cost genome sequencing and re-sequencing data were sufficient and very effective in the development of diagnostic markers for marker-assisted selection. The strategies used in this study may be applied to any trait or plant species. Whole genome sequencing and re-sequencing provides a powerful tool to overcome current limitations in molecular plant breeding, which will enable plant breeders to precisely pyramid favourable genes to develop super crop varieties to meet future food demands.

  19. Novel variants in human and monkey CETP.

    PubMed

    Lloyd, David B; Reynolds, Jennifer M; Cronan, Melissa T; Williams, Suzanne P; Lira, Maruja E; Wood, Linda S; Knight, Delvin R; Thompson, John F

    2005-10-15

    Variation in CETP has been shown to play an important role in HDL-C levels and cardiovascular disease. To better characterize this variation, the promoter and exonic DNA for CETP was resequenced in 189 individuals with extreme HDL-C or age. Two novel amino acid variants were found in humans (V-12D and Y361C) and an additional variant (R137W) not previously studied in vitro were expressed. D-12 was not secreted and had no detectable activity in cells. C361 and W137 retained near normal amounts of cholesteryl ester transfer activity when purified but were less well secreted than wild type. Torcetrapib, a CETP inhibitor in clinical development with atorvastatin, was found to have a uniform effect on inhibition of wild type CETP versus W137 or C361. In addition, the level of variation in other species was assessed by resequencing DNA from nine cynomolgus monkeys. Numerous intronic and silent SNPs were found as well as two variable amino acids. The amino acid altering SNPs were genotyped in 29 monkeys and not found to be significantly associated with HDL-C levels. Three SNPs found in monkeys were identical to three found in humans with these SNPs all occurring at CpG sites.

  20. Rapid genotyping by low-coverage resequencing to construct genetic linkage maps of fungi: a case study in Lentinula edodes

    PubMed Central

    2013-01-01

    Background Genetic linkage maps are important tools in breeding programmes and quantitative trait analyses. Traditional molecular markers used for genotyping are limited in throughput and efficiency. The advent of next-generation sequencing technologies has facilitated progeny genotyping and genetic linkage map construction in the major grains. However, the applicability of the approach remains untested in the fungal system. Findings Shiitake mushroom, Lentinula edodes, is a basidiomycetous fungus that represents one of the most popular cultivated edible mushrooms. Here, we developed a rapid genotyping method based on low-coverage (~0.5 to 1.5-fold) whole-genome resequencing. We used the approach to genotype 20 single-spore isolates derived from L. edodes strain L54 and constructed the first high-density sequence-based genetic linkage map of L. edodes. The accuracy of the proposed genotyping method was verified experimentally with results from mating compatibility tests and PCR-single-strand conformation polymorphism on a few known genes. The linkage map spanned a total genetic distance of 637.1 cM and contained 13 linkage groups. Two hundred sequence-based markers were placed on the map, with an average marker spacing of 3.4 cM. The accuracy of the map was confirmed by comparing with previous maps the locations of known genes such as matA and matB. Conclusions We used the shiitake mushroom as an example to provide a proof-of-principle that low-coverage resequencing could allow rapid genotyping of basidiospore-derived progenies, which could in turn facilitate the construction of high-density genetic linkage maps of basidiomycetous fungi for quantitative trait analyses and improvement of genome assembly. PMID:23915543

  1. Genome-wide resequencing of KRICE_CORE reveals their potential for future breeding, as well as functional and evolutionary studies in the post-genomic era.

    PubMed

    Kim, Tae-Sung; He, Qiang; Kim, Kyu-Won; Yoon, Min-Young; Ra, Won-Hee; Li, Feng Peng; Tong, Wei; Yu, Jie; Oo, Win Htet; Choi, Buung; Heo, Eun-Beom; Yun, Byoung-Kook; Kwon, Soon-Jae; Kwon, Soon-Wook; Cho, Yoo-Hyun; Lee, Chang-Yong; Park, Beom-Seok; Park, Yong-Jin

    2016-05-26

    Rice germplasm collections continue to grow in number and size around the world. Since maintaining and screening such massive resources remains challenging, it is important to establish practical methods to manage them. A core collection, by definition, refers to a subset of the entire population that preserves the majority of genetic diversity, enhancing the efficiency of germplasm utilization. Here, we report whole-genome resequencing of the 137 rice mini core collection or Korean rice core set (KRICE_CORE) that represents 25,604 rice germplasms deposited in the Korean genebank of the Rural Development Administration (RDA). We implemented the Illumina HiSeq 2000 and 2500 platform to produce short reads and then assembled those with 9.8 depths using Nipponbare as a reference. Comparisons of the sequences with the reference genome yielded more than 15 million (M) single nucleotide polymorphisms (SNPs) and 1.3 M INDELs. Phylogenetic and population analyses using 2,046,529 high-quality SNPs successfully assigned rice accessions to the relevant rice subgroups, suggesting that these SNPs capture evolutionary signatures that have accumulated in rice subpopulations. Furthermore, genome-wide association studies (GWAS) for four exemplary agronomic traits in the KRIC_CORE manifest the utility of KRICE_CORE; that is, identifying previously defined genes or novel genetic factors that potentially regulate important phenotypes. This study provides strong evidence that the size of KRICE_CORE is small but contains high genetic and functional diversity across the genome. Thus, our resequencing results will be useful for future breeding, as well as functional and evolutionary studies, in the post-genomic era.

  2. Monitoring therapy responses at the leukemic subclone level by ultra-deep amplicon resequencing in acute myeloid leukemia.

    PubMed

    Ojamies, P N; Kontro, M; Edgren, H; Ellonen, P; Lagström, S; Almusa, H; Miettinen, T; Eldfors, S; Tamborero, D; Wennerberg, K; Heckman, C; Porkka, K; Wolf, M; Kallioniemi, O

    2017-05-01

    In our individualized systems medicine program, personalized treatment options are identified and administered to chemorefractory acute myeloid leukemia (AML) patients based on exome sequencing and ex vivo drug sensitivity and resistance testing data. Here, we analyzed how clonal heterogeneity affects the responses of 13 AML patients to chemotherapy or targeted treatments using ultra-deep (average 68 000 × coverage) amplicon resequencing. Using amplicon resequencing, we identified 16 variants from 4 patients (frequency 0.54-2%) that were not detected previously by exome sequencing. A correlation-based method was developed to detect mutation-specific responses in serial samples across multiple time points. Significant subclone-specific responses were observed for both chemotherapy and targeted therapy. We detected subclonal responses in patients where clinical European LeukemiaNet (ELN) criteria showed no response. Subclonal responses also helped to identify putative mechanisms underlying drug sensitivities, such as sensitivity to azacitidine in DNMT3A mutated cell clones and resistance to cytarabine in a subclone with loss of NF1 gene. In summary, ultra-deep amplicon resequencing method enables sensitive quantification of subclonal variants and their responses to therapies. This approach provides new opportunities for designing combinatorial therapies blocking multiple subclones as well as for real-time assessment of such treatments.

  3. LipidSeq: a next-generation clinical resequencing panel for monogenic dyslipidemias.

    PubMed

    Johansen, Christopher T; Dubé, Joseph B; Loyzer, Melissa N; MacDonald, Austin; Carter, David E; McIntyre, Adam D; Cao, Henian; Wang, Jian; Robinson, John F; Hegele, Robert A

    2014-04-01

    We report the design of a targeted resequencing panel for monogenic dyslipidemias, LipidSeq, for the purpose of replacing Sanger sequencing in the clinical detection of dyslipidemia-causing variants. We also evaluate the performance of the LipidSeq approach versus Sanger sequencing in 84 patients with a range of phenotypes including extreme blood lipid concentrations as well as additional dyslipidemias and related metabolic disorders. The panel performs well, with high concordance (95.2%) in samples with known mutations based on Sanger sequencing and a high detection rate (57.9%) of mutations likely to be causative for disease in samples not previously sequenced. Clinical implementation of LipidSeq has the potential to aid in the molecular diagnosis of patients with monogenic dyslipidemias with a high degree of speed and accuracy and at lower cost than either Sanger sequencing or whole exome sequencing. Furthermore, LipidSeq will help to provide a more focused picture of monogenic and polygenic contributors that underlie dyslipidemia while excluding the discovery of incidental pathogenic clinically actionable variants in nonmetabolism-related genes, such as oncogenes, that would otherwise be identified by a whole exome approach, thus minimizing potential ethical issues.

  4. LipidSeq: a next-generation clinical resequencing panel for monogenic dyslipidemias[S

    PubMed Central

    Johansen, Christopher T.; Dubé, Joseph B.; Loyzer, Melissa N.; MacDonald, Austin; Carter, David E.; McIntyre, Adam D.; Cao, Henian; Wang, Jian; Robinson, John F.; Hegele, Robert A.

    2014-01-01

    We report the design of a targeted resequencing panel for monogenic dyslipidemias, LipidSeq, for the purpose of replacing Sanger sequencing in the clinical detection of dyslipidemia-causing variants. We also evaluate the performance of the LipidSeq approach versus Sanger sequencing in 84 patients with a range of phenotypes including extreme blood lipid concentrations as well as additional dyslipidemias and related metabolic disorders. The panel performs well, with high concordance (95.2%) in samples with known mutations based on Sanger sequencing and a high detection rate (57.9%) of mutations likely to be causative for disease in samples not previously sequenced. Clinical implementation of LipidSeq has the potential to aid in the molecular diagnosis of patients with monogenic dyslipidemias with a high degree of speed and accuracy and at lower cost than either Sanger sequencing or whole exome sequencing. Furthermore, LipidSeq will help to provide a more focused picture of monogenic and polygenic contributors that underlie dyslipidemia while excluding the discovery of incidental pathogenic clinically actionable variants in nonmetabolism-related genes, such as oncogenes, that would otherwise be identified by a whole exome approach, thus minimizing potential ethical issues. PMID:24503134

  5. Population Structure and Domestication Revealed by High-Depth Resequencing of Korean Cultivated and Wild Soybean Genomes†

    PubMed Central

    Chung, Won-Hyong; Jeong, Namhee; Kim, Jiwoong; Lee, Woo Kyu; Lee, Yun-Gyeong; Lee, Sang-Heon; Yoon, Woongchang; Kim, Jin-Hyun; Choi, Ik-Young; Choi, Hong-Kyu; Moon, Jung-Kyung; Kim, Namshin; Jeong, Soon-Chun

    2014-01-01

    Despite the importance of soybean as a major crop, genome-wide variation and evolution of cultivated soybeans are largely unknown. Here, we catalogued genome variation in an annual soybean population by high-depth resequencing of 10 cultivated and 6 wild accessions and obtained 3.87 million high-quality single-nucleotide polymorphisms (SNPs) after excluding the sites with missing data in any accession. Nuclear genome phylogeny supported a single origin for the cultivated soybeans. We identified 10-fold longer linkage disequilibrium (LD) in the wild soybean relative to wild maize and rice. Despite the small population size, the long LD and large SNP data allowed us to identify 206 candidate domestication regions with significantly lower diversity in the cultivated, but not in the wild, soybeans. Some of the genes in these candidate regions were associated with soybean homologues of canonical domestication genes. However, several examples, which are likely specific to soybean or eudicot crop plants, were also observed. Consequently, the variation data identified in this study should be valuable for breeding and for identifying agronomically important genes in soybeans. However, the long LD of wild soybeans may hinder pinpointing causal gene(s) in the candidate regions. PMID:24271940

  6. BAC-End Sequence-Based SNP Mining in Allotetraploid Cotton (Gossypium) Utilizing Resequencing Data, Phylogenetic Inferences, and Perspectives for Genetic Mapping.

    PubMed

    Hulse-Kemp, Amanda M; Ashrafi, Hamid; Stoffel, Kevin; Zheng, Xiuting; Saski, Christopher A; Scheffler, Brian E; Fang, David D; Chen, Z Jeffrey; Van Deynze, Allen; Stelly, David M

    2015-04-09

    A bacterial artificial chromosome library and BAC-end sequences for cultivated cotton (Gossypium hirsutum L.) have recently been developed. This report presents genome-wide single nucleotide polymorphism (SNP) mining utilizing resequencing data with BAC-end sequences as a reference by alignment of 12 G. hirsutum L. lines, one G. barbadense L. line, and one G. longicalyx Hutch and Lee line. A total of 132,262 intraspecific SNPs have been developed for G. hirsutum, whereas 223,138 and 470,631 interspecific SNPs have been developed for G. barbadense and G. longicalyx, respectively. Using a set of interspecific SNPs, 11 randomly selected and 77 SNPs that are putatively associated with the homeologous chromosome pair 12 and 26, we mapped 77 SNPs into two linkage groups representing these chromosomes, spanning a total of 236.2 cM in an interspecific F2 population (G. barbadense 3-79 × G. hirsutum TM-1). The mapping results validated the approach for reliably producing large numbers of both intraspecific and interspecific SNPs aligned to BAC-ends. This will allow for future construction of high-density integrated physical and genetic maps for cotton and other complex polyploid genomes. The methods developed will allow for future Gossypium resequencing data to be automatically genotyped for identified SNPs along the BAC-end sequence reference for anchoring sequence assemblies and comparative studies. Copyright © 2015 Hulse-Kemp et al.

  7. A survey of genome-wide single nucleotide polymorphisms through genome resequencing in the Périgord black truffle (Tuber melanosporum Vittad.).

    PubMed

    Payen, Thibaut; Murat, Claude; Gigant, Anaïs; Morin, Emmanuelle; De Mita, Stéphane; Martin, Francis

    2015-09-01

    The Périgord black truffle (Tuber melanosporum Vittad.), considered a gastronomic delicacy worldwide, is an ectomycorrhizal filamentous fungus that is ecologically important in Mediterranean French, Italian and Spanish woodlands. In this study, we developed a novel resource of single nucleotide polymorphisms (SNPs) for T. melanosporum using Illumina high-throughput resequencing. The genome from six T. melanosporum geographical accessions was sequenced to a depth of approximately 20×. These geographical accessions were selected from different populations within the northern and southern regions of the geographical species distribution. Approximately 80% of the reads for each of the six resequenced geographical accessions mapped against the reference T. melanosporum genome assembly, estimating the core genome size of this organism to be approximately 110 Mbp. A total of 442 326 SNPs corresponding to 3540 SNPs/Mbps were identified as being included in all seven genomes. The SNPs occurred more frequently in repeated sequences (85%), although 4501 SNPs were also identified in the coding regions of 2587 genes. Using the ratio of nonsynonymous mutations per nonsynonymous site (pN) to synonymous mutations per synonymous site (pS) and Tajima's D index scanning the whole genome, we were able to identify genomic regions and genes potentially subjected to positive or purifying selection. The SNPs identified represent a valuable resource for future population genetics and genomics studies. © 2015 John Wiley & Sons Ltd.

  8. pyAmpli: an amplicon-based variant filter pipeline for targeted resequencing data.

    PubMed

    Beyens, Matthias; Boeckx, Nele; Van Camp, Guy; Op de Beeck, Ken; Vandeweyer, Geert

    2017-12-14

    Haloplex targeted resequencing is a popular method to analyze both germline and somatic variants in gene panels. However, involved wet-lab procedures may introduce false positives that need to be considered in subsequent data-analysis. No variant filtering rationale addressing amplicon enrichment related systematic errors, in the form of an all-in-one package, exists to our knowledge. We present pyAmpli, a platform independent parallelized Python package that implements an amplicon-based germline and somatic variant filtering strategy for Haloplex data. pyAmpli can filter variants for systematic errors by user pre-defined criteria. We show that pyAmpli significantly increases specificity, without reducing sensitivity, essential for reporting true positive clinical relevant mutations in gene panel data. pyAmpli is an easy-to-use software tool which increases the true positive variant call rate in targeted resequencing data. It specifically reduces errors related to PCR-based enrichment of targeted regions.

  9. Whole exome resequencing distinguishes cystic kidney diseases from phenocopies in renal ciliopathies

    PubMed Central

    Gee, Heon Yung; Otto, Edgar A.; Hurd, Toby W.; Ashraf, Shazia; Chaki, Moumita; Cluckey, Andrew; Vega-Warner, Virginia; Saisawat, Pawaree; Diaz, Katrina A.; Fang, Humphrey; Kohl, Stefan; Allen, Susan J.; Airik, Rannar; Zhou, Weibin; Ramaswami, Gokul; Janssen, Sabine; Fu, Clementine; Innis, Jamie L.; Weber, Stefanie; Vester, Udo; Davis, Erica E.; Katsanis, Nicholas; Fathy, Hanan M.; Jeck, Nikola; Klaus, Gunther; Nayir, Ahmet; Rahim, Khawla A.; Attrach, Ibrahim Al; Hassoun, Ibrahim Al; Ozturk, Savas; Drozdz, Dorota; Helmchen, Udo; O’Toole, John F.; Attanasio, Massimo; Nürnberg, Gudrun; Nürnberg, Peter; Washburn, Joseph; MacDonald, James; James, Jeffrey W.; Levy, Shawn; Hildebrandt, Friedhelm

    2013-01-01

    Rare single-gene disorders cause chronic disease. However, half of the 6,000 recessive single gene causes of disease are still unknown. Because recessive disease genes can illuminate, at least in part, disease pathomechanism, their identification offers direct opportunities for improved clinical management and potentially treatment. Rare diseases comprise the majority of chronic kidney disease (CKD) in children but are notoriously difficult to diagnose. Whole exome resequencing facilitates identification of recessive disease genes. However, its utility is impeded by the large number of genetic variants detected. We here overcome this limitation by combining homozygosity mapping with whole exome resequencing in 10 sib pairs with a nephronophthisis-related ciliopathy, which represents the most frequent genetic cause of CKD in the first three decades of life. In 7 of 10 sib-ships with a histologic or ultrasonographic diagnosis of nephronophthisis-related ciliopathy we detect the causative gene. In six sib-ships we identify mutations of known nephronophthisis-related ciliopathy genes, while in two additional sib-ships we found mutations in the known CKD-causing genes SLC4A1 and AGXT as phenocopies of nephronophthisis-related ciliopathy. Thus whole exome resequencing establishes an efficient, non-invasive approach towards early detection and causation-based diagnosis of rare kidney diseases. This approach can be extended to other rare recessive disorders, thereby providing accurate diagnosis and facilitating the study of disease mechanisms. PMID:24257694

  10. Whole-exome resequencing distinguishes cystic kidney diseases from phenocopies in renal ciliopathies.

    PubMed

    Gee, Heon Yung; Otto, Edgar A; Hurd, Toby W; Ashraf, Shazia; Chaki, Moumita; Cluckey, Andrew; Vega-Warner, Virginia; Saisawat, Pawaree; Diaz, Katrina A; Fang, Humphrey; Kohl, Stefan; Allen, Susan J; Airik, Rannar; Zhou, Weibin; Ramaswami, Gokul; Janssen, Sabine; Fu, Clementine; Innis, Jamie L; Weber, Stefanie; Vester, Udo; Davis, Erica E; Katsanis, Nicholas; Fathy, Hanan M; Jeck, Nikola; Klaus, Gunther; Nayir, Ahmet; Rahim, Khawla A; Al Attrach, Ibrahim; Al Hassoun, Ibrahim; Ozturk, Savas; Drozdz, Dorota; Helmchen, Udo; O'Toole, John F; Attanasio, Massimo; Lewis, Richard A; Nürnberg, Gudrun; Nürnberg, Peter; Washburn, Joseph; MacDonald, James; Innis, Jeffrey W; Levy, Shawn; Hildebrandt, Friedhelm

    2014-04-01

    Rare single-gene disorders cause chronic disease. However, half of the 6000 recessive single gene causes of disease are still unknown. Because recessive disease genes can illuminate, at least in part, disease pathomechanism, their identification offers direct opportunities for improved clinical management and potentially treatment. Rare diseases comprise the majority of chronic kidney disease (CKD) in children but are notoriously difficult to diagnose. Whole-exome resequencing facilitates identification of recessive disease genes. However, its utility is impeded by the large number of genetic variants detected. We here overcome this limitation by combining homozygosity mapping with whole-exome resequencing in 10 sib pairs with a nephronophthisis-related ciliopathy, which represents the most frequent genetic cause of CKD in the first three decades of life. In 7 of 10 sibships with a histologic or ultrasonographic diagnosis of nephronophthisis-related ciliopathy, we detect the causative gene. In six sibships, we identify mutations of known nephronophthisis-related ciliopathy genes, while in two additional sibships we found mutations in the known CKD-causing genes SLC4A1 and AGXT as phenocopies of nephronophthisis-related ciliopathy. Thus, whole-exome resequencing establishes an efficient, noninvasive approach towards early detection and causation-based diagnosis of rare kidney diseases. This approach can be extended to other rare recessive disorders, thereby providing accurate diagnosis and facilitating the study of disease mechanisms.

  11. A re-sequencing based assessment of genomic heterogeneity and fast neutron-induced deletions in a common bean cultivar

    USDA-ARS?s Scientific Manuscript database

    A small fast neutron mutant population has been established from Phaseolus vulgaris cv. Red Hawk. We leveraged the available P. vulgaris genome sequence and high throughput next generation DNA sequencing to examine the genomic structure of five Phaseolus vulgaris cv. Red Hawk fast neutron mutants wi...

  12. PGen: large-scale genomic variations analysis workflow and browser in SoyKB.

    PubMed

    Liu, Yang; Khan, Saad M; Wang, Juexin; Rynge, Mats; Zhang, Yuanxun; Zeng, Shuai; Chen, Shiyuan; Maldonado Dos Santos, Joao V; Valliyodan, Babu; Calyam, Prasad P; Merchant, Nirav; Nguyen, Henry T; Xu, Dong; Joshi, Trupti

    2016-10-06

    With the advances in next-generation sequencing (NGS) technology and significant reductions in sequencing costs, it is now possible to sequence large collections of germplasm in crops for detecting genome-scale genetic variations and to apply the knowledge towards improvements in traits. To efficiently facilitate large-scale NGS resequencing data analysis of genomic variations, we have developed "PGen", an integrated and optimized workflow using the Extreme Science and Engineering Discovery Environment (XSEDE) high-performance computing (HPC) virtual system, iPlant cloud data storage resources and Pegasus workflow management system (Pegasus-WMS). The workflow allows users to identify single nucleotide polymorphisms (SNPs) and insertion-deletions (indels), perform SNP annotations and conduct copy number variation analyses on multiple resequencing datasets in a user-friendly and seamless way. We have developed both a Linux version in GitHub ( https://github.com/pegasus-isi/PGen-GenomicVariations-Workflow ) and a web-based implementation of the PGen workflow integrated within the Soybean Knowledge Base (SoyKB), ( http://soykb.org/Pegasus/index.php ). Using PGen, we identified 10,218,140 single-nucleotide polymorphisms (SNPs) and 1,398,982 indels from analysis of 106 soybean lines sequenced at 15X coverage. 297,245 non-synonymous SNPs and 3330 copy number variation (CNV) regions were identified from this analysis. SNPs identified using PGen from additional soybean resequencing projects adding to 500+ soybean germplasm lines in total have been integrated. These SNPs are being utilized for trait improvement using genotype to phenotype prediction approaches developed in-house. In order to browse and access NGS data easily, we have also developed an NGS resequencing data browser ( http://soykb.org/NGS_Resequence/NGS_index.php ) within SoyKB to provide easy access to SNP and downstream analysis results for soybean researchers. PGen workflow has been optimized for the most efficient analysis of soybean data using thorough testing and validation. This research serves as an example of best practices for development of genomics data analysis workflows by integrating remote HPC resources and efficient data management with ease of use for biological users. PGen workflow can also be easily customized for analysis of data in other species.

  13. Whole Genome Complete Resequencing of Bacillus subtilis Natto by Combining Long Reads with High-Quality Short Reads

    PubMed Central

    Kamada, Mayumi; Hase, Sumitaka; Sato, Kengo; Toyoda, Atsushi; Fujiyama, Asao; Sakakibara, Yasubumi

    2014-01-01

    De novo microbial genome sequencing reached a turning point with third-generation sequencing (TGS) platforms, and several microbial genomes have been improved by TGS long reads. Bacillus subtilis natto is closely related to the laboratory standard strain B. subtilis Marburg 168, and it has a function in the production of the traditional Japanese fermented food “natto.” The B. subtilis natto BEST195 genome was previously sequenced with short reads, but it included some incomplete regions. We resequenced the BEST195 genome using a PacBio RS sequencer, and we successfully obtained a complete genome sequence from one scaffold without any gaps, and we also applied Illumina MiSeq short reads to enhance quality. Compared with the previous BEST195 draft genome and Marburg 168 genome, we found that incomplete regions in the previous genome sequence were attributed to GC-bias and repetitive sequences, and we also identified some novel genes that are found only in the new genome. PMID:25329997

  14. Rapid genome resequencing of an atoxigenic strain of Aspergillus carbonarius

    DOE PAGES

    Cabañes, F. Javier; Sanseverino, Walter; Castellá, Gemma; ...

    2015-03-13

    In microorganisms, Ion Torrent sequencing technology has been proved to be useful in whole-genome sequencing of bacterial genomes (5 Mbp). In our study, for the first time we used this technology to perform a resequencing approach in a whole fungal genome (36 Mbp), a non-ochratoxin A producing strain of Aspergillus carbonarius. Ochratoxin A (OTA) is a potent nephrotoxin which is found mainly in cereals and their products, but it also occurs in a variety of common foods and beverages. Due to the fact that this strain does not produce OTA, we focused some of the bioinformatics analyses in genes involvedmore » in OTA biosynthesis, using a reference genome of an OTA producing strain of the same species. This study revealed that in the atoxigenic strain there is a high accumulation of nonsense and missense mutations in several genes. Importantly, a two fold increase in gene mutation ratio was observed in PKS and NRPS encoding genes which are suggested to be involved in OTA biosynthesis.« less

  15. Whole genome resequencing of the human parasite Schistosoma mansoni reveals population history and effects of selection.

    PubMed

    Crellen, Thomas; Allan, Fiona; David, Sophia; Durrant, Caroline; Huckvale, Thomas; Holroyd, Nancy; Emery, Aidan M; Rollinson, David; Aanensen, David M; Berriman, Matthew; Webster, Joanne P; Cotton, James A

    2016-02-16

    Schistosoma mansoni is a parasitic fluke that infects millions of people in the developing world. This study presents the first application of population genomics to S. mansoni based on high-coverage resequencing data from 10 global isolates and an isolate of the closely-related Schistosoma rodhaini, which infects rodents. Using population genetic tests, we document genes under directional and balancing selection in S. mansoni that may facilitate adaptation to the human host. Coalescence modeling reveals the speciation of S. mansoni and S. rodhaini as 107.5-147.6KYA, a period which overlaps with the earliest archaeological evidence for fishing in Africa. Our results indicate that S. mansoni originated in East Africa and experienced a decline in effective population size 20-90KYA, before dispersing across the continent during the Holocene. In addition, we find strong evidence that S. mansoni migrated to the New World with the 16-19th Century Atlantic Slave Trade.

  16. Rapid genome resequencing of an atoxigenic strain of Aspergillus carbonarius

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cabañes, F. Javier; Sanseverino, Walter; Castellá, Gemma

    In microorganisms, Ion Torrent sequencing technology has been proved to be useful in whole-genome sequencing of bacterial genomes (5 Mbp). In our study, for the first time we used this technology to perform a resequencing approach in a whole fungal genome (36 Mbp), a non-ochratoxin A producing strain of Aspergillus carbonarius. Ochratoxin A (OTA) is a potent nephrotoxin which is found mainly in cereals and their products, but it also occurs in a variety of common foods and beverages. Due to the fact that this strain does not produce OTA, we focused some of the bioinformatics analyses in genes involvedmore » in OTA biosynthesis, using a reference genome of an OTA producing strain of the same species. This study revealed that in the atoxigenic strain there is a high accumulation of nonsense and missense mutations in several genes. Importantly, a two fold increase in gene mutation ratio was observed in PKS and NRPS encoding genes which are suggested to be involved in OTA biosynthesis.« less

  17. Quantitative genome re-sequencing defines multiple mutations conferring chloroquine resistance in rodent malaria

    PubMed Central

    2012-01-01

    Background Drug resistance in the malaria parasite Plasmodium falciparum severely compromises the treatment and control of malaria. A knowledge of the critical mutations conferring resistance to particular drugs is important in understanding modes of drug action and mechanisms of resistances. They are required to design better therapies and limit drug resistance. A mutation in the gene (pfcrt) encoding a membrane transporter has been identified as a principal determinant of chloroquine resistance in P. falciparum, but we lack a full account of higher level chloroquine resistance. Furthermore, the determinants of resistance in the other major human malaria parasite, P. vivax, are not known. To address these questions, we investigated the genetic basis of chloroquine resistance in an isogenic lineage of rodent malaria parasite P. chabaudi in which high level resistance to chloroquine has been progressively selected under laboratory conditions. Results Loci containing the critical genes were mapped by Linkage Group Selection, using a genetic cross between the high-level chloroquine-resistant mutant and a genetically distinct sensitive strain. A novel high-resolution quantitative whole-genome re-sequencing approach was used to reveal three regions of selection on chr11, chr03 and chr02 that appear progressively at increasing drug doses on three chromosomes. Whole-genome sequencing of the chloroquine-resistant parent identified just four point mutations in different genes on these chromosomes. Three mutations are located at the foci of the selection valleys and are therefore predicted to confer different levels of chloroquine resistance. The critical mutation conferring the first level of chloroquine resistance is found in aat1, a putative aminoacid transporter. Conclusions Quantitative trait loci conferring selectable phenotypes, such as drug resistance, can be mapped directly using progressive genome-wide linkage group selection. Quantitative genome-wide short-read genome resequencing can be used to reveal these signatures of drug selection at high resolution. The identities of three genes (and mutations within them) conferring different levels of chloroquine resistance generate insights regarding the genetic architecture and mechanisms of resistance to chloroquine and other drugs. Importantly, their orthologues may now be evaluated for critical or accessory roles in chloroquine resistance in human malarias P. vivax and P. falciparum. PMID:22435897

  18. High-Throughput SNP Discovery through Deep Resequencing of a Reduced Representation Library to Anchor and Orient Scaffolds in the Soybean Whole Genome Sequence

    USDA-ARS?s Scientific Manuscript database

    The soybean Consensus Map 4.0 facilitated the anchoring of 95.6% of the soybean whole genome sequence developed by the Joint Genome Institute, Department of Energy but only properly oriented 66% of the sequence scaffolds. To find additional single nucleotide polymorphism (SNP) markers for additiona...

  19. Nucleotide diversity and linkage disequilibrium in wild avocado (Persea americana Mill.).

    PubMed

    Chen, Haofeng; Morrell, Peter L; de la Cruz, Marlene; Clegg, Michael T

    2008-01-01

    Resequencing studies provide the ultimate resolution of genetic diversity because they identify all mutations in a gene that are present within the sampled individuals. We report a resequencing study of Persea americana, a subtropical tree species native to Meso- and Central America and the progenitor of cultivated avocado. The sample includes 21 wild accessions from Mexico, Costa Rica, Ecuador, and the Dominican Republic. Estimated levels of nucleotide polymorphism and linkage disequilibrium (LD) are obtained from fully resolved haplotype data from 4 nuclear loci that span 5960 nucleotide sites. Results show that, although avocado is a subtropical tree crop and a predominantly outcrossing plant, the overall level of genetic variation is not exceptionally high (nucleotide diversity at silent sites, pi(sil) = 0.0102) compared with available estimates from temperate plant species. Intralocus LD decays rapidly to half the initial value within about 1 kb. Estimates of recombination rate (based on the sequence data) show that the rate is not exceptionally high when compared with annual plants such as wild barley or maize. Interlocus LD is significant owing to substantial population structure induced by mixing of the 3 botanical races of avocado.

  20. High-throughput discovery of rare human nucleotide polymorphisms by Ecotilling

    PubMed Central

    Till, Bradley J.; Zerr, Troy; Bowers, Elisabeth; Greene, Elizabeth A.; Comai, Luca; Henikoff, Steven

    2006-01-01

    Human individuals differ from one another at only ∼0.1% of nucleotide positions, but these single nucleotide differences account for most heritable phenotypic variation. Large-scale efforts to discover and genotype human variation have been limited to common polymorphisms. However, these efforts overlook rare nucleotide changes that may contribute to phenotypic diversity and genetic disorders, including cancer. Thus, there is an increasing need for high-throughput methods to robustly detect rare nucleotide differences. Toward this end, we have adapted the mismatch discovery method known as Ecotilling for the discovery of human single nucleotide polymorphisms. To increase throughput and reduce costs, we developed a universal primer strategy and implemented algorithms for automated band detection. Ecotilling was validated by screening 90 human DNA samples for nucleotide changes in 5 gene targets and by comparing results to public resequencing data. To increase throughput for discovery of rare alleles, we pooled samples 8-fold and found Ecotilling to be efficient relative to resequencing, with a false negative rate of 5% and a false discovery rate of 4%. We identified 28 new rare alleles, including some that are predicted to damage protein function. The detection of rare damaging mutations has implications for models of human disease. PMID:16893952

  1. Whole-Genome Resequencing of Experimental Populations Reveals Polygenic Basis of Egg-Size Variation in Drosophila melanogaster

    PubMed Central

    Jha, Aashish R.; Miles, Cecelia M.; Lippert, Nodia R.; Brown, Christopher D.; White, Kevin P.; Kreitman, Martin

    2015-01-01

    Complete genome resequencing of populations holds great promise in deconstructing complex polygenic traits to elucidate molecular and developmental mechanisms of adaptation. Egg size is a classic adaptive trait in insects, birds, and other taxa, but its highly polygenic architecture has prevented high-resolution genetic analysis. We used replicated experimental evolution in Drosophila melanogaster and whole-genome sequencing to identify consistent signatures of polygenic egg-size adaptation. A generalized linear-mixed model revealed reproducible allele frequency differences between replicated experimental populations selected for large and small egg volumes at approximately 4,000 single nucleotide polymorphisms (SNPs). Several hundred distinct genomic regions contain clusters of these SNPs and have lower heterozygosity than the genomic background, consistent with selection acting on polymorphisms in these regions. These SNPs are also enriched among genes expressed in Drosophila ovaries and many of these genes have well-defined functions in Drosophila oogenesis. Additional genes regulating egg development, growth, and cell size show evidence of directional selection as genes regulating these biological processes are enriched for highly differentiated SNPs. Genetic crosses performed with a subset of candidate genes demonstrated that these genes influence egg size, at least in the large genetic background. These findings confirm the highly polygenic architecture of this adaptive trait, and suggest the involvement of many novel candidate genes in regulating egg size. PMID:26044351

  2. High-throughput multiplex cpDNA resequencing clarifies the genetic diversity and genetic relationships among Brassica napus, Brassica rapa and Brassica oleracea.

    PubMed

    Qiao, Jiangwei; Cai, Mengxian; Yan, Guixin; Wang, Nian; Li, Feng; Chen, Binyun; Gao, Guizhen; Xu, Kun; Li, Jun; Wu, Xiaoming

    2016-01-01

    Brassica napus (rapeseed) is a recent allotetraploid plant and the second most important oilseed crop worldwide. The origin of B. napus and the genetic relationships with its diploid ancestor species remain largely unresolved. Here, chloroplast DNA (cpDNA) from 488 B. napus accessions of global origin, 139 B. rapa accessions and 49 B. oleracea accessions were populationally resequenced using Illumina Solexa sequencing technologies. The intraspecific cpDNA variants and their allelic frequencies were called genomewide and further validated via EcoTILLING analyses of the rpo region. The cpDNA of the current global B. napus population comprises more than 400 variants (SNPs and short InDels) and maintains one predominant haplotype (Bncp1). Whole-genome resequencing of the cpDNA of Bncp1 haplotype eliminated its direct inheritance from any accession of the B. rapa or B. oleracea species. The distribution of the polymorphism information content (PIC) values for each variant demonstrated that B. napus has much lower cpDNA diversity than B. rapa; however, a vast majority of the wild and cultivated B. oleracea specimens appeared to share one same distinct cpDNA haplotype, in contrast to its wild C-genome relatives. This finding suggests that the cpDNA of the three Brassica species is well differentiated. The predominant B. napus cpDNA haplotype may have originated from uninvestigated relatives or from interactions between cpDNA mutations and natural/artificial selection during speciation and evolution. These exhaustive data on variation in cpDNA would provide fundamental data for research on cpDNA and chloroplasts. © 2015 Society for Experimental Biology, Association of Applied Biologists and John Wiley & Sons Ltd.

  3. Whole mitochondrial genome screening in maternally inherited non-syndromic hearing impairment using a microarray resequencing mitochondrial DNA chip.

    PubMed

    Lévêque, Marianne; Marlin, Sandrine; Jonard, Laurence; Procaccio, Vincent; Reynier, Pascal; Amati-Bonneau, Patrizia; Baulande, Sylvain; Pierron, Denis; Lacombe, Didier; Duriez, Françoise; Francannet, Christine; Mom, Thierry; Journel, Hubert; Catros, Hélène; Drouin-Garraud, Valérie; Obstoy, Marie-Françoise; Dollfus, Hélène; Eliot, Marie-Madeleine; Faivre, Laurence; Duvillard, Christian; Couderc, Remy; Garabedian, Eréa-Noël; Petit, Christine; Feldmann, Delphine; Denoyelle, Françoise

    2007-11-01

    Mitochondrial DNA (mtDNA) mutations have been implicated in non-syndromic hearing loss either as primary or as predisposing factors. As only a part of the mitochondrial genome is usually explored in deafness, its prevalence is probably under-estimated. Among 1350 families with non-syndromic sensorineural hearing loss collected through a French collaborative network, we selected 29 large families with a clear maternal lineage and screened them for known mtDNA mutations in 12S rRNA, tRNASer(UCN) and tRNALeu(UUR) genes. When no mutation could be identified, a whole mitochondrial genome screening was performed, using a microarray resequencing chip: the MitoChip version 2.0 developed by Affymetrix Inc. Known mtDNA mutations was found in nine of the 29 families, which are described in the article: five with A1555G, two with the T7511C, one with 7472insC and one with A3243G mutation. In the remaining 20 families, the resequencing Mitochip detected 258 mitochondrial homoplasmic variants and 107 potentially heteroplasmic variants. Controls were made by direct sequencing on selected fragments and showed a high sensibility of the MitoChip but a low specificity, especially for heteroplasmic variations. An original analysis on the basis of species conservation, frequency and phylogenetic investigation was performed to select the more probably pathogenic variants. The entire genome analysis allowed us to identify five additional families with a putatively pathogenic mitochondrial variant: T669C, C1537T, G8078A, G12236A and G15077A. These results indicate that the new MitoChip platform is a rapid and valuable tool for identification of new mtDNA mutations in deafness.

  4. Evaluating information content of SNPs for sample-tagging in re-sequencing projects.

    PubMed

    Hu, Hao; Liu, Xiang; Jin, Wenfei; Hilger Ropers, H; Wienker, Thomas F

    2015-05-15

    Sample-tagging is designed for identification of accidental sample mix-up, which is a major issue in re-sequencing studies. In this work, we develop a model to measure the information content of SNPs, so that we can optimize a panel of SNPs that approach the maximal information for discrimination. The analysis shows that as low as 60 optimized SNPs can differentiate the individuals in a population as large as the present world, and only 30 optimized SNPs are in practice sufficient in labeling up to 100 thousand individuals. In the simulated populations of 100 thousand individuals, the average Hamming distances, generated by the optimized set of 30 SNPs are larger than 18, and the duality frequency, is lower than 1 in 10 thousand. This strategy of sample discrimination is proved robust in large sample size and different datasets. The optimized sets of SNPs are designed for Whole Exome Sequencing, and a program is provided for SNP selection, allowing for customized SNP numbers and interested genes. The sample-tagging plan based on this framework will improve re-sequencing projects in terms of reliability and cost-effectiveness.

  5. Reduced representation approaches to interrogate genome diversity in large repetitive plant genomes.

    PubMed

    Hirsch, Cory D; Evans, Joseph; Buell, C Robin; Hirsch, Candice N

    2014-07-01

    Technology and software improvements in the last decade now provide methodologies to access the genome sequence of not only a single accession, but also multiple accessions of plant species. This provides a means to interrogate species diversity at the genome level. Ample diversity among accessions in a collection of species can be found, including single-nucleotide polymorphisms, insertions and deletions, copy number variation and presence/absence variation. For species with small, non-repetitive rich genomes, re-sequencing of query accessions is robust, highly informative, and economically feasible. However, for species with moderate to large sized repetitive-rich genomes, technical and economic barriers prevent en masse genome re-sequencing of accessions. Multiple approaches to access a focused subset of loci in species with larger genomes have been developed, including reduced representation sequencing, exome capture and transcriptome sequencing. Collectively, these approaches have enabled interrogation of diversity on a genome scale for large plant genomes, including crop species important to worldwide food security. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  6. Resequencing Pathogen Microarray (RPM) for prospective detection and identification of emergent pathogen strains and variants

    NASA Astrophysics Data System (ADS)

    Tibbetts, Clark; Lichanska, Agnieszka M.; Borsuk, Lisa A.; Weslowski, Brian; Morris, Leah M.; Lorence, Matthew C.; Schafer, Klaus O.; Campos, Joseph; Sene, Mohamadou; Myers, Christopher A.; Faix, Dennis; Blair, Patrick J.; Brown, Jason; Metzgar, David

    2010-04-01

    High-density resequencing microarrays support simultaneous detection and identification of multiple viral and bacterial pathogens. Because detection and identification using RPM is based upon multiple specimen-specific target pathogen gene sequences generated in the individual test, the test results enable both a differential diagnostic analysis and epidemiological tracking of detected pathogen strains and variants from one specimen to the next. The RPM assay enables detection and identification of pathogen sequences that share as little as 80% sequence similarity to prototype target gene sequences represented as detector tiles on the array. This capability enables the RPM to detect and identify previously unknown strains and variants of a detected pathogen, as in sentinel cases associated with an infectious disease outbreak. We illustrate this capability using assay results from testing influenza A virus vaccines configured with strains that were first defined years after the design of the RPM microarray. Results are also presented from RPM-Flu testing of three specimens independently confirmed to the positive for the 2009 Novel H1N1 outbreak strain of influenza virus.

  7. High performance interconnection between high data rate networks

    NASA Technical Reports Server (NTRS)

    Foudriat, E. C.; Maly, K.; Overstreet, C. M.; Zhang, L.; Sun, W.

    1992-01-01

    The bridge/gateway system needed to interconnect a wide range of computer networks to support a wide range of user quality-of-service requirements is discussed. The bridge/gateway must handle a wide range of message types including synchronous and asynchronous traffic, large, bursty messages, short, self-contained messages, time critical messages, etc. It is shown that messages can be classified into three basic classes, synchronous and large and small asynchronous messages. The first two require call setup so that packet identification, buffer handling, etc. can be supported in the bridge/gateway. Identification enables resequences in packet size. The third class is for messages which do not require call setup. Resequencing hardware based to handle two types of resequencing problems is presented. The first is for a virtual parallel circuit which can scramble channel bytes. The second system is effective in handling both synchronous and asynchronous traffic between networks with highly differing packet sizes and data rates. The two other major needs for the bridge/gateway are congestion and error control. A dynamic, lossless congestion control scheme which can easily support effective error correction is presented. Results indicate that the congestion control scheme provides close to optimal capacity under congested conditions. Under conditions where error may develop due to intervening networks which are not lossless, intermediate error recovery and correction takes 1/3 less time than equivalent end-to-end error correction under similar conditions.

  8. High-resolution genetic mapping of allelic variants associated with cell wall chemistry in Populus.

    PubMed

    Muchero, Wellington; Guo, Jianjun; DiFazio, Stephen P; Chen, Jin-Gui; Ranjan, Priya; Slavov, Gancho T; Gunter, Lee E; Jawdy, Sara; Bryan, Anthony C; Sykes, Robert; Ziebell, Angela; Klápště, Jaroslav; Porth, Ilga; Skyba, Oleksandr; Unda, Faride; El-Kassaby, Yousry A; Douglas, Carl J; Mansfield, Shawn D; Martin, Joel; Schackwitz, Wendy; Evans, Luke M; Czarnecki, Olaf; Tuskan, Gerald A

    2015-01-23

    QTL cloning for the discovery of genes underlying polygenic traits has historically been cumbersome in long-lived perennial plants like Populus. Linkage disequilibrium-based association mapping has been proposed as a cloning tool, and recent advances in high-throughput genotyping and whole-genome resequencing enable marker saturation to levels sufficient for association mapping with no a priori candidate gene selection. Here, multiyear and multienvironment evaluation of cell wall phenotypes was conducted in an interspecific P. trichocarpa x P. deltoides pseudo-backcross mapping pedigree and two partially overlapping populations of unrelated P. trichocarpa genotypes using pyrolysis molecular beam mass spectrometry, saccharification, and/ or traditional wet chemistry. QTL mapping was conducted using a high-density genetic map with 3,568 SNP markers. As a fine-mapping approach, chromosome-wide association mapping targeting a QTL hot-spot on linkage group XIV was performed in the two P. trichocarpa populations. Both populations were genotyped using the 34 K Populus Infinium SNP array and whole-genome resequencing of one of the populations facilitated marker-saturation of candidate intervals for gene identification. Five QTLs ranging in size from 0.6 to 1.8 Mb were mapped on linkage group XIV for lignin content, syringyl to guaiacyl (S/G) ratio, 5- and 6-carbon sugars using the mapping pedigree. Six candidate loci exhibiting significant associations with phenotypes were identified within QTL intervals. These associations were reproducible across multiple environments, two independent genotyping platforms, and different plant growth stages. cDNA sequencing for allelic variants of three of the six loci identified polymorphisms leading to variable length poly glutamine (PolyQ) stretch in a transcription factor annotated as an ANGUSTIFOLIA C-terminus Binding Protein (CtBP) and premature stop codons in a KANADI transcription factor as well as a protein kinase. Results from protoplast transient expression assays suggested that each of the polymorphisms conferred allelic differences in the activation of cellulose, hemicelluloses, and lignin pathway marker genes. This study illustrates the utility of complementary QTL and association mapping as tools for gene discovery with no a priori candidate gene selection. This proof of concept in a perennial organism opens up opportunities for discovery of novel genetic determinants of economically important but complex traits in plants.

  9. Whole Genome Re-Sequencing Identifies a Quantitative Trait Locus Repressing Carbon Reserve Accumulation during Optimal Growth in Chlamydomonas reinhardtii

    PubMed Central

    Goold, Hugh Douglas; Nguyen, Hoa Mai; Kong, Fantao; Beyly-Adriano, Audrey; Légeret, Bertrand; Billon, Emmanuelle; Cuiné, Stéphan; Beisson, Fred; Peltier, Gilles; Li-Beisson, Yonghua

    2016-01-01

    Microalgae have emerged as a promising source for biofuel production. Massive oil and starch accumulation in microalgae is possible, but occurs mostly when biomass growth is impaired. The molecular networks underlying the negative correlation between growth and reserve formation are not known. Thus isolation of strains capable of accumulating carbon reserves during optimal growth would be highly desirable. To this end, we screened an insertional mutant library of Chlamydomonas reinhardtii for alterations in oil content. A mutant accumulating five times more oil and twice more starch than wild-type during optimal growth was isolated and named constitutive oil accumulator 1 (coa1). Growth in photobioreactors under highly controlled conditions revealed that the increase in oil and starch content in coa1 was dependent on light intensity. Genetic analysis and DNA hybridization pointed to a single insertional event responsible for the phenotype. Whole genome re-sequencing identified in coa1 a >200 kb deletion on chromosome 14 containing 41 genes. This study demonstrates that, 1), the generation of algal strains accumulating higher reserve amount without compromising biomass accumulation is feasible; 2), light is an important parameter in phenotypic analysis; and 3), a chromosomal region (Quantitative Trait Locus) acts as suppressor of carbon reserve accumulation during optimal growth. PMID:27141848

  10. Whole-genome resequencing of Bacillus cereus and expression of genes functioning in sodium chloride stress.

    PubMed

    Xu, Zhenbo; Xie, Jinhong; Liu, Junyan; Ji, Lili; Soteyome, Thanapop; Peters, Brian M; Chen, Dingqiang; Li, Bing; Li, Lin; Shirtliff, Mark E

    2017-03-01

    Bacillus cereus is one of the most common opportunistic pathogens responsible for various foodborn diseases. To investigate the regulatory mechanism of B. cereus under high osmotic pressure, two B. cereus strains B25 and B26 were isolated from the industrial soy sauce residue containing high-salt concentration. Resequencing was performed by Illumina/Solexa platform and 13,646 SNPs and 434 InDels were identified as common variants between B25 and B26 against reference genome, followed by COG, GO, and KEGG enrichment analysis. Furthermore, 49 key genes involving in Na + /H + ,K + transporter, dipeptide or tripeptide transporter, stress response were selected and classified into 27 groups. Further validation was performed by qRT-PCR, and 4 candidate genes were found most associated with osmotic response. Gene expression of the 4 candidate genes was then analyzed accordingly, and down regulation was obtained for gene BC0669 and BC0754 associated with K + transport system. However, dramatic up regulation was detected for gene BC2114 involving in glutathione peroxidase, indicating the activation of antioxidant responses by osmotic stress via genetic regulation. As concluded, bioinformatic analysis and gene expression profile represented the basis of further investigation on the genetic and regulatory mechanism of bacterial salt tolerance. Copyright © 2017 Elsevier Ltd. All rights reserved.

  11. Whole-Genome Resequencing of Experimental Populations Reveals Polygenic Basis of Egg-Size Variation in Drosophila melanogaster.

    PubMed

    Jha, Aashish R; Miles, Cecelia M; Lippert, Nodia R; Brown, Christopher D; White, Kevin P; Kreitman, Martin

    2015-10-01

    Complete genome resequencing of populations holds great promise in deconstructing complex polygenic traits to elucidate molecular and developmental mechanisms of adaptation. Egg size is a classic adaptive trait in insects, birds, and other taxa, but its highly polygenic architecture has prevented high-resolution genetic analysis. We used replicated experimental evolution in Drosophila melanogaster and whole-genome sequencing to identify consistent signatures of polygenic egg-size adaptation. A generalized linear-mixed model revealed reproducible allele frequency differences between replicated experimental populations selected for large and small egg volumes at approximately 4,000 single nucleotide polymorphisms (SNPs). Several hundred distinct genomic regions contain clusters of these SNPs and have lower heterozygosity than the genomic background, consistent with selection acting on polymorphisms in these regions. These SNPs are also enriched among genes expressed in Drosophila ovaries and many of these genes have well-defined functions in Drosophila oogenesis. Additional genes regulating egg development, growth, and cell size show evidence of directional selection as genes regulating these biological processes are enriched for highly differentiated SNPs. Genetic crosses performed with a subset of candidate genes demonstrated that these genes influence egg size, at least in the large genetic background. These findings confirm the highly polygenic architecture of this adaptive trait, and suggest the involvement of many novel candidate genes in regulating egg size. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  12. Resequencing of the CETP gene in American whites and African blacks: Association of rare and common variants with HDL-cholesterol levels

    PubMed Central

    Pirim, Dilek; Wang, Xingbin; Niemsiri, Vipavee; Radwan, Zaheda H.; Bunker, Clareann H.; Hokanson, John E.; Hamman, Richard F.; Barmada, M. Michael; Demirci, F. Yesim; Kamboh, M. Ilyas

    2015-01-01

    Background Cholesteryl ester transfer protein (CETP) plays a crucial role in lipid metabolism. Associations of common CETP variants with variation in plasma lipid levels, and/or CETP mass/activity have been extensively studied and well-documented; however, the effects of uncommon/rare CETP variants on plasma lipid profile remain undefined. Hence, resequencing of the gene in extreme phenotypes and follow-up rare-variant association analyses are essential to fill this gap. Objective To identify common and uncommon/rare variants in the CETP gene by resequencing the entire gene and test the effects of both common and uncommon/rare CETP variants on plasma lipid traits in two genetically distinct populations. Methods and Results The entire CETP gene plus flanking regions were resequenced in 190 individuals comprising 95 non-Hispanic Whites (NHWs) and 95 African blacks with extreme HDL-C levels. A total of 279 sequence variants were identified, of which 25 were novel. Selected variants were genotyped in the entire samples of 623 NHWs and 788 African blacks and 184 QC-passed variants were tested in relation to plasma lipid traits by using gene-based, single-site, haplotype and rare variant association analyses (SKAT-O). Two novel and independent associations of rs1968905 and rs289740 with HDL-C were identified in African blacks. Using SKAT-O analysis, we also identified rare variants with minor allele frequency <0.01 to be associated with HDL-C in both NHWs (P=0.024) and African blacks (P=0.009). Conclusions Our results point out that in addition to the common CETP variants, rare genetic variants in the CETP gene also contribute to the phenotypic variation of HDL-C in the general population. PMID:26683795

  13. Molecular diversity and population structure at the Cytochrome P450 3A5 gene in Africa

    PubMed Central

    2013-01-01

    Background Cytochrome P450 3A5 (CYP3A5) is an enzyme involved in the metabolism of many therapeutic drugs. CYP3A5 expression levels vary between individuals and populations, and this contributes to adverse clinical outcomes. Variable expression is largely attributed to four alleles, CYP3A5*1 (expresser allele); CYP3A5*3 (rs776746), CYP3A5*6 (rs10264272) and CYP3A5*7 (rs41303343) (low/non-expresser alleles). Little is known about CYP3A5 variability in Africa, a region with considerable genetic diversity. Here we used a multi-disciplinary approach to characterize CYP3A5 variation in geographically and ethnically diverse populations from in and around Africa, and infer the evolutionary processes that have shaped patterns of diversity in this gene. We genotyped 2538 individuals from 36 diverse populations in and around Africa for common low/non-expresser CYP3A5 alleles, and re-sequenced the CYP3A5 gene in five Ethiopian ethnic groups. We estimated the ages of low/non-expresser CYP3A5 alleles using a linked microsatellite and assuming a step-wise mutation model of evolution. Finally, we examined a hypothesis that CYP3A5 is important in salt retention adaptation by performing correlations with ecological data relating to aridity for the present day, 10,000 and 50,000 years ago. Results We estimate that ~43% of individuals within our African dataset express CYP3A5, which is lower than previous independent estimates for the region. We found significant intra-African variability in CYP3A5 expression phenotypes. Within Africa the highest frequencies of high-activity alleles were observed in equatorial and Niger-Congo speaking populations. Ethiopian allele frequencies were intermediate between those of other sub-Saharan African and non-African groups. Re-sequencing of CYP3A5 identified few additional variants likely to affect CYP3A5 expression. We estimate the ages of CYP3A5*3 as ~76,400 years and CYP3A5*6 as ~218,400 years. Finally we report that global CYP3A5 expression levels correlated significantly with aridity measures for 10,000 [Spearmann’s Rho= −0.465, p=0.004] and 50,000 years ago [Spearmann’s Rho= −0.379, p=0.02]. Conclusions Significant intra-African diversity at the CYP3A5 gene is likely to contribute to multiple pharmacogenetic profiles across the continent. Significant correlations between CYP3A5 expression phenotypes and aridity data are consistent with a hypothesis that the enzyme is important in salt-retention adaptation. PMID:23641907

  14. Genome Resequencing Identifies Unique Adaptations of Tibetan Chickens to Hypoxia and High-Dose Ultraviolet Radiation in High-Altitude Environments

    PubMed Central

    Zhang, Qian; Gou, Wenyu; Wang, Xiaotong; Zhang, Yawen; Ma, Jun; Zhang, Hongliang; Zhang, Ying; Zhang, Hao

    2016-01-01

    Tibetan chicken, unlike their lowland counterparts, exhibit specific adaptations to high-altitude conditions. The genetic mechanisms of such adaptations in highland chickens were determined by resequencing the genomes of four highland (Tibetan and Lhasa White) and four lowland (White Leghorn, Lindian, and Chahua) chicken populations. Our results showed an evident genetic admixture in Tibetan chickens, suggesting a history of introgression from lowland gene pools. Genes showing positive selection in highland populations were related to cardiovascular and respiratory system development, DNA repair, response to radiation, inflammation, and immune responses, indicating a strong adaptation to oxygen scarcity and high-intensity solar radiation. The distribution of allele frequencies of nonsynonymous single nucleotide polymorphisms between highland and lowland populations was analyzed using chi-square test, which showed that several differentially distributed genes with missense mutations were enriched in several functional categories, especially in blood vessel development and adaptations to hypoxia and intense radiation. RNA sequencing revealed that several differentially expressed genes were enriched in gene ontology terms related to blood vessel and respiratory system development. Several candidate genes involved in the development of cardiorespiratory system (FGFR1, CTGF, ADAM9, JPH2, SATB1, BMP4, LOX, LPR, ANGPTL4, and HYAL1), inflammation and immune responses (AIRE, MYO1F, ZAP70, DDX60, CCL19, CD47, JSC, and FAS), DNA repair, and responses to radiation (VCP, ASH2L, and FANCG) were identified to play key roles in the adaptation to high-altitude conditions. Our data provide new insights into the unique adaptations of highland animals to extreme environments. PMID:26907498

  15. Bioinformatics Pipelines for Targeted Resequencing and Whole-Exome Sequencing of Human and Mouse Genomes: A Virtual Appliance Approach for Instant Deployment

    PubMed Central

    Saeed, Isaam; Wong, Stephen Q.; Mar, Victoria; Goode, David L.; Caramia, Franco; Doig, Ken; Ryland, Georgina L.; Thompson, Ella R.; Hunter, Sally M.; Halgamuge, Saman K.; Ellul, Jason; Dobrovic, Alexander; Campbell, Ian G.; Papenfuss, Anthony T.; McArthur, Grant A.; Tothill, Richard W.

    2014-01-01

    Targeted resequencing by massively parallel sequencing has become an effective and affordable way to survey small to large portions of the genome for genetic variation. Despite the rapid development in open source software for analysis of such data, the practical implementation of these tools through construction of sequencing analysis pipelines still remains a challenging and laborious activity, and a major hurdle for many small research and clinical laboratories. We developed TREVA (Targeted REsequencing Virtual Appliance), making pre-built pipelines immediately available as a virtual appliance. Based on virtual machine technologies, TREVA is a solution for rapid and efficient deployment of complex bioinformatics pipelines to laboratories of all sizes, enabling reproducible results. The analyses that are supported in TREVA include: somatic and germline single-nucleotide and insertion/deletion variant calling, copy number analysis, and cohort-based analyses such as pathway and significantly mutated genes analyses. TREVA is flexible and easy to use, and can be customised by Linux-based extensions if required. TREVA can also be deployed on the cloud (cloud computing), enabling instant access without investment overheads for additional hardware. TREVA is available at http://bioinformatics.petermac.org/treva/. PMID:24752294

  16. High throughput SNP discovery and genotyping in grapevine (Vitis vinifera L.) by combining a re-sequencing approach and SNPlex technology

    PubMed Central

    Lijavetzky, Diego; Cabezas, José Antonio; Ibáñez, Ana; Rodríguez, Virginia; Martínez-Zapater, José M

    2007-01-01

    Background Single-nucleotide polymorphisms (SNPs) are the most abundant type of DNA sequence polymorphisms. Their higher availability and stability when compared to simple sequence repeats (SSRs) provide enhanced possibilities for genetic and breeding applications such as cultivar identification, construction of genetic maps, the assessment of genetic diversity, the detection of genotype/phenotype associations, or marker-assisted breeding. In addition, the efficiency of these activities can be improved thanks to the ease with which SNP genotyping can be automated. Expressed sequence tags (EST) sequencing projects in grapevine are allowing for the in silico detection of multiple putative sequence polymorphisms within and among a reduced number of cultivars. In parallel, the sequence of the grapevine cultivar Pinot Noir is also providing thousands of polymorphisms present in this highly heterozygous genome. Still the general application of those SNPs requires further validation since their use could be restricted to those specific genotypes. Results In order to develop a large SNP set of wide application in grapevine we followed a systematic re-sequencing approach in a group of 11 grape genotypes corresponding to ancient unrelated cultivars as well as wild plants. Using this approach, we have sequenced 230 gene fragments, what represents the analysis of over 1 Mb of grape DNA sequence. This analysis has allowed the discovery of 1573 SNPs with an average of one SNP every 64 bp (one SNP every 47 bp in non-coding regions and every 69 bp in coding regions). Nucleotide diversity in grape (π = 0.0051) was found to be similar to values observed in highly polymorphic plant species such as maize. The average number of haplotypes per gene sequence was estimated as six, with three haplotypes representing over 83% of the analyzed sequences. Short-range linkage disequilibrium (LD) studies within the analyzed sequences indicate the existence of a rapid decay of LD within the selected grapevine genotypes. To validate the use of the detected polymorphisms in genetic mapping, cultivar identification and genetic diversity studies we have used the SNPlex™ genotyping technology in a sample of grapevine genotypes and segregating progenies. Conclusion These results provide accurate values for nucleotide diversity in coding sequences and a first estimate of short-range LD in grapevine. Using SNPlex™ genotyping we have shown the application of a set of discovered SNPs as molecular markers for cultivar identification, linkage mapping and genetic diversity studies. Thus, the combination a highly efficient re-sequencing approach and the SNPlex™ high throughput genotyping technology provide a powerful tool for grapevine genetic analysis. PMID:18021442

  17. Selection against recombinant hybrids maintains reproductive isolation in hybridizing Populus species despite F1 fertility and recurrent gene flow.

    PubMed

    Christe, Camille; Stölting, Kai N; Bresadola, Luisa; Fussi, Barbara; Heinze, Berthold; Wegmann, Daniel; Lexer, Christian

    2016-06-01

    Natural hybrid zones have proven to be precious tools for understanding the origin and maintenance of reproductive isolation (RI) and therefore species. Most available genomic studies of hybrid zones using whole- or partial-genome resequencing approaches have focused on comparisons of the parental source populations involved in genome admixture, rather than exploring fine-scale patterns of chromosomal ancestry across the full admixture gradient present between hybridizing species. We have studied three well-known European 'replicate' hybrid zones of Populus alba and P. tremula, two widespread, ecologically divergent forest trees, using up to 432 505 single-nucleotide polymorphisms (SNPs) from restriction site-associated DNA (RAD) sequencing. Estimates of fine-scale chromosomal ancestry, genomic divergence and differentiation across all 19 poplar chromosomes revealed strikingly contrasting results, including an unexpected preponderance of F1 hybrids in the centre of genomic clines on the one hand, and genomically localized, spatially variable shared variants consistent with ancient introgression between the parental species on the other. Genetic ancestry had a significant effect on survivorship of hybrid seedlings in a common garden trial, pointing to selection against early-generation recombinants. Our results indicate a role for selection against recombinant genotypes in maintaining RI in the face of apparent F1 fertility, consistent with the intragenomic 'coadaptation' model of barriers to introgression upon secondary contact. Whole-genome resequencing of hybridizing populations will clarify the roles of specific genetic pathways in RI between these model forest trees and may reveal which loci are affected most strongly by its cyclic breakdown. © 2016 John Wiley & Sons Ltd.

  18. Classification of rare missense substitutions, using risk surfaces, with genetic- and molecular-epidemiology applications.

    PubMed

    Tavtigian, Sean V; Byrnes, Graham B; Goldgar, David E; Thomas, Alun

    2008-11-01

    Many individually rare missense substitutions are encountered during deep resequencing of candidate susceptibility genes and clinical mutation screening of known susceptibility genes. BRCA1 and BRCA2 are among the most resequenced of all genes, and clinical mutation screening of these genes provides an extensive data set for analysis of rare missense substitutions. Align-GVGD is a mathematically simple missense substitution analysis algorithm, based on the Grantham difference, which has already contributed to classification of missense substitutions in BRCA1, BRCA2, and CHEK2. However, the distribution of genetic risk as a function of Align-GVGD's output variables Grantham variation (GV) and Grantham deviation (GD) has not been well characterized. Here, we used data from the Myriad Genetic Laboratories database of nearly 70,000 full-sequence tests plus two risk estimates, one approximating the odds ratio and the other reflecting strength of selection, to display the distribution of risk in the GV-GD plane as a series of surfaces. We abstracted contours from the surfaces and used the contours to define a sequence of missense substitution grades ordered from greatest risk to least risk. The grades were validated internally using a third, personal and family history-based, measure of risk. The Align-GVGD grades defined here are applicable to both the genetic epidemiology problem of classifying rare missense substitutions observed in known susceptibility genes and the molecular epidemiology problem of analyzing rare missense substitutions observed during case-control mutation screening studies of candidate susceptibility genes. (c) 2008 Wiley-Liss, Inc.

  19. Genome wide re-sequencing of newly developed Rice Lines from common wild rice (Oryza rufipogon Griff.) for the identification of NBS-LRR genes.

    PubMed

    Liu, Wen; Ghouri, Fozia; Yu, Hang; Li, Xiang; Yu, Shuhong; Shahid, Muhammad Qasim; Liu, Xiangdong

    2017-01-01

    Common wild rice (Oryza rufipogon Griff.) is an important germplasm for rice breeding, which contains many resistance genes. Re-sequencing provides an unprecedented opportunity to explore the abundant useful genes at whole genome level. Here, we identified the nucleotide-binding site leucine-rich repeat (NBS-LRR) encoding genes by re-sequencing of two wild rice lines (i.e. Huaye 1 and Huaye 2) that were developed from common wild rice. We obtained 128 to 147 million reads with approximately 32.5-fold coverage depth, and uniquely covered more than 89.6% (> = 1 fold) of reference genomes. Two wild rice lines showed high SNP (single-nucleotide polymorphisms) variation rate in 12 chromosomes against the reference genomes of Nipponbare (japonica cultivar) and 93-11 (indica cultivar). InDels (insertion/deletion polymorphisms) count-length distribution exhibited normal distribution in the two lines, and most of the InDels were ranged from -5 to 5 bp. With reference to the Nipponbare genome sequence, we detected a total of 1,209,308 SNPs, 161,117 InDels and 4,192 SVs (structural variations) in Huaye 1, and 1,387,959 SNPs, 180,226 InDels and 5,305 SVs in Huaye 2. A total of 44.9% and 46.9% genes exhibited sequence variations in two wild rice lines compared to the Nipponbare and 93-11 reference genomes, respectively. Analysis of NBS-LRR mutant candidate genes showed that they were mainly distributed on chromosome 11, and NBS domain was more conserved than LRR domain in both wild rice lines. NBS genes depicted higher levels of genetic diversity in Huaye 1 than that found in Huaye 2. Furthermore, protein-protein interaction analysis showed that NBS genes mostly interacted with the cytochrome C protein (Os05g0420600, Os01g0885000 and BGIOSGA038922), while some NBS genes interacted with heat shock protein, DNA-binding activity, Phosphoinositide 3-kinase and a coiled coil region. We explored abundant NBS-LRR encoding genes in two common wild rice lines through genome wide re-sequencing, which proved to be a useful tool to exploit elite NBS-LRR genes in wild rice. The data here provide a foundation for future work aimed at dissecting the genetic basis of disease resistance in rice, and the two wild rice lines will be useful germplasm for the molecular improvement of cultivated rice.

  20. Genome wide re-sequencing of newly developed Rice Lines from common wild rice (Oryza rufipogon Griff.) for the identification of NBS-LRR genes

    PubMed Central

    Yu, Hang; Li, Xiang; Yu, Shuhong; Shahid, Muhammad Qasim

    2017-01-01

    Common wild rice (Oryza rufipogon Griff.) is an important germplasm for rice breeding, which contains many resistance genes. Re-sequencing provides an unprecedented opportunity to explore the abundant useful genes at whole genome level. Here, we identified the nucleotide-binding site leucine-rich repeat (NBS-LRR) encoding genes by re-sequencing of two wild rice lines (i.e. Huaye 1 and Huaye 2) that were developed from common wild rice. We obtained 128 to 147 million reads with approximately 32.5-fold coverage depth, and uniquely covered more than 89.6% (> = 1 fold) of reference genomes. Two wild rice lines showed high SNP (single-nucleotide polymorphisms) variation rate in 12 chromosomes against the reference genomes of Nipponbare (japonica cultivar) and 93–11 (indica cultivar). InDels (insertion/deletion polymorphisms) count-length distribution exhibited normal distribution in the two lines, and most of the InDels were ranged from -5 to 5 bp. With reference to the Nipponbare genome sequence, we detected a total of 1,209,308 SNPs, 161,117 InDels and 4,192 SVs (structural variations) in Huaye 1, and 1,387,959 SNPs, 180,226 InDels and 5,305 SVs in Huaye 2. A total of 44.9% and 46.9% genes exhibited sequence variations in two wild rice lines compared to the Nipponbare and 93–11 reference genomes, respectively. Analysis of NBS-LRR mutant candidate genes showed that they were mainly distributed on chromosome 11, and NBS domain was more conserved than LRR domain in both wild rice lines. NBS genes depicted higher levels of genetic diversity in Huaye 1 than that found in Huaye 2. Furthermore, protein-protein interaction analysis showed that NBS genes mostly interacted with the cytochrome C protein (Os05g0420600, Os01g0885000 and BGIOSGA038922), while some NBS genes interacted with heat shock protein, DNA-binding activity, Phosphoinositide 3-kinase and a coiled coil region. We explored abundant NBS-LRR encoding genes in two common wild rice lines through genome wide re-sequencing, which proved to be a useful tool to exploit elite NBS-LRR genes in wild rice. The data here provide a foundation for future work aimed at dissecting the genetic basis of disease resistance in rice, and the two wild rice lines will be useful germplasm for the molecular improvement of cultivated rice. PMID:28700714

  1. The genome of Eucalyptus grandis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Myburg, Alexander A.; Grattapaglia, Dario; Tuskan, Gerald A.

    Eucalypts are the world s most widely planted hardwood trees. Their broad adaptability, rich species diversity, fast growth and superior multipurpose wood, have made them a global renewable resource of fiber and energy that mitigates human pressures on natural forests. We sequenced and assembled >94% of the 640 Mbp genome of Eucalyptus grandis into its 11 chromosomes. A set of 36,376 protein coding genes were predicted revealing that 34% occur in tandem duplications, the largest proportion found thus far in any plant genome. Eucalypts also show the highest diversity of genes for plant specialized metabolism that act as chemical defencemore » against biotic agents and provide unique pharmaceutical oils. Resequencing of a set of inbred tree genomes revealed regions of strongly conserved heterozygosity, likely hotspots of inbreeding depression. The resequenced genome of the sister species E. globulus underscored the high inter-specific genome colinearity despite substantial genome size variation in the genus. The genome of E. grandis is the first reference for the early diverging Rosid order Myrtales and is placed here basal to the Eurosids. This resource expands knowledge on the unique biology of large woody perennials and provides a powerful tool to accelerate comparative biology, breeding and biotechnology.« less

  2. Genomic structure analysis of a set of Oryza nivara introgression lines and identification of yield-associated QTLs using whole-genome resequencing

    PubMed Central

    Ma, Xin; Fu, Yongcai; Zhao, Xinhui; Jiang, Liyun; Zhu, Zuofeng; Gu, Ping; Xu, Wenying; Su, Zhen; Sun, Chuanqing; Tan, Lubin

    2016-01-01

    Oryza nivara, an annual wild AA-genome species of rice, is an important gene pool for broadening the genetic diversity of cultivated rice (O. sativa L.). Towards identifying and utilizing favourable alleles from O. nivara, we developed a set of introgression lines (ILs) by introducing O. nivara segments into the elite indica rice variety 93-11 background through advanced backcrossing and repeated selfing. Using whole-genome resequencing, a high-density genetic map containing 1,070 bin-markers was constructed for the 131 ILs, with an average length of 349 kb per bin. The 131 ILs cover 95% of O. nivara genome, providing a relatively complete genomic library for introgressing O. nivara alleles for trait improvement. Using this high-density bin-map, QTL mapping for 13 yield-related traits was performed and a total of 65 QTLs were detected across two environments. At ~36.9% of detected QTLs, the alleles from O. nivara conferred improving effects on yield-associated traits. Six cloned genes, Sh4/SHA1, Bh4, Sd1, TE/TAD1, GS3 and FZP, colocalised in the peak intervals of 9 QTLs. In conclusion, we developed new genetic materials for exploration and use of beneficial alleles from wild rice and provided a basis for future fine mapping and cloning of the favourable O. nivara-derived QTLs. PMID:27251022

  3. Application of Broad-Spectrum Resequencing Microarray for Genotyping Rhabdoviruses▿

    PubMed Central

    Dacheux, Laurent; Berthet, Nicolas; Dissard, Gabriel; Holmes, Edward C.; Delmas, Olivier; Larrous, Florence; Guigon, Ghislaine; Dickinson, Philip; Faye, Ousmane; Sall, Amadou A.; Old, Iain G.; Kong, Katherine; Kennedy, Giulia C.; Manuguerra, Jean-Claude; Cole, Stewart T.; Caro, Valérie; Gessain, Antoine; Bourhy, Hervé

    2010-01-01

    The rapid and accurate identification of pathogens is critical in the control of infectious disease. To this end, we analyzed the capacity for viral detection and identification of a newly described high-density resequencing microarray (RMA), termed PathogenID, which was designed for multiple pathogen detection using database similarity searching. We focused on one of the largest and most diverse viral families described to date, the family Rhabdoviridae. We demonstrate that this approach has the potential to identify both known and related viruses for which precise sequence information is unavailable. In particular, we demonstrate that a strategy based on consensus sequence determination for analysis of RMA output data enabled successful detection of viruses exhibiting up to 26% nucleotide divergence with the closest sequence tiled on the array. Using clinical specimens obtained from rabid patients and animals, this method also shows a high species level concordance with standard reference assays, indicating that it is amenable for the development of diagnostic assays. Finally, 12 animal rhabdoviruses which were currently unclassified, unassigned, or assigned as tentative species within the family Rhabdoviridae were successfully detected. These new data allowed an unprecedented phylogenetic analysis of 106 rhabdoviruses and further suggest that the principles and methodology developed here may be used for the broad-spectrum surveillance and the broader-scale investigation of biodiversity in the viral world. PMID:20610710

  4. [Identification of novel pathogenic gene mutations in pediatric acute myeloid leukemia by whole-exome resequencing].

    PubMed

    Shiba, Norio

    2015-12-01

    A new class of gene mutations, identified in the pathogenesis of adult acute myeloid leukemia (AML), includes DNMT3A, IDH1/2, TET2 and EZH2. However, these mutations are rare in pediatric AML cases, indicating that pathogeneses differ between adult and pediatric forms of AML. Meanwhile, the recent development of massively parallel sequencing technologies has provided a new opportunity to discover genetic changes across entire genomes or proteincoding sequences. In order to reveal a complete registry of gene mutations, we performed whole exome resequencing of paired tumor-normal specimens from 19 pediatric AML cases using Illumina HiSeq 2000. In total, 80 somatic mutations or 4.2 mutations per sample were identified. Many of the recurrent mutations identified in this study involved previously reported targets in AML, such as FLT3, CEBPA, KIT, CBL, NRAS, WT1 and EZH2. On the other hand, several genes were newly identified in the current study, including BCORL1 and major cohesin components such as SMC3 and RAD21. Whole exome resequencing revealed a complex array of gene mutations in pediatric AML genomes. Our results indicate that a subset of pediatric AML represents a discrete entity that could be discriminated from its adult counterpart, in terms of the spectrum of gene mutations.

  5. Novel mutations in LRP6 highlight the role of WNT signaling in tooth agenesis

    PubMed Central

    Ludwig, Kerstin U.; Sullivan, Robert; van Rooij, Iris A.L.M.; Thonissen, Michelle; Swinnen, Steven; Phan, Milien; Conte, Federica; Ishorst, Nina; Gilissen, Christian; RoaFuentes, Laury; van de Vorst, Maartje; Henkes, Arjen; Steehouwer, Marloes; van Beusekom, Ellen; Bloemen, Marjon; Vankeirsbilck, Bruno; Bergé, Stefaan; Hens, Greet; Schoenaers, Joseph; Poorten, Vincent Vander; Roosenboom, Jasmien; Verdonck, An; Devriendt, Koen; Roeleveldt, Nel; Jhangiani, Shalini N.; Vissers, Lisenka E.L.M.; Lupski, James R.; de Ligt, Joep; Von den Hoff, Johannes W.; Pfundt, Rolph; Brunner, Han G.; Zhou, Huiqing; Dixon, Jill; Mangold, Elisabeth; van Bokhoven, Hans; Dixon, Michael J.; Kleefstra, Tjitske

    2016-01-01

    Purpose Here we aimed to identify a novel genetic cause of tooth agenesis (TA) and/or orofacial clefting (OFC) by combining whole exome sequencing (WES) and targeted re-sequencing in a large cohort of TA and OFC patients. Methods WES was performed in two unrelated patients, one with severe TA and OFC and another with severe TA only. After identifying deleterious mutations in a gene encoding the low density lipoprotein receptor-related protein 6 (LRP6), all its exons were re-sequenced with molecular inversion probes, in 67 patients with TA, 1,072 patients with OFC and in 706 controls. Results We identified a frameshift (c.4594delG, p.Cys1532fs) and a canonical splice site mutation (c.3398-2A>C, p.?) in LRP6 respectively in the patient with TA and OFC, and in the patient with severe TA only. The targeted re-sequencing showed significant enrichment of unique LRP6 variants in TA patients, but not in nonsyndromic OFC. From the 5 variants in patients with TA, 2 affect the canonical splice site and 3 were missense variants; all variants segregated with the dominant phenotype and in 1 case the missense mutation occurred de novo. Conclusion Mutations in LRP6 cause tooth agenesis in man. PMID:26963285

  6. Genome Resequencing Identifies Unique Adaptations of Tibetan Chickens to Hypoxia and High-Dose Ultraviolet Radiation in High-Altitude Environments.

    PubMed

    Zhang, Qian; Gou, Wenyu; Wang, Xiaotong; Zhang, Yawen; Ma, Jun; Zhang, Hongliang; Zhang, Ying; Zhang, Hao

    2016-02-23

    Tibetan chicken, unlike their lowland counterparts, exhibit specific adaptations to high-altitude conditions. The genetic mechanisms of such adaptations in highland chickens were determined by resequencing the genomes of four highland (Tibetan and Lhasa White) and four lowland (White Leghorn, Lindian, and Chahua) chicken populations. Our results showed an evident genetic admixture in Tibetan chickens, suggesting a history of introgression from lowland gene pools. Genes showing positive selection in highland populations were related to cardiovascular and respiratory system development, DNA repair, response to radiation, inflammation, and immune responses, indicating a strong adaptation to oxygen scarcity and high-intensity solar radiation. The distribution of allele frequencies of nonsynonymous single nucleotide polymorphisms between highland and lowland populations was analyzed using chi-square test, which showed that several differentially distributed genes with missense mutations were enriched in several functional categories, especially in blood vessel development and adaptations to hypoxia and intense radiation. RNA sequencing revealed that several differentially expressed genes were enriched in gene ontology terms related to blood vessel and respiratory system development. Several candidate genes involved in the development of cardiorespiratory system (FGFR1, CTGF, ADAM9, JPH2, SATB1, BMP4, LOX, LPR, ANGPTL4, and HYAL1), inflammation and immune responses (AIRE, MYO1F, ZAP70, DDX60, CCL19, CD47, JSC, and FAS), DNA repair, and responses to radiation (VCP, ASH2L, and FANCG) were identified to play key roles in the adaptation to high-altitude conditions. Our data provide new insights into the unique adaptations of highland animals to extreme environments. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  7. MMP21 is mutated in human heterotaxy and is required for normal left-right asymmetry in vertebrates.

    PubMed

    Guimier, Anne; Gabriel, George C; Bajolle, Fanny; Tsang, Michael; Liu, Hui; Noll, Aaron; Schwartz, Molly; El Malti, Rajae; Smith, Laurie D; Klena, Nikolai T; Jimenez, Gina; Miller, Neil A; Oufadem, Myriam; Moreau de Bellaing, Anne; Yagi, Hisato; Saunders, Carol J; Baker, Candice N; Di Filippo, Sylvie; Peterson, Kevin A; Thiffault, Isabelle; Bole-Feysot, Christine; Cooley, Linda D; Farrow, Emily G; Masson, Cécile; Schoen, Patric; Deleuze, Jean-François; Nitschké, Patrick; Lyonnet, Stanislas; de Pontual, Loic; Murray, Stephen A; Bonnet, Damien; Kingsmore, Stephen F; Amiel, Jeanne; Bouvagnet, Patrice; Lo, Cecilia W; Gordon, Christopher T

    2015-11-01

    Heterotaxy results from a failure to establish normal left-right asymmetry early in embryonic development. By whole-exome sequencing, whole-genome sequencing and high-throughput cohort resequencing, we identified recessive mutations in MMP21 (encoding matrix metallopeptidase 21) in nine index cases with heterotaxy. In addition, Mmp21-mutant mice and mmp21-morphant zebrafish displayed heterotaxy and abnormal cardiac looping, respectively, suggesting a new role for extracellular matrix remodeling in the establishment of laterality in vertebrates.

  8. MMP21 is mutated in human heterotaxy and is required for normal left-right asymmetry in vertebrates

    PubMed Central

    Guimier, Anne; Gabriel, George C.; Bajolle, Fanny; Tsang, Michael; Liu, Hui; Noll, Aaron; Schwartz, Molly; El Malti, Rajae; Smith, Laurie D.; Klena, Nikolai T.; Jimenez, Gina; Miller, Neil A.; Oufadem, Myriam; Moreau de Bellaing, Anne; Yagi, Hisato; Saunders, Carol J.; Baker, Candice N.; Di Filippo, Sylvie; Peterson, Kevin A.; Thiffault, Isabelle; Bole-Feysot, Christine; Cooley, Linda D.; Farrow, Emily G.; Masson, Cécile; Schoen, Patric; Deleuze, Jean-François; Nitschké, Patrick; Lyonnet, Stanislas; de Pontual, Loic; Murray, Stephen A.; Bonnet, Damien; Kingsmore, Stephen F.; Amiel, Jeanne; Bouvagnet, Patrice; Lo, Cecilia W.; Gordon, Christopher T.

    2017-01-01

    Heterotaxy results from a failure to establish normal left-right asymmetry early in embryonic development. By whole exome sequencing, whole genome sequencing and high-throughput cohort resequencing we identified recessive mutations in matrix metallopeptidase 21 (MMP21), in nine index cases with heterotaxy. In addition, Mmp21 mutant mice and morphant zebrafish display heterotaxy and abnormal cardiac looping, respectively, suggesting a novel role for extra-cellular remodeling in the establishment of laterality in vertebrates. PMID:26437028

  9. Testing and Validation of High Density Resequencing Microarray for Broad Range Biothreat Agents Detection

    DTIC Science & Technology

    2009-08-11

    Competing Interests: One of the contributing authors : Clark Tibbetts, is the Executive Vice President and Chief Technology Officer of Tessarae, LLC...Detection 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR (S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7...N/A 1021 ng No detection Sin nombre Bunyaviridae III 1021 ng Pulmonary syndrome hantavirus strain Convict Creek 107 1CCHFV = Crimean-Congo hemorrhagic

  10. Extraordinary Genetic Diversity in a Wood Decay Mushroom.

    PubMed

    Baranova, Maria A; Logacheva, Maria D; Penin, Aleksey A; Seplyarskiy, Vladimir B; Safonova, Yana Y; Naumenko, Sergey A; Klepikova, Anna V; Gerasimov, Evgeny S; Bazykin, Georgii A; James, Timothy Y; Kondrashov, Alexey S

    2015-10-01

    Populations of different species vary in the amounts of genetic diversity they possess. Nucleotide diversity π, the fraction of nucleotides that are different between two randomly chosen genotypes, has been known to range in eukaryotes between 0.0001 in Lynx lynx and 0.16 in Caenorhabditis brenneri. Here, we report the results of a comparative analysis of 24 haploid genotypes (12 from the United States and 12 from European Russia) of a split-gill fungus Schizophyllum commune. The diversity at synonymous sites is 0.20 in the American population of S. commune and 0.13 in the Russian population. This exceptionally high level of nucleotide diversity also leads to extreme amino acid diversity of protein-coding genes. Using whole-genome resequencing of 2 parental and 17 offspring haploid genotypes, we estimate that the mutation rate in S. commune is high, at 2.0 × 10(-8) (95% CI: 1.1 × 10(-8) to 4.1 × 10(-8)) per nucleotide per generation. Therefore, the high diversity of S. commune is primarily determined by its elevated mutation rate, although high effective population size likely also plays a role. Small genome size, ease of cultivation and completion of the life cycle in the laboratory, free-living haploid life stages and exceptionally high variability of S. commune make it a promising model organism for population, quantitative, and evolutionary genetics. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  11. TIA: algorithms for development of identity-linked SNP islands for analysis by massively parallel DNA sequencing.

    PubMed

    Farris, M Heath; Scott, Andrew R; Texter, Pamela A; Bartlett, Marta; Coleman, Patricia; Masters, David

    2018-04-11

    Single nucleotide polymorphisms (SNPs) located within the human genome have been shown to have utility as markers of identity in the differentiation of DNA from individual contributors. Massively parallel DNA sequencing (MPS) technologies and human genome SNP databases allow for the design of suites of identity-linked target regions, amenable to sequencing in a multiplexed and massively parallel manner. Therefore, tools are needed for leveraging the genotypic information found within SNP databases for the discovery of genomic targets that can be evaluated on MPS platforms. The SNP island target identification algorithm (TIA) was developed as a user-tunable system to leverage SNP information within databases. Using data within the 1000 Genomes Project SNP database, human genome regions were identified that contain globally ubiquitous identity-linked SNPs and that were responsive to targeted resequencing on MPS platforms. Algorithmic filters were used to exclude target regions that did not conform to user-tunable SNP island target characteristics. To validate the accuracy of TIA for discovering these identity-linked SNP islands within the human genome, SNP island target regions were amplified from 70 contributor genomic DNA samples using the polymerase chain reaction. Multiplexed amplicons were sequenced using the Illumina MiSeq platform, and the resulting sequences were analyzed for SNP variations. 166 putative identity-linked SNPs were targeted in the identified genomic regions. Of the 309 SNPs that provided discerning power across individual SNP profiles, 74 previously undefined SNPs were identified during evaluation of targets from individual genomes. Overall, DNA samples of 70 individuals were uniquely identified using a subset of the suite of identity-linked SNP islands. TIA offers a tunable genome search tool for the discovery of targeted genomic regions that are scalable in the population frequency and numbers of SNPs contained within the SNP island regions. It also allows the definition of sequence length and sequence variability of the target region as well as the less variable flanking regions for tailoring to MPS platforms. As shown in this study, TIA can be used to discover identity-linked SNP islands within the human genome, useful for differentiating individuals by targeted resequencing on MPS technologies.

  12. Association Genetics of Populus trichocarpa or Resequencing in Populus: Towards Genome Wide Association Genetics (2011 JGI User Meeting)

    ScienceCinema

    Tuskan, Gerry

    2018-02-13

    The U.S. Department of Energy Joint Genome Institute (JGI) invited scientists interested in the application of genomics to bioenergy and environmental issues, as well as all current and prospective users and collaborators, to attend the annual DOE JGI Genomics of Energy Environment Meeting held March 22-24, 2011 in Walnut Creek, Calif. The emphasis of this meeting was on the genomics of renewable energy strategies, carbon cycling, environmental gene discovery, and engineering of fuel-producing organisms. The meeting features presentations by leading scientists advancing these topics. Gerry Tuskan of Oak Ridge National Laboratory on Resequencing in Populus: Towards Genome Wide Association Genetics at the 6th annual Genomics of Energy Environment Meeting on March 23, 2011.

  13. Different mutational function of low- and high-linear energy transfer heavy-ion irradiation demonstrated by whole-genome resequencing of Arabidopsis mutants.

    PubMed

    Kazama, Yusuke; Ishii, Kotaro; Hirano, Tomonari; Wakana, Taeko; Yamada, Mieko; Ohbu, Sumie; Abe, Tomoko

    2017-12-01

    Heavy-ion irradiation is a powerful mutagen that possesses high linear energy transfer (LET). Several studies have indicated that the value of LET affects DNA lesion formation in several ways, including the efficiency and the density of double-stranded break induction along the particle path. We assumed that the mutation type can be altered by selecting an appropriate LET value. Here, we quantitatively demonstrate differences in the mutation type induced by irradiation with two representative ions, Ar ions (LET: 290 keV μm -1 ) and C ions (LET: 30.0 keV μm -1 ), by whole-genome resequencing of the Arabidopsis mutants produced by these irradiations. Ar ions caused chromosomal rearrangements or large deletions (≥100 bp) more frequently than C ions, with 10.2 and 2.3 per mutant genome under Ar- and C-ion irradiation, respectively. Conversely, C ions induced more single-base substitutions and small indels (<100 bp) than Ar ions, with 28.1 and 56.9 per mutant genome under Ar- and C-ion irradiation, respectively. Moreover, the rearrangements induced by Ar-ion irradiation were more complex than those induced by C-ion irradiation, and tended to accompany single base substitutions or small indels located close by. In conjunction with the detection of causative genes through high-throughput sequencing, selective irradiation by beams with different effects will be a powerful tool for forward genetics as well as studies on chromosomal rearrangements. © 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd.

  14. Widespread Site-Dependent Buffering of Human Regulatory Polymorphism

    PubMed Central

    Kutyavin, Tanya; Stamatoyannopoulos, John A.

    2012-01-01

    The average individual is expected to harbor thousands of variants within non-coding genomic regions involved in gene regulation. However, it is currently not possible to interpret reliably the functional consequences of genetic variation within any given transcription factor recognition sequence. To address this, we comprehensively analyzed heritable genome-wide binding patterns of a major sequence-specific regulator (CTCF) in relation to genetic variability in binding site sequences across a multi-generational pedigree. We localized and quantified CTCF occupancy by ChIP-seq in 12 related and unrelated individuals spanning three generations, followed by comprehensive targeted resequencing of the entire CTCF–binding landscape across all individuals. We identified hundreds of variants with reproducible quantitative effects on CTCF occupancy (both positive and negative). While these effects paralleled protein–DNA recognition energetics when averaged, they were extensively buffered by striking local context dependencies. In the significant majority of cases buffering was complete, resulting in silent variants spanning every position within the DNA recognition interface irrespective of level of binding energy or evolutionary constraint. The prevalence of complex partial or complete buffering effects severely constrained the ability to predict reliably the impact of variation within any given binding site instance. Surprisingly, 40% of variants that increased CTCF occupancy occurred at positions of human–chimp divergence, challenging the expectation that the vast majority of functional regulatory variants should be deleterious. Our results suggest that, even in the presence of “perfect” genetic information afforded by resequencing and parallel studies in multiple related individuals, genomic site-specific prediction of the consequences of individual variation in regulatory DNA will require systematic coupling with empirical functional genomic measurements. PMID:22457641

  15. Sequence polymorphism at the human apolipoprotein AII gene ( APOA2): unexpected deficit of variation in an African-American sample.

    PubMed

    Fullerton, Stephanie M; Clark, Andrew G; Weiss, Kenneth M; Taylor, Scott L; Stengård, Jari H; Salomaa, Veikko; Boerwinkle, Eric; Nickerson, Deborah A

    2002-07-01

    A 3.3-kb region, encompassing the APOA2 gene and 2 kb of 5' and 3' flanking DNA, was re-sequenced in a "core" sample of 24 individuals, sampled without regard to the health from each of three populations: African-Americans from Jackson (Miss., USA), Europeans from North Karelia (Finland), and non-Hispanic European-Americans from Rochester, (Minn., USA). Fifteen variable sites were identified (14 SNPs and one multi-allelic microsatellite, all silent), and these sites segregated as 18 sequence haplotypes (or nine, if SNPs only are considered). The haplotype distribution in the core African-American sample was unusual, with a deficit of particular haplotypes compared with those found in the other two samples, and a significantly (P<0.05) low level of nucleotide diversity relative to patterns of polymorphism and divergence at other human loci. Six of the 14 SNPs, whose variation captured the haplotype structure of the core data, were then genotyped by oligonucleotide ligation assay in an additional 2183 individuals from the same three populations (n=843, n=452, and n=888, respectively). All six sites varied in each of the larger "epidemiological" samples, and together, they defined 19 SNP haplotypes, seven with relative frequencies greater than 1% in the total sample; all of these common haplotypes had been identified earlier in the core re-sequencing survey. Here also, the African-American sample showed significantly lower SNP heterozygosity and haplotype diversity than the other two samples. The deficit of polymorphism is consistent with a population-specific non-neutral increase in the relative frequency of several haplotypes in Jackson.

  16. Epileptic spasms are a feature of DEPDC5 mTORopathy

    PubMed Central

    Carvill, Gemma L.; Crompton, Douglas E.; Regan, Brigid M.; McMahon, Jacinta M.; Saykally, Julia; Zemel, Matthew; Schneider, Amy L.; Dibbens, Leanne; Howell, Katherine B.; Mandelstam, Simone; Leventer, Richard J.; Harvey, A. Simon; Mullen, Saul A.; Berkovic, Samuel F.; Sullivan, Joseph; Scheffer, Ingrid E.

    2015-01-01

    Objective: To assess the presence of DEPDC5 mutations in a cohort of patients with epileptic spasms. Methods: We performed DEPDC5 resequencing in 130 patients with spasms, segregation analysis of variants of interest, and detailed clinical assessment of patients with possibly and likely pathogenic variants. Results: We identified 3 patients with variants in DEPDC5 in the cohort of 130 patients with spasms. We also describe 3 additional patients with DEPDC5 alterations and epileptic spasms: 2 from a previously described family and a third ascertained by clinical testing. Overall, we describe 6 patients from 5 families with spasms and DEPDC5 variants; 2 arose de novo and 3 were familial. Two individuals had focal cortical dysplasia. Clinical outcome was highly variable. Conclusions: While recent molecular findings in epileptic spasms emphasize the contribution of de novo mutations, we highlight the relevance of inherited mutations in the setting of a family history of focal epilepsies. We also illustrate the utility of clinical diagnostic testing and detailed phenotypic evaluation in characterizing the constellation of phenotypes associated with DEPDC5 alterations. We expand this phenotypic spectrum to include epileptic spasms, aligning DEPDC5 epilepsies more with the recognized features of other mTORopathies. PMID:27066554

  17. Identification of ALK as the Major Familial Neuroblastoma Predisposition Gene

    PubMed Central

    Mossë, Yalë P; Laudenslager, Marci; Longo, Luca; Cole, Kristina A; Wood, Andrew; Attiyeh, Edward F; Laquaglia, Michael J; Sennett, Rachel; Lynch, Jill E; Perri, Patrizia; Laureys, Geneviève; Speleman, Frank; Hakonarson, Hakon; Torkamani, Ali; Schork, Nicholas J; Brodeur, Garrett M; Tonini, Gian Paolo; Rappaport, Eric; Devoto, Marcella; Maris, John M

    2009-01-01

    SUMMARY Survival rates for the childhood cancer neuroblastoma have not substantively improved despite dramatic escalation in chemotherapy intensity. Like most human cancers, this embryonal malignancy can be inherited, but the genetic etiology of familial and sporadically occurring neuroblastoma was largely unknown. Here we show that germline mutations in the anaplastic lymphoma kinase gene (ALK) explain the majority of hereditary neuroblastomas, and that activating mutations can also be somatically acquired. We first identified a significant linkage signal at the short arm of chromosome 2 (maximum nonparametric LOD=4.23 at rs1344063) using a whole-genome scan in neuroblastoma pedigrees. Resequencing of regional candidate genes identified three separate missense mutations in the tyrosine kinase domain of ALK (G1128A, R1192P and R1275Q) that segregated with the disease in eight separate families. Examination of 491 sporadically occurring human neuroblastoma samples showed that the ALK locus was gained in 22.8%, and highly amplified in an additional 3.3%, and that these aberrations were highly associated with death from disease (P=0.0003). Resequencing of 194 high-risk neuroblastoma samples showed somatically acquired mutations within the tyrosine kinase domain in 12.4%. Nine of the ten mutations map to critical regions of the kinase domain and were predicted to be oncogenic drivers with high probability. Mutations resulted in constitutive phosphorylation consistent with activation, and targeted knockdown of ALK mRNA resulted in profound growth inhibition of 4 of 4 cell lines harboring mutant or amplified ALK, as well as 2 of 6 wild type for ALK. Our results demonstrate that heritable mutations of ALK are the major cause of familial neuroblastoma, and that germline or acquired activation of this cell surface kinase is a tractable therapeutic target for this lethal pediatric malignancy. PMID:18724359

  18. Genome resequencing in Populus: Revealing large-scale genome variation and implications on specialized-trait genomics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Muchero, Wellington; Labbe, Jessy L; Priya, Ranjan

    2014-01-01

    To date, Populus ranks among a few plant species with a complete genome sequence and other highly developed genomic resources. With the first genome sequence among all tree species, Populus has been adopted as a suitable model organism for genomic studies in trees. However, far from being just a model species, Populus is a key renewable economic resource that plays a significant role in providing raw materials for the biofuel and pulp and paper industries. Therefore, aside from leading frontiers of basic tree molecular biology and ecological research, Populus leads frontiers in addressing global economic challenges related to fuel andmore » fiber production. The latter fact suggests that research aimed at improving quality and quantity of Populus as a raw material will likely drive the pursuit of more targeted and deeper research in order to unlock the economic potential tied in molecular biology processes that drive this tree species. Advances in genome sequence-driven technologies, such as resequencing individual genotypes, which in turn facilitates large scale SNP discovery and identification of large scale polymorphisms are key determinants of future success in these initiatives. In this treatise we discuss implications of genome sequence-enable technologies on Populus genomic and genetic studies of complex and specialized-traits.« less

  19. Genome-wide DNA polymorphisms in Kavuni, a traditional rice cultivar with nutritional and therapeutic properties.

    PubMed

    Rathinasabapathi, Pasupathi; Purushothaman, Natarajan; Parani, Madasamy

    2016-05-01

    Although rice genome was sequenced in the year 2002, efforts in resequencing the large number of available accessions, landraces, traditional cultivars, and improved varieties of this important food crop are limited. We have initiated resequencing of the traditional cultivars from India. Kavuni is an important traditional rice cultivar from South India that attracts premium price for its nutritional and therapeutic properties. Whole-genome sequencing of Kavuni using Illumina platform and SNPs analysis using Nipponbare reference genome identified 1 150 711 SNPs of which 377 381 SNPs were located in the genic regions. Non-synonymous SNPs (62 708) were distributed in 19 251 genes, and their number varied between 1 and 115 per gene. Large-effect DNA polymorphisms (7769) were present in 3475 genes. Pathway mapping of these polymorphisms revealed the involvement of genes related to carbohydrate metabolism, translation, protein-folding, and cell death. Analysis of the starch biosynthesis related genes revealed that the granule-bound starch synthase I gene had T/G SNPs at the first intron/exon junction and a two-nucleotide combination, which were reported to favour high amylose content and low glycemic index. The present study provided a valuable genomics resource to study the rice varieties with nutritional and medicinal properties.

  20. Whole Genome Sequencing of Greater Amberjack (Seriola dumerili) for SNP Identification on Aligned Scaffolds and Genome Structural Variation Analysis Using Parallel Resequencing

    PubMed Central

    Aokic, Jun-ya; Kawase, Junya; Hamada, Kazuhisa; Fujimoto, Hiroshi; Yamamoto, Ikki; Usuki, Hironori

    2018-01-01

    Greater amberjack (Seriola dumerili) is distributed in tropical and temperate waters worldwide and is an important aquaculture fish. We carried out de novo sequencing of the greater amberjack genome to construct a reference genome sequence to identify single nucleotide polymorphisms (SNPs) for breeding amberjack by marker-assisted or gene-assisted selection as well as to identify functional genes for biological traits. We obtained 200 times coverage and constructed a high-quality genome assembly using next generation sequencing technology. The assembled sequences were aligned onto a yellowtail (Seriola quinqueradiata) radiation hybrid (RH) physical map by sequence homology. A total of 215 of the longest amberjack sequences, with a total length of 622.8 Mbp (92% of the total length of the genome scaffolds), were lined up on the yellowtail RH map. We resequenced the whole genomes of 20 greater amberjacks and mapped the resulting sequences onto the reference genome sequence. About 186,000 nonredundant SNPs were successfully ordered on the reference genome. Further, we found differences in the genome structural variations between two greater amberjack populations using BreakDancer. We also analyzed the greater amberjack transcriptome and mapped the annotated sequences onto the reference genome sequence. PMID:29785397

  1. Association Genetics of Populus trichocarpa or Resequencing in Populus: Towards Genome Wide Association Genetics (2011 JGI User Meeting)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tuskan, Gerry

    The U.S. Department of Energy Joint Genome Institute (JGI) invited scientists interested in the application of genomics to bioenergy and environmental issues, as well as all current and prospective users and collaborators, to attend the annual DOE JGI Genomics of Energy Environment Meeting held March 22-24, 2011 in Walnut Creek, Calif. The emphasis of this meeting was on the genomics of renewable energy strategies, carbon cycling, environmental gene discovery, and engineering of fuel-producing organisms. The meeting features presentations by leading scientists advancing these topics. Gerry Tuskan of Oak Ridge National Laboratory on Resequencing in Populus: Towards Genome Wide Association Geneticsmore » at the 6th annual Genomics of Energy Environment Meeting on March 23, 2011.« less

  2. Precision spectral manipulation of optical pulses using a coherent photon echo memory.

    PubMed

    Buchler, B C; Hosseini, M; Hétet, G; Sparkes, B M; Lam, P K

    2010-04-01

    Photon echo schemes are excellent candidates for high efficiency coherent optical memory. They are capable of high-bandwidth multipulse storage, pulse resequencing and have been shown theoretically to be compatible with quantum information applications. One particular photon echo scheme is the gradient echo memory (GEM). In this system, an atomic frequency gradient is induced in the direction of light propagation leading to a Fourier decomposition of the optical spectrum along the length of the storage medium. This Fourier encoding allows precision spectral manipulation of the stored light. In this Letter, we show frequency shifting, spectral compression, spectral splitting, and fine dispersion control of optical pulses using GEM.

  3. Comprehensive profiling and quantitation of oncogenic mutations in non-small cell lung carcinoma using single-molecule amplification and re-sequencing technology.

    PubMed

    Shi, Jian; Yuan, Meng; Wang, Zhan-Dong; Xu, Xiao-Li; Hong, Lei; Sun, Shenglin

    2017-02-01

    The carcinogenesis of non-small cell lung carcinoma has been found to associate with activating and resistant mutations in the tyrosine kinase domain of specific oncogenes. Here, we assessed the type, frequency, and abundance of epithelial growth factor receptor, KRAS, BRAF, and ALK mutations in 154 non-small cell lung carcinoma specimens using single-molecule amplification and re-sequencing technology. We found that epithelial growth factor receptor mutations were the most prevalent (44.2%), followed by KRAS (18.8%), ALK (7.8%), and BRAF (5.8%) mutations. The type and abundance of the mutations in tumor specimens appeared to be heterogeneous. Thus, we conclude that identification of clinically significant oncogenic mutations may improve the classification of patients and provide valuable information for determination of the therapeutic strategies.

  4. Whole Genome Re-Sequencing and Characterization of Powdery Mildew Disease-Associated Allelic Variation in Melon.

    PubMed

    Natarajan, Sathishkumar; Kim, Hoy-Taek; Thamilarasan, Senthil Kumar; Veerappan, Karpagam; Park, Jong-In; Nou, Ill-Sup

    2016-01-01

    Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L.) and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, 'SCNU1154', 'Edisto47', 'MR-1', and 'PMR5'. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs), 1.9 million InDels, and 182,398 putative structural variations (SVs). Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon.

  5. Pooled Resequencing of 122 Ulcerative Colitis Genes in a Large Dutch Cohort Suggests Population-Specific Associations of Rare Variants in MUC2.

    PubMed

    Visschedijk, Marijn C; Alberts, Rudi; Mucha, Soren; Deelen, Patrick; de Jong, Dirk J; Pierik, Marieke; Spekhorst, Lieke M; Imhann, Floris; van der Meulen-de Jong, Andrea E; van der Woude, C Janneke; van Bodegraven, Adriaan A; Oldenburg, Bas; Löwenberg, Mark; Dijkstra, Gerard; Ellinghaus, David; Schreiber, Stefan; Wijmenga, Cisca; Rivas, Manuel A; Franke, Andre; van Diemen, Cleo C; Weersma, Rinse K

    2016-01-01

    Genome-wide association studies have revealed several common genetic risk variants for ulcerative colitis (UC). However, little is known about the contribution of rare, large effect genetic variants to UC susceptibility. In this study, we performed a deep targeted re-sequencing of 122 genes in Dutch UC patients in order to investigate the contribution of rare variants to the genetic susceptibility to UC. The selection of genes consists of 111 established human UC susceptibility genes and 11 genes that lead to spontaneous colitis when knocked-out in mice. In addition, we sequenced the promoter regions of 45 genes where known variants exert cis-eQTL-effects. Targeted pooled re-sequencing was performed on DNA of 790 Dutch UC cases. The Genome of the Netherlands project provided sequence data of 500 healthy controls. After quality control and prioritization based on allele frequency and pathogenicity probability, follow-up genotyping of 171 rare variants was performed on 1021 Dutch UC cases and 1166 Dutch controls. Single-variant association and gene-based analyses identified an association of rare variants in the MUC2 gene with UC. The associated variants in the Dutch population could not be replicated in a German replication cohort (1026 UC cases, 3532 controls). In conclusion, this study has identified a putative role for MUC2 on UC susceptibility in the Dutch population and suggests a population-specific contribution of rare variants to UC.

  6. Integrating transcriptome and genome re-sequencing data to identify key genes and mutations affecting chicken eggshell qualities.

    PubMed

    Zhang, Quan; Zhu, Feng; Liu, Long; Zheng, Chuan Wei; Wang, De He; Hou, Zhuo Cheng; Ning, Zhong Hua

    2015-01-01

    Eggshell damages lead to economic losses in the egg production industry and are a threat to human health. We examined 49-wk-old Rhode Island White hens (Gallus gallus) that laid eggs having shells with significantly different strengths and thicknesses. We used HiSeq 2000 (Illumina) sequencing to characterize the chicken transcriptome and whole genome to identify the key genes and genetic mutations associated with eggshell calcification. We identified a total of 14,234 genes expressed in the chicken uterus, representing 89% of all annotated chicken genes. A total of 889 differentially expressed genes were identified by comparing low eggshell strength (LES) and normal eggshell strength (NES) genomes. The DEGs are enriched in calcification-related processes, including calcium ion transport and calcium signaling pathways as revealed by gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis. Some important matrix proteins, such as OC-116, LTF and SPP1, were also expressed differentially between two groups. A total of 3,671,919 single-nucleotide polymorphisms (SNPs) and 508,035 Indels were detected in protein coding genes by whole-genome re-sequencing, including 1775 non-synonymous variations and 19 frame-shift Indels in DEGs. SNPs and Indels found in this study could be further investigated for eggshell traits. This is the first report to integrate the transcriptome and genome re-sequencing to target the genetic variations which decreased the eggshell qualities. These findings further advance our understanding of eggshell calcification in the chicken uterus.

  7. Massively multiplexed microbial identification using resequencing DNA microarrays for outbreak investigation

    NASA Astrophysics Data System (ADS)

    Leski, T. A.; Ansumana, R.; Jimmy, D. H.; Bangura, U.; Malanoski, A. P.; Lin, B.; Stenger, D. A.

    2011-06-01

    Multiplexed microbial diagnostic assays are a promising method for detection and identification of pathogens causing syndromes characterized by nonspecific symptoms in which traditional differential diagnosis is difficult. Also such assays can play an important role in outbreak investigations and environmental screening for intentional or accidental release of biothreat agents, which requires simultaneous testing for hundreds of potential pathogens. The resequencing pathogen microarray (RPM) is an emerging technological platform, relying on a combination of massively multiplex PCR and high-density DNA microarrays for rapid detection and high-resolution identification of hundreds of infectious agents simultaneously. The RPM diagnostic system was deployed in Sierra Leone, West Africa in collaboration with Njala University and Mercy Hospital Research Laboratory located in Bo. We used the RPM-Flu microarray designed for broad-range detection of human respiratory pathogens, to investigate a suspected outbreak of avian influenza in a number of poultry farms in which significant mortality of chickens was observed. The microarray results were additionally confirmed by influenza specific real-time PCR. The results of the study excluded the possibility that the outbreak was caused by influenza, but implicated Klebsiella pneumoniae as a possible pathogen. The outcome of this feasibility study confirms that application of broad-spectrum detection platforms for outbreak investigation in low-resource locations is possible and allows for rapid discovery of the responsible agents, even in cases when different agents are suspected. This strategy enables quick and cost effective detection of low probability events such as outbreak of a rare disease or intentional release of a biothreat agent.

  8. Improved hybrid de novo genome assembly of domesticated apple (Malus x domestica).

    PubMed

    Li, Xuewei; Kui, Ling; Zhang, Jing; Xie, Yinpeng; Wang, Liping; Yan, Yan; Wang, Na; Xu, Jidi; Li, Cuiying; Wang, Wen; van Nocker, Steve; Dong, Yang; Ma, Fengwang; Guan, Qingmei

    2016-08-08

    Domesticated apple (Malus × domestica Borkh) is a popular temperate fruit with high nutrient levels and diverse flavors. In 2012, global apple production accounted for at least one tenth of all harvested fruits. A high-quality apple genome assembly is crucial for the selection and breeding of new cultivars. Currently, a single reference genome is available for apple, assembled from 16.9 × genome coverage short reads via Sanger and 454 sequencing technologies. Although a useful resource, this assembly covers only ~89 % of the non-repetitive portion of the genome, and has a relatively short (16.7 kb) contig N50 length. These downsides make it difficult to apply this reference in transcriptive or whole-genome re-sequencing analyses. Here we present an improved hybrid de novo genomic assembly of apple (Golden Delicious), which was obtained from 76 Gb (~102 × genome coverage) Illumina HiSeq data and 21.7 Gb (~29 × genome coverage) PacBio data. The final draft genome is approximately 632.4 Mb, representing ~ 90 % of the estimated genome. The contig N50 size is 111,619 bp, representing a 7 fold improvement. Further annotation analyses predicted 53,922 protein-coding genes and 2,765 non-coding RNA genes. The new apple genome assembly will serve as a valuable resource for investigating complex apple traits at the genomic level. It is not only suitable for genome editing and gene cloning, but also for RNA-seq and whole-genome re-sequencing studies.

  9. Genetic Diversity, Molecular Phylogeny, and Selection Evidence of Jinchuan Yak Revealed by Whole-Genome Resequencing

    PubMed Central

    Lan, Daoliang; Xiong, Xianrong; Mipam, Tserang-Donko; Fu, Changxiu; Li, Qiang; Ai, Yi; Hou, Dingchao; Chai, Zhixin; Zhong, Jincheng; Li, Jian

    2018-01-01

    Jinchuan yak, a newly discovered yak breed, not only possesses a large proportion of multi-ribs but also exhibits many good characteristics, such as high meat production, milk yield, and reproductive performance. However, there is limited information about its overall genetic structure, relationship with yaks in other areas, and possible origins and evolutionary processes. In this study, 7,693,689 high-quality single-nucleotide polymorphisms were identified by resequencing the genome of Jinchuan yak. Principal component and population genetic structure analyses showed that Jinchuan yak could be distinguished as an independent population among the domestic yak population. Linkage disequilibrium analysis showed that the decay rate of Jinchuan yak was the lowest of the domestic yak breeds, indicating that the degree of domestication and selection intensity of Jinchuan yak were higher than those of other yak breeds. Combined with archaeological data, we speculated that the origin of domestication of Jinchuan yak was ∼6000 yr ago (4000–10,000 yr ago). The quantitative dynamics of population growth history in Jinchuan yak was similar to that of other breeds of domestic and wild yaks, but was closer to that of the wild yak. No significant gene exchange between Jinchuan and other domestic yaks occurred. Compared with other domestic yaks, Jinchuan yak possessed 339 significantly and positively selected genes, several of which relate to physiological rhythm, histones, and the breed’s excellent production characteristics. Our results provide a basis for the discovery of the evolution, molecular origin, and unique traits of Jinchuan yak. PMID:29339406

  10. Genetic Diversity, Molecular Phylogeny, and Selection Evidence of Jinchuan Yak Revealed by Whole-Genome Resequencing.

    PubMed

    Lan, Daoliang; Xiong, Xianrong; Mipam, Tserang-Donko; Fu, Changxiu; Li, Qiang; Ai, Yi; Hou, Dingchao; Chai, Zhixin; Zhong, Jincheng; Li, Jian

    2018-03-02

    Jinchuan yak, a newly discovered yak breed, not only possesses a large proportion of multi-ribs but also exhibits many good characteristics, such as high meat production, milk yield, and reproductive performance. However, there is limited information about its overall genetic structure, relationship with yaks in other areas, and possible origins and evolutionary processes. In this study, 7,693,689 high-quality single-nucleotide polymorphisms were identified by resequencing the genome of Jinchuan yak. Principal component and population genetic structure analyses showed that Jinchuan yak could be distinguished as an independent population among the domestic yak population. Linkage disequilibrium analysis showed that the decay rate of Jinchuan yak was the lowest of the domestic yak breeds, indicating that the degree of domestication and selection intensity of Jinchuan yak were higher than those of other yak breeds. Combined with archaeological data, we speculated that the origin of domestication of Jinchuan yak was ∼6000 yr ago (4000-10,000 yr ago). The quantitative dynamics of population growth history in Jinchuan yak was similar to that of other breeds of domestic and wild yaks, but was closer to that of the wild yak. No significant gene exchange between Jinchuan and other domestic yaks occurred. Compared with other domestic yaks, Jinchuan yak possessed 339 significantly and positively selected genes, several of which relate to physiological rhythm, histones, and the breed's excellent production characteristics. Our results provide a basis for the discovery of the evolution, molecular origin, and unique traits of Jinchuan yak. Copyright © 2018 Lan et al.

  11. Analysis of dust samples from the Middle East using high-density resequencing micro-array RPM-TEI

    NASA Astrophysics Data System (ADS)

    Leski, T. A.; Gregory, M. J.; Malanoski, A. P.; Smith, J. P.; Glaven, R. H.; Wang, Z.; Stenger, D. A.; Lin, B.

    2010-04-01

    A previously developed resequencing microarray, "Tropical and Emerging Infections (RPM-TEI v.1.0 chip)", designed to identify and discriminate between tropical diseases and other potential biothreat agents, their near-neighbor species, and/or potential confounders, was used to characterize the microbes present in the silt/clay fraction of surface soils and airborne dust collected from the Middle East. Local populations and U.S. military personnel deployed to the Middle East are regularly subjected to high levels of airborne desert dust containing a significant fraction of inhalable particles and some portion require clinical aid. Not all of the clinical symptoms can be directly attributed to the physical action of material in the human respiratory tract. To better understand the potential health effects of the airborne dust, the composition of the microbial communities associated with surface soil and/or airborne dust (air filter) samples from 19 different sites in Iraq and Kuwait was identified using RPM-TEI v.1.0. Results indicated that several microorganisms including a class of rapidly growing Mycobacterium, Bacillus, Brucella, Clostridium and Coxiella burnetti, were present in the samples. The presence of these organisms in the surface soils and the inhalable fraction of airborne dust analyzed may pose a human health risk and warrants further investigation. Better understanding of the factors influencing the composition of these microbial communities is important to address questions related to human health and is critical to achieving Force Health Protection for the Warfighter operating in the Middle East, Afghanistan, North Africa and other arid regions.

  12. Genome-wide patterns of recombination, linkage disequilibrium and nucleotide diversity from pooled resequencing and single nucleotide polymorphism genotyping unlock the evolutionary history of Eucalyptus grandis.

    PubMed

    Silva-Junior, Orzenil B; Grattapaglia, Dario

    2015-11-01

    We used high-density single nucleotide polymorphism (SNP) data and whole-genome pooled resequencing to examine the landscape of population recombination (ρ) and nucleotide diversity (ϴw ), assess the extent of linkage disequilibrium (r(2) ) and build the highest density linkage maps for Eucalyptus. At the genome-wide level, linkage disequilibrium (LD) decayed within c. 4-6 kb, slower than previously reported from candidate gene studies, but showing considerable variation from absence to complete LD up to 50 kb. A sharp decrease in the estimate of ρ was seen when going from short to genome-wide inter-SNP distances, highlighting the dependence of this parameter on the scale of observation adopted. Recombination was correlated with nucleotide diversity, gene density and distance from the centromere, with hotspots of recombination enriched for genes involved in chemical reactions and pathways of the normal metabolic processes. The high nucleotide diversity (ϴw = 0.022) of E. grandis revealed that mutation is more important than recombination in shaping its genomic diversity (ρ/ϴw = 0.645). Chromosome-wide ancestral recombination graphs allowed us to date the split of E. grandis (1.7-4.8 million yr ago) and identify a scenario for the recent demographic history of the species. Our results have considerable practical importance to Genome Wide Association Studies (GWAS), while indicating bright prospects for genomic prediction of complex phenotypes in eucalypt breeding. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.

  13. Medical Sequencing at the extremes of Human Body Mass

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ahituv, Nadav; Kavaslar, Nihan; Schackwitz, Wendy

    2006-09-01

    Body weight is a quantitative trait with significantheritability in humans. To identify potential genetic contributors tothis phenotype, we resequenced the coding exons and splice junctions of58 genes in 379 obese and 378 lean individuals. Our 96Mb survey included21 genes associated with monogenic forms of obesity in humans or mice, aswell as 37 genes that function in body weight-related pathways. We foundthat the monogenic obesity-associated gene group was enriched for rarenonsynonymous variants unique to the obese (n=46) versus lean (n=26)populations. Computational analysis further predicted a significantlygreater fraction of deleterious variants within the obese cohort.Consistent with the complex inheritance of body weight,more » we did notobserve obvious familial segregation in the majority of the 28 availablekindreds. Taken together, these data suggest that multiple rare alleleswith variable penetrance contribute to obesity in the population andprovide a deep medical sequencing based approach to detectthem.« less

  14. From cancer genomes to cancer models: bridging the gaps

    PubMed Central

    Baudot, Anaïs; Real, Francisco X.; Izarzugaza, José M. G.; Valencia, Alfonso

    2009-01-01

    Cancer genome projects are now being expanded in an attempt to provide complete landscapes of the mutations that exist in tumours. Although the importance of cataloguing genome variations is well recognized, there are obvious difficulties in bridging the gaps between high-throughput resequencing information and the molecular mechanisms of cancer evolution. Here, we describe the current status of the high-throughput genomic technologies, and the current limitations of the associated computational analysis and experimental validation of cancer genetic variants. We emphasize how the current cancer-evolution models will be influenced by the high-throughput approaches, in particular through efforts devoted to monitoring tumour progression, and how, in turn, the integration of data and models will be translated into mechanistic knowledge and clinical applications. PMID:19305388

  15. High-density genetic map using whole-genome resequencing for fine mapping and candidate gene discovery for disease resistance in peanut.

    PubMed

    Agarwal, Gaurav; Clevenger, Josh; Pandey, Manish K; Wang, Hui; Shasidhar, Yaduru; Chu, Ye; Fountain, Jake C; Choudhary, Divya; Culbreath, Albert K; Liu, Xin; Huang, Guodong; Wang, Xingjun; Deshmukh, Rupesh; Holbrook, C Corley; Bertioli, David J; Ozias-Akins, Peggy; Jackson, Scott A; Varshney, Rajeev K; Guo, Baozhu

    2018-04-10

    Whole-genome resequencing (WGRS) of mapping populations has facilitated development of high-density genetic maps essential for fine mapping and candidate gene discovery for traits of interest in crop species. Leaf spots, including early leaf spot (ELS) and late leaf spot (LLS), and Tomato spotted wilt virus (TSWV) are devastating diseases in peanut causing significant yield loss. We generated WGRS data on a recombinant inbred line population, developed a SNP-based high-density genetic map, and conducted fine mapping, candidate gene discovery and marker validation for ELS, LLS and TSWV. The first sequence-based high-density map was constructed with 8869 SNPs assigned to 20 linkage groups, representing 20 chromosomes, for the 'T' population (Tifrunner × GT-C20) with a map length of 3120 cM and an average distance of 1.45 cM. The quantitative trait locus (QTL) analysis using high-density genetic map and multiple season phenotyping data identified 35 main-effect QTLs with phenotypic variation explained (PVE) from 6.32% to 47.63%. Among major-effect QTLs mapped, there were two QTLs for ELS on B05 with 47.42% PVE and B03 with 47.38% PVE, two QTLs for LLS on A05 with 47.63% and B03 with 34.03% PVE and one QTL for TSWV on B09 with 40.71% PVE. The epistasis and environment interaction analyses identified significant environmental effects on these traits. The identified QTL regions had disease resistance genes including R-genes and transcription factors. KASP markers were developed for major QTLs and validated in the population and are ready for further deployment in genomics-assisted breeding in peanut. © 2018 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  16. Whole genome resequencing of a laboratory-adapted Drosophila melanogaster population sample

    PubMed Central

    Gilks, William P.; Pennell, Tanya M.; Flis, Ilona; Webster, Matthew T.; Morrow, Edward H.

    2016-01-01

    As part of a study into the molecular genetics of sexually dimorphic complex traits, we used high-throughput sequencing to obtain data on genomic variation in an outbred laboratory-adapted fruit fly ( Drosophila melanogaster) population. We successfully resequenced the whole genome of 220 hemiclonal females that were heterozygous for the same Berkeley reference line genome (BDGP6/dm6), and a unique haplotype from the outbred base population (LH M). The use of a static and known genetic background enabled us to obtain sequences from whole-genome phased haplotypes. We used a BWA-Picard-GATK pipeline for mapping sequence reads to the dm6 reference genome assembly, at a median depth-of coverage of 31X, and have made the resulting data publicly-available in the NCBI Short Read Archive (Accession number SRP058502). We used Haplotype Caller to discover and genotype 1,726,931 small genomic variants (SNPs and indels, <200bp). Additionally we detected and genotyped 167 large structural variants (1-100Kb in size) using GenomeStrip/2.0. Sequence and genotype data are publicly-available at the corresponding NCBI databases: Short Read Archive, dbSNP and dbVar (BioProject PRJNA282591). We have also released the unfiltered genotype data, and the code and logs for data processing and summary statistics ( https://zenodo.org/communities/sussex_drosophila_sequencing/). PMID:27928499

  17. Identification of the maize gravitropism gene lazy plant1 by a transposon-tagging genome resequencing strategy.

    PubMed

    Howard, Thomas P; Hayward, Andrew P; Tordillos, Anthony; Fragoso, Christopher; Moreno, Maria A; Tohme, Joe; Kausch, Albert P; Mottinger, John P; Dellaporta, Stephen L

    2014-01-01

    Since their initial discovery, transposons have been widely used as mutagens for forward and reverse genetic screens in a range of organisms. The problems of high copy number and sequence divergence among related transposons have often limited the efficiency at which tagged genes can be identified. A method was developed to identity the locations of Mutator (Mu) transposons in the Zea mays genome using a simple enrichment method combined with genome resequencing to identify transposon junction fragments. The sequencing library was prepared from genomic DNA by digesting with a restriction enzyme that cuts within a perfectly conserved motif of the Mu terminal inverted repeats (TIR). Paired-end reads containing Mu TIR sequences were computationally identified and chromosomal sequences flanking the transposon were mapped to the maize reference genome. This method has been used to identify Mu insertions in a number of alleles and to isolate the previously unidentified lazy plant1 (la1) gene. The la1 gene is required for the negatively gravitropic response of shoots and mutant plants lack the ability to sense gravity. Using bioinformatic and fluorescence microscopy approaches, we show that the la1 gene encodes a cell membrane and nuclear localized protein. Our Mu-Taq method is readily adaptable to identify the genomic locations of any insertion of a known sequence in any organism using any sequencing platform.

  18. Genomic dissection of a ‘Fuji’ apple cultivar: re-sequencing, SNP marker development, definition of haplotypes, and QTL detection

    PubMed Central

    Kunihisa, Miyuki; Moriya, Shigeki; Abe, Kazuyuki; Okada, Kazuma; Haji, Takashi; Hayashi, Takeshi; Kawahara, Yoshihiro; Itoh, Ryutaro; Itoh, Takeshi; Katayose, Yuichi; Kanamori, Hiroyuki; Matsumoto, Toshimi; Mori, Satomi; Sasaki, Harumi; Matsumoto, Takashi; Nishitani, Chikako; Terakami, Shingo; Yamamoto, Toshiya

    2016-01-01

    ‘Fuji’ is one of the most popular and highly-produced apple cultivars worldwide, and has been frequently used in breeding programs. The development of genotypic markers for the preferable phenotypes of ‘Fuji’ is required. Here, we aimed to define the haplotypes of ‘Fuji’ and find associations between haplotypes and phenotypes of five traits (harvest day, fruit weight, acidity, degree of watercore, and flesh mealiness) by using 115 accessions related to ‘Fuji’. Through the re-sequencing of ‘Fuji’ genome, total of 2,820,759 variants, including single nucleotide polymorphisms (SNPs) and insertions or deletions (indels) were detected between ‘Fuji’ and ‘Golden Delicious’ reference genome. We selected mapping-validated 1,014 SNPs, most of which were heterozygous in ‘Fuji’ and capable of distinguishing alleles inherited from the parents of ‘Fuji’ (i.e., ‘Ralls Janet’ and ‘Delicious’). We used these SNPs to define the haplotypes of ‘Fuji’ and trace their inheritance in relatives, which were shown to have an average of 27% of ‘Fuji’ genome. Analysis of variance (ANOVA) based on ‘Fuji’ haplotypes identified one quantitative trait loci (QTL) each for harvest time, acidity, degree of watercore, and mealiness. A haplotype from ‘Delicious’ chr14 was considered to dominantly cause watercore, and one from ‘Ralls Janet’ chr1 was related to low-mealiness. PMID:27795675

  19. Identification of the Maize Gravitropism Gene lazy plant1 by a Transposon-Tagging Genome Resequencing Strategy

    PubMed Central

    Howard, Thomas P.; Hayward, Andrew P.; Tordillos, Anthony; Fragoso, Christopher; Moreno, Maria A.; Tohme, Joe; Kausch, Albert P.; Mottinger, John P.; Dellaporta, Stephen L.

    2014-01-01

    Since their initial discovery, transposons have been widely used as mutagens for forward and reverse genetic screens in a range of organisms. The problems of high copy number and sequence divergence among related transposons have often limited the efficiency at which tagged genes can be identified. A method was developed to identity the locations of Mutator (Mu) transposons in the Zea mays genome using a simple enrichment method combined with genome resequencing to identify transposon junction fragments. The sequencing library was prepared from genomic DNA by digesting with a restriction enzyme that cuts within a perfectly conserved motif of the Mu terminal inverted repeats (TIR). Paired-end reads containing Mu TIR sequences were computationally identified and chromosomal sequences flanking the transposon were mapped to the maize reference genome. This method has been used to identify Mu insertions in a number of alleles and to isolate the previously unidentified lazy plant1 (la1) gene. The la1 gene is required for the negatively gravitropic response of shoots and mutant plants lack the ability to sense gravity. Using bioinformatic and fluorescence microscopy approaches, we show that the la1 gene encodes a cell membrane and nuclear localized protein. Our Mu-Taq method is readily adaptable to identify the genomic locations of any insertion of a known sequence in any organism using any sequencing platform. PMID:24498020

  20. HTSstation: a web application and open-access libraries for high-throughput sequencing data analysis.

    PubMed

    David, Fabrice P A; Delafontaine, Julien; Carat, Solenne; Ross, Frederick J; Lefebvre, Gregory; Jarosz, Yohan; Sinclair, Lucas; Noordermeer, Daan; Rougemont, Jacques; Leleu, Marion

    2014-01-01

    The HTSstation analysis portal is a suite of simple web forms coupled to modular analysis pipelines for various applications of High-Throughput Sequencing including ChIP-seq, RNA-seq, 4C-seq and re-sequencing. HTSstation offers biologists the possibility to rapidly investigate their HTS data using an intuitive web application with heuristically pre-defined parameters. A number of open-source software components have been implemented and can be used to build, configure and run HTS analysis pipelines reactively. Besides, our programming framework empowers developers with the possibility to design their own workflows and integrate additional third-party software. The HTSstation web application is accessible at http://htsstation.epfl.ch.

  1. HTSstation: A Web Application and Open-Access Libraries for High-Throughput Sequencing Data Analysis

    PubMed Central

    David, Fabrice P. A.; Delafontaine, Julien; Carat, Solenne; Ross, Frederick J.; Lefebvre, Gregory; Jarosz, Yohan; Sinclair, Lucas; Noordermeer, Daan; Rougemont, Jacques; Leleu, Marion

    2014-01-01

    The HTSstation analysis portal is a suite of simple web forms coupled to modular analysis pipelines for various applications of High-Throughput Sequencing including ChIP-seq, RNA-seq, 4C-seq and re-sequencing. HTSstation offers biologists the possibility to rapidly investigate their HTS data using an intuitive web application with heuristically pre-defined parameters. A number of open-source software components have been implemented and can be used to build, configure and run HTS analysis pipelines reactively. Besides, our programming framework empowers developers with the possibility to design their own workflows and integrate additional third-party software. The HTSstation web application is accessible at http://htsstation.epfl.ch. PMID:24475057

  2. Population genomics provide insights into the evolution and adaptation of the eastern honey bee (Apis cerana).

    PubMed

    Chen, Chao; Wang, Huihua; Liu, Zhiguang; Chen, Xiao; Tang, Jiao; Meng, Fanming; Shi, Wei

    2018-06-20

    The mechanisms by which organisms adapt to variable environments are a fundamental question in evolutionary biology and are important to protect important species in response to a changing climate. An interesting candidate to study this question is the honey bee Apis cerana, a keystone pollinator with a wide distribution throughout a large variety of climates, that exhibits rapid dispersal. Here, we re-sequenced the genome of 180 A. cerana individuals from eighteen populations throughout China. Using a population genomics approach, we observed considerable genetic variation in A. cerana. Patterns of genetic differentiation indicate high divergence at the subspecies level, and physical barriers rather than distance are the driving force for population divergence. Estimations of divergence time suggested that the main branches diverged between 300 and 500 ka. Analyses of the population history revealed a substantial influence of the Earth's climate on the effective population size of A. cerana, as increased population sizes were observed during warmer periods. Further analyses identified candidate genes under natural selection that are potentially related to honey bee cognition, temperature adaptation, and olfactory. Based on our results, A. cerana may have great potential in response to climate change. Our study provides fundamental knowledge of the evolution and adaptation of A. cerana.

  3. Genetic Variation in FABP4 and Evaluation of Its Effects on Beef Cattle Fat Content.

    PubMed

    Goszczynski, Daniel E; Papaleo-Mazzucco, Juliana; Ripoli, María V; Villarreal, Edgardo L; Rogberg-Muñoz, Andrés; Mezzadra, Carlos A; Melucci, Lilia M; Giovambattista, Guillermo

    2017-07-03

    FABP4 is a protein primarily expressed in adipocytes and macrophages that plays a key role in fatty acid trafficking and lipid hydrolysis. FABP4 gene polymorphisms have been associated with meat quality traits in cattle, mostly in Asian breeds under feedlot conditions. The objectives of this work were to characterize FABP4 genetic variation in several worldwide cattle breeds and evaluate possible genotype effects on fat content in a pasture-fed crossbred (Angus-Hereford-Limousin) population. We re-sequenced 43 unrelated animals from nine cattle breeds (Angus, Brahman, Creole, Hereford, Holstein, Limousin, Nelore, Shorthorn, and Wagyu) and obtained 22 single nucleotide polymorphisms (SNPs) over 3,164 bp, including four novel polymorphisms. Haplotypes and linkage disequilibrium analyses showed a high variability. Five SNPs were selected to perform validation and association studies in our crossbred population. Four SNPs showed well-balanced allele frequencies (minor frequency > 0.159), and three showed no significant deviations from Hardy-Weinberg proportions. SNPs showed significant effects on backfat thickness and fatty acid composition (P < 0.05). The protein structure of one of the missense SNPs was analyzed to elucidate its possible effect on fat content in our studied population. Our results revealed a possible blockage of the fatty acid binding site by the missense mutation.

  4. Multiplexed resequencing analysis to identify rare variants in pooled DNA with barcode indexing using next-generation sequencer.

    PubMed

    Mitsui, Jun; Fukuda, Yoko; Azuma, Kyo; Tozaki, Hirokazu; Ishiura, Hiroyuki; Takahashi, Yuji; Goto, Jun; Tsuji, Shoji

    2010-07-01

    We have recently found that multiple rare variants of the glucocerebrosidase gene (GBA) confer a robust risk for Parkinson disease, supporting the 'common disease-multiple rare variants' hypothesis. To develop an efficient method of identifying rare variants in a large number of samples, we applied multiplexed resequencing using a next-generation sequencer to identification of rare variants of GBA. Sixteen sets of pooled DNAs from six pooled DNA samples were prepared. Each set of pooled DNAs was subjected to polymerase chain reaction to amplify the target gene (GBA) covering 6.5 kb, pooled into one tube with barcode indexing, and then subjected to extensive sequence analysis using the SOLiD System. Individual samples were also subjected to direct nucleotide sequence analysis. With the optimization of data processing, we were able to extract all the variants from 96 samples with acceptable rates of false-positive single-nucleotide variants.

  5. Whole-genome resequencing of 292 pigeonpea accessions identifies genomic regions associated with domestication and agronomic traits.

    PubMed

    Varshney, Rajeev K; Saxena, Rachit K; Upadhyaya, Hari D; Khan, Aamir W; Yu, Yue; Kim, Changhoon; Rathore, Abhishek; Kim, Dongseon; Kim, Jihun; An, Shaun; Kumar, Vinay; Anuradha, Ghanta; Yamini, Kalinati Narasimhan; Zhang, Wei; Muniswamy, Sonnappa; Kim, Jong-So; Penmetsa, R Varma; von Wettberg, Eric; Datta, Swapan K

    2017-07-01

    Pigeonpea (Cajanus cajan), a tropical grain legume with low input requirements, is expected to continue to have an important role in supplying food and nutritional security in developing countries in Asia, Africa and the tropical Americas. From whole-genome resequencing of 292 Cajanus accessions encompassing breeding lines, landraces and wild species, we characterize genome-wide variation. On the basis of a scan for selective sweeps, we find several genomic regions that were likely targets of domestication and breeding. Using genome-wide association analysis, we identify associations between several candidate genes and agronomically important traits. Candidate genes for these traits in pigeonpea have sequence similarity to genes functionally characterized in other plants for flowering time control, seed development and pod dehiscence. Our findings will allow acceleration of genetic gains for key traits to improve yield and sustainability in pigeonpea.

  6. Detecting Directional Selection in the Presence of Recent Admixture in African-Americans

    PubMed Central

    Lohmueller, Kirk E.; Bustamante, Carlos D.; Clark, Andrew G.

    2011-01-01

    We investigate the performance of tests of neutrality in admixed populations using plausible demographic models for African-American history as well as resequencing data from African and African-American populations. The analysis of both simulated and human resequencing data suggests that recent admixture does not result in an excess of false-positive results for neutrality tests based on the frequency spectrum after accounting for the population growth in the parental African population. Furthermore, when simulating positive selection, Tajima's D, Fu and Li's D, and haplotype homozygosity have lower power to detect population-specific selection using individuals sampled from the admixed population than from the nonadmixed population. Fay and Wu's H test, however, has more power to detect selection using individuals from the admixed population than from the nonadmixed population, especially when the selective sweep ended long ago. Our results have implications for interpreting recent genome-wide scans for positive selection in human populations. PMID:21196524

  7. Candidate adaptive genes associated with lineage divergence: identifying SNPs via next-generation targeted resequencing in mule deer (Odocoileus hemionus).

    PubMed

    Powell, John H; Amish, Stephen J; Haynes, Gwilym D; Luikart, Gordon; Latch, Emily K

    2016-09-01

    Mule deer (Odocoileus hemionus) are an excellent nonmodel species for empirically testing hypotheses in landscape and population genomics due to their large population sizes (low genetic drift), relatively continuous distribution, diversity of occupied habitats and phenotypic variation. Because few genomic resources are currently available for this species, we used exon data from a cattle (Bos taurus) reference genome to direct targeted resequencing of 5935 genes in mule deer. We sequenced approximately 3.75 Mbp at minimum 20X coverage in each of the seven mule deer, identifying 23 204 single nucleotide polymorphisms (SNPs) within, or adjacent to, 6886 exons in 3559 genes. We found 91 SNP loci (from 69 genes) with putatively fixed allele frequency differences between the two major lineages of mule deer (mule deer and black-tailed deer), and our estimate of mean genetic divergence (genome-wide FST  = 0.123) between these lineages was consistent with previous findings using microsatellite loci. We detected an over-representation of gamete generation and amino acid transport genes among the genes with SNPs exhibiting potentially fixed allele frequency differences between lineages. This targeted resequencing approach using exon capture techniques has identified a suite of loci that can be used in future research to investigate the genomic basis of adaptation and differentiation between black-tailed deer and mule deer. This study also highlights techniques (and an exon capture array) that will facilitate population genomic research in other cervids and nonmodel organisms. © 2016 John Wiley & Sons Ltd.

  8. Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score.

    PubMed

    Lee, Hayan; Schatz, Michael C

    2012-08-15

    Genome resequencing and short read mapping are two of the primary tools of genomics and are used for many important applications. The current state-of-the-art in mapping uses the quality values and mapping quality scores to evaluate the reliability of the mapping. These attributes, however, are assigned to individual reads and do not directly measure the problematic repeats across the genome. Here, we present the Genome Mappability Score (GMS) as a novel measure of the complexity of resequencing a genome. The GMS is a weighted probability that any read could be unambiguously mapped to a given position and thus measures the overall composition of the genome itself. We have developed the Genome Mappability Analyzer to compute the GMS of every position in a genome. It leverages the parallelism of cloud computing to analyze large genomes, and enabled us to identify the 5-14% of the human, mouse, fly and yeast genomes that are difficult to analyze with short reads. We examined the accuracy of the widely used BWA/SAMtools polymorphism discovery pipeline in the context of the GMS, and found discovery errors are dominated by false negatives, especially in regions with poor GMS. These errors are fundamental to the mapping process and cannot be overcome by increasing coverage. As such, the GMS should be considered in every resequencing project to pinpoint the 'dark matter' of the genome, including of known clinically relevant variations in these regions. The source code and profiles of several model organisms are available at http://gma-bio.sourceforge.net

  9. Integrating Transcriptome and Genome Re-Sequencing Data to Identify Key Genes and Mutations Affecting Chicken Eggshell Qualities

    PubMed Central

    Liu, Long; Zheng, Chuan Wei; Wang, De He; Hou, Zhuo Cheng; Ning, Zhong Hua

    2015-01-01

    Eggshell damages lead to economic losses in the egg production industry and are a threat to human health. We examined 49-wk-old Rhode Island White hens (Gallus gallus) that laid eggs having shells with significantly different strengths and thicknesses. We used HiSeq 2000 (Illumina) sequencing to characterize the chicken transcriptome and whole genome to identify the key genes and genetic mutations associated with eggshell calcification. We identified a total of 14,234 genes expressed in the chicken uterus, representing 89% of all annotated chicken genes. A total of 889 differentially expressed genes were identified by comparing low eggshell strength (LES) and normal eggshell strength (NES) genomes. The DEGs are enriched in calcification-related processes, including calcium ion transport and calcium signaling pathways as reveled by gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis. Some important matrix proteins, such as OC-116, LTF and SPP1, were also expressed differentially between two groups. A total of 3,671,919 single-nucleotide polymorphisms (SNPs) and 508,035 Indels were detected in protein coding genes by whole-genome re-sequencing, including 1775 non-synonymous variations and 19 frame-shift Indels in DEGs. SNPs and Indels found in this study could be further investigated for eggshell traits. This is the first report to integrate the transcriptome and genome re-sequencing to target the genetic variations which decreased the eggshell qualities. These findings further advance our understanding of eggshell calcification in the chicken uterus. PMID:25974068

  10. Diversity and Genome Analysis of Australian and Global Oilseed Brassica napus L. Germplasm Using Transcriptomics and Whole Genome Re-sequencing.

    PubMed

    Malmberg, M Michelle; Shi, Fan; Spangenberg, German C; Daetwyler, Hans D; Cogan, Noel O I

    2018-01-01

    Intensive breeding of Brassica napus has resulted in relatively low diversity, such that B. napus would benefit from germplasm improvement schemes that sustain diversity. As such, samples representative of global germplasm pools need to be assessed for existing population structure, diversity and linkage disequilibrium (LD). Complexity reduction genotyping-by-sequencing (GBS) methods, including GBS-transcriptomics (GBS-t), enable cost-effective screening of a large number of samples, while whole genome re-sequencing (WGR) delivers the ability to generate large numbers of unbiased genomic single nucleotide polymorphisms (SNPs), and identify structural variants (SVs). Furthermore, the development of genomic tools based on whole genomes representative of global oilseed diversity and orientated by the reference genome has substantial industry relevance and will be highly beneficial for canola breeding. As recent studies have focused on European and Chinese varieties, a global diversity panel as well as a substantial number of Australian spring types were included in this study. Focusing on industry relevance, 633 varieties were initially genotyped using GBS-t to examine population structure using 61,037 SNPs. Subsequently, 149 samples representative of global diversity were selected for WGR and both data sets used for a side-by-side evaluation of diversity and LD. The WGR data was further used to develop genomic resources consisting of a list of 4,029,750 high-confidence SNPs annotated using SnpEff, and SVs in the form of 10,976 deletions and 2,556 insertions. These resources form the basis of a reliable and repeatable system allowing greater integration between canola genomics studies, with a strong focus on breeding germplasm and industry applicability.

  11. High-throughput interpretation of gene structure changes in human and nonhuman resequencing data, using ACE

    PubMed Central

    Majoros, William H.; Campbell, Michael S.; Holt, Carson; DeNardo, Erin K.; Ware, Doreen; Allen, Andrew S.; Yandell, Mark; Reddy, Timothy E.

    2017-01-01

    Abstract Motivation: The accurate interpretation of genetic variants is critical for characterizing genotype–phenotype associations. Because the effects of genetic variants can depend strongly on their local genomic context, accurate genome annotations are essential. Furthermore, as some variants have the potential to disrupt or alter gene structure, variant interpretation efforts stand to gain from the use of individualized annotations that account for differences in gene structure between individuals or strains. Results: We describe a suite of software tools for identifying possible functional changes in gene structure that may result from sequence variants. ACE (‘Assessing Changes to Exons’) converts phased genotype calls to a collection of explicit haplotype sequences, maps transcript annotations onto them, detects gene-structure changes and their possible repercussions, and identifies several classes of possible loss of function. Novel transcripts predicted by ACE are commonly supported by spliced RNA-seq reads, and can be used to improve read alignment and transcript quantification when an individual-specific genome sequence is available. Using publicly available RNA-seq data, we show that ACE predictions confirm earlier results regarding the quantitative effects of nonsense-mediated decay, and we show that predicted loss-of-function events are highly concordant with patterns of intolerance to mutations across the human population. ACE can be readily applied to diverse species including animals and plants, making it a broadly useful tool for use in eukaryotic population-based resequencing projects, particularly for assessing the joint impact of all variants at a locus. Availability and Implementation: ACE is written in open-source C ++ and Perl and is available from geneprediction.org/ACE Contact: myandell@genetics.utah.edu or tim.reddy@duke.edu Supplementary information: Supplementary information is available at Bioinformatics online. PMID:28011790

  12. High-throughput interpretation of gene structure changes in human and nonhuman resequencing data, using ACE.

    PubMed

    Majoros, William H; Campbell, Michael S; Holt, Carson; DeNardo, Erin K; Ware, Doreen; Allen, Andrew S; Yandell, Mark; Reddy, Timothy E

    2017-05-15

    The accurate interpretation of genetic variants is critical for characterizing genotype-phenotype associations. Because the effects of genetic variants can depend strongly on their local genomic context, accurate genome annotations are essential. Furthermore, as some variants have the potential to disrupt or alter gene structure, variant interpretation efforts stand to gain from the use of individualized annotations that account for differences in gene structure between individuals or strains. We describe a suite of software tools for identifying possible functional changes in gene structure that may result from sequence variants. ACE ('Assessing Changes to Exons') converts phased genotype calls to a collection of explicit haplotype sequences, maps transcript annotations onto them, detects gene-structure changes and their possible repercussions, and identifies several classes of possible loss of function. Novel transcripts predicted by ACE are commonly supported by spliced RNA-seq reads, and can be used to improve read alignment and transcript quantification when an individual-specific genome sequence is available. Using publicly available RNA-seq data, we show that ACE predictions confirm earlier results regarding the quantitative effects of nonsense-mediated decay, and we show that predicted loss-of-function events are highly concordant with patterns of intolerance to mutations across the human population. ACE can be readily applied to diverse species including animals and plants, making it a broadly useful tool for use in eukaryotic population-based resequencing projects, particularly for assessing the joint impact of all variants at a locus. ACE is written in open-source C ++ and Perl and is available from geneprediction.org/ACE. myandell@genetics.utah.edu or tim.reddy@duke.edu. Supplementary information is available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  13. Application of High-Density DNA Resequencing Microarray for Detection and Characterization of Botulinum Neurotoxin-Producing Clostridia

    PubMed Central

    Vanhomwegen, Jessica; Berthet, Nicolas; Mazuet, Christelle; Guigon, Ghislaine; Vallaeys, Tatiana; Stamboliyska, Rayna; Dubois, Philippe; Kennedy, Giulia C.; Cole, Stewart T.; Caro, Valérie; Manuguerra, Jean-Claude; Popoff, Michel-Robert

    2013-01-01

    Background Clostridium botulinum and related clostridia express extremely potent toxins known as botulinum neurotoxins (BoNTs) that cause severe, potentially lethal intoxications in humans. These BoNT-producing bacteria are categorized in seven major toxinotypes (A through G) and several subtypes. The high diversity in nucleotide sequence and genetic organization of the gene cluster encoding the BoNT components poses a great challenge for the screening and characterization of BoNT-producing strains. Methodology/Principal Findings In the present study, we designed and evaluated the performances of a resequencing microarray (RMA), the PathogenId v2.0, combined with an automated data approach for the simultaneous detection and characterization of BoNT-producing clostridia. The unique design of the PathogenID v2.0 array allows the simultaneous detection and characterization of 48 sequences targeting the BoNT gene cluster components. This approach allowed successful identification and typing of representative strains of the different toxinotypes and subtypes, as well as the neurotoxin-producing C. botulinum strain in a naturally contaminated food sample. Moreover, the method allowed fine characterization of the different neurotoxin gene cluster components of all studied strains, including genomic regions exhibiting up to 24.65% divergence with the sequences tiled on the arrays. Conclusions/Significance The severity of the disease demands rapid and accurate means for performing risk assessments of BoNT-producing clostridia and for tracing potentials sources of contamination in outbreak situations. The RMA approach constitutes an essential higher echelon component in a diagnostics and surveillance pipeline. In addition, it is an important asset to characterise potential outbreak related strains, but also environment isolates, in order to obtain a better picture of the molecular epidemiology of BoNT-producing clostridia. PMID:23818983

  14. An Efficient Approach for the Development of Locus Specific Primers in Bread Wheat (Triticum aestivum L.) and Its Application to Re-Sequencing of Genes Involved in Frost Tolerance

    PubMed Central

    Babben, Steve; Perovic, Dragan; Koch, Michael; Ordon, Frank

    2015-01-01

    Recent declines in costs accelerated sequencing of many species with large genomes, including hexaploid wheat (Triticum aestivum L.). Although the draft sequence of bread wheat is known, it is still one of the major challenges to developlocus specific primers suitable to be used in marker assisted selection procedures, due to the high homology of the three genomes. In this study we describe an efficient approach for the development of locus specific primers comprising four steps, i.e. (i) identification of genomic and coding sequences (CDS) of candidate genes, (ii) intron- and exon-structure reconstruction, (iii) identification of wheat A, B and D sub-genome sequences and primer development based on sequence differences between the three sub-genomes, and (iv); testing of primers for functionality, correct size and localisation. This approach was applied to single, low and high copy genes involved in frost tolerance in wheat. In summary for 27 of these genes for which sequences were derived from Triticum aestivum, Triticum monococcum and Hordeum vulgare, a set of 119 primer pairs was developed and after testing on Nulli-tetrasomic (NT) lines, a set of 65 primer pairs (54.6%), corresponding to 19 candidate genes, turned out to be specific. Out of these a set of 35 fragments was selected for validation via Sanger's amplicon re-sequencing. All fragments, with the exception of one, could be assigned to the original reference sequence. The approach presented here showed a much higher specificity in primer development in comparison to techniques used so far in bread wheat and can be applied to other polyploid species with a known draft sequence. PMID:26565976

  15. Bayesian Inference of Shared Recombination Hotspots Between Humans and Chimpanzees

    PubMed Central

    Wang, Ying; Rannala, Bruce

    2014-01-01

    Recombination generates variation and facilitates evolution. Recombination (or lack thereof) also contributes to human genetic disease. Methods for mapping genes influencing complex genetic diseases via association rely on linkage disequilibrium (LD) in human populations, which is influenced by rates of recombination across the genome. Comparative population genomic analyses of recombination using related primate species can identify factors influencing rates of recombination in humans. Such studies can indicate how variable hotspots for recombination may be both among individuals (or populations) and over evolutionary timescales. Previous studies have suggested that locations of recombination hotspots are not conserved between humans and chimpanzees. We made use of the data sets from recent resequencing projects and applied a Bayesian method for identifying hotspots and estimating recombination rates. We also reanalyzed SNP data sets for regions with known hotspots in humans using samples from the human and chimpanzee. The Bayes factors (BF) of shared recombination hotspots between human and chimpanzee across regions were obtained. Based on the analysis of the aligned regions of human chromosome 21, locations where the two species show evidence of shared recombination hotspots (with high BFs) were identified. Interestingly, previous comparative studies of human and chimpanzee that focused on the known human recombination hotspots within the β-globin and HLA regions did not find overlapping of hotspots. Our results show high BFs of shared hotspots at locations within both regions, and the estimated locations of shared hotspots overlap with the locations of human recombination hotspots obtained from sperm-typing studies. PMID:25261696

  16. The Impact of Recombination Hotspots on Genome Evolution of a Fungal Plant Pathogen

    PubMed Central

    Croll, Daniel; Lendenmann, Mark H.; Stewart, Ethan; McDonald, Bruce A.

    2015-01-01

    Recombination has an impact on genome evolution by maintaining chromosomal integrity, affecting the efficacy of selection, and increasing genetic variability in populations. Recombination rates are a key determinant of the coevolutionary dynamics between hosts and their pathogens. Historic recombination events created devastating new pathogens, but the impact of ongoing recombination in sexual pathogens is poorly understood. Many fungal pathogens of plants undergo regular sexual cycles, and sex is considered to be a major factor contributing to virulence. We generated a recombination map at kilobase-scale resolution for the haploid plant pathogenic fungus Zymoseptoria tritici. To account for intraspecific variation in recombination rates, we constructed genetic maps from two independent crosses. We localized a total of 10,287 crossover events in 441 progeny and found that recombination rates were highly heterogeneous within and among chromosomes. Recombination rates on large chromosomes were inversely correlated with chromosome length. Short accessory chromosomes often lacked evidence for crossovers between parental chromosomes. Recombination was concentrated in narrow hotspots that were preferentially located close to telomeres. Hotspots were only partially conserved between the two crosses, suggesting that hotspots are short-lived and may vary according to genomic background. Genes located in hotspot regions were enriched in genes encoding secreted proteins. Population resequencing showed that chromosomal regions with high recombination rates were strongly correlated with regions of low linkage disequilibrium. Hence, genes in pathogen recombination hotspots are likely to evolve faster in natural populations and may represent a greater threat to the host. PMID:26392286

  17. Agricultural biodiversity in the post-genomics era

    USDA-ARS?s Scientific Manuscript database

    The toolkit available for assessing and utilizing biological diversity within agricultural systems is rapidly expanding. In particular, genome and transcriptome re-sequencing as well as genome complexity reduction techniques are gaining popularity as the cost of generating short read sequence data d...

  18. Utilizing the Dog Genome in the Search for Novel Candidate Genes Involved in Glioma Development—Genome Wide Association Mapping followed by Targeted Massive Parallel Sequencing Identifies a Strongly Associated Locus

    PubMed Central

    Dickinson, Peter; Xiong, Anqi; York, Daniel; Jayashankar, Kartika; Pielberg, Gerli; Koltookian, Michele; Murén, Eva; Fuxelius, Hans-Henrik; Weishaupt, Holger; Andersson, Göran; Hedhammar, Åke; Bongcam-Rudloff, Erik; Forsberg-Nilsson, Karin

    2016-01-01

    Gliomas are the most common form of malignant primary brain tumors in humans and second most common in dogs, occurring with similar frequencies in both species. Dogs are valuable spontaneous models of human complex diseases including cancers and may provide insight into disease susceptibility and oncogenesis. Several brachycephalic breeds such as Boxer, Bulldog and Boston Terrier have an elevated risk of developing glioma, but others, including Pug and Pekingese, are not at higher risk. To identify glioma-associated genetic susceptibility factors, an across-breed genome-wide association study (GWAS) was performed on 39 dog glioma cases and 141 controls from 25 dog breeds, identifying a genome-wide significant locus on canine chromosome (CFA) 26 (p = 2.8 x 10−8). Targeted re-sequencing of the 3.4 Mb candidate region was performed, followed by genotyping of the 56 SNVs that best fit the association pattern between the re-sequenced cases and controls. We identified three candidate genes that were highly associated with glioma susceptibility: CAMKK2, P2RX7 and DENR. CAMKK2 showed reduced expression in both canine and human brain tumors, and a non-synonymous variant in P2RX7, previously demonstrated to have a 50% decrease in receptor function, was also associated with disease. Thus, one or more of these genes appear to affect glioma susceptibility. PMID:27171399

  19. Whole-genome re-sequencing of two Italian tomato landraces reveals sequence variations in genes associated with stress tolerance, fruit quality and long shelf-life traits.

    PubMed

    Tranchida-Lombardo, Valentina; Aiese Cigliano, Riccardo; Anzar, Irantzu; Landi, Simone; Palombieri, Samuela; Colantuono, Chiara; Bostan, Hamed; Termolino, Pasquale; Aversano, Riccardo; Batelli, Giorgia; Cammareri, Maria; Carputo, Domenico; Chiusano, Maria Luisa; Conicella, Clara; Consiglio, Federica; D'Agostino, Nunzio; De Palma, Monica; Di Matteo, Antonio; Grandillo, Silvana; Sanseverino, Walter; Tucci, Marina; Grillo, Stefania

    2017-11-14

    Tomato is a high value crop and the primary model for fleshy fruit development and ripening. Breeding priorities include increased fruit quality, shelf life and tolerance to stresses. To contribute towards this goal, we re-sequenced the genomes of Corbarino (COR) and Lucariello (LUC) landraces, which both possess the traits of plant adaptation to water deficit, prolonged fruit shelf-life and good fruit quality. Through the newly developed pipeline Reconstructor, we generated the genome sequences of COR and LUC using datasets of 65.8 M and 56.4 M of 30-150 bp paired-end reads, respectively. New contigs including reads that could not be mapped to the tomato reference genome were assembled, and a total of 43, 054 and 44, 579 gene loci were annotated in COR and LUC. Both genomes showed novel regions with similarity to Solanum pimpinellifolium and Solanum pennellii. In addition to small deletions and insertions, 2, 000 and 1, 700 single nucleotide polymorphisms (SNPs) could exert potentially disruptive effects on 1, 371 and 1, 201 genes in COR and LUC, respectively. A detailed survey of the SNPs occurring in fruit quality, shelf life and stress tolerance related-genes identified several candidates of potential relevance. Variations in ethylene response components may concur in determining peculiar phenotypes of COR and LUC. © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  20. Genome-wide SNP identification and QTL mapping for black rot resistance in cabbage.

    PubMed

    Lee, Jonghoon; Izzah, Nur Kholilatul; Jayakodi, Murukarthick; Perumal, Sampath; Joh, Ho Jun; Lee, Hyeon Ju; Lee, Sang-Choon; Park, Jee Young; Yang, Ki-Woung; Nou, Il-Sup; Seo, Joodeok; Yoo, Jaeheung; Suh, Youngdeok; Ahn, Kyounggu; Lee, Ji Hyun; Choi, Gyung Ja; Yu, Yeisoo; Kim, Heebal; Yang, Tae-Jin

    2015-02-03

    Black rot is a destructive bacterial disease causing large yield and quality losses in Brassica oleracea. To detect quantitative trait loci (QTL) for black rot resistance, we performed whole-genome resequencing of two cabbage parental lines and genome-wide SNP identification using the recently published B. oleracea genome sequences as reference. Approximately 11.5 Gb of sequencing data was produced from each parental line. Reference genome-guided mapping and SNP calling revealed 674,521 SNPs between the two cabbage lines, with an average of one SNP per 662.5 bp. Among 167 dCAPS markers derived from candidate SNPs, 117 (70.1%) were validated as bona fide SNPs showing polymorphism between the parental lines. We then improved the resolution of a previous genetic map by adding 103 markers including 87 SNP-based dCAPS markers. The new map composed of 368 markers and covers 1467.3 cM with an average interval of 3.88 cM between adjacent markers. We evaluated black rot resistance in the mapping population in three independent inoculation tests using F2:3 progenies and identified one major QTL and three minor QTLs. We report successful utilization of whole-genome resequencing for large-scale SNP identification and development of molecular markers for genetic map construction. In addition, we identified novel QTLs for black rot resistance. The high-density genetic map will promote QTL analysis for other important agricultural traits and marker-assisted breeding of B. oleracea.

  1. Complete genome assemblies and methylome characterization in infectious diseases

    USDA-ARS?s Scientific Manuscript database

    Understanding the genetic basis of infectious diseases is a critical component to effective treatments. Because of the rapid evolution of bacterial strains and frequent horizontal transfer of DNA between them, resequencing of new isolates against known reference strains often provides an incomplete ...

  2. Genetics Home Reference: anauxetic dysplasia

    MedlinePlus

    ... one gene that provides instructions for making a protein component of the RNase MRP enzyme complex can also cause anauxetic ... A, Donskoi M, Kenna TJ, Thomas GP, Clark GR, Duncan EL, Brown MA. Whole-exome re-sequencing in a family quartet identifies POP1 mutations as ...

  3. Becoming weeds.

    PubMed

    Stewart, C Neal

    2017-04-26

    A new resequencing analysis of weedy rice (Oryza sativa L.) biotypes illuminates distinct evolutionary paths and outcomes of de-domestication and ferality. This largest effort to date in weedy plant genomics gives a better understanding of weediness while also providing a promising source of alleles for rice breeding.

  4. EDGE 2017 R&D 100 Entry with Appendix

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chain, Patrick Sam Guy; Davenport, Karen Walston; Li, Po-E

    Diabetes, infertility, cancer, and Alzheimer’s disease—the key to one day preventing or even curing such afflictions and diseases (both infectious and genetically driven) may be locked in our own genetic code and the code of microorganisms that inhabit our bodies. The study of this code, known as genomics, has recently become much more promising as a result of two things: (1) vast improvements in high-throughput, nextgeneration sequencing (NSG), and (2) an exponential decrease in the cost of such sequencing. For example, it originally cost approximately $3 billion to sequence the human genome; today, this genome could be resequenced for lessmore » than $1,000.« less

  5. Novel recurrently mutated genes in African American colon cancers.

    PubMed

    Guda, Kishore; Veigl, Martina L; Varadan, Vinay; Nosrati, Arman; Ravi, Lakshmeswari; Lutterbaugh, James; Beard, Lydia; Willson, James K V; Sedwick, W David; Wang, Zhenghe John; Molyneaux, Neil; Miron, Alexander; Adams, Mark D; Elston, Robert C; Markowitz, Sanford D; Willis, Joseph E

    2015-01-27

    We used whole-exome and targeted sequencing to characterize somatic mutations in 103 colorectal cancers (CRC) from African Americans, identifying 20 new genes as significantly mutated in CRC. Resequencing 129 Caucasian derived CRCs confirmed a 15-gene set as a preferential target for mutations in African American CRCs. Two predominant genes, ephrin type A receptor 6 (EPHA6) and folliculin (FLCN), with mutations exclusive to African American CRCs, are by genetic and biological criteria highly likely African American CRC driver genes. These previously unsuspected differences in the mutational landscapes of CRCs arising among individuals of different ethnicities have potential to impact on broader disparities in cancer behaviors.

  6. An Evaluation of Different Target Enrichment Methods in Pooled Sequencing Designs for Complex Disease Association Studies

    PubMed Central

    Day-Williams, Aaron G.; McLay, Kirsten; Drury, Eleanor; Edkins, Sarah; Coffey, Alison J.; Palotie, Aarno; Zeggini, Eleftheria

    2011-01-01

    Pooled sequencing can be a cost-effective approach to disease variant discovery, but its applicability in association studies remains unclear. We compare sequence enrichment methods coupled to next-generation sequencing in non-indexed pools of 1, 2, 10, 20 and 50 individuals and assess their ability to discover variants and to estimate their allele frequencies. We find that pooled resequencing is most usefully applied as a variant discovery tool due to limitations in estimating allele frequency with high enough accuracy for association studies, and that in-solution hybrid-capture performs best among the enrichment methods examined regardless of pool size. PMID:22069447

  7. Haplotype Detection from Next-Generation Sequencing in High-Ploidy-Level Species: 45S rDNA Gene Copies in the Hexaploid Spartina maritima

    PubMed Central

    Boutte, Julien; Aliaga, Benoît; Lima, Oscar; Ferreira de Carvalho, Julie; Ainouche, Abdelkader; Macas, Jiri; Rousseau-Gueutin, Mathieu; Coriton, Olivier; Ainouche, Malika; Salmon, Armel

    2015-01-01

    Gene and whole-genome duplications are widespread in plant nuclear genomes, resulting in sequence heterogeneity. Identification of duplicated genes may be particularly challenging in highly redundant genomes, especially when there are no diploid parents as a reference. Here, we developed a pipeline to detect the different copies in the ribosomal RNA gene family in the hexaploid grass Spartina maritima from next-generation sequencing (Roche-454) reads. The heterogeneity of the different domains of the highly repeated 45S unit was explored by identifying single nucleotide polymorphisms (SNPs) and assembling reads based on shared polymorphisms. SNPs were validated using comparisons with Illumina sequence data sets and by cloning and Sanger (re)sequencing. Using this approach, 29 validated polymorphisms and 11 validated haplotypes were reported (out of 34 and 20, respectively, that were initially predicted by our program). The rDNA domains of S. maritima have similar lengths as those found in other Poaceae, apart from the 5′-ETS, which is approximately two-times longer in S. maritima. Sequence homogeneity was encountered in coding regions and both internal transcribed spacers (ITS), whereas high intragenomic variability was detected in the intergenic spacer (IGS) and the external transcribed spacer (ETS). Molecular cytogenetic analysis by fluorescent in situ hybridization (FISH) revealed the presence of one pair of 45S rDNA signals on the chromosomes of S. maritima instead of three expected pairs for a hexaploid genome, indicating loss of duplicated homeologous loci through the diploidization process. The procedure developed here may be used at any ploidy level and using different sequencing technologies. PMID:26530424

  8. PVRL1 as a Candidate Gene for Nonsyndromic Cleft Lip With or Without Cleft Palate: No Evidence for the Involvement of Common or Rare Variants in Southern Han Chinese Patients

    PubMed Central

    Cheng, Hong-Qiu; Huang, En-Min; Xu, Ming-Yan; Shu, Shen-You

    2012-01-01

    The poliovirus receptor related-1 (PVRL1) gene encodes nectin-1, a cell–cell adhesion molecule (OMIM #600644), and is mutated in the cleft lip with or without cleft palate/ectodermal dysplasia-1 syndrome (CLPED1, OMIM #225000). In addition, PVRL1 mutations have been associated with nonsyndromic cleft lip with or without a cleft palate (NSCL/P) in studies of multiethnic samples. To investigate the possible involvement of this gene in southern Han Chinese NSCL/P patients, we performed (i) a case–control association study, and (ii) a resequencing study. A set of 470 patients with NSCL/P and 693 controls were recruited, and a total of 45 tagging single-nucleotide polymorphisms (SNPs) were genotyped by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. In the resequencing study, the coding regions of the PVRL1 α isoform were direct sequenced in 45 trios from multiply affected families. One (rs7128327) of the 45 tested SNPs showed a trend toward statistical significance in the genotypic-level chi-square test (p=0.009567). However, this result did not withstand correction for multiple testing. Likewise, sliding window haplotype analyses consisting of two, three, or four SNPs failed to detect any positive association. Resequencing analysis also failed to identify any novel rare sequence variants. In conclusion, the present study provided no support for the hypothesis that common or rare variants in PVRL1 play a significant role in NSCL/P development in the southern Han Chinese population. This is the first study that has used tagging SNPs covering all the coding and noncoding regions to search for common NSCL/P-associated mutations of PVRL1. PMID:22455396

  9. Experimental evolution, genetic analysis and genome re-sequencing reveal the mutation conferring artemisinin resistance in an isogenic lineage of malaria parasites

    PubMed Central

    2010-01-01

    Background Classical and quantitative linkage analyses of genetic crosses have traditionally been used to map genes of interest, such as those conferring chloroquine or quinine resistance in malaria parasites. Next-generation sequencing technologies now present the possibility of determining genome-wide genetic variation at single base-pair resolution. Here, we combine in vivo experimental evolution, a rapid genetic strategy and whole genome re-sequencing to identify the precise genetic basis of artemisinin resistance in a lineage of the rodent malaria parasite, Plasmodium chabaudi. Such genetic markers will further the investigation of resistance and its control in natural infections of the human malaria, P. falciparum. Results A lineage of isogenic in vivo drug-selected mutant P. chabaudi parasites was investigated. By measuring the artemisinin responses of these clones, the appearance of an in vivo artemisinin resistance phenotype within the lineage was defined. The underlying genetic locus was mapped to a region of chromosome 2 by Linkage Group Selection in two different genetic crosses. Whole-genome deep coverage short-read re-sequencing (Illumina® Solexa) defined the point mutations, insertions, deletions and copy-number variations arising in the lineage. Eight point mutations arise within the mutant lineage, only one of which appears on chromosome 2. This missense mutation arises contemporaneously with artemisinin resistance and maps to a gene encoding a de-ubiquitinating enzyme. Conclusions This integrated approach facilitates the rapid identification of mutations conferring selectable phenotypes, without prior knowledge of biological and molecular mechanisms. For malaria, this model can identify candidate genes before resistant parasites are commonly observed in natural human malaria populations. PMID:20846421

  10. Comprehensive identification of mutations induced by heavy-ion beam irradiation in Arabidopsis thaliana.

    PubMed

    Hirano, Tomonari; Kazama, Yusuke; Ishii, Kotaro; Ohbu, Sumie; Shirakawa, Yuki; Abe, Tomoko

    2015-04-01

    Heavy-ion beams are widely used for mutation breeding and molecular biology. Although the mutagenic effects of heavy-ion beam irradiation have been characterized by sequence analysis of some restricted chromosomal regions or loci, there have been no evaluations at the whole-genome level or of the detailed genomic rearrangements in the mutant genomes. In this study, using array comparative genomic hybridization (array-CGH) and resequencing, we comprehensively characterized the mutations in Arabidopsis thaliana genomes irradiated with Ar or Fe ions. We subsequently used this information to investigate the mutagenic effects of the heavy-ion beams. Array-CGH demonstrated that the average number of deleted areas per genome were 1.9 and 3.7 following Ar-ion and Fe-ion irradiation, respectively, with deletion sizes ranging from 149 to 602,180 bp; 81% of the deletions were accompanied by genomic rearrangements. To provide a further detailed analysis, the genomes of the mutants induced by Ar-ion beam irradiation were resequenced, and total mutations, including base substitutions, duplications, in/dels, inversions, and translocations, were detected using three algorithms. All three resequenced mutants had genomic rearrangements. Of the 22 DNA fragments that contributed to the rearrangements, 19 fragments were responsible for the intrachromosomal rearrangements, and multiple rearrangements were formed in the localized regions of the chromosomes. The interchromosomal rearrangements were detected in the multiply rearranged regions. These results indicate that the heavy-ion beams led to clustered DNA damage in the chromosome, and that they have great potential to induce complicated intrachromosomal rearrangements. Heavy-ion beams will prove useful as unique mutagens for plant breeding and the establishment of mutant lines. © 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd.

  11. Identification of a novel LMF1 nonsense mutation responsible for severe hypertriglyceridemia by targeted next-generation sequencing.

    PubMed

    Cefalù, Angelo B; Spina, Rossella; Noto, Davide; Ingrassia, Valeria; Valenti, Vincenza; Giammanco, Antonina; Fayer, Francesca; Misiano, Gabriella; Cocorullo, Gianfranco; Scrimali, Chiara; Palesano, Ornella; Altieri, Grazia I; Ganci, Antonina; Barbagallo, Carlo M; Averna, Maurizio R

    Severe hypertriglyceridemia (HTG) may result from mutations in genes affecting the intravascular lipolysis of triglyceride (TG)-rich lipoproteins. The aim of this study was to develop a targeted next-generation sequencing panel for the molecular diagnosis of disorders characterized by severe HTG. We developed a targeted customized panel for next-generation sequencing Ion Torrent Personal Genome Machine to capture the coding exons and intron/exon boundaries of 18 genes affecting the main pathways of TG synthesis and metabolism. We sequenced 11 samples of patients with severe HTG (TG>885 mg/dL-10 mmol/L): 4 positive controls in whom pathogenic mutations had previously been identified by Sanger sequencing and 7 patients in whom the molecular defect was still unknown. The customized panel was accurate, and it allowed to confirm genetic variants previously identified in all positive controls with primary severe HTG. Only 1 patient of 7 with HTG was found to be carrier of a homozygous pathogenic mutation of the third novel mutation of LMF1 gene (c.1380C>G-p.Y460X). The clinical and molecular familial cascade screening allowed the identification of 2 additional affected siblings and 7 heterozygous carriers of the mutation. We showed that our targeted resequencing approach for genetic diagnosis of severe HTG appears to be accurate, less time consuming, and more economical compared with traditional Sanger resequencing. The identification of pathogenic mutations in candidate genes remains challenging and clinical resequencing should mainly intended for patients with strong clinical criteria for monogenic severe HTG. Copyright © 2017 National Lipid Association. Published by Elsevier Inc. All rights reserved.

  12. Mining the LIPG Allelic Spectrum Reveals the Contribution of Rare and Common Regulatory Variants to HDL Cholesterol

    PubMed Central

    Raghavan, Avanthi; Neeli, Hemanth; Jin, Weijun; Badellino, Karen O.; Demissie, Serkalem; Manning, Alisa K.; DerOhannessian, Stephanie L.; Wolfe, Megan L.; Cupples, L. Adrienne; Li, Mingyao; Kathiresan, Sekar; Rader, Daniel J.

    2011-01-01

    Genome-wide association studies (GWAS) have successfully identified loci associated with quantitative traits, such as blood lipids. Deep resequencing studies are being utilized to catalogue the allelic spectrum at GWAS loci. The goal of these studies is to identify causative variants and missing heritability, including heritability due to low frequency and rare alleles with large phenotypic impact. Whereas rare variant efforts have primarily focused on nonsynonymous coding variants, we hypothesized that noncoding variants in these loci are also functionally important. Using the HDL-C gene LIPG as an example, we explored the effect of regulatory variants identified through resequencing of subjects at HDL-C extremes on gene expression, protein levels, and phenotype. Resequencing a portion of the LIPG promoter and 5′ UTR in human subjects with extreme HDL-C, we identified several rare variants in individuals from both extremes. Luciferase reporter assays were used to measure the effect of these rare variants on LIPG expression. Variants conferring opposing effects on gene expression were enriched in opposite extremes of the phenotypic distribution. Minor alleles of a common regulatory haplotype and noncoding GWAS SNPs were associated with reduced plasma levels of the LIPG gene product endothelial lipase (EL), consistent with its role in HDL-C catabolism. Additionally, we found that a common nonfunctional coding variant associated with HDL-C (rs2000813) is in linkage disequilibrium with a 5′ UTR variant (rs34474737) that decreases LIPG promoter activity. We attribute the gene regulatory role of rs34474737 to the observed association of the coding variant with plasma EL levels and HDL-C. Taken together, the findings show that both rare and common noncoding regulatory variants are important contributors to the allelic spectrum in complex trait loci. PMID:22174694

  13. CNV discovery for milk composition traits in dairy cattle using whole genome resequencing.

    PubMed

    Gao, Yahui; Jiang, Jianping; Yang, Shaohua; Hou, Yali; Liu, George E; Zhang, Shengli; Zhang, Qin; Sun, Dongxiao

    2017-03-29

    Copy number variations (CNVs) are important and widely distributed in the genome. CNV detection opens a new avenue for exploring genes associated with complex traits in humans, animals and plants. Herein, we present a genome-wide assessment of CNVs that are potentially associated with milk composition traits in dairy cattle. In this study, CNVs were detected based on whole genome re-sequencing data of eight Holstein bulls from four half- and/or full-sib families, with extremely high and low estimated breeding values (EBVs) of milk protein percentage and fat percentage. The range of coverage depth per individual was 8.2-11.9×. Using CNVnator, we identified a total of 14,821 CNVs, including 5025 duplications and 9796 deletions. Among them, 487 differential CNV regions (CNVRs) comprising ~8.23 Mb of the cattle genome were observed between the high and low groups. Annotation of these differential CNVRs were performed based on the cattle genome reference assembly (UMD3.1) and totally 235 functional genes were found within the CNVRs. By Gene Ontology and KEGG pathway analyses, we found that genes were significantly enriched for specific biological functions related to protein and lipid metabolism, insulin/IGF pathway-protein kinase B signaling cascade, prolactin signaling pathway and AMPK signaling pathways. These genes included INS, IGF2, FOXO3, TH, SCD5, GALNT18, GALNT16, ART3, SNCA and WNT7A, implying their potential association with milk protein and fat traits. In addition, 95 CNVRs were overlapped with 75 known QTLs that are associated with milk protein and fat traits of dairy cattle (Cattle QTLdb). In conclusion, based on NGS of 8 Holstein bulls with extremely high and low EBVs for milk PP and FP, we identified a total of 14,821 CNVs, 487 differential CNVRs between groups, and 10 genes, which were suggested as promising candidate genes for milk protein and fat traits.

  14. Performance comparison of two commercial human whole-exome capture systems on formalin-fixed paraffin-embedded lung adenocarcinoma samples.

    PubMed

    Bonfiglio, Silvia; Vanni, Irene; Rossella, Valeria; Truini, Anna; Lazarevic, Dejan; Dal Bello, Maria Giovanna; Alama, Angela; Mora, Marco; Rijavec, Erika; Genova, Carlo; Cittaro, Davide; Grossi, Francesco; Coco, Simona

    2016-08-30

    Next Generation Sequencing (NGS) has become a valuable tool for molecular landscape characterization of cancer genomes, leading to a better understanding of tumor onset and progression, and opening new avenues in translational oncology. Formalin-fixed paraffin-embedded (FFPE) tissue is the method of choice for storage of clinical samples, however low quality of FFPE genomic DNA (gDNA) can limit its use for downstream applications. To investigate the FFPE specimen suitability for NGS analysis and to establish the performance of two solution-based exome capture technologies, we compared the whole-exome sequencing (WES) data of gDNA extracted from 5 fresh frozen (FF) and 5 matched FFPE lung adenocarcinoma tissues using: SeqCap EZ Human Exome v.3.0 (Roche NimbleGen) and SureSelect XT Human All Exon v.5 (Agilent Technologies). Sequencing metrics on Illumina HiSeq were optimal for both exome systems and comparable among FFPE and FF samples, with a slight increase of PCR duplicates in FFPE, mainly in Roche NimbleGen libraries. Comparison of single nucleotide variants (SNVs) between FFPE-FF pairs reached overlapping values >90 % in both systems. Both WES showed high concordance with target re-sequencing data by Ion PGM™ in 22 lung-cancer genes, regardless the source of samples. Exon coverage of 623 cancer-related genes revealed high coverage efficiency of both kits, proposing WES as a valid alternative to target re-sequencing. High-quality and reliable data can be successfully obtained from WES of FFPE samples starting from a relatively low amount of input gDNA, suggesting the inclusion of NGS-based tests into clinical contest. In conclusion, our analysis suggests that the WES approach could be extended to a translational research context as well as to the clinic (e.g. to study rare malignancies), where the simultaneous analysis of the whole coding region of the genome may help in the detection of cancer-linked variants.

  15. Novel mutations and their genotype-phenotype correlations in patients with Noonan syndrome, using next-generation sequencing.

    PubMed

    Tafazoli, Alireza; Eshraghi, Peyman; Pantaleoni, Francesca; Vakili, Rahim; Moghaddassian, Morteza; Ghahraman, Martha; Muto, Valentina; Paolacci, Stefano; Golyan, Fatemeh Fardi; Abbaszadegan, Mohammad Reza

    2018-03-01

    Noonan Syndrome (NS) is an autosomal dominant disorder with many variable and heterogeneous conditions. The genetic basis for 20-30% of cases is still unknown. This study evaluates Iranian Noonan patients both clinically and genetically for the first time. Mutational analysis of PTPN11 gene was performed in 15 Iranian patients, using PCR and Sanger sequencing at phase one. Then, as phase two, Next Generation Sequencing (NGS) in the form of targeted resequencing was utilized for analysis of exons from other related genes. Homology modelling for the novel founded mutations was performed as well. The genotype, phenotype correlation was done according to the molecular findings and clinical features. Previously reported mutation (p.N308D) in some patients and a novel mutation (p.D155N) in one of the patients were identified in phase one. After applying NGS methods, known and new variants were found in four patients in other genes, including: CBL (p. V904I), KRAS (p. L53W), SOS1 (p. I1302V), and SOS1 (p. R552G). Structural studies of two deduced novel mutations in related genes revealed deficiencies in the mutated proteins. Following genotype, phenotype correlation, a new pattern of the presence of intellectual disability in two patients was registered. NS shows strong variable expressivity along the high genetic heterogeneity especially in distinct populations and ethnic groups. Also possibly unknown other causative genes may be exist. Obviously, more comprehensive and new technologies like NGS methods are the best choice for detection of molecular defects in patients for genotype, phenotype correlation and disease management. Copyright © 2017 Medical University of Bialystok. Published by Elsevier B.V. All rights reserved.

  16. CYP3A variation and the evolution of salt-sensitivity variants.

    PubMed

    Thompson, E E; Kuttab-Boulos, H; Witonsky, D; Yang, L; Roe, B A; Di Rienzo, A

    2004-12-01

    Members of the cytochrome P450 3A subfamily catalyze the metabolism of endogenous substrates, environmental carcinogens, and clinically important exogenous compounds, such as prescription drugs and therapeutic agents. In particular, the CYP3A4 and CYP3A5 genes play an especially important role in pharmacogenetics, since they metabolize >50% of the drugs on the market. However, known genetic variants at these two loci are not sufficient to account for the observed phenotypic variability in drug response. We used a comparative genomics approach to identify conserved coding and noncoding regions at these genes and resequenced them in three ethnically diverse human populations. We show that remarkable interpopulation differences exist with regard to frequency spectrum and haplotype structure. The non-African samples are characterized by a marked excess of rare variants and the presence of a homogeneous group of long-range haplotypes at high frequency. The CYP3A5*1/*3 polymorphism, which is likely to influence salt and water retention and risk for salt-sensitive hypertension, was genotyped in >1,000 individuals from 52 worldwide population samples. The results reveal an unusual geographic pattern whereby the CYP3A5*3 frequency shows extreme variation across human populations and is significantly correlated with distance from the equator. Furthermore, we show that an unlinked variant, AGT M235T, previously implicated in hypertension and pre-eclampsia, exhibits a similar geographic distribution and is significantly correlated in frequency with CYP3A5*1/*3. Taken together, these results suggest that variants that influence salt homeostasis were the targets of a shared selective pressure that resulted from an environmental variable correlated with latitude.

  17. CYP3A Variation and the Evolution of Salt-Sensitivity Variants

    PubMed Central

    Thompson, E. E.; Kuttab-Boulos, H.; Witonsky, D.; Yang, L.; Roe, B. A.; Di Rienzo, A.

    2004-01-01

    Members of the cytochrome P450 3A subfamily catalyze the metabolism of endogenous substrates, environmental carcinogens, and clinically important exogenous compounds, such as prescription drugs and therapeutic agents. In particular, the CYP3A4 and CYP3A5 genes play an especially important role in pharmacogenetics, since they metabolize >50% of the drugs on the market. However, known genetic variants at these two loci are not sufficient to account for the observed phenotypic variability in drug response. We used a comparative genomics approach to identify conserved coding and noncoding regions at these genes and resequenced them in three ethnically diverse human populations. We show that remarkable interpopulation differences exist with regard to frequency spectrum and haplotype structure. The non-African samples are characterized by a marked excess of rare variants and the presence of a homogeneous group of long-range haplotypes at high frequency. The CYP3A5*1/*3 polymorphism, which is likely to influence salt and water retention and risk for salt-sensitive hypertension, was genotyped in >1,000 individuals from 52 worldwide population samples. The results reveal an unusual geographic pattern whereby the CYP3A5*3 frequency shows extreme variation across human populations and is significantly correlated with distance from the equator. Furthermore, we show that an unlinked variant, AGT M235T, previously implicated in hypertension and pre-eclampsia, exhibits a similar geographic distribution and is significantly correlated in frequency with CYP3A5*1/*3. Taken together, these results suggest that variants that influence salt homeostasis were the targets of a shared selective pressure that resulted from an environmental variable correlated with latitude. PMID:15492926

  18. Rare versus common variants in pharmacogenetics: SLCO1B1 variation and methotrexate disposition

    PubMed Central

    Ramsey, Laura B.; Bruun, Gitte H.; Yang, Wenjian; Treviño, Lisa R.; Vattathil, Selina; Scheet, Paul; Cheng, Cheng; Rosner, Gary L.; Giacomini, Kathleen M.; Fan, Yiping; Sparreboom, Alex; Mikkelsen, Torben S.; Corydon, Thomas J.; Pui, Ching-Hon; Evans, William E.; Relling, Mary V.

    2012-01-01

    Methotrexate is used to treat autoimmune diseases and malignancies, including acute lymphoblastic leukemia (ALL). Inter-individual variation in clearance of methotrexate results in heterogeneous systemic exposure, clinical efficacy, and toxicity. In a genome-wide association study of children with ALL, we identified SLCO1B1 as harboring multiple common polymorphisms associated with methotrexate clearance. The extent of influence of rare versus common variants on pharmacogenomic phenotypes remains largely unexplored. We tested the hypothesis that rare variants in SLCO1B1 could affect methotrexate clearance and compared the influence of common versus rare variants in addition to clinical covariates on clearance. From deep resequencing of SLCO1B1 exons in 699 children, we identified 93 SNPs, 15 of which were non-synonymous (NS). Three of these NS SNPs were common, with a minor allele frequency (MAF) >5%, one had low frequency (MAF 1%–5%), and 11 were rare (MAF <1%). NS SNPs (common or rare) predicted to be functionally damaging were more likely to be found among patients with the lowest methotrexate clearance than patients with high clearance. We verified lower function in vitro of four SLCO1B1 haplotypes that were associated with reduced methotrexate clearance. In a multivariate stepwise regression analysis adjusting for other genetic and non-genetic covariates, SLCO1B1 variants accounted for 10.7% of the population variability in clearance. Of that variability, common NS variants accounted for the majority, but rare damaging NS variants constituted 17.8% of SLCO1B1's effects (1.9% of total variation) and had larger effect sizes than common NS variants. Our results show that rare variants are likely to have an important effect on pharmacogenetic phenotypes. PMID:22147369

  19. A tonoplast sugar transporter underlies a sugar accumulation QTL in watermelon

    USDA-ARS?s Scientific Manuscript database

    The molecular mechanism controlling accumulation of soluble sugars in watermelon (Citrullus lanatus) fruit, a trait associated with sweet-dessert watermelon domestication, is still unknown. We re-sequenced 96 recombinant inbred lines, derived from a cross between sweet and unsweet watermelon accessi...

  20. [Fine mapping of complex disease susceptibility loci].

    PubMed

    Song, Qingfeng; Zhang, Hongxing; Ma, Yilong; Zhou, Gangqiao

    2014-01-01

    Genome-wide association studies (GWAS) using single nucleotide polymorphism (SNP) markers have identified more than 3800 susceptibility loci for more than 660 diseases or traits. However, the most significantly associated variants or causative variants in these loci and their biological functions have remained to be clarified. These causative variants can help to elucidate the pathogenesis and discover new biomarkers of complex diseases. One of the main goals in the post-GWAS era is to identify the causative variants and susceptibility genes, and clarify their functional aspects by fine mapping. For common variants, imputation or re-sequencing based strategies were implemented to increase the number of analyzed variants and help to identify the most significantly associated variants. In addition, functional element, expression quantitative trait locus (eQTL) and haplotype analyses were performed to identify functional common variants and susceptibility genes. For rare variants, fine mapping was carried out by re-sequencing, rare haplotype analysis, family-based analysis, burden test, etc.This review summarizes the strategies and problems for fine mapping.

  1. Detecting directional selection in the presence of recent admixture in African-Americans.

    PubMed

    Lohmueller, Kirk E; Bustamante, Carlos D; Clark, Andrew G

    2011-03-01

    We investigate the performance of tests of neutrality in admixed populations using plausible demographic models for African-American history as well as resequencing data from African and African-American populations. The analysis of both simulated and human resequencing data suggests that recent admixture does not result in an excess of false-positive results for neutrality tests based on the frequency spectrum after accounting for the population growth in the parental African population. Furthermore, when simulating positive selection, Tajima's D, Fu and Li's D, and haplotype homozygosity have lower power to detect population-specific selection using individuals sampled from the admixed population than from the nonadmixed population. Fay and Wu's H test, however, has more power to detect selection using individuals from the admixed population than from the nonadmixed population, especially when the selective sweep ended long ago. Our results have implications for interpreting recent genome-wide scans for positive selection in human populations. © 2011 by the Genetics Society of America

  2. Identifying disease polymorphisms from case-control genetic association data.

    PubMed

    Park, L

    2010-12-01

    In case-control association studies, it is typical to observe several associated polymorphisms in a gene region. Often the most significantly associated polymorphism is considered to be the disease polymorphism; however, it is not clear whether it is the disease polymorphism or there is more than one disease polymorphism in the gene region. Currently, there is no method that can handle these problems based on the linkage disequilibrium (LD) relationship between polymorphisms. To distinguish real disease polymorphisms from markers in LD, a method that can detect disease polymorphisms in a gene region has been developed. Relying on the LD between polymorphisms in controls, the proposed method utilizes model-based likelihood ratio tests to find disease polymorphisms. This method shows reliable Type I and Type II error rates when sample sizes are large enough, and works better with re-sequenced data. Applying this method to fine mapping using re-sequencing or dense genotyping data would provide important information regarding the genetic architecture of complex traits.

  3. Resequencing a core collection of upland cotton identifies genomic variation and loci influencing fiber quality and yield.

    PubMed

    Ma, Zhiying; He, Shoupu; Wang, Xingfen; Sun, Junling; Zhang, Yan; Zhang, Guiyin; Wu, Liqiang; Li, Zhikun; Liu, Zhihao; Sun, Gaofei; Yan, Yuanyuan; Jia, Yinhua; Yang, Jun; Pan, Zhaoe; Gu, Qishen; Li, Xueyuan; Sun, Zhengwen; Dai, Panhong; Liu, Zhengwen; Gong, Wenfang; Wu, Jinhua; Wang, Mi; Liu, Hengwei; Feng, Keyun; Ke, Huifeng; Wang, Junduo; Lan, Hongyu; Wang, Guoning; Peng, Jun; Wang, Nan; Wang, Liru; Pang, Baoyin; Peng, Zhen; Li, Ruiqiang; Tian, Shilin; Du, Xiongming

    2018-05-07

    Upland cotton is the most important natural-fiber crop. The genomic variation of diverse germplasms and alleles underpinning fiber quality and yield should be extensively explored. Here, we resequenced a core collection comprising 419 accessions with 6.55-fold coverage depth and identified approximately 3.66 million SNPs for evaluating the genomic variation. We performed phenotyping across 12 environments and conducted genome-wide association study of 13 fiber-related traits. 7,383 unique SNPs were significantly associated with these traits and were located within or near 4,820 genes; more associated loci were detected for fiber quality than fiber yield, and more fiber genes were detected in the D than the A subgenome. Several previously undescribed causal genes for days to flowering, fiber length, and fiber strength were identified. Phenotypic selection for these traits increased the frequency of elite alleles during domestication and breeding. These results provide targets for molecular selection and genetic manipulation in cotton improvement.

  4. Illuminator, a desktop program for mutation detection using short-read clonal sequencing.

    PubMed

    Carr, Ian M; Morgan, Joanne E; Diggle, Christine P; Sheridan, Eamonn; Markham, Alexander F; Logan, Clare V; Inglehearn, Chris F; Taylor, Graham R; Bonthron, David T

    2011-10-01

    Current methods for sequencing clonal populations of DNA molecules yield several gigabases of data per day, typically comprising reads of < 100 nt. Such datasets permit widespread genome resequencing and transcriptome analysis or other quantitative tasks. However, this huge capacity can also be harnessed for the resequencing of smaller (gene-sized) target regions, through the simultaneous parallel analysis of multiple subjects, using sample "tagging" or "indexing". These methods promise to have a huge impact on diagnostic mutation analysis and candidate gene testing. Here we describe a software package developed for such studies, offering the ability to resolve pooled samples carrying barcode tags and to align reads to a reference sequence using a mutation-tolerant process. The program, Illuminator, can identify rare sequence variants, including insertions and deletions, and permits interactive data analysis on standard desktop computers. It facilitates the effective analysis of targeted clonal sequencer data without dedicated computational infrastructure or specialized training. Copyright © 2011 Elsevier Inc. All rights reserved.

  5. Characterization of GM events by insert knowledge adapted re-sequencing approaches

    PubMed Central

    Yang, Litao; Wang, Congmao; Holst-Jensen, Arne; Morisset, Dany; Lin, Yongjun; Zhang, Dabing

    2013-01-01

    Detection methods and data from molecular characterization of genetically modified (GM) events are needed by stakeholders of public risk assessors and regulators. Generally, the molecular characteristics of GM events are incomprehensively revealed by current approaches and biased towards detecting transformation vector derived sequences. GM events are classified based on available knowledge of the sequences of vectors and inserts (insert knowledge). Herein we present three insert knowledge-adapted approaches for characterization GM events (TT51-1 and T1c-19 rice as examples) based on paired-end re-sequencing with the advantages of comprehensiveness, accuracy, and automation. The comprehensive molecular characteristics of two rice events were revealed with additional unintended insertions comparing with the results from PCR and Southern blotting. Comprehensive transgene characterization of TT51-1 and T1c-19 is shown to be independent of a priori knowledge of the insert and vector sequences employing the developed approaches. This provides an opportunity to identify and characterize also unknown GM events. PMID:24088728

  6. Characterization of GM events by insert knowledge adapted re-sequencing approaches.

    PubMed

    Yang, Litao; Wang, Congmao; Holst-Jensen, Arne; Morisset, Dany; Lin, Yongjun; Zhang, Dabing

    2013-10-03

    Detection methods and data from molecular characterization of genetically modified (GM) events are needed by stakeholders of public risk assessors and regulators. Generally, the molecular characteristics of GM events are incomprehensively revealed by current approaches and biased towards detecting transformation vector derived sequences. GM events are classified based on available knowledge of the sequences of vectors and inserts (insert knowledge). Herein we present three insert knowledge-adapted approaches for characterization GM events (TT51-1 and T1c-19 rice as examples) based on paired-end re-sequencing with the advantages of comprehensiveness, accuracy, and automation. The comprehensive molecular characteristics of two rice events were revealed with additional unintended insertions comparing with the results from PCR and Southern blotting. Comprehensive transgene characterization of TT51-1 and T1c-19 is shown to be independent of a priori knowledge of the insert and vector sequences employing the developed approaches. This provides an opportunity to identify and characterize also unknown GM events.

  7. Increased CNV-Region deletions in mild cognitive impairment (MCI) and Alzheimer's disease (AD) subjects in the ADNI sample

    PubMed Central

    Guffanti, Guia; Torri, Federica; Rasmussen, Jerod; Clark, Andrew P.; Lakatos, Anita; Turner, Jessica A.; Fallon, James H.; Saykin, Andrew J.; Weiner, Michael; Vawter, Marquis P.; Knowles, James A.; Potkin, Steven G.; Macciardi, Fabio

    2014-01-01

    We investigated the genome-wide distribution of CNVs in the Alzheimer's disease (AD) Neuroimaging Initiative (ADNI) sample (146 with AD, 313 with Mild Cognitive Impairment (MCI), and 181 controls). Comparison of single CNVs between cases (MCI and AD) and controls shows overrepresentation of large heterozygous deletions in cases (p-value < 0.0001). The analysis of CNV-Regions identifies 44 copy number variable loci of heterozygous deletions, with more CNV-Regions among affected than controls (p = 0.005). Seven of the 44 CNV-Regions are nominally significant for association with cognitive impairment. We validated and confirmed our main findings with genome re-sequencing of selected patients and controls. The functional pathway analysis of the genes putatively affected by deletions of CNV-Regions reveals enrichment of genes implicated in axonal guidance, cell–cell adhesion, neuronal morphogenesis and differentiation. Our findings support the role of CNVs in AD, and suggest an association between large deletions and the development of cognitive impairment PMID:23583670

  8. mQTL-seq delineates functionally relevant candidate gene harbouring a major QTL regulating pod number in chickpea

    PubMed Central

    Das, Shouvik; Singh, Mohar; Srivastava, Rishi; Bajaj, Deepak; Saxena, Maneesha S.; Rana, Jai C.; Bansal, Kailash C.; Tyagi, Akhilesh K.; Parida, Swarup K.

    2016-01-01

    The present study used a whole-genome, NGS resequencing-based mQTL-seq (multiple QTL-seq) strategy in two inter-specific mapping populations (Pusa 1103 × ILWC 46 and Pusa 256 × ILWC 46) to scan the major genomic region(s) underlying QTL(s) governing pod number trait in chickpea. Essentially, the whole-genome resequencing of low and high pod number-containing parental accessions and homozygous individuals (constituting bulks) from each of these two mapping populations discovered >8 million high-quality homozygous SNPs with respect to the reference kabuli chickpea. The functional significance of the physically mapped SNPs was apparent from the identified 2,264 non-synonymous and 23,550 regulatory SNPs, with 8–10% of these SNPs-carrying genes corresponding to transcription factors and disease resistance-related proteins. The utilization of these mined SNPs in Δ (SNP index)-led QTL-seq analysis and their correlation between two mapping populations based on mQTL-seq, narrowed down two (CaqaPN4.1: 867.8 kb and CaqaPN4.2: 1.8 Mb) major genomic regions harbouring robust pod number QTLs into the high-resolution short QTL intervals (CaqbPN4.1: 637.5 kb and CaqbPN4.2: 1.28 Mb) on chickpea chromosome 4. The integration of mQTL-seq-derived one novel robust QTL with QTL region-specific association analysis delineated the regulatory (C/T) and coding (C/A) SNPs-containing one pentatricopeptide repeat (PPR) gene at a major QTL region regulating pod number in chickpea. This target gene exhibited anther, mature pollen and pod-specific expression, including pronounced higher up-regulated (∼3.5-folds) transcript expression in high pod number-containing parental accessions and homozygous individuals of two mapping populations especially during pollen and pod development. The proposed mQTL-seq-driven combinatorial strategy has profound efficacy in rapid genome-wide scanning of potential candidate gene(s) underlying trait-associated high-resolution robust QTL(s), thereby expediting genomics-assisted breeding and genetic enhancement of crop plants, including chickpea. PMID:26685680

  9. Standing Genetic Variation in Contingency Loci Drives the Rapid Adaptation of Campylobacter jejuni to a Novel Host

    PubMed Central

    Jerome, John P.; Bell, Julia A.; Plovanich-Jones, Anne E.; Barrick, Jeffrey E.; Brown, C. Titus; Mansfield, Linda S.

    2011-01-01

    The genome of the food-borne pathogen Campylobacter jejuni contains multiple highly mutable sites, or contingency loci. It has been suggested that standing variation at these loci is a mechanism for rapid adaptation to a novel environment, but this phenomenon has not been shown experimentally. In previous work we showed that the virulence of C. jejuni NCTC11168 increased after serial passage through a C57BL/6 IL-10-/- mouse model of campylobacteriosis. Here we sought to determine the genetic basis of this adaptation during passage. Re-sequencing of the 1.64Mb genome to 200-500X coverage allowed us to define variation in 23 contingency loci to an unprecedented depth both before and after in vivo adaptation. Mutations in the mouse-adapted C. jejuni were largely restricted to the homopolymeric tracts of thirteen contingency loci. These changes cause significant alterations in open reading frames of genes in surface structure biosynthesis loci and in genes with only putative functions. Several loci with open reading frame changes also had altered transcript abundance. The increase in specific phases of contingency loci during in vivo passage of C. jejuni, coupled with the observed virulence increase and the lack of other types of genetic changes, is the first experimental evidence that these variable regions play a significant role in C. jejuni adaptation and virulence in a novel host. PMID:21283682

  10. Translational genomics for analysis of complex traits in peanut and sorghum

    USDA-ARS?s Scientific Manuscript database

    The integration of sequencing and genotype data from natural variation studies (by whole genome resequencing [wgs] or genotype by sequencing [gbs]), transcriptome (RNA-seq) and mutant analysis (also by wgs) facilitated the development of DNA markers in the form of single nucleotide polymorphic (SNP)...

  11. Integrated translational genomics for analysis of complex traits in sorghum

    USDA-ARS?s Scientific Manuscript database

    We will report on the integration of sequencing and genotype data from natural variation (by whole genome resequencing [wgs] or genotype by sequencing [gbs]), transcriptome (RNA-seq) and mutant analysis (also by wgs) with the goal of identifying genes controlling important agronomic traits and tran...

  12. SNPMeta: SNP annotation and SNP metadata collection without a reference genome

    USDA-ARS?s Scientific Manuscript database

    The increase in availability of resequencing data is greatly accelerating SNP discovery and has facilitated the development of SNP genotyping assays. This, in turn, is increasing interest in annotation of individual SNPs. Currently, these data are only available through curation, or comparison to a ...

  13. Comparative population genomics of maize domestication and improvement

    USDA-ARS?s Scientific Manuscript database

    Domestication and modern breeding represent exemplary case studies of evolution in action. Maize is an outcrossing species with a complex genome, and an understanding of maize evolution is thus relevant for both plant and animal systems. This study is the largest plant resequencing effort to date, ...

  14. High-Throughput Sequencing, a Versatile Weapon to Support Genome-Based Diagnosis in Infectious Diseases: Applications to Clinical Bacteriology

    PubMed Central

    Caboche, Ségolène; Audebert, Christophe; Hot, David

    2014-01-01

    The recent progresses of high-throughput sequencing (HTS) technologies enable easy and cost-reduced access to whole genome sequencing (WGS) or re-sequencing. HTS associated with adapted, automatic and fast bioinformatics solutions for sequencing applications promises an accurate and timely identification and characterization of pathogenic agents. Many studies have demonstrated that data obtained from HTS analysis have allowed genome-based diagnosis, which has been consistent with phenotypic observations. These proofs of concept are probably the first steps toward the future of clinical microbiology. From concept to routine use, many parameters need to be considered to promote HTS as a powerful tool to help physicians and clinicians in microbiological investigations. This review highlights the milestones to be completed toward this purpose. PMID:25437800

  15. Genetic susceptibility to neuroblastoma

    PubMed Central

    Tolbert, Vanessa P.; Coggins, Grace E.; Maris, John M.

    2017-01-01

    Until recently, the genetic basis of neuroblastoma, a heterogeneous neoplasm arising from the developing sympathetic nervous system, remained undefined. The discovery of gain-of-function mutations in the ALK receptor tyrosine kinase gene as the major cause of familial neuroblastoma led to the discovery of identical somatic mutations and rapid advancement of ALK as a tractable therapeutic target. Inactivating mutations in a master regulator of neural crest development, PHOX2B, have also been identified in a subset of familial neuroblastomas. Other high penetrance susceptibility alleles likely exist, but together these heritable mutations account for less than 10% of neuroblastoma cases. A genome-wide association study of a large neuroblastoma cohort identified common and rare polymorphisms highly associated with the disease. Ongoing resequencing efforts aim to further define the genetic landscape of neuroblastoma. PMID:28458126

  16. Estimation of Airline Benefits from Avionics Upgrade under Preferential Merge Re-sequence Scheduling

    NASA Technical Reports Server (NTRS)

    Kotegawa, Tatsuya; Cayabyab, Charlene Anne; Almog, Noam

    2013-01-01

    Modernization of the airline fleet avionics is essential to fully enable future technologies and procedures for increasing national airspace system capacity. However in the current national airspace system, system-wide benefits gained by avionics upgrade are not fully directed to aircraft/airlines that upgrade, resulting in slow fleet modernization rate. Preferential merge re-sequence scheduling is a best-equipped-best-served concept designed to incentivize avionics upgrade among airlines by allowing aircraft with new avionics (high-equipped) to be re-sequenced ahead of aircraft without the upgrades (low-equipped) at enroute merge waypoints. The goal of this study is to investigate the potential benefits gained or lost by airlines under a high or low-equipped fleet scenario if preferential merge resequence scheduling is implemented.

  17. Integrated and translational genomics for analysis of complex traits in crops

    USDA-ARS?s Scientific Manuscript database

    We report here on integration of sequencing and genotype data from natural variation (by whole genome resequencing [wgs] or genotype by sequencing [gbs]), transcriptome (RNA-seq) and mutant analysis (also by wgs) with the goal of translating gems from these resources into useable DNA markers in the ...

  18. Genomic Analyses Yield Markers for Identifying Agronomically Important Genes in Potato

    USDA-ARS?s Scientific Manuscript database

    This study explores the genetic architecture underling the potato evolution through a comprehensive assessment of wild and cultivated potato species based on the re-sequencing of 201 accessions of Solanum section Petota with >12 × genome coverage. We identified 450 domesticated genes, which showed e...

  19. Genome sequences of Phytophthora enable translational plant disease management and accelerate research

    Treesearch

    Niklaus J. Grünwald

    2012-01-01

    Whole and partial genome sequences are becoming available at an ever-increasing pace. For many plant pathogen systems, we are moving into the era of genome resequencing. The first Phytophthora genomes, P. ramorum and P. sojae, became available in 2004, followed shortly by P. infestans...

  20. Resequencing and annotation of the Nostoc punctiforme ATTC 29133 genome: facilitating biofuel and high-value chemical production

    DOE PAGES

    Moraes, Luis E.; Blow, Matthew J.; Hawley, Erik R.; ...

    2017-02-16

    Cyanobacteria have the potential to produce bulk and fine chemicals and members belonging to Nostoc sp. have received particular attention due to their relatively fast growth rate and the relative ease with which they can be harvested. Nostoc punctiforme is an aerobic, motile, Gram-negative, filamentous cyanobacterium that has been studied intensively to enhance our understanding of microbial carbon and nitrogen fixation. The genome of the type strain N. punctiforme ATCC 29133 was sequenced in 2001 and the scientific community has used these genome data extensively since then. Advances in bioinformatics tools for sequence annotation and the importance of this organismmore » prompted us to resequence and reanalyze its genome and to make both, the initial and improved annotation, available to the scientific community. The new draft genome has a total size of 9.1 Mbp and consists of 65 contiguous pieces of DNA with a GC content of 41.38% and 7664 protein-coding genes. Furthermore, the resequenced genome is slightly (5152 bp) larger and contains 987 more genes with functional prediction when compared to the previously published version. We deposited the annotation of both genomes in the Department of Energy’s IMG database to facilitate easy genome exploration by the scientific community without the need of in-depth bioinformatics skills. We expect that an facilitated access and ability to search the N. punctiforme ATCC 29133 for genes of interest will significantly facilitate metabolic engineering and genome prospecting efforts and ultimately the synthesis of biofuels and natural products from this keystone organism and closely related cyanobacteria.« less

  1. Resequencing and annotation of the Nostoc punctiforme ATTC 29133 genome: facilitating biofuel and high-value chemical production

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Moraes, Luis E.; Blow, Matthew J.; Hawley, Erik R.

    Cyanobacteria have the potential to produce bulk and fine chemicals and members belonging to Nostoc sp. have received particular attention due to their relatively fast growth rate and the relative ease with which they can be harvested. Nostoc punctiforme is an aerobic, motile, Gram-negative, filamentous cyanobacterium that has been studied intensively to enhance our understanding of microbial carbon and nitrogen fixation. The genome of the type strain N. punctiforme ATCC 29133 was sequenced in 2001 and the scientific community has used these genome data extensively since then. Advances in bioinformatics tools for sequence annotation and the importance of this organismmore » prompted us to resequence and reanalyze its genome and to make both, the initial and improved annotation, available to the scientific community. The new draft genome has a total size of 9.1 Mbp and consists of 65 contiguous pieces of DNA with a GC content of 41.38% and 7664 protein-coding genes. Furthermore, the resequenced genome is slightly (5152 bp) larger and contains 987 more genes with functional prediction when compared to the previously published version. We deposited the annotation of both genomes in the Department of Energy’s IMG database to facilitate easy genome exploration by the scientific community without the need of in-depth bioinformatics skills. We expect that an facilitated access and ability to search the N. punctiforme ATCC 29133 for genes of interest will significantly facilitate metabolic engineering and genome prospecting efforts and ultimately the synthesis of biofuels and natural products from this keystone organism and closely related cyanobacteria.« less

  2. Estimation of linkage disequilibrium and interspecific gene flow in Ficedula flycatchers by a newly developed 50k single-nucleotide polymorphism array

    PubMed Central

    Kawakami, Takeshi; Backström, Niclas; Burri, Reto; Husby, Arild; Olason, Pall; Rice, Amber M; Ålund, Murielle; Qvarnström, Anna; Ellegren, Hans

    2014-01-01

    With the access to draft genome sequence assemblies and whole-genome resequencing data from population samples, molecular ecology studies will be able to take truly genome-wide approaches. This now applies to an avian model system in ecological and evolutionary research: Old World flycatchers of the genus Ficedula, for which we recently obtained a 1.1 Gb collared flycatcher genome assembly and identified 13 million single-nucleotide polymorphism (SNP)s in population resequencing of this species and its sister species, pied flycatcher. Here, we developed a custom 50K Illumina iSelect flycatcher SNP array with markers covering 30 autosomes and the Z chromosome. Using a number of selection criteria for inclusion in the array, both genotyping success rate and polymorphism information content (mean marker heterozygosity = 0.41) were high. We used the array to assess linkage disequilibrium (LD) and hybridization in flycatchers. Linkage disequilibrium declined quickly to the background level at an average distance of 17 kb, but the extent of LD varied markedly within the genome and was more than 10-fold higher in ‘genomic islands’ of differentiation than in the rest of the genome. Genetic ancestry analysis identified 33 F1 hybrids but no later-generation hybrids from sympatric populations of collared flycatchers and pied flycatchers, contradicting earlier reports of backcrosses identified from much fewer number of markers. With an estimated divergence time as recently as <1 Ma, this suggests strong selection against F1 hybrids and unusually rapid evolution of reproductive incompatibility in an avian system. PMID:24784959

  3. GWASeq: targeted re-sequencing follow up to GWAS.

    PubMed

    Salomon, Matthew P; Li, Wai Lok Sibon; Edlund, Christopher K; Morrison, John; Fortini, Barbara K; Win, Aung Ko; Conti, David V; Thomas, Duncan C; Duggan, David; Buchanan, Daniel D; Jenkins, Mark A; Hopper, John L; Gallinger, Steven; Le Marchand, Loïc; Newcomb, Polly A; Casey, Graham; Marjoram, Paul

    2016-03-03

    For the last decade the conceptual framework of the Genome-Wide Association Study (GWAS) has dominated the investigation of human disease and other complex traits. While GWAS have been successful in identifying a large number of variants associated with various phenotypes, the overall amount of heritability explained by these variants remains small. This raises the question of how best to follow up on a GWAS, localize causal variants accounting for GWAS hits, and as a consequence explain more of the so-called "missing" heritability. Advances in high throughput sequencing technologies now allow for the efficient and cost-effective collection of vast amounts of fine-scale genomic data to complement GWAS. We investigate these issues using a colon cancer dataset. After QC, our data consisted of 1993 cases, 899 controls. Using marginal tests of associations, we identify 10 variants distributed among six targeted regions that are significantly associated with colorectal cancer, with eight of the variants being novel to this study. Additionally, we perform so-called 'SNP-set' tests of association and identify two sets of variants that implicate both common and rare variants in the etiology of colorectal cancer. Here we present a large-scale targeted re-sequencing resource focusing on genomic regions implicated in colorectal cancer susceptibility previously identified in several GWAS, which aims to 1) provide fine-scale targeted sequencing data for fine-mapping and 2) provide data resources to address methodological questions regarding the design of sequencing-based follow-up studies to GWAS. Additionally, we show that this strategy successfully identifies novel variants associated with colorectal cancer susceptibility and can implicate both common and rare variants.

  4. Genomics of crop wild relatives: expanding the gene pool for crop improvement.

    PubMed

    Brozynska, Marta; Furtado, Agnelo; Henry, Robert J

    2016-04-01

    Plant breeders require access to new genetic diversity to satisfy the demands of a growing human population for more food that can be produced in a variable or changing climate and to deliver the high-quality food with nutritional and health benefits demanded by consumers. The close relatives of domesticated plants, crop wild relatives (CWRs), represent a practical gene pool for use by plant breeders. Genomics of CWR generates data that support the use of CWR to expand the genetic diversity of crop plants. Advances in DNA sequencing technology are enabling the efficient sequencing of CWR and their increased use in crop improvement. As the sequencing of genomes of major crop species is completed, attention has shifted to analysis of the wider gene pool of major crops including CWR. A combination of de novo sequencing and resequencing is required to efficiently explore useful genetic variation in CWR. Analysis of the nuclear genome, transcriptome and maternal (chloroplast and mitochondrial) genome of CWR is facilitating their use in crop improvement. Genome analysis results in discovery of useful alleles in CWR and identification of regions of the genome in which diversity has been lost in domestication bottlenecks. Targeting of high priority CWR for sequencing will maximize the contribution of genome sequencing of CWR. Coordination of global efforts to apply genomics has the potential to accelerate access to and conservation of the biodiversity essential to the sustainability of agriculture and food production. © 2015 Society for Experimental Biology, Association of Applied Biologists and John Wiley & Sons Ltd.

  5. Bayesian inference of shared recombination hotspots between humans and chimpanzees.

    PubMed

    Wang, Ying; Rannala, Bruce

    2014-12-01

    Recombination generates variation and facilitates evolution. Recombination (or lack thereof) also contributes to human genetic disease. Methods for mapping genes influencing complex genetic diseases via association rely on linkage disequilibrium (LD) in human populations, which is influenced by rates of recombination across the genome. Comparative population genomic analyses of recombination using related primate species can identify factors influencing rates of recombination in humans. Such studies can indicate how variable hotspots for recombination may be both among individuals (or populations) and over evolutionary timescales. Previous studies have suggested that locations of recombination hotspots are not conserved between humans and chimpanzees. We made use of the data sets from recent resequencing projects and applied a Bayesian method for identifying hotspots and estimating recombination rates. We also reanalyzed SNP data sets for regions with known hotspots in humans using samples from the human and chimpanzee. The Bayes factors (BF) of shared recombination hotspots between human and chimpanzee across regions were obtained. Based on the analysis of the aligned regions of human chromosome 21, locations where the two species show evidence of shared recombination hotspots (with high BFs) were identified. Interestingly, previous comparative studies of human and chimpanzee that focused on the known human recombination hotspots within the β-globin and HLA regions did not find overlapping of hotspots. Our results show high BFs of shared hotspots at locations within both regions, and the estimated locations of shared hotspots overlap with the locations of human recombination hotspots obtained from sperm-typing studies. Copyright © 2014 by the Genetics Society of America.

  6. The Impact of Recombination Hotspots on Genome Evolution of a Fungal Plant Pathogen.

    PubMed

    Croll, Daniel; Lendenmann, Mark H; Stewart, Ethan; McDonald, Bruce A

    2015-11-01

    Recombination has an impact on genome evolution by maintaining chromosomal integrity, affecting the efficacy of selection, and increasing genetic variability in populations. Recombination rates are a key determinant of the coevolutionary dynamics between hosts and their pathogens. Historic recombination events created devastating new pathogens, but the impact of ongoing recombination in sexual pathogens is poorly understood. Many fungal pathogens of plants undergo regular sexual cycles, and sex is considered to be a major factor contributing to virulence. We generated a recombination map at kilobase-scale resolution for the haploid plant pathogenic fungus Zymoseptoria tritici. To account for intraspecific variation in recombination rates, we constructed genetic maps from two independent crosses. We localized a total of 10,287 crossover events in 441 progeny and found that recombination rates were highly heterogeneous within and among chromosomes. Recombination rates on large chromosomes were inversely correlated with chromosome length. Short accessory chromosomes often lacked evidence for crossovers between parental chromosomes. Recombination was concentrated in narrow hotspots that were preferentially located close to telomeres. Hotspots were only partially conserved between the two crosses, suggesting that hotspots are short-lived and may vary according to genomic background. Genes located in hotspot regions were enriched in genes encoding secreted proteins. Population resequencing showed that chromosomal regions with high recombination rates were strongly correlated with regions of low linkage disequilibrium. Hence, genes in pathogen recombination hotspots are likely to evolve faster in natural populations and may represent a greater threat to the host. Copyright © 2015 by the Genetics Society of America.

  7. Meeting the challenges of non-referenced genome assembly from short-read sequence data

    Treesearch

    M. Parks; A. Liston; R. Cronn

    2010-01-01

    Massively parallel sequencing technologies (MPST) offer unprecedented opportunities for novel sequencing projects. MPST, while offering tremendous sequencing capacity, are typically most effective in resequencing projects (as opposed to the sequencing of novel genomes) due to the fact that sequence is returned in relatively short reads. Nonetheless, there is great...

  8. A new rainbow trout (Oncorhynchus mykiss) reference genome assembly

    USDA-ARS?s Scientific Manuscript database

    In an effort to improve the rainbow trout reference genome assembly, we have re-sequenced the doubled-haploid Swanson line using the longest available reads from the Illumina technology. Overall we generated over 510 million 260nt paired-end shotgun reads, and 1 billion 160nt mate-pair reads from f...

  9. Genome re-sequencing reveals the history of apple and supports a two-stage model for fruit enlargement

    USDA-ARS?s Scientific Manuscript database

    Human selection has reshaped crop genomes. Here we report an apple genome variation map generated through genome sequencing of 117 diverse accessions. A comprehensive model of apple speciation and domestication along the Silk Road was proposed based on evidence from diverse genomic analyses. Cultiva...

  10. Resequencing IRS2 reveals rare variants for obesity but not fasting glucose homeostasis in Hispanic children

    USDA-ARS?s Scientific Manuscript database

    Our objective was to resequence insulin receptor substrate 2 (IRS2) to identify variants associated with obesity- and diabetes-related traits in Hispanic children. Exonic and intronic segments, 5' and 3' flanking regions of IRS2 (approx. 14.5 kb), were bidirectionally sequenced for single nucleotide...

  11. Forward genetics screen coupled with whole-genome resequencing identifies novel gene targets for improving heterologous enzyme production in Aspergillus niger

    DOE PAGES

    Reilly, Morgann C.; Kim, Joonhoon; Lynn, Jed; ...

    2018-01-06

    Plant biomass, once reduced to its composite sugars, can be converted to fuel substitutes. One means of overcoming the recalcitrance of lignocellulose is pretreatment followed by enzymatic hydrolysis. However, currently available commercial enzyme cocktails are inhibited in the presence of residual pretreatment chemicals. Recent studies have identified a number of cellulolytic enzymes from bacteria that are tolerant to pretreatment chemicals such as ionic liquids. The challenge now is generation of these enzymes in copious amounts, an arena where fungal organisms such as Aspergillus niger have proven efficient. Fungal host strains still need to be engineered to increase production titers ofmore » heterologous protein over native enzymes, which has been a difficult task. Here, we developed a forward genetics screen coupled with whole-genome resequencing to identify specific lesions responsible for a protein hyper-production phenotype in A. niger. As a result, this strategy successfully identified novel targets, including a low-affinity glucose transporter, MstC, whose deletion significantly improved secretion of recombinant proteins driven by a glucoamylase promoter.« less

  12. Whole genome re-sequencing of date palms yields insights into diversification of a fruit tree crop.

    PubMed

    Hazzouri, Khaled M; Flowers, Jonathan M; Visser, Hendrik J; Khierallah, Hussam S M; Rosas, Ulises; Pham, Gina M; Meyer, Rachel S; Johansen, Caryn K; Fresquez, Zoë A; Masmoudi, Khaled; Haider, Nadia; El Kadri, Nabila; Idaghdour, Youssef; Malek, Joel A; Thirkhill, Deborah; Markhand, Ghulam S; Krueger, Robert R; Zaid, Abdelouahhab; Purugganan, Michael D

    2015-11-09

    Date palms (Phoenix dactylifera) are the most significant perennial crop in arid regions of the Middle East and North Africa. Here, we present a comprehensive catalogue of approximately seven million single nucleotide polymorphisms in date palms based on whole genome re-sequencing of a collection of 62 cultivars. Population structure analysis indicates a major genetic divide between North Africa and the Middle East/South Asian date palms, with evidence of admixture in cultivars from Egypt and Sudan. Genome-wide scans for selection suggest at least 56 genomic regions associated with selective sweeps that may underlie geographic adaptation. We report candidate mutations for trait variation, including nonsense polymorphisms and presence/absence variation in gene content in pathways for key agronomic traits. We also identify a copia-like retrotransposon insertion polymorphism in the R2R3 myb-like orthologue of the oil palm virescens gene associated with fruit colour variation. This analysis documents patterns of post-domestication diversification and provides a genomic resource for this economically important perennial tree crop.

  13. Natural and Unanticipated Modifiers of RNAi Activity in Caenorhabditis elegans

    PubMed Central

    Asad, Nadeem; Aw, Wen Yih; Timmons, Lisa

    2012-01-01

    Organisms used as model genomics systems are maintained as isogenic strains, yet evidence of sequence differences between independently maintained wild-type stocks has been substantiated by whole-genome resequencing data and strain-specific phenotypes. Sequence differences may arise from replication errors, transposon mobilization, meiotic gene conversion, or environmental or chemical assault on the genome. Low frequency alleles or mutations with modest effects on phenotypes can contribute to natural variation, and it has proven possible for such sequences to become fixed by adapted evolutionary enrichment and identified by resequencing. Our objective was to identify and analyze single locus genetic defects leading to RNAi resistance in isogenic strains of Caenorhabditis elegans. In so doing, we uncovered a mutation that arose de novo in an existing strain, which initially frustrated our phenotypic analysis. We also report experimental, environmental, and genetic conditions that can complicate phenotypic analysis of RNAi pathway defects. These observations highlight the potential for unanticipated mutations, coupled with genetic and environmental phenomena, to enhance or suppress the effects of known mutations and cause variation between wild-type strains. PMID:23209671

  14. Forward genetics screen coupled with whole-genome resequencing identifies novel gene targets for improving heterologous enzyme production in Aspergillus niger

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Reilly, Morgann C.; Kim, Joonhoon; Lynn, Jed

    Plant biomass, once reduced to its composite sugars, can be converted to fuel substitutes. One means of overcoming the recalcitrance of lignocellulose is pretreatment followed by enzymatic hydrolysis. However, currently available commercial enzyme cocktails are inhibited in the presence of residual pretreatment chemicals. Recent studies have identified a number of cellulolytic enzymes from bacteria that are tolerant to pretreatment chemicals such as ionic liquids. The challenge now is generation of these enzymes in copious amounts, an arena where fungal organisms such as Aspergillus niger have proven efficient. Fungal host strains still need to be engineered to increase production titers ofmore » heterologous protein over native enzymes, which has been a difficult task. Here, we developed a forward genetics screen coupled with whole-genome resequencing to identify specific lesions responsible for a protein hyper-production phenotype in A. niger. This strategy successfully identified novel targets, including a low-affinity glucose transporter, MstC, whose deletion significantly improved secretion of recombinant proteins driven by a glucoamylase promoter.« less

  15. Forward genetics screen coupled with whole-genome resequencing identifies novel gene targets for improving heterologous enzyme production in Aspergillus niger

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Reilly, Morgann C.; Kim, Joonhoon; Lynn, Jed

    Plant biomass, once reduced to its composite sugars, can be converted to fuel substitutes. One means of overcoming the recalcitrance of lignocellulose is pretreatment followed by enzymatic hydrolysis. However, currently available commercial enzyme cocktails are inhibited in the presence of residual pretreatment chemicals. Recent studies have identified a number of cellulolytic enzymes from bacteria that are tolerant to pretreatment chemicals such as ionic liquids. The challenge now is generation of these enzymes in copious amounts, an arena where fungal organisms such as Aspergillus niger have proven efficient. Fungal host strains still need to be engineered to increase production titers ofmore » heterologous protein over native enzymes, which has been a difficult task. Here, we developed a forward genetics screen coupled with whole-genome resequencing to identify specific lesions responsible for a protein hyper-production phenotype in A. niger. As a result, this strategy successfully identified novel targets, including a low-affinity glucose transporter, MstC, whose deletion significantly improved secretion of recombinant proteins driven by a glucoamylase promoter.« less

  16. Whole genome re-sequencing of date palms yields insights into diversification of a fruit tree crop

    PubMed Central

    Hazzouri, Khaled M.; Flowers, Jonathan M.; Visser, Hendrik J.; Khierallah, Hussam S. M.; Rosas, Ulises; Pham, Gina M.; Meyer, Rachel S.; Johansen, Caryn K.; Fresquez, Zoë A.; Masmoudi, Khaled; Haider, Nadia; El Kadri, Nabila; Idaghdour, Youssef; Malek, Joel A.; Thirkhill, Deborah; Markhand, Ghulam S.; Krueger, Robert R.; Zaid, Abdelouahhab; Purugganan, Michael D.

    2015-01-01

    Date palms (Phoenix dactylifera) are the most significant perennial crop in arid regions of the Middle East and North Africa. Here, we present a comprehensive catalogue of approximately seven million single nucleotide polymorphisms in date palms based on whole genome re-sequencing of a collection of 62 cultivars. Population structure analysis indicates a major genetic divide between North Africa and the Middle East/South Asian date palms, with evidence of admixture in cultivars from Egypt and Sudan. Genome-wide scans for selection suggest at least 56 genomic regions associated with selective sweeps that may underlie geographic adaptation. We report candidate mutations for trait variation, including nonsense polymorphisms and presence/absence variation in gene content in pathways for key agronomic traits. We also identify a copia-like retrotransposon insertion polymorphism in the R2R3 myb-like orthologue of the oil palm virescens gene associated with fruit colour variation. This analysis documents patterns of post-domestication diversification and provides a genomic resource for this economically important perennial tree crop. PMID:26549859

  17. Whole genome re-sequencing identifies a mutation in an ABC transporter (mdr2) in a Plasmodium chabaudi clone with altered susceptibility to antifolate drugs☆

    PubMed Central

    Martinelli, Axel; Henriques, Gisela; Cravo, Pedro; Hunt, Paul

    2011-01-01

    In malaria parasites, mutations in two genes of folate biosynthesis encoding dihydrofolate reductase (dhfr) and dihydropteroate synthase (dhps) modify responses to antifolate therapies which target these enzymes. However, the involvement of other genes which modify the availability of exogenous folate, for example, has been proposed. Here, we used short-read whole-genome re-sequencing to determine the mutations in a clone of the rodent malaria parasite, Plasmodium chabaudi, which has altered susceptibility to both sulphadoxine and pyrimethamine. This clone bears a previously identified S106N mutation in dhfr and no mutation in dhps. Instead, three additional point mutations in genes on chromosomes 2, 13 and 14 were identified. The mutated gene on chromosome 13 (mdr2 K392Q) encodes an ABC transporter. Because Quantitative Trait Locus analysis previously indicated an association of genetic markers on chromosome 13 with responses to individual and combined antifolates, MDR2 is proposed to modulate antifolate responses, possibly mediated by the transport of folate intermediates. PMID:20858498

  18. Forward genetics screen coupled with whole-genome resequencing identifies novel gene targets for improving heterologous enzyme production in Aspergillus niger

    DOE PAGES

    Reilly, Morgann C.; Kim, Joonhoon; Lynn, Jed; ...

    2018-01-06

    Plant biomass, once reduced to its composite sugars, can be converted to fuel substitutes. One means of overcoming the recalcitrance of lignocellulose is pretreatment followed by enzymatic hydrolysis. However, currently available commercial enzyme cocktails are inhibited in the presence of residual pretreatment chemicals. Recent studies have identified a number of cellulolytic enzymes from bacteria that are tolerant to pretreatment chemicals such as ionic liquids. The challenge now is generation of these enzymes in copious amounts, an arena where fungal organisms such as Aspergillus niger have proven efficient. Fungal host strains still need to be engineered to increase production titers ofmore » heterologous protein over native enzymes, which has been a difficult task. Here, we developed a forward genetics screen coupled with whole-genome resequencing to identify specific lesions responsible for a protein hyper-production phenotype in A. niger. This strategy successfully identified novel targets, including a low-affinity glucose transporter, MstC, whose deletion significantly improved secretion of recombinant proteins driven by a glucoamylase promoter.« less

  19. Whole-Genome Resequencing of Holstein Bulls for Indel Discovery and Identification of Genes Associated with Milk Composition Traits in Dairy Cattle.

    PubMed

    Jiang, Jianping; Gao, Yahui; Hou, Yali; Li, Wenhui; Zhang, Shengli; Zhang, Qin; Sun, Dongxiao

    2016-01-01

    The use of whole-genome resequencing to obtain more information on genetic variation could produce a range of benefits for the dairy cattle industry, especially with regard to increasing milk production and improving milk composition. In this study, we sequenced the genomes of eight Holstein bulls from four half- or full-sib families, with high and low estimated breeding values (EBVs) of milk protein percentage and fat percentage at an average effective depth of 10×, using Illumina sequencing. Over 0.9 million nonredundant short insertions and deletions (indels) [1-49 base pairs (bp)] were obtained. Among them, 3,625 indels that were polymorphic between the high and low groups of bulls were revealed and subjected to further analysis. The vast majority (76.67%) of these indels were novel. Follow-up validation assays confirmed that most (70%) of the randomly selected indels represented true variations. The indels that were polymorphic between the two groups were annotated based on the cattle genome sequence assembly (UMD3.1.69); as a result, nearly 1,137 of them were found to be located within 767 annotated genes, only 5 (0.138%) of which were located in exons. Then, by integrated analysis of the 767 genes with known quantitative trait loci (QTL); significant single-nucleotide polymorphisms (SNPs) previously identified by genome-wide association studies (GWASs) to be associated with bovine milk protein and fat traits; and the well-known pathways involved in protein, fat synthesis, and metabolism, we identified a total of 11 promising candidate genes potentially affecting milk composition traits. These were FCGR2B, CENPE, RETSAT, ACSBG2, NFKB2, TBC1D1, NLK, MAP3K1, SLC30A2, ANGPT1 and UGDH. Our findings provide a basis for further study and reveal key genes for milk composition traits in dairy cattle.

  20. Population genomic data reveal genes related to important traits of quail.

    PubMed

    Wu, Yan; Zhang, Yaolei; Hou, Zhuocheng; Fan, Guangyi; Pi, Jinsong; Sun, Shuai; Chen, Jiang; Liu, Huaqiao; Du, Xiao; Shen, Jie; Hu, Gang; Chen, Wenbin; Pan, Ailuan; Yin, Pingping; Chen, Xiaoli; Pu, Yuejin; Zhang, He; Liang, Zhenhua; Jian, Jianbo; Zhang, Hao; Wu, Bin; Sun, Jing; Chen, Jianwei; Tao, Hu; Yang, Ting; Xiao, Hongwei; Yang, Huan; Zheng, Chuanwei; Bai, Mingzhou; Fang, Xiaodong; Burt, David W; Wang, Wen; Li, Qingyi; Xu, Xun; Li, Chengfeng; Yang, Huanming; Wang, Jian; Yang, Ning; Liu, Xin; Du, Jinping

    2018-05-01

    Japanese quail (Coturnix japonica), a recently domesticated poultry species, is important not only as an agricultural product, but also as a model bird species for genetic research. However, most of the biological questions concerning genomics, phylogenetics, and genetics of some important economic traits have not been answered. It is thus necessary to complete a high-quality genome sequence as well as a series of comparative genomics, evolution, and functional studies. Here, we present a quail genome assembly spanning 1.04 Gb with 86.63% of sequences anchored to 30 chromosomes (28 autosomes and 2 sex chromosomes Z/W). Our genomic data have resolved the long-term debate of phylogeny among Perdicinae (Japanese quail), Meleagridinae (turkey), and Phasianinae (chicken). Comparative genomics and functional genomic data found that four candidate genes involved in early maturation had experienced positive selection, and one of them encodes follicle stimulating hormone beta (FSHβ), which is correlated with different FSHβ levels in quail and chicken. We re-sequenced 31 quails (10 wild, 11 egg-type, and 10 meat-type) and identified 18 and 26 candidate selective sweep regions in the egg-type and meat-type lines, respectively. That only one of them is shared between egg-type and meat-type lines suggests that they were subject to an independent selection. We also detected a haplotype on chromosome Z, which was closely linked with maroon/yellow plumage in quail using population resequencing and a genome-wide association study. This haplotype block will be useful for quail breeding programs. This study provided a high-quality quail reference genome, identified quail-specific genes, and resolved quail phylogeny. We have identified genes related to quail early maturation and a marker for plumage color, which is significant for quail breeding. These results will facilitate biological discovery in quails and help us elucidate the evolutionary processes within the Phasianidae family.

  1. Identification of QTLs for 14 Agronomically Important Traits in Setaria italica Based on SNPs Generated from High-Throughput Sequencing

    PubMed Central

    Zhang, Kai; Fan, Guangyu; Zhang, Xinxin; Zhao, Fang; Wei, Wei; Du, Guohua; Feng, Xiaolei; Wang, Xiaoming; Wang, Feng; Song, Guoliang; Zou, Hongfeng; Zhang, Xiaolei; Li, Shuangdong; Ni, Xuemei; Zhang, Gengyun; Zhao, Zhihai

    2017-01-01

    Foxtail millet (Setaria italica) is an important crop possessing C4 photosynthesis capability. The S. italica genome was de novo sequenced in 2012, but the sequence lacked high-density genetic maps with agronomic and yield trait linkages. In the present study, we resequenced a foxtail millet population of 439 recombinant inbred lines (RILs) and developed high-resolution bin map and high-density SNP markers, which could provide an effective approach for gene identification. A total of 59 QTL for 14 agronomic traits in plants grown under long- and short-day photoperiods were identified. The phenotypic variation explained ranged from 4.9 to 43.94%. In addition, we suggested that there may be segregation distortion on chromosome 6 that is significantly distorted toward Zhang gu. The newly identified QTL will provide a platform for sequence-based research on the S. italica genome, and for molecular marker-assisted breeding. PMID:28364039

  2. Identification of QTLs for 14 Agronomically Important Traits in Setaria italica Based on SNPs Generated from High-Throughput Sequencing.

    PubMed

    Zhang, Kai; Fan, Guangyu; Zhang, Xinxin; Zhao, Fang; Wei, Wei; Du, Guohua; Feng, Xiaolei; Wang, Xiaoming; Wang, Feng; Song, Guoliang; Zou, Hongfeng; Zhang, Xiaolei; Li, Shuangdong; Ni, Xuemei; Zhang, Gengyun; Zhao, Zhihai

    2017-05-05

    Foxtail millet ( Setaria italica ) is an important crop possessing C4 photosynthesis capability. The S. italica genome was de novo sequenced in 2012, but the sequence lacked high-density genetic maps with agronomic and yield trait linkages. In the present study, we resequenced a foxtail millet population of 439 recombinant inbred lines (RILs) and developed high-resolution bin map and high-density SNP markers, which could provide an effective approach for gene identification. A total of 59 QTL for 14 agronomic traits in plants grown under long- and short-day photoperiods were identified. The phenotypic variation explained ranged from 4.9 to 43.94%. In addition, we suggested that there may be segregation distortion on chromosome 6 that is significantly distorted toward Zhang gu. The newly identified QTL will provide a platform for sequence-based research on the S. italica genome, and for molecular marker-assisted breeding. Copyright © 2017 Zhang et al.

  3. Variability among Cucurbitaceae species (melon, cucumber and watermelon) in a genomic region containing a cluster of NBS-LRR genes.

    PubMed

    Morata, Jordi; Puigdomènech, Pere

    2017-02-08

    Cucurbitaceae species contain a significantly lower number of genes coding for proteins with similarity to plant resistance genes belonging to the NBS-LRR family than other plant species of similar genome size. A large proportion of these genes are organized in clusters that appear to be hotspots of variability. The genomes of the Cucurbitaceae species measured until now are intermediate in size (between 350 and 450 Mb) and they apparently have not undergone any genome duplications beside those at the origin of eudicots. The cluster containing the largest number of NBS-LRR genes has previously been analyzed in melon and related species and showed a high degree of interspecific and intraspecific variability. It was of interest to study whether similar behavior occurred in other cluster of the same family of genes. The cluster of NBS-LRR genes located in melon chromosome 9 was analyzed and compared with the syntenic regions in other cucurbit genomes. This is the second cluster in number within this species and it contains nine sequences with a NBS-LRR annotation including two genes, Fom1 and Prv, providing resistance against Fusarium and Ppapaya ring-spot virus (PRSV). The variability within the melon species appears to consist essentially of single nucleotide polymorphisms. Clusters of similar genes are present in the syntenic regions of the two species of Cucurbitaceae that were sequenced, cucumber and watermelon. Most of the genes in the syntenic clusters can be aligned between species and a hypothesis of generation of the cluster is proposed. The number of genes in the watermelon cluster is similar to that in melon while a higher number of genes (12) is present in cucumber, a species with a smaller genome than melon. After comparing genome resequencing data of 115 cucumber varieties, deletion of a group of genes is observed in a group of varieties of Indian origin. Clusters of genes coding for NBS-LRR proteins in cucurbits appear to have specific variability in different regions of the genome and between different species. This observation is in favour of considering that the adaptation of plant species to changing environments is based upon the variability that may occur at any location in the genome and that has been produced by specific mechanisms of sequence variation acting on plant genomes. This information could be useful both to understand the evolution of species and for plant breeding.

  4. Australian wild rice reveals pre-domestication origin of polymorphism deserts in rice genome.

    PubMed

    Krishnan S, Gopala; Waters, Daniel L E; Henry, Robert J

    2014-01-01

    Rice is a major source of human food with a predominantly Asian production base. Domestication involved selection of traits that are desirable for agriculture and to human consumers. Wild relatives of crop plants are a source of useful variation which is of immense value for crop improvement. Australian wild rices have been isolated from the impacts of domestication in Asia and represents a source of novel diversity for global rice improvement. Oryza rufipogon is a perennial wild progenitor of cultivated rice. Oryza meridionalis is a related annual species in Australia. We have examined the sequence of the genomes of AA genome wild rices from Australia that are close relatives of cultivated rice through whole genome re-sequencing. Assembly of the resequencing data to the O. sativa ssp. japonica cv. Nipponbare shows that Australian wild rices possess 2.5 times more single nucleotide polymorphisms than in the Asian wild rice and cultivated O. sativa ssp. indica. Analysis of the genome of domesticated rice reveals regions of low diversity that show very little variation (polymorphism deserts). Both the perennial and annual wild rice from Australia show a high degree of conservation of sequence with that found in cultivated rice in the same 4.58 Mbp region on chromosome 5, which suggests that some of the 'polymorphism deserts' in this and other parts of the rice genome may have originated prior to domestication due to natural selection. Analysis of genes in the 'polymorphism deserts' indicates that this selection may have been due to biotic or abiotic stress in the environment of early rice relatives. Despite having closely related sequences in these genome regions, the Australian wild populations represent an invaluable source of diversity supporting rice food security.

  5. Resequencing of Capsicum annuum parental lines (YCM334 and Taean) for the genetic analysis of bacterial wilt resistance.

    PubMed

    Kang, Yang Jae; Ahn, Yul-Kyun; Kim, Ki-Taek; Jun, Tae-Hwan

    2016-10-28

    Bacterial wilt (BW) is a widespread plant disease that affects a broad range of dicot and monocot hosts and is particularly harmful for solanaceous plants, such as pepper, tomato, and eggplant. The pathogen responsible for BW is the soil-borne bacterium, Ralstonia solanacearum, which can adapt to diverse temperature conditions and is found in climates ranging from tropical to temperate. Resistance to BW has been detected in some pepper plant lines; however, the genomic loci and alleles that mediate this are poorly studied in this species. We resequenced the pepper cultivars YCM344 and Taean, which are parental recombinant inbred lines (RIL) that display differential resistance phenotypes against BW, with YCM344 being highly resistant to infection with this pathogen. We identified novel single nucleotide polymorphisms (SNPs) and insertions/deletions (Indels) that are only present in both parental lines, as compared to the reference genome and further determined variations that distinguish these two cultivars from one another. We then identified potentially informative SNPs that were found in genes related to those that have been previously associated with disease resistance, such as the R genes and stress response genes. Moreover, via comparative analysis, we identified SNPs located in genomic regions that have homology to known resistance genes in the tomato genomes. From our SNP profiling in both parental lines, we could identify SNPs that are potentially responsible for BW resistance, and practically, these may be used as markers for assisted breeding schemes using these populations. We predict that our analyses will be valuable for both better understanding the YCM334/Taean-derived populations, as well as for enhancing our knowledge of critical SNPs present in the pepper genome.

  6. Re-sequencing transgenic plants revealed rearrangements at T-DNA inserts, and integration of a short T-DNA fragment, but no increase of small mutations elsewhere.

    PubMed

    Schouten, Henk J; Vande Geest, Henri; Papadimitriou, Sofia; Bemer, Marian; Schaart, Jan G; Smulders, Marinus J M; Perez, Gabino Sanchez; Schijlen, Elio

    2017-03-01

    Transformation resulted in deletions and translocations at T-DNA inserts, but not in genome-wide small mutations. A tiny T-DNA splinter was detected that probably would remain undetected by conventional techniques. We investigated to which extent Agrobacterium tumefaciens-mediated transformation is mutagenic, on top of inserting T-DNA. To prevent mutations due to in vitro propagation, we applied floral dip transformation of Arabidopsis thaliana. We re-sequenced the genomes of five primary transformants, and compared these to genomic sequences derived from a pool of four wild-type plants. By genome-wide comparisons, we identified ten small mutations in the genomes of the five transgenic plants, not correlated to the positions or number of T-DNA inserts. This mutation frequency is within the range of spontaneous mutations occurring during seed propagation in A. thaliana, as determined earlier. In addition, we detected small as well as large deletions specifically at the T-DNA insert sites. Furthermore, we detected partial T-DNA inserts, one of these a tiny 50-bp fragment originating from a central part of the T-DNA construct used, inserted into the plant genome without flanking other T-DNA. Because of its small size, we named this fragment a T-DNA splinter. As far as we know this is the first report of such a small T-DNA fragment insert in absence of any T-DNA border sequence. Finally, we found evidence for translocations from other chromosomes, flanking T-DNA inserts. In this study, we showed that next-generation sequencing (NGS) is a highly sensitive approach to detect T-DNA inserts in transgenic plants.

  7. Phylogeonomics and Ecogenomics of the Mycorrhizal Symbiosis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kuo, Alan; Grigoriev, Igor V.; Kohler, Annegret

    Mycorrhizal fungi play critical roles in host plant health, soil community structure and chemistry, and carbon and nutrient cycling, all areas of intense interest to the US Dept. of Energy (DOE) Joint Genome Institute (JGI). To this end we are building on our earlier sequencing of the Laccaria bicolor genome by partnering with INRA-Nancy and the mycorrhizal research community in the MGI to sequence and analyze 2 dozen mycorrhizal genomes of numerous known mycorrhizal orders and several ecological types (ectomycorrhizal [ECM], ericoid, orchid, and arbuscular). JGI has developed and deployed high-throughput pipelines for genomic, transcriptomic, and re-sequencing, and platforms formore » assembly, annotation, and analysis. In the last 2 years we have sequenced 21 genomes of mycorrhizal fungi, and resequenced 6 additional strains of L. bicolor. Most of this data is publicly available on JGI MycoCosm?s Mycorrhizal Fungi Portal (http://jgi.doe.gov/Mycorrhizal_fungi/), which provides access to both the genome data and tools with which to analyze the data. These data allow us to address long-standing issues in mycorrhizal evolution and ecology. For example, a major observation of mycorrhizal evolution is that each of the major ecological types appears to have evolved independently in multiple fungal clades. Using an ecogenomic approach we provide preliminary evidence that 2 clades (Cantharellales and Sebacinales) of a single symbiotic ecotype (orchid) utilize some common regulatory (protein tyrosine kinase) and metabolic (lipase) paths, the latter of which may be the product of HGT. Using a phylogenomic approach we provide preliminary evidence that a particular ecotype (ericoid) may have evolved more than once within a major clade (Leotiomycetes).« less

  8. SNP discovery in candidate adaptive genes using exon capture in a free-ranging alpine ungulate

    Treesearch

    Gretchen H. Roffler; Stephen J. Amish; Seth Smith; Ted Cosart; Marty Kardos; Michael K. Schwartz; Gordon Luikart

    2016-01-01

    Identification of genes underlying genomic signatures of natural selection is key to understanding adaptation to local conditions. We used targeted resequencing to identify SNP markers in 5321 candidate adaptive genes associated with known immunological, metabolic and growth functions in ovids and other ungulates. We selectively targeted 8161 exons in protein-coding...

  9. A New and Improved Rainbow Trout (Oncorhynchus mykiss) Reference Genome Assembly

    USDA-ARS?s Scientific Manuscript database

    In an effort to improve the rainbow trout reference genome assembly, we re-sequenced the doubled-haploid Swanson line using the longest available reads from the Illumina technology; generating over 510 million paired-end shotgun reads (2x260nt), and 1 billion mate-pair reads (2x160nt) from four sequ...

  10. Association of freezing tolerance to LpCBFIIIb and LpCBFIIIc gene polymorphism in perennial ryegrass accessions

    USDA-ARS?s Scientific Manuscript database

    CBF/DREB related genes are considered important genes for regulation of abiotic stress in plants. In this study, CBF/DREB genes in perennial ryegrass (Lolium perenne L.), also known as LpCBF genes, were resequenced from several cultivated and landrace plants from a worldwide collection. The same pla...

  11. Resequencing Skills and Concepts in Applied Calculus Using the Computer as a Tool.

    ERIC Educational Resources Information Center

    Heid, M. Kathleen

    1988-01-01

    During the first 12 weeks of an applied calculus course, two classes of college students studied calculus concepts using graphical and symbol-manipulation computer programs to perform routine manipulations. Three weeks were spent on skill development. Students showed better understanding of concepts and performed almost as well on routine skills.…

  12. Deep resequencing of Trichinella spiralis reveals previously un-described single nucleotide polymorphisms and intra-isolate variation within the mitochondrial genome.

    USDA-ARS?s Scientific Manuscript database

    Trichinella spiralis is a parasitic roundworm that infects domestic swine, rats and humans. Ingestion of infected pork by humans can lead to the potentially fatal disease trichinellosis. The phylogeny and historical dispersal of Trichinella spp. have been studied, in part, by sequencing portions of...

  13. Universal Detection and Identification of Avian Influenza Virus by Use of Resequencing Microarrays

    DTIC Science & Technology

    2009-04-01

    For the RT step, primer LN was replaced by primer NLN (a random 9-mer with a linker se- quence). One picogram each of two internal controls (NAC1...samples (data not shown). These data indicated that most of the avian H5N1 samples identified were presumably sensitive to neuraminidase inhibitors

  14. Genotyping in the cloud with Crossbow.

    PubMed

    Gurtowski, James; Schatz, Michael C; Langmead, Ben

    2012-09-01

    Crossbow is a scalable, portable, and automatic cloud computing tool for identifying SNPs from high-coverage, short-read resequencing data. It is built on Apache Hadoop, an implementation of the MapReduce software framework. Hadoop allows Crossbow to distribute read alignment and SNP calling subtasks over a cluster of commodity computers. Two robust tools, Bowtie and SOAPsnp, implement the fundamental alignment and variant calling operations respectively, and have demonstrated capabilities within Crossbow of analyzing approximately one billion short reads per hour on a commodity Hadoop cluster with 320 cores. Through protocol examples, this unit will demonstrate the use of Crossbow for identifying variations in three different operating modes: on a Hadoop cluster, on a single computer, and on the Amazon Elastic MapReduce cloud computing service.

  15. Mosaicism of Solid Gold supports the causality of a noncoding A-to-G transition in the determinism of the callipyge phenotype.

    PubMed Central

    Smit, Maria; Segers, Karin; Carrascosa, Laura Garcia; Shay, Tracy; Baraldi, Francesca; Gyapay, Gabor; Snowder, Gary; Georges, Michel; Cockett, Noelle; Charlier, Carole

    2003-01-01

    To identify the callipyge mutation, we have resequenced 184 kb spanning the DLK1-, GTL2-, PEG11-, and MEG8-imprinted domain and have identified an A-to-G transition in a highly conserved dodecamer motif between DLK1 and GTL2. This was the only difference found between the callipyge (CLPG) allele and a phylogenetically closely related wild-type allele. We report that this SNP is in perfect association with the callipyge genotype. The demonstration that Solid Gold-the alleged founder ram of the callipyge flock-is mosaic for this SNP virtually proves the causality of this SNP in the determinism of the callipyge phenotype. PMID:12586730

  16. A New Single Nucleotide Polymorphism Database for Rainbow Trout Generated Through Whole Genome Resequencing.

    PubMed

    Gao, Guangtu; Nome, Torfinn; Pearse, Devon E; Moen, Thomas; Naish, Kerry A; Thorgaard, Gary H; Lien, Sigbjørn; Palti, Yniv

    2018-01-01

    Single-nucleotide polymorphisms (SNPs) are highly abundant markers, which are broadly distributed in animal genomes. For rainbow trout ( Oncorhynchus mykiss ), SNP discovery has been previously done through sequencing of restriction-site associated DNA (RAD) libraries, reduced representation libraries (RRL) and RNA sequencing. Recently we have performed high coverage whole genome resequencing with 61 unrelated samples, representing a wide range of rainbow trout and steelhead populations, with 49 new samples added to 12 aquaculture samples from AquaGen (Norway) that we previously used for SNP discovery. Of the 49 new samples, 11 were double-haploid lines from Washington State University (WSU) and 38 represented wild and hatchery populations from a wide range of geographic distribution and with divergent migratory phenotypes. We then mapped the sequences to the new rainbow trout reference genome assembly (GCA_002163495.1) which is based on the Swanson YY doubled haploid line. Variant calling was conducted with FreeBayes and SAMtools mpileup , followed by filtering of SNPs based on quality score, sequence complexity, read depth on the locus, and number of genotyped samples. Results from the two variant calling programs were compared and genotypes of the double haploid samples were used for detecting and filtering putative paralogous sequence variants (PSVs) and multi-sequence variants (MSVs). Overall, 30,302,087 SNPs were identified on the rainbow trout genome 29 chromosomes and 1,139,018 on unplaced scaffolds, with 4,042,723 SNPs having high minor allele frequency (MAF > 0.25). The average SNP density on the chromosomes was one SNP per 64 bp, or 15.6 SNPs per 1 kb. Results from the phylogenetic analysis that we conducted indicate that the SNP markers contain enough population-specific polymorphisms for recovering population relationships despite the small sample size used. Intra-Population polymorphism assessment revealed high level of polymorphism and heterozygosity within each population. We also provide functional annotation based on the genome position of each SNP and evaluate the use of clonal lines for filtering of PSVs and MSVs. These SNPs form a new database, which provides an important resource for a new high density SNP array design and for other SNP genotyping platforms used for genetic and genomics studies of this iconic salmonid fish species.

  17. DNAAlignEditor: DNA alignment editor tool

    PubMed Central

    Sanchez-Villeda, Hector; Schroeder, Steven; Flint-Garcia, Sherry; Guill, Katherine E; Yamasaki, Masanori; McMullen, Michael D

    2008-01-01

    Background With advances in DNA re-sequencing methods and Next-Generation parallel sequencing approaches, there has been a large increase in genomic efforts to define and analyze the sequence variability present among individuals within a species. For very polymorphic species such as maize, this has lead to a need for intuitive, user-friendly software that aids the biologist, often with naïve programming capability, in tracking, editing, displaying, and exporting multiple individual sequence alignments. To fill this need we have developed a novel DNA alignment editor. Results We have generated a nucleotide sequence alignment editor (DNAAlignEditor) that provides an intuitive, user-friendly interface for manual editing of multiple sequence alignments with functions for input, editing, and output of sequence alignments. The color-coding of nucleotide identity and the display of associated quality score aids in the manual alignment editing process. DNAAlignEditor works as a client/server tool having two main components: a relational database that collects the processed alignments and a user interface connected to database through universal data access connectivity drivers. DNAAlignEditor can be used either as a stand-alone application or as a network application with multiple users concurrently connected. Conclusion We anticipate that this software will be of general interest to biologists and population genetics in editing DNA sequence alignments and analyzing natural sequence variation regardless of species, and will be particularly useful for manual alignment editing of sequences in species with high levels of polymorphism. PMID:18366684

  18. Cytochrome b5 and NADH cytochrome b5 reductase: genotype-phenotype correlations for hydroxylamine reduction.

    PubMed

    Sacco, James C; Trepanier, Lauren A

    2010-01-01

    NADH cytochrome b5 reductase (b5R) and cytochrome b5 (b5) catalyze the reduction of sulfamethoxazole hydroxylamine (SMX-HA), which can contribute to sulfonamide hypersensitivity, to the parent drug sulfamethoxazole. Variability in hydroxylamine reduction could thus play a role in adverse drug reactions. The aim of this study was to characterize variability in SMX-HA reduction in 111 human livers, and investigate its association with single nucleotide polymorphisms (SNPs) in b5 and b5R cDNA. Liver microsomes were assayed for SMX-HA reduction activity, and b5 and b5R expression was semiquantified by immunoblotting. The coding regions of the b5 (CYB5A) and b5R (CYB5R3) genes were resequenced. Hepatic SMX-HA reduction displayed a 19-fold range of individual variability (0.06-1.11 nmol/min/mg protein), and a 17-fold range in efficiency (Vmax/Km) among outliers. SMX-HA reduction was positively correlated with b5 and b5R protein content (P<0.0001, r=0.42; P=0.01, r=0.23, respectively), and expression of both proteins correlated with one another (P<0.0001; r=0.74). A novel cSNP in CYB5A (S5A) was associated with very low activity and protein expression. Two novel CYB5R3 SNPs, R59H and R297H, displayed atypical SMX-HA reduction kinetics and decreased SMX-HA reduction efficiency. These studies indicate that although novel cSNPs in CYB5A and CYB5R3 are associated with significantly altered protein expression and/or hydroxylamine reduction activities, these low-frequency cSNPs seem to only minimally impact overall observed phenotypic variability. Work is underway to characterize polymorphisms in other regions of these genes to further account for individual variability in hydroxylamine reduction.

  19. LPA and PLG sequence variation and kringle IV-2 copy number in two populations.

    PubMed

    Crawford, Dana C; Peng, Ze; Cheng, Jan-Fang; Boffelli, Dario; Ahearn, Magdalena; Nguyen, Dan; Shaffer, Tristan; Yi, Qian; Livingston, Robert J; Rieder, Mark J; Nickerson, Deborah A

    2008-01-01

    Lp(a) levels have long been recognized as a potential risk factor for coronary heart disease that is almost completely under genetic control. Much of the genetics impacting Lp(a) levels has been attributed to the highly polymorphic LPA kringle IV-2 copy number variant, and most of the variance in Lp(a) levels in populations of European-descent is inversely correlated with kringle IV copy number. However, less of the variance is explained in African-descent populations for the same structural variation. African-descent populations have, on average, higher levels of Lp(a), suggesting other genetic factors contribute to Lp(a) level variability across populations. To identify potential cis-acting factors, we re-sequenced the gene LPA for single nucleotide polymorphism (SNP) discovery in 23 European-Americans and 24 African-Americans. We also re- sequenced the neighboring gene plasminogen (PLG) and genotyped the kringle IV copy number variant in the same reference samples. These data are the most comprehensive description of sequence variation in LPA and its relationship with the kringle IV copy number variant. With these data, we demonstrate that only a fraction of LPA sequence diversity has been previously documented. Also, we identify several high frequency SNPs present in the African-American sample but absent in the European-American sample. Finally, we show that SNPs within PLG are not in linkage disequilibrium with SNPs in LPA, and we show that kringle IV copy number variation is not in linkage disequilibrium with either LPA or PLG SNPs. Together, these data suggest that LPA SNPs could independently contribute to Lp(a) levels in the general population. Copyright (c) 2008 S. Karger AG, Basel.

  20. Resequencing of the Phytophthora ramorum genome to characterize genetic variation and population dynamics of the invasive pathogen

    Treesearch

    Jennifer Yuzon; David M. Rizzo; Mathu Malar C; Sucheta Tripathy; Takao Kasuga

    2017-01-01

    Phytophthora ramorum has spread and diversified throughout California’s northwestern coast since its introduction in the 1990s. Tracking the spread of P. ramorum and the functional response of the pathogen to the environment is of particular interest to managing the epidemic. Using genetic tools such as microsatellite...

  1. 75 FR 28032 - National Heart, Lung, and Blood Institute; Notice of Closed Meetings

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-05-19

    ... Coordinating Center. Date: June 4, 2010. Time: 1 p.m. to 2 p.m. Agenda: To review and evaluate contract..., NHLBI DNA Resequencing and Genotyping (RS&G) Service: Laboratory Center(s). Date: June 4, 2010. Time: 2... Career Enhancement Awards. Date: June 8-9, 2010. Time: 8 a.m. to 5 p.m. Agenda: To review and evaluate...

  2. Identification of a nonsense mutation in APAF1 that is likely causal for a decrease in reproductive efficiency in Holstein dairy cattle

    USDA-ARS?s Scientific Manuscript database

    A haplotype on cattle chromosome 5 carrying a recessive lethal allele was found to originate in a Holstein-Friesian foundation sire. Resequencing led to the identification of a stop-gain mutation in exon 11 of APAF1, a gene known to cause embryonic lethality and neurodevelopmental abnormalities in ...

  3. Applications of microarray technology in breast cancer research

    PubMed Central

    Cooper, Colin S

    2001-01-01

    Microarrays provide a versatile platform for utilizing information from the Human Genome Project to benefit human health. This article reviews the ways in which microarray technology may be used in breast cancer research. Its diverse applications include monitoring chromosome gains and losses, tumour classification, drug discovery and development, DNA resequencing, mutation detection and investigating the mechanism of tumour development. PMID:11305951

  4. Identification of one polymorphism from the PAPP-A2 gene associated to fertility in Romosinuano beef heifers raised under a subtropical environment

    USDA-ARS?s Scientific Manuscript database

    The objective of this study was to identify single nucleotide polymorphisms (SNP) associated to fertility in female cows raised under a subtropical environment. Re-sequencing of 9 genes associated to GH-IGF endocrine pathway located in bovine chromosome 5, identified 75 SNP useful for associative ge...

  5. Targeted Resequencing of 29 Candidate Genes and Mouse Expression Studies Implicate ZIC3 and FOXF1 in Human VATER/VACTERL Association.

    PubMed

    Hilger, Alina C; Halbritter, Jan; Pennimpede, Tracie; van der Ven, Amelie; Sarma, Georgia; Braun, Daniela A; Porath, Jonathan D; Kohl, Stefan; Hwang, Daw-Yang; Dworschak, Gabriel C; Hermann, Bernhard G; Pavlova, Anna; El-Maarri, Osman; Nöthen, Markus M; Ludwig, Michael; Reutter, Heiko; Hildebrandt, Friedhelm

    2015-12-01

    The VATER/VACTERL association describes the combination of congenital anomalies including vertebral defects, anorectal malformations, cardiac defects, tracheoesophageal fistula with or without esophageal atresia, renal malformations, and limb defects. As mutations in ciliary genes were observed in diseases related to VATER/VACTERL, we performed targeted resequencing of 25 ciliary candidate genes as well as disease-associated genes (FOXF1, HOXD13, PTEN, ZIC3) in 123 patients with VATER/VACTERL or VATER/VACTERL-like phenotype. We detected no biallelic mutation in any of the 25 ciliary candidate genes; however, identified an identical, probably disease-causing ZIC3 missense mutation (p.Gly17Cys) in four patients and a FOXF1 de novo mutation (p.Gly220Cys) in a further patient. In situ hybridization analyses in mouse embryos between E9.5 and E14.5 revealed Zic3 expression in limb and prevertebral structures, and Foxf1 expression in esophageal, tracheal, vertebral, anal, and genital tubercle tissues, hence VATER/VACTERL organ systems. These data provide strong evidence that mutations in ZIC3 or FOXF1 contribute to VATER/VACTERL. © 2015 WILEY PERIODICALS, INC.

  6. Non-coding variants contribute to the clinical heterogeneity of TTR amyloidosis.

    PubMed

    Iorio, Andrea; De Lillo, Antonella; De Angelis, Flavio; Di Girolamo, Marco; Luigetti, Marco; Sabatelli, Mario; Pradotto, Luca; Mauro, Alessandro; Mazzeo, Anna; Stancanelli, Claudia; Perfetto, Federico; Frusconi, Sabrina; My, Filomena; Manfellotto, Dario; Fuciarelli, Maria; Polimanti, Renato

    2017-09-01

    Coding mutations in TTR gene cause a rare hereditary form of systemic amyloidosis, which has a complex genotype-phenotype correlation. We investigated the role of non-coding variants in regulating TTR gene expression and consequently amyloidosis symptoms. We evaluated the genotype-phenotype correlation considering the clinical information of 129 Italian patients with TTR amyloidosis. Then, we conducted a re-sequencing of TTR gene to investigate how non-coding variants affect TTR expression and, consequently, phenotypic presentation in carriers of amyloidogenic mutations. Polygenic scores for genetically determined TTR expression were constructed using data from our re-sequencing analysis and the GTEx (Genotype-Tissue Expression) project. We confirmed a strong phenotypic heterogeneity across coding mutations causing TTR amyloidosis. Considering the effects of non-coding variants on TTR expression, we identified three patient clusters with specific expression patterns associated with certain phenotypic presentations, including late onset, autonomic neurological involvement, and gastrointestinal symptoms. This study provides novel data regarding the role of non-coding variation and the gene expression profiles in patients affected by TTR amyloidosis, also putting forth an approach that could be used to investigate the mechanisms at the basis of the genotype-phenotype correlation of the disease.

  7. Is the child 'father of the man'? evaluating the stability of genetic influences across development.

    PubMed

    Ronald, Angelica

    2011-11-01

    This selective review considers findings in genetic research that have shed light on how genes operate across development. We will address the question of whether the child is 'father of the Man' from a genetic perspective. In other words, do the same genetic influences affect the same traits across development? Using a 'taster menu' approach and prioritizing newer findings on cognitive and behavioral traits, examples from the following genetic disciplines will be discussed: (a) developmental quantitative genetics (such as longitudinal twin studies), (b) neurodevelopmental genetic syndromes with known genetic causes (such as Williams syndrome), (c) developmental candidate gene studies (such as those that link infant and adult populations), (d) developmental genome-wide association studies (GWAS), and (e) DNA resequencing. Evidence presented here suggests that there is considerable genetic stability of cognitive and behavioral traits across development, but there is also evidence for genetic change. Quantitative genetic studies have a long history of assessing genetic continuity and change across development. It is now time for the newer, more technology-enabled fields such as GWAS and DNA resequencing also to take on board the dynamic nature of human behavior. 2011 Blackwell Publishing Ltd.

  8. Population-Based Resequencing of Experimentally Evolved Populations Reveals the Genetic Basis of Body Size Variation in Drosophila melanogaster

    PubMed Central

    Turner, Thomas L.; Stewart, Andrew D.; Fields, Andrew T.; Rice, William R.; Tarone, Aaron M.

    2011-01-01

    Body size is a classic quantitative trait with evolutionarily significant variation within many species. Locating the alleles responsible for this variation would help understand the maintenance of variation in body size in particular, as well as quantitative traits in general. However, successful genome-wide association of genotype and phenotype may require very large sample sizes if alleles have low population frequencies or modest effects. As a complementary approach, we propose that population-based resequencing of experimentally evolved populations allows for considerable power to map functional variation. Here, we use this technique to investigate the genetic basis of natural variation in body size in Drosophila melanogaster. Significant differentiation of hundreds of loci in replicate selection populations supports the hypothesis that the genetic basis of body size variation is very polygenic in D. melanogaster. Significantly differentiated variants are limited to single genes at some loci, allowing precise hypotheses to be formed regarding causal polymorphisms, while other significant regions are large and contain many genes. By using significantly associated polymorphisms as a priori candidates in follow-up studies, these data are expected to provide considerable power to determine the genetic basis of natural variation in body size. PMID:21437274

  9. Droplet-based pyrosequencing using digital microfluidics.

    PubMed

    Boles, Deborah J; Benton, Jonathan L; Siew, Germaine J; Levy, Miriam H; Thwar, Prasanna K; Sandahl, Melissa A; Rouse, Jeremy L; Perkins, Lisa C; Sudarsan, Arjun P; Jalili, Roxana; Pamula, Vamsee K; Srinivasan, Vijay; Fair, Richard B; Griffin, Peter B; Eckhardt, Allen E; Pollack, Michael G

    2011-11-15

    The feasibility of implementing pyrosequencing chemistry within droplets using electrowetting-based digital microfluidics is reported. An array of electrodes patterned on a printed-circuit board was used to control the formation, transportation, merging, mixing, and splitting of submicroliter-sized droplets contained within an oil-filled chamber. A three-enzyme pyrosequencing protocol was implemented in which individual droplets contained enzymes, deoxyribonucleotide triphosphates (dNTPs), and DNA templates. The DNA templates were anchored to magnetic beads which enabled them to be thoroughly washed between nucleotide additions. Reagents and protocols were optimized to maximize signal over background, linearity of response, cycle efficiency, and wash efficiency. As an initial demonstration of feasibility, a portion of a 229 bp Candida parapsilosis template was sequenced using both a de novo protocol and a resequencing protocol. The resequencing protocol generated over 60 bp of sequence with 100% sequence accuracy based on raw pyrogram levels. Excellent linearity was observed for all of the homopolymers (two, three, or four nucleotides) contained in the C. parapsilosis sequence. With improvements in microfluidic design it is expected that longer reads, higher throughput, and improved process integration (i.e., "sample-to-sequence" capability) could eventually be achieved using this low-cost platform.

  10. Deep sequencing detects very-low-grade somatic mosaicism in the unaffected mother of siblings with nemaline myopathy.

    PubMed

    Miyatake, Satoko; Koshimizu, Eriko; Hayashi, Yukiko K; Miya, Kazushi; Shiina, Masaaki; Nakashima, Mitsuko; Tsurusaki, Yoshinori; Miyake, Noriko; Saitsu, Hirotomo; Ogata, Kazuhiro; Nishino, Ichizo; Matsumoto, Naomichi

    2014-07-01

    When an expected mutation in a particular disease-causing gene is not identified in a suspected carrier, it is usually assumed to be due to germline mosaicism. We report here very-low-grade somatic mosaicism in ACTA1 in an unaffected mother of two siblings affected with a neonatal form of nemaline myopathy. The mosaicism was detected by deep resequencing using a next-generation sequencer. We identified a novel heterozygous mutation in ACTA1, c.448A>G (p.Thr150Ala), in the affected siblings. Three-dimensional structural modeling suggested that this mutation may affect polymerization and/or actin's interactions with other proteins. In this family, we expected autosomal dominant inheritance with either parent demonstrating germline or somatic mosaicism. Sanger sequencing identified no mutation. However, further deep resequencing of this mutation on a next-generation sequencer identified very-low-grade somatic mosaicism in the mother: 0.4%, 1.1%, and 8.3% in the saliva, blood leukocytes, and nails, respectively. Our study demonstrates the possibility of very-low-grade somatic mosaicism in suspected carriers, rather than germline mosaicism. Copyright © 2014 Elsevier B.V. All rights reserved.

  11. Droplet-Based Pyrosequencing Using Digital Microfluidics

    PubMed Central

    Boles, Deborah J.; Benton, Jonathan L.; Siew, Germaine J.; Levy, Miriam H.; Thwar, Prasanna K.; Sandahl, Melissa A.; Rouse, Jeremy L.; Perkins, Lisa C.; Sudarsan, Arjun P.; Jalili, Roxana; Pamula, Vamsee K.; Srinivasan, Vijay; Fair, Richard B.; Griffin, Peter B.; Eckhardt, Allen E.; Pollack, Michael G.

    2013-01-01

    The feasibility of implementing pyrosequencing chemistry within droplets using electrowetting-based digital microfluidics is reported. An array of electrodes patterned on a printed-circuit board was used to control the formation, transportation, merging, mixing, and splitting of submicroliter-sized droplets contained within an oil-filled chamber. A three-enzyme pyrosequencing protocol was implemented in which individual droplets contained enzymes, deoxyribonucleotide triphosphates (dNTPs), and DNA templates. The DNA templates were anchored to magnetic beads which enabled them to be thoroughly washed between nucleotide additions. Reagents and protocols were optimized to maximize signal over background, linearity of response, cycle efficiency, and wash efficiency. As an initial demonstration of feasibility, a portion of a 229 bp Candida parapsilosis template was sequenced using both a de novo protocol and a resequencing protocol. The resequencing protocol generated over 60 bp of sequence with 100% sequence accuracy based on raw pyrogram levels. Excellent linearity was observed for all of the homopolymers (two, three, or four nucleotides) contained in the C. parapsilosis sequence. With improvements in microfluidic design it is expected that longer reads, higher throughput, and improved process integration (i.e., “sample-to-sequence” capability) could eventually be achieved using this low-cost platform. PMID:21932784

  12. Genome re-sequencing reveals the history of apple and supports a two-stage model for fruit enlargement.

    PubMed

    Duan, Naibin; Bai, Yang; Sun, Honghe; Wang, Nan; Ma, Yumin; Li, Mingjun; Wang, Xin; Jiao, Chen; Legall, Noah; Mao, Linyong; Wan, Sibao; Wang, Kun; He, Tianming; Feng, Shouqian; Zhang, Zongying; Mao, Zhiquan; Shen, Xiang; Chen, Xiaoliu; Jiang, Yuanmao; Wu, Shujing; Yin, Chengmiao; Ge, Shunfeng; Yang, Long; Jiang, Shenghui; Xu, Haifeng; Liu, Jingxuan; Wang, Deyun; Qu, Changzhi; Wang, Yicheng; Zuo, Weifang; Xiang, Li; Liu, Chang; Zhang, Daoyuan; Gao, Yuan; Xu, Yimin; Xu, Kenong; Chao, Thomas; Fazio, Gennaro; Shu, Huairui; Zhong, Gan-Yuan; Cheng, Lailiang; Fei, Zhangjun; Chen, Xuesen

    2017-08-15

    Human selection has reshaped crop genomes. Here we report an apple genome variation map generated through genome sequencing of 117 diverse accessions. A comprehensive model of apple speciation and domestication along the Silk Road is proposed based on evidence from diverse genomic analyses. Cultivated apples likely originate from Malus sieversii in Kazakhstan, followed by intensive introgressions from M. sylvestris. M. sieversii in Xinjiang of China turns out to be an "ancient" isolated ecotype not directly contributing to apple domestication. We have identified selective sweeps underlying quantitative trait loci/genes of important fruit quality traits including fruit texture and flavor, and provide evidences supporting a model of apple fruit size evolution comprising two major events with one occurring prior to domestication and the other during domestication. This study outlines the genetic basis of apple domestication and evolution, and provides valuable information for facilitating marker-assisted breeding and apple improvement.Apple is one of the most important fruit crops. Here, the authors perform deep genome resequencing of 117 diverse accessions and reveal comprehensive models of apple origin, speciation, domestication, and fruit size evolution as well as candidate genes associated with important agronomic traits.

  13. Genomewide Scan Reveals Amplification of mdr1 as a Common Denominator of Resistance to Mefloquine, Lumefantrine, and Artemisinin in Plasmodium chabaudi Malaria Parasites▿†‡

    PubMed Central

    Borges, Sofia; Cravo, Pedro; Creasey, Alison; Fawcett, Richard; Modrzynska, Katarzyna; Rodrigues, Louise; Martinelli, Axel; Hunt, Paul

    2011-01-01

    Multidrug-resistant Plasmodium falciparum malaria parasites pose a threat to effective drug control, even to artemisinin-based combination therapies (ACTs). Here we used linkage group selection and Solexa whole-genome resequencing to investigate the genetic basis of resistance to component drugs of ACTs. Using the rodent malaria parasite P. chabaudi, we analyzed the uncloned progeny of a genetic backcross between the mefloquine-, lumefantrine-, and artemisinin-resistant mutant AS-15MF and a genetically distinct sensitive clone, AJ, following drug treatment. Genomewide scans of selection showed that parasites surviving each drug treatment bore a duplication of a segment of chromosome 12 (translocated to chromosome 04) present in AS-15MF. Whole-genome resequencing identified the size of the duplicated segment and its position on chromosome 4. The duplicated fragment extends for ∼393 kbp and contains over 100 genes, including mdr1, encoding the multidrug resistance P-glycoprotein homologue 1. We therefore show that resistance to chemically distinct components of ACTs is mediated by the same genetic mutation, highlighting a possible limitation of these therapies. PMID:21709099

  14. Population-based resequencing revealed an ancestral winter group of cultivated flax: implication for flax domestication processes

    PubMed Central

    Fu, Yong-Bi

    2012-01-01

    Cultivated flax (Linum usitatissimum L.) is the earliest oil and fiber crop and its early domestication history may involve multiple events of domestication for oil, fiber, capsular indehiscence, and winter hardiness. Genetic studies have demonstrated that winter cultivated flax is closely related to oil and fiber cultivated flax and shows little relatedness to its progenitor, pale flax (L. bienne Mill.), but winter hardiness is one major characteristic of pale flax. Here, we assessed the genetic relationships of 48 Linum samples representing pale flax and four trait-specific groups of cultivated flax (dehiscent, fiber, oil, and winter) through population-based resequencing at 24 genomic regions, and revealed a winter group of cultivated flax that displayed close relatedness to the pale flax samples. Overall, the cultivated flax showed a 27% reduction of nucleotide diversity when compared with the pale flax. Recombination frequently occurred at these sampled genomic regions, but the signal of selection and bottleneck was relatively weak. These findings provide some insight into the impact and processes of flax domestication and are significant for expanding our knowledge about early flax domestication, particularly for winter hardiness. PMID:22822439

  15. Hybrid error correction and de novo assembly of single-molecule sequencing reads

    PubMed Central

    Koren, Sergey; Schatz, Michael C.; Walenz, Brian P.; Martin, Jeffrey; Howard, Jason; Ganapathy, Ganeshkumar; Wang, Zhong; Rasko, David A.; McCombie, W. Richard; Jarvis, Erich D.; Phillippy, Adam M.

    2012-01-01

    Emerging single-molecule sequencing instruments can generate multi-kilobase sequences with the potential to dramatically improve genome and transcriptome assembly. However, the high error rate of single-molecule reads is challenging, and has limited their use to resequencing bacteria. To address this limitation, we introduce a novel correction algorithm and assembly strategy that utilizes shorter, high-identity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on Pacbio RS reads of phage, prokaryotic, and eukaryotic whole genomes, including the novel genome of the parrot Melopsittacus undulatus, as well as for RNA-seq reads of the corn (Zea mays) transcriptome. Our approach achieves over 99.9% read correction accuracy and produces substantially better assemblies than current sequencing strategies: in the best example, quintupling the median contig size relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly. PMID:22750884

  16. Hybridisation-based resequencing of 17 X-linked intellectual disability genes in 135 patients reveals novel mutations in ATRX, SLC6A8 and PQBP1

    PubMed Central

    Jensen, Lars R; Chen, Wei; Moser, Bettina; Lipkowitz, Bettina; Schroeder, Christopher; Musante, Luciana; Tzschach, Andreas; Kalscheuer, Vera M; Meloni, Ilaria; Raynaud, Martine; van Esch, Hilde; Chelly, Jamel; de Brouwer, Arjan P M; Hackett, Anna; van der Haar, Sigrun; Henn, Wolfram; Gecz, Jozef; Riess, Olaf; Bonin, Michael; Reinhardt, Richard; Ropers, Hans-Hilger; Kuss, Andreas W

    2011-01-01

    X-linked intellectual disability (XLID), also known as X-linked mental retardation, is a highly genetically heterogeneous condition for which mutations in >90 different genes have been identified. In this study, we used a custom-made sequencing array based on the Affymetrix 50k platform for mutation screening in 17 known XLID genes in patients from 135 families and found eight single-nucleotide changes that were absent in controls. For four mutations affecting ATRX (p.1761M>T), PQBP1 (p.155R>X) and SLC6A8 (p.390P>L and p.477S>L), we provide evidence for a functional involvement of these changes in the aetiology of intellectual disability. PMID:21267006

  17. Identification of RAN1 orthologue associated with sex determination through whole genome sequencing analysis in fig (Ficus carica L.).

    PubMed

    Mori, Kazuki; Shirasawa, Kenta; Nogata, Hitoshi; Hirata, Chiharu; Tashiro, Kosuke; Habu, Tsuyoshi; Kim, Sangwan; Himeno, Shuichi; Kuhara, Satoru; Ikegami, Hidetoshi

    2017-01-25

    With the aim of identifying sex determinants of fig, we generated the first draft genome sequence of fig and conducted the subsequent analyses. Linkage analysis with a high-density genetic map established by a restriction-site associated sequencing technique, and genome-wide association study followed by whole-genome resequencing analysis identified two missense mutations in RESPONSIVE-TO-ANTAGONIST1 (RAN1) orthologue encoding copper-transporting ATPase completely associated with sex phenotypes of investigated figs. This result suggests that RAN1 is a possible sex determinant candidate in the fig genome. The genomic resources and genetic findings obtained in this study can contribute to general understanding of Ficus species and provide an insight into fig's and plant's sex determination system.

  18. Population-Specific Resequencing Associates the ATP-Binding Cassette Subfamily C Member 4 Gene With Gout in New Zealand Māori and Pacific Men.

    PubMed

    Tanner, Callum; Boocock, James; Stahl, Eli A; Dobbyn, Amanda; Mandal, Asim K; Cadzow, Murray; Phipps-Green, Amanda J; Topless, Ruth K; Hindmarsh, Jennie Harré; Stamp, Lisa K; Dalbeth, Nicola; Choi, Hyon K; Mount, David B; Merriman, Tony R

    2017-07-01

    There is no evidence for a genetic association between organic anion transporters 1-3 (SLC22A6, SLC22A7, and SLC22A8) and multidrug resistance protein 4 (MRP4; encoded by ABCC4) with the levels of serum urate or gout. The Māori and Pacific (Polynesian) population of New Zealand has the highest prevalence of gout worldwide. The aim of this study was to determine whether any Polynesian population-specific genetic variants in SLC22A6-8 and ABCC4 are associated with gout. All participants had ≥3 self-reported Māori and/or Pacific grandparents. Among the total sample set of 1,808 participants, 191 hyperuricemic and 202 normouricemic individuals were resequenced over the 4 genes, and the remaining 1,415 individuals were used for replication. Regression analyses were performed, adjusting for age, sex, and Polynesian ancestry. To study the functional effect of nonsynonymous variants of ABCC4, transport assays were performed in Xenopus laevis oocytes. A total of 39 common variants were detected, with an ABCC4 variant (rs4148500) significantly associated with hyperuricemia and gout. This variant was monomorphic for the urate-lowering allele in Europeans. There was evidence for an association of rs4148500 with gout in the resequenced samples (odds ratio [OR] 1.62 [P = 0.012]) that was replicated (OR 1.25 [P = 0.033]) and restricted to men (OR 1.43 [P = 0.001] versus OR 0.98 [P = 0.89] in women). The gout risk allele was associated with fractional excretion of uric acid in male individuals (β = -0.570 [P = 0.01]). A rare population-specific allele (P1036L) with predicted strong functional consequence reduced the uric acid transport activity of ABCC4 by 30%. An association between ABCC4 and gout and fractional excretion of uric acid is consistent with the established role of MRP4 as a unidirectional renal uric acid efflux pump. © 2017, American College of Rheumatology.

  19. Identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq.

    PubMed

    Barrick, Jeffrey E; Colburn, Geoffrey; Deatherage, Daniel E; Traverse, Charles C; Strand, Matthew D; Borges, Jordan J; Knoester, David B; Reba, Aaron; Meyer, Austin G

    2014-11-29

    Mutations that alter chromosomal structure play critical roles in evolution and disease, including in the origin of new lifestyles and pathogenic traits in microbes. Large-scale rearrangements in genomes are often mediated by recombination events involving new or existing copies of mobile genetic elements, recently duplicated genes, or other repetitive sequences. Most current software programs for predicting structural variation from short-read DNA resequencing data are intended primarily for use on human genomes. They typically disregard information in reads mapping to repeat sequences, and significant post-processing and manual examination of their output is often required to rule out false-positive predictions and precisely describe mutational events. We have implemented an algorithm for identifying structural variation from DNA resequencing data as part of the breseq computational pipeline for predicting mutations in haploid microbial genomes. Our method evaluates the support for new sequence junctions present in a clonal sample from split-read alignments to a reference genome, including matches to repeat sequences. Then, it uses a statistical model of read coverage evenness to accept or reject these predictions. Finally, breseq combines predictions of new junctions and deleted chromosomal regions to output biologically relevant descriptions of mutations and their effects on genes. We demonstrate the performance of breseq on simulated Escherichia coli genomes with deletions generating unique breakpoint sequences, new insertions of mobile genetic elements, and deletions mediated by mobile elements. Then, we reanalyze data from an E. coli K-12 mutation accumulation evolution experiment in which structural variation was not previously identified. Transposon insertions and large-scale chromosomal changes detected by breseq account for ~25% of spontaneous mutations in this strain. In all cases, we find that breseq is able to reliably predict structural variation with modest read-depth coverage of the reference genome (>40-fold). Using breseq to predict structural variation should be useful for studies of microbial epidemiology, experimental evolution, synthetic biology, and genetics when a reference genome for a closely related strain is available. In these cases, breseq can discover mutations that may be responsible for important or unintended changes in genomes that might otherwise go undetected.

  20. Resequencing three candidate genes discovers seven potentially deleterious variants susceptibility to major depressive disorder and suicide attempts in Chinese.

    PubMed

    Rao, Shitao; Leung, Cherry She Ting; Lam, Macro Hb; Wing, Yun Kwok; Waye, Mary Miu Yee; Tsui, Stephen Kwok Wing

    2017-03-01

    To date almost 200 genes were found to be associated with major depressive disorder (MDD) or suicide attempts (SA), but very few genes were reported for their molecular mechanisms. This study aimed to find out whether there were common or rare variants in three candidate genes altering the risk for MDD and SA in Chinese. Three candidate genes (HOMER1, SLC6A4 and TEF) were chosen for resequencing analysis and association studies as they were reported to be involved in the etiology of MDD and SA. Following that, bioinformatics analyses were applied on those variants of interest. After resequencing analysis and alignment for the amplicons, a total of 34 common or rare variants were found in the randomly selected 36 Hong Kong Chinese patients with both MDD and SA. Among those, seven variants show potentially deleterious features. Rs60029191 and a rare variant located in regulatory region of the HOMER1 gene may affect the promoter activities through interacting with predicted transcription factors. Two missense mutations existed in the SLC6A4 coding regions were firstly reported in Hong Kong Chinese MDD and SA patients, and both of them could affect the transport efficiency of SLC6A4 to serotonin. Moreover, a common variant rs6354 located in the untranslated region of this gene may affect the expression level or exonic splicing of serotonin transporter. In addition, both of a most studied polymorphism rs738499 and a low-frequency variant in the promoter region of the TEF gene were found to be located in potential transcription factor binding sites, which may let the two variants be able to influence the promoter activities of the gene. This study elucidated the potentially molecular mechanisms of the three candidate genes altering the risk for MDD and SA. These findings implied that not only common variants but rare variants could make contributions to the genetic susceptibility to MDD and SA in Chinese. Copyright © 2016 Elsevier B.V. All rights reserved.

  1. SNiPlay: a web-based tool for detection, management and analysis of SNPs. Application to grapevine diversity projects.

    PubMed

    Dereeper, Alexis; Nicolas, Stéphane; Le Cunff, Loïc; Bacilieri, Roberto; Doligez, Agnès; Peros, Jean-Pierre; Ruiz, Manuel; This, Patrice

    2011-05-05

    High-throughput re-sequencing, new genotyping technologies and the availability of reference genomes allow the extensive characterization of Single Nucleotide Polymorphisms (SNPs) and insertion/deletion events (indels) in many plant species. The rapidly increasing amount of re-sequencing and genotyping data generated by large-scale genetic diversity projects requires the development of integrated bioinformatics tools able to efficiently manage, analyze, and combine these genetic data with genome structure and external data. In this context, we developed SNiPlay, a flexible, user-friendly and integrative web-based tool dedicated to polymorphism discovery and analysis. It integrates:1) a pipeline, freely accessible through the internet, combining existing softwares with new tools to detect SNPs and to compute different types of statistical indices and graphical layouts for SNP data. From standard sequence alignments, genotyping data or Sanger sequencing traces given as input, SNiPlay detects SNPs and indels events and outputs submission files for the design of Illumina's SNP chips. Subsequently, it sends sequences and genotyping data into a series of modules in charge of various processes: physical mapping to a reference genome, annotation (genomic position, intron/exon location, synonymous/non-synonymous substitutions), SNP frequency determination in user-defined groups, haplotype reconstruction and network, linkage disequilibrium evaluation, and diversity analysis (Pi, Watterson's Theta, Tajima's D).Furthermore, the pipeline allows the use of external data (such as phenotype, geographic origin, taxa, stratification) to define groups and compare statistical indices.2) a database storing polymorphisms, genotyping data and grapevine sequences released by public and private projects. It allows the user to retrieve SNPs using various filters (such as genomic position, missing data, polymorphism type, allele frequency), to compare SNP patterns between populations, and to export genotyping data or sequences in various formats. Our experiments on grapevine genetic projects showed that SNiPlay allows geneticists to rapidly obtain advanced results in several key research areas of plant genetic diversity. Both the management and treatment of large amounts of SNP data are rendered considerably easier for end-users through automation and integration. Current developments are taking into account new advances in high-throughput technologies.SNiPlay is available at: http://sniplay.cirad.fr/.

  2. A two-stage stochastic rule-based model to determine pre-assembly buffer content

    NASA Astrophysics Data System (ADS)

    Gunay, Elif Elcin; Kula, Ufuk

    2018-01-01

    This study considers instant decision-making needs of the automobile manufactures for resequencing vehicles before final assembly (FA). We propose a rule-based two-stage stochastic model to determine the number of spare vehicles that should be kept in the pre-assembly buffer to restore the altered sequence due to paint defects and upstream department constraints. First stage of the model decides the spare vehicle quantities, where the second stage model recovers the scrambled sequence respect to pre-defined rules. The problem is solved by sample average approximation (SAA) algorithm. We conduct a numerical study to compare the solutions of heuristic model with optimal ones and provide following insights: (i) as the mismatch between paint entrance and scheduled sequence decreases, the rule-based heuristic model recovers the scrambled sequence as good as the optimal resequencing model, (ii) the rule-based model is more sensitive to the mismatch between the paint entrance and scheduled sequences for recovering the scrambled sequence, (iii) as the defect rate increases, the difference in recovery effectiveness between rule-based heuristic and optimal solutions increases, (iv) as buffer capacity increases, the recovery effectiveness of the optimization model outperforms heuristic model, (v) as expected the rule-based model holds more inventory than the optimization model.

  3. Screening Mutations of MYBPC3 in 114 Unrelated Patients with Hypertrophic Cardiomyopathy by Targeted Capture and Next-generation Sequencing.

    PubMed

    Liu, Xuxia; Jiang, Tengyong; Piao, Chunmei; Li, Xiaoyan; Guo, Jun; Zheng, Shuai; Zhang, Xiaoping; Cai, Tao; Du, Jie

    2015-06-19

    Hypertrophic cardiomyopathy (HCM) is a major cause of sudden cardiac death. Mutations in the MYBPC3 gene represent the cause of HCM in ~35% of patients with HCM. However, genetic testing in clinic setting has been limited due to the cost and relatively time-consuming by Sanger sequencing. Here, we developed a HCM Molecular Diagnostic Kit enabling ultra-low-cost targeted gene resequencing in a large cohort and investigated the mutation spectrum of MYBPC3. In a cohort of 114 patients with HCM, a total of 20 different mutations (8 novel and 12 known mutations) of MYBPC3 were identified from 25 patients (21.9%). We demonstrated that the power of targeted resequencing in a cohort of HCM patients, and found that MYBPC3 is a common HCM-causing gene in Chinese patients. Phenotype-genotype analyses showed that the patients with double mutations (n = 2) or premature termination codon mutations (n = 12) showed more severe manifestations, compared with patients with missense mutations (n = 11). Particularly, we identified a recurrent truncation mutation (p.Y842X) in four unrelated cases (4/25, 16%), who showed severe phenotypes, and suggest that the p.Y842X is a frequent mutation in Chinese HCM patients with severe phenotypes.

  4. Information Commons for Rice (IC4R)

    PubMed Central

    2016-01-01

    Rice is the most important staple food for a large part of the world's human population and also a key model organism for plant research. Here, we present Information Commons for Rice (IC4R; http://ic4r.org), a rice knowledgebase featuring adoption of an extensible and sustainable architecture that integrates multiple omics data through community-contributed modules. Each module is developed and maintained by different committed groups, deals with data collection, processing and visualization, and delivers data on-demand via web services. In the current version, IC4R incorporates a variety of rice data through multiple committed modules, including genome-wide expression profiles derived entirely from RNA-Seq data, resequencing-based genomic variations obtained from re-sequencing data of thousands of rice varieties, plant homologous genes covering multiple diverse plant species, post-translational modifications, rice-related literatures and gene annotations contributed by the rice research community. Unlike extant related databases, IC4R is designed for scalability and sustainability and thus also features collaborative integration of rice data and low costs for database update and maintenance. Future directions of IC4R include incorporation of other omics data and association of multiple omics data with agronomically important traits, dedicating to build IC4R into a valuable knowledgebase for both basic and translational researches in rice. PMID:26519466

  5. Genome-wide comparison of ultraviolet and ethyl methanesulphonate mutagenesis methods for the brown alga Ectocarpus.

    PubMed

    Godfroy, Olivier; Peters, Akira F; Coelho, Susana M; Cock, J Mark

    2015-12-01

    Ectocarpus has emerged as a model organism for the brown algae and a broad range of genetic and genomic resources are being generated for this species. The aim of the work presented here was to evaluate two mutagenesis protocols based on ultraviolet irradiation and ethyl methanesulphonate treatment using genome resequencing to measure the number, type and distribution of mutations generated by the two methods. Ultraviolet irradiation generated a greater number of genetic lesions than ethyl methanesulphonate treatment, with more than 400 mutations being detected in the genome of the mutagenised individual. This study therefore confirms that the ultraviolet mutagenesis protocol is suitable for approaches that require a high density of mutations, such as saturation mutagenesis or Targeting Induced Local Lesions in Genomes (TILLING). Copyright © 2015 Elsevier B.V. All rights reserved.

  6. The high-quality draft genome of peach (Prunus persica) identifies unique patterns of genetic diversity, domestication and genome evolution.

    PubMed

    Verde, Ignazio; Abbott, Albert G; Scalabrin, Simone; Jung, Sook; Shu, Shengqiang; Marroni, Fabio; Zhebentyayeva, Tatyana; Dettori, Maria Teresa; Grimwood, Jane; Cattonaro, Federica; Zuccolo, Andrea; Rossini, Laura; Jenkins, Jerry; Vendramin, Elisa; Meisel, Lee A; Decroocq, Veronique; Sosinski, Bryon; Prochnik, Simon; Mitros, Therese; Policriti, Alberto; Cipriani, Guido; Dondini, Luca; Ficklin, Stephen; Goodstein, David M; Xuan, Pengfei; Del Fabbro, Cristian; Aramini, Valeria; Copetti, Dario; Gonzalez, Susana; Horner, David S; Falchi, Rachele; Lucas, Susan; Mica, Erica; Maldonado, Jonathan; Lazzari, Barbara; Bielenberg, Douglas; Pirona, Raul; Miculan, Mara; Barakat, Abdelali; Testolin, Raffaele; Stella, Alessandra; Tartarini, Stefano; Tonutti, Pietro; Arús, Pere; Orellana, Ariel; Wells, Christina; Main, Dorrie; Vizzotto, Giannina; Silva, Herman; Salamini, Francesco; Schmutz, Jeremy; Morgante, Michele; Rokhsar, Daniel S

    2013-05-01

    Rosaceae is the most important fruit-producing clade, and its key commercially relevant genera (Fragaria, Rosa, Rubus and Prunus) show broadly diverse growth habits, fruit types and compact diploid genomes. Peach, a diploid Prunus species, is one of the best genetically characterized deciduous trees. Here we describe the high-quality genome sequence of peach obtained from a completely homozygous genotype. We obtained a complete chromosome-scale assembly using Sanger whole-genome shotgun methods. We predicted 27,852 protein-coding genes, as well as noncoding RNAs. We investigated the path of peach domestication through whole-genome resequencing of 14 Prunus accessions. The analyses suggest major genetic bottlenecks that have substantially shaped peach genome diversity. Furthermore, comparative analyses showed that peach has not undergone recent whole-genome duplication, and even though the ancestral triplicated blocks in peach are fragmentary compared to those in grape, all seven paleosets of paralogs from the putative paleoancestor are detectable.

  7. Demographic Divergence History of Pied Flycatcher and Collared Flycatcher Inferred from Whole-Genome Re-sequencing Data

    PubMed Central

    Nadachowska-Brzyska, Krystyna; Burri, Reto; Olason, Pall I.; Kawakami, Takeshi; Smeds, Linnéa; Ellegren, Hans

    2013-01-01

    Profound knowledge of demographic history is a prerequisite for the understanding and inference of processes involved in the evolution of population differentiation and speciation. Together with new coalescent-based methods, the recent availability of genome-wide data enables investigation of differentiation and divergence processes at unprecedented depth. We combined two powerful approaches, full Approximate Bayesian Computation analysis (ABC) and pairwise sequentially Markovian coalescent modeling (PSMC), to reconstruct the demographic history of the split between two avian speciation model species, the pied flycatcher and collared flycatcher. Using whole-genome re-sequencing data from 20 individuals, we investigated 15 demographic models including different levels and patterns of gene flow, and changes in effective population size over time. ABC provided high support for recent (mode 0.3 my, range <0.7 my) species divergence, declines in effective population size of both species since their initial divergence, and unidirectional recent gene flow from pied flycatcher into collared flycatcher. The estimated divergence time and population size changes, supported by PSMC results, suggest that the ancestral species persisted through one of the glacial periods of middle Pleistocene and then split into two large populations that first increased in size before going through severe bottlenecks and expanding into their current ranges. Secondary contact appears to have been established after the last glacial maximum. The severity of the bottlenecks at the last glacial maximum is indicated by the discrepancy between current effective population sizes (20,000–80,000) and census sizes (5–50 million birds) of the two species. The recent divergence time challenges the supposition that avian speciation is a relatively slow process with extended times for intrinsic postzygotic reproductive barriers to evolve. Our study emphasizes the importance of using genome-wide data to unravel tangled demographic histories. Moreover, it constitutes one of the first examples of the inference of divergence history from genome-wide data in non-model species. PMID:24244198

  8. Demographic divergence history of pied flycatcher and collared flycatcher inferred from whole-genome re-sequencing data.

    PubMed

    Nadachowska-Brzyska, Krystyna; Burri, Reto; Olason, Pall I; Kawakami, Takeshi; Smeds, Linnéa; Ellegren, Hans

    2013-11-01

    Profound knowledge of demographic history is a prerequisite for the understanding and inference of processes involved in the evolution of population differentiation and speciation. Together with new coalescent-based methods, the recent availability of genome-wide data enables investigation of differentiation and divergence processes at unprecedented depth. We combined two powerful approaches, full Approximate Bayesian Computation analysis (ABC) and pairwise sequentially Markovian coalescent modeling (PSMC), to reconstruct the demographic history of the split between two avian speciation model species, the pied flycatcher and collared flycatcher. Using whole-genome re-sequencing data from 20 individuals, we investigated 15 demographic models including different levels and patterns of gene flow, and changes in effective population size over time. ABC provided high support for recent (mode 0.3 my, range <0.7 my) species divergence, declines in effective population size of both species since their initial divergence, and unidirectional recent gene flow from pied flycatcher into collared flycatcher. The estimated divergence time and population size changes, supported by PSMC results, suggest that the ancestral species persisted through one of the glacial periods of middle Pleistocene and then split into two large populations that first increased in size before going through severe bottlenecks and expanding into their current ranges. Secondary contact appears to have been established after the last glacial maximum. The severity of the bottlenecks at the last glacial maximum is indicated by the discrepancy between current effective population sizes (20,000-80,000) and census sizes (5-50 million birds) of the two species. The recent divergence time challenges the supposition that avian speciation is a relatively slow process with extended times for intrinsic postzygotic reproductive barriers to evolve. Our study emphasizes the importance of using genome-wide data to unravel tangled demographic histories. Moreover, it constitutes one of the first examples of the inference of divergence history from genome-wide data in non-model species.

  9. Positional bias in variant calls against draft reference assemblies.

    PubMed

    Briskine, Roman V; Shimizu, Kentaro K

    2017-03-28

    Whole genome resequencing projects may implement variant calling using draft reference genomes assembled de novo from short-read libraries. Despite lower quality of such assemblies, they allowed researchers to extend a wide range of population genetic and genome-wide association analyses to non-model species. As the variant calling pipelines are complex and involve many software packages, it is important to understand inherent biases and limitations at each step of the analysis. In this article, we report a positional bias present in variant calling performed against draft reference assemblies constructed from de Bruijn or string overlap graphs. We assessed how frequently variants appeared at each position counted from ends of a contig or scaffold sequence, and discovered unexpectedly high number of variants at the positions related to the length of either k-mers or reads used for the assembly. We detected the bias in both publicly available draft assemblies from Assemblathon 2 competition as well as in the assemblies we generated from our simulated short-read data. Simulations confirmed that the bias causing variants are predominantly false positives induced by reads from spatially distant repeated sequences. The bias is particularly strong in contig assemblies. Scaffolding does not eliminate the bias but tends to mitigate it because of the changes in variants' relative positions and alterations in read alignments. The bias can be effectively reduced by filtering out the variants that reside in repetitive elements. Draft genome sequences generated by several popular assemblers appear to be susceptible to the positional bias potentially affecting many resequencing projects in non-model species. The bias is inherent to the assembly algorithms and arises from their particular handling of repeated sequences. It is recommended to reduce the bias by filtering especially if higher-quality genome assembly cannot be achieved. Our findings can help other researchers to improve the quality of their variant data sets and reduce artefactual findings in downstream analyses.

  10. Pharmacogenetics of human 3'-phosphoadenosine 5'-phosphosulfate synthetase 1 (PAPSS1): gene resequencing, sequence variation, and functional genomics.

    PubMed

    Xu, Zhen-Hua; Thomae, Bianca A; Eckloff, Bruce W; Wieben, Eric D; Weinshilboum, Richard M

    2003-06-01

    3'-Phosphoadenosine 5'-phosphosulfate (PAPS) is the high-energy "sulfate donor" for reactions catalyzed by sulfotransferase (SULT) enzymes. The strict requirement of SULTs for PAPS suggests that PAPS synthesis might influence the rate of sulfate conjugation. In humans, PAPS is synthesized from ATP and SO(4)(2-) by two isoforms of PAPS synthetase (PAPSS): PAPSS1 and PAPSS2. As a step toward pharmacogenetic studies, we have resequenced the entire coding sequence of the human PAPSS1 gene, including exon-intron splice junctions, using DNA samples from 60 Caucasian-American and 58 African-American subjects. Twenty-one genetic polymorphisms were observed-1 insertion-deletion event and 20 single nucleotide polymorphisms (SNPs)-including two non-synonymous coding SNPs (cSNPs) that altered the following amino acids: Arg333Cys and Glu531Gln. Twelve pairs of these polymorphisms were tightly linked, and a total of twelve unequivocal haplotypes could be identified-two that were common to both ethnic groups and ten that were ethnic-specific. The Arg333Cys polymorphism, with an allele frequency of 2.5%, was observed only in DNA samples from Caucasian subjects. The Glu531Gln polymorphism was rare, with only a single copy of that allele in a DNA sample from an African-American subject. Transient expression in mammalian cells showed that neither of the non-synonymous cSNPs resulted in a change in the basal level of enzyme activity measured under optimal assay conditions. However, the Glu531Gln polymorphism altered the substrate kinetic properties of the enzyme. The Gln531 variant allozyme had a 5-fold higher K(m) value for SO(4)(2-) than did the wild-type allozyme and displayed monophasic kinetics for Na(2)SO(4). The wild-type allozyme (Glu531) showed biphasic kinetics for that substrate. These observations represent a step toward testing the hypothesis that genetic variation in PAPS synthesis catalyzed by PAPSS1 might alter in vivo sulfate conjugation.

  11. A novel ultra high-throughput 16S rRNA gene amplicon sequencing library preparation method for the Illumina HiSeq platform.

    PubMed

    de Muinck, Eric J; Trosvik, Pål; Gilfillan, Gregor D; Hov, Johannes R; Sundaram, Arvind Y M

    2017-07-06

    Advances in sequencing technologies and bioinformatics have made the analysis of microbial communities almost routine. Nonetheless, the need remains to improve on the techniques used for gathering such data, including increasing throughput while lowering cost and benchmarking the techniques so that potential sources of bias can be better characterized. We present a triple-index amplicon sequencing strategy to sequence large numbers of samples at significantly lower c ost and in a shorter timeframe compared to existing methods. The design employs a two-stage PCR protocol, incorpo rating three barcodes to each sample, with the possibility to add a fourth-index. It also includes heterogeneity spacers to overcome low complexity issues faced when sequencing amplicons on Illumina platforms. The library preparation method was extensively benchmarked through analysis of a mock community in order to assess biases introduced by sample indexing, number of PCR cycles, and template concentration. We further evaluated the method through re-sequencing of a standardized environmental sample. Finally, we evaluated our protocol on a set of fecal samples from a small cohort of healthy adults, demonstrating good performance in a realistic experimental setting. Between-sample variation was mainly related to batch effects, such as DNA extraction, while sample indexing was also a significant source of bias. PCR cycle number strongly influenced chimera formation and affected relative abundance estimates of species with high GC content. Libraries were sequenced using the Illumina HiSeq and MiSeq platforms to demonstrate that this protocol is highly scalable to sequence thousands of samples at a very low cost. Here, we provide the most comprehensive study of performance and bias inherent to a 16S rRNA gene amplicon sequencing method to date. Triple-indexing greatly reduces the number of long custom DNA oligos required for library preparation, while the inclusion of variable length heterogeneity spacers minimizes the need for PhiX spike-in. This design results in a significant cost reduction of highly multiplexed amplicon sequencing. The biases we characterize highlight the need for highly standardized protocols. Reassuringly, we find that the biological signal is a far stronger structuring factor than the various sources of bias.

  12. A variant reference data set for the Africanized honeybee, Apis mellifera

    PubMed Central

    Kadri, Samir M.; Harpur, Brock A.; Orsi, Ricardo O.; Zayed, Amro

    2016-01-01

    The Africanized honeybee (AHB) is a population of Apis mellifera found in the Americas. AHBs originated in 1956 in Rio Clara, Brazil where imported African A. m. scutellata escaped and hybridized with local populations of European A. mellifera. Africanized populations can now be found from Northern Argentina to the Southern United States. AHBs—often referred to as ‘Killer Bees’— are a major concern to the beekeeping industry as well as a model for the evolutionary genetics of colony defence. We performed high coverage pooled-resequencing of 360 diploid workers from 30 Brazilian AHB colonies using Illumina Hi-Seq (150 bp PE). This yielded a high density SNP data set with an average read depth at each site of 20.25 reads. With 3,606,720 SNPs and 155,336 SNPs within 11,365 genes, this data set is the largest genomic resource available for AHBs and will enable high-resolution studies of the population dynamics, evolution, and genetics of this successful biological invader, in addition to facilitating the development of SNP-based tools for identifying AHBs. PMID:27824336

  13. Research progress of plant population genomics based on high-throughput sequencing.

    PubMed

    Wang, Yun-sheng

    2016-08-01

    Population genomics, a new paradigm for population genetics, combine the concepts and techniques of genomics with the theoretical system of population genetics and improve our understanding of microevolution through identification of site-specific effect and genome-wide effects using genome-wide polymorphic sites genotypeing. With the appearance and improvement of the next generation high-throughput sequencing technology, the numbers of plant species with complete genome sequences increased rapidly and large scale resequencing has also been carried out in recent years. Parallel sequencing has also been done in some plant species without complete genome sequences. These studies have greatly promoted the development of population genomics and deepened our understanding of the genetic diversity, level of linking disequilibium, selection effect, demographical history and molecular mechanism of complex traits of relevant plant population at a genomic level. In this review, I briely introduced the concept and research methods of population genomics and summarized the research progress of plant population genomics based on high-throughput sequencing. I also discussed the prospect as well as existing problems of plant population genomics in order to provide references for related studies.

  14. A variant reference data set for the Africanized honeybee, Apis mellifera.

    PubMed

    Kadri, Samir M; Harpur, Brock A; Orsi, Ricardo O; Zayed, Amro

    2016-11-08

    The Africanized honeybee (AHB) is a population of Apis mellifera found in the Americas. AHBs originated in 1956 in Rio Clara, Brazil where imported African A. m. scutellata escaped and hybridized with local populations of European A. mellifera. Africanized populations can now be found from Northern Argentina to the Southern United States. AHBs-often referred to as 'Killer Bees'- are a major concern to the beekeeping industry as well as a model for the evolutionary genetics of colony defence. We performed high coverage pooled-resequencing of 360 diploid workers from 30 Brazilian AHB colonies using Illumina Hi-Seq (150 bp PE). This yielded a high density SNP data set with an average read depth at each site of 20.25 reads. With 3,606,720 SNPs and 155,336 SNPs within 11,365 genes, this data set is the largest genomic resource available for AHBs and will enable high-resolution studies of the population dynamics, evolution, and genetics of this successful biological invader, in addition to facilitating the development of SNP-based tools for identifying AHBs.

  15. Genetic responses to seasonal variation in altitudinal stress: whole-genome resequencing of great tit in eastern Himalayas

    PubMed Central

    Qu, Yanhua; Tian, Shilin; Han, Naijian; Zhao, Hongwei; Gao, Bin; Fu, Jun; Cheng, Yalin; Song, Gang; Ericson, Per G. P.; Zhang, Yong E.; Wang, Dawei; Quan, Qing; Jiang, Zhi; Li, Ruiqiang; Lei, Fumin

    2015-01-01

    Species that undertake altitudinal migrations are exposed to a considerable seasonal variation in oxygen levels and temperature. How they cope with this was studied in a population of great tit (Parus major) that breeds at high elevations and winters at lower elevations in the eastern Himalayas. Comparison of population genomics of high altitudinal great tits and those living in lowlands revealed an accelerated genetic selection for carbohydrate energy metabolism (amino sugar, nucleotide sugar metabolism and insulin signaling pathways) and hypoxia response (PI3K-akt, mTOR and MAPK signaling pathways) in the high altitudinal population. The PI3K-akt, mTOR and MAPK pathways modulate the hypoxia-inducible factors, HIF-1α and VEGF protein expression thus indirectly regulate hypoxia induced angiogenesis, erythropoiesis and vasodilatation. The strategies observed in high altitudinal great tits differ from those described in a closely related species on the Tibetan Plateau, the sedentary ground tit (Parus humilis). This species has enhanced selection in lipid-specific metabolic pathways and hypoxia-inducible factor pathway (HIF-1). Comparative population genomics also revealed selection for larger body size in high altitudinal great tits. PMID:26404527

  16. TE-Tracker: systematic identification of transposition events through whole-genome resequencing.

    PubMed

    Gilly, Arthur; Etcheverry, Mathilde; Madoui, Mohammed-Amin; Guy, Julie; Quadrana, Leandro; Alberti, Adriana; Martin, Antoine; Heitkam, Tony; Engelen, Stefan; Labadie, Karine; Le Pen, Jeremie; Wincker, Patrick; Colot, Vincent; Aury, Jean-Marc

    2014-11-19

    Transposable elements (TEs) are DNA sequences that are able to move from their location in the genome by cutting or copying themselves to another locus. As such, they are increasingly recognized as impacting all aspects of genome function. With the dramatic reduction in cost of DNA sequencing, it is now possible to resequence whole genomes in order to systematically characterize novel TE mobilization in a particular individual. However, this task is made difficult by the inherently repetitive nature of TE sequences, which in some eukaryotes compose over half of the genome sequence. Currently, only a few software tools dedicated to the detection of TE mobilization using next-generation-sequencing are described in the literature. They often target specific TEs for which annotation is available, and are only able to identify families of closely related TEs, rather than individual elements. We present TE-Tracker, a general and accurate computational method for the de-novo detection of germ line TE mobilization from re-sequenced genomes, as well as the identification of both their source and destination sequences. We compare our method with the two classes of existing software: specialized TE-detection tools and generic structural variant (SV) detection tools. We show that TE-Tracker, while working independently of any prior annotation, bridges the gap between these two approaches in terms of detection power. Indeed, its positive predictive value (PPV) is comparable to that of dedicated TE software while its sensitivity is typical of a generic SV detection tool. TE-Tracker demonstrates the benefit of adopting an annotation-independent, de novo approach for the detection of TE mobilization events. We use TE-Tracker to provide a comprehensive view of transposition events induced by loss of DNA methylation in Arabidopsis. TE-Tracker is freely available at http://www.genoscope.cns.fr/TE-Tracker . We show that TE-Tracker accurately detects both the source and destination of novel transposition events in re-sequenced genomes. Moreover, TE-Tracker is able to detect all potential donor sequences for a given insertion, and can identify the correct one among them. Furthermore, TE-Tracker produces significantly fewer false positives than common SV detection programs, thus greatly facilitating the detection and analysis of TE mobilization events.

  17. Microarray-based Resequencing of Multiple Bacillus anthracis Isolates

    DTIC Science & Technology

    2004-12-17

    generated an Unweighted Pair Group Method Arithmetic Mean ( UPGMA ) tree (see methods [56]; Figure 3). The strains group together in a manner broadly similar...was created using DNADIST, plotted as a UPGMA tree using NEIGHBOR and the tree plotted using DRAWGRAM [56]. The B1 strain A0465 was used as an...distance matrix was created using DNADIST, plotted as a UPGMA tree using NEIGHBOR and the tree plotted using DRAWGRAM [57]. Additional data files The

  18. Genomic Analyses Reveal Demographic History and Temperate Adaptation of the Newly Discovered Honey Bee Subspecies Apis mellifera sinisxinyuan n. ssp

    PubMed Central

    Chen, Chao; Liu, Zhiguang; Pan, Qi; Chen, Xiao; Wang, Huihua; Guo, Haikun; Liu, Shidong; Lu, Hongfeng; Tian, Shilin; Li, Ruiqiang; Shi, Wei

    2016-01-01

    Studying the genetic signatures of climate-driven selection can produce insights into local adaptation and the potential impacts of climate change on populations. The honey bee (Apis mellifera) is an interesting species to study local adaptation because it originated in tropical/subtropical climatic regions and subsequently spread into temperate regions. However, little is known about the genetic basis of its adaptation to temperate climates. Here, we resequenced the whole genomes of ten individual bees from a newly discovered population in temperate China and downloaded resequenced data from 35 individuals from other populations. We found that the new population is an undescribed subspecies in the M-lineage of A. mellifera (Apis mellifera sinisxinyuan). Analyses of population history show that long-term global temperature has strongly influenced the demographic history of A. m. sinisxinyuan and its divergence from other subspecies. Further analyses comparing temperate and tropical populations identified several candidate genes related to fat body and the Hippo signaling pathway that are potentially involved in adaptation to temperate climates. Our results provide insights into the demographic history of the newly discovered A. m. sinisxinyuan, as well as the genetic basis of adaptation of A. mellifera to temperate climates at the genomic level. These findings will facilitate the selective breeding of A. mellifera to improve the survival of overwintering colonies. PMID:26823447

  19. Amplicon Resequencing Identified Parental Mosaicism for Approximately 10% of “de novo” SCN1A Mutations in Children with Dravet Syndrome

    PubMed Central

    Xu, Xiaojing; Yang, Xiaoxu; Wu, Qixi; Liu, Aijie; Yang, Xiaoling; Ye, Adam Yongxin; Huang, August Yue; Li, Jiarui; Wang, Meng; Yu, Zhe; Wang, Sheng; Zhang, Zhichao; Wu, Xiru

    2015-01-01

    ABSTRACT The majority of children with Dravet syndrome (DS) are caused by de novo SCN1A mutations. To investigate the origin of the mutations, we developed and applied a new method that combined deep amplicon resequencing with a Bayesian model to detect and quantify allelic fractions with improved sensitivity. Of 174 SCN1A mutations in DS probands which were considered “de novo” by Sanger sequencing, we identified 15 cases (8.6%) of parental mosaicism. We identified another five cases of parental mosaicism that were also detectable by Sanger sequencing. Fraction of mutant alleles in the 20 cases of parental mosaicism ranged from 1.1% to 32.6%. Thirteen (65% of 20) mutations originated paternally and seven (35% of 20) maternally. Twelve (60% of 20) mosaic parents did not have any epileptic symptoms. Their mutant allelic fractions were significantly lower than those in mosaic parents with epileptic symptoms (P = 0.016). We identified mosaicism with varied allelic fractions in blood, saliva, urine, hair follicle, oral epithelium, and semen, demonstrating that postzygotic mutations could affect multiple somatic cells as well as germ cells. Our results suggest that more sensitive tools for detecting low‐level mosaicism in parents of families with seemingly “de novo” mutations will allow for better informed genetic counseling. PMID:26096185

  20. Using genic sequence capture in combination with a syntenic pseudo genome to map a deletion mutant in a wheat species.

    PubMed

    Gardiner, Laura-Jayne; Gawroński, Piotr; Olohan, Lisa; Schnurbusch, Thorsten; Hall, Neil; Hall, Anthony

    2014-12-01

    Mapping-by-sequencing analyses have largely required a complete reference sequence and employed whole genome re-sequencing. In species such as wheat, no finished genome reference sequence is available. Additionally, because of its large genome size (17 Gb), re-sequencing at sufficient depth of coverage is not practical. Here, we extend the utility of mapping by sequencing, developing a bespoke pipeline and algorithm to map an early-flowering locus in einkorn wheat (Triticum monococcum L.) that is closely related to the bread wheat genome A progenitor. We have developed a genomic enrichment approach using the gene-rich regions of hexaploid bread wheat to design a 110-Mbp NimbleGen SeqCap EZ in solution capture probe set, representing the majority of genes in wheat. Here, we use the capture probe set to enrich and sequence an F2 mapping population of the mutant. The mutant locus was identified in T. monococcum, which lacks a complete genome reference sequence, by mapping the enriched data set onto pseudo-chromosomes derived from the capture probe target sequence, with a long-range order of genes based on synteny of wheat with Brachypodium distachyon. Using this approach we are able to map the region and identify a set of deleted genes within the interval. © 2014 The Authors.The Plant Journal published by Society for Experimental Biology and John Wiley & Sons Ltd.

  1. The wolf reference genome sequence (Canis lupus lupus) and its implications for Canis spp. population genomics.

    PubMed

    Gopalakrishnan, Shyam; Samaniego Castruita, Jose A; Sinding, Mikkel-Holger S; Kuderna, Lukas F K; Räikkönen, Jannikke; Petersen, Bent; Sicheritz-Ponten, Thomas; Larson, Greger; Orlando, Ludovic; Marques-Bonet, Tomas; Hansen, Anders J; Dalén, Love; Gilbert, M Thomas P

    2017-06-29

    An increasing number of studies are addressing the evolutionary genomics of dog domestication, principally through resequencing dog, wolf and related canid genomes. There is, however, only one de novo assembled canid genome currently available against which to map such data - that of a boxer dog (Canis lupus familiaris). We generated the first de novo wolf genome (Canis lupus lupus) as an additional choice of reference, and explored what implications may arise when previously published dog and wolf resequencing data are remapped to this reference. Reassuringly, we find that regardless of the reference genome choice, most evolutionary genomic analyses yield qualitatively similar results, including those exploring the structure between the wolves and dogs using admixture and principal component analysis. However, we do observe differences in the genomic coverage of re-mapped samples, the number of variants discovered, and heterozygosity estimates of the samples. In conclusion, the choice of reference is dictated by the aims of the study being undertaken; if the study focuses on the differences between the different dog breeds or the fine structure among dogs, then using the boxer reference genome is appropriate, but if the aim of the study is to look at the variation within wolves and their relationships to dogs, then there are clear benefits to using the de novo assembled wolf reference genome.

  2. Genetic evidence of multiple loci in dystocia - difficult labour

    PubMed Central

    2010-01-01

    Background Dystocia, difficult labour, is a common but also complex problem during childbirth. It can be attributed to either weak contractions of the uterus, a large infant, reduced capacity of the pelvis or combinations of these. Previous studies have indicated that there is a genetic component in the susceptibility of experiencing dystocia. The purpose of this study was to identify susceptibility genes in dystocia. Methods A total of 104 women in 47 families were included where at least two sisters had undergone caesarean section at a gestational length of 286 days or more at their first delivery. Study of medical records and a telephone interview was performed to identify subjects with dystocia. Whole-genome scanning using Affymetrix genotyping-arrays and non-parametric linkage (NPL) analysis was made in 39 women exhibiting the phenotype of dystocia from 19 families. In 68 women re-sequencing was performed of candidate genes showing suggestive linkage: oxytocin (OXT) on chromosome 20 and oxytocin-receptor (OXTR) on chromosome 3. Results We found a trend towards linkage with suggestive NPL-score (3.15) on chromosome 12p12. Suggestive linkage peaks were observed on chromosomes 3, 4, 6, 10, 20. Re-sequencing of OXT and OXTR did not reveal any causal variants. Conclusions Dystocia is likely to have a genetic component with variations in multiple genes affecting the patient outcome. We found 6 loci that could be re-evaluated in larger patient cohorts. PMID:20587075

  3. Revalidation and genetic characterization of new members of Group C (Orthobunyavirus genus, Peribunyaviridae family) isolated in the Americas.

    PubMed

    Nunes, Márcio Roberto Teixeira; de Souza, William Marciel; Acrani, Gustavo Olszanski; Cardoso, Jedson Ferreira; da Silva, Sandro Patroca; Badra, Soraya Jabur; Figueiredo, Luiz Tadeu Moraes; Vasconcelos, Pedro Fernando da Costa

    2018-01-01

    Group C serogroup includes members of the Orthobunyavirus genus (family Peribunyaviridae) and comprises 15 arboviruses that can be associated with febrile illness in humans. Although previous studies described the genome characterization of Group C orthobunyavirus, there is a gap in genomic information about the other viruses in this group. Therefore, in this study, complete genomes of members of Group C serogroup were sequenced or re-sequenced and used for genetic characterization, as well as to understand their phylogenetic and evolutionary aspects. Thus, our study reported the genomes of three new members in Group C virus (Apeu strain BeAn848, Itaqui strain BeAn12797 and Nepuyo strain BeAn10709), as well as re-sequencing of original strains of five members: Caraparu (strain BeAn3994), Madrid (strain BT4075), Murucutu (strain BeAn974), Oriboca (strain BeAn17), and Marituba (strain BeAn15). These viruses presented a typical genomic organization related to members of the Orthobunyavirus genus. Interestingly, all viruses of this serogroup showed an open reading frame (ORF) that encodes the putative nonstructural NSs protein that precedes the nucleoprotein ORF, an unprecedented fact in Group C virus. Also, we confirmed the presence of natural reassortment events. This study expands the genomic information of Group C viruses, as well as revalidates the genomic organization of viruses that were previously reported.

  4. Empirical Validation of Pooled Whole Genome Population Re-Sequencing in Drosophila melanogaster

    PubMed Central

    Zhu, Yuan; Bergland, Alan O.; González, Josefa; Petrov, Dmitri A.

    2012-01-01

    The sequencing of pooled non-barcoded individuals is an inexpensive and efficient means of assessing genome-wide population allele frequencies, yet its accuracy has not been thoroughly tested. We assessed the accuracy of this approach on whole, complex eukaryotic genomes by resequencing pools of largely isogenic, individually sequenced Drosophila melanogaster strains. We called SNPs in the pooled data and estimated false positive and false negative rates using the SNPs called in individual strain as a reference. We also estimated allele frequency of the SNPs using “pooled” data and compared them with “true” frequencies taken from the estimates in the individual strains. We demonstrate that pooled sequencing provides a faithful estimate of population allele frequency with the error well approximated by binomial sampling, and is a reliable means of novel SNP discovery with low false positive rates. However, a sufficient number of strains should be used in the pooling because variation in the amount of DNA derived from individual strains is a substantial source of noise when the number of pooled strains is low. Our results and analysis confirm that pooled sequencing is a very powerful and cost-effective technique for assessing of patterns of sequence variation in populations on genome-wide scales, and is applicable to any dataset where sequencing individuals or individual cells is impossible, difficult, time consuming, or expensive. PMID:22848651

  5. Expression Quantitative Trait Locus Mapping across Water Availability Environments Reveals Contrasting Associations with Genomic Features in Arabidopsis[C][W][OPEN

    PubMed Central

    Lowry, David B.; Logan, Tierney L.; Santuari, Luca; Hardtke, Christian S.; Richards, James H.; DeRose-Wilson, Leah J.; McKay, John K.; Sen, Saunak; Juenger, Thomas E.

    2013-01-01

    The regulation of gene expression is crucial for an organism’s development and response to stress, and an understanding of the evolution of gene expression is of fundamental importance to basic and applied biology. To improve this understanding, we conducted expression quantitative trait locus (eQTL) mapping in the Tsu-1 (Tsushima, Japan) × Kas-1 (Kashmir, India) recombinant inbred line population of Arabidopsis thaliana across soil drying treatments. We then used genome resequencing data to evaluate whether genomic features (promoter polymorphism, recombination rate, gene length, and gene density) are associated with genes responding to the environment (E) or with genes with genetic variation (G) in gene expression in the form of eQTLs. We identified thousands of genes that responded to soil drying and hundreds of main-effect eQTLs. However, we identified very few statistically significant eQTLs that interacted with the soil drying treatment (GxE eQTL). Analysis of genome resequencing data revealed associations of several genomic features with G and E genes. In general, E genes had lower promoter diversity and local recombination rates. By contrast, genes with eQTLs (G) had significantly greater promoter diversity and were located in genomic regions with higher recombination. These results suggest that genomic architecture may play an important a role in the evolution of gene expression. PMID:24045022

  6. Deep Resequencing Unveils Genetic Architecture of ADIPOQ and Identifies a Novel Low-Frequency Variant Strongly Associated With Adiponectin Variation

    PubMed Central

    Warren, Liling L.; Li, Li; Nelson, Matthew R.; Ehm, Margaret G.; Shen, Judong; Fraser, Dana J.; Aponte, Jennifer L.; Nangle, Keith L.; Slater, Andrew J.; Woollard, Peter M.; Hall, Matt D.; Topp, Simon D.; Yuan, Xin; Cardon, Lon R.; Chissoe, Stephanie L.; Mooser, Vincent; Morris, Andrew D.; Palmer, Colin N.A.; Perry, John R.; Frayling, Timothy M.; Whittaker, John C.; Waterworth, Dawn M.

    2012-01-01

    Increased adiponectin levels have been shown to be associated with a lower risk of type 2 diabetes. To understand the relations between genetic variation at the adiponectin-encoding gene, ADIPOQ, and adiponectin levels, and subsequently its role in disease, we conducted a deep resequencing experiment of ADIPOQ in 14,002 subjects, including 12,514 Europeans, 594 African Americans, and 567 Indian Asians. We identified 296 single nucleotide polymorphisms (SNPs), including 30 amino acid changes, and carried out association analyses in a subset of 3,665 subjects from two independent studies. We confirmed multiple genome-wide association study findings and identified a novel association between a low-frequency SNP (rs17366653) and adiponectin levels (P = 2.2E–17). We show that seven SNPs exert independent effects on adiponectin levels. Together, they explained 6% of adiponectin variation in our samples. We subsequently assessed association between these SNPs and type 2 diabetes in the Genetics of Diabetes Audit and Research in Tayside Scotland (GO-DARTS) study, comprised of 5,145 case and 6,374 control subjects. No evidence of association with type 2 diabetes was found, but we were also unable to exclude the possibility of substantial effects (e.g., odds ratio 95% CI for rs7366653 [0.91–1.58]). Further investigation by large-scale and well-powered Mendelian randomization studies is warranted. PMID:22403302

  7. Molecular footprints of domestication and improvement in soybean revealed by whole genome re-sequencing

    PubMed Central

    2013-01-01

    Background Artificial selection played an important role in the origin of modern Glycine max cultivars from the wild soybean Glycine soja. To elucidate the consequences of artificial selection accompanying the domestication and modern improvement of soybean, 25 new and 30 published whole-genome re-sequencing accessions, which represent wild, domesticated landrace, and Chinese elite soybean populations were analyzed. Results A total of 5,102,244 single nucleotide polymorphisms (SNPs) and 707,969 insertion/deletions were identified. Among the SNPs detected, 25.5% were not described previously. We found that artificial selection during domestication led to more pronounced reduction in the genetic diversity of soybean than the switch from landraces to elite cultivars. Only a small proportion (2.99%) of the whole genomic regions appear to be affected by artificial selection for preferred agricultural traits. The selection regions were not distributed randomly or uniformly throughout the genome. Instead, clusters of selection hotspots in certain genomic regions were observed. Moreover, a set of candidate genes (4.38% of the total annotated genes) significantly affected by selection underlying soybean domestication and genetic improvement were identified. Conclusions Given the uniqueness of the soybean germplasm sequenced, this study drew a clear picture of human-mediated evolution of the soybean genomes. The genomic resources and information provided by this study would also facilitate the discovery of genes/loci underlying agronomically important traits. PMID:23984715

  8. Comprehensive profiling and quantitation of oncogenic mutations in non small-cell lung carcinoma using single molecule amplification and re-sequencing technology

    PubMed Central

    Jiang, Hong; Wang, Limin; Xu, Rujun; Shi, Yanbin; Zhang, Jianguang; Xu, Mengnan; Cram, David S.; Ma, Shenglin

    2016-01-01

    Activating and resistance mutations in the tyrosine kinase domain of several oncogenes are frequently associated with non-small cell lung carcinoma (NSCLC). In this study we assessed the frequency, type and abundance of EGFR, KRAS, BRAF, TP53 and ALK mutations in tumour specimens from 184 patients with early and late stage disease using single molecule amplification and re-sequencing technology (SMART). Based on modelling of EGFR mutations, the detection sensitivity of the SMART assay was at least 0.1%. Benchmarking EGFR mutation detection against the gold standard ARMS-PCR assay, SMART assay had a sensitivity and specificity of 98.7% and 99.0%. Amongst the 184 samples, EGFR mutations were the most prevalent (59.9%), followed by KRAS (16.9%), TP53 (12.7%), EML4-ALK fusions (6.3%) and BRAF (4.2%) mutations. The abundance and types of mutations in tumour specimens were extremely heterogeneous, involving either monoclonal (51.6%) or polyclonal (12.6%) mutation events. At the clinical level, although the spectrum of tumour mutation(s) was unique to each patient, the overall patterns in early or advanced stage disease were relatively similar. Based on these findings, we propose that personalized profiling and quantitation of clinically significant oncogenic mutations will allow better classification of patients according to tumour characteristics and provide clinicians with important ancillary information for treatment decision-making. PMID:27409166

  9. Comprehensive profiling and quantitation of oncogenic mutations in non small-cell lung carcinoma using single molecule amplification and re-sequencing technology.

    PubMed

    Zhang, Shirong; Xia, Bing; Jiang, Hong; Wang, Limin; Xu, Rujun; Shi, Yanbin; Zhang, Jianguang; Xu, Mengnan; Cram, David S; Ma, Shenglin

    2016-08-02

    Activating and resistance mutations in the tyrosine kinase domain of several oncogenes are frequently associated with non-small cell lung carcinoma (NSCLC). In this study we assessed the frequency, type and abundance of EGFR, KRAS, BRAF, TP53 and ALK mutations in tumour specimens from 184 patients with early and late stage disease using single molecule amplification and re-sequencing technology (SMART). Based on modelling of EGFR mutations, the detection sensitivity of the SMART assay was at least 0.1%. Benchmarking EGFR mutation detection against the gold standard ARMS-PCR assay, SMART assay had a sensitivity and specificity of 98.7% and 99.0%. Amongst the 184 samples, EGFR mutations were the most prevalent (59.9%), followed by KRAS (16.9%), TP53 (12.7%), EML4-ALK fusions (6.3%) and BRAF (4.2%) mutations. The abundance and types of mutations in tumour specimens were extremely heterogeneous, involving either monoclonal (51.6%) or polyclonal (12.6%) mutation events. At the clinical level, although the spectrum of tumour mutation(s) was unique to each patient, the overall patterns in early or advanced stage disease were relatively similar. Based on these findings, we propose that personalized profiling and quantitation of clinically significant oncogenic mutations will allow better classification of patients according to tumour characteristics and provide clinicians with important ancillary information for treatment decision-making.

  10. Rabbit genome analysis reveals a polygenic basis for phenotypic change during domestication.

    PubMed

    Carneiro, Miguel; Rubin, Carl-Johan; Di Palma, Federica; Albert, Frank W; Alföldi, Jessica; Martinez Barrio, Alvaro; Pielberg, Gerli; Rafati, Nima; Sayyab, Shumaila; Turner-Maier, Jason; Younis, Shady; Afonso, Sandra; Aken, Bronwen; Alves, Joel M; Barrell, Daniel; Bolet, Gerard; Boucher, Samuel; Burbano, Hernán A; Campos, Rita; Chang, Jean L; Duranthon, Veronique; Fontanesi, Luca; Garreau, Hervé; Heiman, David; Johnson, Jeremy; Mage, Rose G; Peng, Ze; Queney, Guillaume; Rogel-Gaillard, Claire; Ruffier, Magali; Searle, Steve; Villafuerte, Rafael; Xiong, Anqi; Young, Sarah; Forsberg-Nilsson, Karin; Good, Jeffrey M; Lander, Eric S; Ferrand, Nuno; Lindblad-Toh, Kerstin; Andersson, Leif

    2014-08-29

    The genetic changes underlying the initial steps of animal domestication are still poorly understood. We generated a high-quality reference genome for the rabbit and compared it to resequencing data from populations of wild and domestic rabbits. We identified more than 100 selective sweeps specific to domestic rabbits but only a relatively small number of fixed (or nearly fixed) single-nucleotide polymorphisms (SNPs) for derived alleles. SNPs with marked allele frequency differences between wild and domestic rabbits were enriched for conserved noncoding sites. Enrichment analyses suggest that genes affecting brain and neuronal development have often been targeted during domestication. We propose that because of a truly complex genetic background, tame behavior in rabbits and other domestic animals evolved by shifts in allele frequencies at many loci, rather than by critical changes at only a few domestication loci. Copyright © 2014, American Association for the Advancement of Science.

  11. Using Informatics-, Bioinformatics- and Genomics-Based Approaches for the Molecular Surveillance and Detection of Biothreat Agents

    NASA Astrophysics Data System (ADS)

    Seto, Donald

    The convergence and wealth of informatics, bioinformatics and genomics methods and associated resources allow a comprehensive and rapid approach for the surveillance and detection of bacterial and viral organisms. Coupled with the continuing race for the fastest, most cost-efficient and highest-quality DNA sequencing technology, that is, "next generation sequencing", the detection of biological threat agents by `cheaper and faster' means is possible. With the application of improved bioinformatic tools for the understanding of these genomes and for parsing unique pathogen genome signatures, along with `state-of-the-art' informatics which include faster computational methods, equipment and databases, it is feasible to apply new algorithms to biothreat agent detection. Two such methods are high-throughput DNA sequencing-based and resequencing microarray-based identification. These are illustrated and validated by two examples involving human adenoviruses, both from real-world test beds.

  12. Yak whole-genome resequencing reveals domestication signatures and prehistoric population expansions.

    PubMed

    Qiu, Qiang; Wang, Lizhong; Wang, Kun; Yang, Yongzhi; Ma, Tao; Wang, Zefu; Zhang, Xiao; Ni, Zhengqiang; Hou, Fujiang; Long, Ruijun; Abbott, Richard; Lenstra, Johannes; Liu, Jianquan

    2015-12-22

    Yak domestication represents an important episode in the early human occupation of the high-altitude Qinghai-Tibet Plateau (QTP). The precise timing of domestication is debated and little is known about the underlying genetic changes that occurred during the process. Here we investigate genome variation of wild and domestic yaks. We detect signals of selection in 209 genes of domestic yaks, several of which relate to behaviour and tameness. We date yak domestication to 7,300 years before present (yr BP), most likely by nomadic people, and an estimated sixfold increase in yak population size by 3,600 yr BP. These dates coincide with two early human population expansions on the QTP during the early-Neolithic age and the late-Holocene, respectively. Our findings add to an understanding of yak domestication and its importance in the early human occupation of the QTP.

  13. Yak whole-genome resequencing reveals domestication signatures and prehistoric population expansions

    PubMed Central

    Qiu, Qiang; Wang, Lizhong; Wang, Kun; Yang, Yongzhi; Ma, Tao; Wang, Zefu; Zhang, Xiao; Ni, Zhengqiang; Hou, Fujiang; Long, Ruijun; Abbott, Richard; Lenstra, Johannes; Liu, Jianquan

    2015-01-01

    Yak domestication represents an important episode in the early human occupation of the high-altitude Qinghai-Tibet Plateau (QTP). The precise timing of domestication is debated and little is known about the underlying genetic changes that occurred during the process. Here we investigate genome variation of wild and domestic yaks. We detect signals of selection in 209 genes of domestic yaks, several of which relate to behaviour and tameness. We date yak domestication to 7,300 years before present (yr BP), most likely by nomadic people, and an estimated sixfold increase in yak population size by 3,600 yr BP. These dates coincide with two early human population expansions on the QTP during the early-Neolithic age and the late-Holocene, respectively. Our findings add to an understanding of yak domestication and its importance in the early human occupation of the QTP. PMID:26691338

  14. The Rosa genome provides new insights into the domestication of modern roses.

    PubMed

    Raymond, Olivier; Gouzy, Jérôme; Just, Jérémy; Badouin, Hélène; Verdenaud, Marion; Lemainque, Arnaud; Vergne, Philippe; Moja, Sandrine; Choisne, Nathalie; Pont, Caroline; Carrère, Sébastien; Caissard, Jean-Claude; Couloux, Arnaud; Cottret, Ludovic; Aury, Jean-Marc; Szécsi, Judit; Latrasse, David; Madoui, Mohammed-Amin; François, Léa; Fu, Xiaopeng; Yang, Shu-Hua; Dubois, Annick; Piola, Florence; Larrieu, Antoine; Perez, Magali; Labadie, Karine; Perrier, Lauriane; Govetto, Benjamin; Labrousse, Yoan; Villand, Priscilla; Bardoux, Claudia; Boltz, Véronique; Lopez-Roques, Céline; Heitzler, Pascal; Vernoux, Teva; Vandenbussche, Michiel; Quesneville, Hadi; Boualem, Adnane; Bendahmane, Abdelhafid; Liu, Chang; Le Bris, Manuel; Salse, Jérôme; Baudino, Sylvie; Benhamed, Moussa; Wincker, Patrick; Bendahmane, Mohammed

    2018-06-01

    Roses have high cultural and economic importance as ornamental plants and in the perfume industry. We report the rose whole-genome sequencing and assembly and resequencing of major genotypes that contributed to rose domestication. We generated a homozygous genotype from a heterozygous diploid modern rose progenitor, Rosa chinensis 'Old Blush'. Using single-molecule real-time sequencing and a meta-assembly approach, we obtained one of the most comprehensive plant genomes to date. Diversity analyses highlighted the mosaic origin of 'La France', one of the first hybrids combining the growth vigor of European species and the recurrent blooming of Chinese species. Genomic segments of Chinese ancestry identified new candidate genes for recurrent blooming. Reconstructing regulatory and secondary metabolism pathways allowed us to propose a model of interconnected regulation of scent and flower color. This genome provides a foundation for understanding the mechanisms governing rose traits and should accelerate improvement in roses, Rosaceae and ornamentals.

  15. Chromosomal-Level Assembly of the Asian Seabass Genome Using Long Sequence Reads and Multi-layered Scaffolding

    PubMed Central

    Vij, Shubha; Kuhl, Heiner; Kuznetsova, Inna S.; Komissarov, Aleksey; Yurchenko, Andrey A.; Van Heusden, Peter; Singh, Siddharth; Thevasagayam, Natascha M.; Prakki, Sai Rama Sridatta; Purushothaman, Kathiresan; Saju, Jolly M.; Jiang, Junhui; Mbandi, Stanley Kimbung; Jonas, Mario; Hin Yan Tong, Amy; Mwangi, Sarah; Lau, Doreen; Ngoh, Si Yan; Liew, Woei Chang; Shen, Xueyan; Hon, Lawrence S.; Drake, James P.; Boitano, Matthew; Hall, Richard; Chin, Chen-Shan; Lachumanan, Ramkumar; Korlach, Jonas; Trifonov, Vladimir; Kabilov, Marsel; Tupikin, Alexey; Green, Darrell; Moxon, Simon; Garvin, Tyler; Sedlazeck, Fritz J.; Vurture, Gregory W.; Gopalapillai, Gopikrishna; Kumar Katneni, Vinaya; Noble, Tansyn H.; Scaria, Vinod; Sivasubbu, Sridhar; Jerry, Dean R.; O'Brien, Stephen J.; Schatz, Michael C.; Dalmay, Tamás; Turner, Stephen W.; Lok, Si; Christoffels, Alan; Orbán, László

    2016-01-01

    We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species’ native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics. PMID:27082250

  16. Genome sequence and genetic diversity of the common carp, Cyprinus carpio.

    PubMed

    Xu, Peng; Zhang, Xiaofeng; Wang, Xumin; Li, Jiongtang; Liu, Guiming; Kuang, Youyi; Xu, Jian; Zheng, Xianhu; Ren, Lufeng; Wang, Guoliang; Zhang, Yan; Huo, Linhe; Zhao, Zixia; Cao, Dingchen; Lu, Cuiyun; Li, Chao; Zhou, Yi; Liu, Zhanjiang; Fan, Zhonghua; Shan, Guangle; Li, Xingang; Wu, Shuangxiu; Song, Lipu; Hou, Guangyuan; Jiang, Yanliang; Jeney, Zsigmond; Yu, Dan; Wang, Li; Shao, Changjun; Song, Lai; Sun, Jing; Ji, Peifeng; Wang, Jian; Li, Qiang; Xu, Liming; Sun, Fanyue; Feng, Jianxin; Wang, Chenghui; Wang, Shaolin; Wang, Baosen; Li, Yan; Zhu, Yaping; Xue, Wei; Zhao, Lan; Wang, Jintu; Gu, Ying; Lv, Weihua; Wu, Kejing; Xiao, Jingfa; Wu, Jiayan; Zhang, Zhang; Yu, Jun; Sun, Xiaowen

    2014-11-01

    The common carp, Cyprinus carpio, is one of the most important cyprinid species and globally accounts for 10% of freshwater aquaculture production. Here we present a draft genome of domesticated C. carpio (strain Songpu), whose current assembly contains 52,610 protein-coding genes and approximately 92.3% coverage of its paleotetraploidized genome (2n = 100). The latest round of whole-genome duplication has been estimated to have occurred approximately 8.2 million years ago. Genome resequencing of 33 representative individuals from worldwide populations demonstrates a single origin for C. carpio in 2 subspecies (C. carpio Haematopterus and C. carpio carpio). Integrative genomic and transcriptomic analyses were used to identify loci potentially associated with traits including scaling patterns and skin color. In combination with the high-resolution genetic map, the draft genome paves the way for better molecular studies and improved genome-assisted breeding of C. carpio and other closely related species.

  17. Development and validation of an rDNA operon based primer walking strategy applicable to de novo bacterial genome finishing

    PubMed Central

    Eastman, Alexander W.; Yuan, Ze-Chun

    2015-01-01

    Advances in sequencing technology have drastically increased the depth and feasibility of bacterial genome sequencing. However, little information is available that details the specific techniques and procedures employed during genome sequencing despite the large numbers of published genomes. Shotgun approaches employed by second-generation sequencing platforms has necessitated the development of robust bioinformatics tools for in silico assembly, and complete assembly is limited by the presence of repetitive DNA sequences and multi-copy operons. Typically, re-sequencing with multiple platforms and laborious, targeted Sanger sequencing are employed to finish a draft bacterial genome. Here we describe a novel strategy based on the identification and targeted sequencing of repetitive rDNA operons to expedite bacterial genome assembly and finishing. Our strategy was validated by finishing the genome of Paenibacillus polymyxa strain CR1, a bacterium with potential in sustainable agriculture and bio-based processes. An analysis of the 38 contigs contained in the P. polymyxa strain CR1 draft genome revealed 12 repetitive rDNA operons with varied intragenic and flanking regions of variable length, unanimously located at contig boundaries and within contig gaps. These highly similar but not identical rDNA operons were experimentally verified and sequenced simultaneously with multiple, specially designed primer sets. This approach also identified and corrected significant sequence rearrangement generated during the initial in silico assembly of sequencing reads. Our approach reduces the required effort associated with blind primer walking for contig assembly, increasing both the speed and feasibility of genome finishing. Our study further reinforces the notion that repetitive DNA elements are major limiting factors for genome finishing. Moreover, we provided a step-by-step workflow for genome finishing, which may guide future bacterial genome finishing projects. PMID:25653642

  18. Association analysis of genes involved in maize (Zea mays L.) root development with seedling and agronomic traits under contrasting nitrogen levels.

    PubMed

    Abdel-Ghani, Adel H; Kumar, Bharath; Pace, Jordon; Jansen, Constantin; Gonzalez-Portilla, Pedro J; Reyes-Matamoros, Jenaro; San Martin, Juan Pablo; Lee, Michael; Lübberstedt, Thomas

    2015-05-01

    A better understanding of the genetic control of root development might allow one to develop lines with root systems with the potential to adapt to soils with limited nutrient availability. For this purpose, an association study (AS) panel consisting of 74 diverse set of inbred maize lines were screened for seedling root traits and adult plant root traits under two contrasting nitrogen (N) levels (low and high N). Allele re-sequencing of RTCL, RTH3, RUM1, and RUL1 genes related to root development was carried out for AS panel lines. Association analysis was carried out between individual polymorphisms, and both seedling and adult plant traits, while controlling for spurious associations due to population structure and kinship relations. Based on the SNPs identified in RTCL, RTH3, RUM1, and RUL1, lines within the AS panel were grouped into 16, 9, 22, and 7 haplotypes, respectively. Association analysis revealed several polymorphisms within root genes putatively associated with the variability in seedling root and adult plant traits development under contrasting N levels. The highest number of significantly associated SNPs with seedling root traits were found in RTCL (19 SNPs) followed by RUM1 (4 SNPs) and in case of RTH3 and RUL1, two and three SNPs, respectively, were significantly associated with root traits. RTCL and RTH3 were also found to be associated with grain yield. Thus considerable allelic diversity is present within the candidate genes studied and can be utilized to develop functional markers that allow identification of maize lines with improved root architecture and yield under N stress conditions.

  19. Identification, Replication, and Functional Fine-Mapping of Expression Quantitative Trait Loci in Primary Human Liver Tissue

    PubMed Central

    Stanaway, Ian B.; Gamazon, Eric R.; Smith, Joshua D.; Mirkov, Snezana; Ramirez, Jacqueline; Liu, Wanqing; Lin, Yvonne S.; Moloney, Cliona; Aldred, Shelly Force; Trinklein, Nathan D.; Schuetz, Erin; Nickerson, Deborah A.; Thummel, Ken E.; Rieder, Mark J.; Rettie, Allan E.; Ratain, Mark J.; Cox, Nancy J.; Brown, Christopher D.

    2011-01-01

    The discovery of expression quantitative trait loci (“eQTLs”) can help to unravel genetic contributions to complex traits. We identified genetic determinants of human liver gene expression variation using two independent collections of primary tissue profiled with Agilent (n = 206) and Illumina (n = 60) expression arrays and Illumina SNP genotyping (550K), and we also incorporated data from a published study (n = 266). We found that ∼30% of SNP-expression correlations in one study failed to replicate in either of the others, even at thresholds yielding high reproducibility in simulations, and we quantified numerous factors affecting reproducibility. Our data suggest that drug exposure, clinical descriptors, and unknown factors associated with tissue ascertainment and analysis have substantial effects on gene expression and that controlling for hidden confounding variables significantly increases replication rate. Furthermore, we found that reproducible eQTL SNPs were heavily enriched near gene starts and ends, and subsequently resequenced the promoters and 3′UTRs for 14 genes and tested the identified haplotypes using luciferase assays. For three genes, significant haplotype-specific in vitro functional differences correlated directly with expression levels, suggesting that many bona fide eQTLs result from functional variants that can be mechanistically isolated in a high-throughput fashion. Finally, given our study design, we were able to discover and validate hundreds of liver eQTLs. Many of these relate directly to complex traits for which liver-specific analyses are likely to be relevant, and we identified dozens of potential connections with disease-associated loci. These included previously characterized eQTL contributors to diabetes, drug response, and lipid levels, and they suggest novel candidates such as a role for NOD2 expression in leprosy risk and C2orf43 in prostate cancer. In general, the work presented here will be valuable for future efforts to precisely identify and functionally characterize genetic contributions to a variety of complex traits. PMID:21637794

  20. Research Results

    NASA Astrophysics Data System (ADS)

    2011-12-01

    Research on Global Carbon Emission and Sequestration NSFC Funded Project Made Significant Progress in Quantum Dynamics Functional Human Blood Protein Obtained from Rice How Giant Pandas Thrive on a Bamboo Diet New Evidence of Interpersonal Violence from 129,000 Years Ago Found in China Aptamer-Mediated Efficient Capture and Release of T Lymphocytes on Nanostructured Surfaces BGI Study Results on Resequencing 50 Accessions of Rice Cast New Light on Molecular Breeding BGI Reports Study Results on Frequent Mutation of Genes Encoding UMPP Components in Kidney Cancer Research on Habitat Shift Promoting Species Diversification

  1. JVM: Java Visual Mapping tool for next generation sequencing read.

    PubMed

    Yang, Ye; Liu, Juan

    2015-01-01

    We developed a program JVM (Java Visual Mapping) for mapping next generation sequencing read to reference sequence. The program is implemented in Java and is designed to deal with millions of short read generated by sequence alignment using the Illumina sequencing technology. It employs seed index strategy and octal encoding operations for sequence alignments. JVM is useful for DNA-Seq, RNA-Seq when dealing with single-end resequencing. JVM is a desktop application, which supports reads capacity from 1 MB to 10 GB.

  2. A novel PTCH1 mutation in a patient with Gorlin syndrome

    PubMed Central

    Okamoto, Nana; Naruto, Takuya; Kohmoto, Tomohiro; Komori, Takahide; Imoto, Issei

    2014-01-01

    Gorlin syndrome is an autosomal dominant disorder characterized by a wide range of developmental abnormalities and a predisposition to various tumors, and it is linked to the alteration of several causative genes, including PTCH1. We performed targeted resequencing using a next-generation sequencer to analyze genes associated with known clinical phenotypes in an 11-year-old male with sporadic jaw keratocysts. A novel duplication mutation (c.426dup) in PTCH1, resulting in a truncated protein, was identified. PMID:27081512

  3. A novel PTCH1 mutation in a patient with Gorlin syndrome.

    PubMed

    Okamoto, Nana; Naruto, Takuya; Kohmoto, Tomohiro; Komori, Takahide; Imoto, Issei

    2014-01-01

    Gorlin syndrome is an autosomal dominant disorder characterized by a wide range of developmental abnormalities and a predisposition to various tumors, and it is linked to the alteration of several causative genes, including PTCH1. We performed targeted resequencing using a next-generation sequencer to analyze genes associated with known clinical phenotypes in an 11-year-old male with sporadic jaw keratocysts. A novel duplication mutation (c.426dup) in PTCH1, resulting in a truncated protein, was identified.

  4. A Variable Polyglutamine Repeat Affects Subcellular Localization and Regulatory Activity of a Populus ANGUSTIFOLIA Protein.

    PubMed

    Bryan, Anthony C; Zhang, Jin; Guo, Jianjun; Ranjan, Priya; Singan, Vasanth; Barry, Kerrie; Schmutz, Jeremy; Weighill, Deborah; Jacobson, Daniel; Jawdy, Sara; Tuskan, Gerald A; Chen, Jin-Gui; Muchero, Wellington

    2018-06-08

    Polyglutamine (polyQ) stretches have been reported to occur in proteins across many organisms including animals, fungi and plants. Expansion of these repeats has attracted much attention due their associations with numerous human diseases including Huntington's and other neurological maladies. This suggests that the relative length of polyQ stretches is an important modulator of their function. Here, we report the identification of a Populus C-terminus binding protein (CtBP) ANGUSTIFOLIA ( PtAN1 ) which contains a polyQ stretch whose functional relevance had not been established. Analysis of 917 resequenced Populus trichocarpa genotypes revealed three allelic variants at this locus encoding 11-, 13- and 15-glutamine residues. Transient expression assays using Populus leaf mesophyll protoplasts revealed that the 11Q variant exhibited strong nuclear localization whereas the 15Q variant was only found in the cytosol, with the 13Q variant exhibiting localization in both subcellular compartments. We assessed functional implications by evaluating expression changes of putative PtAN1 targets in response to overexpression of the three allelic variants and observed allele-specific differences in expression levels of putative targets. Our results provide evidence that variation in polyQ length modulates PtAN1 function by altering subcellular localization. Copyright © 2018, G3: Genes, Genomes, Genetics.

  5. Smoking Gun or Circumstantial Evidence? Comparison of Statistical Learning Methods using Functional Annotations for Prioritizing Risk Variants.

    PubMed

    Gagliano, Sarah A; Ravji, Reena; Barnes, Michael R; Weale, Michael E; Knight, Jo

    2015-08-24

    Although technology has triumphed in facilitating routine genome sequencing, new challenges have been created for the data-analyst. Genome-scale surveys of human variation generate volumes of data that far exceed capabilities for laboratory characterization. By incorporating functional annotations as predictors, statistical learning has been widely investigated for prioritizing genetic variants likely to be associated with complex disease. We compared three published prioritization procedures, which use different statistical learning algorithms and different predictors with regard to the quantity, type and coding. We also explored different combinations of algorithm and annotation set. As an application, we tested which methodology performed best for prioritizing variants using data from a large schizophrenia meta-analysis by the Psychiatric Genomics Consortium. Results suggest that all methods have considerable (and similar) predictive accuracies (AUCs 0.64-0.71) in test set data, but there is more variability in the application to the schizophrenia GWAS. In conclusion, a variety of algorithms and annotations seem to have a similar potential to effectively enrich true risk variants in genome-scale datasets, however none offer more than incremental improvement in prediction. We discuss how methods might be evolved for risk variant prediction to address the impending bottleneck of the new generation of genome re-sequencing studies.

  6. The N14 anti-afamin antibody Fab: a rare VL1 CDR glycosylation, crystallographic re-sequencing, molecular plasticity and conservative versus enthusiastic modelling.

    PubMed

    Naschberger, Andreas; Fürnrohr, Barbara G; Lenac Rovis, Tihana; Malic, Suzana; Scheffzek, Klaus; Dieplinger, Hans; Rupp, Bernhard

    2016-12-01

    The monoclonal antibody N14 is used as a detection antibody in ELISA kits for the human glycoprotein afamin, a member of the albumin family, which has recently gained interest in the capture and stabilization of Wnt signalling proteins, and for its role in metabolic syndrome and papillary thyroid carcinoma. As a rare occurrence, the N14 Fab is N-glycosylated at Asn26L at the onset of the V L 1 antigen-binding loop, with the α-1-6 core fucosylated complex glycan facing out of the L1 complementarity-determining region. The crystal structures of two non-apparent (pseudo) isomorphous crystals of the N14 Fab were analyzed, which differ significantly in the elbow angles, thereby cautioning against the overinterpretation of domain movements upon antigen binding. In addition, the map quality at 1.9 Å resolution was sufficient to crystallographically re-sequence the variable V L and V H domains and to detect discrepancies in the hybridoma-derived sequence. Finally, a conservatively refined parsimonious model is presented and its statistics are compared with those from a less conservatively built model that has been modelled more enthusiastically. Improvements to the PDB validation reports affecting ligands, clashscore and buried surface calculations are suggested.

  7. Rare loss-of-function mutations in ANGPTL family members contribute to plasma triglyceride levels in humans

    PubMed Central

    Romeo, Stefano; Yin, Wu; Kozlitina, Julia; Pennacchio, Len A.; Boerwinkle, Eric; Hobbs, Helen H.; Cohen, Jonathan C.

    2008-01-01

    The relative activity of lipoprotein lipase (LPL) in different tissues controls the partitioning of lipoprotein-derived fatty acids between sites of fat storage (adipose tissue) and oxidation (heart and skeletal muscle). Here we used a reverse genetic strategy to test the hypothesis that 4 angiopoietin-like proteins (ANGPTL3, -4, -5, and -6) play key roles in triglyceride (TG) metabolism in humans. We re-sequenced the coding regions of the genes encoding these proteins and identified multiple rare nonsynonymous (NS) sequence variations that were associated with low plasma TG levels but not with other metabolic phenotypes. Functional studies revealed that all mutant alleles of ANGPTL3 and ANGPTL4 that were associated with low plasma TG levels interfered either with the synthesis or secretion of the protein or with the ability of the ANGPTL protein to inhibit LPL. A total of 1% of the Dallas Heart Study population and 4% of those participants with a plasma TG in the lowest quartile had a rare loss-of-function mutation in ANGPTL3, ANGPTL4, or ANGPTL5. Thus, ANGPTL3, ANGPTL4, and ANGPTL5, but not ANGPTL6, play nonredundant roles in TG metabolism, and multiple alleles at these loci cumulatively contribute to variability in plasma TG levels in humans. PMID:19075393

  8. Simultaneous mutation and copy number variation (CNV) detection by multiplex PCR-based GS-FLX sequencing.

    PubMed

    Goossens, Dirk; Moens, Lotte N; Nelis, Eva; Lenaerts, An-Sofie; Glassee, Wim; Kalbe, Andreas; Frey, Bruno; Kopal, Guido; De Jonghe, Peter; De Rijk, Peter; Del-Favero, Jurgen

    2009-03-01

    We evaluated multiplex PCR amplification as a front-end for high-throughput sequencing, to widen the applicability of massive parallel sequencers for the detailed analysis of complex genomes. Using multiplex PCR reactions, we sequenced the complete coding regions of seven genes implicated in peripheral neuropathies in 40 individuals on a GS-FLX genome sequencer (Roche). The resulting dataset showed highly specific and uniform amplification. Comparison of the GS-FLX sequencing data with the dataset generated by Sanger sequencing confirmed the detection of all variants present and proved the sensitivity of the method for mutation detection. In addition, we showed that we could exploit the multiplexed PCR amplicons to determine individual copy number variation (CNV), increasing the spectrum of detected variations to both genetic and genomic variants. We conclude that our straightforward procedure substantially expands the applicability of the massive parallel sequencers for sequencing projects of a moderate number of amplicons (50-500) with typical applications in resequencing exons in positional or functional candidate regions and molecular genetic diagnostics. 2008 Wiley-Liss, Inc.

  9. Genetic Predictors of Interindividual Variability in Hepatic CYP3A4 ExpressionS⃞

    PubMed Central

    Lamba, Vishal; Panetta, John C.; Strom, Stephen

    2010-01-01

    Variability in hepatic CYP3A4 cannot be explained by common CYP3A4 coding variants. We previously identified polymorphisms in pregnane X receptor (PXR) and ATP-binding cassette subfamily B member 1 (ABCB1) associated with CYP3A4 mRNA levels in small cohorts of human livers. However, the relative contributions of these genetic variations or of polymorphisms in other CYP3A4 regulators to variable CYP3A4 expression were not known. We phenotyped livers from white donors (n = 128) by quantitative real-time polymerase chain reaction for expression of CYP3A4, CYP3A5, and CYP3A7 and nine transcriptional regulators, coactivators, and corepressors. We resequenced hepatic nuclear factor-3-β (HNF3β, FoxA2), HNF4α, HNF3γ (FoxA3), nuclear receptor corepressor 2 (NCoR2), and regions of the CYP3A4 promoter and genotyped informative single-nucleotide polymorphisms in PXR and ABCB1 in the same livers. CYP3A4 mRNA was positively correlated with PXR and FoxA2 and negatively correlated with NCoR2 mRNA. A common silent polymorphism and a polymorphic trinucleotide (CCT) repeat in FoxA2 were associated with CYP3A4 expression. The transcriptional activity of the FoxA2 polymorphic CCT repeat alleles (wild-type, n = 14 and variant, n = 13, 15, and 19) when assayed by luciferase reporter transactivation assays was greatest for the wild-type repeat, with deviations from this number having decreased transcriptional activity. This corresponded with higher expression of FoxA2 mRNA and its targets PXR and CYP3A4 in human livers with (CCT) n = 14 genotypes. Multiple linear regression analysis was used to quantify the contributions of selected genetic polymorphisms to variable CYP3A4 expression. This approach identified sex and polymorphisms in FoxA2, HNF4α, FoxA3, PXR, ABCB1, and the CYP3A4 promoter that together explained as much as 24.6% of the variation in hepatic CYP3A4 expression. PMID:19934400

  10. Genomic Analyses Reveal Demographic History and Temperate Adaptation of the Newly Discovered Honey Bee Subspecies Apis mellifera sinisxinyuan n. ssp.

    PubMed

    Chen, Chao; Liu, Zhiguang; Pan, Qi; Chen, Xiao; Wang, Huihua; Guo, Haikun; Liu, Shidong; Lu, Hongfeng; Tian, Shilin; Li, Ruiqiang; Shi, Wei

    2016-05-01

    Studying the genetic signatures of climate-driven selection can produce insights into local adaptation and the potential impacts of climate change on populations. The honey bee (Apis mellifera) is an interesting species to study local adaptation because it originated in tropical/subtropical climatic regions and subsequently spread into temperate regions. However, little is known about the genetic basis of its adaptation to temperate climates. Here, we resequenced the whole genomes of ten individual bees from a newly discovered population in temperate China and downloaded resequenced data from 35 individuals from other populations. We found that the new population is an undescribed subspecies in the M-lineage of A. mellifera (Apis mellifera sinisxinyuan). Analyses of population history show that long-term global temperature has strongly influenced the demographic history of A. m. sinisxinyuan and its divergence from other subspecies. Further analyses comparing temperate and tropical populations identified several candidate genes related to fat body and the Hippo signaling pathway that are potentially involved in adaptation to temperate climates. Our results provide insights into the demographic history of the newly discovered A. m. sinisxinyuan, as well as the genetic basis of adaptation of A. mellifera to temperate climates at the genomic level. These findings will facilitate the selective breeding of A. mellifera to improve the survival of overwintering colonies. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  11. Quantitation of fetal DNA fraction in maternal plasma using circulating single molecule amplification and re-sequencing technology (cSMART).

    PubMed

    Song, Yijun; Zhou, Xiya; Huang, Saiqiong; Li, Xiaohong; Qi, Qingwei; Jiang, Yulin; Liu, Yiqian; Ma, Chengcheng; Li, Zhifeng; Xu, Mengnan; Cram, David S; Liu, Juntao

    2016-05-01

    Calculation of the fetal DNA fraction (FF) is important for reliable and accurate noninvasive prenatal testing (NIPT) for fetal genetic abnormalities. The aim of the study was to develop and validate a novel method for FF determination. FF was calculated using the chromosome Y (ChrY) sequence read assay and by circulating single molecule amplification and re-sequencing technology of 76 autosomal SNPs. By Pearson correlation for FF (4.73-22.11%) in 33 male pregnancy samples, the R(2) co-efficient for the 76-SNP versus the ChrY assay was 0.9572 (p<0.001). In addition, the co-efficient of variation (CV) of FF measurement by the 76-SNP assay was low (0.15-0.35). As a control, the FF measurement for four non-pregnant plasma samples was virtually zero. In prospective longitudinal studies of 14 women with normal pregnancies, FF generally increased with gestational age. However, in eight women (71%) there was a significant decrease in FF between the first trimester (11-13 weeks) and the second trimester (15-19 weeks), and this was attributable to significant maternal weight gain. The novel 76-SNP cSMART assay has the precision to accurately measure FF in all pregnancies at a detection threshold of 5%. Based on FF trends in individual pregnancies, our results suggest that the end of the first trimester may be a more optimal window for performing NIPT. Copyright © 2016 Elsevier B.V. All rights reserved.

  12. How important are rare variants in common disease?

    PubMed

    Saint Pierre, Aude; Génin, Emmanuelle

    2014-09-01

    Genome-wide association studies have uncovered hundreds of common genetic variants involved in complex diseases. However, for most complex diseases, these common genetic variants only marginally contribute to disease susceptibility. It is now argued that rare variants located in different genes could in fact play a more important role in disease susceptibility than common variants. These rare genetic variants were not captured by genome-wide association studies using single nucleotide polymorphism-chips but with the advent of next-generation sequencing technologies, they have become detectable. It is now possible to study their contribution to common disease by resequencing samples of cases and controls or by using new genotyping exome arrays that cover rare alleles. In this review, we address the question of the contribution of rare variants in common disease by taking the examples of different diseases for which some resequencing studies have already been performed, and by summarizing the results of simulation studies conducted so far to investigate the genetic architecture of complex traits in human. So far, empirical data have not allowed the exclusion of many models except the most extreme ones involving only a small number of rare variants with large effects contributing to complex disease. To unravel the genetic architecture of complex disease, case-control data will not be sufficient, and alternative study designs need to be proposed together with methodological developments. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  13. Targeted Resequencing and Functional Testing Identifies Low-Frequency Missense Variants in the Gene Encoding GARP as Significant Contributors to Atopic Dermatitis Risk.

    PubMed

    Manz, Judith; Rodríguez, Elke; ElSharawy, Abdou; Oesau, Eva-Maria; Petersen, Britt-Sabina; Baurecht, Hansjörg; Mayr, Gabriele; Weber, Susanne; Harder, Jürgen; Reischl, Eva; Schwarz, Agatha; Novak, Natalija; Franke, Andre; Weidinger, Stephan

    2016-12-01

    Gene-mapping studies have consistently identified a susceptibility locus for atopic dermatitis and other inflammatory diseases on chromosome band 11q13.5, with the strongest association observed for a common variant located in an intergenic region between the two annotated genes C11orf30 and LRRC32. Using a targeted resequencing approach we identified low-frequency and rare missense mutations within the LRRC32 gene encoding the protein GARP, a receptor on activated regulatory T cells that binds latent transforming growth factor-β. Subsequent association testing in more than 2,000 atopic dermatitis patients and 2,000 control subjects showed a significant excess of these LRRC32 variants in individuals with atopic dermatitis. Structural protein modeling and bioinformatic analysis predicted a disruption of protein transport upon these variants, and overexpression assays in CD4 + CD25 - T cells showed a significant reduction in surface expression of the mutated protein. Consistently, flow cytometric (FACS) analyses of different T-cell subtypes obtained from atopic dermatitis patients showed a significantly reduced surface expression of GARP and a reduced conversion of CD4 + CD25 - T cells into regulatory T cells, along with lower expression of latency-associated protein upon stimulation in carriers of the LRRC32 A407T variant. These results link inherited disturbances of transforming growth factor-β signaling with atopic dermatitis risk. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  14. A new DPYD genotyping assay for improving the safety of 5-fluorouracil therapy.

    PubMed

    Sistonen, Johanna; Smith, Chingying; Fu, Yung-Kang; Largiadèr, Carlo R

    2012-12-24

    Chemotherapeutic use of 5-fluorouracil (5FU) is compromised by 10-20% of patients developing severe toxicity. Recently described genetic variation in dihydropyrimidine dehydrogenase (DPYD) has been shown to be a major predictor of 5FU toxicity. Here, we describe a new genotyping assay for routine clinical use that covers all the major DPYD risk variants. Genomic regions targeting DPYD risk variants (c.1129-5923C>G, c.1679T>G/A, c.1905+1G>A, c.2846A>T) and additional markers (c.234-123G>C, c.496A>G, c.775A>G) were amplified in a multiplex PCR reaction. The subsequent steps including allele-specific primer extension, hybridization of the primers to a microarray, scanning of the array, and data analysis were automated within the INFINITI® Analyzer (AutoGenomics). The assay was validated by analyzing 107 blood samples obtained from patients previously re-sequenced for the DPYD. The genotypes obtained with the developed assay were 100% concordant with the re-sequencing. The procedure is suitable for routine clinical use since the results are obtained within one day. For heterozygous risk variant carriers (~7% of Europeans), the treatment can be adjusted by 5FU dose reduction, whereas carriers of two risk alleles should be treated with an alternative therapy. The developed assay provides a novel tool to improve the safety of commonly used 5FU-based chemotherapies. Copyright © 2012 Elsevier B.V. All rights reserved.

  15. Natural Selection and Genetic Diversity in the Butterfly Heliconius melpomene.

    PubMed

    Martin, Simon H; Möst, Markus; Palmer, William J; Salazar, Camilo; McMillan, W Owen; Jiggins, Francis M; Jiggins, Chris D

    2016-05-01

    A combination of selective and neutral evolutionary forces shape patterns of genetic diversity in nature. Among the insects, most previous analyses of the roles of drift and selection in shaping variation across the genome have focused on the genus Drosophila A more complete understanding of these forces will come from analyzing other taxa that differ in population demography and other aspects of biology. We have analyzed diversity and signatures of selection in the neotropical Heliconius butterflies using resequenced genomes from 58 wild-caught individuals of Heliconius melpomene and another 21 resequenced genomes representing 11 related species. By comparing intraspecific diversity and interspecific divergence, we estimate that 31% of amino acid substitutions between Heliconius species are adaptive. Diversity at putatively neutral sites is negatively correlated with the local density of coding sites as well as nonsynonymous substitutions and positively correlated with recombination rate, indicating widespread linked selection. This process also manifests in significantly reduced diversity on longer chromosomes, consistent with lower recombination rates. Although hitchhiking around beneficial nonsynonymous mutations has significantly shaped genetic variation in H. melpomene, evidence for strong selective sweeps is limited overall. We did however identify two regions where distinct haplotypes have swept in different populations, leading to increased population differentiation. On the whole, our study suggests that positive selection is less pervasive in these butterflies as compared to fruit flies, a fact that curiously results in very similar levels of neutral diversity in these very different insects. Copyright © 2016 by the Genetics Society of America.

  16. Validation of Pooled Whole-Genome Re-Sequencing in Arabidopsis lyrata.

    PubMed

    Fracassetti, Marco; Griffin, Philippa C; Willi, Yvonne

    2015-01-01

    Sequencing pooled DNA of multiple individuals from a population instead of sequencing individuals separately has become popular due to its cost-effectiveness and simple wet-lab protocol, although some criticism of this approach remains. Here we validated a protocol for pooled whole-genome re-sequencing (Pool-seq) of Arabidopsis lyrata libraries prepared with low amounts of DNA (1.6 ng per individual). The validation was based on comparing single nucleotide polymorphism (SNP) frequencies obtained by pooling with those obtained by individual-based Genotyping By Sequencing (GBS). Furthermore, we investigated the effect of sample number, sequencing depth per individual and variant caller on population SNP frequency estimates. For Pool-seq data, we compared frequency estimates from two SNP callers, VarScan and Snape; the former employs a frequentist SNP calling approach while the latter uses a Bayesian approach. Results revealed concordance correlation coefficients well above 0.8, confirming that Pool-seq is a valid method for acquiring population-level SNP frequency data. Higher accuracy was achieved by pooling more samples (25 compared to 14) and working with higher sequencing depth (4.1× per individual compared to 1.4× per individual), which increased the concordance correlation coefficient to 0.955. The Bayesian-based SNP caller produced somewhat higher concordance correlation coefficients, particularly at low sequencing depth. We recommend pooling at least 25 individuals combined with sequencing at a depth of 100× to produce satisfactory frequency estimates for common SNPs (minor allele frequency above 0.05).

  17. Missense mutations in TENM4, a regulator of axon guidance and central myelination, cause essential tremor

    PubMed Central

    Hor, Hyun; Francescatto, Ludmila; Bartesaghi, Luca; Ortega-Cubero, Sara; Kousi, Maria; Lorenzo-Betancor, Oswaldo; Jiménez-Jiménez, Felix J.; Gironell, Alexandre; Clarimón, Jordi; Drechsel, Oliver; Agúndez, José A. G.; Kenzelmann Broz, Daniela; Chiquet-Ehrismann, Ruth; Lleó, Alberto; Coria, Francisco; García-Martin, Elena; Alonso-Navarro, Hortensia; Martí, Maria J.; Kulisevsky, Jaume; Hor, Charlotte N.; Ossowski, Stephan; Chrast, Roman; Katsanis, Nicholas; Pastor, Pau; Estivill, Xavier

    2015-01-01

    Essential tremor (ET) is a common movement disorder with an estimated prevalence of 5% of the population aged over 65 years. In spite of intensive efforts, the genetic architecture of ET remains unknown. We used a combination of whole-exome sequencing and targeted resequencing in three ET families. In vitro and in vivo experiments in oligodendrocyte precursor cells and zebrafish were performed to test our findings. Whole-exome sequencing revealed a missense mutation in TENM4 segregating in an autosomal-dominant fashion in an ET family. Subsequent targeted resequencing of TENM4 led to the discovery of two novel missense mutations. Not only did these two mutations segregate with ET in two additional families, but we also observed significant over transmission of pathogenic TENM4 alleles across the three families. Consistent with a dominant mode of inheritance, in vitro analysis in oligodendrocyte precursor cells showed that mutant proteins mislocalize. Finally, expression of human mRNA harboring any of three patient mutations in zebrafish embryos induced defects in axon guidance, confirming a dominant-negative mode of action for these mutations. Our genetic and functional data, which is corroborated by the existence of a Tenm4 knockout mouse displaying an ET phenotype, implicates TENM4 in ET. Together with previous studies of TENM4 in model organisms, our studies intimate that processes regulating myelination in the central nervous system and axon guidance might be significant contributors to the genetic burden of this disorder. PMID:26188006

  18. Identification of Polymorphisms in the 3′-Untranslated Region of the Human Pregnane X Receptor (PXR) Gene Associated with Variability in Cytochrome P450 3A (CYP3A) Metabolism

    PubMed Central

    Oleson, Lauren; von Moltke, Lisa L.; Greenblatt, David J.; Court, Michael H.

    2013-01-01

    Single nucleotide polymorphisms (SNPs) in the 3′untranslated region (3′UTR) of human pregnane X receptor (PXR) gene may contribute to interindividual variability in cytochrome P450 3A (CYP3A) activity. Genotype-phenotype associations involving PXR-3′UTR SNPs were investigated through in vitro (53 human livers from primarily white donors) and in vivo (26 white or African-American volunteers) studies using midazolam 1′-hydroxylation and midazolam apparent oral clearance (CL/F), respectively, as CYP3A-specific probes. PXR-3′UTR resequencing identified 12 SNPs, including 2 that were novel. Although none of the SNPs evaluated were associated with altered midazolam 1′-hydroxylation in the liver bank, both rs3732359 homozygotes and rs3732360 carriers showed 80% higher (P<0.05) CL/F compared with homozygous reference individuals. These differences in CL/F were even larger (100 and 120% higher, respectively; P<0.01) when only African-American subjects (n=14) were considered. Five major haplotypes were identified containing the PXR-3′UTR SNPs and previously identified intron SNPs. Although CL/F differences were not statistically significant within the entire study cohort, African-American carriers of Haplotype-1 (which includes both rs3732359 and rs3732360 variants) exhibited 70% higher median CL/F compared with African-American non-carriers (P=0.036). Our results identify rs3732359 and rs3732360 as PXR-3′UTR SNPs associated with higher CYP3A activity in vivo in African-Americans. PMID:20082578

  19. Genome Structural Diversity among 31 Bordetella pertussis Isolates from Two Recent U.S. Whooping Cough Statewide Epidemics.

    PubMed

    Bowden, Katherine E; Weigand, Michael R; Peng, Yanhui; Cassiday, Pamela K; Sammons, Scott; Knipe, Kristen; Rowe, Lori A; Loparev, Vladimir; Sheth, Mili; Weening, Keeley; Tondella, M Lucia; Williams, Margaret M

    2016-01-01

    During 2010 and 2012, California and Vermont, respectively, experienced statewide epidemics of pertussis with differences seen in the demographic affected, case clinical presentation, and molecular epidemiology of the circulating strains. To overcome limitations of the current molecular typing methods for pertussis, we utilized whole-genome sequencing to gain a broader understanding of how current circulating strains are causing large epidemics. Through the use of combined next-generation sequencing technologies, this study compared de novo, single-contig genome assemblies from 31 out of 33 Bordetella pertussis isolates collected during two separate pertussis statewide epidemics and 2 resequenced vaccine strains. Final genome architecture assemblies were verified with whole-genome optical mapping. Sixteen distinct genome rearrangement profiles were observed in epidemic isolate genomes, all of which were distinct from the genome structures of the two resequenced vaccine strains. These rearrangements appear to be mediated by repetitive sequence elements, such as high-copy-number mobile genetic elements and rRNA operons. Additionally, novel and previously identified single nucleotide polymorphisms were detected in 10 virulence-related genes in the epidemic isolates. Whole-genome variation analysis identified state-specific variants, and coding regions bearing nonsynonymous mutations were classified into functional annotated orthologous groups. Comprehensive studies on whole genomes are needed to understand the resurgence of pertussis and develop novel tools to better characterize the molecular epidemiology of evolving B. pertussis populations. IMPORTANCE Pertussis, or whooping cough, is the most poorly controlled vaccine-preventable bacterial disease in the United States, which has experienced a resurgence for more than a decade. Once viewed as a monomorphic pathogen, B. pertussis strains circulating during epidemics exhibit diversity visible on a genome structural level, previously undetectable by traditional sequence analysis using short-read technologies. For the first time, we combine short- and long-read sequencing platforms with restriction optical mapping for single-contig, de novo assembly of 31 isolates to investigate two geographically and temporally independent U.S. pertussis epidemics. These complete genomes reshape our understanding of B. pertussis evolution and strengthen molecular epidemiology toward one day understanding the resurgence of pertussis.

  20. SeqWare Query Engine: storing and searching sequence data in the cloud.

    PubMed

    O'Connor, Brian D; Merriman, Barry; Nelson, Stanley F

    2010-12-21

    Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets.

  1. In Vitro vs In Silico Detected SNPs for the Development of a Genotyping Array: What Can We Learn from a Non-Model Species?

    PubMed Central

    Lepoittevin, Camille; Frigerio, Jean-Marc; Garnier-Géré, Pauline; Salin, Franck; Cervera, María-Teresa; Vornam, Barbara; Harvengt, Luc; Plomion, Christophe

    2010-01-01

    Background There is considerable interest in the high-throughput discovery and genotyping of single nucleotide polymorphisms (SNPs) to accelerate genetic mapping and enable association studies. This study provides an assessment of EST-derived and resequencing-derived SNP quality in maritime pine (Pinus pinaster Ait.), a conifer characterized by a huge genome size (∼23.8 Gb/C). Methodology/Principal Findings A 384-SNPs GoldenGate genotyping array was built from i/ 184 SNPs originally detected in a set of 40 re-sequenced candidate genes (in vitro SNPs), chosen on the basis of functionality scores, presence of neighboring polymorphisms, minor allele frequencies and linkage disequilibrium and ii/ 200 SNPs screened from ESTs (in silico SNPs) selected based on the number of ESTs used for SNP detection, the SNP minor allele frequency and the quality of SNP flanking sequences. The global success rate of the assay was 66.9%, and a conversion rate (considering only polymorphic SNPs) of 51% was achieved. In vitro SNPs showed significantly higher genotyping-success and conversion rates than in silico SNPs (+11.5% and +18.5%, respectively). The reproducibility was 100%, and the genotyping error rate very low (0.54%, dropping down to 0.06% when removing four SNPs showing elevated error rates). Conclusions/Significance This study demonstrates that ESTs provide a resource for SNP identification in non-model species, which do not require any additional bench work and little bio-informatics analysis. However, the time and cost benefits of in silico SNPs are counterbalanced by a lower conversion rate than in vitro SNPs. This drawback is acceptable for population-based experiments, but could be dramatic in experiments involving samples from narrow genetic backgrounds. In addition, we showed that both the visual inspection of genotyping clusters and the estimation of a per SNP error rate should help identify markers that are not suitable to the GoldenGate technology in species characterized by a large and complex genome. PMID:20543950

  2. Single nucleotide polymorphisms in bone turnover-related genes in Koreans: ethnic differences in linkage disequilibrium and haplotype

    PubMed Central

    Kim, Kyung-Seon; Kim, Ghi-Su; Hwang, Joo-Yeon; Lee, Hye-Ja; Park, Mi-Hyun; Kim, Kwang-joong; Jung, Jongsun; Cha, Hyo-Soung; Shin, Hyoung Doo; Kang, Jong-Ho; Park, Eui Kyun; Kim, Tae-Ho; Hong, Jung-Min; Koh, Jung-Min; Oh, Bermseok; Kimm, Kuchan; Kim, Shin-Yoon; Lee, Jong-Young

    2007-01-01

    Background Osteoporosis is defined as the loss of bone mineral density that leads to bone fragility with aging. Population-based case-control studies have identified polymorphisms in many candidate genes that have been associated with bone mass maintenance or osteoporotic fracture. To investigate single nucleotide polymorphisms (SNPs) that are associated with osteoporosis, we examined the genetic variation among Koreans by analyzing 81 genes according to their function in bone formation and resorption during bone remodeling. Methods We resequenced all the exons, splice junctions and promoter regions of candidate osteoporosis genes using 24 unrelated Korean individuals. Using the common SNPs from our study and the HapMap database, a statistical analysis of deviation in heterozygosity depicted. Results We identified 942 variants, including 888 SNPs, 43 insertion/deletion polymorphisms, and 11 microsatellite markers. Of the SNPs, 557 (63%) had been previously identified and 331 (37%) were newly discovered in the Korean population. When compared SNPs in the Korean population with those in HapMap database, 1% (or less) of SNPs in the Japanese and Chinese subpopulations and 20% of those in Caucasian and African subpopulations were significantly differentiated from the Hardy-Weinberg expectations. In addition, an analysis of the genetic diversity showed that there were no significant differences among Korean, Han Chinese and Japanese populations, but African and Caucasian populations were significantly differentiated in selected genes. Nevertheless, in the detailed analysis of genetic properties, the LD and Haplotype block patterns among the five sub-populations were substantially different from one another. Conclusion Through the resequencing of 81 osteoporosis candidate genes, 118 unknown SNPs with a minor allele frequency (MAF) > 0.05 were discovered in the Korean population. In addition, using the common SNPs between our study and HapMap, an analysis of genetic diversity and deviation in heterozygosity was performed and the polymorphisms of the above genes among the five populations were substantially differentiated from one another. Further studies of osteoporosis could utilize the polymorphisms identified in our data since they may have important implications for the selection of highly informative SNPs for future association studies. PMID:18036257

  3. SeqWare Query Engine: storing and searching sequence data in the cloud

    PubMed Central

    2010-01-01

    Background Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. Results In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). Conclusions The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets. PMID:21210981

  4. A novel COL11A1 mutation affecting splicing in a patient with Stickler syndrome.

    PubMed

    Kohmoto, Tomohiro; Naruto, Takuya; Kobayashi, Haruka; Watanabe, Miki; Okamoto, Nana; Masuda, Kiyoshi; Imoto, Issei; Okamoto, Nobuhiko

    2015-01-01

    Stickler syndrome is a clinically and genetically heterogeneous collagenopathy characterized by ocular, auditory, skeletal and orofacial abnormalities, commonly occurring as an autosomal dominant trait. We conducted target resequencing to analyze candidate genes associated with known clinical phenotypes from a 4-year-old girl with Stickler syndrome. We detected a novel heterozygous intronic mutation (NM_001854.3:c.3168+5G>A) in COL11A1 that may impair splicing, which was suggested by in silico prediction and a minigene assay.

  5. A novel COL11A1 mutation affecting splicing in a patient with Stickler syndrome

    PubMed Central

    Kohmoto, Tomohiro; Naruto, Takuya; Kobayashi, Haruka; Watanabe, Miki; Okamoto, Nana; Masuda, Kiyoshi; Imoto, Issei; Okamoto, Nobuhiko

    2015-01-01

    Stickler syndrome is a clinically and genetically heterogeneous collagenopathy characterized by ocular, auditory, skeletal and orofacial abnormalities, commonly occurring as an autosomal dominant trait. We conducted target resequencing to analyze candidate genes associated with known clinical phenotypes from a 4-year-old girl with Stickler syndrome. We detected a novel heterozygous intronic mutation (NM_001854.3:c.3168+5G>A) in COL11A1 that may impair splicing, which was suggested by in silico prediction and a minigene assay. PMID:27081549

  6. ScanIndel: a hybrid framework for indel detection via gapped alignment, split reads and de novo assembly.

    PubMed

    Yang, Rendong; Nelson, Andrew C; Henzler, Christine; Thyagarajan, Bharat; Silverstein, Kevin A T

    2015-12-07

    Comprehensive identification of insertions/deletions (indels) across the full size spectrum from second generation sequencing is challenging due to the relatively short read length inherent in the technology. Different indel calling methods exist but are limited in detection to specific sizes with varying accuracy and resolution. We present ScanIndel, an integrated framework for detecting indels with multiple heuristics including gapped alignment, split reads and de novo assembly. Using simulation data, we demonstrate ScanIndel's superior sensitivity and specificity relative to several state-of-the-art indel callers across various coverage levels and indel sizes. ScanIndel yields higher predictive accuracy with lower computational cost compared with existing tools for both targeted resequencing data from tumor specimens and high coverage whole-genome sequencing data from the human NIST standard NA12878. Thus, we anticipate ScanIndel will improve indel analysis in both clinical and research settings. ScanIndel is implemented in Python, and is freely available for academic use at https://github.com/cauyrd/ScanIndel.

  7. Phenotypic diversification by enhanced genome restructuring after induction of multiple DNA double-strand breaks.

    PubMed

    Muramoto, Nobuhiko; Oda, Arisa; Tanaka, Hidenori; Nakamura, Takahiro; Kugou, Kazuto; Suda, Kazuki; Kobayashi, Aki; Yoneda, Shiori; Ikeuchi, Akinori; Sugimoto, Hiroki; Kondo, Satoshi; Ohto, Chikara; Shibata, Takehiko; Mitsukawa, Norihiro; Ohta, Kunihiro

    2018-05-18

    DNA double-strand break (DSB)-mediated genome rearrangements are assumed to provide diverse raw genetic materials enabling accelerated adaptive evolution; however, it remains unclear about the consequences of massive simultaneous DSB formation in cells and their resulting phenotypic impact. Here, we establish an artificial genome-restructuring technology by conditionally introducing multiple genomic DSBs in vivo using a temperature-dependent endonuclease TaqI. Application in yeast and Arabidopsis thaliana generates strains with phenotypes, including improved ethanol production from xylose at higher temperature and increased plant biomass, that are stably inherited to offspring after multiple passages. High-throughput genome resequencing revealed that these strains harbor diverse rearrangements, including copy number variations, translocations in retrotransposons, and direct end-joinings at TaqI-cleavage sites. Furthermore, large-scale rearrangements occur frequently in diploid yeasts (28.1%) and tetraploid plants (46.3%), whereas haploid yeasts and diploid plants undergo minimal rearrangement. This genome-restructuring system (TAQing system) will enable rapid genome breeding and aid genome-evolution studies.

  8. Monogenic Autoinflammatory Diseases with Mendelian Inheritance: Genes, Mutations, and Genotype/Phenotype Correlations

    PubMed Central

    Martorana, Davide; Bonatti, Francesco; Mozzoni, Paola; Vaglio, Augusto; Percesepe, Antonio

    2017-01-01

    Autoinflammatory diseases (AIDs) are a genetically heterogeneous group of diseases caused by mutations of genes encoding proteins, which play a pivotal role in the regulation of the inflammatory response. In the pathogenesis of AIDs, the role of the genetic background is triggered by environmental factors through the modulation of the innate immune system. Monogenic AIDs are characterized by Mendelian inheritance and are caused by highly penetrant genetic variants in single genes. During the last years, remarkable progress has been made in the identification of disease-associated genes by using new technologies, such as next-generation sequencing, which has allowed the genetic characterization in undiagnosed patients and in sporadic cases by means of targeted resequencing of a gene panel and whole exome sequencing. In this review, we delineate the genetics of the monogenic AIDs, report the role of the most common gene mutations, and describe the evidences of the most sound genotype/phenotype correlations in AID. PMID:28421071

  9. Landscape of genomic diversity and trait discovery in soybean.

    PubMed

    Valliyodan, Babu; Dan Qiu; Patil, Gunvant; Zeng, Peng; Huang, Jiaying; Dai, Lu; Chen, Chengxuan; Li, Yanjun; Joshi, Trupti; Song, Li; Vuong, Tri D; Musket, Theresa A; Xu, Dong; Shannon, J Grover; Shifeng, Cheng; Liu, Xin; Nguyen, Henry T

    2016-03-31

    Cultivated soybean [Glycine max (L.) Merr.] is a primary source of vegetable oil and protein. We report a landscape analysis of genome-wide genetic variation and an association study of major domestication and agronomic traits in soybean. A total of 106 soybean genomes representing wild, landraces, and elite lines were re-sequenced at an average of 17x depth with a 97.5% coverage. Over 10 million high-quality SNPs were discovered, and 35.34% of these have not been previously reported. Additionally, 159 putative domestication sweeps were identified, which includes 54.34 Mbp (4.9%) and 4,414 genes; 146 regions were involved in artificial selection during domestication. A genome-wide association study of major traits including oil and protein content, salinity, and domestication traits resulted in the discovery of novel alleles. Genomic information from this study provides a valuable resource for understanding soybean genome structure and evolution, and can also facilitate trait dissection leading to sequencing-based molecular breeding.

  10. Landscape of genomic diversity and trait discovery in soybean

    PubMed Central

    Valliyodan, Babu; Dan Qiu; Patil, Gunvant; Zeng, Peng; Huang, Jiaying; Dai, Lu; Chen, Chengxuan; Li, Yanjun; Joshi, Trupti; Song, Li; Vuong, Tri D.; Musket, Theresa A.; Xu, Dong; Shannon, J. Grover; Shifeng, Cheng; Liu, Xin; Nguyen, Henry T.

    2016-01-01

    Cultivated soybean [Glycine max (L.) Merr.] is a primary source of vegetable oil and protein. We report a landscape analysis of genome-wide genetic variation and an association study of major domestication and agronomic traits in soybean. A total of 106 soybean genomes representing wild, landraces, and elite lines were re-sequenced at an average of 17x depth with a 97.5% coverage. Over 10 million high-quality SNPs were discovered, and 35.34% of these have not been previously reported. Additionally, 159 putative domestication sweeps were identified, which includes 54.34 Mbp (4.9%) and 4,414 genes; 146 regions were involved in artificial selection during domestication. A genome-wide association study of major traits including oil and protein content, salinity, and domestication traits resulted in the discovery of novel alleles. Genomic information from this study provides a valuable resource for understanding soybean genome structure and evolution, and can also facilitate trait dissection leading to sequencing-based molecular breeding. PMID:27029319

  11. A short insertion mutation disrupts genesis of miR-16 and causes increased body weight in domesticated chicken.

    PubMed

    Jia, Xinzheng; Lin, Huiran; Nie, Qinghua; Zhang, Xiquan; Lamont, Susan J

    2016-11-03

    Body weight is one of the most important quantitative traits with high heritability in chicken. We previously mapped a quantitative trait locus (QTL) for body weight by genome-wide association study (GWAS) in an F2 chicken resource population. To identify the causal mutations linked to this QTL, expression profiles were determined on livers of high-weight and low-weight chicken lines by microarray. Combining the expression pattern with SNP effects by GWAS, miR-16 was identified as the most likely potential candidate with a 3.8-fold decrease in high-weight lines. Re-sequencing revealed that a 54-bp insertion mutation in the upstream region of miR-15a-16 displayed high allele frequencies in high-weight commercial broiler line. This mutation resulted in lower miR-16 expression by introducing three novel splicing sites instead of the missing 5' terminal splicing of mature miR-16. Elevating miR-16 significantly inhibited DF-1 chicken embryo cell proliferation, consistent with a role in suppression of cellular growth. The 54-bp insertion was significantly associated with increased body weight, bone size and muscle mass. Also, the insertion mutation tended towards fixation in commercial broilers (Fst > 0.4). Our findings revealed a novel causative mutation for body weight regulation that aids our basic understanding of growth regulation in birds.

  12. A short insertion mutation disrupts genesis of miR-16 and causes increased body weight in domesticated chicken

    PubMed Central

    Jia, Xinzheng; Lin, Huiran; Nie, Qinghua; Zhang, Xiquan; Lamont, Susan J.

    2016-01-01

    Body weight is one of the most important quantitative traits with high heritability in chicken. We previously mapped a quantitative trait locus (QTL) for body weight by genome-wide association study (GWAS) in an F2 chicken resource population. To identify the causal mutations linked to this QTL, expression profiles were determined on livers of high-weight and low-weight chicken lines by microarray. Combining the expression pattern with SNP effects by GWAS, miR-16 was identified as the most likely potential candidate with a 3.8-fold decrease in high-weight lines. Re-sequencing revealed that a 54-bp insertion mutation in the upstream region of miR-15a-16 displayed high allele frequencies in high-weight commercial broiler line. This mutation resulted in lower miR-16 expression by introducing three novel splicing sites instead of the missing 5′ terminal splicing of mature miR-16. Elevating miR-16 significantly inhibited DF-1 chicken embryo cell proliferation, consistent with a role in suppression of cellular growth. The 54-bp insertion was significantly associated with increased body weight, bone size and muscle mass. Also, the insertion mutation tended towards fixation in commercial broilers (Fst > 0.4). Our findings revealed a novel causative mutation for body weight regulation that aids our basic understanding of growth regulation in birds. PMID:27808177

  13. An ultra-high-density bin map facilitates high-throughput QTL mapping of horticultural traits in pepper (Capsicum annuum).

    PubMed

    Han, Koeun; Jeong, Hee-Jin; Yang, Hee-Bum; Kang, Sung-Min; Kwon, Jin-Kyung; Kim, Seungill; Choi, Doil; Kang, Byoung-Cheorl

    2016-04-01

    Most agricultural traits are controlled by quantitative trait loci (QTLs); however, there are few studies on QTL mapping of horticultural traits in pepper (Capsicum spp.) due to the lack of high-density molecular maps and the sequence information. In this study, an ultra-high-density map and 120 recombinant inbred lines (RILs) derived from a cross between C. annuum'Perennial' and C. annuum'Dempsey' were used for QTL mapping of horticultural traits. Parental lines and RILs were resequenced at 18× and 1× coverage, respectively. Using a sliding window approach, an ultra-high-density bin map containing 2,578 bins was constructed. The total map length of the map was 1,372 cM, and the average interval between bins was 0.53 cM. A total of 86 significant QTLs controlling 17 horticultural traits were detected. Among these, 32 QTLs controlling 13 traits were major QTLs. Our research shows that the construction of bin maps using low-coverage sequence is a powerful method for QTL mapping, and that the short intervals between bins are helpful for fine-mapping of QTLs. Furthermore, bin maps can be used to improve the quality of reference genomes by elucidating the genetic order of unordered regions and anchoring unassigned scaffolds to linkage groups. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  14. Assessing pooled BAC and whole genome shotgun strategies for assembly of complex genomes.

    PubMed

    Haiminen, Niina; Feltus, F Alex; Parida, Laxmi

    2011-04-15

    We investigate if pooling BAC clones and sequencing the pools can provide for more accurate assembly of genome sequences than the "whole genome shotgun" (WGS) approach. Furthermore, we quantify this accuracy increase. We compare the pooled BAC and WGS approaches using in silico simulations. Standard measures of assembly quality focus on assembly size and fragmentation, which are desirable for large whole genome assemblies. We propose additional measures enabling easy and visual comparison of assembly quality, such as rearrangements and redundant sequence content, relative to the known target sequence. The best assembly quality scores were obtained using 454 coverage of 15× linear and 5× paired (3kb insert size) reads (15L-5P) on Arabidopsis. This regime gave similarly good results on four additional plant genomes of very different GC and repeat contents. BAC pooling improved assembly scores over WGS assembly, coverage and redundancy scores improving the most. BAC pooling works better than WGS, however, both require a physical map to order the scaffolds. Pool sizes up to 12Mbp work well, suggesting this pooling density to be effective in medium-scale re-sequencing applications such as targeted sequencing of QTL intervals for candidate gene discovery. Assuming the current Roche/454 Titanium sequencing limitations, a 12 Mbp region could be re-sequenced with a full plate of linear reads and a half plate of paired-end reads, yielding 15L-5P coverage after read pre-processing. Our simulation suggests that massively over-sequencing may not improve accuracy. Our scoring measures can be used generally to evaluate and compare results of simulated genome assemblies.

  15. Gene expression profiling and candidate gene resequencing identifies pathways and mutations important for malignant transformation caused by leukemogenic fusion genes.

    PubMed

    Novak, Rachel L; Harper, David P; Caudell, David; Slape, Christopher; Beachy, Sarah H; Aplan, Peter D

    2012-12-01

    NUP98-HOXD13 (NHD13) and CALM-AF10 (CA10) are oncogenic fusion proteins produced by recurrent chromosomal translocations in patients with acute myeloid leukemia (AML). Transgenic mice that express these fusions develop AML with a long latency and incomplete penetrance, suggesting that collaborating genetic events are required for leukemic transformation. We employed genetic techniques to identify both preleukemic abnormalities in healthy transgenic mice as well as collaborating events leading to leukemic transformation. Candidate gene resequencing revealed that 6 of 27 (22%) CA10 AMLs spontaneously acquired a Ras pathway mutation and 8 of 27 (30%) acquired an Flt3 mutation. Two CA10 AMLs acquired an Flt3 internal-tandem duplication, demonstrating that these mutations can be acquired in murine as well as human AML. Gene expression profiles revealed a marked upregulation of Hox genes, particularly Hoxa5, Hoxa9, and Hoxa10 in both NHD13 and CA10 mice. Furthermore, mir196b, which is embedded within the Hoxa locus, was overexpressed in both CA10 and NHD13 samples. In contrast, the Hox cofactors Meis1 and Pbx3 were differentially expressed; Meis1 was increased in CA10 AMLs but not NHD13 AMLs, whereas Pbx3 was consistently increased in NHD13 but not CA10 AMLs. Silencing of Pbx3 in NHD13 cells led to decreased proliferation, increased apoptosis, and decreased colony formation in vitro, suggesting a previously unexpected role for Pbx3 in leukemic transformation. Published by Elsevier Inc.

  16. Single nucleotide variants and InDels identified from whole-genome re-sequencing of Guzerat, Gyr, Girolando and Holstein cattle breeds.

    PubMed

    Stafuzza, Nedenia Bonvino; Zerlotini, Adhemar; Lobo, Francisco Pereira; Yamagishi, Michel Eduardo Beleza; Chud, Tatiane Cristina Seleguim; Caetano, Alexandre Rodrigues; Munari, Danísio Prado; Garrick, Dorian J; Machado, Marco Antonio; Martins, Marta Fonseca; Carvalho, Maria Raquel; Cole, John Bruce; Barbosa da Silva, Marcos Vinicius Gualberto

    2017-01-01

    Whole-genome re-sequencing, alignment and annotation analyses were undertaken for 12 sires representing four important cattle breeds in Brazil: Guzerat (multi-purpose), Gyr, Girolando and Holstein (dairy production). A total of approximately 4.3 billion reads from an Illumina HiSeq 2000 sequencer generated for each animal 10.7 to 16.4-fold genome coverage. A total of 27,441,279 single nucleotide variations (SNVs) and 3,828,041 insertions/deletions (InDels) were detected in the samples, of which 2,557,670 SNVs and 883,219 InDels were novel. The submission of these genetic variants to the dbSNP database significantly increased the number of known variants, particularly for the indicine genome. The concordance rate between genotypes obtained using the Bovine HD BeadChip array and the same variants identified by sequencing was about 99.05%. The annotation of variants identified numerous non-synonymous SNVs and frameshift InDels which could affect phenotypic variation. Functional enrichment analysis was performed and revealed that variants in the olfactory transduction pathway was over represented in all four cattle breeds, while the ECM-receptor interaction pathway was over represented in Girolando and Guzerat breeds, the ABC transporters pathway was over represented only in Holstein breed, and the metabolic pathways was over represented only in Gyr breed. The genetic variants discovered here provide a rich resource to help identify potential genomic markers and their associated molecular mechanisms that impact economically important traits for Gyr, Girolando, Guzerat and Holstein breeding programs.

  17. Single nucleotide variants and InDels identified from whole-genome re-sequencing of Guzerat, Gyr, Girolando and Holstein cattle breeds

    PubMed Central

    Lobo, Francisco Pereira; Yamagishi, Michel Eduardo Beleza; Chud, Tatiane Cristina Seleguim; Caetano, Alexandre Rodrigues; Munari, Danísio Prado; Garrick, Dorian J.; Machado, Marco Antonio; Martins, Marta Fonseca; Carvalho, Maria Raquel; Cole, John Bruce; Barbosa da Silva, Marcos Vinicius Gualberto

    2017-01-01

    Whole-genome re-sequencing, alignment and annotation analyses were undertaken for 12 sires representing four important cattle breeds in Brazil: Guzerat (multi-purpose), Gyr, Girolando and Holstein (dairy production). A total of approximately 4.3 billion reads from an Illumina HiSeq 2000 sequencer generated for each animal 10.7 to 16.4-fold genome coverage. A total of 27,441,279 single nucleotide variations (SNVs) and 3,828,041 insertions/deletions (InDels) were detected in the samples, of which 2,557,670 SNVs and 883,219 InDels were novel. The submission of these genetic variants to the dbSNP database significantly increased the number of known variants, particularly for the indicine genome. The concordance rate between genotypes obtained using the Bovine HD BeadChip array and the same variants identified by sequencing was about 99.05%. The annotation of variants identified numerous non-synonymous SNVs and frameshift InDels which could affect phenotypic variation. Functional enrichment analysis was performed and revealed that variants in the olfactory transduction pathway was over represented in all four cattle breeds, while the ECM-receptor interaction pathway was over represented in Girolando and Guzerat breeds, the ABC transporters pathway was over represented only in Holstein breed, and the metabolic pathways was over represented only in Gyr breed. The genetic variants discovered here provide a rich resource to help identify potential genomic markers and their associated molecular mechanisms that impact economically important traits for Gyr, Girolando, Guzerat and Holstein breeding programs. PMID:28323836

  18. The 3,000 rice genomes project

    PubMed Central

    2014-01-01

    Background Rice, Oryza sativa L., is the staple food for half the world’s population. By 2030, the production of rice must increase by at least 25% in order to keep up with global population growth and demand. Accelerated genetic gains in rice improvement are needed to mitigate the effects of climate change and loss of arable land, as well as to ensure a stable global food supply. Findings We resequenced a core collection of 3,000 rice accessions from 89 countries. All 3,000 genomes had an average sequencing depth of 14×, with average genome coverages and mapping rates of 94.0% and 92.5%, respectively. From our sequencing efforts, approximately 18.9 million single nucleotide polymorphisms (SNPs) in rice were discovered when aligned to the reference genome of the temperate japonica variety, Nipponbare. Phylogenetic analyses based on SNP data confirmed differentiation of the O. sativa gene pool into 5 varietal groups – indica, aus/boro, basmati/sadri, tropical japonica and temperate japonica. Conclusions Here, we report an international resequencing effort of 3,000 rice genomes. This data serves as a foundation for large-scale discovery of novel alleles for important rice phenotypes using various bioinformatics and/or genetic approaches. It also serves to understand the genomic diversity within O. sativa at a higher level of detail. With the release of the sequencing data, the project calls for the global rice community to take advantage of this data as a foundation for establishing a global, public rice genetic/genomic database and information platform for advancing rice breeding technology for future rice improvement. PMID:24872877

  19. Genome-wide DNA polymorphism in the indica rice varieties RGD-7S and Taifeng B as revealed by whole genome re-sequencing.

    PubMed

    Fu, Chong-Yun; Liu, Wu-Ge; Liu, Di-Lin; Li, Ji-Hua; Zhu, Man-Shan; Liao, Yi-Long; Liu, Zhen-Rong; Zeng, Xue-Qin; Wang, Feng

    2016-03-01

    Next-generation sequencing technologies provide opportunities to further understand genetic variation, even within closely related cultivars. We performed whole genome resequencing of two elite indica rice varieties, RGD-7S and Taifeng B, whose F1 progeny showed hybrid weakness and hybrid vigor when grown in the early- and late-cropping seasons, respectively. Approximately 150 million 100-bp pair-end reads were generated, which covered ∼86% of the rice (Oryza sativa L. japonica 'Nipponbare') reference genome. A total of 2,758,740 polymorphic sites including 2,408,845 SNPs and 349,895 InDels were detected in RGD-7S and Taifeng B, respectively. Applying stringent parameters, we identified 961,791 SNPs and 46,640 InDels between RGD-7S and Taifeng B (RGD-7S/Taifeng B). The density of DNA polymorphisms was 256.8 SNPs and 12.5 InDels per 100 kb for RGD-7S/Taifeng B. Copy number variations (CNVs) were also investigated. In RGD-7S, 1989 of 2727 CNVs were overlapped in 218 genes, and 1231 of 2010 CNVs were annotated in 175 genes in Taifeng B. In addition, we verified a subset of InDels in the interval of hybrid weakness genes, Hw3 and Hw4, and obtained some polymorphic InDel markers, which will provide a sound foundation for cloning hybrid weakness genes. Analysis of genomic variations will also contribute to understanding the genetic basis of hybrid weakness and heterosis.

  20. Genetic diversity, molecular phylogeny and selection evidence of the silkworm mitochondria implicated by complete resequencing of 41 genomes

    PubMed Central

    2010-01-01

    Background Mitochondria are a valuable resource for studying the evolutionary process and deducing phylogeny. A few mitochondria genomes have been sequenced, but a comprehensive picture of the domestication event for silkworm mitochondria remains to be established. In this study, we integrate the extant data, and perform a whole genome resequencing of Japanese wild silkworm to obtain breakthrough results in silkworm mitochondrial (mt) population, and finally use these to deduce a more comprehensive phylogeny of the Bombycidae. Results We identified 347 single nucleotide polymorphisms (SNPs) in the mt genome, but found no past recombination event to have occurred in the silkworm progenitor. A phylogeny inferred from these whole genome SNPs resulted in a well-classified tree, confirming that the domesticated silkworm, Bombyx mori, most recently diverged from the Chinese wild silkworm, rather than from the Japanese wild silkworm. We showed that the population sizes of the domesticated and Chinese wild silkworms both experience neither expansion nor contraction. We also discovered that one mt gene, named cytochrome b, shows a strong signal of positive selection in the domesticated clade. This gene is related to energy metabolism, and may have played an important role during silkworm domestication. Conclusions We present a comparative analysis on 41 mt genomes of B. mori and B. mandarina from China and Japan. With these, we obtain a much clearer picture of the evolution history of the silkworm. The data and analyses presented here aid our understanding of the silkworm in general, and provide a crucial insight into silkworm phylogeny. PMID:20334646

  1. Next-generation sequencing for identification of candidate genes for Fusarium wilt and sterility mosaic disease in pigeonpea (Cajanus cajan).

    PubMed

    Singh, Vikas K; Khan, Aamir W; Saxena, Rachit K; Kumar, Vinay; Kale, Sandip M; Sinha, Pallavi; Chitikineni, Annapurna; Pazhamala, Lekha T; Garg, Vanika; Sharma, Mamta; Sameer Kumar, Chanda Venkata; Parupalli, Swathi; Vechalapu, Suryanarayana; Patil, Suyash; Muniswamy, Sonnappa; Ghanta, Anuradha; Yamini, Kalinati Narasimhan; Dharmaraj, Pallavi Subbanna; Varshney, Rajeev K

    2016-05-01

    To map resistance genes for Fusarium wilt (FW) and sterility mosaic disease (SMD) in pigeonpea, sequencing-based bulked segregant analysis (Seq-BSA) was used. Resistant (R) and susceptible (S) bulks from the extreme recombinant inbred lines of ICPL 20096 × ICPL 332 were sequenced. Subsequently, SNP index was calculated between R- and S-bulks with the help of draft genome sequence and reference-guided assembly of ICPL 20096 (resistant parent). Seq-BSA has provided seven candidate SNPs for FW and SMD resistance in pigeonpea. In parallel, four additional genotypes were re-sequenced and their combined analysis with R- and S-bulks has provided a total of 8362 nonsynonymous (ns) SNPs. Of 8362 nsSNPs, 60 were found within the 2-Mb flanking regions of seven candidate SNPs identified through Seq-BSA. Haplotype analysis narrowed down to eight nsSNPs in seven genes. These eight nsSNPs were further validated by re-sequencing 11 genotypes that are resistant and susceptible to FW and SMD. This analysis revealed association of four candidate nsSNPs in four genes with FW resistance and four candidate nsSNPs in three genes with SMD resistance. Further, In silico protein analysis and expression profiling identified two most promising candidate genes namely C.cajan_01839 for SMD resistance and C.cajan_03203 for FW resistance. Identified candidate genomic regions/SNPs will be useful for genomics-assisted breeding in pigeonpea. © 2015 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  2. Re-sequencing and genetic variation identification of a rice line with ideal plant architecture.

    PubMed

    Li, Shuangcheng; Xie, Kailong; Li, Wenbo; Zou, Ting; Ren, Yun; Wang, Shiquan; Deng, Qiming; Zheng, Aiping; Zhu, Jun; Liu, Huainian; Wang, Lingxia; Ai, Peng; Gao, Fengyan; Huang, Bin; Cao, Xuemei; Li, Ping

    2012-12-01

    The ideal plant architecture (IPA) includes several important characteristics such as low tiller numbers, few or no unproductive tillers, more grains per panicle, and thick and sturdy stems. We have developed an indica restorer line 7302R that displays the IPA phenotype in terms of tiller number, grain number, and stem strength. However, its mechanism had to be clarified. We performed re-sequencing and genome-wide variation analysis of 7302R using the Solexa sequencing technology. With the genomic sequence of the indica cultivar 9311 as reference, 307 627 SNPs, 57 372 InDels, and 3 096 SVs were identified in the 7302R genome. The 7302R-specific variations were investigated via the synteny analysis of all the SNPs of 7302R with those of the previous sequenced none-IPA-type lines IR24, MH63, and SH527. Moreover, we found 178 168 7302R-specific SNPs across the whole genome and 30 239 SNPs in the predicted mRNA regions, among which 8 517 were Non-syn CDS. In addition, 263 large-effect SNPs that were expected to affect the integrity of encoded proteins were identified from the 7302R-specific SNPs. SNPs of several important previously cloned rice genes were also identified by aligning the 7302R sequence with other sequence lines. Our results provided several candidates account for the IPA phenotype of 7302R. These results therefore lay the groundwork for long-term efforts to uncover important genes and alleles for rice plant architecture construction, also offer useful data resources for future genetic and genomic studies in rice.

  3. Genomic variants in an inbred mouse model predict mania-like behaviors.

    PubMed

    Saul, Michael C; Stevenson, Sharon A; Zhao, Changjiu; Driessen, Terri M; Eisinger, Brian E; Gammie, Stephen C

    2018-01-01

    Contemporary rodent models for bipolar disorders split the bipolar spectrum into complimentary behavioral endophenotypes representing mania and depression. Widely accepted mania models typically utilize single gene transgenics or pharmacological manipulations, but inbred rodent strains show great potential as mania models. Their acceptance is often limited by the lack of genotypic data needed to establish construct validity. In this study, we used a unique strategy to inexpensively explore and confirm population allele differences in naturally occurring candidate variants in a manic rodent strain, the Madison (MSN) mouse strain. Variants were identified using whole exome resequencing on a small population of animals. Interesting candidate variants were confirmed in a larger population with genotyping. We enriched these results with observations of locomotor behavior from a previous study. Resequencing identified 447 structural variants that are mostly fixed in the MSN strain relative to control strains. After filtering and annotation, we found 11 non-synonymous MSN variants that we believe alter protein function. The allele frequencies for 6 of these variants were consistent with explanatory variants for the Madison strain's phenotype. The variants are in the Npas2, Cp, Polr3c, Smarca4, Trpv1, and Slc5a7 genes, and many of these genes' products are in pathways implicated in human bipolar disorders. Variants in Smarca4 and Polr3c together explained over 40% of the variance in locomotor behavior in the Hsd:ICR founder strain. These results enhance the MSN strain's construct validity and implicate altered nucleosome structure and transcriptional regulation as a chief molecular system underpinning behavior.

  4. Opposing Forces of A/T-Biased Mutations and G/C-Biased Gene Conversions Shape the Genome of the Nematode Pristionchus pacificus

    PubMed Central

    Weller, Andreas M.; Rödelsperger, Christian; Eberhardt, Gabi; Molnar, Ruxandra I.; Sommer, Ralf J.

    2014-01-01

    Base substitution mutations are a major source of genetic novelty and mutation accumulation line (MAL) studies revealed a nearly universal AT bias in de novo mutation spectra. While a comparison of de novo mutation spectra with the actual nucleotide composition in the genome suggests the existence of general counterbalancing mechanisms, little is known about the evolutionary and historical details of these opposing forces. Here, we correlate MAL-derived mutation spectra with patterns observed from population resequencing. Variation observed in natural populations has already been subject to evolutionary forces. Distinction between rare and common alleles, the latter of which are close to fixation and of presumably older age, can provide insight into mutational processes and their influence on genome evolution. We provide a genome-wide analysis of de novo mutations in 22 MALs of the nematode Pristionchus pacificus and compare the spectra with natural variants observed in resequencing of 104 natural isolates. MALs show an AT bias of 5.3, one of the highest values observed to date. In contrast, the AT bias in natural variants is much lower. Specifically, rare derived alleles show an AT bias of 2.4, whereas common derived alleles close to fixation show no AT bias at all. These results indicate the existence of a strong opposing force and they suggest that the GC content of the P. pacificus genome is in equilibrium. We discuss GC-biased gene conversion as a potential mechanism acting against AT-biased mutations. This study provides insight into genome evolution by combining MAL studies with natural variation. PMID:24414549

  5. Missense mutations in TENM4, a regulator of axon guidance and central myelination, cause essential tremor.

    PubMed

    Hor, Hyun; Francescatto, Ludmila; Bartesaghi, Luca; Ortega-Cubero, Sara; Kousi, Maria; Lorenzo-Betancor, Oswaldo; Jiménez-Jiménez, Felix J; Gironell, Alexandre; Clarimón, Jordi; Drechsel, Oliver; Agúndez, José A G; Kenzelmann Broz, Daniela; Chiquet-Ehrismann, Ruth; Lleó, Alberto; Coria, Francisco; García-Martin, Elena; Alonso-Navarro, Hortensia; Martí, Maria J; Kulisevsky, Jaume; Hor, Charlotte N; Ossowski, Stephan; Chrast, Roman; Katsanis, Nicholas; Pastor, Pau; Estivill, Xavier

    2015-10-15

    Essential tremor (ET) is a common movement disorder with an estimated prevalence of 5% of the population aged over 65 years. In spite of intensive efforts, the genetic architecture of ET remains unknown. We used a combination of whole-exome sequencing and targeted resequencing in three ET families. In vitro and in vivo experiments in oligodendrocyte precursor cells and zebrafish were performed to test our findings. Whole-exome sequencing revealed a missense mutation in TENM4 segregating in an autosomal-dominant fashion in an ET family. Subsequent targeted resequencing of TENM4 led to the discovery of two novel missense mutations. Not only did these two mutations segregate with ET in two additional families, but we also observed significant over transmission of pathogenic TENM4 alleles across the three families. Consistent with a dominant mode of inheritance, in vitro analysis in oligodendrocyte precursor cells showed that mutant proteins mislocalize. Finally, expression of human mRNA harboring any of three patient mutations in zebrafish embryos induced defects in axon guidance, confirming a dominant-negative mode of action for these mutations. Our genetic and functional data, which is corroborated by the existence of a Tenm4 knockout mouse displaying an ET phenotype, implicates TENM4 in ET. Together with previous studies of TENM4 in model organisms, our studies intimate that processes regulating myelination in the central nervous system and axon guidance might be significant contributors to the genetic burden of this disorder. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  6. Chromosome-level assembly of Arabidopsis thaliana Ler reveals the extent of translocation and inversion polymorphisms.

    PubMed

    Zapata, Luis; Ding, Jia; Willing, Eva-Maria; Hartwig, Benjamin; Bezdan, Daniela; Jiao, Wen-Biao; Patel, Vipul; Velikkakam James, Geo; Koornneef, Maarten; Ossowski, Stephan; Schneeberger, Korbinian

    2016-07-12

    Resequencing or reference-based assemblies reveal large parts of the small-scale sequence variation. However, they typically fail to separate such local variation into colinear and rearranged variation, because they usually do not recover the complement of large-scale rearrangements, including transpositions and inversions. Besides the availability of hundreds of genomes of diverse Arabidopsis thaliana accessions, there is so far only one full-length assembled genome: the reference sequence. We have assembled 117 Mb of the A. thaliana Landsberg erecta (Ler) genome into five chromosome-equivalent sequences using a combination of short Illumina reads, long PacBio reads, and linkage information. Whole-genome comparison against the reference sequence revealed 564 transpositions and 47 inversions comprising ∼3.6 Mb, in addition to 4.1 Mb of nonreference sequence, mostly originating from duplications. Although rearranged regions are not different in local divergence from colinear regions, they are drastically depleted for meiotic recombination in heterozygotes. Using a 1.2-Mb inversion as an example, we show that such rearrangement-mediated reduction of meiotic recombination can lead to genetically isolated haplotypes in the worldwide population of A. thaliana Moreover, we found 105 single-copy genes, which were only present in the reference sequence or the Ler assembly, and 334 single-copy orthologs, which showed an additional copy in only one of the genomes. To our knowledge, this work gives first insights into the degree and type of variation, which will be revealed once complete assemblies will replace resequencing or other reference-dependent methods.

  7. Synthesis of high-quality libraries of long (150mer) oligonucleotides by a novel depurination controlled process

    PubMed Central

    LeProust, Emily M.; Peck, Bill J.; Spirin, Konstantin; McCuen, Heather Brummel; Moore, Bridget; Namsaraev, Eugeni; Caruthers, Marvin H.

    2010-01-01

    We have achieved the ability to synthesize thousands of unique, long oligonucleotides (150mers) in fmol amounts using parallel synthesis of DNA on microarrays. The sequence accuracy of the oligonucleotides in such large-scale syntheses has been limited by the yields and side reactions of the DNA synthesis process used. While there has been significant demand for libraries of long oligos (150mer and more), the yields in conventional DNA synthesis and the associated side reactions have previously limited the availability of oligonucleotide pools to lengths <100 nt. Using novel array based depurination assays, we show that the depurination side reaction is the limiting factor for the synthesis of libraries of long oligonucleotides on Agilent Technologies’ SurePrint® DNA microarray platform. We also demonstrate how depurination can be controlled and reduced by a novel detritylation process to enable the synthesis of high quality, long (150mer) oligonucleotide libraries and we report the characterization of synthesis efficiency for such libraries. Oligonucleotide libraries prepared with this method have changed the economics and availability of several existing applications (e.g. targeted resequencing, preparation of shRNA libraries, site-directed mutagenesis), and have the potential to enable even more novel applications (e.g. high-complexity synthetic biology). PMID:20308161

  8. Genomic Analyses Reveal Potential Independent Adaptation to High Altitude in Tibetan Chickens.

    PubMed

    Wang, Ming-Shan; Li, Yan; Peng, Min-Sheng; Zhong, Li; Wang, Zong-Ji; Li, Qi-Ye; Tu, Xiao-Long; Dong, Yang; Zhu, Chun-Ling; Wang, Lu; Yang, Min-Min; Wu, Shi-Fang; Miao, Yong-Wang; Liu, Jian-Ping; Irwin, David M; Wang, Wen; Wu, Dong-Dong; Zhang, Ya-Ping

    2015-07-01

    Much like other indigenous domesticated animals, Tibetan chickens living at high altitudes (2,200-4,100 m) show specific physiological adaptations to the extreme environmental conditions of the Tibetan Plateau, but the genetic bases of these adaptations are not well characterized. Here, we assembled a de novo genome of a Tibetan chicken and resequenced whole genomes of 32 additional chickens, including Tibetan chickens, village chickens, game fowl, and Red Junglefowl, and found that the Tibetan chickens could broadly be placed into two groups. Further analyses revealed that several candidate genes in the calcium-signaling pathway are possibly involved in adaptation to the hypoxia experienced by these chickens, as these genes appear to have experienced directional selection in the two Tibetan chicken populations, suggesting a potential genetic mechanism underlying high altitude adaptation in Tibetan chickens. The candidate selected genes identified in this study, and their variants, may be useful targets for clarifying our understanding of the domestication of chickens in Tibet, and might be useful in current breeding efforts to develop improved breeds for the highlands. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  9. Mismatch repair deficiency does not enhance ENU mutagenesis in the zebrafish germ line.

    PubMed

    Feitsma, Harma; de Bruijn, Ewart; van de Belt, Jose; Nijman, Isaac J; Cuppen, Edwin

    2008-07-01

    S(N)1-type alkylating agents such as N-ethyl-N-nitrosourea (ENU) are very potent mutagens. They act by transferring their alkyl group to DNA bases, which, upon mispairing during replication, can cause single base pair mutations in the next replication cycle. As DNA mismatch repair (MMR) proteins are involved in the recognition of alkylation damage, we hypothesized that ENU-induced mutation rates could be increased in a MMR-deficient background, which would be beneficial for mutagenesis approaches. We applied a standard ENU mutagenesis protocol to adult zebrafish deficient in the MMR gene msh6 and heterozygous controls to study the effect of MMR on ENU-induced DNA damage. Dose-dependent lethality was found to be similar for homozygous and heterozygous mutants, indicating that there is no difference in ENU resistance. Mutation discovery by high-throughput dideoxy resequencing of genomic targets in outcrossed progeny of the mutagenized fish did also not reveal any differences in germ line mutation frequency. These results may indicate that the maximum mutation load for zebrafish has been reached with the currently used, highly optimized ENU mutagenesis protocol. Alternatively, the MMR system in the zebrafish germ line may be saturated very rapidly, thereby having a limited effect on high-dose ENU mutagenesis.

  10. Enrichment of target sequences for next-generation sequencing applications in research and diagnostics.

    PubMed

    Altmüller, Janine; Budde, Birgit S; Nürnberg, Peter

    2014-02-01

    Abstract Targeted re-sequencing such as gene panel sequencing (GPS) has become very popular in medical genetics, both for research projects and in diagnostic settings. The technical principles of the different enrichment methods have been reviewed several times before; however, new enrichment products are constantly entering the market, and researchers are often puzzled about the requirement to take decisions about long-term commitments, both for the enrichment product and the sequencing technology. This review summarizes important considerations for the experimental design and provides helpful recommendations in choosing the best sequencing strategy for various research projects and diagnostic applications.

  11. The Yak genome database: an integrative database for studying yak biology and high-altitude adaption

    PubMed Central

    2012-01-01

    Background The yak (Bos grunniens) is a long-haired bovine that lives at high altitudes and is an important source of milk, meat, fiber and fuel. The recent sequencing, assembly and annotation of its genome are expected to further our understanding of the means by which it has adapted to life at high altitudes and its ecologically important traits. Description The Yak Genome Database (YGD) is an internet-based resource that provides access to genomic sequence data and predicted functional information concerning the genes and proteins of Bos grunniens. The curated data stored in the YGD includes genome sequences, predicted genes and associated annotations, non-coding RNA sequences, transposable elements, single nucleotide variants, and three-way whole-genome alignments between human, cattle and yak. YGD offers useful searching and data mining tools, including the ability to search for genes by name or using function keywords as well as GBrowse genome browsers and/or BLAST servers, which can be used to visualize genome regions and identify similar sequences. Sequence data from the YGD can also be downloaded to perform local searches. Conclusions A new yak genome database (YGD) has been developed to facilitate studies on high-altitude adaption and bovine genomics. The database will be continuously updated to incorporate new information such as transcriptome data and population resequencing data. The YGD can be accessed at http://me.lzu.edu.cn/yak. PMID:23134687

  12. Mutation spectrum of the FZD-4, TSPAN12 AND ZNF408 genes in Indian FEVR patients.

    PubMed

    Musada, Ganeswara Rao; Syed, Hameed; Jalali, Subhadra; Chakrabarti, Subhabrata; Kaur, Inderjeet

    2016-06-17

    Mutations in candidate genes that encode for a ligand (NDP) and receptor complex (FZD4, LRP5 and TSPAN12) in the Norrin β-catenin signaling pathway are involved in the pathogenesis of familial exudative vitreoretinopathy (FEVR, MIM # 133780). Recently, a transcription factor (ZNF408) has also been implicated in FEVR. We had earlier characterized the variations in NDP among FEVR patients from India. The present study aimed at understanding the involvement of the remaining genes (FZD4, TSPAN12 and ZNF408) in the same cohort. The DNA of 110 unrelated FEVR patients and 115 unaffected controls were screened for variations in the entire coding and untranslated regions of these 3 genes by resequencing. Segregation of the disease-associated variants was assessed in the family members of the probands. The effect of the observed missense changes were further analyzed by SIFT and PolyPhen-2 scores. The screening of FZD4, TSPAN12 and ZNF408 genes identified 11 different mutations in 15/110 FEVR probands. Of the 11 identified mutations, 6 mutations were novel. The detected missense mutations were mainly located in the domains which are functionally crucial for the formation of ligand-receptor complex and as they replaced evolutionarily highly conserved amino acids with a SIFT score < 0.005, they are predicted to be pathogenic. Additionally 2 novel and 16 reported single nucleotide polymorphisms (SNP) were also detected. Our genetic screening revealed varying mutation frequencies in the FZD4 (8.0 %), TSPAN12 (5.4 %) and ZNF408 (2.7 %) genes among the FEVR patients, indicating their potential role in the disease pathogenesis. The observed mutations segregated with the disease phenotype and exhibited variable expressivity. The mutations in FZD4 and TSPAN12 were involved in autosomal dominant and autosomal recessive families and further validates the involvement of these gene in FEVR development.

  13. Comparative mitogenomic analysis of mirid bugs (Hemiptera: Miridae) and evaluation of potential DNA barcoding markers.

    PubMed

    Wang, Juan; Zhang, Li; Zhang, Qi-Lin; Zhou, Min-Qiang; Wang, Xiao-Tong; Yang, Xing-Zhuo; Yuan, Ming-Long

    2017-01-01

    The family Miridae is one of the most species-rich families of insects. To better understand the diversity and evolution of mirids, we determined the mitogenome of Lygus pratenszs and re-sequenced the mitogenomes of four mirids (i.e., Apolygus lucorum , Adelphocoris suturalis , Ade. fasciaticollis and Ade. lineolatus ). We performed a comparative analysis for 15 mitogenomic sequences representing 11 species of five genera within Miridae and evaluated the potential of these mitochondrial genes as molecular markers. Our results showed that the general mitogenomic features (gene content, gene arrangement, base composition and codon usage) were well conserved among these mirids. Four protein-coding genes (PCGs) ( cox1 , cox3 , nad1 and nad3 ) had no length variability, where nad5 showed the largest size variation; no intraspecific length variation was found in PCGs. Two PCGs ( nad4 and nad5 ) showed relatively high substitution rates at the nucleotide and amino acid levels, where cox1 had the lowest substitution rate. The Ka/Ks values for all PCGs were far lower than 1 (<0.59), but the Ka/Ks values of cox1 -barcode sequences were always larger than 1 (1.34 -15.20), indicating that the 658 bp sequences of cox1 may be not the appropriate marker due to positive selection or selection relaxation. Phylogenetic analyses based on two concatenated mitogenomic datasets consistently supported the relationship of Nesidiocoris + ( Trigonotylus + ( Adelphocoris + ( Apolygus + Lygus ))), as revealed by nad4 , nad5 , rrnL and the combined 22 transfer RNA genes (tRNAs), respectively. Taken sequence length, substitution rate and phylogenetic signal together, the individual genes ( nad4 , nad5 and rrnL ) and the combined 22 tRNAs could been used as potential molecular markers for Miridae at various taxonomic levels. Our results suggest that it is essential to evaluate and select suitable markers for different taxa groups when performing phylogenetic, population genetic and species identification studies.

  14. Whole exome sequencing reveals concomitant mutations of multiple FA genes in individual Fanconi anemia patients

    PubMed Central

    2014-01-01

    Background Fanconi anemia (FA) is a rare inherited genetic syndrome with highly variable clinical manifestations. Fifteen genetic subtypes of FA have been identified. Traditional complementation tests for grouping studies have been used generally in FA patients and in stepwise methods to identify the FA type, which can result in incomplete genetic information from FA patients. Methods We diagnosed five pediatric patients with FA based on clinical manifestations, and we performed exome sequencing of peripheral blood specimens from these patients and their family members. The related sequencing data were then analyzed by bioinformatics, and the FANC gene mutations identified by exome sequencing were confirmed by PCR re-sequencing. Results Homozygous and compound heterozygous mutations of FANC genes were identified in all of the patients. The FA subtypes of the patients included FANCA, FANCM and FANCD2. Interestingly, four FA patients harbored multiple mutations in at least two FA genes, and some of these mutations have not been previously reported. These patients’ clinical manifestations were vastly different from each other, as were their treatment responses to androstanazol and prednisone. This finding suggests that heterozygous mutation(s) in FA genes could also have diverse biological and/or pathophysiological effects on FA patients or FA gene carriers. Interestingly, we were not able to identify de novo mutations in the genes implicated in DNA repair pathways when the sequencing data of patients were compared with those of their parents. Conclusions Our results indicate that Chinese FA patients and carriers might have higher and more complex mutation rates in FANC genes than have been conventionally recognized. Testing of the fifteen FANC genes in FA patients and their family members should be a regular clinical practice to determine the optimal care for the individual patient, to counsel the family and to obtain a better understanding of FA pathophysiology. PMID:24885126

  15. Whole exome sequencing reveals concomitant mutations of multiple FA genes in individual Fanconi anemia patients.

    PubMed

    Chang, Lixian; Yuan, Weiping; Zeng, Huimin; Zhou, Quanquan; Wei, Wei; Zhou, Jianfeng; Li, Miaomiao; Wang, Xiaomin; Xu, Mingjiang; Yang, Fengchun; Yang, Yungui; Cheng, Tao; Zhu, Xiaofan

    2014-05-15

    Fanconi anemia (FA) is a rare inherited genetic syndrome with highly variable clinical manifestations. Fifteen genetic subtypes of FA have been identified. Traditional complementation tests for grouping studies have been used generally in FA patients and in stepwise methods to identify the FA type, which can result in incomplete genetic information from FA patients. We diagnosed five pediatric patients with FA based on clinical manifestations, and we performed exome sequencing of peripheral blood specimens from these patients and their family members. The related sequencing data were then analyzed by bioinformatics, and the FANC gene mutations identified by exome sequencing were confirmed by PCR re-sequencing. Homozygous and compound heterozygous mutations of FANC genes were identified in all of the patients. The FA subtypes of the patients included FANCA, FANCM and FANCD2. Interestingly, four FA patients harbored multiple mutations in at least two FA genes, and some of these mutations have not been previously reported. These patients' clinical manifestations were vastly different from each other, as were their treatment responses to androstanazol and prednisone. This finding suggests that heterozygous mutation(s) in FA genes could also have diverse biological and/or pathophysiological effects on FA patients or FA gene carriers. Interestingly, we were not able to identify de novo mutations in the genes implicated in DNA repair pathways when the sequencing data of patients were compared with those of their parents. Our results indicate that Chinese FA patients and carriers might have higher and more complex mutation rates in FANC genes than have been conventionally recognized. Testing of the fifteen FANC genes in FA patients and their family members should be a regular clinical practice to determine the optimal care for the individual patient, to counsel the family and to obtain a better understanding of FA pathophysiology.

  16. Assessing pooled BAC and whole genome shotgun strategies for assembly of complex genomes

    PubMed Central

    2011-01-01

    Background We investigate if pooling BAC clones and sequencing the pools can provide for more accurate assembly of genome sequences than the "whole genome shotgun" (WGS) approach. Furthermore, we quantify this accuracy increase. We compare the pooled BAC and WGS approaches using in silico simulations. Standard measures of assembly quality focus on assembly size and fragmentation, which are desirable for large whole genome assemblies. We propose additional measures enabling easy and visual comparison of assembly quality, such as rearrangements and redundant sequence content, relative to the known target sequence. Results The best assembly quality scores were obtained using 454 coverage of 15× linear and 5× paired (3kb insert size) reads (15L-5P) on Arabidopsis. This regime gave similarly good results on four additional plant genomes of very different GC and repeat contents. BAC pooling improved assembly scores over WGS assembly, coverage and redundancy scores improving the most. Conclusions BAC pooling works better than WGS, however, both require a physical map to order the scaffolds. Pool sizes up to 12Mbp work well, suggesting this pooling density to be effective in medium-scale re-sequencing applications such as targeted sequencing of QTL intervals for candidate gene discovery. Assuming the current Roche/454 Titanium sequencing limitations, a 12 Mbp region could be re-sequenced with a full plate of linear reads and a half plate of paired-end reads, yielding 15L-5P coverage after read pre-processing. Our simulation suggests that massively over-sequencing may not improve accuracy. Our scoring measures can be used generally to evaluate and compare results of simulated genome assemblies. PMID:21496274

  17. Novel GREM1 Variations in Sub-Saharan African Patients With Cleft Lip and/or Cleft Palate.

    PubMed

    Gowans, Lord Jephthah Joojo; Oseni, Ganiyu; Mossey, Peter A; Adeyemo, Wasiu Lanre; Eshete, Mekonen A; Busch, Tamara D; Donkor, Peter; Obiri-Yeboah, Solomon; Plange-Rhule, Gyikua; Oti, Alexander A; Owais, Arwa; Olaitan, Peter B; Aregbesola, Babatunde S; Oginni, Fadekemi O; Bello, Seidu A; Audu, Rosemary; Onwuamah, Chika; Agbenorku, Pius; Ogunlewe, Mobolanle O; Abdur-Rahman, Lukman O; Marazita, Mary L; Adeyemo, A A; Murray, Jeffrey C; Butali, Azeez

    2018-05-01

    Cleft lip and/or cleft palate (CL/P) are congenital anomalies of the face and have multifactorial etiology, with both environmental and genetic risk factors playing crucial roles. Though at least 40 loci have attained genomewide significant association with nonsyndromic CL/P, these loci largely reside in noncoding regions of the human genome, and subsequent resequencing studies of neighboring candidate genes have revealed only a limited number of etiologic coding variants. The present study was conducted to identify etiologic coding variants in GREM1, a locus that has been shown to be largely associated with cleft of both lip and soft palate. We resequenced DNA from 397 sub-Saharan Africans with CL/P and 192 controls using Sanger sequencing. Following analyses of the sequence data, we observed 2 novel coding variants in GREM1. These variants were not found in the 192 African controls and have never been previously reported in any public genetic variant database that includes more than 5000 combined African and African American controls or from the CL/P literature. The novel variants include p.Pro164Ser in an individual with soft palate cleft only and p.Gly61Asp in an individual with bilateral cleft lip and palate. The proband with the p.Gly61Asp GREM1 variant is a van der Woude (VWS) case who also has an etiologic variant in IRF6 gene. Our study demonstrated that there is low number of etiologic coding variants in GREM1, confirming earlier suggestions that variants in regulatory elements may largely account for the association between this locus and CL/P.

  18. Clinical significance of BRAF non-V600E mutations on the therapeutic effects of anti-EGFR monoclonal antibody treatment in patients with pretreated metastatic colorectal cancer: the Biomarker Research for anti-EGFR monoclonal Antibodies by Comprehensive Cancer genomics (BREAC) study.

    PubMed

    Shinozaki, Eiji; Yoshino, Takayuki; Yamazaki, Kentaro; Muro, Kei; Yamaguchi, Kensei; Nishina, Tomohiro; Yuki, Satoshi; Shitara, Kohei; Bando, Hideaki; Mimaki, Sachiyo; Nakai, Chikako; Matsushima, Koutatsu; Suzuki, Yutaka; Akagi, Kiwamu; Yamanaka, Takeharu; Nomura, Shogo; Fujii, Satoshi; Esumi, Hiroyasu; Sugiyama, Masaya; Nishida, Nao; Mizokami, Masashi; Koh, Yasuhiro; Abe, Yukiko; Ohtsu, Atsushi; Tsuchihara, Katsuya

    2017-11-07

    Patients with BRAF V600E -mutated metastatic colorectal cancer (mCRC) have a poorer prognosis as well as resistance to anti-EGFR antibodies. However, it is unclear whether BRAF mutations other than BRAF V600E (BRAF non-V600E mutations) contribute to anti-EGFR antibody resistance. This study was composed of exploratory and inference cohorts. Candidate biomarkers identified by whole exome sequencing from super-responders and nonresponders in the exploratory cohort were validated by targeted resequencing for patients who received anti-EGFR antibody in the inference cohort. In the exploratory cohort, 31 candidate biomarkers, including KRAS/NRAS/BRAF mutations, were identified. Targeted resequencing of 150 patients in the inference cohort revealed 40 patients with RAS (26.7%), 9 patients with BRAF V600E (6.0%), and 7 patients with BRAF non-V600E mutations (4.7%), respectively. The response rates in RAS, BRAF V600E , and BRAF non-V600E were lower than those in RAS/BRAF wild-type (2.5%, 0%, and 0% vs 31.9%). The median PFS in BRAF non-V600E mutations was 2.4 months, similar to that in RAS or BRAF V600E mutations (2.1 and 1.6 months) but significantly worse than that in wild-type RAS/BRAF (5.9 months). Although BRAF non-V600E mutations identified were a rare and unestablished molecular subtype, certain BRAF non-V600E mutations might contribute to a lesser benefit of anti-EGFR monoclonal antibody treatment.

  19. Genome resequencing and transcriptome profiling reveal structural diversity and expression patterns of constitutive disease resistance genes in Huanglongbing-tolerant Poncirus trifoliata and its hybrids

    PubMed Central

    Rawat, Nidhi; Kumar, Brajendra; Albrecht, Ute; Du, Dongliang; Huang, Ming; Yu, Qibin; Zhang, Yi; Duan, Yong-Ping; Bowman, Kim D; Gmitter, Fred G; Deng, Zhanao

    2017-01-01

    Huanglongbing (HLB) is the most destructive bacterial disease of citrus worldwide. While most citrus varieties are susceptible to HLB, Poncirus trifoliata, a close relative of Citrus, and some of its hybrids with Citrus are tolerant to HLB. No specific HLB tolerance genes have been identified in P. trifoliata but recent studies have shown that constitutive disease resistance (CDR) genes were expressed at much higher levels in HLB-tolerant Poncirus hybrids and the expression of CDR genes was modulated by Candidatus Liberibacter asiaticus (CLas), the pathogen of HLB. The current study was undertaken to mine and characterize the CDR gene family in Citrus and Poncirus and to understand its association with HLB tolerance in Poncirus. We identified 17 CDR genes in two citrus genomes, deduced their structures, and investigated their phylogenetic relationships. We revealed that the expansion of the CDR family in Citrus seems to be due to segmental and tandem duplication events. Through genome resequencing and transcriptome sequencing, we identified eight CDR genes in the Poncirus genome (PtCDR1-PtCDR8). The number of SNPs was the highest in PtCDR2 and the lowest in PtCDR7. Most of the deletion and insertion events were observed in the UTR regions of Citrus and Poncirus CDR genes. PtCDR2 and PtCDR8 were in abundance in the leaf transcriptomes of two HLB-tolerant Poncirus genotypes and were also upregulated in HLB-tolerant, Poncirus hybrids as revealed by real-time PCR analysis. These two CDR genes seem to be good candidate genes for future studies of their role in citrus-CLas interactions. PMID:29152310

  20. Analysis of Genes Involved in Body Weight Regulation by Targeted Re-Sequencing.

    PubMed

    Volckmar, Anna-Lena; Han, Chung Ting; Pütter, Carolin; Haas, Stefan; Vogel, Carla I G; Knoll, Nadja; Struve, Christoph; Göbel, Maria; Haas, Katharina; Herrfurth, Nikolas; Jarick, Ivonne; Grallert, Harald; Schürmann, Annette; Al-Hasani, Hadi; Hebebrand, Johannes; Sauer, Sascha; Hinney, Anke

    2016-01-01

    Genes involved in body weight regulation that were previously investigated in genome-wide association studies (GWAS) and in animal models were target-enriched followed by massive parallel next generation sequencing. We enriched and re-sequenced continuous genomic regions comprising FTO, MC4R, TMEM18, SDCCAG8, TKNS, MSRA and TBC1D1 in a screening sample of 196 extremely obese children and adolescents with age and sex specific body mass index (BMI) ≥ 99th percentile and 176 lean adults (BMI ≤ 15th percentile). 22 variants were confirmed by Sanger sequencing. Genotyping was performed in up to 705 independent obesity trios (extremely obese child and both parents), 243 extremely obese cases and 261 lean adults. We detected 20 different non-synonymous variants, one frame shift and one nonsense mutation in the 7 continuous genomic regions in study groups of different weight extremes. For SNP Arg695Cys (rs58983546) in TBC1D1 we detected nominal association with obesity (pTDT = 0.03 in 705 trios). Eleven of the variants were rare, thus were only detected heterozygously in up to ten individual(s) of the complete screening sample of 372 individuals. Two of them (in FTO and MSRA) were found in lean individuals, nine in extremely obese. In silico analyses of the 11 variants did not reveal functional implications for the mutations. Concordant with our hypothesis we detected a rare variant that potentially leads to loss of FTO function in a lean individual. For TBC1D1, in contrary to our hypothesis, the loss of function variant (Arg443Stop) was found in an obese individual. Functional in vitro studies are warranted.

  1. A statistical method for the detection of variants from next-generation resequencing of DNA pools.

    PubMed

    Bansal, Vikas

    2010-06-15

    Next-generation sequencing technologies have enabled the sequencing of several human genomes in their entirety. However, the routine resequencing of complete genomes remains infeasible. The massive capacity of next-generation sequencers can be harnessed for sequencing specific genomic regions in hundreds to thousands of individuals. Sequencing-based association studies are currently limited by the low level of multiplexing offered by sequencing platforms. Pooled sequencing represents a cost-effective approach for studying rare variants in large populations. To utilize the power of DNA pooling, it is important to accurately identify sequence variants from pooled sequencing data. Detection of rare variants from pooled sequencing represents a different challenge than detection of variants from individual sequencing. We describe a novel statistical approach, CRISP [Comprehensive Read analysis for Identification of Single Nucleotide Polymorphisms (SNPs) from Pooled sequencing] that is able to identify both rare and common variants by using two approaches: (i) comparing the distribution of allele counts across multiple pools using contingency tables and (ii) evaluating the probability of observing multiple non-reference base calls due to sequencing errors alone. Information about the distribution of reads between the forward and reverse strands and the size of the pools is also incorporated within this framework to filter out false variants. Validation of CRISP on two separate pooled sequencing datasets generated using the Illumina Genome Analyzer demonstrates that it can detect 80-85% of SNPs identified using individual sequencing while achieving a low false discovery rate (3-5%). Comparison with previous methods for pooled SNP detection demonstrates the significantly lower false positive and false negative rates for CRISP. Implementation of this method is available at http://polymorphism.scripps.edu/~vbansal/software/CRISP/.

  2. Noninvasive prenatal testing for Wilson disease by use of circulating single-molecule amplification and resequencing technology (cSMART).

    PubMed

    Lv, Weigang; Wei, Xianda; Guo, Ruolan; Liu, Qin; Zheng, Yu; Chang, Jiazhen; Bai, Ting; Li, Haoxian; Zhang, Jianguang; Song, Zhuo; Cram, David S; Liang, Desheng; Wu, Lingqian

    2015-01-01

    Noninvasive prenatal testing (NIPT) for monogenic diseases by use of PCR-based strategies requires precise quantification of mutant fetal alleles circulating in the maternal plasma. The study describes the development and validation of a novel assay termed circulating single-molecule amplification and resequencing technology (cSMART) for counting single allelic molecules in plasma. Here we demonstrate the suitability of cSMART for NIPT, with Wilson Disease (WD) as proof of concept. We used Sanger and whole-exome sequencing to identify familial ATP7B (ATPase, Cu(++) transporting, β polypeptide) gene mutations. For cSMART, single molecules were tagged with unique barcodes and circularized, and alleles were targeted and replicated by inverse PCR. The unique single allelic molecules were identified by sequencing and counted, and the percentage of mutant alleles in the original maternal plasma sample was used to determine fetal genotypes. Four families with WD pedigrees consented to the study. Using Sanger and whole-exome sequencing, we mapped the pathogenic ATP7B mutations in each pedigree and confirmed the proband's original diagnosis of WD. After validation of cSMART with defined plasma models mimicking fetal inheritance of paternal, maternal, or both parental mutant alleles, we retrospectively showed in second pregnancies that the fetal genotypes assigned by invasive testing and NIPT were concordant. We developed a reliable and accurate NIPT assay that correctly diagnosed the fetal genotypes in 4 pregnancies at risk for WD. This novel technology has potential as a universal strategy for NIPT of other monogenic disorders, since it requires only knowledge of the parental pathogenic mutations. © 2014 American Association for Clinical Chemistry.

  3. Common variants at the promoter region of the APOM confer a risk of rheumatoid arthritis

    PubMed Central

    Hu, Hae-Jin; Jin, Eun-Heui; Yim, Seon-Hee; Yang, So-Young; Jung, Seung-Hyun; Shin, Seung-Hun; Kim, Wan-Uk; Shim, Seung-Cheol; Kim, Tai-Gyu

    2011-01-01

    Although the genetic component in the etiology of rheumatoid arthritis (RA) has been consistently suggested, many novel genetic loci remain to uncover. To identify RA risk loci, we performed a genome-wide association study (GWAS) with 100 RA cases and 600 controls using Affymetrix SNP array 5.0. The candidate risk locus (APOM gene) was re-sequenced to discover novel promoter and coding variants in a group of the subjects. Replication was performed with the independent case-control set comprising of 578 RAs and 711 controls. Through GWAS, we identified a novel SNP associated with RA at the APOM gene in the MHC class III region on 6p21.33 (rs805297, odds ratio (OR) = 2.28, P = 5.20 × 10-7). Three more polymorphisms were identified at the promoter region of the APOM by the re-sequencing. For the replication, we genotyped the four SNP loci in the independent case-control set. The association of rs805297 identified by GWAS was successfully replicated (OR = 1.40, P = 6.65 × 10-5). The association became more significant in the combined analysis of discovery and replication sets (OR = 1.56, P = 2.73 ± 10-10). The individuals with the rs805297 risk allele (A) at the promoter region showed a significantly lower level of APOM expression compared with those with the protective allele (C) homozygote. In the logistic regressions by the phenotype status, the homozygote risk genotype (A/A) consistently showed higher ORs than the heterozygote one (A/C) for the phenotype-positive RAs. These results indicate that APOM promoter polymorphisms are significantly associated with the susceptibility to RA. PMID:21844665

  4. Estimating population genetic parameters and comparing model goodness-of-fit using DNA sequences with error

    PubMed Central

    Liu, Xiaoming; Fu, Yun-Xin; Maxwell, Taylor J.; Boerwinkle, Eric

    2010-01-01

    It is known that sequencing error can bias estimation of evolutionary or population genetic parameters. This problem is more prominent in deep resequencing studies because of their large sample size n, and a higher probability of error at each nucleotide site. We propose a new method based on the composite likelihood of the observed SNP configurations to infer population mutation rate θ = 4Neμ, population exponential growth rate R, and error rate ɛ, simultaneously. Using simulation, we show the combined effects of the parameters, θ, n, ɛ, and R on the accuracy of parameter estimation. We compared our maximum composite likelihood estimator (MCLE) of θ with other θ estimators that take into account the error. The results show the MCLE performs well when the sample size is large or the error rate is high. Using parametric bootstrap, composite likelihood can also be used as a statistic for testing the model goodness-of-fit of the observed DNA sequences. The MCLE method is applied to sequence data on the ANGPTL4 gene in 1832 African American and 1045 European American individuals. PMID:19952140

  5. Clonal evolution in relapsed and refractory diffuse large B-cell lymphoma is characterized by high dynamics of subclones.

    PubMed

    Melchardt, Thomas; Hufnagl, Clemens; Weinstock, David M; Kopp, Nadja; Neureiter, Daniel; Tränkenschuh, Wolfgang; Hackl, Hubert; Weiss, Lukas; Rinnerthaler, Gabriel; Hartmann, Tanja N; Greil, Richard; Weigert, Oliver; Egle, Alexander

    2016-08-09

    Little information is available about the role of certain mutations for clonal evolution and the clinical outcome during relapse in diffuse large B-cell lymphoma (DLBCL). Therefore, we analyzed formalin-fixed-paraffin-embedded tumor samples from first diagnosis, relapsed or refractory disease from 28 patients using next-generation sequencing of the exons of 104 coding genes. Non-synonymous mutations were present in 74 of the 104 genes tested. Primary tumor samples showed a median of 8 non-synonymous mutations (range: 0-24) with the used gene set. Lower numbers of non-synonymous mutations in the primary tumor were associated with a better median OS compared with higher numbers (28 versus 15 months, p=0.031). We observed three patterns of clonal evolution during relapse of disease: large global change, subclonal selection and no or minimal change possibly suggesting preprogrammed resistance. We conclude that targeted re-sequencing is a feasible and informative approach to characterize the molecular pattern of relapse and it creates novel insights into the role of dynamics of individual genes.

  6. Genetic variations associated with six-white-point coat pigmentation in Diannan small-ear pigs

    PubMed Central

    Lü, Meng-Die; Han, Xu-Man; Ma, Yun-Fei; Irwin, David M.; Gao, Yun; Deng, Jia-Kun; Adeola, Adeniyi C.; Xie, Hai-Bing; Zhang, Ya-Ping

    2016-01-01

    A common phenotypic difference among domestic animals is variation in coat color. Six-white-point is a pigmentation pattern observed in varying pig breeds, which seems to have evolved through several different mechanistic pathways. Herein, we re-sequenced whole genomes of 31 Diannan small-ear pigs from China and found that the six-white-point coat color in Diannan small-ear pigs is likely regulated by polygenic loci, rather than by the MC1R locus. Strong associations were observed at three loci (EDNRB, CNTLN, and PINK1), which explain about 20 percent of the total coat color variance in the Diannan small-ear pigs. We found a mutation that is highly differentiated between six-white-point and black Diannan small-ear pigs, which is located in a conserved noncoding sequence upstream of the EDNRB gene and is a putative binding site of the CEBPB protein. This study advances our understanding of coat color evolution in Diannan small-ear pigs and expands our traditional knowledge of coat color being a monogenic trait. PMID:27270507

  7. Signatures of selection in tilapia revealed by whole genome resequencing.

    PubMed

    Xia, Jun Hong; Bai, Zhiyi; Meng, Zining; Zhang, Yong; Wang, Le; Liu, Feng; Jing, Wu; Wan, Zi Yi; Li, Jiale; Lin, Haoran; Yue, Gen Hua

    2015-09-16

    Natural selection and selective breeding for genetic improvement have left detectable signatures within the genome of a species. Identification of selection signatures is important in evolutionary biology and for detecting genes that facilitate to accelerate genetic improvement. However, selection signatures, including artificial selection and natural selection, have only been identified at the whole genome level in several genetically improved fish species. Tilapia is one of the most important genetically improved fish species in the world. Using next-generation sequencing, we sequenced the genomes of 47 tilapia individuals. We identified a total of 1.43 million high-quality SNPs and found that the LD block sizes ranged from 10-100 kb in tilapia. We detected over a hundred putative selective sweep regions in each line of tilapia. Most selection signatures were located in non-coding regions of the tilapia genome. The Wnt signaling, gonadotropin-releasing hormone receptor and integrin signaling pathways were under positive selection in all improved tilapia lines. Our study provides a genome-wide map of genetic variation and selection footprints in tilapia, which could be important for genetic studies and accelerating genetic improvement of tilapia.

  8. The Interplay between Natural Selection and Susceptibility to Melanoma on Allele 374F of SLC45A2 Gene in a South European Population

    PubMed Central

    López, Saioa; García, Óscar; Yurrebaso, Iñaki; Flores, Carlos; Acosta-Herrera, Marialbert; Chen, Hua; Gardeazabal, Jesús; Careaga, Jesús María; Boyano, María Dolores; Sánchez, Ana; Ratón-Nieto, Juan Antonio; Sevilla, Arrate; Smith-Zubiaga, Isabel; de Galdeano, Alicia García; Martinez-Cadenas, Conrado; Izagirre, Neskuts; de la Rúa, Concepción; Alonso, Santos

    2014-01-01

    We aimed to study the selective pressures interacting on SLC45A2 to investigate the interplay between selection and susceptibility to disease. Thus, we enrolled 500 volunteers from a geographically limited population (Basques from the North of Spain) and by resequencing the whole coding region and intron 5 of the 34 most and the 34 least pigmented individuals according to the reflectance distribution, we observed that the polymorphism Leu374Phe (L374F, rs16891982) was statistically associated with skin color variability within this sample. In particular, allele 374F was significantly more frequent among the individuals with lighter skin. Further genotyping an independent set of 558 individuals of a geographically wider population with known ancestry in the Spanish population also revealed that the frequency of L374F was significantly correlated with the incident UV radiation intensity. Selection tests suggest that allele 374F is being positively selected in South Europeans, thus indicating that depigmentation is an adaptive process. Interestingly, by genotyping 119 melanoma samples, we show that this variant is also associated with an increased susceptibility to melanoma in our populations. The ultimate driving force for this adaptation is unknown, but it is compatible with the vitamin D hypothesis. This shows that molecular evolution analysis can be used as a useful technology to predict phenotypic and biomedical consequences in humans. PMID:25093503

  9. A protocol for chemical mutagenesis in Strongyloides ratti.

    PubMed

    Guo, Li; Chang, Zisong; Dieterich, Christoph; Streit, Adrian

    2015-11-01

    Genetic analysis using experimentally induced mutations has been a most valuable tool in the analysis of various organisms. However, genetic analysis of endoparasitic organisms tends to be difficult because of the limited accessibility of the sexually reproducing adults, which are normally located within the host. Nematodes of the genera Strogyloides and Parastrongyloides represent an exception to this because they can form facultative free-living sexually reproducing generations in between parasitic generations. Here we present a protocol for the chemical mutagenesis of Strongyloides ratti. Further we evaluate the feasibility of identifying the induced mutations by whole genome re-sequencing. Copyright © 2015 Elsevier Inc. All rights reserved.

  10. High-throughput sequence alignment using Graphics Processing Units

    PubMed Central

    Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh

    2007-01-01

    Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU. PMID:18070356

  11. Genome-wide patterns of differentiation and spatially varying selection between postglacial recolonization lineages of Populus alba (Salicaceae), a widespread forest tree.

    PubMed

    Stölting, Kai N; Paris, Margot; Meier, Cécile; Heinze, Berthold; Castiglione, Stefano; Bartha, Denes; Lexer, Christian

    2015-08-01

    Studying the divergence continuum in plants is relevant to fundamental and applied biology because of the potential to reveal functionally important genetic variation. In this context, whole-genome sequencing (WGS) provides the necessary rigour for uncovering footprints of selection. We resequenced populations of two divergent phylogeographic lineages of Populus alba (n = 48), thoroughly characterized by microsatellites (n = 317), and scanned their genomes for regions of unusually high allelic differentiation and reduced diversity using > 1.7 million single nucleotide polymorphisms (SNPs) from WGS. Results were confirmed by Sanger sequencing. On average, 9134 high-differentiation (≥ 4 standard deviations) outlier SNPs were uncovered between populations, 848 of which were shared by ≥ three replicate comparisons. Annotation revealed that 545 of these were located in 437 predicted genes. Twelve percent of differentiation outlier genome regions exhibited significantly reduced genetic diversity. Gene ontology (GO) searches were successful for 327 high-differentiation genes, and these were enriched for 63 GO terms. Our results provide a snapshot of the roles of 'hard selective sweeps' vs divergent selection of standing genetic variation in distinct postglacial recolonization lineages of P. alba. Thus, this study adds to our understanding of the mechanisms responsible for the origin of functionally relevant variation in temperate trees. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.

  12. Genomic-assisted haplotype analysis and the development of high-throughput SNP markers for salinity tolerance in soybean

    PubMed Central

    Patil, Gunvant; Do, Tuyen; Vuong, Tri D.; Valliyodan, Babu; Lee, Jeong-Dong; Chaudhary, Juhi; Shannon, J. Grover; Nguyen, Henry T.

    2016-01-01

    Soil salinity is a limiting factor of crop yield. The soybean is sensitive to soil salinity, and a dominant gene, Glyma03g32900 is primarily responsible for salt-tolerance. The identification of high throughput and robust markers as well as the deployment of salt-tolerant cultivars are effective approaches to minimize yield loss under saline conditions. We utilized high quality (15x) whole-genome resequencing (WGRS) on 106 diverse soybean lines and identified three major structural variants and allelic variation in the promoter and genic regions of the GmCHX1 gene. The discovery of single nucleotide polymorphisms (SNPs) associated with structural variants facilitated the design of six KASPar assays. Additionally, haplotype analysis and pedigree tracking of 93 U.S. ancestral lines were performed using publically available WGRS datasets. Identified SNP markers were validated, and a strong correlation was observed between the genotype and salt treatment phenotype (leaf scorch, chlorophyll content and Na+ accumulation) using a panel of 104 soybean lines and, an interspecific bi-parental population (F8) from PI483463 x Hutcheson. These markers precisely identified salt-tolerant/sensitive genotypes (>91%), and different structural-variants (>98%). These SNP assays, supported by accurate phenotyping, haplotype analyses and pedigree tracking information, will accelerate marker-assisted selection programs to enhance the development of salt-tolerant soybean cultivars. PMID:26781337

  13. Comparative genome analysis to identify SNPs associated with high oleic acid and elevated protein content in soybean.

    PubMed

    Kulkarni, Krishnanand P; Patil, Gunvant; Valliyodan, Babu; Vuong, Tri D; Shannon, J Grover; Nguyen, Henry T; Lee, Jeong-Dong

    2018-03-01

    The objective of this study was to determine the genetic relationship between the oleic acid and protein content. The genotypes having high oleic acid and elevated protein (HOEP) content were crossed with five elite lines having normal oleic acid and average protein (NOAP) content. The selected accessions were grown at six environments in three different locations and phenotyped for protein, oil, and fatty acid components. The mean protein content of parents, HOEP, and NOAP lines was 34.6%, 38%, and 34.9%, respectively. The oleic acid concentration of parents, HOEP, and NOAP lines was 21.7%, 80.5%, and 20.8%, respectively. The HOEP plants carried both FAD2-1A (S117N) and FAD2-1B (P137R) mutant alleles contributing to the high oleic acid phenotype. Comparative genome analysis using whole-genome resequencing data identified six genes having single nucleotide polymorphism (SNP) significantly associated with the traits analyzed. A single SNP in the putative gene Glyma.10G275800 was associated with the elevated protein content, and palmitic, oleic, and linoleic acids. The genes from the marker intervals of previously identified QTL did not carry SNPs associated with protein content and fatty acid composition in the lines used in this study, indicating that all the genes except Glyma.10G278000 may be the new genes associated with the respective traits.

  14. PrimerSuite: A High-Throughput Web-Based Primer Design Program for Multiplex Bisulfite PCR.

    PubMed

    Lu, Jennifer; Johnston, Andrew; Berichon, Philippe; Ru, Ke-Lin; Korbie, Darren; Trau, Matt

    2017-01-24

    The analysis of DNA methylation at CpG dinucleotides has become a major research focus due to its regulatory role in numerous biological processes, but the requisite need for assays which amplify bisulfite-converted DNA represents a major bottleneck due to the unique design constraints imposed on bisulfite-PCR primers. Moreover, a review of the literature indicated no available software solutions which accommodated both high-throughput primer design, support for multiplex amplification assays, and primer-dimer prediction. In response, the tri-modular software package PrimerSuite was developed to support bisulfite multiplex PCR applications. This software was constructed to (i) design bisulfite primers against multiple regions simultaneously (PrimerSuite), (ii) screen for primer-primer dimerizing artefacts (PrimerDimer), and (iii) support multiplex PCR assays (PrimerPlex). Moreover, a major focus in the development of this software package was the emphasis on extensive empirical validation, and over 1300 unique primer pairs have been successfully designed and screened, with over 94% of them producing amplicons of the expected size, and an average mapping efficiency of 93% when screened using bisulfite multiplex resequencing. The potential use of the software in other bisulfite-based applications such as methylation-specific PCR is under consideration for future updates. This resource is freely available for use at PrimerSuite website (www.primer-suite.com).

  15. Larva-mediated chalkbrood resistance-associated single nucleotide polymorphism markers in the honey bee Apis mellifera.

    PubMed

    Liu, Y; Yan, L; Li, Z; Huang, W-F; Pokhrel, S; Liu, X; Su, S

    2016-06-01

    Chalkbrood is a disease affecting honey bees that seriously impairs brood growth and productivity of diseased colonies. Although honey bees can develop chalkbrood resistance naturally, the details underlying the mechanisms of resistance are not fully understood, and no easy method is currently available for selecting and breeding resistant bees. Finding the genes involved in the development of resistance and identifying single nucleotide polymorphisms (SNPs) that can be used as molecular markers of resistance is therefore a high priority. We conducted genome resequencing to compare resistant (Res) and susceptible (Sus) larvae that were selected following in vitro chalkbrood inoculation. Twelve genomic libraries, including 14.4 Gb of sequence data, were analysed using SNP-finding algorithms. Unique SNPs derived from chromosomes 2 and 11 were analysed in this study. SNPs from resistant individuals were confirmed by PCR and Sanger sequencing using in vitro reared larvae and resistant colonies. We found strong support for an association between the C allele at SNP C2587245T and chalkbrood resistance. SNP C2587245T may be useful as a genetic marker for the selection of chalkbrood resistance and high royal jelly production honey bee lines, thereby helping to minimize the negative effects of chalkbrood on managed honey bees. © 2016 The Royal Entomological Society.

  16. Population resequencing reveals candidate genes associated with salinity adaptation of the Pacific oyster Crassostrea gigas.

    PubMed

    She, Zhicai; Li, Li; Meng, Jie; Jia, Zhen; Que, Huayong; Zhang, Guofan

    2018-06-06

    The Pacific oyster Crassostrea gigas is an important cultivated shellfish. As a euryhaline species, it has evolved adaptive mechanisms responding to the complex and changeable intertidal environment that it inhabits. To investigate the genetic basis of this salinity adaptation mechanism, we conducted a genome-wide association study using phenotypically differentiated populations (hyposalinity and hypersalinity adaptation populations, and control population), and confirmed our results using an independent population, high-resolution melting, and mRNA expression analysis. For the hyposalinity adaptation, we determined 24 genes, including Cg_CLCN7 (chloride channel protein 7) and Cg_AP1 (apoptosis 1 inhibitor), involved in the ion/water channel and transporter mechanisms, free amino acid and reactive oxygen species metabolism, immune responses, and chemical defence. Three SNPs located on these two genes were significantly differentiated between groups, as was Cg_CLCN7. For the hypersalinity adaptation, the biological process for positive regulating the developmental process was enriched. Enriched gene functions were focused on transcriptional regulation, signal transduction, and cell growth and differentiation, including calmodulin (Cg_CaM) and ficolin-2 (Cg_FCN2). These genes and polymorphisms possibly play an important role in oyster hyposalinity and hypersalinity adaptation. They not only further our understanding of salinity adaptation mechanisms but also provide markers for highly adaptable oyster strains suitable for breeding.

  17. Surfactant Protein-C Promoter Variants Associated with Neonatal Respiratory Distress Syndrome Reduce Transcription

    PubMed Central

    Wambach, Jennifer A.; Yang, Ping; Wegner, Daniel J.; An, Ping; Hackett, Brian P.; Cole, F. S.; Hamvas, Aaron

    2010-01-01

    Dominant mutations in coding regions of the surfactant protein-C gene (SFTPC) cause respiratory distress syndrome (RDS) in infants. However, the contribution of variants in noncoding regions of SFTPC to pulmonary phenotypes is unknown. Using a case-control group of infants ≥34 weeks gestation (n=538), we used complete resequencing of SFTPC and its promoter, genotyping, and logistic regression to identify 80 single nucleotide polymorphisms (SNPs). Three promoter SNPs were statistically associated with neonatal RDS among European descent infants. To assess the transcriptional effects of these three promoter SNPs, we selectively mutated the SFTPC promoter and performed transient transfection using MLE-15 cells and a firefly luciferase reporter vector. Each promoter SNP decreased SFTPC transcription. The combination of two variants in high linkage dysequilibrium also decreased SFTPC transcription. In silico evaluation of transcription factor binding demonstrated that the rare allele at g.-1167 disrupts a SOX (SRY-related high mobility group box) consensus motif and introduces a GATA-1 site, at g.-2385 removes a MZF-1 (myeloid zinc finger) binding site, and at g.-1647 removes a potential methylation site. This combined statistical, in vitro, and in silico approach suggests that reduced SFTPC transcription contributes to the genetic risk for neonatal RDS in developmentally susceptible infants. PMID:20539253

  18. High-resolution genetic maps of Eucalyptus improve Eucalyptus grandis genome assembly.

    PubMed

    Bartholomé, Jérôme; Mandrou, Eric; Mabiala, André; Jenkins, Jerry; Nabihoudine, Ibouniyamine; Klopp, Christophe; Schmutz, Jeremy; Plomion, Christophe; Gion, Jean-Marc

    2015-06-01

    Genetic maps are key tools in genetic research as they constitute the framework for many applications, such as quantitative trait locus analysis, and support the assembly of genome sequences. The resequencing of the two parents of a cross between Eucalyptus urophylla and Eucalyptus grandis was used to design a single nucleotide polymorphism (SNP) array of 6000 markers evenly distributed along the E. grandis genome. The genotyping of 1025 offspring enabled the construction of two high-resolution genetic maps containing 1832 and 1773 markers with an average marker interval of 0.45 and 0.5 cM for E. grandis and E. urophylla, respectively. The comparison between genetic maps and the reference genome highlighted 85% of collinear regions. A total of 43 noncollinear regions and 13 nonsynthetic regions were detected and corrected in the new genome assembly. This improved version contains 4943 scaffolds totalling 691.3 Mb of which 88.6% were captured by the 11 chromosomes. The mapping data were also used to investigate the effect of population size and number of markers on linkage mapping accuracy. This study provides the most reliable linkage maps for Eucalyptus and version 2.0 of the E. grandis genome. © 2014 CIRAD. New Phytologist © 2014 New Phytologist Trust.

  19. Genome-wide patterns of copy number variation in the Chinese yak genome.

    PubMed

    Zhang, Xiao; Wang, Kun; Wang, Lizhong; Yang, Yongzhi; Ni, Zhengqiang; Xie, Xiuyue; Shao, Xuemin; Han, Jin; Wan, Dongshi; Qiu, Qiang

    2016-05-20

    Copy number variation (CNV) represents an important source of genetic divergence that can produce drastic phenotypic differences and may therefore be subject to selection during domestication and environmental adaptation. To investigate the evolutionary dynamics of CNV in the yak genome, we used a read depth approach to detect CNV based on genome resequencing data from 14 wild and 65 domestic yaks and determined CNV regions related to domestication and adaptations to high-altitude. We identified 2,634 CNV regions (CNVRs) comprising a total of 153 megabases (5.7 % of the yak genome) and 3,879 overlapping annotated genes. Comparison between domestic and wild yak populations identified 121 potentially selected CNVRs, harboring genes related to neuronal development, reproduction, nutrition and energy metabolism. In addition, we found 85 CNVRs that are significantly different between domestic yak living in high- and low-altitude areas, including three genes related to hypoxia response and six related to immune defense. This analysis shows that genic CNVs may play an important role in phenotypic changes during yak domestication and adaptation to life at high-altitude. We present the first refined CNV map for yak along with comprehensive genomic analysis of yak CNV. Our results provide new insights into the genetic basis of yak domestication and adaptation to living in a high-altitude environment, as well as a valuable genetic resource that will facilitate future CNV association studies of important traits in yak and other bovid species.

  20. Genome sequencing of the high oil crop sesame provides insight into oil biosynthesis

    PubMed Central

    2014-01-01

    Background Sesame, Sesamum indicum L., is considered the queen of oilseeds for its high oil content and quality, and is grown widely in tropical and subtropical areas as an important source of oil and protein. However, the molecular biology of sesame is largely unexplored. Results Here, we report a high-quality genome sequence of sesame assembled de novo with a contig N50 of 52.2 kb and a scaffold N50 of 2.1 Mb, containing an estimated 27,148 genes. The results reveal novel, independent whole genome duplication and the absence of the Toll/interleukin-1 receptor domain in resistance genes. Candidate genes and oil biosynthetic pathways contributing to high oil content were discovered by comparative genomic and transcriptomic analyses. These revealed the expansion of type 1 lipid transfer genes by tandem duplication, the contraction of lipid degradation genes, and the differential expression of essential genes in the triacylglycerol biosynthesis pathway, particularly in the early stage of seed development. Resequencing data in 29 sesame accessions from 12 countries suggested that the high genetic diversity of lipid-related genes might be associated with the wide variation in oil content. Additionally, the results shed light on the pivotal stage of seed development, oil accumulation and potential key genes for sesamin production, an important pharmacological constituent of sesame. Conclusions As an important species from the order Lamiales and a high oil crop, the sesame genome will facilitate future research on the evolution of eudicots, as well as the study of lipid biosynthesis and potential genetic improvement of sesame. PMID:24576357

  1. Genome-wide re-sequencing of multidrug-resistant Mycobacterium leprae Airaku-3.

    PubMed

    Singh, P; Benjak, A; Carat, S; Kai, M; Busso, P; Avanzi, C; Paniz-Mondolfi, A; Peter, C; Harshman, K; Rougemont, J; Matsuoka, M; Cole, S T

    2014-10-01

    Genotyping and molecular characterization of drug resistance mechanisms in Mycobacterium leprae enables disease transmission and drug resistance trends to be monitored. In the present study, we performed genome-wide analysis of Airaku-3, a multidrug-resistant strain with an unknown mechanism of resistance to rifampicin. We identified 12 unique non-synonymous single-nucleotide polymorphisms (SNPs) including two in the transporter-encoding ctpC and ctpI genes. In addition, two SNPs were found that improve the resolution of SNP-based genotyping, particularly for Venezuelan and South East Asian strains of M. leprae. © 2014 The Authors Clinical Microbiology and Infection © 2014 European Society of Clinical Microbiology and Infectious Diseases.

  2. Probable Zoonotic Leprosy in the Southern United States

    PubMed Central

    Truman, Richard W.; Singh, Pushpendra; Sharma, Rahul; Busso, Philippe; Rougemont, Jacques; Paniz-Mondolfi, Alberto; Kapopoulou, Adamandia; Brisse, Sylvain; Scollard, David M.; Gillis, Thomas P.; Cole, Stewart T.

    2011-01-01

    BACKGROUND In the southern region of the United States, such as in Louisiana and Texas, there are autochthonous cases of leprosy among native-born Americans with no history of foreign exposure. In the same region, as well as in Mexico, wild armadillos are infected with Mycobacterium leprae. METHODS Whole-genome resequencing of M. leprae from one wild armadillo and three U.S. patients with leprosy revealed that the infective strains were essentially identical. Comparative genomic analysis of these strains and M. leprae strains from Asia and Brazil identified 51 single-nucleotide polymorphisms and an 11-bp insertion–deletion. We genotyped these polymorphic sites, in combination with 10 variable-number tandem repeats, in M. leprae strains obtained from 33 wild armadillos from five southern states, 50 U.S. outpatients seen at a clinic in Louisiana, and 64 Venezuelan patients, as well as in four foreign reference strains. RESULTS The M. leprae genotype of patients with foreign exposure generally reflected their country of origin or travel history. However, a unique M. leprae genotype (3I-2-v1) was found in 28 of the 33 wild armadillos and 25 of the 39 U.S. patients who resided in areas where exposure to armadillo-borne M. leprae was possible. This genotype has not been reported elsewhere in the world. CONCLUSIONS Wild armadillos and many patients with leprosy in the southern United States are infected with the same strain of M. leprae. Armadillos are a large natural reservoir for M. leprae, and leprosy may be a zoonosis in the region. (Funded by the National Institute of Allergy and Infectious Diseases and others.) PMID:21524213

  3. High throughput SNP discovery and genotyping in hexaploid wheat.

    PubMed

    Rimbert, Hélène; Darrier, Benoît; Navarro, Julien; Kitt, Jonathan; Choulet, Frédéric; Leveugle, Magalie; Duarte, Jorge; Rivière, Nathalie; Eversole, Kellye; Le Gouis, Jacques; Davassi, Alessandro; Balfourier, François; Le Paslier, Marie-Christine; Berard, Aurélie; Brunel, Dominique; Feuillet, Catherine; Poncet, Charles; Sourdille, Pierre; Paux, Etienne

    2018-01-01

    Because of their abundance and their amenability to high-throughput genotyping techniques, Single Nucleotide Polymorphisms (SNPs) are powerful tools for efficient genetics and genomics studies, including characterization of genetic resources, genome-wide association studies and genomic selection. In wheat, most of the previous SNP discovery initiatives targeted the coding fraction, leaving almost 98% of the wheat genome largely unexploited. Here we report on the use of whole-genome resequencing data from eight wheat lines to mine for SNPs in the genic, the repetitive and non-repetitive intergenic fractions of the wheat genome. Eventually, we identified 3.3 million SNPs, 49% being located on the B-genome, 41% on the A-genome and 10% on the D-genome. We also describe the development of the TaBW280K high-throughput genotyping array containing 280,226 SNPs. Performance of this chip was examined by genotyping a set of 96 wheat accessions representing the worldwide diversity. Sixty-nine percent of the SNPs can be efficiently scored, half of them showing a diploid-like clustering. The TaBW280K was proven to be a very efficient tool for diversity analyses, as well as for breeding as it can discriminate between closely related elite varieties. Finally, the TaBW280K array was used to genotype a population derived from a cross between Chinese Spring and Renan, leading to the construction a dense genetic map comprising 83,721 markers. The results described here will provide the wheat community with powerful tools for both basic and applied research.

  4. Genome-Wide Sequence Variation Identification and Floral-Associated Trait Comparisons Based on the Re-sequencing of the ‘Nagafu No. 2’ and ‘Qinguan’ Varieties of Apple (Malus domestica Borkh.)

    PubMed Central

    Xing, Libo; Zhang, Dong; Song, Xiaomin; Weng, Kai; Shen, Yawen; Li, Youmei; Zhao, Caiping; Ma, Juanjuan; An, Na; Han, Mingyu

    2016-01-01

    Apple (Malus domestica Borkh.) is a commercially important fruit worldwide. Detailed information on genomic DNA polymorphisms, which are important for understanding phenotypic traits, is lacking for the apple. We re-sequenced two elite apple varieties, ‘Nagafu No. 2’ and ‘Qinguan,’ which have different characteristics. We identified many genomic variations, including 2,771,129 single nucleotide polymorphisms (SNPs), 82,663 structural variations (SVs), and 1,572,803 insertion/deletions (INDELs) in ‘Nagafu No. 2’ and 2,262,888 SNPs, 63,764 SVs, and 1,294,060 INDELs in ‘Qinguan.’ The ‘SNP,’ ‘INDEL,’ and ‘SV’ distributions were non-random, with variation-rich or -poor regions throughout the genomes. In ‘Nagafu No. 2’ and ‘Qinguan’ there were 171,520 and 147,090 non-synonymous SNPs spanning 23,111 and 21,400 genes, respectively; 3,963 and 3,196 SVs in 3,431 and 2,815 genes, respectively; and 1,834 and 1,451 INDELs in 1,681 and 1,345 genes, respectively. Genetic linkage maps of 190 flowering genes associated with multiple flowering pathways in ‘Nagafu No. 2,’ ‘Qinguan,’ and ‘Golden Delicious,’ identified complex regulatory mechanisms involved in floral induction, flower bud formation, and flowering characteristics, which might reflect the genetic variation of the flowering genes. Expression profiling of key flowering genes in buds and leaves suggested that the photoperiod and autonomous flowering pathways are major contributors to the different floral-associated traits between ‘Nagafu No. 2’ and ‘Qinguan.’ The genome variation data provided a foundation for the further exploration of apple diversity and gene–phenotype relationships, and for future research on molecular breeding to improve apple and related species. PMID:27446138

  5. Sequencing degraded DNA from non-destructively sampled museum specimens for RAD-tagging and low-coverage shotgun phylogenetics.

    PubMed

    Tin, Mandy Man-Ying; Economo, Evan Philip; Mikheyev, Alexander Sergeyevich

    2014-01-01

    Ancient and archival DNA samples are valuable resources for the study of diverse historical processes. In particular, museum specimens provide access to biotas distant in time and space, and can provide insights into ecological and evolutionary changes over time. However, archival specimens are difficult to handle; they are often fragile and irreplaceable, and typically contain only short segments of denatured DNA. Here we present a set of tools for processing such samples for state-of-the-art genetic analysis. First, we report a protocol for minimally destructive DNA extraction of insect museum specimens, which produced sequenceable DNA from all of the samples assayed. The 11 specimens analyzed had fragmented DNA, rarely exceeding 100 bp in length, and could not be amplified by conventional PCR targeting the mitochondrial cytochrome oxidase I gene. Our approach made these samples amenable to analysis with commonly used next-generation sequencing-based molecular analytic tools, including RAD-tagging and shotgun genome re-sequencing. First, we used museum ant specimens from three species, each with its own reference genome, for RAD-tag mapping. Were able to use the degraded DNA sequences, which were sequenced in full, to identify duplicate reads and filter them prior to base calling. Second, we re-sequenced six Hawaiian Drosophila species, with millions of years of divergence, but with only a single available reference genome. Despite a shallow coverage of 0.37 ± 0.42 per base, we could recover a sufficient number of overlapping SNPs to fully resolve the species tree, which was consistent with earlier karyotypic studies, and previous molecular studies, at least in the regions of the tree that these studies could resolve. Although developed for use with degraded DNA, all of these techniques are readily applicable to more recent tissue, and are suitable for liquid handling automation.

  6. An investigation of obesity susceptibility genes in Northern Han Chinese by targeted resequencing.

    PubMed

    Wu, Yili; Wang, Weijing; Jiang, Wenjie; Yao, Jie; Zhang, Dongfeng

    2017-02-01

    Our earlier genome-wide linkage study of body mass index (BMI) showed strong signals from 7q36.3 and 8q21.13. This case-control study set to investigate 2 genomic regions which may harbor variants contributed to development of obesity.We employed targeted resequencing technology to detect single nucleotide polymorphisms (SNPs) in 7q36.3 and 8q21.13 from 16 individuals with obesity. These were compared with 504 East Asians in the 1000 Genomes Project as a reference panel. Linkage disequilibrium (LD) block analysis was performed for the significant SNPs located near the same gene. Genes involved in statistically significant loci were then subject to gene set enrichment analysis (GSEA).The 16 individuals aged between 30 and 60 years with BMI = 33.25 ± 2.22 kg/m. A total of 12,131 genetic variants across all of samples were found. After correcting for multiple testing, 65 SNPs from 25 nearest genes (INSIG1, FABP5, PTPRN2, VIPR2, WDR60, SHH, UBE3C, LMBR1, PAG1, IMPA1, CHMP4, SNX16, BLACE, EN2, CNPY1, LOC100506302, RBM33, LOC389602, LOC285889, LINC01006, NOM1, DNAJB6, LOC101927914, ESYT2, LINC00689) were associated with obesity at significant level q-value ≤ 0.05. LD block analysis showed there were 10 pairs of loci with D' ≥ 0.8 and r ≥ 0.8. GSEA further identified 2 major related gene sets, involving lipid raft and lipid metabolic process, with FDR values <0.12 and <0.4, respectively.Our data are the first documentation of genetic variants in 7q36.3 and 8q21.13 associated with obesity using target capture sequencing and Northern Han Chinese samples. Additional replication and functional studies are merited to validate our findings.

  7. Whole-genome resequencing of honeybee drones to detect genomic selection in a population managed for royal jelly.

    PubMed

    Wragg, David; Marti-Marimon, Maria; Basso, Benjamin; Bidanel, Jean-Pierre; Labarthe, Emmanuelle; Bouchez, Olivier; Le Conte, Yves; Vignal, Alain

    2016-06-03

    Four main evolutionary lineages of A. mellifera have been described including eastern Europe (C) and western and northern Europe (M). Many apiculturists prefer bees from the C lineage due to their docility and high productivity. In France, the routine importation of bees from the C lineage has resulted in the widespread admixture of bees from the M lineage. The haplodiploid nature of the honeybee Apis mellifera, and its small genome size, permits affordable and extensive genomics studies. As a pilot study of a larger project to characterise French honeybee populations, we sequenced 60 drones sampled from two commercial populations managed for the production of honey and royal jelly. Results indicate a C lineage origin, whilst mitochondrial analysis suggests two drones originated from the O lineage. Analysis of heterozygous SNPs identified potential copy number variants near to genes encoding odorant binding proteins and several cytochrome P450 genes. Signatures of selection were detected using the hapFLK haplotype-based method, revealing several regions under putative selection for royal jelly production. The framework developed during this study will be applied to a broader sampling regime, allowing the genetic diversity of French honeybees to be characterised in detail.

  8. Whole-genome resequencing of honeybee drones to detect genomic selection in a population managed for royal jelly

    PubMed Central

    Wragg, David; Marti-Marimon, Maria; Basso, Benjamin; Bidanel, Jean-Pierre; Labarthe, Emmanuelle; Bouchez, Olivier; Le Conte, Yves; Vignal, Alain

    2016-01-01

    Four main evolutionary lineages of A. mellifera have been described including eastern Europe (C) and western and northern Europe (M). Many apiculturists prefer bees from the C lineage due to their docility and high productivity. In France, the routine importation of bees from the C lineage has resulted in the widespread admixture of bees from the M lineage. The haplodiploid nature of the honeybee Apis mellifera, and its small genome size, permits affordable and extensive genomics studies. As a pilot study of a larger project to characterise French honeybee populations, we sequenced 60 drones sampled from two commercial populations managed for the production of honey and royal jelly. Results indicate a C lineage origin, whilst mitochondrial analysis suggests two drones originated from the O lineage. Analysis of heterozygous SNPs identified potential copy number variants near to genes encoding odorant binding proteins and several cytochrome P450 genes. Signatures of selection were detected using the hapFLK haplotype-based method, revealing several regions under putative selection for royal jelly production. The framework developed during this study will be applied to a broader sampling regime, allowing the genetic diversity of French honeybees to be characterised in detail. PMID:27255426

  9. WhopGenome: high-speed access to whole-genome variation and sequence data in R.

    PubMed

    Wittelsbürger, Ulrich; Pfeifer, Bastian; Lercher, Martin J

    2015-02-01

    The statistical programming language R has become a de facto standard for the analysis of many types of biological data, and is well suited for the rapid development of new algorithms. However, variant call data from population-scale resequencing projects are typically too large to be read and processed efficiently with R's built-in I/O capabilities. WhopGenome can efficiently read whole-genome variation data stored in the widely used variant call format (VCF) file format into several R data types. VCF files can be accessed either on local hard drives or on remote servers. WhopGenome can associate variants with annotations such as those available from the UCSC genome browser, and can accelerate the reading process by filtering loci according to user-defined criteria. WhopGenome can also read other Tabix-indexed files and create indices to allow fast selective access to FASTA-formatted sequence files. The WhopGenome R package is available on CRAN at http://cran.r-project.org/web/packages/WhopGenome/. A Bioconductor package has been submitted. lercher@cs.uni-duesseldorf.de. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. Integrated sequence analysis pipeline provides one-stop solution for identifying disease-causing mutations.

    PubMed

    Hu, Hao; Wienker, Thomas F; Musante, Luciana; Kalscheuer, Vera M; Kahrizi, Kimia; Najmabadi, Hossein; Ropers, H Hilger

    2014-12-01

    Next-generation sequencing has greatly accelerated the search for disease-causing defects, but even for experts the data analysis can be a major challenge. To facilitate the data processing in a clinical setting, we have developed a novel medical resequencing analysis pipeline (MERAP). MERAP assesses the quality of sequencing, and has optimized capacity for calling variants, including single-nucleotide variants, insertions and deletions, copy-number variation, and other structural variants. MERAP identifies polymorphic and known causal variants by filtering against public domain databases, and flags nonsynonymous and splice-site changes. MERAP uses a logistic model to estimate the causal likelihood of a given missense variant. MERAP considers the relevant information such as phenotype and interaction with known disease-causing genes. MERAP compares favorably with GATK, one of the widely used tools, because of its higher sensitivity for detecting indels, its easy installation, and its economical use of computational resources. Upon testing more than 1,200 individuals with mutations in known and novel disease genes, MERAP proved highly reliable, as illustrated here for five families with disease-causing variants. We believe that the clinical implementation of MERAP will expedite the diagnostic process of many disease-causing defects. © 2014 WILEY PERIODICALS, INC.

  11. Genome Re-Sequencing of Semi-Wild Soybean Reveals a Complex Soja Population Structure and Deep Introgression

    PubMed Central

    Wu, Sanling; Wang, Ying-Ying; Ye, Chu-Yu; Bai, Xuefei; Li, Zefeng; Yan, Chenghai; Wang, Weidi; Wang, Ziqiang; Shu, Qingyao; Xie, Jiahua; Lee, Suk-Ha; Fan, Longjiang

    2014-01-01

    Semi-wild soybean is a unique type of soybean that retains both wild and domesticated characteristics, which provides an important intermediate type for understanding the evolution of the subgenus Soja population in the Glycine genus. In this study, a semi-wild soybean line (Maliaodou) and a wild line (Lanxi 1) collected from the lower Yangtze regions were deeply sequenced while nine other semi-wild lines were sequenced to a 3-fold genome coverage. Sequence analysis revealed that (1) no independent phylogenetic branch covering all 10 semi-wild lines was observed in the Soja phylogenetic tree; (2) besides two distinct subpopulations of wild and cultivated soybean in the Soja population structure, all semi-wild lines were mixed with some wild lines into a subpopulation rather than an independent one or an intermediate transition type of soybean domestication; (3) high heterozygous rates (0.19–0.49) were observed in several semi-wild lines; and (4) over 100 putative selective regions were identified by selective sweep analysis, including those related to the development of seed size. Our results suggested a hybridization origin for the semi-wild soybean, which makes a complex Soja population structure. PMID:25265539

  12. Genome-wide variation within and between wild and domestic yak.

    PubMed

    Wang, Kun; Hu, Quanjun; Ma, Hui; Wang, Lizhong; Yang, Yongzhi; Luo, Wenchun; Qiu, Qiang

    2014-07-01

    The yak is one of the few animals that can thrive in the harsh environment of the Qinghai-Tibetan Plateau and adjacent Alpine regions. Yak provides essential resources allowing Tibetans to live at high altitudes. However, genetic variation within and between wild and domestic yak remain unknown. Here, we present a genome-wide study of the genetic variation within and between wild and domestic yak. Using next-generation sequencing technology, we resequenced three wild and three domestic yak with a mean of fivefold coverage using our published domestic yak genome as a reference. We identified a total of 8.38 million SNPs (7.14 million novel), 383,241 InDels and 126,352 structural variants between the six yak. We observed higher linkage disequilibrium in domestic yak than in wild yak and a modest but distinct genetic divergence between these two groups. We further identified more than a thousand of potential selected regions (PSRs) for the three domestic yak by scanning the whole genome. These genomic resources can be further used to study genetic diversity and select superior breeds of yak and other bovid species. © 2014 John Wiley & Sons Ltd.

  13. Whole-genome sequencing of cultivated and wild peppers provides insights into Capsicum domestication and specialization

    PubMed Central

    Qin, Cheng; Yu, Changshui; Shen, Yaou; Fang, Xiaodong; Chen, Lang; Min, Jiumeng; Cheng, Jiaowen; Zhao, Shancen; Xu, Meng; Luo, Yong; Yang, Yulan; Wu, Zhiming; Mao, Likai; Wu, Haiyang; Ling-Hu, Changying; Zhou, Huangkai; Lin, Haijian; González-Morales, Sandra; Trejo-Saavedra, Diana L.; Tian, Hao; Tang, Xin; Zhao, Maojun; Huang, Zhiyong; Zhou, Anwei; Yao, Xiaoming; Cui, Junjie; Li, Wenqi; Chen, Zhe; Feng, Yongqiang; Niu, Yongchao; Bi, Shimin; Yang, Xiuwei; Li, Weipeng; Cai, Huimin; Luo, Xirong; Montes-Hernández, Salvador; Leyva-González, Marco A.; Xiong, Zhiqiang; He, Xiujing; Bai, Lijun; Tan, Shu; Tang, Xiangqun; Liu, Dan; Liu, Jinwen; Zhang, Shangxing; Chen, Maoshan; Zhang, Lu; Zhang, Li; Zhang, Yinchao; Liao, Weiqin; Zhang, Yan; Wang, Min; Lv, Xiaodan; Wen, Bo; Liu, Hongjun; Luan, Hemi; Zhang, Yonggang; Yang, Shuang; Wang, Xiaodian; Xu, Jiaohui; Li, Xueqin; Li, Shuaicheng; Wang, Junyi; Palloix, Alain; Bosland, Paul W.; Li, Yingrui; Krogh, Anders; Rivera-Bustamante, Rafael F.; Herrera-Estrella, Luis; Yin, Ye; Yu, Jiping; Hu, Kailin; Zhang, Zhiming

    2014-01-01

    As an economic crop, pepper satisfies people’s spicy taste and has medicinal uses worldwide. To gain a better understanding of Capsicum evolution, domestication, and specialization, we present here the genome sequence of the cultivated pepper Zunla-1 (C. annuum L.) and its wild progenitor Chiltepin (C. annuum var. glabriusculum). We estimate that the pepper genome expanded ∼0.3 Mya (with respect to the genome of other Solanaceae) by a rapid amplification of retrotransposons elements, resulting in a genome comprised of ∼81% repetitive sequences. Approximately 79% of 3.48-Gb scaffolds containing 34,476 protein-coding genes were anchored to chromosomes by a high-density genetic map. Comparison of cultivated and wild pepper genomes with 20 resequencing accessions revealed molecular footprints of artificial selection, providing us with a list of candidate domestication genes. We also found that dosage compensation effect of tandem duplication genes probably contributed to the pungent diversification in pepper. The Capsicum reference genome provides crucial information for the study of not only the evolution of the pepper genome but also, the Solanaceae family, and it will facilitate the establishment of more effective pepper breeding programs. PMID:24591624

  14. Low-pH production of D-lactic acid using newly isolated acid tolerant yeast Pichia kudriavzevii NG7.

    PubMed

    Park, Hyun Joo; Bae, Jung-Hoon; Ko, Hyeok-Jin; Lee, Sun-Hee; Sung, Bong Hyun; Han, Jong-In; Sohn, Jung-Hoon

    2018-06-13

    Lactic acid is a platform chemical for the sustainable production of various materials. To develop a robust yeast platform for low-pH production of D-lactic acid, an acid-tolerant yeast strain was isolated from grape skins and named Pichia kudriavzevii NG7 by ribosomal RNA sequencing. This strain was able to grow at pH 2.0 and 50°C. For the commercial application of P. kudriavzevii NG7 as a lactic acid producer, the ethanol fermentation pathway was redirected to lactic acid by replacing pyruvate decarboxylase 1 gene (PDC1) with D-lactate dehydrogenase gene (D-LDH) derived from Lactobacillus plantarum. To enhance lactic acid tolerance, this engineered strain was adapted to high lactic acid concentrations, and a new transcriptional regulator, PAR1, responsible for acid tolerance, was identified by whole-genome resequencing. The final engineered strain produced 135 g/L and 154 g/L of D-lactic acid with productivity over 3.66 g/L/h at pH 3.6 and 4.16 g/L/h at pH 4.7, respectively. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  15. Discovery, genotyping and characterization of structural variation and novel sequence at single nucleotide resolution from de novo genome assemblies on a population scale.

    PubMed

    Liu, Siyang; Huang, Shujia; Rao, Junhua; Ye, Weijian; Krogh, Anders; Wang, Jun

    2015-01-01

    Comprehensive recognition of genomic variation in one individual is important for understanding disease and developing personalized medication and treatment. Many tools based on DNA re-sequencing exist for identification of single nucleotide polymorphisms, small insertions and deletions (indels) as well as large deletions. However, these approaches consistently display a substantial bias against the recovery of complex structural variants and novel sequence in individual genomes and do not provide interpretation information such as the annotation of ancestral state and formation mechanism. We present a novel approach implemented in a single software package, AsmVar, to discover, genotype and characterize different forms of structural variation and novel sequence from population-scale de novo genome assemblies up to nucleotide resolution. Application of AsmVar to several human de novo genome assemblies captures a wide spectrum of structural variants and novel sequences present in the human population in high sensitivity and specificity. Our method provides a direct solution for investigating structural variants and novel sequences from de novo genome assemblies, facilitating the construction of population-scale pan-genomes. Our study also highlights the usefulness of the de novo assembly strategy for definition of genome structure.

  16. Identification of an EMS-induced causal mutation in a gene required for boron-mediated root development by low-coverage genome re-sequencing in Arabidopsis

    PubMed Central

    Tabata, Ryo; Kamiya, Takehiro; Shigenobu, Shuji; Yamaguchi, Katsushi; Yamada, Masashi; Hasebe, Mitsuyasu; Fujiwara, Toru; Sawa, Shinichiro

    2013-01-01

    Next-generation sequencing (NGS) technologies enable the rapid production of an enormous quantity of sequence data. These powerful new technologies allow the identification of mutations by whole-genome sequencing. However, most reported NGS-based mapping methods, which are based on bulked segregant analysis, are costly and laborious. To address these limitations, we designed a versatile NGS-based mapping method that consists of a combination of low- to medium-coverage multiplex SOLiD (Sequencing by Oligonucleotide Ligation and Detection) and classical genetic rough mapping. Using only low to medium coverage reduces the SOLiD sequencing costs and, since just 10 to 20 mutant F2 plants are required for rough mapping, the operation is simple enough to handle in a laboratory with limited space and funding. As a proof of principle, we successfully applied this method to identify the CTR1, which is involved in boron-mediated root development, from among a population of high boron requiring Arabidopsis thaliana mutants. Our work demonstrates that this NGS-based mapping method is a moderately priced and versatile method that can readily be applied to other model organisms. PMID:23104114

  17. Signatures of selection in tilapia revealed by whole genome resequencing

    PubMed Central

    Hong Xia, Jun; Bai, Zhiyi; Meng, Zining; Zhang, Yong; Wang, Le; Liu, Feng; Jing, Wu; Yi Wan, Zi; Li, Jiale; Lin, Haoran; Hua Yue, Gen

    2015-01-01

    Natural selection and selective breeding for genetic improvement have left detectable signatures within the genome of a species. Identification of selection signatures is important in evolutionary biology and for detecting genes that facilitate to accelerate genetic improvement. However, selection signatures, including artificial selection and natural selection, have only been identified at the whole genome level in several genetically improved fish species. Tilapia is one of the most important genetically improved fish species in the world. Using next-generation sequencing, we sequenced the genomes of 47 tilapia individuals. We identified a total of 1.43 million high-quality SNPs and found that the LD block sizes ranged from 10–100 kb in tilapia. We detected over a hundred putative selective sweep regions in each line of tilapia. Most selection signatures were located in non-coding regions of the tilapia genome. The Wnt signaling, gonadotropin-releasing hormone receptor and integrin signaling pathways were under positive selection in all improved tilapia lines. Our study provides a genome-wide map of genetic variation and selection footprints in tilapia, which could be important for genetic studies and accelerating genetic improvement of tilapia. PMID:26373374

  18. Human Y Chromosome Haplogroup N: A Non-trivial Time-Resolved Phylogeography that Cuts across Language Families.

    PubMed

    Ilumäe, Anne-Mai; Reidla, Maere; Chukhryaeva, Marina; Järve, Mari; Post, Helen; Karmin, Monika; Saag, Lauri; Agdzhoyan, Anastasiya; Kushniarevich, Alena; Litvinov, Sergey; Ekomasova, Natalya; Tambets, Kristiina; Metspalu, Ene; Khusainova, Rita; Yunusbayev, Bayazit; Khusnutdinova, Elza K; Osipova, Ludmila P; Fedorova, Sardana; Utevska, Olga; Koshel, Sergey; Balanovska, Elena; Behar, Doron M; Balanovsky, Oleg; Kivisild, Toomas; Underhill, Peter A; Villems, Richard; Rootsi, Siiri

    2016-07-07

    The paternal haplogroup (hg) N is distributed from southeast Asia to eastern Europe. The demographic processes that have shaped the vast extent of this major Y chromosome lineage across numerous linguistically and autosomally divergent populations have previously been unresolved. On the basis of 94 high-coverage re-sequenced Y chromosomes, we establish and date a detailed hg N phylogeny. We evaluate geographic structure by using 16 distinguishing binary markers in 1,631 hg N Y chromosomes from a collection of 6,521 samples from 56 populations. The more southerly distributed sub-clade N4 emerged before N2a1 and N3, found mostly in the north, but the latter two display more elaborate branching patterns, indicative of regional contrasts in recent expansions. In particular, a number of prominent and well-defined clades with common N3a3'6 ancestry occur in regionally dissimilar northern Eurasian populations, indicating almost simultaneous regional diversification and expansion within the last 5,000 years. This patrilineal genetic affinity is decoupled from the associated higher degree of language diversity. Copyright © 2016 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  19. Reduced levels of protein recoding by A-to-I RNA editing in Alzheimer's disease

    PubMed Central

    Khermesh, Khen; D'Erchia, Anna Maria; Barak, Michal; Annese, Anita; Wachtel, Chaim; Levanon, Erez Y.; Picardi, Ernesto; Eisenberg, Eli

    2016-01-01

    Adenosine to inosine (A-to-I) RNA editing, catalyzed by the ADAR enzyme family, acts on dsRNA structures within pre-mRNA molecules. Editing of the coding part of the mRNA may lead to recoding, amino acid substitution in the resulting protein, possibly modifying its biochemical and biophysical properties. Altered RNA editing patterns have been observed in various neurological pathologies. Here, we present a comprehensive study of recoding by RNA editing in Alzheimer's disease (AD), the most common cause of irreversible dementia. We have used a targeted resequencing approach supplemented by a microfluidic-based high-throughput PCR coupled with next-generation sequencing to accurately quantify A-to-I RNA editing levels in a preselected set of target sites, mostly located within the coding sequence of synaptic genes. Overall, editing levels decreased in AD patients’ brain tissues, mainly in the hippocampus and to a lesser degree in the temporal and frontal lobes. Differential RNA editing levels were observed in 35 target sites within 22 genes. These results may shed light on a possible association between the neurodegenerative processes typical for AD and deficient RNA editing. PMID:26655226

  20. New Genes and New Insights from Old Genes: Update on Alzheimer Disease

    PubMed Central

    Ringman, John M.; Coppola, Giovanni

    2013-01-01

    Purpose of Review: This article discusses the current status of knowledge regarding the genetic basis of Alzheimer disease (AD) with a focus on clinically relevant aspects. Recent Findings: The genetic architecture of AD is complex, as it includes multiple susceptibility genes and likely nongenetic factors. Rare but highly penetrant autosomal dominant mutations explain a small minority of the cases but have allowed tremendous advances in understanding disease pathogenesis. The identification of a strong genetic risk factor, APOE, reshaped the field and introduced the notion of genetic risk for AD. More recently, large-scale genome-wide association studies are adding to the picture a number of common variants with very small effect sizes. Large-scale resequencing studies are expected to identify additional risk factors, including rare susceptibility variants and structural variation. Summary: Genetic assessment is currently of limited utility in clinical practice because of the low frequency (Mendelian mutations) or small effect size (common risk factors) of the currently known susceptibility genes. However, genetic studies are identifying with confidence a number of novel risk genes, and this will further our understanding of disease biology and possibly the identification of therapeutic targets. PMID:23558482

  1. Mutations Affecting the SAND Domain of DEAF1 Cause Intellectual Disability with Severe Speech Impairment and Behavioral Problems

    PubMed Central

    Vulto-van Silfhout, Anneke T.; Rajamanickam, Shivakumar; Jensik, Philip J.; Vergult, Sarah; de Rocker, Nina; Newhall, Kathryn J.; Raghavan, Ramya; Reardon, Sara N.; Jarrett, Kelsey; McIntyre, Tara; Bulinski, Joseph; Ownby, Stacy L.; Huggenvik, Jodi I.; McKnight, G. Stanley; Rose, Gregory M.; Cai, Xiang; Willaert, Andy; Zweier, Christiane; Endele, Sabine; de Ligt, Joep; van Bon, Bregje W.M.; Lugtenberg, Dorien; de Vries, Petra F.; Veltman, Joris A.; van Bokhoven, Hans; Brunner, Han G.; Rauch, Anita; de Brouwer, Arjan P.M.; Carvill, Gemma L.; Hoischen, Alexander; Mefford, Heather C.; Eichler, Evan E.; Vissers, Lisenka E.L.M.; Menten, Björn; Collard, Michael W.; de Vries, Bert B.A.

    2014-01-01

    Recently, we identified in two individuals with intellectual disability (ID) different de novo mutations in DEAF1, which encodes a transcription factor with an important role in embryonic development. To ascertain whether these mutations in DEAF1 are causative for the ID phenotype, we performed targeted resequencing of DEAF1 in an additional cohort of over 2,300 individuals with unexplained ID and identified two additional individuals with de novo mutations in this gene. All four individuals had severe ID with severely affected speech development, and three showed severe behavioral problems. DEAF1 is highly expressed in the CNS, especially during early embryonic development. All four mutations were missense mutations affecting the SAND domain of DEAF1. Altered DEAF1 harboring any of the four amino acid changes showed impaired transcriptional regulation of the DEAF1 promoter. Moreover, behavioral studies in mice with a conditional knockout of Deaf1 in the brain showed memory deficits and increased anxiety-like behavior. Our results demonstrate that mutations in DEAF1 cause ID and behavioral problems, most likely as a result of impaired transcriptional regulation by DEAF1. PMID:24726472

  2. Exploring the Genetic Signature of Body Size in Yucatan Miniature Pig

    PubMed Central

    Kim, Hyeongmin; Song, Ki Duk; Kim, Hyeon Jeong; Park, WonCheoul; Kim, Jaemin; Lee, Taeheon; Shin, Dong-Hyun; Kwak, Woori; Kwon, Young-jun; Sung, Samsun; Moon, Sunjin; Lee, Kyung-Tai; Kim, Namshin; Hong, Joon Ki; Eo, Kyung Yeon; Seo, Kang Seok; Kim, Girak; Park, Sungmoo; Yun, Cheol-Heui; Kim, Hyunil; Choi, Kimyung; Kim, Jiho; Lee, Woon Kyu; Kim, Duk-Kyung; Oh, Jae-Don; Kim, Eui-Soo; Cho, Seoae; Lee, Hak-Kyo; Kim, Tae-Hun; Kim, Heebal

    2015-01-01

    Since being domesticated about 10,000–12,000 years ago, domestic pigs (Sus scrofa domesticus) have been selected for traits of economic importance, in particular large body size. However, Yucatan miniature pigs have been selected for small body size to withstand high temperature environment and for laboratory use. This renders the Yucatan miniature pig a valuable model for understanding the evolution of body size. We investigate the genetic signature for selection of body size in the Yucatan miniature pig. Phylogenetic distance of Yucatan miniature pig was compared to other large swine breeds (Yorkshire, Landrace, Duroc and wild boar). By estimating the XP-EHH statistic using re-sequencing data derived from 70 pigs, we were able to unravel the signatures of selection of body size. We found that both selections at the level of organism, and at the cellular level have occurred. Selection at the higher levels include feed intake, regulation of body weight and increase in mass while selection at the molecular level includes cell cycle and cell proliferation. Positively selected genes probed by XP-EHH may provide insight into the docile character and innate immunity as well as body size of Yucatan miniature pig. PMID:25885114

  3. Development and evaluation of high-density Axiom® CicerSNP Array for high-resolution genetic mapping and breeding applications in chickpea.

    PubMed

    Roorkiwal, Manish; Jain, Ankit; Kale, Sandip M; Doddamani, Dadakhalandar; Chitikineni, Annapurna; Thudi, Mahendar; Varshney, Rajeev K

    2018-04-01

    To accelerate genomics research and molecular breeding applications in chickpea, a high-throughput SNP genotyping platform 'Axiom ® CicerSNP Array' has been designed, developed and validated. Screening of whole-genome resequencing data from 429 chickpea lines identified 4.9 million SNPs, from which a subset of 70 463 high-quality nonredundant SNPs was selected using different stringent filter criteria. This was further narrowed down to 61 174 SNPs based on p-convert score ≥0.3, of which 50 590 SNPs could be tiled on array. Among these tiled SNPs, a total of 11 245 SNPs (22.23%) were from the coding regions of 3673 different genes. The developed Axiom ® CicerSNP Array was used for genotyping two recombinant inbred line populations, namely ICCRIL03 (ICC 4958 × ICC 1882) and ICCRIL04 (ICC 283 × ICC 8261). Genotyping data reflected high success and polymorphic rate, with 15 140 (29.93%; ICCRIL03) and 20 018 (39.57%; ICCRIL04) polymorphic SNPs. High-density genetic maps comprising 13 679 SNPs spanning 1033.67 cM and 7769 SNPs spanning 1076.35 cM were developed for ICCRIL03 and ICCRIL04 populations, respectively. QTL analysis using multilocation, multiseason phenotyping data on these RILs identified 70 (ICCRIL03) and 120 (ICCRIL04) main-effect QTLs on genetic map. Higher precision and potential of this array is expected to advance chickpea genetics and breeding applications. © 2017 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  4. Hypoxia adaptations in the grey wolf (Canis lupus chanco) from Qinghai-Tibet Plateau.

    PubMed

    Zhang, Wenping; Fan, Zhenxin; Han, Eunjung; Hou, Rong; Zhang, Liang; Galaverni, Marco; Huang, Jie; Liu, Hong; Silva, Pedro; Li, Peng; Pollinger, John P; Du, Lianming; Zhang, XiuyYue; Yue, Bisong; Wayne, Robert K; Zhang, Zhihe

    2014-07-01

    The Tibetan grey wolf (Canis lupus chanco) occupies habitats on the Qinghai-Tibet Plateau, a high altitude (>3000 m) environment where low oxygen tension exerts unique selection pressure on individuals to adapt to hypoxic conditions. To identify genes involved in hypoxia adaptation, we generated complete genome sequences of nine Chinese wolves from high and low altitude populations at an average coverage of 25× coverage. We found that, beginning about 55,000 years ago, the highland Tibetan grey wolf suffered a more substantial population decline than lowland wolves. Positively selected hypoxia-related genes in highland wolves are enriched in the HIF signaling pathway (P = 1.57E-6), ATP binding (P = 5.62E-5), and response to an oxygen-containing compound (P≤5.30E-4). Of these positively selected hypoxia-related genes, three genes (EPAS1, ANGPT1, and RYR2) had at least one specific fixed non-synonymous SNP in highland wolves based on the nine genome data. Our re-sequencing studies on a large panel of individuals showed a frequency difference greater than 58% between highland and lowland wolves for these specific fixed non-synonymous SNPs and a high degree of LD surrounding the three genes, which imply strong selection. Past studies have shown that EPAS1 and ANGPT1 are important in the response to hypoxic stress, and RYR2 is involved in heart function. These three genes also exhibited significant signals of natural selection in high altitude human populations, which suggest similar evolutionary constraints on natural selection in wolves and humans of the Qinghai-Tibet Plateau.

  5. The History of Tree and Shrub Taxa on Bol’shoy Lyakhovsky Island (New Siberian Archipelago) since the Last Interglacial Uncovered by Sedimentary Ancient DNA and Pollen Data

    PubMed Central

    Raschke, Elena; Epp, Laura S.; Stoof-Leichsenring, Kathleen R.; Schwamborn, Georg; Herzschuh, Ulrike

    2017-01-01

    Ecosystem boundaries, such as the Arctic-Boreal treeline, are strongly coupled with climate and were spatially highly dynamic during past glacial-interglacial cycles. Only a few studies cover vegetation changes since the last interglacial, as most of the former landscapes are inundated and difficult to access. Using pollen analysis and sedimentary ancient DNA (sedaDNA) metabarcoding, we reveal vegetation changes on Bol’shoy Lyakhovsky Island since the last interglacial from permafrost sediments. Last interglacial samples depict high levels of floral diversity with the presence of trees (Larix, Picea, Populus) and shrubs (Alnus, Betula, Ribes, Cornus, Saliceae) on the currently treeless island. After the Last Glacial Maximum, Larix re-colonised the island but disappeared along with most shrub taxa. This was probably caused by Holocene sea-level rise, which led to increased oceanic conditions on the island. Additionally, we applied two newly developed larch-specific chloroplast markers to evaluate their potential for tracking past population dynamics from environmental samples. The novel markers were successfully re-sequenced and exhibited two variants of each marker in last interglacial samples. SedaDNA can track vegetation changes as well as genetic changes across geographic space through time and can improve our understanding of past processes that shape modern patterns. PMID:29027988

  6. Lessons learned from implementing a national infrastructure in Sweden for storage and analysis of next-generation sequencing data

    PubMed Central

    2013-01-01

    Analyzing and storing data and results from next-generation sequencing (NGS) experiments is a challenging task, hampered by ever-increasing data volumes and frequent updates of analysis methods and tools. Storage and computation have grown beyond the capacity of personal computers and there is a need for suitable e-infrastructures for processing. Here we describe UPPNEX, an implementation of such an infrastructure, tailored to the needs of data storage and analysis of NGS data in Sweden serving various labs and multiple instruments from the major sequencing technology platforms. UPPNEX comprises resources for high-performance computing, large-scale and high-availability storage, an extensive bioinformatics software suite, up-to-date reference genomes and annotations, a support function with system and application experts as well as a web portal and support ticket system. UPPNEX applications are numerous and diverse, and include whole genome-, de novo- and exome sequencing, targeted resequencing, SNP discovery, RNASeq, and methylation analysis. There are over 300 projects that utilize UPPNEX and include large undertakings such as the sequencing of the flycatcher and Norwegian spruce. We describe the strategic decisions made when investing in hardware, setting up maintenance and support, allocating resources, and illustrate major challenges such as managing data growth. We conclude with summarizing our experiences and observations with UPPNEX to date, providing insights into the successful and less successful decisions made. PMID:23800020

  7. Furfural-tolerant Zymomonas mobilis derived from error-prone PCR-based whole genome shuffling and their tolerant mechanism.

    PubMed

    Huang, Suzhen; Xue, Tingli; Wang, Zhiquan; Ma, Yuanyuan; He, Xueting; Hong, Jiefang; Zou, Shaolan; Song, Hao; Zhang, Minhua

    2018-04-01

    Furfural-tolerant strain is essential for the fermentative production of biofuels or chemicals from lignocellulosic biomass. In this study, Zymomonas mobilis CP4 was for the first time subjected to error-prone PCR-based whole genome shuffling, and the resulting mutants F211 and F27 that could tolerate 3 g/L furfural were obtained. The mutant F211 under various furfural stress conditions could rapidly grow when the furfural concentration reduced to 1 g/L. Meanwhile, the two mutants also showed higher tolerance to high concentration of glucose than the control strain CP4. Genome resequencing revealed that the F211 and F27 had 12 and 13 single-nucleotide polymorphisms. The activity assay demonstrated that the activity of NADH-dependent furfural reductase in mutant F211 and CP4 was all increased under furfural stress, and the activity peaked earlier in mutant than in control. Also, furfural level in the culture of F211 was also more rapidly decreased. These indicate that the increase in furfural tolerance of the mutants may be resulted from the enhanced NADH-dependent furfural reductase activity during early log phase, which could lead to an accelerated furfural detoxification process in mutants. In all, we obtained Z. mobilis mutants with enhanced furfural and high concentration of glucose tolerance, and provided valuable clues for the mechanism of furfural tolerance and strain development.

  8. Draft genome of the peanut A-genome progenitor (Arachis duranensis) provides insights into geocarpy, oil biosynthesis, and allergens

    PubMed Central

    Chen, Xiaoping; Li, Hongjie; Pandey, Manish K.; Yang, Qingli; Wang, Xiyin; Garg, Vanika; Li, Haifen; Chi, Xiaoyuan; Doddamani, Dadakhalandar; Hong, Yanbin; Upadhyaya, Hari; Guo, Hui; Khan, Aamir W.; Zhu, Fanghe; Zhang, Xiaoyan; Pan, Lijuan; Pierce, Gary J.; Zhou, Guiyuan; Krishnamohan, Katta A. V. S.; Chen, Mingna; Zhong, Ni; Agarwal, Gaurav; Li, Shuanzhu; Chitikineni, Annapurna; Zhang, Guo-Qiang; Sharma, Shivali; Chen, Na; Liu, Haiyan; Janila, Pasupuleti; Li, Shaoxiong; Wang, Min; Wang, Tong; Sun, Jie; Li, Xingyu; Li, Chunyan; Wang, Mian; Yu, Lina; Wen, Shijie; Singh, Sube; Yang, Zhen; Zhao, Jinming; Zhang, Chushu; Yu, Yue; Bi, Jie; Zhang, Xiaojun; Paterson, Andrew H.; Wang, Shuping; Liang, Xuanqiang; Varshney, Rajeev K.; Yu, Shanlin

    2016-01-01

    Peanut or groundnut (Arachis hypogaea L.), a legume of South American origin, has high seed oil content (45–56%) and is a staple crop in semiarid tropical and subtropical regions, partially because of drought tolerance conferred by its geocarpic reproductive strategy. We present a draft genome of the peanut A-genome progenitor, Arachis duranensis, and 50,324 protein-coding gene models. Patterns of gene duplication suggest the peanut lineage has been affected by at least three polyploidizations since the origin of eudicots. Resequencing of synthetic Arachis tetraploids reveals extensive gene conversion in only three seed-to-seed generations since their formation by human hands, indicating that this process begins virtually immediately following polyploid formation. Expansion of some specific gene families suggests roles in the unusual subterranean fructification of Arachis. For example, the S1Fa-like transcription factor family has 126 Arachis members, in contrast to no more than five members in other examined plant species, and is more highly expressed in roots and etiolated seedlings than green leaves. The A. duranensis genome provides a major source of candidate genes for fructification, oil biosynthesis, and allergens, expanding knowledge of understudied areas of plant biology and human health impacts of plants, informing peanut genetic improvement and aiding deeper sequencing of Arachis diversity. PMID:27247390

  9. Emerging Genomic Tools for Legume Breeding: Current Status and Future Prospects

    PubMed Central

    Pandey, Manish K.; Roorkiwal, Manish; Singh, Vikas K.; Ramalingam, Abirami; Kudapa, Himabindu; Thudi, Mahendar; Chitikineni, Anu; Rathore, Abhishek; Varshney, Rajeev K.

    2016-01-01

    Legumes play a vital role in ensuring global nutritional food security and improving soil quality through nitrogen fixation. Accelerated higher genetic gains is required to meet the demand of ever increasing global population. In recent years, speedy developments have been witnessed in legume genomics due to advancements in next-generation sequencing (NGS) and high-throughput genotyping technologies. Reference genome sequences for many legume crops have been reported in the last 5 years. The availability of the draft genome sequences and re-sequencing of elite genotypes for several important legume crops have made it possible to identify structural variations at large scale. Availability of large-scale genomic resources and low-cost and high-throughput genotyping technologies are enhancing the efficiency and resolution of genetic mapping and marker-trait association studies. Most importantly, deployment of molecular breeding approaches has resulted in development of improved lines in some legume crops such as chickpea and groundnut. In order to support genomics-driven crop improvement at a fast pace, the deployment of breeder-friendly genomics and decision support tools seems appear to be critical in breeding programs in developing countries. This review provides an overview of emerging genomics and informatics tools/approaches that will be the key driving force for accelerating genomics-assisted breeding and ultimately ensuring nutritional and food security in developing countries. PMID:27199998

  10. Dissection of genetic architecture of rice plant height and heading date by multiple-strategy-based association studies

    PubMed Central

    Zhou, Liyuan; Liu, Shouye; Wu, Weixun; Chen, Daibo; Zhan, Xiaodeng; Zhu, Aike; Zhang, Yingxin; Cheng, Shihua; Cao, Liyong; Lou, Xiangyang; Xu, Haiming

    2016-01-01

    Xieyou9308 is a certified super hybrid rice cultivar with a high grain yield. To investigate its underlying genetic basis of high yield potential, a recombinant inbred line (RIL) population derived from the cross between the maintainer line XieqingzaoB (XQZB) and the restorer line Zhonghui9308 (ZH9308) was constructed for identification of quantitative trait SNPs (QTSs) associated with two important agronomic traits, plant height (PH) and heading date (HD). By re-sequencing of 138 recombinant inbred lines (RILs), a total of ~0.7 million SNPs were identified for the association studies on the PH and HD. Three association mapping strategies (including hypothesis-free genome-wide association and its two complementary hypothesis-engaged ones, QTL-based association and gene-based association) were adopted for data analysis. Using a saturated mixed linear model including epistasis and environmental interaction, we identified a total of 31 QTSs associated with either the PH or the HD. The total estimated heritability across three analyses ranged from 37.22% to 45.63% and from 37.53% to 55.96% for the PH and HD, respectively. In this study we examined the feasibility of association studies in an experimental population (RIL) and identified several common loci through multiple strategies which could be preferred candidates for further research. PMID:27406081

  11. The genome sequence of sweet cherry (Prunus avium) for use in genomics-assisted breeding.

    PubMed

    Shirasawa, Kenta; Isuzugawa, Kanji; Ikenaga, Mitsunobu; Saito, Yutaro; Yamamoto, Toshiya; Hirakawa, Hideki; Isobe, Sachiko

    2017-10-01

    We determined the genome sequence of sweet cherry (Prunus avium) using next-generation sequencing technology. The total length of the assembled sequences was 272.4 Mb, consisting of 10,148 scaffold sequences with an N50 length of 219.6 kb. The sequences covered 77.8% of the 352.9 Mb sweet cherry genome, as estimated by k-mer analysis, and included >96.0% of the core eukaryotic genes. We predicted 43,349 complete and partial protein-encoding genes. A high-density consensus map with 2,382 loci was constructed using double-digest restriction site-associated DNA sequencing. Comparing the genetic maps of sweet cherry and peach revealed high synteny between the two genomes; thus the scaffolds were integrated into pseudomolecules using map- and synteny-based strategies. Whole-genome resequencing of six modern cultivars found 1,016,866 SNPs and 162,402 insertions/deletions, out of which 0.7% were deleterious. The sequence variants, as well as simple sequence repeats, can be used as DNA markers. The genomic information helps us to identify agronomically important genes and will accelerate genetic studies and breeding programs for sweet cherries. Further information on the genomic sequences and DNA markers is available in DBcherry (http://cherry.kazusa.or.jp (8 May 2017, date last accessed)). © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  12. Whole genome sequences in pulse crops: a global community resource to expedite translational genomics and knowledge-based crop improvement.

    PubMed

    Bohra, Abhishek; Singh, Narendra P

    2015-08-01

    Unprecedented developments in legume genomics over the last decade have resulted in the acquisition of a wide range of modern genomic resources to underpin genetic improvement of grain legumes. The genome enabled insights direct investigators in various ways that primarily include unearthing novel structural variations, retrieving the lost genetic diversity, introducing novel/exotic alleles from wider gene pools, finely resolving the complex quantitative traits and so forth. To this end, ready availability of cost-efficient and high-density genotyping assays allows genome wide prediction to be increasingly recognized as the key selection criterion in crop breeding. Further, the high-dimensional measurements of agronomically significant phenotypes obtained by using new-generation screening techniques will empower reference based resequencing as well as allele mining and trait mapping methods to comprehensively associate genome diversity with the phenome scale variation. Besides stimulating the forward genetic systems, accessibility to precisely delineated genomic segments reveals novel candidates for reverse genetic techniques like targeted genome editing. The shifting paradigm in plant genomics in turn necessitates optimization of crop breeding strategies to enable the most efficient integration of advanced omics knowledge and tools. We anticipate that the crop improvement schemes will be bolstered remarkably with rational deployment of these genome-guided approaches, ultimately resulting in expanded plant breeding capacities and improved crop performance.

  13. Identification of a Novel Idiopathic Epilepsy Locus in Belgian Shepherd Dogs

    PubMed Central

    Seppälä, Eija H.; Koskinen, Lotta L. E.; Gulløv, Christina H.; Jokinen, Päivi; Karlskov-Mortensen, Peter; Bergamasco, Luciana; Baranowska Körberg, Izabella; Cizinauskas, Sigitas; Oberbauer, Anita M.; Berendt, Mette; Fredholm, Merete; Lohi, Hannes

    2012-01-01

    Epilepsy is the most common neurological disorder in dogs, with an incidence ranging from 0.5% to up to 20% in particular breeds. Canine epilepsy can be etiologically defined as idiopathic or symptomatic. Epileptic seizures may be classified as focal with or without secondary generalization, or as primary generalized. Nine genes have been identified for symptomatic (storage diseases) and one for idiopathic epilepsy in different breeds. However, the genetic background of common canine epilepsies remains unknown. We have studied the clinical and genetic background of epilepsy in Belgian Shepherds. We collected 159 cases and 148 controls and confirmed the presence of epilepsy through epilepsy questionnaires and clinical examinations. The MRI was normal while interictal EEG revealed abnormalities and variable foci in the clinically examined affected dogs. A genome-wide association study using Affymetrix 50K SNP arrays in 40 cases and 44 controls mapped the epilepsy locus on CFA37, which was replicated in an independent cohort (81 cases and 88 controls; combined p = 9.70×10−10, OR = 3.3). Fine mapping study defined a ∼1 Mb region including 12 genes of which none are known epilepsy genes or encode ion channels. Exonic sequencing was performed for two candidate genes, KLF7 and ADAM23. No variation was found in KLF7 but a highly-associated non-synonymous variant, G1203A (R387H) was present in the ADAM23 gene (p = 3.7×10−8, OR = 3.9 for homozygosity). Homozygosity for a two-SNP haplotype within the ADAM23 gene conferred the highest risk for epilepsy (p = 6.28×10−11, OR = 7.4). ADAM23 interacts with known epilepsy proteins LGI1 and LGI2. However, our data suggests that the ADAM23 variant is a polymorphism and we have initiated a targeted re-sequencing study across the locus to identify the causative mutation. It would establish the affected breed as a novel therapeutic model, help to develop a DNA test for breeding purposes and introduce a novel candidate gene for human idiopathic epilepsies. PMID:22457775

  14. SOPRA: Scaffolding algorithm for paired reads via statistical optimization.

    PubMed

    Dayarian, Adel; Michael, Todd P; Sengupta, Anirvan M

    2010-06-24

    High throughput sequencing (HTS) platforms produce gigabases of short read (<100 bp) data per run. While these short reads are adequate for resequencing applications, de novo assembly of moderate size genomes from such reads remains a significant challenge. These limitations could be partially overcome by utilizing mate pair technology, which provides pairs of short reads separated by a known distance along the genome. We have developed SOPRA, a tool designed to exploit the mate pair/paired-end information for assembly of short reads. The main focus of the algorithm is selecting a sufficiently large subset of simultaneously satisfiable mate pair constraints to achieve a balance between the size and the quality of the output scaffolds. Scaffold assembly is presented as an optimization problem for variables associated with vertices and with edges of the contig connectivity graph. Vertices of this graph are individual contigs with edges drawn between contigs connected by mate pairs. Similar graph problems have been invoked in the context of shotgun sequencing and scaffold building for previous generation of sequencing projects. However, given the error-prone nature of HTS data and the fundamental limitations from the shortness of the reads, the ad hoc greedy algorithms used in the earlier studies are likely to lead to poor quality results in the current context. SOPRA circumvents this problem by treating all the constraints on equal footing for solving the optimization problem, the solution itself indicating the problematic constraints (chimeric/repetitive contigs, etc.) to be removed. The process of solving and removing of constraints is iterated till one reaches a core set of consistent constraints. For SOLiD sequencer data, SOPRA uses a dynamic programming approach to robustly translate the color-space assembly to base-space. For assessing the quality of an assembly, we report the no-match/mismatch error rate as well as the rates of various rearrangement errors. Applying SOPRA to real data from bacterial genomes, we were able to assemble contigs into scaffolds of significant length (N50 up to 200 Kb) with very few errors introduced in the process. In general, the methodology presented here will allow better scaffold assemblies of any type of mate pair sequencing data.

  15. Teaching children with autism to use photographic activity schedules: maintenance and generalization of complex response chains.

    PubMed Central

    MacDuff, G S; Krantz, P J; McClannahan, L E

    1993-01-01

    We used a graduated guidance procedure to teach 4 boys with autism to follow photographic activity schedules to increase on-task and on-schedule behavior. The multiple baseline across participants design included baseline, teaching, maintenance, resequencing of photographs, and generalization to novel photographs phases. The results indicated that photographic activity schedules (albums depicting after-school activities) produced sustained engagement, and skills generalized to a new sequence of photographs and to new photographs. The acquisition of schedule-following skills enabled these children with severe developmental disabilities to display lengthy response chains, independently change activities, and change activities in different group home settings in the absence of immediate supervision and prompts from others. PMID:8473261

  16. Ancient homology underlies adaptive mimetic diversity across butterflies

    PubMed Central

    Gallant, Jason R.; Imhoff, Vance E.; Martin, Arnaud; Savage, Wesley K.; Chamberlain, Nicola L.; Pote, Ben L.; Peterson, Chelsea; Smith, Gabriella E.; Evans, Benjamin; Reed, Robert D.; Kronforst, Marcus R.; Mullen, Sean P.

    2014-01-01

    Convergent evolution provides a rare, natural experiment with which to test the predictability of adaptation at the molecular level. Little is known about the molecular basis of convergence over macro-evolutionary timescales. Here we use a combination of positional cloning, population genomic resequencing, association mapping and developmental data to demonstrate that positionally orthologous nucleotide variants in the upstream region of the same gene, WntA, are responsible for parallel mimetic variation in two butterfly lineages that diverged >65 million years ago. Furthermore, characterization of spatial patterns of WntA expression during development suggests that alternative regulatory mechanisms underlie wing pattern variation in each system. Taken together, our results reveal a strikingly predictable molecular basis for phenotypic convergence over deep evolutionary time. PMID:25198507

  17. Increased heterocyst frequency by patN disruption in Anabaena leads to enhanced photobiological hydrogen production at high light intensity and high cell density.

    PubMed

    Masukawa, Hajime; Sakurai, Hidehiro; Hausinger, Robert P; Inoue, Kazuhito

    2017-03-01

    The effects of increasing the heterocyst-to-vegetative cell ratio on the nitrogenase-based photobiological hydrogen production by the filamentous heterocyst-forming cyanobacterium Anabaena sp. PCC 7120 were studied. Using the uptake hydrogenase-disrupted mutant (ΔHup) as the parent, a deletion-insertion mutant (PN1) was created in patN, known to be involved in heterocyst pattern formation and leading to multiple singular heterocysts (MSH) in Nostoc punctiforme strain ATCC 29133. The PN1 strain showed heterocyst differentiation but failed to grow in medium free of combined-nitrogen; however, a spontaneous mutant (PN22) was obtained on prolonged incubation of PN1 liquid cultures and was able to grow robustly on N 2 . The disruption of patN was confirmed in both PN1 and PN22 by PCR and whole genome resequencing. Under combined-nitrogen limitation, the percentage of heterocysts to total cells in the PN22 filaments was 13-15 and 16-18% under air and 1% CO 2 -enriched air, respectively, in contrast to the parent ΔHup which formed 6.5-11 and 9.7-13% heterocysts in these conditions. The PN22 strain exhibited a MSH phenotype, normal diazotrophic growth, and higher H 2 productivity at high cell concentrations, and was less susceptible to photoinhibition by strong light than the parent ΔHup strain, resulting in greater light energy utilization efficiency in H 2 production on a per unit area basis under high light conditions. The increase in MSH frequency shown here appears to be a viable strategy for enhancing H 2 productivity by outdoor cultures of cyanobacteria in high-light environments.

  18. Optimizing the arrival, waiting, and NPO times of children on the day of pediatric endoscopy procedures.

    PubMed

    Smallman, Bettina; Dexter, Franklin

    2010-03-01

    Research in predictive variability of operating room (OR) times has been performed using data from multidisciplinary, tertiary hospitals with mostly adult patients. In this article, we discuss case-duration prediction for children receiving general anesthesia for endoscopy. We critique which of the several types of OR management decisions dependent on accuracy of prediction are relevant to series (lists) of brief pediatric anesthetics. OR information system data were obtained for all children (aged 18 years and younger) undergoing a gastroenterology procedure with an anesthesiologist over 21 months. Summaries of data were used for a qualitative, systematic review of prior studies to learn which apply to brief pediatric cases. Patient arrival times were changed to be based on the statistical method relating actual and scheduled start times (Wachtel and Dexter, Anesth Analg 2007;105:127-40). Even perfect case-duration prediction would not affect whether a brief case was performed on a certain date and/or in a certain OR. There was no evidence of usefulness in calculating the probability that one case would last longer than another or in resequencing cases to influence postanesthesia care unit staffing or patient waiting from scheduled start times. The only decision for which the accuracy of case-duration prediction mattered was for the shortest time that preceding cases in the OR may take. Knowledge of the preceding procedures in the OR was not useful for that purpose because there were hundreds of combinations of preceding procedures and some cases cancelled. Instead, patient ready times were chosen based on 5% lower prediction bounds for ratios of actual to scheduled OR times. The approach was useful based on a 30% reduction in patient waiting times from scheduled start times with corresponding expected reductions in average and peak numbers of patients in the holding area. For brief pediatric OR anesthetics, predictive variability of case durations matters principally to the extent that it affects appropriate patient ready times. Such times should not be chosen by having patients start fasting, arrive, and be ready fixed numbers of hours before their scheduled start times.

  19. Refining the Results of a Classical SELEX Experiment by Expanding the Sequence Data Set of an Aptamer Pool Selected for Protein A

    PubMed Central

    2018-01-01

    New, as yet undiscovered aptamers for Protein A were identified by applying next generation sequencing (NGS) to a previously selected aptamer pool. This pool was obtained in a classical SELEX (Systematic Evolution of Ligands by EXponential enrichment) experiment using the FluMag-SELEX procedure followed by cloning and Sanger sequencing. PA#2/8 was identified as the only Protein A-binding aptamer from the Sanger sequence pool, and was shown to be able to bind intact cells of Staphylococcus aureus. In this study, we show the extension of the SELEX results by re-sequencing of the same aptamer pool using a medium throughput NGS approach and data analysis. Both data pools were compared. They confirm the selection of a highly complex and heterogeneous oligonucleotide pool and show consistently a high content of orphans as well as a similar relative frequency of certain sequence groups. But in contrast to the Sanger data pool, the NGS pool was clearly dominated by one sequence group containing the known Protein A-binding aptamer PA#2/8 as the most frequent sequence in this group. In addition, we found two new sequence groups in the NGS pool represented by PA-C10 and PA-C8, respectively, which also have high specificity for Protein A. Comparative affinity studies reveal differences between the aptamers and confirm that PA#2/8 remains the most potent sequence within the selected aptamer pool reaching affinities in the low nanomolar range of KD = 20 ± 1 nM. PMID:29495282

  20. Refining the Results of a Classical SELEX Experiment by Expanding the Sequence Data Set of an Aptamer Pool Selected for Protein A.

    PubMed

    Stoltenburg, Regina; Strehlitz, Beate

    2018-02-24

    New, as yet undiscovered aptamers for Protein A were identified by applying next generation sequencing (NGS) to a previously selected aptamer pool. This pool was obtained in a classical SELEX (Systematic Evolution of Ligands by EXponential enrichment) experiment using the FluMag-SELEX procedure followed by cloning and Sanger sequencing. PA#2/8 was identified as the only Protein A-binding aptamer from the Sanger sequence pool, and was shown to be able to bind intact cells of Staphylococcus aureus . In this study, we show the extension of the SELEX results by re-sequencing of the same aptamer pool using a medium throughput NGS approach and data analysis. Both data pools were compared. They confirm the selection of a highly complex and heterogeneous oligonucleotide pool and show consistently a high content of orphans as well as a similar relative frequency of certain sequence groups. But in contrast to the Sanger data pool, the NGS pool was clearly dominated by one sequence group containing the known Protein A-binding aptamer PA#2/8 as the most frequent sequence in this group. In addition, we found two new sequence groups in the NGS pool represented by PA-C10 and PA-C8, respectively, which also have high specificity for Protein A. Comparative affinity studies reveal differences between the aptamers and confirm that PA#2/8 remains the most potent sequence within the selected aptamer pool reaching affinities in the low nanomolar range of K D = 20 ± 1 nM.

  1. Expression profiling during arabidopsis/downy mildew interaction reveals a highly-expressed effector that attenuates responses to salicylic acid.

    PubMed

    Asai, Shuta; Rallapalli, Ghanasyam; Piquerez, Sophie J M; Caillaud, Marie-Cécile; Furzer, Oliver J; Ishaque, Naveed; Wirthmueller, Lennart; Fabro, Georgina; Shirasu, Ken; Jones, Jonathan D G

    2014-10-01

    Plants have evolved strong innate immunity mechanisms, but successful pathogens evade or suppress plant immunity via effectors delivered into the plant cell. Hyaloperonospora arabidopsidis (Hpa) causes downy mildew on Arabidopsis thaliana, and a genome sequence is available for isolate Emoy2. Here, we exploit the availability of genome sequences for Hpa and Arabidopsis to measure gene-expression changes in both Hpa and Arabidopsis simultaneously during infection. Using a high-throughput cDNA tag sequencing method, we reveal expression patterns of Hpa predicted effectors and Arabidopsis genes in compatible and incompatible interactions, and promoter elements associated with Hpa genes expressed during infection. By resequencing Hpa isolate Waco9, we found it evades Arabidopsis resistance gene RPP1 through deletion of the cognate recognized effector ATR1. Arabidopsis salicylic acid (SA)-responsive genes including PR1 were activated not only at early time points in the incompatible interaction but also at late time points in the compatible interaction. By histochemical analysis, we found that Hpa suppresses SA-inducible PR1 expression, specifically in the haustoriated cells into which host-translocated effectors are delivered, but not in non-haustoriated adjacent cells. Finally, we found a highly-expressed Hpa effector candidate that suppresses responsiveness to SA. As this approach can be easily applied to host-pathogen interactions for which both host and pathogen genome sequences are available, this work opens the door towards transcriptome studies in infection biology that should help unravel pathogen infection strategies and the mechanisms by which host defense responses are overcome.

  2. The draft genome and transcriptome of Cannabis sativa

    PubMed Central

    2011-01-01

    Background Cannabis sativa has been cultivated throughout human history as a source of fiber, oil and food, and for its medicinal and intoxicating properties. Selective breeding has produced cannabis plants for specific uses, including high-potency marijuana strains and hemp cultivars for fiber and seed production. The molecular biology underlying cannabinoid biosynthesis and other traits of interest is largely unexplored. Results We sequenced genomic DNA and RNA from the marijuana strain Purple Kush using shortread approaches. We report a draft haploid genome sequence of 534 Mb and a transcriptome of 30,000 genes. Comparison of the transcriptome of Purple Kush with that of the hemp cultivar 'Finola' revealed that many genes encoding proteins involved in cannabinoid and precursor pathways are more highly expressed in Purple Kush than in 'Finola'. The exclusive occurrence of Δ9-tetrahydrocannabinolic acid synthase in the Purple Kush transcriptome, and its replacement by cannabidiolic acid synthase in 'Finola', may explain why the psychoactive cannabinoid Δ9-tetrahydrocannabinol (THC) is produced in marijuana but not in hemp. Resequencing the hemp cultivars 'Finola' and 'USO-31' showed little difference in gene copy numbers of cannabinoid pathway enzymes. However, single nucleotide variant analysis uncovered a relatively high level of variation among four cannabis types, and supported a separation of marijuana and hemp. Conclusions The availability of the Cannabis sativa genome enables the study of a multifunctional plant that occupies a unique role in human culture. Its availability will aid the development of therapeutic marijuana strains with tailored cannabinoid profiles and provide a basis for the breeding of hemp with improved agronomic characteristics. PMID:22014239

  3. Whole-genome sequencing approaches for conservation biology: Advantages, limitations and practical recommendations.

    PubMed

    Fuentes-Pardo, Angela P; Ruzzante, Daniel E

    2017-10-01

    Whole-genome resequencing (WGR) is a powerful method for addressing fundamental evolutionary biology questions that have not been fully resolved using traditional methods. WGR includes four approaches: the sequencing of individuals to a high depth of coverage with either unresolved or resolved haplotypes, the sequencing of population genomes to a high depth by mixing equimolar amounts of unlabelled-individual DNA (Pool-seq) and the sequencing of multiple individuals from a population to a low depth (lcWGR). These techniques require the availability of a reference genome. This, along with the still high cost of shotgun sequencing and the large demand for computing resources and storage, has limited their implementation in nonmodel species with scarce genomic resources and in fields such as conservation biology. Our goal here is to describe the various WGR methods, their pros and cons and potential applications in conservation biology. WGR offers an unprecedented marker density and surveys a wide diversity of genetic variations not limited to single nucleotide polymorphisms (e.g., structural variants and mutations in regulatory elements), increasing their power for the detection of signatures of selection and local adaptation as well as for the identification of the genetic basis of phenotypic traits and diseases. Currently, though, no single WGR approach fulfils all requirements of conservation genetics, and each method has its own limitations and sources of potential bias. We discuss proposed ways to minimize such biases. We envision a not distant future where the analysis of whole genomes becomes a routine task in many nonmodel species and fields including conservation biology. © 2017 John Wiley & Sons Ltd.

  4. The draft genome and transcriptome of Cannabis sativa.

    PubMed

    van Bakel, Harm; Stout, Jake M; Cote, Atina G; Tallon, Carling M; Sharpe, Andrew G; Hughes, Timothy R; Page, Jonathan E

    2011-10-20

    Cannabis sativa has been cultivated throughout human history as a source of fiber, oil and food, and for its medicinal and intoxicating properties. Selective breeding has produced cannabis plants for specific uses, including high-potency marijuana strains and hemp cultivars for fiber and seed production. The molecular biology underlying cannabinoid biosynthesis and other traits of interest is largely unexplored. We sequenced genomic DNA and RNA from the marijuana strain Purple Kush using shortread approaches. We report a draft haploid genome sequence of 534 Mb and a transcriptome of 30,000 genes. Comparison of the transcriptome of Purple Kush with that of the hemp cultivar 'Finola' revealed that many genes encoding proteins involved in cannabinoid and precursor pathways are more highly expressed in Purple Kush than in 'Finola'. The exclusive occurrence of Δ9-tetrahydrocannabinolic acid synthase in the Purple Kush transcriptome, and its replacement by cannabidiolic acid synthase in 'Finola', may explain why the psychoactive cannabinoid Δ9-tetrahydrocannabinol (THC) is produced in marijuana but not in hemp. Resequencing the hemp cultivars 'Finola' and 'USO-31' showed little difference in gene copy numbers of cannabinoid pathway enzymes. However, single nucleotide variant analysis uncovered a relatively high level of variation among four cannabis types, and supported a separation of marijuana and hemp. The availability of the Cannabis sativa genome enables the study of a multifunctional plant that occupies a unique role in human culture. Its availability will aid the development of therapeutic marijuana strains with tailored cannabinoid profiles and provide a basis for the breeding of hemp with improved agronomic characteristics.

  5. High throughput SNP discovery and genotyping in hexaploid wheat

    PubMed Central

    Navarro, Julien; Kitt, Jonathan; Choulet, Frédéric; Leveugle, Magalie; Duarte, Jorge; Rivière, Nathalie; Eversole, Kellye; Le Gouis, Jacques; Davassi, Alessandro; Balfourier, François; Le Paslier, Marie-Christine; Berard, Aurélie; Brunel, Dominique; Feuillet, Catherine; Poncet, Charles; Sourdille, Pierre

    2018-01-01

    Because of their abundance and their amenability to high-throughput genotyping techniques, Single Nucleotide Polymorphisms (SNPs) are powerful tools for efficient genetics and genomics studies, including characterization of genetic resources, genome-wide association studies and genomic selection. In wheat, most of the previous SNP discovery initiatives targeted the coding fraction, leaving almost 98% of the wheat genome largely unexploited. Here we report on the use of whole-genome resequencing data from eight wheat lines to mine for SNPs in the genic, the repetitive and non-repetitive intergenic fractions of the wheat genome. Eventually, we identified 3.3 million SNPs, 49% being located on the B-genome, 41% on the A-genome and 10% on the D-genome. We also describe the development of the TaBW280K high-throughput genotyping array containing 280,226 SNPs. Performance of this chip was examined by genotyping a set of 96 wheat accessions representing the worldwide diversity. Sixty-nine percent of the SNPs can be efficiently scored, half of them showing a diploid-like clustering. The TaBW280K was proven to be a very efficient tool for diversity analyses, as well as for breeding as it can discriminate between closely related elite varieties. Finally, the TaBW280K array was used to genotype a population derived from a cross between Chinese Spring and Renan, leading to the construction a dense genetic map comprising 83,721 markers. The results described here will provide the wheat community with powerful tools for both basic and applied research. PMID:29293495

  6. A candidate gene based approach validates Md-PG1 as the main responsible for a QTL impacting fruit texture in apple (Malus x domestica Borkh).

    PubMed

    Longhi, Sara; Hamblin, Martha T; Trainotti, Livio; Peace, Cameron P; Velasco, Riccardo; Costa, Fabrizio

    2013-03-04

    Apple is a widely cultivated fruit crop for its quality properties and extended storability. Among the several quality factors, texture is the most important and appreciated, and within the apple variety panorama the cortex texture shows a broad range of variability. Anatomically these variations depend on degradation events occurring in both fruit primary cell wall and middle lamella. This physiological process is regulated by an enzymatic network generally encoded by large gene families, among which polygalacturonase is devoted to the depolymerization of pectin. In apple, Md-PG1, a key gene belonging to the polygalacturonase gene family, was mapped on chromosome 10 and co-localized within the statistical interval of a major hot spot QTL associated to several fruit texture sub-phenotypes. In this work, a QTL corresponding to the position of Md-PG1 was validated and new functional alleles associated to the fruit texture properties in 77 apple cultivars were discovered. 38 SNPs genotyped by gene full length resequencing and 2 SSR markers ad hoc targeted in the gene metacontig were employed. Out of this SNP set, eleven were used to define three significant haplotypes statistically associated to several texture components. The impact of Md-PG1 in the fruit cell wall disassembly was further confirmed by the cortex structure electron microscope scanning in two apple varieties characterized by opposite texture performance, such as 'Golden Delicious' and 'Granny Smith'. The results here presented step forward into the genetic dissection of fruit texture in apple. This new set of haplotypes, and microsatellite alleles, can represent a valuable toolbox for a more efficient parental selection as well as the identification of new apple accessions distinguished by superior fruit quality features.

  7. A Novel Genome-Information Content-Based Statistic for Genome-Wide Association Analysis Designed for Next-Generation Sequencing Data

    PubMed Central

    Luo, Li; Zhu, Yun

    2012-01-01

    Abstract The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T2, collapsing method, multivariate and collapsing (CMC) method, individual χ2 test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets. PMID:22651812

  8. A novel genome-information content-based statistic for genome-wide association analysis designed for next-generation sequencing data.

    PubMed

    Luo, Li; Zhu, Yun; Xiong, Momiao

    2012-06-01

    The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T(2), collapsing method, multivariate and collapsing (CMC) method, individual χ(2) test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets.

  9. Exome and Transcriptome Sequencing of Aedes aegypti Identifies a Locus That Confers Resistance to Brugia malayi and Alters the Immune Response

    PubMed Central

    Juneja, Punita; Ariani, Cristina V.; Ho, Yung Shwen; Akorli, Jewelna; Palmer, William J.; Pain, Arnab; Jiggins, Francis M.

    2015-01-01

    Many mosquito species are naturally polymorphic for their abilities to transmit parasites, a feature which is of great interest for controlling vector-borne disease. Aedes aegypti, the primary vector of dengue and yellow fever and a laboratory model for studying lymphatic filariasis, is genetically variable for its capacity to harbor the filarial nematode Brugia malayi. The genome of Ae. aegypti is large and repetitive, making genome resequencing difficult and expensive. We designed exome captures to target protein-coding regions of the genome, and used association mapping in a wild Kenyan population to identify a single, dominant, sex-linked locus underlying resistance. This falls in a region of the genome where a resistance locus was previously mapped in a line established in 1936, suggesting that this polymorphism has been maintained in the wild for the at least 80 years. We then crossed resistant and susceptible mosquitoes to place both alleles of the gene into a common genetic background, and used RNA-seq to measure the effect of this locus on gene expression. We found evidence for Toll, IMD, and JAK-STAT pathway activity in response to early stages of B. malayi infection when the parasites are beginning to die in the resistant genotype. We also found that resistant mosquitoes express anti-microbial peptides at the time of parasite-killing, and that this expression is suppressed in susceptible mosquitoes. Together, we have found that a single resistance locus leads to a higher immune response in resistant mosquitoes, and we identify genes in this region that may be responsible for this trait. PMID:25815506

  10. Gene variants in responsiveness to clopidogrel have no impact on clinical outcomes in Chinese patients undergoing percutaneous coronary intervention - A multicenter study.

    PubMed

    Li, Chenze; Zhang, Lina; Wang, Haoran; Li, Sha; Zhang, Yan; You, Ling; Sun, Yang; Wang, Dong; Yang, Jun; Cui, Yinghua; Cao, Yanyan; Shen, Xiaoqing; Wang, Yan; Cui, Wei; Yan, Jiangtao; Zeng, Hesong; Guo, Xiaomei; Li, Jianjun; Wang, Dao Wen

    2017-08-01

    Gene variants contribute to variability in individual responsiveness to clopidogrel and influence cardiovascular outcomes in Caucasian patients with acute coronary syndrome (ACS). However, limited data is available in Asian populations. We resequenced 14 genes in metabolizing and activity pathway of clopidogrel in 138 patients with ACS and prospectively assessed the modulating effects of 13 variants possibly related to clopidogrel efficacy on one-year cardiovascular event occurrence in 5820 ACS patients after percutaneous coronary intervention (PCI). In addition, platelet aggregation rate was measured in 1084 participants and plasma levels of active metabolite were determined in 15 patients to test whether increasing clopidogrel maintenance doses increases active metabolite exposure. No significant associations were found between any of the tested variants and risk of cardiovascular events (P>0.05), although CYP2C19*2 carriers had slightly higher on-treatment platelet aggregation rate and lower active metabolite exposure compared with that of non-carriers (Median [IQR] 51.49 [35.43-66.75] vs. 49.05 [32.36-63.38], P=0.012) (means±SD AUC, 22.84±5.00 vs. 35.05±12.34, P=0.008). Switching from 75mg daily clopidogrel to 150mg daily fully overcomes low exposure to clopidogrel active metabolite in CYP2C19*2 carriers (means±SD AUC, 32.35±8.65 vs. 35.05±12.34, P=0.314). Different from Caucasian populations, genetic variants have no significant influence on clinical outcomes and have much milder effects on inhibition of platelet and active clopidogrel metabolite levels in Chinese patients with ACS after PCI, an effect which could be overcome with a dose escalation to 150mg daily. Copyright © 2017. Published by Elsevier B.V.

  11. Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies

    PubMed Central

    Sundquist, Andreas; Ronaghi, Mostafa; Tang, Haixu; Pevzner, Pavel; Batzoglou, Serafim

    2007-01-01

    While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology. PMID:17534434

  12. Genome-wide copy number variation (CNV) detection in Nelore cattle reveals highly frequent variants in genome regions harboring QTLs affecting production traits.

    PubMed

    da Silva, Joaquim Manoel; Giachetto, Poliana Fernanda; da Silva, Luiz Otávio; Cintra, Leandro Carrijo; Paiva, Samuel Rezende; Yamagishi, Michel Eduardo Beleza; Caetano, Alexandre Rodrigues

    2016-06-13

    Copy number variations (CNVs) have been shown to account for substantial portions of observed genomic variation and have been associated with qualitative and quantitative traits and the onset of disease in a number of species. Information from high-resolution studies to detect, characterize and estimate population-specific variant frequencies will facilitate the incorporation of CNVs in genomic studies to identify genes affecting traits of importance. Genome-wide CNVs were detected in high-density single nucleotide polymorphism (SNP) genotyping data from 1,717 Nelore (Bos indicus) cattle, and in NGS data from eight key ancestral bulls. A total of 68,007 and 12,786 distinct CNVs were observed, respectively. Cross-comparisons of results obtained for the eight resequenced animals revealed that 92 % of the CNVs were observed in both datasets, while 62 % of all detected CNVs were observed to overlap with previously validated cattle copy number variant regions (CNVRs). Observed CNVs were used for obtaining breed-specific CNV frequencies and identification of CNVRs, which were subsequently used for gene annotation. A total of 688 of the detected CNVRs were observed to overlap with 286 non-redundant QTLs associated with important production traits in cattle. All of 34 CNVs previously reported to be associated with milk production traits in Holsteins were also observed in Nelore cattle. Comparisons of estimated frequencies of these CNVs in the two breeds revealed 14, 13, 6 and 14 regions in high (>20 %), low (<20 %) and divergent (NEL > HOL, NEL < HOL) frequencies, respectively. Obtained results significantly enriched the bovine CNV map and enabled the identification of variants that are potentially associated with traits under selection in Nelore cattle, particularly in genome regions harboring QTLs affecting production traits.

  13. Development and validation of the Axiom(®) Apple480K SNP genotyping array.

    PubMed

    Bianco, Luca; Cestaro, Alessandro; Linsmith, Gareth; Muranty, Hélène; Denancé, Caroline; Théron, Anthony; Poncet, Charles; Micheletti, Diego; Kerschbamer, Emanuela; Di Pierro, Erica A; Larger, Simone; Pindo, Massimo; Van de Weg, Eric; Davassi, Alessandro; Laurens, François; Velasco, Riccardo; Durel, Charles-Eric; Troggio, Michela

    2016-04-01

    Cultivated apple (Malus × domestica Borkh.) is one of the most important fruit crops in temperate regions, and has great economic and cultural value. The apple genome is highly heterozygous and has undergone a recent duplication which, combined with a rapid linkage disequilibrium decay, makes it difficult to perform genome-wide association (GWA) studies. Single nucleotide polymorphism arrays offer highly multiplexed assays at a relatively low cost per data point and can be a valid tool for the identification of the markers associated with traits of interest. Here, we describe the development and validation of a 487K SNP Affymetrix Axiom(®) genotyping array for apple and discuss its potential applications. The array has been built from the high-depth resequencing of 63 different cultivars covering most of the genetic diversity in cultivated apple. The SNPs were chosen by applying a focal points approach to enrich genic regions, but also to reach a uniform coverage of non-genic regions. A total of 1324 apple accessions, including the 92 progenies of two mapping populations, have been genotyped with the Axiom(®) Apple480K to assess the effectiveness of the array. A large majority of SNPs (359 994 or 74%) fell in the stringent class of poly high resolution polymorphisms. We also devised a filtering procedure to identify a subset of 275K very robust markers that can be safely used for germplasm surveys in apple. The Axiom(®) Apple480K has now been commercially released both for public and proprietary use and will likely be a reference tool for GWA studies in apple. © 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.

  14. Genome-wide association study reveals a QTL and strong candidate genes for umbilical hernia in pigs on SSC14.

    PubMed

    Grindflek, Eli; Hansen, Marianne H S; Lien, Sigbjørn; van Son, Maren

    2018-05-29

    Umbilical hernia is one of the most prevalent congenital defect in pigs, causing economic losses and substantial animal welfare problems. Identification and implementation of genomic regions controlling umbilical hernia in breeding is of great interest to reduce incidences of hernia in commercial pig production. The aim of this study was to identify such regions and possibly identify causative variation affecting umbilical hernia in pigs. A case/control material consisting of 739 Norwegian Landrace pigs was collected and applied in a GWAS study with a genome-wide distributed panel of 60 K SNPs. Additionally candidate genes were sequenced to detect additional polymorphisms that were used for single SNP and haplotype association analyses in 453 of the pigs. The GWAS in this report detected a highly significant region affecting umbilical hernia around 50 Mb on SSC14 (P < 0.0001) explaining up to 8.6% of the phenotypic variance of the trait. The region is rather broad and includes 62 significant SNPs in high linkage disequilibrium with each other. Targeted sequencing of candidate genes within the region revealed polymorphisms within the Leukemia inhibitory factor (LIF) and Oncostatin M (OSM) that were significantly associated with umbilical hernia (P < 0.001). A highly significant QTL for umbilical hernia in Norwegian Landrace pigs was detected around 50 Mb on SSC14. Resequencing of candidate genes within the region revealed SNPs within LIF and OSM highly associated with the trait. However, because of extended LD within the region, studies in other populations and functional studies are needed to determine whether these variants are causal or not. Still without this knowledge, SNPs within the region can be used as genetic markers to reduce incidences of umbilical hernia in Norwegian Landrace pigs.

  15. Mutations in IRX5 impair craniofacial development and germ cell migration via SDF1.

    PubMed

    Bonnard, Carine; Strobl, Anna C; Shboul, Mohammad; Lee, Hane; Merriman, Barry; Nelson, Stanley F; Ababneh, Osama H; Uz, Elif; Güran, Tülay; Kayserili, Hülya; Hamamy, Hanan; Reversade, Bruno

    2012-05-13

    Using homozygosity mapping and locus resequencing, we found that alterations in the homeodomain of the IRX5 transcription factor cause a recessive congenital disorder affecting face, brain, blood, heart, bone and gonad development. We found through in vivo modeling in Xenopus laevis embryos that Irx5 modulates the migration of progenitor cell populations in branchial arches and gonads by repressing Sdf1. We further found that transcriptional control by Irx5 is modulated by direct protein-protein interaction with two GATA zinc-finger proteins, GATA3 and TRPS1; disruptions of these proteins also cause craniofacial dysmorphisms. Our findings suggest that IRX proteins integrate combinatorial transcriptional inputs to regulate key signaling molecules involved in the ontogeny of multiple organs during embryogenesis and homeostasis.

  16. A Noncoding, Regulatory Mutation Implicates HCFC1 in Nonsyndromic Intellectual Disability

    PubMed Central

    Huang, Lingli; Jolly, Lachlan A.; Willis-Owen, Saffron; Gardner, Alison; Kumar, Raman; Douglas, Evelyn; Shoubridge, Cheryl; Wieczorek, Dagmar; Tzschach, Andreas; Cohen, Monika; Hackett, Anna; Field, Michael; Froyen, Guy; Hu, Hao; Haas, Stefan A.; Ropers, Hans-Hilger; Kalscheuer, Vera M.; Corbett, Mark A.; Gecz, Jozef

    2012-01-01

    The discovery of mutations causing human disease has so far been biased toward protein-coding regions. Having excluded all annotated coding regions, we performed targeted massively parallel resequencing of the nonrepetitive genomic linkage interval at Xq28 of family MRX3. We identified in the binding site of transcription factor YY1 a regulatory mutation that leads to overexpression of the chromatin-associated transcriptional regulator HCFC1. When tested on embryonic murine neural stem cells and embryonic hippocampal neurons, HCFC1 overexpression led to a significant increase of the production of astrocytes and a considerable reduction in neurite growth. Two other nonsynonymous, potentially deleterious changes have been identified by X-exome sequencing in individuals with intellectual disability, implicating HCFC1 in normal brain function. PMID:23000143

  17. Draft genome sequence of chickpea (Cicer arietinum) provides a resource for trait improvement.

    PubMed

    Varshney, Rajeev K; Song, Chi; Saxena, Rachit K; Azam, Sarwar; Yu, Sheng; Sharpe, Andrew G; Cannon, Steven; Baek, Jongmin; Rosen, Benjamin D; Tar'an, Bunyamin; Millan, Teresa; Zhang, Xudong; Ramsay, Larissa D; Iwata, Aiko; Wang, Ying; Nelson, William; Farmer, Andrew D; Gaur, Pooran M; Soderlund, Carol; Penmetsa, R Varma; Xu, Chunyan; Bharti, Arvind K; He, Weiming; Winter, Peter; Zhao, Shancen; Hane, James K; Carrasquilla-Garcia, Noelia; Condie, Janet A; Upadhyaya, Hari D; Luo, Ming-Cheng; Thudi, Mahendar; Gowda, C L L; Singh, Narendra P; Lichtenzveig, Judith; Gali, Krishna K; Rubio, Josefa; Nadarajan, N; Dolezel, Jaroslav; Bansal, Kailash C; Xu, Xun; Edwards, David; Zhang, Gengyun; Kahl, Guenter; Gil, Juan; Singh, Karam B; Datta, Swapan K; Jackson, Scott A; Wang, Jun; Cook, Douglas R

    2013-03-01

    Chickpea (Cicer arietinum) is the second most widely grown legume crop after soybean, accounting for a substantial proportion of human dietary nitrogen intake and playing a crucial role in food security in developing countries. We report the ∼738-Mb draft whole genome shotgun sequence of CDC Frontier, a kabuli chickpea variety, which contains an estimated 28,269 genes. Resequencing and analysis of 90 cultivated and wild genotypes from ten countries identifies targets of both breeding-associated genetic sweeps and breeding-associated balancing selection. Candidate genes for disease resistance and agronomic traits are highlighted, including traits that distinguish the two main market classes of cultivated chickpea--desi and kabuli. These data comprise a resource for chickpea improvement through molecular breeding and provide insights into both genome diversity and domestication.

  18. Network-assisted crop systems genetics: network inference and integrative analysis.

    PubMed

    Lee, Tak; Kim, Hyojin; Lee, Insuk

    2015-04-01

    Although next-generation sequencing (NGS) technology has enabled the decoding of many crop species genomes, most of the underlying genetic components for economically important crop traits remain to be determined. Network approaches have proven useful for the study of the reference plant, Arabidopsis thaliana, and the success of network-based crop genetics will also require the availability of a genome-scale functional networks for crop species. In this review, we discuss how to construct functional networks and elucidate the holistic view of a crop system. The crop gene network then can be used for gene prioritization and the analysis of resequencing-based genome-wide association study (GWAS) data, the amount of which will rapidly grow in the field of crop science in the coming years. Copyright © 2015 Elsevier Ltd. All rights reserved.

  19. Estimating the parameters of background selection and selective sweeps in Drosophila in the presence of gene conversion

    PubMed Central

    Campos, José Luis; Charlesworth, Brian

    2017-01-01

    We used whole-genome resequencing data from a population of Drosophila melanogaster to investigate the causes of the negative correlation between the within-population synonymous nucleotide site diversity (πS) of a gene and its degree of divergence from related species at nonsynonymous nucleotide sites (KA). By using the estimated distributions of mutational effects on fitness at nonsynonymous and UTR sites, we predicted the effects of background selection at sites within a gene on πS and found that these could account for only part of the observed correlation between πS and KA. We developed a model of the effects of selective sweeps that included gene conversion as well as crossing over. We used this model to estimate the average strength of selection on positively selected mutations in coding sequences and in UTRs, as well as the proportions of new mutations that are selectively advantageous. Genes with high levels of selective constraint on nonsynonymous sites were found to have lower strengths of positive selection and lower proportions of advantageous mutations than genes with low levels of constraint. Overall, background selection and selective sweeps within a typical gene reduce its synonymous diversity to ∼75% of its value in the absence of selection, with larger reductions for genes with high KA. Gene conversion has a major effect on the estimates of the parameters of positive selection, such that the estimated strength of selection on favorable mutations is greatly reduced if it is ignored. PMID:28559322

  20. Genomic paradigms for food-borne enteric pathogen analysis at the USFDA: case studies highlighting method utility, integration and resolution.

    PubMed

    Elkins, C A; Kotewicz, M L; Jackson, S A; Lacher, D W; Abu-Ali, G S; Patel, I R

    2013-01-01

    Modern risk control and food safety practices involving food-borne bacterial pathogens are benefiting from new genomic technologies for rapid, yet highly specific, strain characterisations. Within the United States Food and Drug Administration (USFDA) Center for Food Safety and Applied Nutrition (CFSAN), optical genome mapping and DNA microarray genotyping have been used for several years to quickly assess genomic architecture and gene content, respectively, for outbreak strain subtyping and to enhance retrospective trace-back analyses. The application and relative utility of each method varies with outbreak scenario and the suspect pathogen, with comparative analytical power enhanced by database scale and depth. Integration of these two technologies allows high-resolution scrutiny of the genomic landscapes of enteric food-borne pathogens with notable examples including Shiga toxin-producing Escherichia coli (STEC) and Salmonella enterica serovars from a variety of food commodities. Moreover, the recent application of whole genome sequencing technologies to food-borne pathogen outbreaks and surveillance has enhanced resolution to the single nucleotide scale. This new wealth of sequence data will support more refined next-generation custom microarray designs, targeted re-sequencing and "genomic signature recognition" approaches involving a combination of genes and single nucleotide polymorphism detection to distil strain-specific fingerprinting to a minimised scale. This paper examines the utility of microarrays and optical mapping in analysing outbreaks, reviews best practices and the limits of these technologies for pathogen differentiation, and it considers future integration with whole genome sequencing efforts.

  1. Identification of 99 novel mutations in a worldwide cohort of 1,056 patients with a nephronophthisis-related ciliopathy.

    PubMed

    Halbritter, Jan; Porath, Jonathan D; Diaz, Katrina A; Braun, Daniela A; Kohl, Stefan; Chaki, Moumita; Allen, Susan J; Soliman, Neveen A; Hildebrandt, Friedhelm; Otto, Edgar A

    2013-08-01

    Nephronophthisis-related ciliopathies (NPHP-RC) are autosomal-recessive cystic kidney diseases. More than 13 genes are implicated in its pathogenesis to date, accounting for only 40 % of all cases. High-throughput mutation screenings of large patient cohorts represent a powerful tool for diagnostics and identification of novel NPHP genes. We here performed a new high-throughput mutation analysis method to study 13 established NPHP genes (NPHP1-NPHP13) in a worldwide cohort of 1,056 patients diagnosed with NPHP-RC. We first applied multiplexed PCR-based amplification using Fluidigm Access-Array™ technology followed by barcoding and next-generation resequencing on an Illumina platform. As a result, we established the molecular diagnosis in 127/1,056 independent individuals (12.0 %) and identified a single heterozygous truncating mutation in an additional 31 individuals (2.9 %). Altogether, we detected 159 different mutations in 11 out of 13 different NPHP genes, 99 of which were novel. Phenotypically most remarkable were two patients with truncating mutations in INVS/NPHP2 who did not present as infants and did not exhibit extrarenal manifestations. In addition, we present the first case of Caroli disease due to mutations in WDR19/NPHP13 and the second case ever with a recessive mutation in GLIS2/NPHP7. This study represents the most comprehensive mutation analysis in NPHP-RC patients, identifying the largest number of novel mutations in a single study worldwide.

  2. Molecular mapping and genomics of soybean seed protein: a review and perspective for the future.

    PubMed

    Patil, Gunvant; Mian, Rouf; Vuong, Tri; Pantalone, Vince; Song, Qijian; Chen, Pengyin; Shannon, Grover J; Carter, Tommy C; Nguyen, Henry T

    2017-10-01

    Genetic improvement of soybean protein meal is a complex process because of negative correlation with oil, yield, and temperature. This review describes the progress in mapping and genomics, identifies knowledge gaps, and highlights the need of integrated approaches. Meal protein derived from soybean [Glycine max (L) Merr.] seed is the primary source of protein in poultry and livestock feed. Protein is a key factor that determines the nutritional and economical value of soybean. Genetic improvement of soybean seed protein content is highly desirable, and major quantitative trait loci (QTL) for soybean protein have been detected and repeatedly mapped on chromosomes (Chr.) 20 (LG-I), and 15 (LG-E). However, practical breeding progress is challenging because of seed protein content's negative genetic correlation with seed yield, other seed components such as oil and sucrose, and interaction with environmental effects such as temperature during seed development. In this review, we discuss rate-limiting factors related to soybean protein content and nutritional quality, and potential control factors regulating seed storage protein. In addition, we describe advances in next-generation sequencing technologies for precise detection of natural variants and their integration with conventional and high-throughput genotyping technologies. A syntenic analysis of QTL on Chr. 15 and 20 was performed. Finally, we discuss comprehensive approaches for integrating protein and amino acid QTL, genome-wide association studies, whole-genome resequencing, and transcriptome data to accelerate identification of genomic hot spots for allele introgression and soybean meal protein improvement.

  3. The Genome of the “Great Speciator” Provides Insights into Bird Diversification

    PubMed Central

    Cornetti, Luca; Valente, Luis M.; Dunning, Luke T.; Quan, Xueping; Black, Richard A.; Hébert, Olivier; Savolainen, Vincent

    2015-01-01

    Among birds, white-eyes (genus Zosterops) have diversified so extensively that Jared Diamond and Ernst Mayr referred to them as the “great speciator.” The Zosterops lineage exhibits some of the fastest rates of species diversification among vertebrates, and its members are the most prolific passerine island colonizers. We present a high-quality genome assembly for the silvereye (Zosterops lateralis), a white-eye species consisting of several subspecies distributed across multiple islands. We investigate the genetic basis of rapid diversification in white-eyes by conducting genomic analyses at varying taxonomic levels. First, we compare the silvereye genome with those of birds from different families and searched for genomic features that may be unique to Zosterops. Second, we compare the genomes of different species of white-eyes from Lifou island (South Pacific), using whole genome resequencing and restriction site associated DNA. Third, we contrast the genomes of two subspecies of silvereye that differ in plumage color. In accordance with theory, we show that white-eyes have high rates of substitutions, gene duplication, and positive selection relative to other birds. Below genus level, we find that genomic differentiation accumulates rapidly and reveals contrasting demographic histories between sympatric species on Lifou, indicative of past interspecific interactions. Finally, we highlight genes possibly involved in color polymorphism between the subspecies of silvereye. By providing the first whole-genome sequence resources for white-eyes and by conducting analyses at different taxonomic levels, we provide genomic evidence underpinning this extraordinary bird radiation. PMID:26338191

  4. EUPAN enables pan-genome studies of a large number of eukaryotic genomes.

    PubMed

    Hu, Zhiqiang; Sun, Chen; Lu, Kuang-Chen; Chu, Xixia; Zhao, Yue; Lu, Jinyuan; Shi, Jianxin; Wei, Chaochun

    2017-08-01

    Pan-genome analyses are routinely carried out for bacteria to interpret the within-species gene presence/absence variations (PAVs). However, pan-genome analyses are rare for eukaryotes due to the large sizes and higher complexities of their genomes. Here we proposed EUPAN, a eukaryotic pan-genome analysis toolkit, enabling automatic large-scale eukaryotic pan-genome analyses and detection of gene PAVs at a relatively low sequencing depth. In the previous studies, we demonstrated the effectiveness and high accuracy of EUPAN in the pan-genome analysis of 453 rice genomes, in which we also revealed widespread gene PAVs among individual rice genomes. Moreover, EUPAN can be directly applied to the current re-sequencing projects primarily focusing on single nucleotide polymorphisms. EUPAN is implemented in Perl, R and C ++. It is supported under Linux and preferred for a computer cluster with LSF and SLURM job scheduling system. EUPAN together with its standard operating procedure (SOP) is freely available for non-commercial use (CC BY-NC 4.0) at http://cgm.sjtu.edu.cn/eupan/index.html . ccwei@sjtu.edu.cn or jianxin.shi@sjtu.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  5. Identification of Putative Transmembrane Proteins Involved in Salinity Tolerance in Chenopodium quinoa by Integrating Physiological Data, RNAseq, and SNP Analyses

    PubMed Central

    Schmöckel, Sandra M.; Lightfoot, Damien J.; Razali, Rozaimi; Tester, Mark; Jarvis, David E.

    2017-01-01

    Chenopodium quinoa (quinoa) is an emerging crop that produces nutritious grains with the potential to contribute to global food security. Quinoa can also grow on marginal lands, such as soils affected by high salinity. To identify candidate salt tolerance genes in the recently sequenced quinoa genome, we used a multifaceted approach integrating RNAseq analyses with comparative genomics and topology prediction. We identified 219 candidate genes by selecting those that were differentially expressed in response to salinity, were specific to or overrepresented in quinoa relative to other Amaranthaceae species, and had more than one predicted transmembrane domain. To determine whether these genes might underlie variation in salinity tolerance in quinoa and its close relatives, we compared the response to salinity stress in a panel of 21 Chenopodium accessions (14 C. quinoa, 5 C. berlandieri, and 2 C. hircinum). We found large variation in salinity tolerance, with one C. hircinum displaying the highest salinity tolerance. Using genome re-sequencing data from these accessions, we investigated single nucleotide polymorphisms and copy number variation (CNV) in the 219 candidate genes in accessions of contrasting salinity tolerance, and identified 15 genes that could contribute to the differences in salinity tolerance of these Chenopodium accessions. PMID:28680429

  6. Evaluation of second-generation sequencing of 19 dilated cardiomyopathy genes for clinical applications.

    PubMed

    Gowrisankar, Sivakumar; Lerner-Ellis, Jordan P; Cox, Stephanie; White, Emily T; Manion, Megan; LeVan, Kevin; Liu, Jonathan; Farwell, Lisa M; Iartchouk, Oleg; Rehm, Heidi L; Funke, Birgit H

    2010-11-01

    Medical sequencing for diseases with locus and allelic heterogeneities has been limited by the high cost and low throughput of traditional sequencing technologies. "Second-generation" sequencing (SGS) technologies allow the parallel processing of a large number of genes and, therefore, offer great promise for medical sequencing; however, their use in clinical laboratories is still in its infancy. Our laboratory offers clinical resequencing for dilated cardiomyopathy (DCM) using an array-based platform that interrogates 19 of more than 30 genes known to cause DCM. We explored both the feasibility and cost effectiveness of using PCR amplification followed by SGS technology for sequencing these 19 genes in a set of five samples enriched for known sequence alterations (109 unique substitutions and 27 insertions and deletions). While the analytical sensitivity for substitutions was comparable to that of the DCM array (98%), SGS technology performed better than the DCM array for insertions and deletions (90.6% versus 58%). Overall, SGS performed substantially better than did the current array-based testing platform; however, the operational cost and projected turnaround time do not meet our current standards. Therefore, efficient capture methods and/or sample pooling strategies that shorten the turnaround time and decrease reagent and labor costs are needed before implementing this platform into routine clinical applications.

  7. Channelopathy pathogenesis in autism spectrum disorders.

    PubMed

    Schmunk, Galina; Gargus, J Jay

    2013-11-05

    Autism spectrum disorder (ASD) is a syndrome that affects normal brain development and is characterized by impaired social interaction as well as verbal and non-verbal communication and by repetitive, stereotypic behavior. ASD is a complex disorder arising from a combination of multiple genetic and environmental factors that are independent from racial, ethnic and socioeconomical status. The high heritability of ASD suggests a strong genetic basis for the disorder. Furthermore, a mounting body of evidence implies a role of various ion channel gene defects (channelopathies) in the pathogenesis of autism. Indeed, recent genome-wide association, and whole exome- and whole-genome resequencing studies linked polymorphisms and rare variants in calcium, sodium and potassium channels and their subunits with susceptibility to ASD, much as they do with bipolar disorder, schizophrenia and other neuropsychiatric disorders. Moreover, animal models with these genetic variations recapitulate endophenotypes considered to be correlates of autistic behavior seen in patients. An ion flux across the membrane regulates a variety of cell functions, from generation of action potentials to gene expression and cell morphology, thus it is not surprising that channelopathies have profound effects on brain functions. In the present work, we summarize existing evidence for the role of ion channel gene defects in the pathogenesis of autism with a focus on calcium signaling and its downstream effects.

  8. Diversity of the TLR4 Immunity Receptor in Czech Native Cattle Breeds Revealed Using the Pacific Biosciences Sequencing Platform.

    PubMed

    Novák, Karel; Pikousová, Jitka; Czerneková, Vladimíra; Mátlová, Věra

    2017-07-03

    The allelic variants of immunity genes in historical breeds likely reflect local infection pressure and therefore represent a reservoir for breeding. Screening to determine the diversity of the Toll-like receptor gene TLR4 was conducted in two conserved cattle breeds: Czech Red and Czech Red Pied. High-throughput sequencing of pooled PCR amplicons using the PacBio platform revealed polymorphisms, which were subsequently confirmed via genotyping techniques. Eight SNPs found in coding and adjacent regions were grouped into 18 haplotypes, representing a significant portion of the known diversity in the global breed panel and presumably exceeding diversity in production populations. Notably, the ancient Czech Red breed appeared to possess greater haplotype diversity than the Czech Red Pied breed, a Simmental variant, although the haplotype frequencies might have been distorted by significant crossbreeding and bottlenecks in the history of Czech Red cattle. The differences in haplotype frequencies validated the phenotypic distinctness of the local breeds. Due to the availability of Czech Red Pied production herds, the effect of intensive breeding on TLR diversity can be evaluated in this model. The advantages of the Pacific Biosciences technology for the resequencing of long PCR fragments with subsequent direct phasing were independently validated.

  9. Whole-genome resequencing reveals candidate mutations for pig prolificacy.

    PubMed

    Li, Wen-Ting; Zhang, Meng-Meng; Li, Qi-Gang; Tang, Hui; Zhang, Li-Fan; Wang, Ke-Jun; Zhu, Mu-Zhen; Lu, Yun-Feng; Bao, Hai-Gang; Zhang, Yuan-Ming; Li, Qiu-Yan; Wu, Ke-Liang; Wu, Chang-Xin

    2017-12-20

    Changes in pig fertility have occurred as a result of domestication, but are not understood at the level of genetic variation. To identify variations potentially responsible for prolificacy, we sequenced the genomes of the highly prolific Taihu pig breed and four control breeds. Genes involved in embryogenesis and morphogenesis were targeted in the Taihu pig, consistent with the morphological differences observed between the Taihu pig and others during pregnancy. Additionally, excessive functional non-coding mutations have been specifically fixed or nearly fixed in the Taihu pig. We focused attention on an oestrogen response element (ERE) within the first intron of the bone morphogenetic protein receptor type-1B gene ( BMPR1B ) that overlaps with a known quantitative trait locus (QTL) for pig fecundity. Using 242 pigs from 30 different breeds, we confirmed that the genotype of the ERE was nearly fixed in the Taihu pig. ERE function was assessed by luciferase assays, examination of histological sections, chromatin immunoprecipitation, quantitative polymerase chain reactions, and western blots. The results suggest that the ERE may control pig prolificacy via the cis-regulation of BMPR1B expression. This study provides new insight into changes in reproductive performance and highlights the role of non-coding mutations in generating phenotypic diversity between breeds. © 2017 The Author(s).

  10. Simultaneous achievement of high ethanol yield and titer in Clostridium thermocellum

    DOE PAGES

    Tian, Liang; Papanek, Beth; Olson, Daniel G.; ...

    2016-06-02

    Background Biofuel production from plant cell walls offers the potential for sustainable and economically attractive alternatives to petroleum-based products. Fuels from cellulosic biomass are particularly promising, but would benefit from lower processing costs. Clostridium thermocellum can rapidly solubilize and ferment cellulosic biomass, making it a promising candidate microorganism for consolidated bioprocessing for biofuel production, but increases in product yield and titer are still needed. Results We started with an engineered C. thermocellum strain where the central metabolic pathways to products other than ethanol had been deleted. After two stages of adaptive evolution, an evolved strain was selected with improved yieldmore » and titer. On chemically defined medium with crystalline cellulose as substrate, the evolved strain produced 22.4 ± 1.4 g/L ethanol from 60 g/L cellulose. Moreover, the resulting yield was about 0.39 gETOH/gGluc eq, which is 75 % of the maximum theoretical yield. Genome resequencing, proteomics, and biochemical analysis were used to examine differences between the original and evolved strains. Conclusions A two step selection method successfully improved the ethanol yield and the titer. Finaly, this evolved strain has the highest ethanol yield and titer reported to date for C. thermocellum, and is an important step in the development of this microbe for industrial applications.« less

  11. Rare variation facilitates inferences of fine-scale population structure in humans.

    PubMed

    O'Connor, Timothy D; Fu, Wenqing; Mychaleckyj, Josyf C; Logsdon, Benjamin; Auer, Paul; Carlson, Christopher S; Leal, Suzanne M; Smith, Joshua D; Rieder, Mark J; Bamshad, Michael J; Nickerson, Deborah A; Akey, Joshua M

    2015-03-01

    Understanding the genetic structure of human populations has important implications for the design and interpretation of disease mapping studies and reconstructing human evolutionary history. To date, inferences of human population structure have primarily been made with common variants. However, recent large-scale resequencing studies have shown an abundance of rare variation in humans, which may be particularly useful for making inferences of fine-scale population structure. To this end, we used an information theory framework and extensive coalescent simulations to rigorously quantify the informativeness of rare and common variation to detect signatures of fine-scale population structure. We show that rare variation affords unique insights into patterns of recent population structure. Furthermore, to empirically assess our theoretical findings, we analyzed high-coverage exome sequences in 6,515 European and African American individuals. As predicted, rare variants are more informative than common polymorphisms in revealing a distinct cluster of European-American individuals, and subsequent analyses demonstrate that these individuals are likely of Ashkenazi Jewish ancestry. Our results provide new insights into the population structure using rare variation, which will be an important factor to account for in rare variant association studies. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  12. Channelopathy pathogenesis in autism spectrum disorders

    PubMed Central

    Schmunk, Galina; Gargus, J. Jay

    2013-01-01

    Autism spectrum disorder (ASD) is a syndrome that affects normal brain development and is characterized by impaired social interaction as well as verbal and non-verbal communication and by repetitive, stereotypic behavior. ASD is a complex disorder arising from a combination of multiple genetic and environmental factors that are independent from racial, ethnic and socioeconomical status. The high heritability of ASD suggests a strong genetic basis for the disorder. Furthermore, a mounting body of evidence implies a role of various ion channel gene defects (channelopathies) in the pathogenesis of autism. Indeed, recent genome-wide association, and whole exome- and whole-genome resequencing studies linked polymorphisms and rare variants in calcium, sodium and potassium channels and their subunits with susceptibility to ASD, much as they do with bipolar disorder, schizophrenia and other neuropsychiatric disorders. Moreover, animal models with these genetic variations recapitulate endophenotypes considered to be correlates of autistic behavior seen in patients. An ion flux across the membrane regulates a variety of cell functions, from generation of action potentials to gene expression and cell morphology, thus it is not surprising that channelopathies have profound effects on brain functions. In the present work, we summarize existing evidence for the role of ion channel gene defects in the pathogenesis of autism with a focus on calcium signaling and its downstream effects. PMID:24204377

  13. A programmable method for massively parallel targeted sequencing

    PubMed Central

    Hopmans, Erik S.; Natsoulis, Georges; Bell, John M.; Grimes, Susan M.; Sieh, Weiva; Ji, Hanlee P.

    2014-01-01

    We have developed a targeted resequencing approach referred to as Oligonucleotide-Selective Sequencing. In this study, we report a series of significant improvements and novel applications of this method whereby the surface of a sequencing flow cell is modified in situ to capture specific genomic regions of interest from a sample and then sequenced. These improvements include a fully automated targeted sequencing platform through the use of a standard Illumina cBot fluidics station. Targeting optimization increased the yield of total on-target sequencing data 2-fold compared to the previous iteration, while simultaneously increasing the percentage of reads that could be mapped to the human genome. The described assays cover up to 1421 genes with a total coverage of 5.5 Megabases (Mb). We demonstrate a 10-fold abundance uniformity of greater than 90% in 1 log distance from the median and a targeting rate of up to 95%. We also sequenced continuous genomic loci up to 1.5 Mb while simultaneously genotyping SNPs and genes. Variants with low minor allele fraction were sensitively detected at levels of 5%. Finally, we determined the exact breakpoint sequence of cancer rearrangements. Overall, this approach has high performance for selective sequencing of genome targets, configuration flexibility and variant calling accuracy. PMID:24782526

  14. Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data

    NASA Astrophysics Data System (ADS)

    Sandmann, Sarah; de Graaf, Aniek O.; Karimi, Mohsen; van der Reijden, Bert A.; Hellström-Lindberg, Eva; Jansen, Joop H.; Dugas, Martin

    2017-02-01

    Valid variant calling results are crucial for the use of next-generation sequencing in clinical routine. However, there are numerous variant calling tools that usually differ in algorithms, filtering strategies, recommendations and thus, also in the output. We evaluated eight open-source tools regarding their ability to call single nucleotide variants and short indels with allelic frequencies as low as 1% in non-matched next-generation sequencing data: GATK HaplotypeCaller, Platypus, VarScan, LoFreq, FreeBayes, SNVer, SAMtools and VarDict. We analysed two real datasets from patients with myelodysplastic syndrome, covering 54 Illumina HiSeq samples and 111 Illumina NextSeq samples. Mutations were validated by re-sequencing on the same platform, on a different platform and expert based review. In addition we considered two simulated datasets with varying coverage and error profiles, covering 50 samples each. In all cases an identical target region consisting of 19 genes (42,322 bp) was analysed. Altogether, no tool succeeded in calling all mutations. High sensitivity was always accompanied by low precision. Influence of varying coverages- and background noise on variant calling was generally low. Taking everything into account, VarDict performed best. However, our results indicate that there is a need to improve reproducibility of the results in the context of multithreading.

  15. Simultaneous achievement of high ethanol yield and titer in Clostridium thermocellum.

    PubMed

    Tian, Liang; Papanek, Beth; Olson, Daniel G; Rydzak, Thomas; Holwerda, Evert K; Zheng, Tianyong; Zhou, Jilai; Maloney, Marybeth; Jiang, Nannan; Giannone, Richard J; Hettich, Robert L; Guss, Adam M; Lynd, Lee R

    2016-01-01

    Biofuel production from plant cell walls offers the potential for sustainable and economically attractive alternatives to petroleum-based products. Fuels from cellulosic biomass are particularly promising, but would benefit from lower processing costs. Clostridium thermocellum can rapidly solubilize and ferment cellulosic biomass, making it a promising candidate microorganism for consolidated bioprocessing for biofuel production, but increases in product yield and titer are still needed. Here, we started with an engineered C. thermocellum strain where the central metabolic pathways to products other than ethanol had been deleted. After two stages of adaptive evolution, an evolved strain was selected with improved yield and titer. On chemically defined medium with crystalline cellulose as substrate, the evolved strain produced 22.4 ± 1.4 g/L ethanol from 60 g/L cellulose. The resulting yield was about 0.39 gETOH/gGluc eq, which is 75 % of the maximum theoretical yield. Genome resequencing, proteomics, and biochemical analysis were used to examine differences between the original and evolved strains. A two step selection method successfully improved the ethanol yield and the titer. This evolved strain has the highest ethanol yield and titer reported to date for C. thermocellum, and is an important step in the development of this microbe for industrial applications.

  16. Improving the annotation of the Heterorhabditis bacteriophora genome.

    PubMed

    McLean, Florence; Berger, Duncan; Laetsch, Dominik R; Schwartz, Hillel T; Blaxter, Mark

    2018-04-01

    Genome assembly and annotation remain exacting tasks. As the tools available for these tasks improve, it is useful to return to data produced with earlier techniques to assess their credibility and correctness. The entomopathogenic nematode Heterorhabditis bacteriophora is widely used to control insect pests in horticulture. The genome sequence for this species was reported to encode an unusually high proportion of unique proteins and a paucity of secreted proteins compared to other related nematodes. We revisited the H. bacteriophora genome assembly and gene predictions to determine whether these unusual characteristics were biological or methodological in origin. We mapped an independent resequencing dataset to the genome and used the blobtools pipeline to identify potential contaminants. While present (0.2% of the genome span, 0.4% of predicted proteins), assembly contamination was not significant. Re-prediction of the gene set using BRAKER1 and published transcriptome data generated a predicted proteome that was very different from the published one. The new gene set had a much reduced complement of unique proteins, better completeness values that were in line with other related species' genomes, and an increased number of proteins predicted to be secreted. It is thus likely that methodological issues drove the apparent uniqueness of the initial H. bacteriophora genome annotation and that similar contamination and misannotation issues affect other published genome assemblies.

  17. Mutations affecting the SAND domain of DEAF1 cause intellectual disability with severe speech impairment and behavioral problems.

    PubMed

    Vulto-van Silfhout, Anneke T; Rajamanickam, Shivakumar; Jensik, Philip J; Vergult, Sarah; de Rocker, Nina; Newhall, Kathryn J; Raghavan, Ramya; Reardon, Sara N; Jarrett, Kelsey; McIntyre, Tara; Bulinski, Joseph; Ownby, Stacy L; Huggenvik, Jodi I; McKnight, G Stanley; Rose, Gregory M; Cai, Xiang; Willaert, Andy; Zweier, Christiane; Endele, Sabine; de Ligt, Joep; van Bon, Bregje W M; Lugtenberg, Dorien; de Vries, Petra F; Veltman, Joris A; van Bokhoven, Hans; Brunner, Han G; Rauch, Anita; de Brouwer, Arjan P M; Carvill, Gemma L; Hoischen, Alexander; Mefford, Heather C; Eichler, Evan E; Vissers, Lisenka E L M; Menten, Björn; Collard, Michael W; de Vries, Bert B A

    2014-05-01

    Recently, we identified in two individuals with intellectual disability (ID) different de novo mutations in DEAF1, which encodes a transcription factor with an important role in embryonic development. To ascertain whether these mutations in DEAF1 are causative for the ID phenotype, we performed targeted resequencing of DEAF1 in an additional cohort of over 2,300 individuals with unexplained ID and identified two additional individuals with de novo mutations in this gene. All four individuals had severe ID with severely affected speech development, and three showed severe behavioral problems. DEAF1 is highly expressed in the CNS, especially during early embryonic development. All four mutations were missense mutations affecting the SAND domain of DEAF1. Altered DEAF1 harboring any of the four amino acid changes showed impaired transcriptional regulation of the DEAF1 promoter. Moreover, behavioral studies in mice with a conditional knockout of Deaf1 in the brain showed memory deficits and increased anxiety-like behavior. Our results demonstrate that mutations in DEAF1 cause ID and behavioral problems, most likely as a result of impaired transcriptional regulation by DEAF1. Copyright © 2014 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  18. The redox-sensing protein Rex modulates ethanol production in Thermoanaerobacterium saccharolyticum

    PubMed Central

    Lanahan, Anthony A.; Lynd, Lee R.

    2018-01-01

    Thermoanaerobacterium saccharolyticum is a thermophilic anaerobe that has been engineered to produce high amounts of ethanol, reaching ~90% theoretical yield at a titer of 70 g/L. Here we report the physiological changes that occur upon deleting the redox-sensing transcriptional regulator Rex in wild type T. saccharolyticum: a single deletion of rex resulted in a two-fold increase in ethanol yield (from 40% to 91% theoretical yield), but the resulting strains grew only about a third as fast as the wild type strain. Deletion of the rex gene also had the effect of increasing expression of alcohol dehydrogenase genes, adhE and adhA. After several serial transfers, the ethanol yield decreased from an average of 91% to 55%, and the growth rates had increased. We performed whole-genome resequencing to identify secondary mutations in the Δrex strains adapted for faster growth. In several cases, secondary mutations had appeared in the adhE gene. Furthermore, in these strains the NADH-linked alcohol dehydrogenase activity was greatly reduced. Complementation studies were done to reintroduce rex into the Δrex strains: reintroducing rex decreased ethanol yield to below wild type levels in the Δrex strain without adhE mutations, but did not change the ethanol yield in the Δrex strain where an adhE mutation occurred. PMID:29621294

  19. A 200K SNP chip reveals a novel Pacific salmon louse genotype linked to differential efficacy of emamectin benzoate.

    PubMed

    Messmer, Amber M; Leong, Jong S; Rondeau, Eric B; Mueller, Anita; Despins, Cody A; Minkley, David R; Kent, Matthew P; Lien, Sigbjørn; Boyce, Brad; Morrison, Diane; Fast, Mark D; Norman, Joseph D; Danzmann, Roy G; Koop, Ben F

    2018-04-16

    Antiparasitic drugs such as emamectin benzoate (EMB) are relied upon to reduce the parasite load, particularly of the sea louse Lepeophtheirus salmonis, on farmed salmon. The decline in EMB treatment efficacy for this purpose is an important issue for salmon producers around the world, and particularly for those in the Atlantic Ocean where widespread EMB tolerance in sea lice is recognized as a significant problem. Salmon farms in the Northeast Pacific Ocean have not historically experienced the same issues with treatment efficacy, possibly due to the relatively large population of endemic salmonid hosts that serve to both redistribute surviving lice and dilute populations potentially under selection by introducing naïve lice to farms. Frequent migration of lice among farmed and wild hosts should limit the effect of farm-specific selection pressures on changes to the overall allele frequencies of sea lice in the Pacific Ocean. A previous study using microsatellites examined L. salmonis oncorhynchi from 10 Pacific locations from wild and farmed hosts and found no population structure. Recently however, a farm population of sea lice was detected where EMB bioassay exposure tolerance was abnormally elevated. In response, we have developed a Pacific louse draft genome that complements the previously-released Atlantic louse sequence. These genomes were combined with whole-genome re-sequencing data to design a highly sensitive 201,279 marker SNP array applicable for both subspecies (90,827 validated Pacific loci; 153,569 validated Atlantic loci). Notably, kmer spectrum analysis of the re-sequenced samples indicated that Pacific lice exhibit a large within-individual heterozygosity rate (average of 1 in every 72 bases) that is markedly higher than that of Atlantic individuals (1 in every 173 bases). The SNP chip was used to produce a high-density map for Atlantic sea louse linkage group 5 that was previously shown to be associated with EMB tolerance in Atlantic lice. Additionally, 478 Pacific louse samples from farmed and wild hosts obtained between 2005 and 2014 were also genotyped on the array. Clustering analysis allowed us to detect the apparent emergence of an otherwise rare genotype at a high frequency among the lice collected from two farms in 2013 that had reported elevated EMB tolerance. This genotype was not observed in louse samples collected from the same farm in 2010, nor in any lice sampled from other locations prior to 2013. However, this genotype was detected at low frequencies in louse samples from farms in two locations reporting elevated EMB tolerance in 2014. These results suggest that a rare genotype present in Pacific lice may be locally expanded in farms after EMB treatment. Supporting this hypothesis, 437 SNPs associated with this genotype were found to be in a region of linkage group 5 that overlaps the region associated with EMB resistance in Atlantic lice. Finally, five of the top diagnostic SNPs within this region were used to screen lice that had been subjected to an EMB survival assay, revealing a significant association between these SNPs and EMB treatment outcome. To our knowledge this work is the first report to identify a genetic link to altered EMB efficacy in L. salmonis in the Pacific Ocean. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

  20. Genetic investigation of sudden unexpected death in epilepsy cohort by panel target resequencing.

    PubMed

    Coll, Monica; Allegue, Catarina; Partemi, Sara; Mates, Jesus; Del Olmo, Bernat; Campuzano, Oscar; Pascali, Vincenzo; Iglesias, Anna; Striano, Pasquale; Oliva, Antonio; Brugada, Ramon

    2016-03-01

    Sudden unexpected death in epilepsy (SUDEP) is defined as the abrupt, no traumatic, witnessed or unwitnessed death, occurring in benign circumstances, in an individual with epilepsy, with or without evidence for a seizure and excluding documented status epilepticus (seizure duration ≥ 30 min or seizures without recovery), and in which postmortem examination does not reveal a cause of death. Although the physiopathological mechanisms that underlie SUDEP remain to be clarified, the genetic background has been described to play a role in this disorder. Pathogenic variants in genes associated with epilepsy and encoding cardiac ion channels could explain the SUDEP phenotype. To test this we use the next-generation sequencing technology to sequence a cohort of SUDEP cases and its translation into clinical and forensic fields. A panel target resequencing was used to study 14 SUDEP cases from both postmortem (2 cases) and from living patients (12 cases). Genes already associated with SUDEP and also candidate genes had been investigated. Overall, 24 rare genetic variants were identified in 13 SUDEP cases. Four cases showed rare variants with complete segregation in the SCN1A, FBN1, HCN1, SCN4A, and EFHC1 genes, and one case with a rare variant in KCNQ1 gene showed incomplete pattern of inheritance. In four cases, rare variants were detected in CACNA1A, SCN11A and SCN10A, and KCNQ1 genes, but familial segregation was not possible due to lack of DNA from relatives. Finally, in the four remaining cases, the rare variants did not segregate in the family. This study confirms the link between epilepsy, sudden death, and cardiac disease. In addition, we identified new potential candidate genes for SUDEP: FBN1, HCN1, SCN4A, EFHC1, CACNA1A, SCN11A, and SCN10A. Further confirmation in larger cohorts will be necessary especially if genetic screening for SUDEP is applied to forensic and clinical medicine. Nevertheless, this study supports the emerging concept of a genetically determined cardiocerebral channelopathy.

  1. Re-sequencing of the APOAI promoter region and the genetic association of the -75G > A polymorphism with increased cholesterol and low density lipoprotein levels among a sample of the Kuwaiti population

    PubMed Central

    2013-01-01

    Background APOAI, a member of the APOAI/CIII/IV/V gene cluster on chromosome 11q23-24, encodes a major protein component of HDL that has been associated with serum lipid levels. The aim of this study was to determine the genetic association of polymorphisms in the APOAI promoter region with plasma lipid levels in a cohort of healthy Kuwaiti volunteers. Methods A 435 bp region of the APOAI promoter was analyzed by re-sequencing in 549 Kuwaiti samples. DNA was extracted from blood taken from 549 healthy Kuwaiti volunteers who had fasted for the previous 12 h. Univariate and multivariate analysis was used to determine allele association with serum lipid levels. Results The target sequence included a partial segment of the promoter region, 5’UTR and exon 1 located between nucleotides −141 to +294 upstream of the APOAI gene on chromosome 11. No novel single nucleotide polymorphisms (SNPs) were observed. The sequences obtained were deposited with the NCBI GenBank with accession number [GenBank: JX438706]. The allelic frequencies for the three SNPs were as follows: APOAI rs670G = 0.807; rs5069C = 0.964; rs1799837G = 0.997 and found to be in HWE. A significant association (p < 0.05) was observed for the APOAI rs670 polymorphism with increased serum LDL-C. Multivariate analysis showed that APOAI rs670 was an independent predictive factor when controlling for age, sex and BMI for both LDL-C (OR: 1.66, p = 0.014) and TC (OR: 1.77, p = 0.006) levels. Conclusion This study is the first to report sequence analysis of the APOAI promoter in an Arab population. The unexpected positive association found between the APOAI rs670 polymorphism and increased levels of LDL-C and TC may be due to linkage disequilibrium with other polymorphisms in candidate and neighboring genes known to be associated with lipid metabolism and transport. PMID:24028463

  2. Re-sequencing of the APOAI promoter region and the genetic association of the -75G > A polymorphism with increased cholesterol and low density lipoprotein levels among a sample of the Kuwaiti population.

    PubMed

    Al-Bustan, Suzanne A; Al-Serri, Ahmad E; Annice, Babitha G; Alnaqeeb, Majed A; Ebrahim, Ghada A

    2013-09-12

    APOAI, a member of the APOAI/CIII/IV/V gene cluster on chromosome 11q23-24, encodes a major protein component of HDL that has been associated with serum lipid levels. The aim of this study was to determine the genetic association of polymorphisms in the APOAI promoter region with plasma lipid levels in a cohort of healthy Kuwaiti volunteers. A 435 bp region of the APOAI promoter was analyzed by re-sequencing in 549 Kuwaiti samples. DNA was extracted from blood taken from 549 healthy Kuwaiti volunteers who had fasted for the previous 12 h. Univariate and multivariate analysis was used to determine allele association with serum lipid levels. The target sequence included a partial segment of the promoter region, 5'UTR and exon 1 located between nucleotides -141 to +294 upstream of the APOAI gene on chromosome 11. No novel single nucleotide polymorphisms (SNPs) were observed. The sequences obtained were deposited with the NCBI GenBank with accession number [GenBank: JX438706]. The allelic frequencies for the three SNPs were as follows: APOAI rs670G = 0.807; rs5069C = 0.964; rs1799837G = 0.997 and found to be in HWE. A significant association (p < 0.05) was observed for the APOAI rs670 polymorphism with increased serum LDL-C. Multivariate analysis showed that APOAI rs670 was an independent predictive factor when controlling for age, sex and BMI for both LDL-C (OR: 1.66, p = 0.014) and TC (OR: 1.77, p = 0.006) levels. This study is the first to report sequence analysis of the APOAI promoter in an Arab population. The unexpected positive association found between the APOAI rs670 polymorphism and increased levels of LDL-C and TC may be due to linkage disequilibrium with other polymorphisms in candidate and neighboring genes known to be associated with lipid metabolism and transport.

  3. Parallel Mapping and Simultaneous Sequencing Reveals Deletions in BCAN and FAM83H Associated with Discrete Inherited Disorders in a Domestic Dog Breed

    PubMed Central

    Forman, Oliver P.; Hayward, Louisa J.; Ricketts, Sally L.; Mellersh, Cathryn S.

    2012-01-01

    The domestic dog (Canis familiaris) segregates more naturally-occurring diseases and phenotypic variation than any other species and has become established as an unparalled model with which to study the genetics of inherited traits. We used a genome-wide association study (GWAS) and targeted resequencing of DNA from just five dogs to simultaneously map and identify mutations for two distinct inherited disorders that both affect a single breed, the Cavalier King Charles Spaniel. We investigated episodic falling (EF), a paroxysmal exertion-induced dyskinesia, alongside the phenotypically distinct condition congenital keratoconjunctivitis sicca and ichthyosiform dermatosis (CKCSID), commonly known as dry eye curly coat syndrome. EF is characterised by episodes of exercise-induced muscular hypertonicity and abnormal posturing, usually occurring after exercise or periods of excitement. CKCSID is a congenital disorder that manifests as a rough coat present at birth, with keratoconjunctivitis sicca apparent on eyelid opening at 10–14 days, followed by hyperkeratinisation of footpads and distortion of nails that develops over the next few months. We undertook a GWAS with 31 EF cases, 23 CKCSID cases, and a common set of 38 controls and identified statistically associated signals for EF and CKCSID on chromosome 7 (Praw 1.9×10−14; Pgenome = 1.0×10−5) and chromosome 13 (Praw 1.2×10−17; Pgenome = 1.0×10−5), respectively. We resequenced both the EF and CKCSID disease-associated regions in just five dogs and identified a 15,724 bp deletion spanning three exons of BCAN associated with EF and a single base-pair exonic deletion in FAM83H associated with CKCSID. Neither BCAN or FAM83H have been associated with equivalent disease phenotypes in any other species, thus demonstrating the ability to use the domestic dog to study the genetic basis of more than one disease simultaneously in a single breed and to identify multiple novel candidate genes in parallel. PMID:22253609

  4. Resequencing and Analysis of Variation in the TCF7L2 Gene in African Americans Suggests That SNP rs7903146 Is the Causal Diabetes Susceptibility Variant

    PubMed Central

    Palmer, Nicholette D.; Hester, Jessica M.; An, S. Sandy; Adeyemo, Adebowale; Rotimi, Charles; Langefeld, Carl D.; Freedman, Barry I.; Ng, Maggie C.Y.; Bowden, Donald W.

    2011-01-01

    OBJECTIVE Variation in the transcription factor 7-like 2 (TCF7L2) locus is associated with type 2 diabetes across multiple ethnicities. The aim of this study was to elucidate which variant in TCF7L2 confers diabetes susceptibility in African Americans. RESEARCH DESIGN AND METHODS Through the evaluation of tagging single nucleotide polymorphisms (SNPs), type 2 diabetes susceptibility was limited to a 4.3-kb interval, which contains the YRI (African) linkage disequilibrium (LD) block containing rs7903146. To better define the relationship between type 2 diabetes risk and genetic variation we resequenced this 4.3-kb region in 96 African American DNAs. Thirty-three novel and 13 known SNPs were identified: 20 with minor allele frequencies (MAF) >0.05 and 12 with MAF >0.10. These polymorphisms and the previously identified DG10S478 microsatellite were evaluated in African American type 2 diabetic cases (n = 1,033) and controls (n = 1,106). RESULTS Variants identified from direct sequencing and databases were genotyped or imputed. Fifteen SNPs showed association with type 2 diabetes (P < 0.05) with rs7903146 being the most significant (P = 6.32 × 10−6). Results of imputation, haplotype, and conditional analysis of SNPs were consistent with rs7903146 being the trait-defining SNP. Analysis of the DG10S478 microsatellite, which is outside the 4.3-kb LD block, revealed consistent association of risk allele 8 with type 2 diabetes (odds ratio [OR] = 1.33; P = 0.022) as reported in European populations; however, allele 16 (MAF = 0.016 cases and 0.032 controls) was strongly associated with reduced risk (OR = 0.39; P = 5.02 × 10−5) in contrast with previous studies. CONCLUSIONS In African Americans, these observations suggest that rs7903146 is the trait-defining polymorphism associated with type 2 diabetes risk. Collectively, these results support ethnic differences in type 2 diabetes associations. PMID:20980453

  5. Association of CLU and PICALM variants with Alzheimer's disease

    PubMed Central

    Kamboh, M.I.; Minster, R. L.; Demirci, F.Y.; Ganguli, M.; DeKosky, S.T.; Lopez, O.L.; Barmada, M.M.

    2010-01-01

    Two recent large genome-wide association studies have reported significant associations in the CLU (APOJ), CR1 and PICALM genes. In order to replicate these findings, we examined 7 single nucleotide polymorphisms (SNPs) most significantly implicated by these studies in a large case-control sample comprising of 2,707 individuals. Principle components analysis revealed no population substructure in our sample. While no association was observed with CR1 SNPs (P=0.30–0.457), a trend of association was seen with the PICALM (P=0.071–0.086) and CLU (P=0.148–0.258) SNPs. A meta-analysis of three studies revealed significant associations with all three genes. Our data from an independent and large case-control sample suggest that these gene regions should be followed up by comprehensive resequencing to find functional variants. PMID:20570404

  6. PolyPhred analysis software for mutation detection from fluorescence-based sequence data.

    PubMed

    Montgomery, Kate T; Iartchouck, Oleg; Li, Li; Loomis, Stephanie; Obourn, Vanessa; Kucherlapati, Raju

    2008-10-01

    The ability to search for genetic variants that may be related to human disease is one of the most exciting consequences of the availability of the sequence of the human genome. Large cohorts of individuals exhibiting certain phenotypes can be studied and candidate genes resequenced. However, the challenge of analyzing sequence data from many individuals with accuracy, speed, and economy is great. This unit describes one set of software tools: Phred, Phrap, PolyPhred, and Consed. Coverage includes the advantages and disadvantages of these analysis tools, details for obtaining and using the software, and the results one may expect. The software is being continually updated to permit further automation of mutation analysis. Currently, however, at least some manual review is required if one wishes to identify 100% of the variants in a sample set.

  7. Advanced Applications of Next-Generation Sequencing Technologies to Orchid Biology.

    PubMed

    Yeh, Chuan-Ming; Liu, Zhong-Jian; Tsai, Wen-Chieh

    2018-01-01

    Next-generation sequencing technologies are revolutionizing biology by permitting, transcriptome sequencing, whole-genome sequencing and resequencing, and genome-wide single nucleotide polymorphism profiling. Orchid research has benefited from this breakthrough, and a few orchid genomes are now available; new biological questions can be approached and new breeding strategies can be designed. The first part of this review describes the unique features of orchid biology. The second part provides an overview of the current next-generation sequencing platforms, many of which are already used in plant laboratories. The third part summarizes the state of orchid transcriptome and genome sequencing and illustrates current achievements. The genetic sequences currently obtained will not only provide a broad scope for the study of orchid biology, but also serves as a starting point for uncovering the mystery of orchid evolution.

  8. Mutations in the GABA Transporter SLC6A1 Cause Epilepsy with Myoclonic-Atonic Seizures

    PubMed Central

    Carvill, Gemma L.; McMahon, Jacinta M.; Schneider, Amy; Zemel, Matthew; Myers, Candace T.; Saykally, Julia; Nguyen, John; Robbiano, Angela; Zara, Federico; Specchio, Nicola; Mecarelli, Oriano; Smith, Robert L.; Leventer, Richard J.; Møller, Rikke S.; Nikanorova, Marina; Dimova, Petia; Jordanova, Albena; Petrou, Steven; Helbig, Ingo; Striano, Pasquale; Weckhuysen, Sarah; Berkovic, Samuel F.; Scheffer, Ingrid E.; Mefford, Heather C.

    2015-01-01

    GAT-1, encoded by SLC6A1, is one of the major gamma-aminobutyric acid (GABA) transporters in the brain and is responsible for re-uptake of GABA from the synapse. In this study, targeted resequencing of 644 individuals with epileptic encephalopathies led to the identification of six SLC6A1 mutations in seven individuals, all of whom have epilepsy with myoclonic-atonic seizures (MAE). We describe two truncations and four missense alterations, all of which most likely lead to loss of function of GAT-1 and thus reduced GABA re-uptake from the synapse. These individuals share many of the electrophysiological properties of Gat1-deficient mice, including spontaneous spike-wave discharges. Overall, pathogenic mutations occurred in 6/160 individuals with MAE, accounting for ∼4% of unsolved MAE cases. PMID:25865495

  9. X-MATE: a flexible system for mapping short read data

    PubMed Central

    Pearson, John V.; Cloonan, Nicole; Grimmond, Sean M.

    2011-01-01

    Summary: Accurate and complete mapping of short-read sequencing to a reference genome greatly enhances the discovery of biological results and improves statistical predictions. We recently presented RNA-MATE, a pipeline for the recursive mapping of RNA-Seq datasets. With the rapid increase in genome re-sequencing projects, progression of available mapping software and the evolution of file formats, we now present X-MATE, an updated version of RNA-MATE, capable of mapping both RNA-Seq and DNA datasets and with improved performance, output file formats, configuration files, and flexibility in core mapping software. Availability: Executables, source code, junction libraries, test data and results and the user manual are available from http://grimmond.imb.uq.edu.au/X-MATE/. Contact: n.cloonan@uq.edu.au; s.grimmond@uq.edu.au Supplementary information: Supplementary data are available at Bioinformatics Online. PMID:21216778

  10. Development and Validation of a High-Density SNP Genotyping Array for African Oil Palm.

    PubMed

    Kwong, Qi Bin; Teh, Chee Keng; Ong, Ai Ling; Heng, Huey Ying; Lee, Heng Leng; Mohamed, Mohaimi; Low, Joel Zi-Bin; Apparow, Sukganah; Chew, Fook Tim; Mayes, Sean; Kulaveerasingam, Harikrishna; Tammi, Martti; Appleton, David Ross

    2016-08-01

    High-density single nucleotide polymorphism (SNP) genotyping arrays are powerful tools that can measure the level of genetic polymorphism within a population. To develop a whole-genome SNP array for oil palms, SNP discovery was performed using deep resequencing of eight libraries derived from 132 Elaeis guineensis and Elaeis oleifera palms belonging to 59 origins, resulting in the discovery of >3 million putative SNPs. After SNP filtering, the Illumina OP200K custom array was built with 170 860 successful probes. Phenetic clustering analysis revealed that the array could distinguish between palms of different origins in a way consistent with pedigree records. Genome-wide linkage disequilibrium declined more slowly for the commercial populations (ranging from 120 kb at r(2) = 0.43 to 146 kb at r(2) = 0.50) when compared with the semi-wild populations (19.5 kb at r(2) = 0.22). Genetic fixation mapping comparing the semi-wild and commercial population identified 321 selective sweeps. A genome-wide association study (GWAS) detected a significant peak on chromosome 2 associated with the polygenic component of the shell thickness trait (based on the trait shell-to-fruit; S/F %) in tenera palms. Testing of a genomic selection model on the same trait resulted in good prediction accuracy (r = 0.65) with 42% of the S/F % variation explained. The first high-density SNP genotyping array for oil palm has been developed and shown to be robust for use in genetic studies and with potential for developing early trait prediction to shorten the oil palm breeding cycle. Copyright © 2016 The Author. Published by Elsevier Inc. All rights reserved.

  11. Ultra-sensitive Sequencing Identifies High Prevalence of Clonal Hematopoiesis-Associated Mutations throughout Adult Life.

    PubMed

    Acuna-Hidalgo, Rocio; Sengul, Hilal; Steehouwer, Marloes; van de Vorst, Maartje; Vermeulen, Sita H; Kiemeney, Lambertus A L M; Veltman, Joris A; Gilissen, Christian; Hoischen, Alexander

    2017-07-06

    Clonal hematopoiesis results from somatic mutations in hematopoietic stem cells, which give an advantage to mutant cells, driving their clonal expansion and potentially leading to leukemia. The acquisition of clonal hematopoiesis-driver mutations (CHDMs) occurs with normal aging and these mutations have been detected in more than 10% of individuals ≥65 years. We aimed to examine the prevalence and characteristics of CHDMs throughout adult life. We developed a targeted re-sequencing assay combining high-throughput with ultra-high sensitivity based on single-molecule molecular inversion probes (smMIPs). Using smMIPs, we screened more than 100 loci for CHDMs in more than 2,000 blood DNA samples from population controls between 20 and 69 years of age. Loci screened included 40 regions known to drive clonal hematopoiesis when mutated and 64 novel candidate loci. We identified 224 somatic mutations throughout our cohort, of which 216 were coding mutations in known driver genes (DNMT3A, JAK2, GNAS, TET2, and ASXL1), including 196 point mutations and 20 indels. Our assay's improved sensitivity allowed us to detect mutations with variant allele frequencies as low as 0.001. CHDMs were identified in more than 20% of individuals 60 to 69 years of age and in 3% of individuals 20 to 29 years of age, approximately double the previously reported prevalence despite screening a limited set of loci. Our findings support the occurrence of clonal hematopoiesis-associated mutations as a widespread mechanism linked with aging, suggesting that mosaicism as a result of clonal evolution of cells harboring somatic mutations is a universal mechanism occurring at all ages in healthy humans. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  12. CNS germinomas are characterized by global demethylation, chromosomal instability and mutational activation of the Kit-, Ras/Raf/Erk- and Akt-pathways

    PubMed Central

    Schulte, Simone Laura; Waha, Andreas; Steiger, Barbara; Denkhaus, Dorota; Dörner, Evelyn; Calaminus, Gabriele; Leuschner, Ivo; Pietsch, Torsten

    2016-01-01

    CNS germinomas represent a unique germ cell tumor entity characterized by undifferentiated tumor cells and a high response rate to current treatment protocols. Limited information is available on their underlying genomic, epigenetic and biological alterations. We performed a genome-wide analysis of genomic copy number alterations in 49 CNS germinomas by molecular inversion profiling. In addition, CpG dinucleotide methylation was studied by immunohistochemistry for methylated cytosine residues. Mutational analysis was performed by resequencing of candidate genes including KIT and RAS family members. Ras/Erk and Akt pathway activation was analyzed by immunostaining with antibodies against phospho-Erk, phosho-Akt, phospho-mTOR and phospho-S6. All germinomas coexpressed Oct4 and Kit but showed an extensive global DNA demethylation compared to other tumors and normal tissues. Molecular inversion profiling showed predominant genomic instability in all tumors with a high frequency of regional gains and losses including high level gene amplifications. Activating mutations of KIT exons 11, 13, and 17 as well as a case with genomic KIT amplification and activating mutations or amplifications of RAS gene family members including KRAS, NRAS and RRAS2 indicated mutational activation of crucial signaling pathways. Co-activation of Ras/Erk and Akt pathways was present in 83% of germinomas. These data suggest that CNS germinoma cells display a demethylated nuclear DNA similar to primordial germ cells in early development. This finding has a striking coincidence with extensive genomic instability. In addition, mutational activation of Kit-, Ras/Raf/Erk- and Akt- pathways indicate the biological importance of these pathways and their components as potential targets for therapy. PMID:27391150

  13. Integrated genomic sequencing reveals mutational landscape of T-cell prolymphocytic leukemia

    PubMed Central

    Kiel, Mark J.; Velusamy, Thirunavukkarasu; Rolland, Delphine; Sahasrabuddhe, Anagh A.; Chung, Fuzon; Bailey, Nathanael G.; Schrader, Alexandra; Li, Bo; Li, Jun Z.; Ozel, Ayse B.; Betz, Bryan L.; Miranda, Roberto N.; Medeiros, L. Jeffrey; Zhao, Lili; Herling, Marco

    2014-01-01

    The comprehensive genetic alterations underlying the pathogenesis of T-cell prolymphocytic leukemia (T-PLL) are unknown. To address this, we performed whole-genome sequencing (WGS), whole-exome sequencing (WES), high-resolution copy-number analysis, and Sanger resequencing of a large cohort of T-PLL. WGS and WES identified novel mutations in recurrently altered genes not previously implicated in T-PLL including EZH2, FBXW10, and CHEK2. Strikingly, WGS and/or WES showed largely mutually exclusive mutations affecting IL2RG, JAK1, JAK3, or STAT5B in 38 of 50 T-PLL genomes (76.0%). Notably, gain-of-function IL2RG mutations are novel and have not been reported in any form of cancer. Further, high-frequency mutations in STAT5B have not been previously reported in T-PLL. Functionally, IL2RG-JAK1-JAK3-STAT5B mutations led to signal transducer and activator of transcription 5 (STAT5) hyperactivation, transformed Ba/F3 cells resulting in cytokine-independent growth, and/or enhanced colony formation in Jurkat T cells. Importantly, primary T-PLL cells exhibited constitutive activation of STAT5, and targeted pharmacologic inhibition of STAT5 with pimozide induced apoptosis in primary T-PLL cells. These results for the first time provide a portrait of the mutational landscape of T-PLL and implicate deregulation of DNA repair and epigenetic modulators as well as high-frequency mutational activation of the IL2RG-JAK1-JAK3-STAT5B axis in the pathogenesis of T-PLL. These findings offer opportunities for novel targeted therapies in this aggressive leukemia. PMID:24825865

  14. Integrated genomic sequencing reveals mutational landscape of T-cell prolymphocytic leukemia.

    PubMed

    Kiel, Mark J; Velusamy, Thirunavukkarasu; Rolland, Delphine; Sahasrabuddhe, Anagh A; Chung, Fuzon; Bailey, Nathanael G; Schrader, Alexandra; Li, Bo; Li, Jun Z; Ozel, Ayse B; Betz, Bryan L; Miranda, Roberto N; Medeiros, L Jeffrey; Zhao, Lili; Herling, Marco; Lim, Megan S; Elenitoba-Johnson, Kojo S J

    2014-08-28

    The comprehensive genetic alterations underlying the pathogenesis of T-cell prolymphocytic leukemia (T-PLL) are unknown. To address this, we performed whole-genome sequencing (WGS), whole-exome sequencing (WES), high-resolution copy-number analysis, and Sanger resequencing of a large cohort of T-PLL. WGS and WES identified novel mutations in recurrently altered genes not previously implicated in T-PLL including EZH2, FBXW10, and CHEK2. Strikingly, WGS and/or WES showed largely mutually exclusive mutations affecting IL2RG, JAK1, JAK3, or STAT5B in 38 of 50 T-PLL genomes (76.0%). Notably, gain-of-function IL2RG mutations are novel and have not been reported in any form of cancer. Further, high-frequency mutations in STAT5B have not been previously reported in T-PLL. Functionally, IL2RG-JAK1-JAK3-STAT5B mutations led to signal transducer and activator of transcription 5 (STAT5) hyperactivation, transformed Ba/F3 cells resulting in cytokine-independent growth, and/or enhanced colony formation in Jurkat T cells. Importantly, primary T-PLL cells exhibited constitutive activation of STAT5, and targeted pharmacologic inhibition of STAT5 with pimozide induced apoptosis in primary T-PLL cells. These results for the first time provide a portrait of the mutational landscape of T-PLL and implicate deregulation of DNA repair and epigenetic modulators as well as high-frequency mutational activation of the IL2RG-JAK1-JAK3-STAT5B axis in the pathogenesis of T-PLL. These findings offer opportunities for novel targeted therapies in this aggressive leukemia. © 2014 by The American Society of Hematology.

  15. The population genetics of Quechuas, the largest native South American group: autosomal sequences, SNPs, and microsatellites evidence high level of diversity.

    PubMed

    Scliar, Marilia O; Soares-Souza, Giordano B; Chevitarese, Juliana; Lemos, Livia; Magalhães, Wagner C S; Fagundes, Nelson J; Bonatto, Sandro L; Yeager, Meredith; Chanock, Stephen J; Tarazona-Santos, Eduardo

    2012-03-01

    Elucidating the pattern of genetic diversity for non-European populations is necessary to make the benefits of human genetics research available to individuals from these groups. In the era of large human genomic initiatives, Native American populations have been neglected, in particular, the Quechua, the largest South Amerindian group settled along the Andes. We characterized the genetic diversity of a Quechua population in a global setting, using autosomal noncoding sequences (nine unlinked loci for a total of 16 kb), 351 unlinked SNPs and 678 microsatellites and tested predictions of the model of the evolution of Native Americans proposed by (Tarazona-Santos et al.: Am J Hum Genet 68 (2001) 1485-1496). European admixture is <5% and African ancestry is barely detectable in the studied population. The largest genetic distances were between African versus Quechua or Melanesian populations, which is concordant with the African origin of modern humans and the fact that South America was the last part of the world to be peopled. The diversity in the Quechua population is comparable with that of Eurasian populations, and the allele frequency spectrum based on resequencing data does not reflect a reduction in the proportion of rare alleles. Thus, the Quechua population is a large reservoir of common and rare genetic variants of South Amerindians. These results are consistent with and complement our evolutionary model of South Amerindians (Tarazona-Santos et al.: Am J Hum Genet 68 (2001) 1485-1496), proposed based on Y-chromosome data, which predicts high genomic diversity due to the high level of gene flow between Andean populations and their long-term effective population size. Copyright © 2012 Wiley Periodicals, Inc.

  16. Whole-genome sequence-based genomic prediction in laying chickens with different genomic relationship matrices to account for genetic architecture.

    PubMed

    Ni, Guiyan; Cavero, David; Fangmann, Anna; Erbe, Malena; Simianer, Henner

    2017-01-16

    With the availability of next-generation sequencing technologies, genomic prediction based on whole-genome sequencing (WGS) data is now feasible in animal breeding schemes and was expected to lead to higher predictive ability, since such data may contain all genomic variants including causal mutations. Our objective was to compare prediction ability with high-density (HD) array data and WGS data in a commercial brown layer line with genomic best linear unbiased prediction (GBLUP) models using various approaches to weight single nucleotide polymorphisms (SNPs). A total of 892 chickens from a commercial brown layer line were genotyped with 336 K segregating SNPs (array data) that included 157 K genic SNPs (i.e. SNPs in or around a gene). For these individuals, genome-wide sequence information was imputed based on data from re-sequencing runs of 25 individuals, leading to 5.2 million (M) imputed SNPs (WGS data), including 2.6 M genic SNPs. De-regressed proofs (DRP) for eggshell strength, feed intake and laying rate were used as quasi-phenotypic data in genomic prediction analyses. Four weighting factors for building a trait-specific genomic relationship matrix were investigated: identical weights, -(log 10 P) from genome-wide association study results, squares of SNP effects from random regression BLUP, and variable selection based weights (known as BLUP|GA). Predictive ability was measured as the correlation between DRP and direct genomic breeding values in five replications of a fivefold cross-validation. Averaged over the three traits, the highest predictive ability (0.366 ± 0.075) was obtained when only genic SNPs from WGS data were used. Predictive abilities with genic SNPs and all SNPs from HD array data were 0.361 ± 0.072 and 0.353 ± 0.074, respectively. Prediction with -(log 10 P) or squares of SNP effects as weighting factors for building a genomic relationship matrix or BLUP|GA did not increase accuracy, compared to that with identical weights, regardless of the SNP set used. Our results show that little or no benefit was gained when using all imputed WGS data to perform genomic prediction compared to using HD array data regardless of the weighting factors tested. However, using only genic SNPs from WGS data had a positive effect on prediction ability.

  17. Repeat-aware modeling and correction of short read errors.

    PubMed

    Yang, Xiao; Aluru, Srinivas; Dorman, Karin S

    2011-02-15

    High-throughput short read sequencing is revolutionizing genomics and systems biology research by enabling cost-effective deep coverage sequencing of genomes and transcriptomes. Error detection and correction are crucial to many short read sequencing applications including de novo genome sequencing, genome resequencing, and digital gene expression analysis. Short read error detection is typically carried out by counting the observed frequencies of kmers in reads and validating those with frequencies exceeding a threshold. In case of genomes with high repeat content, an erroneous kmer may be frequently observed if it has few nucleotide differences with valid kmers with multiple occurrences in the genome. Error detection and correction were mostly applied to genomes with low repeat content and this remains a challenging problem for genomes with high repeat content. We develop a statistical model and a computational method for error detection and correction in the presence of genomic repeats. We propose a method to infer genomic frequencies of kmers from their observed frequencies by analyzing the misread relationships among observed kmers. We also propose a method to estimate the threshold useful for validating kmers whose estimated genomic frequency exceeds the threshold. We demonstrate that superior error detection is achieved using these methods. Furthermore, we break away from the common assumption of uniformly distributed errors within a read, and provide a framework to model position-dependent error occurrence frequencies common to many short read platforms. Lastly, we achieve better error correction in genomes with high repeat content. The software is implemented in C++ and is freely available under GNU GPL3 license and Boost Software V1.0 license at "http://aluru-sun.ece.iastate.edu/doku.php?id = redeem". We introduce a statistical framework to model sequencing errors in next-generation reads, which led to promising results in detecting and correcting errors for genomes with high repeat content.

  18. A 48 SNP set for grapevine cultivar identification

    PubMed Central

    2011-01-01

    Background Rapid and consistent genotyping is an important requirement for cultivar identification in many crop species. Among them grapevine cultivars have been the subject of multiple studies given the large number of synonyms and homonyms generated during many centuries of vegetative multiplication and exchange. Simple sequence repeat (SSR) markers have been preferred until now because of their high level of polymorphism, their codominant nature and their high profile repeatability. However, the rapid application of partial or complete genome sequencing approaches is identifying thousands of single nucleotide polymorphisms (SNP) that can be very useful for such purposes. Although SNP markers are bi-allelic, and therefore not as polymorphic as microsatellites, the high number of loci that can be multiplexed and the possibilities of automation as well as their highly repeatable results under any analytical procedure make them the future markers of choice for any type of genetic identification. Results We analyzed over 300 SNP in the genome of grapevine using a re-sequencing strategy in a selection of 11 genotypes. Among the identified polymorphisms, we selected 48 SNP spread across all grapevine chromosomes with allele frequencies balanced enough as to provide sufficient information content for genetic identification in grapevine allowing for good genotyping success rate. Marker stability was tested in repeated analyses of a selected group of cultivars obtained worldwide to demonstrate their usefulness in genetic identification. Conclusions We have selected a set of 48 stable SNP markers with a high discrimination power and a uniform genome distribution (2-3 markers/chromosome), which is proposed as a standard set for grapevine (Vitis vinifera L.) genotyping. Any previous problems derived from microsatellite allele confusion between labs or the need to run reference cultivars to identify allele sizes disappear using this type of marker. Furthermore, because SNP markers are bi-allelic, allele identification and genotype naming are extremely simple and genotypes obtained with different equipments and by different laboratories are always fully comparable. PMID:22060012

  19. Genetic and agronomic assessment of cob traits in corn under low and normal nitrogen management conditions.

    PubMed

    Jansen, Constantin; Zhang, Yongzhong; Liu, Hongjun; Gonzalez-Portilla, Pedro J; Lauter, Nick; Kumar, Bharath; Trucillo-Silva, Ignacio; Martin, Juan Pablo San; Lee, Michael; Simcox, Kevin; Schussler, Jeff; Dhugga, Kanwarpal; Lübberstedt, Thomas

    2015-07-01

    Exploring and understanding the genetic basis of cob biomass in relation to grain yield under varying nitrogen management regimes will help breeders to develop dual-purpose maize. With rising energy demands and costs for fossil fuels, alternative energy from renewable sources such as maize cobs will become competitive. Maize cobs have beneficial characteristics for utilization as feedstock including compact tissue, high cellulose content, and low ash and nitrogen content. Nitrogen is quantitatively the most important nutrient for plant growth. However, the influence of nitrogen fertilization on maize cob production is unclear. In this study, quantitative trait loci (QTL) have been analyzed for cob morphological traits such as cob weight, volume, length, diameter and cob tissue density, and grain yield under normal and low nitrogen regimes. 213 doubled-haploid lines of the intermated B73 × Mo17 (IBM) Syn10 population have been resequenced for 8575 bins, based on SNP markers. A total of 138 QTL were found for six traits across six trials using composite interval mapping with ten cofactors and empirical comparison-wise thresholds (P = 0.001). Despite moderate to high repeatabilities across trials, few QTL were consistent across trials and overall levels of explained phenotypic variance were lower than expected some of the cob trait × trial combinations (R (2) = 7.3-43.1 %). Variation for cob traits was less affected by nitrogen conditions than by grain yield. Thus, the economics of cob usage under low nitrogen regimes is promising.

  20. efficient association study design via power-optimized tag SNP selection

    PubMed Central

    HAN, BUHM; KANG, HYUN MIN; SEO, MYEONG SEONG; ZAITLEN, NOAH; ESKIN, ELEAZAR

    2008-01-01

    Discovering statistical correlation between causal genetic variation and clinical traits through association studies is an important method for identifying the genetic basis of human diseases. Since fully resequencing a cohort is prohibitively costly, genetic association studies take advantage of local correlation structure (or linkage disequilibrium) between single nucleotide polymorphisms (SNPs) by selecting a subset of SNPs to be genotyped (tag SNPs). While many current association studies are performed using commercially available high-throughput genotyping products that define a set of tag SNPs, choosing tag SNPs remains an important problem for both custom follow-up studies as well as designing the high-throughput genotyping products themselves. The most widely used tag SNP selection method optimizes over the correlation between SNPs (r2). However, tag SNPs chosen based on an r2 criterion do not necessarily maximize the statistical power of an association study. We propose a study design framework that chooses SNPs to maximize power and efficiently measures the power through empirical simulation. Empirical results based on the HapMap data show that our method gains considerable power over a widely used r2-based method, or equivalently reduces the number of tag SNPs required to attain the desired power of a study. Our power-optimized 100k whole genome tag set provides equivalent power to the Affymetrix 500k chip for the CEU population. For the design of custom follow-up studies, our method provides up to twice the power increase using the same number of tag SNPs as r2-based methods. Our method is publicly available via web server at http://design.cs.ucla.edu. PMID:18702637

  1. Low incidence of SNVs and indels in trio genomes of Cas9-mediated multiplex edited sheep.

    PubMed

    Wang, Xiaolong; Liu, Jing; Niu, Yiyuan; Li, Yan; Zhou, Shiwei; Li, Chao; Ma, Baohua; Kou, Qifang; Petersen, Bjoern; Sonstegard, Tad; Huang, Xingxu; Jiang, Yu; Chen, Yulin

    2018-05-25

    The simplicity of the CRISPR/Cas9 system has enabled its widespread applications in generating animal models, functional genomic screening and in treating genetic and infectious diseases. However, unintended mutations produced by off-target CRISPR/Cas9 nuclease activity may lead to negative consequences. Especially, a very recent study found that gene editing can introduce hundreds of unintended mutations into the genome, and have attracted wide attention. To address the off-target concerns, urgent characterization of the CRISPR/Cas9-mediated off-target mutagenesis is highly anticipated. Here we took advantage of our previously generated gene-edited sheep and performed family trio-based whole genome sequencing which is capable of discriminating variants in the edited progenies that are inherited, naturally generated, or induced by genetic modification. Three family trios were re-sequenced at a high average depth of genomic coverage (~ 25.8×). After developing a pipeline to comprehensively analyze the sequence data for de novo single nucleotide variants, indels and structural variations from the genome; we only found a single unintended event in the form of a 2.4 kb inversion induced by site-specific double-strand breaks between two sgRNA targeting sites at the MSTN locus with a low incidence. We provide the first report on the fidelity of CRISPR-based modification for sheep genomes targeted simultaneously for gene breaks at three coding sequence locations. The trio-based sequencing approach revealed almost negligible off-target modifications, providing timely evidences of the safe application of genome editing in vivo with CRISPR/Cas9.

  2. Genome-wide ancestry and divergence patterns from low-coverage sequencing data reveal a complex history of admixture in wild baboons

    PubMed Central

    Wall, Jeffrey D; Schlebusch, Stephen A; Alberts, Susan C; Cox, Laura A; Snyder-Mackler, Noah; Nevonen, Kimberly; Carbone, Lucia; Tung, Jenny

    2017-01-01

    Naturally occurring admixture has now been documented in every major primate lineage, suggesting its key role in primate evolutionary history. Active primate hybrid zones can provide valuable insight into this process. Here, we investigate the history of admixture in one of the best-studied natural primate hybrid zones, between yellow baboons (Papio cynocephalus) and anubis baboons (Papio anubis) in the Amboseli ecosystem of Kenya. We generated a new genome assembly for yellow baboon and low coverage genome-wide resequencing data from yellow baboons, anubis baboons, and known hybrids (n=44). Using a novel composite likelihood method for estimating local ancestry from low coverage data, we found high levels of genetic diversity and genetic differentiation between the parent taxa, and excellent agreement between genome-scale ancestry estimates and a priori pedigree, life history, and morphology-based estimates (r2=0.899). However, even putatively unadmixed Amboseli yellow individuals carried a substantial proportion of anubis ancestry, presumably due to historical admixture. Further, the distribution of shared versus fixed differences between a putatively unadmixed Amboseli yellow baboon and an unadmixed anubis baboon, both sequenced at high coverage, are inconsistent with simple isolation-migration or equilibrium migration models. Our findings suggest a complex process of intermittent contact that has occurred multiple times in baboon evolutionary history, despite no obvious fitness costs to hybrids or major geographic or behavioral barriers. In combination with the extensive phenotypic data available for baboon hybrids, our results provide valuable context for understanding the history of admixture in primates, including in our own lineage. PMID:27145036

  3. Pedigree-based analysis of derivation of genome segments of an elite rice reveals key regions during its breeding.

    PubMed

    Zhou, Degui; Chen, Wei; Lin, Zechuan; Chen, Haodong; Wang, Chongrong; Li, Hong; Yu, Renbo; Zhang, Fengyun; Zhen, Gang; Yi, Junliang; Li, Kanghuo; Liu, Yaoguang; Terzaghi, William; Tang, Xiaoyan; He, Hang; Zhou, Shaochuan; Deng, Xing Wang

    2016-02-01

    Analyses of genome variations with high-throughput assays have improved our understanding of genetic basis of crop domestication and identified the selected genome regions, but little is known about that of modern breeding, which has limited the usefulness of massive elite cultivars in further breeding. Here we deploy pedigree-based analysis of an elite rice, Huanghuazhan, to exploit key genome regions during its breeding. The cultivars in the pedigree were resequenced with 7.6× depth on average, and 2.1 million high-quality single nucleotide polymorphisms (SNPs) were obtained. Tracing the derivation of genome blocks with pedigree and information on SNPs revealed the chromosomal recombination during breeding, which showed that 26.22% of Huanghuazhan genome are strictly conserved key regions. These major effect regions were further supported by a QTL mapping of 260 recombinant inbred lines derived from the cross of Huanghuazhan and a very dissimilar cultivar, Shuanggui 36, and by the genome profile of eight cultivars and 36 elite lines derived from Huanghuazhan. Hitting these regions with the cloned genes revealed they include numbers of key genes, which were then applied to demonstrate how Huanghuazhan were bred after 30 years of effort and to dissect the deficiency of artificial selection. We concluded the regions are helpful to the further breeding based on this pedigree and performing breeding by design. Our study provides genetic dissection of modern rice breeding and sheds new light on how to perform genomewide breeding by design. © 2015 Society for Experimental Biology, Association of Applied Biologists and John Wiley & Sons Ltd.

  4. Identification of novel variants associated with warfarin stable dosage by use of a two-stage extreme phenotype strategy.

    PubMed

    Luo, Z; Li, X; Zhu, M; Tang, J; Li, Z; Zhou, X; Song, G; Liu, Z; Zhou, H; Zhang, W

    2017-01-01

    Essentials Required warfarin doses for mechanical heart valves vary greatly. A two-stage extreme phenotype design was used to identify novel warfarin dose associated mutation. We identified a group of variants significantly associated with extreme warfarin dose. Four novel identified mutations account for 2.2% of warfarin dose discrepancies. Background The variation among patients in warfarin response complicates the management of warfarin therapy, and an improper therapeutic dose usually results in serious adverse events. Objective To use a two-stage extreme phenotype strategy in order to discover novel warfarin dose-associated mutations in heart valve replacement patients. Patients/method A total of 1617 stable-dose patients were enrolled and divided randomly into two cohorts. Stage I patients were genotyped into three groups on the basis of VKORC1-1639G>A and CYP2C9*3 polymorphisms; only patients with the therapeutic dose at the upper or lower 5% of each genotype group were selected as extreme-dose patients for resequencing of the targeted regions. Evaluation of the accuracy of the sequence data and the potential value of the stage I-identified significant mutations were conducted in a validation cohort of 420 subjects. Results A group of mutations were found to be significantly associated with the extreme warfarin dose. The validation work finally identified four novel mutations, i.e. DNMT3A rs2304429 (24.74%), CYP1A1 rs3826041 (47.35%), STX1B rs72800847 (7.01%), and NQO1 rs10517 (36.11%), which independently and significantly contributed to the overall variability in the warfarin dose. After addition of these four mutations, the estimated regression equation was able to account for 56.2% (R 2 Adj = 0.562) of the overall variability in the warfarin maintenance dose, with a predictive accuracy of 62.4%. Conclusion Our study provides evidence linking genetic variations in STX1B, DNMT3A and CYP1A1 to warfarin maintenance dose. The newly identified mutations together account for 2.2% of warfarin dose discrepancy. © 2016 The Authors. Journal of Thrombosis and Haemostasis published by Wiley Periodicals, Inc. on behalf of International Society on Thrombosis and Haemostasis.

  5. A variant of Rubus yellow net virus with altered genomic organization.

    PubMed

    Diaz-Lara, Alfredo; Mosier, Nola J; Keller, Karen E; Martin, Robert R

    2015-02-01

    Rubus yellow net virus (RYNV) is a member of the genus Badnavirus (family: Caulimoviridae). RYNV infects Rubus species causing chlorosis of the tissue along the leaf veins, giving an unevenly distributed netted symptom in some cultivars of red and black raspberry. Recently, a strain of RYNV was sequenced from a Rubus idaeus plant in Alberta, Canada, exhibiting such symptoms. The viral genome contained seven open reading frames (ORFs) with five of them in the sense-strand, including a large polyprotein. Here we describe a graft-transmissible strain of RYNV from Europe infecting cultivar 'Baumforth's Seedling A' (named RYNV-BS), which was sequenced using rolling circle amplification, enzymatic digestion, cloning and primer walking, and it was resequenced at a 5X coverage. This sequence was then compared with the RYNV-Ca genome and significant differences were observed. Genomic analysis identified differences in the arrangement of coding regions, promoter elements, and presence of motifs. The genomic organization of RYNV-BS consisted of five ORFs (four ORFs in the sense-strand and one ORF in the antisense-strand). ORFs 1, 2, and 3 showed a high degree of homology to RYNV-Ca, while ORFs 4 and 6 of RYNV-BS were quite distinct. Also, the predicted ORFs 5 and 7 in the RYNV-Ca were absent in the RYNV-BS sequence. These differences may account for the lack of aphid transmissibility of RYNV-BS.

  6. Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing.

    PubMed

    Misra, Sanchit; Agrawal, Ankit; Liao, Wei-keng; Choudhary, Alok

    2011-01-15

    Recently, a number of programs have been proposed for mapping short reads to a reference genome. Many of them are heavily optimized for short-read mapping and hence are very efficient for shorter queries, but that makes them inefficient or not applicable for reads longer than 200 bp. However, many sequencers are already generating longer reads and more are expected to follow. For long read sequence mapping, there are limited options; BLAT, SSAHA2, FANGS and BWA-SW are among the popular ones. However, resequencing and personalized medicine need much faster software to map these long sequencing reads to a reference genome to identify SNPs or rare transcripts. We present AGILE (AliGnIng Long rEads), a hash table based high-throughput sequence mapping algorithm for longer 454 reads that uses diagonal multiple seed-match criteria, customized q-gram filtering and a dynamic incremental search approach among other heuristics to optimize every step of the mapping process. In our experiments, we observe that AGILE is more accurate than BLAT, and comparable to BWA-SW and SSAHA2. For practical error rates (< 5%) and read lengths (200-1000 bp), AGILE is significantly faster than BLAT, SSAHA2 and BWA-SW. Even for the other cases, AGILE is comparable to BWA-SW and several times faster than BLAT and SSAHA2. http://www.ece.northwestern.edu/~smi539/agile.html.

  7. Genomic Variants Revealed by Invariably Missing Genotypes in Nelore Cattle

    PubMed Central

    da Silva, Joaquim Manoel; Giachetto, Poliana Fernanda; da Silva, Luiz Otávio Campos; Cintra, Leandro Carrijo; Paiva, Samuel Rezende; Caetano, Alexandre Rodrigues; Yamagishi, Michel Eduardo Beleza

    2015-01-01

    High density genotyping panels have been used in a wide range of applications. From population genetics to genome-wide association studies, this technology still offers the lowest cost and the most consistent solution for generating SNP data. However, in spite of the application, part of the generated data is always discarded from final datasets based on quality control criteria used to remove unreliable markers. Some discarded data consists of markers that failed to generate genotypes, labeled as missing genotypes. A subset of missing genotypes that occur in the whole population under study may be caused by technical issues but can also be explained by the presence of genomic variations that are in the vicinity of the assayed SNP and that prevent genotyping probes from annealing. The latter case may contain relevant information because these missing genotypes might be used to identify population-specific genomic variants. In order to assess which case is more prevalent, we used Illumina HD Bovine chip genotypes from 1,709 Nelore (Bos indicus) samples. We found 3,200 missing genotypes among the whole population. NGS re-sequencing data from 8 sires were used to verify the presence of genomic variations within their flanking regions in 81.56% of these missing genotypes. Furthermore, we discovered 3,300 novel SNPs/Indels, 31% of which are located in genes that may affect traits of importance for the genetic improvement of cattle production. PMID:26305794

  8. Evolved osmotolerant Escherichia coli mutants frequently exhibit defective N-acetylglucosamine catabolism and point mutations in cell shape-regulating protein MreB.

    PubMed

    Winkler, James D; Garcia, Carlos; Olson, Michelle; Callaway, Emily; Kao, Katy C

    2014-06-01

    Biocatalyst robustness toward stresses imposed during fermentation is important for efficient bio-based production. Osmotic stress, imposed by high osmolyte concentrations or dense populations, can significantly impact growth and productivity. In order to better understand the osmotic stress tolerance phenotype, we evolved sexual (capable of in situ DNA exchange) and asexual Escherichia coli strains under sodium chloride (NaCl) stress. All isolates had significantly improved growth under selection and could grow in up to 0.80 M (47 g/liter) NaCl, a concentration that completely inhibits the growth of the unevolved parental strains. Whole genome resequencing revealed frequent mutations in genes controlling N-acetylglucosamine catabolism (nagC, nagA), cell shape (mrdA, mreB), osmoprotectant uptake (proV), and motility (fimA). Possible epistatic interactions between nagC, nagA, fimA, and proV deletions were also detected when reconstructed as defined mutations. Biofilm formation under osmotic stress was found to be decreased in most mutant isolates, coupled with perturbations in indole secretion. Transcriptional analysis also revealed significant changes in ompACGL porin expression and increased transcription of sulfonate uptake systems in the evolved mutants. These findings expand our current knowledge of the osmotic stress phenotype and will be useful for the rational engineering of osmotic tolerance into industrial strains in the future. Copyright © 2014, American Society for Microbiology. All Rights Reserved.

  9. Population-specific recombination sites within the human MHC region.

    PubMed

    Lam, T H; Shen, M; Chia, J-M; Chan, S H; Ren, E C

    2013-08-01

    Genetic rearrangement by recombination is one of the major driving forces for genome evolution, and recombination is known to occur in non-random, discreet recombination sites within the genome. Mapping of recombination sites has proved to be difficult, particularly, in the human MHC region that is complicated by both population variation and highly polymorphic HLA genes. To overcome these problems, HLA-typed individuals from three representative populations: Asian, European and African were used to generate phased HLA haplotypes. Extended haplotype homozygosity (EHH) plots constructed from the phased haplotype data revealed discreet EHH drops corresponding to recombination events and these signatures were observed to be different for each population. Surprisingly, the majority of recombination sites detected are unique to each population, rather than being common. Unique recombination sites account for 56.8% (21/37 of total sites) in the Asian cohort, 50.0% (15/30 sites) in Europeans and 63.2% (24/38 sites) in Africans. Validation carried out at a known sperm typing recombination site of 45 kb (HLA-F-telomeric) showed that EHH was an efficient method to narrow the recombination region to 826 bp, and this was further refined to 660 bp by resequencing. This approach significantly enhanced mapping of the genomic architecture within the human MHC, and will be useful in studies to identify disease risk genes.

  10. The rubber tree genome reveals new insights into rubber production and species adaptation.

    PubMed

    Tang, Chaorong; Yang, Meng; Fang, Yongjun; Luo, Yingfeng; Gao, Shenghan; Xiao, Xiaohu; An, Zewei; Zhou, Binhui; Zhang, Bing; Tan, Xinyu; Yeang, Hoong-Yeet; Qin, Yunxia; Yang, Jianghua; Lin, Qiang; Mei, Hailiang; Montoro, Pascal; Long, Xiangyu; Qi, Jiyan; Hua, Yuwei; He, Zilong; Sun, Min; Li, Wenjie; Zeng, Xia; Cheng, Han; Liu, Ying; Yang, Jin; Tian, Weimin; Zhuang, Nansheng; Zeng, Rizhong; Li, Dejun; He, Peng; Li, Zhe; Zou, Zhi; Li, Shuangli; Li, Chenji; Wang, Jixiang; Wei, Dong; Lai, Chao-Qiang; Luo, Wei; Yu, Jun; Hu, Songnian; Huang, Huasun

    2016-05-23

    The Para rubber tree (Hevea brasiliensis) is an economically important tropical tree species that produces natural rubber, an essential industrial raw material. Here we present a high-quality genome assembly of this species (1.37 Gb, scaffold N50 = 1.28 Mb) that covers 93.8% of the genome (1.47 Gb) and harbours 43,792 predicted protein-coding genes. A striking expansion of the REF/SRPP (rubber elongation factor/small rubber particle protein) gene family and its divergence into several laticifer-specific isoforms seem crucial for rubber biosynthesis. The REF/SRPP family has isoforms with sizes similar to or larger than SRPP1 (204 amino acids) in 17 other plants examined, but no isoforms with similar sizes to REF1 (138 amino acids), the predominant molecular variant. A pivotal point in Hevea evolution was the emergence of REF1, which is located on the surface of large rubber particles that account for 93% of rubber in the latex (despite constituting only 6% of total rubber particles, large and small). The stringent control of ethylene synthesis under active ethylene signalling and response in laticifers resolves a longstanding mystery of ethylene stimulation in rubber production. Our study, which includes the re-sequencing of five other Hevea cultivars and extensive RNA-seq data, provides a valuable resource for functional genomics and tools for breeding elite Hevea cultivars.

  11. Reconstruction of Haplotype-Blocks Selected during Experimental Evolution.

    PubMed

    Franssen, Susanne U; Barton, Nicholas H; Schlötterer, Christian

    2017-01-01

    The genetic analysis of experimentally evolving populations typically relies on short reads from pooled individuals (Pool-Seq). While this method provides reliable allele frequency estimates, the underlying haplotype structure remains poorly characterized. With small population sizes and adaptive variants that start from low frequencies, the interpretation of selection signatures in most Evolve and Resequencing studies remains challenging. To facilitate the characterization of selection targets, we propose a new approach that reconstructs selected haplotypes from replicated time series, using Pool-Seq data. We identify selected haplotypes through the correlated frequencies of alleles carried by them. Computer simulations indicate that selected haplotype-blocks of several Mb can be reconstructed with high confidence and low error rates, even when allele frequencies change only by 20% across three replicates. Applying this method to real data from D. melanogaster populations adapting to a hot environment, we identify a selected haplotype-block of 6.93 Mb. We confirm the presence of this haplotype-block in evolved populations by experimental haplotyping, demonstrating the power and accuracy of our haplotype reconstruction from Pool-Seq data. We propose that the combination of allele frequency estimates with haplotype information will provide the key to understanding the dynamics of adaptive alleles. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  12. Next generation semiconductor based-sequencing of a nutrigenetics target gene (GPR120) and association with growth rate in Italian Large White pigs.

    PubMed

    Fontanesi, Luca; Bertolini, Francesca; Scotti, Emilio; Schiavo, Giuseppina; Colombo, Michela; Trevisi, Paolo; Ribani, Anisa; Buttazzoni, Luca; Russo, Vincenzo; Dall'Olio, Stefania

    2015-01-01

    The GPR120 gene (also known as FFAR4 or O3FAR1) encodes for a functional omega-3 fatty acid receptor/sensor that mediates potent insulin sensitizing effects by repressing macrophage-induced tissue inflammation. For its functional role, GPR120 could be considered a potential target gene in animal nutrigenetics. In this work we resequenced the porcine GPR120 gene by high throughput Ion Torrent semiconductor sequencing of amplified fragments obtained from 8 DNA pools derived, on the whole, from 153 pigs of different breeds/populations (two Italian Large White pools, Italian Duroc, Italian Landrace, Casertana, Pietrain, Meishan, and wild boars). Three single nucleotide polymorphisms (SNPs), two synonymous substitutions and one in the putative 3'-untranslated region (g.114765469C > T), were identified and their allele frequencies were estimated by sequencing reads count. The g.114765469C > T SNP was also genotyped by PCR-RFLP confirming estimated frequency in Italian Large White pools. Then, this SNP was analyzed in two Italian Large White cohorts using a selective genotyping approach based on extreme and divergent pigs for back fat thickness (BFT) estimated breeding value (EBV) and average daily gain (ADG) EBV. Significant differences of allele and genotype frequencies distribution was observed between the extreme ADG-EBV groups (P < 0.001) whereas this marker was not associated with BFT-EBV.

  13. A Genome-Wide Scan for Evidence of Selection in a Maize Population Under Long-Term Artificial Selection for Ear Number

    PubMed Central

    Beissinger, Timothy M.; Hirsch, Candice N.; Vaillancourt, Brieanne; Deshpande, Shweta; Barry, Kerrie; Buell, C. Robin; Kaeppler, Shawn M.; Gianola, Daniel; de Leon, Natalia

    2014-01-01

    A genome-wide scan to detect evidence of selection was conducted in the Golden Glow maize long-term selection population. The population had been subjected to selection for increased number of ears per plant for 30 generations, with an empirically estimated effective population size ranging from 384 to 667 individuals and an increase of more than threefold in the number of ears per plant. Allele frequencies at >1.2 million single-nucleotide polymorphism loci were estimated from pooled whole-genome resequencing data, and FST values across sliding windows were employed to assess divergence between the population preselection and the population postselection. Twenty-eight highly divergent regions were identified, with half of these regions providing gene-level resolution on potentially selected variants. Approximately 93% of the divergent regions do not demonstrate a significant decrease in heterozygosity, which suggests that they are not approaching fixation. Also, most regions display a pattern consistent with a soft-sweep model as opposed to a hard-sweep model, suggesting that selection mostly operated on standing genetic variation. For at least 25% of the regions, results suggest that selection operated on variants located outside of currently annotated coding regions. These results provide insights into the underlying genetic effects of long-term artificial selection and identification of putative genetic elements underlying number of ears per plant in maize. PMID:24381334

  14. An Advanced Preclinical Mouse Model for Acute Myeloid Leukemia Using Patients' Cells of Various Genetic Subgroups and In Vivo Bioluminescence Imaging

    PubMed Central

    Vick, Binje; Rothenberg, Maja; Sandhöfer, Nadine; Carlet, Michela; Finkenzeller, Cornelia; Krupka, Christina; Grunert, Michaela; Trumpp, Andreas; Corbacioglu, Selim; Ebinger, Martin; André, Maya C.; Hiddemann, Wolfgang; Schneider, Stephanie; Subklewe, Marion; Metzeler, Klaus H.; Spiekermann, Karsten; Jeremias, Irmela

    2015-01-01

    Acute myeloid leukemia (AML) is a clinically and molecularly heterogeneous disease with poor outcome. Adequate model systems are required for preclinical studies to improve understanding of AML biology and to develop novel, rational treatment approaches. Xenografts in immunodeficient mice allow performing functional studies on patient-derived AML cells. We have established an improved model system that integrates serial retransplantation of patient-derived xenograft (PDX) cells in mice, genetic manipulation by lentiviral transduction, and essential quality controls by immunophenotyping and targeted resequencing of driver genes. 17/29 samples showed primary engraftment, 10/17 samples could be retransplanted and some of them allowed virtually indefinite serial transplantation. 5/6 samples were successfully transduced using lentiviruses. Neither serial transplantation nor genetic engineering markedly altered sample characteristics analyzed. Transgene expression was stable in PDX AML cells. Example given, recombinant luciferase enabled bioluminescence in vivo imaging and highly sensitive and reliable disease monitoring; imaging visualized minimal disease at 1 PDX cell in 10000 mouse bone marrow cells and facilitated quantifying leukemia initiating cells. We conclude that serial expansion, genetic engineering and imaging represent valuable tools to improve the individualized xenograft mouse model of AML. Prospectively, these advancements enable repetitive, clinically relevant studies on AML biology and preclinical treatment trials on genetically defined and heterogeneous subgroups. PMID:25793878

  15. Barley whole exome capture: a tool for genomic research in the genus Hordeum and beyond

    PubMed Central

    Mascher, Martin; Richmond, Todd A; Gerhardt, Daniel J; Himmelbach, Axel; Clissold, Leah; Sampath, Dharanya; Ayling, Sarah; Steuernagel, Burkhard; Pfeifer, Matthias; D'Ascenzo, Mark; Akhunov, Eduard D; Hedley, Pete E; Gonzales, Ana M; Morrell, Peter L; Kilian, Benjamin; Blattner, Frank R; Scholz, Uwe; Mayer, Klaus FX; Flavell, Andrew J; Muehlbauer, Gary J; Waugh, Robbie; Jeddeloh, Jeffrey A; Stein, Nils

    2013-01-01

    Advanced resources for genome-assisted research in barley (Hordeum vulgare) including a whole-genome shotgun assembly and an integrated physical map have recently become available. These have made possible studies that aim to assess genetic diversity or to isolate single genes by whole-genome resequencing and in silico variant detection. However such an approach remains expensive given the 5 Gb size of the barley genome. Targeted sequencing of the mRNA-coding exome reduces barley genomic complexity more than 50-fold, thus dramatically reducing this heavy sequencing and analysis load. We have developed and employed an in-solution hybridization-based sequence capture platform to selectively enrich for a 61.6 megabase coding sequence target that includes predicted genes from the genome assembly of the cultivar Morex as well as publicly available full-length cDNAs and de novo assembled RNA-Seq consensus sequence contigs. The platform provides a highly specific capture with substantial and reproducible enrichment of targeted exons, both for cultivated barley and related species. We show that this exome capture platform provides a clear path towards a broader and deeper understanding of the natural variation residing in the mRNA-coding part of the barley genome and will thus constitute a valuable resource for applications such as mapping-by-sequencing and genetic diversity analyzes. PMID:23889683

  16. Quantitative phenotyping via deep barcode sequencing.

    PubMed

    Smith, Andrew M; Heisler, Lawrence E; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J; Chee, Mark; Roth, Frederick P; Giaever, Guri; Nislow, Corey

    2009-10-01

    Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale.

  17. Comprehensive Annotation of the Parastagonospora nodorum Reference Genome Using Next-Generation Genomics, Transcriptomics and Proteogenomics

    PubMed Central

    Dodhia, Kejal; Stoll, Thomas; Hastie, Marcus; Furuki, Eiko; Ellwood, Simon R.; Williams, Angela H.; Tan, Yew-Foon; Testa, Alison C.; Gorman, Jeffrey J.; Oliver, Richard P.

    2016-01-01

    Parastagonospora nodorum, the causal agent of Septoria nodorum blotch (SNB), is an economically important pathogen of wheat (Triticum spp.), and a model for the study of necrotrophic pathology and genome evolution. The reference P. nodorum strain SN15 was the first Dothideomycete with a published genome sequence, and has been used as the basis for comparison within and between species. Here we present an updated reference genome assembly with corrections of SNP and indel errors in the underlying genome assembly from deep resequencing data as well as extensive manual annotation of gene models using transcriptomic and proteomic sources of evidence (https://github.com/robsyme/Parastagonospora_nodorum_SN15). The updated assembly and annotation includes 8,366 genes with modified protein sequence and 866 new genes. This study shows the benefits of using a wide variety of experimental methods allied to expert curation to generate a reliable set of gene models. PMID:26840125

  18. Genomic mutation consequence calculator.

    PubMed

    Major, John E

    2007-11-15

    The genomic mutation consequence calculator (GMCC) is a tool that will reliably and quickly calculate the consequence of arbitrary genomic mutations. GMCC also reports supporting annotations for the specified genomic region. The particular strength of the GMCC is it works in genomic space, not simply in spliced transcript space as some similar tools do. Within gene features, GMCC can report on the effects on splice site, UTR and coding regions in all isoforms affected by the mutation. A considerable number of genomic annotations are also reported, including: genomic conservation score, known SNPs, COSMIC mutations, disease associations and others. The manual interface also offers link outs to various external databases and resources. In batch mode, GMCC returns a csv file which can easily be parsed by the end user. GMCC is intended to support the many tumor resequencing efforts, but can be useful to any study investigating genomic mutations.

  19. Genome sequence of the hot pepper provides insights into the evolution of pungency in Capsicum species.

    PubMed

    Kim, Seungill; Park, Minkyu; Yeom, Seon-In; Kim, Yong-Min; Lee, Je Min; Lee, Hyun-Ah; Seo, Eunyoung; Choi, Jaeyoung; Cheong, Kyeongchae; Kim, Ki-Tae; Jung, Kyongyong; Lee, Gir-Won; Oh, Sang-Keun; Bae, Chungyun; Kim, Saet-Byul; Lee, Hye-Young; Kim, Shin-Young; Kim, Myung-Shin; Kang, Byoung-Cheorl; Jo, Yeong Deuk; Yang, Hee-Bum; Jeong, Hee-Jin; Kang, Won-Hee; Kwon, Jin-Kyung; Shin, Chanseok; Lim, Jae Yun; Park, June Hyun; Huh, Jin Hoe; Kim, June-Sik; Kim, Byung-Dong; Cohen, Oded; Paran, Ilan; Suh, Mi Chung; Lee, Saet Buyl; Kim, Yeon-Ki; Shin, Younhee; Noh, Seung-Jae; Park, Junhyung; Seo, Young Sam; Kwon, Suk-Yoon; Kim, Hyun A; Park, Jeong Mee; Kim, Hyun-Jin; Choi, Sang-Bong; Bosland, Paul W; Reeves, Gregory; Jo, Sung-Hwan; Lee, Bong-Woo; Cho, Hyung-Taeg; Choi, Hee-Seung; Lee, Min-Soo; Yu, Yeisoo; Do Choi, Yang; Park, Beom-Seok; van Deynze, Allen; Ashrafi, Hamid; Hill, Theresa; Kim, Woo Taek; Pai, Hyun-Sook; Ahn, Hee Kyung; Yeam, Inhwa; Giovannoni, James J; Rose, Jocelyn K C; Sørensen, Iben; Lee, Sang-Jik; Kim, Ryan W; Choi, Ik-Young; Choi, Beom-Soon; Lim, Jong-Sung; Lee, Yong-Hwan; Choi, Doil

    2014-03-01

    Hot pepper (Capsicum annuum), one of the oldest domesticated crops in the Americas, is the most widely grown spice crop in the world. We report whole-genome sequencing and assembly of the hot pepper (Mexican landrace of Capsicum annuum cv. CM334) at 186.6× coverage. We also report resequencing of two cultivated peppers and de novo sequencing of the wild species Capsicum chinense. The genome size of the hot pepper was approximately fourfold larger than that of its close relative tomato, and the genome showed an accumulation of Gypsy and Caulimoviridae family elements. Integrative genomic and transcriptomic analyses suggested that change in gene expression and neofunctionalization of capsaicin synthase have shaped capsaicinoid biosynthesis. We found differential molecular patterns of ripening regulators and ethylene synthesis in hot pepper and tomato. The reference genome will serve as a platform for improving the nutritional and medicinal values of Capsicum species.

  20. Butterfly genome reveals promiscuous exchange of mimicry adaptations among species

    PubMed Central

    Dasmahapatra, Kanchon K; Walters, James R.; Briscoe, Adriana D.; Davey, John W.; Whibley, Annabel; Nadeau, Nicola J.; Zimin, Aleksey V.; Hughes, Daniel S. T.; Ferguson, Laura C.; Martin, Simon H.; Salazar, Camilo; Lewis, James J.; Adler, Sebastian; Ahn, Seung-Joon; Baker, Dean A.; Baxter, Simon W.; Chamberlain, Nicola L.; Chauhan, Ritika; Counterman, Brian A.; Dalmay, Tamas; Gilbert, Lawrence E.; Gordon, Karl; Heckel, David G.; Hines, Heather M.; Hoff, Katharina J.; Holland, Peter W.H.; Jacquin-Joly, Emmanuelle; Jiggins, Francis M.; Jones, Robert T.; Kapan, Durrell D.; Kersey, Paul; Lamas, Gerardo; Lawson, Daniel; Mapleson, Daniel; Maroja, Luana S.; Martin, Arnaud; Moxon, Simon; Palmer, William J.; Papa, Riccardo; Papanicolaou, Alexie; Pauchet, Yannick; Ray, David A.; Rosser, Neil; Salzberg, Steven L.; Supple, Megan A.; Surridge, Alison; Tenger-Trolander, Ayse; Vogel, Heiko; Wilkinson, Paul A.; Wilson, Derek; Yorke, James A.; Yuan, Furong; Balmuth, Alexi L.; Eland, Cathlene; Gharbi, Karim; Thomson, Marian; Gibbs, Richard A.; Han, Yi; Jayaseelan, Joy C.; Kovar, Christie; Mathew, Tittu; Muzny, Donna M.; Ongeri, Fiona; Pu, Ling-Ling; Qu, Jiaxin; Thornton, Rebecca L.; Worley, Kim C.; Wu, Yuan-Qing; Linares, Mauricio; Blaxter, Mark L.; Constant, Richard H. ffrench; Joron, Mathieu; Kronforst, Marcus R.; Mullen, Sean P.; Reed, Robert D.; Scherer, Steven E.; Richards, Stephen; Mallet, James; McMillan, W. Owen; Jiggins, Chris D.

    2012-01-01

    The evolutionary importance of hybridization and introgression has long been debated1. We used genomic tools to investigate introgression in Heliconius, a rapidly radiating genus of neotropical butterflies widely used in studies of ecology, behaviour, mimicry and speciation2-5 . We sequenced the genome of Heliconius melpomene and compared it with other taxa to investigate chromosomal evolution in Lepidoptera and gene flow among multiple Heliconius species and races. Among 12,657 predicted genes for Heliconius, biologically important expansions of families of chemosensory and Hox genes are particularly noteworthy. Chromosomal organisation has remained broadly conserved since the Cretaceous, when butterflies split from the silkmoth lineage. Using genomic resequencing, we show hybrid exchange of genes between three co-mimics, H. melpomene, H. timareta, and H. elevatus, especially at two genomic regions that control mimicry pattern. Closely related Heliconius species clearly exchange protective colour pattern genes promiscuously, implying a major role for hybridization in adaptive radiation. PMID:22722851

Top