Utilization of sequence on relatives to improve analysis of individuals' low-coverage NGS data
USDA-ARS?s Scientific Manuscript database
Low-coverage sequence data is expected to have low call rates under the prevailing paradigm that genotypes are first “called” from sequence data of each individual independently and subsequent analyses (including determination of haplotypes) are dependent on those called genotypes. However, provide...
Breaking Lander-Waterman’s Coverage Bound
Nashta-ali, Damoun; Motahari, Seyed Abolfazl; Hosseinkhalaj, Babak
2016-01-01
Lander-Waterman’s coverage bound establishes the total number of reads required to cover the whole genome of size G bases. In fact, their bound is a direct consequence of the well-known solution to the coupon collector’s problem which proves that for such genome, the total number of bases to be sequenced should be O(G ln G). Although the result leads to a tight bound, it is based on a tacit assumption that the set of reads are first collected through a sequencing process and then are processed through a computation process, i.e., there are two different machines: one for sequencing and one for processing. In this paper, we present a significant improvement compared to Lander-Waterman’s result and prove that by combining the sequencing and computing processes, one can re-sequence the whole genome with as low as O(G) sequenced bases in total. Our approach also dramatically reduces the required computational power for the combined process. Simulation results are performed on real genomes with different sequencing error rates. The results support our theory predicting the log G improvement on coverage bound and corresponding reduction in the total number of bases required to be sequenced. PMID:27806058
Variant calling in low-coverage whole genome sequencing of a Native American population sample.
Bizon, Chris; Spiegel, Michael; Chasse, Scott A; Gizer, Ian R; Li, Yun; Malc, Ewa P; Mieczkowski, Piotr A; Sailsbery, Josh K; Wang, Xiaoshu; Ehlers, Cindy L; Wilhelmsen, Kirk C
2014-01-30
The reduction in the cost of sequencing a human genome has led to the use of genotype sampling strategies in order to impute and infer the presence of sequence variants that can then be tested for associations with traits of interest. Low-coverage Whole Genome Sequencing (WGS) is a sampling strategy that overcomes some of the deficiencies seen in fixed content SNP array studies. Linkage-disequilibrium (LD) aware variant callers, such as the program Thunder, may provide a calling rate and accuracy that makes a low-coverage sequencing strategy viable. We examined the performance of an LD-aware variant calling strategy in a population of 708 low-coverage whole genome sequences from a community sample of Native Americans. We assessed variant calling through a comparison of the sequencing results to genotypes measured in 641 of the same subjects using a fixed content first generation exome array. The comparison was made using the variant calling routines GATK Unified Genotyper program and the LD-aware variant caller Thunder. Thunder was found to improve concordance in a coverage dependent fashion, while correctly calling nearly all of the common variants as well as a high percentage of the rare variants present in the sample. Low-coverage WGS is a strategy that appears to collect genetic information intermediate in scope between fixed content genotyping arrays and deep-coverage WGS. Our data suggests that low-coverage WGS is a viable strategy with a greater chance of discovering novel variants and associations than fixed content arrays for large sample association analyses.
Haplotype estimation using sequencing reads.
Delaneau, Olivier; Howie, Bryan; Cox, Anthony J; Zagury, Jean-François; Marchini, Jonathan
2013-10-03
High-throughput sequencing technologies produce short sequence reads that can contain phase information if they span two or more heterozygote genotypes. This information is not routinely used by current methods that infer haplotypes from genotype data. We have extended the SHAPEIT2 method to use phase-informative sequencing reads to improve phasing accuracy. Our model incorporates the read information in a probabilistic model through base quality scores within each read. The method is primarily designed for high-coverage sequence data or data sets that already have genotypes called. One important application is phasing of single samples sequenced at high coverage for use in medical sequencing and studies of rare diseases. Our method can also use existing panels of reference haplotypes. We tested the method by using a mother-father-child trio sequenced at high-coverage by Illumina together with the low-coverage sequence data from the 1000 Genomes Project (1000GP). We found that use of phase-informative reads increases the mean distance between switch errors by 22% from 274.4 kb to 328.6 kb. We also used male chromosome X haplotypes from the 1000GP samples to simulate sequencing reads with varying insert size, read length, and base error rate. When using short 100 bp paired-end reads, we found that using mixtures of insert sizes produced the best results. When using longer reads with high error rates (5-20 kb read with 4%-15% error per base), phasing performance was substantially improved. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Confetti: A Multiprotease Map of the HeLa Proteome for Comprehensive Proteomics*
Guo, Xiaofeng; Trudgian, David C.; Lemoff, Andrew; Yadavalli, Sivaramakrishna; Mirzaei, Hamid
2014-01-01
Bottom-up proteomics largely relies on tryptic peptides for protein identification and quantification. Tryptic digestion often provides limited coverage of protein sequence because of issues such as peptide length, ionization efficiency, and post-translational modification colocalization. Unfortunately, a region of interest in a protein, for example, because of proximity to an active site or the presence of important post-translational modifications, may not be covered by tryptic peptides. Detection limits, quantification accuracy, and isoform differentiation can also be improved with greater sequence coverage. Selected reaction monitoring (SRM) would also greatly benefit from being able to identify additional targetable sequences. In an attempt to improve protein sequence coverage and to target regions of proteins that do not generate useful tryptic peptides, we deployed a multiprotease strategy on the HeLa proteome. First, we used seven commercially available enzymes in single, double, and triple enzyme combinations. A total of 48 digests were performed. 5223 proteins were detected by analyzing the unfractionated cell lysate digest directly; with 42% mean sequence coverage. Additional strong-anion exchange fractionation of the most complementary digests permitted identification of over 3000 more proteins, with improved mean sequence coverage. We then constructed a web application (https://proteomics.swmed.edu/confetti) that allows the community to examine a target protein or protein isoform in order to discover the enzyme or combination of enzymes that would yield peptides spanning a certain region of interest in the sequence. Finally, we examined the use of nontryptic digests for SRM. From our strong-anion exchange fractionation data, we were able to identify three or more proteotypic SRM candidates within a single digest for 6056 genes. Surprisingly, in 25% of these cases the digest producing the most observable proteotypic peptides was neither trypsin nor Lys-C. SRM analysis of Asp-N versus tryptic peptides for eight proteins determined that Asp-N yielded higher signal in five of eight cases. PMID:24696503
Fu, Yong-Bi; Peterson, Gregory W; Dong, Yibo
2016-04-07
Genotyping-by-sequencing (GBS) has emerged as a useful genomic approach for exploring genome-wide genetic variation. However, GBS commonly samples a genome unevenly and can generate a substantial amount of missing data. These technical features would limit the power of various GBS-based genetic and genomic analyses. Here we present software called IgCoverage for in silico evaluation of genomic coverage through GBS with an individual or pair of restriction enzymes on one sequenced genome, and report a new set of 21 restriction enzyme combinations that can be applied to enhance GBS applications. These enzyme combinations were developed through an application of IgCoverage on 22 plant, animal, and fungus species with sequenced genomes, and some of them were empirically evaluated with different runs of Illumina MiSeq sequencing in 12 plant species. The in silico analysis of 22 organisms revealed up to eight times more genome coverage for the new combinations consisted of pairing four- or five-cutter restriction enzymes than the commonly used enzyme combination PstI + MspI. The empirical evaluation of the new enzyme combination (HinfI + HpyCH4IV) in 12 plant species showed 1.7-6 times more genome coverage than PstI + MspI, and 2.3 times more genome coverage in dicots than monocots. Also, the SNP genotyping in 12 Arabidopsis and 12 rice plants revealed that HinfI + HpyCH4IV generated 7 and 1.3 times more SNPs (with 0-16.7% missing observations) than PstI + MspI, respectively. These findings demonstrate that these novel enzyme combinations can be utilized to increase genome sampling and improve SNP genotyping in various GBS applications. Copyright © 2016 Fu et al.
Assessing pooled BAC and whole genome shotgun strategies for assembly of complex genomes.
Haiminen, Niina; Feltus, F Alex; Parida, Laxmi
2011-04-15
We investigate if pooling BAC clones and sequencing the pools can provide for more accurate assembly of genome sequences than the "whole genome shotgun" (WGS) approach. Furthermore, we quantify this accuracy increase. We compare the pooled BAC and WGS approaches using in silico simulations. Standard measures of assembly quality focus on assembly size and fragmentation, which are desirable for large whole genome assemblies. We propose additional measures enabling easy and visual comparison of assembly quality, such as rearrangements and redundant sequence content, relative to the known target sequence. The best assembly quality scores were obtained using 454 coverage of 15× linear and 5× paired (3kb insert size) reads (15L-5P) on Arabidopsis. This regime gave similarly good results on four additional plant genomes of very different GC and repeat contents. BAC pooling improved assembly scores over WGS assembly, coverage and redundancy scores improving the most. BAC pooling works better than WGS, however, both require a physical map to order the scaffolds. Pool sizes up to 12Mbp work well, suggesting this pooling density to be effective in medium-scale re-sequencing applications such as targeted sequencing of QTL intervals for candidate gene discovery. Assuming the current Roche/454 Titanium sequencing limitations, a 12 Mbp region could be re-sequenced with a full plate of linear reads and a half plate of paired-end reads, yielding 15L-5P coverage after read pre-processing. Our simulation suggests that massively over-sequencing may not improve accuracy. Our scoring measures can be used generally to evaluate and compare results of simulated genome assemblies.
Improved multiple displacement amplification (iMDA) and ultraclean reagents.
Motley, S Timothy; Picuri, John M; Crowder, Chris D; Minich, Jeremiah J; Hofstadler, Steven A; Eshoo, Mark W
2014-06-06
Next-generation sequencing sample preparation requires nanogram to microgram quantities of DNA; however, many relevant samples are comprised of only a few cells. Genomic analysis of these samples requires a whole genome amplification method that is unbiased and free of exogenous DNA contamination. To address these challenges we have developed protocols for the production of DNA-free consumables including reagents and have improved upon multiple displacement amplification (iMDA). A specialized ethylene oxide treatment was developed that renders free DNA and DNA present within Gram positive bacterial cells undetectable by qPCR. To reduce DNA contamination in amplification reagents, a combination of ion exchange chromatography, filtration, and lot testing protocols were developed. Our multiple displacement amplification protocol employs a second strand-displacing DNA polymerase, improved buffers, improved reaction conditions and DNA free reagents. The iMDA protocol, when used in combination with DNA-free laboratory consumables and reagents, significantly improved efficiency and accuracy of amplification and sequencing of specimens with moderate to low levels of DNA. The sensitivity and specificity of sequencing of amplified DNA prepared using iMDA was compared to that of DNA obtained with two commercial whole genome amplification kits using 10 fg (~1-2 bacterial cells worth) of bacterial genomic DNA as a template. Analysis showed >99% of the iMDA reads mapped to the template organism whereas only 0.02% of the reads from the commercial kits mapped to the template. To assess the ability of iMDA to achieve balanced genomic coverage, a non-stochastic amount of bacterial genomic DNA (1 pg) was amplified and sequenced, and data obtained were compared to sequencing data obtained directly from genomic DNA. The iMDA DNA and genomic DNA sequencing had comparable coverage 99.98% of the reference genome at ≥1X coverage and 99.9% at ≥5X coverage while maintaining both balance and representation of the genome. The iMDA protocol in combination with DNA-free laboratory consumables, significantly improved the ability to sequence specimens with low levels of DNA. iMDA has broad utility in metagenomics, diagnostics, ancient DNA analysis, pre-implantation embryo screening, single-cell genomics, whole genome sequencing of unculturable organisms, and forensic applications for both human and microbial targets.
Assessing pooled BAC and whole genome shotgun strategies for assembly of complex genomes
2011-01-01
Background We investigate if pooling BAC clones and sequencing the pools can provide for more accurate assembly of genome sequences than the "whole genome shotgun" (WGS) approach. Furthermore, we quantify this accuracy increase. We compare the pooled BAC and WGS approaches using in silico simulations. Standard measures of assembly quality focus on assembly size and fragmentation, which are desirable for large whole genome assemblies. We propose additional measures enabling easy and visual comparison of assembly quality, such as rearrangements and redundant sequence content, relative to the known target sequence. Results The best assembly quality scores were obtained using 454 coverage of 15× linear and 5× paired (3kb insert size) reads (15L-5P) on Arabidopsis. This regime gave similarly good results on four additional plant genomes of very different GC and repeat contents. BAC pooling improved assembly scores over WGS assembly, coverage and redundancy scores improving the most. Conclusions BAC pooling works better than WGS, however, both require a physical map to order the scaffolds. Pool sizes up to 12Mbp work well, suggesting this pooling density to be effective in medium-scale re-sequencing applications such as targeted sequencing of QTL intervals for candidate gene discovery. Assuming the current Roche/454 Titanium sequencing limitations, a 12 Mbp region could be re-sequenced with a full plate of linear reads and a half plate of paired-end reads, yielding 15L-5P coverage after read pre-processing. Our simulation suggests that massively over-sequencing may not improve accuracy. Our scoring measures can be used generally to evaluate and compare results of simulated genome assemblies. PMID:21496274
NASA Astrophysics Data System (ADS)
Nicolardi, Simone; Giera, Martin; Kooijman, Pieter; Kraj, Agnieszka; Chervet, Jean-Pierre; Deelder, André M.; van der Burgt, Yuri E. M.
2013-12-01
Particularly in the field of middle- and top-down peptide and protein analysis, disulfide bridges can severely hinder fragmentation and thus impede sequence analysis (coverage). Here we present an on-line/electrochemistry/ESI-FTICR-MS approach, which was applied to the analysis of the primary structure of oxytocin, containing one disulfide bridge, and of hepcidin, containing four disulfide bridges. The presented workflow provided up to 80 % (on-line) conversion of disulfide bonds in both peptides. With minimal sample preparation, such reduction resulted in a higher number of peptide backbone cleavages upon CID or ETD fragmentation, and thus yielded improved sequence coverage. The cycle times, including electrode recovery, were rapid and, therefore, might very well be coupled with liquid chromatography for protein or peptide separation, which has great potential for high-throughput analysis.
Recovering complete and draft population genomes from metagenome datasets
Sangwan, Naseer; Xia, Fangfang; Gilbert, Jack A.
2016-03-08
Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem ofmore » chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution.« less
Recovering complete and draft population genomes from metagenome datasets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sangwan, Naseer; Xia, Fangfang; Gilbert, Jack A.
Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem ofmore » chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution.« less
Ancestry estimation and control of population stratification for sequence-based association studies.
Wang, Chaolong; Zhan, Xiaowei; Bragg-Gresham, Jennifer; Kang, Hyun Min; Stambolian, Dwight; Chew, Emily Y; Branham, Kari E; Heckenlively, John; Fulton, Robert; Wilson, Richard K; Mardis, Elaine R; Lin, Xihong; Swaroop, Anand; Zöllner, Sebastian; Abecasis, Gonçalo R
2014-04-01
Estimating individual ancestry is important in genetic association studies where population structure leads to false positive signals, although assigning ancestry remains challenging with targeted sequence data. We propose a new method for the accurate estimation of individual genetic ancestry, based on direct analysis of off-target sequence reads, and implement our method in the publicly available LASER software. We validate the method using simulated and empirical data and show that the method can accurately infer worldwide continental ancestry when used with sequencing data sets with whole-genome shotgun coverage as low as 0.001×. For estimates of fine-scale ancestry within Europe, the method performs well with coverage of 0.1×. On an even finer scale, the method improves discrimination between exome-sequenced study participants originating from different provinces within Finland. Finally, we show that our method can be used to improve case-control matching in genetic association studies and to reduce the risk of spurious findings due to population structure.
Expanding proteome coverage with orthogonal-specificity α-Lytic proteases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Meyer, Jesse G.; Kim, Sangtae; Maltby, David A.
2014-03-01
Bottom-up proteomics studies traditionally involve proteome digestion with a single protease, trypsin. However, trypsin alone does not generate peptides that encompass the entire proteome. Alternative proteases have been explored, but most have specificity for charged amino acid side chains. Therefore, additional proteases that improve proteome coverage by cleavage at sequences complimentary to trypsin may increase proteome coverage. We demonstrate the novel application of two proteases for bottom-up proteomics: wild type alpha-lytic protease (WaLP), and an active site mutant of WaLP, M190A alpha-lytic protease (MaLP). We assess several relevant factors including MS/MS fragmentation, peptide length, peptide yield, and protease specificity. Bymore » combining data from separate digestions with trypsin, LysC, WaLP, and MaLP, proteome coverage was increased 101% compared to trypsin digestion alone. To demonstrate how the gained sequence coverage can access additional PTM information, we show identification of a number of novel phosphorylation sites in the S. pombe proteome and include an illustrative example from the protein MPD2, wherein two novel sites are identified, one in a tryptic peptide too short to identify and the other in a sequence devoid of tryptic sites. The specificity of WaLP and MaLP for aliphatic amino acid side chains was particularly valuable for coverage of membrane protein sequences, which increased 350% when the data from trypsin, LysC, WaLP, and MaLP were combined.« less
ECHO: A reference-free short-read error correction algorithm
Kao, Wei-Chun; Chan, Andrew H.; Song, Yun S.
2011-01-01
Developing accurate, scalable algorithms to improve data quality is an important computational challenge associated with recent advances in high-throughput sequencing technology. In this study, a novel error-correction algorithm, called ECHO, is introduced for correcting base-call errors in short-reads, without the need of a reference genome. Unlike most previous methods, ECHO does not require the user to specify parameters of which optimal values are typically unknown a priori. ECHO automatically sets the parameters in the assumed model and estimates error characteristics specific to each sequencing run, while maintaining a running time that is within the range of practical use. ECHO is based on a probabilistic model and is able to assign a quality score to each corrected base. Furthermore, it explicitly models heterozygosity in diploid genomes and provides a reference-free method for detecting bases that originated from heterozygous sites. On both real and simulated data, ECHO is able to improve the accuracy of previous error-correction methods by several folds to an order of magnitude, depending on the sequence coverage depth and the position in the read. The improvement is most pronounced toward the end of the read, where previous methods become noticeably less effective. Using a whole-genome yeast data set, it is demonstrated here that ECHO is capable of coping with nonuniform coverage. Also, it is shown that using ECHO to perform error correction as a preprocessing step considerably facilitates de novo assembly, particularly in the case of low-to-moderate sequence coverage depth. PMID:21482625
Improved Analysis of Nanopore Sequence Data and Scanning Nanopore Techniques
NASA Astrophysics Data System (ADS)
Szalay, Tamas
The field of nanopore research has been driven by the need to inexpensively and rapidly sequence DNA. In order to help realize this goal, this thesis describes the PoreSeq algorithm that identifies and corrects errors in real-world nanopore sequencing data and improves the accuracy of de novo genome assembly with increasing coverage depth. The approach relies on modeling the possible sources of uncertainty that occur as DNA advances through the nanopore and then using this model to find the sequence that best explains multiple reads of the same region of DNA. PoreSeq increases nanopore sequencing read accuracy of M13 bacteriophage DNA from 85% to 99% at 100X coverage. We also use the algorithm to assemble E. coli with 30X coverage and the lambda genome at a range of coverages from 3X to 50X. Additionally, we classify sequence variants at an order of magnitude lower coverage than is possible with existing methods. This thesis also reports preliminary progress towards controlling the motion of DNA using two nanopores instead of one. The speed at which the DNA travels through the nanopore needs to be carefully controlled to facilitate the detection of individual bases. A second nanopore in close proximity to the first could be used to slow or stop the motion of the DNA in order to enable a more accurate readout. The fabrication process for a new pyramidal nanopore geometry was developed in order to facilitate the positioning of the nanopores. This thesis demonstrates that two of them can be placed close enough to interact with a single molecule of DNA, which is a prerequisite for being able to use the driving force of the pores to exert fine control over the motion of the DNA. Another strategy for reading the DNA is to trap it completely with one pore and to move the second nanopore instead. To that end, this thesis also shows that a single strand of immobilized DNA can be captured in a scanning nanopore and examined for a full hour, with data from many scans at many different voltages obtained in order to detect a bound protein placed partway along the molecule.
Highly multiplexed targeted DNA sequencing from single nuclei.
Leung, Marco L; Wang, Yong; Kim, Charissa; Gao, Ruli; Jiang, Jerry; Sei, Emi; Navin, Nicholas E
2016-02-01
Single-cell DNA sequencing methods are challenged by poor physical coverage, high technical error rates and low throughput. To address these issues, we developed a single-cell DNA sequencing protocol that combines flow-sorting of single nuclei, time-limited multiple-displacement amplification (MDA), low-input library preparation, DNA barcoding, targeted capture and next-generation sequencing (NGS). This approach represents a major improvement over our previous single nucleus sequencing (SNS) Nature Protocols paper in terms of generating higher-coverage data (>90%), thereby enabling the detection of genome-wide variants in single mammalian cells at base-pair resolution. Furthermore, by pooling 48-96 single-cell libraries together for targeted capture, this approach can be used to sequence many single-cell libraries in parallel in a single reaction. This protocol greatly reduces the cost of single-cell DNA sequencing, and it can be completed in 5-6 d by advanced users. This single-cell DNA sequencing protocol has broad applications for studying rare cells and complex populations in diverse fields of biological research and medicine.
Analysis of quality raw data of second generation sequencers with Quality Assessment Software.
Ramos, Rommel Tj; Carneiro, Adriana R; Baumbach, Jan; Azevedo, Vasco; Schneider, Maria Pc; Silva, Artur
2011-04-18
Second generation technologies have advantages over Sanger; however, they have resulted in new challenges for the genome construction process, especially because of the small size of the reads, despite the high degree of coverage. Independent of the program chosen for the construction process, DNA sequences are superimposed, based on identity, to extend the reads, generating contigs; mismatches indicate a lack of homology and are not included. This process improves our confidence in the sequences that are generated. We developed Quality Assessment Software, with which one can review graphs showing the distribution of quality values from the sequencing reads. This software allow us to adopt more stringent quality standards for sequence data, based on quality-graph analysis and estimated coverage after applying the quality filter, providing acceptable sequence coverage for genome construction from short reads. Quality filtering is a fundamental step in the process of constructing genomes, as it reduces the frequency of incorrect alignments that are caused by measuring errors, which can occur during the construction process due to the size of the reads, provoking misassemblies. Application of quality filters to sequence data, using the software Quality Assessment, along with graphing analyses, provided greater precision in the definition of cutoff parameters, which increased the accuracy of genome construction.
NASA Astrophysics Data System (ADS)
Weisbrod, Chad R.; Kaiser, Nathan K.; Syka, John E. P.; Early, Lee; Mullen, Christopher; Dunyach, Jean-Jacques; English, A. Michelle; Anderson, Lissa C.; Blakney, Greg T.; Shabanowitz, Jeffrey; Hendrickson, Christopher L.; Marshall, Alan G.; Hunt, Donald F.
2017-09-01
High resolution mass spectrometry is a key technology for in-depth protein characterization. High-field Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) enables high-level interrogation of intact proteins in the most detail to date. However, an appropriate complement of fragmentation technologies must be paired with FTMS to provide comprehensive sequence coverage, as well as characterization of sequence variants, and post-translational modifications. Here we describe the integration of front-end electron transfer dissociation (FETD) with a custom-built 21 tesla FT-ICR mass spectrometer, which yields unprecedented sequence coverage for proteins ranging from 2.8 to 29 kDa, without the need for extensive spectral averaging (e.g., 60% sequence coverage for apo-myoglobin with four averaged acquisitions). The system is equipped with a multipole storage device separate from the ETD reaction device, which allows accumulation of multiple ETD fragment ion fills. Consequently, an optimally large product ion population is accumulated prior to transfer to the ICR cell for mass analysis, which improves mass spectral signal-to-noise ratio, dynamic range, and scan rate. We find a linear relationship between protein molecular weight and minimum number of ETD reaction fills to achieve optimum sequence coverage, thereby enabling more efficient use of instrument data acquisition time. Finally, real-time scaling of the number of ETD reactions fills during method-based acquisition is shown, and the implications for LC-MS/MS top-down analysis are discussed. [Figure not available: see fulltext.
Rare Cell Detection by Single-Cell RNA Sequencing as Guided by Single-Molecule RNA FISH.
Torre, Eduardo; Dueck, Hannah; Shaffer, Sydney; Gospocic, Janko; Gupte, Rohit; Bonasio, Roberto; Kim, Junhyong; Murray, John; Raj, Arjun
2018-02-28
Although single-cell RNA sequencing can reliably detect large-scale transcriptional programs, it is unclear whether it accurately captures the behavior of individual genes, especially those that express only in rare cells. Here, we use single-molecule RNA fluorescence in situ hybridization as a gold standard to assess trade-offs in single-cell RNA-sequencing data for detecting rare cell expression variability. We quantified the gene expression distribution for 26 genes that range from ubiquitous to rarely expressed and found that the correspondence between estimates across platforms improved with both transcriptome coverage and increased number of cells analyzed. Further, by characterizing the trade-off between transcriptome coverage and number of cells analyzed, we show that when the number of genes required to answer a given biological question is small, then greater transcriptome coverage is more important than analyzing large numbers of cells. More generally, our report provides guidelines for selecting quality thresholds for single-cell RNA-sequencing experiments aimed at rare cell analyses. Copyright © 2018 Elsevier Inc. All rights reserved.
Hornshøj, Linda; Benn, Christine Stabell; Fernandes, Manuel; Rodrigues, Amabelia; Aaby, Peter; Fisker, Ane Bærent
2012-01-01
Objective The WHO aims for 90% coverage of the Expanded Program on Immunization (EPI), which in Guinea-Bissau included BCG vaccine at birth, three doses of diphtheria−tetanus−pertussis vaccine (DTP) and oral polio vaccine (OPV) at 6, 10 and 14 weeks and measles vaccine (MV) at 9 months when this study was conducted. The WHO assesses coverage by 12 months of age. The sequence of vaccines may have an effect on child mortality, but is not considered in official statistics or assessments of programme performance. We assessed vaccination coverage and frequency of out-of-sequence vaccinations by 12 and 24 months of age. Design Observational cohort study. Setting and participants The Bandim Health Project's (BHP) rural Health and Demographic Surveillance site covers 258 randomly selected villages in all regions of Guinea-Bissau. Villages are visited biannually and vaccination cards inspected to ascertain vaccination status. Between 2003 and 2009 vaccination status by 12 months of age was assessed for 5806 children aged 12–23 months; vaccination status by 24 months of age was assessed for 3792 children aged 24–35 months. Outcome measures Coverage of EPI vaccinations and frequency of out-of-sequence vaccinations. Results Half of 12-month-old children and 65% of 24-month-old children had completed all EPI vaccinations. Many children received vaccines out of sequence: by 12 months of age 54% of BCG-vaccinated children had received DTP with or before BCG and 28% of measles-vaccinated children had received DTP with or after MV. By 24 months of age the proportion of out-of-sequence vaccinations was 58% and 35%, respectively, for BCG and MV. Conclusions In rural Guinea-Bissau vaccination coverage by 12 months of age was low, but continued to increase beyond 12 months of age. More than half of all children received vaccinations out of sequence. This highlights the need to improve vaccination services. PMID:23166127
Lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals.
Taylor, Jeremy F; Whitacre, Lynsey K; Hoff, Jesse L; Tizioto, Polyana C; Kim, JaeWoo; Decker, Jared E; Schnabel, Robert D
2016-08-17
Decreasing sequencing costs and development of new protocols for characterizing global methylation, gene expression patterns and regulatory regions have stimulated the generation of large livestock datasets. Here, we discuss experiences in the analysis of whole-genome and transcriptome sequence data. We analyzed whole-genome sequence (WGS) data from 132 individuals from five canid species (Canis familiaris, C. latrans, C. dingo, C. aureus and C. lupus) and 61 breeds, three bison (Bison bison), 64 water buffalo (Bubalus bubalis) and 297 bovines from 17 breeds. By individual, data vary in extent of reference genome depth of coverage from 4.9X to 64.0X. We have also analyzed RNA-seq data for 580 samples representing 159 Bos taurus and Rattus norvegicus animals and 98 tissues. By aligning reads to a reference assembly and calling variants, we assessed effects of average depth of coverage on the actual coverage and on the number of called variants. We examined the identity of unmapped reads by assembling them and querying produced contigs against the non-redundant nucleic acids database. By imputing high-density single nucleotide polymorphism data on 4010 US registered Angus animals to WGS using Run4 of the 1000 Bull Genomes Project and assessing the accuracy of imputation, we identified misassembled reference sequence regions. We estimate that a 24X depth of coverage is required to achieve 99.5 % coverage of the reference assembly and identify 95 % of the variants within an individual's genome. Genomes sequenced to low average coverage (e.g., <10X) may fail to cover 10 % of the reference genome and identify <75 % of variants. About 10 % of genomic DNA or transcriptome sequence reads fail to align to the reference assembly. These reads include loci missing from the reference assembly and misassembled genes and interesting symbionts, commensal and pathogenic organisms. Assembly errors and a lack of annotation of functional elements significantly limit the utility of the current draft livestock reference assemblies. The Functional Annotation of Animal Genomes initiative seeks to annotate functional elements, while a 70X Pac-Bio assembly for cow is underway and may result in a significantly improved reference assembly.
FY11 Report on Metagenome Analysis using Pathogen Marker Libraries
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gardner, Shea N.; Allen, Jonathan E.; McLoughlin, Kevin S.
2011-06-02
A method, sequence library, and software suite was invented to rapidly assess whether any member of a pre-specified list of threat organisms or their near neighbors is present in a metagenome. The system was designed to handle mega- to giga-bases of FASTA-formatted raw sequence reads from short or long read next generation sequencing platforms. The approach is to pre-calculate a viral and a bacterial "Pathogen Marker Library" (PML) containing sub-sequences specific to pathogens or their near neighbors. A list of expected matches comparing every bacterial or viral genome against the PML sequences is also pre-calculated. To analyze a metagenome, readsmore » are compared to the PML, and observed PML-metagenome matches are compared to the expected PML-genome matches, and the ratio of observed relative to expected matches is reported. In other words, a 3-way comparison among the PML, metagenome, and existing genome sequences is used to quickly assess which (if any) species included in the PML is likely to be present in the metagenome, based on available sequence data. Our tests showed that the species with the most PML matches correctly indicated the organism sequenced for empirical metagenomes consisting of a cultured, relatively pure isolate. These runs completed in 1 minute to 3 hours on 12 CPU (1 thread/CPU), depending on the metagenome and PML. Using more threads on the same number of CPU resulted in speed improvements roughly proportional to the number of threads. Simulations indicated that detection sensitivity depends on both sequencing coverage levels for a species and the size of the PML: species were correctly detected even at ~0.003x coverage by the large PMLs, and at ~0.03x coverage by the smaller PMLs. Matches to true positive species were 3-4 orders of magnitude higher than to false positives. Simulations with short reads (36 nt and ~260 nt) showed that species were usually detected for metagenome coverage above 0.005x and coverage in the PML above 0.05x, and detection probability appears to be a function of both coverages. Multiple species could be detected simultaneously in a simulated low-coverage, complex metagenome, and the largest PML gave no false negative species and no false positive genera. The presence of multiple species was predicted in a complex metagenome from a human gut microbiome with 1.9 GB of short reads (75 nt); the species predicted were reasonable gut flora and no biothreat agents were detected, showing the feasibility of PML analysis of empirical complex metagenomes.« less
Nair, Shalima S; Luu, Phuc-Loi; Qu, Wenjia; Maddugoda, Madhavi; Huschtscha, Lily; Reddel, Roger; Chenevix-Trench, Georgia; Toso, Martina; Kench, James G; Horvath, Lisa G; Hayes, Vanessa M; Stricker, Phillip D; Hughes, Timothy P; White, Deborah L; Rasko, John E J; Wong, Justin J-L; Clark, Susan J
2018-05-28
Comprehensive genome-wide DNA methylation profiling is critical to gain insights into epigenetic reprogramming during development and disease processes. Among the different genome-wide DNA methylation technologies, whole genome bisulphite sequencing (WGBS) is considered the gold standard for assaying genome-wide DNA methylation at single base resolution. However, the high sequencing cost to achieve the optimal depth of coverage limits its application in both basic and clinical research. To achieve 15× coverage of the human methylome, using WGBS, requires approximately three lanes of 100-bp-paired-end Illumina HiSeq 2500 sequencing. It is important, therefore, for advances in sequencing technologies to be developed to enable cost-effective high-coverage sequencing. In this study, we provide an optimised WGBS methodology, from library preparation to sequencing and data processing, to enable 16-20× genome-wide coverage per single lane of HiSeq X Ten, HCS 3.3.76. To process and analyse the data, we developed a WGBS pipeline (METH10X) that is fast and can call SNPs. We performed WGBS on both high-quality intact DNA and degraded DNA from formalin-fixed paraffin-embedded tissue. First, we compared different library preparation methods on the HiSeq 2500 platform to identify the best method for sequencing on the HiSeq X Ten. Second, we optimised the PhiX and genome spike-ins to achieve higher quality and coverage of WGBS data on the HiSeq X Ten. Third, we performed integrated whole genome sequencing (WGS) and WGBS of the same DNA sample in a single lane of HiSeq X Ten to improve data output. Finally, we compared methylation data from the HiSeq 2500 and HiSeq X Ten and found high concordance (Pearson r > 0.9×). Together we provide a systematic, efficient and complete approach to perform and analyse WGBS on the HiSeq X Ten. Our protocol allows for large-scale WGBS studies at reasonable processing time and cost on the HiSeq X Ten platform.
Identification of copy number variants in whole-genome data using Reference Coverage Profiles
Glusman, Gustavo; Severson, Alissa; Dhankani, Varsha; Robinson, Max; Farrah, Terry; Mauldin, Denise E.; Stittrich, Anna B.; Ament, Seth A.; Roach, Jared C.; Brunkow, Mary E.; Bodian, Dale L.; Vockley, Joseph G.; Shmulevich, Ilya; Niederhuber, John E.; Hood, Leroy
2015-01-01
The identification of DNA copy numbers from short-read sequencing data remains a challenge for both technical and algorithmic reasons. The raw data for these analyses are measured in tens to hundreds of gigabytes per genome; transmitting, storing, and analyzing such large files is cumbersome, particularly for methods that analyze several samples simultaneously. We developed a very efficient representation of depth of coverage (150–1000× compression) that enables such analyses. Current methods for analyzing variants in whole-genome sequencing (WGS) data frequently miss copy number variants (CNVs), particularly hemizygous deletions in the 1–100 kb range. To fill this gap, we developed a method to identify CNVs in individual genomes, based on comparison to joint profiles pre-computed from a large set of genomes. We analyzed depth of coverage in over 6000 high quality (>40×) genomes. The depth of coverage has strong sequence-specific fluctuations only partially explained by global parameters like %GC. To account for these fluctuations, we constructed multi-genome profiles representing the observed or inferred diploid depth of coverage at each position along the genome. These Reference Coverage Profiles (RCPs) take into account the diverse technologies and pipeline versions used. Normalization of the scaled coverage to the RCP followed by hidden Markov model (HMM) segmentation enables efficient detection of CNVs and large deletions in individual genomes. Use of pre-computed multi-genome coverage profiles improves our ability to analyze each individual genome. We make available RCPs and tools for performing these analyses on personal genomes. We expect the increased sensitivity and specificity for individual genome analysis to be critical for achieving clinical-grade genome interpretation. PMID:25741365
Multiplex Degenerate Primer Design for Targeted Whole Genome Amplification of Many Viral Genomes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gardner, Shea N.; Jaing, Crystal J.; Elsheikh, Maher M.
Background . Targeted enrichment improves coverage of highly mutable viruses at low concentration in complex samples. Degenerate primers that anneal to conserved regions can facilitate amplification of divergent, low concentration variants, even when the strain present is unknown. Results . A tool for designing multiplex sets of degenerate sequencing primers to tile overlapping amplicons across multiple whole genomes is described. The new script, run_tiled_primers, is part of the PriMux software. Primers were designed for each segment of South American hemorrhagic fever viruses, tick-borne encephalitis, Henipaviruses, Arenaviruses, Filoviruses, Crimean-Congo hemorrhagic fever virus, Rift Valley fever virus, and Japanese encephalitis virus. Eachmore » group is highly diverse with as little as 5% genome consensus. Primer sets were computationally checked for nontarget cross reactions against the NCBI nucleotide sequence database. Primers for murine hepatitis virus were demonstrated in the lab to specifically amplify selected genes from a laboratory cultured strain that had undergone extensive passage in vitro and in vivo. Conclusions . This software should help researchers design multiplex sets of primers for targeted whole genome enrichment prior to sequencing to obtain better coverage of low titer, divergent viruses. Applications include viral discovery from a complex background and improved sensitivity and coverage of rapidly evolving strains or variants in a gene family.« less
Multiplex Degenerate Primer Design for Targeted Whole Genome Amplification of Many Viral Genomes
Gardner, Shea N.; Jaing, Crystal J.; Elsheikh, Maher M.; ...
2014-01-01
Background . Targeted enrichment improves coverage of highly mutable viruses at low concentration in complex samples. Degenerate primers that anneal to conserved regions can facilitate amplification of divergent, low concentration variants, even when the strain present is unknown. Results . A tool for designing multiplex sets of degenerate sequencing primers to tile overlapping amplicons across multiple whole genomes is described. The new script, run_tiled_primers, is part of the PriMux software. Primers were designed for each segment of South American hemorrhagic fever viruses, tick-borne encephalitis, Henipaviruses, Arenaviruses, Filoviruses, Crimean-Congo hemorrhagic fever virus, Rift Valley fever virus, and Japanese encephalitis virus. Eachmore » group is highly diverse with as little as 5% genome consensus. Primer sets were computationally checked for nontarget cross reactions against the NCBI nucleotide sequence database. Primers for murine hepatitis virus were demonstrated in the lab to specifically amplify selected genes from a laboratory cultured strain that had undergone extensive passage in vitro and in vivo. Conclusions . This software should help researchers design multiplex sets of primers for targeted whole genome enrichment prior to sequencing to obtain better coverage of low titer, divergent viruses. Applications include viral discovery from a complex background and improved sensitivity and coverage of rapidly evolving strains or variants in a gene family.« less
Alternative Enzymes Lead to Improvements in Sequence Coverage and PTM Analysis
Hooper, Kyle; Rosenblatt, Michael; Urh, Marjeta; Saveliev, Sergei; Hosfield, Chris; Kobs, Gary; Ford, Michael; Jones, Richard; Amunugama, Ravi; Allen, David; Brazas, Robert
2013-01-01
The profiling of proteins using biological mass spectrometry (bottom up proteomics) most commonly requires trypsin. Trypsin is advantageous in that it produces peptides of optimal charge and size. However, for applications in which the proteins under investigation are part of a complex mixture or not isolated at high levels (i.e. low ng from an immunoprecipitation), sequence coverage is rarely complete. In addition, we have found that in several cases, like phosphorylation, acetylation, and methylation, alternative proteases are required to prepare peptides suitable for MS detection. This poster will provide specific examples which demonstrate this observation. For example, the application of a combined Trypsin/ Lys-C mixture reduces the number of missed cleavages by more than 3-fold producing samples with lower CV's (for biological replicates). The mixture is also well-suited for the complete proteolysis of hydrophobic, compact proteins. The addition of chymotrypsin and elastase has been found to be useful for identifying phosphorylation sites on proteins, especially on sequences where the site of phosphorylation inhibits trypsin (i.e. proximal to K or R). Many epigenetic applications have focused on histone modifications, like lysine acetylation and arginine methylation. Alternative proteases like Asp-N, Glu-C, and chymotrypsin have been especially useful given the fact that the modified K and R residues are resistant to c-terminal cleavage by trypsin. Finally, in the case of serum profiling, the addition of the endoglycosidase, PNGase F has been found to improve sequence coverage due to the removal of N-linked glycans.
Barker, F. Keith; Oyler-McCance, Sara; Tomback, Diana F.
2015-01-01
Next generation sequencing methods allow rapid, economical accumulation of data that have many applications, even at relatively low levels of genome coverage. However, the utility of shotgun sequencing data sets for specific goals may vary depending on the biological nature of the samples sequenced. We show that the ability to assemble mitogenomes from three avian samples of two different tissue types varies widely. In particular, data with coverage typical of microsatellite development efforts (∼1×) from DNA extracted from avian blood failed to cover even 50% of the mitogenome, relative to at least 500-fold coverage from muscle-derived data. Researchers should consider possible applications of their data and select the tissue source for their work accordingly. Practitioners analyzing low-coverage shotgun sequencing data (including for microsatellite locus development) should consider the potential benefits of mitogenome assembly, including internal barcode verification of species identity, mitochondrial primer development, and phylogenetics.
Delaneau, Olivier; Marchini, Jonathan
2014-06-13
A major use of the 1000 Genomes Project (1000 GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to build a backbone (or 'scaffold') of haplotypes across each chromosome. We then phase the sequence data 'onto' this haplotype scaffold. This approach can take advantage of relatedness between sequenced and non-sequenced samples to improve accuracy. We use this method to create a new 1000 GP haplotype reference set for use by the human genetic community. Using a set of validation genotypes at SNP and bi-allelic indels we show that these haplotypes have lower genotype discordance and improved imputation performance into downstream GWAS samples, especially at low-frequency variants.
Improving draft genome contiguity with reference-derived in silico mate-pair libraries.
Grau, José Horacio; Hackl, Thomas; Koepfli, Klaus-Peter; Hofreiter, Michael
2018-05-01
Contiguous genome assemblies are a highly valued biological resource because of the higher number of completely annotated genes and genomic elements that are usable compared to fragmented draft genomes. Nonetheless, contiguity is difficult to obtain if only low coverage data and/or only distantly related reference genome assemblies are available. In order to improve genome contiguity, we have developed Cross-Species Scaffolding-a new pipeline that imports long-range distance information directly into the de novo assembly process by constructing mate-pair libraries in silico. We show how genome assembly metrics and gene prediction dramatically improve with our pipeline by assembling two primate genomes solely based on ∼30x coverage of shotgun sequencing data.
Improved hybrid de novo genome assembly of domesticated apple (Malus x domestica).
Li, Xuewei; Kui, Ling; Zhang, Jing; Xie, Yinpeng; Wang, Liping; Yan, Yan; Wang, Na; Xu, Jidi; Li, Cuiying; Wang, Wen; van Nocker, Steve; Dong, Yang; Ma, Fengwang; Guan, Qingmei
2016-08-08
Domesticated apple (Malus × domestica Borkh) is a popular temperate fruit with high nutrient levels and diverse flavors. In 2012, global apple production accounted for at least one tenth of all harvested fruits. A high-quality apple genome assembly is crucial for the selection and breeding of new cultivars. Currently, a single reference genome is available for apple, assembled from 16.9 × genome coverage short reads via Sanger and 454 sequencing technologies. Although a useful resource, this assembly covers only ~89 % of the non-repetitive portion of the genome, and has a relatively short (16.7 kb) contig N50 length. These downsides make it difficult to apply this reference in transcriptive or whole-genome re-sequencing analyses. Here we present an improved hybrid de novo genomic assembly of apple (Golden Delicious), which was obtained from 76 Gb (~102 × genome coverage) Illumina HiSeq data and 21.7 Gb (~29 × genome coverage) PacBio data. The final draft genome is approximately 632.4 Mb, representing ~ 90 % of the estimated genome. The contig N50 size is 111,619 bp, representing a 7 fold improvement. Further annotation analyses predicted 53,922 protein-coding genes and 2,765 non-coding RNA genes. The new apple genome assembly will serve as a valuable resource for investigating complex apple traits at the genomic level. It is not only suitable for genome editing and gene cloning, but also for RNA-seq and whole-genome re-sequencing studies.
2014-01-01
Affinity capture of DNA methylation combined with high-throughput sequencing strikes a good balance between the high cost of whole genome bisulfite sequencing and the low coverage of methylation arrays. We present BayMeth, an empirical Bayes approach that uses a fully methylated control sample to transform observed read counts into regional methylation levels. In our model, inefficient capture can readily be distinguished from low methylation levels. BayMeth improves on existing methods, allows explicit modeling of copy number variation, and offers computationally efficient analytical mean and variance estimators. BayMeth is available in the Repitools Bioconductor package. PMID:24517713
Harris, R. Alan; Wang, Ting; Coarfa, Cristian; Nagarajan, Raman P.; Hong, Chibo; Downey, Sara L.; Johnson, Brett E.; Fouse, Shaun D.; Delaney, Allen; Zhao, Yongjun; Olshen, Adam; Ballinger, Tracy; Zhou, Xin; Forsberg, Kevin J.; Gu, Junchen; Echipare, Lorigail; O’Geen, Henriette; Lister, Ryan; Pelizzola, Mattia; Xi, Yuanxin; Epstein, Charles B.; Bernstein, Bradley E.; Hawkins, R. David; Ren, Bing; Chung, Wen-Yu; Gu, Hongcang; Bock, Christoph; Gnirke, Andreas; Zhang, Michael Q.; Haussler, David; Ecker, Joseph; Li, Wei; Farnham, Peggy J.; Waterland, Robert A.; Meissner, Alexander; Marra, Marco A.; Hirst, Martin; Milosavljevic, Aleksandar; Costello, Joseph F.
2010-01-01
Sequencing-based DNA methylation profiling methods are comprehensive and, as accuracy and affordability improve, will increasingly supplant microarrays for genome-scale analyses. Here, four sequencing-based methodologies were applied to biological replicates of human embryonic stem cells to compare their CpG coverage genome-wide and in transposons, resolution, cost, concordance and its relationship with CpG density and genomic context. The two bisulfite methods reached concordance of 82% for CpG methylation levels and 99% for non-CpG cytosine methylation levels. Using binary methylation calls, two enrichment methods were 99% concordant, while regions assessed by all four methods were 97% concordant. To achieve comprehensive methylome coverage while reducing cost, an approach integrating two complementary methods was examined. The integrative methylome profile along with histone methylation, RNA, and SNP profiles derived from the sequence reads allowed genome-wide assessment of allele-specific epigenetic states, identifying most known imprinted regions and new loci with monoallelic epigenetic marks and monoallelic expression. PMID:20852635
Exome sequencing for prenatal diagnosis of fetuses with sonographic abnormalities.
Drury, Suzanne; Williams, Hywel; Trump, Natalie; Boustred, Christopher; Lench, Nicholas; Scott, Richard H; Chitty, Lyn S
2015-10-01
In the absence of aneuploidy or other pathogenic cytogenetic abnormality, fetuses with increased nuchal translucency (NT ≥ 3.5 mm) and/or other sonographic abnormalities have a greater incidence of genetic syndromes, but defining the underlying pathology can be challenging. Here, we investigate the value of whole exome sequencing in fetuses with sonographic abnormalities but normal microarray analysis. Whole exome sequencing was performed on DNA extracted from chorionic villi or amniocytes in 24 fetuses with unexplained ultrasound findings. In the first 14 cases sequencing was initially performed on fetal DNA only. For the remaining 10, the trio of fetus, mother and father was sequenced simultaneously. In 21% (5/24) cases, exome sequencing provided definitive diagnoses (Milroy disease, hypophosphatasia, achondrogenesis type 2, Freeman-Sheldon syndrome and Baraitser-Winter Syndrome). In a further case, a plausible diagnosis of orofaciodigital syndrome type 6 was made. In two others, a single mutation in an autosomal recessive gene was identified, but incomplete sequencing coverage precluded exclusion of the presence of a second mutation. Whole exome sequencing improves prenatal diagnosis in euploid fetuses with abnormal ultrasound scans. In order to expedite interpretation of results, trio sequencing should be employed, but interpretation can still be compromised by incomplete coverage of relevant genes. © 2015 John Wiley & Sons, Ltd.
van den Bos, Indra C; Hussain, Shahid M; Krestin, Gabriel P; Wielopolski, Piotr A
2008-07-01
Institutional Review Board approval and signed informed consent were obtained by all participants for an ongoing sequence optimization project at 3.0 T. The purpose of this study was to evaluate breath-hold diffusion-induced black-blood echo-planar imaging (BBEPI) as a potential alternative for specific absorption rate (SAR)-intensive spin-echo sequences, in particular, the fast spin-echo (FSE) sequences, at 3.0 T. Fourteen healthy volunteers (seven men, seven women; mean age +/- standard deviation, 32.7 years +/- 6.8) were imaged for this purpose. Liver coverage (20 cm, z-axis) was always performed in one 25-second breath hold. Imaging parameters were varied interactively with regard to echo time, diffusion b value, and voxel size. Images were evaluated and compared with fat-suppressed T2-weighted FSE images for image quality, liver delineation, geometric distortions, fat suppression, suppression of the blood signal, contrast-to-noise ratio (CNR), and signal-to-noise ratio (SNR). An optimized short- (25 msec) and long-echo (80 msec) BBEPI provided full anatomic, single breath-hold liver coverage (100 and 50 sections, respectively), with resulting voxel sizes of 3.3 x 2.7 x 2.0 mm and 3.3 x 2.7 x 4.0 mm, respectively. Repetition time was 6300 msec, matrix size was 160 x 192, and an acceleration factor of 2.00 was used. b Values of more than 20 sec/mm(2) showed better suppression of the blood signal but b values of 10 sec/mm(2) provided improved volume coverage and signal consistency. Compared with fat-suppressed T2-weighted FSE, the optimized BBEPI sequence provided (a) comparable image quality and liver delineation, (b) acceptable geometric distortions, (c) improved suppression of fat and blood signals, and (d) high CNR and SNR. BBEPI is feasible for fast, low-SAR, thin-section morphologic imaging of the entire liver in a single breath hold at 3.0 T. (c) RSNA, 2008.
Rutvisuttinunt, Wiriya; Chinnawirotpisan, Piyawan; Simasathien, Sriluck; Shrestha, Sanjaya K; Yoon, In-Kyu; Klungthong, Chonticha; Fernandez, Stefan
2013-11-01
Active global surveillance and characterization of influenza viruses are essential for better preparation against possible pandemic events. Obtaining comprehensive information about the influenza genome can improve our understanding of the evolution of influenza viruses and emergence of new strains, and improve the accuracy when designing preventive vaccines. This study investigated the use of deep sequencing by the next-generation sequencing (NGS) Illumina MiSeq Platform to obtain complete genome sequence information from influenza virus isolates. The influenza virus isolates were cultured from 6 respiratory acute clinical specimens collected in Thailand and Nepal. DNA libraries obtained from each viral isolate were mixed and all were sequenced simultaneously. Total information of 2.6 Gbases was obtained from a 455±14 K/mm2 density with 95.76% (8,571,655/8,950,724 clusters) of the clusters passing quality control (QC) filters. Approximately 93.7% of all sequences from Read1 and 83.5% from Read2 contained high quality sequences that were ≥Q30, a base calling QC score standard. Alignments analysis identified three seasonal influenza A H3N2 strains, one 2009 pandemic influenza A H1N1 strain and two influenza B strains. The nearly entire genomes of all six virus isolates yielded equal or greater than 600-fold sequence coverage depth. MiSeq Platform identified seasonal influenza A H3N2, 2009 pandemic influenza A H1N1and influenza B in the DNA library mixtures efficiently. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.
Genome sequence of Phytophthora ramorum: implications for management
Brett Tyler; Sucheta Tripathy; Nik Grunwald; Kurt Lamour; Kelly Ivors; Matteo Garbelotto; Daniel Rokhsar; Nik Putnam; Igor Grigoriev; Jeffrey Boore
2006-01-01
A draft genome sequence has been determined for Phytophthora ramorum, together with a draft sequence of the soybean pathogen Phytophthora sojae. The P. ramorum genome was sequenced to a depth of 7-fold coverage, while the P. sojae genome was sequenced to a depth of 9-fold coverage. The genome...
The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads.
Wang, Zhiwen; Hobson, Neil; Galindo, Leonardo; Zhu, Shilin; Shi, Daihu; McDill, Joshua; Yang, Linfeng; Hawkins, Simon; Neutelings, Godfrey; Datla, Raju; Lambert, Georgina; Galbraith, David W; Grassa, Christopher J; Geraldes, Armando; Cronk, Quentin C; Cullis, Christopher; Dash, Prasanta K; Kumar, Polumetla A; Cloutier, Sylvie; Sharpe, Andrew G; Wong, Gane K-S; Wang, Jun; Deyholos, Michael K
2012-11-01
Flax (Linum usitatissimum) is an ancient crop that is widely cultivated as a source of fiber, oil and medicinally relevant compounds. To accelerate crop improvement, we performed whole-genome shotgun sequencing of the nuclear genome of flax. Seven paired-end libraries ranging in size from 300 bp to 10 kb were sequenced using an Illumina genome analyzer. A de novo assembly, comprised exclusively of deep-coverage (approximately 94× raw, approximately 69× filtered) short-sequence reads (44-100 bp), produced a set of scaffolds with N(50) =694 kb, including contigs with N(50)=20.1 kb. The contig assembly contained 302 Mb of non-redundant sequence representing an estimated 81% genome coverage. Up to 96% of published flax ESTs aligned to the whole-genome shotgun scaffolds. However, comparisons with independently sequenced BACs and fosmids showed some mis-assembly of regions at the genome scale. A total of 43384 protein-coding genes were predicted in the whole-genome shotgun assembly, and up to 93% of published flax ESTs, and 86% of A. thaliana genes aligned to these predicted genes, indicating excellent coverage and accuracy at the gene level. Analysis of the synonymous substitution rates (K(s) ) observed within duplicate gene pairs was consistent with a recent (5-9 MYA) whole-genome duplication in flax. Within the predicted proteome, we observed enrichment of many conserved domains (Pfam-A) that may contribute to the unique properties of this crop, including agglutinin proteins. Together these results show that de novo assembly, based solely on whole-genome shotgun short-sequence reads, is an efficient means of obtaining nearly complete genome sequence information for some plant species. © 2012 The Authors. The Plant Journal © 2012 Blackwell Publishing Ltd.
Optimizing Illumina next-generation sequencing library preparation for extremely AT-biased genomes.
Oyola, Samuel O; Otto, Thomas D; Gu, Yong; Maslen, Gareth; Manske, Magnus; Campino, Susana; Turner, Daniel J; Macinnis, Bronwyn; Kwiatkowski, Dominic P; Swerdlow, Harold P; Quail, Michael A
2012-01-03
Massively parallel sequencing technology is revolutionizing approaches to genomic and genetic research. Since its advent, the scale and efficiency of Next-Generation Sequencing (NGS) has rapidly improved. In spite of this success, sequencing genomes or genomic regions with extremely biased base composition is still a great challenge to the currently available NGS platforms. The genomes of some important pathogenic organisms like Plasmodium falciparum (high AT content) and Mycobacterium tuberculosis (high GC content) display extremes of base composition. The standard library preparation procedures that employ PCR amplification have been shown to cause uneven read coverage particularly across AT and GC rich regions, leading to problems in genome assembly and variation analyses. Alternative library-preparation approaches that omit PCR amplification require large quantities of starting material and hence are not suitable for small amounts of DNA/RNA such as those from clinical isolates. We have developed and optimized library-preparation procedures suitable for low quantity starting material and tolerant to extremely high AT content sequences. We have used our optimized conditions in parallel with standard methods to prepare Illumina sequencing libraries from a non-clinical and a clinical isolate (containing ~53% host contamination). By analyzing and comparing the quality of sequence data generated, we show that our optimized conditions that involve a PCR additive (TMAC), produces amplified libraries with improved coverage of extremely AT-rich regions and reduced bias toward GC neutral templates. We have developed a robust and optimized Next-Generation Sequencing library amplification method suitable for extremely AT-rich genomes. The new amplification conditions significantly reduce bias and retain the complexity of either extremes of base composition. This development will greatly benefit sequencing clinical samples that often require amplification due to low mass of DNA starting material.
MeCorS: Metagenome-enabled error correction of single cell sequencing reads
Bremges, Andreas; Singer, Esther; Woyke, Tanja; ...
2016-03-15
Here we present a new tool, MeCorS, to correct chimeric reads and sequencing errors in Illumina data generated from single amplified genomes (SAGs). It uses sequence information derived from accompanying metagenome sequencing to accurately correct errors in SAG reads, even from ultra-low coverage regions. In evaluations on real data, we show that MeCorS outperforms BayesHammer, the most widely used state-of-the-art approach. MeCorS performs particularly well in correcting chimeric reads, which greatly improves both accuracy and contiguity of de novo SAG assemblies.
USDA-ARS?s Scientific Manuscript database
Next-generation sequencing technology such as genotyping-by-sequencing (GBS) made low-cost, but often low-coverage, whole-genome sequencing widely available. Extensive inbreeding in crop plants provides an untapped, high quality source of phased haplotypes for imputing missing genotypes. We introduc...
Modeling genome coverage in single-cell sequencing
Daley, Timothy; Smith, Andrew D.
2014-01-01
Motivation: Single-cell DNA sequencing is necessary for examining genetic variation at the cellular level, which remains hidden in bulk sequencing experiments. But because they begin with such small amounts of starting material, the amount of information that is obtained from single-cell sequencing experiment is highly sensitive to the choice of protocol employed and variability in library preparation. In particular, the fraction of the genome represented in single-cell sequencing libraries exhibits extreme variability due to quantitative biases in amplification and loss of genetic material. Results: We propose a method to predict the genome coverage of a deep sequencing experiment using information from an initial shallow sequencing experiment mapped to a reference genome. The observed coverage statistics are used in a non-parametric empirical Bayes Poisson model to estimate the gain in coverage from deeper sequencing. This approach allows researchers to know statistical features of deep sequencing experiments without actually sequencing deeply, providing a basis for optimizing and comparing single-cell sequencing protocols or screening libraries. Availability and implementation: The method is available as part of the preseq software package. Source code is available at http://smithlabresearch.org/preseq. Contact: andrewds@usc.edu Supplementary information: Supplementary material is available at Bioinformatics online. PMID:25107873
Kammermeier, Jochen; Drury, Suzanne; James, Chela T; Dziubak, Robert; Ocaka, Louise; Elawad, Mamoun; Beales, Philip; Lench, Nicholas; Uhlig, Holm H; Bacchelli, Chiara; Shah, Neil
2014-11-01
Multiple monogenetic conditions with partially overlapping phenotypes can present with inflammatory bowel disease (IBD)-like intestinal inflammation. With novel genotype-specific therapies emerging, establishing a molecular diagnosis is becoming increasingly important. We have introduced targeted next-generation sequencing (NGS) technology as a prospective screening tool in children with very early onset IBD (VEOIBD). We evaluated the coverage of 40 VEOIBD genes in two separate cohorts undergoing targeted gene panel sequencing (TGPS) (n=25) and whole exome sequencing (WES) (n=20). TGPS revealed causative mutations in four genes (IL10RA, EPCAM, TTC37 and SKIV2L) discovered unexpected phenotypes and directly influenced clinical decision making by supporting as well as avoiding haematopoietic stem cell transplantation. TGPS resulted in significantly higher median coverage when compared with WES, fewer coverage deficiencies and improved variant detection across established VEOIBD genes. Excluding or confirming known VEOIBD genotypes should be considered early in the disease course in all cases of therapy-refractory VEOIBD, as it can have a direct impact on patient management. To combine both described NGS technologies would compensate for the limitations of WES for disease-specific application while offering the opportunity for novel gene discovery in the research setting. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Lucas Lledó, José Ignacio; Cáceres, Mario
2013-01-01
One of the most used techniques to study structural variation at a genome level is paired-end mapping (PEM). PEM has the advantage of being able to detect balanced events, such as inversions and translocations. However, inversions are still quite difficult to predict reliably, especially from high-throughput sequencing data. We simulated realistic PEM experiments with different combinations of read and library fragment lengths, including sequencing errors and meaningful base-qualities, to quantify and track down the origin of false positives and negatives along sequencing, mapping, and downstream analysis. We show that PEM is very appropriate to detect a wide range of inversions, even with low coverage data. However, % of inversions located between segmental duplications are expected to go undetected by the most common sequencing strategies. In general, longer DNA libraries improve the detectability of inversions far better than increments of the coverage depth or the read length. Finally, we review the performance of three algorithms to detect inversions —SVDetect, GRIAL, and VariationHunter—, identify common pitfalls, and reveal important differences in their breakpoint precisions. These results stress the importance of the sequencing strategy for the detection of structural variants, especially inversions, and offer guidelines for the design of future genome sequencing projects. PMID:23637806
Effects of the Ion PGM™ Hi-Q™ sequencing chemistry on sequence data quality.
Churchill, Jennifer D; King, Jonathan L; Chakraborty, Ranajit; Budowle, Bruce
2016-09-01
Massively parallel sequencing (MPS) offers substantial improvements over current forensic DNA typing methodologies such as increased resolution, scalability, and throughput. The Ion PGM™ is a promising MPS platform for analysis of forensic biological evidence. The system employs a sequencing-by-synthesis chemistry on a semiconductor chip that measures a pH change due to the release of hydrogen ions as nucleotides are incorporated into the growing DNA strands. However, implementation of MPS into forensic laboratories requires a robust chemistry. Ion Torrent's Hi-Q™ Sequencing Chemistry was evaluated to determine if it could improve on the quality of the generated sequence data in association with selected genetic marker targets. The whole mitochondrial genome and the HID-Ion STR 10-plex panel were sequenced on the Ion PGM™ system with the Ion PGM™ Sequencing 400 Kit and the Ion PGM™ Hi-Q™ Sequencing Kit. Concordance, coverage, strand balance, noise, and deletion ratios were assessed in evaluating the performance of the Ion PGM™ Hi-Q™ Sequencing Kit. The results indicate that reliable, accurate data are generated and that sequencing through homopolymeric regions can be improved with the use of Ion Torrent's Hi-Q™ Sequencing Chemistry. Overall, the quality of the generated sequencing data supports the potential for use of the Ion PGM™ in forensic genetic laboratories.
Improved coverage of cDNA-AFLP by sequential digestion of immobilized cDNA.
Weiberg, Arne; Pöhler, Dirk; Morgenstern, Burkhard; Karlovsky, Petr
2008-10-13
cDNA-AFLP is a transcriptomics technique which does not require prior sequence information and can therefore be used as a gene discovery tool. The method is based on selective amplification of cDNA fragments generated by restriction endonucleases, electrophoretic separation of the products and comparison of the band patterns between treated samples and controls. Unequal distribution of restriction sites used to generate cDNA fragments negatively affects the performance of cDNA-AFLP. Some transcripts are represented by more than one fragment while other escape detection, causing redundancy and reducing the coverage of the analysis, respectively. With the goal of improving the coverage of cDNA-AFLP without increasing its redundancy, we designed a modified cDNA-AFLP protocol. Immobilized cDNA is sequentially digested with several restriction endonucleases and the released DNA fragments are collected in mutually exclusive pools. To investigate the performance of the protocol, software tool MECS (Multiple Enzyme cDNA-AFLP Simulation) was written in Perl. cDNA-AFLP protocols described in the literature and the new sequential digestion protocol were simulated on sets of cDNA sequences from mouse, human and Arabidopsis thaliana. The redundancy and coverage, the total number of PCR reactions, and the average fragment length were calculated for each protocol and cDNA set. Simulation revealed that sequential digestion of immobilized cDNA followed by the partitioning of released fragments into mutually exclusive pools outperformed other cDNA-AFLP protocols in terms of coverage, redundancy, fragment length, and the total number of PCRs. Primers generating 30 to 70 amplicons per PCR provided the highest fraction of electrophoretically distinguishable fragments suitable for normalization. For A. thaliana, human and mice transcriptome, the use of two marking enzymes and three sequentially applied releasing enzymes for each of the marking enzymes is recommended.
Genotype calling from next-generation sequencing data using haplotype information of reads
Zhi, Degui; Wu, Jihua; Liu, Nianjun; Zhang, Kui
2012-01-01
Motivation: Low coverage sequencing provides an economic strategy for whole genome sequencing. When sequencing a set of individuals, genotype calling can be challenging due to low sequencing coverage. Linkage disequilibrium (LD) based refinement of genotyping calling is essential to improve the accuracy. Current LD-based methods use read counts or genotype likelihoods at individual potential polymorphic sites (PPSs). Reads that span multiple PPSs (jumping reads) can provide additional haplotype information overlooked by current methods. Results: In this article, we introduce a new Hidden Markov Model (HMM)-based method that can take into account jumping reads information across adjacent PPSs and implement it in the HapSeq program. Our method extends the HMM in Thunder and explicitly models jumping reads information as emission probabilities conditional on the states of adjacent PPSs. Our simulation results show that, compared to Thunder, HapSeq reduces the genotyping error rate by 30%, from 0.86% to 0.60%. The results from the 1000 Genomes Project show that HapSeq reduces the genotyping error rate by 12 and 9%, from 2.24% and 2.76% to 1.97% and 2.50% for individuals with European and African ancestry, respectively. We expect our program can improve genotyping qualities of the large number of ongoing and planned whole genome sequencing projects. Contact: dzhi@ms.soph.uab.edu; kzhang@ms.soph.uab.edu Availability: The software package HapSeq and its manual can be found and downloaded at www.ssg.uab.edu/hapseq/. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22285565
GFam: a platform for automatic annotation of gene families.
Sasidharan, Rajkumar; Nepusz, Tamás; Swarbreck, David; Huala, Eva; Paccanaro, Alberto
2012-10-01
We have developed GFam, a platform for automatic annotation of gene/protein families. GFam provides a framework for genome initiatives and model organism resources to build domain-based families, derive meaningful functional labels and offers a seamless approach to propagate functional annotation across periodic genome updates. GFam is a hybrid approach that uses a greedy algorithm to chain component domains from InterPro annotation provided by its 12 member resources followed by a sequence-based connected component analysis of un-annotated sequence regions to derive consensus domain architecture for each sequence and subsequently generate families based on common architectures. Our integrated approach increases sequence coverage by 7.2 percentage points and residue coverage by 14.6 percentage points higher than the coverage relative to the best single-constituent database within InterPro for the proteome of Arabidopsis. The true power of GFam lies in maximizing annotation provided by the different InterPro data sources that offer resource-specific coverage for different regions of a sequence. GFam's capability to capture higher sequence and residue coverage can be useful for genome annotation, comparative genomics and functional studies. GFam is a general-purpose software and can be used for any collection of protein sequences. The software is open source and can be obtained from http://www.paccanarolab.org/software/gfam/.
1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life
Mukherjee, Supratim; Seshadri, Rekha; Varghese, Neha J.; ...
2017-06-12
We present 1,003 reference genomes that were sequenced as part of the Genomic Encyclopedia of Bacteria and Archaea (GEBA) initiative, selected to maximize sequence coverage of phylogenetic space. These genomes double the number of existing type strains and expand their overall phylogenetic diversity by 25%. Comparative analyses with previously available finished and draft genomes reveal a 10.5% increase in novel protein families as a function of phylogenetic diversity. The GEBA genomes recruit 25 million previously unassigned metagenomic proteins from 4,650 samples, improving their phylogenetic and functional interpretation. We identify numerous biosynthetic clusters and experimentally validate a divergent phenazine cluster withmore » potential new chemical structure and antimicrobial activity. This Resource is the largest single release of reference genomes to date. Bacterial and archaeal isolate sequence space is still far from saturated, and future endeavors in this direction will continue to be a valuable resource for scientific discovery.« less
1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mukherjee, Supratim; Seshadri, Rekha; Varghese, Neha J.
We present 1,003 reference genomes that were sequenced as part of the Genomic Encyclopedia of Bacteria and Archaea (GEBA) initiative, selected to maximize sequence coverage of phylogenetic space. These genomes double the number of existing type strains and expand their overall phylogenetic diversity by 25%. Comparative analyses with previously available finished and draft genomes reveal a 10.5% increase in novel protein families as a function of phylogenetic diversity. The GEBA genomes recruit 25 million previously unassigned metagenomic proteins from 4,650 samples, improving their phylogenetic and functional interpretation. We identify numerous biosynthetic clusters and experimentally validate a divergent phenazine cluster withmore » potential new chemical structure and antimicrobial activity. This Resource is the largest single release of reference genomes to date. Bacterial and archaeal isolate sequence space is still far from saturated, and future endeavors in this direction will continue to be a valuable resource for scientific discovery.« less
NASA Astrophysics Data System (ADS)
Matossian, Mark G.
1997-01-01
Much attention in recent years has focused on commercial telecommunications ventures involving constellations of spacecraft in low and medium Earth orbit. These projects often require investments on the order of billions of dollars (US$) for development and operations, but surprisingly little work has been published on constellation design optimization for coverage analysis, traffic simulation and launch sequencing for constellation build-up strategies. This paper addresses the two most critical aspects of constellation orbital design — efficient constellation candidate generation and coverage analysis. Inefficiencies and flaws in the current standard algorithm for constellation modeling are identified, and a corrected and improved algorithm is presented. In the 1970's, John Walker and G. V. Mozhaev developed innovative strategies for continuous global coverage using symmetric non-geosynchronous constellations. (These are sometimes referred to as rosette, or Walker constellations. An example is pictured above.) In 1980, the late Arthur Ballard extended and generalized the work of Walker into a detailed algorithm for the NAVSTAR/GPS program, which deployed a 24 satellite symmetric constellation. Ballard's important contribution was published in his "Rosette Constellations of Earth Satellites."
Yuan, Shuai; Johnston, H. Richard; Zhang, Guosheng; Li, Yun; Hu, Yi-Juan; Qin, Zhaohui S.
2015-01-01
With rapid decline of the sequencing cost, researchers today rush to embrace whole genome sequencing (WGS), or whole exome sequencing (WES) approach as the next powerful tool for relating genetic variants to human diseases and phenotypes. A fundamental step in analyzing WGS and WES data is mapping short sequencing reads back to the reference genome. This is an important issue because incorrectly mapped reads affect the downstream variant discovery, genotype calling and association analysis. Although many read mapping algorithms have been developed, the majority of them uses the universal reference genome and do not take sequence variants into consideration. Given that genetic variants are ubiquitous, it is highly desirable if they can be factored into the read mapping procedure. In this work, we developed a novel strategy that utilizes genotypes obtained a priori to customize the universal haploid reference genome into a personalized diploid reference genome. The new strategy is implemented in a program named RefEditor. When applying RefEditor to real data, we achieved encouraging improvements in read mapping, variant discovery and genotype calling. Compared to standard approaches, RefEditor can significantly increase genotype calling consistency (from 43% to 61% at 4X coverage; from 82% to 92% at 20X coverage) and reduce Mendelian inconsistency across various sequencing depths. Because many WGS and WES studies are conducted on cohorts that have been genotyped using array-based genotyping platforms previously or concurrently, we believe the proposed strategy will be of high value in practice, which can also be applied to the scenario where multiple NGS experiments are conducted on the same cohort. The RefEditor sources are available at https://github.com/superyuan/refeditor. PMID:26267278
Dessimoz, Christophe; Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro
2011-09-01
Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references.
Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro
2011-01-01
Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references. PMID:21712341
Virome Assembly and Annotation: A Surprise in the Namib Desert
Hesse, Uljana; van Heusden, Peter; Kirby, Bronwyn M.; Olonade, Israel; van Zyl, Leonardo J.; Trindade, Marla
2017-01-01
Sequencing, assembly, and annotation of environmental virome samples is challenging. Methodological biases and differences in species abundance result in fragmentary read coverage; sequence reconstruction is further complicated by the mosaic nature of viral genomes. In this paper, we focus on biocomputational aspects of virome analysis, emphasizing latent pitfalls in sequence annotation. Using simulated viromes that mimic environmental data challenges we assessed the performance of five assemblers (CLC-Workbench, IDBA-UD, SPAdes, RayMeta, ABySS). Individual analyses of relevant scaffold length fractions revealed shortcomings of some programs in reconstruction of viral genomes with excessive read coverage (IDBA-UD, RayMeta), and in accurate assembly of scaffolds ≥50 kb (SPAdes, RayMeta, ABySS). The CLC-Workbench assembler performed best in terms of genome recovery (including highly covered genomes) and correct reconstruction of large scaffolds; and was used to assemble a virome from a copper rich site in the Namib Desert. We found that scaffold network analysis and cluster-specific read reassembly improved reconstruction of sequences with excessive read coverage, and that strict data filtering for non-viral sequences prior to downstream analyses was essential. In this study we describe novel viral genomes identified in the Namib Desert copper site virome. Taxonomic affiliations of diverse proteins in the dataset and phylogenetic analyses of circovirus-like proteins indicated links to the marine habitat. Considering additional evidence from this dataset we hypothesize that viruses may have been carried from the Atlantic Ocean into the Namib Desert by fog and wind, highlighting the impact of the extended environment on an investigated niche in metagenome studies. PMID:28167933
Prakash, Celine; Haeseler, Arndt Von
2017-03-01
RNA sequencing (RNA-seq) has emerged as the method of choice for measuring the expression of RNAs in a given cell population. In most RNA-seq technologies, sequencing the full length of RNA molecules requires fragmentation into smaller pieces. Unfortunately, the issue of nonuniform sequencing coverage across a genomic feature has been a concern in RNA-seq and is attributed to biases for certain fragments in RNA-seq library preparation and sequencing. To investigate the expected coverage obtained from fragmentation, we develop a simple fragmentation model that is independent of bias from the experimental method and is not specific to the transcript sequence. Essentially, we enumerate all configurations for maximal placement of a given fragment length, F, on transcript length, T, to represent every possible fragmentation pattern, from which we compute the expected coverage profile across a transcript. We extend this model to incorporate general empirical attributes such as read length, fragment length distribution, and number of molecules of the transcript. We further introduce the fragment starting-point, fragment coverage, and read coverage profiles. We find that the expected profiles are not uniform and that factors such as fragment length to transcript length ratio, read length to fragment length ratio, fragment length distribution, and number of molecules influence the variability of coverage across a transcript. Finally, we explore a potential application of the model where, with simulations, we show that it is possible to correctly estimate the transcript copy number for any transcript in the RNA-seq experiment.
Haeseler, Arndt Von
2017-01-01
Abstract RNA sequencing (RNA-seq) has emerged as the method of choice for measuring the expression of RNAs in a given cell population. In most RNA-seq technologies, sequencing the full length of RNA molecules requires fragmentation into smaller pieces. Unfortunately, the issue of nonuniform sequencing coverage across a genomic feature has been a concern in RNA-seq and is attributed to biases for certain fragments in RNA-seq library preparation and sequencing. To investigate the expected coverage obtained from fragmentation, we develop a simple fragmentation model that is independent of bias from the experimental method and is not specific to the transcript sequence. Essentially, we enumerate all configurations for maximal placement of a given fragment length, F, on transcript length, T, to represent every possible fragmentation pattern, from which we compute the expected coverage profile across a transcript. We extend this model to incorporate general empirical attributes such as read length, fragment length distribution, and number of molecules of the transcript. We further introduce the fragment starting-point, fragment coverage, and read coverage profiles. We find that the expected profiles are not uniform and that factors such as fragment length to transcript length ratio, read length to fragment length ratio, fragment length distribution, and number of molecules influence the variability of coverage across a transcript. Finally, we explore a potential application of the model where, with simulations, we show that it is possible to correctly estimate the transcript copy number for any transcript in the RNA-seq experiment. PMID:27661099
Zhao, Shanrong; Zhang, Ying; Gamini, Ramya; Zhang, Baohong; von Schack, David
2018-03-19
To allow efficient transcript/gene detection, highly abundant ribosomal RNAs (rRNA) are generally removed from total RNA either by positive polyA+ selection or by rRNA depletion (negative selection) before sequencing. Comparisons between the two methods have been carried out by various groups, but the assessments have relied largely on non-clinical samples. In this study, we evaluated these two RNA sequencing approaches using human blood and colon tissue samples. Our analyses showed that rRNA depletion captured more unique transcriptome features, whereas polyA+ selection outperformed rRNA depletion with higher exonic coverage and better accuracy of gene quantification. For blood- and colon-derived RNAs, we found that 220% and 50% more reads, respectively, would have to be sequenced to achieve the same level of exonic coverage in the rRNA depletion method compared with the polyA+ selection method. Therefore, in most cases we strongly recommend polyA+ selection over rRNA depletion for gene quantification in clinical RNA sequencing. Our evaluation revealed that a small number of lncRNAs and small RNAs made up a large fraction of the reads in the rRNA depletion RNA sequencing data. Thus, we recommend that these RNAs are specifically depleted to improve the sequencing depth of the remaining RNAs.
Lionel, Anath C; Costain, Gregory; Monfared, Nasim; Walker, Susan; Reuter, Miriam S; Hosseini, S Mohsen; Thiruvahindrapuram, Bhooma; Merico, Daniele; Jobling, Rebekah; Nalpathamkalam, Thomas; Pellecchia, Giovanna; Sung, Wilson W L; Wang, Zhuozhi; Bikangaga, Peter; Boelman, Cyrus; Carter, Melissa T; Cordeiro, Dawn; Cytrynbaum, Cheryl; Dell, Sharon D; Dhir, Priya; Dowling, James J; Heon, Elise; Hewson, Stacy; Hiraki, Linda; Inbar-Feigenberg, Michal; Klatt, Regan; Kronick, Jonathan; Laxer, Ronald M; Licht, Christoph; MacDonald, Heather; Mercimek-Andrews, Saadet; Mendoza-Londono, Roberto; Piscione, Tino; Schneider, Rayfel; Schulze, Andreas; Silverman, Earl; Siriwardena, Komudi; Snead, O Carter; Sondheimer, Neal; Sutherland, Joanne; Vincent, Ajoy; Wasserman, Jonathan D; Weksberg, Rosanna; Shuman, Cheryl; Carew, Chris; Szego, Michael J; Hayeems, Robin Z; Basran, Raveen; Stavropoulos, Dimitri J; Ray, Peter N; Bowdin, Sarah; Meyn, M Stephen; Cohn, Ronald D; Scherer, Stephen W; Marshall, Christian R
2018-01-01
Purpose Genetic testing is an integral diagnostic component of pediatric medicine. Standard of care is often a time-consuming stepwise approach involving chromosomal microarray analysis and targeted gene sequencing panels, which can be costly and inconclusive. Whole-genome sequencing (WGS) provides a comprehensive testing platform that has the potential to streamline genetic assessments, but there are limited comparative data to guide its clinical use. Methods We prospectively recruited 103 patients from pediatric non-genetic subspecialty clinics, each with a clinical phenotype suggestive of an underlying genetic disorder, and compared the diagnostic yield and coverage of WGS with those of conventional genetic testing. Results WGS identified diagnostic variants in 41% of individuals, representing a significant increase over conventional testing results (24% P = 0.01). Genes clinically sequenced in the cohort (n = 1,226) were well covered by WGS, with a median exonic coverage of 40 × ±8 × (mean ±SD). All the molecular diagnoses made by conventional methods were captured by WGS. The 18 new diagnoses made with WGS included structural and non-exonic sequence variants not detectable with whole-exome sequencing, and confirmed recent disease associations with the genes PIGG, RNU4ATAC, TRIO, and UNC13A. Conclusion WGS as a primary clinical test provided a higher diagnostic yield than conventional genetic testing in a clinically heterogeneous cohort. PMID:28771251
Lionel, Anath C; Costain, Gregory; Monfared, Nasim; Walker, Susan; Reuter, Miriam S; Hosseini, S Mohsen; Thiruvahindrapuram, Bhooma; Merico, Daniele; Jobling, Rebekah; Nalpathamkalam, Thomas; Pellecchia, Giovanna; Sung, Wilson W L; Wang, Zhuozhi; Bikangaga, Peter; Boelman, Cyrus; Carter, Melissa T; Cordeiro, Dawn; Cytrynbaum, Cheryl; Dell, Sharon D; Dhir, Priya; Dowling, James J; Heon, Elise; Hewson, Stacy; Hiraki, Linda; Inbar-Feigenberg, Michal; Klatt, Regan; Kronick, Jonathan; Laxer, Ronald M; Licht, Christoph; MacDonald, Heather; Mercimek-Andrews, Saadet; Mendoza-Londono, Roberto; Piscione, Tino; Schneider, Rayfel; Schulze, Andreas; Silverman, Earl; Siriwardena, Komudi; Snead, O Carter; Sondheimer, Neal; Sutherland, Joanne; Vincent, Ajoy; Wasserman, Jonathan D; Weksberg, Rosanna; Shuman, Cheryl; Carew, Chris; Szego, Michael J; Hayeems, Robin Z; Basran, Raveen; Stavropoulos, Dimitri J; Ray, Peter N; Bowdin, Sarah; Meyn, M Stephen; Cohn, Ronald D; Scherer, Stephen W; Marshall, Christian R
2018-04-01
PurposeGenetic testing is an integral diagnostic component of pediatric medicine. Standard of care is often a time-consuming stepwise approach involving chromosomal microarray analysis and targeted gene sequencing panels, which can be costly and inconclusive. Whole-genome sequencing (WGS) provides a comprehensive testing platform that has the potential to streamline genetic assessments, but there are limited comparative data to guide its clinical use.MethodsWe prospectively recruited 103 patients from pediatric non-genetic subspecialty clinics, each with a clinical phenotype suggestive of an underlying genetic disorder, and compared the diagnostic yield and coverage of WGS with those of conventional genetic testing.ResultsWGS identified diagnostic variants in 41% of individuals, representing a significant increase over conventional testing results (24%; P = 0.01). Genes clinically sequenced in the cohort (n = 1,226) were well covered by WGS, with a median exonic coverage of 40 × ±8 × (mean ±SD). All the molecular diagnoses made by conventional methods were captured by WGS. The 18 new diagnoses made with WGS included structural and non-exonic sequence variants not detectable with whole-exome sequencing, and confirmed recent disease associations with the genes PIGG, RNU4ATAC, TRIO, and UNC13A.ConclusionWGS as a primary clinical test provided a higher diagnostic yield than conventional genetic testing in a clinically heterogeneous cohort.
Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi
2015-11-20
The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.
SAR and scan-time optimized 3D whole-brain double inversion recovery imaging at 7T.
Pracht, Eberhard D; Feiweier, Thorsten; Ehses, Philipp; Brenner, Daniel; Roebroeck, Alard; Weber, Bernd; Stöcker, Tony
2018-05-01
The aim of this project was to implement an ultra-high field (UHF) optimized double inversion recovery (DIR) sequence for gray matter (GM) imaging, enabling whole brain coverage in short acquisition times ( ≈5 min, image resolution 1 mm 3 ). A 3D variable flip angle DIR turbo spin echo (TSE) sequence was optimized for UHF application. We implemented an improved, fast, and specific absorption rate (SAR) efficient TSE imaging module, utilizing improved reordering. The DIR preparation was tailored to UHF application. Additionally, fat artifacts were minimized by employing water excitation instead of fat saturation. GM images, covering the whole brain, were acquired in 7 min scan time at 1 mm isotropic resolution. SAR issues were overcome by using a dedicated flip angle calculation considering SAR and SNR efficiency. Furthermore, UHF related artifacts were minimized. The suggested sequence is suitable to generate GM images with whole-brain coverage at UHF. Due to the short total acquisition times and overall robustness, this approach can potentially enable DIR application in a routine setting and enhance lesion detection in neurological diseases. Magn Reson Med 79:2620-2628, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Hart, Elizabeth A; Caccamo, Mario; Harrow, Jennifer L; Humphray, Sean J; Gilbert, James GR; Trevanion, Steve; Hubbard, Tim; Rogers, Jane; Rothschild, Max F
2007-01-01
Background We describe here the sequencing, annotation and comparative analysis of an 8 Mb region of pig chromosome 17, which provides a useful test region to assess coverage and quality for the pig genome sequencing project. We report our findings comparing the annotation of draft sequence assembled at different depths of coverage. Results Within this region we annotated 71 loci, of which 53 are orthologous to human known coding genes. When compared to the syntenic regions in human (20q13.13-q13.33) and mouse (chromosome 2, 167.5 Mb-178.3 Mb), this region was found to be highly conserved with respect to gene order. The most notable difference between the three species is the presence of a large expansion of zinc finger coding genes and pseudogenes on mouse chromosome 2 between Edn3 and Phactr3 that is absent from pig and human. All of our annotation has been made publicly available in the Vertebrate Genome Annotation browser, VEGA. We assessed the impact of coverage on sequence assembly across this region and found, as expected, that increased sequence depth resulted in fewer, longer contigs. One-third of our annotated loci could not be fully re-aligned back to the low coverage version of the sequence, principally because the transcripts are fragmented over several contigs. Conclusion We have demonstrated the considerable advantages of sequencing at increased read depths and discuss the implications that lower coverage sequence may have on subsequent comparative and functional studies, particularly those involving complex loci such as GNAS. PMID:17705864
Dynamics of domain coverage of the protein sequence universe.
Rekapalli, Bhanu; Wuichet, Kristin; Peterson, Gregory D; Zhulin, Igor B
2012-11-16
The currently known protein sequence space consists of millions of sequences in public databases and is rapidly expanding. Assigning sequences to families leads to a better understanding of protein function and the nature of the protein universe. However, a large portion of the current protein space remains unassigned and is referred to as its "dark matter". Here we suggest that true size of "dark matter" is much larger than stated by current definitions. We propose an approach to reducing the size of "dark matter" by identifying and subtracting regions in protein sequences that are not likely to contain any domain. Recent improvements in computational domain modeling result in a decrease, albeit slowly, in the relative size of "dark matter"; however, its absolute size increases substantially with the growth of sequence data.
Quail, Michael A; Smith, Miriam; Coupland, Paul; Otto, Thomas D; Harris, Simon R; Connor, Thomas R; Bertoni, Anna; Swerdlow, Harold P; Gu, Yong
2012-07-24
Next generation sequencing (NGS) technology has revolutionized genomic and genetic research. The pace of change in this area is rapid with three major new sequencing platforms having been released in 2011: Ion Torrent's PGM, Pacific Biosciences' RS and the Illumina MiSeq. Here we compare the results obtained with those platforms to the performance of the Illumina HiSeq, the current market leader. In order to compare these platforms, and get sufficient coverage depth to allow meaningful analysis, we have sequenced a set of 4 microbial genomes with mean GC content ranging from 19.3 to 67.7%. Together, these represent a comprehensive range of genome content. Here we report our analysis of that sequence data in terms of coverage distribution, bias, GC distribution, variant detection and accuracy. Sequence generated by Ion Torrent, MiSeq and Pacific Biosciences technologies displays near perfect coverage behaviour on GC-rich, neutral and moderately AT-rich genomes, but a profound bias was observed upon sequencing the extremely AT-rich genome of Plasmodium falciparum on the PGM, resulting in no coverage for approximately 30% of the genome. We analysed the ability to call variants from each platform and found that we could call slightly more variants from Ion Torrent data compared to MiSeq data, but at the expense of a higher false positive rate. Variant calling from Pacific Biosciences data was possible but higher coverage depth was required. Context specific errors were observed in both PGM and MiSeq data, but not in that from the Pacific Biosciences platform. All three fast turnaround sequencers evaluated here were able to generate usable sequence. However there are key differences between the quality of that data and the applications it will support.
Multi-Objective Optimization of Spacecraft Trajectories for Small-Body Coverage Missions
NASA Technical Reports Server (NTRS)
Hinckley, David, Jr.; Englander, Jacob; Hitt, Darren
2017-01-01
Visual coverage of surface elements of a small-body object requires multiple images to be taken that meet many requirements on their viewing angles, illumination angles, times of day, and combinations thereof. Designing trajectories capable of maximizing total possible coverage may not be useful since the image target sequence and the feasibility of said sequence given the rotation-rate limitations of the spacecraft are not taken into account. This work presents a means of optimizing, in a multi-objective manner, surface target sequences that account for such limitations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gibson, Eli; Biomedical Engineering, University of Western Ontario, London, Ontario; Centre for Medical Image Computing, University College London, London
Purpose: Defining prostate cancer (PCa) lesion clinical target volumes (CTVs) for multiparametric magnetic resonance imaging (mpMRI) could support focal boosting or treatment to improve outcomes or lower morbidity, necessitating appropriate CTV margins for mpMRI-defined gross tumor volumes (GTVs). This study aimed to identify CTV margins yielding 95% coverage of PCa tumors for prospective cases with high likelihood. Methods and Materials: Twenty-five men with biopsy-confirmed clinical stage T1 or T2 PCa underwent pre-prostatectomy mpMRI, yielding T2-weighted, dynamic contrast-enhanced, and apparent diffusion coefficient images. Digitized whole-mount histology was contoured and registered to mpMRI scans (error ≤2 mm). Four observers contoured lesion GTVs onmore » each mpMRI scan. CTVs were defined by isotropic and anisotropic expansion from these GTVs and from multiparametric (unioned) GTVs from 2 to 3 scans. Histologic coverage (proportions of tumor area on co-registered histology inside the CTV, measured for Gleason scores [GSs] ≥6 and ≥7) and prostate sparing (proportions of prostate volume outside the CTV) were measured. Nonparametric histologic-coverage prediction intervals defined minimal margins yielding 95% coverage for prospective cases with 78% to 92% likelihood. Results: On analysis of 72 true-positive tumor detections, 95% coverage margins were 9 to 11 mm (GS ≥ 6) and 8 to 10 mm (GS ≥ 7) for single-sequence GTVs and were 8 mm (GS ≥ 6) and 6 mm (GS ≥ 7) for 3-sequence GTVs, yielding CTVs that spared 47% to 81% of prostate tissue for the majority of tumors. Inclusion of T2-weighted contours increased sparing for multiparametric CTVs with 95% coverage margins for GS ≥6, and inclusion of dynamic contrast-enhanced contours increased sparing for GS ≥7. Anisotropic 95% coverage margins increased the sparing proportions to 71% to 86%. Conclusions: Multiparametric magnetic resonance imaging–defined GTVs expanded by appropriate margins may support focal boosting or treatment of PCa; however, these margins, accounting for interobserver and intertumoral variability, may preclude highly conformal CTVs. Multiparametric GTVs and anisotropic margins may reduce the required margins and improve prostate sparing.« less
snpAD: An ancient DNA genotype caller.
Prüfer, Kay
2018-06-21
The study of ancient genomes can elucidate the evolutionary past. However, analyses are complicated by base-modifications in ancient DNA molecules that result in errors in DNA sequences. These errors are particularly common near the ends of sequences and pose a challenge for genotype calling. I describe an iterative method that estimates genotype frequencies and errors along sequences to allow for accurate genotype calling from ancient sequences. The implementation of this method, called snpAD, performs well on high-coverage ancient data, as shown by simulations and by subsampling the data of a high-coverage Neandertal genome. Although estimates for low-coverage genomes are less accurate, I am able to derive approximate estimates of heterozygosity from several low-coverage Neandertals. These estimates show that low heterozygosity, compared to modern humans, was common among Neandertals. The C ++ code of snpAD is freely available at http://bioinf.eva.mpg.de/snpAD/. Supplementary data are available at Bioinformatics online.
Indexcov: fast coverage quality control for whole-genome sequencing.
Pedersen, Brent S; Collins, Ryan L; Talkowski, Michael E; Quinlan, Aaron R
2017-11-01
The BAM and CRAM formats provide a supplementary linear index that facilitates rapid access to sequence alignments in arbitrary genomic regions. Comparing consecutive entries in a BAM or CRAM index allows one to infer the number of alignment records per genomic region for use as an effective proxy of sequence depth in each genomic region. Based on these properties, we have developed indexcov, an efficient estimator of whole-genome sequencing coverage to rapidly identify samples with aberrant coverage profiles, reveal large-scale chromosomal anomalies, recognize potential batch effects, and infer the sex of a sample. Indexcov is available at https://github.com/brentp/goleft under the MIT license. © The Authors 2017. Published by Oxford University Press.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ruggles, Kelly V.; Tang, Zuojian; Wang, Xuya
Improvements in mass spectrometry (MS)-based peptide sequencing provide a new opportunity to determine whether polymorphisms, mutations and splice variants identified in cancer cells are translated. Herein we therefore describe a proteogenomic data integration tool (QUILTS) and illustrate its application to whole genome, transcriptome and global MS peptide sequence datasets generated from a pair of luminal and basal-like breast cancer patient derived xenografts (PDX). The sensitivity of proteogenomic analysis for singe nucleotide variant (SNV) expression and novel splice junction (NSJ) detection was probed using multiple MS/MS process replicates. Despite over thirty sample replicates, only about 10% of all SNV (somatic andmore » germline) were detected by both DNA and RNA sequencing were observed as peptides. An even smaller proportion of peptides corresponding to NSJ observed by RNA sequencing were detected (<0.1%). Peptides mapping to DNA-detected SNV without a detectable mRNA transcript were also observed demonstrating the transcriptome coverage was also incomplete (~80%). In contrast to germ-line variants, somatic variants were less likely to be detected at the peptide level in the basal-like tumor than the luminal tumor raising the possibility of differential translation or protein degradation effects. In conclusion, the QUILTS program integrates DNA, RNA and peptide sequencing to assess the degree to which somatic mutations are translated and therefore biologically active. By identifying gaps in sequence coverage QUILTS benchmarks current technology and assesses progress towards whole cancer proteome and transcriptome analysis.« less
Assessing the Relationship of Ancient and Modern Populations
Schraiber, Joshua G.
2018-01-01
Genetic material sequenced from ancient samples is revolutionizing our understanding of the recent evolutionary past. However, ancient DNA is often degraded, resulting in low coverage, error-prone sequencing. Several solutions exist to this problem, ranging from simple approach, such as selecting a read at random for each site, to more complicated approaches involving genotype likelihoods. In this work, we present a novel method for assessing the relationship of an ancient sample with a modern population, while accounting for sequencing error and postmortem damage by analyzing raw reads from multiple ancient individuals simultaneously. We show that, when analyzing SNP data, it is better to sequence more ancient samples to low coverage: two samples sequenced to 0.5× coverage provide better resolution than a single sample sequenced to 2× coverage. We also examined the power to detect whether an ancient sample is directly ancestral to a modern population, finding that, with even a few high coverage individuals, even ancient samples that are very slightly diverged from the modern population can be detected with ease. When we applied our approach to European samples, we found that no ancient samples represent direct ancestors of modern Europeans. We also found that, as shown previously, the most ancient Europeans appear to have had the smallest effective population sizes, indicating a role for agriculture in modern population growth. PMID:29167200
Gene and translation initiation site prediction in metagenomic sequences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hyatt, Philip Douglas; LoCascio, Philip F; Hauser, Loren John
2012-01-01
Gene prediction in metagenomic sequences remains a difficult problem. Current sequencing technologies do not achieve sufficient coverage to assemble the individual genomes in a typical sample; consequently, sequencing runs produce a large number of short sequences whose exact origin is unknown. Since these sequences are usually smaller than the average length of a gene, algorithms must make predictions based on very little data. We present MetaProdigal, a metagenomic version of the gene prediction program Prodigal, that can identify genes in short, anonymous coding sequences with a high degree of accuracy. The novel value of the method consists of enhanced translationmore » initiation site identification, ability to identify sequences that use alternate genetic codes and confidence values for each gene call. We compare the results of MetaProdigal with other methods and conclude with a discussion of future improvements.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Borot de Battisti, M; Maenhout, M; Lagendijk, J J W
Purpose: To develop a new method which adaptively determines the optimal needle insertion sequence for HDR prostate brachytherapy involving divergent needle-by-needle dose delivery by e.g. a robotic device. A needle insertion sequence is calculated at the beginning of the intervention and updated after each needle insertion with feedback on needle positioning errors. Methods: Needle positioning errors and anatomy changes may occur during HDR brachytherapy which can lead to errors in the delivered dose. A novel strategy was developed to calculate and update the needle sequence and the dose plan after each needle insertion with feedback on needle positioning errors. Themore » dose plan optimization was performed by numerical simulations. The proposed needle sequence determination optimizes the final dose distribution based on the dose coverage impact of each needle. This impact is predicted stochastically by needle insertion simulations. HDR procedures were simulated with varying number of needle insertions (4 to 12) using 11 patient MR data-sets with PTV, prostate, urethra, bladder and rectum delineated. Needle positioning errors were modeled by random normally distributed angulation errors (standard deviation of 3 mm at the needle’s tip). The final dose parameters were compared in the situations where the needle with the largest vs. the smallest dose coverage impact was selected at each insertion. Results: Over all scenarios, the percentage of clinically acceptable final dose distribution improved when the needle selected had the largest dose coverage impact (91%) compared to the smallest (88%). The differences were larger for few (4 to 6) needle insertions (maximum difference scenario: 79% vs. 60%). The computation time of the needle sequence optimization was below 60s. Conclusion: A new adaptive needle sequence determination for HDR prostate brachytherapy was developed. Coupled to adaptive planning, the selection of the needle with the largest dose coverage impact increases chances of reaching the clinical constraints. M. Borot de Battisti is funded by Philips Medical Systems Nederland B.V.; M. Moerland is principal investigator on a contract funded by Philips Medical Systems Nederland B.V.; G. Hautvast and D. Binnekamp are fulltime employees of Philips Medical Systems Nederland B.V.« less
Launch mission summary and sequence of events Telesat-F(anik-D1)/Delta-164
NASA Technical Reports Server (NTRS)
1982-01-01
The launch vehicle, spacecraft, and mission are summarized. Launch window information, vehicle telemetry coverage, real time data flow, telemetry coverage by station, selected trajectory information, and a brief sequence of flight events are included.
A user's guide to quantitative and comparative analysis of metagenomic datasets.
Luo, Chengwei; Rodriguez-R, Luis M; Konstantinidis, Konstantinos T
2013-01-01
Metagenomics has revolutionized microbiological studies during the past decade and provided new insights into the diversity, dynamics, and metabolic potential of natural microbial communities. However, metagenomics still represents a field in development, and standardized tools and approaches to handle and compare metagenomes have not been established yet. An important reason accounting for the latter is the continuous changes in the type of sequencing data available, for example, long versus short sequencing reads. Here, we provide a guide to bioinformatic pipelines developed to accomplish the following tasks, focusing primarily on those developed by our team: (i) assemble a metagenomic dataset; (ii) determine the level of sequence coverage obtained and the amount of sequencing required to obtain complete coverage; (iii) identify the taxonomic affiliation of a metagenomic read or assembled contig; and (iv) determine differentially abundant genes, pathways, and species between different datasets. Most of these pipelines do not depend on the type of sequences available or can be easily adjusted to fit different types of sequences, and are freely available (for instance, through our lab Web site: http://www.enve-omics.gatech.edu/). The limitations of current approaches, as well as the computational aspects that can be further improved, will also be briefly discussed. The work presented here provides practical guidelines on how to perform metagenomic analysis of microbial communities characterized by varied levels of diversity and establishes approaches to handle the resulting data, independent of the sequencing platform employed. © 2013 Elsevier Inc. All rights reserved.
Comparing viral metagenomics methods using a highly multiplexed human viral pathogens reagent
Li, Linlin; Deng, Xutao; Mee, Edward T.; Collot-Teixeira, Sophie; Anderson, Rob; Schepelmann, Silke; Minor, Philip D.; Delwart, Eric
2014-01-01
Unbiased metagenomic sequencing holds significant potential as a diagnostic tool for the simultaneous detection of any previously genetically described viral nucleic acids in clinical samples. Viral genome sequences can also inform on likely phenotypes including drug susceptibility or neutralization serotypes. In this study, different variables of the laboratory methods often used to generate viral metagenomics libraries on the efficiency of viral detection and virus genome coverage were compared. A biological reagent consisting of 25 different human RNA and DNA viral pathogens was used to estimate the effect of filtration and nuclease digestion, DNA/RNA extraction methods, pre-amplification and the use of different library preparation kits on the detection of viral nucleic acids. Filtration and nuclease treatment led to slight decreases in the percentage of viral sequence reads and number of viruses detected. For nucleic acid extractions silica spin columns improved viral sequence recovery relative to magnetic beads and Trizol extraction. Pre-amplification using random RT-PCR while generating more viral sequence reads resulted in detection of fewer viruses, more overlapping sequences, and lower genome coverage. The ScriptSeq library preparation method retrieved more viruses and a greater fraction of their genomes than the TruSeq and Nextera methods. Viral metagenomics sequencing was able to simultaneously detect up to 22 different viruses in the biological reagent analyzed including all those detected by qPCR. Further optimization will be required for the detection of viruses in biologically more complex samples such as tissues, blood, or feces. PMID:25497414
Dynamics of domain coverage of the protein sequence universe
2012-01-01
Background The currently known protein sequence space consists of millions of sequences in public databases and is rapidly expanding. Assigning sequences to families leads to a better understanding of protein function and the nature of the protein universe. However, a large portion of the current protein space remains unassigned and is referred to as its “dark matter”. Results Here we suggest that true size of “dark matter” is much larger than stated by current definitions. We propose an approach to reducing the size of “dark matter” by identifying and subtracting regions in protein sequences that are not likely to contain any domain. Conclusions Recent improvements in computational domain modeling result in a decrease, albeit slowly, in the relative size of “dark matter”; however, its absolute size increases substantially with the growth of sequence data. PMID:23157439
De novo protein sequencing by combining top-down and bottom-up tandem mass spectra.
Liu, Xiaowen; Dekker, Lennard J M; Wu, Si; Vanduijn, Martijn M; Luider, Theo M; Tolić, Nikola; Kou, Qiang; Dvorkin, Mikhail; Alexandrova, Sonya; Vyatkina, Kira; Paša-Tolić, Ljiljana; Pevzner, Pavel A
2014-07-03
There are two approaches for de novo protein sequencing: Edman degradation and mass spectrometry (MS). Existing MS-based methods characterize a novel protein by assembling tandem mass spectra of overlapping peptides generated from multiple proteolytic digestions of the protein. Because each tandem mass spectrum covers only a short peptide of the target protein, the key to high coverage protein sequencing is to find spectral pairs from overlapping peptides in order to assemble tandem mass spectra to long ones. However, overlapping regions of peptides may be too short to be confidently identified. High-resolution mass spectrometers have become accessible to many laboratories. These mass spectrometers are capable of analyzing molecules of large mass values, boosting the development of top-down MS. Top-down tandem mass spectra cover whole proteins. However, top-down tandem mass spectra, even combined, rarely provide full ion fragmentation coverage of a protein. We propose an algorithm, TBNovo, for de novo protein sequencing by combining top-down and bottom-up MS. In TBNovo, a top-down tandem mass spectrum is utilized as a scaffold, and bottom-up tandem mass spectra are aligned to the scaffold to increase sequence coverage. Experiments on data sets of two proteins showed that TBNovo achieved high sequence coverage and high sequence accuracy.
2011-01-01
Background HIV vaccine development must address the genetic diversity and plasticity of the virus that permits the presentation of diverse genetic forms to the immune system and subsequent escape from immune pressure. Assessment of potential HIV strain coverage by candidate T cell-based vaccines (whether natural sequence or computationally optimized products) is now a critical component in interpreting candidate vaccine suitability. Methods We have utilized an N-mer identity algorithm to represent T cell epitopes and explore potential coverage of the global HIV pandemic using natural sequences derived from candidate HIV vaccines. Breadth (the number of T cell epitopes generated) and depth (the variant coverage within a T cell epitope) analyses have been incorporated into the model to explore vaccine coverage requirements in terms of the number of discrete T cell epitopes generated. Results We show that when multiple epitope generation by a vaccine product is considered a far more nuanced appraisal of the potential HIV strain coverage of the vaccine product emerges. By considering epitope breadth and depth several important observations were made: (1) epitope breadth requirements to reach particular levels of vaccine coverage, even for natural sequence-based vaccine products is not necessarily an intractable problem for the immune system; (2) increasing the valency (number of T cell epitope variants present) of vaccine products dramatically decreases the epitope requirements to reach particular coverage levels for any epidemic; (3) considering multiple-hit models (more than one exact epitope match with an incoming HIV strain) places a significantly higher requirement upon epitope breadth in order to reach a given level of coverage, to the point where low valency natural sequence based products would not practically be able to generate sufficient epitopes. Conclusions When HIV vaccine sequences are compared against datasets of potential incoming viruses important metrics such as the minimum epitope count required to reach a desired level of coverage can be easily calculated. We propose that such analyses can be applied early in the planning stages and during the execution phase of a vaccine trial to explore theoretical and empirical suitability of a vaccine product to a particular epidemic setting. PMID:22152192
Enhanced sequencing coverage with digital droplet multiple displacement amplification
Sidore, Angus M.; Lan, Freeman; Lim, Shaun W.; Abate, Adam R.
2016-01-01
Sequencing small quantities of DNA is important for applications ranging from the assembly of uncultivable microbial genomes to the identification of cancer-associated mutations. To obtain sufficient quantities of DNA for sequencing, the small amount of starting material must be amplified significantly. However, existing methods often yield errors or non-uniform coverage, reducing sequencing data quality. Here, we describe digital droplet multiple displacement amplification, a method that enables massive amplification of low-input material while maintaining sequence accuracy and uniformity. The low-input material is compartmentalized as single molecules in millions of picoliter droplets. Because the molecules are isolated in compartments, they amplify to saturation without competing for resources; this yields uniform representation of all sequences in the final product and, in turn, enhances the quality of the sequence data. We demonstrate the ability to uniformly amplify the genomes of single Escherichia coli cells, comprising just 4.7 fg of starting DNA, and obtain sequencing coverage distributions that rival that of unamplified material. Digital droplet multiple displacement amplification provides a simple and effective method for amplifying minute amounts of DNA for accurate and uniform sequencing. PMID:26704978
Estimating genotype error rates from high-coverage next-generation sequence data.
Wall, Jeffrey D; Tang, Ling Fung; Zerbe, Brandon; Kvale, Mark N; Kwok, Pui-Yan; Schaefer, Catherine; Risch, Neil
2014-11-01
Exome and whole-genome sequencing studies are becoming increasingly common, but little is known about the accuracy of the genotype calls made by the commonly used platforms. Here we use replicate high-coverage sequencing of blood and saliva DNA samples from four European-American individuals to estimate lower bounds on the error rates of Complete Genomics and Illumina HiSeq whole-genome and whole-exome sequencing. Error rates for nonreference genotype calls range from 0.1% to 0.6%, depending on the platform and the depth of coverage. Additionally, we found (1) no difference in the error profiles or rates between blood and saliva samples; (2) Complete Genomics sequences had substantially higher error rates than Illumina sequences had; (3) error rates were higher (up to 6%) for rare or unique variants; (4) error rates generally declined with genotype quality (GQ) score, but in a nonlinear fashion for the Illumina data, likely due to loss of specificity of GQ scores greater than 60; and (5) error rates increased with increasing depth of coverage for the Illumina data. These findings, especially (3)-(5), suggest that caution should be taken in interpreting the results of next-generation sequencing-based association studies, and even more so in clinical application of this technology in the absence of validation by other more robust sequencing or genotyping methods. © 2014 Wall et al.; Published by Cold Spring Harbor Laboratory Press.
Pitteri, Sharon J.; Chrisman, Paul A.; McLuckey, Scott A.
2005-01-01
In this study, the electron-transfer dissociation (ETD) behavior of cations derived from 27 different peptides (22 of which are tryptic peptides) has been studied in a 3D quadrupole ion trap mass spectrometer. Ion/ion reactions between peptide cations and nitrobenzene anions have been examined at both room temperature and in an elevated temperature bath gas environment to form ETD product ions. From the peptides studied, the ETD sequence coverage tends to be inversely related to peptide size. At room temperature, very high sequence coverage (~100%) was observed for small peptides (≤7 amino acids). For medium-sized peptides composed of 8–11 amino acids, the average sequence coverage was 46%. Larger peptides with 14 or more amino acids yielded an average sequence coverage of 23%. Elevated-temperature ETD provided increased sequence coverage over room-temperature experiments for the peptides of greater than 7 residues, giving an average of 67% for medium-sized peptides and 63% for larger peptides. Percent ETD, a measure of the extent of electron transfer, has also been calculated for the peptides and also shows an inverse relation with peptide size. Bath gas temperature does not have a consistent effect on percent ETD, however. For the tryptic peptides, fragmentation is localized at the ends of the peptides suggesting that the distribution of charge within the peptide may play an important role in determining fragmentation sites. A triply protonated peptide has also been studied and shows behavior similar to the doubly charged peptides. These preliminary results suggest that for a given charge state there is a maximum size for which high sequence coverage is obtained and that increasing the bath gas temperature can increase this maximum. PMID:16131079
Comparative analyses of two Geraniaceae transcriptomes using next-generation sequencing.
Zhang, Jin; Ruhlman, Tracey A; Mower, Jeffrey P; Jansen, Robert K
2013-12-29
Organelle genomes of Geraniaceae exhibit several unusual evolutionary phenomena compared to other angiosperm families including accelerated nucleotide substitution rates, widespread gene loss, reduced RNA editing, and extensive genomic rearrangements. Since most organelle-encoded proteins function in multi-subunit complexes that also contain nuclear-encoded proteins, it is likely that the atypical organellar phenomena affect the evolution of nuclear genes encoding organellar proteins. To begin to unravel the complex co-evolutionary interplay between organellar and nuclear genomes in this family, we sequenced nuclear transcriptomes of two species, Geranium maderense and Pelargonium x hortorum. Normalized cDNA libraries of G. maderense and P. x hortorum were used for transcriptome sequencing. Five assemblers (MIRA, Newbler, SOAPdenovo, SOAPdenovo-trans [SOAPtrans], Trinity) and two next-generation technologies (454 and Illumina) were compared to determine the optimal transcriptome sequencing approach. Trinity provided the highest quality assembly of Illumina data with the deepest transcriptome coverage. An analysis to determine the amount of sequencing needed for de novo assembly revealed diminishing returns of coverage and quality with data sets larger than sixty million Illumina paired end reads for both species. The G. maderense and P. x hortorum transcriptomes contained fewer transcripts encoding the PLS subclass of PPR proteins relative to other angiosperms, consistent with reduced mitochondrial RNA editing activity in Geraniaceae. In addition, transcripts for all six plastid targeted sigma factors were identified in both transcriptomes, suggesting that one of the highly divergent rpoA-like ORFs in the P. x hortorum plastid genome is functional. The findings support the use of the Illumina platform and assemblers optimized for transcriptome assembly, such as Trinity or SOAPtrans, to generate high-quality de novo transcriptomes with broad coverage. In addition, results indicated no major improvements in breadth of coverage with data sets larger than six billion nucleotides or when sampling RNA from four tissue types rather than from a single tissue. Finally, this work demonstrates the power of cross-compartmental genomic analyses to deepen our understanding of the correlated evolution of the nuclear, plastid, and mitochondrial genomes in plants.
Comparative analyses of two Geraniaceae transcriptomes using next-generation sequencing
2013-01-01
Background Organelle genomes of Geraniaceae exhibit several unusual evolutionary phenomena compared to other angiosperm families including accelerated nucleotide substitution rates, widespread gene loss, reduced RNA editing, and extensive genomic rearrangements. Since most organelle-encoded proteins function in multi-subunit complexes that also contain nuclear-encoded proteins, it is likely that the atypical organellar phenomena affect the evolution of nuclear genes encoding organellar proteins. To begin to unravel the complex co-evolutionary interplay between organellar and nuclear genomes in this family, we sequenced nuclear transcriptomes of two species, Geranium maderense and Pelargonium x hortorum. Results Normalized cDNA libraries of G. maderense and P. x hortorum were used for transcriptome sequencing. Five assemblers (MIRA, Newbler, SOAPdenovo, SOAPdenovo-trans [SOAPtrans], Trinity) and two next-generation technologies (454 and Illumina) were compared to determine the optimal transcriptome sequencing approach. Trinity provided the highest quality assembly of Illumina data with the deepest transcriptome coverage. An analysis to determine the amount of sequencing needed for de novo assembly revealed diminishing returns of coverage and quality with data sets larger than sixty million Illumina paired end reads for both species. The G. maderense and P. x hortorum transcriptomes contained fewer transcripts encoding the PLS subclass of PPR proteins relative to other angiosperms, consistent with reduced mitochondrial RNA editing activity in Geraniaceae. In addition, transcripts for all six plastid targeted sigma factors were identified in both transcriptomes, suggesting that one of the highly divergent rpoA-like ORFs in the P. x hortorum plastid genome is functional. Conclusions The findings support the use of the Illumina platform and assemblers optimized for transcriptome assembly, such as Trinity or SOAPtrans, to generate high-quality de novo transcriptomes with broad coverage. In addition, results indicated no major improvements in breadth of coverage with data sets larger than six billion nucleotides or when sampling RNA from four tissue types rather than from a single tissue. Finally, this work demonstrates the power of cross-compartmental genomic analyses to deepen our understanding of the correlated evolution of the nuclear, plastid, and mitochondrial genomes in plants. PMID:24373163
Perez de Souza, Leonardo; Naake, Thomas; Tohge, Takayuki; Fernie, Alisdair R
2017-01-01
Abstract The grand challenge currently facing metabolomics is the expansion of the coverage of the metabolome from a minor percentage of the metabolic complement of the cell toward the level of coverage afforded by other post-genomic technologies such as transcriptomics and proteomics. In plants, this problem is exacerbated by the sheer diversity of chemicals that constitute the metabolome, with the number of metabolites in the plant kingdom generally considered to be in excess of 200 000. In this review, we focus on web resources that can be exploited in order to improve analyte and ultimately metabolite identification and quantification. There is a wide range of available software that not only aids in this but also in the related area of peak alignment; however, for the uninitiated, choosing which program to use is a daunting task. For this reason, we provide an overview of the pros and cons of the software as well as comments regarding the level of programing skills required to effectively exploit their basic functions. In addition, the torrent of available genome and transcriptome sequences that followed the advent of next-generation sequencing has opened up further valuable resources for metabolite identification. All things considered, we posit that only via a continued communal sharing of information such as that deposited in the databases described within the article are we likely to be able to make significant headway toward improving our coverage of the plant metabolome. PMID:28520864
Fan, Zhaoyang; Yang, Qi; Deng, Zixin; Li, Yuxia; Bi, Xiaoming; Song, Shlee; Li, Debiao
2017-03-01
Although three-dimensional (3D) turbo spin echo (TSE) with variable flip angles has proven to be useful for intracranial vessel wall imaging, it is associated with inadequate suppression of cerebrospinal fluid (CSF) signals and limited spatial coverage at 3 Tesla (T). This work aimed to modify the sequence and develop a protocol to achieve whole-brain, CSF-attenuated T 1 -weighted vessel wall imaging. Nonselective excitation and a flip-down radiofrequency pulse module were incorporated into a commercial 3D TSE sequence. A protocol based on the sequence was designed to achieve T 1 -weighted vessel wall imaging with whole-brain spatial coverage, enhanced CSF-signal suppression, and isotropic 0.5-mm resolution. Human volunteer and pilot patient studies were performed to qualitatively and quantitatively demonstrate the advantages of the sequence. Compared with the original sequence, the modified sequence significantly improved the T 1 -weighted image contrast score (2.07 ± 0.19 versus 3.00 ± 0.00, P = 0.011), vessel wall-to-CSF contrast ratio (0.14 ± 0.16 versus 0.52 ± 0.30, P = 0.007) and contrast-to-noise ratio (1.69 ± 2.18 versus 4.26 ± 2.30, P = 0.022). Significant improvement in vessel wall outer boundary sharpness was observed in several major arterial segments. The new 3D TSE sequence allows for high-quality T 1 -weighted intracranial vessel wall imaging at 3 T. It may potentially aid in depicting small arteries and revealing T 1 -mediated high-signal wall abnormalities. Magn Reson Med 77:1142-1150, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
Nonpareil 3: Fast Estimation of Metagenomic Coverage and Sequence Diversity.
Rodriguez-R, Luis M; Gunturu, Santosh; Tiedje, James M; Cole, James R; Konstantinidis, Konstantinos T
2018-01-01
Estimations of microbial community diversity based on metagenomic data sets are affected, often to an unknown degree, by biases derived from insufficient coverage and reference database-dependent estimations of diversity. For instance, the completeness of reference databases cannot be generally estimated since it depends on the extant diversity sampled to date, which, with the exception of a few habitats such as the human gut, remains severely undersampled. Further, estimation of the degree of coverage of a microbial community by a metagenomic data set is prohibitively time-consuming for large data sets, and coverage values may not be directly comparable between data sets obtained with different sequencing technologies. Here, we extend Nonpareil, a database-independent tool for the estimation of coverage in metagenomic data sets, to a high-performance computing implementation that scales up to hundreds of cores and includes, in addition, a k -mer-based estimation as sensitive as the original alignment-based version but about three hundred times as fast. Further, we propose a metric of sequence diversity ( N d ) derived directly from Nonpareil curves that correlates well with alpha diversity assessed by traditional metrics. We use this metric in different experiments demonstrating the correlation with the Shannon index estimated on 16S rRNA gene profiles and show that N d additionally reveals seasonal patterns in marine samples that are not captured by the Shannon index and more precise rankings of the magnitude of diversity of microbial communities in different habitats. Therefore, the new version of Nonpareil, called Nonpareil 3, advances the toolbox for metagenomic analyses of microbiomes. IMPORTANCE Estimation of the coverage provided by a metagenomic data set, i.e., what fraction of the microbial community was sampled by DNA sequencing, represents an essential first step of every culture-independent genomic study that aims to robustly assess the sequence diversity present in a sample. However, estimation of coverage remains elusive because of several technical limitations associated with high computational requirements and limiting statistical approaches to quantify diversity. Here we described Nonpareil 3, a new bioinformatics algorithm that circumvents several of these limitations and thus can facilitate culture-independent studies in clinical or environmental settings, independent of the sequencing platform employed. In addition, we present a new metric of sequence diversity based on rarefied coverage and demonstrate its use in communities from diverse ecosystems.
CORNAS: coverage-dependent RNA-Seq analysis of gene expression data without biological replicates.
Low, Joel Z B; Khang, Tsung Fei; Tammi, Martti T
2017-12-28
In current statistical methods for calling differentially expressed genes in RNA-Seq experiments, the assumption is that an adjusted observed gene count represents an unknown true gene count. This adjustment usually consists of a normalization step to account for heterogeneous sample library sizes, and then the resulting normalized gene counts are used as input for parametric or non-parametric differential gene expression tests. A distribution of true gene counts, each with a different probability, can result in the same observed gene count. Importantly, sequencing coverage information is currently not explicitly incorporated into any of the statistical models used for RNA-Seq analysis. We developed a fast Bayesian method which uses the sequencing coverage information determined from the concentration of an RNA sample to estimate the posterior distribution of a true gene count. Our method has better or comparable performance compared to NOISeq and GFOLD, according to the results from simulations and experiments with real unreplicated data. We incorporated a previously unused sequencing coverage parameter into a procedure for differential gene expression analysis with RNA-Seq data. Our results suggest that our method can be used to overcome analytical bottlenecks in experiments with limited number of replicates and low sequencing coverage. The method is implemented in CORNAS (Coverage-dependent RNA-Seq), and is available at https://github.com/joel-lzb/CORNAS .
Identifying micro-inversions using high-throughput sequencing reads.
He, Feifei; Li, Yang; Tang, Yu-Hang; Ma, Jian; Zhu, Huaiqiu
2016-01-11
The identification of inversions of DNA segments shorter than read length (e.g., 100 bp), defined as micro-inversions (MIs), remains challenging for next-generation sequencing reads. It is acknowledged that MIs are important genomic variation and may play roles in causing genetic disease. However, current alignment methods are generally insensitive to detect MIs. Here we develop a novel tool, MID (Micro-Inversion Detector), to identify MIs in human genomes using next-generation sequencing reads. The algorithm of MID is designed based on a dynamic programming path-finding approach. What makes MID different from other variant detection tools is that MID can handle small MIs and multiple breakpoints within an unmapped read. Moreover, MID improves reliability in low coverage data by integrating multiple samples. Our evaluation demonstrated that MID outperforms Gustaf, which can currently detect inversions from 30 bp to 500 bp. To our knowledge, MID is the first method that can efficiently and reliably identify MIs from unmapped short next-generation sequencing reads. MID is reliable on low coverage data, which is suitable for large-scale projects such as the 1000 Genomes Project (1KGP). MID identified previously unknown MIs from the 1KGP that overlap with genes and regulatory elements in the human genome. We also identified MIs in cancer cell lines from Cancer Cell Line Encyclopedia (CCLE). Therefore our tool is expected to be useful to improve the study of MIs as a type of genetic variant in the human genome. The source code can be downloaded from: http://cqb.pku.edu.cn/ZhuLab/MID .
Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data
NASA Astrophysics Data System (ADS)
Sandmann, Sarah; de Graaf, Aniek O.; Karimi, Mohsen; van der Reijden, Bert A.; Hellström-Lindberg, Eva; Jansen, Joop H.; Dugas, Martin
2017-02-01
Valid variant calling results are crucial for the use of next-generation sequencing in clinical routine. However, there are numerous variant calling tools that usually differ in algorithms, filtering strategies, recommendations and thus, also in the output. We evaluated eight open-source tools regarding their ability to call single nucleotide variants and short indels with allelic frequencies as low as 1% in non-matched next-generation sequencing data: GATK HaplotypeCaller, Platypus, VarScan, LoFreq, FreeBayes, SNVer, SAMtools and VarDict. We analysed two real datasets from patients with myelodysplastic syndrome, covering 54 Illumina HiSeq samples and 111 Illumina NextSeq samples. Mutations were validated by re-sequencing on the same platform, on a different platform and expert based review. In addition we considered two simulated datasets with varying coverage and error profiles, covering 50 samples each. In all cases an identical target region consisting of 19 genes (42,322 bp) was analysed. Altogether, no tool succeeded in calling all mutations. High sensitivity was always accompanied by low precision. Influence of varying coverages- and background noise on variant calling was generally low. Taking everything into account, VarDict performed best. However, our results indicate that there is a need to improve reproducibility of the results in the context of multithreading.
Diversity arrays technology (DArT) markers in apple for genetic linkage maps.
Schouten, Henk J; van de Weg, W Eric; Carling, Jason; Khan, Sabaz Ali; McKay, Steven J; van Kaauwen, Martijn P W; Wittenberg, Alexander H J; Koehorst-van Putten, Herma J J; Noordijk, Yolanda; Gao, Zhongshan; Rees, D Jasper G; Van Dyk, Maria M; Jaccoud, Damian; Considine, Michael J; Kilian, Andrzej
2012-03-01
Diversity Arrays Technology (DArT) provides a high-throughput whole-genome genotyping platform for the detection and scoring of hundreds of polymorphic loci without any need for prior sequence information. The work presented here details the development and performance of a DArT genotyping array for apple. This is the first paper on DArT in horticultural trees. Genetic mapping of DArT markers in two mapping populations and their integration with other marker types showed that DArT is a powerful high-throughput method for obtaining accurate and reproducible marker data, despite the low cost per data point. This method appears to be suitable for aligning the genetic maps of different segregating populations. The standard complexity reduction method, based on the methylation-sensitive PstI restriction enzyme, resulted in a high frequency of markers, although there was 52-54% redundancy due to the repeated sampling of highly similar sequences. Sequencing of the marker clones showed that they are significantly enriched for low-copy, genic regions. The genome coverage using the standard method was 55-76%. For improved genome coverage, an alternative complexity reduction method was examined, which resulted in less redundancy and additional segregating markers. The DArT markers proved to be of high quality and were very suitable for genetic mapping at low cost for the apple, providing moderate genome coverage. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s11032-011-9579-5) contains supplementary material, which is available to authorized users.
High depth, whole-genome sequencing of cholera isolates from Haiti and the Dominican Republic.
Sealfon, Rachel; Gire, Stephen; Ellis, Crystal; Calderwood, Stephen; Qadri, Firdausi; Hensley, Lisa; Kellis, Manolis; Ryan, Edward T; LaRocque, Regina C; Harris, Jason B; Sabeti, Pardis C
2012-09-11
Whole-genome sequencing is an important tool for understanding microbial evolution and identifying the emergence of functionally important variants over the course of epidemics. In October 2010, a severe cholera epidemic began in Haiti, with additional cases identified in the neighboring Dominican Republic. We used whole-genome approaches to sequence four Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae isolates to a high depth of coverage (>2000x); four of the seven isolates were previously sequenced. Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants. We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads. Phylogenetic analysis between the newly sequenced and thirty-three previously sequenced V. cholerae isolates indicates that the Haitian and Dominican Republic isolates are closest to strains from South Asia. The Haitian and Dominican Republic isolates form a tight cluster, with only four variants unique to individual isolates. These variants are located in the CTX region, the SXT region, and the core genome. Of the 126 mutations identified that separate the Haiti-Dominican Republic cluster from the V. cholerae reference strain (N16961), 73 are non-synonymous changes, and a number of these changes cluster in specific genes and pathways. Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.
A programmable method for massively parallel targeted sequencing
Hopmans, Erik S.; Natsoulis, Georges; Bell, John M.; Grimes, Susan M.; Sieh, Weiva; Ji, Hanlee P.
2014-01-01
We have developed a targeted resequencing approach referred to as Oligonucleotide-Selective Sequencing. In this study, we report a series of significant improvements and novel applications of this method whereby the surface of a sequencing flow cell is modified in situ to capture specific genomic regions of interest from a sample and then sequenced. These improvements include a fully automated targeted sequencing platform through the use of a standard Illumina cBot fluidics station. Targeting optimization increased the yield of total on-target sequencing data 2-fold compared to the previous iteration, while simultaneously increasing the percentage of reads that could be mapped to the human genome. The described assays cover up to 1421 genes with a total coverage of 5.5 Megabases (Mb). We demonstrate a 10-fold abundance uniformity of greater than 90% in 1 log distance from the median and a targeting rate of up to 95%. We also sequenced continuous genomic loci up to 1.5 Mb while simultaneously genotyping SNPs and genes. Variants with low minor allele fraction were sensitively detected at levels of 5%. Finally, we determined the exact breakpoint sequence of cancer rearrangements. Overall, this approach has high performance for selective sequencing of genome targets, configuration flexibility and variant calling accuracy. PMID:24782526
Nowrousian, Minou; Stajich, Jason E.; Chu, Meiling; Engh, Ines; Espagne, Eric; Halliday, Karen; Kamerewerd, Jens; Kempken, Frank; Knab, Birgit; Kuo, Hsiao-Che; Osiewacz, Heinz D.; Pöggeler, Stefanie; Read, Nick D.; Seiler, Stephan; Smith, Kristina M.; Zickler, Denise; Kück, Ulrich; Freitag, Michael
2010-01-01
Filamentous fungi are of great importance in ecology, agriculture, medicine, and biotechnology. Thus, it is not surprising that genomes for more than 100 filamentous fungi have been sequenced, most of them by Sanger sequencing. While next-generation sequencing techniques have revolutionized genome resequencing, e.g. for strain comparisons, genetic mapping, or transcriptome and ChIP analyses, de novo assembly of eukaryotic genomes still presents significant hurdles, because of their large size and stretches of repetitive sequences. Filamentous fungi contain few repetitive regions in their 30–90 Mb genomes and thus are suitable candidates to test de novo genome assembly from short sequence reads. Here, we present a high-quality draft sequence of the Sordaria macrospora genome that was obtained by a combination of Illumina/Solexa and Roche/454 sequencing. Paired-end Solexa sequencing of genomic DNA to 85-fold coverage and an additional 10-fold coverage by single-end 454 sequencing resulted in ∼4 Gb of DNA sequence. Reads were assembled to a 40 Mb draft version (N50 of 117 kb) with the Velvet assembler. Comparative analysis with Neurospora genomes increased the N50 to 498 kb. The S. macrospora genome contains even fewer repeat regions than its closest sequenced relative, Neurospora crassa. Comparison with genomes of other fungi showed that S. macrospora, a model organism for morphogenesis and meiosis, harbors duplications of several genes involved in self/nonself-recognition. Furthermore, S. macrospora contains more polyketide biosynthesis genes than N. crassa. Phylogenetic analyses suggest that some of these genes may have been acquired by horizontal gene transfer from a distantly related ascomycete group. Our study shows that, for typical filamentous fungi, de novo assembly of genomes from short sequence reads alone is feasible, that a mixture of Solexa and 454 sequencing substantially improves the assembly, and that the resulting data can be used for comparative studies to address basic questions of fungal biology. PMID:20386741
Nowrousian, Minou; Stajich, Jason E; Chu, Meiling; Engh, Ines; Espagne, Eric; Halliday, Karen; Kamerewerd, Jens; Kempken, Frank; Knab, Birgit; Kuo, Hsiao-Che; Osiewacz, Heinz D; Pöggeler, Stefanie; Read, Nick D; Seiler, Stephan; Smith, Kristina M; Zickler, Denise; Kück, Ulrich; Freitag, Michael
2010-04-08
Filamentous fungi are of great importance in ecology, agriculture, medicine, and biotechnology. Thus, it is not surprising that genomes for more than 100 filamentous fungi have been sequenced, most of them by Sanger sequencing. While next-generation sequencing techniques have revolutionized genome resequencing, e.g. for strain comparisons, genetic mapping, or transcriptome and ChIP analyses, de novo assembly of eukaryotic genomes still presents significant hurdles, because of their large size and stretches of repetitive sequences. Filamentous fungi contain few repetitive regions in their 30-90 Mb genomes and thus are suitable candidates to test de novo genome assembly from short sequence reads. Here, we present a high-quality draft sequence of the Sordaria macrospora genome that was obtained by a combination of Illumina/Solexa and Roche/454 sequencing. Paired-end Solexa sequencing of genomic DNA to 85-fold coverage and an additional 10-fold coverage by single-end 454 sequencing resulted in approximately 4 Gb of DNA sequence. Reads were assembled to a 40 Mb draft version (N50 of 117 kb) with the Velvet assembler. Comparative analysis with Neurospora genomes increased the N50 to 498 kb. The S. macrospora genome contains even fewer repeat regions than its closest sequenced relative, Neurospora crassa. Comparison with genomes of other fungi showed that S. macrospora, a model organism for morphogenesis and meiosis, harbors duplications of several genes involved in self/nonself-recognition. Furthermore, S. macrospora contains more polyketide biosynthesis genes than N. crassa. Phylogenetic analyses suggest that some of these genes may have been acquired by horizontal gene transfer from a distantly related ascomycete group. Our study shows that, for typical filamentous fungi, de novo assembly of genomes from short sequence reads alone is feasible, that a mixture of Solexa and 454 sequencing substantially improves the assembly, and that the resulting data can be used for comparative studies to address basic questions of fungal biology.
Resolving the Complexity of Human Skin Metagenomes Using Single-Molecule Sequencing
Tsai, Yu-Chih; Deming, Clayton; Segre, Julia A.; Kong, Heidi H.; Korlach, Jonas
2016-01-01
ABSTRACT Deep metagenomic shotgun sequencing has emerged as a powerful tool to interrogate composition and function of complex microbial communities. Computational approaches to assemble genome fragments have been demonstrated to be an effective tool for de novo reconstruction of genomes from these communities. However, the resultant “genomes” are typically fragmented and incomplete due to the limited ability of short-read sequence data to assemble complex or low-coverage regions. Here, we use single-molecule, real-time (SMRT) sequencing to reconstruct a high-quality, closed genome of a previously uncharacterized Corynebacterium simulans and its companion bacteriophage from a skin metagenomic sample. Considerable improvement in assembly quality occurs in hybrid approaches incorporating short-read data, with even relatively small amounts of long-read data being sufficient to improve metagenome reconstruction. Using short-read data to evaluate strain variation of this C. simulans in its skin community at single-nucleotide resolution, we observed a dominant C. simulans strain with moderate allelic heterozygosity throughout the population. We demonstrate the utility of SMRT sequencing and hybrid approaches in metagenome quantitation, reconstruction, and annotation. PMID:26861018
An efficient study design to test parent-of-origin effects in family trios.
Yu, Xiaobo; Chen, Gao; Feng, Rui
2017-11-01
Increasing evidence has shown that genes may cause prenatal, neonatal, and pediatric diseases depending on their parental origins. Statistical models that incorporate parent-of-origin effects (POEs) can improve the power of detecting disease-associated genes and help explain the missing heritability of diseases. In many studies, children have been sequenced for genome-wide association testing. But it may become unaffordable to sequence their parents and evaluate POEs. Motivated by the reality, we proposed a budget-friendly study design of sequencing children and only genotyping their parents through single nucleotide polymorphism array. We developed a powerful likelihood-based method, which takes into account both sequence reads and linkage disequilibrium to infer the parental origins of children's alleles and estimate their POEs on the outcome. We evaluated the performance of our proposed method and compared it with an existing method using only genotypes, through extensive simulations. Our method showed higher power than the genotype-based method. When either the mean read depth or the pair-end length was reasonably large, our method achieved ideal power. When single parents' genotypes were unavailable or parental genotypes at the testing locus were not typed, both methods lost power compared with when complete data were available; but the power loss from our method was smaller than the genotype-based method. We also extended our method to accommodate mixed genotype, low-, and high-coverage sequence data from children and their parents. At presence of sequence errors, low-coverage parental sequence data may lead to lower power than parental genotype data. © 2017 WILEY PERIODICALS, INC.
Debladis, Emilie; Llauro, Christel; Carpentier, Marie-Christine; Mirouze, Marie; Panaud, Olivier
2017-07-17
Transposables elements (TEs) contribute to both structural and functional dynamics of most eukaryotic genomes. Because of their propensity to densely populate plant and animal genomes, the precise estimation of the impact of transposition on genomic diversity has been considered as one of the main challenges of today's genomics. The recent development of NGS (next generation sequencing) technologies has open new perspectives in population genomics by providing new methods for high throughput detection of Transposable Elements-associated Structural Variants (TEASV). However, these have relied on Illumina platform that generates short reads (up to 350 nucleotides). This limitation in size of sequence reads can cause high false discovery rate (FDR) and therefore limit the power of detection of TEASVs, especially in the case of large, complex genomes. The newest sequencing technologies, such as Oxford Nanopore Technologies (ONT) can generate kilobases-long reads thus representing a promising tool for TEASV detection in plant and animals. We present the results of a pilot experiment for TEASV detection on the model plant species Arabidopsis thaliana using ONT sequencing and show that it can be used efficiently to detect TE movements. We generated a ~0.8X genome coverage of a met1-derived epigenetic recombinant inbred line (epiRIL) using a MinIon device with R7 chemistry. We were able to detect nine new copies of the LTR-retrotransposon Evadé (EVD). We also evidenced the activity of the DNA transposon CACTA, CAC1. Even at a low sequence coverage (0.8X), ONT sequencing allowed us to reliably detect several TE insertions in Arabidopsis thaliana genome. The long read length allowed a precise and un-ambiguous mapping of the structural variations caused by the activity of TEs. This suggests that the trade-off between read length and genome coverage for TEASV detection may be in favor of the former. Should the technology be further improved both in terms of lower error rate and operation costs, it could be efficiently used in diversity studies at population level.
Yang, Rendong; Nelson, Andrew C; Henzler, Christine; Thyagarajan, Bharat; Silverstein, Kevin A T
2015-12-07
Comprehensive identification of insertions/deletions (indels) across the full size spectrum from second generation sequencing is challenging due to the relatively short read length inherent in the technology. Different indel calling methods exist but are limited in detection to specific sizes with varying accuracy and resolution. We present ScanIndel, an integrated framework for detecting indels with multiple heuristics including gapped alignment, split reads and de novo assembly. Using simulation data, we demonstrate ScanIndel's superior sensitivity and specificity relative to several state-of-the-art indel callers across various coverage levels and indel sizes. ScanIndel yields higher predictive accuracy with lower computational cost compared with existing tools for both targeted resequencing data from tumor specimens and high coverage whole-genome sequencing data from the human NIST standard NA12878. Thus, we anticipate ScanIndel will improve indel analysis in both clinical and research settings. ScanIndel is implemented in Python, and is freely available for academic use at https://github.com/cauyrd/ScanIndel.
O'Brien, Heath E; Gong, Yunchen; Fung, Pauline; Wang, Pauline W; Guttman, David S
2011-01-01
Next-generation genomic technology has both greatly accelerated the pace of genome research as well as increased our reliance on draft genome sequences. While groups such as the Genomics Standards Consortium have made strong efforts to promote genome standards there is a still a general lack of uniformity among published draft genomes, leading to challenges for downstream comparative analyses. This lack of uniformity is a particular problem when using standard draft genomes that frequently have large numbers of low-quality sequencing tracts. Here we present a proposal for an "enhanced-quality draft" genome that identifies at least 95% of the coding sequences, thereby effectively providing a full accounting of the genic component of the genome. Enhanced-quality draft genomes are easily attainable through a combination of small- and large-insert next-generation, paired-end sequencing. We illustrate the generation of an enhanced-quality draft genome by re-sequencing the plant pathogenic bacterium Pseudomonas syringae pv. phaseolicola 1448A (Pph 1448A), which has a published, closed genome sequence of 5.93 Mbp. We use a combination of Illumina paired-end and mate-pair sequencing, and surprisingly find that de novo assemblies with 100x paired-end coverage and mate-pair sequencing with as low as low as 2-5x coverage are substantially better than assemblies based on higher coverage. The rapid and low-cost generation of large numbers of enhanced-quality draft genome sequences will be of particular value for microbial diagnostics and biosecurity, which rely on precise discrimination of potentially dangerous clones from closely related benign strains.
Using Poisson mixed-effects model to quantify transcript-level gene expression in RNA-Seq.
Hu, Ming; Zhu, Yu; Taylor, Jeremy M G; Liu, Jun S; Qin, Zhaohui S
2012-01-01
RNA sequencing (RNA-Seq) is a powerful new technology for mapping and quantifying transcriptomes using ultra high-throughput next-generation sequencing technologies. Using deep sequencing, gene expression levels of all transcripts including novel ones can be quantified digitally. Although extremely promising, the massive amounts of data generated by RNA-Seq, substantial biases and uncertainty in short read alignment pose challenges for data analysis. In particular, large base-specific variation and between-base dependence make simple approaches, such as those that use averaging to normalize RNA-Seq data and quantify gene expressions, ineffective. In this study, we propose a Poisson mixed-effects (POME) model to characterize base-level read coverage within each transcript. The underlying expression level is included as a key parameter in this model. Since the proposed model is capable of incorporating base-specific variation as well as between-base dependence that affect read coverage profile throughout the transcript, it can lead to improved quantification of the true underlying expression level. POME can be freely downloaded at http://www.stat.purdue.edu/~yuzhu/pome.html. yuzhu@purdue.edu; zhaohui.qin@emory.edu Supplementary data are available at Bioinformatics online.
Shannon C.K. Straub; Mark Fishbein; Tatyana Livshult; Zachary Foster; Matthew Parks; Kevin Weitemier; Richard C. Cronn; Aaron Liston
2011-01-01
Milkweeds (Asclepias L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L.) could be useful in...
Sepúlveda, Nuno; Campino, Susana G; Assefa, Samuel A; Sutherland, Colin J; Pain, Arnab; Clark, Taane G
2013-02-26
The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the respective coverage, and detecting regions with too-low or too-high coverage (deletions and amplifications, respectively). Current CNV detection methods rely on statistical assumptions (e.g., a Poisson model) that may not hold in general, or require fine-tuning the underlying algorithms to detect known hits. We propose a new CNV detection methodology based on two Poisson hierarchical models, the Poisson-Gamma and Poisson-Lognormal, with the advantage of being sufficiently flexible to describe different data patterns, whilst robust against deviations from the often assumed Poisson model. Using sequence coverage data of 7 Plasmodium falciparum malaria genomes (3D7 reference strain, HB3, DD2, 7G8, GB4, OX005, and OX006), we showed that empirical coverage distributions are intrinsically asymmetric and overdispersed in relation to the Poisson model. We also demonstrated a low baseline false positive rate for the proposed methodology using 3D7 resequencing data and simulation. When applied to the non-reference isolate data, our approach detected known CNV hits, including an amplification of the PfMDR1 locus in DD2 and a large deletion in the CLAG3.2 gene in GB4, and putative novel CNV regions. When compared to the recently available FREEC and cn.MOPS approaches, our findings were more concordant with putative hits from the highest quality array data for the 7G8 and GB4 isolates. In summary, the proposed methodology brings an increase in flexibility, robustness, accuracy and statistical rigour to CNV detection using sequence coverage data.
Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic
Yebra, Gonzalo; Hodcroft, Emma B.; Ragonnet-Cronin, Manon L.; Pillay, Deenan; Brown, Andrew J. Leigh; Fraser, Christophe; Kellam, Paul; de Oliveira, Tulio; Dennis, Ann; Hoppe, Anne; Kityo, Cissy; Frampton, Dan; Ssemwanga, Deogratius; Tanser, Frank; Keshani, Jagoda; Lingappa, Jairam; Herbeck, Joshua; Wawer, Maria; Essex, Max; Cohen, Myron S.; Paton, Nicholas; Ratmann, Oliver; Kaleebu, Pontiano; Hayes, Richard; Fidler, Sarah; Quinn, Thomas; Novitsky, Vladimir; Haywards, Andrew; Nastouli, Eleni; Morris, Steven; Clark, Duncan; Kozlakidis, Zisis
2016-01-01
HIV molecular epidemiology studies analyse viral pol gene sequences due to their availability, but whole genome sequencing allows to use other genes. We aimed to determine what gene(s) provide(s) the best approximation to the real phylogeny by analysing a simulated epidemic (created as part of the PANGEA_HIV project) with a known transmission tree. We sub-sampled a simulated dataset of 4662 sequences into different combinations of genes (gag-pol-env, gag-pol, gag, pol, env and partial pol) and sampling depths (100%, 60%, 20% and 5%), generating 100 replicates for each case. We built maximum-likelihood trees for each combination using RAxML (GTR + Γ), and compared their topologies to the corresponding true tree’s using CompareTree. The accuracy of the trees was significantly proportional to the length of the sequences used, with the gag-pol-env datasets showing the best performance and gag and partial pol sequences showing the worst. The lowest sampling depths (20% and 5%) greatly reduced the accuracy of tree reconstruction and showed high variability among replicates, especially when using the shortest gene datasets. In conclusion, using longer sequences derived from nearly whole genomes will improve the reliability of phylogenetic reconstruction. With low sample coverage, results can be highly variable, particularly when based on short sequences. PMID:28008945
Yebra, Gonzalo; Hodcroft, Emma B; Ragonnet-Cronin, Manon L; Pillay, Deenan; Brown, Andrew J Leigh
2016-12-23
HIV molecular epidemiology studies analyse viral pol gene sequences due to their availability, but whole genome sequencing allows to use other genes. We aimed to determine what gene(s) provide(s) the best approximation to the real phylogeny by analysing a simulated epidemic (created as part of the PANGEA_HIV project) with a known transmission tree. We sub-sampled a simulated dataset of 4662 sequences into different combinations of genes (gag-pol-env, gag-pol, gag, pol, env and partial pol) and sampling depths (100%, 60%, 20% and 5%), generating 100 replicates for each case. We built maximum-likelihood trees for each combination using RAxML (GTR + Γ), and compared their topologies to the corresponding true tree's using CompareTree. The accuracy of the trees was significantly proportional to the length of the sequences used, with the gag-pol-env datasets showing the best performance and gag and partial pol sequences showing the worst. The lowest sampling depths (20% and 5%) greatly reduced the accuracy of tree reconstruction and showed high variability among replicates, especially when using the shortest gene datasets. In conclusion, using longer sequences derived from nearly whole genomes will improve the reliability of phylogenetic reconstruction. With low sample coverage, results can be highly variable, particularly when based on short sequences.
Mutua, Martin Kavao; Kimani-Murage, Elizabeth; Ngomi, Nicholas; Ravn, Henrik; Mwaniki, Peter; Echoka, Elizabeth
2016-01-01
More efforts have been put in place to increase full immunization coverage rates in the last decade. Little is known about the levels and consequences of delaying or vaccinating children in different schedules. Vaccine effectiveness depends on the timing of its administration, and it is not optimal if given early, delayed or not given as recommended. Evidence of non-specific effects of vaccines is well documented and could be linked to timing and sequencing of immunization. This paper documents the levels of coverage, timing and sequencing of routine childhood vaccines. The study was conducted between 2007 and 2014 in two informal urban settlements in Nairobi. A total of 3856 children, aged 12-23 months and having a vaccination card seen were included in analysis. Vaccination dates recorded from the cards seen were used to define full immunization coverage, timeliness and sequencing. Proportions, medians and Kaplan-Meier curves were used to assess and describe the levels of full immunization coverage, vaccination delays and sequencing. The findings indicate that 67 % of the children were fully immunized by 12 months of age. Missing measles and third doses of polio and pentavalent vaccine were the main reason for not being fully immunized. Delays were highest for third doses of polio and pentavalent and measles. About 22 % of fully immunized children had vaccines in an out-of-sequence manner with 18 % not receiving pentavalent together with polio vaccine as recommended. Results show higher levels of missed opportunities and low coverage of routine childhood vaccinations given at later ages. New strategies are needed to enable health care providers and parents/guardians to work together to increase the levels of completion of all required vaccinations. In particular, more focus is needed on vaccines given in multiple doses (polio, pentavalent and pneumococcal conjugate vaccines).
Yu, Hui; Zhang, Victor Wei; Stray-Pedersen, Asbjørg; Hanson, Imelda Celine; Forbes, Lisa R; de la Morena, M Teresa; Chinn, Ivan K; Gorman, Elizabeth; Mendelsohn, Nancy J; Pozos, Tamara; Wiszniewski, Wojciech; Nicholas, Sarah K; Yates, Anne B; Moore, Lindsey E; Berge, Knut Erik; Sorte, Hanne; Bayer, Diana K; ALZahrani, Daifulah; Geha, Raif S; Feng, Yanming; Wang, Guoli; Orange, Jordan S; Lupski, James R; Wang, Jing; Wong, Lee-Jun
2016-10-01
Primary immunodeficiency diseases (PIDDs) are inherited disorders of the immune system. The most severe form, severe combined immunodeficiency (SCID), presents with profound deficiencies of T cells, B cells, or both at birth. If not treated promptly, affected patients usually do not live beyond infancy because of infections. Genetic heterogeneity of SCID frequently delays the diagnosis; a specific diagnosis is crucial for life-saving treatment and optimal management. We developed a next-generation sequencing (NGS)-based multigene-targeted panel for SCID and other severe PIDDs requiring rapid therapeutic actions in a clinical laboratory setting. The target gene capture/NGS assay provides an average read depth of approximately 1000×. The deep coverage facilitates simultaneous detection of single nucleotide variants and exonic copy number variants in one comprehensive assessment. Exons with insufficient coverage (<20× read depth) or high sequence homology (pseudogenes) are complemented by amplicon-based sequencing with specific primers to ensure 100% coverage of all targeted regions. Analysis of 20 patient samples with low T-cell receptor excision circle numbers on newborn screening or a positive family history or clinical suspicion of SCID or other severe PIDD identified deleterious mutations in 14 of them. Identified pathogenic variants included both single nucleotide variants and exonic copy number variants, such as hemizygous nonsense, frameshift, and missense changes in IL2RG; compound heterozygous changes in ATM, RAG1, and CIITA; homozygous changes in DCLRE1C and IL7R; and a heterozygous nonsense mutation in CHD7. High-throughput deep sequencing analysis with complete clinical validation greatly increases the diagnostic yield of severe primary immunodeficiency. Establishing a molecular diagnosis enables early immune reconstitution through prompt therapeutic intervention and guides management for improved long-term quality of life. Copyright © 2016 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.
SNV-PPILP: refined SNV calling for tumor data using perfect phylogenies and ILP.
van Rens, Karen E; Mäkinen, Veli; Tomescu, Alexandru I
2015-04-01
Recent studies sequenced tumor samples from the same progenitor at different development stages and showed that by taking into account the phylogeny of this development, single-nucleotide variant (SNV) calling can be improved. Accurate SNV calls can better reveal early-stage tumors, identify mechanisms of cancer progression or help in drug targeting. We present SNV-PPILP, a fast and easy to use tool for refining GATK's Unified Genotyper SNV calls, for multiple samples assumed to form a phylogeny. We tested SNV-PPILP on simulated data, with a varying number of samples, SNVs, read coverage and violations of the perfect phylogeny assumption. We always match or improve the accuracy of GATK, with a significant improvement on low read coverage. SNV-PPILP, available at cs.helsinki.fi/gsa/snv-ppilp/, is written in Python and requires the free ILP solver lp_solve. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Doyle, Stephen R; Griffith, Ian S; Murphy, Nick P; Strugnell, Jan M
2015-01-01
The complete mitochondrial genome of the Eastern Rock lobster, Sagmariasus verreauxi, is reported for the first time. Using low-coverage, long read MiSeq next generation sequencing, we constructed and determined the mtDNA genome organization of the 15,470 bp sequence from two isolates from Eastern Tasmania, Australia and Northern New Zealand, and identified 46 polymorphic nucleotides between the two sequences. This genome sequence and its genetic polymorphisms will likely be useful in understanding the distribution and population connectivity of the Eastern Rock Lobster, and in the fisheries management of this commercially important species.
Comparing de novo genome assembly: the long and short of it.
Narzisi, Giuseppe; Mishra, Bud
2011-04-29
Recent advances in DNA sequencing technology and their focal role in Genome Wide Association Studies (GWAS) have rekindled a growing interest in the whole-genome sequence assembly (WGSA) problem, thereby, inundating the field with a plethora of new formalizations, algorithms, heuristics and implementations. And yet, scant attention has been paid to comparative assessments of these assemblers' quality and accuracy. No commonly accepted and standardized method for comparison exists yet. Even worse, widely used metrics to compare the assembled sequences emphasize only size, poorly capturing the contig quality and accuracy. This paper addresses these concerns: it highlights common anomalies in assembly accuracy through a rigorous study of several assemblers, compared under both standard metrics (N50, coverage, contig sizes, etc.) as well as a more comprehensive metric (Feature-Response Curves, FRC) that is introduced here; FRC transparently captures the trade-offs between contigs' quality against their sizes. For this purpose, most of the publicly available major sequence assemblers--both for low-coverage long (Sanger) and high-coverage short (Illumina) reads technologies--are compared. These assemblers are applied to microbial (Escherichia coli, Brucella, Wolbachia, Staphylococcus, Helicobacter) and partial human genome sequences (Chr. Y), using sequence reads of various read-lengths, coverages, accuracies, and with and without mate-pairs. It is hoped that, based on these evaluations, computational biologists will identify innovative sequence assembly paradigms, bioinformaticists will determine promising approaches for developing "next-generation" assemblers, and biotechnologists will formulate more meaningful design desiderata for sequencing technology platforms. A new software tool for computing the FRC metric has been developed and is available through the AMOS open-source consortium.
Perez de Souza, Leonardo; Naake, Thomas; Tohge, Takayuki; Fernie, Alisdair R
2017-07-01
The grand challenge currently facing metabolomics is the expansion of the coverage of the metabolome from a minor percentage of the metabolic complement of the cell toward the level of coverage afforded by other post-genomic technologies such as transcriptomics and proteomics. In plants, this problem is exacerbated by the sheer diversity of chemicals that constitute the metabolome, with the number of metabolites in the plant kingdom generally considered to be in excess of 200 000. In this review, we focus on web resources that can be exploited in order to improve analyte and ultimately metabolite identification and quantification. There is a wide range of available software that not only aids in this but also in the related area of peak alignment; however, for the uninitiated, choosing which program to use is a daunting task. For this reason, we provide an overview of the pros and cons of the software as well as comments regarding the level of programing skills required to effectively exploit their basic functions. In addition, the torrent of available genome and transcriptome sequences that followed the advent of next-generation sequencing has opened up further valuable resources for metabolite identification. All things considered, we posit that only via a continued communal sharing of information such as that deposited in the databases described within the article are we likely to be able to make significant headway toward improving our coverage of the plant metabolome. © The Authors 2017. Published by Oxford University Press.
Madsen, James A.; Xu, Hua; Robinson, Michelle R.; Horton, Andrew P.; Shaw, Jared B.; Giles, David K.; Kaoud, Tamer S.; Dalby, Kevin N.; Trent, M. Stephen; Brodbelt, Jennifer S.
2013-01-01
The use of ultraviolet photodissociation (UVPD) for the activation and dissociation of peptide anions is evaluated for broader coverage of the proteome. To facilitate interpretation and assignment of the resulting UVPD mass spectra of peptide anions, the MassMatrix database search algorithm was modified to allow automated analysis of negative polarity MS/MS spectra. The new UVPD algorithms were developed based on the MassMatrix database search engine by adding specific fragmentation pathways for UVPD. The new UVPD fragmentation pathways in MassMatrix were rigorously and statistically optimized using two large data sets with high mass accuracy and high mass resolution for both MS1 and MS2 data acquired on an Orbitrap mass spectrometer for complex Halobacterium and HeLa proteome samples. Negative mode UVPD led to the identification of 3663 and 2350 peptides for the Halo and HeLa tryptic digests, respectively, corresponding to 655 and 645 peptides that were unique when compared with electron transfer dissociation (ETD), higher energy collision-induced dissociation, and collision-induced dissociation results for the same digests analyzed in the positive mode. In sum, 805 and 619 proteins were identified via UVPD for the Halobacterium and HeLa samples, respectively, with 49 and 50 unique proteins identified in contrast to the more conventional MS/MS methods. The algorithm also features automated charge determination for low mass accuracy data, precursor filtering (including intact charge-reduced peaks), and the ability to combine both positive and negative MS/MS spectra into a single search, and it is freely open to the public. The accuracy and specificity of the MassMatrix UVPD search algorithm was also assessed for low resolution, low mass accuracy data on a linear ion trap. Analysis of a known mixture of three mitogen-activated kinases yielded similar sequence coverage percentages for UVPD of peptide anions versus conventional collision-induced dissociation of peptide cations, and when these methods were combined into a single search, an increase of up to 13% sequence coverage was observed for the kinases. The ability to sequence peptide anions and cations in alternating scans in the same chromatographic run was also demonstrated. Because ETD has a significant bias toward identifying highly basic peptides, negative UVPD was used to improve the identification of the more acidic peptides in conjunction with positive ETD for the more basic species. In this case, tryptic peptides from the cytosolic section of HeLa cells were analyzed by polarity switching nanoLC-MS/MS utilizing ETD for cation sequencing and UVPD for anion sequencing. Relative to searching using ETD alone, positive/negative polarity switching significantly improved sequence coverages across identified proteins, resulting in a 33% increase in unique peptide identifications and more than twice the number of peptide spectral matches. PMID:23695934
Mitt, Mario; Kals, Mart; Pärn, Kalle; Gabriel, Stacey B; Lander, Eric S; Palotie, Aarno; Ripatti, Samuli; Morris, Andrew P; Metspalu, Andres; Esko, Tõnu; Mägi, Reedik; Palta, Priit
2017-06-01
Genetic imputation is a cost-efficient way to improve the power and resolution of genome-wide association (GWA) studies. Current publicly accessible imputation reference panels accurately predict genotypes for common variants with minor allele frequency (MAF)≥5% and low-frequency variants (0.5≤MAF<5%) across diverse populations, but the imputation of rare variation (MAF<0.5%) is still rather limited. In the current study, we evaluate imputation accuracy achieved with reference panels from diverse populations with a population-specific high-coverage (30 ×) whole-genome sequencing (WGS) based reference panel, comprising of 2244 Estonian individuals (0.25% of adult Estonians). Although the Estonian-specific panel contains fewer haplotypes and variants, the imputation confidence and accuracy of imputed low-frequency and rare variants was significantly higher. The results indicate the utility of population-specific reference panels for human genetic studies.
Mitt, Mario; Kals, Mart; Pärn, Kalle; Gabriel, Stacey B; Lander, Eric S; Palotie, Aarno; Ripatti, Samuli; Morris, Andrew P; Metspalu, Andres; Esko, Tõnu; Mägi, Reedik; Palta, Priit
2017-01-01
Genetic imputation is a cost-efficient way to improve the power and resolution of genome-wide association (GWA) studies. Current publicly accessible imputation reference panels accurately predict genotypes for common variants with minor allele frequency (MAF)≥5% and low-frequency variants (0.5≤MAF<5%) across diverse populations, but the imputation of rare variation (MAF<0.5%) is still rather limited. In the current study, we evaluate imputation accuracy achieved with reference panels from diverse populations with a population-specific high-coverage (30 ×) whole-genome sequencing (WGS) based reference panel, comprising of 2244 Estonian individuals (0.25% of adult Estonians). Although the Estonian-specific panel contains fewer haplotypes and variants, the imputation confidence and accuracy of imputed low-frequency and rare variants was significantly higher. The results indicate the utility of population-specific reference panels for human genetic studies. PMID:28401899
USDA-ARS?s Scientific Manuscript database
Copy number variations (CNVs) are large insertions, deletions or duplications in the genome that vary between members of a species and are known to affect a wide variety of phenotypic traits. In this study, we identified CNVs in a population of bulls using low coverage next-generation sequence data....
DOE Office of Scientific and Technical Information (OSTI.GOV)
Golbus, Jessica R.; Puckelwartz, Megan J.; Dellefave-Castillo, Lisa
Background—Cardiomyopathy is highly heritable but genetically diverse. At present, genetic testing for cardiomyopathy uses targeted sequencing to simultaneously assess the coding regions of more than 50 genes. New genes are routinely added to panels to improve the diagnostic yield. With the anticipated $1000 genome, it is expected that genetic testing will shift towards comprehensive genome sequencing accompanied by targeted gene analysis. Therefore, we assessed the reliability of whole genome sequencing and targeted analysis to identify cardiomyopathy variants in 11 subjects with cardiomyopathy. Methods and Results—Whole genome sequencing with an average of 37× coverage was combined with targeted analysis focused onmore » 204 genes linked to cardiomyopathy. Genetic variants were scored using multiple prediction algorithms combined with frequency data from public databases. This pipeline yielded 1-14 potentially pathogenic variants per individual. Variants were further analyzed using clinical criteria and/or segregation analysis. Three of three previously identified primary mutations were detected by this analysis. In six subjects for whom the primary mutation was previously unknown, we identified mutations that segregated with disease, had clinical correlates, and/or had additional pathological correlation to provide evidence for causality. For two subjects with previously known primary mutations, we identified additional variants that may act as modifiers of disease severity. In total, we identified the likely pathological mutation in 9 of 11 (82%) subjects. We conclude that these pilot data demonstrate that ~30-40× coverage whole genome sequencing combined with targeted analysis is feasible and sensitive to identify rare variants in cardiomyopathy-associated genes.« less
Protein family clustering for structural genomics.
Yan, Yongpan; Moult, John
2005-10-28
A major goal of structural genomics is the provision of a structural template for a large fraction of protein domains. The magnitude of this task depends on the number and nature of protein sequence families. With a large number of bacterial genomes now fully sequenced, it is possible to obtain improved estimates of the number and diversity of families in that kingdom. We have used an automated clustering procedure to group all sequences in a set of genomes into protein families. Bench-marking shows the clustering method is sensitive at detecting remote family members, and has a low level of false positives. This comprehensive protein family set has been used to address the following questions. (1) What is the structure coverage for currently known families? (2) How will the number of known apparent families grow as more genomes are sequenced? (3) What is a practical strategy for maximizing structure coverage in future? Our study indicates that approximately 20% of known families with three or more members currently have a representative structure. The study indicates also that the number of apparent protein families will be considerably larger than previously thought: We estimate that, by the criteria of this work, there will be about 250,000 protein families when 1000 microbial genomes have been sequenced. However, the vast majority of these families will be small, and it will be possible to obtain structural templates for 70-80% of protein domains with an achievable number of representative structures, by systematically sampling the larger families.
Chan, K. C. Allen; Jiang, Peiyong; Sun, Kun; Cheng, Yvonne K. Y.; Tong, Yu K.; Cheng, Suk Hang; Wong, Ada I. C.; Hudecova, Irena; Leung, Tak Y.; Chiu, Rossa W. K.; Lo, Yuk Ming Dennis
2016-01-01
Plasma DNA obtained from a pregnant woman was sequenced to a depth of 270× haploid genome coverage. Comparing the maternal plasma DNA sequencing data with the parental genomic DNA data and using a series of bioinformatics filters, fetal de novo mutations were detected at a sensitivity of 85% and a positive predictive value of 74%. These results represent a 169-fold improvement in the positive predictive value over previous attempts. Improvements in the interpretation of the sequence information of every base position in the genome allowed us to interrogate the maternal inheritance of the fetus for 618,271 of 656,676 (94.2%) heterozygous SNPs within the maternal genome. The fetal genotype at each of these sites was deduced individually, unlike previously, where the inheritance was determined for a collection of sites within a haplotype. These results represent a 90-fold enhancement in the resolution in determining the fetus’s maternal inheritance. Selected genomic locations were more likely to be found at the ends of plasma DNA molecules. We found that a subset of such preferred ends exhibited selectivity for fetal- or maternal-derived DNA in maternal plasma. The ratio of the number of maternal plasma DNA molecules with fetal preferred ends to those with maternal preferred ends showed a correlation with the fetal DNA fraction. Finally, this second generation approach for noninvasive fetal whole-genome analysis was validated in a pregnancy diagnosed with cardiofaciocutaneous syndrome with maternal plasma DNA sequenced to 195× coverage. The causative de novo BRAF mutation was successfully detected through the maternal plasma DNA analysis. PMID:27799561
Initial sequence and comparative analysis of the cat genome
Pontius, Joan U.; Mullikin, James C.; Smith, Douglas R.; Lindblad-Toh, Kerstin; Gnerre, Sante; Clamp, Michele; Chang, Jean; Stephens, Robert; Neelam, Beena; Volfovsky, Natalia; Schäffer, Alejandro A.; Agarwala, Richa; Narfström, Kristina; Murphy, William J.; Giger, Urs; Roca, Alfred L.; Antunes, Agostinho; Menotti-Raymond, Marilyn; Yuhki, Naoya; Pecon-Slattery, Jill; Johnson, Warren E.; Bourque, Guillaume; Tesler, Glenn; O’Brien, Stephen J.
2007-01-01
The genome sequence (1.9-fold coverage) of an inbred Abyssinian domestic cat was assembled, mapped, and annotated with a comparative approach that involved cross-reference to annotated genome assemblies of six mammals (human, chimpanzee, mouse, rat, dog, and cow). The results resolved chromosomal positions for 663,480 contigs, 20,285 putative feline gene orthologs, and 133,499 conserved sequence blocks (CSBs). Additional annotated features include repetitive elements, endogenous retroviral sequences, nuclear mitochondrial (numt) sequences, micro-RNAs, and evolutionary breakpoints that suggest historic balancing of translocation and inversion incidences in distinct mammalian lineages. Large numbers of single nucleotide polymorphisms (SNPs), deletion insertion polymorphisms (DIPs), and short tandem repeats (STRs), suitable for linkage or association studies were characterized in the context of long stretches of chromosome homozygosity. In spite of the light coverage capturing ∼65% of euchromatin sequence from the cat genome, these comparative insights shed new light on the tempo and mode of gene/genome evolution in mammals, promise several research applications for the cat, and also illustrate that a comparative approach using more deeply covered mammals provides an informative, preliminary annotation of a light (1.9-fold) coverage mammal genome sequence. PMID:17975172
Targeted Analysis of Whole Genome Sequence Data to Diagnose Genetic Cardiomyopathy
Golbus, Jessica R.; Puckelwartz, Megan J.; Dellefave-Castillo, Lisa; ...
2014-09-01
Background—Cardiomyopathy is highly heritable but genetically diverse. At present, genetic testing for cardiomyopathy uses targeted sequencing to simultaneously assess the coding regions of more than 50 genes. New genes are routinely added to panels to improve the diagnostic yield. With the anticipated $1000 genome, it is expected that genetic testing will shift towards comprehensive genome sequencing accompanied by targeted gene analysis. Therefore, we assessed the reliability of whole genome sequencing and targeted analysis to identify cardiomyopathy variants in 11 subjects with cardiomyopathy. Methods and Results—Whole genome sequencing with an average of 37× coverage was combined with targeted analysis focused onmore » 204 genes linked to cardiomyopathy. Genetic variants were scored using multiple prediction algorithms combined with frequency data from public databases. This pipeline yielded 1-14 potentially pathogenic variants per individual. Variants were further analyzed using clinical criteria and/or segregation analysis. Three of three previously identified primary mutations were detected by this analysis. In six subjects for whom the primary mutation was previously unknown, we identified mutations that segregated with disease, had clinical correlates, and/or had additional pathological correlation to provide evidence for causality. For two subjects with previously known primary mutations, we identified additional variants that may act as modifiers of disease severity. In total, we identified the likely pathological mutation in 9 of 11 (82%) subjects. We conclude that these pilot data demonstrate that ~30-40× coverage whole genome sequencing combined with targeted analysis is feasible and sensitive to identify rare variants in cardiomyopathy-associated genes.« less
Wilson, Kitchener D; Shen, Peidong; Fung, Eula; Karakikes, Ioannis; Zhang, Angela; InanlooRahatloo, Kolsoum; Odegaard, Justin; Sallam, Karim; Davis, Ronald W; Lui, George K; Ashley, Euan A; Scharfe, Curt; Wu, Joseph C
2015-09-11
Thousands of mutations across >50 genes have been implicated in inherited cardiomyopathies. However, options for sequencing this rapidly evolving gene set are limited because many sequencing services and off-the-shelf kits suffer from slow turnaround, inefficient capture of genomic DNA, and high cost. Furthermore, customization of these assays to cover emerging targets that suit individual needs is often expensive and time consuming. We sought to develop a custom high throughput, clinical-grade next-generation sequencing assay for detecting cardiac disease gene mutations with improved accuracy, flexibility, turnaround, and cost. We used double-stranded probes (complementary long padlock probes), an inexpensive and customizable capture technology, to efficiently capture and amplify the entire coding region and flanking intronic and regulatory sequences of 88 genes and 40 microRNAs associated with inherited cardiomyopathies, congenital heart disease, and cardiac development. Multiplexing 11 samples per sequencing run resulted in a mean base pair coverage of 420, of which 97% had >20× coverage and >99% were concordant with known heterozygous single nucleotide polymorphisms. The assay correctly detected germline variants in 24 individuals and revealed several polymorphic regions in miR-499. Total run time was 3 days at an approximate cost of $100 per sample. Accurate, high-throughput detection of mutations across numerous cardiac genes is achievable with complementary long padlock probe technology. Moreover, this format allows facile insertion of additional probes as more cardiomyopathy and congenital heart disease genes are discovered, giving researchers a powerful new tool for DNA mutation detection and discovery. © 2015 American Heart Association, Inc.
Dfam: a database of repetitive DNA based on profile hidden Markov models.
Wheeler, Travis J; Clements, Jody; Eddy, Sean R; Hubley, Robert; Jones, Thomas A; Jurka, Jerzy; Smit, Arian F A; Finn, Robert D
2013-01-01
We present a database of repetitive DNA elements, called Dfam (http://dfam.janelia.org). Many genomes contain a large fraction of repetitive DNA, much of which is made up of remnants of transposable elements (TEs). Accurate annotation of TEs enables research into their biology and can shed light on the evolutionary processes that shape genomes. Identification and masking of TEs can also greatly simplify many downstream genome annotation and sequence analysis tasks. The commonly used TE annotation tools RepeatMasker and Censor depend on sequence homology search tools such as cross_match and BLAST variants, as well as Repbase, a collection of known TE families each represented by a single consensus sequence. Dfam contains entries corresponding to all Repbase TE entries for which instances have been found in the human genome. Each Dfam entry is represented by a profile hidden Markov model, built from alignments generated using RepeatMasker and Repbase. When used in conjunction with the hidden Markov model search tool nhmmer, Dfam produces a 2.9% increase in coverage over consensus sequence search methods on a large human benchmark, while maintaining low false discovery rates, and coverage of the full human genome is 54.5%. The website provides a collection of tools and data views to support improved TE curation and annotation efforts. Dfam is also available for download in flat file format or in the form of MySQL table dumps.
RefCNV: Identification of Gene-Based Copy Number Variants Using Whole Exome Sequencing.
Chang, Lun-Ching; Das, Biswajit; Lih, Chih-Jian; Si, Han; Camalier, Corinne E; McGregor, Paul M; Polley, Eric
2016-01-01
With rapid advances in DNA sequencing technologies, whole exome sequencing (WES) has become a popular approach for detecting somatic mutations in oncology studies. The initial intent of WES was to characterize single nucleotide variants, but it was observed that the number of sequencing reads that mapped to a genomic region correlated with the DNA copy number variants (CNVs). We propose a method RefCNV that uses a reference set to estimate the distribution of the coverage for each exon. The construction of the reference set includes an evaluation of the sources of variability in the coverage distribution. We observed that the processing steps had an impact on the coverage distribution. For each exon, we compared the observed coverage with the expected normal coverage. Thresholds for determining CNVs were selected to control the false-positive error rate. RefCNV prediction correlated significantly (r = 0.96-0.86) with CNV measured by digital polymerase chain reaction for MET (7q31), EGFR (7p12), or ERBB2 (17q12) in 13 tumor cell lines. The genome-wide CNV analysis showed a good overall correlation (Spearman's coefficient = 0.82) between RefCNV estimation and publicly available CNV data in Cancer Cell Line Encyclopedia. RefCNV also showed better performance than three other CNV estimation methods in genome-wide CNV analysis.
Two low coverage bird genomes and a comparison of reference-guided versus de novo genome assemblies.
Card, Daren C; Schield, Drew R; Reyes-Velasco, Jacobo; Fujita, Matthew K; Andrew, Audra L; Oyler-McCance, Sara J; Fike, Jennifer A; Tomback, Diana F; Ruggiero, Robert P; Castoe, Todd A
2014-01-01
As a greater number and diversity of high-quality vertebrate reference genomes become available, it is increasingly feasible to use these references to guide new draft assemblies for related species. Reference-guided assembly approaches may substantially increase the contiguity and completeness of a new genome using only low levels of genome coverage that might otherwise be insufficient for de novo genome assembly. We used low-coverage (∼3.5-5.5x) Illumina paired-end sequencing to assemble draft genomes of two bird species (the Gunnison Sage-Grouse, Centrocercus minimus, and the Clark's Nutcracker, Nucifraga columbiana). We used these data to estimate de novo genome assemblies and reference-guided assemblies, and compared the information content and completeness of these assemblies by comparing CEGMA gene set representation, repeat element content, simple sequence repeat content, and GC isochore structure among assemblies. Our results demonstrate that even lower-coverage genome sequencing projects are capable of producing informative and useful genomic resources, particularly through the use of reference-guided assemblies.
Two low coverage bird genomes and a comparison of reference-guided versus de novo genome assemblies
Card, Daren C.; Schield, Drew R.; Reyes-Velasco, Jacobo; Fujita, Matthre K.; Andrew, Audra L.; Oyler-McCance, Sara J.; Fike, Jennifer A.; Tomback, Diana F.; Ruggiero, Robert P.; Castoe, Todd A.
2014-01-01
As a greater number and diversity of high-quality vertebrate reference genomes become available, it is increasingly feasible to use these references to guide new draft assemblies for related species. Reference-guided assembly approaches may substantially increase the contiguity and completeness of a new genome using only low levels of genome coverage that might otherwise be insufficient for de novo genome assembly. We used low-coverage (~3.5–5.5x) Illumina paired-end sequencing to assemble draft genomes of two bird species (the Gunnison Sage-Grouse, Centrocercus minimus, and the Clark's Nutcracker, Nucifraga columbiana). We used these data to estimate de novo genome assemblies and reference-guided assemblies, and compared the information content and completeness of these assemblies by comparing CEGMA gene set representation, repeat element content, simple sequence repeat content, and GC isochore structure among assemblies. Our results demonstrate that even lower-coverage genome sequencing projects are capable of producing informative and useful genomic resources, particularly through the use of reference-guided assemblies.
Calibrating genomic and allelic coverage bias in single-cell sequencing.
Zhang, Cheng-Zhong; Adalsteinsson, Viktor A; Francis, Joshua; Cornils, Hauke; Jung, Joonil; Maire, Cecile; Ligon, Keith L; Meyerson, Matthew; Love, J Christopher
2015-04-16
Artifacts introduced in whole-genome amplification (WGA) make it difficult to derive accurate genomic information from single-cell genomes and require different analytical strategies from bulk genome analysis. Here, we describe statistical methods to quantitatively assess the amplification bias resulting from whole-genome amplification of single-cell genomic DNA. Analysis of single-cell DNA libraries generated by different technologies revealed universal features of the genome coverage bias predominantly generated at the amplicon level (1-10 kb). The magnitude of coverage bias can be accurately calibrated from low-pass sequencing (∼0.1 × ) to predict the depth-of-coverage yield of single-cell DNA libraries sequenced at arbitrary depths. We further provide a benchmark comparison of single-cell libraries generated by multi-strand displacement amplification (MDA) and multiple annealing and looping-based amplification cycles (MALBAC). Finally, we develop statistical models to calibrate allelic bias in single-cell whole-genome amplification and demonstrate a census-based strategy for efficient and accurate variant detection from low-input biopsy samples.
Calibrating genomic and allelic coverage bias in single-cell sequencing
Francis, Joshua; Cornils, Hauke; Jung, Joonil; Maire, Cecile; Ligon, Keith L.; Meyerson, Matthew; Love, J. Christopher
2016-01-01
Artifacts introduced in whole-genome amplification (WGA) make it difficult to derive accurate genomic information from single-cell genomes and require different analytical strategies from bulk genome analysis. Here, we describe statistical methods to quantitatively assess the amplification bias resulting from whole-genome amplification of single-cell genomic DNA. Analysis of single-cell DNA libraries generated by different technologies revealed universal features of the genome coverage bias predominantly generated at the amplicon level (1–10 kb). The magnitude of coverage bias can be accurately calibrated from low-pass sequencing (~0.1 ×) to predict the depth-of-coverage yield of single-cell DNA libraries sequenced at arbitrary depths. We further provide a benchmark comparison of single-cell libraries generated by multi-strand displacement amplification (MDA) and multiple annealing and looping-based amplification cycles (MALBAC). Finally, we develop statistical models to calibrate allelic bias in single-cell whole-genome amplification and demonstrate a census-based strategy for efficient and accurate variant detection from low-input biopsy samples. PMID:25879913
2011-01-01
Background One of the key goals of oak genomics research is to identify genes of adaptive significance. This information may help to improve the conservation of adaptive genetic variation and the management of forests to increase their health and productivity. Deep-coverage large-insert genomic libraries are a crucial tool for attaining this objective. We report herein the construction of a BAC library for Quercus robur, its characterization and an analysis of BAC end sequences. Results The EcoRI library generated consisted of 92,160 clones, 7% of which had no insert. Levels of chloroplast and mitochondrial contamination were below 3% and 1%, respectively. Mean clone insert size was estimated at 135 kb. The library represents 12 haploid genome equivalents and, the likelihood of finding a particular oak sequence of interest is greater than 99%. Genome coverage was confirmed by PCR screening of the library with 60 unique genetic loci sampled from the genetic linkage map. In total, about 20,000 high-quality BAC end sequences (BESs) were generated by sequencing 15,000 clones. Roughly 5.88% of the combined BAC end sequence length corresponded to known retroelements while ab initio repeat detection methods identified 41 additional repeats. Collectively, characterized and novel repeats account for roughly 8.94% of the genome. Further analysis of the BESs revealed 1,823 putative genes suggesting at least 29,340 genes in the oak genome. BESs were aligned with the genome sequences of Arabidopsis thaliana, Vitis vinifera and Populus trichocarpa. One putative collinear microsyntenic region encoding an alcohol acyl transferase protein was observed between oak and chromosome 2 of V. vinifera. Conclusions This BAC library provides a new resource for genomic studies, including SSR marker development, physical mapping, comparative genomics and genome sequencing. BES analysis provided insight into the structure of the oak genome. These sequences will be used in the assembly of a future genome sequence for oak. PMID:21645357
Hybrid error correction and de novo assembly of single-molecule sequencing reads
Koren, Sergey; Schatz, Michael C.; Walenz, Brian P.; Martin, Jeffrey; Howard, Jason; Ganapathy, Ganeshkumar; Wang, Zhong; Rasko, David A.; McCombie, W. Richard; Jarvis, Erich D.; Phillippy, Adam M.
2012-01-01
Emerging single-molecule sequencing instruments can generate multi-kilobase sequences with the potential to dramatically improve genome and transcriptome assembly. However, the high error rate of single-molecule reads is challenging, and has limited their use to resequencing bacteria. To address this limitation, we introduce a novel correction algorithm and assembly strategy that utilizes shorter, high-identity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on Pacbio RS reads of phage, prokaryotic, and eukaryotic whole genomes, including the novel genome of the parrot Melopsittacus undulatus, as well as for RNA-seq reads of the corn (Zea mays) transcriptome. Our approach achieves over 99.9% read correction accuracy and produces substantially better assemblies than current sequencing strategies: in the best example, quintupling the median contig size relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly. PMID:22750884
NASA Astrophysics Data System (ADS)
Long, Ying; Wood, Troy D.
2015-01-01
Most enzymatic microreactors for protein digestion are based on trypsin, but proteins with hydrophobic segments may be difficult to digest because of the paucity of Arg and Lys residues. Microreactors based on pepsin, which is less specific than trypsin, can overcome this challenge. Here, an integrated immobilized pepsin microreactor (IPMR)/nanoelectrospray emitter is examined for its potential for peptide mapping. For myoglobin, equivalent sequence coverage is obtained in a thousandth the time of solution digestion with better sequence coverage. While sequence coverage of cytochrome c is lesser than solution in this short duration, more highly-charged peptic peptides are produced and a number of peaks are unidentified at low-resolution, suggesting that high-resolution mass spectrometry is needed to take full advantage of integrated IPMR/nanoelectrospray devices.
An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing.
Zimin, Aleksey V; Stevens, Kristian A; Crepeau, Marc W; Puiu, Daniela; Wegrzyn, Jill L; Yorke, James A; Langley, Charles H; Neale, David B; Salzberg, Steven L
2017-01-01
The 22-gigabase genome of loblolly pine (Pinus taeda) is one of the largest ever sequenced. The draft assembly published in 2014 was built entirely from short Illumina reads, with lengths ranging from 100 to 250 base pairs (bp). The assembly was quite fragmented, containing over 11 million contigs whose weighted average (N50) size was 8206 bp. To improve this result, we generated approximately 12-fold coverage in long reads using the Single Molecule Real Time sequencing technology developed at Pacific Biosciences. We assembled the long and short reads together using the MaSuRCA mega-reads assembly algorithm, which produced a substantially better assembly, P. taeda version 2.0. The new assembly has an N50 contig size of 25 361, more than three times as large as achieved in the original assembly, and an N50 scaffold size of 107 821, 61% larger than the previous assembly. © The Author 2017. Published by Oxford University Press.
Zimin, Aleksey V; Stevens, Kristian A; Crepeau, Marc W; Puiu, Daniela; Wegrzyn, Jill L; Yorke, James A; Langley, Charles H; Neale, David B; Salzberg, Steven L
2017-10-01
The 22-gigabase genome of loblolly pine (Pinus taeda) is one of the largest ever sequenced. The draft assembly published in 2014 was built entirely from short Illumina reads, with lengths ranging from 100 to 250 base pairs (bp). The assembly was quite fragmented, containing over 11 million contigs whose weighted average (N50) size was 8206 bp. To improve this result, we generated approximately 12-fold coverage in long reads using the Single Molecule Real Time sequencing technology developed at Pacific Biosciences. We assembled the long and short reads together using the MaSuRCA mega-reads assembly algorithm, which produced a substantially better assembly, P. taeda version 2.0. The new assembly has an N50 contig size of 25 361, more than three times as large as achieved in the original assembly, and an N50 scaffold size of 107 821, 61% larger than the previous assembly. © The Authors 2017. Published by Oxford University Press.
Tabata, Ryo; Kamiya, Takehiro; Shigenobu, Shuji; Yamaguchi, Katsushi; Yamada, Masashi; Hasebe, Mitsuyasu; Fujiwara, Toru; Sawa, Shinichiro
2013-01-01
Next-generation sequencing (NGS) technologies enable the rapid production of an enormous quantity of sequence data. These powerful new technologies allow the identification of mutations by whole-genome sequencing. However, most reported NGS-based mapping methods, which are based on bulked segregant analysis, are costly and laborious. To address these limitations, we designed a versatile NGS-based mapping method that consists of a combination of low- to medium-coverage multiplex SOLiD (Sequencing by Oligonucleotide Ligation and Detection) and classical genetic rough mapping. Using only low to medium coverage reduces the SOLiD sequencing costs and, since just 10 to 20 mutant F2 plants are required for rough mapping, the operation is simple enough to handle in a laboratory with limited space and funding. As a proof of principle, we successfully applied this method to identify the CTR1, which is involved in boron-mediated root development, from among a population of high boron requiring Arabidopsis thaliana mutants. Our work demonstrates that this NGS-based mapping method is a moderately priced and versatile method that can readily be applied to other model organisms. PMID:23104114
2013-01-01
Background The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the respective coverage, and detecting regions with too-low or too-high coverage (deletions and amplifications, respectively). Current CNV detection methods rely on statistical assumptions (e.g., a Poisson model) that may not hold in general, or require fine-tuning the underlying algorithms to detect known hits. We propose a new CNV detection methodology based on two Poisson hierarchical models, the Poisson-Gamma and Poisson-Lognormal, with the advantage of being sufficiently flexible to describe different data patterns, whilst robust against deviations from the often assumed Poisson model. Results Using sequence coverage data of 7 Plasmodium falciparum malaria genomes (3D7 reference strain, HB3, DD2, 7G8, GB4, OX005, and OX006), we showed that empirical coverage distributions are intrinsically asymmetric and overdispersed in relation to the Poisson model. We also demonstrated a low baseline false positive rate for the proposed methodology using 3D7 resequencing data and simulation. When applied to the non-reference isolate data, our approach detected known CNV hits, including an amplification of the PfMDR1 locus in DD2 and a large deletion in the CLAG3.2 gene in GB4, and putative novel CNV regions. When compared to the recently available FREEC and cn.MOPS approaches, our findings were more concordant with putative hits from the highest quality array data for the 7G8 and GB4 isolates. Conclusions In summary, the proposed methodology brings an increase in flexibility, robustness, accuracy and statistical rigour to CNV detection using sequence coverage data. PMID:23442253
NASA Astrophysics Data System (ADS)
Jiang, Ning
Accurately measuring the immune repertoire sequence composition, diversity, and abundance is important in studying repertoire response in infections, vaccinations, and cancer immunology. Using molecular identifiers (MIDs) to tag mRNA molecules is an effective method in improving the accuracy of immune repertoire sequencing (IR-seq). However, it is still difficult to use IR-seq on small amount of clinical samples to achieve a high coverage of the repertoire diversities. This is especially challenging in studying infections and vaccinations where B cell subpopulations with fewer cells, such as memory B cells or plasmablasts, are often of great interest to study somatic mutation patterns and diversity changes. Here, we describe an approach of IR-seq based on the use of MIDs in combination with a clustering method that can reveal more than 80% of the antibody diversity in a sample and can be applied to as few as 1,000 B cells. We applied this to study the antibody repertoires of young children before and during an acute malaria infection. We discovered unexpectedly high levels of somatic hypermutation (SHM) in infants and revealed characteristics of antibody repertoire development in young children that would have a profound impact on immunization in children.
NASA Astrophysics Data System (ADS)
Halim, Mohammad A.; MacAleese, Luke; Lemoine, Jérôme; Antoine, Rodolphe; Dugourd, Philippe; Girod, Marion
2018-02-01
Mass spectrometry-based methods have made significant progress in characterizing post-translational modifications in peptides and proteins; however, certain aspects regarding fragmentation methods must still be improved. A good technique is expected to provide excellent sequence information, locate PTM sites, and retain the labile PTM groups. To address these issues, we investigate 10.6 μm IRMPD, 213 nm UVPD, and combined UV and IR photodissociation, known as HiLoPD (high-low photodissociation), for phospho-, sulfo-, and glyco-peptide cations. IRMPD shows excellent backbone fragmentation and produces equal numbers of N- and C-terminal ions. The results reveal that 213 nm UVPD and HiLoPD methods can provide diverse backbone fragmentation producing a/x, b/y, and c/z ions with excellent sequence coverage, locate PTM sites, and offer reasonable retention efficiency for phospho- and glyco-peptides. Excellent sequence coverage is achieved for sulfo-peptides and the position of the SO3 group can be pinpointed; however, widespread SO3 losses are detected irrespective of the methods used herein. Based on the overall performance achieved, we believe that 213 nm UVPD and HiLoPD can serve as alternative options to collision activation and electron transfer dissociations for phospho- and glyco-proteomics.
Observatorio Astrofísico de Javalambre: observation scheduler and sequencer
NASA Astrophysics Data System (ADS)
Ederoclite, A.; Cristóbal-Hornillos, D.; Moles, M.; Cenarro, A. J.; Marín-Franch, A.; Yanes Díaz, A.; Gruel, N.; Varela, J.; Chueca, S.; Rueda-Teruel, F.; Rueda-Teruel, S.; Luis-Simoes, R.; Hernández-Fuertes, J.; López-Sainz, A.; Chioare Díaz-Martín, M.
2013-05-01
Observational strategy is a critical path in any large survey. The planning of a night requires the knowledge of the fields observed, the quality of the data already secured, and the ones still to be observed to optimize scientific returns. Finally, field maximum altitude, sky distance/brightness during the night and meteorological data (cloud coverage and seeing) have to be taken into account in order to increase the chance to have a successful observation. To support the execution of the J-PAS project at the Javalambre Astrophysical Observatory, we have prepared a scheduler and a sequencer (SCH/SQ) which takes into account all the relevant mentioned parameters. The scheduler first selects the fields which can be observed during the night and orders them on the basis of their figure of merit. It takes into account the quality and spectral coverage of the existing observations as well as the possibility to get a good observation during the night. The sequencer takes into account the meteorological variables in order to prepare the observation queue for the night. During the commissioning of the telescopes at OAJ, we expect to improve our figures of merit and eventually get to a system which can function semi-automatically. This poster describes the design of this software.
The genome sequence of taurine cattle: a window to ruminant biology and evolution.
Elsik, Christine G; Tellam, Ross L; Worley, Kim C; Gibbs, Richard A; Muzny, Donna M; Weinstock, George M; Adelson, David L; Eichler, Evan E; Elnitski, Laura; Guigó, Roderic; Hamernik, Debora L; Kappes, Steve M; Lewin, Harris A; Lynn, David J; Nicholas, Frank W; Reymond, Alexandre; Rijnkels, Monique; Skow, Loren C; Zdobnov, Evgeny M; Schook, Lawrence; Womack, James; Alioto, Tyler; Antonarakis, Stylianos E; Astashyn, Alex; Chapple, Charles E; Chen, Hsiu-Chuan; Chrast, Jacqueline; Câmara, Francisco; Ermolaeva, Olga; Henrichsen, Charlotte N; Hlavina, Wratko; Kapustin, Yuri; Kiryutin, Boris; Kitts, Paul; Kokocinski, Felix; Landrum, Melissa; Maglott, Donna; Pruitt, Kim; Sapojnikov, Victor; Searle, Stephen M; Solovyev, Victor; Souvorov, Alexandre; Ucla, Catherine; Wyss, Carine; Anzola, Juan M; Gerlach, Daniel; Elhaik, Eran; Graur, Dan; Reese, Justin T; Edgar, Robert C; McEwan, John C; Payne, Gemma M; Raison, Joy M; Junier, Thomas; Kriventseva, Evgenia V; Eyras, Eduardo; Plass, Mireya; Donthu, Ravikiran; Larkin, Denis M; Reecy, James; Yang, Mary Q; Chen, Lin; Cheng, Ze; Chitko-McKown, Carol G; Liu, George E; Matukumalli, Lakshmi K; Song, Jiuzhou; Zhu, Bin; Bradley, Daniel G; Brinkman, Fiona S L; Lau, Lilian P L; Whiteside, Matthew D; Walker, Angela; Wheeler, Thomas T; Casey, Theresa; German, J Bruce; Lemay, Danielle G; Maqbool, Nauman J; Molenaar, Adrian J; Seo, Seongwon; Stothard, Paul; Baldwin, Cynthia L; Baxter, Rebecca; Brinkmeyer-Langford, Candice L; Brown, Wendy C; Childers, Christopher P; Connelley, Timothy; Ellis, Shirley A; Fritz, Krista; Glass, Elizabeth J; Herzig, Carolyn T A; Iivanainen, Antti; Lahmers, Kevin K; Bennett, Anna K; Dickens, C Michael; Gilbert, James G R; Hagen, Darren E; Salih, Hanni; Aerts, Jan; Caetano, Alexandre R; Dalrymple, Brian; Garcia, Jose Fernando; Gill, Clare A; Hiendleder, Stefan G; Memili, Erdogan; Spurlock, Diane; Williams, John L; Alexander, Lee; Brownstein, Michael J; Guan, Leluo; Holt, Robert A; Jones, Steven J M; Marra, Marco A; Moore, Richard; Moore, Stephen S; Roberts, Andy; Taniguchi, Masaaki; Waterman, Richard C; Chacko, Joseph; Chandrabose, Mimi M; Cree, Andy; Dao, Marvin Diep; Dinh, Huyen H; Gabisi, Ramatu Ayiesha; Hines, Sandra; Hume, Jennifer; Jhangiani, Shalini N; Joshi, Vandita; Kovar, Christie L; Lewis, Lora R; Liu, Yih-Shin; Lopez, John; Morgan, Margaret B; Nguyen, Ngoc Bich; Okwuonu, Geoffrey O; Ruiz, San Juana; Santibanez, Jireh; Wright, Rita A; Buhay, Christian; Ding, Yan; Dugan-Rocha, Shannon; Herdandez, Judith; Holder, Michael; Sabo, Aniko; Egan, Amy; Goodell, Jason; Wilczek-Boney, Katarzyna; Fowler, Gerald R; Hitchens, Matthew Edward; Lozado, Ryan J; Moen, Charles; Steffen, David; Warren, James T; Zhang, Jingkun; Chiu, Readman; Schein, Jacqueline E; Durbin, K James; Havlak, Paul; Jiang, Huaiyang; Liu, Yue; Qin, Xiang; Ren, Yanru; Shen, Yufeng; Song, Henry; Bell, Stephanie Nicole; Davis, Clay; Johnson, Angela Jolivet; Lee, Sandra; Nazareth, Lynne V; Patel, Bella Mayurkumar; Pu, Ling-Ling; Vattathil, Selina; Williams, Rex Lee; Curry, Stacey; Hamilton, Cerissa; Sodergren, Erica; Wheeler, David A; Barris, Wes; Bennett, Gary L; Eggen, André; Green, Ronnie D; Harhay, Gregory P; Hobbs, Matthew; Jann, Oliver; Keele, John W; Kent, Matthew P; Lien, Sigbjørn; McKay, Stephanie D; McWilliam, Sean; Ratnakumar, Abhirami; Schnabel, Robert D; Smith, Timothy; Snelling, Warren M; Sonstegard, Tad S; Stone, Roger T; Sugimoto, Yoshikazu; Takasuga, Akiko; Taylor, Jeremy F; Van Tassell, Curtis P; Macneil, Michael D; Abatepaulo, Antonio R R; Abbey, Colette A; Ahola, Virpi; Almeida, Iassudara G; Amadio, Ariel F; Anatriello, Elen; Bahadue, Suria M; Biase, Fernando H; Boldt, Clayton R; Carroll, Jeffery A; Carvalho, Wanessa A; Cervelatti, Eliane P; Chacko, Elsa; Chapin, Jennifer E; Cheng, Ye; Choi, Jungwoo; Colley, Adam J; de Campos, Tatiana A; De Donato, Marcos; Santos, Isabel K F de Miranda; de Oliveira, Carlo J F; Deobald, Heather; Devinoy, Eve; Donohue, Kaitlin E; Dovc, Peter; Eberlein, Annett; Fitzsimmons, Carolyn J; Franzin, Alessandra M; Garcia, Gustavo R; Genini, Sem; Gladney, Cody J; Grant, Jason R; Greaser, Marion L; Green, Jonathan A; Hadsell, Darryl L; Hakimov, Hatam A; Halgren, Rob; Harrow, Jennifer L; Hart, Elizabeth A; Hastings, Nicola; Hernandez, Marta; Hu, Zhi-Liang; Ingham, Aaron; Iso-Touru, Terhi; Jamis, Catherine; Jensen, Kirsty; Kapetis, Dimos; Kerr, Tovah; Khalil, Sari S; Khatib, Hasan; Kolbehdari, Davood; Kumar, Charu G; Kumar, Dinesh; Leach, Richard; Lee, Justin C-M; Li, Changxi; Logan, Krystin M; Malinverni, Roberto; Marques, Elisa; Martin, William F; Martins, Natalia F; Maruyama, Sandra R; Mazza, Raffaele; McLean, Kim L; Medrano, Juan F; Moreno, Barbara T; Moré, Daniela D; Muntean, Carl T; Nandakumar, Hari P; Nogueira, Marcelo F G; Olsaker, Ingrid; Pant, Sameer D; Panzitta, Francesca; Pastor, Rosemeire C P; Poli, Mario A; Poslusny, Nathan; Rachagani, Satyanarayana; Ranganathan, Shoba; Razpet, Andrej; Riggs, Penny K; Rincon, Gonzalo; Rodriguez-Osorio, Nelida; Rodriguez-Zas, Sandra L; Romero, Natasha E; Rosenwald, Anne; Sando, Lillian; Schmutz, Sheila M; Shen, Libing; Sherman, Laura; Southey, Bruce R; Lutzow, Ylva Strandberg; Sweedler, Jonathan V; Tammen, Imke; Telugu, Bhanu Prakash V L; Urbanski, Jennifer M; Utsunomiya, Yuri T; Verschoor, Chris P; Waardenberg, Ashley J; Wang, Zhiquan; Ward, Robert; Weikard, Rosemarie; Welsh, Thomas H; White, Stephen N; Wilming, Laurens G; Wunderlich, Kris R; Yang, Jianqi; Zhao, Feng-Qi
2009-04-24
To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides a resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production.
The Bologna Annotation Resource (BAR 3.0): improving protein functional annotation
Casadio, Rita
2017-01-01
Abstract BAR 3.0 updates our server BAR (Bologna Annotation Resource) for predicting protein structural and functional features from sequence. We increase data volume, query capabilities and information conveyed to the user. The core of BAR 3.0 is a graph-based clustering procedure of UniProtKB sequences, following strict pairwise similarity criteria (sequence identity ≥40% with alignment coverage ≥90%). Each cluster contains the available annotation downloaded from UniProtKB, GO, PFAM and PDB. After statistical validation, GO terms and PFAM domains are cluster-specific and annotate new sequences entering the cluster after satisfying similarity constraints. BAR 3.0 includes 28 869 663 sequences in 1 361 773 clusters, of which 22.2% (22 241 661 sequences) and 47.4% (24 555 055 sequences) have at least one validated GO term and one PFAM domain, respectively. 1.4% of the clusters (36% of all sequences) include PDB structures and the cluster is associated to a hidden Markov model that allows building template-target alignment suitable for structural modeling. Some other 3 399 026 sequences are singletons. BAR 3.0 offers an improved search interface, allowing queries by UniProtKB-accession, Fasta sequence, GO-term, PFAM-domain, organism, PDB and ligand/s. When evaluated on the CAFA2 targets, BAR 3.0 largely outperforms our previous version and scores among state-of-the-art methods. BAR 3.0 is publicly available and accessible at http://bar.biocomp.unibo.it/bar3. PMID:28453653
Pightling, Arthur W.; Petronella, Nicholas; Pagotto, Franco
2014-01-01
The wide availability of whole-genome sequencing (WGS) and an abundance of open-source software have made detection of single-nucleotide polymorphisms (SNPs) in bacterial genomes an increasingly accessible and effective tool for comparative analyses. Thus, ensuring that real nucleotide differences between genomes (i.e., true SNPs) are detected at high rates and that the influences of errors (such as false positive SNPs, ambiguously called sites, and gaps) are mitigated is of utmost importance. The choices researchers make regarding the generation and analysis of WGS data can greatly influence the accuracy of short-read sequence alignments and, therefore, the efficacy of such experiments. We studied the effects of some of these choices, including: i) depth of sequencing coverage, ii) choice of reference-guided short-read sequence assembler, iii) choice of reference genome, and iv) whether to perform read-quality filtering and trimming, on our ability to detect true SNPs and on the frequencies of errors. We performed benchmarking experiments, during which we assembled simulated and real Listeria monocytogenes strain 08-5578 short-read sequence datasets of varying quality with four commonly used assemblers (BWA, MOSAIK, Novoalign, and SMALT), using reference genomes of varying genetic distances, and with or without read pre-processing (i.e., quality filtering and trimming). We found that assemblies of at least 50-fold coverage provided the most accurate results. In addition, MOSAIK yielded the fewest errors when reads were aligned to a nearly identical reference genome, while using SMALT to align reads against a reference sequence that is ∼0.82% distant from 08-5578 at the nucleotide level resulted in the detection of the greatest numbers of true SNPs and the fewest errors. Finally, we show that whether read pre-processing improves SNP detection depends upon the choice of reference sequence and assembler. In total, this study demonstrates that researchers should test a variety of conditions to achieve optimal results. PMID:25144537
Guan, Xiaoyan; Brownstein, Naomi C; Young, Nicolas L; Marshall, Alan G
2017-01-30
Bottom-up tandem mass spectrometry (MS/MS) is regularly used in proteomics to identify proteins from a sequence database. De novo sequencing is also available for sequencing peptides with relatively short sequence lengths. We recently showed that paired Lys-C and Lys-N proteases produce peptides of identical mass and similar retention time, but different tandem mass spectra. Such parallel experiments provide complementary information, and allow for up to 100% MS/MS sequence coverage. Here, we report digestion by paired Lys-C and Lys-N proteases of a seven-protein mixture: human hemoglobin alpha, bovine carbonic anhydrase 2, horse skeletal muscle myoglobin, hen egg white lysozyme, bovine pancreatic ribonuclease, bovine rhodanese, and bovine serum albumin, followed by reversed-phase nanoflow liquid chromatography, collision-induced dissociation, and 14.5 T Fourier transform ion cyclotron resonance mass spectrometry. Matched pairs of product peptide ions of equal precursor mass and similar retention times from each digestion are compared, leveraging single-residue transposed information with independent interferences to confidently identify fragment ion types, residues, and peptides. Selected pairs of product ion mass spectra for de novo sequenced protein segments from each member of the mixture are presented. Pairs of the transposed product ions as well as complementary information from the parallel experiments allow for both high MS/MS coverage for long peptide sequences and high confidence in the amino acid identification. Moreover, the parallel experiments in the de novo sequencing reduce false-positive matches of product ions from the single-residue transposed peptides from the same segment, and thereby further improve the confidence in protein identification. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Zhao, Panpan; Zhong, Jiayong; Liu, Wanting; Zhao, Jing; Zhang, Gong
2017-12-01
Multiple search engines based on various models have been developed to search MS/MS spectra against a reference database, providing different results for the same data set. How to integrate these results efficiently with minimal compromise on false discoveries is an open question due to the lack of an independent, reliable, and highly sensitive standard. We took the advantage of the translating mRNA sequencing (RNC-seq) result as a standard to evaluate the integration strategies of the protein identifications from various search engines. We used seven mainstream search engines (Andromeda, Mascot, OMSSA, X!Tandem, pFind, InsPecT, and ProVerB) to search the same label-free MS data sets of human cell lines Hep3B, MHCCLM3, and MHCC97H from the Chinese C-HPP Consortium for Chromosomes 1, 8, and 20. As expected, the union of seven engines resulted in a boosted false identification, whereas the intersection of seven engines remarkably decreased the identification power. We found that identifications of at least two out of seven engines resulted in maximizing the protein identification power while minimizing the ratio of suspicious/translation-supported identifications (STR), as monitored by our STR index, based on RNC-Seq. Furthermore, this strategy also significantly improves the peptides coverage of the protein amino acid sequence. In summary, we demonstrated a simple strategy to significantly improve the performance for shotgun mass spectrometry by protein-level integrating multiple search engines, maximizing the utilization of the current MS spectra without additional experimental work.
MCMC-ODPR: primer design optimization using Markov Chain Monte Carlo sampling.
Kitchen, James L; Moore, Jonathan D; Palmer, Sarah A; Allaby, Robin G
2012-11-05
Next generation sequencing technologies often require numerous primer designs that require good target coverage that can be financially costly. We aimed to develop a system that would implement primer reuse to design degenerate primers that could be designed around SNPs, thus find the fewest necessary primers and the lowest cost whilst maintaining an acceptable coverage and provide a cost effective solution. We have implemented Metropolis-Hastings Markov Chain Monte Carlo for optimizing primer reuse. We call it the Markov Chain Monte Carlo Optimized Degenerate Primer Reuse (MCMC-ODPR) algorithm. After repeating the program 1020 times to assess the variance, an average of 17.14% fewer primers were found to be necessary using MCMC-ODPR for an equivalent coverage without implementing primer reuse. The algorithm was able to reuse primers up to five times. We compared MCMC-ODPR with single sequence primer design programs Primer3 and Primer-BLAST and achieved a lower primer cost per amplicon base covered of 0.21 and 0.19 and 0.18 primer nucleotides on three separate gene sequences, respectively. With multiple sequences, MCMC-ODPR achieved a lower cost per base covered of 0.19 than programs BatchPrimer3 and PAMPS, which achieved 0.25 and 0.64 primer nucleotides, respectively. MCMC-ODPR is a useful tool for designing primers at various melting temperatures at good target coverage. By combining degeneracy with optimal primer reuse the user may increase coverage of sequences amplified by the designed primers at significantly lower costs. Our analyses showed that overall MCMC-ODPR outperformed the other primer-design programs in our study in terms of cost per covered base.
MCMC-ODPR: Primer design optimization using Markov Chain Monte Carlo sampling
2012-01-01
Background Next generation sequencing technologies often require numerous primer designs that require good target coverage that can be financially costly. We aimed to develop a system that would implement primer reuse to design degenerate primers that could be designed around SNPs, thus find the fewest necessary primers and the lowest cost whilst maintaining an acceptable coverage and provide a cost effective solution. We have implemented Metropolis-Hastings Markov Chain Monte Carlo for optimizing primer reuse. We call it the Markov Chain Monte Carlo Optimized Degenerate Primer Reuse (MCMC-ODPR) algorithm. Results After repeating the program 1020 times to assess the variance, an average of 17.14% fewer primers were found to be necessary using MCMC-ODPR for an equivalent coverage without implementing primer reuse. The algorithm was able to reuse primers up to five times. We compared MCMC-ODPR with single sequence primer design programs Primer3 and Primer-BLAST and achieved a lower primer cost per amplicon base covered of 0.21 and 0.19 and 0.18 primer nucleotides on three separate gene sequences, respectively. With multiple sequences, MCMC-ODPR achieved a lower cost per base covered of 0.19 than programs BatchPrimer3 and PAMPS, which achieved 0.25 and 0.64 primer nucleotides, respectively. Conclusions MCMC-ODPR is a useful tool for designing primers at various melting temperatures at good target coverage. By combining degeneracy with optimal primer reuse the user may increase coverage of sequences amplified by the designed primers at significantly lower costs. Our analyses showed that overall MCMC-ODPR outperformed the other primer-design programs in our study in terms of cost per covered base. PMID:23126469
2010-01-01
Background Food supply from the ocean is constrained by the shortage of domesticated and selected fish. Development of genomic models of economically important fishes should assist with the removal of this bottleneck. European sea bass Dicentrarchus labrax L. (Moronidae, Perciformes, Teleostei) is one of the most important fishes in European marine aquaculture; growing genomic resources put it on its way to serve as an economic model. Results End sequencing of a sea bass genomic BAC-library enabled the comparative mapping of the sea bass genome using the three-spined stickleback Gasterosteus aculeatus genome as a reference. BAC-end sequences (102,690) were aligned to the stickleback genome. The number of mappable BACs was improved using a two-fold coverage WGS dataset of sea bass resulting in a comparative BAC-map covering 87% of stickleback chromosomes with 588 BAC-contigs. The minimum size of 83 contigs covering 50% of the reference was 1.2 Mbp; the largest BAC-contig comprised 8.86 Mbp. More than 22,000 BAC-clones aligned with both ends to the reference genome. Intra-chromosomal rearrangements between sea bass and stickleback were identified. Size distributions of mapped BACs were used to calculate that the genome of sea bass may be only 1.3 fold larger than the 460 Mbp stickleback genome. Conclusions The BAC map is used for sequencing single BACs or BAC-pools covering defined genomic entities by second generation sequencing technologies. Together with the WGS dataset it initiates a sea bass genome sequencing project. This will allow the quantification of polymorphisms through resequencing, which is important for selecting highly performing domesticated fish. PMID:20105308
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gallaher, Sean D.; Fitz-Gibbon, Sorel T.; Strenkert, Daniela
Chlamydomonas reinhardtii is a unicellular chlorophyte alga that is widely studied as a reference organism for understanding photosynthesis, sensory and motile cilia, and for development of an algal-based platform for producing biofuels and bio-products. Its highly repetitive, ~205-kbp circular chloroplast genome and ~15.8-kbp linear mitochondrial genome were sequenced prior to the advent of high-throughput sequencing technologies. Here, high coverage shotgun sequencing was used to assemble both organellar genomes de novo. These new genomes correct dozens of errors in the prior genome sequences and annotations. Gen-ome sequencing coverage indicates that each cell contains on average 83 copies of the chloroplast genomemore » and 130 copies of the mitochondrial genome. Using protocols and analyses optimized for organellar tran-scripts, RNA-Seq was used to quantify their relative abundances across 12 different growth conditions. Forty-six percent of total cellular mRNA is attributable to high expression from a few dozen chloroplast genes. RNA-Seq data were used to guide gene annotation, to demonstrate polycistronic gene expression, and to quantify splicing of psaA and psbA introns. In contrast to a conclusion from a recent study, we found that chloroplast transcripts are not edited. Unexpectedly, cytosine-rich polynucleotide tails were observed at the 3’-end of all mitochondrial transcripts. A comparative genomics analysis of eight laboratory strains and 11 wild isolates of C. reinhardtii identified 2658 variants in the organellargenomes, which is 1/10th as much genetic diversity as is found in the nucleus.« less
Enhanced Methods for Local Ancestry Assignment in Sequenced Admixed Individuals
Brown, Robert; Pasaniuc, Bogdan
2014-01-01
Inferring the ancestry at each locus in the genome of recently admixed individuals (e.g., Latino Americans) plays a major role in medical and population genetic inferences, ranging from finding disease-risk loci, to inferring recombination rates, to mapping missing contigs in the human genome. Although many methods for local ancestry inference have been proposed, most are designed for use with genotyping arrays and fail to make use of the full spectrum of data available from sequencing. In addition, current haplotype-based approaches are very computationally demanding, requiring large computational time for moderately large sample sizes. Here we present new methods for local ancestry inference that leverage continent-specific variants (CSVs) to attain increased performance over existing approaches in sequenced admixed genomes. A key feature of our approach is that it incorporates the admixed genomes themselves jointly with public datasets, such as 1000 Genomes, to improve the accuracy of CSV calling. We use simulations to show that our approach attains accuracy similar to widely used computationally intensive haplotype-based approaches with large decreases in runtime. Most importantly, we show that our method recovers comparable local ancestries, as the 1000 Genomes consensus local ancestry calls in the real admixed individuals from the 1000 Genomes Project. We extend our approach to account for low-coverage sequencing and show that accurate local ancestry inference can be attained at low sequencing coverage. Finally, we generalize CSVs to sub-continental population-specific variants (sCSVs) and show that in some cases it is possible to determine the sub-continental ancestry for short chromosomal segments on the basis of sCSVs. PMID:24743331
Xu, Ye; Huang, Cheng; Colón-Ramos, Uriyoán
2015-01-01
Binagwaho and colleagues’ perspective piece provided a timely reflection on the experience of Rwanda in achieving the Millennium Development Goals (MDGs) and a proposal of 5 principles to carry forward in post-2015 health development. This commentary echoes their viewpoints and offers three lessons for health policy reforms consistent with these principles beyond 2015. Specifically, we argue that universal health coverage (UHC) is an integrated solution to advance the global health development agenda, and the three essential strategies drawn from Asian countries’ health reforms toward UHC are: (1) Public financing support and sequencing health insurance expansion by first extending health insurance to the extremely poor, vulnerable, and marginalized population are critical for achieving UHC; (2) Improved quality of delivered care ensures supply-side readiness and effective coverage; (3) Strategic purchasing and results-based financing creates incentives and accountability for positive changes. These strategies were discussed and illustrated with experience from China and other Asian economies. PMID:26673477
MSeq-CNV: accurate detection of Copy Number Variation from Sequencing of Multiple samples.
Malekpour, Seyed Amir; Pezeshk, Hamid; Sadeghi, Mehdi
2018-03-05
Currently a few tools are capable of detecting genome-wide Copy Number Variations (CNVs) based on sequencing of multiple samples. Although aberrations in mate pair insertion sizes provide additional hints for the CNV detection based on multiple samples, the majority of the current tools rely only on the depth of coverage. Here, we propose a new algorithm (MSeq-CNV) which allows detecting common CNVs across multiple samples. MSeq-CNV applies a mixture density for modeling aberrations in depth of coverage and abnormalities in the mate pair insertion sizes. Each component in this mixture density applies a Binomial distribution for modeling the number of mate pairs with aberration in the insertion size and also a Poisson distribution for emitting the read counts, in each genomic position. MSeq-CNV is applied on simulated data and also on real data of six HapMap individuals with high-coverage sequencing, in 1000 Genomes Project. These individuals include a CEU trio of European ancestry and a YRI trio of Nigerian ethnicity. Ancestry of these individuals is studied by clustering the identified CNVs. MSeq-CNV is also applied for detecting CNVs in two samples with low-coverage sequencing in 1000 Genomes Project and six samples form the Simons Genome Diversity Project.
Palmer, Lance E; Dejori, Mathaeus; Bolanos, Randall; Fasulo, Daniel
2010-01-15
With the rapid expansion of DNA sequencing databases, it is now feasible to identify relevant information from prior sequencing projects and completed genomes and apply it to de novo sequencing of new organisms. As an example, this paper demonstrates how such extra information can be used to improve de novo assemblies by augmenting the overlapping step. Finding all pairs of overlapping reads is a key task in many genome assemblers, and to this end, highly efficient algorithms have been developed to find alignments in large collections of sequences. It is well known that due to repeated sequences, many aligned pairs of reads nevertheless do not overlap. But no overlapping algorithm to date takes a rigorous approach to separating aligned but non-overlapping read pairs from true overlaps. We present an approach that extends the Minimus assembler by a data driven step to classify overlaps as true or false prior to contig construction. We trained several different classification models within the Weka framework using various statistics derived from overlaps of reads available from prior sequencing projects. These statistics included percent mismatch and k-mer frequencies within the overlaps as well as a comparative genomics score derived from mapping reads to multiple reference genomes. We show that in real whole-genome sequencing data from the E. coli and S. aureus genomes, by providing a curated set of overlaps to the contigging phase of the assembler, we nearly doubled the median contig length (N50) without sacrificing coverage of the genome or increasing the number of mis-assemblies. Machine learning methods that use comparative and non-comparative features to classify overlaps as true or false can be used to improve the quality of a sequence assembly.
2013-01-01
Background Genetic linkage maps are important tools in breeding programmes and quantitative trait analyses. Traditional molecular markers used for genotyping are limited in throughput and efficiency. The advent of next-generation sequencing technologies has facilitated progeny genotyping and genetic linkage map construction in the major grains. However, the applicability of the approach remains untested in the fungal system. Findings Shiitake mushroom, Lentinula edodes, is a basidiomycetous fungus that represents one of the most popular cultivated edible mushrooms. Here, we developed a rapid genotyping method based on low-coverage (~0.5 to 1.5-fold) whole-genome resequencing. We used the approach to genotype 20 single-spore isolates derived from L. edodes strain L54 and constructed the first high-density sequence-based genetic linkage map of L. edodes. The accuracy of the proposed genotyping method was verified experimentally with results from mating compatibility tests and PCR-single-strand conformation polymorphism on a few known genes. The linkage map spanned a total genetic distance of 637.1 cM and contained 13 linkage groups. Two hundred sequence-based markers were placed on the map, with an average marker spacing of 3.4 cM. The accuracy of the map was confirmed by comparing with previous maps the locations of known genes such as matA and matB. Conclusions We used the shiitake mushroom as an example to provide a proof-of-principle that low-coverage resequencing could allow rapid genotyping of basidiospore-derived progenies, which could in turn facilitate the construction of high-density genetic linkage maps of basidiomycetous fungi for quantitative trait analyses and improvement of genome assembly. PMID:23915543
PWHATSHAP: efficient haplotyping for future generation sequencing.
Bracciali, Andrea; Aldinucci, Marco; Patterson, Murray; Marschall, Tobias; Pisanti, Nadia; Merelli, Ivan; Torquati, Massimo
2016-09-22
Haplotype phasing is an important problem in the analysis of genomics information. Given a set of DNA fragments of an individual, it consists of determining which one of the possible alleles (alternative forms of a gene) each fragment comes from. Haplotype information is relevant to gene regulation, epigenetics, genome-wide association studies, evolutionary and population studies, and the study of mutations. Haplotyping is currently addressed as an optimisation problem aiming at solutions that minimise, for instance, error correction costs, where costs are a measure of the confidence in the accuracy of the information acquired from DNA sequencing. Solutions have typically an exponential computational complexity. WHATSHAP is a recent optimal approach which moves computational complexity from DNA fragment length to fragment overlap, i.e., coverage, and is hence of particular interest when considering sequencing technology's current trends that are producing longer fragments. Given the potential relevance of efficient haplotyping in several analysis pipelines, we have designed and engineered PWHATSHAP, a parallel, high-performance version of WHATSHAP. PWHATSHAP is embedded in a toolkit developed in Python and supports genomics datasets in standard file formats. Building on WHATSHAP, PWHATSHAP exhibits the same complexity exploring a number of possible solutions which is exponential in the coverage of the dataset. The parallel implementation on multi-core architectures allows for a relevant reduction of the execution time for haplotyping, while the provided results enjoy the same high accuracy as that provided by WHATSHAP, which increases with coverage. Due to its structure and management of the large datasets, the parallelisation of WHATSHAP posed demanding technical challenges, which have been addressed exploiting a high-level parallel programming framework. The result, PWHATSHAP, is a freely available toolkit that improves the efficiency of the analysis of genomics information.
Genome Wide Characterization of Simple Sequence Repeats in Cucumber
USDA-ARS?s Scientific Manuscript database
The whole genome sequence of the cucumber cultivar Gy14 was recently sequenced at 15× coverage with the Roche 454 Titanium technology. The microsatellite DNA sequences (simple sequence repeats, SSRs) in the assembled scaffolds were computationally explored and characterized. A total of 112,073 SSRs ...
Ramos, Enrique; Levinson, Benjamin T; Chasnoff, Sara; Hughes, Andrew; Young, Andrew L; Thornton, Katherine; Li, Allie; Vallania, Francesco L M; Province, Michael; Druley, Todd E
2012-12-06
Rare genetic variation in the human population is a major source of pathophysiological variability and has been implicated in a host of complex phenotypes and diseases. Finding disease-related genes harboring disparate functional rare variants requires sequencing of many individuals across many genomic regions and comparing against unaffected cohorts. However, despite persistent declines in sequencing costs, population-based rare variant detection across large genomic target regions remains cost prohibitive for most investigators. In addition, DNA samples are often precious and hybridization methods typically require large amounts of input DNA. Pooled sample DNA sequencing is a cost and time-efficient strategy for surveying populations of individuals for rare variants. We set out to 1) create a scalable, multiplexing method for custom capture with or without individual DNA indexing that was amenable to low amounts of input DNA and 2) expand the functionality of the SPLINTER algorithm for calling substitutions, insertions and deletions across either candidate genes or the entire exome by integrating the variant calling algorithm with the dynamic programming aligner, Novoalign. We report methodology for pooled hybridization capture with pre-enrichment, indexed multiplexing of up to 48 individuals or non-indexed pooled sequencing of up to 92 individuals with as little as 70 ng of DNA per person. Modified solid phase reversible immobilization bead purification strategies enable no sample transfers from sonication in 96-well plates through adapter ligation, resulting in 50% less library preparation reagent consumption. Custom Y-shaped adapters containing novel 7 base pair index sequences with a Hamming distance of ≥2 were directly ligated onto fragmented source DNA eliminating the need for PCR to incorporate indexes, and was followed by a custom blocking strategy using a single oligonucleotide regardless of index sequence. These results were obtained aligning raw reads against the entire genome using Novoalign followed by variant calling of non-indexed pools using SPLINTER or SAMtools for indexed samples. With these pipelines, we find sensitivity and specificity of 99.4% and 99.7% for pooled exome sequencing. Sensitivity, and to a lesser degree specificity, proved to be a function of coverage. For rare variants (≤2% minor allele frequency), we achieved sensitivity and specificity of ≥94.9% and ≥99.99% for custom capture of 2.5 Mb in multiplexed libraries of 22-48 individuals with only ≥5-fold coverage/chromosome, but these parameters improved to ≥98.7 and 100% with 20-fold coverage/chromosome. This highly scalable methodology enables accurate rare variant detection, with or without individual DNA sample indexing, while reducing the amount of required source DNA and total costs through less hybridization reagent consumption, multi-sample sonication in a standard PCR plate, multiplexed pre-enrichment pooling with a single hybridization and lesser sequencing coverage required to obtain high sensitivity.
Two Low Coverage Bird Genomes and a Comparison of Reference-Guided versus De Novo Genome Assemblies
Card, Daren C.; Schield, Drew R.; Reyes-Velasco, Jacobo; Fujita, Matthew K.; Andrew, Audra L.; Oyler-McCance, Sara J.; Fike, Jennifer A.; Tomback, Diana F.; Ruggiero, Robert P.; Castoe, Todd A.
2014-01-01
As a greater number and diversity of high-quality vertebrate reference genomes become available, it is increasingly feasible to use these references to guide new draft assemblies for related species. Reference-guided assembly approaches may substantially increase the contiguity and completeness of a new genome using only low levels of genome coverage that might otherwise be insufficient for de novo genome assembly. We used low-coverage (∼3.5–5.5x) Illumina paired-end sequencing to assemble draft genomes of two bird species (the Gunnison Sage-Grouse, Centrocercus minimus, and the Clark's Nutcracker, Nucifraga columbiana). We used these data to estimate de novo genome assemblies and reference-guided assemblies, and compared the information content and completeness of these assemblies by comparing CEGMA gene set representation, repeat element content, simple sequence repeat content, and GC isochore structure among assemblies. Our results demonstrate that even lower-coverage genome sequencing projects are capable of producing informative and useful genomic resources, particularly through the use of reference-guided assemblies. PMID:25192061
CNNdel: Calling Structural Variations on Low Coverage Data Based on Convolutional Neural Networks
2017-01-01
Many structural variations (SVs) detection methods have been proposed due to the popularization of next-generation sequencing (NGS). These SV calling methods use different SV-property-dependent features; however, they all suffer from poor accuracy when running on low coverage sequences. The union of results from these tools achieves fairly high sensitivity but still produces low accuracy on low coverage sequence data. That is, these methods contain many false positives. In this paper, we present CNNdel, an approach for calling deletions from paired-end reads. CNNdel gathers SV candidates reported by multiple tools and then extracts features from aligned BAM files at the positions of candidates. With labeled feature-expressed candidates as a training set, CNNdel trains convolutional neural networks (CNNs) to distinguish true unlabeled candidates from false ones. Results show that CNNdel works well with NGS reads from 26 low coverage genomes of the 1000 Genomes Project. The paper demonstrates that convolutional neural networks can automatically assign the priority of SV features and reduce the false positives efficaciously. PMID:28630866
Kempton, Colton E.; Heninger, Justin R.; Johnson, Steven M.
2014-01-01
Nucleosomes and their positions in the eukaryotic genome play an important role in regulating gene expression by influencing accessibility to DNA. Many factors influence a nucleosome's final position in the chromatin landscape including the underlying genomic sequence. One of the primary reasons for performing in vitro nucleosome reconstitution experiments is to identify how the underlying DNA sequence will influence a nucleosome's position in the absence of other compounding cellular factors. However, concerns have been raised about the reproducibility of data generated from these kinds of experiments. Here we present data for in vitro nucleosome reconstitution experiments performed on linear plasmid DNA that demonstrate that, when coverage is deep enough, these reconstitution experiments are exquisitely reproducible and highly consistent. Our data also suggests that a coverage depth of 35X be maintained for maximal confidence when assaying nucleosome positions, but lower coverage levels may be generally sufficient. These coverage depth recommendations are sufficient in the experimental system and conditions used in this study, but may vary depending on the exact parameters used in other systems. PMID:25093869
Bilton, Timothy P.; Schofield, Matthew R.; Black, Michael A.; Chagné, David; Wilcox, Phillip L.; Dodds, Ken G.
2018-01-01
Next-generation sequencing is an efficient method that allows for substantially more markers than previous technologies, providing opportunities for building high-density genetic linkage maps, which facilitate the development of nonmodel species’ genomic assemblies and the investigation of their genes. However, constructing genetic maps using data generated via high-throughput sequencing technology (e.g., genotyping-by-sequencing) is complicated by the presence of sequencing errors and genotyping errors resulting from missing parental alleles due to low sequencing depth. If unaccounted for, these errors lead to inflated genetic maps. In addition, map construction in many species is performed using full-sibling family populations derived from the outcrossing of two individuals, where unknown parental phase and varying segregation types further complicate construction. We present a new methodology for modeling low coverage sequencing data in the construction of genetic linkage maps using full-sibling populations of diploid species, implemented in a package called GUSMap. Our model is based on the Lander–Green hidden Markov model but extended to account for errors present in sequencing data. We were able to obtain accurate estimates of the recombination fractions and overall map distance using GUSMap, while most existing mapping packages produced inflated genetic maps in the presence of errors. Our results demonstrate the feasibility of using low coverage sequencing data to produce genetic maps without requiring extensive filtering of potentially erroneous genotypes, provided that the associated errors are correctly accounted for in the model. PMID:29487138
Bilton, Timothy P; Schofield, Matthew R; Black, Michael A; Chagné, David; Wilcox, Phillip L; Dodds, Ken G
2018-05-01
Next-generation sequencing is an efficient method that allows for substantially more markers than previous technologies, providing opportunities for building high-density genetic linkage maps, which facilitate the development of nonmodel species' genomic assemblies and the investigation of their genes. However, constructing genetic maps using data generated via high-throughput sequencing technology ( e.g. , genotyping-by-sequencing) is complicated by the presence of sequencing errors and genotyping errors resulting from missing parental alleles due to low sequencing depth. If unaccounted for, these errors lead to inflated genetic maps. In addition, map construction in many species is performed using full-sibling family populations derived from the outcrossing of two individuals, where unknown parental phase and varying segregation types further complicate construction. We present a new methodology for modeling low coverage sequencing data in the construction of genetic linkage maps using full-sibling populations of diploid species, implemented in a package called GUSMap. Our model is based on the Lander-Green hidden Markov model but extended to account for errors present in sequencing data. We were able to obtain accurate estimates of the recombination fractions and overall map distance using GUSMap, while most existing mapping packages produced inflated genetic maps in the presence of errors. Our results demonstrate the feasibility of using low coverage sequencing data to produce genetic maps without requiring extensive filtering of potentially erroneous genotypes, provided that the associated errors are correctly accounted for in the model. Copyright © 2018 Bilton et al.
Low-Cost, High-Throughput Sequencing of DNA Assemblies Using a Highly Multiplexed Nextera Process.
Shapland, Elaine B; Holmes, Victor; Reeves, Christopher D; Sorokin, Elena; Durot, Maxime; Platt, Darren; Allen, Christopher; Dean, Jed; Serber, Zach; Newman, Jack; Chandran, Sunil
2015-07-17
In recent years, next-generation sequencing (NGS) technology has greatly reduced the cost of sequencing whole genomes, whereas the cost of sequence verification of plasmids via Sanger sequencing has remained high. Consequently, industrial-scale strain engineers either limit the number of designs or take short cuts in quality control. Here, we show that over 4000 plasmids can be completely sequenced in one Illumina MiSeq run for less than $3 each (15× coverage), which is a 20-fold reduction over using Sanger sequencing (2× coverage). We reduced the volume of the Nextera tagmentation reaction by 100-fold and developed an automated workflow to prepare thousands of samples for sequencing. We also developed software to track the samples and associated sequence data and to rapidly identify correctly assembled constructs having the fewest defects. As DNA synthesis and assembly become a centralized commodity, this NGS quality control (QC) process will be essential to groups operating high-throughput pipelines for DNA construction.
The diploid genome sequence of an Asian individual
Wang, Jun; Wang, Wei; Li, Ruiqiang; Li, Yingrui; Tian, Geng; Goodman, Laurie; Fan, Wei; Zhang, Junqing; Li, Jun; Zhang, Juanbin; Guo, Yiran; Feng, Binxiao; Li, Heng; Lu, Yao; Fang, Xiaodong; Liang, Huiqing; Du, Zhenglin; Li, Dong; Zhao, Yiqing; Hu, Yujie; Yang, Zhenzhen; Zheng, Hancheng; Hellmann, Ines; Inouye, Michael; Pool, John; Yi, Xin; Zhao, Jing; Duan, Jinjie; Zhou, Yan; Qin, Junjie; Ma, Lijia; Li, Guoqing; Yang, Zhentao; Zhang, Guojie; Yang, Bin; Yu, Chang; Liang, Fang; Li, Wenjie; Li, Shaochuan; Li, Dawei; Ni, Peixiang; Ruan, Jue; Li, Qibin; Zhu, Hongmei; Liu, Dongyuan; Lu, Zhike; Li, Ning; Guo, Guangwu; Zhang, Jianguo; Ye, Jia; Fang, Lin; Hao, Qin; Chen, Quan; Liang, Yu; Su, Yeyang; san, A.; Ping, Cuo; Yang, Shuang; Chen, Fang; Li, Li; Zhou, Ke; Zheng, Hongkun; Ren, Yuanyuan; Yang, Ling; Gao, Yang; Yang, Guohua; Li, Zhuo; Feng, Xiaoli; Kristiansen, Karsten; Wong, Gane Ka-Shu; Nielsen, Rasmus; Durbin, Richard; Bolund, Lars; Zhang, Xiuqing; Li, Songgang; Yang, Huanming; Wang, Jian
2009-01-01
Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics. PMID:18987735
Software for pre-processing Illumina next-generation sequencing short read sequences
2014-01-01
Background When compared to Sanger sequencing technology, next-generation sequencing (NGS) technologies are hindered by shorter sequence read length, higher base-call error rate, non-uniform coverage, and platform-specific sequencing artifacts. These characteristics lower the quality of their downstream analyses, e.g. de novo and reference-based assembly, by introducing sequencing artifacts and errors that may contribute to incorrect interpretation of data. Although many tools have been developed for quality control and pre-processing of NGS data, none of them provide flexible and comprehensive trimming options in conjunction with parallel processing to expedite pre-processing of large NGS datasets. Methods We developed ngsShoRT (next-generation sequencing Short Reads Trimmer), a flexible and comprehensive open-source software package written in Perl that provides a set of algorithms commonly used for pre-processing NGS short read sequences. We compared the features and performance of ngsShoRT with existing tools: CutAdapt, NGS QC Toolkit and Trimmomatic. We also compared the effects of using pre-processed short read sequences generated by different algorithms on de novo and reference-based assembly for three different genomes: Caenorhabditis elegans, Saccharomyces cerevisiae S288c, and Escherichia coli O157 H7. Results Several combinations of ngsShoRT algorithms were tested on publicly available Illumina GA II, HiSeq 2000, and MiSeq eukaryotic and bacteria genomic short read sequences with the focus on removing sequencing artifacts and low-quality reads and/or bases. Our results show that across three organisms and three sequencing platforms, trimming improved the mean quality scores of trimmed sequences. Using trimmed sequences for de novo and reference-based assembly improved assembly quality as well as assembler performance. In general, ngsShoRT outperformed comparable trimming tools in terms of trimming speed and improvement of de novo and reference-based assembly as measured by assembly contiguity and correctness. Conclusions Trimming of short read sequences can improve the quality of de novo and reference-based assembly and assembler performance. The parallel processing capability of ngsShoRT reduces trimming time and improves the memory efficiency when dealing with large datasets. We recommend combining sequencing artifacts removal, and quality score based read filtering and base trimming as the most consistent method for improving sequence quality and downstream assemblies. ngsShoRT source code, user guide and tutorial are available at http://research.bioinformatics.udel.edu/genomics/ngsShoRT/. ngsShoRT can be incorporated as a pre-processing step in genome and transcriptome assembly projects. PMID:24955109
Spiliopoulou, Athina; Colombo, Marco; Orchard, Peter; Agakov, Felix; McKeigue, Paul
2017-01-01
We address the task of genotype imputation to a dense reference panel given genotype likelihoods computed from ultralow coverage sequencing as inputs. In this setting, the data have a high-level of missingness or uncertainty, and are thus more amenable to a probabilistic representation. Most existing imputation algorithms are not well suited for this situation, as they rely on prephasing for computational efficiency, and, without definite genotype calls, the prephasing task becomes computationally expensive. We describe GeneImp, a program for genotype imputation that does not require prephasing and is computationally tractable for whole-genome imputation. GeneImp does not explicitly model recombination, instead it capitalizes on the existence of large reference panels—comprising thousands of reference haplotypes—and assumes that the reference haplotypes can adequately represent the target haplotypes over short regions unaltered. We validate GeneImp based on data from ultralow coverage sequencing (0.5×), and compare its performance to the most recent version of BEAGLE that can perform this task. We show that GeneImp achieves imputation quality very close to that of BEAGLE, using one to two orders of magnitude less time, without an increase in memory complexity. Therefore, GeneImp is the first practical choice for whole-genome imputation to a dense reference panel when prephasing cannot be applied, for instance, in datasets produced via ultralow coverage sequencing. A related future application for GeneImp is whole-genome imputation based on the off-target reads from deep whole-exome sequencing. PMID:28348060
A computational method for detecting copy number variations using scale-space filtering
2013-01-01
Background As next-generation sequencing technology made rapid and cost-effective sequencing available, the importance of computational approaches in finding and analyzing copy number variations (CNVs) has been amplified. Furthermore, most genome projects need to accurately analyze sequences with fairly low-coverage read data. It is urgently needed to develop a method to detect the exact types and locations of CNVs from low coverage read data. Results Here, we propose a new CNV detection method, CNV_SS, which uses scale-space filtering. The scale-space filtering is evaluated by applying to the read coverage data the Gaussian convolution for various scales according to a given scaling parameter. Next, by differentiating twice and finding zero-crossing points, inflection points of scale-space filtered read coverage data are calculated per scale. Then, the types and the exact locations of CNVs are obtained by analyzing the finger print map, the contours of zero-crossing points for various scales. Conclusions The performance of CNV_SS showed that FNR and FPR stay in the range of 1.27% to 2.43% and 1.14% to 2.44%, respectively, even at a relatively low coverage (0.5x ≤C ≤2x). CNV_SS gave also much more effective results than the conventional methods in the evaluation of FNR, at 3.82% at least and 76.97% at most even when the coverage level of read data is low. CNV_SS source code is freely available from http://dblab.hallym.ac.kr/CNV SS/. PMID:23418726
Error and Error Mitigation in Low-Coverage Genome Assemblies
Hubisz, Melissa J.; Lin, Michael F.; Kellis, Manolis; Siepel, Adam
2011-01-01
The recent release of twenty-two new genome sequences has dramatically increased the data available for mammalian comparative genomics, but twenty of these new sequences are currently limited to ∼2× coverage. Here we examine the extent of sequencing error in these 2× assemblies, and its potential impact in downstream analyses. By comparing 2× assemblies with high-quality sequences from the ENCODE regions, we estimate the rate of sequencing error to be 1–4 errors per kilobase. While this error rate is fairly modest, sequencing error can still have surprising effects. For example, an apparent lineage-specific insertion in a coding region is more likely to reflect sequencing error than a true biological event, and the length distribution of coding indels is strongly distorted by error. We find that most errors are contributed by a small fraction of bases with low quality scores, in particular, by the ends of reads in regions of single-read coverage in the assembly. We explore several approaches for automatic sequencing error mitigation (SEM), making use of the localized nature of sequencing error, the fact that it is well predicted by quality scores, and information about errors that comes from comparisons across species. Our automatic methods for error mitigation cannot replace the need for additional sequencing, but they do allow substantial fractions of errors to be masked or eliminated at the cost of modest amounts of over-correction, and they can reduce the impact of error in downstream phylogenomic analyses. Our error-mitigated alignments are available for download. PMID:21340033
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies.
Utturkar, Sagar M; Klingeman, Dawn M; Hurt, Richard A; Brown, Steven D
2017-01-01
This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.
Eaton, Deren A R; Spriggs, Elizabeth L; Park, Brian; Donoghue, Michael J
2017-05-01
Restriction-site associated DNA (RAD) sequencing and related methods rely on the conservation of enzyme recognition sites to isolate homologous DNA fragments for sequencing, with the consequence that mutations disrupting these sites lead to missing information. There is thus a clear expectation for how missing data should be distributed, with fewer loci recovered between more distantly related samples. This observation has led to a related expectation: that RAD-seq data are insufficiently informative for resolving deeper scale phylogenetic relationships. Here we investigate the relationship between missing information among samples at the tips of a tree and information at edges within it. We re-analyze and review the distribution of missing data across ten RAD-seq data sets and carry out simulations to determine expected patterns of missing information. We also present new empirical results for the angiosperm clade Viburnum (Adoxaceae, with a crown age >50 Ma) for which we examine phylogenetic information at different depths in the tree and with varied sequencing effort. The total number of loci, the proportion that are shared, and phylogenetic informativeness varied dramatically across the examined RAD-seq data sets. Insufficient or uneven sequencing coverage accounted for similar proportions of missing data as dropout from mutation-disruption. Simulations reveal that mutation-disruption, which results in phylogenetically distributed missing data, can be distinguished from the more stochastic patterns of missing data caused by low sequencing coverage. In Viburnum, doubling sequencing coverage nearly doubled the number of parsimony informative sites, and increased by >10X the number of loci with data shared across >40 taxa. Our analysis leads to a set of practical recommendations for maximizing phylogenetic information in RAD-seq studies. [hierarchical redundancy; phylogenetic informativeness; quartet informativeness; Restriction-site associated DNA (RAD) sequencing; sequencing coverage; Viburnum.]. © The authors 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For permissions, please e-mail: journals.permission@oup.com.
The Genome Sequence of Taurine Cattle: A window to ruminant biology and evolution
Elsik, Christine G.; Tellam, Ross L.; Worley, Kim C.
2010-01-01
To understand the biology and evolution of ruminants, the cattle genome was sequenced to ∼7× coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1,217 are absent or undetected in non-eutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides an enabling resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production. PMID:19390049
Trosman, Julia R; Weldon, Christine B; Kelley, R Kate; Phillips, Kathryn A
2015-03-01
Next-generation tumor sequencing (NGTS) panels, which include multiple established and novel targets across cancers, are emerging in oncology practice, but lack formal positive coverage by US payers. Lack of coverage may impact access and adoption. This study identified challenges of NGTS coverage by private payers. We conducted semi-structured interviews with 14 NGTS experts on potential NGTS benefits, and with 10 major payers, representing more than 125,000,000 enrollees, on NGTS coverage considerations. We used the framework approach of qualitative research for study design and thematic analyses and simple frequencies to further describe findings. All interviewed payers see potential NGTS benefits, but all noted challenges to formal coverage: 80% state that inherent features of NGTS do not fit the medical necessity definition required for coverage, 70% view NGTS as a bundle of targets versus comprehensive tumor characterization and may evaluate each target individually, and 70% express skepticism regarding new evidence methods proposed for NGTS. Fifty percent of payers expressed sufficient concerns about NGTS adoption and implementation that will preclude their ability to issue positive coverage policies. Payers perceive that NGTS holds significant promise but, in its current form, poses disruptive challenges to coverage policy frameworks. Proactive multidisciplinary efforts to define the direction for NGTS development, evidence generation, and incorporation into coverage policy are necessary to realize its promise and provide patient access. This study contributes to current literature, as possibly the first study to directly interview US payers on NGTS coverage and reimbursement. Copyright © 2015 by the National Comprehensive Cancer Network.
Trosman, Julia R.; Weldon, Christine B.; Kate Kelley, R.; Phillips, Kathryn A.
2015-01-01
Background Next-generation tumor sequencing (NGTS) panels, which include multiple established and novel targets across cancers, are emerging in oncology practice, but lack formal positive coverage by US payers. Lack of coverage may impact access and adoption. This study identified challenges of NGTS coverage by private payers. Methods We conducted semi-structured interviews with 14 NGTS experts on potential NGTS benefits, and with 10 major payers, representing more than 125,000,000 enrollees, on NGTS coverage considerations. We used the framework approach of qualitative research for study design and thematic analyses and simple frequencies to further describe findings. Results All interviewed payers see potential NGTS benefits, but all noted challenges to formal coverage: 80% state that inherent features of NGTS do not fit the medical necessity definition required for coverage, 70% view NGTS as a bundle of targets versus comprehensive tumor characterization and may evaluate each target individually, and 70% express skepticism regarding new evidence methods proposed for NGTS. Fifty percent of payers expressed sufficient concerns about NGTS adoption and implementation that will preclude their ability to issue positive coverage policies. Conclusions Payers perceive that NGTS holds significant promise but, in its current form, poses disruptive challenges to coverage policy frameworks. Proactive multidisciplinary efforts to define the direction for NGTS development, evidence generation, and incorporation into coverage policy are necessary to realize its promise and provide patient access. This study contributes to current literature, as possibly the first study to directly interview US payers on NGTS coverage and reimbursement. PMID:25736008
Deep whole-genome sequencing of 90 Han Chinese genomes.
Lan, Tianming; Lin, Haoxiang; Zhu, Wenjuan; Laurent, Tellier Christian Asker Melchior; Yang, Mengcheng; Liu, Xin; Wang, Jun; Wang, Jian; Yang, Huanming; Xu, Xun; Guo, Xiaosen
2017-09-01
Next-generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low-frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low-frequency and novel variants. Although whole-exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole-genome sequencing data are limited for any population, and a large amount of low-frequency, population-specific variants remain uncharacterized. We have performed whole-genome sequencing at a high depth (∼×80) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genomes Project samples, including 45 Northern Han Chinese and 45 Southern Han Chinese samples. Eighty-three of these 90 have been sequenced by the 1000 Genomes Project. We have identified 12 568 804 single nucleotide polymorphisms, 2 074 210 short InDels, and 26 142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7 000 629 novel variants with low frequency (defined as minor allele frequency < 5%), including 5 813 503 single nucleotide polymorphisms, 1 169 199 InDels, and 17 927 structural variants. Using deep sequencing data, we have built a greatly expanded spectrum of genetic variation for the Han Chinese genome. Compared to the 1000 Genomes Project, these Han Chinese deep sequencing data enhance the characterization of a large number of low-frequency, novel variants. This will be a valuable resource for promoting Chinese genetics research and medical development. Additionally, it will provide a valuable supplement to the 1000 Genomes Project, as well as to other human genome projects. © The Authors 2017. Published by Oxford University Press.
Evaluating the Detection of Hydrocarbon-Degrading Bacteria in 16S rRNA Gene Sequencing Surveys
Berry, David; Gutierrez, Tony
2017-01-01
Hydrocarbonoclastic bacteria (HCB) play a key role in the biodegradation of oil hydrocarbons in marine and other environments. A small number of taxa have been identified as obligate HCB, notably the Gammaproteobacterial genera Alcanivorax, Cycloclasticus, Marinobacter, Neptumonas, Oleiphilus, Oleispira, and Thalassolituus, as well as the Alphaproteobacterial genus Thalassospira. Detection of HCB in amplicon-based sequencing surveys relies on high coverage by PCR primers and accurate taxonomic classification. In this study, we performed a phylogenetic analysis to identify 16S rRNA gene sequence regions that represent the breadth of sequence diversity within these taxa. Using validated sequences, we evaluated 449 universal 16S rRNA gene-targeted bacterial PCR primer pairs for their coverage of these taxa. The results of this analysis provide a practical framework for selection of suitable primer sets for optimal detection of HCB in sequencing surveys. PMID:28567035
Evaluating the Detection of Hydrocarbon-Degrading Bacteria in 16S rRNA Gene Sequencing Surveys.
Berry, David; Gutierrez, Tony
2017-01-01
Hydrocarbonoclastic bacteria (HCB) play a key role in the biodegradation of oil hydrocarbons in marine and other environments. A small number of taxa have been identified as obligate HCB, notably the Gammaproteobacterial genera Alcanivorax, Cycloclasticus, Marinobacter, Neptumonas, Oleiphilus, Oleispira , and Thalassolituus , as well as the Alphaproteobacterial genus Thalassospira . Detection of HCB in amplicon-based sequencing surveys relies on high coverage by PCR primers and accurate taxonomic classification. In this study, we performed a phylogenetic analysis to identify 16S rRNA gene sequence regions that represent the breadth of sequence diversity within these taxa. Using validated sequences, we evaluated 449 universal 16S rRNA gene-targeted bacterial PCR primer pairs for their coverage of these taxa. The results of this analysis provide a practical framework for selection of suitable primer sets for optimal detection of HCB in sequencing surveys.
The Bologna Annotation Resource (BAR 3.0): improving protein functional annotation.
Profiti, Giuseppe; Martelli, Pier Luigi; Casadio, Rita
2017-07-03
BAR 3.0 updates our server BAR (Bologna Annotation Resource) for predicting protein structural and functional features from sequence. We increase data volume, query capabilities and information conveyed to the user. The core of BAR 3.0 is a graph-based clustering procedure of UniProtKB sequences, following strict pairwise similarity criteria (sequence identity ≥40% with alignment coverage ≥90%). Each cluster contains the available annotation downloaded from UniProtKB, GO, PFAM and PDB. After statistical validation, GO terms and PFAM domains are cluster-specific and annotate new sequences entering the cluster after satisfying similarity constraints. BAR 3.0 includes 28 869 663 sequences in 1 361 773 clusters, of which 22.2% (22 241 661 sequences) and 47.4% (24 555 055 sequences) have at least one validated GO term and one PFAM domain, respectively. 1.4% of the clusters (36% of all sequences) include PDB structures and the cluster is associated to a hidden Markov model that allows building template-target alignment suitable for structural modeling. Some other 3 399 026 sequences are singletons. BAR 3.0 offers an improved search interface, allowing queries by UniProtKB-accession, Fasta sequence, GO-term, PFAM-domain, organism, PDB and ligand/s. When evaluated on the CAFA2 targets, BAR 3.0 largely outperforms our previous version and scores among state-of-the-art methods. BAR 3.0 is publicly available and accessible at http://bar.biocomp.unibo.it/bar3. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
USDA-ARS?s Scientific Manuscript database
The current pig reference genome sequence (Sscrofa10.2) was established using Sanger sequencing and following the clone-by-clone hierarchical shotgun sequencing approach used in the public human genome project. However, as sequence coverage was low (4-6x) the resulting assembly was only of draft qua...
Accurate and exact CNV identification from targeted high-throughput sequence data.
Nord, Alex S; Lee, Ming; King, Mary-Claire; Walsh, Tom
2011-04-12
Massively parallel sequencing of barcoded DNA samples significantly increases screening efficiency for clinically important genes. Short read aligners are well suited to single nucleotide and indel detection. However, methods for CNV detection from targeted enrichment are lacking. We present a method combining coverage with map information for the identification of deletions and duplications in targeted sequence data. Sequencing data is first scanned for gains and losses using a comparison of normalized coverage data between samples. CNV calls are confirmed by testing for a signature of sequences that span the CNV breakpoint. With our method, CNVs can be identified regardless of whether breakpoints are within regions targeted for sequencing. For CNVs where at least one breakpoint is within targeted sequence, exact CNV breakpoints can be identified. In a test data set of 96 subjects sequenced across ~1 Mb genomic sequence using multiplexing technology, our method detected mutations as small as 31 bp, predicted quantitative copy count, and had a low false-positive rate. Application of this method allows for identification of gains and losses in targeted sequence data, providing comprehensive mutation screening when combined with a short read aligner.
Thompson, Fabiano L; Bruce, Thiago; Gonzalez, Alessandra; Cardoso, Alexander; Clementino, Maysa; Costagliola, Marcela; Hozbor, Constanza; Otero, Ernesto; Piccini, Claudia; Peressutti, Silvia; Schmieder, Robert; Edwards, Robert; Smith, Mathew; Takiyama, Luis Roberto; Vieira, Ricardo; Paranhos, Rodolfo; Artigas, Luis Felipe
2011-02-01
The bacterioplankton diversity of coastal waters along a latitudinal gradient between Puerto Rico and Argentina was analyzed using a total of 134,197 high-quality sequences from the V6 hypervariable region of the small-subunit ribosomal RNA gene (16S rRNA) (mean length of 60 nt). Most of the OTUs were identified into Proteobacteria, Bacteriodetes, Cyanobacteria, and Actinobacteria, corresponding to approx. 80% of the total number of sequences. The number of OTUs corresponding to species varied between 937 and 1946 in the seven locations. Proteobacteria appeared at high frequency in the seven locations. An enrichment of Cyanobacteria was observed in Puerto Rico, whereas an enrichment of Bacteroidetes was detected in the Argentinian shelf and Uruguayan coastal lagoons. The highest number of sequences of Actinobacteria and Acidobacteria were obtained in the Amazon estuary mouth. The rarefaction curves and Good coverage estimator for species diversity suggested a significant coverage, with values ranging between 92 and 97% for Good coverage. Conserved taxa corresponded to aprox. 52% of all sequences. This study suggests that human-contaminated environments may influence bacterioplankton diversity.
Mowlaboccus, Shakeel; Perkins, Timothy T.; Smith, Helen; Sloots, Theo; Tozer, Sarah; Prempeh, Lydia-Jessica; Tay, Chin Yen; Peters, Fanny; Speers, David; Keil, Anthony D.; Kahler, Charlene M.
2016-01-01
Neisseria meningitidis is the causative agent of invasive meningococcal disease (IMD). The BEXSERO® vaccine which is used to prevent serogroup B disease is composed of four sub-capsular protein antigens supplemented with an outer membrane vesicle. Since the sub-capsular protein antigens are variably expressed and antigenically variable amongst meningococcal isolates, vaccine coverage can be estimated by the meningococcal antigen typing system (MATS) which measures the propensity of the strain to be killed by vaccinated sera. Whole genome sequencing (WGS) which identifies the alleles of the antigens that may be recognised by the antibody response could represent, in future, an alternative estimate of coverage. In this study, WGS of 278 meningococcal isolates responsible for 62% of IMD in Western Australia from 2000–2014 were analysed for association of genetic lineage (sequence type [ST], clonal complex [cc]) with BEXSERO® antigen sequence type (BAST) and MATS to predict the annual vaccine coverage. A hyper-endemic period of IMD between 2000–05 was caused by cc41/44 with the major sequence type of ST-146 which was not predicted by MATS or BAST to be covered by the vaccine. An increase in serogroup diversity was observed between 2010–14 with the emergence of cc11 serogroup W in the adolescent population and cc23 serogroup Y in the elderly. BASTs were statistically associated with clonal complex although individual antigens underwent antigenic drift from the major type. BAST and MATS predicted an annual range of 44–91% vaccine coverage. Periods of low vaccine coverage in years post-2005 were not a result of the resurgence of cc41/44:ST-146 but were characterised by increased diversity of clonal complexes expressing BASTs which were not predicted by MATS to be covered by the vaccine. The driving force behind the diversity of the clonal complex and BAST during these periods of low vaccine coverage is unknown, but could be due to immune selection and inter-strain competition with carriage of non-disease causing meningococci. PMID:27355628
Optimization of conditions to sequence long cDNAs from viruses
USDA-ARS?s Scientific Manuscript database
Fourth generation sequencing with the Minion nanopore sequencer provides opportunity to obtain deep coverage and long read for single molecules. This will benefit studies on RNA viruses. In the past, Sanger, Illumina, and Ion Torrent sequencing have been utilized to study RNA viruses. Both technique...
Maroso, F; Hillen, J E J; Pardo, B G; Gkagkavouzis, K; Coscia, I; Hermida, M; Franch, R; Hellemans, B; Van Houdt, J; Simionati, B; Taggart, J B; Nielsen, E E; Maes, G; Ciavaglia, S A; Webster, L M I; Volckaert, F A M; Martinez, P; Bargelloni, L; Ogden, R
2018-06-01
The development of Genotyping-By-Sequencing (GBS) technologies enables cost-effective analysis of large numbers of Single Nucleotide Polymorphisms (SNPs), especially in "non-model" species. Nevertheless, as such technologies enter a mature phase, biases and errors inherent to GBS are becoming evident. Here, we evaluated the performance of double digest Restriction enzyme Associated DNA (ddRAD) sequencing in SNP genotyping studies including high number of samples. Datasets of sequence data were generated from three marine teleost species (>5500 samples, >2.5 × 10 12 bases in total), using a standardized protocol. A common bioinformatics pipeline based on STACKS was established, with and without the use of a reference genome. We performed analyses throughout the production and analysis of ddRAD data in order to explore (i) the loss of information due to heterogeneous raw read number across samples; (ii) the discrepancy between expected and observed tag length and coverage; (iii) the performances of reference based vs. de novo approaches; (iv) the sources of potential genotyping errors of the library preparation/bioinformatics protocol, by comparing technical replicates. Our results showed use of a reference genome and a posteriori genotype correction improved genotyping precision. Individual read coverage was a key variable for reproducibility; variance in sequencing depth between loci in the same individual was also identified as an important factor and found to correlate to tag length. A comparison of downstream analysis carried out with ddRAD vs single SNP allele specific assay genotypes provided information about the levels of genotyping imprecision that can have a significant impact on allele frequency estimations and population assignment. The results and insights presented here will help to select and improve approaches to the analysis of large datasets based on RAD-like methodologies. Crown Copyright © 2018. Published by Elsevier B.V. All rights reserved.
Han, Lin; Wu, Hua-Jun; Zhu, Haiying; Kim, Kun-Yong; Marjani, Sadie L.; Riester, Markus; Euskirchen, Ghia; Zi, Xiaoyuan; Yang, Jennifer; Han, Jasper; Snyder, Michael; Park, In-Hyun; Irizarry, Rafael; Weissman, Sherman M.
2017-01-01
Abstract Conventional DNA bisulfite sequencing has been extended to single cell level, but the coverage consistency is insufficient for parallel comparison. Here we report a novel method for genome-wide CpG island (CGI) methylation sequencing for single cells (scCGI-seq), combining methylation-sensitive restriction enzyme digestion and multiple displacement amplification for selective detection of methylated CGIs. We applied this method to analyzing single cells from two types of hematopoietic cells, K562 and GM12878 and small populations of fibroblasts and induced pluripotent stem cells. The method detected 21 798 CGIs (76% of all CGIs) per cell, and the number of CGIs consistently detected from all 16 profiled single cells was 20 864 (72.7%), with 12 961 promoters covered. This coverage represents a substantial improvement over results obtained using single cell reduced representation bisulfite sequencing, with a 66-fold increase in the fraction of consistently profiled CGIs across individual cells. Single cells of the same type were more similar to each other than to other types, but also displayed epigenetic heterogeneity. The method was further validated by comparing the CpG methylation pattern, methylation profile of CGIs/promoters and repeat regions and 41 classes of known regulatory markers to the ENCODE data. Although not every minor methylation differences between cells are detectable, scCGI-seq provides a solid tool for unsupervised stratification of a heterogeneous cell population. PMID:28126923
Austin, Christopher M; Hammer, Michael P; Lee, Yin Peng; Gan, Han Ming
2018-01-01
Abstract Background Some of the most widely recognized coral reef fishes are clownfish or anemonefish, members of the family Pomacentridae (subfamily: Amphiprioninae). They are popular aquarium species due to their bright colours, adaptability to captivity, and fascinating behavior. Their breeding biology (sequential hermaphrodites) and symbiotic mutualism with sea anemones have attracted much scientific interest. Moreover, there are some curious geographic-based phenotypes that warrant investigation. Leveraging on the advancement in Nanopore long read technology, we report the first hybrid assembly of the clown anemonefish (Amphiprion ocellaris) genome utilizing Illumina and Nanopore reads, further demonstrating the substantial impact of modest long read sequencing data sets on improving genome assembly statistics. Results We generated 43 Gb of short Illumina reads and 9 Gb of long Nanopore reads, representing approximate genome coverage of 54× and 11×, respectively, based on the range of estimated k-mer-predicted genome sizes of between 791 and 967 Mbp. The final assembled genome is contained in 6404 scaffolds with an accumulated length of 880 Mb (96.3% BUSCO-calculated genome completeness). Compared with the Illumina-only assembly, the hybrid approach generated 94% fewer scaffolds with an 18-fold increase in N50 length (401 kb) and increased the genome completeness by an additional 16%. A total of 27 240 high-quality protein-coding genes were predicted from the clown anemonefish, 26 211 (96%) of which were annotated functionally with information from either sequence homology or protein signature searches. Conclusions We present the first genome of any anemonefish and demonstrate the value of low coverage (∼11×) long Nanopore read sequencing in improving both genome assembly contiguity and completeness. The near-complete assembly of the A. ocellaris genome will be an invaluable molecular resource for supporting a range of genetic, genomic, and phylogenetic studies specifically for clownfish and more generally for other related fish species of the family Pomacentridae. PMID:29342277
Tan, Mun Hua; Austin, Christopher M; Hammer, Michael P; Lee, Yin Peng; Croft, Laurence J; Gan, Han Ming
2018-03-01
Some of the most widely recognized coral reef fishes are clownfish or anemonefish, members of the family Pomacentridae (subfamily: Amphiprioninae). They are popular aquarium species due to their bright colours, adaptability to captivity, and fascinating behavior. Their breeding biology (sequential hermaphrodites) and symbiotic mutualism with sea anemones have attracted much scientific interest. Moreover, there are some curious geographic-based phenotypes that warrant investigation. Leveraging on the advancement in Nanopore long read technology, we report the first hybrid assembly of the clown anemonefish (Amphiprion ocellaris) genome utilizing Illumina and Nanopore reads, further demonstrating the substantial impact of modest long read sequencing data sets on improving genome assembly statistics. We generated 43 Gb of short Illumina reads and 9 Gb of long Nanopore reads, representing approximate genome coverage of 54× and 11×, respectively, based on the range of estimated k-mer-predicted genome sizes of between 791 and 967 Mbp. The final assembled genome is contained in 6404 scaffolds with an accumulated length of 880 Mb (96.3% BUSCO-calculated genome completeness). Compared with the Illumina-only assembly, the hybrid approach generated 94% fewer scaffolds with an 18-fold increase in N50 length (401 kb) and increased the genome completeness by an additional 16%. A total of 27 240 high-quality protein-coding genes were predicted from the clown anemonefish, 26 211 (96%) of which were annotated functionally with information from either sequence homology or protein signature searches. We present the first genome of any anemonefish and demonstrate the value of low coverage (∼11×) long Nanopore read sequencing in improving both genome assembly contiguity and completeness. The near-complete assembly of the A. ocellaris genome will be an invaluable molecular resource for supporting a range of genetic, genomic, and phylogenetic studies specifically for clownfish and more generally for other related fish species of the family Pomacentridae.
King, Timothy L.; Eackles, Michael S.; Reshetnikov, Andrey N.
2015-01-01
Human-mediated translocations and subsequent large-scale colonization by the invasive fish rotan (Perccottus glenii Dybowski, 1877; Perciformes, Odontobutidae), also known as Amur or Chinese sleeper, has resulted in dramatic transformations of small lentic ecosystems. However, no detailed genetic information exists on population structure, levels of effective movement, or relatedness among geographic populations of P. glenii within the European part of the range. We used massively parallel genomic DNA shotgun sequencing on the semiconductor-based Ion Torrent Personal Genome Machine (PGM) sequencing platform to identify nuclear microsatellite and mitochondrial DNA sequences in P. glenii from European Russia. Here we describe the characterization of nine nuclear microsatellite loci, ascertain levels of allelic diversity, heterozygosity, and demographic status of P. glenii collected from Ilev, Russia, one of several initial introduction points in European Russia. In addition, we mapped sequence reads to the complete P. glenii mitochondrial DNA sequence to identify polymorphic regions. Nuclear microsatellite markers developed for P. glenii yielded sufficient genetic diversity to: (1) produce unique multilocus genotypes; (2) elucidate structure among geographic populations; and (3) provide unique perspectives for analysis of population sizes and historical demographics. Among 4.9 million filtered P. glenii Ion Torrent PGM sequence reads, 11,304 mapped to the mitochondrial genome (NC_020350). This resulted in 100 % coverage of this genome to a mean coverage depth of 102X. A total of 130 variable sites were observed between the publicly available genome from China and the studied composite mitochondrial genome. Among these, 82 were diagnostic and monomorphic between the mitochondrial genomes and distributed among 15 genome regions. The polymorphic sites (N = 48) were distributed among 11 mitochondrial genome regions. Our results also indicate that sequence reads generated from two three-hour runs on the Ion Torrent PGM can generate a sufficient number of nuclear and mitochondrial markers to improve understanding of the evolutionary and ecological dynamics of non-model and in particular, invasive species.
Wang, Anqi; Wang, Zhanyu; Li, Zheng; Li, Lei M
2018-06-15
It is highly desirable to assemble genomes of high continuity and consistency at low cost. The current bottleneck of draft genome continuity using the second generation sequencing (SGS) reads is primarily caused by uncertainty among repetitive sequences. Even though the single-molecule real-time sequencing technology is very promising to overcome the uncertainty issue, its relatively high cost and error rate add burden on budget or computation. Many long-read assemblers take the overlap-layout-consensus (OLC) paradigm, which is less sensitive to sequencing errors, heterozygosity and variability of coverage. However, current assemblers of SGS data do not sufficiently take advantage of the OLC approach. Aiming at minimizing uncertainty, the proposed method BAUM, breaks the whole genome into regions by adaptive unique mapping; then the local OLC is used to assemble each region in parallel. BAUM can (i) perform reference-assisted assembly based on the genome of a close species (ii) or improve the results of existing assemblies that are obtained based on short or long sequencing reads. The tests on two eukaryote genomes, a wild rice Oryza longistaminata and a parrot Melopsittacus undulatus, show that BAUM achieved substantial improvement on genome size and continuity. Besides, BAUM reconstructed a considerable amount of repetitive regions that failed to be assembled by existing short read assemblers. We also propose statistical approaches to control the uncertainty in different steps of BAUM. http://www.zhanyuwang.xin/wordpress/index.php/2017/07/21/baum. Supplementary data are available at Bioinformatics online.
NASA Astrophysics Data System (ADS)
He, Lidong; Anderson, Lissa C.; Barnidge, David R.; Murray, David L.; Hendrickson, Christopher L.; Marshall, Alan G.
2017-05-01
With the rapid growth of therapeutic monoclonal antibodies (mAbs), stringent quality control is needed to ensure clinical safety and efficacy. Monoclonal antibody primary sequence and post-translational modifications (PTM) are conventionally analyzed with labor-intensive, bottom-up tandem mass spectrometry (MS/MS), which is limited by incomplete peptide sequence coverage and introduction of artifacts during the lengthy analysis procedure. Here, we describe top-down and middle-down approaches with the advantages of fast sample preparation with minimal artifacts, ultrahigh mass accuracy, and extensive residue cleavages by use of 21 tesla FT-ICR MS/MS. The ultrahigh mass accuracy yields an RMS error of 0.2-0.4 ppm for antibody light chain, heavy chain, heavy chain Fc/2, and Fd subunits. The corresponding sequence coverages are 81%, 38%, 72%, and 65% with MS/MS RMS error 4 ppm. Extension to a monoclonal antibody in human serum as a monoclonal gammopathy model yielded 53% sequence coverage from two nano-LC MS/MS runs. A blind analysis of five therapeutic monoclonal antibodies at clinically relevant concentrations in human serum resulted in correct identification of all five antibodies. Nano-LC 21 T FT-ICR MS/MS provides nonpareil mass resolution, mass accuracy, and sequence coverage for mAbs, and sets a benchmark for MS/MS analysis of multiple mAbs in serum. This is the first time that extensive cleavages for both variable and constant regions have been achieved for mAbs in a human serum background.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roux, Simon; Emerson, Joanne B.; Eloe-Fadrosh, Emiley A.
BackgroundViral metagenomics (viromics) is increasingly used to obtain uncultivated viral genomes, evaluate community diversity, and assess ecological hypotheses. While viromic experimental methods are relatively mature and widely accepted by the research community, robust bioinformatics standards remain to be established. Here we usedin silicomock viral communities to evaluate the viromic sequence-to-ecological-inference pipeline, including (i) read pre-processing and metagenome assembly, (ii) thresholds applied to estimate viral relative abundances based on read mapping to assembled contigs, and (iii) normalization methods applied to the matrix of viral relative abundances for alpha and beta diversity estimates. ResultsTools specifically designed for metagenomes, specifically metaSPAdes, MEGAHIT, andmore » IDBA-UD, were the most effective at assembling viromes. Read pre-processing, such as partitioning, had virtually no impact on assembly output, but may be useful when hardware is limited. Viral populations with 2–5 × coverage typically assembled well, whereas lesser coverage led to fragmented assembly. Strain heterogeneity within populations hampered assembly, especially when strains were closely related (average nucleotide identity, or ANI ≥97%) and when the most abundant strain represented <50% of the population. Viral community composition assessments based on read recruitment were generally accurate when the following thresholds for detection were applied: (i) ≥10 kb contig lengths to define populations, (ii) coverage defined from reads mapping at ≥90% identity, and (iii) ≥75% of contig length with ≥1 × coverage. Finally, although data are limited to the most abundant viruses in a community, alpha and beta diversity patterns were robustly estimated (±10%) when comparing samples of similar sequencing depth, but more divergent (up to 80%) when sequencing depth was uneven across the dataset. In the latter cases, the use of normalization methods specifically developed for metagenomes provided the best estimates. ConclusionsThese simulations provide benchmarks for selecting analysis cut-offs and establish that an optimized sample-to-ecological-inference viromics pipeline is robust for making ecological inferences from natural viral communities. Continued development to better accessing RNA, rare, and/or diverse viral populations and improved reference viral genome availability will alleviate many of viromics remaining limitations.« less
Roux, Simon; Emerson, Joanne B.; Eloe-Fadrosh, Emiley A.; ...
2017-09-21
BackgroundViral metagenomics (viromics) is increasingly used to obtain uncultivated viral genomes, evaluate community diversity, and assess ecological hypotheses. While viromic experimental methods are relatively mature and widely accepted by the research community, robust bioinformatics standards remain to be established. Here we usedin silicomock viral communities to evaluate the viromic sequence-to-ecological-inference pipeline, including (i) read pre-processing and metagenome assembly, (ii) thresholds applied to estimate viral relative abundances based on read mapping to assembled contigs, and (iii) normalization methods applied to the matrix of viral relative abundances for alpha and beta diversity estimates. ResultsTools specifically designed for metagenomes, specifically metaSPAdes, MEGAHIT, andmore » IDBA-UD, were the most effective at assembling viromes. Read pre-processing, such as partitioning, had virtually no impact on assembly output, but may be useful when hardware is limited. Viral populations with 2–5 × coverage typically assembled well, whereas lesser coverage led to fragmented assembly. Strain heterogeneity within populations hampered assembly, especially when strains were closely related (average nucleotide identity, or ANI ≥97%) and when the most abundant strain represented <50% of the population. Viral community composition assessments based on read recruitment were generally accurate when the following thresholds for detection were applied: (i) ≥10 kb contig lengths to define populations, (ii) coverage defined from reads mapping at ≥90% identity, and (iii) ≥75% of contig length with ≥1 × coverage. Finally, although data are limited to the most abundant viruses in a community, alpha and beta diversity patterns were robustly estimated (±10%) when comparing samples of similar sequencing depth, but more divergent (up to 80%) when sequencing depth was uneven across the dataset. In the latter cases, the use of normalization methods specifically developed for metagenomes provided the best estimates. ConclusionsThese simulations provide benchmarks for selecting analysis cut-offs and establish that an optimized sample-to-ecological-inference viromics pipeline is robust for making ecological inferences from natural viral communities. Continued development to better accessing RNA, rare, and/or diverse viral populations and improved reference viral genome availability will alleviate many of viromics remaining limitations.« less
Khafizov, Kamil; Madrid-Aliste, Carlos; Almo, Steven C.; Fiser, Andras
2014-01-01
The exponential growth of protein sequence data provides an ever-expanding body of unannotated and misannotated proteins. The National Institutes of Health-supported Protein Structure Initiative and related worldwide structural genomics efforts facilitate functional annotation of proteins through structural characterization. Recently there have been profound changes in the taxonomic composition of sequence databases, which are effectively redefining the scope and contribution of these large-scale structure-based efforts. The faster-growing bacterial genomic entries have overtaken the eukaryotic entries over the last 5 y, but also have become more redundant. Despite the enormous increase in the number of sequences, the overall structural coverage of proteins—including proteins for which reliable homology models can be generated—on the residue level has increased from 30% to 40% over the last 10 y. Structural genomics efforts contributed ∼50% of this new structural coverage, despite determining only ∼10% of all new structures. Based on current trends, it is expected that ∼55% structural coverage (the level required for significant functional insight) will be achieved within 15 y, whereas without structural genomics efforts, realizing this goal will take approximately twice as long. PMID:24567391
Khafizov, Kamil; Madrid-Aliste, Carlos; Almo, Steven C; Fiser, Andras
2014-03-11
The exponential growth of protein sequence data provides an ever-expanding body of unannotated and misannotated proteins. The National Institutes of Health-supported Protein Structure Initiative and related worldwide structural genomics efforts facilitate functional annotation of proteins through structural characterization. Recently there have been profound changes in the taxonomic composition of sequence databases, which are effectively redefining the scope and contribution of these large-scale structure-based efforts. The faster-growing bacterial genomic entries have overtaken the eukaryotic entries over the last 5 y, but also have become more redundant. Despite the enormous increase in the number of sequences, the overall structural coverage of proteins--including proteins for which reliable homology models can be generated--on the residue level has increased from 30% to 40% over the last 10 y. Structural genomics efforts contributed ∼50% of this new structural coverage, despite determining only ∼10% of all new structures. Based on current trends, it is expected that ∼55% structural coverage (the level required for significant functional insight) will be achieved within 15 y, whereas without structural genomics efforts, realizing this goal will take approximately twice as long.
Viking Imaging of Phobos and Deimos: An Overview of the Primary Mission
NASA Technical Reports Server (NTRS)
Duxbury, T. C.; Veverka, J.
1977-01-01
During the Viking primary mission the cameras on the two orbiters acquired about 50 pictures of the two Martian moons. The Viking images of the satellites have a higher surface resolution than those obtained by Mariner 9. The typical surface resolution achieved was 100-200 m, although detail as small as 40 m was imaged on Phobos during a particularly close passage. Attention is given to color sequences obtained for each satellite, aspects of phase angle coverage, and pictures for ephemeris improvement.
Searching for SNPs with cloud computing
2009-01-01
As DNA sequencing outpaces improvements in computer speed, there is a critical need to accelerate tasks like alignment and SNP calling. Crossbow is a cloud-computing software tool that combines the aligner Bowtie and the SNP caller SOAPsnp. Executing in parallel using Hadoop, Crossbow analyzes data comprising 38-fold coverage of the human genome in three hours using a 320-CPU cluster rented from a cloud computing service for about $85. Crossbow is available from http://bowtie-bio.sourceforge.net/crossbow/. PMID:19930550
Pollen, Alex A; Nowakowski, Tomasz J; Shuga, Joe; Wang, Xiaohui; Leyrat, Anne A; Lui, Jan H; Li, Nianzhen; Szpankowski, Lukasz; Fowler, Brian; Chen, Peilin; Ramalingam, Naveen; Sun, Gang; Thu, Myo; Norris, Michael; Lebofsky, Ronald; Toppani, Dominique; Kemp, Darnell W; Wong, Michael; Clerkson, Barry; Jones, Brittnee N; Wu, Shiquan; Knutsson, Lawrence; Alvarado, Beatriz; Wang, Jing; Weaver, Lesley S; May, Andrew P; Jones, Robert C; Unger, Marc A; Kriegstein, Arnold R; West, Jay A A
2014-10-01
Large-scale surveys of single-cell gene expression have the potential to reveal rare cell populations and lineage relationships but require efficient methods for cell capture and mRNA sequencing. Although cellular barcoding strategies allow parallel sequencing of single cells at ultra-low depths, the limitations of shallow sequencing have not been investigated directly. By capturing 301 single cells from 11 populations using microfluidics and analyzing single-cell transcriptomes across downsampled sequencing depths, we demonstrate that shallow single-cell mRNA sequencing (~50,000 reads per cell) is sufficient for unbiased cell-type classification and biomarker identification. In the developing cortex, we identify diverse cell types, including multiple progenitor and neuronal subtypes, and we identify EGR1 and FOS as previously unreported candidate targets of Notch signaling in human but not mouse radial glia. Our strategy establishes an efficient method for unbiased analysis and comparison of cell populations from heterogeneous tissue by microfluidic single-cell capture and low-coverage sequencing of many cells.
Deep whole-genome sequencing of 100 southeast Asian Malays.
Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying
2013-01-10
Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Deep Whole-Genome Sequencing of 100 Southeast Asian Malays
Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying
2013-01-01
Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies. PMID:23290073
Lindsey, Rebecca L.; Pouseele, Hannes; Chen, Jessica C.; Strockbine, Nancy A.; Carleton, Heather A.
2016-01-01
Shiga toxin-producing Escherichia coli (STEC) is an important foodborne pathogen capable of causing severe disease in humans. Rapid and accurate identification and characterization techniques are essential during outbreak investigations. Current methods for characterization of STEC are expensive and time-consuming. With the advent of rapid and cheap whole genome sequencing (WGS) benchtop sequencers, the potential exists to replace traditional workflows with WGS. The aim of this study was to validate tools to do reference identification and characterization from WGS for STEC in a single workflow within an easy to use commercially available software platform. Publically available serotype, virulence, and antimicrobial resistance databases were downloaded from the Center for Genomic Epidemiology (CGE) (www.genomicepidemiology.org) and integrated into a genotyping plug-in with in silico PCR tools to confirm some of the virulence genes detected from WGS data. Additionally, down sampling experiments on the WGS sequence data were performed to determine a threshold for sequence coverage needed to accurately predict serotype and virulence genes using the established workflow. The serotype database was tested on a total of 228 genomes and correctly predicted from WGS for 96.1% of O serogroups and 96.5% of H serogroups identified by conventional testing techniques. A total of 59 genomes were evaluated to determine the threshold of coverage to detect the different WGS targets, 40 were evaluated for serotype and virulence gene detection and 19 for the stx gene subtypes. For serotype, 95% of the O and 100% of the H serogroups were detected at > 40x and ≥ 30x coverage, respectively. For virulence targets and stx gene subtypes, nearly all genes were detected at > 40x, though some targets were 100% detectable from genomes with coverage ≥20x. The resistance detection tool was 97% concordant with phenotypic testing results. With isolates sequenced to > 40x coverage, the different databases accurately predicted serotype, virulence, and resistance from WGS data, providing a fast and cheaper alternative to conventional typing techniques. PMID:27242777
Lindsey, Rebecca L; Pouseele, Hannes; Chen, Jessica C; Strockbine, Nancy A; Carleton, Heather A
2016-01-01
Shiga toxin-producing Escherichia coli (STEC) is an important foodborne pathogen capable of causing severe disease in humans. Rapid and accurate identification and characterization techniques are essential during outbreak investigations. Current methods for characterization of STEC are expensive and time-consuming. With the advent of rapid and cheap whole genome sequencing (WGS) benchtop sequencers, the potential exists to replace traditional workflows with WGS. The aim of this study was to validate tools to do reference identification and characterization from WGS for STEC in a single workflow within an easy to use commercially available software platform. Publically available serotype, virulence, and antimicrobial resistance databases were downloaded from the Center for Genomic Epidemiology (CGE) (www.genomicepidemiology.org) and integrated into a genotyping plug-in with in silico PCR tools to confirm some of the virulence genes detected from WGS data. Additionally, down sampling experiments on the WGS sequence data were performed to determine a threshold for sequence coverage needed to accurately predict serotype and virulence genes using the established workflow. The serotype database was tested on a total of 228 genomes and correctly predicted from WGS for 96.1% of O serogroups and 96.5% of H serogroups identified by conventional testing techniques. A total of 59 genomes were evaluated to determine the threshold of coverage to detect the different WGS targets, 40 were evaluated for serotype and virulence gene detection and 19 for the stx gene subtypes. For serotype, 95% of the O and 100% of the H serogroups were detected at > 40x and ≥ 30x coverage, respectively. For virulence targets and stx gene subtypes, nearly all genes were detected at > 40x, though some targets were 100% detectable from genomes with coverage ≥20x. The resistance detection tool was 97% concordant with phenotypic testing results. With isolates sequenced to > 40x coverage, the different databases accurately predicted serotype, virulence, and resistance from WGS data, providing a fast and cheaper alternative to conventional typing techniques.
EXAMINING EVIDENCE IN U.S. PAYER COVERAGE POLICIES FOR MULTI-GENE PANELS AND SEQUENCING TESTS
Chambers, James D.; Saret, Cayla J.; Anderson, Jordan E.; Deverka, Patricia A.; Douglas, Michael P.; Phillips, Kathryn A.
2017-01-01
Objectives The aim of this study was to examine the evidence payers cited in their coverage policies for multi-gene panels and sequencing tests (panels), and to compare these findings with the evidence payers cited in their coverage policies for other types of medical interventions. Methods We used the University of California at San Francisco TRANSPERS Payer Coverage Registry to identify coverage policies for panels issued by five of the largest US private payers. We reviewed each policy and categorized the evidence cited within as: clinical studies, systematic reviews, technology assessments, cost-effectiveness analyses (CEAs), budget impact studies, and clinical guidelines. We compared the evidence cited in these coverage policies for panels with the evidence cited in policies for other intervention types (pharmaceuticals, medical devices, diagnostic tests and imaging, and surgical interventions) as reported in a previous study. Results Fifty-five coverage policies for panels were included. On average, payers cited clinical guidelines in 84 percent of their coverage policies (range, 73–100 percent), clinical studies in 69 percent (50–87 percent), technology assessments 47 percent (33–86 percent), systematic reviews or meta-analyses 31 percent (7–71 percent), and CEAs 5 percent (0–7 percent). No payers cited budget impact studies in their policies. Payers less often cited clinical studies, systematic reviews, technology assessments, and CEAs in their coverage policies for panels than in their policies for other intervention types. Payers cited clinical guidelines in a comparable proportion of policies for panels and other technology types. Conclusions Payers in our sample less often cited clinical studies and other evidence types in their coverage policies for panels than they did in their coverage policies for other types of medical interventions. PMID:29065945
EXAMINING EVIDENCE IN U.S. PAYER COVERAGE POLICIES FOR MULTI-GENE PANELS AND SEQUENCING TESTS.
Chambers, James D; Saret, Cayla J; Anderson, Jordan E; Deverka, Patricia A; Douglas, Michael P; Phillips, Kathryn A
2017-01-01
The aim of this study was to examine the evidence payers cited in their coverage policies for multi-gene panels and sequencing tests (panels), and to compare these findings with the evidence payers cited in their coverage policies for other types of medical interventions. We used the University of California at San Francisco TRANSPERS Payer Coverage Registry to identify coverage policies for panels issued by five of the largest US private payers. We reviewed each policy and categorized the evidence cited within as: clinical studies, systematic reviews, technology assessments, cost-effectiveness analyses (CEAs), budget impact studies, and clinical guidelines. We compared the evidence cited in these coverage policies for panels with the evidence cited in policies for other intervention types (pharmaceuticals, medical devices, diagnostic tests and imaging, and surgical interventions) as reported in a previous study. Fifty-five coverage policies for panels were included. On average, payers cited clinical guidelines in 84 percent of their coverage policies (range, 73-100 percent), clinical studies in 69 percent (50-87 percent), technology assessments 47 percent (33-86 percent), systematic reviews or meta-analyses 31 percent (7-71 percent), and CEAs 5 percent (0-7 percent). No payers cited budget impact studies in their policies. Payers less often cited clinical studies, systematic reviews, technology assessments, and CEAs in their coverage policies for panels than in their policies for other intervention types. Payers cited clinical guidelines in a comparable proportion of policies for panels and other technology types. Payers in our sample less often cited clinical studies and other evidence types in their coverage policies for panels than they did in their coverage policies for other types of medical interventions.
Reality of Single Circulating Tumor Cell Sequencing for Molecular Diagnostics in Pancreatic Cancer.
Court, Colin M; Ankeny, Jacob S; Sho, Shonan; Hou, Shuang; Li, Qingyu; Hsieh, Carolyn; Song, Min; Liao, Xinfang; Rochefort, Matthew M; Wainberg, Zev A; Graeber, Thomas G; Tseng, Hsian-Rong; Tomlinson, James S
2016-09-01
To understand the potential and limitations of circulating tumor cell (CTC) sequencing for molecular diagnostics, we investigated the feasibility of identifying the ubiquitous KRAS mutation in single CTCs from pancreatic cancer (PC) patients. We used the NanoVelcro/laser capture microdissection CTC platform, combined with whole genome amplification and KRAS Sanger sequencing. We assessed both KRAS codon-12 coverage and the degree that allele dropout during whole genome amplification affected the detection of KRAS mutations from single CTCs. We isolated 385 single cells, 163 from PC cell lines and 222 from the blood of 12 PC patients, and obtained KRAS sequence coverage in 218 of 385 single cells (56.6%). For PC cell lines with known KRAS mutations, single mutations were detected in 67% of homozygous cells but only 37.4% of heterozygous single cells, demonstrating that both coverage and allele dropout are important causes of mutation detection failure from single cells. We could detect KRAS mutations in CTCs from 11 of 12 patients (92%) and 33 of 119 single CTCs sequenced, resulting in a KRAS mutation detection rate of 27.7%. Importantly, KRAS mutations were never found in the 103 white blood cells sequenced. Sequencing of groups of cells containing between 1 and 100 cells determined that at least 10 CTCs are likely required to reliably assess KRAS mutation status from CTCs. Copyright © 2016 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
Lee, Donald W; Khavrutskii, Ilja V; Wallqvist, Anders; Bavari, Sina; Cooper, Christopher L; Chaudhury, Sidhartha
2016-01-01
The somatic diversity of antigen-recognizing B-cell receptors (BCRs) arises from Variable (V), Diversity (D), and Joining (J) (VDJ) recombination and somatic hypermutation (SHM) during B-cell development and affinity maturation. The VDJ junction of the BCR heavy chain forms the highly variable complementarity determining region 3 (CDR3), which plays a critical role in antigen specificity and binding affinity. Tracking the selection and mutation of the CDR3 can be useful in characterizing humoral responses to infection and vaccination. Although tens to hundreds of thousands of unique BCR genes within an expressed B-cell repertoire can now be resolved with high-throughput sequencing, tracking SHMs is still challenging because existing annotation methods are often limited by poor annotation coverage, inconsistent SHM identification across the VDJ junction, or lack of B-cell lineage data. Here, we present B-cell repertoire inductive lineage and immunosequence annotator (BRILIA), an algorithm that leverages repertoire-wide sequencing data to globally improve the VDJ annotation coverage, lineage tree assembly, and SHM identification. On benchmark tests against simulated human and mouse BCR repertoires, BRILIA correctly annotated germline and clonally expanded sequences with 94 and 70% accuracy, respectively, and it has a 90% SHM-positive prediction rate in the CDR3 of heavily mutated sequences; these are substantial improvements over existing methods. We used BRILIA to process BCR sequences obtained from splenic germinal center B cells extracted from C57BL/6 mice. BRILIA returned robust B-cell lineage trees and yielded SHM patterns that are consistent across the VDJ junction and agree with known biological mechanisms of SHM. By contrast, existing BCR annotation tools, which do not account for repertoire-wide clonal relationships, systematically underestimated both the size of clonally related B-cell clusters and yielded inconsistent SHM frequencies. We demonstrate BRILIA's utility in B-cell repertoire studies related to VDJ gene usage, mechanisms for adenosine mutations, and SHM hot spot motifs. Furthermore, we show that the complete gene usage annotation and SHM identification across the entire CDR3 are essential for studying the B-cell affinity maturation process through immunosequencing methods.
Brodie, Nicholas I; Huguet, Romain; Zhang, Terry; Viner, Rosa; Zabrouskov, Vlad; Pan, Jingxi; Petrotchenko, Evgeniy V; Borchers, Christoph H
2018-03-06
Top-down hydrogen-deuterium exchange (HDX) analysis using electron capture or transfer dissociation Fourier transform mass spectrometry (FTMS) is a powerful method for the analysis of secondary structure of proteins in solution. The resolution of the method is a function of the degree of fragmentation of backbone bonds in the proteins. While fragmentation is usually extensive near the N- and C-termini, electron capture (ECD) or electron transfer dissociation (ETD) fragmentation methods sometimes lack good coverage of certain regions of the protein, most often in the middle of the sequence. Ultraviolet photodissociation (UVPD) is a recently developed fast-fragmentation technique, which provides extensive backbone fragmentation that can be complementary in sequence coverage to the aforementioned electron-based fragmentation techniques. Here, we explore the application of electrospray ionization (ESI)-UVPD FTMS on an Orbitrap Fusion Lumos Tribrid mass spectrometer to top-down HDX analysis of proteins. We have incorporated UVPD-specific fragment-ion types and fragment-ion mixtures into our isotopic envelope fitting software (HDX Match) for the top-down HDX analysis. We have shown that UVPD data is complementary to ETD, thus improving the overall resolution when used as a combined approach.
Recent Advances in Chemical Modification of Peptide Nucleic Acids
Rozners, Eriks
2012-01-01
Peptide nucleic acid (PNA) has become an extremely powerful tool in chemistry and biology. Although PNA recognizes single-stranded nucleic acids with exceptionally high affinity and sequence selectivity, there is considerable ongoing effort to further improve properties of PNA for both fundamental science and practical applications. The present paper discusses selected recent studies that improve on cellular uptake and binding of PNA to double-stranded DNA and RNA. The focus is on chemical modifications of PNA's backbone and heterocyclic nucleobases. The paper selects representative recent studies and does not attempt to provide comprehensive coverage of the broad and vibrant field of PNA modification. PMID:22991652
Ruggles, Kelly V; Tang, Zuojian; Wang, Xuya; Grover, Himanshu; Askenazi, Manor; Teubl, Jennifer; Cao, Song; McLellan, Michael D; Clauser, Karl R; Tabb, David L; Mertins, Philipp; Slebos, Robbert; Erdmann-Gilmore, Petra; Li, Shunqiang; Gunawardena, Harsha P; Xie, Ling; Liu, Tao; Zhou, Jian-Ying; Sun, Shisheng; Hoadley, Katherine A; Perou, Charles M; Chen, Xian; Davies, Sherri R; Maher, Christopher A; Kinsinger, Christopher R; Rodland, Karen D; Zhang, Hui; Zhang, Zhen; Ding, Li; Townsend, R Reid; Rodriguez, Henry; Chan, Daniel; Smith, Richard D; Liebler, Daniel C; Carr, Steven A; Payne, Samuel; Ellis, Matthew J; Fenyő, David
2016-03-01
Improvements in mass spectrometry (MS)-based peptide sequencing provide a new opportunity to determine whether polymorphisms, mutations, and splice variants identified in cancer cells are translated. Herein, we apply a proteogenomic data integration tool (QUILTS) to illustrate protein variant discovery using whole genome, whole transcriptome, and global proteome datasets generated from a pair of luminal and basal-like breast-cancer-patient-derived xenografts (PDX). The sensitivity of proteogenomic analysis for singe nucleotide variant (SNV) expression and novel splice junction (NSJ) detection was probed using multiple MS/MS sample process replicates defined here as an independent tandem MS experiment using identical sample material. Despite analysis of over 30 sample process replicates, only about 10% of SNVs (somatic and germline) detected by both DNA and RNA sequencing were observed as peptides. An even smaller proportion of peptides corresponding to NSJ observed by RNA sequencing were detected (<0.1%). Peptides mapping to DNA-detected SNVs without a detectable mRNA transcript were also observed, suggesting that transcriptome coverage was incomplete (∼80%). In contrast to germline variants, somatic variants were less likely to be detected at the peptide level in the basal-like tumor than in the luminal tumor, raising the possibility of differential translation or protein degradation effects. In conclusion, this large-scale proteogenomic integration allowed us to determine the degree to which mutations are translated and identify gaps in sequence coverage, thereby benchmarking current technology and progress toward whole cancer proteome and transcriptome analysis. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.
Improved PCR primers for the detection and identification of arbuscular mycorrhizal fungi.
Lee, Jaikoo; Lee, Sangsun; Young, J Peter W
2008-08-01
A set of PCR primers that should amplify all subgroups of arbuscular mycorrhizal fungi (AMF, Glomeromycota), but exclude sequences from other organisms, was designed to facilitate rapid detection and identification directly from field-grown plant roots. The small subunit rRNA gene was targeted for the new primers (AML1 and AML2) because phylogenetic relationships among the Glomeromycota are well understood for this gene. Sequence comparisons indicate that the new primers should amplify all published AMF sequences except those from Archaeospora trappei. The specificity of the new primers was tested using 23 different AMF spore morphotypes from trap cultures and Miscanthus sinensis, Glycine max and Panax ginseng roots sampled from the field. Non-AMF DNA of 14 plants, 14 Basidiomycota and 18 Ascomycota was also tested as negative controls. Sequences amplified from roots using the new primers were compared with those obtained using the established NS31 and AM1 primer combination. The new primers have much better specificity and coverage of all known AMF groups.
Fujimori, Shigeo; Hirai, Naoya; Ohashi, Hiroyuki; Masuoka, Kazuyo; Nishikimi, Akihiko; Fukui, Yoshinori; Washio, Takanori; Oshikubo, Tomohiro; Yamashita, Tatsuhiro; Miyamoto-Sato, Etsuko
2012-01-01
Next-generation sequencing (NGS) has been applied to various kinds of omics studies, resulting in many biological and medical discoveries. However, high-throughput protein-protein interactome datasets derived from detection by sequencing are scarce, because protein-protein interaction analysis requires many cell manipulations to examine the interactions. The low reliability of the high-throughput data is also a problem. Here, we describe a cell-free display technology combined with NGS that can improve both the coverage and reliability of interactome datasets. The completely cell-free method gives a high-throughput and a large detection space, testing the interactions without using clones. The quantitative information provided by NGS reduces the number of false positives. The method is suitable for the in vitro detection of proteins that interact not only with the bait protein, but also with DNA, RNA and chemical compounds. Thus, it could become a universal approach for exploring the large space of protein sequences and interactome networks. PMID:23056904
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Jr., Richard A.
This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted.more » PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Furthermore, our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.« less
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies
Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Jr., Richard A.; ...
2017-07-18
This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted.more » PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Furthermore, our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.« less
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies
Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Richard A.; Brown, Steven D.
2017-01-01
This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences. PMID:28769883
LaFond, Anne; Kanagat, Natasha; Steinglass, Robert; Fields, Rebecca; Sequeira, Jenny; Mookherji, Sangeeta
2015-01-01
There is limited understanding of why routine immunization (RI) coverage improves in some settings in Africa and not in others. Using a grounded theory approach, we conducted in-depth case studies to understand pathways to coverage improvement by comparing immunization programme experience in 12 districts in three countries (Ethiopia, Cameroon and Ghana). Drawing on positive deviance or assets model techniques we compared the experience of districts where diphtheria–tetanus–pertussis (DTP3)/pentavalent3 (Penta3) coverage improved with districts where DTP3/Penta3 coverage remained unchanged (or steady) over the same period, focusing on basic readiness to deliver immunization services and drivers of coverage improvement. The results informed a model for immunization coverage improvement that emphasizes the dynamics of immunization systems at district level. In all districts, whether improving or steady, we found that a set of basic RI system resources were in place from 2006 to 2010 and did not observe major differences in infrastructure. We found that the differences in coverage trends were due to factors other than basic RI system capacity or service readiness. We identified six common drivers of RI coverage performance improvement—four direct drivers and two enabling drivers—that were present in well-performing districts and weaker or absent in steady coverage districts, and map the pathways from driver to improved supply, demand and coverage. Findings emphasize the critical role of implementation strategies and the need for locally skilled managers that are capable of tailoring strategies to specific settings and community needs. The case studies are unique in their focus on the positive drivers of change and the identification of pathways to coverage improvement, an approach that should be considered in future studies and routine assessments of district-level immunization system performance. PMID:24615431
Comparing microarrays and next-generation sequencing technologies for microbial ecology research.
Roh, Seong Woon; Abell, Guy C J; Kim, Kyoung-Ho; Nam, Young-Do; Bae, Jin-Woo
2010-06-01
Recent advances in molecular biology have resulted in the application of DNA microarrays and next-generation sequencing (NGS) technologies to the field of microbial ecology. This review aims to examine the strengths and weaknesses of each of the methodologies, including depth and ease of analysis, throughput and cost-effectiveness. It also intends to highlight the optimal application of each of the individual technologies toward the study of a particular environment and identify potential synergies between the two main technologies, whereby both sample number and coverage can be maximized. We suggest that the efficient use of microarray and NGS technologies will allow researchers to advance the field of microbial ecology, and importantly, improve our understanding of the role of microorganisms in their various environments.
SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing
Bankevich, Anton; Nurk, Sergey; Antipov, Dmitry; Gurevich, Alexey A.; Dvorkin, Mikhail; Kulikov, Alexander S.; Lesin, Valery M.; Nikolenko, Sergey I.; Pham, Son; Prjibelski, Andrey D.; Pyshkin, Alexey V.; Sirotkin, Alexander V.; Vyahhi, Nikolay; Tesler, Glenn; Pevzner, Pavel A.
2012-01-01
Abstract The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V−SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online (http://bioinf.spbau.ru/spades). It is distributed as open source software. PMID:22506599
Kim, Seungill; Park, Minkyu; Yeom, Seon-In; Kim, Yong-Min; Lee, Je Min; Lee, Hyun-Ah; Seo, Eunyoung; Choi, Jaeyoung; Cheong, Kyeongchae; Kim, Ki-Tae; Jung, Kyongyong; Lee, Gir-Won; Oh, Sang-Keun; Bae, Chungyun; Kim, Saet-Byul; Lee, Hye-Young; Kim, Shin-Young; Kim, Myung-Shin; Kang, Byoung-Cheorl; Jo, Yeong Deuk; Yang, Hee-Bum; Jeong, Hee-Jin; Kang, Won-Hee; Kwon, Jin-Kyung; Shin, Chanseok; Lim, Jae Yun; Park, June Hyun; Huh, Jin Hoe; Kim, June-Sik; Kim, Byung-Dong; Cohen, Oded; Paran, Ilan; Suh, Mi Chung; Lee, Saet Buyl; Kim, Yeon-Ki; Shin, Younhee; Noh, Seung-Jae; Park, Junhyung; Seo, Young Sam; Kwon, Suk-Yoon; Kim, Hyun A; Park, Jeong Mee; Kim, Hyun-Jin; Choi, Sang-Bong; Bosland, Paul W; Reeves, Gregory; Jo, Sung-Hwan; Lee, Bong-Woo; Cho, Hyung-Taeg; Choi, Hee-Seung; Lee, Min-Soo; Yu, Yeisoo; Do Choi, Yang; Park, Beom-Seok; van Deynze, Allen; Ashrafi, Hamid; Hill, Theresa; Kim, Woo Taek; Pai, Hyun-Sook; Ahn, Hee Kyung; Yeam, Inhwa; Giovannoni, James J; Rose, Jocelyn K C; Sørensen, Iben; Lee, Sang-Jik; Kim, Ryan W; Choi, Ik-Young; Choi, Beom-Soon; Lim, Jong-Sung; Lee, Yong-Hwan; Choi, Doil
2014-03-01
Hot pepper (Capsicum annuum), one of the oldest domesticated crops in the Americas, is the most widely grown spice crop in the world. We report whole-genome sequencing and assembly of the hot pepper (Mexican landrace of Capsicum annuum cv. CM334) at 186.6× coverage. We also report resequencing of two cultivated peppers and de novo sequencing of the wild species Capsicum chinense. The genome size of the hot pepper was approximately fourfold larger than that of its close relative tomato, and the genome showed an accumulation of Gypsy and Caulimoviridae family elements. Integrative genomic and transcriptomic analyses suggested that change in gene expression and neofunctionalization of capsaicin synthase have shaped capsaicinoid biosynthesis. We found differential molecular patterns of ripening regulators and ethylene synthesis in hot pepper and tomato. The reference genome will serve as a platform for improving the nutritional and medicinal values of Capsicum species.
2011-01-01
Background Milkweeds (Asclepias L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L.) could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. Results A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp) and 5S rDNA (120 bp) sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp), with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae) unigenes (median coverage of 0.29×) and 66% of single copy orthologs (COSII) in asterids (median coverage of 0.14×). From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites) and phylogenetics (low-copy nuclear genes) studies were developed. Conclusions The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species and its relatives. This study represents a first step in the development of a community resource for further study of plant-insect co-evolution, anti-herbivore defense, floral developmental genetics, reproductive biology, chemical evolution, population genetics, and comparative genomics using milkweeds, and A. syriaca in particular, as ecological and evolutionary models. PMID:21542930
Straub, Shannon C K; Fishbein, Mark; Livshultz, Tatyana; Foster, Zachary; Parks, Matthew; Weitemier, Kevin; Cronn, Richard C; Liston, Aaron
2011-05-04
Milkweeds (Asclepias L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L.) could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp) and 5S rDNA (120 bp) sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp), with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae) unigenes (median coverage of 0.29×) and 66% of single copy orthologs (COSII) in asterids (median coverage of 0.14×). From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites) and phylogenetics (low-copy nuclear genes) studies were developed. The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species and its relatives. This study represents a first step in the development of a community resource for further study of plant-insect co-evolution, anti-herbivore defense, floral developmental genetics, reproductive biology, chemical evolution, population genetics, and comparative genomics using milkweeds, and A. syriaca in particular, as ecological and evolutionary models.
Genomic Sequencing: Assessing The Health Care System, Policy, And Big-Data Implications
Phillips, Kathryn A.; Trosman, Julia; Kelley, Robin K.; Pletcher, Mark J.; Douglas, Michael P.; Weldon, Christine B.
2014-01-01
New genomic sequencing technologies enable the high-speed analysis of multiple genes simultaneously, including all of those in a person's genome. Sequencing is a prominent example of a “big data” technology because of the massive amount of information it produces and its complexity, diversity, and timeliness. Our objective in this article is to provide a policy primer on sequencing and illustrate how it can affect health care system and policy issues. Toward this end, we developed an easily applied classification of sequencing based on inputs, methods, and outputs. We used it to examine the implications of sequencing for three health care system and policy issues: making care more patient-centered, developing coverage and reimbursement policies, and assessing economic value. We conclude that sequencing has great promise but that policy challenges include how to optimize patient engagement as well as privacy, develop coverage policies that distinguish research from clinical uses and account for bioinformatics costs, and determine the economic value of sequencing through complex economic models that take into account multiple findings and downstream costs. PMID:25006153
Genomic sequencing: assessing the health care system, policy, and big-data implications.
Phillips, Kathryn A; Trosman, Julia R; Kelley, Robin K; Pletcher, Mark J; Douglas, Michael P; Weldon, Christine B
2014-07-01
New genomic sequencing technologies enable the high-speed analysis of multiple genes simultaneously, including all of those in a person's genome. Sequencing is a prominent example of a "big data" technology because of the massive amount of information it produces and its complexity, diversity, and timeliness. Our objective in this article is to provide a policy primer on sequencing and illustrate how it can affect health care system and policy issues. Toward this end, we developed an easily applied classification of sequencing based on inputs, methods, and outputs. We used it to examine the implications of sequencing for three health care system and policy issues: making care more patient-centered, developing coverage and reimbursement policies, and assessing economic value. We conclude that sequencing has great promise but that policy challenges include how to optimize patient engagement as well as privacy, develop coverage policies that distinguish research from clinical uses and account for bioinformatics costs, and determine the economic value of sequencing through complex economic models that take into account multiple findings and downstream costs. Project HOPE—The People-to-People Health Foundation, Inc.
High-resolution characterization of a hepatocellular carcinoma genome.
Totoki, Yasushi; Tatsuno, Kenji; Yamamoto, Shogo; Arai, Yasuhito; Hosoda, Fumie; Ishikawa, Shumpei; Tsutsumi, Shuichi; Sonoda, Kohtaro; Totsuka, Hirohiko; Shirakihara, Takuya; Sakamoto, Hiromi; Wang, Linghua; Ojima, Hidenori; Shimada, Kazuaki; Kosuge, Tomoo; Okusaka, Takuji; Kato, Kazuto; Kusuda, Jun; Yoshida, Teruhiko; Aburatani, Hiroyuki; Shibata, Tatsuhiro
2011-05-01
Hepatocellular carcinoma, one of the most common virus-associated cancers, is the third most frequent cause of cancer-related death worldwide. By massively parallel sequencing of a primary hepatitis C virus-positive hepatocellular carcinoma (36× coverage) and matched lymphocytes (>28× coverage) from the same individual, we identified more than 11,000 somatic substitutions of the tumor genome that showed predominance of T>C/A>G transition and a decrease of the T>C substitution on the transcribed strand, suggesting preferential DNA repair. Gene annotation enrichment analysis of 63 validated non-synonymous substitutions revealed enrichment of phosphoproteins. We further validated 22 chromosomal rearrangements, generating four fusion transcripts that had altered transcriptional regulation (BCORL1-ELF4) or promoter activity. Whole-exome sequencing at a higher sequence depth (>76× coverage) revealed a TSC1 nonsense substitution in a subpopulation of the tumor cells. This first high-resolution characterization of a virus-associated cancer genome identified previously uncharacterized mutation patterns, intra-chromosomal rearrangements and fusion genes, as well as genetic heterogeneity within the tumor.
Licastro, Danilo; Mutarelli, Margherita; Peluso, Ivana; Neveling, Kornelia; Wieskamp, Nienke; Rispoli, Rossella; Vozzi, Diego; Athanasakis, Emmanouil; D'Eustacchio, Angela; Pizzo, Mariateresa; D'Amico, Francesca; Ziviello, Carmela; Simonelli, Francesca; Fabretto, Antonella; Scheffer, Hans; Gasparini, Paolo; Banfi, Sandro; Nigro, Vincenzo
2012-01-01
Usher syndrome (USH) is a clinically and genetically heterogeneous disorder characterized by visual and hearing impairments. Clinically, it is subdivided into three subclasses with nine genes identified so far. In the present study, we investigated whether the currently available Next Generation Sequencing (NGS) technologies are already suitable for molecular diagnostics of USH. We analyzed a total of 12 patients, most of which were negative for previously described mutations in known USH genes upon primer extension-based microarray genotyping. We enriched the NGS template either by whole exome capture or by Long-PCR of the known USH genes. The main NGS sequencing platforms were used: SOLiD for whole exome sequencing, Illumina (Genome Analyzer II) and Roche 454 (GS FLX) for the Long-PCR sequencing. Long-PCR targeting was more efficient with up to 94% of USH gene regions displaying an overall coverage higher than 25×, whereas whole exome sequencing yielded a similar coverage for only 50% of those regions. Overall this integrated analysis led to the identification of 11 novel sequence variations in USH genes (2 homozygous and 9 heterozygous) out of 18 detected. However, at least two cases were not genetically solved. Our result highlights the current limitations in the diagnostic use of NGS for USH patients. The limit for whole exome sequencing is linked to the need of a strong coverage and to the correct interpretation of sequence variations with a non obvious, pathogenic role, whereas the targeted approach suffers from the high genetic heterogeneity of USH that may be also caused by the presence of additional causative genes yet to be identified. PMID:22952768
Ozhinsky, Eugene; Vigneron, Daniel B; Nelson, Sarah J
2011-04-01
To develop a technique for optimizing coverage of brain 3D (1) H magnetic resonance spectroscopic imaging (MRSI) by automatic placement of outer-volume suppression (OVS) saturation bands (sat bands) and to compare the performance for point-resolved spectroscopic sequence (PRESS) MRSI protocols with manual and automatic placement of sat bands. The automated OVS procedure includes the acquisition of anatomic images from the head, obtaining brain and lipid tissue maps, calculating optimal sat band placement, and then using those optimized parameters during the MRSI acquisition. The data were analyzed to quantify brain coverage volume and data quality. 3D PRESS MRSI data were acquired from three healthy volunteers and 29 patients using protocols that included either manual or automatic sat band placement. On average, the automatic sat band placement allowed the acquisition of PRESS MRSI data from 2.7 times larger brain volumes than the conventional method while maintaining data quality. The technique developed helps solve two of the most significant problems with brain PRESS MRSI acquisitions: limited brain coverage and difficulty in prescription. This new method will facilitate routine clinical brain 3D MRSI exams and will be important for performing serial evaluation of response to therapy in patients with brain tumors and other neurological diseases. Copyright © 2011 Wiley-Liss, Inc.
Single haplotype assembly of the human genome from a hydatidiform mole.
Steinberg, Karyn Meltz; Schneider, Valerie A; Graves-Lindsay, Tina A; Fulton, Robert S; Agarwala, Richa; Huddleston, John; Shiryev, Sergey A; Morgulis, Aleksandr; Surti, Urvashi; Warren, Wesley C; Church, Deanna M; Eichler, Evan E; Wilson, Richard K
2014-12-01
A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly. © 2014 Steinberg et al.; Published by Cold Spring Harbor Laboratory Press.
Single haplotype assembly of the human genome from a hydatidiform mole
Steinberg, Karyn Meltz; Schneider, Valerie A.; Graves-Lindsay, Tina A.; Fulton, Robert S.; Agarwala, Richa; Huddleston, John; Shiryev, Sergey A.; Morgulis, Aleksandr; Surti, Urvashi; Warren, Wesley C.; Church, Deanna M.; Eichler, Evan E.; Wilson, Richard K.
2014-01-01
A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly. PMID:25373144
Evaluating imputation algorithms for low-depth genotyping-by-sequencing (GBS) data
USDA-ARS?s Scientific Manuscript database
Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordabl...
EEG microstates during resting represent personality differences.
Schlegel, Felix; Lehmann, Dietrich; Faber, Pascal L; Milz, Patricia; Gianotti, Lorena R R
2012-01-01
We investigated the spontaneous brain electric activity of 13 skeptics and 16 believers in paranormal phenomena; they were university students assessed with a self-report scale about paranormal beliefs. 33-channel EEG recordings during no-task resting were processed as sequences of momentary potential distribution maps. Based on the maps at peak times of Global Field Power, the sequences were parsed into segments of quasi-stable potential distribution, the 'microstates'. The microstates were clustered into four classes of map topographies (A-D). Analysis of the microstate parameters time coverage, occurrence frequency and duration as well as the temporal sequence (syntax) of the microstate classes revealed significant differences: Believers had a higher coverage and occurrence of class B, tended to decreased coverage and occurrence of class C, and showed a predominant sequence of microstate concatenations from A to C to B to A that was reversed in skeptics (A to B to C to A). Microstates of different topographies, putative "atoms of thought", are hypothesized to represent different types of information processing.The study demonstrates that personality differences can be detected in resting EEG microstate parameters and microstate syntax. Microstate analysis yielded no conclusive evidence for the hypothesized relation between paranormal belief and schizophrenia.
A case of canine borreliosis in Iran caused by Borrelia persica.
Shirani, Darush; Rakhshanpoor, Alaleh; Cutler, Sally Jane; Ghazinezhad, Behnaz; Naddaf, Saied Reza
2016-04-01
Tick-borne relapsing fever is an endemic disease in Iran, with most cases attributed to infection by Borrelia persica, which is transmitted by Ornithodoros tholozani soft ticks. Here, we report spirochetemia in blood of a puppy residing in Tehran, Iran. The causative species was identified by use of highly discriminative IGS sequencing; the 489 bp IGS sequence obtained in our study showed 99% identity (100% coverage) when compared with B. persica sequences derived from clinical cases or from O. tholozani ticks. Our IGS sequence also showed 99% similarity over 414 bp (85% coverage) with a strain from a domestic dog, and 96% over 328 bp (69% coverage) with a strain from a domestic cat. Pet-keeping in cosmopolitan cities like Tehran has become increasingly popular in recent years. Animals are often transported into the city in cages or cardboard boxes that might also harbor minute tick larvae and/or early stages of the nymphs bringing them into the urban environment. This may pose a threat to household members who buy and keep these puppies and as a result may come into close contact with infected ticks. Copyright © 2016 Elsevier GmbH. All rights reserved.
ReadXplorer—visualization and analysis of mapped sequences
Hilker, Rolf; Stadermann, Kai Bernd; Doppmeier, Daniel; Kalinowski, Jörn; Stoye, Jens; Straube, Jasmin; Winnebald, Jörn; Goesmann, Alexander
2014-01-01
Motivation: Fast algorithms and well-arranged visualizations are required for the comprehensive analysis of the ever-growing size of genomic and transcriptomic next-generation sequencing data. Results: ReadXplorer is a software offering straightforward visualization and extensive analysis functions for genomic and transcriptomic DNA sequences mapped on a reference. A unique specialty of ReadXplorer is the quality classification of the read mappings. It is incorporated in all analysis functions and displayed in ReadXplorer's various synchronized data viewers for (i) the reference sequence, its base coverage as (ii) normalizable plot and (iii) histogram, (iv) read alignments and (v) read pairs. ReadXplorer's analysis capability covers RNA secondary structure prediction, single nucleotide polymorphism and deletion–insertion polymorphism detection, genomic feature and general coverage analysis. Especially for RNA-Seq data, it offers differential gene expression analysis, transcription start site and operon detection as well as RPKM value and read count calculations. Furthermore, ReadXplorer can combine or superimpose coverage of different datasets. Availability and implementation: ReadXplorer is available as open-source software at http://www.readxplorer.org along with a detailed manual. Contact: rhilker@mikrobio.med.uni-giessen.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24790157
Phylogenomics from Whole Genome Sequences Using aTRAM.
Allen, Julie M; Boyd, Bret; Nguyen, Nam-Phuong; Vachaspati, Pranjal; Warnow, Tandy; Huang, Daisie I; Grady, Patrick G S; Bell, Kayce C; Cronk, Quentin C B; Mugisha, Lawrence; Pittendrigh, Barry R; Leonardi, M Soledad; Reed, David L; Johnson, Kevin P
2017-09-01
Novel sequencing technologies are rapidly expanding the size of data sets that can be applied to phylogenetic studies. Currently the most commonly used phylogenomic approaches involve some form of genome reduction. While these approaches make assembling phylogenomic data sets more economical for organisms with large genomes, they reduce the genomic coverage and thereby the long-term utility of the data. Currently, for organisms with moderate to small genomes ($<$1000 Mbp) it is feasible to sequence the entire genome at modest coverage ($10-30\\times$). Computational challenges for handling these large data sets can be alleviated by assembling targeted reads, rather than assembling the entire genome, to produce a phylogenomic data matrix. Here we demonstrate the use of automated Target Restricted Assembly Method (aTRAM) to assemble 1107 single-copy ortholog genes from whole genome sequencing of sucking lice (Anoplura) and out-groups. We developed a pipeline to extract exon sequences from the aTRAM assemblies by annotating them with respect to the original target protein. We aligned these protein sequences with the inferred amino acids and then performed phylogenetic analyses on both the concatenated matrix of genes and on each gene separately in a coalescent analysis. Finally, we tested the limits of successful assembly in aTRAM by assembling 100 genes from close- to distantly related taxa at high to low levels of coverage.Both the concatenated analysis and the coalescent-based analysis produced the same tree topology, which was consistent with previously published results and resolved weakly supported nodes. These results demonstrate that this approach is successful at developing phylogenomic data sets from raw genome sequencing reads. Further, we found that with coverages above $5-10\\times$, aTRAM was successful at assembling 80-90% of the contigs for both close and distantly related taxa. As sequencing costs continue to decline, we expect full genome sequencing will become more feasible for a wider array of organisms, and aTRAM will enable mining of these genomic data sets for an extensive variety of applications, including phylogenomics. [aTRAM; gene assembly; genome sequencing; phylogenomics.]. © The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Repetitive part of the banana (Musa acuminata) genome investigated by low-depth 454 sequencing.
Hribová, Eva; Neumann, Pavel; Matsumoto, Takashi; Roux, Nicolas; Macas, Jirí; Dolezel, Jaroslav
2010-09-16
Bananas and plantains (Musa spp.) are grown in more than a hundred tropical and subtropical countries and provide staple food for hundreds of millions of people. They are seed-sterile crops propagated clonally and this makes them vulnerable to a rapid spread of devastating diseases and at the same time hampers breeding improved cultivars. Although the socio-economic importance of bananas and plantains cannot be overestimated, they remain outside the focus of major research programs. This slows down the study of nuclear genome and the development of molecular tools to facilitate banana improvement. In this work, we report on the first thorough characterization of the repeat component of the banana (M. acuminata cv. 'Calcutta 4') genome. Analysis of almost 100 Mb of sequence data (0.15× genome coverage) permitted partial sequence reconstruction and characterization of repetitive DNA, making up about 30% of the genome. The results showed that the banana repeats are predominantly made of various types of Ty1/copia and Ty3/gypsy retroelements representing 16 and 7% of the genome respectively. On the other hand, DNA transposons were found to be rare. In addition to new families of transposable elements, two new satellite repeats were discovered and found useful as cytogenetic markers. To help in banana sequence annotation, a specific Musa repeat database was created, and its utility was demonstrated by analyzing the repeat composition of 62 genomic BAC clones. A low-depth 454 sequencing of banana nuclear genome provided the largest amount of DNA sequence data available until now for Musa and permitted reconstruction of most of the major types of DNA repeats. The information obtained in this study improves the knowledge of the long-range organization of banana chromosomes, and provides sequence resources needed for repeat masking and annotation during the Musa genome sequencing project. It also provides sequence data for isolation of DNA markers to be used in genetic diversity studies and in marker-assisted selection.
Repetitive part of the banana (Musa acuminata) genome investigated by low-depth 454 sequencing
2010-01-01
Background Bananas and plantains (Musa spp.) are grown in more than a hundred tropical and subtropical countries and provide staple food for hundreds of millions of people. They are seed-sterile crops propagated clonally and this makes them vulnerable to a rapid spread of devastating diseases and at the same time hampers breeding improved cultivars. Although the socio-economic importance of bananas and plantains cannot be overestimated, they remain outside the focus of major research programs. This slows down the study of nuclear genome and the development of molecular tools to facilitate banana improvement. Results In this work, we report on the first thorough characterization of the repeat component of the banana (M. acuminata cv. 'Calcutta 4') genome. Analysis of almost 100 Mb of sequence data (0.15× genome coverage) permitted partial sequence reconstruction and characterization of repetitive DNA, making up about 30% of the genome. The results showed that the banana repeats are predominantly made of various types of Ty1/copia and Ty3/gypsy retroelements representing 16 and 7% of the genome respectively. On the other hand, DNA transposons were found to be rare. In addition to new families of transposable elements, two new satellite repeats were discovered and found useful as cytogenetic markers. To help in banana sequence annotation, a specific Musa repeat database was created, and its utility was demonstrated by analyzing the repeat composition of 62 genomic BAC clones. Conclusion A low-depth 454 sequencing of banana nuclear genome provided the largest amount of DNA sequence data available until now for Musa and permitted reconstruction of most of the major types of DNA repeats. The information obtained in this study improves the knowledge of the long-range organization of banana chromosomes, and provides sequence resources needed for repeat masking and annotation during the Musa genome sequencing project. It also provides sequence data for isolation of DNA markers to be used in genetic diversity studies and in marker-assisted selection. PMID:20846365
Li, Ying Hong; Xu, Jing Yu; Tao, Lin; Li, Xiao Feng; Li, Shuang; Zeng, Xian; Chen, Shang Ying; Zhang, Peng; Qin, Chu; Zhang, Cheng; Chen, Zhe; Zhu, Feng; Chen, Yu Zong
2016-01-01
Knowledge of protein function is important for biological, medical and therapeutic studies, but many proteins are still unknown in function. There is a need for more improved functional prediction methods. Our SVM-Prot web-server employed a machine learning method for predicting protein functional families from protein sequences irrespective of similarity, which complemented those similarity-based and other methods in predicting diverse classes of proteins including the distantly-related proteins and homologous proteins of different functions. Since its publication in 2003, we made major improvements to SVM-Prot with (1) expanded coverage from 54 to 192 functional families, (2) more diverse protein descriptors protein representation, (3) improved predictive performances due to the use of more enriched training datasets and more variety of protein descriptors, (4) newly integrated BLAST analysis option for assessing proteins in the SVM-Prot predicted functional families that were similar in sequence to a query protein, and (5) newly added batch submission option for supporting the classification of multiple proteins. Moreover, 2 more machine learning approaches, K nearest neighbor and probabilistic neural networks, were added for facilitating collective assessment of protein functions by multiple methods. SVM-Prot can be accessed at http://bidd2.nus.edu.sg/cgi-bin/svmprot/svmprot.cgi.
Everts-van der Wind, Annelie; Kata, Srinivas R.; Band, Mark R.; Rebeiz, Mark; Larkin, Denis M.; Everts, Robin E.; Green, Cheryl A.; Liu, Lei; Natarajan, Shreedhar; Goldammer, Tom; Lee, Jun Heon; McKay, Stephanie; Womack, James E.; Lewin, Harris A.
2004-01-01
A second-generation 5000 rad radiation hybrid (RH) map of the cattle genome was constructed primarily using cattle ESTs that were targeted to gaps in the existing cattle–human comparative map, as well as to sparsely populated map intervals. A total of 870 targeted markers were added, bringing the number of markers mapped on the RH5000 panel to 1913. Of these, 1463 have significant BLASTN hits (E < e–5) against the human genome sequence. A cattle–human comparative map was created using human genome sequence coordinates of the paired orthologs. One-hundred and ninety-five conserved segments (defined by two or more genes) were identified between the cattle and human genomes, of which 31 are newly discovered and 34 were extended singletons on the first-generation map. The new map represents an improvement of 20% genome-wide comparative coverage compared with the first-generation map. Analysis of gene content within human genome regions where there are gaps in the comparative map revealed gaps with both significantly greater and significantly lower gene content. The new, more detailed cattle–human comparative map provides an improved resource for the analysis of mammalian chromosome evolution, the identification of candidate genes for economically important traits, and for proper alignment of sequence contigs on cattle chromosomes. PMID:15231756
HetMappsS: Heterozygous mapping strategy for high resolution Genotyping-by-Sequencing Markers
USDA-ARS?s Scientific Manuscript database
Reduced representation genotyping approaches, such as genotyping-by-sequencing (GBS), provide opportunities to generate high-resolution genetic maps at a low per-sample cost. However, missing data and non-uniform sequence coverage can complicate map creation in highly heterozygous species. To facili...
Laskin, Julia [Richland, WA; Futrell, Jean H [Richland, WA
2008-04-29
The invention relates to a method and apparatus for enhanced sequencing of complex molecules using surface-induced dissociation (SID) in conjunction with mass spectrometric analysis. Results demonstrate formation of a wide distribution of structure-specific fragments having wide sequence coverage useful for sequencing and identifying the complex molecules.
Chung, Jongsuk; Son, Dae-Soon; Jeon, Hyo-Jeong; Kim, Kyoung-Mee; Park, Gahee; Ryu, Gyu Ha; Park, Woong-Yang; Park, Donghyun
2016-01-01
Targeted capture massively parallel sequencing is increasingly being used in clinical settings, and as costs continue to decline, use of this technology may become routine in health care. However, a limited amount of tissue has often been a challenge in meeting quality requirements. To offer a practical guideline for the minimum amount of input DNA for targeted sequencing, we optimized and evaluated the performance of targeted sequencing depending on the input DNA amount. First, using various amounts of input DNA, we compared commercially available library construction kits and selected Agilent’s SureSelect-XT and KAPA Biosystems’ Hyper Prep kits as the kits most compatible with targeted deep sequencing using Agilent’s SureSelect custom capture. Then, we optimized the adapter ligation conditions of the Hyper Prep kit to improve library construction efficiency and adapted multiplexed hybrid selection to reduce the cost of sequencing. In this study, we systematically evaluated the performance of the optimized protocol depending on the amount of input DNA, ranging from 6.25 to 200 ng, suggesting the minimal input DNA amounts based on coverage depths required for specific applications. PMID:27220682
Carss, Keren J; Arno, Gavin; Erwood, Marie; Stephens, Jonathan; Sanchis-Juan, Alba; Hull, Sarah; Megy, Karyn; Grozeva, Detelina; Dewhurst, Eleanor; Malka, Samantha; Plagnol, Vincent; Penkett, Christopher; Stirrups, Kathleen; Rizzo, Roberta; Wright, Genevieve; Josifova, Dragana; Bitner-Glindzicz, Maria; Scott, Richard H; Clement, Emma; Allen, Louise; Armstrong, Ruth; Brady, Angela F; Carmichael, Jenny; Chitre, Manali; Henderson, Robert H H; Hurst, Jane; MacLaren, Robert E; Murphy, Elaine; Paterson, Joan; Rosser, Elisabeth; Thompson, Dorothy A; Wakeling, Emma; Ouwehand, Willem H; Michaelides, Michel; Moore, Anthony T; Webster, Andrew R; Raymond, F Lucy
2017-01-05
Inherited retinal disease is a common cause of visual impairment and represents a highly heterogeneous group of conditions. Here, we present findings from a cohort of 722 individuals with inherited retinal disease, who have had whole-genome sequencing (n = 605), whole-exome sequencing (n = 72), or both (n = 45) performed, as part of the NIHR-BioResource Rare Diseases research study. We identified pathogenic variants (single-nucleotide variants, indels, or structural variants) for 404/722 (56%) individuals. Whole-genome sequencing gives unprecedented power to detect three categories of pathogenic variants in particular: structural variants, variants in GC-rich regions, which have significantly improved coverage compared to whole-exome sequencing, and variants in non-coding regulatory regions. In addition to previously reported pathogenic regulatory variants, we have identified a previously unreported pathogenic intronic variant in CHM in two males with choroideremia. We have also identified 19 genes not previously known to be associated with inherited retinal disease, which harbor biallelic predicted protein-truncating variants in unsolved cases. Whole-genome sequencing is an increasingly important comprehensive method with which to investigate the genetic causes of inherited retinal disease. Copyright © 2017. Published by Elsevier Inc.
High-resolution community profiling of arbuscular mycorrhizal fungi.
Schlaeppi, Klaus; Bender, S Franz; Mascher, Fabio; Russo, Giancarlo; Patrignani, Andrea; Camenzind, Tessa; Hempel, Stefan; Rillig, Matthias C; van der Heijden, Marcel G A
2016-11-01
Community analyses of arbuscular mycorrhizal fungi (AMF) using ribosomal small subunit (SSU) or internal transcribed spacer (ITS) DNA sequences often suffer from low resolution or coverage. We developed a novel sequencing based approach for a highly resolving and specific profiling of AMF communities. We took advantage of previously established AMF-specific PCR primers that amplify a c. 1.5-kb long fragment covering parts of SSU, ITS and parts of the large ribosomal subunit (LSU), and we sequenced the resulting amplicons with single molecule real-time (SMRT) sequencing. The method was applicable to soil and root samples, detected all major AMF families and successfully discriminated closely related AMF species, which would not be discernible using SSU sequences. In inoculation tests we could trace the introduced AMF inoculum at the molecular level. One of the introduced strains almost replaced the local strain(s), revealing that AMF inoculation can have a profound impact on the native community. The methodology presented offers researchers a powerful new tool for AMF community analysis because it unifies improved specificity and enhanced resolution, whereas the drawback of medium sequencing throughput appears of lesser importance for low-diversity groups such as AMF. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.
Senerchia, Natacha; Wicker, Thomas; Felber, François; Parisod, Christian
2013-01-01
Transposable elements (TEs) represent a major fraction of plant genomes and drive their evolution. An improved understanding of genome evolution requires the dynamics of a large number of TE families to be considered. We put forward an approach bypassing the required step of a complete reference genome to assess the evolutionary trajectories of high copy number TE families from genome snapshot with high-throughput sequencing. Low coverage sequencing of the complex genomes of Aegilops cylindrica and Ae. geniculata using 454 identified more than 70% of the sequences as known TEs, mainly long terminal repeat (LTR) retrotransposons. Comparing the abundance of reads as well as patterns of sequence diversity and divergence within and among genomes assessed the dynamics of 44 major LTR retrotransposon families of the 165 identified. In particular, molecular population genetics on individual TE copies distinguished recently active from quiescent families and highlighted different evolutionary trajectories of retrotransposons among related species. This work presents a suite of tools suitable for current sequencing data, allowing to address the genome-wide evolutionary dynamics of TEs at the family level and advancing our understanding of the evolution of nonmodel genomes.
COACH: profile-profile alignment of protein families using hidden Markov models.
Edgar, Robert C; Sjölander, Kimmen
2004-05-22
Alignments of two multiple-sequence alignments, or statistical models of such alignments (profiles), have important applications in computational biology. The increased amount of information in a profile versus a single sequence can lead to more accurate alignments and more sensitive homolog detection in database searches. Several profile-profile alignment methods have been proposed and have been shown to improve sensitivity and alignment quality compared with sequence-sequence methods (such as BLAST) and profile-sequence methods (e.g. PSI-BLAST). Here we present a new approach to profile-profile alignment we call Comparison of Alignments by Constructing Hidden Markov Models (HMMs) (COACH). COACH aligns two multiple sequence alignments by constructing a profile HMM from one alignment and aligning the other to that HMM. We compare the alignment accuracy of COACH with two recently published methods: Yona and Levitt's prof_sim and Sadreyev and Grishin's COMPASS. On two sets of reference alignments selected from the FSSP database, we find that COACH is able, on average, to produce alignments giving the best coverage or the fewest errors, depending on the chosen parameter settings. COACH is freely available from www.drive5.com/lobster
Ciotlos, Serban; Mao, Qing; Zhang, Rebecca Yu; Li, Zhenyu; Chin, Robert; Gulbahce, Natali; Liu, Sophie Jia; Drmanac, Radoje; Peters, Brock A
2016-01-01
The cell line BT-474 is a popular cell line for studying the biology of cancer and developing novel drugs. However, there is no complete, published genome sequence for this highly utilized scientific resource. In this study we sought to provide a comprehensive and useful data set for the scientific community by generating a whole genome sequence for BT-474. Five μg of genomic DNA, isolated from an early passage of the BT-474 cell line, was used to generate a whole genome sequence (114X coverage) using Complete Genomics' standard sequencing process. To provide additional variant phasing and structural variation data we also processed and analyzed two separate libraries of 5 and 6 individual cells to depths of 99X and 87X, respectively, using Complete Genomics' Long Fragment Read (LFR) technology. BT-474 is a highly aneuploid cell line with an extremely complex genome sequence. This ~300X total coverage genome sequence provides a more complete understanding of this highly utilized cell line at the genomic level.
Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel.
Huang, Jie; Howie, Bryan; McCarthy, Shane; Memari, Yasin; Walter, Klaudia; Min, Josine L; Danecek, Petr; Malerba, Giovanni; Trabetti, Elisabetta; Zheng, Hou-Feng; Gambaro, Giovanni; Richards, J Brent; Durbin, Richard; Timpson, Nicholas J; Marchini, Jonathan; Soranzo, Nicole
2015-09-14
Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low depth (average 7x), aiming to exhaustively characterize genetic variation down to 0.1% minor allele frequency in the British population. Here we demonstrate the value of this resource for improving imputation accuracy at rare and low-frequency variants in both a UK and an Italian population. We show that large increases in imputation accuracy can be achieved by re-phasing WGS reference panels after initial genotype calling. We also present a method for combining WGS panels to improve variant coverage and downstream imputation accuracy, which we illustrate by integrating 7,562 WGS haplotypes from the UK10K project with 2,184 haplotypes from the 1000 Genomes Project. Finally, we introduce a novel approximation that maintains speed without sacrificing imputation accuracy for rare variants.
Cuff, Alison L.; Sillitoe, Ian; Lewis, Tony; Clegg, Andrew B.; Rentzsch, Robert; Furnham, Nicholas; Pellegrini-Calace, Marialuisa; Jones, David; Thornton, Janet; Orengo, Christine A.
2011-01-01
CATH version 3.3 (class, architecture, topology, homology) contains 128 688 domains, 2386 homologous superfamilies and 1233 fold groups, and reflects a major focus on classifying structural genomics (SG) structures and transmembrane proteins, both of which are likely to add structural novelty to the database and therefore increase the coverage of protein fold space within CATH. For CATH version 3.4 we have significantly improved the presentation of sequence information and associated functional information for CATH superfamilies. The CATH superfamily pages now reflect both the functional and structural diversity within the superfamily and include structural alignments of close and distant relatives within the superfamily, annotated with functional information and details of conserved residues. A significantly more efficient search function for CATH has been established by implementing the search server Solr (http://lucene.apache.org/solr/). The CATH v3.4 webpages have been built using the Catalyst web framework. PMID:21097779
Han, Lin; Wu, Hua-Jun; Zhu, Haiying; Kim, Kun-Yong; Marjani, Sadie L; Riester, Markus; Euskirchen, Ghia; Zi, Xiaoyuan; Yang, Jennifer; Han, Jasper; Snyder, Michael; Park, In-Hyun; Irizarry, Rafael; Weissman, Sherman M; Michor, Franziska; Fan, Rong; Pan, Xinghua
2017-06-02
Conventional DNA bisulfite sequencing has been extended to single cell level, but the coverage consistency is insufficient for parallel comparison. Here we report a novel method for genome-wide CpG island (CGI) methylation sequencing for single cells (scCGI-seq), combining methylation-sensitive restriction enzyme digestion and multiple displacement amplification for selective detection of methylated CGIs. We applied this method to analyzing single cells from two types of hematopoietic cells, K562 and GM12878 and small populations of fibroblasts and induced pluripotent stem cells. The method detected 21 798 CGIs (76% of all CGIs) per cell, and the number of CGIs consistently detected from all 16 profiled single cells was 20 864 (72.7%), with 12 961 promoters covered. This coverage represents a substantial improvement over results obtained using single cell reduced representation bisulfite sequencing, with a 66-fold increase in the fraction of consistently profiled CGIs across individual cells. Single cells of the same type were more similar to each other than to other types, but also displayed epigenetic heterogeneity. The method was further validated by comparing the CpG methylation pattern, methylation profile of CGIs/promoters and repeat regions and 41 classes of known regulatory markers to the ENCODE data. Although not every minor methylation differences between cells are detectable, scCGI-seq provides a solid tool for unsupervised stratification of a heterogeneous cell population. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Korshoj, Lee E; Afsari, Sepideh; Chatterjee, Anushree; Nagpal, Prashant
2017-11-01
Electronic conduction or charge transport through single molecules depends primarily on molecular structure and anchoring groups and forms the basis for a wide range of studies from molecular electronics to DNA sequencing. Several high-throughput nanoelectronic methods such as mechanical break junctions, nanopores, conductive atomic force microscopy, scanning tunneling break junctions, and static nanoscale electrodes are often used for measuring single-molecule conductance. In these measurements, "smearing" due to conformational changes and other entropic factors leads to large variances in the observed molecular conductance, especially in individual measurements. Here, we show a method for characterizing smear in single-molecule conductance measurements and demonstrate how binning measurements according to smear can significantly enhance the use of individual conductance measurements for molecular recognition. Using quantum point contact measurements on single nucleotides within DNA macromolecules, we demonstrate that the distance over which molecular junctions are maintained is a measure of smear, and the resulting variance in unbiased single measurements depends on this smear parameter. Our ability to identify individual DNA nucleotides at 20× coverage increases from 81.3% accuracy without smear analysis to 93.9% with smear characterization and binning (SCRIB). Furthermore, merely 7 conductance measurements (7× coverage) are needed to achieve 97.8% accuracy for DNA nucleotide recognition when only low molecular smear measurements are used, which represents a significant improvement over contemporary sequencing methods. These results have important implications in a broad range of molecular electronics applications from designing robust molecular switches to nanoelectronic DNA sequencing.
Marine, Rachel; McCarren, Coleen; Vorrasane, Vansay; Nasko, Dan; Crowgey, Erin; Polson, Shawn W; Wommack, K Eric
2014-01-30
Shotgun metagenomics has become an important tool for investigating the ecology of microorganisms. Underlying these investigations is the assumption that metagenome sequence data accurately estimates the census of microbial populations. Multiple displacement amplification (MDA) of microbial community DNA is often used in cases where it is difficult to obtain enough DNA for sequencing; however, MDA can result in amplification biases that may impact subsequent estimates of population census from metagenome data. Some have posited that pooling replicate MDA reactions negates these biases and restores the accuracy of population analyses. This assumption has not been empirically tested. Using mock viral communities, we examined the influence of pooling on population-scale analyses. In pooled and single reaction MDA treatments, sequence coverage of viral populations was highly variable and coverage patterns across viral genomes were nearly identical, indicating that initial priming biases were reproducible and that pooling did not alleviate biases. In contrast, control unamplified sequence libraries showed relatively even coverage across phage genomes. MDA should be avoided for metagenomic investigations that require quantitative estimates of microbial taxa and gene functional groups. While MDA is an indispensable technique in applications such as single-cell genomics, amplification biases cannot be overcome by combining replicate MDA reactions. Alternative library preparation techniques should be utilized for quantitative microbial ecology studies utilizing metagenomic sequencing approaches.
Sequenced sorghum mutant library- an efficient platform for discovery of causal gene mutations
USDA-ARS?s Scientific Manuscript database
Ethyl methanesulfonate (EMS) efficiently generates high-density mutations in genomes. We applied whole-genome sequencing to 256 phenotyped mutant lines of sorghum (Sorghum bicolor L. Moench) to 16x coverage. Comparisons with the reference sequence revealed >1.8 million canonical EMS-induced G/C to A...
Xu, Chang; Nezami Ranjbar, Mohammad R; Wu, Zhong; DiCarlo, John; Wang, Yexun
2017-01-03
Detection of DNA mutations at very low allele fractions with high accuracy will significantly improve the effectiveness of precision medicine for cancer patients. To achieve this goal through next generation sequencing, researchers need a detection method that 1) captures rare mutation-containing DNA fragments efficiently in the mix of abundant wild-type DNA; 2) sequences the DNA library extensively to deep coverage; and 3) distinguishes low level true variants from amplification and sequencing errors with high accuracy. Targeted enrichment using PCR primers provides researchers with a convenient way to achieve deep sequencing for a small, yet most relevant region using benchtop sequencers. Molecular barcoding (or indexing) provides a unique solution for reducing sequencing artifacts analytically. Although different molecular barcoding schemes have been reported in recent literature, most variant calling has been done on limited targets, using simple custom scripts. The analytical performance of barcode-aware variant calling can be significantly improved by incorporating advanced statistical models. We present here a highly efficient, simple and scalable enrichment protocol that integrates molecular barcodes in multiplex PCR amplification. In addition, we developed smCounter, an open source, generic, barcode-aware variant caller based on a Bayesian probabilistic model. smCounter was optimized and benchmarked on two independent read sets with SNVs and indels at 5 and 1% allele fractions. Variants were called with very good sensitivity and specificity within coding regions. We demonstrated that we can accurately detect somatic mutations with allele fractions as low as 1% in coding regions using our enrichment protocol and variant caller.
NASA Technical Reports Server (NTRS)
1980-01-01
A brief summary of the launch vehicle, spacecraft, and mission is contained. Information relative to launch windows, vehicle telemetry coverage, realtime data flow, telemetry coverage by station, selected trajectory information, and a brief sequence of flight events is also included.
Zhang, Suhua; Bian, Yingnan; Chen, Anqi; Zheng, Hancheng; Gao, Yuzhen; Hou, Yiping; Li, Chengtao
2017-03-01
Utilizing massively parallel sequencing (MPS) technology for SNP testing in forensic genetics is becoming attractive because of the shortcomings of STR markers, such as their high mutation rates and disadvantages associated with the current PCR-CE method as well as its limitations regarding multiplex capabilities. MPS offers the potential to genotype hundreds to thousands of SNPs from multiple samples in a single experimental run. In this study, we designed a customized SNP panel that includes 273 forensically relevant identity SNPs chosen from SNPforID, IISNP, and the HapMap database as well as previously related studies and evaluated the levels of genotyping precision, sequence coverage, sensitivity and SNP performance using the Ion Torrent PGM. In a concordant study of the custom MPS-SNP panel, only four MPS callings were missing due to coverage reads that were too low (<20), whereas the others were fully concordant with Sanger's sequencing results across the two control samples, that is, 9947A and 9948. The analyses indicated a balanced coverage among the included loci, with the exception of the 16 SNPs that were used to detect an inconsistent allele balance and/or lower coverage reads among 50 tested individuals from the Chinese HAN population and the above controls. With the exception of the 16 poorly performing SNPs, the sequence coverage obtained was extensive for the bulk of the SNPs, and only three Y-SNPs (rs16980601, rs11096432, rs3900) showed a mean coverage below 1000. Analyses of the dilution series of control DNA 9948 yielded reproducible results down to 1ng of DNA input. In addition, we provide an analysis tool for automated data quality control and genotyping checks, and we conclude that the SNP targets are polymorphic and independent in the Chinese HAN population. In summary, the evaluation of the sensitivity, accuracy and genotyping performance provides strong support for the application of MPS technology in forensic SNP analysis, and the assay offers a straightforward sample-to-genotype workflow that could be beneficial in forensic casework with respect to both individual identification and complex kinship issues. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Caititu: a tool to graphically represent peptide sequence coverage and domain distribution.
Carvalho, Paulo C; Junqueira, Magno; Valente, Richard H; Domont, Gilberto B
2008-10-07
Here we present Caititu, an easy-to-use proteomics software to graphically represent peptide sequence coverage and domain distribution for different correlated samples (e.g. originated from 2D gel spots) relatively to the full-sequence of the known protein they are related to. Although Caititu has a broad applicability, we exemplify its usefulness in Toxinology using snake venom as a model. For example, proteolytic processing may lead to inactivation or loss of domains. Therefore, our proposed graphic representation for peptides identified by two dimensional electrophoresis followed by mass spectrometric identification of excised spots can aid in inferring what kind of processing happened to the toxins, if any. Caititu is freely available to download at: http://pcarvalho.com/things/caititu.
Assembly of 913 microbial genomes from metagenomic sequencing of the cow rumen.
Stewart, Robert D; Auffret, Marc D; Warr, Amanda; Wiser, Andrew H; Press, Maximilian O; Langford, Kyle W; Liachko, Ivan; Snelling, Timothy J; Dewhurst, Richard J; Walker, Alan W; Roehe, Rainer; Watson, Mick
2018-02-28
The cow rumen is adapted for the breakdown of plant material into energy and nutrients, a task largely performed by enzymes encoded by the rumen microbiome. Here we present 913 draft bacterial and archaeal genomes assembled from over 800 Gb of rumen metagenomic sequence data derived from 43 Scottish cattle, using both metagenomic binning and Hi-C-based proximity-guided assembly. Most of these genomes represent previously unsequenced strains and species. The draft genomes contain over 69,000 proteins predicted to be involved in carbohydrate metabolism, over 90% of which do not have a good match in public databases. Inclusion of the 913 genomes presented here improves metagenomic read classification by sevenfold against our own data, and by fivefold against other publicly available rumen datasets. Thus, our dataset substantially improves the coverage of rumen microbial genomes in the public databases and represents a valuable resource for biomass-degrading enzyme discovery and studies of the rumen microbiome.
QSRA: a quality-value guided de novo short read assembler.
Bryant, Douglas W; Wong, Weng-Keen; Mockler, Todd C
2009-02-24
New rapid high-throughput sequencing technologies have sparked the creation of a new class of assembler. Since all high-throughput sequencing platforms incorporate errors in their output, short-read assemblers must be designed to account for this error while utilizing all available data. We have designed and implemented an assembler, Quality-value guided Short Read Assembler, created to take advantage of quality-value scores as a further method of dealing with error. Compared to previous published algorithms, our assembler shows significant improvements not only in speed but also in output quality. QSRA generally produced the highest genomic coverage, while being faster than VCAKE. QSRA is extremely competitive in its longest contig and N50/N80 contig lengths, producing results of similar quality to those of EDENA and VELVET. QSRA provides a step closer to the goal of de novo assembly of complex genomes, improving upon the original VCAKE algorithm by not only drastically reducing runtimes but also increasing the viability of the assembly algorithm through further error handling capabilities.
Lu, Yang Young; Chen, Ting; Fuhrman, Jed A; Sun, Fengzhu
2017-03-15
The advent of next-generation sequencing technologies enables researchers to sequence complex microbial communities directly from the environment. Because assembly typically produces only genome fragments, also known as contigs, instead of an entire genome, it is crucial to group them into operational taxonomic units (OTUs) for further taxonomic profiling and down-streaming functional analysis. OTU clustering is also referred to as binning. We present COCACOLA, a general framework automatically bin contigs into OTUs based on sequence composition and coverage across multiple samples. The effectiveness of COCACOLA is demonstrated in both simulated and real datasets in comparison with state-of-art binning approaches such as CONCOCT, GroopM, MaxBin and MetaBAT. The superior performance of COCACOLA relies on two aspects. One is using L 1 distance instead of Euclidean distance for better taxonomic identification during initialization. More importantly, COCACOLA takes advantage of both hard clustering and soft clustering by sparsity regularization. In addition, the COCACOLA framework seamlessly embraces customized knowledge to facilitate binning accuracy. In our study, we have investigated two types of additional knowledge, the co-alignment to reference genomes and linkage of contigs provided by paired-end reads, as well as the ensemble of both. We find that both co-alignment and linkage information further improve binning in the majority of cases. COCACOLA is scalable and faster than CONCOCT, GroopM, MaxBin and MetaBAT. The software is available at https://github.com/younglululu/COCACOLA . fsun@usc.edu. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Genomic Rearrangements in Arabidopsis Considered as Quantitative Traits.
Imprialou, Martha; Kahles, André; Steffen, Joshua G; Osborne, Edward J; Gan, Xiangchao; Lempe, Janne; Bhomra, Amarjit; Belfield, Eric; Visscher, Anne; Greenhalgh, Robert; Harberd, Nicholas P; Goram, Richard; Hein, Jotun; Robert-Seilaniantz, Alexandre; Jones, Jonathan; Stegle, Oliver; Kover, Paula; Tsiantis, Miltos; Nordborg, Magnus; Rätsch, Gunnar; Clark, Richard M; Mott, Richard
2017-04-01
To understand the population genetics of structural variants and their effects on phenotypes, we developed an approach to mapping structural variants that segregate in a population sequenced at low coverage. We avoid calling structural variants directly. Instead, the evidence for a potential structural variant at a locus is indicated by variation in the counts of short-reads that map anomalously to that locus. These structural variant traits are treated as quantitative traits and mapped genetically, analogously to a gene expression study. Association between a structural variant trait at one locus, and genotypes at a distant locus indicate the origin and target of a transposition. Using ultra-low-coverage (0.3×) population sequence data from 488 recombinant inbred Arabidopsis thaliana genomes, we identified 6502 segregating structural variants. Remarkably, 25% of these were transpositions. While many structural variants cannot be delineated precisely, we validated 83% of 44 predicted transposition breakpoints by polymerase chain reaction. We show that specific structural variants may be causative for quantitative trait loci for germination and resistance to infection by the fungus Albugo laibachii , isolate Nc14. Further we show that the phenotypic heritability attributable to read-mapping anomalies differs from, and, in the case of time to germination and bolting, exceeds that due to standard genetic variation. Genes within structural variants are also more likely to be silenced or dysregulated. This approach complements the prevalent strategy of structural variant discovery in fewer individuals sequenced at high coverage. It is generally applicable to large populations sequenced at low-coverage, and is particularly suited to mapping transpositions. Copyright © 2017 by the Genetics Society of America.
Draft versus finished sequence data for DNA and protein diagnostic signature development
Gardner, Shea N.; Lam, Marisa W.; Smith, Jason R.; Torres, Clinton L.; Slezak, Tom R.
2005-01-01
Sequencing pathogen genomes is costly, demanding careful allocation of limited sequencing resources. We built a computational Sequencing Analysis Pipeline (SAP) to guide decisions regarding the amount of genomic sequencing necessary to develop high-quality diagnostic DNA and protein signatures. SAP uses simulations to estimate the number of target genomes and close phylogenetic relatives (near neighbors or NNs) to sequence. We use SAP to assess whether draft data are sufficient or finished sequencing is required using Marburg and variola virus sequences. Simulations indicate that intermediate to high-quality draft with error rates of 10−3–10−5 (∼8× coverage) of target organisms is suitable for DNA signature prediction. Low-quality draft with error rates of ∼1% (3× to 6× coverage) of target isolates is inadequate for DNA signature prediction, although low-quality draft of NNs is sufficient, as long as the target genomes are of high quality. For protein signature prediction, sequencing errors in target genomes substantially reduce the detection of amino acid sequence conservation, even if the draft is of high quality. In summary, high-quality draft of target and low-quality draft of NNs appears to be a cost-effective investment for DNA signature prediction, but may lead to underestimation of predicted protein signatures. PMID:16243783
Testing the molecular clock using mechanistic models of fossil preservation and molecular evolution
2017-01-01
Molecular sequence data provide information about relative times only, and fossil-based age constraints are the ultimate source of information about absolute times in molecular clock dating analyses. Thus, fossil calibrations are critical to molecular clock dating, but competing methods are difficult to evaluate empirically because the true evolutionary time scale is never known. Here, we combine mechanistic models of fossil preservation and sequence evolution in simulations to evaluate different approaches to constructing fossil calibrations and their impact on Bayesian molecular clock dating, and the relative impact of fossil versus molecular sampling. We show that divergence time estimation is impacted by the model of fossil preservation, sampling intensity and tree shape. The addition of sequence data may improve molecular clock estimates, but accuracy and precision is dominated by the quality of the fossil calibrations. Posterior means and medians are poor representatives of true divergence times; posterior intervals provide a much more accurate estimate of divergence times, though they may be wide and often do not have high coverage probability. Our results highlight the importance of increased fossil sampling and improved statistical approaches to generating calibrations, which should incorporate the non-uniform nature of ecological and temporal fossil species distributions. PMID:28637852
Lu, Fu-Hao; McKenzie, Neil; Kettleborough, George; Heavens, Darren; Clark, Matthew D; Bevan, Michael W
2018-05-01
The accurate sequencing and assembly of very large, often polyploid, genomes remains a challenging task, limiting long-range sequence information and phased sequence variation for applications such as plant breeding. The 15-Gb hexaploid bread wheat (Triticum aestivum) genome has been particularly challenging to sequence, and several different approaches have recently generated long-range assemblies. Mapping and understanding the types of assembly errors are important for optimising future sequencing and assembly approaches and for comparative genomics. Here we use a Fosill 38-kb jumping library to assess medium and longer-range order of different publicly available wheat genome assemblies. Modifications to the Fosill protocol generated longer Illumina sequences and enabled comprehensive genome coverage. Analyses of two independent Bacterial Artificial Chromosome (BAC)-based chromosome-scale assemblies, two independent Illumina whole genome shotgun assemblies, and a hybrid Single Molecule Real Time (SMRT-PacBio) and short read (Illumina) assembly were carried out. We revealed a surprising scale and variety of discrepancies using Fosill mate-pair mapping and validated several of each class. In addition, Fosill mate-pairs were used to scaffold a whole genome Illumina assembly, leading to a 3-fold increase in N50 values. Our analyses, using an independent means to validate different wheat genome assemblies, show that whole genome shotgun assemblies based solely on Illumina sequences are significantly more accurate by all measures compared to BAC-based chromosome-scale assemblies and hybrid SMRT-Illumina approaches. Although current whole genome assemblies are reasonably accurate and useful, additional improvements will be needed to generate complete assemblies of wheat genomes using open-source, computationally efficient, and cost-effective methods.
Artieri, Carlo G; Fraser, Hunter B
2014-12-01
The recent advent of ribosome profiling-sequencing of short ribosome-bound fragments of mRNA-has offered an unprecedented opportunity to interrogate the sequence features responsible for modulating translational rates. Nevertheless, numerous analyses of the first riboprofiling data set have produced equivocal and often incompatible results. Here we analyze three independent yeast riboprofiling data sets, including two with much higher coverage than previously available, and find that all three show substantial technical sequence biases that confound interpretations of ribosomal occupancy. After accounting for these biases, we find no effect of previously implicated factors on ribosomal pausing. Rather, we find that incorporation of proline, whose unique side-chain stalls peptide synthesis in vitro, also slows the ribosome in vivo. We also reanalyze a method that implicated positively charged amino acids as the major determinant of ribosomal stalling and demonstrate that it produces false signals of stalling in low-coverage data. Our results suggest that any analysis of riboprofiling data should account for sequencing biases and sparse coverage. To this end, we establish a robust methodology that enables analysis of ribosome profiling data without prior assumptions regarding which positions spanned by the ribosome cause stalling. © 2014 Artieri and Fraser; Published by Cold Spring Harbor Laboratory Press.
Arnold, Roland; Goldenberg, Florian; Mewes, Hans-Werner; Rattei, Thomas
2014-01-01
The Similarity Matrix of Proteins (SIMAP, http://mips.gsf.de/simap/) database has been designed to massively accelerate computationally expensive protein sequence analysis tasks in bioinformatics. It provides pre-calculated sequence similarities interconnecting the entire known protein sequence universe, complemented by pre-calculated protein features and domains, similarity clusters and functional annotations. SIMAP covers all major public protein databases as well as many consistently re-annotated metagenomes from different repositories. As of September 2013, SIMAP contains >163 million proteins corresponding to ∼70 million non-redundant sequences. SIMAP uses the sensitive FASTA search heuristics, the Smith–Waterman alignment algorithm, the InterPro database of protein domain models and the BLAST2GO functional annotation algorithm. SIMAP assists biologists by facilitating the interactive exploration of the protein sequence universe. Web-Service and DAS interfaces allow connecting SIMAP with any other bioinformatic tool and resource. All-against-all protein sequence similarity matrices of project-specific protein collections are generated on request. Recent improvements allow SIMAP to cover the rapidly growing sequenced protein sequence universe. New Web-Service interfaces enhance the connectivity of SIMAP. Novel tools for interactive extraction of protein similarity networks have been added. Open access to SIMAP is provided through the web portal; the portal also contains instructions and links for software access and flat file downloads. PMID:24165881
New in-depth rainbow trout transcriptome reference and digital atlas of gene expression
USDA-ARS?s Scientific Manuscript database
Sequencing the rainbow trout genome is underway and a transcriptome reference sequence is required to help in genome assembly and gene discovery. Previously, we reported a transcriptome reference sequence using a 19X coverage of 454-pyrosequencing data. Although this work added a great wealth of ann...
USDA-ARS?s Scientific Manuscript database
Genomic structural variations are an important source of genetic diversity. Copy number variations (CNVs), gains and losses of large regions of genomic sequence between individuals of a species, are known to be associated with both diseases and phenotypic traits. Deeply sequenced genomes are often u...
Scalable whole-exome sequencing of cell-free DNA reveals high concordance with metastatic tumors.
Adalsteinsson, Viktor A; Ha, Gavin; Freeman, Samuel S; Choudhury, Atish D; Stover, Daniel G; Parsons, Heather A; Gydush, Gregory; Reed, Sarah C; Rotem, Denisse; Rhoades, Justin; Loginov, Denis; Livitz, Dimitri; Rosebrock, Daniel; Leshchiner, Ignaty; Kim, Jaegil; Stewart, Chip; Rosenberg, Mara; Francis, Joshua M; Zhang, Cheng-Zhong; Cohen, Ofir; Oh, Coyin; Ding, Huiming; Polak, Paz; Lloyd, Max; Mahmud, Sairah; Helvie, Karla; Merrill, Margaret S; Santiago, Rebecca A; O'Connor, Edward P; Jeong, Seong H; Leeson, Rachel; Barry, Rachel M; Kramkowski, Joseph F; Zhang, Zhenwei; Polacek, Laura; Lohr, Jens G; Schleicher, Molly; Lipscomb, Emily; Saltzman, Andrea; Oliver, Nelly M; Marini, Lori; Waks, Adrienne G; Harshman, Lauren C; Tolaney, Sara M; Van Allen, Eliezer M; Winer, Eric P; Lin, Nancy U; Nakabayashi, Mari; Taplin, Mary-Ellen; Johannessen, Cory M; Garraway, Levi A; Golub, Todd R; Boehm, Jesse S; Wagle, Nikhil; Getz, Gad; Love, J Christopher; Meyerson, Matthew
2017-11-06
Whole-exome sequencing of cell-free DNA (cfDNA) could enable comprehensive profiling of tumors from blood but the genome-wide concordance between cfDNA and tumor biopsies is uncertain. Here we report ichorCNA, software that quantifies tumor content in cfDNA from 0.1× coverage whole-genome sequencing data without prior knowledge of tumor mutations. We apply ichorCNA to 1439 blood samples from 520 patients with metastatic prostate or breast cancers. In the earliest tested sample for each patient, 34% of patients have ≥10% tumor-derived cfDNA, sufficient for standard coverage whole-exome sequencing. Using whole-exome sequencing, we validate the concordance of clonal somatic mutations (88%), copy number alterations (80%), mutational signatures, and neoantigens between cfDNA and matched tumor biopsies from 41 patients with ≥10% cfDNA tumor content. In summary, we provide methods to identify patients eligible for comprehensive cfDNA profiling, revealing its applicability to many patients, and demonstrate high concordance of cfDNA and metastatic tumor whole-exome sequencing.
The study of human Y chromosome variation through ancient DNA.
Kivisild, Toomas
2017-05-01
High throughput sequencing methods have completely transformed the study of human Y chromosome variation by offering a genome-scale view on genetic variation retrieved from ancient human remains in context of a growing number of high coverage whole Y chromosome sequence data from living populations from across the world. The ancient Y chromosome sequences are providing us the first exciting glimpses into the past variation of male-specific compartment of the genome and the opportunity to evaluate models based on previously made inferences from patterns of genetic variation in living populations. Analyses of the ancient Y chromosome sequences are challenging not only because of issues generally related to ancient DNA work, such as DNA damage-induced mutations and low content of endogenous DNA in most human remains, but also because of specific properties of the Y chromosome, such as its highly repetitive nature and high homology with the X chromosome. Shotgun sequencing of uniquely mapping regions of the Y chromosomes to sufficiently high coverage is still challenging and costly in poorly preserved samples. To increase the coverage of specific target SNPs capture-based methods have been developed and used in recent years to generate Y chromosome sequence data from hundreds of prehistoric skeletal remains. Besides the prospects of testing directly as how much genetic change in a given time period has accompanied changes in material culture the sequencing of ancient Y chromosomes allows us also to better understand the rate at which mutations accumulate and get fixed over time. This review considers genome-scale evidence on ancient Y chromosome diversity that has recently started to accumulate in geographic areas favourable to DNA preservation. More specifically the review focuses on examples of regional continuity and change of the Y chromosome haplogroups in North Eurasia and in the New World.
Wahjuhono, Gendro; Revolusiana; Widhiastuti, Dyah; Sundoro, Julitasari; Mardani, Tri; Ratih, Woro Umi; Sutomo, Retno; Safitri, Ida; Sampurno, Ondri Dwi; Rana, Bardan; Roivainen, Merja; Kahn, Anna-Lea; Mach, Ondrej; Pallansch, Mark A; Sutter, Roland W
2014-11-01
Inactivated poliovirus vaccine (IPV) is rarely used in tropical developing countries. To generate additional scientific information, especially on the possible emergence of vaccine-derived polioviruses (VDPVs) in an IPV-only environment, we initiated an IPV introduction project in Yogyakarta, an Indonesian province. In this report, we present the coverage, immunity, and VDPV surveillance results. In Yogyakarta, we established environmental surveillance starting in 2004; and conducted routine immunization coverage and seroprevalence surveys before and after a September 2007 switch from oral poliovirus vaccine (OPV) to IPV, using standard coverage and serosurvey methods. Rates and types of polioviruses found in sewage samples were analyzed, and all poliovirus isolates after the switch were sequenced. Vaccination coverage (>95%) and immunity (approximately 100%) did not change substantially before and after the IPV switch. No VDPVs were detected. Before the switch, 58% of environmental samples contained Sabin poliovirus; starting 6 weeks after the switch, Sabin polioviruses were rarely isolated, and if they were, genetic sequencing suggested recent introductions. This project demonstrated that under almost ideal conditions (good hygiene, maintenance of universally high IPV coverage, and corresponding high immunity against polioviruses), no emergence and circulation of VDPV could be detected in a tropical developing country setting. © The Author 2014. Published by Oxford University Press on behalf of the Infectious Diseases Society of America. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Emerman, Amy B; Bowman, Sarah K; Barry, Andrew; Henig, Noa; Patel, Kruti M; Gardner, Andrew F; Hendrickson, Cynthia L
2017-07-05
Next-generation sequencing (NGS) is a powerful tool for genomic studies, translational research, and clinical diagnostics that enables the detection of single nucleotide polymorphisms, insertions and deletions, copy number variations, and other genetic variations. Target enrichment technologies improve the efficiency of NGS by only sequencing regions of interest, which reduces sequencing costs while increasing coverage of the selected targets. Here we present NEBNext Direct ® , a hybridization-based, target-enrichment approach that addresses many of the shortcomings of traditional target-enrichment methods. This approach features a simple, 7-hr workflow that uses enzymatic removal of off-target sequences to achieve a high specificity for regions of interest. Additionally, unique molecular identifiers are incorporated for the identification and filtering of PCR duplicates. The same protocol can be used across a wide range of input amounts, input types, and panel sizes, enabling NEBNext Direct to be broadly applicable across a wide variety of research and diagnostic needs. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
PaPrBaG: A machine learning approach for the detection of novel pathogens from NGS data
NASA Astrophysics Data System (ADS)
Deneke, Carlus; Rentzsch, Robert; Renard, Bernhard Y.
2017-01-01
The reliable detection of novel bacterial pathogens from next-generation sequencing data is a key challenge for microbial diagnostics. Current computational tools usually rely on sequence similarity and often fail to detect novel species when closely related genomes are unavailable or missing from the reference database. Here we present the machine learning based approach PaPrBaG (Pathogenicity Prediction for Bacterial Genomes). PaPrBaG overcomes genetic divergence by training on a wide range of species with known pathogenicity phenotype. To that end we compiled a comprehensive list of pathogenic and non-pathogenic bacteria with human host, using various genome metadata in conjunction with a rule-based protocol. A detailed comparative study reveals that PaPrBaG has several advantages over sequence similarity approaches. Most importantly, it always provides a prediction whereas other approaches discard a large number of sequencing reads with low similarity to currently known reference genomes. Furthermore, PaPrBaG remains reliable even at very low genomic coverages. CombiningPaPrBaG with existing approaches further improves prediction results.
Leese, Florian; Mayer, Christoph; Agrawal, Shobhit; Dambach, Johannes; Dietz, Lars; Doemel, Jana S.; Goodall-Copstake, William P.; Held, Christoph; Jackson, Jennifer A.; Lampert, Kathrin P.; Linse, Katrin; Macher, Jan N.; Nolzen, Jennifer; Raupach, Michael J.; Rivera, Nicole T.; Schubart, Christoph D.; Striewski, Sebastian; Tollrian, Ralph; Sands, Chester J.
2012-01-01
High throughput sequencing technologies are revolutionizing genetic research. With this “rise of the machines”, genomic sequences can be obtained even for unknown genomes within a short time and for reasonable costs. This has enabled evolutionary biologists studying genetically unexplored species to identify molecular markers or genomic regions of interest (e.g. micro- and minisatellites, mitochondrial and nuclear genes) by sequencing only a fraction of the genome. However, when using such datasets from non-model species, it is possible that DNA from non-target contaminant species such as bacteria, viruses, fungi, or other eukaryotic organisms may complicate the interpretation of the results. In this study we analysed 14 genomic pyrosequencing libraries of aquatic non-model taxa from four major evolutionary lineages. We quantified the amount of suitable micro- and minisatellites, mitochondrial genomes, known nuclear genes and transposable elements and searched for contamination from various sources using bioinformatic approaches. Our results show that in all sequence libraries with estimated coverage of about 0.02–25%, many appropriate micro- and minisatellites, mitochondrial gene sequences and nuclear genes from different KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways could be identified and characterized. These can serve as markers for phylogenetic and population genetic analyses. A central finding of our study is that several genomic libraries suffered from different biases owing to non-target DNA or mobile elements. In particular, viruses, bacteria or eukaryote endosymbionts contributed significantly (up to 10%) to some of the libraries analysed. If not identified as such, genetic markers developed from high-throughput sequencing data for non-model organisms may bias evolutionary studies or fail completely in experimental tests. In conclusion, our study demonstrates the enormous potential of low-coverage genome survey sequences and suggests bioinformatic analysis workflows. The results also advise a more sophisticated filtering for problematic sequences and non-target genome sequences prior to developing markers. PMID:23185309
Verifying Digital Components of Physical Systems: Experimental Evaluation of Test Quality
NASA Astrophysics Data System (ADS)
Laputenko, A. V.; López, J. E.; Yevtushenko, N. V.
2018-03-01
This paper continues the study of high quality test derivation for verifying digital components which are used in various physical systems; those are sensors, data transfer components, etc. We have used logic circuits b01-b010 of the package of ITC'99 benchmarks (Second Release) for experimental evaluation which as stated before, describe digital components of physical systems designed for various applications. Test sequences are derived for detecting the most known faults of the reference logic circuit using three different approaches to test derivation. Three widely used fault types such as stuck-at-faults, bridges, and faults which slightly modify the behavior of one gate are considered as possible faults of the reference behavior. The most interesting test sequences are short test sequences that can provide appropriate guarantees after testing, and thus, we experimentally study various approaches to the derivation of the so-called complete test suites which detect all fault types. In the first series of experiments, we compare two approaches for deriving complete test suites. In the first approach, a shortest test sequence is derived for testing each fault. In the second approach, a test sequence is pseudo-randomly generated by the use of an appropriate software for logic synthesis and verification (ABC system in our study) and thus, can be longer. However, after deleting sequences detecting the same set of faults, a test suite returned by the second approach is shorter. The latter underlines the fact that in many cases it is useless to spend `time and efforts' for deriving a shortest distinguishing sequence; it is better to use the test minimization afterwards. The performed experiments also show that the use of only randomly generated test sequences is not very efficient since such sequences do not detect all the faults of any type. After reaching the fault coverage around 70%, saturation is observed, and the fault coverage cannot be increased anymore. For deriving high quality short test suites, the approach that is the combination of randomly generated sequences together with sequences which are aimed to detect faults not detected by random tests, allows to reach the good fault coverage using shortest test sequences.
Detection of Bacterial Pathogens from Broncho-Alveolar Lavage by Next-Generation Sequencing.
Leo, Stefano; Gaïa, Nadia; Ruppé, Etienne; Emonet, Stephane; Girard, Myriam; Lazarevic, Vladimir; Schrenzel, Jacques
2017-09-20
The applications of whole-metagenome shotgun sequencing (WMGS) in routine clinical analysis are still limited. A combination of a DNA extraction procedure, sequencing, and bioinformatics tools is essential for the removal of human DNA and for improving bacterial species identification in a timely manner. We tackled these issues with a broncho-alveolar lavage (BAL) sample from an immunocompromised patient who had developed severe chronic pneumonia. We extracted DNA from the BAL sample with protocols based either on sequential lysis of human and bacterial cells or on the mechanical disruption of all cells. Metagenomic libraries were sequenced on Illumina HiSeq platforms. Microbial community composition was determined by k-mer analysis or by mapping to taxonomic markers. Results were compared to those obtained by conventional clinical culture and molecular methods. Compared to mechanical cell disruption, a sequential lysis protocol resulted in a significantly increased proportion of bacterial DNA over human DNA and higher sequence coverage of Mycobacterium abscessus , Corynebacterium jeikeium and Rothia dentocariosa , the bacteria reported by clinical microbiology tests. In addition, we identified anaerobic bacteria not searched for by the clinical laboratory. Our results further support the implementation of WMGS in clinical routine diagnosis for bacterial identification.
Genome and transcriptome of the regeneration-competent flatworm, Macrostomum lignano.
Wasik, Kaja; Gurtowski, James; Zhou, Xin; Ramos, Olivia Mendivil; Delás, M Joaquina; Battistoni, Giorgia; El Demerdash, Osama; Falciatori, Ilaria; Vizoso, Dita B; Smith, Andrew D; Ladurner, Peter; Schärer, Lukas; McCombie, W Richard; Hannon, Gregory J; Schatz, Michael
2015-10-06
The free-living flatworm, Macrostomum lignano has an impressive regenerative capacity. Following injury, it can regenerate almost an entirely new organism because of the presence of an abundant somatic stem cell population, the neoblasts. This set of unique properties makes many flatworms attractive organisms for studying the evolution of pathways involved in tissue self-renewal, cell-fate specification, and regeneration. The use of these organisms as models, however, is hampered by the lack of a well-assembled and annotated genome sequences, fundamental to modern genetic and molecular studies. Here we report the genomic sequence of M. lignano and an accompanying characterization of its transcriptome. The genome structure of M. lignano is remarkably complex, with ∼75% of its sequence being comprised of simple repeats and transposon sequences. This has made high-quality assembly from Illumina reads alone impossible (N50=222 bp). We therefore generated 130× coverage by long sequencing reads from the Pacific Biosciences platform to create a substantially improved assembly with an N50 of 64 Kbp. We complemented the reference genome with an assembled and annotated transcriptome, and used both of these datasets in combination to probe gene-expression patterns during regeneration, examining pathways important to stem cell function.
Shirts, Brian H; Salipante, Stephen J; Casadei, Silvia; Ryan, Shawnia; Martin, Judith; Jacobson, Angela; Vlaskin, Tatyana; Koehler, Karen; Livingston, Robert J; King, Mary-Claire; Walsh, Tom; Pritchard, Colin C
2014-10-01
Single-exon inversions have rarely been described in clinical syndromes and are challenging to detect using Sanger sequencing. We report the case of a 40-year-old woman with adenomatous colon polyps too numerous to count and who had a complex inversion spanning the entire exon 10 in APC (the gene encoding for adenomatous polyposis coli), causing exon skipping and resulting in a frameshift and premature protein truncation. In this study, we employed complete APC gene sequencing using high-coverage next-generation sequencing by ColoSeq, analysis with BreakDancer and SLOPE software, and confirmatory transcript analysis. ColoSeq identified a complex small genomic rearrangement consisting of an inversion that results in translational skipping of exon 10 in the APC gene. This mutation would not have been detected by traditional sequencing or gene-dosage methods. We report a case of adenomatous polyposis resulting from a complex single-exon inversion. Our report highlights the benefits of large-scale sequencing methods that capture intronic sequences with high enough depth of coverage-as well as the use of informatics tools-to enable detection of small pathogenic structural rearrangements.
Model-based reconstruction of synthetic promoter library in Corynebacterium glutamicum.
Zhang, Shuanghong; Liu, Dingyu; Mao, Zhitao; Mao, Yufeng; Ma, Hongwu; Chen, Tao; Zhao, Xueming; Wang, Zhiwen
2018-05-01
To develop an efficient synthetic promoter library for fine-tuned expression of target genes in Corynebacterium glutamicum. A synthetic promoter library for C. glutamicum was developed based on conserved sequences of the - 10 and - 35 regions. The synthetic promoter library covered a wide range of strengths, ranging from 1 to 193% of the tac promoter. 68 promoters were selected and sequenced for correlation analysis between promoter sequence and strength with a statistical model. A new promoter library was further reconstructed with improved promoter strength and coverage based on the results of correlation analysis. Tandem promoter P70 was finally constructed with increased strength by 121% over the tac promoter. The promoter library developed in this study showed a great potential for applications in metabolic engineering and synthetic biology for the optimization of metabolic networks. To the best of our knowledge, this is the first reconstruction of synthetic promoter library based on statistical analysis of C. glutamicum.
Kozyreva, Varvara K.; Truong, Chau-Linda; Greninger, Alexander L.; Crandall, John; Mukhopadhyay, Rituparna
2017-01-01
ABSTRACT Public health microbiology laboratories (PHLs) are on the cusp of unprecedented improvements in pathogen identification, antibiotic resistance detection, and outbreak investigation by using whole-genome sequencing (WGS). However, considerable challenges remain due to the lack of common standards. Here, we describe the validation of WGS on the Illumina platform for routine use in PHLs according to Clinical Laboratory Improvements Act (CLIA) guidelines for laboratory-developed tests (LDTs). We developed a validation panel comprising 10 Enterobacteriaceae isolates, 5 Gram-positive cocci, 5 Gram-negative nonfermenting species, 9 Mycobacterium tuberculosis isolates, and 5 miscellaneous bacteria. The genome coverage range was 15.71× to 216.4× (average, 79.72×; median, 71.55×); the limit of detection (LOD) for single nucleotide polymorphisms (SNPs) was 60×. The accuracy, reproducibility, and repeatability of base calling were >99.9%. The accuracy of phylogenetic analysis was 100%. The specificity and sensitivity inferred from multilocus sequence typing (MLST) and genome-wide SNP-based phylogenetic assays were 100%. The following objectives were accomplished: (i) the establishment of the performance specifications for WGS applications in PHLs according to CLIA guidelines, (ii) the development of quality assurance and quality control measures, (iii) the development of a reporting format for end users with or without WGS expertise, (iv) the availability of a validation set of microorganisms, and (v) the creation of a modular template for the validation of WGS processes in PHLs. The validation panel, sequencing analytics, and raw sequences could facilitate multilaboratory comparisons of WGS data. Additionally, the WGS performance specifications and modular template are adaptable for the validation of other platforms and reagent kits. PMID:28592550
Kozyreva, Varvara K; Truong, Chau-Linda; Greninger, Alexander L; Crandall, John; Mukhopadhyay, Rituparna; Chaturvedi, Vishnu
2017-08-01
Public health microbiology laboratories (PHLs) are on the cusp of unprecedented improvements in pathogen identification, antibiotic resistance detection, and outbreak investigation by using whole-genome sequencing (WGS). However, considerable challenges remain due to the lack of common standards. Here, we describe the validation of WGS on the Illumina platform for routine use in PHLs according to Clinical Laboratory Improvements Act (CLIA) guidelines for laboratory-developed tests (LDTs). We developed a validation panel comprising 10 Enterobacteriaceae isolates, 5 Gram-positive cocci, 5 Gram-negative nonfermenting species, 9 Mycobacterium tuberculosis isolates, and 5 miscellaneous bacteria. The genome coverage range was 15.71× to 216.4× (average, 79.72×; median, 71.55×); the limit of detection (LOD) for single nucleotide polymorphisms (SNPs) was 60×. The accuracy, reproducibility, and repeatability of base calling were >99.9%. The accuracy of phylogenetic analysis was 100%. The specificity and sensitivity inferred from multilocus sequence typing (MLST) and genome-wide SNP-based phylogenetic assays were 100%. The following objectives were accomplished: (i) the establishment of the performance specifications for WGS applications in PHLs according to CLIA guidelines, (ii) the development of quality assurance and quality control measures, (iii) the development of a reporting format for end users with or without WGS expertise, (iv) the availability of a validation set of microorganisms, and (v) the creation of a modular template for the validation of WGS processes in PHLs. The validation panel, sequencing analytics, and raw sequences could facilitate multilaboratory comparisons of WGS data. Additionally, the WGS performance specifications and modular template are adaptable for the validation of other platforms and reagent kits. Copyright © 2017 Kozyreva et al.
O'Flaherty, Brigid M; Li, Yan; Tao, Ying; Paden, Clinton R; Queen, Krista; Zhang, Jing; Dinwiddie, Darrell L; Gross, Stephen M; Schroth, Gary P; Tong, Suxiang
2018-06-01
Next generation sequencing (NGS) technologies have revolutionized the genomics field and are becoming more commonplace for identification of human infectious diseases. However, due to the low abundance of viral nucleic acids (NAs) in relation to host, viral identification using direct NGS technologies often lacks sufficient sensitivity. Here, we describe an approach based on two complementary enrichment strategies that significantly improves the sensitivity of NGS-based virus identification. To start, we developed two sets of DNA probes to enrich virus NAs associated with respiratory diseases. The first set of probes spans the genomes, allowing for identification of known viruses and full genome sequencing, while the second set targets regions conserved among viral families or genera, providing the ability to detect both known and potentially novel members of those virus groups. Efficiency of enrichment was assessed by NGS testing reference virus and clinical samples with known infection. We show significant improvement in viral identification using enriched NGS compared to unenriched NGS. Without enrichment, we observed an average of 0.3% targeted viral reads per sample. However, after enrichment, 50%-99% of the reads per sample were the targeted viral reads for both the reference isolates and clinical specimens using both probe sets. Importantly, dramatic improvements on genome coverage were also observed following virus-specific probe enrichment. The methods described here provide improved sensitivity for virus identification by NGS, allowing for a more comprehensive analysis of disease etiology. © 2018 O'Flaherty et al.; Published by Cold Spring Harbor Laboratory Press.
NASA Astrophysics Data System (ADS)
Koelmel, Jeremy P.; Kroeger, Nicholas M.; Gill, Emily L.; Ulmer, Candice Z.; Bowden, John A.; Patterson, Rainey E.; Yost, Richard A.; Garrett, Timothy J.
2017-05-01
Untargeted omics analyses aim to comprehensively characterize biomolecules within a biological system. Changes in the presence or quantity of these biomolecules can indicate important biological perturbations, such as those caused by disease. With current technological advancements, the entire genome can now be sequenced; however, in the burgeoning fields of lipidomics, only a subset of lipids can be identified. The recent emergence of high resolution tandem mass spectrometry (HR-MS/MS), in combination with ultra-high performance liquid chromatography, has resulted in an increased coverage of the lipidome. Nevertheless, identifications from MS/MS are generally limited by the number of precursors that can be selected for fragmentation during chromatographic elution. Therefore, we developed the software IE-Omics to automate iterative exclusion (IE), where selected precursors using data-dependent topN analyses are excluded in sequential injections. In each sequential injection, unique precursors are fragmented until HR-MS/MS spectra of all ions above a user-defined intensity threshold are acquired. IE-Omics was applied to lipidomic analyses in Red Cross plasma and substantia nigra tissue. Coverage of the lipidome was drastically improved using IE. When applying IE-Omics to Red Cross plasma and substantia nigra lipid extracts in positive ion mode, 69% and 40% more molecular identifications were obtained, respectively. In addition, applying IE-Omics to a lipidomics workflow increased the coverage of trace species, including odd-chained and short-chained diacylglycerides and oxidized lipid species. By increasing the coverage of the lipidome, applying IE to a lipidomics workflow increases the probability of finding biomarkers and provides additional information for determining etiology of disease.
Simultaneous Multi-Slice fMRI using Spiral Trajectories
Zahneisen, Benjamin; Poser, Benedikt A.; Ernst, Thomas; Stenger, V. Andrew
2014-01-01
Parallel imaging methods using multi-coil receiver arrays have been shown to be effective for increasing MRI acquisition speed. However parallel imaging methods for fMRI with 2D sequences show only limited improvements in temporal resolution because of the long echo times needed for BOLD contrast. Recently, Simultaneous Multi-Slice (SMS) imaging techniques have been shown to increase fMRI temporal resolution by factors of four and higher. In SMS fMRI multiple slices can be acquired simultaneously using Echo Planar Imaging (EPI) and the overlapping slices are un-aliased using a parallel imaging reconstruction with multiple receivers. The slice separation can be further improved using the “blipped-CAIPI” EPI sequence that provides a more efficient sampling of the SMS 3D k-space. In this paper a blipped-spiral SMS sequence for ultra-fast fMRI is presented. The blipped-spiral sequence combines the sampling efficiency of spiral trajectories with the SMS encoding concept used in blipped-CAIPI EPI. We show that blipped spiral acquisition can achieve almost whole brain coverage at 3 mm isotropic resolution in 168 ms. It is also demonstrated that the high temporal resolution allows for dynamic BOLD lag time measurement using visual/motor and retinotopic mapping paradigms. The local BOLD lag time within the visual cortex following the retinotopic mapping stimulation of expanding flickering rings is directly measured and easily translated into an eccentricity map of the cortex. PMID:24518259
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gordon, Sean; Reguera, Maria; Sade, Nir
2015-03-20
We are using the perennial model grass Brachypodium sylvaticum to identify combinations of transgenes that enhance tolerance to multiple, simultaneous abiotic stresses. The most successful transgene combinations will ultimately be used to create improved switchgrass (Panicum virgatum L.) cultivars. To further develop B. sylvaticum as a perennial model grass, and facilitate our planned transcriptional profiling, we are sequencing and annotating the genome. We have generated ~40x genome coverage using PacBio sequencing of the largest possible size selected libraries (18, 22, 25 kb). Our initial assembly using only long-read sequence contained 320 Mb of sequence with an N50 contig length ofmore » 315 kb and an N95 contig length of 40 kb. This assembly consists of 2,430 contigs, the largest of which was 1.6 Mb. The estimated genome size based on c-values is 340 Mb indicating that about 20 Mb of presumably repetitive DNA remains yet unassembled. Significantly, this assembly is far superior to an assembly created from paired-end short-read sequence, ~100x genome coverage. The short-read-only assembly contained only 226 Mb of sequence in 19k contigs. To aid the assembly of the scaffolds into chromosome-scale assemblies we produced an F2 mapping population and have genotyped 480 individuals using a genotype by sequence approach. One of the reasons for using B. sylvaticum as a model system is to determine if the transgenes adversely affect perenniality and winter hardiness. Toward this goal, we examined the freezing tolerance of wild type B. sylvaticum lines to determine the optimal conditions for testing the freezing tolerance of the transgenics. A survey of seven accessions noted significant natural variation in freezing tolerance. Seedling or adult Ain-1 plants, the line used for transformation, survived an 8 hour challenge down to -6 oC and 50% survived a challenge down to -9 oC. Thus, we will be able to easily determine if the transgenes compromise freezing tolerance. In the effort to develop biotechnological tools for perennial grass improvement, we have completed the transformation of B. sylvaticum with constructs containing 20 genes shown to be associated with enhanced abiotic stress tolerance in monocots. In addition, we have transformed plants with constructs containing a combination of genes (i.e. SARK::IPT- Ubi::HSR1::Ubi::NHX1) in order to simultaneously overexpress genes associated with drought + heat tolerance + salt tolerance. We generated single copy insert T1 lines for all constructs and the generation and bulking of homozygous T2 lines is well underway. In addition to our B. sylvaticum transgenics, we transformed B. distachyon with many of the same genes. Some of the transgenic B. distachyon plants subjected to a combined stress of both drought and salinity were able to produced higher yields than wild type plants. Our results indicate a great potential for the development of grasses with improved performance and yield in water-limited areas.« less
Mikkelsen, Martin; Frank-Hansen, Rune; Hansen, Anders J; Morling, Niels
2014-09-01
of sequencing of whole mitochondrial genome, HV1 and HV2 DNA with the second generation system (SGS) Roche 454 GS Junior were compared with results of Sanger sequencing and SNP typing with SNaPshot single base extension detected with MALDI-TOF and capillary electrophoresis. We investigated the performance of the software analysis of the data, reproducibility, ability to sequence homopolymeric regions, detection of mixtures and heteroplasmy as well as the implications of the depth of coverage. We found full reproducibility between samples sequenced twice with SGS. We found close to full concordance between the mtDNA sequences of 26 samples obtained with (1) the 454 SGS method using a depth of coverage above 100 and (2) Sanger sequencing and SNP typing. The discrepancies were primarily observed in homopolymeric regions. The 454 SGS method was able to sequence 95% of the reads correctly in homopolymers up to 4 bases, and up to 6 bases could be sequenced with similar success if the results were carefully, visually inspected. The 454 technology was able to detect mixtures or heteroplasmy of approximately 10%. We detected previously unreported heteroplasmy in the GM9947A component of the NIST human mitochondrial DNA SRM-2392 standard reference material. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Phased Array 3D MR Spectroscopic Imaging of the Brain at 7 Tesla
Xu, Duan; Cunningham, Charles H; Chen, Albert P.; Li, Yan; Kelley, Douglas AC; Mukherjee, Pratik; Pauly, John M.; Nelson, Sarah J.; Vigneron, Daniel B.
2008-01-01
Ultrahigh field 7T MR scanners offer the potential for greatly improved MR spectroscopic imaging due to increased sensitivity and spectral resolution. Prior 7T human single-voxel MRS studies have shown significant increases in SNR and spectral resolution as compared to lower magnetic fields, but have not demonstrated the increase in spatial resolution and multivoxel coverage possible with 7T MR spectroscopic imaging. The goal of this study was to develop specialized rf pulses and sequences for 3D MRSI at 7T to address the challenges of increased chemical shift misregistration, B1 power limitations, and increased spectral bandwidth. The new 7T MRSI sequence was tested in volunteer studies and demonstrated the feasibility of obtaining high SNR phased-array 3D MRSI from the human brain. PMID:18486386
Biedrzycka, Aleksandra; Sebastian, Alvaro; Migalska, Magdalena; Westerdahl, Helena; Radwan, Jacek
2017-07-01
Characterization of highly duplicated genes, such as genes of the major histocompatibility complex (MHC), where multiple loci often co-amplify, has until recently been hindered by insufficient read depths per amplicon. Here, we used ultra-deep Illumina sequencing to resolve genotypes at exon 3 of MHC class I genes in the sedge warbler (Acrocephalus schoenobaenus). We sequenced 24 individuals in two replicates and used this data, as well as a simulated data set, to test the effect of amplicon coverage (range: 500-20 000 reads per amplicon) on the repeatability of genotyping using four different genotyping approaches. A third replicate employed unique barcoding to assess the extent of tag jumping, that is swapping of individual tag identifiers, which may confound genotyping. The reliability of MHC genotyping increased with coverage and approached or exceeded 90% within-method repeatability of allele calling at coverages of >5000 reads per amplicon. We found generally high agreement between genotyping methods, especially at high coverages. High reliability of the tested genotyping approaches was further supported by our analysis of the simulated data set, although the genotyping approach relying primarily on replication of variants in independent amplicons proved sensitive to repeatable errors. According to the most repeatable genotyping method, the number of co-amplifying variants per individual ranged from 19 to 42. Tag jumping was detectable, but at such low frequencies that it did not affect the reliability of genotyping. We thus demonstrate that gene families with many co-amplifying genes can be reliably genotyped using HTS, provided that there is sufficient per amplicon coverage. © 2016 John Wiley & Sons Ltd.
AD-LIBS: inferring ancestry across hybrid genomes using low-coverage sequence data.
Schaefer, Nathan K; Shapiro, Beth; Green, Richard E
2017-04-04
Inferring the ancestry of each region of admixed individuals' genomes is useful in studies ranging from disease gene mapping to speciation genetics. Current methods require high-coverage genotype data and phased reference panels, and are therefore inappropriate for many data sets. We present a software application, AD-LIBS, that uses a hidden Markov model to infer ancestry across hybrid genomes without requiring variant calling or phasing. This approach is useful for non-model organisms and in cases of low-coverage data, such as ancient DNA. We demonstrate the utility of AD-LIBS with synthetic data. We then use AD-LIBS to infer ancestry in two published data sets: European human genomes with Neanderthal ancestry and brown bear genomes with polar bear ancestry. AD-LIBS correctly infers 87-91% of ancestry in simulations and produces ancestry maps that agree with published results and global ancestry estimates in humans. In brown bears, we find more polar bear ancestry than has been published previously, using both AD-LIBS and an existing software application for local ancestry inference, HAPMIX. We validate AD-LIBS polar bear ancestry maps by recovering a geographic signal within bears that mirrors what is seen in SNP data. Finally, we demonstrate that AD-LIBS is more effective than HAPMIX at inferring ancestry when preexisting phased reference data are unavailable and genomes are sequenced to low coverage. AD-LIBS is an effective tool for ancestry inference that can be used even when few individuals are available for comparison or when genomes are sequenced to low coverage. AD-LIBS is therefore likely to be useful in studies of non-model or ancient organisms that lack large amounts of genomic DNA. AD-LIBS can therefore expand the range of studies in which admixture mapping is a viable tool.
Evaluation of the New MALDI Matrix 4-Chloro-α-Cyanocinnamic Acid
Leszyk, John D.
2010-01-01
MALDI-TOF continues to be an important tool for many proteomic studies. Recently, a new rationally designed matrix 4-chloro-α-cyanocinnamic acid was introduced, which is reported to have superior performance as compared with the “gold standard” α-cyano-4-hydroxycinnamic acid (CHCA).1 In this study, the performance of this new matrix, using the Shimadzu Biotech Axima TOF2 (Shimadzu Biotech, Manchester, UK), was investigated. The overall sequence coverage as well as sensitivity of this matrix were compared with CHCA using standard protein tryptic digests. The performance of this matrix with labile peptides, such as phosphopeptides and 4-sulfophenyl isothiocynate-derivatized peptides, to facilitate de novo sequencing was also explored. This matrix was found to be better performing than CHCA in overall sensitivity and showed better sequence coverage at low-digest levels, partly as a result of less of a bias for arginine-containing peptides. It also showed as much as a tenfold improvement in sensitivity with labile peptides on standard stainless-steel targets. In addition, as a result of the much cooler nature of this matrix, labile peptides are readily seen intact with much less fragmentation in mass spectrometry (MS) mode. This matrix was also evaluated in the MS/MS fragmentation modes of post-source decay (PSD) and collisional-induced dissociation (CID). It was found that fragmentation occurs readily in CID, however as a result of the very cool nature of this new matrix, the PSD fragments were quite weak. This matrix promises to be an important addition to the already extensive array of MALDI matrices. PMID:20592871
An improved filtering algorithm for big read datasets and its application to single-cell assembly.
Wedemeyer, Axel; Kliemann, Lasse; Srivastav, Anand; Schielke, Christian; Reusch, Thorsten B; Rosenstiel, Philip
2017-07-03
For single-cell or metagenomic sequencing projects, it is necessary to sequence with a very high mean coverage in order to make sure that all parts of the sample DNA get covered by the reads produced. This leads to huge datasets with lots of redundant data. A filtering of this data prior to assembly is advisable. Brown et al. (2012) presented the algorithm Diginorm for this purpose, which filters reads based on the abundance of their k-mers. We present Bignorm, a faster and quality-conscious read filtering algorithm. An important new algorithmic feature is the use of phred quality scores together with a detailed analysis of the k-mer counts to decide which reads to keep. We qualify and recommend parameters for our new read filtering algorithm. Guided by these parameters, we remove in terms of median 97.15% of the reads while keeping the mean phred score of the filtered dataset high. Using the SDAdes assembler, we produce assemblies of high quality from these filtered datasets in a fraction of the time needed for an assembly from the datasets filtered with Diginorm. We conclude that read filtering is a practical and efficient method for reducing read data and for speeding up the assembly process. This applies not only for single cell assembly, as shown in this paper, but also to other projects with high mean coverage datasets like metagenomic sequencing projects. Our Bignorm algorithm allows assemblies of competitive quality in comparison to Diginorm, while being much faster. Bignorm is available for download at https://git.informatik.uni-kiel.de/axw/Bignorm .
NASA Technical Reports Server (NTRS)
Velden, Christopher S.
1994-01-01
The thrust of the proposed effort under this contract is aimed at improving techniques to track water vapor data in sequences of imagery from geostationary satellites. In regards to this task, significant testing, evaluation, and progress was accomplished during this period. Sets of winds derived from Meteosat data were routinely produced during Atlantic hurricane events in the 1993 season. These wind sets were delivered via Internet in real time to the Hurricane Research Division in Miami for their evaluation in a track forecast model. For eighteen cases in which 72-hour forecasts were produced, thirteen resulted in track forecast improvements (some quite significant). In addition, quality-controlled Meteosat water vapor winds produced by NESDIS were validated against rawinsondes, yielding an 8 m/s RMS. This figure is comparable to upper-level cloud drift wind accuracies. Given the complementary horizontal coverage in cloud-free areas, we believe that water vapor vectors can supplement cloud-drift wind information to provide good full-disk coverage of the upper tropospheric flow. The impact of these winds on numerical analysis and forecasts will be tested in the next reporting period.
A HIGH COVERAGE GENOME SEQUENCE FROM AN ARCHAIC DENISOVAN INDIVIDUAL
Meyer, Matthias; Kircher, Martin; Gansauge, Marie-Theres; Li, Heng; Racimo, Fernando; Mallick, Swapan; Schraiber, Joshua G.; Jay, Flora; Prüfer, Kay; de Filippo, Cesare; Sudmant, Peter H.; Alkan, Can; Fu, Qiaomei; Do, Ron; Rohland, Nadin; Tandon, Arti; Siebauer, Michael; Green, Richard E.; Bryc, Katarzyna; Briggs, Adrian W.; Stenzel, Udo; Dabney, Jesse; Shendure, Jay; Kitzman, Jacob; Hammer, Michael F.; Shunkov, Michael V.; Derevianko, Anatoli P.; Patterson, Nick; Andrés, Aida M.; Eichler, Evan E.; Slatkin, Montgomery; Reich, David; Kelso, Janet; Pääbo, Svante
2013-01-01
We present a DNA library preparation method that has allowed us to reconstruct a high coverage (30X) genome sequence of a Denisovan, an extinct relative of Neandertals. The quality of this genome allows a direct estimation of Denisovan heterozygosity indicating that genetic diversity in these archaic hominins was extremely low. It also allows tentative dating of the specimen on the basis of “missing evolution” in its genome, detailed measurements of Denisovan and Neandertal admixture into present-day human populations, and the generation of a near-complete catalog of genetic changes that swept to high frequency in modern humans since their divergence from Denisovans. PMID:22936568
Mosaic HIV-1 vaccines expand the breadth and depth of cellular immune responses in rhesus monkeys.
Barouch, Dan H; O'Brien, Kara L; Simmons, Nathaniel L; King, Sharon L; Abbink, Peter; Maxfield, Lori F; Sun, Ying-Hua; La Porte, Annalena; Riggs, Ambryice M; Lynch, Diana M; Clark, Sarah L; Backus, Katherine; Perry, James R; Seaman, Michael S; Carville, Angela; Mansfield, Keith G; Szinger, James J; Fischer, Will; Muldoon, Mark; Korber, Bette
2010-03-01
The worldwide diversity of HIV-1 presents an unprecedented challenge for vaccine development. Antigens derived from natural HIV-1 sequences have elicited only a limited breadth of cellular immune responses in nonhuman primate studies and clinical trials to date. Polyvalent 'mosaic' antigens, in contrast, are designed to optimize cellular immunologic coverage of global HIV-1 sequence diversity. Here we show that mosaic HIV-1 Gag, Pol and Env antigens expressed by recombinant, replication-incompetent adenovirus serotype 26 vectors markedly augmented both the breadth and depth without compromising the magnitude of antigen-specific T lymphocyte responses as compared with consensus or natural sequence HIV-1 antigens in rhesus monkeys. Polyvalent mosaic antigens therefore represent a promising strategy to expand cellular immunologic vaccine coverage for genetically diverse pathogens such as HIV-1.
Miura, Naoki; Kucho, Ken-Ichi; Noguchi, Michiko; Miyoshi, Noriaki; Uchiumi, Toshiki; Kawaguchi, Hiroaki; Tanimoto, Akihide
2014-01-01
The microminipig, which weighs less than 10 kg at an early stage of maturity, has been reported as a potential experimental model animal. Its extremely small size and other distinct characteristics suggest the possibility of a number of differences between the genome of the microminipig and that of conventional pigs. In this study, we analyzed the genomes of two healthy microminipigs using a next-generation sequencer SOLiD™ system. We then compared the obtained genomic sequences with a genomic database for the domestic pig (Sus scrofa). The mapping coverage of sequenced tag from the microminipig to conventional pig genomic sequences was greater than 96% and we detected no clear, substantial genomic variance from these data. The results may indicate that the distinct characteristics of the microminipig derive from small-scale alterations in the genome, such as Single Nucleotide Polymorphisms or translational modifications, rather than large-scale deletion or insertion polymorphisms. Further investigation of the entire genomic sequence of the microminipig with methods enabling deeper coverage is required to elucidate the genetic basis of its distinct phenotypic traits. Copyright © 2014 International Institute of Anticancer Research (Dr. John G. Delinassios), All rights reserved.
Comparison and quantitative verification of mapping algorithms for whole genome bisulfite sequencing
USDA-ARS?s Scientific Manuscript database
Coupling bisulfite conversion with next-generation sequencing (Bisulfite-seq) enables genome-wide measurement of DNA methylation, but poses unique challenges for mapping. However, despite a proliferation of Bisulfite-seq mapping tools, no systematic comparison of their genomic coverage and quantitat...
Fast imputation using medium or low-coverage sequence data
USDA-ARS?s Scientific Manuscript database
Accurate genotype imputation can greatly reduce costs and increase benefits by combining whole-genome sequence data of varying read depth and microarray genotypes of varying densities. For large populations, an efficient strategy chooses the two haplotypes most likely to form each genotype and updat...
Role of Mitochondrial Inheritance on Prostate Cancer Outcome in African American Men. Addendum
2016-11-01
DNA sequencing technique developed by our collaborator using single amplicon long-range PCR that permits deep coverage (10,000-20,000X on average) of...the mitochondrial genome. We have sequenced 652 samples derived from frozen fully using this technology. The additional DNA samples derived from...paraffin embedded (FFPE) tissue were more challenging, but have now been sequenced . Mapping of DNA variants in our sequenced genomes to mitochondrial
Aziz, Ramy K.; Dwivedi, Bhakti; Akhter, Sajia; Breitbart, Mya; Edwards, Robert A.
2015-01-01
Phages are the most abundant biological entities on Earth and play major ecological roles, yet the current sequenced phage genomes do not adequately represent their diversity, and little is known about the abundance and distribution of these sequenced genomes in nature. Although the study of phage ecology has benefited tremendously from the emergence of metagenomic sequencing, a systematic survey of phage genes and genomes in various ecosystems is still lacking, and fundamental questions about phage biology, lifestyle, and ecology remain unanswered. To address these questions and improve comparative analysis of phages in different metagenomes, we screened a core set of publicly available metagenomic samples for sequences related to completely sequenced phages using the web tool, Phage Eco-Locator. We then adopted and deployed an array of mathematical and statistical metrics for a multidimensional estimation of the abundance and distribution of phage genes and genomes in various ecosystems. Experiments using those metrics individually showed their usefulness in emphasizing the pervasive, yet uneven, distribution of known phage sequences in environmental metagenomes. Using these metrics in combination allowed us to resolve phage genomes into clusters that correlated with their genotypes and taxonomic classes as well as their ecological properties. We propose adding this set of metrics to current metaviromic analysis pipelines, where they can provide insight regarding phage mosaicism, habitat specificity, and evolution. PMID:26005436
Aziz, Ramy K.; Dwivedi, Bhakti; Akhter, Sajia; ...
2015-05-08
Phages are the most abundant biological entities on Earth and play major ecological roles, yet the current sequenced phage genomes do not adequately represent their diversity, and little is known about the abundance and distribution of these sequenced genomes in nature. Although the study of phage ecology has benefited tremendously from the emergence of metagenomic sequencing, a systematic survey of phage genes and genomes in various ecosystems is still lacking, and fundamental questions about phage biology, lifestyle, and ecology remain unanswered. To address these questions and improve comparative analysis of phages in different metagenomes, we screened a core set ofmore » publicly available metagenomic samples for sequences related to completely sequenced phages using the web tool, Phage Eco-Locator. We then adopted and deployed an array of mathematical and statistical metrics for a multidimensional estimation of the abundance and distribution of phage genes and genomes in various ecosystems. Experiments using those metrics individually showed their usefulness in emphasizing the pervasive, yet uneven, distribution of known phage sequences in environmental metagenomes. Using these metrics in combination allowed us to resolve phage genomes into clusters that correlated with their genotypes and taxonomic classes as well as their ecological properties. By adding this set of metrics to current metaviromic analysis pipelines, where they can provide insight regarding phage mosaicism, habitat specificity, and evolution.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aziz, Ramy K.; Dwivedi, Bhakti; Akhter, Sajia
Phages are the most abundant biological entities on Earth and play major ecological roles, yet the current sequenced phage genomes do not adequately represent their diversity, and little is known about the abundance and distribution of these sequenced genomes in nature. Although the study of phage ecology has benefited tremendously from the emergence of metagenomic sequencing, a systematic survey of phage genes and genomes in various ecosystems is still lacking, and fundamental questions about phage biology, lifestyle, and ecology remain unanswered. To address these questions and improve comparative analysis of phages in different metagenomes, we screened a core set ofmore » publicly available metagenomic samples for sequences related to completely sequenced phages using the web tool, Phage Eco-Locator. We then adopted and deployed an array of mathematical and statistical metrics for a multidimensional estimation of the abundance and distribution of phage genes and genomes in various ecosystems. Experiments using those metrics individually showed their usefulness in emphasizing the pervasive, yet uneven, distribution of known phage sequences in environmental metagenomes. Using these metrics in combination allowed us to resolve phage genomes into clusters that correlated with their genotypes and taxonomic classes as well as their ecological properties. By adding this set of metrics to current metaviromic analysis pipelines, where they can provide insight regarding phage mosaicism, habitat specificity, and evolution.« less
Candidate mosaic proteins for a pan-filoviral cytotoxic T-Cell lymphocyte vaccine
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fenimore, Paul W; Fischer, William M; Kuiken, Carla
The extremely high fatality rates of many filovirus (FILV) strains the recurrent but rarely identified origin of human epidemics, the only partly identified viral reservoirs and the continuing non-human primate epizootics in Africa make a broadly-protective filovirus vaccine highly desirable. Cytotoxic T-cells (CTL) have been shown to be protective in mice, guinea pigs and non-human primates. In murine models the cytotoxic T-cell epitopes that are protective against Ebola virus have been mapped and in non-human primates CTL-mediated protection between viral strains (John Dye: specify) has been demonstrated using two filoviral proteins, nucleoprotein (NP) and glycoprotein (GP). These immunological results suggestmore » that the CTL avenue of immunity deserves consideration for a vaccine. The poorly-understood viral reservoirs means that it is difficult to predict what strains are likely to cause epidemics. Thus, there is a premium on developing a pan-filoviral vaccine. The genetic diversity of FILV is large, roughly the same scale as human immunodeficiency virus (HIV). This presents a serious challenge for the vaccine designer because a traditional vaccine aspiring to pan-filoviral coverage is likely to require the inclusion of many antigenic reagents. A recent method for optimizing cytotoxic T-cell lymphocyte epitope coverage with mosaic antigens was successful in improving potential CTL epitope coverage against HIV and may be useful in the context of very different viruses, such as the filoviruses discussed here. Mosaic proteins are recombinants composed of fragments of wild-type proteins joined at locations resulting in exclusively natural k-mers, 9 {le} k {le} 15, and having approximately the same length as the wild-type proteins. The use of mosaic antigens is motivated by three conjectures: (1) optimizing a mosaic protein to maximize coverage of k-mers found in a set of reference proteins will give better odds of including broadly-protective CTL epitopes in a vaccine than is possible with a wild-type protein, (2) reducing the number of low-prevalence k-mers minimizes the likelihood of undesirable immunodominance, and (3) excluding exogenous k-mers will result in mosaic proteins whose processing for presentation is close to what occurs with wild-type proteins. The first and second applications of the mosaic method were to HIV and Hepatitis C Virus (HCV). HIV is the virus with the largest number of known sequences, and consequently a plethora of information for the CTL vaccine designer to incorporate into their mosaics. Experience with HIV and HCV mosaics supports the validity of the three conjectures above. The available FILV sequences are probably closer to the minimum amount of information needed to make a meaningful mosaic vaccine candidate. There were 532 protein sequences in the National Institutes of Health GenPept database in November 2007 when our reference set was downloaded. These sequences come from both Ebola and Marburg viruses (EBOV and MARV), representing transcripts of all 7 genes. The coverage of viral diversity by the 7 genes is variable, with genes 1 (nucleoprotein, NP), 4 (glycoprotein, GP; soluble glycoprotein, sGP) and 7 (polymerase, L) giving the best coverage. Broadly-protective vaccine candidates for diverse viruses, such as HIV or Hepatitis C virus (HCV) have required pools of antigens. FILV is similar in this regard. While we have designed CTL mosaic proteins using all 7 types of filoviral proteins, only NP, GP and L proteins are reported here. If it were important to include other proteins in a mosaic CTL vaccine, additional sequences would be required to cover the space of known viral diversity.« less
Testing the molecular clock using mechanistic models of fossil preservation and molecular evolution.
Warnock, Rachel C M; Yang, Ziheng; Donoghue, Philip C J
2017-06-28
Molecular sequence data provide information about relative times only, and fossil-based age constraints are the ultimate source of information about absolute times in molecular clock dating analyses. Thus, fossil calibrations are critical to molecular clock dating, but competing methods are difficult to evaluate empirically because the true evolutionary time scale is never known. Here, we combine mechanistic models of fossil preservation and sequence evolution in simulations to evaluate different approaches to constructing fossil calibrations and their impact on Bayesian molecular clock dating, and the relative impact of fossil versus molecular sampling. We show that divergence time estimation is impacted by the model of fossil preservation, sampling intensity and tree shape. The addition of sequence data may improve molecular clock estimates, but accuracy and precision is dominated by the quality of the fossil calibrations. Posterior means and medians are poor representatives of true divergence times; posterior intervals provide a much more accurate estimate of divergence times, though they may be wide and often do not have high coverage probability. Our results highlight the importance of increased fossil sampling and improved statistical approaches to generating calibrations, which should incorporate the non-uniform nature of ecological and temporal fossil species distributions. © 2017 The Authors.
Coprolites as a source of information on the genome and diet of the cave hyena
Bon, Céline; Berthonaud, Véronique; Maksud, Frédéric; Labadie, Karine; Poulain, Julie; Artiguenave, François; Wincker, Patrick; Aury, Jean-Marc; Elalouf, Jean-Marc
2012-01-01
We performed high-throughput sequencing of DNA from fossilized faeces to evaluate this material as a source of information on the genome and diet of Pleistocene carnivores. We analysed coprolites derived from the extinct cave hyena (Crocuta crocuta spelaea), and sequenced 90 million DNA fragments from two specimens. The DNA reads enabled a reconstruction of the cave hyena mitochondrial genome with up to a 158-fold coverage. This genome, and those sequenced from extant spotted (Crocuta crocuta) and striped (Hyaena hyaena) hyena specimens, allows for the establishment of a robust phylogeny that supports a close relationship between the cave and the spotted hyena. We also demonstrate that high-throughput sequencing yields data for cave hyena multi-copy and single-copy nuclear genes, and that about 50 per cent of the coprolite DNA can be ascribed to this species. Analysing the data for additional species to indicate the cave hyena diet, we retrieved abundant sequences for the red deer (Cervus elaphus), and characterized its mitochondrial genome with up to a 3.8-fold coverage. In conclusion, we have demonstrated the presence of abundant ancient DNA in the coprolites surveyed. Shotgun sequencing of this material yielded a wealth of DNA sequences for a Pleistocene carnivore and allowed unbiased identification of diet. PMID:22456883
Sequence Analysis of the Genome of Carnation (Dianthus caryophyllus L.)
Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi
2014-01-01
The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. ‘Francesco’ was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568 887 315 bp, consisting of 45 088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16 644 bp and 60 737 bp, respectively, and the longest scaffold was 1 287 144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. PMID:24344172
Scaglione, Davide; Reyes-Chin-Wo, Sebastian; Acquadro, Alberto; Froenicke, Lutz; Portis, Ezio; Beitel, Christopher; Tirone, Matteo; Mauro, Rosario; Lo Monaco, Antonino; Mauromicale, Giovanni; Faccioli, Primetta; Cattivelli, Luigi; Rieseberg, Loren; Michelmore, Richard; Lanteri, Sergio
2016-01-01
Globe artichoke (Cynara cardunculus var. scolymus) is an out-crossing, perennial, multi-use crop species that is grown worldwide and belongs to the Compositae, one of the most successful Angiosperm families. We describe the first genome sequence of globe artichoke. The assembly, comprising of 13,588 scaffolds covering 725 of the 1,084 Mb genome, was generated using ~133-fold Illumina sequencing data and encodes 26,889 predicted genes. Re-sequencing (30×) of globe artichoke and cultivated cardoon (C. cardunculus var. altilis) parental genotypes and low-coverage (0.5 to 1×) genotyping-by-sequencing of 163 F1 individuals resulted in 73% of the assembled genome being anchored in 2,178 genetic bins ordered along 17 chromosomal pseudomolecules. This was achieved using a novel pipeline, SOILoCo (Scaffold Ordering by Imputation with Low Coverage), to detect heterozygous regions and assign parental haplotypes with low sequencing read depth and of unknown phase. SOILoCo provides a powerful tool for de novo genome analysis of outcrossing species. Our data will enable genome-scale analyses of evolutionary processes among crops, weeds, and wild species within and beyond the Compositae, and will facilitate the identification of economically important genes from related species. PMID:26786968
A Pan-HIV Strategy for Complete Genome Sequencing
Yamaguchi, Julie; Alessandri-Gradt, Elodie; Tell, Robert W.; Brennan, Catherine A.
2015-01-01
Molecular surveillance is essential to monitor HIV diversity and track emerging strains. We have developed a universal library preparation method (HIV-SMART [i.e., switching mechanism at 5′ end of RNA transcript]) for next-generation sequencing that harnesses the specificity of HIV-directed priming to enable full genome characterization of all HIV-1 groups (M, N, O, and P) and HIV-2. Broad application of the HIV-SMART approach was demonstrated using a panel of diverse cell-cultured virus isolates. HIV-1 non-subtype B-infected clinical specimens from Cameroon were then used to optimize the protocol to sequence directly from plasma. When multiplexing 8 or more libraries per MiSeq run, full genome coverage at a median ∼2,000× depth was routinely obtained for either sample type. The method reproducibly generated the same consensus sequence, consistently identified viral sequence heterogeneity present in specimens, and at viral loads of ≤4.5 log copies/ml yielded sufficient coverage to permit strain classification. HIV-SMART provides an unparalleled opportunity to identify diverse HIV strains in patient specimens and to determine phylogenetic classification based on the entire viral genome. Easily adapted to sequence any RNA virus, this technology illustrates the utility of next-generation sequencing (NGS) for viral characterization and surveillance. PMID:26699702
Clinical next-generation sequencing in patients with non-small cell lung cancer.
Hagemann, Ian S; Devarakonda, Siddhartha; Lockwood, Christina M; Spencer, David H; Guebert, Kalin; Bredemeyer, Andrew J; Al-Kateb, Hussam; Nguyen, TuDung T; Duncavage, Eric J; Cottrell, Catherine E; Kulkarni, Shashikant; Nagarajan, Rakesh; Seibert, Karen; Baggstrom, Maria; Waqar, Saiama N; Pfeifer, John D; Morgensztern, Daniel; Govindan, Ramaswamy
2015-02-15
A clinical assay was implemented to perform next-generation sequencing (NGS) of genes commonly mutated in multiple cancer types. This report describes the feasibility and diagnostic yield of this assay in 381 consecutive patients with non-small cell lung cancer (NSCLC). Clinical targeted sequencing of 23 genes was performed with DNA from formalin-fixed, paraffin-embedded (FFPE) tumor tissue. The assay used Agilent SureSelect hybrid capture followed by Illumina HiSeq 2000, MiSeq, or HiSeq 2500 sequencing in a College of American Pathologists-accredited, Clinical Laboratory Improvement Amendments-certified laboratory. Single-nucleotide variants and insertion/deletion events were reported. This assay was performed before methods were developed to detect rearrangements by NGS. Two hundred nine of all requisitioned samples (55%) were successfully sequenced. The most common reason for not performing the sequencing was an insufficient quantity of tissue available in the blocks (29%). Excisional, endoscopic, and core biopsy specimens were sufficient for testing in 95%, 66%, and 40% of the cases, respectively. The median turnaround time (TAT) in the pathology laboratory was 21 days, and there was a trend of an improved TAT with more rapid sequencing platforms. Sequencing yielded a mean coverage of 1318×. Potentially actionable mutations (ie, predictive or prognostic) were identified in 46% of 209 samples and were most commonly found in KRAS (28%), epidermal growth factor receptor (14%), phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit alpha (4%), phosphatase and tensin homolog (1%), and BRAF (1%). Five percent of the samples had multiple actionable mutations. A targeted therapy was instituted on the basis of NGS in 11% of the sequenced patients or in 6% of all patients. NGS-based diagnostics are feasible in NSCLC and provide clinically relevant information from readily available FFPE tissue. The sample type is associated with the probability of successful testing. © 2014 American Cancer Society.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhu, Ying; Zhao, Rui; Piehowski, Paul D.
One of the greatest challenges for mass spectrometry (MS)-based proteomics is the limited ability to analyze small samples. Here we investigate the relative contributions of liquid chromatography (LC), MS instrumentation and data analysis methods with the aim of improving proteome coverage for sample sizes ranging from 0.5 ng to 50 ng. We show that the LC separations utilizing 30-µm-i.d. columns increase signal intensity by >3-fold relative to those using 75-µm-i.d. columns, leading to 32% increase in peptide identifications. The Orbitrap Fusion Lumos mass spectrometer significantly boosted both sensitivity and sequencing speed relative to earlier generation Orbitraps (e.g., LTQ-Orbitrap), leading tomore » a ~3× increase in peptide identifications and 1.7× increase in identified protein groups for 2 ng tryptic digests of bacterial lysate. The Match Between Runs algorithm of open-source MaxQuant software further increased proteome coverage by ~ 95% for 0.5 ng samples and by ~42% for 2 ng samples. The present platform is capable of identifying >3000 protein groups from tryptic digestion of cell lysates equivalent to 50 HeLa cells and 100 THP-1 cells (~10 ng total proteins), respectively, and >950 proteins from subnanogram bacterial and archaeal cell lysates. The present ultrasensitive LC-MS platform is expected to enable deep proteome coverage for subnanogram samples, including single mammalian cells.« less
Enhancing genome assemblies by integrating non-sequence based data
2011-01-01
Introduction Many genome projects were underway before the advent of high-throughput sequencing and have thus been supported by a wealth of genome information from other technologies. Such information frequently takes the form of linkage and physical maps, both of which can provide a substantial amount of data useful in de novo sequencing projects. Furthermore, the recent abundance of genome resources enables the use of conserved synteny maps identified in related species to further enhance genome assemblies. Methods The tammar wallaby (Macropus eugenii) is a model marsupial mammal with a low coverage genome. However, we have access to extensive comparative maps containing over 14,000 markers constructed through the physical mapping of conserved loci, chromosome painting and comprehensive linkage maps. Using a custom Bioperl pipeline, information from the maps was aligned to assembled tammar wallaby contigs using BLAT. This data was used to construct pseudo paired-end libraries with intervals ranging from 5-10 MB. We then used Bambus (a program designed to scaffold eukaryotic genomes by ordering and orienting contigs through the use of paired-end data) to scaffold our libraries. To determine how map data compares to sequence based approaches to enhance assemblies, we repeated the experiment using a 0.5× coverage of unique reads from 4 KB and 8 KB Illumina paired-end libraries. Finally, we combined both the sequence and non-sequence-based data to determine how a combined approach could further enhance the quality of the low coverage de novo reconstruction of the tammar wallaby genome. Results Using the map data alone, we were able order 2.2% of the initial contigs into scaffolds, and increase the N50 scaffold size to 39 KB (36 KB in the original assembly). Using only the 0.5× paired-end sequence based data, 53% of the initial contigs were assigned to scaffolds. Combining both data sets resulted in a further 2% increase in the number of initial contigs integrated into a scaffold (55% total) but a 35% increase in N50 scaffold size over the use of sequence-based data alone. Conclusions We provide a relatively simple pipeline utilizing existing bioinformatics tools to integrate map data into a genome assembly which is available at http://www.mcb.uconn.edu/fac.php?name=paska. While the map data only contributed minimally to assigning the initial contigs to scaffolds in the new assembly, it greatly increased the N50 size. This process added structure to our low coverage assembly, greatly increasing its utility in further analyses. PMID:21554765
Enhancing genome assemblies by integrating non-sequence based data.
Heider, Thomas N; Lindsay, James; Wang, Chenwei; O'Neill, Rachel J; Pask, Andrew J
2011-05-28
Many genome projects were underway before the advent of high-throughput sequencing and have thus been supported by a wealth of genome information from other technologies. Such information frequently takes the form of linkage and physical maps, both of which can provide a substantial amount of data useful in de novo sequencing projects. Furthermore, the recent abundance of genome resources enables the use of conserved synteny maps identified in related species to further enhance genome assemblies. The tammar wallaby (Macropus eugenii) is a model marsupial mammal with a low coverage genome. However, we have access to extensive comparative maps containing over 14,000 markers constructed through the physical mapping of conserved loci, chromosome painting and comprehensive linkage maps. Using a custom Bioperl pipeline, information from the maps was aligned to assembled tammar wallaby contigs using BLAT. This data was used to construct pseudo paired-end libraries with intervals ranging from 5-10 MB. We then used Bambus (a program designed to scaffold eukaryotic genomes by ordering and orienting contigs through the use of paired-end data) to scaffold our libraries. To determine how map data compares to sequence based approaches to enhance assemblies, we repeated the experiment using a 0.5× coverage of unique reads from 4 KB and 8 KB Illumina paired-end libraries. Finally, we combined both the sequence and non-sequence-based data to determine how a combined approach could further enhance the quality of the low coverage de novo reconstruction of the tammar wallaby genome. Using the map data alone, we were able order 2.2% of the initial contigs into scaffolds, and increase the N50 scaffold size to 39 KB (36 KB in the original assembly). Using only the 0.5× paired-end sequence based data, 53% of the initial contigs were assigned to scaffolds. Combining both data sets resulted in a further 2% increase in the number of initial contigs integrated into a scaffold (55% total) but a 35% increase in N50 scaffold size over the use of sequence-based data alone. We provide a relatively simple pipeline utilizing existing bioinformatics tools to integrate map data into a genome assembly which is available at http://www.mcb.uconn.edu/fac.php?name=paska. While the map data only contributed minimally to assigning the initial contigs to scaffolds in the new assembly, it greatly increased the N50 size. This process added structure to our low coverage assembly, greatly increasing its utility in further analyses.
Report on noninvasive prenatal testing: classical and alternative approaches.
Pantiukh, Kateryna S; Chekanov, Nikolay N; Zaigrin, Igor V; Zotov, Alexei M; Mazur, Alexander M; Prokhortchouk, Egor B
2016-01-01
Concerns of traditional prenatal aneuploidy testing methods, such as low accuracy of noninvasive and health risks associated with invasive procedures, were overcome with the introduction of novel noninvasive methods based on genetics (NIPT). These were rapidly adopted into clinical practice in many countries after a series of successful trials of various independent submethods. Here we present results of own NIPT trial carried out in Moscow, Russia. 1012 samples were subjected to the method aimed at measuring chromosome coverage by massive parallel sequencing. Two alternative approaches are ascertained: one based on maternal/fetal differential methylation and another based on allelic difference. While the former failed to provide stable results, the latter was found to be promising and worthy of conducting a large-scale trial. One critical point in any NIPT approach is the determination of fetal cell-free DNA fraction, which dictates the reliability of obtained results for a given sample. We show that two different chromosome Y representation measures-by real-time PCR and by whole-genome massive parallel sequencing-are practically interchangeable (r=0.94). We also propose a novel method based on maternal/fetal allelic difference which is applicable in pregnancies with fetuses of either sex. Even in its pilot form it correlates well with chromosome Y coverage estimates (r=0.74) and can be further improved by increasing the number of polymorphisms.
Han, Koeun; Jeong, Hee-Jin; Yang, Hee-Bum; Kang, Sung-Min; Kwon, Jin-Kyung; Kim, Seungill; Choi, Doil; Kang, Byoung-Cheorl
2016-04-01
Most agricultural traits are controlled by quantitative trait loci (QTLs); however, there are few studies on QTL mapping of horticultural traits in pepper (Capsicum spp.) due to the lack of high-density molecular maps and the sequence information. In this study, an ultra-high-density map and 120 recombinant inbred lines (RILs) derived from a cross between C. annuum'Perennial' and C. annuum'Dempsey' were used for QTL mapping of horticultural traits. Parental lines and RILs were resequenced at 18× and 1× coverage, respectively. Using a sliding window approach, an ultra-high-density bin map containing 2,578 bins was constructed. The total map length of the map was 1,372 cM, and the average interval between bins was 0.53 cM. A total of 86 significant QTLs controlling 17 horticultural traits were detected. Among these, 32 QTLs controlling 13 traits were major QTLs. Our research shows that the construction of bin maps using low-coverage sequence is a powerful method for QTL mapping, and that the short intervals between bins are helpful for fine-mapping of QTLs. Furthermore, bin maps can be used to improve the quality of reference genomes by elucidating the genetic order of unordered regions and anchoring unassigned scaffolds to linkage groups. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Iehisa, Julio Cesar Masaru; Ohno, Ryoko; Kimura, Tatsuro; Enoki, Hiroyuki; Nishimura, Satoru; Okamoto, Yuki; Nasuda, Shuhei; Takumi, Shigeo
2014-01-01
The large genome and allohexaploidy of common wheat have complicated construction of a high-density genetic map. Although improvements in the throughput of next-generation sequencing (NGS) technologies have made it possible to obtain a large amount of genotyping data for an entire mapping population by direct sequencing, including hexaploid wheat, a significant number of missing data points are often apparent due to the low coverage of sequencing. In the present study, a microarray-based polymorphism detection system was developed using NGS data obtained from complexity-reduced genomic DNA of two common wheat cultivars, Chinese Spring (CS) and Mironovskaya 808. After design and selection of polymorphic probes, 13,056 new markers were added to the linkage map of a recombinant inbred mapping population between CS and Mironovskaya 808. On average, 2.49 missing data points per marker were observed in the 201 recombinant inbred lines, with a maximum of 42. Around 40% of the new markers were derived from genic regions and 11% from repetitive regions. The low number of retroelements indicated that the new polymorphic markers were mainly derived from the less repetitive region of the wheat genome. Around 25% of the mapped sequences were useful for alignment with the physical map of barley. Quantitative trait locus (QTL) analyses of 14 agronomically important traits related to flowering, spikes, and seeds demonstrated that the new high-density map showed improved QTL detection, resolution, and accuracy over the original simple sequence repeat map. PMID:24972598
Iehisa, Julio Cesar Masaru; Ohno, Ryoko; Kimura, Tatsuro; Enoki, Hiroyuki; Nishimura, Satoru; Okamoto, Yuki; Nasuda, Shuhei; Takumi, Shigeo
2014-10-01
The large genome and allohexaploidy of common wheat have complicated construction of a high-density genetic map. Although improvements in the throughput of next-generation sequencing (NGS) technologies have made it possible to obtain a large amount of genotyping data for an entire mapping population by direct sequencing, including hexaploid wheat, a significant number of missing data points are often apparent due to the low coverage of sequencing. In the present study, a microarray-based polymorphism detection system was developed using NGS data obtained from complexity-reduced genomic DNA of two common wheat cultivars, Chinese Spring (CS) and Mironovskaya 808. After design and selection of polymorphic probes, 13,056 new markers were added to the linkage map of a recombinant inbred mapping population between CS and Mironovskaya 808. On average, 2.49 missing data points per marker were observed in the 201 recombinant inbred lines, with a maximum of 42. Around 40% of the new markers were derived from genic regions and 11% from repetitive regions. The low number of retroelements indicated that the new polymorphic markers were mainly derived from the less repetitive region of the wheat genome. Around 25% of the mapped sequences were useful for alignment with the physical map of barley. Quantitative trait locus (QTL) analyses of 14 agronomically important traits related to flowering, spikes, and seeds demonstrated that the new high-density map showed improved QTL detection, resolution, and accuracy over the original simple sequence repeat map. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
NASA Astrophysics Data System (ADS)
Abate, A.; Pressello, M. C.; Benassi, M.; Strigari, L.
2009-12-01
The aim of this study was to evaluate the effectiveness and efficiency in inverse IMRT planning of one-step optimization with the step-and-shoot (SS) technique as compared to traditional two-step optimization using the sliding windows (SW) technique. The Pinnacle IMRT TPS allows both one-step and two-step approaches. The same beam setup for five head-and-neck tumor patients and dose-volume constraints were applied for all optimization methods. Two-step plans were produced converting the ideal fluence with or without a smoothing filter into the SW sequence. One-step plans, based on direct machine parameter optimization (DMPO), had the maximum number of segments per beam set at 8, 10, 12, producing a directly deliverable sequence. Moreover, the plans were generated whether a split-beam was used or not. Total monitor units (MUs), overall treatment time, cost function and dose-volume histograms (DVHs) were estimated for each plan. PTV conformality and homogeneity indexes and normal tissue complication probability (NTCP) that are the basis for improving therapeutic gain, as well as non-tumor integral dose (NTID), were evaluated. A two-sided t-test was used to compare quantitative variables. All plans showed similar target coverage. Compared to two-step SW optimization, the DMPO-SS plans resulted in lower MUs (20%), NTID (4%) as well as NTCP values. Differences of about 15-20% in the treatment delivery time were registered. DMPO generates less complex plans with identical PTV coverage, providing lower NTCP and NTID, which is expected to reduce the risk of secondary cancer. It is an effective and efficient method and, if available, it should be favored over the two-step IMRT planning.
Fan, Baoli; Zhang, Aiping; Yang, Yi; Ma, Quanlin; Li, Xuemin; Zhao, Changming
2016-01-01
The xerophytic desert shrub Haloxylon ammodendron (C. A. Mey.) Bunge. is distributed naturally in Asian and African deserts, and is widely used for vegetation restoration in the desert regions of Northern China. However, there are limited long-term chrono-sequence studies on the impact of changed soil properties and vegetation dynamics following establishment of this shrub on mobile sand dunes. In Minqin County, Gansu Province, we investigated soil properties and herbaceous vegetation development of 10, 20, 30, 40, 50-year-old H. ammodendron plantations on mobile sand dunes. Soil sampling at two depths (0–5 and 5–20 cm) under the shrubs determined SOC, nutrition and soil physical characteristics. The results showed that: establishment of H. ammodendron had improved soil physio-chemical properties, increased thickness of soil crusts and coverage of biological soil crusts (BSCs), and promoted development of topsoil over an extended period of 5 decades. Soil texture and soil nutrition improved along the chrono-sequence according to three distinct phases: i) an initial fast development from 0 to 10 years, ii) a stabilizing phase from 10 to 30 years followed by iii) a relatively marked restoration development in 40 and 50-year-old plantations. Meanwhile, herbaceous community coverage also markedly increased in 30-year-old plantations. However, both soil and vegetation restoration were very slow due to low annual precipitation in Minqin county compared to other Northern China sand afforestation sites. Canonical Correspondence Analysis results demonstrated that herbaceous plant development was closely associated with changes in soil texture (increased clay and silt percentage) and availability of soil nutrients. Thus our results indicated that selection of the long-lived shrub H. ammodendron is an essential and effective tool in arid desert re-vegetation. PMID:27992458
Martino, Amanda J.; Rhodes, Matthew E.; Biddle, Jennifer F.; Brandt, Leah D.; Tomsho, Lynn P.; House, Christopher H.
2011-01-01
A degenerate polymerase chain reaction (PCR)-based method of whole-genome amplification, designed to work fluidly with 454 sequencing technology, was developed and tested for use on deep marine subsurface DNA samples. While optimized here for use with Roche 454 technology, the general framework presented may be applicable to other next generation sequencing systems as well (e.g., Illumina, Ion Torrent). The method, which we have called random amplification metagenomic PCR (RAMP), involves the use of specific primers from Roche 454 amplicon sequencing, modified by the addition of a degenerate region at the 3′ end. It utilizes a PCR reaction, which resulted in no amplification from blanks, even after 50 cycles of PCR. After efforts to optimize experimental conditions, the method was tested with DNA extracted from cultured E. coli cells, and genome coverage was estimated after sequencing on three different occasions. Coverage did not vary greatly with the different experimental conditions tested, and was around 62% with a sequencing effort equivalent to a theoretical genome coverage of 14.10×. The GC content of the sequenced amplification product was within 2% of the predicted values for this strain of E. coli. The method was also applied to DNA extracted from marine subsurface samples from ODP Leg 201 site 1229 (Peru Margin), and results of a taxonomic analysis revealed microbial communities dominated by Proteobacteria, Chloroflexi, Firmicutes, Euryarchaeota, and Crenarchaeota, among others. These results were similar to those obtained previously for those samples; however, variations in the proportions of taxa identified illustrates well the generally accepted view that community analysis is sensitive to both the amplification technique used and the method of assigning sequences to taxonomic groups. Overall, we find that RAMP represents a valid methodology for amplifying metagenomes from low-biomass samples. PMID:22319519
Lin, Jing; Pramono, Zacharias Aloysius Dwi; Maurer-Stroh, Sebastian
2016-01-01
The multiple circulating human influenza A virus subtypes coupled with the perpetual genomic mutations and segment reassortment events challenge the development of effective therapeutics. The capacity to drug most RNAs motivates the investigation on viral RNA targets. 123,060 segment sequences from 35,938 strains of the most prevalent subtypes also infecting humans–H1N1, 2009 pandemic H1N1, H3N2, H5N1 and H7N9, were used to identify 1,183 conserved RNA target sequences (≥15-mer) in the internal segments. 100% theoretical coverage in simultaneous heterosubtypic targeting is achieved by pairing specific sequences from the same segment (“Duals”) or from two segments (“Doubles”); 1,662 Duals and 28,463 Doubles identified. By combining specific Duals and/or Doubles to form a target graph wherein an edge connecting two vertices (target sequences) represents a Dual or Double, it is possible to hedge against antiviral resistance besides maintaining 100% heterosubtypic coverage. To evaluate the hedging potential, we define the hedge-factor as the minimum number of resistant target sequences that will render the graph to become resistant i.e. eliminate all the edges therein; a target sequence or a graph is considered resistant when it cannot achieve 100% heterosubtypic coverage. In an n-vertices graph (n ≥ 3), the hedge-factor is maximal (= n– 1) when it is a complete graph i.e. every distinct pair in a graph is either a Dual or Double. Computational analyses uncover an extensive number of complete graphs of different sizes. Monte Carlo simulations show that the mutation counts and time elapsed for a target graph to become resistant increase with the hedge-factor. Incidentally, target sequences which were reported to reduce virus titre in experiments are included in our target graphs. The identity of target sequence pairs for heterosubtypic targeting and their combinations for hedging antiviral resistance are useful toolkits to construct target graphs for different therapeutic objectives. PMID:26771381
Nguyen, Nam-Ninh; Srihari, Sriganesh; Leong, Hon Wai; Chong, Ket-Fah
2015-10-01
Determining the entire complement of enzymes and their enzymatic functions is a fundamental step for reconstructing the metabolic network of cells. High quality enzyme annotation helps in enhancing metabolic networks reconstructed from the genome, especially by reducing gaps and increasing the enzyme coverage. Currently, structure-based and network-based approaches can only cover a limited number of enzyme families, and the accuracy of homology-based approaches can be further improved. Bottom-up homology-based approach improves the coverage by rebuilding Hidden Markov Model (HMM) profiles for all known enzymes. However, its clustering procedure relies firmly on BLAST similarity score, ignoring protein domains/patterns, and is sensitive to changes in cut-off thresholds. Here, we use functional domain architecture to score the association between domain families and enzyme families (Domain-Enzyme Association Scoring, DEAS). The DEAS score is used to calculate the similarity between proteins, which is then used in clustering procedure, instead of using sequence similarity score. We improve the enzyme annotation protocol using a stringent classification procedure, and by choosing optimal threshold settings and checking for active sites. Our analysis shows that our stringent protocol EnzDP can cover up to 90% of enzyme families available in Swiss-Prot. It achieves a high accuracy of 94.5% based on five-fold cross-validation. EnzDP outperforms existing methods across several testing scenarios. Thus, EnzDP serves as a reliable automated tool for enzyme annotation and metabolic network reconstruction. Available at: www.comp.nus.edu.sg/~nguyennn/EnzDP .
Ahmed, Shakil; Annear, Peter Leslie; Phonvisay, Bouaphat; Phommavong, Chansaly; Cruz, Valeria de Oliveira; Hammerich, Asmus; Jacobs, Bart
2013-11-01
There is now widespread acceptance of the universal coverage approach, presented in the 2010 World Health Report. There are more and more voices for the benefit of creating a single national risk pool. Now, a body of literature is emerging on institutional design and organizational practice for universal coverage, related to management of the three health-financing functions: collection, pooling and purchasing. While all countries can move towards universal coverage, lower-income countries face particular challenges, including scarce resources and limited capacity. Recently, the Lao PDR has been preparing options for moving to a single national health insurance scheme. The aim is to combine four different social health protection schemes into a national health insurance authority (NHIA) with a single national fund- and risk-pool. This paper investigates the main institutional and organizational challenges related to the creation of the NHIA. The paper uses a qualitative approach, drawing on the World Health Organization's institutional and Organizational Assessment for Improving and Strengthening health financing (OASIS) conceptual framework for data analysis. Data were collected from a review of key health financing policy documents and from 17 semi-structured key informant interviews. Policy makers and advisors are confronting issues related to institutional arrangements, funding sources for the authority and government support for subsidies to the demand-side health financing schemes. Compulsory membership is proposed, but the means for covering the informal sector have not been resolved. While unification of existing schemes may be the basis for creating a single risk pool, challenges related to administrative capacity and cross-subsidies remain. The example of Lao PDR illustrates the need to include consideration of national context, the sequencing of reforms and the time-scale appropriate for achieving universal coverage. Copyright © 2013 Elsevier Ltd. All rights reserved.
Bokulich, Nicholas A.
2013-01-01
Ultra-high-throughput sequencing (HTS) of fungal communities has been restricted by short read lengths and primer amplification bias, slowing the adoption of newer sequencing technologies to fungal community profiling. To address these issues, we evaluated the performance of several common internal transcribed spacer (ITS) primers and designed a novel primer set and work flow for simultaneous quantification and species-level interrogation of fungal consortia. Primer comparison and validation were predicted in silico and by sequencing a “mock community” of mixed yeast species to explore the challenges of amplicon length and amplification bias for reconstructing defined yeast community structures. The amplicon size and distribution of this primer set are smaller than for all preexisting ITS primer sets, maximizing sequencing coverage of hypervariable ITS domains by very-short-amplicon, high-throughput sequencing platforms. This feature also enables the optional integration of quantitative PCR (qPCR) directly into the HTS preparatory work flow by substituting qPCR with these primers for standard PCR, yielding quantification of individual community members. The complete work flow described here, utilizing any of the qualified primer sets evaluated, can rapidly profile mixed fungal communities and capably reconstructed well-characterized beer and wine fermentation fungal communities. PMID:23377949
Construction of Red Fox Chromosomal Fragments from the Short-Read Genome Assembly.
Rando, Halie M; Farré, Marta; Robson, Michael P; Won, Naomi B; Johnson, Jennifer L; Buch, Ronak; Bastounes, Estelle R; Xiang, Xueyan; Feng, Shaohong; Liu, Shiping; Xiong, Zijun; Kim, Jaebum; Zhang, Guojie; Trut, Lyudmila N; Larkin, Denis M; Kukekova, Anna V
2018-06-20
The genome of a red fox ( Vulpes vulpes ) was recently sequenced and assembled using next-generation sequencing (NGS). The assembly is of high quality, with 94X coverage and a scaffold N50 of 11.8 Mbp, but is split into 676,878 scaffolds, some of which are likely to contain assembly errors. Fragmentation and misassembly hinder accurate gene prediction and downstream analysis such as the identification of loci under selection. Therefore, assembly of the genome into chromosome-scale fragments was an important step towards developing this genomic model. Scaffolds from the assembly were aligned to the dog reference genome and compared to the alignment of an outgroup genome (cat) against the dog to identify syntenic sequences among species. The program Reference-Assisted Chromosome Assembly (RACA) then integrated the comparative alignment with the mapping of the raw sequencing reads generated during assembly against the fox scaffolds. The 128 sequence fragments RACA assembled were compared to the fox meiotic linkage map to guide the construction of 40 chromosomal fragments. This computational approach to assembly was facilitated by prior research in comparative mammalian genomics, and the continued improvement of the red fox genome can in turn offer insight into canid and carnivore chromosome evolution. This assembly is also necessary for advancing genetic research in foxes and other canids.
The need for an assembly pilot project
USDA-ARS?s Scientific Manuscript database
Progress has been rapid since the June 2008 start of the cacao genome sequencing project with the completion of the physical map and the accumulation of approximately 10x coverage of the genome with Titanium 454 sequence data of Matina1-6, the highly homozygous Amelonado tree chosen for the project....
Fast imputation using medium- or low-coverage sequence data
USDA-ARS?s Scientific Manuscript database
Direct imputation from raw sequence reads can be more accurate than calling genotypes first and then imputing, especially if read depth is low or error rates high, but different imputation strategies are required than those used for data from genotyping chips. A fast algorithm to impute from lower t...
Transcriptome assembly, gene annotation and tissue gene expression atlas of the rainbow trout
USDA-ARS?s Scientific Manuscript database
Efforts to obtain a comprehensive genome sequence for rainbow trout are ongoing and will be complimented by transcriptome information that will enhance genome assembly and annotation. Previously, we reported a transcriptome reference sequence using a 19X coverage of Sanger and 454-pyrosequencing dat...
Cohen, Paul A; Flowers, Nicola; Tong, Stephen; Hannan, Natalie; Pertile, Mark D; Hui, Lisa
2016-08-24
Non-invasive prenatal testing (NIPT) identifies fetal aneuploidy by sequencing cell-free DNA in the maternal plasma. Pre-symptomatic maternal malignancies have been incidentally detected during NIPT based on abnormal genomic profiles. This low coverage sequencing approach could have potential for ovarian cancer screening in the non-pregnant population. Our objective was to investigate whether plasma DNA sequencing with a clinical whole genome NIPT platform can detect early- and late-stage high-grade serous ovarian carcinomas (HGSOC). This is a case control study of prospectively-collected biobank samples comprising preoperative plasma from 32 women with HGSOC (16 'early cancer' (FIGO I-II) and 16 'advanced cancer' (FIGO III-IV)) and 32 benign controls. Plasma DNA from cases and controls were sequenced using a commercial NIPT platform and chromosome dosage measured. Sequencing data were blindly analyzed with two methods: (1) Subchromosomal changes were called using an open source algorithm WISECONDOR (WIthin-SamplE COpy Number aberration DetectOR). Genomic gains or losses ≥ 15 Mb were prespecified as "screen positive" calls, and mapped to recurrent copy number variations reported in an ovarian cancer genome atlas. (2) Selected whole chromosome gains or losses were reported using the routine NIPT pipeline for fetal aneuploidy. We detected 13/32 cancer cases using the subchromosomal analysis (sensitivity 40.6 %, 95 % CI, 23.7-59.4 %), including 6/16 early and 7/16 advanced HGSOC cases. Two of 32 benign controls had subchromosomal gains ≥ 15 Mb (specificity 93.8 %, 95 % CI, 79.2-99.2 %). Twelve of the 13 true positive cancer cases exhibited specific recurrent changes reported in HGSOC tumors. The NIPT pipeline resulted in one "monosomy 18" call from the cancer group, and two "monosomy X" calls in the controls. Low coverage plasma DNA sequencing used for prenatal testing detected 40.6 % of all HGSOC, including 38 % of early stage cases. Our findings demonstrate the potential of a high throughput sequencing platform to screen for early HGSOC in plasma based on characteristic multiple segmental chromosome gains and losses. The performance of this approach may be further improved by refining bioinformatics algorithms and targeting selected cancer copy number variations.
Gardner, Elliot M.; Johnson, Matthew G.; Ragone, Diane; Wickett, Norman J.; Zerega, Nyree J. C.
2016-01-01
Premise of the study: We used moderately low-coverage (17×) whole-genome sequencing of Artocarpus camansi (Moraceae) to develop genomic resources for Artocarpus and Moraceae. Methods and Results: A de novo assembly of Illumina short reads (251,378,536 pairs, 2 × 100 bp) accounted for 93% of the predicted genome size. Predicted coding regions were used in a three-way orthology search with published genomes of Morus notabilis and Cannabis sativa. Phylogenetic markers for Moraceae were developed from 333 inferred single-copy exons. Ninety-eight putative MADS-box genes were identified. Analysis of all predicted coding regions resulted in preliminary annotation of 49,089 genes. An analysis of synonymous substitutions for pairs of orthologs (Ks analysis) in M. notabilis and A. camansi strongly suggested a lineage-specific whole-genome duplication in Artocarpus. Conclusions: This study substantially increases the genomic resources available for Artocarpus and Moraceae and demonstrates the value of low-coverage de novo assemblies for nonmodel organisms with moderately large genomes. PMID:27437173
Moniruzzaman; Saha, Palash Chandra; Habib, Md Monjurul
2015-01-01
The Community Based Rehabilitation (CBR) is a common approach to work with disable people to improve their quality of life by improving the level of productivity and integrating them into society. But the effectiveness of CBR varies by country to country. The aim of the study was to find out whether CBR programs really improved the level of productivity among persons with physical disabilities. A cross-sectional study was conducted among equal number of respondents (n=51) from each CBR coverage and non-coverage areas from two different upazilla (sub-districts) located 40 km away from the capital city of Bangladesh. Respondents were selected purposively and data were collected by face to face interviews. Willer's (1994) version of the Community Integration Questionnaire (CIQ) was used to measure the level of productivity among adult with physical disabilities. The mean score of total productivity integration in CBR coverage and non-coverage areas were 4.3 ± 2.4 and 4.5 ± 2.2 respectively. This difference was statistically non-significant (p=0.602).The levels of productivity integration between CBR coverage and non-coverage areas varied only 2-4% (p=0.793). The mean score of productivity integration and levels of productivity were not different significantly in CBR coverage and non-coverage areas.
Nordenstedt, Noora; Marcenaro, Delfia; Chilagane, Daudi; Mwaipopo, Beatrice; Rajamäki, Minna-Liisa; Nchimbi-Msolla, Susan; Njau, Paul J R; Mbanzibwa, Deusdedith R; Valkonen, Jari P T
2017-01-01
Common bean (Phaseolus vulgaris) is an annual grain legume that was domesticated in Mesoamerica (Central America) and the Andes. It is currently grown widely also on other continents including Africa. We surveyed seedborne viruses in new common bean varieties introduced to Nicaragua (Central America) and in landraces and improved varieties grown in Tanzania (eastern Africa). Bean seeds, harvested from Nicaragua and Tanzania, were grown in insect-controlled greenhouse or screenhouse, respectively, to obtain leaf material for virus testing. Equal amounts of total RNA from different samples were pooled (30-36 samples per pool), and small RNAs were deep-sequenced (Illumina). Assembly of the reads (21-24 nt) to contiguous sequences and searches for homologous viral sequences in databases revealed Phaseolus vulgaris endornavirus 1 (PvEV-1) and PvEV-2 in the bean varieties in Nicaragua and Tanzania. These viruses are not known to cause symptoms in common bean and are considered non-pathogenic. The small-RNA reads from each pool of samples were mapped to the previously characterized complete PvEV-1 and PvEV-2 sequences (genome lengths ca. 14 kb and 15 kb, respectively). Coverage of the viral genomes was 87.9-99.9%, depending on the pool. Coverage per nucleotide ranged from 5 to 471, confirming virus identification. PvEV-1 and PvEV-2 are known to occur in Phaseolus spp. in Central America, but there is little previous information about their occurrence in Nicaragua, and no information about occurrence in Africa. Aside from Cowpea mild mosaic virus detected in bean plants grown from been seeds harvested from one region in Tanzania, no other pathogenic seedborne viruses were detected. The low incidence of infections caused by pathogenic viruses transmitted via bean seeds may be attributable to new, virus-resistant CB varieties released by breeding programs in Nicaragua and Tanzania.
Mahato, Ajay Kumar; Sharma, Nimisha; Singh, Akshay; Srivastav, Manish; Jaiprakash; Singh, Sanjay Kumar; Singh, Anand Kumar; Sharma, Tilak Raj; Singh, Nagendra Kumar
2016-01-01
Mango (Mangifera indica L.) is called "king of fruits" due to its sweetness, richness of taste, diversity, large production volume and a variety of end usage. Despite its huge economic importance genomic resources in mango are scarce and genetics of useful horticultural traits are poorly understood. Here we generated deep coverage leaf RNA sequence data for mango parental varieties 'Neelam', 'Dashehari' and their hybrid 'Amrapali' using next generation sequencing technologies. De-novo sequence assembly generated 27,528, 20,771 and 35,182 transcripts for the three genotypes, respectively. The transcripts were further assembled into a non-redundant set of 70,057 unigenes that were used for SSR and SNP identification and annotation. Total 5,465 SSR loci were identified in 4,912 unigenes with 288 type I SSR (n ≥ 20 bp). One hundred type I SSR markers were randomly selected of which 43 yielded PCR amplicons of expected size in the first round of validation and were designated as validated genic-SSR markers. Further, 22,306 SNPs were identified by aligning high quality sequence reads of the three mango varieties to the reference unigene set, revealing significantly enhanced SNP heterozygosity in the hybrid Amrapali. The present study on leaf RNA sequencing of mango varieties and their hybrid provides useful genomic resource for genetic improvement of mango.
Mahato, Ajay Kumar; Sharma, Nimisha; Singh, Akshay; Srivastav, Manish; Jaiprakash; Singh, Sanjay Kumar; Singh, Anand Kumar; Sharma, Tilak Raj; Singh, Nagendra Kumar
2016-01-01
Mango (Mangifera indica L.) is called “king of fruits” due to its sweetness, richness of taste, diversity, large production volume and a variety of end usage. Despite its huge economic importance genomic resources in mango are scarce and genetics of useful horticultural traits are poorly understood. Here we generated deep coverage leaf RNA sequence data for mango parental varieties ‘Neelam’, ‘Dashehari’ and their hybrid ‘Amrapali’ using next generation sequencing technologies. De-novo sequence assembly generated 27,528, 20,771 and 35,182 transcripts for the three genotypes, respectively. The transcripts were further assembled into a non-redundant set of 70,057 unigenes that were used for SSR and SNP identification and annotation. Total 5,465 SSR loci were identified in 4,912 unigenes with 288 type I SSR (n ≥ 20 bp). One hundred type I SSR markers were randomly selected of which 43 yielded PCR amplicons of expected size in the first round of validation and were designated as validated genic-SSR markers. Further, 22,306 SNPs were identified by aligning high quality sequence reads of the three mango varieties to the reference unigene set, revealing significantly enhanced SNP heterozygosity in the hybrid Amrapali. The present study on leaf RNA sequencing of mango varieties and their hybrid provides useful genomic resource for genetic improvement of mango. PMID:27736892
Reiz, Bela; Li, Liang
2010-09-01
Controlled hydrolysis of proteins to generate peptide ladders combined with mass spectrometric analysis of the resultant peptides can be used for protein sequencing. In this paper, two methods of improving the microwave-assisted protein hydrolysis process are described to enable rapid sequencing of proteins containing disulfide bonds and increase sequence coverage, respectively. It was demonstrated that proteins containing disulfide bonds could be sequenced by MS analysis by first performing hydrolysis for less than 2 min, followed by 1 h of reduction to release the peptides originally linked by disulfide bonds. It was shown that a strong base could be used as a catalyst for microwave-assisted protein hydrolysis, producing complementary sequence information to that generated by microwave-assisted acid hydrolysis. However, using either acid or base hydrolysis, amide bond breakages in small regions of the polypeptide chains of the model proteins (e.g., cytochrome c and lysozyme) were not detected. Dynamic light scattering measurement of the proteins solubilized in an acid or base indicated that protein-protein interaction or aggregation was not the cause of the failure to hydrolyze certain amide bonds. It was speculated that there were some unknown local structures that might play a role in preventing an acid or base from reacting with the peptide bonds therein. 2010 American Society for Mass Spectrometry. Published by Elsevier Inc. All rights reserved.
A Two-Phase Coverage-Enhancing Algorithm for Hybrid Wireless Sensor Networks.
Zhang, Qingguo; Fok, Mable P
2017-01-09
Providing field coverage is a key task in many sensor network applications. In certain scenarios, the sensor field may have coverage holes due to random initial deployment of sensors; thus, the desired level of coverage cannot be achieved. A hybrid wireless sensor network is a cost-effective solution to this problem, which is achieved by repositioning a portion of the mobile sensors in the network to meet the network coverage requirement. This paper investigates how to redeploy mobile sensor nodes to improve network coverage in hybrid wireless sensor networks. We propose a two-phase coverage-enhancing algorithm for hybrid wireless sensor networks. In phase one, we use a differential evolution algorithm to compute the candidate's target positions in the mobile sensor nodes that could potentially improve coverage. In the second phase, we use an optimization scheme on the candidate's target positions calculated from phase one to reduce the accumulated potential moving distance of mobile sensors, such that the exact mobile sensor nodes that need to be moved as well as their final target positions can be determined. Experimental results show that the proposed algorithm provided significant improvement in terms of area coverage rate, average moving distance, area coverage-distance rate and the number of moved mobile sensors, when compare with other approaches.
2012-01-01
Background Amazona vittata is a critically endangered Puerto Rican endemic bird, the only surviving native parrot species in the United States territory, and the first parrot in the large Neotropical genus Amazona, to be studied on a genomic scale. Findings In a unique community-based funded project, DNA from an A. vittata female was sequenced using a HiSeq Illumina platform, resulting in a total of ~42.5 billion nucleotide bases. This provided approximately 26.89x average coverage depth at the completion of this funding phase. Filtering followed by assembly resulted in 259,423 contigs (N50 = 6,983 bp, longest = 75,003 bp), which was further scaffolded into 148,255 fragments (N50 = 19,470, longest = 206,462 bp). This provided ~76% coverage of the genome based on an estimated size of 1.58 Gb. The assembled scaffolds allowed basic genomic annotation and comparative analyses with other available avian whole-genome sequences. Conclusions The current data represents the first genomic information from and work carried out with a unique source of funding. This analysis further provides a means for directed training of young researchers in genetic and bioinformatics analyses and will facilitate progress towards a full assembly and annotation of the Puerto Rican parrot genome. It also adds extensive genomic data to a new branch of the avian tree, making it useful for comparative analyses with other avian species. Ultimately, the knowledge acquired from these data will contribute to an improved understanding of the overall population health of this species and aid in ongoing and future conservation efforts. PMID:23587420
Demographic history and rare allele sharing among human populations.
Gravel, Simon; Henn, Brenna M; Gutenkunst, Ryan N; Indap, Amit R; Marth, Gabor T; Clark, Andrew G; Yu, Fuli; Gibbs, Richard A; Bustamante, Carlos D
2011-07-19
High-throughput sequencing technology enables population-level surveys of human genomic variation. Here, we examine the joint allele frequency distributions across continental human populations and present an approach for combining complementary aspects of whole-genome, low-coverage data and targeted high-coverage data. We apply this approach to data generated by the pilot phase of the Thousand Genomes Project, including whole-genome 2-4× coverage data for 179 samples from HapMap European, Asian, and African panels as well as high-coverage target sequencing of the exons of 800 genes from 697 individuals in seven populations. We use the site frequency spectra obtained from these data to infer demographic parameters for an Out-of-Africa model for populations of African, European, and Asian descent and to predict, by a jackknife-based approach, the amount of genetic diversity that will be discovered as sample sizes are increased. We predict that the number of discovered nonsynonymous coding variants will reach 100,000 in each population after ∼1,000 sequenced chromosomes per population, whereas ∼2,500 chromosomes will be needed for the same number of synonymous variants. Beyond this point, the number of segregating sites in the European and Asian panel populations is expected to overcome that of the African panel because of faster recent population growth. Overall, we find that the majority of human genomic variable sites are rare and exhibit little sharing among diverged populations. Our results emphasize that replication of disease association for specific rare genetic variants across diverged populations must overcome both reduced statistical power because of rarity and higher population divergence.
Demographic history and rare allele sharing among human populations
Gravel, Simon; Henn, Brenna M.; Gutenkunst, Ryan N.; Indap, Amit R.; Marth, Gabor T.; Clark, Andrew G.; Yu, Fuli; Gibbs, Richard A.; Bustamante, Carlos D.; Altshuler, David L.; Durbin, Richard M.; Abecasis, Gonçalo R.; Bentley, David R.; Chakravarti, Aravinda; Clark, Andrew G.; Collins, Francis S.; De La Vega, Francisco M.; Donnelly, Peter; Egholm, Michael; Flicek, Paul; Gabriel, Stacey B.; Gibbs, Richard A.; Knoppers, Bartha M.; Lander, Eric S.; Lehrach, Hans; Mardis, Elaine R.; McVean, Gil A.; Nickerson, Debbie A.; Peltonen, Leena; Schafer, Alan J.; Sherry, Stephen T.; Wang, Jun; Wilson, Richard K.; Gibbs, Richard A.; Deiros, David; Metzker, Mike; Muzny, Donna; Reid, Jeff; Wheeler, David; Wang, Jun; Li, Jingxiang; Jian, Min; Li, Guoqing; Li, Ruiqiang; Liang, Huiqing; Tian, Geng; Wang, Bo; Wang, Jian; Wang, Wei; Yang, Huanming; Zhang, Xiuqing; Zheng, Huisong; Lander, Eric S.; Altshuler, David L.; Ambrogio, Lauren; Bloom, Toby; Cibulskis, Kristian; Fennell, Tim J.; Gabriel, Stacey B.; Jaffe, David B.; Shefler, Erica; Sougnez, Carrie L.; Bentley, David R.; Gormley, Niall; Humphray, Sean; Kingsbury, Zoya; Koko-Gonzales, Paula; Stone, Jennifer; McKernan, Kevin J.; Costa, Gina L.; Ichikawa, Jeffry K.; Lee, Clarence C.; Sudbrak, Ralf; Lehrach, Hans; Borodina, Tatiana A.; Dahl, Andreas; Davydov, Alexey N.; Marquardt, Peter; Mertes, Florian; Nietfeld, Wilfiried; Rosenstiel, Philip; Schreiber, Stefan; Soldatov, Aleksey V.; Timmermann, Bernd; Tolzmann, Marius; Egholm, Michael; Affourtit, Jason; Ashworth, Dana; Attiya, Said; Bachorski, Melissa; Buglione, Eli; Burke, Adam; Caprio, Amanda; Celone, Christopher; Clark, Shauna; Conners, David; Desany, Brian; Gu, Lisa; Guccione, Lorri; Kao, Kalvin; Kebbel, Andrew; Knowlton, Jennifer; Labrecque, Matthew; McDade, Louise; Mealmaker, Craig; Minderman, Melissa; Nawrocki, Anne; Niazi, Faheem; Pareja, Kristen; Ramenani, Ravi; Riches, David; Song, Wanmin; Turcotte, Cynthia; Wang, Shally; Mardis, Elaine R.; Wilson, Richard K.; Dooling, David; Fulton, Lucinda; Fulton, Robert; Weinstock, George; Durbin, Richard M.; Burton, John; Carter, David M.; Churcher, Carol; Coffey, Alison; Cox, Anthony; Palotie, Aarno; Quail, Michael; Skelly, Tom; Stalker, James; Swerdlow, Harold P.; Turner, Daniel; De Witte, Anniek; Giles, Shane; Gibbs, Richard A.; Wheeler, David; Bainbridge, Matthew; Challis, Danny; Sabo, Aniko; Yu, Fuli; Yu, Jin; Wang, Jun; Fang, Xiaodong; Guo, Xiaosen; Li, Ruiqiang; Li, Yingrui; Luo, Ruibang; Tai, Shuaishuai; Wu, Honglong; Zheng, Hancheng; Zheng, Xiaole; Zhou, Yan; Li, Guoqing; Wang, Jian; Yang, Huanming; Marth, Gabor T.; Garrison, Erik P.; Huang, Weichun; Indap, Amit; Kural, Deniz; Lee, Wan-Ping; Leong, Wen Fung; Quinlan, Aaron R.; Stewart, Chip; Stromberg, Michael P.; Ward, Alistair N.; Wu, Jiantao; Lee, Charles; Mills, Ryan E.; Shi, Xinghua; Daly, Mark J.; DePristo, Mark A.; Altshuler, David L.; Ball, Aaron D.; Banks, Eric; Bloom, Toby; Browning, Brian L.; Cibulskis, Kristian; Fennell, Tim J.; Garimella, Kiran V.; Grossman, Sharon R.; Handsaker, Robert E.; Hanna, Matt; Hartl, Chris; Jaffe, David B.; Kernytsky, Andrew M.; Korn, Joshua M.; Li, Heng; Maguire, Jared R.; McCarroll, Steven A.; McKenna, Aaron; Nemesh, James C.; Philippakis, Anthony A.; Poplin, Ryan E.; Price, Alkes; Rivas, Manuel A.; Sabeti, Pardis C.; Schaffner, Stephen F.; Shefler, Erica; Shlyakhter, Ilya A.; Cooper, David N.; Ball, Edward V.; Mort, Matthew; Phillips, Andrew D.; Stenson, Peter D.; Sebat, Jonathan; Makarov, Vladimir; Ye, Kenny; Yoon, Seungtai C.; Bustamante, Carlos D.; Clark, Andrew G.; Boyko, Adam; Degenhardt, Jeremiah; Gravel, Simon; Gutenkunst, Ryan N.; Kaganovich, Mark; Keinan, Alon; Lacroute, Phil; Ma, Xin; Reynolds, Andy; Clarke, Laura; Flicek, Paul; Cunningham, Fiona; Herrero, Javier; Keenen, Stephen; Kulesha, Eugene; Leinonen, Rasko; McLaren, William M.; Radhakrishnan, Rajesh; Smith, Richard E.; Zalunin, Vadim; Zheng-Bradley, Xiangqun; Korbel, Jan O.; Stütz, Adrian M.; Humphray, Sean; Bauer, Markus; Cheetham, R. Keira; Cox, Tony; Eberle, Michael; James, Terena; Kahn, Scott; Murray, Lisa; Chakravarti, Aravinda; Ye, Kai; De La Vega, Francisco M.; Fu, Yutao; Hyland, Fiona C. L.; Manning, Jonathan M.; McLaughlin, Stephen F.; Peckham, Heather E.; Sakarya, Onur; Sun, Yongming A.; Tsung, Eric F.; Batzer, Mark A.; Konkel, Miriam K.; Walker, Jerilyn A.; Sudbrak, Ralf; Albrecht, Marcus W.; Amstislavskiy, Vyacheslav S.; Herwig, Ralf; Parkhomchuk, Dimitri V.; Sherry, Stephen T.; Agarwala, Richa; Khouri, Hoda M.; Morgulis, Aleksandr O.; Paschall, Justin E.; Phan, Lon D.; Rotmistrovsky, Kirill E.; Sanders, Robert D.; Shumway, Martin F.; Xiao, Chunlin; McVean, Gil A.; Auton, Adam; Iqbal, Zamin; Lunter, Gerton; Marchini, Jonathan L.; Moutsianas, Loukas; Myers, Simon; Tumian, Afidalina; Desany, Brian; Knight, James; Winer, Roger; Craig, David W.; Beckstrom-Sternberg, Steve M.; Christoforides, Alexis; Kurdoglu, Ahmet A.; Pearson, John V.; Sinari, Shripad A.; Tembe, Waibhav D.; Haussler, David; Hinrichs, Angie S.; Katzman, Sol J.; Kern, Andrew; Kuhn, Robert M.; Przeworski, Molly; Hernandez, Ryan D.; Howie, Bryan; Kelley, Joanna L.; Melton, S. Cord; Abecasis, Gonçalo R.; Li, Yun; Anderson, Paul; Blackwell, Tom; Chen, Wei; Cookson, William O.; Ding, Jun; Kang, Hyun Min; Lathrop, Mark; Liang, Liming; Moffatt, Miriam F.; Scheet, Paul; Sidore, Carlo; Snyder, Matthew; Zhan, Xiaowei; Zöllner, Sebastian; Awadalla, Philip; Casals, Ferran; Idaghdour, Youssef; Keebler, John; Stone, Eric A.; Zilversmit, Martine; Jorde, Lynn; Xing, Jinchuan; Eichler, Evan E.; Aksay, Gozde; Alkan, Can; Hajirasouliha, Iman; Hormozdiari, Fereydoun; Kidd, Jeffrey M.; Sahinalp, S. Cenk; Sudmant, Peter H.; Mardis, Elaine R.; Chen, Ken; Chinwalla, Asif; Ding, Li; Koboldt, Daniel C.; McLellan, Mike D.; Dooling, David; Weinstock, George; Wallis, John W.; Wendl, Michael C.; Zhang, Qunyuan; Durbin, Richard M.; Albers, Cornelis A.; Ayub, Qasim; Balasubramaniam, Senduran; Barrett, Jeffrey C.; Carter, David M.; Chen, Yuan; Conrad, Donald F.; Danecek, Petr; Dermitzakis, Emmanouil T.; Hu, Min; Huang, Ni; Hurles, Matt E.; Jin, Hanjun; Jostins, Luke; Keane, Thomas M.; Le, Si Quang; Lindsay, Sarah; Long, Quan; MacArthur, Daniel G.; Montgomery, Stephen B.; Parts, Leopold; Stalker, James; Tyler-Smith, Chris; Walter, Klaudia; Zhang, Yujun; Gerstein, Mark B.; Snyder, Michael; Abyzov, Alexej; Balasubramanian, Suganthi; Bjornson, Robert; Du, Jiang; Grubert, Fabian; Habegger, Lukas; Haraksingh, Rajini; Jee, Justin; Khurana, Ekta; Lam, Hugo Y. K.; Leng, Jing; Mu, Xinmeng Jasmine; Urban, Alexander E.; Zhang, Zhengdong; Li, Yingrui; Luo, Ruibang; Marth, Gabor T.; Garrison, Erik P.; Kural, Deniz; Quinlan, Aaron R.; Stewart, Chip; Stromberg, Michael P.; Ward, Alistair N.; Wu, Jiantao; Lee, Charles; Mills, Ryan E.; Shi, Xinghua; McCarroll, Steven A.; Banks, Eric; DePristo, Mark A.; Handsaker, Robert E.; Hartl, Chris; Korn, Joshua M.; Li, Heng; Nemesh, James C.; Sebat, Jonathan; Makarov, Vladimir; Ye, Kenny; Yoon, Seungtai C.; Degenhardt, Jeremiah; Kaganovich, Mark; Clarke, Laura; Smith, Richard E.; Zheng-Bradley, Xiangqun; Korbel, Jan O.; Humphray, Sean; Cheetham, R. Keira; Eberle, Michael; Kahn, Scott; Murray, Lisa; Ye, Kai; De La Vega, Francisco M.; Fu, Yutao; Peckham, Heather E.; Sun, Yongming A.; Batzer, Mark A.; Konkel, Miriam K.; Walker, Jerilyn A.; Xiao, Chunlin; Iqbal, Zamin; Desany, Brian; Blackwell, Tom; Snyder, Matthew; Xing, Jinchuan; Eichler, Evan E.; Aksay, Gozde; Alkan, Can; Hajirasouliha, Iman; Hormozdiari, Fereydoun; Kidd, Jeffrey M.; Chen, Ken; Chinwalla, Asif; Ding, Li; McLellan, Mike D.; Wallis, John W.; Hurles, Matt E.; Conrad, Donald F.; Walter, Klaudia; Zhang, Yujun; Gerstein, Mark B.; Snyder, Michael; Abyzov, Alexej; Du, Jiang; Grubert, Fabian; Haraksingh, Rajini; Jee, Justin; Khurana, Ekta; Lam, Hugo Y. K.; Leng, Jing; Mu, Xinmeng Jasmine; Urban, Alexander E.; Zhang, Zhengdong; Gibbs, Richard A.; Bainbridge, Matthew; Challis, Danny; Coafra, Cristian; Dinh, Huyen; Kovar, Christie; Lee, Sandy; Muzny, Donna; Nazareth, Lynne; Reid, Jeff; Sabo, Aniko; Yu, Fuli; Yu, Jin; Marth, Gabor T.; Garrison, Erik P.; Indap, Amit; Leong, Wen Fung; Quinlan, Aaron R.; Stewart, Chip; Ward, Alistair N.; Wu, Jiantao; Cibulskis, Kristian; Fennell, Tim J.; Gabriel, Stacey B.; Garimella, Kiran V.; Hartl, Chris; Shefler, Erica; Sougnez, Carrie L.; Wilkinson, Jane; Clark, Andrew G.; Gravel, Simon; Grubert, Fabian; Clarke, Laura; Flicek, Paul; Smith, Richard E.; Zheng-Bradley, Xiangqun; Sherry, Stephen T.; Khouri, Hoda M.; Paschall, Justin E.; Shumway, Martin F.; Xiao, Chunlin; McVean, Gil A.; Katzman, Sol J.; Abecasis, Gonçalo R.; Blackwell, Tom; Mardis, Elaine R.; Dooling, David; Fulton, Lucinda; Fulton, Robert; Koboldt, Daniel C.; Durbin, Richard M.; Balasubramaniam, Senduran; Coffey, Allison; Keane, Thomas M.; MacArthur, Daniel G.; Palotie, Aarno; Scott, Carol; Stalker, James; Tyler-Smith, Chris; Gerstein, Mark B.; Balasubramanian, Suganthi; Chakravarti, Aravinda; Knoppers, Bartha M.; Abecasis, Gonçalo R.; Bustamante, Carlos D.; Gharani, Neda; Gibbs, Richard A.; Jorde, Lynn; Kaye, Jane S.; Kent, Alastair; Li, Taosha; McGuire, Amy L.; McVean, Gil A.; Ossorio, Pilar N.; Rotimi, Charles N.; Su, Yeyang; Toji, Lorraine H.; TylerSmith, Chris; Brooks, Lisa D.; Felsenfeld, Adam L.; McEwen, Jean E.; Abdallah, Assya; Juenger, Christopher R.; Clemm, Nicholas C.; Collins, Francis S.; Duncanson, Audrey; Green, Eric D.; Guyer, Mark S.; Peterson, Jane L.; Schafer, Alan J.; Abecasis, Gonçalo R.; Altshuler, David L.; Auton, Adam; Brooks, Lisa D.; Durbin, Richard M.; Gibbs, Richard A.; Hurles, Matt E.; McVean, Gil A.
2011-01-01
High-throughput sequencing technology enables population-level surveys of human genomic variation. Here, we examine the joint allele frequency distributions across continental human populations and present an approach for combining complementary aspects of whole-genome, low-coverage data and targeted high-coverage data. We apply this approach to data generated by the pilot phase of the Thousand Genomes Project, including whole-genome 2–4× coverage data for 179 samples from HapMap European, Asian, and African panels as well as high-coverage target sequencing of the exons of 800 genes from 697 individuals in seven populations. We use the site frequency spectra obtained from these data to infer demographic parameters for an Out-of-Africa model for populations of African, European, and Asian descent and to predict, by a jackknife-based approach, the amount of genetic diversity that will be discovered as sample sizes are increased. We predict that the number of discovered nonsynonymous coding variants will reach 100,000 in each population after ∼1,000 sequenced chromosomes per population, whereas ∼2,500 chromosomes will be needed for the same number of synonymous variants. Beyond this point, the number of segregating sites in the European and Asian panel populations is expected to overcome that of the African panel because of faster recent population growth. Overall, we find that the majority of human genomic variable sites are rare and exhibit little sharing among diverged populations. Our results emphasize that replication of disease association for specific rare genetic variants across diverged populations must overcome both reduced statistical power because of rarity and higher population divergence. PMID:21730125
Park, Jae-Min; Jang, Se Jin; Lee, Sang-Ick; Lee, Won-Jun
2018-03-14
We designed cyclosilazane-type silicon precursors and proposed a three-step plasma-enhanced atomic layer deposition (PEALD) process to prepare silicon nitride films with high quality and excellent step coverage. The cyclosilazane-type precursor, 1,3-di-isopropylamino-2,4-dimethylcyclosilazane (CSN-2), has a closed ring structure for good thermal stability and high reactivity. CSN-2 showed thermal stability up to 450 °C and a sufficient vapor pressure of 4 Torr at 60 °C. The energy for the chemisorption of CSN-2 on the undercoordinated silicon nitride surface as calculated by density functional theory method was -7.38 eV. The PEALD process window was between 200 and 500 °C, with a growth rate of 0.43 Å/cycle. The best film quality was obtained at 500 °C, with hydrogen impurity of ∼7 atom %, oxygen impurity less than 2 atom %, low wet etching rate, and excellent step coverage of ∼95%. At 300 °C and lower temperatures, the wet etching rate was high especially at the lower sidewall of the trench pattern. We introduced the three-step PEALD process to improve the film quality and the step coverage on the lower sidewall. The sequence of the three-step PEALD process consists of the CSN-2 feeding step, the NH 3 /N 2 plasma step, and the N 2 plasma step. The H radicals in NH 3 /N 2 plasma efficiently remove the ligands from the precursor, and the N 2 plasma after the NH 3 plasma removes the surface hydrogen atoms to activate the adsorption of the precursor. The films deposited at 300 °C using the novel precursor and the three-step PEALD process showed a significantly improved step coverage of ∼95% and an excellent wet etching resistance at the lower sidewall, which is only twice as high as that of the blanket film prepared by low-pressure chemical vapor deposition.
Li, Jonathan Z; Chapman, Brad; Charlebois, Patrick; Hofmann, Oliver; Weiner, Brian; Porter, Alyssa J; Samuel, Reshmi; Vardhanabhuti, Saran; Zheng, Lu; Eron, Joseph; Taiwo, Babafemi; Zody, Michael C; Henn, Matthew R; Kuritzkes, Daniel R; Hide, Winston; Wilson, Cara C; Berzins, Baiba I; Acosta, Edward P; Bastow, Barbara; Kim, Peter S; Read, Sarah W; Janik, Jennifer; Meres, Debra S; Lederman, Michael M; Mong-Kryspin, Lori; Shaw, Karl E; Zimmerman, Louis G; Leavitt, Randi; De La Rosa, Guy; Jennings, Amy
2014-01-01
The impact of raltegravir-resistant HIV-1 minority variants (MVs) on raltegravir treatment failure is unknown. Illumina sequencing offers greater throughput than 454, but sequence analysis tools for viral sequencing are needed. We evaluated Illumina and 454 for the detection of HIV-1 raltegravir-resistant MVs. A5262 was a single-arm study of raltegravir and darunavir/ritonavir in treatment-naïve patients. Pre-treatment plasma was obtained from 5 participants with raltegravir resistance at the time of virologic failure. A control library was created by pooling integrase clones at predefined proportions. Multiplexed sequencing was performed with Illumina and 454 platforms at comparable costs. Illumina sequence analysis was performed with the novel snp-assess tool and 454 sequencing was analyzed with V-Phaser. Illumina sequencing resulted in significantly higher sequence coverage and a 0.095% limit of detection. Illumina accurately detected all MVs in the control library at ≥0.5% and 7/10 MVs expected at 0.1%. 454 sequencing failed to detect any MVs at 0.1% with 5 false positive calls. For MVs detected in the patient samples by both 454 and Illumina, the correlation in the detected variant frequencies was high (R2 = 0.92, P<0.001). Illumina sequencing detected 2.4-fold greater nucleotide MVs and 2.9-fold greater amino acid MVs compared to 454. The only raltegravir-resistant MV detected was an E138K mutation in one participant by Illumina sequencing, but not by 454. In participants of A5262 with raltegravir resistance at virologic failure, baseline raltegravir-resistant MVs were rarely detected. At comparable costs to 454 sequencing, Illumina demonstrated greater depth of coverage, increased sensitivity for detecting HIV MVs, and fewer false positive variant calls.
LoRTE: Detecting transposon-induced genomic variants using low coverage PacBio long read sequences.
Disdero, Eric; Filée, Jonathan
2017-01-01
Population genomic analysis of transposable elements has greatly benefited from recent advances of sequencing technologies. However, the short size of the reads and the propensity of transposable elements to nest in highly repeated regions of genomes limits the efficiency of bioinformatic tools when Illumina or 454 technologies are used. Fortunately, long read sequencing technologies generating read length that may span the entire length of full transposons are now available. However, existing TE population genomic softwares were not designed to handle long reads and the development of new dedicated tools is needed. LoRTE is the first tool able to use PacBio long read sequences to identify transposon deletions and insertions between a reference genome and genomes of different strains or populations. Tested against simulated and genuine Drosophila melanogaster PacBio datasets, LoRTE appears to be a reliable and broadly applicable tool to study the dynamic and evolutionary impact of transposable elements using low coverage, long read sequences. LoRTE is an efficient and accurate tool to identify structural genomic variants caused by TE insertion or deletion. LoRTE is available for download at http://www.egce.cnrs-gif.fr/?p=6422.
Bazak, Lily; Haviv, Ami; Barak, Michal; Jacob-Hirsch, Jasmine; Deng, Patricia; Zhang, Rui; Isaacs, Farren J; Rechavi, Gideon; Li, Jin Billy; Eisenberg, Eli; Levanon, Erez Y
2014-03-01
RNA molecules transmit the information encoded in the genome and generally reflect its content. Adenosine-to-inosine (A-to-I) RNA editing by ADAR proteins converts a genomically encoded adenosine into inosine. It is known that most RNA editing in human takes place in the primate-specific Alu sequences, but the extent of this phenomenon and its effect on transcriptome diversity are not yet clear. Here, we analyzed large-scale RNA-seq data and detected ∼1.6 million editing sites. As detection sensitivity increases with sequencing coverage, we performed ultradeep sequencing of selected Alu sequences and showed that the scope of editing is much larger than anticipated. We found that virtually all adenosines within Alu repeats that form double-stranded RNA undergo A-to-I editing, although most sites exhibit editing at only low levels (<1%). Moreover, using high coverage sequencing, we observed editing of transcripts resulting from residual antisense expression, doubling the number of edited sites in the human genome. Based on bioinformatic analyses and deep targeted sequencing, we estimate that there are over 100 million human Alu RNA editing sites, located in the majority of human genes. These findings set the stage for exploring how this primate-specific massive diversification of the transcriptome is utilized.
A massive parallel sequencing workflow for diagnostic genetic testing of mismatch repair genes
Hansen, Maren F; Neckmann, Ulrike; Lavik, Liss A S; Vold, Trine; Gilde, Bodil; Toft, Ragnhild K; Sjursen, Wenche
2014-01-01
The purpose of this study was to develop a massive parallel sequencing (MPS) workflow for diagnostic analysis of mismatch repair (MMR) genes using the GS Junior system (Roche). A pathogenic variant in one of four MMR genes, (MLH1, PMS2, MSH6, and MSH2), is the cause of Lynch Syndrome (LS), which mainly predispose to colorectal cancer. We used an amplicon-based sequencing method allowing specific and preferential amplification of the MMR genes including PMS2, of which several pseudogenes exist. The amplicons were pooled at different ratios to obtain coverage uniformity and maximize the throughput of a single-GS Junior run. In total, 60 previously identified and distinct variants (substitutions and indels), were sequenced by MPS and successfully detected. The heterozygote detection range was from 19% to 63% and dependent on sequence context and coverage. We were able to distinguish between false-positive and true-positive calls in homopolymeric regions by cross-sample comparison and evaluation of flow signal distributions. In addition, we filtered variants according to a predefined status, which facilitated variant annotation. Our study shows that implementation of MPS in routine diagnostics of LS can accelerate sample throughput and reduce costs without compromising sensitivity, compared to Sanger sequencing. PMID:24689082
Herath, Damayanthi; Tang, Sen-Lin; Tandon, Kshitij; Ackland, David; Halgamuge, Saman Kumara
2017-12-28
In metagenomics, the separation of nucleotide sequences belonging to an individual or closely matched populations is termed binning. Binning helps the evaluation of underlying microbial population structure as well as the recovery of individual genomes from a sample of uncultivable microbial organisms. Both supervised and unsupervised learning methods have been employed in binning; however, characterizing a metagenomic sample containing multiple strains remains a significant challenge. In this study, we designed and implemented a new workflow, Coverage and composition based binning of Metagenomes (CoMet), for binning contigs in a single metagenomic sample. CoMet utilizes coverage values and the compositional features of metagenomic contigs. The binning strategy in CoMet includes the initial grouping of contigs in guanine-cytosine (GC) content-coverage space and refinement of bins in tetranucleotide frequencies space in a purely unsupervised manner. With CoMet, the clustering algorithm DBSCAN is employed for binning contigs. The performances of CoMet were compared against four existing approaches for binning a single metagenomic sample, including MaxBin, Metawatt, MyCC (default) and MyCC (coverage) using multiple datasets including a sample comprised of multiple strains. Binning methods based on both compositional features and coverages of contigs had higher performances than the method which is based only on compositional features of contigs. CoMet yielded higher or comparable precision in comparison to the existing binning methods on benchmark datasets of varying complexities. MyCC (coverage) had the highest ranking score in F1-score. However, the performances of CoMet were higher than MyCC (coverage) on the dataset containing multiple strains. Furthermore, CoMet recovered contigs of more species and was 18 - 39% higher in precision than the compared existing methods in discriminating species from the sample of multiple strains. CoMet resulted in higher precision than MyCC (default) and MyCC (coverage) on a real metagenome. The approach proposed with CoMet for binning contigs, improves the precision of binning while characterizing more species in a single metagenomic sample and in a sample containing multiple strains. The F1-scores obtained from different binning strategies vary with different datasets; however, CoMet yields the highest F1-score with a sample comprised of multiple strains.
Termite hindguts and the ecology of microbial communities in the sequencing age.
Tai, Vera; Keeling, Patrick J
2013-01-01
Advances in high-throughput nucleic acid sequencing have improved our understanding of microbial communities in a number of ways. Deeper sequence coverage provides the means to assess diversity at the resolution necessary to recover ecological and biogeographic patterns, and at the same time single-cell genomics provides detailed information about the interactions between members of a microbial community. Given the vastness and complexity of microbial ecosystems, such analyses remain challenging for most environments, so greater insight can also be drawn from analysing less dynamic ecosystems. Here, we outline the advantages of one such environment, the wood-digesting hindgut communities of termites and cockroaches, and how it is a model to examine and compare both protist and bacterial communities. Beyond the analysis of diversity, our understanding of protist community ecology will depend on using statistically sound sampling regimes at biologically relevant scales, transitioning from discovery-based to experimental ecology, incorporating single-cell microbiology and other data sources, and continued development of analytical tools. © 2013 The Author(s) Journal of Eukaryotic Microbiology © 2013 International Society of Protistologists.
2013-01-01
Background Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RNA viruses of the Western honey bee (Apis mellifera), deformed wing virus (DWV) and Israel acute paralysis virus (IAPV). All viral RNA was extracted from North American samples of honey bees or, in one case, the ectoparasitic mite Varroa destructor. Results Coverage depth was generally lower for IAPV than DWV, and marked gaps in coverage occurred in several narrow regions (< 50 bp) of IAPV. These coverage gaps occurred across sequencing runs and were virtually unchanged when reads were re-mapped with greater permissiveness (up to 8% divergence), suggesting a recurrent sequencing artifact rather than strain divergence. Consensus sequences of DWV for each sample showed little phylogenetic divergence, low nucleotide diversity, and strongly negative values of Fu and Li’s D statistic, suggesting a recent population bottleneck and/or purifying selection. The Kakugo strain of DWV fell outside of all other DWV sequences at 100% bootstrap support. IAPV consensus sequences supported the existence of multiple clades as had been previously reported, and Fu and Li’s D was closer to neutral expectation overall, although a sliding-window analysis identified a significantly positive D within the protease region, suggesting selection maintains diversity in that region. Within-sample mean diversity was comparable between the two viruses on average, although for both viruses there was substantial variation among samples in mean diversity at third codon positions and in the number of high-diversity sites. FST values were bimodal for DWV, likely reflecting neutral divergence in two low-diversity populations, whereas IAPV had several sites that were strong outliers with very low FST. Conclusions This initial survey of genetic variation within honey bee RNA viruses suggests future directions for studies examining the underlying causes of population-genetic structure in these economically important pathogens. PMID:23497218
Cornman, Robert Scott; Boncristiani, Humberto; Dainat, Benjamin; Chen, Yanping; vanEngelsdorp, Dennis; Weaver, Daniel; Evans, Jay D
2013-03-07
Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RNA viruses of the Western honey bee (Apis mellifera), deformed wing virus (DWV) and Israel acute paralysis virus (IAPV). All viral RNA was extracted from North American samples of honey bees or, in one case, the ectoparasitic mite Varroa destructor. Coverage depth was generally lower for IAPV than DWV, and marked gaps in coverage occurred in several narrow regions (< 50 bp) of IAPV. These coverage gaps occurred across sequencing runs and were virtually unchanged when reads were re-mapped with greater permissiveness (up to 8% divergence), suggesting a recurrent sequencing artifact rather than strain divergence. Consensus sequences of DWV for each sample showed little phylogenetic divergence, low nucleotide diversity, and strongly negative values of Fu and Li's D statistic, suggesting a recent population bottleneck and/or purifying selection. The Kakugo strain of DWV fell outside of all other DWV sequences at 100% bootstrap support. IAPV consensus sequences supported the existence of multiple clades as had been previously reported, and Fu and Li's D was closer to neutral expectation overall, although a sliding-window analysis identified a significantly positive D within the protease region, suggesting selection maintains diversity in that region. Within-sample mean diversity was comparable between the two viruses on average, although for both viruses there was substantial variation among samples in mean diversity at third codon positions and in the number of high-diversity sites. FST values were bimodal for DWV, likely reflecting neutral divergence in two low-diversity populations, whereas IAPV had several sites that were strong outliers with very low FST. This initial survey of genetic variation within honey bee RNA viruses suggests future directions for studies examining the underlying causes of population-genetic structure in these economically important pathogens.
Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).
Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi
2014-06-01
The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. © The Author 2013. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Chen, DaYang; Zhen, HeFu; Qiu, Yong; Liu, Ping; Zeng, Peng; Xia, Jun; Shi, QianYu; Xie, Lin; Zhu, Zhu; Gao, Ya; Huang, GuoDong; Wang, Jian; Yang, HuanMing; Chen, Fang
2018-03-21
Research based on a strategy of single-cell low-coverage whole genome sequencing (SLWGS) has enabled better reproducibility and accuracy for detection of copy number variations (CNVs). The whole genome amplification (WGA) method and sequencing platform are critical factors for successful SLWGS (<0.1 × coverage). In this study, we compared single cell and multiple cells sequencing data produced by the HiSeq2000 and Ion Proton platforms using two WGA kits and then comprehensively evaluated the GC-bias, reproducibility, uniformity and CNV detection among different experimental combinations. Our analysis demonstrated that the PicoPLEX WGA Kit resulted in higher reproducibility, lower sequencing error frequency but more GC-bias than the GenomePlex Single Cell WGA Kit (WGA4 kit) independent of the cell number on the HiSeq2000 platform. While on the Ion Proton platform, the WGA4 kit (both single cell and multiple cells) had higher uniformity and less GC-bias but lower reproducibility than those of the PicoPLEX WGA Kit. Moreover, on these two sequencing platforms, depending on cell number, the performance of the two WGA kits was different for both sensitivity and specificity on CNV detection. The results can help researchers who plan to use SLWGS on single or multiple cells to select appropriate experimental conditions for their applications.
Comparison of the Equine Reference Sequence with Its Sanger Source Data and New Illumina Reads
Rebolledo-Mendez, Jovan; Hestand, Matthew S.; Coleman, Stephen J.; Zeng, Zheng; Orlando, Ludovic; MacLeod, James N.; Kalbfleisch, Ted
2015-01-01
The reference assembly for the domestic horse, EquCab2, published in 2009, was built using approximately 30 million Sanger reads from a Thoroughbred mare named Twilight. Contiguity in the assembly was facilitated using nearly 315 thousand BAC end sequences from Twilight’s half brother Bravo. Since then, it has served as the foundation for many genome-wide analyses that include not only the modern horse, but ancient horses and other equid species as well. As data mapped to this reference has accumulated, consistent variation between mapped datasets and the reference, in terms of regions with no read coverage, single nucleotide variants, and small insertions/deletions have become apparent. In many cases, it is not clear whether these differences are the result of true sequence variation between the research subjects’ and Twilight’s genome or due to errors in the reference. EquCab2 is regarded as “The Twilight Assembly.” The objective of this study was to identify inconsistencies between the EquCab2 assembly and the source Twilight Sanger data used to build it. To that end, the original Sanger and BAC end reads have been mapped back to this equine reference and assessed with the addition of approximately 40X coverage of new Illumina Paired-End sequence data. The resulting mapped datasets identify those regions with low Sanger read coverage, as well as variation in genomic content that is not consistent with either the original Twilight Sanger data or the new genomic sequence data generated from Twilight on the Illumina platform. As the haploid EquCab2 reference assembly was created using Sanger reads derived largely from a single individual, the vast majority of variation detected in a mapped dataset comprised of those same Sanger reads should be heterozygous. In contrast, homozygous variations would represent either errors in the reference or contributions from Bravo's BAC end sequences. Our analysis identifies 720,843 homozygous discrepancies between new, high throughput genomic sequence data generated for Twilight and the EquCab2 reference assembly. Most of these represent errors in the assembly, while approximately 10,000 are demonstrated to be contributions from another horse. Other results are presented that include the binary alignment map file of the mapped Sanger reads, a list of variants identified as discrepancies between the source data and resulting reference, and a BED annotation file that lists the regions of the genome whose consensus was likely derived from low coverage alignments. PMID:26107638
Disk-based compression of data from genome sequencing.
Grabowski, Szymon; Deorowicz, Sebastian; Roguski, Łukasz
2015-05-01
High-coverage sequencing data have significant, yet hard to exploit, redundancy. Most FASTQ compressors cannot efficiently compress the DNA stream of large datasets, since the redundancy between overlapping reads cannot be easily captured in the (relatively small) main memory. More interesting solutions for this problem are disk based, where the better of these two, from Cox et al. (2012), is based on the Burrows-Wheeler transform (BWT) and achieves 0.518 bits per base for a 134.0 Gbp human genome sequencing collection with almost 45-fold coverage. We propose overlapping reads compression with minimizers, a compression algorithm dedicated to sequencing reads (DNA only). Our method makes use of a conceptually simple and easily parallelizable idea of minimizers, to obtain 0.317 bits per base as the compression ratio, allowing to fit the 134.0 Gbp dataset into only 5.31 GB of space. http://sun.aei.polsl.pl/orcom under a free license. sebastian.deorowicz@polsl.pl Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Long, Kyle A; Nossa, Carlos W; Sewell, Mary A; Putnam, Nicholas H; Ryan, Joseph F
2016-01-01
There are five major extant groups of Echinodermata: Crinoidea (feather stars and sea lillies), Ophiuroidea (brittle stars and basket stars), Asteroidea (sea stars), Echinoidea (sea urchins, sea biscuits, and sand dollars), and Holothuroidea (sea cucumbers). These animals are known for their pentaradial symmetry as adults, unique water vascular system, mutable collagenous tissues, and endoskeletons of high magnesium calcite. To our knowledge, the only echinoderm species with a genome sequence available to date is Strongylocentrotus pupuratus (Echinoidea). The availability of additional echinoderm genome sequences is crucial for understanding the biology of these animals. Here we present assembled draft genomes of the brittle star Ophionereis fasciata, the sea star Patiriella regularis, and the sea cucumber Australostichopus mollis from Illumina sequence data with coverages of 12.5x, 22.5x, and 21.4x, respectively. These data provide a resource for mining gene superfamilies, identifying non-coding RNAs, confirming gene losses, and designing experimental constructs. They will be important comparative resources for future genomic studies in echinoderms.
A Nested Phosphorus and Proton Coil Array for Brain Magnetic Resonance Imaging and Spectroscopy
Brown, Ryan; Lakshmanan, Karthik; Madelin, Guillaume; Parasoglou, Prodromos
2015-01-01
A dual-nuclei radiofrequency coil array was constructed for phosphorus and proton magnetic resonance imaging and spectroscopy of the human brain at 7 Tesla. An eight-channel transceive degenerate birdcage phosphorus module was implemented to provide whole-brain coverage and significant sensitivity improvement over a standard dual-tuned loop coil. A nested eight-channel proton module provided adequate sensitivity for anatomical localization without substantially sacrificing performance on the phosphorus module. The developed array enabled phosphorus spectroscopy, a saturation transfer technique to calculate the global creatine kinase forward reaction rate, and single-metabolite whole-brain imaging with 1.4 cm nominal isotropic resolution in 15 min (2.3 cm actual resolution), while additionally enabling 1 mm isotropic proton imaging. This study demonstrates that a multi-channel array can be utilized for phosphorus and proton applications with improved coverage and/or sensitivity over traditional single-channel coils. The efficient multi-channel coil array, time-efficient pulse sequences, and the enhanced signal strength available at ultra-high fields can be combined to allow volumetric assessment of the brain and could provide new insights into the underlying energy metabolism impairment in several neurodegenerative conditions, such as Alzheimer’s and Parkinson’s diseases, as well as mental disorders such as schizophrenia. PMID:26375209
A nested phosphorus and proton coil array for brain magnetic resonance imaging and spectroscopy.
Brown, Ryan; Lakshmanan, Karthik; Madelin, Guillaume; Parasoglou, Prodromos
2016-01-01
A dual-nuclei radiofrequency coil array was constructed for phosphorus and proton magnetic resonance imaging and spectroscopy of the human brain at 7T. An eight-channel transceive degenerate birdcage phosphorus module was implemented to provide whole-brain coverage and significant sensitivity improvement over a standard dual-tuned loop coil. A nested eight-channel proton module provided adequate sensitivity for anatomical localization without substantially sacrificing performance on the phosphorus module. The developed array enabled phosphorus spectroscopy, a saturation transfer technique to calculate the global creatine kinase forward reaction rate, and single-metabolite whole-brain imaging with 1.4cm nominal isotropic resolution in 15min (2.3cm actual resolution), while additionally enabling 1mm isotropic proton imaging. This study demonstrates that a multi-channel array can be utilized for phosphorus and proton applications with improved coverage and/or sensitivity over traditional single-channel coils. The efficient multi-channel coil array, time-efficient pulse sequences, and the enhanced signal strength available at ultra-high fields can be combined to allow volumetric assessment of the brain and could provide new insights into the underlying energy metabolism impairment in several neurodegenerative conditions, such as Alzheimer's and Parkinson's diseases, as well as mental disorders such as schizophrenia. Copyright © 2015 Elsevier Inc. All rights reserved.
NSW annual immunisation coverage report, 2011.
Hull, Brynley; Dey, Aditi; Campbell-Lloyd, Sue; Menzies, Robert I; McIntyre, Peter B
2012-12-01
This annual report, the third in the series, documents trends in immunisation coverage in NSW for children, adolescents and the elderly, to the end of 2011. Data from the Australian Childhood Immunisation Register, the NSW School Immunisation Program and the NSW Population Health Survey were used to calculate various measures of population coverage. During 2011, greater than 90% coverage was maintained for children at 12 and 24 months of age. For children at 5 years of age the improvement seen in 2010 was sustained, with coverage at or near 90%. For adolescents, there was improved coverage for all doses of human papillomavirus vaccine, both doses of hepatitis B vaccine, varicella vaccine and the dose of diphtheria, tetanus and acellular pertussis given to school attendees in Years 7 and 10. Pneumococcal vaccination coverage in the elderly has been steadily rising, although it has remained lower than the influenza coverage estimates. This report provides trends in immunisation coverage in NSW across the age spectrum. The inclusion of coverage estimates for the pneumococcal conjugate, varicella and meningococcal C vaccines in the official coverage assessments for 'fully immunised' in 2013 is a welcome initiative.
Automated typing of red blood cell and platelet antigens: a whole-genome sequencing study.
Lane, William J; Westhoff, Connie M; Gleadall, Nicholas S; Aguad, Maria; Smeland-Wagman, Robin; Vege, Sunitha; Simmons, Daimon P; Mah, Helen H; Lebo, Matthew S; Walter, Klaudia; Soranzo, Nicole; Di Angelantonio, Emanuele; Danesh, John; Roberts, David J; Watkins, Nick A; Ouwehand, Willem H; Butterworth, Adam S; Kaufman, Richard M; Rehm, Heidi L; Silberstein, Leslie E; Green, Robert C
2018-06-01
There are more than 300 known red blood cell (RBC) antigens and 33 platelet antigens that differ between individuals. Sensitisation to antigens is a serious complication that can occur in prenatal medicine and after blood transfusion, particularly for patients who require multiple transfusions. Although pre-transfusion compatibility testing largely relies on serological methods, reagents are not available for many antigens. Methods based on single-nucleotide polymorphism (SNP) arrays have been used, but typing for ABO and Rh-the most important blood groups-cannot be done with SNP typing alone. We aimed to develop a novel method based on whole-genome sequencing to identify RBC and platelet antigens. This whole-genome sequencing study is a subanalysis of data from patients in the whole-genome sequencing arm of the MedSeq Project randomised controlled trial (NCT01736566) with no measured patient outcomes. We created a database of molecular changes in RBC and platelet antigens and developed an automated antigen-typing algorithm based on whole-genome sequencing (bloodTyper). This algorithm was iteratively improved to address cis-trans haplotype ambiguities and homologous gene alignments. Whole-genome sequencing data from 110 MedSeq participants (30 × depth) were used to initially validate bloodTyper through comparison with conventional serology and SNP methods for typing of 38 RBC antigens in 12 blood-group systems and 22 human platelet antigens. bloodTyper was further validated with whole-genome sequencing data from 200 INTERVAL trial participants (15 × depth) with serological comparisons. We iteratively improved bloodTyper by comparing its typing results with conventional serological and SNP typing in three rounds of testing. The initial whole-genome sequencing typing algorithm was 99·5% concordant across the first 20 MedSeq genomes. Addressing discordances led to development of an improved algorithm that was 99·8% concordant for the remaining 90 MedSeq genomes. Additional modifications led to the final algorithm, which was 99·2% concordant across 200 INTERVAL genomes (or 99·9% after adjustment for the lower depth of coverage). By enabling more precise antigen-matching of patients with blood donors, antigen typing based on whole-genome sequencing provides a novel approach to improve transfusion outcomes with the potential to transform the practice of transfusion medicine. National Human Genome Research Institute, Doris Duke Charitable Foundation, National Health Service Blood and Transplant, National Institute for Health Research, and Wellcome Trust. Copyright © 2018 Elsevier Ltd. All rights reserved.
Kimura, Akiko C; Nguyen, Christine N; Higa, Jeffrey I; Hurwitz, Eric L; Vugia, Duc J
2007-04-01
We examined barriers to influenza vaccination among long-term care facility (LTCF) health care workers in Southern California and developed simple, effective interventions to improve influenza vaccine coverage of these workers. In 2002, health care workers at LTCFs were surveyed regarding their knowledge and attitudes about influenza and the influenza vaccine. Results were used to develop 2 interventions, an educational campaign and Vaccine Day (a well-publicized day for free influenza vaccination of all employees at the worksite). Seventy facilities were recruited to participate in an intervention trial and randomly assigned to 4 study groups. The combination of Vaccine Day and an educational campaign was most effective in increasing vaccine coverage (53% coverage; prevalence ratio [PR]=1.45; 95% confidence interval [CI]=1.24, 1.71, compared with 27% coverage in the control group). Vaccine Day alone was also effective (46% coverage; PR= 1.41; 95% CI=1.17, 1.71). The educational campaign alone was not effective in improving coverage levels (34% coverage; PR=1.18; 95% CI=0.93, 1.50). Influenza vaccine coverage of LTCF health care workers can be improved by providing free vaccinations at the worksite with a well-publicized Vaccine Day.
A Two-Phase Coverage-Enhancing Algorithm for Hybrid Wireless Sensor Networks
Zhang, Qingguo; Fok, Mable P.
2017-01-01
Providing field coverage is a key task in many sensor network applications. In certain scenarios, the sensor field may have coverage holes due to random initial deployment of sensors; thus, the desired level of coverage cannot be achieved. A hybrid wireless sensor network is a cost-effective solution to this problem, which is achieved by repositioning a portion of the mobile sensors in the network to meet the network coverage requirement. This paper investigates how to redeploy mobile sensor nodes to improve network coverage in hybrid wireless sensor networks. We propose a two-phase coverage-enhancing algorithm for hybrid wireless sensor networks. In phase one, we use a differential evolution algorithm to compute the candidate’s target positions in the mobile sensor nodes that could potentially improve coverage. In the second phase, we use an optimization scheme on the candidate’s target positions calculated from phase one to reduce the accumulated potential moving distance of mobile sensors, such that the exact mobile sensor nodes that need to be moved as well as their final target positions can be determined. Experimental results show that the proposed algorithm provided significant improvement in terms of area coverage rate, average moving distance, area coverage–distance rate and the number of moved mobile sensors, when compare with other approaches. PMID:28075365
Yang, Yang; Kramer, Christopher M.; Shaw, Peter W.; Meyer, Craig H.; Salerno, Michael
2015-01-01
Purpose To design and evaluate 2D L1-SPIRiT accelerated spiral pulse sequences for first-pass myocardial perfusion imaging with whole heart coverage capable of measuring 8 slices at 2 mm in-plane resolution at heart rates up to 125 beats per minute (BPM). Methods Combinations of 5 different spiral trajectories and 4 k-t sampling patterns were retrospectively simulated in 25 fully sampled datasets and reconstructed with L1-SPIRiT to determine the best combination of parameters. Two candidate sequences were prospectively evaluated in 34 human subjects to assess in-vivo performance. Results A dual density broad transition spiral trajectory with either angularly uniform or golden angle in time k-t sampling pattern had the largest structural similarity (SSIM) and smallest root mean square error (RMSE) from the retrospective simulation, and the L1-SPIRiT reconstruction had well-preserved temporal dynamics. In vivo data demonstrated that both of the sampling patterns could produce high quality perfusion images with whole-heart coverage. Conclusion First-pass myocardial perfusion imaging using accelerated spirals with optimized trajectory and k-t sampling pattern can produce high quality 2D-perfusion images with wholeheart coverage at the heart rates up to 125 BPM. PMID:26538511
Varela, Miguel A; Curtis, Helen J; Douglas, Andrew GL; Hammond, Suzan M; O'Loughlin, Aisling J; Sobrido, Maria J; Scholefield, Janine; Wood, Matthew JA
2016-01-01
Allele-specific gene therapy aims to silence expression of mutant alleles through targeting of disease-linked single-nucleotide polymorphisms (SNPs). However, SNP linkage to disease varies between populations, making such molecular therapies applicable only to a subset of patients. Moreover, not all SNPs have the molecular features necessary for potent gene silencing. Here we provide knowledge to allow the maximisation of patient coverage by building a comprehensive understanding of SNPs ranked according to their predicted suitability toward allele-specific silencing in 14 repeat expansion diseases: amyotrophic lateral sclerosis and frontotemporal dementia, dentatorubral-pallidoluysian atrophy, myotonic dystrophy 1, myotonic dystrophy 2, Huntington's disease and several spinocerebellar ataxias. Our systematic analysis of DNA sequence variation shows that most annotated SNPs are not suitable for potent allele-specific silencing across populations because of suboptimal sequence features and low variability (>97% in HD). We suggest maximising patient coverage by selecting SNPs with high heterozygosity across populations, and preferentially targeting SNPs that lead to purine:purine mismatches in wild-type alleles to obtain potent allele-specific silencing. We therefore provide fundamental knowledge on strategies for optimising patient coverage of therapeutics for microsatellite expansion disorders by linking analysis of population genetic variation to the selection of molecular targets. PMID:25990798
Varela, Miguel A; Curtis, Helen J; Douglas, Andrew G L; Hammond, Suzan M; O'Loughlin, Aisling J; Sobrido, Maria J; Scholefield, Janine; Wood, Matthew J A
2016-02-01
Allele-specific gene therapy aims to silence expression of mutant alleles through targeting of disease-linked single-nucleotide polymorphisms (SNPs). However, SNP linkage to disease varies between populations, making such molecular therapies applicable only to a subset of patients. Moreover, not all SNPs have the molecular features necessary for potent gene silencing. Here we provide knowledge to allow the maximisation of patient coverage by building a comprehensive understanding of SNPs ranked according to their predicted suitability toward allele-specific silencing in 14 repeat expansion diseases: amyotrophic lateral sclerosis and frontotemporal dementia, dentatorubral-pallidoluysian atrophy, myotonic dystrophy 1, myotonic dystrophy 2, Huntington's disease and several spinocerebellar ataxias. Our systematic analysis of DNA sequence variation shows that most annotated SNPs are not suitable for potent allele-specific silencing across populations because of suboptimal sequence features and low variability (>97% in HD). We suggest maximising patient coverage by selecting SNPs with high heterozygosity across populations, and preferentially targeting SNPs that lead to purine:purine mismatches in wild-type alleles to obtain potent allele-specific silencing. We therefore provide fundamental knowledge on strategies for optimising patient coverage of therapeutics for microsatellite expansion disorders by linking analysis of population genetic variation to the selection of molecular targets.
NASA Astrophysics Data System (ADS)
Lindner, Robert; Lou, Xinghua; Reinstein, Jochen; Shoeman, Robert L.; Hamprecht, Fred A.; Winkler, Andreas
2014-06-01
Hydrogen-deuterium exchange (HDX) experiments analyzed by mass spectrometry (MS) provide information about the dynamics and the solvent accessibility of protein backbone amide hydrogen atoms. Continuous improvement of MS instrumentation has contributed to the increasing popularity of this method; however, comprehensive automated data analysis is only beginning to mature. We present Hexicon 2, an automated pipeline for data analysis and visualization based on the previously published program Hexicon (Lou et al. 2010). Hexicon 2 employs the sensitive NITPICK peak detection algorithm of its predecessor in a divide-and-conquer strategy and adds new features, such as chromatogram alignment and improved peptide sequence assignment. The unique feature of deuteration distribution estimation was retained in Hexicon 2 and improved using an iterative deconvolution algorithm that is robust even to noisy data. In addition, Hexicon 2 provides a data browser that facilitates quality control and provides convenient access to common data visualization tasks. Analysis of a benchmark dataset demonstrates superior performance of Hexicon 2 compared with its predecessor in terms of deuteration centroid recovery and deuteration distribution estimation. Hexicon 2 greatly reduces data analysis time compared with manual analysis, whereas the increased number of peptides provides redundant coverage of the entire protein sequence. Hexicon 2 is a standalone application available free of charge under http://hx2.mpimf-heidelberg.mpg.de.
Lindner, Robert; Lou, Xinghua; Reinstein, Jochen; Shoeman, Robert L; Hamprecht, Fred A; Winkler, Andreas
2014-06-01
Hydrogen-deuterium exchange (HDX) experiments analyzed by mass spectrometry (MS) provide information about the dynamics and the solvent accessibility of protein backbone amide hydrogen atoms. Continuous improvement of MS instrumentation has contributed to the increasing popularity of this method; however, comprehensive automated data analysis is only beginning to mature. We present Hexicon 2, an automated pipeline for data analysis and visualization based on the previously published program Hexicon (Lou et al. 2010). Hexicon 2 employs the sensitive NITPICK peak detection algorithm of its predecessor in a divide-and-conquer strategy and adds new features, such as chromatogram alignment and improved peptide sequence assignment. The unique feature of deuteration distribution estimation was retained in Hexicon 2 and improved using an iterative deconvolution algorithm that is robust even to noisy data. In addition, Hexicon 2 provides a data browser that facilitates quality control and provides convenient access to common data visualization tasks. Analysis of a benchmark dataset demonstrates superior performance of Hexicon 2 compared with its predecessor in terms of deuteration centroid recovery and deuteration distribution estimation. Hexicon 2 greatly reduces data analysis time compared with manual analysis, whereas the increased number of peptides provides redundant coverage of the entire protein sequence. Hexicon 2 is a standalone application available free of charge under http://hx2.mpimf-heidelberg.mpg.de.
Reference-guided de novo assembly approach improves genome reconstruction for related species.
Lischer, Heidi E L; Shimizu, Kentaro K
2017-11-10
The development of next-generation sequencing has made it possible to sequence whole genomes at a relatively low cost. However, de novo genome assemblies remain challenging due to short read length, missing data, repetitive regions, polymorphisms and sequencing errors. As more and more genomes are sequenced, reference-guided assembly approaches can be used to assist the assembly process. However, previous methods mostly focused on the assembly of other genotypes within the same species. We adapted and extended a reference-guided de novo assembly approach, which enables the usage of a related reference sequence to guide the genome assembly. In order to compare and evaluate de novo and our reference-guided de novo assembly approaches, we used a simulated data set of a repetitive and heterozygotic plant genome. The extended reference-guided de novo assembly approach almost always outperforms the corresponding de novo assembly program even when a reference of a different species is used. Similar improvements can be observed in high and low coverage situations. In addition, we show that a single evaluation metric, like the widely used N50 length, is not enough to properly rate assemblies as it not always points to the best assembly evaluated with other criteria. Therefore, we used the summed z-scores of 36 different statistics to evaluate the assemblies. The combination of reference mapping and de novo assembly provides a powerful tool to improve genome reconstruction by integrating information of a related genome. Our extension of the reference-guided de novo assembly approach enables the application of this strategy not only within but also between related species. Finally, the evaluation of genome assemblies is often not straight forward, as the truth is not known. Thus one should always use a combination of evaluation metrics, which not only try to assess the continuity but also the accuracy of an assembly.
Zhu, Ying; Zhao, Rui; Piehowski, Paul D.; ...
2017-09-01
One of the greatest challenges for mass spectrometry (MS)-based proteomics is the limited ability to analyze small samples. Here in this study, we investigate the relative contributions of liquid chromatography (LC), MS instrumentation and data analysis methods with the aim of improving proteome coverage for sample sizes ranging from 0.5 ng to 50 ng. We show that the LC separations utilizing 30-μm-i.d. columns increase signal intensity by >3-fold relative to those using 75-μm-i.d. columns, leading to 32% increase in peptide identifications. The Orbitrap Fusion Lumos MS significantly boosted both sensitivity and sequencing speed relative to earlier generation Orbitraps (e.g., LTQ-Orbitrap),more » leading to a ~3-fold increase in peptide identifications and 1.7-fold increase in identified protein groups for 2 ng tryptic digests of the bacterium S. oneidensis. The Match Between Runs algorithm of open-source MaxQuant software further increased proteome coverage by ~95% for 0.5 ng samples and by ~42% for 2 ng samples. Using the best combination of the above variables, we were able to identify >3,000 proteins from 10 ng tryptic digests from both HeLa and THP-1 mammalian cell lines. We also identified >950 proteins from subnanogram archaeal/bacterial cocultures. Finally, the present ultrasensitive LC-MS platform achieves a level of proteome coverage not previously realized for ultra-small sample loadings, and is expected to facilitate the analysis of subnanogram samples, including single mammalian cells.« less
Memetic Algorithm-Based Multi-Objective Coverage Optimization for Wireless Sensor Networks
Chen, Zhi; Li, Shuai; Yue, Wenjing
2014-01-01
Maintaining effective coverage and extending the network lifetime as much as possible has become one of the most critical issues in the coverage of WSNs. In this paper, we propose a multi-objective coverage optimization algorithm for WSNs, namely MOCADMA, which models the coverage control of WSNs as the multi-objective optimization problem. MOCADMA uses a memetic algorithm with a dynamic local search strategy to optimize the coverage of WSNs and achieve the objectives such as high network coverage, effective node utilization and more residual energy. In MOCADMA, the alternative solutions are represented as the chromosomes in matrix form, and the optimal solutions are selected through numerous iterations of the evolution process, including selection, crossover, mutation, local enhancement, and fitness evaluation. The experiment and evaluation results show MOCADMA can have good capabilities in maintaining the sensing coverage, achieve higher network coverage while improving the energy efficiency and effectively prolonging the network lifetime, and have a significant improvement over some existing algorithms. PMID:25360579
Memetic algorithm-based multi-objective coverage optimization for wireless sensor networks.
Chen, Zhi; Li, Shuai; Yue, Wenjing
2014-10-30
Maintaining effective coverage and extending the network lifetime as much as possible has become one of the most critical issues in the coverage of WSNs. In this paper, we propose a multi-objective coverage optimization algorithm for WSNs, namely MOCADMA, which models the coverage control of WSNs as the multi-objective optimization problem. MOCADMA uses a memetic algorithm with a dynamic local search strategy to optimize the coverage of WSNs and achieve the objectives such as high network coverage, effective node utilization and more residual energy. In MOCADMA, the alternative solutions are represented as the chromosomes in matrix form, and the optimal solutions are selected through numerous iterations of the evolution process, including selection, crossover, mutation, local enhancement, and fitness evaluation. The experiment and evaluation results show MOCADMA can have good capabilities in maintaining the sensing coverage, achieve higher network coverage while improving the energy efficiency and effectively prolonging the network lifetime, and have a significant improvement over some existing algorithms.
Friesen, Valerie M; Aaron, Grant J; Myatt, Mark; Neufeld, Lynnette M
2017-05-01
Food fortification is a widely used approach to increase micronutrient intake in the diet. High coverage is essential for achieving impact. Data on coverage is limited in many countries, and tools to assess coverage of fortification programs have not been standardized. In 2013, the Global Alliance for Improved Nutrition developed the Fortification Assessment Coverage Toolkit (FACT) to carry out coverage assessments in both population-based (i.e., staple foods and/or condiments) and targeted (e.g., infant and young child) fortification programs. The toolkit was designed to generate evidence on program coverage and the use of fortified foods to provide timely and programmatically relevant information for decision making. This supplement presents results from FACT surveys that assessed the coverage of population-based and targeted food fortification programs across 14 countries. It then discusses the policy and program implications of the findings for the potential for impact and program improvement.
Conte, Matthew A; Gammerdinger, William J; Bartie, Kerry L; Penman, David J; Kocher, Thomas D
2017-05-02
Tilapias are the second most farmed fishes in the world and a sustainable source of food. Like many other fish, tilapias are sexually dimorphic and sex is a commercially important trait in these fish. In this study, we developed a significantly improved assembly of the tilapia genome using the latest genome sequencing methods and show how it improves the characterization of two sex determination regions in two tilapia species. A homozygous clonal XX female Nile tilapia (Oreochromis niloticus) was sequenced to 44X coverage using Pacific Biosciences (PacBio) SMRT sequencing. Dozens of candidate de novo assemblies were generated and an optimal assembly (contig NG50 of 3.3Mbp) was selected using principal component analysis of likelihood scores calculated from several paired-end sequencing libraries. Comparison of the new assembly to the previous O. niloticus genome assembly reveals that recently duplicated portions of the genome are now well represented. The overall number of genes in the new assembly increased by 27.3%, including a 67% increase in pseudogenes. The new tilapia genome assembly correctly represents two recent vasa gene duplication events that have been verified with BAC sequencing. At total of 146Mbp of additional transposable element sequence are now assembled, a large proportion of which are recent insertions. Large centromeric satellite repeats are assembled and annotated in cichlid fish for the first time. Finally, the new assembly identifies the long-range structure of both a ~9Mbp XY sex determination region on LG1 in O. niloticus, and a ~50Mbp WZ sex determination region on LG3 in the related species O. aureus. This study highlights the use of long read sequencing to correctly assemble recent duplications and to characterize repeat-filled regions of the genome. The study serves as an example of the need for high quality genome assemblies and provides a framework for identifying sex determining genes in tilapia and related fish species.
The role of the RAS pathway in iAMP21-ALL
Ryan, S L; Matheson, E; Grossmann, V; Sinclair, P; Bashton, M; Schwab, C; Towers, W; Partington, M; Elliott, A; Minto, L; Richardson, S; Rahman, T; Keavney, B; Skinner, R; Bown, N; Haferlach, T; Vandenberghe, P; Haferlach, C; Santibanez-Koref, M; Moorman, A V; Kohlmann, A; Irving, J A E; Harrison, C J
2016-01-01
Intrachromosomal amplification of chromosome 21 (iAMP21) identifies a high-risk subtype of acute lymphoblastic leukaemia (ALL), requiring intensive treatment to reduce their relapse risk. Improved understanding of the genomic landscape of iAMP21-ALL will ascertain whether these patients may benefit from targeted therapy. We performed whole-exome sequencing of eight iAMP21-ALL samples. The mutation rate was dramatically disparate between cases (average 24.9, range 5–51) and a large number of novel variants were identified, including frequent mutation of the RAS/MEK/ERK pathway. Targeted sequencing of a larger cohort revealed that 60% (25/42) of diagnostic iAMP21-ALL samples harboured 42 distinct RAS pathway mutations. High sequencing coverage demonstrated heterogeneity in the form of multiple RAS pathway mutations within the same sample and diverse variant allele frequencies (VAFs) (2–52%), similar to other subtypes of ALL. Constitutive RAS pathway activation was observed in iAMP21 samples that harboured mutations in the predominant clone (⩾35% VAF). Viable iAMP21 cells from primary xenografts showed reduced viability in response to the MEK1/2 inhibitor, selumetinib, in vitro. As clonal (⩾35% VAF) mutations were detected in 26% (11/42) of iAMP21-ALL, this evidence of response to RAS pathway inhibitors may offer the possibility to introduce targeted therapy to improve therapeutic efficacy in these high-risk patients. PMID:27168466
Gene expression distribution deconvolution in single-cell RNA sequencing.
Wang, Jingshu; Huang, Mo; Torre, Eduardo; Dueck, Hannah; Shaffer, Sydney; Murray, John; Raj, Arjun; Li, Mingyao; Zhang, Nancy R
2018-06-26
Single-cell RNA sequencing (scRNA-seq) enables the quantification of each gene's expression distribution across cells, thus allowing the assessment of the dispersion, nonzero fraction, and other aspects of its distribution beyond the mean. These statistical characterizations of the gene expression distribution are critical for understanding expression variation and for selecting marker genes for population heterogeneity. However, scRNA-seq data are noisy, with each cell typically sequenced at low coverage, thus making it difficult to infer properties of the gene expression distribution from raw counts. Based on a reexamination of nine public datasets, we propose a simple technical noise model for scRNA-seq data with unique molecular identifiers (UMI). We develop deconvolution of single-cell expression distribution (DESCEND), a method that deconvolves the true cross-cell gene expression distribution from observed scRNA-seq counts, leading to improved estimates of properties of the distribution such as dispersion and nonzero fraction. DESCEND can adjust for cell-level covariates such as cell size, cell cycle, and batch effects. DESCEND's noise model and estimation accuracy are further evaluated through comparisons to RNA FISH data, through data splitting and simulations and through its effectiveness in removing known batch effects. We demonstrate how DESCEND can clarify and improve downstream analyses such as finding differentially expressed genes, identifying cell types, and selecting differentiation markers. Copyright © 2018 the Author(s). Published by PNAS.
Feliubadaló, Lídia; Lopez-Doriga, Adriana; Castellsagué, Ester; del Valle, Jesús; Menéndez, Mireia; Tornero, Eva; Montes, Eva; Cuesta, Raquel; Gómez, Carolina; Campos, Olga; Pineda, Marta; González, Sara; Moreno, Victor; Brunet, Joan; Blanco, Ignacio; Serra, Eduard; Capellá, Gabriel; Lázaro, Conxi
2013-01-01
Next-generation sequencing (NGS) is changing genetic diagnosis due to its huge sequencing capacity and cost-effectiveness. The aim of this study was to develop an NGS-based workflow for routine diagnostics for hereditary breast and ovarian cancer syndrome (HBOCS), to improve genetic testing for BRCA1 and BRCA2. A NGS-based workflow was designed using BRCA MASTR kit amplicon libraries followed by GS Junior pyrosequencing. Data analysis combined Variant Identification Pipeline freely available software and ad hoc R scripts, including a cascade of filters to generate coverage and variant calling reports. A BRCA homopolymer assay was performed in parallel. A research scheme was designed in two parts. A Training Set of 28 DNA samples containing 23 unique pathogenic mutations and 213 other variants (33 unique) was used. The workflow was validated in a set of 14 samples from HBOCS families in parallel with the current diagnostic workflow (Validation Set). The NGS-based workflow developed permitted the identification of all pathogenic mutations and genetic variants, including those located in or close to homopolymers. The use of NGS for detecting copy-number alterations was also investigated. The workflow meets the sensitivity and specificity requirements for the genetic diagnosis of HBOCS and improves on the cost-effectiveness of current approaches. PMID:23249957
Non-ECG-gated unenhanced MRA of the carotids: optimization and clinical feasibility.
Raoult, H; Gauvrit, J Y; Schmitt, P; Le Couls, V; Bannier, E
2013-11-01
To optimise and assess the clinical feasibility of a carotid non-ECG-gated unenhanced MRA sequence. Sixteen healthy volunteers and 11 patients presenting with internal carotid artery (ICA) disease underwent large field-of-view balanced steady-state free precession (bSSFP) unenhanced MRA at 3T. Sampling schemes acquiring the k-space centre either early (kCE) or late (kCL) in the acquisition window were evaluated. Signal and image quality was scored in comparison to ECG-gated kCE unenhanced MRA and TOF. For patients, computed tomography angiography was used as the reference. In volunteers, kCE sampling yielded higher image quality than kCL and TOF, with fewer flow artefacts and improved signal homogeneity. kCE unenhanced MRA image quality was higher without ECG-gating. Arterial signal and artery/vein contrast were higher with both bSSFP sampling schemes than with TOF. The kCE sequence allowed correct quantification of ten significant stenoses, and it facilitated the identification of an infrapetrous dysplasia, which was outside of the TOF imaging coverage. Non-ECG-gated bSSFP carotid imaging offers high-quality images and is a promising sequence for carotid disease diagnosis in a short acquisition time with high spatial resolution and a large field of view. • Non-ECG-gated unenhanced bSSFP MRA offers high-quality imaging of the carotid arteries. • Sequences using early acquisition of the k-space centre achieve higher image quality. • Non-ECG-gated unenhanced bSSFP MRA allows quantification of significant carotid stenosis. • Short MR acquisition times and ungated sequences are helpful in clinical practice. • High 3D spatial resolution and a large field of view improve diagnostic performance.
Correlation between measles vaccine doses: implications for the maintenance of elimination.
McKee, A; Ferrari, M J; Shea, K
2018-03-01
Measles eradication efforts have been successful at achieving elimination in many countries worldwide. Such countries actively work to maintain this elimination by continuing to improve coverage of two routine doses of measles vaccine following measles elimination. While improving measles vaccine coverage is always beneficial, we show, using a steady-state analysis of a dynamical model, that the correlation between populations receiving the first and second routine dose also has a significant impact on the population immunity achieved by a specified combination of first and second dose coverage. If the second dose is administered to people independently of whether they had the first dose, high second-dose coverage improves the proportion of the population receiving at least one dose, and will have a large effect on population immunity. If the second dose is administered only to people who have had the first dose, high second-dose coverage reduces the rate of primary vaccine failure, but does not reach people who missed the first dose; this will therefore have a relatively small effect on population immunity. When doses are administered dependently, and assuming the first dose has higher coverage, increasing the coverage of the first dose has a larger impact on population immunity than does increasing the coverage of the second. Correlation between vaccine doses has a significant impact on the level of population immunity maintained by current vaccination coverage, potentially outweighing the effects of age structure and, in some cases, recent improvements in vaccine coverage. It is therefore important to understand the correlation between vaccine doses as such correlation may have a large impact on the effectiveness of measles vaccination strategies.
Evaluation of a vaccine passport to improve vaccine coverage in people living with HIV.
Chadwick, D R; Corbett, K; Mann, S; Teruzzi, B; Horner, S
2018-01-01
An increased risk of vaccine-preventable infections (VPIs) is seen in people living with HIV (PLWH), and current vaccine coverage and immunity is variable. Vaccine passports have the potential to improve vaccine coverage. The objective was to assess how successful a vaccine passport was in improving vaccine coverage in PLWH. Baseline immunity to VPIs was established in PLWH attending a single HIV clinic and vaccinations required were determined based on the BHIVA Vaccination Guidelines (2015). The passport was completed and the PLWH informed about additional vaccines they should obtain from primary care. After 6-9 months the passport was reviewed including confirmation if vaccines were given. PLWH satisfaction with the system was evaluated by a survey. Seventy-three PLWH provided sufficient data for analysis. At baseline significant proportions of PLWH were not immune/unvaccinated to the main VPIs, especially human papillomavirus, pneumococcus and measles. After the passport was applied immunity improved significantly (56% overall, p < 0.01) for most VPIs; however, full coverage was not achieved. The system was popular with PLWH. The passport was successful in increasing vaccination coverage although full or near-full coverage was not achieved. A more successful service would probably be achieved by commissioning English HIV clinics to provide all vaccines.
Targeting a Complex Transcriptome: The Construction of the Mouse Full-Length cDNA Encyclopedia
Carninci, Piero; Waki, Kazunori; Shiraki, Toshiyuki; Konno, Hideaki; Shibata, Kazuhiro; Itoh, Masayoshi; Aizawa, Katsunori; Arakawa, Takahiro; Ishii, Yoshiyuki; Sasaki, Daisuke; Bono, Hidemasa; Kondo, Shinji; Sugahara, Yuichi; Saito, Rintaro; Osato, Naoki; Fukuda, Shiro; Sato, Kenjiro; Watahiki, Akira; Hirozane-Kishikawa, Tomoko; Nakamura, Mari; Shibata, Yuko; Yasunishi, Ayako; Kikuchi, Noriko; Yoshiki, Atsushi; Kusakabe, Moriaki; Gustincich, Stefano; Beisel, Kirk; Pavan, William; Aidinis, Vassilis; Nakagawara, Akira; Held, William A.; Iwata, Hiroo; Kono, Tomohiro; Nakauchi, Hiromitsu; Lyons, Paul; Wells, Christine; Hume, David A.; Fagiolini, Michela; Hensch, Takao K.; Brinkmeier, Michelle; Camper, Sally; Hirota, Junji; Mombaerts, Peter; Muramatsu, Masami; Okazaki, Yasushi; Kawai, Jun; Hayashizaki, Yoshihide
2003-01-01
We report the construction of the mouse full-length cDNA encyclopedia,the most extensive view of a complex transcriptome,on the basis of preparing and sequencing 246 libraries. Before cloning,cDNAs were enriched in full-length by Cap-Trapper,and in most cases,aggressively subtracted/normalized. We have produced 1,442,236 successful 3′-end sequences clustered into 171,144 groups, from which 60,770 clones were fully sequenced cDNAs annotated in the FANTOM-2 annotation. We have also produced 547,149 5′ end reads,which clustered into 124,258 groups. Altogether, these cDNAs were further grouped in 70,000 transcriptional units (TU),which represent the best coverage of a transcriptome so far. By monitoring the extent of normalization/subtraction, we define the tentative equivalent coverage (TEC),which was estimated to be equivalent to >12,000,000 ESTs derived from standard libraries. High coverage explains discrepancies between the very large numbers of clusters (and TUs) of this project,which also include non-protein-coding RNAs,and the lower gene number estimation of genome annotations. Altogether,5′-end clusters identify regions that are potential promoters for 8637 known genes and 5′-end clusters suggest the presence of almost 63,000 transcriptional starting points. An estimate of the frequency of polyadenylation signals suggests that at least half of the singletons in the EST set represent real mRNAs. Clones accounting for about half of the predicted TUs await further sequencing. The continued high-discovery rate suggests that the task of transcriptome discovery is not yet complete. PMID:12819125
High-Throughput Next-Generation Sequencing of Polioviruses
Montmayeur, Anna M.; Schmidt, Alexander; Zhao, Kun; Magaña, Laura; Iber, Jane; Castro, Christina J.; Chen, Qi; Henderson, Elizabeth; Ramos, Edward; Shaw, Jing; Tatusov, Roman L.; Dybdahl-Sissoko, Naomi; Endegue-Zanga, Marie Claire; Adeniji, Johnson A.; Oberste, M. Steven; Burns, Cara C.
2016-01-01
ABSTRACT The poliovirus (PV) is currently targeted for worldwide eradication and containment. Sanger-based sequencing of the viral protein 1 (VP1) capsid region is currently the standard method for PV surveillance. However, the whole-genome sequence is sometimes needed for higher resolution global surveillance. In this study, we optimized whole-genome sequencing protocols for poliovirus isolates and FTA cards using next-generation sequencing (NGS), aiming for high sequence coverage, efficiency, and throughput. We found that DNase treatment of poliovirus RNA followed by random reverse transcription (RT), amplification, and the use of the Nextera XT DNA library preparation kit produced significantly better results than other preparations. The average viral reads per total reads, a measurement of efficiency, was as high as 84.2% ± 15.6%. PV genomes covering >99 to 100% of the reference length were obtained and validated with Sanger sequencing. A total of 52 PV genomes were generated, multiplexing as many as 64 samples in a single Illumina MiSeq run. This high-throughput, sequence-independent NGS approach facilitated the detection of a diverse range of PVs, especially for those in vaccine-derived polioviruses (VDPV), circulating VDPV, or immunodeficiency-related VDPV. In contrast to results from previous studies on other viruses, our results showed that filtration and nuclease treatment did not discernibly increase the sequencing efficiency of PV isolates. However, DNase treatment after nucleic acid extraction to remove host DNA significantly improved the sequencing results. This NGS method has been successfully implemented to generate PV genomes for molecular epidemiology of the most recent PV isolates. Additionally, the ability to obtain full PV genomes from FTA cards will aid in facilitating global poliovirus surveillance. PMID:27927929
The Genome Sequence of Taurine Cattle: A Window to Ruminant Biology and Evolution
USDA-ARS?s Scientific Manuscript database
As a major step toward understanding the biology and evolution of ruminants, the cattle genome was sequenced to ~7x coverage using a combined whole genome shotgun and BAC skim approach. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs found in seven mammalian...
Len Gen: The international lentil genome sequencing project
USDA-ARS?s Scientific Manuscript database
We have been sequencing CDC Redberry using NGS of paired-end and mate-pair libraries over a wide range of sizes and technologies. The most recent draft (v0.7) of approximately 150x coverage produced scaffolds covering over half the genome (2.7 Gb of the expected 4.3 Gb). Long reads from PacBio sequ...
USDA-ARS?s Scientific Manuscript database
The mitochondrial genome of the bollworm, Helicoverpa zea, was assembled using paired-end nucleotide sequence reads generated with a next-generation sequencing platform. Assembly resulted in a mitogenome of 15,348 bp with greater than 17,000-fold average coverage. Organization of the H. zea mitogen...
USDA-ARS?s Scientific Manuscript database
Watermelon (Citrullus lanatus var. lanatus) is an important vegetable fruit throughout the world. A high number of single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers should provide large coverage of the watermelon genome and high phylogenetic resolution of germplasm acces...
GapFiller: a de novo assembly approach to fill the gap within paired reads
2012-01-01
Background Next Generation Sequencing technologies are able to provide high genome coverages at a relatively low cost. However, due to limited reads' length (from 30 bp up to 200 bp), specific bioinformatics problems have become even more difficult to solve. De novo assembly with short reads, for example, is more complicated at least for two reasons: first, the overall amount of "noisy" data to cope with increased and, second, as the reads' length decreases the number of unsolvable repeats grows. Our work's aim is to go at the root of the problem by providing a pre-processing tool capable to produce (in-silico) longer and highly accurate sequences from a collection of Next Generation Sequencing reads. Results In this paper a seed-and-extend local assembler is presented. The kernel algorithm is a loop that, starting from a read used as seed, keeps extending it using heuristics whose main goal is to produce a collection of error-free and longer sequences. In particular, GapFiller carefully detects reliable overlaps and operates clustering similar reads in order to reconstruct the missing part between the two ends of the same insert. Our tool's output has been validated on 24 experiments using both simulated and real paired reads datasets. The output sequences are declared correct when the seed-mate is found. In the experiments performed, GapFiller was able to extend high percentages of the processed seeds and find their mates, with a false positives rate that turned out to be nearly negligible. Conclusions GapFiller, starting from a sufficiently high short reads coverage, is able to produce high coverages of accurate longer sequences (from 300 bp up to 3500 bp). The procedure to perform safe extensions, together with the mate-found check, turned out to be a powerful criterion to guarantee contigs' correctness. GapFiller has further potential, as it could be applied in a number of different scenarios, including the post-processing validation of insertions/deletions detection pipelines, pre-processing routines on datasets for de novo assembly pipelines, or in any hierarchical approach designed to assemble, analyse or validate pools of sequences. PMID:23095524
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sherertz, T; Ellis, R; Colussi, V
2014-06-15
Purpose: To evaluate volumetric coverage of a Mick Radionuclear titanium Split-Ring applicator (SRA) with/without interstitial needle compared to an intracavitary Vienna applicator (VA), interstitial-intracavitary VA, and intracavitary ring and tandem applicator (RTA). Methods: A 57 year-old female with FIGO stage IIB cervical carcinoma was treated following chemoradiotherapy (45Gy pelvic and 5.4Gy parametrial boost) with highdose- rate (HDR) brachytherapy to 30Gy in 5 fractions using a SRA. A single interstitial needle was placed using the Ellis Interstitial Cap for the final three fractions to increase coverage of left-sided gross residual disease identified on 3T-MRI. High-risk (HR) clinical target volume (CTV) andmore » intermediate-risk (IR) CTV were defined using axial T2-weighted 2D and 3D MRI sequences (Philips PET/MRI unit). Organs-at-risks (OARs) were delineated on CT. Oncentra planning system was used for treatment optimization satisfying GEC-ESTRO guidelines for target coverage and OAR constraints. Retrospectively, treatment plans (additional 20 plans) were simulated using intracavitary SRA (without needle), intracavitary VA (without needle), interstitial-intracavitary VA, and intracavitary RTA with this same patient case. Plans were optimized for each fraction to maintain coverage to HR-CTV. Results: Interstitial-intracavitary SRA achieved the following combined coverage for external radiation and brachytherapy (EQD2): D90 HR-CTV =94.6Gy; Bladder-2cc =88.9Gy; Rectum-2cc =65.1Gy; Sigmoid-2cc =48.9Gy; Left vaginal wall (VW) =103Gy, Right VW =99.2Gy. Interstitial-intracavitary VA was able to achieve identical D90 HR-CTV =94.6Gy, yet Bladder-2cc =91.9Gy (exceeding GEC-ESTRO recommendations of 2cc<90Gy) and Left VW =120.8Gy and Right VW =115.5Gy. Neither the SRA nor VA without interstitial needle could cover HR-CTV adequately without exceeding dose to Bladder-2cc. Conventional RTA was unable to achieve target coverage for the HR-CTV >80Gy without severely overdosing OARs. Conclusion: The Ellis Interstitial Cap for the SRA offered superior dosimetric coverage as compared to the interstitialintracavitary VA. This represents the first reported use for this devise, and further investigation is warranted.« less
Improving care for patients on antiretroviral therapy through a gap analysis framework.
Massoud, M Rashad; Shakir, Fazila; Livesley, Nigel; Muhire, Martin; Nabwire, Juliana; Ottosson, Amanda; Jean-Baptiste, Rachel; Megere, Humphrey; Karamagi-Nkolo, Esther; Gaudreault, Suzanne; Marks, Pamela; Jennings, Larissa
2015-07-01
To improve quality of care through decreasing existing gaps in the areas of coverage, retention, and wellness of patients receiving HIV care and treatment. The antiretroviral therapy (ART) Framework utilizes improvement methods and the Chronic Care Model to address the coverage, retention, and wellness gaps in HIV care and treatment. This is a time-series study. The ART Framework was applied in five health centers in Buikwe District, Uganda. Quality improvement teams, consisting of healthcare workers and expert patients, were established in each of the five healthcare facilities. The intervention period was October 2010 to September 2012. It consisted of quality improvement teams analyzing their facility and systems of care from the perspective of the Chronic Care Model to identify areas of improvement. They implemented the ART Framework, collected data and assessed outcomes, focused on self-management support for patients, to improve coverage, retention, and wellness gaps in HIV care and treatment. Coverage was defined as every patient who needs ART in the catchment area, receives it. Retention was defined as every patient who receives ART stays on ART, and wellness defined as having a positive clinical, immunological, and/or virological response to treatment without intolerable or unmanageable side-effects. Results from Buikwe show the gaps in coverage, retention, and wellness greatly decreased a gap in coverage of 44-19%, gap in retention of 49-24%, and gap in wellness of 53-14% during a 2-year intervention period. The ART Framework is an innovative and practical tool for HIV program managers to improve HIV care and treatment.
Combining multiple ChIP-seq peak detection systems using combinatorial fusion.
Schweikert, Christina; Brown, Stuart; Tang, Zuojian; Smith, Phillip R; Hsu, D Frank
2012-01-01
Due to the recent rapid development in ChIP-seq technologies, which uses high-throughput next-generation DNA sequencing to identify the targets of Chromatin Immunoprecipitation, there is an increasing amount of sequencing data being generated that provides us with greater opportunity to analyze genome-wide protein-DNA interactions. In particular, we are interested in evaluating and enhancing computational and statistical techniques for locating protein binding sites. Many peak detection systems have been developed; in this study, we utilize the following six: CisGenome, MACS, PeakSeq, QuEST, SISSRs, and TRLocator. We define two methods to merge and rescore the regions of two peak detection systems and analyze the performance based on average precision and coverage of transcription start sites. The results indicate that ChIP-seq peak detection can be improved by fusion using score or rank combination. Our method of combination and fusion analysis would provide a means for generic assessment of available technologies and systems and assist researchers in choosing an appropriate system (or fusion method) for analyzing ChIP-seq data. This analysis offers an alternate approach for increasing true positive rates, while decreasing false positive rates and hence improving the ChIP-seq peak identification process.
Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies
Sundquist, Andreas; Ronaghi, Mostafa; Tang, Haixu; Pevzner, Pavel; Batzoglou, Serafim
2007-01-01
While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology. PMID:17534434
2011-01-01
Background BAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such a region can be approached with next-generation whole-genome sequencing and assembly as if it were an independent small genome. Using the minimum tiling path as a guide, specific BAC clones representing the prioritized genomic interval are selected, pooled, and used to prepare a sequencing library. Results This pooled BAC approach was taken to sequence and assemble a QTL-rich region, of ~3 Mbp and represented by twenty-seven BACs, on linkage group 5 of the Theobroma cacao cv. Matina 1-6 genome. Using various mixtures of read coverages from paired-end and linear 454 libraries, multiple assemblies of varied quality were generated. Quality was assessed by comparing the assembly of 454 reads with a subset of ten BACs individually sequenced and assembled using Sanger reads. A mixture of reads optimal for assembly was identified. We found, furthermore, that a quality assembly suitable for serving as a reference genome template could be obtained even with a reduced depth of sequencing coverage. Annotation of the resulting assembly revealed several genes potentially responsible for three T. cacao traits: black pod disease resistance, bean shape index, and pod weight. Conclusions Our results, as with other pooled BAC sequencing reports, suggest that pooling portions of a minimum tiling path derived from a BAC-based physical map is an effective method to target sub-genomic regions for sequencing. While we focused on a single QTL region, other QTL regions of importance could be similarly sequenced allowing for biological discovery to take place before a high quality whole-genome assembly is completed. PMID:21794110
Feltus, Frank A; Saski, Christopher A; Mockaitis, Keithanne; Haiminen, Niina; Parida, Laxmi; Smith, Zachary; Ford, James; Staton, Margaret E; Ficklin, Stephen P; Blackmon, Barbara P; Cheng, Chun-Huai; Schnell, Raymond J; Kuhn, David N; Motamayor, Juan-Carlos
2011-07-27
BAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such a region can be approached with next-generation whole-genome sequencing and assembly as if it were an independent small genome. Using the minimum tiling path as a guide, specific BAC clones representing the prioritized genomic interval are selected, pooled, and used to prepare a sequencing library. This pooled BAC approach was taken to sequence and assemble a QTL-rich region, of ~3 Mbp and represented by twenty-seven BACs, on linkage group 5 of the Theobroma cacao cv. Matina 1-6 genome. Using various mixtures of read coverages from paired-end and linear 454 libraries, multiple assemblies of varied quality were generated. Quality was assessed by comparing the assembly of 454 reads with a subset of ten BACs individually sequenced and assembled using Sanger reads. A mixture of reads optimal for assembly was identified. We found, furthermore, that a quality assembly suitable for serving as a reference genome template could be obtained even with a reduced depth of sequencing coverage. Annotation of the resulting assembly revealed several genes potentially responsible for three T. cacao traits: black pod disease resistance, bean shape index, and pod weight. Our results, as with other pooled BAC sequencing reports, suggest that pooling portions of a minimum tiling path derived from a BAC-based physical map is an effective method to target sub-genomic regions for sequencing. While we focused on a single QTL region, other QTL regions of importance could be similarly sequenced allowing for biological discovery to take place before a high quality whole-genome assembly is completed.
Doring, Martin
2005-12-01
This article deals with the cultural framing of the near sequencing of the human genome and its impact on the media coverage in Germany. It investigates in particular the way in which the weekly journal Die Zeit and the daily newspaper Frankfurter Rundschau reported this media event and its aftermath between June 2000 and June 2001. Both newspapers are quality papers that played an essential role in framing the human genome debate--alongside the Frankfurter Allgemeine Zeitung--which became the most prominent genomic forum. The decoding of the human genome prompted a huge controversy concerning the ethics of human engineering, research on stem cells and Preimplantation Genetic Diagnosis. The main aim of this article is to show how this controversy was structured by metaphor. The media coverage of the genome generated DNA-factishes--a neologism designating the ambivalence of something as fact (fait) and as a fetish (fetiche)--that mostly propagated images of a new DNA-scienticism or biological determinism. Mediated by cultural experiences, the human genome became a highly artificial and social construct of a 'NatureCulture'.
Snyder-Mackler, Noah; Majoros, William H.; Yuan, Michael L.; Shaver, Amanda O.; Gordon, Jacob B.; Kopp, Gisela H.; Schlebusch, Stephen A.; Wall, Jeffrey D.; Alberts, Susan C.; Mukherjee, Sayan; Zhou, Xiang; Tung, Jenny
2016-01-01
Research on the genetics of natural populations was revolutionized in the 1990s by methods for genotyping noninvasively collected samples. However, these methods have remained largely unchanged for the past 20 years and lag far behind the genomics era. To close this gap, here we report an optimized laboratory protocol for genome-wide capture of endogenous DNA from noninvasively collected samples, coupled with a novel computational approach to reconstruct pedigree links from the resulting low-coverage data. We validated both methods using fecal samples from 62 wild baboons, including 48 from an independently constructed extended pedigree. We enriched fecal-derived DNA samples up to 40-fold for endogenous baboon DNA and reconstructed near-perfect pedigree relationships even with extremely low-coverage sequencing. We anticipate that these methods will be broadly applicable to the many research systems for which only noninvasive samples are available. The lab protocol and software (“WHODAD”) are freely available at www.tung-lab.org/protocols-and-software.html and www.xzlab.org/software.html, respectively. PMID:27098910
Shifted Transversal Design smart-pooling for high coverage interactome mapping
Xin, Xiaofeng; Rual, Jean-François; Hirozane-Kishikawa, Tomoko; Hill, David E.; Vidal, Marc; Boone, Charles; Thierry-Mieg, Nicolas
2009-01-01
“Smart-pooling,” in which test reagents are multiplexed in a highly redundant manner, is a promising strategy for achieving high efficiency, sensitivity, and specificity in systems-level projects. However, previous applications relied on low redundancy designs that do not leverage the full potential of smart-pooling, and more powerful theoretical constructions, such as the Shifted Transversal Design (STD), lack experimental validation. Here we evaluate STD smart-pooling in yeast two-hybrid (Y2H) interactome mapping. We employed two STD designs and two established methods to perform ORFeome-wide Y2H screens with 12 baits. We found that STD pooling achieves similar levels of sensitivity and specificity as one-on-one array-based Y2H, while the costs and workloads are divided by three. The screening-sequencing approach is the most cost- and labor-efficient, yet STD identifies about twofold more interactions. Screening-sequencing remains an appropriate method for quickly producing low-coverage interactomes, while STD pooling appears as the method of choice for obtaining maps with higher coverage. PMID:19447967
Supply, Philip; Marceau, Michael; Mangenot, Sophie; Roche, David; Rouanet, Carine; Khanna, Varun; Majlessi, Laleh; Criscuolo, Alexis; Tap, Julien; Pawlik, Alexandre; Fiette, Laurence; Orgeur, Mickael; Fabre, Michel; Parmentier, Cécile; Frigui, Wafa; Simeone, Roxane; Boritsch, Eva C.; Debrie, Anne-Sophie; Willery, Eve; Walker, Danielle; Quail, Michael A.; Ma, Laurence; Bouchier, Christiane; Salvignol, Grégory; Sayes, Fadel; Cascioferro, Alessandro; Seemann, Torsten; Barbe, Valérie; Locht, Camille; Gutierrez, Maria-Cristina; Leclerc, Claude; Bentley, Stephen; Stinear, Timothy P.; Brisse, Sylvain; Médigue, Claudine; Parkhill, Julian; Cruveiller, Stéphane; Brosch, Roland
2013-01-01
Global spread and genetic monomorphism are hallmarks of Mycobacterium tuberculosis, the agent of human tuberculosis. In contrast, Mycobacterium canettii, and related tubercle bacilli that also cause human tuberculosis and exhibit unusual smooth colony morphology, are restricted to East-Africa. Here, we sequenced and analyzed the genomes of five representative strains of smooth tubercle bacilli (STB) using Sanger (4-5x coverage), 454/Roche (13-18x coverage) and/or Illumina DNA sequencing (45-105x coverage). We show that STB are highly recombinogenic and evolutionary early-branching, with larger genome sizes, 25-fold more SNPs, fewer molecular scars and distinct CRISPR-Cas systems relative to M. tuberculosis. Despite the differences, all tuberculosis-causing mycobacteria share a highly conserved core genome. Mouse-infection experiments revealed that STB are less persistent and virulent than M. tuberculosis. We conclude that M. tuberculosis emerged from an ancestral, STB-like pool of mycobacteria by gain of persistence and virulence mechanisms and we provide genome-wide insights into the molecular events involved. PMID:23291586
Staton, Margaret; Best, Teodora; Khodwekar, Sudhir; Owusu, Sandra; Xu, Tao; Xu, Yi; Jennings, Tara; Cronn, Richard; Arumuganathan, A. Kathiravetpilla; Coggeshall, Mark; Gailing, Oliver; Liang, Haiying; Romero-Severson, Jeanne; Schlarbaum, Scott; Carlson, John E.
2015-01-01
Forest health issues are on the rise in the United States, resulting from introduction of alien pests and diseases, coupled with abiotic stresses related to climate change. Increasingly, forest scientists are finding genetic/genomic resources valuable in addressing forest health issues. For a set of ten ecologically and economically important native hardwood tree species representing a broad phylogenetic spectrum, we used low coverage whole genome sequencing from multiplex Illumina paired ends to economically profile their genomic content. For six species, the genome content was further analyzed by flow cytometry in order to determine the nuclear genome size. Sequencing yielded a depth of 0.8X to 7.5X, from which in silico analysis yielded preliminary estimates of gene and repetitive sequence content in the genome for each species. Thousands of genomic SSRs were identified, with a clear predisposition toward dinucleotide repeats and AT-rich repeat motifs. Flanking primers were designed for SSR loci for all ten species, ranging from 891 loci in sugar maple to 18,167 in redbay. In summary, we have demonstrated that useful preliminary genome information including repeat content, gene content and useful SSR markers can be obtained at low cost and time input from a single lane of Illumina multiplex sequence. PMID:26698853
Jiang, Yue; Turinsky, Andrei L.; Brudno, Michael
2015-01-01
With the development of High-Throughput Sequencing (HTS) thousands of human genomes have now been sequenced. Whenever different studies analyze the same genome they usually agree on the amount of single-nucleotide polymorphisms, but differ dramatically on the number of insertion and deletion variants (indels). Furthermore, there is evidence that indels are often severely under-reported. In this manuscript we derive the total number of indel variants in a human genome by combining data from different sequencing technologies, while assessing the indel detection accuracy. Our estimate of approximately 1 million indels in a Yoruban genome is much higher than the results reported in several recent HTS studies. We identify two key sources of difficulties in indel detection: the insufficient coverage, read length or alignment quality; and the presence of repeats, including short interspersed elements and homopolymers/dimers. We quantify the effect of these factors on indel detection. The quality of sequencing data plays a major role in improving indel detection by HTS methods. However, many indels exist in long homopolymers and repeats, where their detection is severely impeded. The true number of indel events is likely even higher than our current estimates, and new techniques and technologies will be required to detect them. PMID:26130710
Expanding probe repertoire and improving reproducibility in human genomic hybridization
Dorman, Stephanie N.; Shirley, Ben C.; Knoll, Joan H. M.; Rogan, Peter K.
2013-01-01
Diagnostic DNA hybridization relies on probes composed of single copy (sc) genomic sequences. Sc sequences in probe design ensure high specificity and avoid cross-hybridization to other regions of the genome, which could lead to ambiguous results that are difficult to interpret. We examine how the distribution and composition of repetitive sequences in the genome affects sc probe performance. A divide and conquer algorithm was implemented to design sc probes. With this approach, sc probes can include divergent repetitive elements, which hybridize to unique genomic targets under higher stringency experimental conditions. Genome-wide custom probe sets were created for fluorescent in situ hybridization (FISH) and microarray genomic hybridization. The scFISH probes were developed for detection of copy number changes within small tumour suppressor genes and oncogenes. The microarrays demonstrated increased reproducibility by eliminating cross-hybridization to repetitive sequences adjacent to probe targets. The genome-wide microarrays exhibited lower median coefficients of variation (17.8%) for two HapMap family trios. The coefficients of variations of commercial probes within 300 nt of a repetitive element were 48.3% higher than the nearest custom probe. Furthermore, the custom microarray called a chromosome 15q11.2q13 deletion more consistently. This method for sc probe design increases probe coverage for FISH and lowers variability in genomic microarrays. PMID:23376933
UV Decontamination of MDA Reagents for Single Cell Genomics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Janey; Tighe, Damon; Sczyrba, Alexander
2011-03-18
Single cell genomics, the amplification and sequencing of genomes from single cells, can provide a glimpse into the genetic make-up and thus life style of the vast majority of uncultured microbial cells, making it an immensely powerful and increasingly popular tool. This is accomplished by use of multiple displacement amplification (MDA), which can generate billions of copies of a single bacterial genome producing microgram-range DNA required for shotgun sequencing. Here, we address a key challenge inherent to this approach and propose a solution for the improved recovery of single cell genomes. While DNA-free reagents for the amplification of a singlemore » cell genome are a prerequisite for successful single cell sequencing and analysis, DNA contamination has been detected in various reagents, which poses a considerable challenge. Our study demonstrates the effect of UV irradiation in efficient elimination of exogenous contaminant DNA found in MDA reagents, while maintaining Phi29 activity. Consequently, we also find that increased UV exposure to Phi29 does not adversely affect genome coverage of MDA amplified single cells. While additional challenges in single cell genomics remain to be resolved, the proposed methodology is relatively quick and simple and we believe that its application will be of high value for future single cell sequencing projects.« less
High-Resolution Spatial Distribution and Estimation of Access to Improved Sanitation in Kenya.
Jia, Peng; Anderson, John D; Leitner, Michael; Rheingans, Richard
2016-01-01
Access to sanitation facilities is imperative in reducing the risk of multiple adverse health outcomes. A distinct disparity in sanitation exists among different wealth levels in many low-income countries, which may hinder the progress across each of the Millennium Development Goals. The surveyed households in 397 clusters from 2008-2009 Kenya Demographic and Health Surveys were divided into five wealth quintiles based on their national asset scores. A series of spatial analysis methods including excess risk, local spatial autocorrelation, and spatial interpolation were applied to observe disparities in coverage of improved sanitation among different wealth categories. The total number of the population with improved sanitation was estimated by interpolating, time-adjusting, and multiplying the surveyed coverage rates by high-resolution population grids. A comparison was then made with the annual estimates from United Nations Population Division and World Health Organization /United Nations Children's Fund Joint Monitoring Program for Water Supply and Sanitation. The Empirical Bayesian Kriging interpolation produced minimal root mean squared error for all clusters and five quintiles while predicting the raw and spatial coverage rates of improved sanitation. The coverage in southern regions was generally higher than in the north and east, and the coverage in the south decreased from Nairobi in all directions, while Nyanza and North Eastern Province had relatively poor coverage. The general clustering trend of high and low sanitation improvement among surveyed clusters was confirmed after spatial smoothing. There exists an apparent disparity in sanitation among different wealth categories across Kenya and spatially smoothed coverage rates resulted in a closer estimation of the available statistics than raw coverage rates. Future intervention activities need to be tailored for both different wealth categories and nationally where there are areas of greater needs when resources are limited.
Chopra, Mickey; Sharkey, Alyssa; Dalmiya, Nita; Anthony, David; Binkin, Nancy
2012-10-13
Implementation of innovative strategies to improve coverage of evidence-based interventions, especially in the most marginalised populations, is a key focus of policy makers and planners aiming to improve child survival, health, and nutrition. We present a three-step approach to improvement of the effective coverage of essential interventions. First, we identify four different intervention delivery channels--ie, clinical or curative, outreach, community-based preventive or promotional, and legislative or mass media. Second, we classify which interventions' deliveries can be improved or changed within their channel or by switching to another channel. Finally, we do a meta-review of both published and unpublished reviews to examine the evidence for a range of strategies designed to overcome supply and demand bottlenecks to effective coverage of interventions that improve child survival, health, and nutrition. Although knowledge gaps exist, several strategies show promise for improving coverage of effective interventions-and, in some cases, health outcomes in children-including expanded roles for lay health workers, task shifting, reduction of financial barriers, increases in human-resource availability and geographical access, and use of the private sector. Policy makers and planners should be informed of this evidence as they choose strategies in which to invest their scarce resources. Copyright © 2012 Elsevier Ltd. All rights reserved.
2011-01-01
Background Lupinus angustifolius L, also known as narrow-leafed lupin (NLL), is becoming an important grain legume crop that is valuable for sustainable farming and is becoming recognised as a potential human health food. Recent interest is being directed at NLL to improve grain production, disease and pest management and health benefits of the grain. However, studies have been hindered by a lack of extensive genomic resources for the species. Results A NLL BAC library was constructed consisting of 111,360 clones with an average insert size of 99.7 Kbp from cv Tanjil. The library has approximately 12 × genome coverage. Both ends of 9600 randomly selected BAC clones were sequenced to generate 13985 BAC end-sequences (BESs), covering approximately 1% of the NLL genome. These BESs permitted a preliminary characterisation of the NLL genome such as organisation and composition, with the BESs having approximately 39% G:C content, 16.6% repetitive DNA and 5.4% putative gene-encoding regions. From the BESs 9966 simple sequence repeat (SSR) motifs were identified and some of these are shown to be potential markers. Conclusions The NLL BAC library and BAC-end sequences are powerful resources for genetic and genomic research on lupin. These resources will provide a robust platform for future high-resolution mapping, map-based cloning, comparative genomics and assembly of whole-genome sequencing data for the species. PMID:22014081
Improve homology search sensitivity of PacBio data by correcting frameshifts.
Du, Nan; Sun, Yanni
2016-09-01
Single-molecule, real-time sequencing (SMRT) developed by Pacific BioSciences produces longer reads than secondary generation sequencing technologies such as Illumina. The long read length enables PacBio sequencing to close gaps in genome assembly, reveal structural variations, and identify gene isoforms with higher accuracy in transcriptomic sequencing. However, PacBio data has high sequencing error rate and most of the errors are insertion or deletion errors. During alignment-based homology search, insertion or deletion errors in genes will cause frameshifts and may only lead to marginal alignment scores and short alignments. As a result, it is hard to distinguish true alignments from random alignments and the ambiguity will incur errors in structural and functional annotation. Existing frameshift correction tools are designed for data with much lower error rate and are not optimized for PacBio data. As an increasing number of groups are using SMRT, there is an urgent need for dedicated homology search tools for PacBio data. In this work, we introduce Frame-Pro, a profile homology search tool for PacBio reads. Our tool corrects sequencing errors and also outputs the profile alignments of the corrected sequences against characterized protein families. We applied our tool to both simulated and real PacBio data. The results showed that our method enables more sensitive homology search, especially for PacBio data sets of low sequencing coverage. In addition, we can correct more errors when comparing with a popular error correction tool that does not rely on hybrid sequencing. The source code is freely available at https://sourceforge.net/projects/frame-pro/ yannisun@msu.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Leung, Ross Ka-Kit; Dong, Zhi Qiang; Sa, Fei; Chong, Cheong Meng; Lei, Si Wan; Tsui, Stephen Kwok-Wing; Lee, Simon Ming-Yuen
2014-02-01
Minor variants have significant implications in quasispecies evolution, early cancer detection and non-invasive fetal genotyping but their accurate detection by next-generation sequencing (NGS) is hampered by sequencing errors. We generated sequencing data from mixtures at predetermined ratios in order to provide insight into sequencing errors and variations that can arise for which simulation cannot be performed. The information also enables better parameterization in depth of coverage, read quality and heterogeneity, library preparation techniques, technical repeatability for mathematical modeling, theory development and simulation experimental design. We devised minor variant authentication rules that achieved 100% accuracy in both testing and validation experiments. The rules are free from tedious inspection of alignment accuracy, sequencing read quality or errors introduced by homopolymers. The authentication processes only require minor variants to: (1) have minimum depth of coverage larger than 30; (2) be reported by (a) four or more variant callers, or (b) DiBayes or LoFreq, plus SNVer (or BWA when no results are returned by SNVer), and with the interassay coefficient of variation (CV) no larger than 0.1. Quantification accuracy undermined by sequencing errors could neither be overcome by ultra-deep sequencing, nor recruiting more variant callers to reach a consensus, such that consistent underestimation and overestimation (i.e. low CV) were observed. To accommodate stochastic error and adjust the observed ratio within a specified accuracy, we presented a proof of concept for the use of a double calibration curve for quantification, which provides an important reference towards potential industrial-scale fabrication of calibrants for NGS.
Luo, Chengwei; Tsementzi, Despina; Kyrpides, Nikos; Read, Timothy; Konstantinidis, Konstantinos T
2012-01-01
Next-generation sequencing (NGS) is commonly used in metagenomic studies of complex microbial communities but whether or not different NGS platforms recover the same diversity from a sample and their assembled sequences are of comparable quality remain unclear. We compared the two most frequently used platforms, the Roche 454 FLX Titanium and the Illumina Genome Analyzer (GA) II, on the same DNA sample obtained from a complex freshwater planktonic community. Despite the substantial differences in read length and sequencing protocols, the platforms provided a comparable view of the community sampled. For instance, derived assemblies overlapped in ~90% of their total sequences and in situ abundances of genes and genotypes (estimated based on sequence coverage) correlated highly between the two platforms (R(2)>0.9). Evaluation of base-call error, frameshift frequency, and contig length suggested that Illumina offered equivalent, if not better, assemblies than Roche 454. The results from metagenomic samples were further validated against DNA samples of eighteen isolate genomes, which showed a range of genome sizes and G+C% content. We also provide quantitative estimates of the errors in gene and contig sequences assembled from datasets characterized by different levels of complexity and G+C% content. For instance, we noted that homopolymer-associated, single-base errors affected ~1% of the protein sequences recovered in Illumina contigs of 10× coverage and 50% G+C; this frequency increased to ~3% when non-homopolymer errors were also considered. Collectively, our results should serve as a useful practical guide for choosing proper sampling strategies and data possessing protocols for future metagenomic studies.
Madi, Nada; Al-Nakib, Widad; Mustafa, Abu Salim; Habibi, Nazima
2018-03-01
A metagenomic approach based on target independent next-generation sequencing has become a known method for the detection of both known and novel viruses in clinical samples. This study aimed to use the metagenomic sequencing approach to characterize the viral diversity in respiratory samples from patients with respiratory tract infections. We have investigated 86 respiratory samples received from various hospitals in Kuwait between 2015 and 2016 for the diagnosis of respiratory tract infections. A metagenomic approach using the next-generation sequencer to characterize viruses was used. According to the metagenomic analysis, an average of 145, 019 reads were identified, and 2% of these reads were of viral origin. Also, metagenomic analysis of the viral sequences revealed many known respiratory viruses, which were detected in 30.2% of the clinical samples. Also, sequences of non-respiratory viruses were detected in 14% of the clinical samples, while sequences of non-human viruses were detected in 55.8% of the clinical samples. The average genome coverage of the viruses was 12% with the highest genome coverage of 99.2% for respiratory syncytial virus, and the lowest was 1% for torque teno midi virus 2. Our results showed 47.7% agreement between multiplex Real-Time PCR and metagenomics sequencing in the detection of respiratory viruses in the clinical samples. Though there are some difficulties in using this method to clinical samples such as specimen quality, these observations are indicative of the promising utility of the metagenomic sequencing approach for the identification of respiratory viruses in patients with respiratory tract infections. © 2017 Wiley Periodicals, Inc.
McMorrow, Stacey; Kenney, Genevieve M; Long, Sharon K; Goin, Dana E
2016-08-01
To assess the effects of past Medicaid eligibility expansions to parents on coverage, access to care, out-of-pocket (OOP) spending, and mental health outcomes, and consider implications for the Affordable Care Act (ACA) Medicaid expansion. Person-level data from the National Health Interview Survey (1998-2010) is used to measure insurance coverage and related outcomes for low-income parents. Using state identifiers available at the National Center for Health Statistics Research Data Center, we attach state Medicaid eligibility thresholds for parents collected from a variety of sources to NHIS observations. We use changes in the Medicaid eligibility threshold for parents within states over time to identify the effects of changes in eligibility on low-income parents. We find that expanding Medicaid eligibility increases insurance coverage, reduces unmet needs due to cost and OOP spending, and improves mental health status among low-income parents. Moreover, our findings suggest that uninsured populations in states not currently participating in the ACA Medicaid expansion would experience even larger improvements in coverage and related outcomes than those in participating states if they chose to expand eligibility. The ACA Medicaid expansion has the potential to improve a wide variety of coverage, access, financial, and health outcomes for uninsured parents in states that choose to expand coverage. © Health Research and Educational Trust.
Exposure-response relationship of neighbourhood sanitation and children's diarrhoea.
Jung, Youngmee Tiffany; Lou, Wendy; Cheng, Yu-Ling
2017-07-01
To assess the association of neighbourhood sanitation coverage with under-five children's diarrhoeal morbidity and to evaluate its exposure-response relationship. We used the Demographic and Health Surveys (DHS) of 29 developing countries in sub-Saharan Africa and South Asia, conducted between 2010 and 2014. The primary outcome was two-week incidence of diarrhoea in children under 5 years of age (N = 269014). We conducted three-level logistic regression analyses and applied cubic splines to assess the trend between neighbourhood-level coverage of improved household sanitation and diarrhoeal morbidity. A significant association between neighbourhood-level coverage of improved household sanitation and diarrhoeal morbidity (OR [95% CI] = 0.68 [0.62-0.76]) was found. Exposure-relationship analyses results showed improved sanitation coverage threshold at 0.6. We found marginal degree of association (OR [95% CI] = 0.82 [0.77-0.87]) below the threshold, which, beyond the threshold, sharply increased to OR of 0.44 (95% CI: 0.29-0.67) at sanitation coverage of 1 (i.e. neighbourhood-wide use of improved household sanitation). Similar exposure-response trends were identified for urban and rural subgroups. Our findings suggest that neighbourhood sanitation plays a key role in reducing diarrhoeal diseases and that increase in sanitation coverage may only have minimal impact on diarrhoeal illness, unless sufficiently high coverage is achieved. © 2017 John Wiley & Sons Ltd.
Measuring populations to improve vaccination coverage
NASA Astrophysics Data System (ADS)
Bharti, Nita; Djibo, Ali; Tatem, Andrew J.; Grenfell, Bryan T.; Ferrari, Matthew J.
2016-10-01
In low-income settings, vaccination campaigns supplement routine immunization but often fail to achieve coverage goals due to uncertainty about target population size and distribution. Accurate, updated estimates of target populations are rare but critical; short-term fluctuations can greatly impact population size and susceptibility. We use satellite imagery to quantify population fluctuations and the coverage achieved by a measles outbreak response vaccination campaign in urban Niger and compare campaign estimates to measurements from a post-campaign survey. Vaccine coverage was overestimated because the campaign underestimated resident numbers and seasonal migration further increased the target population. We combine satellite-derived measurements of fluctuations in population distribution with high-resolution measles case reports to develop a dynamic model that illustrates the potential improvement in vaccination campaign coverage if planners account for predictable population fluctuations. Satellite imagery can improve retrospective estimates of vaccination campaign impact and future campaign planning by synchronizing interventions with predictable population fluxes.
Measuring populations to improve vaccination coverage
Bharti, Nita; Djibo, Ali; Tatem, Andrew J.; Grenfell, Bryan T.; Ferrari, Matthew J.
2016-01-01
In low-income settings, vaccination campaigns supplement routine immunization but often fail to achieve coverage goals due to uncertainty about target population size and distribution. Accurate, updated estimates of target populations are rare but critical; short-term fluctuations can greatly impact population size and susceptibility. We use satellite imagery to quantify population fluctuations and the coverage achieved by a measles outbreak response vaccination campaign in urban Niger and compare campaign estimates to measurements from a post-campaign survey. Vaccine coverage was overestimated because the campaign underestimated resident numbers and seasonal migration further increased the target population. We combine satellite-derived measurements of fluctuations in population distribution with high-resolution measles case reports to develop a dynamic model that illustrates the potential improvement in vaccination campaign coverage if planners account for predictable population fluctuations. Satellite imagery can improve retrospective estimates of vaccination campaign impact and future campaign planning by synchronizing interventions with predictable population fluxes. PMID:27703191
PDBe: Protein Data Bank in Europe
Velankar, S.; Alhroub, Y.; Best, C.; Caboche, S.; Conroy, M. J.; Dana, J. M.; Fernandez Montecelo, M. A.; van Ginkel, G.; Golovin, A.; Gore, S. P.; Gutmanas, A.; Haslam, P.; Hendrickx, P. M. S.; Heuson, E.; Hirshberg, M.; John, M.; Lagerstedt, I.; Mir, S.; Newman, L. E.; Oldfield, T. J.; Patwardhan, A.; Rinaldi, L.; Sahni, G.; Sanz-García, E.; Sen, S.; Slowley, R.; Suarez-Uruena, A.; Swaminathan, G. J.; Symmons, M. F.; Vranken, W. F.; Wainwright, M.; Kleywegt, G. J.
2012-01-01
The Protein Data Bank in Europe (PDBe; pdbe.org) is a partner in the Worldwide PDB organization (wwPDB; wwpdb.org) and as such actively involved in managing the single global archive of biomacromolecular structure data, the PDB. In addition, PDBe develops tools, services and resources to make structure-related data more accessible to the biomedical community. Here we describe recently developed, extended or improved services, including an animated structure-presentation widget (PDBportfolio), a widget to graphically display the coverage of any UniProt sequence in the PDB (UniPDB), chemistry- and taxonomy-based PDB-archive browsers (PDBeXplore), and a tool for interactive visualization of NMR structures, corresponding experimental data as well as validation and analysis results (Vivaldi). PMID:22110033
Novel ITS1 Fungal Primers for Characterization of the Mycobiome
Usyk, Mykhaylo; Zolnik, Christine P.; Patel, Hitesh; Levi, Michael H.
2017-01-01
ABSTRACT Studies of the human microbiome frequently omit characterization of fungal communities (the mycobiome), which limits our ability to investigate how fungal communities influence human health. The internal transcribed spacer 1 (ITS1) region of the eukaryotic ribosomal cluster has features allowing for wide taxonomic coverage and has been recognized as a suitable barcode region for species-level identification of fungal organisms. We developed custom ITS1 primer sets using iterative alignment refinement. Primer performance was evaluated using in silico testing and experimental testing of fungal cultures and human samples. Using an expanded novel reference database, SIS (18S-ITS1-5.8S), the newly designed primers showed an average in silico taxonomic coverage of 79.9% ± 7.1% compared to a coverage of 44.6% ± 13.2% using previously published primers (P = 0.05). The newly described primer sets recovered an average of 21,830 ± 225 fungal reads from fungal isolate culture samples, whereas the previously published primers had an average of 3,305 ± 1,621 reads (P = 0.03). Of note was an increase in the taxonomic coverage of the Candida genus, which went from a mean coverage of 59.5% ± 13% to 100.0% ± 0.0% (P = 0.0015) comparing the previously described primers to the new primers, respectively. The newly developed ITS1 primer sets significantly improve general taxonomic coverage of fungal communities infecting humans and increased read depth by an order of magnitude over the best-performing published primer set tested. The overall best-performing primer pair in terms of taxonomic coverage and read recovery, ITS1-30F/ITS1-217R, will aid in advancing research in the area of the human mycobiome. IMPORTANCE The mycobiome constitutes all the fungal organisms within an environment or biological niche. The fungi are eukaryotes, are extremely heterogeneous, and include yeasts and molds that colonize humans as part of the microbiome. In addition, fungi can also infect humans and cause disease. Characterization of the bacterial component of the microbiome was revolutionized by 16S rRNA gene fragment amplification, next-generation sequencing technologies, and bioinformatics pipelines. Characterization of the mycobiome has often not been included in microbiome studies because of limitations in amplification systems. This report revisited the selection of PCR primers that amplify the fungal ITS1 region. We have identified primers with superior identification of fungi present in the database. We have compared the new primer sets against those previously used in the literature and show a significant improvement in read count and taxon identification. These primers should facilitate the study of fungi in human physiology and disease states. PMID:29242834
Phoummalaysith, Bounfeng; Yamamoto, Eiko; Xeuatvongsa, Anonh; Louangpradith, Viengsakhone; Keohavong, Bounxou; Saw, Yu Mon; Hamajima, Nobuyuki
2018-05-03
Routine vaccination is administered free of charge to all children under one year old in Lao People's Democratic Republic (Lao PDR) and the national goal is to achieve at least 95% coverage with all vaccines included in the national immunization program by 2025. In this study, factors related to the immunization system and characteristics of provinces and districts in Lao PDR were examined to evaluate the association with routine immunization coverage. Coverage rates for Bacillus Calmette-Guerin (BCG), Diphtheria-Tetanus-Pertussis-Hepatitis B (DTP-HepB), DTP-HepB-Hib (Haemophilus influenzae type B), polio (OPV), and measles (MCV1) vaccines from 2002 to 2014 collected through regular reporting system, were used to identify the immunization coverage trends in Lao PDR. Correlation analysis was performed using immunization coverage, characteristics of provinces or districts (population, population density, and proportion of poor villages and high-risk villages), and factors related to immunization service (including the proportions of the following: villages served by health facility levels, vaccine session types, and presence of well-functioning cold chain equipment). To determine factors associated with low coverage, provinces were categorized based on 80% of DTP-HepB-Hib3 coverage (<80% = low group; ≥80% = high group). Coverages of BCG, DTP-HepB3, OPV3 and MCV1 increased gradually from 2007 to 2014 (82.2-88.3% in 2014). However, BCG coverage showed the least improvement from 2002 to 2014. The coverage of each vaccine correlated with the coverage of the other vaccines and DTP-HepB-Hib dropout rate in provinces as well as districts. The provinces with low immunization coverage were correlated with higher proportions of poor villages. Routine immunization coverage has been improving in the last 13 years, but the national goal is not yet reached in Lao PDR. The results of this study suggest that BCG coverage and poor villages should be targeted to improve nationwide coverage. Copyright © 2018 Elsevier Ltd. All rights reserved.
Medication coverage for lawmakers may worsen access for everyone else.
Taglione, Michael S; Boozary, Andrew; Persaud, Nav
2018-03-01
Despite numerous recommendations for universal public coverage of prescription drugs in Canada based on evidence that millions of Canadians cannot afford medications, no province or territory has adopted first dollar coverage for all residents. However, one group unaffected by the lack of public coverage are lawmakers. Lawmakers receive excellent drug coverage plans for themselves and their immediate families. Evidence suggests that lawmakers' decisions are influenced by their personal circumstances; in this case, they are insulated from the effects of poor access to medications by their drug coverage plans. In contrast, a patchwork system of 46 programs across Canada provides some drug coverage to vulnerable populations. Reducing the disparity in prescription drug access between Canadian lawmakers and the public may promote progress towards better medication access for everyone. This could be achieved either by reducing lawmaker coverage or improving upon the public patchwork system. Since the goal should be to improve the overall access of medications for all Canadians, lawmakers included, the latter method is preferred. A universal drug plan with first dollar coverage could replace the current patchwork system and expand coverage to all Canadians. Copyright © 2017 Elsevier Inc. All rights reserved.
Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology.
Otto, Thomas D; Sanders, Mandy; Berriman, Matthew; Newbold, Chris
2010-07-15
The accuracy of reference genomes is important for downstream analysis but a low error rate requires expensive manual interrogation of the sequence. Here, we describe a novel algorithm (Iterative Correction of Reference Nucleotides) that iteratively aligns deep coverage of short sequencing reads to correct errors in reference genome sequences and evaluate their accuracy. Using Plasmodium falciparum (81% A + T content) as an extreme example, we show that the algorithm is highly accurate and corrects over 2000 errors in the reference sequence. We give examples of its application to numerous other eukaryotic and prokaryotic genomes and suggest additional applications. The software is available at http://icorn.sourceforge.net
Whole-exome sequencing identified a variant in EFTUD2 gene in establishing a genetic diagnosis.
Rengasamy Venugopalan, S; Farrow, E G; Lypka, M
2017-06-01
Craniofacial anomalies are complex and have an overlapping phenotype. Mandibulofacial Dysostosis and Oculo-Auriculo-Vertebral Spectrum are conditions that share common craniofacial phenotype and present a challenge in arriving at a diagnosis. In this report, we present a case of female proband who was given a differential diagnosis of Treacher Collins syndrome or Hemifacial Microsomia without certainty. Prior genetic testing reported negative for 22q deletion and FGFR screenings. The objective of this study was to demonstrate the critical role of whole-exome sequencing in establishing a genetic diagnosis of the proband. The participants were 14½-year-old affected female proband/parent trio. Proband/parent trio were enrolled in the study. Surgical tissue sample from the proband and parental blood samples were collected and prepared for whole-exome sequencing. Illumina HiSeq 2500 instrument was used for sequencing (125 nucleotide reads/84X coverage). Analyses of variants were performed using custom-developed software, RUNES and VIKING. Variant analyses following whole-exome sequencing identified a heterozygous de novo pathogenic variant, c.259C>T (p.Gln87*), in EFTUD2 (NM_004247.3) gene in the proband. Previous studies have reported that the variants in EFTUD2 gene were associated with Mandibulofacial Dysostosis with Microcephaly. Patients with facial asymmetry, micrognathia, choanal atresia and microcephaly should be analyzed for variants in EFTUD2 gene. Next-generation sequencing techniques, such as whole-exome sequencing offer great promise to improve the understanding of etiologies of sporadic genetic diseases. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Yang, Yang; Kramer, Christopher M; Shaw, Peter W; Meyer, Craig H; Salerno, Michael
2016-11-01
To design and evaluate two-dimensional (2D) L1-SPIRiT accelerated spiral pulse sequences for first-pass myocardial perfusion imaging with whole heart coverage capable of measuring eight slices at 2 mm in-plane resolution at heart rates up to 125 beats per minute (BPM). Combinations of five different spiral trajectories and four k-t sampling patterns were retrospectively simulated in 25 fully sampled datasets and reconstructed with L1-SPIRiT to determine the best combination of parameters. Two candidate sequences were prospectively evaluated in 34 human subjects to assess in vivo performance. A dual density broad transition spiral trajectory with either angularly uniform or golden angle in time k-t sampling pattern had the largest structural similarity and smallest root mean square error from the retrospective simulation, and the L1-SPIRiT reconstruction had well-preserved temporal dynamics. In vivo data demonstrated that both of the sampling patterns could produce high quality perfusion images with whole-heart coverage. First-pass myocardial perfusion imaging using accelerated spirals with optimized trajectory and k-t sampling pattern can produce high quality 2D perfusion images with whole-heart coverage at the heart rates up to 125 BPM. Magn Reson Med 76:1375-1387, 2016. © 2015 International Society for Magnetic Resonance in Medicine. © 2015 International Society for Magnetic Resonance in Medicine.
Identification and correction of systematic error in high-throughput sequence data
2011-01-01
Background A feature common to all DNA sequencing technologies is the presence of base-call errors in the sequenced reads. The implications of such errors are application specific, ranging from minor informatics nuisances to major problems affecting biological inferences. Recently developed "next-gen" sequencing technologies have greatly reduced the cost of sequencing, but have been shown to be more error prone than previous technologies. Both position specific (depending on the location in the read) and sequence specific (depending on the sequence in the read) errors have been identified in Illumina and Life Technology sequencing platforms. We describe a new type of systematic error that manifests as statistically unlikely accumulations of errors at specific genome (or transcriptome) locations. Results We characterize and describe systematic errors using overlapping paired reads from high-coverage data. We show that such errors occur in approximately 1 in 1000 base pairs, and that they are highly replicable across experiments. We identify motifs that are frequent at systematic error sites, and describe a classifier that distinguishes heterozygous sites from systematic error. Our classifier is designed to accommodate data from experiments in which the allele frequencies at heterozygous sites are not necessarily 0.5 (such as in the case of RNA-Seq), and can be used with single-end datasets. Conclusions Systematic errors can easily be mistaken for heterozygous sites in individuals, or for SNPs in population analyses. Systematic errors are particularly problematic in low coverage experiments, or in estimates of allele-specific expression from RNA-Seq data. Our characterization of systematic error has allowed us to develop a program, called SysCall, for identifying and correcting such errors. We conclude that correction of systematic errors is important to consider in the design and interpretation of high-throughput sequencing experiments. PMID:22099972
Leyvraz, Magali; Rohner, Fabian; Konan, Amoin G; Esso, Lasme J C E; Woodruff, Bradley A; Norte, Augusto; Adiko, Adiko F; Bonfoh, Bassirou; Aaron, Grant J
2016-01-01
Poor complementary feeding practices among infants and young children in Côte d'Ivoire are major contributing factors to the country's high burden of malnutrition. As part of a broad effort to address this issue, an affordable, nutritious, and locally produced fortified complementary food product was launched in the Côte d'Ivoire in 2011. The objective of the current research was to assess various levels of coverage of the program and to identify coverage barriers. A cross-sectional household survey was conducted among caregivers of children less than 2-years of age living in Abidjan, Côte d'Ivoire. Four measures of coverage were assessed: "message coverage" (i.e., has the caregiver ever heard of the product?), "contact coverage" (i.e., has the caregiver ever fed the child the product?), "partial coverage" (i.e., has the caregiver fed the child the product in the previous month?), and "effective coverage" (i.e., has the caregiver fed the child the product in the previous 7 days?). A total of 1,113 caregivers with children between 0 and 23 months of age were interviewed. Results showed high message coverage (85.0%), moderate contact coverage (37.8%), and poor partial and effective coverages (8.8% and 4.6%, respectively). Product awareness was lower among caregivers from poorer households, but partial and effective coverages were comparable in both poor and non-poor groups. Infant and young child feeding (IYCF) practices were generally poor and did not appear to have improved since previous assessments. In conclusion, the results from the present study indicate that availability on the market and high awareness among the target population is not sufficient to achieve high and effective coverage. With market-based delivery models, significant efforts are needed to improve demand. Moreover, given the high prevalence of malnutrition and poor IYCF practices, additional modes of delivering IYCF interventions and improving IYCF practices should be considered.
... quality care for older women, and ends the gender discrimination that requires women to pay more for the ... Coverage Preventive Health Services Improved Medicare Coverage Ending Gender Discrimination in Premiums Expanded Insurance Coverage Endnotes Download "rb. ...
Anderson, Tavis K; Laegreid, William W; Cerutti, Francesco; Osorio, Fernando A; Nelson, Eric A; Christopher-Hennings, Jane; Goldberg, Tony L
2012-06-15
The extraordinary genetic and antigenic variability of RNA viruses is arguably the greatest challenge to the development of broadly effective vaccines. No single viral variant can induce sufficiently broad immunity, and incorporating all known naturally circulating variants into one multivalent vaccine is not feasible. Furthermore, no objective strategies currently exist to select actual viral variants that should be included or excluded in polyvalent vaccines. To address this problem, we demonstrate a method based on graph theory that quantifies the relative importance of viral variants. We demonstrate our method through application to the envelope glycoprotein gene of a particularly diverse RNA virus of pigs: porcine reproductive and respiratory syndrome virus (PRRSV). Using distance matrices derived from sequence nucleotide difference, amino acid difference and evolutionary distance, we constructed viral networks and used common network statistics to assign each sequence an objective ranking of relative 'importance'. To validate our approach, we use an independent published algorithm to score our top-ranked wild-type variants for coverage of putative T-cell epitopes across the 9383 sequences in our dataset. Top-ranked viruses achieve significantly higher coverage than low-ranked viruses, and top-ranked viruses achieve nearly equal coverage as a synthetic mosaic protein constructed in silico from the same set of 9383 sequences. Our approach relies on the network structure of PRRSV but applies to any diverse RNA virus because it identifies subsets of viral variants that are most important to overall viral diversity. We suggest that this method, through the objective quantification of variant importance, provides criteria for choosing viral variants for further characterization, diagnostics, surveillance and ultimately polyvalent vaccine development.
Goh, Falicia; Allen, Michelle A; Leuko, Stefan; Kawaguchi, Tomohiro; Decho, Alan W; Burns, Brendan P; Neilan, Brett A
2009-04-01
The stromatolites at Shark Bay, Western Australia, are analogues of some of the oldest evidence of life on Earth. The aim of this study was to identify and spatially characterize the specific microbial communities associated with Shark Bay intertidal columnar stromatolites. Conventional culturing methods and construction of 16S rDNA clone libraries from community genomic DNA with both universal and specific PCR primers were employed. The estimated coverage, richness and diversity of stromatolite microbial populations were compared with earlier studies on these ecosystems. The estimated coverage for all clone libraries indicated that population coverage was comprehensive. Phylogenetic analyses of stromatolite and surrounding seawater sequences were performed in ARB with the Greengenes database of full-length non-chimaeric 16S rRNA genes. The communities identified exhibited extensive diversity. The most abundant sequences from the stromatolites were alpha- and gamma-proteobacteria (58%), whereas the cyanobacterial community was characterized by sequences related to the genera Euhalothece, Gloeocapsa, Gloeothece, Chroococcidiopsis, Dermocarpella, Acaryochloris, Geitlerinema and Schizothrix. All clones from the archaeal-specific clone libraries were related to the halophilic archaea; however, no archaeal sequence was identified from the surrounding seawater. Fluorescence in situ hybridization also revealed stromatolite surfaces to be dominated by unicellular cyanobacteria, in contrast to the sub-surface archaea and sulphate-reducing bacteria. This study is the first to compare the microbial composition of morphologically similar stromatolites over time and examine the spatial distribution of specific microorganismic groups in these intertidal structures and the surrounding seawater at Shark Bay. The results provide a platform for identifying the key microbial physiology groups and their potential roles in modern stromatolite morphogenesis and ecology.
Shape-based retrieval of CNV regions in read coverage data.
Hong, Sangkyun; Yoon, Jeehee; Hong, Dongwan; Lee, Unjoo; Kim, Baeksop; Park, Sanghyun
2014-01-01
This study proposes a novel copy number variation (CNV) detection method, CNV_shape, based on variations in the shape of the read coverage data which are obtained from millions of short reads aligned to a reference sequence. The proposed method carries out two transforms, mean shift transform and mean slope transform, to extract the shape of a CNV more precisely from real human data, which are vulnerable to experimental and biological noises. The mean shift transform is a procedure for gaining a preliminary estimation of the CNVs by statistically evaluating moving averages of given read coverage data. The mean slope transform extracts candidate CNVs by filtering out non-stationary sub-regions from each of the primary CNVs pre-estimated in the mean shift procedure. Each of the candidate CNVs is merged with neighbours depending on the merging score to be finally identified as a putative CNV, where the merging score is estimated by the ratio of the positions with non-zero values of the mean shift transform to the total length of the region including two neighbouring candidate CNVs and the interval between them. The proposed CNV detection method was validated experimentally with simulated data and real human data. The simulated data with coverage in the range of 1x to 10x were generated for various sampling sizes and p-values. Five individual human genomes were used as real human data. The results show that relatively small CNVs (> 1 kbp) can be detected from low coverage (> 1.7x) data. The results also reveal that, in contrast to conventional methods, performance improvement from 8.18 to 87.90% was achieved in CNV_shape. The outcomes suggest that the proposed method is very effective in reducing noises inherent in real data as well as in detecting CNVs of various sizes and types.
Lau, T K; Cheung, S W; Lo, P S S; Pursley, A N; Chan, M K; Jiang, F; Zhang, H; Wang, W; Jong, L F J; Yuen, O K C; Chan, H Y C; Chan, W S K; Choy, K W
2014-03-01
To review the performance of non-invasive prenatal testing (NIPT) by low-coverage whole-genome sequencing of maternal plasma DNA at a single center. The NIPT result and pregnancy outcome of 1982 consecutive cases were reviewed. NIPT was based on low coverage (0.1×) whole-genome sequencing of maternal plasma DNA. All subjects were contacted for pregnancy and fetal outcome. Of the 1982 NIPT tests, a repeat blood sample was required in 23 (1.16%). In one case, a conclusive report could not be issued, probably because of an abnormal vanished twin fetus. NIPT was positive for common trisomies in 29 cases (23 were trisomy 21, four were trisomy 18 and two were trisomy 13); all were confirmed by prenatal karyotyping (specificity=100%). In addition, 11 cases were positive for sex-chromosomal abnormalities (SCA), and nine cases were positive for other aneuploidies or deletion/duplication. Fourteen of these 20 subjects agreed to undergo further investigations, and the abnormality was found to be of fetal origin in seven, confined placental mosaicism (CPM) in four, of maternal origin in two and not confirmed in one. Overall, 85.7% of the NIPT-suspected SCA were of fetal origin, and 66.7% of the other abnormalities were caused by CPM. Two of the six cases suspected or confirmed to have CPM were complicated by early-onset growth restriction requiring delivery before 34 weeks. Fetal outcome of the NIPT-negative cases was ascertained in 1645 (85.15%). Three chromosomal abnormalities were not detected by NIPT, including one case each of a balanced translocation, unbalanced translocation and triploidy. There were no known false negatives involving the common trisomies (sensitivity=100%). Low-coverage whole-genome sequencing of maternal plasma DNA was highly accurate in detecting common trisomies. It also enabled the detection of other aneuploidies and structural chromosomal abnormalities with high positive predictive value. Copyright © 2013 ISUOG. Published by John Wiley & Sons Ltd.
Biocrusts role on nitrogen cycle and microbial communities from underlying soils in drylands
NASA Astrophysics Data System (ADS)
Anguita-Maeso, Manuel; Miralles*, Isabel; van Wesemael, Bas; Lázaro, Roberto; Ortega, Raúl; Garcia-Salcedo, José Antonio; Soriano**, Miguel
2017-04-01
Biocrusts are distributed in arid areas widely covering most of the soil surface and playing an essential role in the functioning of nitrogen cycle. The absence of biocrust coverage might affect the soil nitrogen content and the quantity and diversity of microbial communities in underlying biocrust soils. To analyse this mater, we have collected three underlying soils biocrusts samples dominated by the lichen Diploschistes diacapsis and Squamarina lentigera from Tabernas desert (southeast of Spain) at two extremes of its spatial distribution range: one with a high percentage of biocrust coverage and other with a huge degradation and low percentage of biocrust coverage in order to determine differences on the total nitrogen content and microbial communities from these underlying soils. DNA from these samples was isolated though a commercial kit and it was used as template for metagenomic analysis. We accomplished a sequencing of the amplicons V4-V5 of the 16S rRNA gene with Next-Generation Sequencing (NGS) Illumina MiSeq platform and a relative quantity of bacteria (rRNA 16S) and fungi (ITS1-5.8S) were conducted by quantitative qPCR. Total nitrogen was measured by the Kjeldahl method. Statistical analyses were based on ANOVAs, heatmap and Generalized Linear Models (GLM). The results showed 1.89E+09 bacteria per gram of soil in the high biocrust coverage position while 6.98E+08 microorganisms per gram of soil were found in the less favourable position according to the lower percentage of biocrust coverage. Similarly, 1.19E+12 was the amount of fungi per gram of soil located in the favourable position with higher biocrust coverage and 7.62E+11 was found in the unfavourable position. Furthermore, the soil under high percentage of biocrust coverage showed the greatest total nitrogen content (1.1 g kg-1) whereas the soil sampled under depressed percentage of biocrust coverage displayed the fewest quantity of total nitrogen content (0.9 g kg-1). Metagenomic and statistical analysis exhibited different bacteria communities according to underlying soils with unlike percentage of biocrust coverage. Opitutus and Adhaeribacter predominated in soil under high biocrust coverage percentage whereas Chelatococcus was found as prevalent bacteria community in soils under low biocrust coverage percentage. Our data illustrate that the percentage of biocrust coverage influence the total nitrogen content in underlying biocrust soils and also affects the amount and the variety of bacteria communities in these underlying soils. (*) Financial support by Marie Curie Intra-European Fellowship (FP7-577 PEOPLE-2013-IEF, Proposal n° 623393) and (**) by the Ministerio de Economía y Competitividad (MINECO) cofinanced with FEDER funds (project CGL2015-71709-R) is acknowledged.
Ruan, Guihua; Wei, Meiping; Chen, Zhengyi; Su, Rihui; Du, Fuyou; Zheng, Yanjie
2014-09-15
A novel large-volume immobilized enzyme reactor (IMER) on small column was prepared with organic-inorganic hybrid silica particles and applied for fast (10 min) and oriented digestion of protein. At first, a thin enzyme support layer was formed in the bottom of the small column by polymerization with α-methacrylic acid and dimethacrylate. After that, amino SiO2 particles was prepared by the sol-gel method with tetraethoxysilane and 3-aminopropyltriethoxysilane. Subsequently, the amino SiO2 particles were activated by glutaraldehyde for covalent immobilization of trypsin. Digestive capability of large-volume IMER for proteins was investigated by using bovine serum albumin (BSA), cytochrome c (Cyt-c) as model proteins. Results showed that although the sequence coverage of the BSA (20%) and Cyt-c (19%) was low, the large-volume IMER could produce peptides with stable specific sequence at 101-105, 156-160, 205-209, 212-218, 229-232, 257-263 and 473-451 of the amino sequence of BSA when digesting 1mg/mL BSA. Eight of common peptides were observed during each of the ten runs of large-volume IMER. Besides, the IMER could be easily regenerated by reactivating with GA and cross-linking with trypsin after breaking the -C=N- bond by 0.01 M HCl. The sequence coverage of BSA from regenerated IMER increased to 25% comparing the non-regenerated IMER (17%). 14 common peptides. accounting for 87.5% of first use of IMER, were produced both with IMER and regenerated IMER. When the IMER was applied for ginkgo albumin digestion, the sequence coverage of two main proteins of ginkgo, ginnacin and legumin, was 56% and 55%, respectively. (Reviewer 2) Above all, the fast and selective digestion property of the large-volume IMER indicated that the regenerative IMER could be tentatively used for the production of potential bioactive peptides and the study of oriented protein digestion. Copyright © 2014 Elsevier B.V. All rights reserved.
Astale, Tigist; Sata, Eshetu; Zerihun, Mulat; Nute, Andrew W; Stewart, Aisha E P; Gessese, Demelash; Ayenew, Gedefaw; Melak, Berhanu; Chanyalew, Melsew; Tadesse, Zerihun; Callahan, E Kelly; Nash, Scott D
2018-02-01
Trachoma is the leading infectious cause of blindness worldwide. In communities where the district level prevalence of trachomatous inflammation-follicular among children ages 1-9 years is ≥5%, WHO recommends annual mass drug administration (MDA) of antibiotics with the aim of at least 80% coverage. Population-based post-MDA coverage surveys are essential to understand the effectiveness of MDA programs, yet published reports from trachoma programs are rare. In the Amhara region of Ethiopia, a population-based MDA coverage survey was conducted 3 weeks following the 2016 MDA to estimate the zonal prevalence of self-reported drug coverage in all 10 administrative zones. Survey households were selected using a multi-stage cluster random sampling design and all individuals in selected households were presented with a drug sample and asked about taking the drug during the campaign. Zonal estimates were weighted and confidence intervals were calculated using survey procedures. Self-reported drug coverage was then compared with regional reported administrative coverage. Region-wide, 24,248 individuals were enumerated, of which, 20,942 (86.4%) individuals were present. The regional self-reported antibiotic coverage was 76.8% (95%Confidence Interval (CI):69.3-82.9%) in the population overall and 77.4% (95%CI = 65.7-85.9%) among children ages 1-9 years old. Zonal coverage ranged from 67.8% to 90.2%. Five out of 10 zones achieved a coverage >80%. In all zones, the reported administrative coverage was greater than 90% and was considerably higher than self-reported MDA coverage. Main reasons reported for MDA campaign non-attendance included being physically unable to get to MDA site (22.5%), traveling (20.6%), and not knowing about the campaign (21.0%). MDA refusal was low (2.8%) in this population. Although self-reported MDA coverage in Amhara was greater than 80% in some zones, programmatic improvements are warranted throughout Amhara to achieve higher coverage. These results will be used to enhance community mobilization and improve training for MDA distributors and supervisors to improve coverage in future MDAs.
Watchdog activity monitor (WAM) for use wth high coverage processor self-test
NASA Technical Reports Server (NTRS)
Tulpule, Bhalchandra R. (Inventor); Crosset, III, Richard W. (Inventor); Versailles, Richard E. (Inventor)
1988-01-01
A high fault coverage, instruction modeled self-test for a signal processor in a user environment is disclosed. The self-test executes a sequence of sub-tests and issues a state transition signal upon the execution of each sub-test. The self-test may be combined with a watchdog activity monitor (WAM) which provides a test-failure signal in the presence of a counted number of state transitions not agreeing with an expected number. An independent measure of time may be provided in the WAM to increase fault coverage by checking the processor's clock. Additionally, redundant processor systems are protected from inadvertent unsevering of a severed processor using a unique unsever arming technique and apparatus.
Desai, Aarti; Marwah, Veer Singh; Yadav, Akshay; Jha, Vineet; Dhaygude, Kishor; Bangar, Ujwala; Kulkarni, Vivek; Jere, Abhay
2013-01-01
Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6-40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.
Mutation Scanning in Wheat by Exon Capture and Next-Generation Sequencing.
King, Robert; Bird, Nicholas; Ramirez-Gonzalez, Ricardo; Coghill, Jane A; Patil, Archana; Hassani-Pak, Keywan; Uauy, Cristobal; Phillips, Andrew L
2015-01-01
Targeted Induced Local Lesions in Genomes (TILLING) is a reverse genetics approach to identify novel sequence variation in genomes, with the aims of investigating gene function and/or developing useful alleles for breeding. Despite recent advances in wheat genomics, most current TILLING methods are low to medium in throughput, being based on PCR amplification of the target genes. We performed a pilot-scale evaluation of TILLING in wheat by next-generation sequencing through exon capture. An oligonucleotide-based enrichment array covering ~2 Mbp of wheat coding sequence was used to carry out exon capture and sequencing on three mutagenised lines of wheat containing previously-identified mutations in the TaGA20ox1 homoeologous genes. After testing different mapping algorithms and settings, candidate SNPs were identified by mapping to the IWGSC wheat Chromosome Survey Sequences. Where sequence data for all three homoeologues were found in the reference, mutant calls were unambiguous; however, where the reference lacked one or two of the homoeologues, captured reads from these genes were mis-mapped to other homoeologues, resulting either in dilution of the variant allele frequency or assignment of mutations to the wrong homoeologue. Competitive PCR assays were used to validate the putative SNPs and estimate cut-off levels for SNP filtering. At least 464 high-confidence SNPs were detected across the three mutagenized lines, including the three known alleles in TaGA20ox1, indicating a mutation rate of ~35 SNPs per Mb, similar to that estimated by PCR-based TILLING. This demonstrates the feasibility of using exon capture for genome re-sequencing as a method of mutation detection in polyploid wheat, but accurate mutation calling will require an improved genomic reference with more comprehensive coverage of homoeologues.
Ramu, P; Kassahun, B; Senthilvel, S; Ashok Kumar, C; Jayashree, B; Folkertsma, R T; Reddy, L Ananda; Kuruvinashetti, M S; Haussmann, B I G; Hash, C T
2009-11-01
The sequencing and detailed comparative functional analysis of genomes of a number of select botanical models open new doors into comparative genomics among the angiosperms, with potential benefits for improvement of many orphan crops that feed large populations. In this study, a set of simple sequence repeat (SSR) markers was developed by mining the expressed sequence tag (EST) database of sorghum. Among the SSR-containing sequences, only those sharing considerable homology with rice genomic sequences across the lengths of the 12 rice chromosomes were selected. Thus, 600 SSR-containing sorghum EST sequences (50 homologous sequences on each of the 12 rice chromosomes) were selected, with the intention of providing coverage for corresponding homologous regions of the sorghum genome. Primer pairs were designed and polymorphism detection ability was assessed using parental pairs of two existing sorghum mapping populations. About 28% of these new markers detected polymorphism in this 4-entry panel. A subset of 55 polymorphic EST-derived SSR markers were mapped onto the existing skeleton map of a recombinant inbred population derived from cross N13 x E 36-1, which is segregating for Striga resistance and the stay-green component of terminal drought tolerance. These new EST-derived SSR markers mapped across all 10 sorghum linkage groups, mostly to regions expected based on prior knowledge of rice-sorghum synteny. The ESTs from which these markers were derived were then mapped in silico onto the aligned sorghum genome sequence, and 88% of the best hits corresponded to linkage-based positions. This study demonstrates the utility of comparative genomic information in targeted development of markers to fill gaps in linkage maps of related crop species for which sufficient genomic tools are not available.
Launch mission summary: INTELSAT 5 (F4) ATLAS/CENTAUR-58
NASA Technical Reports Server (NTRS)
1982-01-01
The launch vehicle, spacecraft, and mission are described. Information relative to launch windows, flight plan, trajectory, and radar and telemetry coverage are included with brief sequence of flight events.
Cho, Jin-Young; Lee, Hyoung-Joo; Jeong, Seul-Ki; Kim, Kwang-Youl; Kwon, Kyung-Hoon; Yoo, Jong Shin; Omenn, Gilbert S; Baker, Mark S; Hancock, William S; Paik, Young-Ki
2015-12-04
Approximately 2.9 billion long base-pair human reference genome sequences are known to encode some 20 000 representative proteins. However, 3000 proteins, that is, ~15% of all proteins, have no or very weak proteomic evidence and are still missing. Missing proteins may be present in rare samples in very low abundance or be only temporarily expressed, causing problems in their detection and protein profiling. In particular, some technical limitations cause missing proteins to remain unassigned. For example, current mass spectrometry techniques have high limits and error rates for the detection of complex biological samples. An insufficient proteome coverage in a reference sequence database and spectral library also raises major issues. Thus, the development of a better strategy that results in greater sensitivity and accuracy in the search for missing proteins is necessary. To this end, we used a new strategy, which combines a reference spectral library search and a simulated spectral library search, to identify missing proteins. We built the human iRefSPL, which contains the original human reference spectral library and additional peptide sequence-spectrum match entries from other species. We also constructed the human simSPL, which contains the simulated spectra of 173 907 human tryptic peptides determined by MassAnalyzer (version 2.3.1). To prove the enhanced analytical performance of the combination of the human iRefSPL and simSPL methods for the identification of missing proteins, we attempted to reanalyze the placental tissue data set (PXD000754). The data from each experiment were analyzed using PeptideProphet, and the results were combined using iProphet. For the quality control, we applied the class-specific false-discovery rate filtering method. All of the results were filtered at a false-discovery rate of <1% at the peptide and protein levels. The quality-controlled results were then cross-checked with the neXtProt DB (2014-09-19 release). The two spectral libraries, iRefSPL and simSPL, were designed to ensure no overlap of the proteome coverage. They were shown to be complementary to spectral library searching and significantly increased the number of matches. From this trial, 12 new missing proteins were identified that passed the following criterion: at least 2 peptides of 7 or more amino acids in length or one of 9 or more amino acids in length with one or more unique sequences. Thus, the iRefSPL and simSPL combination can be used to help identify peptides that have not been detected by conventional sequence database searches with improved sensitivity and a low error rate.
Single molecule sequencing of the M13 virus genome without amplification
Zhao, Luyang; Deng, Liwei; Li, Gailing; Jin, Huan; Cai, Jinsen; Shang, Huan; Li, Yan; Wu, Haomin; Xu, Weibin; Zeng, Lidong; Zhang, Renli; Zhao, Huan; Wu, Ping; Zhou, Zhiliang; Zheng, Jiao; Ezanno, Pierre; Yang, Andrew X.; Yan, Qin; Deem, Michael W.; He, Jiankui
2017-01-01
Next generation sequencing (NGS) has revolutionized life sciences research. However, GC bias and costly, time-intensive library preparation make NGS an ill fit for increasing sequencing demands in the clinic. A new class of third-generation sequencing platforms has arrived to meet this need, capable of directly measuring DNA and RNA sequences at the single-molecule level without amplification. Here, we use the new GenoCare single-molecule sequencing platform from Direct Genomics to sequence the genome of the M13 virus. Our platform detects single-molecule fluorescence by total internal reflection microscopy, with sequencing-by-synthesis chemistry. We sequenced the genome of M13 to a depth of 316x, with 100% coverage. We determined a consensus sequence accuracy of 100%. In contrast to GC bias inherent to NGS results, we demonstrated that our single-molecule sequencing method yields minimal GC bias. PMID:29253901
Single molecule sequencing of the M13 virus genome without amplification.
Zhao, Luyang; Deng, Liwei; Li, Gailing; Jin, Huan; Cai, Jinsen; Shang, Huan; Li, Yan; Wu, Haomin; Xu, Weibin; Zeng, Lidong; Zhang, Renli; Zhao, Huan; Wu, Ping; Zhou, Zhiliang; Zheng, Jiao; Ezanno, Pierre; Yang, Andrew X; Yan, Qin; Deem, Michael W; He, Jiankui
2017-01-01
Next generation sequencing (NGS) has revolutionized life sciences research. However, GC bias and costly, time-intensive library preparation make NGS an ill fit for increasing sequencing demands in the clinic. A new class of third-generation sequencing platforms has arrived to meet this need, capable of directly measuring DNA and RNA sequences at the single-molecule level without amplification. Here, we use the new GenoCare single-molecule sequencing platform from Direct Genomics to sequence the genome of the M13 virus. Our platform detects single-molecule fluorescence by total internal reflection microscopy, with sequencing-by-synthesis chemistry. We sequenced the genome of M13 to a depth of 316x, with 100% coverage. We determined a consensus sequence accuracy of 100%. In contrast to GC bias inherent to NGS results, we demonstrated that our single-molecule sequencing method yields minimal GC bias.
Comparison of next generation sequencing technologies for transcriptome characterization
2009-01-01
Background We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG) ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19). We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica) and the magnoliid avocado (Persea americana) using a variety of methods for cDNA synthesis. Results The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB), 119,518 (88.7%) mapped exactly to known exons, while 1,117 (0.8%) mapped to introns, 11,524 (8.6%) spanned annotated intron/exon boundaries, and 3,066 (2.3%) extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance over capillary-based sequencing, but NG sequencing also presents significant challenges in assembly and sequence accuracy due to short read lengths, method-specific sequencing errors, and the absence of physical clones. These problems may be overcome by hybrid sequencing strategies using a mixture of sequencing methodologies, by new assemblers, and by sequencing more deeply. Sequencing and microarray outcomes from multiple experiments suggest that our simulator will be useful for guiding NG transcriptome sequencing projects in a wide range of organisms. PMID:19646272
Rhodes, Johanna; Beale, Mathew A; Fisher, Matthew C
2014-01-01
The industry of next-generation sequencing is constantly evolving, with novel library preparation methods and new sequencing machines being released by the major sequencing technology companies annually. The Illumina TruSeq v2 library preparation method was the most widely used kit and the market leader; however, it has now been discontinued, and in 2013 was replaced by the TruSeq Nano and TruSeq PCR-free methods, leaving a gap in knowledge regarding which is the most appropriate library preparation method to use. Here, we used isolates from the pathogenic fungi Cryptococcus neoformans var. grubii and sequenced them using the existing TruSeq DNA v2 kit (Illumina), along with two new kits: the TruSeq Nano DNA kit (Illumina) and the NEBNext Ultra DNA kit (New England Biolabs) to provide a comparison. Compared to the original TruSeq DNA v2 kit, both newer kits gave equivalent or better sequencing data, with increased coverage. When comparing the two newer kits, we found little difference in cost and workflow, with the NEBNext Ultra both slightly cheaper and faster than the TruSeq Nano. However, the quality of data generated using the TruSeq Nano DNA kit was superior due to higher coverage at regions of low GC content, and more SNPs identified. Researchers should therefore evaluate their resources and the type of application (and hence data quality) being considered when ultimately deciding on which library prep method to use.
Steele, Katherine A; Quinton-Tulloch, Mark J; Amgai, Resham B; Dhakal, Rajeev; Khatiwada, Shambhu P; Vyas, Darshna; Heine, Martin; Witcombe, John R
2018-01-01
Few public sector rice breeders have the capacity to use NGS-derived markers in their breeding programmes despite rapidly expanding repositories of rice genome sequence data. They rely on > 18,000 mapped microsatellites (SSRs) for marker-assisted selection (MAS) using gel analysis. Lack of knowledge about target SNP and InDel variant loci has hampered the uptake by many breeders of Kompetitive allele-specific PCR (KASP), a proprietary technology of LGC genomics that can distinguish alleles at variant loci. KASP is a cost-effective single-step genotyping technology, cheaper than SSRs and more flexible than genotyping by sequencing (GBS) or array-based genotyping when used in selection programmes. Before this study, there were 2015 rice KASP marker loci in the public domain, mainly identified by array-based screening, leaving large proportions of the rice genome with no KASP coverage. Here we have addressed the urgent need for a wide choice of appropriate rice KASP assays and demonstrated that NGS can detect many more KASP to give full genome coverage. Through re-sequencing of nine indica rice breeding lines or released varieties, this study has identified 2.5 million variant sites. Stringent filtering of variants generated 1.3 million potential KASP assay designs, including 92,500 potential functional markers. This strategy delivers a 650-fold increase in potential selectable KASP markers at a density of 3.1 per 1 kb in the indica crosses analysed and 377,178 polymorphic KASP design sites on average per cross. This knowledge is available to breeders and has been utilised to improve the efficiency of public sector breeding in Nepal, enabling identification of polymorphic KASP at any region or quantitative trait loci in relevant crosses. Validation of 39 new KASP was carried out by genotyping progeny from a range of crosses to show that they detected segregating alleles. The new KASP have replaced SSRs to aid trait selection during marker-assisted backcrossing in these crosses, where target traits include rice blast and BLB resistance loci. Furthermore, we provide the software for plant breeders to generate KASP designs from their own datasets.
Rapid and accurate pyrosequencing of angiosperm plastid genomes
Moore, Michael J; Dhingra, Amit; Soltis, Pamela S; Shaw, Regina; Farmerie, William G; Folta, Kevin M; Soltis, Douglas E
2006-01-01
Background Plastid genome sequence information is vital to several disciplines in plant biology, including phylogenetics and molecular biology. The past five years have witnessed a dramatic increase in the number of completely sequenced plastid genomes, fuelled largely by advances in conventional Sanger sequencing technology. Here we report a further significant reduction in time and cost for plastid genome sequencing through the successful use of a newly available pyrosequencing platform, the Genome Sequencer 20 (GS 20) System (454 Life Sciences Corporation), to rapidly and accurately sequence the whole plastid genomes of the basal eudicot angiosperms Nandina domestica (Berberidaceae) and Platanus occidentalis (Platanaceae). Results More than 99.75% of each plastid genome was simultaneously obtained during two GS 20 sequence runs, to an average depth of coverage of 24.6× in Nandina and 17.3× in Platanus. The Nandina and Platanus plastid genomes shared essentially identical gene complements and possessed the typical angiosperm plastid structure and gene arrangement. To assess the accuracy of the GS 20 sequence, over 45 kilobases of sequence were generated for each genome using conventional sequencing. Overall error rates of 0.043% and 0.031% were observed in GS 20 sequence for Nandina and Platanus, respectively. More than 97% of all observed errors were associated with homopolymer runs, with ~60% of all errors associated with homopolymer runs of 5 or more nucleotides and ~50% of all errors associated with regions of extensive homopolymer runs. No substitution errors were present in either genome. Error rates were generally higher in the single-copy and noncoding regions of both plastid genomes relative to the inverted repeat and coding regions. Conclusion Highly accurate and essentially complete sequence information was obtained for the Nandina and Platanus plastid genomes using the GS 20 System. More importantly, the high accuracy observed in the GS 20 plastid genome sequence was generated for a significant reduction in time and cost over traditional shotgun-based genome sequencing techniques, although with approximately half the coverage of previously reported GS 20 de novo genome sequence. The GS 20 should be broadly applicable to angiosperm plastid genome sequencing, and therefore promises to expand the scale of plant genetic and phylogenetic research dramatically. PMID:16934154
Analysis on the Change of Vegetation Coverage in Qinghai Province from 2000 TO 2012
NASA Astrophysics Data System (ADS)
Wang, J.; Yan, Q.; Liu, Z.; Luo, C.
2013-07-01
Qinghai Province is one of the important provinces on the Qinghai-Tibet Plateau in China. Its unique alpine meadow ecosystem makes it become the most concentrated areas of biodiversity in high altitudes in the world. Researching the vegetation coverage and changes of Qinghai province can reflect effectively and timely processing of changes and problems of ecological quality in the region. This research will give a long time series monitoring of the vegetation coverage of Qinghai province based on maximum value composite (MVC) and S-G filtering algorithm using MODIS data of the year of 2000-2012, then analyze the change using coefficient of variability(CV) and trend line analysis. According to research, during the past 13 years, more than half of Qinghai Province's vegetation coverage is well, both the east and south have a high coverage, while the northwest is lower. The changing of vegetation coverage also has showed a steady and improving trend in 13 years. The largest area is slight improved area is about 29.08% of the total area, and the second largest area is significant improved area is about 21.09% of the total area. In this research can learn directly the vegetation coverage and changes of Qinghai province and provide reference and scientific basis for the protection and governance of ecological environment.
Luck, Ashley N; Slatko, Barton E; Foster, Jeremy M
2017-01-01
Efficient transcriptomic sequencing of microbial mRNA derived from host-microbe associations is often compromised by the much lower relative abundance of microbial RNA in the mixed total RNA sample. One solution to this problem is to perform extensive sequencing until an acceptable level of transcriptome coverage is obtained. More cost-effective methods include use of prokaryotic and/or eukaryotic rRNA depletion strategies, sometimes in conjunction with depletion of polyadenylated eukaryotic mRNA. Here, we report use of Cappable-seq™ to specifically enrich, in a single step, Wolbachia endobacterial mRNA transcripts from total RNA prepared from the parasitic filarial nematode, Brugia malayi. The obligate Wolbachia endosymbiont is a proven drug target for many human filarial infections, yet the precise nature of its symbiosis with the nematode host is poorly understood. Insightful analysis of the expression levels of Wolbachia genes predicted to underpin the mutualistic association and of known drug target genes at different life cycle stages or in response to drug treatments is typically challenged by low transcriptomic coverage. Cappable-seq resulted in up to ~ 5-fold increase in the number of reads mapping to Wolbachia. On average, coverage of Wolbachia transcripts from B. malayi microfilariae was enriched ~40-fold by Cappable-seq. Additionally, this method has an additional benefit of selectively removing abundant prokaryotic ribosomal RNAs.The deeper microbial transcriptome sequencing afforded by Cappable-seq facilitates more detailed characterization of gene expression levels of pathogens and symbionts present in animal tissues.
Evaluation of the three-dimensional bony coverage before and after rotational acetabular osteotomy.
Tanaka, Takeyuki; Moro, Toru; Takatori, Yoshio; Oshima, Hirofumi; Ito, Hideya; Sugita, Naohiko; Mitsuishi, Mamoru; Tanaka, Sakae
2018-02-26
Rotational acetabular osteotomy is a type of pelvic osteotomy that involves rotation of the acetabular bone to improve the bony coverage of the femoral head for patients with acetabular dysplasia. Favourable post-operative long-term outcomes have been reported in previous studies. However, there is a paucity of published data regarding three-dimensional bony coverage. The present study investigated the three-dimensional bony coverage of the acetabulum covering the femoral head in hips before and after rotational acetabular osteotomy and in normal hips. The computed tomography data of 40 hip joints (12 joints before and after rotational acetabular osteotomy; 16 normal joints) were analyzed. The three-dimensional bony coverage of each joint was evaluated using original software. The post-operative bony coverage improved significantly compared with pre-operative values. In particular, the anterolateral aspect of the acetabulum tended to be dysplastic in patients with acetabular dysplasia compared to those with normal hip joints. However, greater bony coverage at the anterolateral aspect was obtained after rotational acetabular osteotomy. Meanwhile, the results of the present study may indicate that the bony coverage in the anterior aspect may be excessive. Three-dimensional analysis indicated that rotational acetabular osteotomy achieved favorable bony coverage. Further investigations are necessary to determine the ideal bony coverage after rotational acetabular osteotomy.
Zhang, Qi; Zeng, Xin; Younkin, Sam; Kawli, Trupti; Snyder, Michael P; Keleş, Sündüz
2016-02-24
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) experiments revolutionized genome-wide profiling of transcription factors and histone modifications. Although maturing sequencing technologies allow these experiments to be carried out with short (36-50 bps), long (75-100 bps), single-end, or paired-end reads, the impact of these read parameters on the downstream data analysis are not well understood. In this paper, we evaluate the effects of different read parameters on genome sequence alignment, coverage of different classes of genomic features, peak identification, and allele-specific binding detection. We generated 101 bps paired-end ChIP-seq data for many transcription factors from human GM12878 and MCF7 cell lines. Systematic evaluations using in silico variations of these data as well as fully simulated data, revealed complex interplay between the sequencing parameters and analysis tools, and indicated clear advantages of paired-end designs in several aspects such as alignment accuracy, peak resolution, and most notably, allele-specific binding detection. Our work elucidates the effect of design on the downstream analysis and provides insights to investigators in deciding sequencing parameters in ChIP-seq experiments. We present the first systematic evaluation of the impact of ChIP-seq designs on allele-specific binding detection and highlights the power of pair-end designs in such studies.
Do modern total knee replacements improve tibial coverage?
Meier, Malin; Webb, Jonathan; Collins, Jamie E; Beckmann, Johannes; Fitz, Wolfgang
2018-01-25
The purpose of the present study is to compare newer designs of various symmetric and asymmetric tibial components and measure tibial bone coverage using the rotational safe zone defined by two commonly utilized anatomic rotational landmarks. Computed tomography scans (CT scans) of one hundred consecutive patients scheduled for total knee arthroplasty were obtained pre-operatively. A virtual proximal tibial cut was performed and two commonly used rotational axes were added for each image: the medio-lateral axis (ML-axis) and the medial 1/3 tibial tubercle axis (med-1/3-axis). Different symmetric and asymmetric implant designs were then superimposed in various rotational positions for best cancellous and cortical coverage. The images were imported to a public domain imaging software, and cancellous and cortical bone coverage was computed for each image, with each implant design in various rotational positions. One single implant type could not be identified that provided the best cortical and cancellous coverage of the tibia, irrespective of using the med-1/3-axis or the ML-axis for rotational alignment. However, it could be confirmed that the best bone coverage was dependent on the selected rotational landmark. Furthermore, improved bone coverage was observed when tibial implant positions were optimized between the two rotational axes. Tibial coverage is similar for symmetric and asymmetric designs, but depends on the rotational landmark for which the implant is designed. The surgeon has the option to improve tibial coverage by optimizing placement between the two anatomic rotational alignment landmarks, the medial 1/3 and the ML-axis. Surgeons should be careful assessing intraoperative rotational tibial placement using the described anatomic rotational landmarks to optimize tibial bony coverage without compromising patella tracking. III.
Tolson, D A; Nicholson, N H
1998-01-01
The determination of DNA sequences by partial exonuclease digestion followed by Matrix-Assisted Laser Desorption Time of Flight Mass Spectrometry (MALDI-TOF) is a well established method. When the same procedure is applied to RNA, difficulties arise due to the small (1 Da) mass difference between the nucleotides U and C, which makes unambiguous assignment difficult using a MALDI-TOF instrument. Here we report our experiences with sequence specific endonucleases and chemical methods followed by MALDI-TOF to resolve these sequence ambiguities. We have found chemical methods superior to endonucleases both in terms of correct specificity and extent of sequence coverage. This methodology can be used in combination with exonuclease digestion to rapidly assign RNA sequences. PMID:9421498
Cost analysis of whole genome sequencing in German clinical practice.
Plöthner, Marika; Frank, Martin; von der Schulenburg, J-Matthias Graf
2017-06-01
Whole genome sequencing (WGS) is an emerging tool in clinical diagnostics. However, little has been said about its procedure costs, owing to a dearth of related cost studies. This study helps fill this research gap by analyzing the execution costs of WGS within the setting of German clinical practice. First, to estimate costs, a sequencing process related to clinical practice was undertaken. Once relevant resources were identified, a quantification and monetary evaluation was conducted using data and information from expert interviews with clinical geneticists, and personnel at private enterprises and hospitals. This study focuses on identifying the costs associated with the standard sequencing process, and the procedure costs for a single WGS were analyzed on the basis of two sequencing platforms-namely, HiSeq 2500 and HiSeq Xten, both by Illumina, Inc. In addition, sensitivity analyses were performed to assess the influence of various uses of sequencing platforms and various coverage values on a fixed-cost degression. In the base case scenario-which features 80 % utilization and 30-times coverage-the cost of a single WGS analysis with the HiSeq 2500 was estimated at €3858.06. The cost of sequencing materials was estimated at €2848.08; related personnel costs of €396.94 and acquisition/maintenance costs (€607.39) were also found. In comparison, the cost of sequencing that uses the latest technology (i.e., HiSeq Xten) was approximately 63 % cheaper, at €1411.20. The estimated costs of WGS currently exceed the prediction of a 'US$1000 per genome', by more than a factor of 3.8. In particular, the material costs in themselves exceed this predicted cost.
The American cranberry: first insights into the whole genome of a species adapted to bog habitat.
Polashock, James; Zelzion, Ehud; Fajardo, Diego; Zalapa, Juan; Georgi, Laura; Bhattacharya, Debashish; Vorsa, Nicholi
2014-06-13
The American cranberry (Vaccinium macrocarpon Ait.) is one of only three widely-cultivated fruit crops native to North America- the other two are blueberry (Vaccinium spp.) and native grape (Vitis spp.). In terms of taxonomy, cranberries are in the core Ericales, an order for which genome sequence data are currently lacking. In addition, cranberries produce a host of important polyphenolic secondary compounds, some of which are beneficial to human health. Whereas next-generation sequencing technology is allowing the advancement of whole-genome sequencing, one major obstacle to the successful assembly from short-read sequence data of complex diploid (and higher ploidy) organisms is heterozygosity. Cranberry has the advantage of being diploid (2n = 2x = 24) and self-fertile. To minimize the issue of heterozygosity, we sequenced the genome of a fifth-generation inbred genotype (F ≥ 0.97) derived from five generations of selfing originating from the cultivar Ben Lear. The genome size of V. macrocarpon has been estimated to be about 470 Mb. Genomic sequences were assembled into 229,745 scaffolds representing 420 Mbp (N50 = 4,237 bp) with 20X average coverage. The number of predicted genes was 36,364 and represents 17.7% of the assembled genome. Of the predicted genes, 30,090 were assigned to candidate genes based on homology. Genes supported by transcriptome data totaled 13,170 (36%). Shotgun sequencing of the cranberry genome, with an average sequencing coverage of 20X, allowed efficient assembly and gene calling. The candidate genes identified represent a useful collection to further study important biochemical pathways and cellular processes and to use for marker development for breeding and the study of horticultural characteristics, such as disease resistance.
The American cranberry: first insights into the whole genome of a species adapted to bog habitat
2014-01-01
Background The American cranberry (Vaccinium macrocarpon Ait.) is one of only three widely-cultivated fruit crops native to North America- the other two are blueberry (Vaccinium spp.) and native grape (Vitis spp.). In terms of taxonomy, cranberries are in the core Ericales, an order for which genome sequence data are currently lacking. In addition, cranberries produce a host of important polyphenolic secondary compounds, some of which are beneficial to human health. Whereas next-generation sequencing technology is allowing the advancement of whole-genome sequencing, one major obstacle to the successful assembly from short-read sequence data of complex diploid (and higher ploidy) organisms is heterozygosity. Cranberry has the advantage of being diploid (2n = 2x = 24) and self-fertile. To minimize the issue of heterozygosity, we sequenced the genome of a fifth-generation inbred genotype (F ≥ 0.97) derived from five generations of selfing originating from the cultivar Ben Lear. Results The genome size of V. macrocarpon has been estimated to be about 470 Mb. Genomic sequences were assembled into 229,745 scaffolds representing 420 Mbp (N50 = 4,237 bp) with 20X average coverage. The number of predicted genes was 36,364 and represents 17.7% of the assembled genome. Of the predicted genes, 30,090 were assigned to candidate genes based on homology. Genes supported by transcriptome data totaled 13,170 (36%). Conclusions Shotgun sequencing of the cranberry genome, with an average sequencing coverage of 20X, allowed efficient assembly and gene calling. The candidate genes identified represent a useful collection to further study important biochemical pathways and cellular processes and to use for marker development for breeding and the study of horticultural characteristics, such as disease resistance. PMID:24927653
Salmaso, S.; Rota, M. C.; Ciofi Degli Atti, M. L.; Tozzi, A. E.; Kreidl, P.
1999-01-01
In 1998, a series of regional cluster surveys (the ICONA Study) was conducted simultaneously in 19 out of the 20 regions in Italy to estimate the mandatory immunization coverage of children aged 12-24 months with oral poliovirus (OPV), diphtheria-tetanus (DT) and viral hepatitis B (HBV) vaccines, as well as optional immunization coverage with pertussis, measles and Haemophilus influenzae b (Hib) vaccines. The study children were born in 1996 and selected from birth registries using the Expanded Programme of Immunization (EPI) cluster sampling technique. Interviews with parents were conducted to determine each child's immunization status and the reasons for any missed or delayed vaccinations. The study population comprised 4310 children aged 12-24 months. Coverage for both mandatory and optional vaccinations differed by region. The overall coverage for mandatory vaccines (OPV, DT and HBV) exceeded 94%, but only 79% had been vaccinated in accord with the recommended schedule (i.e. during the first year of life). Immunization coverage for pertussis increased from 40% (1993 survey) to 88%, but measles coverage (56%) remained inadequate for controlling the disease; Hib coverage was 20%. These results confirm that in Italy the coverage of only mandatory immunizations is satisfactory. Pertussis immunization coverage has improved dramatically since the introduction of acellular vaccines. A greater effort to educate parents and physicians is still needed to improve the coverage of optional vaccinations in all regions. PMID:10593033
Salmaso, S; Rota, M C; Ciofi Degli Atti, M L; Tozzi, A E; Kreidl, P
1999-01-01
In 1998, a series of regional cluster surveys (the ICONA Study) was conducted simultaneously in 19 out of the 20 regions in Italy to estimate the mandatory immunization coverage of children aged 12-24 months with oral poliovirus (OPV), diphtheria-tetanus (DT) and viral hepatitis B (HBV) vaccines, as well as optional immunization coverage with pertussis, measles and Haemophilus influenzae b (Hib) vaccines. The study children were born in 1996 and selected from birth registries using the Expanded Programme of Immunization (EPI) cluster sampling technique. Interviews with parents were conducted to determine each child's immunization status and the reasons for any missed or delayed vaccinations. The study population comprised 4310 children aged 12-24 months. Coverage for both mandatory and optional vaccinations differed by region. The overall coverage for mandatory vaccines (OPV, DT and HBV) exceeded 94%, but only 79% had been vaccinated in accord with the recommended schedule (i.e. during the first year of life). Immunization coverage for pertussis increased from 40% (1993 survey) to 88%, but measles coverage (56%) remained inadequate for controlling the disease; Hib coverage was 20%. These results confirm that in Italy the coverage of only mandatory immunizations is satisfactory. Pertussis immunization coverage has improved dramatically since the introduction of acellular vaccines. A greater effort to educate parents and physicians is still needed to improve the coverage of optional vaccinations in all regions.
Salisbury, Joseph P; Sîrbulescu, Ruxandra F; Moran, Benjamin M; Auclair, Jared R; Zupanc, Günther K H; Agar, Jeffrey N
2015-03-11
The brown ghost knifefish (Apteronotus leptorhynchus) is a weakly electric teleost fish of particular interest as a versatile model system for a variety of research areas in neuroscience and biology. The comprehensive information available on the neurophysiology and neuroanatomy of this organism has enabled significant advances in such areas as the study of the neural basis of behavior, the development of adult-born neurons in the central nervous system and their involvement in the regeneration of nervous tissue, as well as brain aging and senescence. Despite substantial scientific interest in this species, no genomic resources are currently available. Here, we report the de novo assembly and annotation of the A. leptorhynchus transcriptome. After evaluating several trimming and transcript reconstruction strategies, de novo assembly using Trinity uncovered 42,459 unique contigs containing at least a partial protein-coding sequence based on alignment to a reference set of known Actinopterygii sequences. As many as 11,847 of these contigs contained full or near-full length protein sequences, providing broad coverage of the proteome. A variety of non-coding RNA sequences were also identified and annotated, including conserved long intergenic non-coding RNA and other long non-coding RNA observed previously to be expressed in adult zebrafish (Danio rerio) brain, as well as a variety of miRNA, snRNA, and snoRNA. Shotgun proteomics confirmed translation of open reading frames from over 2,000 transcripts, including alternative splice variants. Assignment of tandem mass spectra was greatly improved by use of the assembly compared to databases of sequences from closely related organisms. The assembly and raw reads have been deposited at DDBJ/EMBL/GenBank under the accession number GBKR00000000. Tandem mass spectrometry data is available via ProteomeXchange with identifier PXD001285. Presented here is the first release of an annotated de novo transcriptome assembly from Apteronotus leptorhynchus, providing a broad overview of RNA expressed in central nervous system tissue. The assembly, which includes substantial coverage of a wide variety of both protein coding and non-coding transcripts, will allow the development of better tools to understand the mechanisms underlying unique characteristics of the knifefish model system, such as their tremendous regenerative capacity and negligible brain senescence.
Neo-sex Chromosomes in the Monarch Butterfly, Danaus plexippus
Mongue, Andrew J.; Nguyen, Petr; Voleníková, Anna; Walters, James R.
2017-01-01
We report the discovery of a neo-sex chromosome in the monarch butterfly, Danaus plexippus, and several of its close relatives. Z-linked scaffolds in the D. plexippus genome assembly were identified via sex-specific differences in Illumina sequencing coverage. Additionally, a majority of the D. plexippus genome assembly was assigned to chromosomes based on counts of one-to-one orthologs relative to the butterfly Melitaea cinxia (with replication using two other lepidopteran species), in which genome scaffolds have been mapped to linkage groups. Sequencing coverage-based assessments of Z linkage combined with homology-based chromosomal assignments provided strong evidence for a Z-autosome fusion in the Danaus lineage, involving the autosome homologous to chromosome 21 in M. cinxia. Coverage analysis also identified three notable assembly errors resulting in chimeric Z-autosome scaffolds. Cytogenetic analysis further revealed a large W chromosome that is partially euchromatic, consistent with being a neo-W chromosome. The discovery of a neo-Z and the provisional assignment of chromosome linkage for >90% of D. plexippus genes lays the foundation for novel insights concerning sex chromosome evolution in this female-heterogametic model species for functional and evolutionary genomics. PMID:28839116
Application of Tandem Two-Dimensional Mass Spectrometry for Top-Down Deep Sequencing of Calmodulin.
Floris, Federico; Chiron, Lionel; Lynch, Alice M; Barrow, Mark P; Delsuc, Marc-André; O'Connor, Peter B
2018-06-04
Two-dimensional mass spectrometry (2DMS) involves simultaneous acquisition of the fragmentation patterns of all the analytes in a mixture by correlating their precursor and fragment ions by modulating precursor ions systematically through a fragmentation zone. Tandem two-dimensional mass spectrometry (MS/2DMS) unites the ultra-high accuracy of Fourier transform ion cyclotron resonance (FT-ICR) MS/MS and the simultaneous data-independent fragmentation of 2DMS to achieve extensive inter-residue fragmentation of entire proteins. 2DMS was recently developed for top-down proteomics (TDP), and applied to the analysis of calmodulin (CaM), reporting a cleavage coverage of about ~23% using infrared multiphoton dissociation (IRMPD) as fragmentation technique. The goal of this work is to expand the utility of top-down protein analysis using MS/2DMS in order to extend the cleavage coverage in top-down proteomics further into the interior regions of the protein. In this case, using MS/2DMS, the cleavage coverage of CaM increased from ~23% to ~42%. Graphical Abstract Two-dimensional mass spectrometry, when applied to primary fragment ions from the source, allows deep-sequencing of the protein calmodulin.
Camps, Carme; Petousi, Nayia; Bento, Celeste; Cario, Holger; Copley, Richard R.; McMullin, Mary Frances; van Wijk, Richard; Ratcliffe, Peter J.; Robbins, Peter A.; Taylor, Jenny C.
2016-01-01
Erythrocytosis is a rare disorder characterized by increased red cell mass and elevated hemoglobin concentration and hematocrit. Several genetic variants have been identified as causes for erythrocytosis in genes belonging to different pathways including oxygen sensing, erythropoiesis and oxygen transport. However, despite clinical investigation and screening for these mutations, the cause of disease cannot be found in a considerable number of patients, who are classified as having idiopathic erythrocytosis. In this study, we developed a targeted next-generation sequencing panel encompassing the exonic regions of 21 genes from relevant pathways (~79 Kb) and sequenced 125 patients with idiopathic erythrocytosis. The panel effectively screened 97% of coding regions of these genes, with an average coverage of 450×. It identified 51 different rare variants, all leading to alterations of protein sequence, with 57 out of 125 cases (45.6%) having at least one of these variants. Ten of these were known erythrocytosis-causing variants, which had been missed following existing diagnostic algorithms. Twenty-two were novel variants in erythrocytosis-associated genes (EGLN1, EPAS1, VHL, BPGM, JAK2, SH2B3) and in novel genes included in the panel (e.g. EPO, EGLN2, HIF3A, OS9), some with a high likelihood of functionality, for which future segregation, functional and replication studies will be useful to provide further evidence for causality. The rest were classified as polymorphisms. Overall, these results demonstrate the benefits of using a gene panel rather than existing methods in which focused genetic screening is performed depending on biochemical measurements: the gene panel improves diagnostic accuracy and provides the opportunity for discovery of novel variants. PMID:27651169
Camps, Carme; Petousi, Nayia; Bento, Celeste; Cario, Holger; Copley, Richard R; McMullin, Mary Frances; van Wijk, Richard; Ratcliffe, Peter J; Robbins, Peter A; Taylor, Jenny C
2016-11-01
Erythrocytosis is a rare disorder characterized by increased red cell mass and elevated hemoglobin concentration and hematocrit. Several genetic variants have been identified as causes for erythrocytosis in genes belonging to different pathways including oxygen sensing, erythropoiesis and oxygen transport. However, despite clinical investigation and screening for these mutations, the cause of disease cannot be found in a considerable number of patients, who are classified as having idiopathic erythrocytosis. In this study, we developed a targeted next-generation sequencing panel encompassing the exonic regions of 21 genes from relevant pathways (~79 Kb) and sequenced 125 patients with idiopathic erythrocytosis. The panel effectively screened 97% of coding regions of these genes, with an average coverage of 450×. It identified 51 different rare variants, all leading to alterations of protein sequence, with 57 out of 125 cases (45.6%) having at least one of these variants. Ten of these were known erythrocytosis-causing variants, which had been missed following existing diagnostic algorithms. Twenty-two were novel variants in erythrocytosis-associated genes (EGLN1, EPAS1, VHL, BPGM, JAK2, SH2B3) and in novel genes included in the panel (e.g. EPO, EGLN2, HIF3A, OS9), some with a high likelihood of functionality, for which future segregation, functional and replication studies will be useful to provide further evidence for causality. The rest were classified as polymorphisms. Overall, these results demonstrate the benefits of using a gene panel rather than existing methods in which focused genetic screening is performed depending on biochemical measurements: the gene panel improves diagnostic accuracy and provides the opportunity for discovery of novel variants. Copyright© Ferrata Storti Foundation.
Wongkongkathep, Piriya; Li, Huilin; Zhang, Xing; Loo, Rachel R Ogorzalek; Julian, Ryan R; Loo, Joseph A
2015-11-15
The application of ion pre-activation with 266 nm ultraviolet (UV) laser irradiation combined with electron capture dissociation (ECD) is demonstrated to enhance top-down mass spectrometry sequence coverage of disulfide bond containing proteins. UV-based activation can homolytically cleave a disulfide bond to yield two separated thiol radicals. Activated ECD experiments of insulin and ribonuclease A containing three and four disulfide bonds, respectively, were performed. UV-activation in combination with ECD allowed the three disulfide bonds of insulin to be cleaved and the overall sequence coverage to be increased. For the larger sized ribonuclease A with four disulfide bonds, irradiation from an infrared laser (10.6 µm) to disrupt non-covalent interactions was combined with UV-activation to facilitate the cleavage of up to three disulfide bonds. Preferences for disulfide bond cleavage are dependent on protein structure and sequence. Disulfide bonds can reform if the generated radicals remain in close proximity. By varying the time delay between the UV-activation and the ECD events, it was determined that disulfide bonds reform within 10-100 msec after their UV-homolytic cleavage.
García-Chequer, A.J.; Méndez-Tenorio, A.; Olguín-Ruiz, G.; Sánchez-Vallejo, C.; Isa, P.; Arias, C.F.; Torres, J.; Hernández-Angeles, A.; Ramírez-Ortiz, M.A.; Lara, C.; Cabrera-Muñoz, M.L.; Sadowinski-Pine, S.; Bravo-Ortiz, J.C.; Ramón-García, G.; Diegopérez-Ramírez, J.; Ramírez-Reyes, G.; Casarrubias-Islas, R.; Ramírez, J.; Orjuela, M.A.; Ponce-Castañeda, M.V.
2016-01-01
Genes are frequently lost or gained in malignant tumors and the analysis of these changes can be informative about the underlying tumor biology. Retinoblastoma is a pediatric intraocular malignancy, and since deletions in chromosome 13 have been described in this tumor, we performed genome wide sequencing with the Illumina platform to test whether recurrent losses could be detected in low coverage data from DNA pools of Rb cases. An in silico reference profile for each pool was created from the human genome sequence GRCh37p5; a chromosome integrity score and a graphics 40 Kb window analysis approach, allowed us to identify with high resolution previously reported non random recurrent losses in all chromosomes of these tumors. We also found a pattern of gains and losses associated to clear and dark cytogenetic bands respectively. We further analyze a pool of medulloblastoma and found a more stable genomic profile and previously reported losses in this tumor. This approach facilitates identification of recurrent deletions from many patients that may be biological relevant for tumor development. PMID:26883451
The Protein Information Resource: an integrated public resource of functional annotation of proteins
Wu, Cathy H.; Huang, Hongzhan; Arminski, Leslie; Castro-Alvear, Jorge; Chen, Yongxing; Hu, Zhang-Zhi; Ledley, Robert S.; Lewis, Kali C.; Mewes, Hans-Werner; Orcutt, Bruce C.; Suzek, Baris E.; Tsugita, Akira; Vinayaka, C. R.; Yeh, Lai-Su L.; Zhang, Jian; Barker, Winona C.
2002-01-01
The Protein Information Resource (PIR) serves as an integrated public resource of functional annotation of protein data to support genomic/proteomic research and scientific discovery. The PIR, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the PIR-International Protein Sequence Database (PSD), the major annotated protein sequence database in the public domain, containing about 250 000 proteins. To improve protein annotation and the coverage of experimentally validated data, a bibliography submission system is developed for scientists to submit, categorize and retrieve literature information. Comprehensive protein information is available from iProClass, which includes family classification at the superfamily, domain and motif levels, structural and functional features of proteins, as well as cross-references to over 40 biological databases. To provide timely and comprehensive protein data with source attribution, we have introduced a non-redundant reference protein database, PIR-NREF. The database consists of about 800 000 proteins collected from PIR-PSD, SWISS-PROT, TrEMBL, GenPept, RefSeq and PDB, with composite protein names and literature data. To promote database interoperability, we provide XML data distribution and open database schema, and adopt common ontologies. The PIR web site (http://pir.georgetown.edu/) features data mining and sequence analysis tools for information retrieval and functional identification of proteins based on both sequence and annotation information. The PIR databases and other files are also available by FTP (ftp://nbrfa.georgetown.edu/pir_databases). PMID:11752247
Expanded microbial genome coverage and improved protein family annotation in the COG database
Galperin, Michael Y.; Makarova, Kira S.; Wolf, Yuri I.; Koonin, Eugene V.
2015-01-01
Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the COGs is expected to become an important tool for microbial genomics. PMID:25428365
Fernandez-Valverde, Selene L; Calcino, Andrew D; Degnan, Bernard M
2015-05-15
The demosponge Amphimedon queenslandica is amongst the few early-branching metazoans with an assembled and annotated draft genome, making it an important species in the study of the origin and early evolution of animals. Current gene models in this species are largely based on in silico predictions and low coverage expressed sequence tag (EST) evidence. Amphimedon queenslandica protein-coding gene models are improved using deep RNA-Seq data from four developmental stages and CEL-Seq data from 82 developmental samples. Over 86% of previously predicted genes are retained in the new gene models, although 24% have additional exons; there is also a marked increase in the total number of annotated 3' and 5' untranslated regions (UTRs). Importantly, these new developmental transcriptome data reveal numerous previously unannotated protein-coding genes in the Amphimedon genome, increasing the total gene number by 25%, from 30,060 to 40,122. In general, Amphimedon genes have introns that are markedly smaller than those in other animals and most of the alternatively spliced genes in Amphimedon undergo intron-retention; exon-skipping is the least common mode of alternative splicing. Finally, in addition to canonical polyadenylation signal sequences, Amphimedon genes are enriched in a number of unique AT-rich motifs in their 3' UTRs. The inclusion of developmental transcriptome data has substantially improved the structure and composition of protein-coding gene models in Amphimedon queenslandica, providing a more accurate and comprehensive set of genes for functional and comparative studies. These improvements reveal the Amphimedon genome is comprised of a remarkably high number of tightly packed genes. These genes have small introns and there is pervasive intron retention amongst alternatively spliced transcripts. These aspects of the sponge genome are more similar unicellular opisthokont genomes than to other animal genomes.
Minneci, Federico; Piovesan, Damiano; Cozzetto, Domenico; Jones, David T.
2013-01-01
To understand fully cell behaviour, biologists are making progress towards cataloguing the functional elements in the human genome and characterising their roles across a variety of tissues and conditions. Yet, functional information – either experimentally validated or computationally inferred by similarity – remains completely missing for approximately 30% of human proteins. FFPred was initially developed to bridge this gap by targeting sequences with distant or no homologues of known function and by exploiting clear patterns of intrinsic disorder associated with particular molecular activities and biological processes. Here, we present an updated and improved version, which builds on larger datasets of protein sequences and annotations, and uses updated component feature predictors as well as revised training procedures. FFPred 2.0 includes support vector regression models for the prediction of 442 Gene Ontology (GO) terms, which largely expand the coverage of the ontology and of the biological process category in particular. The GO term list mainly revolves around macromolecular interactions and their role in regulatory, signalling, developmental and metabolic processes. Benchmarking experiments on newly annotated proteins show that FFPred 2.0 provides more accurate functional assignments than its predecessor and the ProtFun server do; also, its assignments can complement information obtained using BLAST-based transfer of annotations, improving especially prediction in the biological process category. Furthermore, FFPred 2.0 can be used to annotate proteins belonging to several eukaryotic organisms with a limited decrease in prediction quality. We illustrate all these points through the use of both precision-recall plots and of the COGIC scores, which we recently proposed as an alternative numerical evaluation measure of function prediction accuracy. PMID:23717476
Ahanhanzo, Y Glèlè; Palenfo, D; Saussier, C; Gbèdonou, P; Tonda, A; Da Silva, A; Aplogan, A
2016-08-01
Within the framework of its strategic goal of vaccine coverage (VC) improvement, GAVI, The Vaccine Alliance has entrusted the Agence de médecine préventive (agency for preventive medicine, AMP) with technical assistance services to Cameroon, Cote d'Ivoire (Ivory Coast), and Mauritania. This support was provided to selected priority districts (PDs) with the worst Penta3 coverage performances. In 2014, PDs benefited from technical and management capacities in vaccinology strengthening for district medical officers, supportive supervisions and technical assistance in health logistics, data management and quality. We analyzed the effects of the AMP technical assistance on the improvement of the cumulative Penta3 coverage, which is the key performance indicator of the expanded programme on immunization (EPI) performance. We compared Penta3 coverage between PDs and other non-priority districts (NPDs), Penta3 coverage evolution within each PD, and the distribution of PDs and NPDs according to Penta3 coverage category between January and December 2014. Technical assistance had a positive effect on the EPI performance. Indeed Penta3 coverage progression was higher in PDs than in NPDs throughout the period. Besides, between January and December 2014, the Penta3 VC increased in 70%, 100% and 86% of DPs in Cameroon, Côte d'Ivoire and Mauritania, respectively. Furthermore, the increase in the number of PDs with a Penta3 coverage over 80% was higher in DPs than in NPDs: 20% versus 8% for Cameroon, 58% versus 29% for Côte d'Ivoire and 17% versus 8% for Mauritania. Despite positive and encouraging results, this technical assistance service can be improved and efforts are needed to ensure that all health districts have a VC above 80% for all EPI vaccines. The current challenge is for African countries to mobilize resources for maintaining the knowledge and benefits and scaling such interventions in the public health area.
Wu, Chengcang; Proestou, Dina; Carter, Dorothy; Nicholson, Erica; Santos, Filippe; Zhao, Shaying; Zhang, Hong-Bin; Goldsmith, Marian R
2009-01-01
Background Manduca sexta, Heliothis virescens, and Heliconius erato represent three widely-used insect model species for genomic and fundamental studies in Lepidoptera. Large-insert BAC libraries of these insects are critical resources for many molecular studies, including physical mapping and genome sequencing, but not available to date. Results We report the construction and characterization of six large-insert BAC libraries for the three species and sampling sequence analysis of the genomes. The six BAC libraries were constructed with two restriction enzymes, two libraries for each species, and each has an average clone insert size ranging from 152–175 kb. We estimated that the genome coverage of each library ranged from 6–9 ×, with the two combined libraries of each species being equivalent to 13.0–16.3 × haploid genomes. The genome coverage, quality and utility of the libraries were further confirmed by library screening using 6~8 putative single-copy probes. To provide a first glimpse into these genomes, we sequenced and analyzed the BAC ends of ~200 clones randomly selected from the libraries of each species. The data revealed that the genomes are AT-rich, contain relatively small fractions of repeat elements with a majority belonging to the category of low complexity repeats, and are more abundant in retro-elements than DNA transposons. Among the species, the H. erato genome is somewhat more abundant in repeat elements and simple repeats than those of M. sexta and H. virescens. The BLAST analysis of the BAC end sequences suggested that the evolution of the three genomes is widely varied, with the genome of H. virescens being the most conserved as a typical lepidopteran, whereas both genomes of H. erato and M. sexta appear to have evolved significantly, resulting in a higher level of species- or evolutionary lineage-specific sequences. Conclusion The high-quality and large-insert BAC libraries of the insects, together with the identified BACs containing genes of interest, provide valuable information, resources and tools for comprehensive understanding and studies of the insect genomes and for addressing many fundamental questions in Lepidoptera. The sample of the genomic sequences provides the first insight into the constitution and evolution of the insect genomes. PMID:19558662
Ammonium sulfate and MALDI in-source decay: a winning combination for sequencing peptides
Delvolve, Alice; Woods, Amina S.
2009-01-01
In previous papers we highlighted the role of ammonium sulfate in increasing peptide fragmentation by in source decay (ISD). The current work systematically investigated effects of MALDI extraction delay, peptide amino acid composition, matrix and ammonium sulfate concentration on peptides ISD fragmentation. The data confirmed that ammonium sulfate increased peptides signal to noise ratio as well as their in source fragmentation resulting in complete sequence coverage regardless of the amino acid composition. This method is easy, inexpensive and generates the peptides sequence instantly. PMID:19877641
Metagenomic Taxonomy-Guided Database-Searching Strategy for Improving Metaproteomic Analysis.
Xiao, Jinqiu; Tanca, Alessandro; Jia, Ben; Yang, Runqing; Wang, Bo; Zhang, Yu; Li, Jing
2018-04-06
Metaproteomics provides a direct measure of the functional information by investigating all proteins expressed by a microbiota. However, due to the complexity and heterogeneity of microbial communities, it is very hard to construct a sequence database suitable for a metaproteomic study. Using a public database, researchers might not be able to identify proteins from poorly characterized microbial species, while a sequencing-based metagenomic database may not provide adequate coverage for all potentially expressed protein sequences. To address this challenge, we propose a metagenomic taxonomy-guided database-search strategy (MT), in which a merged database is employed, consisting of both taxonomy-guided reference protein sequences from public databases and proteins from metagenome assembly. By applying our MT strategy to a mock microbial mixture, about two times as many peptides were detected as with the metagenomic database only. According to the evaluation of the reliability of taxonomic attribution, the rate of misassignments was comparable to that obtained using an a priori matched database. We also evaluated the MT strategy with a human gut microbial sample, and we found 1.7 times as many peptides as using a standard metagenomic database. In conclusion, our MT strategy allows the construction of databases able to provide high sensitivity and precision in peptide identification in metaproteomic studies, enabling the detection of proteins from poorly characterized species within the microbiota.
NASA Astrophysics Data System (ADS)
Hersch, Roger David; Crété, Frédérique
2004-12-01
Dot gain is different when dots are printed alone, printed in superposition with one ink or printed in superposition with two inks. In addition, the dot gain may also differ depending on which solid ink the considered halftone layer is superposed. In a previous research project, we developed a model for computing the effective surface coverage of a dot according to its superposition conditions. In the present contribution, we improve the Yule-Nielsen modified Neugebauer model by integrating into it our effective dot surface coverage computation model. Calibration of the reproduction curves mapping nominal to effective surface coverages in every superposition condition is carried out by fitting effective dot surfaces which minimize the sum of square differences between the measured reflection density spectra and reflection density spectra predicted according to the Yule-Nielsen modified Neugebauer model. In order to predict the reflection spectrum of a patch, its known nominal surface coverage values are converted into effective coverage values by weighting the contributions from different reproduction curves according to the weights of the contributing superposition conditions. We analyze the colorimetric prediction improvement brought by our extended dot surface coverage model for clustered-dot offset prints, thermal transfer prints and ink-jet prints. The color differences induced by the differences between measured reflection spectra and reflection spectra predicted according to the new dot surface estimation model are quantified on 729 different cyan, magenta, yellow patches covering the full color gamut. As a reference, these differences are also computed for the classical Yule-Nielsen modified spectral Neugebauer model incorporating a single halftone reproduction curve for each ink. Taking into account dot surface coverages according to different superposition conditions considerably improves the predictions of the Yule-Nielsen modified Neugebauer model. In the case of offset prints, the mean difference between predictions and measurements expressed in CIE-LAB CIE-94 ΔE94 values is reduced at 100 lpi from 1.54 to 0.90 (accuracy improvement factor: 1.7) and at 150 lpi it is reduced from 1.87 to 1.00 (accuracy improvement factor: 1.8). Similar improvements have been observed for a thermal transfer printer at 600 dpi, at lineatures of 50 and 75 lpi. In the case of an ink-jet printer at 600 dpi, the mean ΔE94 value is reduced at 75 lpi from 3.03 to 0.90 (accuracy improvement factor: 3.4) and at 100 lpi from 3.08 to 0.91 (accuracy improvement factor: 3.4).
NASA Astrophysics Data System (ADS)
Hersch, Roger David; Crete, Frederique
2005-01-01
Dot gain is different when dots are printed alone, printed in superposition with one ink or printed in superposition with two inks. In addition, the dot gain may also differ depending on which solid ink the considered halftone layer is superposed. In a previous research project, we developed a model for computing the effective surface coverage of a dot according to its superposition conditions. In the present contribution, we improve the Yule-Nielsen modified Neugebauer model by integrating into it our effective dot surface coverage computation model. Calibration of the reproduction curves mapping nominal to effective surface coverages in every superposition condition is carried out by fitting effective dot surfaces which minimize the sum of square differences between the measured reflection density spectra and reflection density spectra predicted according to the Yule-Nielsen modified Neugebauer model. In order to predict the reflection spectrum of a patch, its known nominal surface coverage values are converted into effective coverage values by weighting the contributions from different reproduction curves according to the weights of the contributing superposition conditions. We analyze the colorimetric prediction improvement brought by our extended dot surface coverage model for clustered-dot offset prints, thermal transfer prints and ink-jet prints. The color differences induced by the differences between measured reflection spectra and reflection spectra predicted according to the new dot surface estimation model are quantified on 729 different cyan, magenta, yellow patches covering the full color gamut. As a reference, these differences are also computed for the classical Yule-Nielsen modified spectral Neugebauer model incorporating a single halftone reproduction curve for each ink. Taking into account dot surface coverages according to different superposition conditions considerably improves the predictions of the Yule-Nielsen modified Neugebauer model. In the case of offset prints, the mean difference between predictions and measurements expressed in CIE-LAB CIE-94 ΔE94 values is reduced at 100 lpi from 1.54 to 0.90 (accuracy improvement factor: 1.7) and at 150 lpi it is reduced from 1.87 to 1.00 (accuracy improvement factor: 1.8). Similar improvements have been observed for a thermal transfer printer at 600 dpi, at lineatures of 50 and 75 lpi. In the case of an ink-jet printer at 600 dpi, the mean ΔE94 value is reduced at 75 lpi from 3.03 to 0.90 (accuracy improvement factor: 3.4) and at 100 lpi from 3.08 to 0.91 (accuracy improvement factor: 3.4).
Nacheva, Elizabeth; Mokretar, Katya; Soenmez, Aynur; Pittman, Alan M; Grace, Colin; Valli, Roberto; Ejaz, Ayesha; Vattathil, Selina; Maserati, Emanuela; Houlden, Henry; Taanman, Jan-Willem; Schapira, Anthony H; Proukakis, Christos
2017-01-01
Potential bias introduced during DNA isolation is inadequately explored, although it could have significant impact on downstream analysis. To investigate this in human brain, we isolated DNA from cerebellum and frontal cortex using spin columns under different conditions, and salting-out. We first analysed DNA using array CGH, which revealed a striking wave pattern suggesting primarily GC-rich cerebellar losses, even against matched frontal cortex DNA, with a similar pattern on a SNP array. The aCGH changes varied with the isolation protocol. Droplet digital PCR of two genes also showed protocol-dependent losses. Whole genome sequencing showed GC-dependent variation in coverage with spin column isolation from cerebellum. We also extracted and sequenced DNA from substantia nigra using salting-out and phenol / chloroform. The mtDNA copy number, assessed by reads mapping to the mitochondrial genome, was higher in substantia nigra when using phenol / chloroform. We thus provide evidence for significant method-dependent bias in DNA isolation from human brain, as reported in rat tissues. This may contribute to array "waves", and could affect copy number determination, particularly if mosaicism is being sought, and sequencing coverage. Variations in isolation protocol may also affect apparent mtDNA abundance.
Nacheva, Elizabeth; Mokretar, Katya; Soenmez, Aynur; Pittman, Alan M.; Grace, Colin; Valli, Roberto; Ejaz, Ayesha; Vattathil, Selina; Maserati, Emanuela; Houlden, Henry; Taanman, Jan-Willem; Schapira, Anthony H.
2017-01-01
Potential bias introduced during DNA isolation is inadequately explored, although it could have significant impact on downstream analysis. To investigate this in human brain, we isolated DNA from cerebellum and frontal cortex using spin columns under different conditions, and salting-out. We first analysed DNA using array CGH, which revealed a striking wave pattern suggesting primarily GC-rich cerebellar losses, even against matched frontal cortex DNA, with a similar pattern on a SNP array. The aCGH changes varied with the isolation protocol. Droplet digital PCR of two genes also showed protocol-dependent losses. Whole genome sequencing showed GC-dependent variation in coverage with spin column isolation from cerebellum. We also extracted and sequenced DNA from substantia nigra using salting-out and phenol / chloroform. The mtDNA copy number, assessed by reads mapping to the mitochondrial genome, was higher in substantia nigra when using phenol / chloroform. We thus provide evidence for significant method-dependent bias in DNA isolation from human brain, as reported in rat tissues. This may contribute to array “waves”, and could affect copy number determination, particularly if mosaicism is being sought, and sequencing coverage. Variations in isolation protocol may also affect apparent mtDNA abundance. PMID:28683077
NASA Astrophysics Data System (ADS)
Basyuni, M.; Sulistiyono, N.; Wati, R.; Sumardi; Oku, H.; Baba, S.; Sagami, H.
2018-03-01
Cloning of Kandelia obovata KcCAS gene (previously known as Kandelia candel) and Rhizophora stylosa RsCAS have already have been reported and encoded cycloartenol synthases. In this study, the predicted KcCAS and RsCAS protein were analyzed using online software of Phyre2 and Swiss-model. The protein modelling for KcCAS and RsCAS cycloartenol synthases was determined using Pyre2 had similar results with slightly different in sequence identity. By contrast, the Swiss-model for KcCAS slightly had higher sequence identity (47.31%) and Qmean (0.70) compared to RsCAS. No difference of ligands binding site which is considered as modulators for both cycloartenol synthases. The range of predicted protein derived from 91-757 amino acid residues with coverage sequence similarities 0.86, respectively from template model of lanosterol synthase from the human. Homology modelling revealed that 706 residues (93% of the amino acid sequence) had been modelled with 100.0% confidence by the single highest scoring template for both KcCAS and RsCAS using Phyre2. This coverage was more elevated than swiss-model predicted (86%). The present study suggested that both genes are responsible for the genesis of cycloartenol in these mangrove plants.
Paliwoda, Rebecca E; Li, Feng; Reid, Michael S; Lin, Yanwen; Le, X Chris
2014-06-17
Functionalizing nanomaterials for diverse analytical, biomedical, and therapeutic applications requires determination of surface coverage (or density) of DNA on nanomaterials. We describe a sequential strand displacement beacon assay that is able to quantify specific DNA sequences conjugated or coconjugated onto gold nanoparticles (AuNPs). Unlike the conventional fluorescence assay that requires the target DNA to be fluorescently labeled, the sequential strand displacement beacon method is able to quantify multiple unlabeled DNA oligonucleotides using a single (universal) strand displacement beacon. This unique feature is achieved by introducing two short unlabeled DNA probes for each specific DNA sequence and by performing sequential DNA strand displacement reactions. Varying the relative amounts of the specific DNA sequences and spacing DNA sequences during their coconjugation onto AuNPs results in different densities of the specific DNA on AuNP, ranging from 90 to 230 DNA molecules per AuNP. Results obtained from our sequential strand displacement beacon assay are consistent with those obtained from the conventional fluorescence assays. However, labeling of DNA with some fluorescent dyes, e.g., tetramethylrhodamine, alters DNA density on AuNP. The strand displacement strategy overcomes this problem by obviating direct labeling of the target DNA. This method has broad potential to facilitate more efficient design and characterization of novel multifunctional materials for diverse applications.
Salis, R. K.; Bruder, A.; Piggott, J. J.; Summerfield, T. C.; Matthaei, C. D.
2017-01-01
Disentangling the individual and interactive effects of multiple stressors on microbial communities is a key challenge to our understanding and management of ecosystems. Advances in molecular techniques allow studying microbial communities in situ and with high taxonomic resolution. However, the taxonomic level which provides the best trade-off between our ability to detect multiple-stressor effects versus the goal of studying entire communities remains unknown. We used outdoor mesocosms simulating small streams to investigate the effects of four agricultural stressors (nutrient enrichment, the nitrification inhibitor dicyandiamide (DCD), fine sediment and flow velocity reduction) on stream bacteria (phyla, orders, genera, and species represented by Operational Taxonomic Units with 97% sequence similarity). Community composition was assessed using amplicon sequencing (16S rRNA gene, V3-V4 region). DCD was the most pervasive stressor, affecting evenness and most abundant taxa, followed by sediment and flow velocity. Stressor pervasiveness was similar across taxonomic levels and lower levels did not perform better in detecting stressor effects. Community coverage decreased from 96% of all sequences for abundant phyla to 28% for species. Order-level responses were generally representative of responses of corresponding genera and species, suggesting that this level may represent the best compromise between stressor sensitivity and coverage of bacterial communities. PMID:28327636
A high-coverage draft genome of the mycalesine butterfly Bicyclus anynana.
Nowell, Reuben W; Elsworth, Ben; Oostra, Vicencio; Zwaan, Bas J; Wheat, Christopher W; Saastamoinen, Marjo; Saccheri, Ilik J; Van't Hof, Arjen E; Wasik, Bethany R; Connahs, Heidi; Aslam, Muhammad L; Kumar, Sujai; Challis, Richard J; Monteiro, Antónia; Brakefield, Paul M; Blaxter, Mark
2017-07-01
The mycalesine butterfly Bicyclus anynana, the "Squinting bush brown," is a model organism in the study of lepidopteran ecology, development, and evolution. Here, we present a draft genome sequence for B. anynana to serve as a genomics resource for current and future studies of this important model species. Seven libraries with insert sizes ranging from 350 bp to 20 kb were constructed using DNA from an inbred female and sequenced using both Illumina and PacBio technology; 128 Gb of raw Illumina data was filtered to 124 Gb and assembled to a final size of 475 Mb (∼×260 assembly coverage). Contigs were scaffolded using mate-pair, transcriptome, and PacBio data into 10 800 sequences with an N50 of 638 kb (longest scaffold 5 Mb). The genome is comprised of 26% repetitive elements and encodes a total of 22 642 predicted protein-coding genes. Recovery of a BUSCO set of core metazoan genes was almost complete (98%). Overall, these metrics compare well with other recently published lepidopteran genomes. We report a high-quality draft genome sequence for Bicyclus anynana. The genome assembly and annotated gene models are available at LepBase (http://ensembl.lepbase.org/index.html). © The Authors 2017. Published by Oxford University Press.
A high-coverage draft genome of the mycalesine butterfly Bicyclus anynana
Elsworth, Ben; Oostra, Vicencio; Zwaan, Bas J.; Wheat, Christopher W.; Saastamoinen, Marjo; Saccheri, Ilik J.; van’t Hof, Arjen E.; Wasik, Bethany R.; Connahs, Heidi; Aslam, Muhammad L.; Kumar, Sujai; Challis, Richard J.; Monteiro, Antónia; Brakefield, Paul M.
2017-01-01
Abstract The mycalesine butterfly Bicyclus anynana, the “Squinting bush brown,” is a model organism in the study of lepidopteran ecology, development, and evolution. Here, we present a draft genome sequence for B. anynana to serve as a genomics resource for current and future studies of this important model species. Seven libraries with insert sizes ranging from 350 bp to 20 kb were constructed using DNA from an inbred female and sequenced using both Illumina and PacBio technology; 128 Gb of raw Illumina data was filtered to 124 Gb and assembled to a final size of 475 Mb (∼×260 assembly coverage). Contigs were scaffolded using mate-pair, transcriptome, and PacBio data into 10 800 sequences with an N50 of 638 kb (longest scaffold 5 Mb). The genome is comprised of 26% repetitive elements and encodes a total of 22 642 predicted protein-coding genes. Recovery of a BUSCO set of core metazoan genes was almost complete (98%). Overall, these metrics compare well with other recently published lepidopteran genomes. We report a high-quality draft genome sequence for Bicyclus anynana. The genome assembly and annotated gene models are available at LepBase (http://ensembl.lepbase.org/index.html). PMID:28486658
Coverage Bias and Sensitivity of Variant Calling for Four Whole-genome Sequencing Technologies
Lasitschka, Bärbel; Jones, David; Northcott, Paul; Hutter, Barbara; Jäger, Natalie; Kool, Marcel; Taylor, Michael; Lichter, Peter; Pfister, Stefan; Wolf, Stephan; Brors, Benedikt; Eils, Roland
2013-01-01
The emergence of high-throughput, next-generation sequencing technologies has dramatically altered the way we assess genomes in population genetics and in cancer genomics. Currently, there are four commonly used whole-genome sequencing platforms on the market: Illumina’s HiSeq2000, Life Technologies’ SOLiD 4 and its completely redesigned 5500xl SOLiD, and Complete Genomics’ technology. A number of earlier studies have compared a subset of those sequencing platforms or compared those platforms with Sanger sequencing, which is prohibitively expensive for whole genome studies. Here we present a detailed comparison of the performance of all currently available whole genome sequencing platforms, especially regarding their ability to call SNVs and to evenly cover the genome and specific genomic regions. Unlike earlier studies, we base our comparison on four different samples, allowing us to assess the between-sample variation of the platforms. We find a pronounced GC bias in GC-rich regions for Life Technologies’ platforms, with Complete Genomics performing best here, while we see the least bias in GC-poor regions for HiSeq2000 and 5500xl. HiSeq2000 gives the most uniform coverage and displays the least sample-to-sample variation. In contrast, Complete Genomics exhibits by far the smallest fraction of bases not covered, while the SOLiD platforms reveal remarkable shortcomings, especially in covering CpG islands. When comparing the performance of the four platforms for calling SNPs, HiSeq2000 and Complete Genomics achieve the highest sensitivity, while the SOLiD platforms show the lowest false positive rate. Finally, we find that integrating sequencing data from different platforms offers the potential to combine the strengths of different technologies. In summary, our results detail the strengths and weaknesses of all four whole-genome sequencing platforms. It indicates application areas that call for a specific sequencing platform and disallow other platforms. This helps to identify the proper sequencing platform for whole genome studies with different application scopes. PMID:23776689
Genomic treasure troves: complete genome sequencing of herbarium and insect museum specimens.
Staats, Martijn; Erkens, Roy H J; van de Vossenberg, Bart; Wieringa, Jan J; Kraaijeveld, Ken; Stielow, Benjamin; Geml, József; Richardson, James E; Bakker, Freek T
2013-01-01
Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS) world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae) herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22-82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus) were generated with 81.4-97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2-71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes), but at least generating vital comparative genomic data for testing (phylo)genetic, demographic and genetic hypotheses, that become increasingly more horizontal. Furthermore, NGS of historical DNA enables recovering crucial genetic information from old type specimens that to date have remained mostly unutilized and, thus, opens up a new frontier for taxonomic research as well.
Zhou, Yuqing; Xing, Yi; Liang, Xiaofeng; Yue, Chenyan; Zhu, Xu; Hipgrave, David
2016-01-01
Objective To evaluate interventions to improve routine vaccination coverage and caregiver knowledge in China's remote west, where routine immunisation is relatively weak. Design Prospective pre–post (2006–2010) evaluation in project counties; retrospective comparison based on 2004 administrative data at baseline and surveyed post-intervention (2010) data in selected non-project counties. Setting Four project counties and one non-project county in each of four provinces. Participants 3390 children in project counties at baseline, and 3299 in project and 830 in non-project counties post-intervention; and 3279 caregivers at baseline, and 3389 in project and 830 in non-project counties post-intervention. Intervention Multicomponent inexpensive knowledge-strengthening and service-strengthening and innovative, multisectoral engagement. Data collection Standard 30-cluster household surveys of vaccine coverage and caregiver interviews pre-intervention and post-intervention in each project county. Similar surveys in one non-project county selected by local authorities in each province post-intervention. Administrative data on vaccination coverage in non-project counties at baseline. Primary outcome measures Changes in vaccine coverage between baseline and project completion (2010); comparative caregiver knowledge in all counties in 2010. Analysis Crude (χ2) analysis of changes and differences in vaccination coverage and related knowledge. Multiple logistic regression to assess associations with timely coverage. Results Timely coverage of four routine vaccines increased by 21% (p<0.001) and hepatitis B (HepB) birth dose by 35% (p<0.001) over baseline in project counties. Comparison with non-project counties revealed secular improvement in most provinces, except new vaccine coverage was mostly higher in project counties. Ethnicity, province, birthplace, vaccination site, dual-parental out-migration and parental knowledge had significant associations with coverage. Knowledge increased for all variables but one in project counties (highest p<0.05) and was substantially higher than in non-project counties (p<0.01). Conclusions Comprehensive but inexpensive strategies improved vaccination coverage and caretaker knowledge in western China. Establishing multisectoral leadership, involving the education sector and including immunisation in public-sector performance standards, are affordable and effective interventions. PMID:26966053
ERIC Educational Resources Information Center
Wright, Richard A.
1987-01-01
Compares coverage of women and crime in sociology journals published from 1956-1960 and from 1976-1980 with coverage in criminology texts published from 1956-1965 and from 1976-1985. Data shows despite recent increases in publication of research in journals, criminology texts show little improvement in their coverage of women and crime. Concludes…
Using lot quality assurance sampling to improve immunization coverage in Bangladesh.
Tawfik, Y.; Hoque, S.; Siddiqi, M.
2001-01-01
OBJECTIVE: To determine areas of low vaccination coverage in five cities in Bangladesh (Chittagong, Dhaka, Khulna, Rajshahi, and Syedpur). METHODS: Six studies using lot quality assurance sampling were conducted between 1995 and 1997 by Basic Support for Institutionalizing Child Survival and the Bangladesh National Expanded Programme on Immunization. FINDINGS: BCG vaccination coverage was acceptable in all lots studied; however, the proportion of lots rejected because coverage of measles vaccination was low ranged from 0% of lots in Syedpur to 12% in Chittagong and 20% in Dhaka's zones 7 and 8. The proportion of lots rejected because an inadequate number of children in the sample had been fully vaccinated varied from 11% in Syedpur to 30% in Dhaka. Additionally, analysis of aggregated, weighted immunization coverage showed that there was a high BCG vaccination coverage (the first administered vaccine) and a low measles vaccination coverage (the last administered vaccine) indicating a high drop-out rate, ranging from 14% in Syedpur to 36% in Dhaka's zone 8. CONCLUSION: In Bangladesh, where resources are limited, results from surveys using lot quality assurance sampling enabled managers of the National Expanded Programme on Immunization to identify areas with poor vaccination coverage. Those areas were targeted to receive focused interventions to improve coverage. Since this sampling method requires only a small sample size and was easy for staff to use, it is feasible for routine monitoring of vaccination coverage. PMID:11436470
Identification of mutated driver pathways in cancer using a multi-objective optimization model.
Zheng, Chun-Hou; Yang, Wu; Chong, Yan-Wen; Xia, Jun-Feng
2016-05-01
New-generation high-throughput technologies, including next-generation sequencing technology, have been extensively applied to solve biological problems. As a result, large cancer genomics projects such as the Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium are producing large amount of rich and diverse data in multiple cancer types. The identification of mutated driver genes and driver pathways from these data is a significant challenge. Genome aberrations in cancer cells can be divided into two types: random 'passenger mutation' and functional 'driver mutation'. In this paper, we introduced a Multi-objective Optimization model based on a Genetic Algorithm (MOGA) to solve the maximum weight submatrix problem, which can be employed to identify driver genes and driver pathways promoting cancer proliferation. The maximum weight submatrix problem defined to find mutated driver pathways is based on two specific properties, i.e., high coverage and high exclusivity. The multi-objective optimization model can adjust the trade-off between high coverage and high exclusivity. We proposed an integrative model by combining gene expression data and mutation data to improve the performance of the MOGA algorithm in a biological context. Copyright © 2016 Elsevier Ltd. All rights reserved.
Association mapping identifies loci for canopy coverage in diverse soybean genotypes
USDA-ARS?s Scientific Manuscript database
Rapid establishment of canopy coverage decreases soil evaporation relative to transpiration (T), improves water use efficiency (WUE) and light interception, and increases soybean competitiveness against weeds. The objective of this study was to identify genomic loci associated with canopy coverage (...
Code of Federal Regulations, 2011 CFR
2011-07-01
... the Family Caregiver in transitioning to alternative health care coverage and with mental health... individual with transitioning to alternative health care coverage and with mental health services, unless one... health care coverage and with mental health services. If revocation is due to improvement in the eligible...
Code of Federal Regulations, 2012 CFR
2012-07-01
... the Family Caregiver in transitioning to alternative health care coverage and with mental health... individual with transitioning to alternative health care coverage and with mental health services, unless one... health care coverage and with mental health services. If revocation is due to improvement in the eligible...
Graves, John A; Nikpay, Sayeh S
2017-02-01
The introduction of Medicaid expansions and state Marketplaces under the Affordable Care Act (ACA) have reduced the uninsurance rate to historic lows, changing the choices Americans make about coverage. In this article we shed light on these changing dynamics. We drew upon multistate transition models fit to nationally representative longitudinal data to estimate coverage transition probabilities between major insurance types in the years leading up to and including 2014. We found that the ACA's unprecedented coverage changes increased transitions to Medicaid and nongroup coverage among the uninsured, while strengthening the existing employer-sponsored insurance system and improving retention of public coverage. However, our results suggest possible weakness of state Marketplaces, since people gaining nongroup coverage were disproportionately older than other potential enrollees. We identified key opportunities for policy makers and insurers to improve underlying Marketplace risk pools by focusing on people transitioning from employer-sponsored coverage; these people are disproportionately younger and saw almost no change in their likelihood of becoming uninsured in 2014 compared to earlier years. Project HOPE—The People-to-People Health Foundation, Inc.
Launch mission summary: Intelsat 5 (F3) Atlas/Centaur-55
NASA Technical Reports Server (NTRS)
1981-01-01
Intelsat 5 (F3) spacecraft, launch vehicle, and mission are described. Information relative to launch windows, flight plan, radar and telemetry coverage, selected trajectory information, and a brief sequence of flight events is provided.
Recovery of a Medieval Brucella melitensis Genome Using Shotgun Metagenomics
Kay, Gemma L.; Sergeant, Martin J.; Giuffra, Valentina; Bandiera, Pasquale; Milanese, Marco; Bramanti, Barbara
2014-01-01
ABSTRACT Shotgun metagenomics provides a powerful assumption-free approach to the recovery of pathogen genomes from contemporary and historical material. We sequenced the metagenome of a calcified nodule from the skeleton of a 14th-century middle-aged male excavated from the medieval Sardinian settlement of Geridu. We obtained 6.5-fold coverage of a Brucella melitensis genome. Sequence reads from this genome showed signatures typical of ancient or aged DNA. Despite the relatively low coverage, we were able to use information from single-nucleotide polymorphisms to place the medieval pathogen genome within a clade of B. melitensis strains that included the well-studied Ether strain and two other recent Italian isolates. We confirmed this placement using information from deletions and IS711 insertions. We conclude that metagenomics stands ready to document past and present infections, shedding light on the emergence, evolution, and spread of microbial pathogens. PMID:25028426
Goh, V; Halligan, S; Gartner, L; Bassett, P; Bartram, C I
2006-07-01
The purpose of this study was to determine if greater z-axis tumour coverage improves the reproducibility of quantitative colorectal cancer perfusion measurements using CT. A 65 s perfusion study was acquired following intravenous contrast administration in 10 patients with proven colorectal cancer using a four-detector row scanner. This was repeated within 48 h using identical technical parameters to allow reproducibility assessment. Quantitative tumour blood volume, blood flow, mean transit time and permeability measurements were determined using commercially available software (Perfusion 3.0; GE Healthcare, Waukesha, WI) for data obtained from a 5 mm z-axis tumour coverage, and from a 20 mm z-axis tumour coverage. Measurement reproducibility was assessed using Bland-Altman statistics, for a 5 mm z-axis tumour coverage, and 20 mm z-axis tumour coverage, respectively. The mean difference (95% limits of agreement) for blood volume, blood flow, mean transit time and permeability were 0.04 (-2.50 to +2.43) ml/100 g tissue; +8.80 (-50.5 to +68.0) ml/100 g tissue/min; -0.99 (-8.19 to +6.20) seconds; and +1.20 (-5.42 to +7.83) ml/100 g tissue/min, respectively, for a 5 mm coverage, and -0.04 (-2.61 to +2.53) ml/100 g tissue; +7.40 (-50.3 to +65.0) ml/100 g tissue/min; -2.46 (-12.61 to +7.69) seconds; and -0.23 (-8.31 to +7.85) ml/100 g tissue/min, respectively, for a 20 mm coverage, indicating similar levels of agreement. In conclusion, increasing z-axis coverage does not improve reproducibility of quantitative colorectal cancer perfusion measurements.
Penmetsa, R. V.; Dutta, S.; Kulwal, P. L.; Saxena, R. K.; Datta, S.; Sharma, T. R.; Rosen, B.; Carrasquilla-Garcia, N.; Farmer, A. D.; Dubey, A.; Saxena, K. B.; Gao, J.; Fakrudin, B.; Singh, M. N.; Singh, B. P.; Wanjari, K. B.; Yuan, M.; Srivastava, R. K.; Kilian, A.; Upadhyaya, H. D.; Mallikarjuna, N.; Town, C. D.; Bruening, G. E.; He, G.; May, G. D.; McCombie, R.; Jackson, S. A.; Singh, N. K.; Cook, D. R.
2009-01-01
Pigeonpea (Cajanus cajan), an important food legume crop in the semi-arid regions of the world and the second most important pulse crop in India, has an average crop productivity of 780 kg/ha. The relatively low crop yields may be attributed to non-availability of improved cultivars, poor crop husbandry and exposure to a number of biotic and abiotic stresses in pigeonpea growing regions. Narrow genetic diversity in cultivated germplasm has further hampered the effective utilization of conventional breeding as well as development and utilization of genomic tools, resulting in pigeonpea being often referred to as an ‘orphan crop legume’. To enable genomics-assisted breeding in this crop, the pigeonpea genomics initiative (PGI) was initiated in late 2006 with funding from Indian Council of Agricultural Research under the umbrella of Indo-US agricultural knowledge initiative, which was further expanded with financial support from the US National Science Foundation’s Plant Genome Research Program and the Generation Challenge Program. As a result of the PGI, the last 3 years have witnessed significant progress in development of both genetic as well as genomic resources in this crop through effective collaborations and coordination of genomics activities across several institutes and countries. For instance, 25 mapping populations segregating for a number of biotic and abiotic stresses have been developed or are under development. An 11X-genome coverage bacterial artificial chromosome (BAC) library comprising of 69,120 clones have been developed of which 50,000 clones were end sequenced to generate 87,590 BAC-end sequences (BESs). About 10,000 expressed sequence tags (ESTs) from Sanger sequencing and ca. 2 million short ESTs by 454/FLX sequencing have been generated. A variety of molecular markers have been developed from BESs, microsatellite or simple sequence repeat (SSR)-enriched libraries and mining of ESTs and genomic amplicon sequencing. Of about 21,000 SSRs identified, 6,698 SSRs are under analysis along with 670 orthologous genes using a GoldenGate SNP (single nucleotide polymorphism) genotyping platform, with large scale SNP discovery using Solexa, a next generation sequencing technology, is in progress. Similarly a diversity array technology array comprising of ca. 15,000 features has been developed. In addition, >600 unique nucleotide binding site (NBS) domain containing members of the NBS-leucine rich repeat disease resistance homologs were cloned in pigeonpea; 960 BACs containing these sequences were identified by filter hybridization, BES physical maps developed using high information content fingerprinting. To enrich the genomic resources further, sequenced soybean genome is being analyzed to establish the anchor points between pigeonpea and soybean genomes. In addition, Solexa sequencing is being used to explore the feasibility of generating whole genome sequence. In summary, the collaborative efforts of several research groups under the umbrella of PGI are making significant progress in improving molecular tools in pigeonpea and should significantly benefit pigeonpea genetics and breeding. As these efforts come to fruition, and expanded (depending on funding), pigeonpea would move from an ‘orphan legume crop’ to one where genomics-assisted breeding approaches for a sustainable crop improvement are routine. PMID:20976284
Detecting false positive sequence homology: a machine learning approach.
Fujimoto, M Stanley; Suvorov, Anton; Jensen, Nicholas O; Clement, Mark J; Bybee, Seth M
2016-02-24
Accurate detection of homologous relationships of biological sequences (DNA or amino acid) amongst organisms is an important and often difficult task that is essential to various evolutionary studies, ranging from building phylogenies to predicting functional gene annotations. There are many existing heuristic tools, most commonly based on bidirectional BLAST searches that are used to identify homologous genes and combine them into two fundamentally distinct classes: orthologs and paralogs. Due to only using heuristic filtering based on significance score cutoffs and having no cluster post-processing tools available, these methods can often produce multiple clusters constituting unrelated (non-homologous) sequences. Therefore sequencing data extracted from incomplete genome/transcriptome assemblies originated from low coverage sequencing or produced by de novo processes without a reference genome are susceptible to high false positive rates of homology detection. In this paper we develop biologically informative features that can be extracted from multiple sequence alignments of putative homologous genes (orthologs and paralogs) and further utilized in context of guided experimentation to verify false positive outcomes. We demonstrate that our machine learning method trained on both known homology clusters obtained from OrthoDB and randomly generated sequence alignments (non-homologs), successfully determines apparent false positives inferred by heuristic algorithms especially among proteomes recovered from low-coverage RNA-seq data. Almost ~42 % and ~25 % of predicted putative homologies by InParanoid and HaMStR respectively were classified as false positives on experimental data set. Our process increases the quality of output from other clustering algorithms by providing a novel post-processing method that is both fast and efficient at removing low quality clusters of putative homologous genes recovered by heuristic-based approaches.
The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes.
Mao, Qing; Ciotlos, Serban; Zhang, Rebecca Yu; Ball, Madeleine P; Chin, Robert; Carnevali, Paolo; Barua, Nina; Nguyen, Staci; Agarwal, Misha R; Clegg, Tom; Connelly, Abram; Vandewege, Ward; Zaranek, Alexander Wait; Estep, Preston W; Church, George M; Drmanac, Radoje; Peters, Brock A
2016-10-11
Since the completion of the Human Genome Project in 2003, it is estimated that more than 200,000 individual whole human genomes have been sequenced. A stunning accomplishment in such a short period of time. However, most of these were sequenced without experimental haplotype data and are therefore missing an important aspect of genome biology. In addition, much of the genomic data is not available to the public and lacks phenotypic information. As part of the Personal Genome Project, blood samples from 184 participants were collected and processed using Complete Genomics' Long Fragment Read technology. Here, we present the experimental whole genome haplotyping and sequencing of these samples to an average read coverage depth of 100X. This is approximately three-fold higher than the read coverage applied to most whole human genome assemblies and ensures the highest quality results. Currently, 114 genomes from this dataset are freely available in the GigaDB repository and are associated with rich phenotypic data; the remaining 70 should be added in the near future as they are approved through the PGP data release process. For reproducibility analyses, 20 genomes were sequenced at least twice using independent LFR barcoded libraries. Seven genomes were also sequenced using Complete Genomics' standard non-barcoded library process. In addition, we report 2.6 million high-quality, rare variants not previously identified in the Single Nucleotide Polymorphisms database or the 1000 Genomes Project Phase 3 data. These genomes represent a unique source of haplotype and phenotype data for the scientific community and should help to expand our understanding of human genome evolution and function.
A map of human genome variation from population-scale sequencing.
Abecasis, Gonçalo R; Altshuler, David; Auton, Adam; Brooks, Lisa D; Durbin, Richard M; Gibbs, Richard A; Hurles, Matt E; McVean, Gil A
2010-10-28
The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.
Policy Changes and Improvements in Health Insurance Coverage Among MSM: 20 U.S. Cities, 2008-2014.
Cooley, Laura A; Hoots, Brooke; Wejnert, Cyprian; Lewis, Rashunda; Paz-Bailey, Gabriela
2017-03-01
Recent policy changes have improved the ability of gay, bisexual, and other men who have sex with men (MSM) to secure health insurance. We wanted to assess changes over time in self-reported health insurance status among MSM participating in CDC's National HIV Behavioral Surveillance (NHBS) in 2008, 2011, and 2014. We analyzed NHBS data from sexually active MSM interviewed at venues in 20 U.S. cities. To determine if interview year was associated with health insurance status, we used a Poisson model with robust standard errors. Among included MSM, the overall percentage of MSM with health insurance rose 16 % from 2008 (68 %) to 2014 (79 %) (p value for trend < 0.001). The change in coverage over time was greatest in key demographic segments with lower health insurance coverage all three interview years, by age, education, and income. Corresponding with recent policy changes, health insurance improved among MSM participating in NHBS, with greater improvements in historically underinsured demographic segments. Despite these increases, improved coverage is still needed. Improved access to health insurance could lead to a reduction in health disparities among MSM over time.
Leyvraz, Magali; Aaron, Grant J; Poonawala, Alia; van Liere, Marti J; Schofield, Dominic; Myatt, Mark; Neufeld, Lynnette M
2017-05-01
Background: The efficacy of a number of interventions that include fortified complementary foods (FCFs) or other products to improve infant and young child feeding (IYCF) is well established. Programs that provide such products free or at a subsidized price are implemented in many countries around the world. Demonstrating the impact at scale of these programs has been challenging, and rigorous information on coverage and utilization is lacking. Objective: The objective of this article is to review key findings from 11 coverage surveys of IYCF programs distributing or selling FCFs or micronutrient powders in 5 countries. Methods: Programs were implemented in Ghana, Cote d'Ivoire, India, Bangladesh, and Vietnam. Surveys were implemented at different stages of program implementation between 2013 and 2015. The Fortification Assessment Coverage Toolkit (FACT) was developed to assess 3 levels of coverage (message: awareness of the product; contact: use of the product ≥1 time; and effective: regular use aligned with program-specific goals), as well as barriers and factors that facilitate coverage. Analyses included the coverage estimates, as well as an assessment of equity of coverage between the poor and nonpoor, and between those with poor and adequate child feeding practices. Results: Coverage varied greatly between countries and program models. Message coverage ranged from 29.0% to 99.7%, contact coverage from 22.6% to 94.4%, and effective coverage from 0.8% to 88.3%. Beyond creating awareness, programs that achieved high coverage were those with effective mechanisms in place to overcome barriers for both supply and demand. Conclusions: Variability in coverage was likely due to the program design, delivery model, quality of implementation, and product type. Measuring program coverage and understanding its determinants is essential for program improvement and to estimate the potential for impact of programs at scale. Use of the FACT can help overcome this evidence gap.
Aaron, Grant J; Poonawala, Alia; van Liere, Marti J; Schofield, Dominic; Myatt, Mark
2017-01-01
Background: The efficacy of a number of interventions that include fortified complementary foods (FCFs) or other products to improve infant and young child feeding (IYCF) is well established. Programs that provide such products free or at a subsidized price are implemented in many countries around the world. Demonstrating the impact at scale of these programs has been challenging, and rigorous information on coverage and utilization is lacking. Objective: The objective of this article is to review key findings from 11 coverage surveys of IYCF programs distributing or selling FCFs or micronutrient powders in 5 countries. Methods: Programs were implemented in Ghana, Cote d’Ivoire, India, Bangladesh, and Vietnam. Surveys were implemented at different stages of program implementation between 2013 and 2015. The Fortification Assessment Coverage Toolkit (FACT) was developed to assess 3 levels of coverage (message: awareness of the product; contact: use of the product ≥1 time; and effective: regular use aligned with program-specific goals), as well as barriers and factors that facilitate coverage. Analyses included the coverage estimates, as well as an assessment of equity of coverage between the poor and nonpoor, and between those with poor and adequate child feeding practices. Results: Coverage varied greatly between countries and program models. Message coverage ranged from 29.0% to 99.7%, contact coverage from 22.6% to 94.4%, and effective coverage from 0.8% to 88.3%. Beyond creating awareness, programs that achieved high coverage were those with effective mechanisms in place to overcome barriers for both supply and demand. Conclusions: Variability in coverage was likely due to the program design, delivery model, quality of implementation, and product type. Measuring program coverage and understanding its determinants is essential for program improvement and to estimate the potential for impact of programs at scale. Use of the FACT can help overcome this evidence gap. PMID:28404839
Beattie, Allison; Yates, Robert; Noble, Douglas J
2016-01-01
Universal health coverage generates significant health and economic benefits and enables governments to reduce inequity. Where universal health coverage has been implemented well, it can contribute to nation-building. This analysis reviews evidence from Asia and Pacific drawing out determinants of successful systems and barriers to progress with a focus on women and children. Access to healthcare is important for women and children and contributes to early childhood development. Universal health coverage is a political process from the start, and public financing is critical and directly related to more equitable health systems. Closing primary healthcare gaps should be the foundation of universal health coverage reforms. Recommendations for policy for national governments to improve universal health coverage are identified, including countries spending < 3% of gross domestic product in public expenditure on health committing to increasing funding by at least 0.3%/year to reach a minimum expenditure threshold of 3%. PMID:28588989
Time-Distance Helioseismology with the HMI Instrument
NASA Technical Reports Server (NTRS)
Duvall, Thomas L., Jr.
2010-01-01
We expect considerable improvement of time-distance results from the Helioseismic and Magnetic Imager (HMI) instrument as opposed to the earlier MDI and GONG data. The higher data rate makes possible several improvements, including faster temporal sampling (45 sec), smaller spatial pixels (0.5 arc sec), better wavelength coverage (6 samples across the line all transmitted to the ground), and year-round coverage of the full disk. The higher spatial resolution makes possible better longitude coverage of active regions and supergranulation and also better latitude coverage. Doppler, continuum, and line depth images have a strong granulation signal. Line core images show little granulation. Analyses to test the limits of these new capabilities will be presented.
Universal health coverage in Latin American countries: how to improve solidarity-based schemes.
Titelman, Daniel; Cetrángolo, Oscar; Acosta, Olga Lucía
2015-04-04
In this Health Policy we examine the association between the financing structure of health systems and universal health coverage. Latin American health systems encompass a wide range of financial sources, which translate into different solidarity-based schemes that combine contributory (payroll taxes) and non-contributory (general taxes) sources of financing. To move towards universal health coverage, solidarity-based schemes must heavily rely on countries' capacity to increase public expenditure in health. Improvement of solidarity-based schemes will need the expansion of mandatory universal insurance systems and strengthening of the public sector including increased fiscal expenditure. These actions demand a new model to integrate different sources of health-sector financing, including general tax revenue, social security contributions, and private expenditure. The extent of integration achieved among these sources will be the main determinant of solidarity and universal health coverage. The basic challenges for improvement of universal health coverage are not only to spend more on health, but also to reduce the proportion of out-of-pocket spending, which will need increased fiscal resources. Copyright © 2015 Elsevier Ltd. All rights reserved.
Walter, Vonn; Patel, Nirali M.; Eberhard, David A.; Hayward, Michele C.; Salazar, Ashley H.; Jo, Heejoon; Soloway, Matthew G.; Wilkerson, Matthew D.; Parker, Joel S.; Yin, Xiaoying; Zhang, Guosheng; Siegel, Marni B.; Rosson, Gary B.; Earp, H. Shelton; Sharpless, Norman E.; Gulley, Margaret L.; Weck, Karen E.
2015-01-01
The recent FDA approval of the MiSeqDx platform provides a unique opportunity to develop targeted next generation sequencing (NGS) panels for human disease, including cancer. We have developed a scalable, targeted panel-based assay termed UNCseq, which involves a NGS panel of over 200 cancer-associated genes and a standardized downstream bioinformatics pipeline for detection of single nucleotide variations (SNV) as well as small insertions and deletions (indel). In addition, we developed a novel algorithm, NGScopy, designed for samples with sparse sequencing coverage to detect large-scale copy number variations (CNV), similar to human SNP Array 6.0 as well as small-scale intragenic CNV. Overall, we applied this assay to 100 snap-frozen lung cancer specimens lacking same-patient germline DNA (07–0120 tissue cohort) and validated our results against Sanger sequencing, SNP Array, and our recently published integrated DNA-seq/RNA-seq assay, UNCqeR, where RNA-seq of same-patient tumor specimens confirmed SNV detected by DNA-seq, if RNA-seq coverage depth was adequate. In addition, we applied the UNCseq assay on an independent lung cancer tumor tissue collection with available same-patient germline DNA (11–1115 tissue cohort) and confirmed mutations using assays performed in a CLIA-certified laboratory. We conclude that UNCseq can identify SNV, indel, and CNV in tumor specimens lacking germline DNA in a cost-efficient fashion. PMID:26076459
Harder, Valerie S; Barry, Sara E; Ahrens, Bridget; Davis, Wendy S; Shaw, Judith S
Despite the proven benefits of immunizations, coverage remains low in many states, including Vermont. This study measured the impact of a quality improvement (QI) project on immunization coverage in childhood, school-age, and adolescent groups. In 2013, a total of 20 primary care practices completed a 7-month QI project aimed to increase immunization coverage among early childhood (29-33 months), school-age (6 years), and adolescent (13 years) age groups. For this study, we examined random cross-sectional medical record reviews from 12 of the 20 practices within each age group in 2012, 2013, and 2014 to measure improvement in immunization coverage over time using chi-squared tests. We repeated these analyses on population-level data from Vermont's immunization registry for the 12 practices in each age group each year. We used difference-in-differences regressions in the immunization registry data to compare improvements over time between the 12 practices and those not participating in QI. Immunization coverage increased over 3 years for all ages and all immunization series (P ≤ .009) except one, as measured by medical record review. Registry results aligned partially with medical record review with increases in early childhood and adolescent series over time (P ≤ .012). Notably, the adolescent immunization series completion, including human papillomavirus, increased more than in the comparison practices (P = .037). Medical record review indicated that QI efforts led to increases in immunization coverage in pediatric primary care. Results were partially validated in the immunization registry particularly among early childhood and adolescent groups, with a population-level impact of the intervention among adolescents. Copyright © 2018 Academic Pediatric Association. Published by Elsevier Inc. All rights reserved.
76 FR 37049 - Improving Wireless Coverage Through the Use of Signal Boosters
Federal Register 2010, 2011, 2012, 2013, 2014
2011-06-24
... FEDERAL COMMUNICATIONS COMMISSION 47 CFR Parts 1, 2, 22, 24, 27, 90 and 95 [WT Docket No. 10-4; DA 11-1078] Improving Wireless Coverage Through the Use of Signal Boosters AGENCY: Federal Communications Commission. ACTION: Proposed rule; extension of comment period. SUMMARY: The Federal...
Potential Improvements to VLBA UV-Coverages by the Addition of a 32-m Peruvian Antenna
NASA Astrophysics Data System (ADS)
Horiuchi, S.; Murphy, D. W.; Ishitsuka, J. K.; Ishitsuka, M.
2005-12-01
A plan is being currently developed to convert a 32-m telecomunications antenna in the Peruvian Andes into a radio astronomy facility. Significant improvements to stand-alone VLBA UV-coverages can be obtained with the addition of this southern hemisphere telescope to VLBA observations.
Assessing levels and trends of child health inequality in 88 developing countries: from 2000 to 2014
Li, Zhihui; Li, Mingqiang; Subramanian, S. V.; Lu, Chunling
2017-01-01
ABSTRACT Background: Reducing child mortality was one of the Millennium Development Goals. In the current Sustainable Development Goals era, achieving equity is prioritized as a major aim. Objective: This study aims to provide a comprehensive and updated picture of inequalities in child health intervention coverage and child health outcomes by wealth status, as well as their trends between 2000 and 2014. Methods: Using data from Demographic Health Surveys and Multiple Indicator Cluster Surveys, we adopted three measures of inequality, including one absolute inequality indicator and two relative inequality indicators, to estimate the level and trends of inequalities in three child health outcome variables and 17 intervention coverages in 88 developing countries. Results: While improvements in child health outcomes and coverage of interventions have been observed between 2000 and 2014, large inequalities remain. There was a high level of variation between countries’ progress toward reducing child health inequalities, with some countries significantly improving, some deteriorating, and some remaining statistically unchanged. Among child health interventions, the least equitable one was access to improved sanitation (The absolute difference in coverages between the richest quintile and the poorest quintile reached 49.5% [42.7, 56.2]), followed by access to improved water (34.1% [29.5, 38.6]), and skilled birth attendant (SBA) (34.1% [28.8, 39.4]). The most equitable intervention coverage was insecticide-treated bed net for children (1.0% [−3.9, 5.9]), followed by oral rehydration therapy for diarrhea ((8.0% [5.2, 10.8]), and vitamin A supplement (8.4% [5.1, 11.7]). These findings were robust to various inequality measurements. Conclusions: Although child health outcomes and coverage of interventions have improved largely over the study period for almost all wealth quintiles, insufficient progress was made in reducing child health inequalities between the poorest and richest wealth quintiles. Future efforts should focus on reaching the poorest children by increasing investments toward expanding the coverage of interventions in resource-limited settings. PMID:29228888
NASA Astrophysics Data System (ADS)
Kim, Kyoung-Nam; Heo, Phil; Kim, Young-Bo; Han, Gyu-Cheol
2015-02-01
An ultra-high-field magnetic resonance (MR) scanner and a specially-optimized radiofrequency (RF) coil and sequence protocol are required to obtain high-resolution images of the inner ear that can noninvasively confirm pathologic diagnoses. In phantom studies, the MR signal distribution of the gradient echo MR images generated by using a customized RF coil was compared with that of a commercial volume coil. The MR signal intensity of the customized RF coil decreases rapidly from near the RF coil plane toward the exterior of the phantom. However, the signal sensitivity of this coil is superior on both sides of the phantom, corresponding to the petrous pyramid. In in-vivo 7-T MR imaging, a customized RF coil and a volumetric-interpolated breath-hold examination imaging sequence are employed for visualization of the inner ear's structure. The entire membranous portion of the cochlear and the three semicircular canals, including the ductus reunions, oval window, and round window with associated nervous tissue, were clearly depicted with sufficient spatial coverage for adequate inspection of the surrounding anatomy. Developments from a new perspective to inner ear imaging using the 7-T modality could lead to further improved image sensitivity and, thus, enable ultra-structural MR imaging.
Universal health coverage in Turkey: enhancement of equity.
Atun, Rifat; Aydın, Sabahattin; Chakraborty, Sarbani; Sümer, Safir; Aran, Meltem; Gürol, Ipek; Nazlıoğlu, Serpil; Ozgülcü, Senay; Aydoğan, Ulger; Ayar, Banu; Dilmen, Uğur; Akdağ, Recep
2013-07-06
Turkey has successfully introduced health system changes and provided its citizens with the right to health to achieve universal health coverage, which helped to address inequities in financing, health service access, and health outcomes. We trace the trajectory of health system reforms in Turkey, with a particular emphasis on 2003-13, which coincides with the Health Transformation Program (HTP). The HTP rapidly expanded health insurance coverage and access to health-care services for all citizens, especially the poorest population groups, to achieve universal health coverage. We analyse the contextual drivers that shaped the transformations in the health system, explore the design and implementation of the HTP, identify the factors that enabled its success, and investigate its effects. Our findings suggest that the HTP was instrumental in achieving universal health coverage to enhance equity substantially, and led to quantifiable and beneficial effects on all health system goals, with an improved level and distribution of health, greater fairness in financing with better financial protection, and notably increased user satisfaction. After the HTP, five health insurance schemes were consolidated to create a unified General Health Insurance scheme with harmonised and expanded benefits. Insurance coverage for the poorest population groups in Turkey increased from 2·4 million people in 2003, to 10·2 million in 2011. Health service access increased across the country-in particular, access and use of key maternal and child health services improved to help to greatly reduce the maternal mortality ratio, and under-5, infant, and neonatal mortality, especially in socioeconomically disadvantaged groups. Several factors helped to achieve universal health coverage and improve outcomes. These factors include economic growth, political stability, a comprehensive transformation strategy led by a transformation team, rapid policy translation, flexible implementation with continuous learning, and simultaneous improvements in the health system, on both the demand side (increased health insurance coverage, expanded benefits, and reduced cost-sharing) and the supply side (expansion of infrastructure, health human resources, and health services). Copyright © 2013 Elsevier Ltd. All rights reserved.
Massa, K; Olsen, A; Sheshe, A; Ntakamulenga, R; Ndawi, B; Magnussen, P
2009-11-01
Control programmes generally use a school-based strategy of mass drug administration to reduce morbidity of schistosomiasis and soil-transmitted helminthiasis (STH) in school-aged populations. The success of school-based programmes depends on treatment coverage. The community-directed treatment (ComDT) approach has been implemented in the control of onchocerciasis and lymphatic filariasis in Africa and improves treatment coverage. This study compared the treatment coverage between the ComDT approach and the school-based treatment approach, where non-enrolled school-aged children were invited for treatment, in the control of schistosomiasis and STH among enrolled and non-enrolled school-aged children. Coverage during the first treatment round among enrolled children was similar for the two approaches (ComDT: 80.3% versus school: 82.1%, P=0.072). However, for the non-enrolled children the ComDT approach achieved a significantly higher coverage than the school-based approach (80.0 versus 59.2%, P<0.001). Similar treatment coverage levels were attained at the second treatment round. Again, equal levels of treatment coverage were found between the two approaches for the enrolled school-aged children, while the ComDT approach achieved a significantly higher coverage in the non-enrolled children. The results of this study showed that the ComDT approach can obtain significantly higher treatment coverage among the non-enrolled school-aged children compared to the school-based treatment approach for the control of schistosomiasis and STH.
Munos, Melinda K; Stanton, Cynthia K; Bryce, Jennifer
2017-06-01
Regular monitoring of coverage for reproductive, maternal, neonatal, and child health (RMNCH) is central to assessing progress toward health goals. The objectives of this review were to describe the current state of coverage measurement for RMNCH, assess the extent to which current approaches to coverage measurement cover the spectrum of RMNCH interventions, and prioritize interventions for a novel approach to coverage measurement linking household surveys with provider assessments. We included 58 interventions along the RMNCH continuum of care for which there is evidence of effectiveness against cause-specific mortality and stillbirth. We reviewed household surveys and provider assessments used in low- and middle-income countries (LMICs) to determine whether these tools generate measures of intervention coverage, readiness, or quality. For facility-based interventions, we assessed the feasibility of linking provider assessments to household surveys to provide estimates of intervention coverage. Fewer than half (24 of 58) of included RMNCH interventions are measured in standard household surveys. The periconceptional, antenatal, and intrapartum periods were poorly represented. All but one of the interventions not measured in household surveys are facility-based, and 13 of these would be highly feasible to measure by linking provider assessments to household surveys. We found important gaps in coverage measurement for proven RMNCH interventions, particularly around the time of birth. Based on our findings, we propose three sets of actions to improve coverage measurement for RMNCH, focused on validation of coverage measures and development of new measurement approaches feasible for use at scale in LMICs.
Kotoula, Vassiliki; Lyberopoulou, Aggeliki; Papadopoulou, Kyriaki; Charalambous, Elpida; Alexopoulou, Zoi; Gakou, Chryssa; Lakis, Sotiris; Tsolaki, Eleftheria; Lilakos, Konstantinos; Fountzilas, George
2015-01-01
Background—Aim Massively parallel sequencing (MPS) holds promise for expanding cancer translational research and diagnostics. As yet, it has been applied on paraffin DNA (FFPE) with commercially available highly multiplexed gene panels (100s of DNA targets), while custom panels of low multiplexing are used for re-sequencing. Here, we evaluated the performance of two highly multiplexed custom panels on FFPE DNA. Methods Two custom multiplex amplification panels (B, 373 amplicons; T, 286 amplicons) were coupled with semiconductor sequencing on DNA samples from FFPE breast tumors and matched peripheral blood samples (n samples: 316; n libraries: 332). The two panels shared 37% DNA targets (common or shifted amplicons). Panel performance was evaluated in paired sample groups and quartets of libraries, where possible. Results Amplicon read ratios yielded similar patterns per gene with the same panel in FFPE and blood samples; however, performance of common amplicons differed between panels (p<0.001). FFPE genotypes were compared for 1267 coding and non-coding variant replicates, 999 out of which (78.8%) were concordant in different paired sample combinations. Variant frequency was highly reproducible (Spearman’s rho 0.959). Repeatedly discordant variants were of high coverage / low frequency (p<0.001). Genotype concordance was (a) high, for intra-run duplicates with the same panel (mean±SD: 97.2±4.7, 95%CI: 94.8–99.7, p<0.001); (b) modest, when the same DNA was analyzed with different panels (mean±SD: 81.1±20.3, 95%CI: 66.1–95.1, p = 0.004); and (c) low, when different DNA samples from the same tumor were compared with the same panel (mean±SD: 59.9±24.0; 95%CI: 43.3–76.5; p = 0.282). Low coverage / low frequency variants were validated with Sanger sequencing even in samples with unfavourable DNA quality. Conclusions Custom MPS may yield novel information on genomic alterations, provided that data evaluation is adjusted to tumor tissue FFPE DNA. To this scope, eligibility of all amplicons along with variant coverage and frequency need to be assessed. PMID:26039550
Data mining of enzymes using specific peptides
2009-01-01
Background Predicting the function of a protein from its sequence is a long-standing challenge of bioinformatic research, typically addressed using either sequence-similarity or sequence-motifs. We employ the novel motif method that consists of Specific Peptides (SPs) that are unique to specific branches of the Enzyme Commission (EC) functional classification. We devise the Data Mining of Enzymes (DME) methodology that allows for searching SPs on arbitrary proteins, determining from its sequence whether a protein is an enzyme and what the enzyme's EC classification is. Results We extract novel SP sets from Swiss-Prot enzyme data. Using a training set of July 2006, and test sets of July 2008, we find that the predictive power of SPs, both for true-positives (enzymes) and true-negatives (non-enzymes), depends on the coverage length of all SP matches (the number of amino-acids matched on the protein sequence). DME is quite different from BLAST. Comparing the two on an enzyme test set of July 2008, we find that DME has lower recall. On the other hand, DME can provide predictions for proteins regarded by BLAST as having low homologies with known enzymes, thus supplying complementary information. We test our method on a set of proteins belonging to 10 bacteria, dated July 2008, establishing the usefulness of the coverage-length cutoff to determine true-negatives. Moreover, sifting through our predictions we find that some of them have been substantiated by Swiss-Prot annotations by July 2009. Finally we extract, for production purposes, a novel SP set trained on all Swiss-Prot enzymes as of July 2009. This new set increases considerably the recall of DME. The new SP set is being applied to three metagenomes: Sargasso Sea with over 1,000,000 proteins, producing predictions of over 220,000 enzymes, and two human gut metagenomes. The outcome of these analyses can be characterized by the enzymatic profile of the metagenomes, describing the relative numbers of enzymes observed for different EC categories. Conclusions Employing SPs for predicting enzymatic activity of proteins works well once one utilizes coverage-length criteria. In our analysis, L ≥ 7 has led to highly accurate results. PMID:20034383
A second-generation anchored genetic linkage map of the tammar wallaby (Macropus eugenii)
2011-01-01
Background The tammar wallaby, Macropus eugenii, a small kangaroo used for decades for studies of reproduction and metabolism, is the model Australian marsupial for genome sequencing and genetic investigations. The production of a more comprehensive cytogenetically-anchored genetic linkage map will significantly contribute to the deciphering of the tammar wallaby genome. It has great value as a resource to identify novel genes and for comparative studies, and is vital for the ongoing genome sequence assembly and gene ordering in this species. Results A second-generation anchored tammar wallaby genetic linkage map has been constructed based on a total of 148 loci. The linkage map contains the original 64 loci included in the first-generation map, plus an additional 84 microsatellite loci that were chosen specifically to increase coverage and assist with the anchoring and orientation of linkage groups to chromosomes. These additional loci were derived from (a) sequenced BAC clones that had been previously mapped to tammar wallaby chromosomes by fluorescence in situ hybridization (FISH), (b) End sequence from BACs subsequently FISH-mapped to tammar wallaby chromosomes, and (c) tammar wallaby genes orthologous to opossum genes predicted to fill gaps in the tammar wallaby linkage map as well as three X-linked markers from a published study. Based on these 148 loci, eight linkage groups were formed. These linkage groups were assigned (via FISH-mapped markers) to all seven autosomes and the X chromosome. The sex-pooled map size is 1402.4 cM, which is estimated to provide 82.6% total coverage of the genome, with an average interval distance of 10.9 cM between adjacent markers. The overall ratio of female/male map length is 0.84, which is comparable to the ratio of 0.78 obtained for the first-generation map. Conclusions Construction of this second-generation genetic linkage map is a significant step towards complete coverage of the tammar wallaby genome and considerably extends that of the first-generation map. It will be a valuable resource for ongoing tammar wallaby genetic research and assembling the genome sequence. The sex-pooled map is available online at http://compldb.angis.org.au/. PMID:21854616
A second-generation anchored genetic linkage map of the tammar wallaby (Macropus eugenii).
Wang, Chenwei; Webley, Lee; Wei, Ke-jun; Wakefield, Matthew J; Patel, Hardip R; Deakin, Janine E; Alsop, Amber; Marshall Graves, Jennifer A; Cooper, Desmond W; Nicholas, Frank W; Zenger, Kyall R
2011-08-19
The tammar wallaby, Macropus eugenii, a small kangaroo used for decades for studies of reproduction and metabolism, is the model Australian marsupial for genome sequencing and genetic investigations. The production of a more comprehensive cytogenetically-anchored genetic linkage map will significantly contribute to the deciphering of the tammar wallaby genome. It has great value as a resource to identify novel genes and for comparative studies, and is vital for the ongoing genome sequence assembly and gene ordering in this species. A second-generation anchored tammar wallaby genetic linkage map has been constructed based on a total of 148 loci. The linkage map contains the original 64 loci included in the first-generation map, plus an additional 84 microsatellite loci that were chosen specifically to increase coverage and assist with the anchoring and orientation of linkage groups to chromosomes. These additional loci were derived from (a) sequenced BAC clones that had been previously mapped to tammar wallaby chromosomes by fluorescence in situ hybridization (FISH), (b) End sequence from BACs subsequently FISH-mapped to tammar wallaby chromosomes, and (c) tammar wallaby genes orthologous to opossum genes predicted to fill gaps in the tammar wallaby linkage map as well as three X-linked markers from a published study. Based on these 148 loci, eight linkage groups were formed. These linkage groups were assigned (via FISH-mapped markers) to all seven autosomes and the X chromosome. The sex-pooled map size is 1402.4 cM, which is estimated to provide 82.6% total coverage of the genome, with an average interval distance of 10.9 cM between adjacent markers. The overall ratio of female/male map length is 0.84, which is comparable to the ratio of 0.78 obtained for the first-generation map. Construction of this second-generation genetic linkage map is a significant step towards complete coverage of the tammar wallaby genome and considerably extends that of the first-generation map. It will be a valuable resource for ongoing tammar wallaby genetic research and assembling the genome sequence. The sex-pooled map is available online at http://compldb.angis.org.au/.
Ning, Jia; Sun, Yongliang; Xie, Sheng; Zhang, Bida; Huang, Feng; Koken, Peter; Smink, Jouke; Yuan, Chun; Chen, Huijun
2018-05-01
To propose a simultaneous acquisition sequence for improved hepatic pharmacokinetics quantification accuracy (SAHA) method for liver dynamic contrast-enhanced MRI. The proposed SAHA simultaneously acquired high temporal-resolution 2D images for vascular input function extraction using Cartesian sampling and 3D large-coverage high spatial-resolution liver dynamic contrast-enhanced images using golden angle stack-of-stars acquisition in an interleaved way. Simulations were conducted to investigate the accuracy of SAHA in pharmacokinetic analysis. A healthy volunteer and three patients with cirrhosis or hepatocellular carcinoma were included in the study to investigate the feasibility of SAHA in vivo. Simulation studies showed that SAHA can provide closer results to the true values and lower root mean square error of estimated pharmacokinetic parameters in all of the tested scenarios. The in vivo scans of subjects provided fair image quality of both 2D images for arterial input function and portal venous input function and 3D whole liver images. The in vivo fitting results showed that the perfusion parameters of healthy liver were significantly different from those of cirrhotic liver and HCC. The proposed SAHA can provide improved accuracy in pharmacokinetic modeling and is feasible in human liver dynamic contrast-enhanced MRI, suggesting that SAHA is a potential tool for liver dynamic contrast-enhanced MRI. Magn Reson Med 79:2629-2641, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.