Science.gov

Sample records for deep short-read sequencing

  1. Unlocking Short Read Sequencing for Metagenomics

    SciTech Connect

    Rodrigue, Sébastien; Materna, Arne C.; Timberlake, Sonia C.; Blackburn, Matthew C.; Malmstrom, Rex R.; Alm, Eric J.; Chisholm, Sallie W.; Gilbert, Jack Anthony

    2010-07-28

    We describe an experimental and computational pipeline yielding millions of reads that can exceed 200 bp with quality scores approaching that of traditional Sanger sequencing. The method combines an automatable gel-less library construction step with paired-end sequencing on a short-read instrument. With appropriately sized library inserts, mate-pair sequences can overlap, and we describe the SHERA software package that joins them to form a longer composite read.

  2. Fast search of thousands of short-read sequencing experiments.

    PubMed

    Solomon, Brad; Kingsford, Carl

    2016-03-01

    The amount of sequence information in public repositories is growing at a rapid rate. Although these data are likely to contain clinically important information that has not yet been uncovered, our ability to effectively mine these repositories is limited. Here we introduce Sequence Bloom Trees (SBTs), a method for querying thousands of short-read sequencing experiments by sequence, 162 times faster than existing approaches. The approach searches large data archives for all experiments that involve a given sequence. We use SBTs to search 2,652 human blood, breast and brain RNA-seq experiments for all 214,293 known transcripts in under 4 days using less than 239 MB of RAM and a single CPU. Searching sequence archives at this scale and in this time frame is currently not possible using existing tools.

  3. Viral population analysis and minority-variant detection using short read next-generation sequencing

    PubMed Central

    Watson, Simon J.; Welkers, Matthijs R. A.; Depledge, Daniel P.; Coulter, Eve; Breuer, Judith M.; de Jong, Menno D.; Kellam, Paul

    2013-01-01

    RNA viruses within infected individuals exist as a population of evolutionary-related variants. Owing to evolutionary change affecting the constitution of this population, the frequency and/or occurrence of individual viral variants can show marked or subtle fluctuations. Since the development of massively parallel sequencing platforms, such viral populations can now be investigated to unprecedented resolution. A critical problem with such analyses is the presence of sequencing-related errors that obscure the identification of true biological variants present at low frequency. Here, we report the development and assessment of the Quality Assessment of Short Read (QUASR) Pipeline (http://sourceforge.net/projects/quasr) specific for virus genome short read analysis that minimizes sequencing errors from multiple deep-sequencing platforms, and enables post-mapping analysis of the minority variants within the viral population. QUASR significantly reduces the error-related noise in deep-sequencing datasets, resulting in increased mapping accuracy and reduction of erroneous mutations. Using QUASR, we have determined influenza virus genome dynamics in sequential samples from an in vitro evolution of 2009 pandemic H1N1 (A/H1N1/09) influenza from samples sequenced on both the Roche 454 GSFLX and Illumina GAIIx platforms. Importantly, concordance between the 454 and Illumina sequencing allowed unambiguous minority-variant detection and accurate determination of virus population turnover in vitro. PMID:23382427

  4. Viral population analysis and minority-variant detection using short read next-generation sequencing.

    PubMed

    Watson, Simon J; Welkers, Matthijs R A; Depledge, Daniel P; Coulter, Eve; Breuer, Judith M; de Jong, Menno D; Kellam, Paul

    2013-03-19

    RNA viruses within infected individuals exist as a population of evolutionary-related variants. Owing to evolutionary change affecting the constitution of this population, the frequency and/or occurrence of individual viral variants can show marked or subtle fluctuations. Since the development of massively parallel sequencing platforms, such viral populations can now be investigated to unprecedented resolution. A critical problem with such analyses is the presence of sequencing-related errors that obscure the identification of true biological variants present at low frequency. Here, we report the development and assessment of the Quality Assessment of Short Read (QUASR) Pipeline (http://sourceforge.net/projects/quasr) specific for virus genome short read analysis that minimizes sequencing errors from multiple deep-sequencing platforms, and enables post-mapping analysis of the minority variants within the viral population. QUASR significantly reduces the error-related noise in deep-sequencing datasets, resulting in increased mapping accuracy and reduction of erroneous mutations. Using QUASR, we have determined influenza virus genome dynamics in sequential samples from an in vitro evolution of 2009 pandemic H1N1 (A/H1N1/09) influenza from samples sequenced on both the Roche 454 GSFLX and Illumina GAIIx platforms. Importantly, concordance between the 454 and Illumina sequencing allowed unambiguous minority-variant detection and accurate determination of virus population turnover in vitro.

  5. Development and transferability of black and red raspberry microsatellite markers from short-read sequences

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The advent of next-generation sequencing technologies has been a boon to the cost-effective development of molecular markers, particularly in non-model species. Here, we demonstrate the efficiency of microsatellite or simple sequence repeat (SSR) marker development from short-read sequences using th...

  6. An analysis of the feasibility of short read sequencing

    PubMed Central

    Whiteford, Nava; Haslam, Niall; Weber, Gerald; Prügel-Bennett, Adam; Essex, Jonathan W.; Roach, Peter L.; Bradley, Mark; Neylon, Cameron

    2005-01-01

    Several methods for ultra high-throughput DNA sequencing are currently under investigation. Many of these methods yield very short blocks of sequence information (reads). Here we report on an analysis showing the level of genome sequencing possible as a function of read length. It is shown that re-sequencing and de novo sequencing of the majority of a bacterial genome is possible with read lengths of 20–30 nt, and that reads of 50 nt can provide reconstructed contigs (a contiguous fragment of sequence data) of 1000 nt and greater that cover 80% of human chromosome 1. PMID:16275781

  7. Short read sequencing for Genomic Analysis of the brown rot fungus Fibroporia radiculosa

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The practical capability of short read sequencing for whole genome gene prediction was investigated for Fibroporia radiculosa, a copper-tolerant basidiomycete fungus that causes brown rot decay of wood. Illumina GAIIX reads from a single run of a paired-end library (75 nt read length, 300 bp insert...

  8. Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies

    PubMed Central

    Sundquist, Andreas; Ronaghi, Mostafa; Tang, Haixu; Pevzner, Pavel; Batzoglou, Serafim

    2007-01-01

    While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology. PMID:17534434

  9. Assembled sequence contigs by SOAPdenova and Volvet algorithms from metagenomic short reads of a new bacterial isolate of gut origin

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Assembled sequence contigs by SOAPdenova and Volvet algorithms from metagenomic short reads of a new bacterial isolate of gut origin. This study included 2 submissions with a total of 9.8 million bp of assembled contigs....

  10. Haplotype-Phased Synthetic Long Reads from Short-Read Sequencing

    PubMed Central

    Stapleton, James A.; Kim, Jeongwoon; Hamilton, John P.; Wu, Ming; Irber, Luiz C.; Maddamsetti, Rohan; Briney, Bryan; Newton, Linsey; Burton, Dennis R.; Brown, C. Titus; Chan, Christina; Buell, C. Robin; Whitehead, Timothy A.

    2016-01-01

    Next-generation DNA sequencing has revolutionized the study of biology. However, the short read lengths of the dominant instruments complicate assembly of complex genomes and haplotype phasing of mixtures of similar sequences. Here we demonstrate a method to reconstruct the sequences of individual nucleic acid molecules up to 11.6 kilobases in length from short (150-bp) reads. We show that our method can construct 99.97%-accurate synthetic reads from bacterial, plant, and animal genomic samples, full-length mRNA sequences from human cancer cell lines, and individual HIV env gene variants from a mixture. The preparation of multiple samples can be multiplexed into a single tube, further reducing effort and cost relative to competing approaches. Our approach generates sequencing libraries in three days from less than one microgram of DNA in a single-tube format without custom equipment or specialized expertise. PMID:26789840

  11. AREM: Aligning Short Reads from ChIP-Sequencing by Expectation Maximization

    NASA Astrophysics Data System (ADS)

    Newkirk, Daniel; Biesinger, Jacob; Chon, Alvin; Yokomori, Kyoko; Xie, Xiaohui

    High-throughput sequencing coupled to chromatin immunoprecipitation (ChIP-Seq) is widely used in characterizing genome-wide binding patterns of transcription factors, cofactors, chromatin modifiers, and other DNA binding proteins. A key step in ChIP-Seq data analysis is to map short reads from high-throughput sequencing to a reference genome and identify peak regions enriched with short reads. Although several methods have been proposed for ChIP-Seq analysis, most existing methods only consider reads that can be uniquely placed in the reference genome, and therefore have low power for detecting peaks located within repeat sequences. Here we introduce a probabilistic approach for ChIP-Seq data analysis which utilizes all reads, providing a truly genome-wide view of binding patterns. Reads are modeled using a mixture model corresponding to K enriched regions and a null genomic background. We use maximum likelihood to estimate the locations of the enriched regions, and implement an expectation-maximization (E-M) algorithm, called AREM (aligning reads by expectation maximization), to update the alignment probabilities of each read to different genomic locations. We apply the algorithm to identify genome-wide binding events of two proteins: Rad21, a component of cohesin and a key factor involved in chromatid cohesion, and Srebp-1, a transcription factor important for lipid/cholesterol homeostasis. Using AREM, we were able to identify 19,935 Rad21 peaks and 1,748 Srebp-1 peaks in the mouse genome with high confidence, including 1,517 (7.6%) Rad21 peaks and 227 (13%) Srebp-1 peaks that were missed using only uniquely mapped reads. The open source implementation of our algorithm is available at http://sourceforge.net/projects/arem

  12. Reference-based compression of short-read sequences using path encoding

    PubMed Central

    Kingsford, Carl; Patro, Rob

    2015-01-01

    Motivation: Storing, transmitting and archiving data produced by next-generation sequencing is a significant computational burden. New compression techniques tailored to short-read sequence data are needed. Results: We present here an approach to compression that reduces the difficulty of managing large-scale sequencing data. Our novel approach sits between pure reference-based compression and reference-free compression and combines much of the benefit of reference-based approaches with the flexibility of de novo encoding. Our method, called path encoding, draws a connection between storing paths in de Bruijn graphs and context-dependent arithmetic coding. Supporting this method is a system to compactly store sets of kmers that is of independent interest. We are able to encode RNA-seq reads using 3–11% of the space of the sequence in raw FASTA files, which is on average more than 34% smaller than competing approaches. We also show that even if the reference is very poorly matched to the reads that are being encoded, good compression can still be achieved. Availability and implementation: Source code and binaries freely available for download at http://www.cs.cmu.edu/∼ckingsf/software/pathenc/, implemented in Go and supported on Linux and Mac OS X. Contact: carlk@cs.cmu.edu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25649622

  13. MOST: a modified MLST typing tool based on short read sequencing

    PubMed Central

    Dallman, Timothy; Schaefer, Ulf; Sheppard, Carmen L.; Ashton, Philip; Pichon, Bruno; Ellington, Matthew; Swift, Craig; Green, Jonathan; Underwood, Anthony

    2016-01-01

    Multilocus sequence typing (MLST) is an effective method to describe bacterial populations. Conventionally, MLST involves Polymerase Chain Reaction (PCR) amplification of housekeeping genes followed by Sanger DNA sequencing. Public Health England (PHE) is in the process of replacing the conventional MLST methodology with a method based on short read sequence data derived from Whole Genome Sequencing (WGS). This paper reports the comparison of the reliability of MLST results derived from WGS data, comparing mapping and assembly-based approaches to conventional methods using 323 bacterial genomes of diverse species. The sensitivity of the two WGS based methods were further investigated with 26 mixed and 29 low coverage genomic data sets from Salmonella enteridis and Streptococcus pneumoniae. Of the 323 samples, 92.9% (n = 300), 97.5% (n = 315) and 99.7% (n = 322) full MLST profiles were derived by the conventional method, assembly- and mapping-based approaches, respectively. The concordance between samples that were typed by conventional (92.9%) and both WGS methods was 100%. From the 55 mixed and low coverage genomes, 89.1% (n = 49) and 67.3% (n = 37) full MLST profiles were derived from the mapping and assembly based approaches, respectively. In conclusion, deriving MLST from WGS data is more sensitive than the conventional method. When comparing WGS based methods, the mapping based approach was the most sensitive. In addition, the mapping based approach described here derives quality metrics, which are difficult to determine quantitatively using conventional and WGS-assembly based approaches. PMID:27602279

  14. Efficient yeast ChIP-Seq using multiplex short-read DNA sequencing

    PubMed Central

    Lefrançois, Philippe; Euskirchen, Ghia M; Auerbach, Raymond K; Rozowsky, Joel; Gibson, Theodore; Yellman, Christopher M; Gerstein, Mark; Snyder, Michael

    2009-01-01

    Background Short-read high-throughput DNA sequencing technologies provide new tools to answer biological questions. However, high cost and low throughput limit their widespread use, particularly in organisms with smaller genomes such as S. cerevisiae. Although ChIP-Seq in mammalian cell lines is replacing array-based ChIP-chip as the standard for transcription factor binding studies, ChIP-Seq in yeast is still underutilized compared to ChIP-chip. We developed a multiplex barcoding system that allows simultaneous sequencing and analysis of multiple samples using Illumina's platform. We applied this method to analyze the chromosomal distributions of three yeast DNA binding proteins (Ste12, Cse4 and RNA PolII) and a reference sample (input DNA) in a single experiment and demonstrate its utility for rapid and accurate results at reduced costs. Results We developed a barcoding ChIP-Seq method for the concurrent analysis of transcription factor binding sites in yeast. Our multiplex strategy generated high quality data that was indistinguishable from data obtained with non-barcoded libraries. None of the barcoded adapters induced differences relative to a non-barcoded adapter when applied to the same DNA sample. We used this method to map the binding sites for Cse4, Ste12 and Pol II throughout the yeast genome and we found 148 binding targets for Cse4, 823 targets for Ste12 and 2508 targets for PolII. Cse4 was strongly bound to all yeast centromeres as expected and the remaining non-centromeric targets correspond to highly expressed genes in rich media. The presence of Cse4 non-centromeric binding sites was not reported previously. Conclusion We designed a multiplex short-read DNA sequencing method to perform efficient ChIP-Seq in yeast and other small genome model organisms. This method produces accurate results with higher throughput and reduced cost. Given constant improvements in high-throughput sequencing technologies, increasing multiplexing will be possible to

  15. Short-read, high-throughput sequencing technology for STR genotyping.

    PubMed

    Bornman, Daniel M; Hester, Mark E; Schuetter, Jared M; Kasoji, Manjula D; Minard-Smith, Angela; Barden, Curt A; Nelson, Scott C; Godbold, Gene D; Baker, Christine H; Yang, Boyu; Walther, Jacquelyn E; Tornes, Ivan E; Yan, Pearlly S; Rodriguez, Benjamin; Bundschuh, Ralf; Dickens, Michael L; Young, Brian A; Faith, Seth A

    2012-04-01

    DNA-based methods for human identification principally rely upon genotyping of short tandem repeat (STR) loci. Electrophoretic-based techniques for variable-length classification of STRs are universally utilized, but are limited in that they have relatively low throughput and do not yield nucleotide sequence information. High-throughput sequencing technology may provide a more powerful instrument for human identification, but is not currently validated for forensic casework. Here, we present a systematic method to perform high-throughput genotyping analysis of the Combined DNA Index System (CODIS) STR loci using short-read (150 bp) massively parallel sequencing technology. Open source reference alignment tools were optimized to evaluate PCR-amplified STR loci using a custom designed STR genome reference. Evaluation of this approach demonstrated that the 13 CODIS STR loci and amelogenin (AMEL) locus could be accurately called from individual and mixture samples. Sensitivity analysis showed that as few as 18,500 reads, aligned to an in silico referenced genome, were required to genotype an individual (>99% confidence) for the CODIS loci. The power of this technology was further demonstrated by identification of variant alleles containing single nucleotide polymorphisms (SNPs) and the development of quantitative measurements (reads) for resolving mixed samples.

  16. Bioinformatics challenges in de novo transcriptome assembly using short read sequences in the absence of a reference genome sequence.

    PubMed

    Góngora-Castillo, Elsa; Buell, C Robin

    2013-04-01

    Plant natural product research can be facilitated through genome and transcriptome sequencing approaches that generate informative sequence and expression datasets that enable characterization of biochemical pathways of interest. As the overwhelming majority of plant-derived natural products are derived from species with little, if any, sequence and/or genomic resources, the ability to perform whole genome shotgun sequencing and assembly has been and will continue to be transformative as access to a genome sequence provides molecular resources and a context for discovery and characterization of biosynthetic pathways. Due to the reduced size and complexity of the transcriptome relative to the genome, transcriptome sequencing provides a rapid, inexpensive approach to access gene sequences, gene expression abundances, and gene expression patterns in any species, including those that lack a reference genome sequence. To date, successful applications of RNA sequencing in conjunction with de novo transcriptome assembly has enabled identification of new genes in an array of biochemical pathways in plants. While sequencing technologies are well developed, challenges remain in the handling and analysis of transcriptome sequences. In this Highlight article, we provide an overview of the bioinformatics challenges associated with transcriptome analyses using short read sequences and how to address these issues in plant species that lack a reference genome.

  17. Assembly-based inference of B-cell receptor repertoires from short read RNA sequencing data with V’DJer

    PubMed Central

    Mose, Lisle E.; Selitsky, Sara R.; Bixby, Lisa M.; Marron, David L.; Iglesia, Michael D.; Serody, Jonathan S.; Perou, Charles M.; Vincent, Benjamin G.; Parker, Joel S.

    2016-01-01

    Motivation: B-cell receptor (BCR) repertoire profiling is an important tool for understanding the biology of diverse immunologic processes. Current methods for analyzing adaptive immune receptor repertoires depend upon PCR amplification of VDJ rearrangements followed by long read amplicon sequencing spanning the VDJ junctions. While this approach has proven to be effective, it is frequently not feasible due to cost or limited sample material. Additionally, there are many existing datasets where short-read RNA sequencing data are available but PCR amplified BCR data are not. Results: We present here V’DJer, an assembly-based method that reconstructs adaptive immune receptor repertoires from short-read RNA sequencing data. This method captures expressed BCR loci from a standard RNA-seq assay. We applied this method to 473 Melanoma samples from The Cancer Genome Atlas and demonstrate V’DJer’s ability to accurately reconstruct BCR repertoires from short read mRNA-seq data. Availability and Implementation: V’DJer is implemented in C/C ++, freely available for academic use and can be downloaded from Github: https://github.com/mozack/vdjer Contact: benjamin_vincent@med.unc.edu or parkerjs@email.unc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27559159

  18. Characterization of a biogas-producing microbial community by short-read next generation DNA sequencing

    PubMed Central

    2012-01-01

    Background Renewable energy production is currently a major issue worldwide. Biogas is a promising renewable energy carrier as the technology of its production combines the elimination of organic waste with the formation of a versatile energy carrier, methane. In consequence of the complexity of the microbial communities and metabolic pathways involved the biotechnology of the microbiological process leading to biogas production is poorly understood. Metagenomic approaches are suitable means of addressing related questions. In the present work a novel high-throughput technique was tested for its benefits in resolving the functional and taxonomical complexity of such microbial consortia. Results It was demonstrated that the extremely parallel SOLiD™ short-read DNA sequencing platform is capable of providing sufficient useful information to decipher the systematic and functional contexts within a biogas-producing community. Although this technology has not been employed to address such problems previously, the data obtained compare well with those from similar high-throughput approaches such as 454-pyrosequencing GS FLX or Titanium. The predominant microbes contributing to the decomposition of organic matter include members of the Eubacteria, class Clostridia, order Clostridiales, family Clostridiaceae. Bacteria belonging in other systematic groups contribute to the diversity of the microbial consortium. Archaea comprise a remarkably small minority in this community, given their crucial role in biogas production. Among the Archaea, the predominant order is the Methanomicrobiales and the most abundant species is Methanoculleus marisnigri. The Methanomicrobiales are hydrogenotrophic methanogens. Besides corroborating earlier findings on the significance of the contribution of the Clostridia to organic substrate decomposition, the results demonstrate the importance of the metabolism of hydrogen within the biogas producing microbial community. Conclusions Both

  19. Efficient Graph Based Assembly of Short-Read Sequences on Hybrid Core Architecture

    SciTech Connect

    Sczyrba, Alex; Pratap, Abhishek; Canon, Shane; Han, James; Copeland, Alex; Wang, Zhong; Brewer, Tony; Soper, David; D'Jamoos, Mike; Collins, Kirby; Vacek, George

    2011-03-22

    Advanced architectures can deliver dramatically increased throughput for genomics and proteomics applications, reducing time-to-completion in some cases from days to minutes. One such architecture, hybrid-core computing, marries a traditional x86 environment with a reconfigurable coprocessor, based on field programmable gate array (FPGA) technology. In addition to higher throughput, increased performance can fundamentally improve research quality by allowing more accurate, previously impractical approaches. We will discuss the approach used by Convey?s de Bruijn graph constructor for short-read, de-novo assembly. Bioinformatics applications that have random access patterns to large memory spaces, such as graph-based algorithms, experience memory performance limitations on cache-based x86 servers. Convey?s highly parallel memory subsystem allows application-specific logic to simultaneously access 8192 individual words in memory, significantly increasing effective memory bandwidth over cache-based memory systems. Many algorithms, such as Velvet and other de Bruijn graph based, short-read, de-novo assemblers, can greatly benefit from this type of memory architecture. Furthermore, small data type operations (four nucleotides can be represented in two bits) make more efficient use of logic gates than the data types dictated by conventional programming models.JGI is comparing the performance of Convey?s graph constructor and Velvet on both synthetic and real data. We will present preliminary results on memory usage and run time metrics for various data sets with different sizes, from small microbial and fungal genomes to very large cow rumen metagenome. For genomes with references we will also present assembly quality comparisons between the two assemblers.

  20. Similarity thresholds used in DNA sequence assembly from short reads can reduce the comparability of population histories across species

    PubMed Central

    Judy, Caroline Duffie; Seeholzer, Glenn F.; Maley, James M.; Graves, Gary R.; Brumfield, Robb T.

    2015-01-01

    Comparing inferences among datasets generated using short read sequencing may provide insight into the concerted impacts of divergence, gene flow and selection across organisms, but comparisons are complicated by biases introduced during dataset assembly. Sequence similarity thresholds allow the de novo assembly of short reads into clusters of alleles representing different loci, but the resulting datasets are sensitive to both the similarity threshold used and to the variation naturally present in the organism under study. Thresholds that require high sequence similarity among reads for assembly (stringent thresholds) as well as highly variable species may result in datasets in which divergent alleles are lost or divided into separate loci (‘over-splitting’), whereas liberal thresholds increase the risk of paralogous loci being combined into a single locus (‘under-splitting’). Comparisons among datasets or species are therefore potentially biased if different similarity thresholds are applied or if the species differ in levels of within-lineage genetic variation. We examine the impact of a range of similarity thresholds on assembly of empirical short read datasets from populations of four different non-model bird lineages (species or species pairs) with different levels of genetic divergence. We find that, in all species, stringent similarity thresholds result in fewer alleles per locus than more liberal thresholds, which appears to be the result of high levels of over-splitting. The frequency of putative under-splitting, conversely, is low at all thresholds. Inferred genetic distances between individuals, gene tree depths, and estimates of the ancestral mutation-scaled effective population size (θ) differ depending upon the similarity threshold applied. Relative differences in inferences across species differ even when the same threshold is applied, but may be dramatically different when datasets assembled under different thresholds are compared. These

  1. BarraCUDA - a fast short read sequence aligner using graphics processing units

    PubMed Central

    2012-01-01

    Background With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http://seqbarracuda.sf.net PMID:22244497

  2. MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping.

    PubMed

    Lee, Wan-Ping; Stromberg, Michael P; Ward, Alistair; Stewart, Chip; Garrison, Erik P; Marth, Gabor T

    2014-01-01

    MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide consistent mappings for all the generated data (sequencing technologies, low-coverage and exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture mismatches as well as short insertions and deletions. To support the growing interest in larger structural variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g. mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All variant discovery benefits from an accurate description of the read placement confidence. To this end, MOSAIK uses a neural-network based training scheme to provide well-calibrated mapping quality scores, demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO (http://gkno.me).

  3. Choice of Reference Sequence and Assembler for Alignment of Listeria monocytogenes Short-Read Sequence Data Greatly Influences Rates of Error in SNP Analyses

    PubMed Central

    Pightling, Arthur W.; Petronella, Nicholas; Pagotto, Franco

    2014-01-01

    The wide availability of whole-genome sequencing (WGS) and an abundance of open-source software have made detection of single-nucleotide polymorphisms (SNPs) in bacterial genomes an increasingly accessible and effective tool for comparative analyses. Thus, ensuring that real nucleotide differences between genomes (i.e., true SNPs) are detected at high rates and that the influences of errors (such as false positive SNPs, ambiguously called sites, and gaps) are mitigated is of utmost importance. The choices researchers make regarding the generation and analysis of WGS data can greatly influence the accuracy of short-read sequence alignments and, therefore, the efficacy of such experiments. We studied the effects of some of these choices, including: i) depth of sequencing coverage, ii) choice of reference-guided short-read sequence assembler, iii) choice of reference genome, and iv) whether to perform read-quality filtering and trimming, on our ability to detect true SNPs and on the frequencies of errors. We performed benchmarking experiments, during which we assembled simulated and real Listeria monocytogenes strain 08-5578 short-read sequence datasets of varying quality with four commonly used assemblers (BWA, MOSAIK, Novoalign, and SMALT), using reference genomes of varying genetic distances, and with or without read pre-processing (i.e., quality filtering and trimming). We found that assemblies of at least 50-fold coverage provided the most accurate results. In addition, MOSAIK yielded the fewest errors when reads were aligned to a nearly identical reference genome, while using SMALT to align reads against a reference sequence that is ∼0.82% distant from 08-5578 at the nucleotide level resulted in the detection of the greatest numbers of true SNPs and the fewest errors. Finally, we show that whether read pre-processing improves SNP detection depends upon the choice of reference sequence and assembler. In total, this study demonstrates that researchers should

  4. Alignment of Short Reads: A Crucial Step for Application of Next-Generation Sequencing Data in Precision Medicine

    PubMed Central

    Ye, Hao; Meehan, Joe; Tong, Weida; Hong, Huixiao

    2015-01-01

    Precision medicine or personalized medicine has been proposed as a modernized and promising medical strategy. Genetic variants of patients are the key information for implementation of precision medicine. Next-generation sequencing (NGS) is an emerging technology for deciphering genetic variants. Alignment of raw reads to a reference genome is one of the key steps in NGS data analysis. Many algorithms have been developed for alignment of short read sequences since 2008. Users have to make a decision on which alignment algorithm to use in their studies. Selection of the right alignment algorithm determines not only the alignment algorithm but also the set of suitable parameters to be used by the algorithm. Understanding these algorithms helps in selecting the appropriate alignment algorithm for different applications in precision medicine. Here, we review current available algorithms and their major strategies such as seed-and-extend and q-gram filter. We also discuss the challenges in current alignment algorithms, including alignment in multiple repeated regions, long reads alignment and alignment facilitated with known genetic variants. PMID:26610555

  5. Short reads and nonmodel species: exploring the complexities of next-generation sequence assembly and SNP discovery in the absence of a reference genome.

    PubMed

    Everett, M V; Grau, E D; Seeb, J E

    2011-03-01

    How practical is gene and SNP discovery in a nonmodel species using short read sequences? Next-generation sequencing technologies are being applied to an increasing number of species with no reference genome. For nonmodel species, the cost, availability of existing genetic resources, genome complexity and the planned method of assembly must all be considered when selecting a sequencing platform. Our goal was to examine the feasibility and optimal methodology for SNP and gene discovery in the sockeye salmon (Oncorhynchus nerka) using short read sequences. SOLiD short reads (up to 50 bp) were generated from single- and pooled-tissue transcriptome libraries from ten sockeye salmon. The individuals were from five distinct populations from the Wood River Lakes and Mendeltna Creek, Alaska. As no reference genome was available for sockeye salmon, the SOLiD sequence reads were assembled to publicly available EST reference sequences from sockeye salmon and two closely related species, rainbow trout (Oncorhynchus mykiss) and Atlantic salmon (Salmo salar). Additionally, de novo assembly of the SOLiD data was carried out, and the SOLiD reads were remapped to the de novo contigs. The results from each reference assembly were compared across all references. The number and size of contigs assembled varied with the size reference sequences. In silico SNP discovery was carried out on contigs from all four EST references; however, discovery of valid SNPs was most successful using one of the two conspecific references.

  6. Whole genome sequencing of environmental Vibrio cholerae O1 from 10 nanograms of DNA using short reads.

    PubMed

    Pérez Chaparro, Paula Juliana; McCulloch, John Anthony; Cerdeira, Louise Teixeira; Al-Dilaimi, Arwa; Canto de Sá, Lena Lillian; de Oliveira, Rodrigo; Tauch, Andreas; de Carvalho Azevedo, Vasco Ariston; Cruz Schneider, Maria Paula; da Silva, Artur Luiz da Costa

    2011-11-01

    Multiple Displacement Amplification (MDA) of DNA using φ29 (phi29) DNA polymerase amplifies DNA several billion-fold, which has proved to be potentially very useful for evaluating genome information in a culture-independent manner. Whole genome sequencing using DNA from a single prokaryotic genome copy amplified by MDA has not yet been achieved due to the formation of chimeras and skewed amplification of genomic regions during the MDA step, which then precludes genome assembly. We have hereby addressed the issue by using 10 ng of genomic Vibrio cholerae DNA extracted within an agarose plug to ensure circularity as a starting point for MDA and then sequencing the amplified yield using the SOLiD platform. We successfully managed to assemble the entire genome of V. cholerae strain LMA3984-4 (environmental O1 strain isolated in urban Amazonia) using a hybrid de novo assembly strategy. Using our method, only 178 out of 16,713 (1%) of contigs were not able to be inserted into either chromosome scaffold, and out of these 178, only 3 appeared to be chimeras. The other contigs seem to be the result of template-independent non-specific amplification during MDA, yielding spurious reads. Extraction of genomic DNA within an agarose plug in order to ensure circularity of the extracted genome might be key to minimizing amplification bias by MDA for WGS.

  7. Comprehensive definition of genome features in Spirodela polyrhiza by high-depth physical mapping and short-read DNA sequencing strategies.

    PubMed

    Michael, Todd P; Bryant, Douglas; Gutierrez, Ryan; Borisjuk, Nikolai; Chu, Philomena; Zhang, Hanzhong; Xia, Jing; Zhou, Junfei; Peng, Hai; El Baidouri, Moaine; Ten Hallers, Boudewijn; Hastie, Alex R; Liang, Tiffany; Acosta, Kenneth; Gilbert, Sarah; McEntee, Connor; Jackson, Scott A; Mockler, Todd C; Zhang, Weixiong; Lam, Eric

    2017-02-01

    Spirodela polyrhiza is a fast-growing aquatic monocot with highly reduced morphology, genome size and number of protein-coding genes. Considering these biological features of Spirodela and its basal position in the monocot lineage, understanding its genome architecture could shed light on plant adaptation and genome evolution. Like many draft genomes, however, the 158-Mb Spirodela genome sequence has not been resolved to chromosomes, and important genome characteristics have not been defined. Here we deployed rapid genome-wide physical maps combined with high-coverage short-read sequencing to resolve the 20 chromosomes of Spirodela and to empirically delineate its genome features. Our data revealed a dramatic reduction in the number of the rDNA repeat units in Spirodela to fewer than 100, which is even fewer than that reported for yeast. Consistent with its unique phylogenetic position, small RNA sequencing revealed 29 Spirodela-specific microRNA, with only two being shared with Elaeis guineensis (oil palm) and Musa balbisiana (banana). Combining DNA methylation data and small RNA sequencing enabled the accurate prediction of 20.5% long terminal repeats (LTRs) that doubled the previous estimate, and revealed a high Solo:Intact LTR ratio of 8.2. Interestingly, we found that Spirodela has the lowest global DNA methylation levels (9%) of any plant species tested. Taken together our results reveal a genome that has undergone reduction, likely through eliminating non-essential protein coding genes, rDNA and LTRs. In addition to delineating the genome features of this unique plant, the methodologies described and large-scale genome resources from this work will enable future evolutionary and functional studies of this basal monocot family.

  8. Deep sequencing of evolving pathogen populations: applications, errors, and bioinformatic solutions

    PubMed Central

    2014-01-01

    Deep sequencing harnesses the high throughput nature of next generation sequencing technologies to generate population samples, treating information contained in individual reads as meaningful. Here, we review applications of deep sequencing to pathogen evolution. Pioneering deep sequencing studies from the virology literature are discussed, such as whole genome Roche-454 sequencing analyses of the dynamics of the rapidly mutating pathogens hepatitis C virus and HIV. Extension of the deep sequencing approach to bacterial populations is then discussed, including the impacts of emerging sequencing technologies. While it is clear that deep sequencing has unprecedented potential for assessing the genetic structure and evolutionary history of pathogen populations, bioinformatic challenges remain. We summarise current approaches to overcoming these challenges, in particular methods for detecting low frequency variants in the context of sequencing error and reconstructing individual haplotypes from short reads. PMID:24428920

  9. Leveraging FPGAs for Accelerating Short Read Alignment.

    PubMed

    Arram, James; Kaplan, Thomas; Luk, Wayne; Jiang, Peiyong

    2016-02-29

    One of the key challenges facing genomics today is how to efficiently analyse the massive amounts of data produced by next-generation sequencing platforms. With general-purpose computing systems struggling to address this challenge, specialised processors such as the Field-Programmable Gate Array (FPGA) are receiving growing interest. The means by which to leverage this technology for accelerating genomic data analysis is however largely unexplored. In this paper we present a runtime reconfigurable architecture for accelerating short read alignment using FPGAs. This architecture exploits the reconfigurability of FPGAs to allow the development of fast yet flexible alignment designs. We apply this architecture to develop an alignment design which supports exact and approximate alignment with up to 2 mismatches. Our design is based on the FM-index, with optimisations to improve the alignment performance. In particular, the n-step FM-index, index oversampling, a seedand- compare stage, and bi-directional backtracking are included. Our design is implemented and evaluated on a 1U Maxeler MPC-X2000 dataflow node with 8 Altera Stratix-V FPGAs. Measurements show that our design is 28 times faster than Bowtie2 running with 16 threads on dual Intel Xeon E5-2640 CPUs, and 9 times faster than Soap3-dp running on an NVIDIA Tesla C2070 GPU.

  10. Droplet barcoding for massively parallel single-molecule deep sequencing

    PubMed Central

    Lan, Freeman; Haliburton, John R.; Yuan, Aaron; Abate, Adam R.

    2016-01-01

    The ability to accurately sequence long DNA molecules is important across biology, but existing sequencers are limited in read length and accuracy. Here, we demonstrate a method to leverage short-read sequencing to obtain long and accurate reads. Using droplet microfluidics, we isolate, amplify, fragment and barcode single DNA molecules in aqueous picolitre droplets, allowing the full-length molecules to be sequenced with multi-fold coverage using short-read sequencing. We show that this approach can provide accurate sequences of up to 10 kb, allowing us to identify rare mutations below the detection limit of conventional sequencing and directly link them into haplotypes. This barcoding methodology can be a powerful tool in sequencing heterogeneous populations such as viruses. PMID:27353563

  11. Qualitative de novo analysis of full length cDNA and quantitative analysis of gene expression for common marmoset (Callithrix jacchus) transcriptomes using parallel long-read technology and short-read sequencing.

    PubMed

    Shimizu, Makiko; Iwano, Shunsuke; Uno, Yasuhiro; Uehara, Shotaro; Inoue, Takashi; Murayama, Norie; Onodera, Jun; Sasaki, Erika; Yamazaki, Hiroshi

    2014-01-01

    The common marmoset (Callithrix jacchus) is a non-human primate that could prove useful as human pharmacokinetic and biomedical research models. The cytochromes P450 (P450s) are a superfamily of enzymes that have critical roles in drug metabolism and disposition via monooxygenation of a broad range of xenobiotics; however, information on some marmoset P450s is currently limited. Therefore, identification and quantitative analysis of tissue-specific mRNA transcripts, including those of P450s and flavin-containing monooxygenases (FMO, another monooxygenase family), need to be carried out in detail before the marmoset can be used as an animal model in drug development. De novo assembly and expression analysis of marmoset transcripts were conducted with pooled liver, intestine, kidney, and brain samples from three male and three female marmosets. After unique sequences were automatically aligned by assembling software, the mean contig length was 718 bp (with a standard deviation of 457 bp) among a total of 47,883 transcripts. Approximately 30% of the total transcripts were matched to known marmoset sequences. Gene expression in 18 marmoset P450- and 4 FMO-like genes displayed some tissue-specific patterns. Of these, the three most highly expressed in marmoset liver were P450 2D-, 2E-, and 3A-like genes. In extrahepatic tissues, including brain, gene expressions of these monooxygenases were lower than those in liver, although P450 3A4 (previously P450 3A21) in intestine and P450 4A11- and FMO1-like genes in kidney were relatively highly expressed. By means of massive parallel long-read sequencing and short-read technology applied to marmoset liver, intestine, kidney, and brain, the combined next-generation sequencing analyses reported here were able to identify novel marmoset drug-metabolizing P450 transcripts that have until now been little reported. These results provide a foundation for mechanistic studies and pave the way for the use of marmosets as model animals

  12. Qualitative De Novo Analysis of Full Length cDNA and Quantitative Analysis of Gene Expression for Common Marmoset (Callithrix jacchus) Transcriptomes Using Parallel Long-Read Technology and Short-Read Sequencing

    PubMed Central

    Uno, Yasuhiro; Uehara, Shotaro; Inoue, Takashi; Murayama, Norie; Onodera, Jun; Sasaki, Erika; Yamazaki, Hiroshi

    2014-01-01

    The common marmoset (Callithrix jacchus) is a non-human primate that could prove useful as human pharmacokinetic and biomedical research models. The cytochromes P450 (P450s) are a superfamily of enzymes that have critical roles in drug metabolism and disposition via monooxygenation of a broad range of xenobiotics; however, information on some marmoset P450s is currently limited. Therefore, identification and quantitative analysis of tissue-specific mRNA transcripts, including those of P450s and flavin-containing monooxygenases (FMO, another monooxygenase family), need to be carried out in detail before the marmoset can be used as an animal model in drug development. De novo assembly and expression analysis of marmoset transcripts were conducted with pooled liver, intestine, kidney, and brain samples from three male and three female marmosets. After unique sequences were automatically aligned by assembling software, the mean contig length was 718 bp (with a standard deviation of 457 bp) among a total of 47,883 transcripts. Approximately 30% of the total transcripts were matched to known marmoset sequences. Gene expression in 18 marmoset P450- and 4 FMO-like genes displayed some tissue-specific patterns. Of these, the three most highly expressed in marmoset liver were P450 2D-, 2E-, and 3A-like genes. In extrahepatic tissues, including brain, gene expressions of these monooxygenases were lower than those in liver, although P450 3A4 (previously P450 3A21) in intestine and P450 4A11- and FMO1-like genes in kidney were relatively highly expressed. By means of massive parallel long-read sequencing and short-read technology applied to marmoset liver, intestine, kidney, and brain, the combined next-generation sequencing analyses reported here were able to identify novel marmoset drug-metabolizing P450 transcripts that have until now been little reported. These results provide a foundation for mechanistic studies and pave the way for the use of marmosets as model animals

  13. SlideSort: all pairs similarity search for short reads

    PubMed Central

    Shimizu, Kana; Tsuda, Koji

    2011-01-01

    Motivation: Recent progress in DNA sequencing technologies calls for fast and accurate algorithms that can evaluate sequence similarity for a huge amount of short reads. Searching similar pairs from a string pool is a fundamental process of de novo genome assembly, genome-wide alignment and other important analyses. Results: In this study, we designed and implemented an exact algorithm SlideSort that finds all similar pairs from a string pool in terms of edit distance. Using an efficient pattern growth algorithm, SlideSort discovers chains of common k-mers to narrow down the search. Compared to existing methods based on single k-mers, our method is more effective in reducing the number of edit distance calculations. In comparison to backtracking methods such as BWA, our method is much faster in finding remote matches, scaling easily to tens of millions of sequences. Our software has an additional function of single link clustering, which is useful in summarizing short reads for further processing. Availability: Executable binary files and C++ libraries are available at http://www.cbrc.jp/~shimizu/slidesort/ for Linux and Windows. Contact: slidesort@m.aist.go.jp; shimizu-kana@aist.go.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21148542

  14. Quantitative phenotyping via deep barcode sequencing.

    PubMed

    Smith, Andrew M; Heisler, Lawrence E; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J; Chee, Mark; Roth, Frederick P; Giaever, Guri; Nislow, Corey

    2009-10-01

    Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale.

  15. From deep sequencing to actual clones.

    PubMed

    D'Angelo, Sara; Kumar, Sandeep; Naranjo, Leslie; Ferrara, Fortunato; Kiss, Csaba; Bradbury, Andrew R M

    2014-10-01

    The application of deep sequencing to in vitro display technologies has been invaluable for the straightforward analysis of enriched clones. After sequencing in vitro selected populations, clones are binned into identical or similar groups and ordered by abundance, allowing identification of those that are most enriched. However, the greatest strength of deep sequencing is also its greatest weakness: clones are easily identified by their DNA sequences, but are not physically available for testing without a laborious multistep process involving several rounds of polymerization chain reaction (PCR), assembly and cloning. Here, using the isolation of antibody genes from a phage and yeast display selection as an example, we show the power of a rapid and simple inverse PCR-based method to easily isolate clones identified by deep sequencing. Once primers have been received, clone isolation can be carried out in a single day, rather than two days. Furthermore the reduced number of PCRs required will reduce PCR mutations correspondingly. We have observed a 100% success rate in amplifying clones with an abundance as low as 0.5% in a polyclonal population. This approach allows us to obtain full-length clones even when an incomplete sequence is available, and greatly simplifies the subcloning process. Moreover, rarer, but functional clones missed by traditional screening can be easily isolated using this method, and the approach can be extended to any selected library (scFv, cDNA, libraries based on scaffold proteins) where a unique sequence signature for the desired clones of interest is available.

  16. CRISPR Detection From Short Reads Using Partial Overlap Graphs.

    PubMed

    Ben-Bassat, Ilan; Chor, Benny

    2016-06-01

    Clustered regularly interspaced short palindromic repeats (CRISPR) are structured regions in bacterial and archaeal genomes, which are part of an adaptive immune system against phages. CRISPRs are important for many microbial studies and are playing an essential role in current gene editing techniques. As such, they attract substantial research interest. The exponential growth in the amount of bacterial sequence data in recent years enables the exploration of CRISPR loci in more and more species. Most of the automated tools that detect CRISPR loci rely on fully assembled genomes. However, many assemblers do not handle repetitive regions successfully. The first tool to work directly on raw sequence data is Crass, which requires reads that are long enough to contain two copies of the same repeat. We present a method to identify CRISPR repeats from raw sequence data of short reads. The algorithm is based on an observation differentiating CRISPR repeats from other types of repeats, and it involves a series of partial constructions of the overlap graph. This enables us to avoid many of the difficulties that assemblers face, as we merely aim to identify the repeats that belong to CRISPR loci. A preliminary implementation of the algorithm shows good results and detects CRISPR repeats in cases where other existing tools fail to do so.

  17. Optimization of de novo short read assembly of seabuckthorn (Hippophae rhamnoides L.) transcriptome.

    PubMed

    Ghangal, Rajesh; Chaudhary, Saurabh; Jain, Mukesh; Purty, Ram Singh; Chand Sharma, Prakash

    2013-01-01

    Seabuckthorn (Hippophaerhamnoides L.) is known for its medicinal, nutritional and environmental importance since ancient times. However, very limited efforts have been made to characterize the genome and transcriptome of this wonder plant. Here, we report the use of next generation massive parallel sequencing technology (Illumina platform) and de novo assembly to gain a comprehensive view of the seabuckthorn transcriptome. We assembled 86,253,874 high quality short reads using six assembly tools. At our hand, assembly of non-redundant short reads following a two-step procedure was found to be the best considering various assembly quality parameters. Initially, ABySS tool was used following an additive k-mer approach. The assembled transcripts were subsequently subjected to TGICL suite. Finally, de novo short read assembly yielded 88,297 transcripts (> 100 bp), representing about 53 Mb of seabuckthorn transcriptome. The average length of transcripts was 610 bp, N50 length 1198 BP and 91% of the short reads uniquely mapped back to seabuckthorn transcriptome. A total of 41,340 (46.8%) transcripts showed significant similarity with sequences present in nr protein databases of NCBI (E-value < 1E-06). We also screened the assembled transcripts for the presence of transcription factors and simple sequence repeats. Our strategy involving the use of short read assembler (ABySS) followed by TGICL will be useful for the researchers working with a non-model organism's transcriptome in terms of saving time and reducing complexity in data management. The seabuckthorn transcriptome data generated here provide a valuable resource for gene discovery and development of functional molecular markers.

  18. Deep Whole-Genome Sequencing to Detect Mixed Infection of Mycobacterium tuberculosis

    PubMed Central

    Gan, Mingyu; Liu, Qingyun; Yang, Chongguang; Gao, Qian; Luo, Tao

    2016-01-01

    Mixed infection by multiple Mycobacterium tuberculosis (MTB) strains is associated with poor treatment outcome of tuberculosis (TB). Traditional genotyping methods have been used to detect mixed infections of MTB, however, their sensitivity and resolution are limited. Deep whole-genome sequencing (WGS) has been proved highly sensitive and discriminative for studying population heterogeneity of MTB. Here, we developed a phylogenetic-based method to detect MTB mixed infections using WGS data. We collected published WGS data of 782 global MTB strains from public database. We called homogeneous and heterogeneous single nucleotide variations (SNVs) of individual strains by mapping short reads to the ancestral MTB reference genome. We constructed a phylogenomic database based on 68,639 homogeneous SNVs of 652 MTB strains. Mixed infections were determined if multiple evolutionary paths were identified by mapping the SNVs of individual samples to the phylogenomic database. By simulation, our method could specifically detect mixed infections when the sequencing depth of minor strains was as low as 1× coverage, and when the genomic distance of two mixed strains was as small as 16 SNVs. By applying our methods to all 782 samples, we detected 47 mixed infections and 45 of them were caused by locally endemic strains. The results indicate that our method is highly sensitive and discriminative for identifying mixed infections from deep WGS data of MTB isolates. PMID:27391214

  19. Assembly complexity of prokaryotic genomes using short reads

    PubMed Central

    2010-01-01

    Background De Bruijn graphs are a theoretical framework underlying several modern genome assembly programs, especially those that deal with very short reads. We describe an application of de Bruijn graphs to analyze the global repeat structure of prokaryotic genomes. Results We provide the first survey of the repeat structure of a large number of genomes. The analysis gives an upper-bound on the performance of genome assemblers for de novo reconstruction of genomes across a wide range of read lengths. Further, we demonstrate that the majority of genes in prokaryotic genomes can be reconstructed uniquely using very short reads even if the genomes themselves cannot. The non-reconstructible genes are overwhelmingly related to mobile elements (transposons, IS elements, and prophages). Conclusions Our results improve upon previous studies on the feasibility of assembly with short reads and provide a comprehensive benchmark against which to compare the performance of the short-read assemblers currently being developed. PMID:20064276

  20. Deep Ion Torrent sequencing identifies soil fungal community shifts after frequent prescribed fires in a southeastern US forest ecosystem.

    PubMed

    Brown, Shawn P; Callaham, Mac A; Oliver, Alena K; Jumpponen, Ari

    2013-12-01

    Prescribed burning is a common management tool to control fuel loads, ground vegetation, and facilitate desirable game species. We evaluated soil fungal community responses to long-term prescribed fire treatments in a loblolly pine forest on the Piedmont of Georgia and utilized deep Internal Transcribed Spacer Region 1 (ITS1) amplicon sequencing afforded by the recent Ion Torrent Personal Genome Machine (PGM). These deep sequence data (19,000 + reads per sample after subsampling) indicate that frequent fires (3-year fire interval) shift soil fungus communities, whereas infrequent fires (6-year fire interval) permit system resetting to a state similar to that without prescribed fire. Furthermore, in nonmetric multidimensional scaling analyses, primarily ectomycorrhizal taxa were correlated with axes associated with long fire intervals, whereas soil saprobes tended to be correlated with the frequent fire recurrence. We conclude that (1) multiplexed Ion Torrent PGM analyses allow deep cost effective sequencing of fungal communities but may suffer from short read lengths and inconsistent sequence quality adjacent to the sequencing adaptor; (2) frequent prescribed fires elicit a shift in soil fungal communities; and (3) such shifts do not occur when fire intervals are longer. Our results emphasize the general responsiveness of these forests to management, and the importance of fire return intervals in meeting management objectives.

  1. Effects of Short Read Quality and Quantity on a de novo Vertebrate Transcriptome Assembly✰

    PubMed Central

    Garcia, T.I.; Shen, Y.; Catchen, J.; Amores, A.; Schartl, M.; Postlethwait, J.; Walter, R. B.

    2011-01-01

    For many researchers, next generation sequencing data holds the key to answering a category of questions previously unassailable. One of the important and challenging steps in achieving these goals is accurately assembling the massive quantity of short sequencing reads into full nucleic acid sequences. For research groups working with non-model or wild systems, short read assembly can pose a significant challenge due to the lack of pre-existing EST or genome reference libraries. While many publications describe the overall process of sequencing and assembly, few address the topic of how many and what types of reads are best for assembly. The goal of this project was use real world data to explore the effects of read quantity and short read quality scores on the resulting de novo assemblies. Using several samples of short reads of various sizes and qualities we produced many assemblies in an automated manner. We observe how the properties of read length, read quality, and read quantity affect the resulting assemblies and provide some general recommendations based on our real-world data set. PMID:21651990

  2. DistMap: a toolkit for distributed short read mapping on a Hadoop cluster.

    PubMed

    Pandey, Ram Vinay; Schlötterer, Christian

    2013-01-01

    With the rapid and steady increase of next generation sequencing data output, the mapping of short reads has become a major data analysis bottleneck. On a single computer, it can take several days to map the vast quantity of reads produced from a single Illumina HiSeq lane. In an attempt to ameliorate this bottleneck we present a new tool, DistMap - a modular, scalable and integrated workflow to map reads in the Hadoop distributed computing framework. DistMap is easy to use, currently supports nine different short read mapping tools and can be run on all Unix-based operating systems. It accepts reads in FASTQ format as input and provides mapped reads in a SAM/BAM format. DistMap supports both paired-end and single-end reads thereby allowing the mapping of read data produced by different sequencing platforms. DistMap is available from http://code.google.com/p/distmap/

  3. Approaching marine bioprospecting in hexacorals by RNA deep sequencing.

    PubMed

    Johansen, Steinar D; Emblem, Ase; Karlsen, Bård Ove; Okkenhaug, Siri; Hansen, Hilde; Moum, Truls; Coucheron, Dag H; Seternes, Ole Morten

    2010-07-31

    RNA deep sequencing represents a new complementary approach in marine bioprospecting. Next-generation sequencing platforms have recently been developed for de novo whole transcriptome analysis, small RNA discovery and gene expression profiling. Deep sequencing transcriptomics (sequencing the complete set of cellular transcripts at a specific stage or condition) leads to sequential identification of all expressed genes in a sample. When combined to high-throughput bioinformatics and protein synthesis, RNA deep sequencing represents a new powerful approach in gene product discovery and bioprospecting. Here we summarize recent progress in the analyses of hexacoral transcriptomes with the focus on cold-water sea anemones and related organisms.

  4. Whole genome complete resequencing of Bacillus subtilis natto by combining long reads with high-quality short reads.

    PubMed

    Kamada, Mayumi; Hase, Sumitaka; Sato, Kengo; Toyoda, Atsushi; Fujiyama, Asao; Sakakibara, Yasubumi

    2014-01-01

    De novo microbial genome sequencing reached a turning point with third-generation sequencing (TGS) platforms, and several microbial genomes have been improved by TGS long reads. Bacillus subtilis natto is closely related to the laboratory standard strain B. subtilis Marburg 168, and it has a function in the production of the traditional Japanese fermented food "natto." The B. subtilis natto BEST195 genome was previously sequenced with short reads, but it included some incomplete regions. We resequenced the BEST195 genome using a PacBio RS sequencer, and we successfully obtained a complete genome sequence from one scaffold without any gaps, and we also applied Illumina MiSeq short reads to enhance quality. Compared with the previous BEST195 draft genome and Marburg 168 genome, we found that incomplete regions in the previous genome sequence were attributed to GC-bias and repetitive sequences, and we also identified some novel genes that are found only in the new genome.

  5. Short-read DNA sequencing yields microsatellite markers for Rheum

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Identifying culinary rhubarb (Rheum ×hybridum Murray) cultivars using morphological characteristics is problematic due to variability within individual genotypes, variation caused by environmental factors, plant and leaf age, similarity between genetically diverse genotypes, multiple cultivar names ...

  6. Deep sequencing: becoming a critical tool in clinical virology.

    PubMed

    Quiñones-Mateu, Miguel E; Avila, Santiago; Reyes-Teran, Gustavo; Martinez, Miguel A

    2014-09-01

    Population (Sanger) sequencing has been the standard method in basic and clinical DNA sequencing for almost 40 years; however, next-generation (deep) sequencing methodologies are now revolutionizing the field of genomics, and clinical virology is no exception. Deep sequencing is highly efficient, producing an enormous amount of information at low cost in a relatively short period of time. High-throughput sequencing techniques have enabled significant contributions to multiples areas in virology, including virus discovery and metagenomics (viromes), molecular epidemiology, pathogenesis, and studies of how viruses to escape the host immune system and antiviral pressures. In addition, new and more affordable deep sequencing-based assays are now being implemented in clinical laboratories. Here, we review the use of the current deep sequencing platforms in virology, focusing on three of the most studied viruses: human immunodeficiency virus (HIV), hepatitis C virus (HCV), and influenza virus.

  7. Deep Sequencing: Becoming a Critical Tool in Clinical Virology

    PubMed Central

    QUIÑONES-MATEU, Miguel E.; AVILA, Santiago; REYES-TERAN, Gustavo; MARTINEZ, Miguel A.

    2014-01-01

    Population (Sanger) sequencing has been the standard method in basic and clinical DNA sequencing for almost 40 years; however, next-generation (deep) sequencing methodologies are now revolutionizing the field of genomics, and clinical virology is no exception. Deep sequencing is highly efficient, producing an enormous amount of information at low cost in a relatively short period of time. High-throughput sequencing techniques have enabled significant contributions to multiples areas in virology, including virus discovery and metagenomics (viromes), molecular epidemiology, pathogenesis, and studies of how viruses to escape the host immune system and antiviral pressures. In addition, new and more affordable deep sequencing-based assays are now being implemented in clinical laboratories. Here we review the use of the current deep sequencing platforms in virology, focusing on three of the most studied viruses: human immunodeficiency virus (HIV), hepatitis C virus (HCV), and influenza virus. PMID:24998424

  8. Error tolerant indexing and alignment of short reads with covering template families.

    PubMed

    Giladi, Eldar; Healy, John; Myers, Gene; Hart, Chris; Kapranov, Philipp; Lipson, Doron; Roels, Steve; Thayer, Edward; Letovsky, Stan

    2010-10-01

    The rapid adoption of high-throughput next generation sequence data in biological research is presenting a major challenge for sequence alignment tools—specifically, the efficient alignment of vast amounts of short reads to large references in the presence of differences arising from sequencing errors and biological sequence variations. To address this challenge, we developed a short read aligner for high-throughput sequencer data that is tolerant of errors or mutations of all types—namely, substitutions, deletions, and insertions. The aligner utilizes a multi-stage approach in which template-based indexing is used to identify candidate regions for alignment with dynamic programming. A template is a pair of gapped seeds, with one used with the read and one used with the reference. In this article, we focus on the development of template families that yield error-tolerant indexing up to a given error-budget. A general algorithm for finding those families is presented, and a recursive construction that creates families with higher error tolerance from ones with a lower error tolerance is developed.

  9. QSRA – a quality-value guided de novo short read assembler

    PubMed Central

    Bryant, Douglas W; Wong, Weng-Keen; Mockler, Todd C

    2009-01-01

    Background New rapid high-throughput sequencing technologies have sparked the creation of a new class of assembler. Since all high-throughput sequencing platforms incorporate errors in their output, short-read assemblers must be designed to account for this error while utilizing all available data. Results We have designed and implemented an assembler, Quality-value guided Short Read Assembler, created to take advantage of quality-value scores as a further method of dealing with error. Compared to previous published algorithms, our assembler shows significant improvements not only in speed but also in output quality. Conclusion QSRA generally produced the highest genomic coverage, while being faster than VCAKE. QSRA is extremely competitive in its longest contig and N50/N80 contig lengths, producing results of similar quality to those of EDENA and VELVET. QSRA provides a step closer to the goal of de novo assembly of complex genomes, improving upon the original VCAKE algorithm by not only drastically reducing runtimes but also increasing the viability of the assembly algorithm through further error handling capabilities. PMID:19239711

  10. Fitness Inference from Short-Read Data: Within-Host Evolution of a Reassortant H5N1 Influenza Virus

    PubMed Central

    Illingworth, Christopher J.R.

    2015-01-01

    We present a method to infer the role of selection acting during the within-host evolution of the influenza virus from short-read genome sequence data. Linkage disequilibrium between loci is accounted for by treating short-read sequences as noisy multilocus emissions from an underlying model of haplotype evolution. A hierarchical model-selection procedure is used to infer the underlying fitness landscape of the virus insofar as that landscape is explored by the viral population. In a first application of our method, we analyze data from an evolutionary experiment describing the growth of a reassortant H5N1 virus in ferrets. Across two sets of replica experiments we infer multiple alleles to be under selection, including variants associated with receptor binding specificity, glycosylation, and with the increased transmissibility of the virus. We identify epistasis as an important component of the within-host fitness landscape, and show that adaptation can proceed through multiple genetic pathways. PMID:26243288

  11. Deep sequencing in the management of hepatitis virus infections.

    PubMed

    Quer, Josep; Rodríguez-Frias, Francisco; Gregori, Josep; Tabernero, David; Soria, Maria Eugenia; García-Cehic, Damir; Homs, Maria; Bosch, Albert; Pintó, Rosa María; Esteban, Juan Ignacio; Domingo, Esteban; Perales, Celia

    2016-12-28

    The hepatitis viruses represent a major public health problem worldwide. Procedures for characterization of the genomic composition of their populations, accurate diagnosis, identification of multiple infections, and information on inhibitor-escape mutants for treatment decisions are needed. Deep sequencing methodologies are extremely useful for these viruses since they replicate as complex and dynamic quasispecies swarms whose complexity and mutant composition are biologically relevant traits. Population complexity is a major challenge for disease prevention and control, but also an opportunity to distinguish among related but phenotypically distinct variants that might anticipate disease progression and treatment outcome. Detailed characterization of mutant spectra should permit choosing better treatment options, given the increasing number of new antiviral inhibitors available. In the present review we briefly summarize our experience on the use of deep sequencing for the management of hepatitis virus infections, particularly for hepatitis B and C viruses, and outline some possible new applications of deep sequencing for these important human pathogens.

  12. Deep sequencing increases hepatitis C virus phylogenetic cluster detection compared to Sanger sequencing.

    PubMed

    Montoya, Vincent; Olmstead, Andrea; Tang, Patrick; Cook, Darrel; Janjua, Naveed; Grebely, Jason; Jacka, Brendan; Poon, Art F Y; Krajden, Mel

    2016-09-01

    Effective surveillance and treatment strategies are required to control the hepatitis C virus (HCV) epidemic. Phylogenetic analyses are powerful tools for reconstructing the evolutionary history of viral outbreaks and identifying transmission clusters. These studies often rely on Sanger sequencing which typically generates a single consensus sequence for each infected individual. For rapidly mutating viruses such as HCV, consensus sequencing underestimates the complexity of the viral quasispecies population and could therefore generate different phylogenetic tree topologies. Although deep sequencing provides a more detailed quasispecies characterization, in-depth phylogenetic analyses are challenging due to dataset complexity and computational limitations. Here, we apply deep sequencing to a characterized population to assess its ability to identify phylogenetic clusters compared with consensus Sanger sequencing. For deep sequencing, a sample specific threshold determined by the 50th percentile of the patristic distance distribution for all variants within each individual was used to identify clusters. Among seven patristic distance thresholds tested for the Sanger sequence phylogeny ranging from 0.005-0.06, a threshold of 0.03 was found to provide the maximum balance between positive agreement (samples in a cluster) and negative agreement (samples not in a cluster) relative to the deep sequencing dataset. From 77 HCV seroconverters, 10 individuals were identified in phylogenetic clusters using both methods. Deep sequencing analysis identified an additional 4 individuals and excluded 8 other individuals relative to Sanger sequencing. The application of this deep sequencing approach could be a more effective tool to understand onward HCV transmission dynamics compared with Sanger sequencing, since the incorporation of minority sequence variants improves the discrimination of phylogenetically linked clusters.

  13. Preparing DNA libraries for multiplexed paired-end deep sequencing for Illumina GA sequencers.

    PubMed

    Son, Mike S; Taylor, Ronald K

    2011-02-01

    Whole-genome sequencing, also known as deep sequencing, is becoming a more affordable and efficient way to identify SNP mutations, deletions, and insertions in DNA sequences across several different strains. Two major obstacles preventing the widespread use of deep sequencers are the costs involved in services used to prepare DNA libraries for sequencing and the overall accuracy of the sequencing data. This unit describes the preparation of DNA libraries for multiplexed paired-end sequencing using the Illumina GA series sequencer. Self-preparation of DNA libraries can help reduce overall expenses, especially if optimization is required for the different samples, and use of the Illumina GA Sequencer can improve the quality of the data.

  14. Deep sequencing and human antibody repertoire analysis

    PubMed Central

    Boyd, Scott D; Crowe, James E

    2016-01-01

    In the past decade, high-throughput DNA sequencing (HTS) methods and improved approaches for isolating antigen-specific B cells and their antibody genes have been applied in many areas of human immunology. This work has greatly increased our understanding of human antibody repertoires and the specific clones responsible for protective immunity or immune-mediated pathogenesis. Although the principles underlying selection of individual B cell clones in the intact immune system are still under investigation, the combination of more powerful genetic tracking of antibody lineage development and functional testing of the encoded proteins promises to transform therapeutic antibody discovery and optimization. Here, we highlight recent advances in this fast-moving field. PMID:27065089

  15. RNA-CODE: a noncoding RNA classification tool for short reads in NGS data lacking reference genomes.

    PubMed

    Yuan, Cheng; Sun, Yanni

    2013-01-01

    The number of transcriptomic sequencing projects of various non-model organisms is still accumulating rapidly. As non-coding RNAs (ncRNAs) are highly abundant in living organism and play important roles in many biological processes, identifying fragmentary members of ncRNAs in small RNA-seq data is an important step in post-NGS analysis. However, the state-of-the-art ncRNA search tools are not optimized for next-generation sequencing (NGS) data, especially for very short reads. In this work, we propose and implement a comprehensive ncRNA classification tool (RNA-CODE) for very short reads. RNA-CODE is specifically designed for ncRNA identification in NGS data that lack quality reference genomes. Given a set of short reads, our tool classifies the reads into different types of ncRNA families. The classification results can be used to quantify the expression levels of different types of ncRNAs in RNA-seq data and ncRNA composition profiles in metagenomic data, respectively. The experimental results of applying RNA-CODE to RNA-seq of Arabidopsis and a metagenomic data set sampled from human guts demonstrate that RNA-CODE competes favorably in both sensitivity and specificity with other tools. The source codes of RNA-CODE can be downloaded at http://www.cse.msu.edu/~chengy/RNA_CODE.

  16. GAViT: Genome Assembly Visualization Tool for Short Read Data

    SciTech Connect

    Syed, Aijazuddin; Shapiro, Harris; Tu, Hank; Pangilinan, Jasmyn; Trong, Stephan

    2008-03-14

    It is a challenging job for genome analysts to accurately debug, troubleshoot, and validate genome assembly results. Genome analysts rely on visualization tools to help validate and troubleshoot assembly results, including such problems as mis-assemblies, low-quality regions, and repeats. Short read data adds further complexity and makes it extremely challenging for the visualization tools to scale and to view all needed assembly information. As a result, there is a need for a visualization tool that can scale to display assembly data from the new sequencing technologies. We present Genome Assembly Visualization Tool (GAViT), a highly scalable and interactive assembly visualization tool developed at the DOE Joint Genome Institute (JGI).

  17. Deep sequencing analysis of phage libraries using Illumina platform.

    PubMed

    Matochko, Wadim L; Chu, Kiki; Jin, Bingjie; Lee, Sam W; Whitesides, George M; Derda, Ratmir

    2012-09-01

    This paper presents an analysis of phage-displayed libraries of peptides using Illumina. We describe steps for the preparation of short DNA fragments for deep sequencing and MatLab software for the analysis of the results. Screening of peptide libraries displayed on the surface of bacteriophage (phage display) can be used to discover peptides that bind to any target. The key step in this discovery is the analysis of peptide sequences present in the library. This analysis is usually performed by Sanger sequencing, which is labor intensive and limited to examination of a few hundred phage clones. On the other hand, Illumina deep-sequencing technology can characterize over 10(7) reads in a single run. We applied Illumina sequencing to analyze phage libraries. Using PCR, we isolated the variable regions from M13KE phage vectors from a phage display library. The PCR primers contained (i) sequences flanking the variable region, (ii) barcodes, and (iii) variable 5'-terminal region. We used this approach to examine how diversity of peptides in phage display libraries changes as a result of amplification of libraries in bacteria. Using HiSeq single-end Illumina sequencing of these fragments, we acquired over 2×10(7) reads, 57 base pairs (bp) in length. Each read contained information about the barcode (6bp), one complimentary region (12bp) and a variable region (36bp). We applied this sequencing to a model library of 10(6) unique clones and observed that amplification enriches ∼150 clones, which dominate ∼20% of the library. Deep sequencing, for the first time, characterized the collapse of diversity in phage libraries. The results suggest that screens based on repeated amplification and small-scale sequencing identify a few binding clones and miss thousands of useful clones. The deep sequencing approach described here could identify under-represented clones in phage screens. It could also be instrumental in developing new screening strategies, which can preserve

  18. deepTools: a flexible platform for exploring deep-sequencing data.

    PubMed

    Ramírez, Fidel; Dündar, Friederike; Diehl, Sarah; Grüning, Björn A; Manke, Thomas

    2014-07-01

    We present a Galaxy based web server for processing and visualizing deeply sequenced data. The web server's core functionality consists of a suite of newly developed tools, called deepTools, that enable users with little bioinformatic background to explore the results of their sequencing experiments in a standardized setting. Users can upload pre-processed files with continuous data in standard formats and generate heatmaps and summary plots in a straight-forward, yet highly customizable manner. In addition, we offer several tools for the analysis of files containing aligned reads and enable efficient and reproducible generation of normalized coverage files. As a modular and open-source platform, deepTools can easily be expanded and customized to future demands and developments. The deepTools webserver is freely available at http://deeptools.ie-freiburg.mpg.de and is accompanied by extensive documentation and tutorials aimed at conveying the principles of deep-sequencing data analysis. The web server can be used without registration. deepTools can be installed locally either stand-alone or as part of Galaxy.

  19. DSAP: deep-sequencing small RNA analysis pipeline.

    PubMed

    Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

    2010-07-01

    DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.

  20. Deep Whole-Genome Sequencing of 100 Southeast Asian Malays

    PubMed Central

    Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2013-01-01

    Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies. PMID:23290073

  1. Deep whole-genome sequencing of 100 southeast Asian Malays.

    PubMed

    Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2013-01-10

    Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies.

  2. Unbiased Deep Sequencing of RNA Viruses from Clinical Samples

    PubMed Central

    Matranga, Christian B.; Gladden-Young, Adrianne; Qu, James; Winnicki, Sarah; Nosamiefan, Dolo; Levin, Joshua Z.; Sabeti, Pardis C.

    2016-01-01

    Here we outline a next-generation RNA sequencing protocol that enables de novo assemblies and intra-host variant calls of viral genomes collected from clinical and biological sources. The method is unbiased and universal; it uses random primers for cDNA synthesis and requires no prior knowledge of the viral sequence content. Before library construction, selective RNase H-based digestion is used to deplete unwanted RNA — including poly(rA) carrier and ribosomal RNA — from the viral RNA sample. Selective depletion improves both the data quality and the number of unique reads in viral RNA sequencing libraries. Moreover, a transposase-based 'tagmentation' step is used in the protocol as it reduces overall library construction time. The protocol has enabled rapid deep sequencing of over 600 Lassa and Ebola virus samples-including collections from both blood and tissue isolates-and is broadly applicable to other microbial genomics studies. PMID:27403729

  3. Deep Sequencing Analysis of Apple Infecting Viruses in Korea

    PubMed Central

    Cho, In-Sook; Igori, Davaajargal; Lim, Seungmo; Choi, Gug-Seoun; Hammond, John; Lim, Hyoun-Sub; Moon, Jae Sun

    2016-01-01

    Deep sequencing has generated 52 contigs derived from five viruses; Apple chlorotic leaf spot virus (ACLSV), Apple stem grooving virus (ASGV), Apple stem pitting virus (ASPV), Apple green crinkle associated virus (AGCaV), and Apricot latent virus (ApLV) were identified from eight apple samples showing small leaves and/or growth retardation. Nucleotide (nt) sequence identity of the assembled contigs was from 68% to 99% compared to the reference sequences of the five respective viral genomes. Sequences of ASPV and ASGV were the most abundantly represented by the 52 contigs assembled. The presence of the five viruses in the samples was confirmed by RT-PCR using specific primers based on the sequences of each assembled contig. All five viruses were detected in three of the samples, whereas all samples had mixed infections with at least two viruses. The most frequently detected virus was ASPV, followed by ASGV, ApLV, ACLSV, and AGCaV which were withal found in mixed infections in the tested samples. AGCaV was identified in assembled contigs ID 1012480 and 93549, which showed 82% and 78% nt sequence identity with ORF1 of AGCaV isolate Aurora-1. ApLV was identified in three assembled contigs, ID 65587, 1802365, and 116777, which showed 77%, 78%, and 76% nt sequence identity respectively with ORF1 of ApLV isolate LA2. Deep sequencing assay was shown to be a valuable and powerful tool for detection and identification of known and unknown virome in infected apple trees, here identifying ApLV and AGCaV in commercial orchards in Korea for the first time. PMID:27721694

  4. Deep Sequencing Analysis of Apple Infecting Viruses in Korea.

    PubMed

    Cho, In-Sook; Igori, Davaajargal; Lim, Seungmo; Choi, Gug-Seoun; Hammond, John; Lim, Hyoun-Sub; Moon, Jae Sun

    2016-10-01

    Deep sequencing has generated 52 contigs derived from five viruses; Apple chlorotic leaf spot virus (ACLSV), Apple stem grooving virus (ASGV), Apple stem pitting virus (ASPV), Apple green crinkle associated virus (AGCaV), and Apricot latent virus (ApLV) were identified from eight apple samples showing small leaves and/or growth retardation. Nucleotide (nt) sequence identity of the assembled contigs was from 68% to 99% compared to the reference sequences of the five respective viral genomes. Sequences of ASPV and ASGV were the most abundantly represented by the 52 contigs assembled. The presence of the five viruses in the samples was confirmed by RT-PCR using specific primers based on the sequences of each assembled contig. All five viruses were detected in three of the samples, whereas all samples had mixed infections with at least two viruses. The most frequently detected virus was ASPV, followed by ASGV, ApLV, ACLSV, and AGCaV which were withal found in mixed infections in the tested samples. AGCaV was identified in assembled contigs ID 1012480 and 93549, which showed 82% and 78% nt sequence identity with ORF1 of AGCaV isolate Aurora-1. ApLV was identified in three assembled contigs, ID 65587, 1802365, and 116777, which showed 77%, 78%, and 76% nt sequence identity respectively with ORF1 of ApLV isolate LA2. Deep sequencing assay was shown to be a valuable and powerful tool for detection and identification of known and unknown virome in infected apple trees, here identifying ApLV and AGCaV in commercial orchards in Korea for the first time.

  5. SHARAKU: an algorithm for aligning and clustering read mapping profiles of deep sequencing in non-coding RNA processing

    PubMed Central

    Tsuchiya, Mariko; Amano, Kojiro; Abe, Masaya; Seki, Misato; Hase, Sumitaka; Sato, Kengo; Sakakibara, Yasubumi

    2016-01-01

    Motivation: Deep sequencing of the transcripts of regulatory non-coding RNA generates footprints of post-transcriptional processes. After obtaining sequence reads, the short reads are mapped to a reference genome, and specific mapping patterns can be detected called read mapping profiles, which are distinct from random non-functional degradation patterns. These patterns reflect the maturation processes that lead to the production of shorter RNA sequences. Recent next-generation sequencing studies have revealed not only the typical maturation process of miRNAs but also the various processing mechanisms of small RNAs derived from tRNAs and snoRNAs. Results: We developed an algorithm termed SHARAKU to align two read mapping profiles of next-generation sequencing outputs for non-coding RNAs. In contrast with previous work, SHARAKU incorporates the primary and secondary sequence structures into an alignment of read mapping profiles to allow for the detection of common processing patterns. Using a benchmark simulated dataset, SHARAKU exhibited superior performance to previous methods for correctly clustering the read mapping profiles with respect to 5′-end processing and 3′-end processing from degradation patterns and in detecting similar processing patterns in deriving the shorter RNAs. Further, using experimental data of small RNA sequencing for the common marmoset brain, SHARAKU succeeded in identifying the significant clusters of read mapping profiles for similar processing patterns of small derived RNA families expressed in the brain. Availability and Implementation: The source code of our program SHARAKU is available at http://www.dna.bio.keio.ac.jp/sharaku/, and the simulated dataset used in this work is available at the same link. Accession code: The sequence data from the whole RNA transcripts in the hippocampus of the left brain used in this work is available from the DNA DataBank of Japan (DDBJ) Sequence Read Archive (DRA) under the accession number DRA

  6. Deep sequencing of 10,000 human genomes

    PubMed Central

    Pierce, Levi C. T.; Biggs, William H.; di Iulio, Julia; Wong, Emily H. M.; Fabani, Martin M.; Kirkness, Ewen F.; Moustafa, Ahmed; Shah, Naisha; Xie, Chao; Brewerton, Suzanne C.; Bulsara, Nadeem; Garner, Chad; Metzker, Gary; Sandoval, Efren; Perkins, Brad A.; Och, Franz J.; Turpaz, Yaron; Venter, J. Craig

    2016-01-01

    We report on the sequencing of 10,545 human genomes at 30×–40× coverage with an emphasis on quality metrics and novel variant and sequence discovery. We find that 84% of an individual human genome can be sequenced confidently. This high-confidence region includes 91.5% of exon sequence and 95.2% of known pathogenic variant positions. We present the distribution of over 150 million single-nucleotide variants in the coding and noncoding genome. Each newly sequenced genome contributes an average of 8,579 novel variants. In addition, each genome carries on average 0.7 Mb of sequence that is not found in the main build of the hg38 reference genome. The density of this catalog of variation allowed us to construct high-resolution profiles that define genomic sites that are highly intolerant of genetic variation. These results indicate that the data generated by deep genome sequencing is of the quality necessary for clinical use. PMID:27702888

  7. Deep sequencing approach for investigating infectious agents causing fever.

    PubMed

    Susilawati, T N; Jex, A R; Cantacessi, C; Pearson, M; Navarro, S; Susianto, A; Loukas, A C; McBride, W J H

    2016-07-01

    Acute undifferentiated fever (AUF) poses a diagnostic challenge due to the variety of possible aetiologies. While the majority of AUFs resolve spontaneously, some cases become prolonged and cause significant morbidity and mortality, necessitating improved diagnostic methods. This study evaluated the utility of deep sequencing in fever investigation. DNA and RNA were isolated from plasma/sera of AUF cases being investigated at Cairns Hospital in northern Australia, including eight control samples from patients with a confirmed diagnosis. Following isolation, DNA and RNA were bulk amplified and RNA was reverse transcribed to cDNA. The resulting DNA and cDNA amplicons were subjected to deep sequencing on an Illumina HiSeq 2000 platform. Bioinformatics analysis was performed using the program Kraken and the CLC assembly-alignment pipeline. The results were compared with the outcomes of clinical tests. We generated between 4 and 20 million reads per sample. The results of Kraken and CLC analyses concurred with diagnoses obtained by other means in 87.5 % (7/8) and 25 % (2/8) of control samples, respectively. Some plausible causes of fever were identified in ten patients who remained undiagnosed following routine hospital investigations, including Escherichia coli bacteraemia and scrub typhus that eluded conventional tests. Achromobacter xylosoxidans, Alteromonas macleodii and Enterobacteria phage were prevalent in all samples. A deep sequencing approach of patient plasma/serum samples led to the identification of aetiological agents putatively implicated in AUFs and enabled the study of microbial diversity in human blood. The application of this approach in hospital practice is currently limited by sequencing input requirements and complicated data analysis.

  8. Signatures of Crested Ibis MHC Revealed by Recombination Screening and Short-Reads Assembly Strategy

    PubMed Central

    Liu, Yuanhong; Xiong, Zijun; Fu, Dongke; Li, Bo; Wei, Shuguang; Xu, Xun; Li, Shengbin; Yuan, Hui

    2016-01-01

    Whole-genome shotgun (WGS) sequencing has become a routine method in genome research over the past decade. However, the assembly of highly polymorphic regions in WGS projects remains a challenge, especially for large genomes. Employing BAC library constructing, PCR screening and Sanger sequencing, traditional strategy is laborious and expensive, which hampers research on polymorphic genomic regions. As one of the most highly polymorphic regions, the major histocompatibility complex (MHC) plays a central role in the adaptive immunity of all jawed vertebrates. In this study, we introduced an efficient procedure based on recombination screening and short-reads assembly. With this procedure, we constructed a high quality 488-kb region of crested ibis MHC that consists of 3 superscaffolds and contains 50 genes. Our sequence showed comparable quality (97.29% identity) to traditional Sanger assembly, while the workload was reduced almost 7 times. Comparative study revealed distinctive features of crested ibis by exhibiting the COL11A2-BLA-BLB-BRD2 cluster and presenting both ADPRH and odorant receptor (OR) gene in the MHC region. Furthermore, the conservation of the BF-TAP1-TAP2 structure in crested ibis and other vertebrate lineages is interesting in light of the hypothesis that coevolution of functionally related genes in the primordial MHC is responsible for the appearance of the antigen presentation pathways at the birth of the adaptive immune system. PMID:27997612

  9. deepTools2: a next generation web server for deep-sequencing data analysis.

    PubMed

    Ramírez, Fidel; Ryan, Devon P; Grüning, Björn; Bhardwaj, Vivek; Kilpert, Fabian; Richter, Andreas S; Heyne, Steffen; Dündar, Friederike; Manke, Thomas

    2016-07-08

    We present an update to our Galaxy-based web server for processing and visualizing deeply sequenced data. Its core tool set, deepTools, allows users to perform complete bioinformatic workflows ranging from quality controls and normalizations of aligned reads to integrative analyses, including clustering and visualization approaches. Since we first described our deepTools Galaxy server in 2014, we have implemented new solutions for many requests from the community and our users. Here, we introduce significant enhancements and new tools to further improve data visualization and interpretation. deepTools continue to be open to all users and freely available as a web service at deeptools.ie-freiburg.mpg.de The new deepTools2 suite can be easily deployed within any Galaxy framework via the toolshed repository, and we also provide source code for command line usage under Linux and Mac OS X. A public and documented API for access to deepTools functionality is also available.

  10. Localized suffix array and its application to genome mapping problems for paired-end short reads.

    PubMed

    Kimura, Kouichi; Koike, Asako

    2009-10-01

    We introduce a new data structure, a localized suffix array, based on which occurrence information is dynamically represented as the combination of global positional information and local lexicographic order information in text search applications. For the search of a pair of words within a given distance, many candidate positions that share a coarse-grained global position can be compactly represented in term of local lexicographic orders as in the conventional suffix array, and they can be simultaneously examined for violation of the distance constraint at the coarse-grained resolution. Trade-off between the positional and lexicographical information is progressively shifted towards finer positional resolution, and the distance constraint is reexamined accordingly. Thus the paired search can be efficiently performed even if there are a large number of occurrences for each word. The localized suffix array itself is in fact a reordering of bits inside the conventional suffix array, and their memory requirements are essentially the same. We demonstrate an application to genome mapping problems for paired-end short reads generated by new-generation DNA sequencers. When paired reads are highly repetitive, it is time-consuming to naïvely calculate, sort, and compare all of the coordinates. For a human genome re-sequencing data of 36 base pairs, more than 10 times speedups over the naïve method were observed in almost half of the cases where the sums of redundancies (number of individual occurrences) of paired reads were greater than 2,000.

  11. ECHO: a reference-free short-read error correction algorithm.

    PubMed

    Kao, Wei-Chun; Chan, Andrew H; Song, Yun S

    2011-07-01

    Developing accurate, scalable algorithms to improve data quality is an important computational challenge associated with recent advances in high-throughput sequencing technology. In this study, a novel error-correction algorithm, called ECHO, is introduced for correcting base-call errors in short-reads, without the need of a reference genome. Unlike most previous methods, ECHO does not require the user to specify parameters of which optimal values are typically unknown a priori. ECHO automatically sets the parameters in the assumed model and estimates error characteristics specific to each sequencing run, while maintaining a running time that is within the range of practical use. ECHO is based on a probabilistic model and is able to assign a quality score to each corrected base. Furthermore, it explicitly models heterozygosity in diploid genomes and provides a reference-free method for detecting bases that originated from heterozygous sites. On both real and simulated data, ECHO is able to improve the accuracy of previous error-correction methods by several folds to an order of magnitude, depending on the sequence coverage depth and the position in the read. The improvement is most pronounced toward the end of the read, where previous methods become noticeably less effective. Using a whole-genome yeast data set, it is demonstrated here that ECHO is capable of coping with nonuniform coverage. Also, it is shown that using ECHO to perform error correction as a preprocessing step considerably facilitates de novo assembly, particularly in the case of low-to-moderate sequence coverage depth.

  12. Parallel and Scalable Short-Read Alignment on Multi-Core Clusters Using UPC++

    PubMed Central

    González-Domínguez, Jorge; Liu, Yongchao; Schmidt, Bertil

    2016-01-01

    The growth of next-generation sequencing (NGS) datasets poses a challenge to the alignment of reads to reference genomes in terms of alignment quality and execution speed. Some available aligners have been shown to obtain high quality mappings at the expense of long execution times. Finding fast yet accurate software solutions is of high importance to research, since availability and size of NGS datasets continue to increase. In this work we present an efficient parallelization approach for NGS short-read alignment on multi-core clusters. Our approach takes advantage of a distributed shared memory programming model based on the new UPC++ language. Experimental results using the CUSHAW3 aligner show that our implementation based on dynamic scheduling obtains good scalability on multi-core clusters. Through our evaluation, we are able to complete the single-end and paired-end alignments of 246 million reads of length 150 base-pairs in 11.54 and 16.64 minutes, respectively, using 32 nodes with four AMD Opteron 6272 16-core CPUs per node. In contrast, the multi-threaded original tool needs 2.77 and 5.54 hours to perform the same alignments on the 64 cores of one node. The source code of our parallel implementation is publicly available at the CUSHAW3 homepage (http://cushaw3.sourceforge.net). PMID:26731399

  13. Parallel and Scalable Short-Read Alignment on Multi-Core Clusters Using UPC+.

    PubMed

    González-Domínguez, Jorge; Liu, Yongchao; Schmidt, Bertil

    2016-01-01

    The growth of next-generation sequencing (NGS) datasets poses a challenge to the alignment of reads to reference genomes in terms of alignment quality and execution speed. Some available aligners have been shown to obtain high quality mappings at the expense of long execution times. Finding fast yet accurate software solutions is of high importance to research, since availability and size of NGS datasets continue to increase. In this work we present an efficient parallelization approach for NGS short-read alignment on multi-core clusters. Our approach takes advantage of a distributed shared memory programming model based on the new UPC++ language. Experimental results using the CUSHAW3 aligner show that our implementation based on dynamic scheduling obtains good scalability on multi-core clusters. Through our evaluation, we are able to complete the single-end and paired-end alignments of 246 million reads of length 150 base-pairs in 11.54 and 16.64 minutes, respectively, using 32 nodes with four AMD Opteron 6272 16-core CPUs per node. In contrast, the multi-threaded original tool needs 2.77 and 5.54 hours to perform the same alignments on the 64 cores of one node. The source code of our parallel implementation is publicly available at the CUSHAW3 homepage (http://cushaw3.sourceforge.net).

  14. Target Enrichment Improves Mapping of Complex Traits by Deep Sequencing.

    PubMed

    Guo, Jianjun; Fan, Jue; Hauser, Bernard A; Rhee, Seung Y

    2015-11-03

    Complex traits such as crop performance and human diseases are controlled by multiple genetic loci, many of which have small effects and often go undetected by traditional quantitative trait locus (QTL) mapping. Recently, bulked segregant analysis with large F2 pools and genome-level markers (named extreme-QTL or X-QTL mapping) has been used to identify many QTL. To estimate parameters impacting QTL detection for X-QTL mapping, we simulated the effects of population size, marker density, and sequencing depth of markers on QTL detectability for traits with differing heritabilities. These simulations indicate that a high (>90%) chance of detecting QTL with at least 5% effect requires 5000× sequencing depth for a trait with heritability of 0.4-0.7. For most eukaryotic organisms, whole-genome sequencing at this depth is not economically feasible. Therefore, we tested and confirmed the feasibility of applying deep sequencing of target-enriched markers for X-QTL mapping. We used two traits in Arabidopsis thaliana with different heritabilities: seed size (H(2) = 0.61) and seedling greening in response to salt (H(2) = 0.94). We used a modified G test to identify QTL regions and developed a model-based statistical framework to resolve individual peaks by incorporating recombination rates. Multiple QTL were identified for both traits, including previously undiscovered QTL. We call our method target-enriched X-QTL (TEX-QTL) mapping; this mapping approach is not limited by the genome size or the availability of recombinant inbred populations and should be applicable to many organisms and traits.

  15. Clinical actionability enhanced through deep targeted sequencing of solid tumors

    PubMed Central

    Chen, Ken; Meric-Bernstam, Funda; Zhao, Hao; Zhang, Qingxiu; Ezzeddine, Nader; Tang, Lin-ya; Qi, Yuan; Mao, Yong; Chen, Tenghui; Chong, Zechen; Zhou, Wanding; Zheng, Xiaofeng; Johnson, Amber; Aldape, Kenneth D.; Routbort, Mark J.; Luthra, Rajyalakshmi; Kopetz, Scott; Davies, Michael A.; de Groot, John; Moulder, Stacy; Vinod, Ravi; Farhangfar, Carol J.; Shaw, Kenna Mills; Mendelsohn, John; Mills, Gordon B.; Eterovic, Agda Karina

    2015-01-01

    Background Further advances of targeted cancer therapy require comprehensive in-depth profiling of somatic mutations that are present in subpopulations of tumor cells in a clinical tumor sample. However, it is unclear to what extent such intra-tumor heterogeneity is present and whether it may affect clinical decision making. To unravel this challenge, we established a deep targeted sequencing platform to identify potentially actionable DNA alterations in tumor samples. Methods We assayed 515 FFPE tumor samples and matched germline (475 patients) from 11 disease sites by capturing and sequencing all the exons in 201 cancer related genes. Mutations, indels and copy number data were reported. Results We obtained a 1000-fold average sequencing depth and identified 4794 non-synonymous mutations in the samples analyzed, which 15.2% were present at less than 10% allele frequency. Most of these low level mutations occurred at known oncogenic hotspots and are likely functional. Identifying low level mutations improved identification of mutations in actionable genes in 118 (24.84%) patients, among which 47 (9.8%) would otherwise be unactionable. In addition, acquiring ultra-high depth also ensured a low false discovery rate (less than 2.2%) from FFPE samples. Conclusion Our results were as accurate as a commercially available CLIA-compliant hotspot panel, but allowed the detection of a higher number of mutations in actionable genes. Our study revealed the critical importance of acquiring and utilizing high depth in profiling clinical tumor samples and presented a very useful platform for implementing routine sequencing in a cancer care institution. PMID:25626406

  16. Error analysis of deep sequencing of phage libraries: peptides censored in sequencing.

    PubMed

    Matochko, Wadim L; Derda, Ratmir

    2013-01-01

    Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N × 1 frequency vector n = ||ni||, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N × N matrix and a stochastic sampling operator (Sa). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq). Sequencing without any bias and errors is Seq = Sa IN, where IN is a N × N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process.

  17. Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

    PubMed Central

    Matochko, Wadim L.; Derda, Ratmir

    2013-01-01

    Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N × 1 frequency vector n = ||ni||, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N × N matrix and a stochastic sampling operator (Sa). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of Sa and use them to define the sequencing operator (Seq). Sequencing without any bias and errors is Seq = Sa IN, where IN is a N × N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (CEN), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process. PMID:24416071

  18. DeepBase: annotation and discovery of microRNAs and other noncoding RNAs from deep-sequencing data.

    PubMed

    Yang, Jian-Hua; Qu, Liang-Hu

    2012-01-01

    Recent advances in high-throughput deep-sequencing technology have produced large numbers of short and long RNA sequences and enabled the detection and profiling of known and novel microRNAs (miRNAs) and other noncoding RNAs (ncRNAs) at unprecedented sensitivity and depth. In this chapter, we describe the use of deepBase, a database that we have developed to integrate all public deep-sequencing data and to facilitate the comprehensive annotation and discovery of miRNAs and other ncRNAs from these data. deepBase provides an integrative, interactive, and versatile web graphical interface to evaluate miRBase-annotated miRNA genes and other known ncRNAs, explores the expression patterns of miRNAs and other ncRNAs, and discovers novel miRNAs and other ncRNAs from deep-sequencing data. deepBase also provides a deepView genome browser to comparatively analyze these data at multiple levels. deepBase is available at http://deepbase.sysu.edu.cn/.

  19. Diagnosing Balamuthia mandrillaris Encephalitis With Metagenomic Deep Sequencing

    PubMed Central

    Shanbhag, Niraj M.; Reid, Michael J.; Singhal, Neel S.; Gelfand, Jeffrey M.; Sample, Hannah A.; Benkli, Barlas; O'Donovan, Brian D.; Ali, Ibne K.M.; Keating, M. Kelly; Dunnebacke, Thelma H.; Wood, Matthew D.; Bollen, Andrew; DeRisi, Joseph L.

    2015-01-01

    Objective Identification of a particular cause of meningoencephalitis can be challenging owing to the myriad bacteria, viruses, fungi, and parasites that can produce overlapping clinical phenotypes, frequently delaying diagnosis and therapy. Metagenomic deep sequencing (MDS) approaches to infectious disease diagnostics are known for their ability to identify unusual or novel viruses and thus are well suited for investigating possible etiologies of meningoencephalitis. Methods We present the case of a 74‐year‐old woman with endophthalmitis followed by meningoencephalitis. MDS of her cerebrospinal fluid (CSF) was performed to identify an infectious agent. Results Sequences aligning to Balamuthia mandrillaris ribosomal RNA genes were identified in the CSF by MDS. Polymerase chain reaction subsequently confirmed the presence of B. mandrillaris in CSF, brain tissue, and vitreous fluid from the patient's infected eye. B. mandrillaris serology and immunohistochemistry for free‐living amoebas on the brain biopsy tissue were positive. Interpretation The diagnosis was made using MDS after the patient had been hospitalized for several weeks and subjected to costly and invasive testing. MDS is a powerful diagnostic tool with the potential for rapid and unbiased pathogen identification leading to early therapeutic targeting. Ann Neurol 2015;78:Ann Neurol 2015;78:679–696 PMID:26290222

  20. Measuring Cation Dependent DNA Polymerase Fidelity Landscapes by Deep Sequencing

    PubMed Central

    Kording, Konrad; Schmidt, Daniel; Martin-Alarcon, Daniel; Tyo, Keith; Boyden, Edward S.; Church, George

    2012-01-01

    High-throughput recording of signals embedded within inaccessible micro-environments is a technological challenge. The ideal recording device would be a nanoscale machine capable of quantitatively transducing a wide range of variables into a molecular recording medium suitable for long-term storage and facile readout in the form of digital data. We have recently proposed such a device, in which cation concentrations modulate the misincorporation rate of a DNA polymerase (DNAP) on a known template, allowing DNA sequences to encode information about the local cation concentration. In this work we quantify the cation sensitivity of DNAP misincorporation rates, making possible the indirect readout of cation concentration by DNA sequencing. Using multiplexed deep sequencing, we quantify the misincorporation properties of two DNA polymerases – Dpo4 and Klenow exo− – obtaining the probability and base selectivity of misincorporation at all positions within the template. We find that Dpo4 acts as a DNA recording device for Mn2+ with a misincorporation rate gain of ∼2%/mM. This modulation of misincorporation rate is selective to the template base: the probability of misincorporation on template T by Dpo4 increases >50-fold over the range tested, while the other template bases are affected less strongly. Furthermore, cation concentrations act as scaling factors for misincorporation: on a given template base, Mn2+ and Mg2+ change the overall misincorporation rate but do not alter the relative frequencies of incoming misincorporated nucleotides. Characterization of the ion dependence of DNAP misincorporation serves as the first step towards repurposing it as a molecular recording device. PMID:22928047

  1. A parallel algorithm for error correction in high-throughput short-read data on CUDA-enabled graphics hardware.

    PubMed

    Shi, Haixiang; Schmidt, Bertil; Liu, Weiguo; Müller-Wittig, Wolfgang

    2010-04-01

    Emerging DNA sequencing technologies open up exciting new opportunities for genome sequencing by generating read data with a massive throughput. However, produced reads are significantly shorter and more error-prone compared to the traditional Sanger shotgun sequencing method. This poses challenges for de novo DNA fragment assembly algorithms in terms of both accuracy (to deal with short, error-prone reads) and scalability (to deal with very large input data sets). In this article, we present a scalable parallel algorithm for correcting sequencing errors in high-throughput short-read data so that error-free reads can be available before DNA fragment assembly, which is of high importance to many graph-based short-read assembly tools. The algorithm is based on spectral alignment and uses the Compute Unified Device Architecture (CUDA) programming model. To gain efficiency we are taking advantage of the CUDA texture memory using a space-efficient Bloom filter data structure for spectrum membership queries. We have tested the runtime and accuracy of our algorithm using real and simulated Illumina data for different read lengths, error rates, input sizes, and algorithmic parameters. Using a CUDA-enabled mass-produced GPU (available for less than US$400 at any local computer outlet), this results in speedups of 12-84 times for the parallelized error correction, and speedups of 3-63 times for both sequential preprocessing and parallelized error correction compared to the publicly available Euler-SR program. Our implementation is freely available for download from http://cuda-ec.sourceforge.net .

  2. Ultra-deep sequencing of foraminiferal microbarcodes unveils hidden richness of early monothalamous lineages in deep-sea sediments.

    PubMed

    Lecroq, Béatrice; Lejzerowicz, Franck; Bachar, Dipankar; Christen, Richard; Esling, Philippe; Baerlocher, Loïc; Østerås, Magne; Farinelli, Laurent; Pawlowski, Jan

    2011-08-09

    Deep-sea floors represent one of the largest and most complex ecosystems on Earth but remain essentially unexplored. The vastness and remoteness of this ecosystem make deep-sea sampling difficult, hampering traditional taxonomic observations and diversity assessment. This problem is particularly true in the case of the deep-sea meiofauna, which largely comprises small-sized, fragile, and difficult-to-identify metazoans and protists. Here, we introduce an ultra-deep sequencing-based metagenetic approach to examine the richness of benthic foraminifera, a principal component of deep-sea meiofauna. We used Illumina sequencing technology to assess foraminiferal richness in 31 unsieved deep-sea sediment samples from five distinct oceanic regions. We sequenced an extremely short fragment (36 bases) of the small subunit ribosomal DNA hypervariable region 37f, which has been shown to accurately distinguish foraminiferal species. In total, we obtained 495,978 unique sequences that were grouped into 1,643 operational taxonomic units, of which about half (841) could be reliably assigned to foraminifera. The vast majority of the operational taxonomic units (nearly 90%) were either assigned to early (ancient) lineages of soft-walled, single-chambered (monothalamous) foraminifera or remained undetermined and yet possibly belong to unknown early lineages. Contrasting with the classical view of multichambered taxa dominating foraminiferal assemblages, our work reflects an unexpected diversity of monothalamous lineages that are as yet unknown using conventional micropaleontological observations. Although we can only speculate about their morphology, the immense richness of deep-sea phylotypes revealed by this study suggests that ultra-deep sequencing can improve understanding of deep-sea benthic diversity considered until now as unknowable based on a traditional taxonomic approach.

  3. Whole plastome sequencing reveals deep plastid divergence and cytonuclear discordance between closely related balsam poplars, Populus balsamifera and P. trichocarpa (Salicaceae).

    PubMed

    Huang, Daisie I; Hefer, Charles A; Kolosova, Natalia; Douglas, Carl J; Cronk, Quentin C B

    2014-11-01

    As molecular phylogenetic analyses incorporate ever-greater numbers of loci, cases of cytonuclear discordance - the phenomenon in which nuclear gene trees deviate significantly from organellar gene trees - are being reported more frequently. Plant examples of topological discordance, caused by recent hybridization between extant species, are well known. However, examples of branch-length discordance are less reported in plants relative to animals. We use a combination of de novo assembly and reference-based mapping using short-read shotgun sequences to construct a robust phylogeny of the plastome for multiple individuals of all the common Populus species in North America. We demonstrate a case of strikingly high plastome divergence, in contrast to little nuclear genome divergence, in two closely related balsam poplars, Populus balsamifera and Populus trichocarpa (Populus balsamifera ssp. trichocarpa). Previous studies with nuclear loci indicate that the two species (or subspecies) diverged since the late Pleistocene, whereas their plastomes indicate deep divergence, dating to at least the Pliocene (6-7 Myr ago). Our finding is in marked contrast to the estimated Pleistocene divergence of the nuclear genomes, previously calculated at 75 000 yr ago, suggesting plastid capture from a 'ghost lineage' of a now-extinct North American poplar.

  4. Key roles for freshwater Actinobacteria revealed by deep metagenomic sequencing.

    PubMed

    Ghai, Rohit; Mizuno, Carolina Megumi; Picazo, Antonio; Camacho, Antonio; Rodriguez-Valera, Francisco

    2014-12-01

    Freshwater ecosystems are critical but fragile environments directly affecting society and its welfare. However, our understanding of genuinely freshwater microbial communities, constrained by our capacity to manipulate its prokaryotic participants in axenic cultures, remains very rudimentary. Even the most abundant components, freshwater Actinobacteria, remain largely unknown. Here, applying deep metagenomic sequencing to the microbial community of a freshwater reservoir, we were able to circumvent this traditional bottleneck and reconstruct de novo seven distinct streamlined actinobacterial genomes. These genomes represent three new groups of photoheterotrophic, planktonic Actinobacteria. We describe for the first time genomes of two novel clades, acMicro (Micrococcineae, related to Luna2,) and acAMD (Actinomycetales, related to acTH1). Besides, an aggregate of contigs belonged to a new branch of the Acidimicrobiales. All are estimated to have small genomes (approximately 1.2 Mb), and their GC content varied from 40 to 61%. One of the Micrococcineae genomes encodes a proteorhodopsin, a rhodopsin type reported for the first time in Actinobacteria. The remarkable potential capacity of some of these genomes to transform recalcitrant plant detrital material, particularly lignin-derived compounds, suggests close linkages between the terrestrial and aquatic realms. Moreover, abundances of Actinobacteria correlate inversely to those of Cyanobacteria that are responsible for prolonged and frequently irretrievable damage to freshwater ecosystems. This suggests that they might serve as sentinels of impending ecological catastrophes.

  5. Concurrent and Accurate Short Read Mapping on Multicore Processors.

    PubMed

    Martínez, Héctor; Tárraga, Joaquín; Medina, Ignacio; Barrachina, Sergio; Castillo, Maribel; Dopazo, Joaquín; Quintana-Ortí, Enrique S

    2015-01-01

    We introduce a parallel aligner with a work-flow organization for fast and accurate mapping of RNA sequences on servers equipped with multicore processors. Our software, HPG Aligner SA (HPG Aligner SA is an open-source application. The software is available at http://www.opencb.org, exploits a suffix array to rapidly map a large fraction of the RNA fragments (reads), as well as leverages the accuracy of the Smith-Waterman algorithm to deal with conflictive reads. The aligner is enhanced with a careful strategy to detect splice junctions based on an adaptive division of RNA reads into small segments (or seeds), which are then mapped onto a number of candidate alignment locations, providing crucial information for the successful alignment of the complete reads. The experimental results on a platform with Intel multicore technology report the parallel performance of HPG Aligner SA, on RNA reads of 100-400 nucleotides, which excels in execution time/sensitivity to state-of-the-art aligners such as TopHat 2+Bowtie 2, MapSplice, and STAR.

  6. Unified View of Backward Backtracking in Short Read Mapping

    NASA Astrophysics Data System (ADS)

    Mäkinen, Veli; Välimäki, Niko; Laaksonen, Antti; Katainen, Riku

    Mapping short DNA reads to the reference genome is the core task in the recent high-throughput technologies to study e.g. protein-DNA interactions (ChIP-seq) and alternative splicing (RNA-seq). Several tools for the task (bowtie, bwa, SOAP2, TopHat) have been developed that exploit Burrows-Wheeler transform and the backward backtracking technique on it, to map the reads to their best approximate occurrences in the genome. These tools use different tailored mechanisms for small error-levels to prune the search phase significantly. We propose a new pruning mechanism that can be seen a generalization of the tailored mechanisms used so far. It uses a novel idea of storing all cyclic rotations of fixed length substrings of the reference sequence with a compressed index that is able to exploit the repetitions created to level out the growth of the input set. For RNA-seq we propose a new method that combines dynamic programming with backtracking to map efficiently and correctly all reads that span two exons. Same mechanism can also be used for mapping mate-pair reads.

  7. Deep-Sea, Deep-Sequencing: Metabarcoding Extracellular DNA from Sediments of Marine Canyons

    PubMed Central

    Guardiola, Magdalena; Uriz, María Jesús; Taberlet, Pierre; Coissac, Eric; Wangensteen, Owen Simon; Turon, Xavier

    2015-01-01

    Marine sediments are home to one of the richest species pools on Earth, but logistics and a dearth of taxonomic work-force hinders the knowledge of their biodiversity. We characterized α- and β-diversity of deep-sea assemblages from submarine canyons in the western Mediterranean using an environmental DNA metabarcoding. We used a new primer set targeting a short eukaryotic 18S sequence (ca. 110 bp). We applied a protocol designed to obtain extractions enriched in extracellular DNA from replicated sediment corers. With this strategy we captured information from DNA (local or deposited from the water column) that persists adsorbed to inorganic particles and buffered short-term spatial and temporal heterogeneity. We analysed replicated samples from 20 localities including 2 deep-sea canyons, 1 shallower canal, and two open slopes (depth range 100–2,250 m). We identified 1,629 MOTUs, among which the dominant groups were Metazoa (with representatives of 19 phyla), Alveolata, Stramenopiles, and Rhizaria. There was a marked small-scale heterogeneity as shown by differences in replicates within corers and within localities. The spatial variability between canyons was significant, as was the depth component in one of the canyons where it was tested. Likewise, the composition of the first layer (1 cm) of sediment was significantly different from deeper layers. We found that qualitative (presence-absence) and quantitative (relative number of reads) data showed consistent trends of differentiation between samples and geographic areas. The subset of exclusively benthic MOTUs showed similar patterns of β-diversity and community structure as the whole dataset. Separate analyses of the main metazoan phyla (in number of MOTUs) showed some differences in distribution attributable to different lifestyles. Our results highlight the differentiation that can be found even between geographically close assemblages, and sets the ground for future monitoring and conservation efforts on

  8. Deep-Sea, Deep-Sequencing: Metabarcoding Extracellular DNA from Sediments of Marine Canyons.

    PubMed

    Guardiola, Magdalena; Uriz, María Jesús; Taberlet, Pierre; Coissac, Eric; Wangensteen, Owen Simon; Turon, Xavier

    2015-01-01

    Marine sediments are home to one of the richest species pools on Earth, but logistics and a dearth of taxonomic work-force hinders the knowledge of their biodiversity. We characterized α- and β-diversity of deep-sea assemblages from submarine canyons in the western Mediterranean using an environmental DNA metabarcoding. We used a new primer set targeting a short eukaryotic 18S sequence (ca. 110 bp). We applied a protocol designed to obtain extractions enriched in extracellular DNA from replicated sediment corers. With this strategy we captured information from DNA (local or deposited from the water column) that persists adsorbed to inorganic particles and buffered short-term spatial and temporal heterogeneity. We analysed replicated samples from 20 localities including 2 deep-sea canyons, 1 shallower canal, and two open slopes (depth range 100-2,250 m). We identified 1,629 MOTUs, among which the dominant groups were Metazoa (with representatives of 19 phyla), Alveolata, Stramenopiles, and Rhizaria. There was a marked small-scale heterogeneity as shown by differences in replicates within corers and within localities. The spatial variability between canyons was significant, as was the depth component in one of the canyons where it was tested. Likewise, the composition of the first layer (1 cm) of sediment was significantly different from deeper layers. We found that qualitative (presence-absence) and quantitative (relative number of reads) data showed consistent trends of differentiation between samples and geographic areas. The subset of exclusively benthic MOTUs showed similar patterns of β-diversity and community structure as the whole dataset. Separate analyses of the main metazoan phyla (in number of MOTUs) showed some differences in distribution attributable to different lifestyles. Our results highlight the differentiation that can be found even between geographically close assemblages, and sets the ground for future monitoring and conservation efforts on

  9. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning.

    PubMed

    Alipanahi, Babak; Delong, Andrew; Weirauch, Matthew T; Frey, Brendan J

    2015-08-01

    Knowing the sequence specificities of DNA- and RNA-binding proteins is essential for developing models of the regulatory processes in biological systems and for identifying causal disease variants. Here we show that sequence specificities can be ascertained from experimental data with 'deep learning' techniques, which offer a scalable, flexible and unified computational approach for pattern discovery. Using a diverse array of experimental data and evaluation metrics, we find that deep learning outperforms other state-of-the-art methods, even when training on in vitro data and testing on in vivo data. We call this approach DeepBind and have built a stand-alone software tool that is fully automatic and handles millions of sequences per experiment. Specificities determined by DeepBind are readily visualized as a weighted ensemble of position weight matrices or as a 'mutation map' that indicates how variations affect binding within a specific sequence.

  10. Complete Genome Sequence of Bacteriophage Deep-Blue Infecting Emetic Bacillus cereus.

    PubMed

    Hock, Louise; Gillis, Annika; Mahillon, Jacques

    2016-06-16

    The Bacillus cereus emetic pathotype is responsible for important food-borne intoxications. Here, we describe the complete genome sequence of bacteriophage Deep-Blue, which is able to infect emetic strains of B. cereus Deep-Blue is a 159-kb myophage of the Bastille-like group within the Spounavirinae.

  11. Complete Genome Sequence of Bacteriophage Deep-Blue Infecting Emetic Bacillus cereus

    PubMed Central

    Hock, Louise; Gillis, Annika

    2016-01-01

    The Bacillus cereus emetic pathotype is responsible for important food-borne intoxications. Here, we describe the complete genome sequence of bacteriophage Deep-Blue, which is able to infect emetic strains of B. cereus. Deep-Blue is a 159-kb myophage of the Bastille-like group within the Spounavirinae. PMID:27313285

  12. Virus identification in unknown tropical febrile illness cases using deep sequencing.

    PubMed

    Yozwiak, Nathan L; Skewes-Cox, Peter; Stenglein, Mark D; Balmaseda, Angel; Harris, Eva; DeRisi, Joseph L

    2012-01-01

    Dengue virus is an emerging infectious agent that infects an estimated 50-100 million people annually worldwide, yet current diagnostic practices cannot detect an etiologic pathogen in ∼40% of dengue-like illnesses. Metagenomic approaches to pathogen detection, such as viral microarrays and deep sequencing, are promising tools to address emerging and non-diagnosable disease challenges. In this study, we used the Virochip microarray and deep sequencing to characterize the spectrum of viruses present in human sera from 123 Nicaraguan patients presenting with dengue-like symptoms but testing negative for dengue virus. We utilized a barcoding strategy to simultaneously deep sequence multiple serum specimens, generating on average over 1 million reads per sample. We then implemented a stepwise bioinformatic filtering pipeline to remove the majority of human and low-quality sequences to improve the speed and accuracy of subsequent unbiased database searches. By deep sequencing, we were able to detect virus sequence in 37% (45/123) of previously negative cases. These included 13 cases with Human Herpesvirus 6 sequences. Other samples contained sequences with similarity to sequences from viruses in the Herpesviridae, Flaviviridae, Circoviridae, Anelloviridae, Asfarviridae, and Parvoviridae families. In some cases, the putative viral sequences were virtually identical to known viruses, and in others they diverged, suggesting that they may derive from novel viruses. These results demonstrate the utility of unbiased metagenomic approaches in the detection of known and divergent viruses in the study of tropical febrile illness.

  13. Error rates, PCR recombination, and sampling depth in HIV-1 whole genome deep sequencing.

    PubMed

    Zanini, Fabio; Brodin, Johanna; Albert, Jan; Neher, Richard A

    2016-12-27

    Deep sequencing is a powerful and cost-effective tool to characterize the genetic diversity and evolution of virus populations. While modern sequencing instruments readily cover viral genomes many thousand fold and very rare variants can in principle be detected, sequencing errors, amplification biases, and other artifacts can limit sensitivity and complicate data interpretation. For this reason, the number of studies using whole genome deep sequencing to characterize viral quasi-species in clinical samples is still limited. We have previously undertaken a large scale whole genome deep sequencing study of HIV-1 populations. Here we discuss the challenges, error profiles, control experiments, and computational test we developed to quantify the accuracy of variant frequency estimation.

  14. KRAS, BRAF, and TP53 deep sequencing for colorectal carcinoma patient diagnostics.

    PubMed

    Rechsteiner, Markus; von Teichman, Adriana; Rüschoff, Jan H; Fankhauser, Niklaus; Pestalozzi, Bernhard; Schraml, Peter; Weber, Achim; Wild, Peter; Zimmermann, Dieter; Moch, Holger

    2013-05-01

    In colorectal carcinoma, KRAS (alias Ki-ras) and BRAF mutations have emerged as predictors of resistance to anti-epidermal growth factor receptor antibody treatment and worse patient outcome, respectively. In this study, we aimed to establish a high-throughput deep sequencing workflow according to 454 pyrosequencing technology to cope with the increasing demand for sequence information at medical institutions. A cohort of 81 patients with known KRAS mutation status detected by Sanger sequencing was chosen for deep sequencing. The workflow allowed us to analyze seven amplicons (one BRAF, two KRAS, and four TP53 exons) of nine patients in parallel in one deep sequencing run. Target amplification and variant calling showed reproducible results with input DNA derived from FFPE tissue that ranged from 0.4 to 50 ng with the use of different targets and multiplex identifiers. Equimolar pooling of each amplicon in a deep sequencing run was necessary to counterbalance differences in patient tissue quality. Five BRAF and 49 TP53 mutations with functional consequences were detected. The lowest mutation frequency detected in a patient tumor population was 5% in TP53 exon 5. This low-frequency mutation was successfully verified in a second PCR and deep sequencing run. In summary, our workflow allows us to process 315 targets a week and provides the quality, flexibility, and speed needed to be integrated as standard procedure for mutational analysis in diagnostics.

  15. DNA Methyltransferase Accessibility Protocol for Individual Templates by Deep Sequencing

    PubMed Central

    Darst, Russell P.; Nabilsi, Nancy H.; Pardo, Carolina E.; Riva, Alberto; Kladde, Michael P.

    2013-01-01

    A single-molecule probe of chromatin structure can uncover dynamic chromatin states and rare epigenetic variants of biological importance that bulk measures of chromatin structure miss. In bisulfite genomic sequencing, each sequenced clone records the methylation status of multiple sites on an individual molecule of DNA. An exogenous DNA methyltransferase can thus be used to image nucleosomes and other protein–DNA complexes. In this chapter, we describe the adaptation of this technique, termed Methylation Accessibility Protocol for individual templates, to modern high-throughput sequencing, which both simplifies the workflow and extends its utility. PMID:22929770

  16. Sniper: improved SNP discovery by multiply mapping deep sequenced reads.

    PubMed

    Simola, Daniel F; Kim, Junhyong

    2011-06-20

    SNP (single nucleotide polymorphism) discovery using next-generation sequencing data remains difficult primarily because of redundant genomic regions, such as interspersed repetitive elements and paralogous genes, present in all eukaryotic genomes. To address this problem, we developed Sniper, a novel multi-locus Bayesian probabilistic model and a computationally efficient algorithm that explicitly incorporates sequence reads that map to multiple genomic loci. Our model fully accounts for sequencing error, template bias, and multi-locus SNP combinations, maintaining high sensitivity and specificity under a broad range of conditions. An implementation of Sniper is freely available at http://kim.bio.upenn.edu/software/sniper.shtml.

  17. HIV-1 quasispecies delineation by tag linkage deep sequencing.

    PubMed

    Wu, Nicholas C; De La Cruz, Justin; Al-Mawsawi, Laith Q; Olson, C Anders; Qi, Hangfei; Luan, Harding H; Nguyen, Nguyen; Du, Yushen; Le, Shuai; Wu, Ting-Ting; Li, Xinmin; Lewis, Martha J; Yang, Otto O; Sun, Ren

    2014-01-01

    Trade-offs between throughput, read length, and error rates in high-throughput sequencing limit certain applications such as monitoring viral quasispecies. Here, we describe a molecular-based tag linkage method that allows assemblage of short sequence reads into long DNA fragments. It enables haplotype phasing with high accuracy and sensitivity to interrogate individual viral sequences in a quasispecies. This approach is demonstrated to deduce ∼ 2000 unique 1.3 kb viral sequences from HIV-1 quasispecies in vivo and after passaging ex vivo with a detection limit of ∼ 0.005% to ∼ 0.001%. Reproducibility of the method is validated quantitatively and qualitatively by a technical replicate. This approach can improve monitoring of the genetic architecture and evolution dynamics in any quasispecies population.

  18. Deep sequencing of small RNAs in plants: applied bioinformatics.

    PubMed

    Studholme, David J

    2012-01-01

    Small RNAs, including microRNA and short-interfering RNAs, play important roles in plants. In recent years, developments in sequencing technology have enabled the large-scale discovery of sRNAs in various cells, tissues and developmental stages and in response to various stresses. This review describes the bioinformatics challenges to analysing these large datasets of short-RNA sequences and some of the solutions to those challenges.

  19. Deep sequencing of phage display libraries to support antibody discovery.

    PubMed

    Ravn, Ulla; Didelot, Gérard; Venet, Sophie; Ng, Kwok-Ting; Gueneau, Franck; Rousseau, François; Calloud, Sébastien; Kosco-Vilbois, Marie; Fischer, Nicolas

    2013-03-15

    The use of next generation sequencing (NGS) for the analysis of antibody sequences both in phage display libraries and during in vitro selection processes has become increasingly popular in the last few years. Here, our methods developed for DNA preparation, sequencing and data analysis are presented. A key parameter has also been to develop new software designed for high throughput antibody sequence analysis that is used in combination with publicly available tools. As an example of our methods, we provide data from the extensive analysis of five scFv libraries generated using different heavy chain CDR3 diversification strategies. The results not only confirm that the library designs were correct but also reveal differences in quality not easily identified by standard DNA sequencing approaches. The very large number of reads permits extensive sequence coverage after the selection process. Furthermore, as samples can be multiplexed, costs decrease and more information is gained per NGS run. Using examples of results obtained post phage display selections against two antigens, frequency and clustering analysis identified novel antibody fragments that were then shown to be specific for the target antigen. In summary, the methods described here demonstrate how NGS analysis enhances quality control of complex antibody libraries as well as facilitates the antibody discovery process.

  20. Molecular Diagnosis of Actinomadura madurae Infection by 16S rRNA Deep Sequencing

    PubMed Central

    SenGupta, Dhruba J.; Hoogestraat, Daniel R.; Cummings, Lisa A.; Bryant, Bronwyn H.; Natividad, Catherine; Thielges, Stephanie; Monsaas, Peter W.; Chau, Mimosa; Barbee, Lindley A.; Rosenthal, Christopher; Cookson, Brad T.; Hoffman, Noah G.

    2013-01-01

    Next-generation DNA sequencing can be used to catalog individual organisms within complex, polymicrobial specimens. Here, we utilized deep sequencing of 16S rRNA to implicate Actinomadura madurae as the cause of mycetoma in a diabetic patient when culture and conventional molecular methods were overwhelmed by overgrowth of other organisms. PMID:24108607

  1. Molecular diagnosis of Actinomadura madurae infection by 16S rRNA deep sequencing.

    PubMed

    Salipante, Stephen J; Sengupta, Dhruba J; Hoogestraat, Daniel R; Cummings, Lisa A; Bryant, Bronwyn H; Natividad, Catherine; Thielges, Stephanie; Monsaas, Peter W; Chau, Mimosa; Barbee, Lindley A; Rosenthal, Christopher; Cookson, Brad T; Hoffman, Noah G

    2013-12-01

    Next-generation DNA sequencing can be used to catalog individual organisms within complex, polymicrobial specimens. Here, we utilized deep sequencing of 16S rRNA to implicate Actinomadura madurae as the cause of mycetoma in a diabetic patient when culture and conventional molecular methods were overwhelmed by overgrowth of other organisms.

  2. Deep sequencing approach for genetic stability evaluation of influenza A viruses.

    PubMed

    Bidzhieva, Bella; Zagorodnyaya, Tatiana; Karagiannis, Konstantinos; Simonyan, Vahan; Laassri, Majid; Chumakov, Konstantin

    2014-04-01

    Assessment of genetic stability of viruses could be used to monitor manufacturing process of both live and inactivated viral vaccines. Until recently such studies were limited by the difficulty of detecting and quantifying mutations in heterogeneous viral populations. High-throughput sequencing technologies (deep sequencing) can generate massive amounts of genetic information and could be used to reveal and quantify mutations. Comparison of different approaches for deep sequencing of the complete influenza A genome was performed to determine the best way to detect and quantify mutants in attenuated influenza reassortant strain A/Brisbane/59/2007 (H1N1) and its passages in different cell substrates. Full-length amplicons of influenza A virus segments as well as multiple overlapping amplicons covering the entire viral genome were subjected to several ways of DNA library preparation followed by deep sequencing using Solexa (Illumina) and pyrosequencing (454 Life Science) technologies. Sequencing coverage (the number of times each nucleotide was determined) of mutational profiles generated after 454-pyrosequencing of individually synthesized overlapping amplicons were relatively low and insufficiently uniform. Amplification of the entire genome of influenza virus followed by its enzymatic fragmentation, library construction, and Illumina sequencing resulted in high and uniform sequencing coverage enabling sensitive quantitation of mutations. A new bioinformatic procedure was developed to improve the post-alignment quality control for deep-sequencing data analysis.

  3. Protein sequences bound to mineral surfaces persist into deep time

    PubMed Central

    Demarchi, Beatrice; Hall, Shaun; Roncal-Herrero, Teresa; Freeman, Colin L; Woolley, Jos; Crisp, Molly K; Wilson, Julie; Fotakis, Anna; Fischer, Roman; Kessler, Benedikt M; Rakownikow Jersie-Christensen, Rosa; Olsen, Jesper V; Haile, James; Thomas, Jessica; Marean, Curtis W; Parkington, John; Presslee, Samantha; Lee-Thorp, Julia; Ditchfield, Peter; Hamilton, Jacqueline F; Ward, Martyn W; Wang, Chunting Michelle; Shaw, Marvin D; Harrison, Terry; Domínguez-Rodrigo, Manuel; MacPhee, Ross DE; Kwekason, Amandus; Ecker, Michaela; Kolska Horwitz, Liora; Chazan, Michael; Kröger, Roland; Thomas-Oates, Jane; Harding, John H; Cappellini, Enrico; Penkman, Kirsty; Collins, Matthew J

    2016-01-01

    Proteins persist longer in the fossil record than DNA, but the longevity, survival mechanisms and substrates remain contested. Here, we demonstrate the role of mineral binding in preserving the protein sequence in ostrich (Struthionidae) eggshell, including from the palaeontological sites of Laetoli (3.8 Ma) and Olduvai Gorge (1.3 Ma) in Tanzania. By tracking protein diagenesis back in time we find consistent patterns of preservation, demonstrating authenticity of the surviving sequences. Molecular dynamics simulations of struthiocalcin-1 and -2, the dominant proteins within the eggshell, reveal that distinct domains bind to the mineral surface. It is the domain with the strongest calculated binding energy to the calcite surface that is selectively preserved. Thermal age calculations demonstrate that the Laetoli and Olduvai peptides are 50 times older than any previously authenticated sequence (equivalent to ~16 Ma at a constant 10°C). DOI: http://dx.doi.org/10.7554/eLife.17092.001 PMID:27668515

  4. Short-read assembly of full-length 16S amplicons reveals bacterial diversity in subsurface sediments.

    PubMed

    Miller, Christopher S; Handley, Kim M; Wrighton, Kelly C; Frischkorn, Kyle R; Thomas, Brian C; Banfield, Jillian F

    2013-01-01

    In microbial ecology, a fundamental question relates to how community diversity and composition change in response to perturbation. Most studies have had limited ability to deeply sample community structure (e.g. Sanger-sequenced 16S rRNA libraries), or have had limited taxonomic resolution (e.g. studies based on 16S rRNA hypervariable region sequencing). Here, we combine the higher taxonomic resolution of near-full-length 16S rRNA gene amplicons with the economics and sensitivity of short-read sequencing to assay the abundance and identity of organisms that represent as little as 0.01% of sediment bacterial communities. We used a new version of EMIRGE optimized for large data size to reconstruct near-full-length 16S rRNA genes from amplicons sheared and sequenced with Illumina technology. The approach allowed us to differentiate the community composition among samples acquired before perturbation, after acetate amendment shifted the predominant metabolism to iron reduction, and once sulfate reduction began. Results were highly reproducible across technical replicates, and identified specific taxa that responded to the perturbation. All samples contain very high alpha diversity and abundant organisms from phyla without cultivated representatives. Surprisingly, at the time points measured, there was no strong loss of evenness, despite the selective pressure of acetate amendment and change in the terminal electron accepting process. However, community membership was altered significantly. The method allows for sensitive, accurate profiling of the "long tail" of low abundance organisms that exist in many microbial communities, and can resolve population dynamics in response to environmental change.

  5. Deep sequencing as a probe of normal stem cell fate and preneoplasia in human epidermis

    PubMed Central

    Simons, Benjamin D.

    2016-01-01

    Using deep sequencing technology, methods based on the sporadic acquisition of somatic DNA mutations in human tissues have been used to trace the clonal evolution of progenitor cells in diseased states. However, the potential of these approaches to explore cell fate behavior of normal tissues and the initiation of preneoplasia remain underexploited. Focusing on the results of a recent deep sequencing study of eyelid epidermis, we show that the quantitative analysis of mutant clone size provides a general method to resolve the pattern of normal stem cell fate and to detect and characterize the mutational signature of rare field transformations in human tissues, with implications for the early detection of preneoplasia. PMID:26699486

  6. Determining mutant spectra of three RNA viral samples using ultra-deep sequencing

    SciTech Connect

    Chen, H

    2012-06-06

    RNA viruses have extremely high mutation rates that enable the virus to adapt to new host environments and even jump from one species to another. As part of a viral transmission study, three viral samples collected from naturally infected animals were sequenced using Illumina paired-end technology at ultra-deep coverage. In order to determine the mutant spectra within the viral quasispecies, it is critical to understand the sequencing error rates and control for false positive calls of viral variants (point mutantations). I will estimate the sequencing error rate from two control sequences and characterize the mutant spectra in the natural samples with this error rate.

  7. Expression Profile of Ectopic Olfactory Receptors Determined by Deep Sequencing

    PubMed Central

    Flegel, Caroline; Manteniotis, Stavros; Osthold, Sandra; Hatt, Hanns; Gisselmann, Günter

    2013-01-01

    Olfactory receptors (ORs) provide the molecular basis for the detection of volatile odorant molecules by olfactory sensory neurons. The OR supergene family encodes G-protein coupled proteins that belong to the seven-transmembrane-domain receptor family. It was initially postulated that ORs are exclusively expressed in the olfactory epithelium. However, recent studies have demonstrated ectopic expression of some ORs in a variety of other tissues. In the present study, we conducted a comprehensive expression analysis of ORs using an extended panel of human tissues. This analysis made use of recent dramatic technical developments of the so-called Next Generation Sequencing (NGS) technique, which encouraged us to use open access data for the first comprehensive RNA-Seq expression analysis of ectopically expressed ORs in multiple human tissues. We analyzed mRNA-Seq data obtained by Illumina sequencing of 16 human tissues available from Illumina Body Map project 2.0 and from an additional study of OR expression in testis. At least some ORs were expressed in all the tissues analyzed. In several tissues, we could detect broadly expressed ORs such as OR2W3 and OR51E1. We also identified ORs that showed exclusive expression in one investigated tissue, such as OR4N4 in testis. For some ORs, the coding exon was found to be part of a transcript of upstream genes. In total, 111 of 400 OR genes were expressed with an FPKM (fragments per kilobase of exon per million fragments mapped) higher than 0.1 in at least one tissue. For several ORs, mRNA expression was verified by RT-PCR. Our results support the idea that ORs are broadly expressed in a variety of tissues and provide the basis for further functional studies. PMID:23405139

  8. Complete genome of Hainan papaya ringspot virus using small RNA deep sequencing.

    PubMed

    Zhang, Yuliang; Yu, Naitong; Huang, Qixing; Yin, Guohua; Guo, Anping; Wang, Xiangfeng; Xiong, Zhongguo; Liu, Zhixin

    2014-06-01

    Small RNA deep sequencing allows for virus identification, virus genome assembly, and strain differentiation. In this study, papaya plants with virus-like symptoms collected in Hainan province were used for deep sequencing and small RNA library construction. After in silicon subtraction of the papaya sRNAs, small RNA reads were used to in the viral genome assembly using a reference-guided, iterative assembly approach. A nearly complete genome was assembled for a Hainan isolate of papaya ringspot virus (PRSV-HN-2). The complete PRSV-HN-2 genome (accession no.: KF734962) was obtained after a 15-nucleotide gap was filled by direct sequencing of the amplified genomic region. Direct sequencing of several random genomic regions of the PRSV isolate did not find any sequence discrepancy with the sRNA-assembled genome. The newly sequenced PRSV-HN-2 genome shared a nucleotide identity of 96 and 94 % to that of the PRSV-HN (EF183499) and PRSV-HN-1 (HQ424465) isolates, and together with these two isolates formed a new PRSV clade. These data demonstrate that the small RNA deep sequencing technology provides a viable and rapid mean to assemble complete viral genomes in plants.

  9. Using Small RNA Deep Sequencing Data to Detect Human Viruses

    PubMed Central

    Wang, Fang; Sun, Yu; Ruan, Jishou; Chen, Rui; Chen, Xin; Chen, Chengjie; Kreuze, Jan F.; Fei, ZhangJun; Zhu, Xiao

    2016-01-01

    Small RNA sequencing (sRNA-seq) can be used to detect viruses in infected hosts without the necessity to have any prior knowledge or specialized sample preparation. The sRNA-seq method was initially used for viral detection and identification in plants and then in invertebrates and fungi. However, it is still controversial to use sRNA-seq in the detection of mammalian or human viruses. In this study, we used 931 sRNA-seq runs of data from the NCBI SRA database to detect and identify viruses in human cells or tissues, particularly from some clinical samples. Six viruses including HPV-18, HBV, HCV, HIV-1, SMRV, and EBV were detected from 36 runs of data. Four viruses were consistent with the annotations from the previous studies. HIV-1 was found in clinical samples without the HIV-positive reports, and SMRV was found in Diffuse Large B-Cell Lymphoma cells for the first time. In conclusion, these results suggest the sRNA-seq can be used to detect viruses in mammals and humans. PMID:27066498

  10. Using Small RNA Deep Sequencing Data to Detect Human Viruses.

    PubMed

    Wang, Fang; Sun, Yu; Ruan, Jishou; Chen, Rui; Chen, Xin; Chen, Chengjie; Kreuze, Jan F; Fei, ZhangJun; Zhu, Xiao; Gao, Shan

    2016-01-01

    Small RNA sequencing (sRNA-seq) can be used to detect viruses in infected hosts without the necessity to have any prior knowledge or specialized sample preparation. The sRNA-seq method was initially used for viral detection and identification in plants and then in invertebrates and fungi. However, it is still controversial to use sRNA-seq in the detection of mammalian or human viruses. In this study, we used 931 sRNA-seq runs of data from the NCBI SRA database to detect and identify viruses in human cells or tissues, particularly from some clinical samples. Six viruses including HPV-18, HBV, HCV, HIV-1, SMRV, and EBV were detected from 36 runs of data. Four viruses were consistent with the annotations from the previous studies. HIV-1 was found in clinical samples without the HIV-positive reports, and SMRV was found in Diffuse Large B-Cell Lymphoma cells for the first time. In conclusion, these results suggest the sRNA-seq can be used to detect viruses in mammals and humans.

  11. Deep sequencing reveals 50 novel genes for recessive cognitive disorders.

    PubMed

    Najmabadi, Hossein; Hu, Hao; Garshasbi, Masoud; Zemojtel, Tomasz; Abedini, Seyedeh Sedigheh; Chen, Wei; Hosseini, Masoumeh; Behjati, Farkhondeh; Haas, Stefan; Jamali, Payman; Zecha, Agnes; Mohseni, Marzieh; Püttmann, Lucia; Vahid, Leyla Nouri; Jensen, Corinna; Moheb, Lia Abbasi; Bienek, Melanie; Larti, Farzaneh; Mueller, Ines; Weissmann, Robert; Darvish, Hossein; Wrogemann, Klaus; Hadavi, Valeh; Lipkowitz, Bettina; Esmaeeli-Nieh, Sahar; Wieczorek, Dagmar; Kariminejad, Roxana; Firouzabadi, Saghar Ghasemi; Cohen, Monika; Fattahi, Zohreh; Rost, Imma; Mojahedi, Faezeh; Hertzberg, Christoph; Dehghan, Atefeh; Rajab, Anna; Banavandi, Mohammad Javad Soltani; Hoffer, Julia; Falah, Masoumeh; Musante, Luciana; Kalscheuer, Vera; Ullmann, Reinhard; Kuss, Andreas Walter; Tzschach, Andreas; Kahrizi, Kimia; Ropers, H Hilger

    2011-09-21

    Common diseases are often complex because they are genetically heterogeneous, with many different genetic defects giving rise to clinically indistinguishable phenotypes. This has been amply documented for early-onset cognitive impairment, or intellectual disability, one of the most complex disorders known and a very important health care problem worldwide. More than 90 different gene defects have been identified for X-chromosome-linked intellectual disability alone, but research into the more frequent autosomal forms of intellectual disability is still in its infancy. To expedite the molecular elucidation of autosomal-recessive intellectual disability, we have now performed homozygosity mapping, exon enrichment and next-generation sequencing in 136 consanguineous families with autosomal-recessive intellectual disability from Iran and elsewhere. This study, the largest published so far, has revealed additional mutations in 23 genes previously implicated in intellectual disability or related neurological disorders, as well as single, probably disease-causing variants in 50 novel candidate genes. Proteins encoded by several of these genes interact directly with products of known intellectual disability genes, and many are involved in fundamental cellular processes such as transcription and translation, cell-cycle control, energy metabolism and fatty-acid synthesis, which seem to be pivotal for normal brain development and function.

  12. Deep Sequencing of the Murine Olfactory Receptor Neuron Transcriptome

    PubMed Central

    Kanageswaran, Ninthujah; Demond, Marilen; Nagel, Maximilian; Schreiner, Benjamin S. P.; Baumgart, Sabrina; Scholz, Paul; Altmüller, Janine; Becker, Christian; Doerner, Julia F.; Conrad, Heike; Oberland, Sonja; Wetzel, Christian H.; Neuhaus, Eva M.; Hatt, Hanns; Gisselmann, Günter

    2015-01-01

    The ability of animals to sense and differentiate among thousands of odorants relies on a large set of olfactory receptors (OR) and a multitude of accessory proteins within the olfactory epithelium (OE). ORs and related signaling mechanisms have been the subject of intensive studies over the past years, but our knowledge regarding olfactory processing remains limited. The recent development of next generation sequencing (NGS) techniques encouraged us to assess the transcriptome of the murine OE. We analyzed RNA from OEs of female and male adult mice and from fluorescence-activated cell sorting (FACS)-sorted olfactory receptor neurons (ORNs) obtained from transgenic OMP-GFP mice. The Illumina RNA-Seq protocol was utilized to generate up to 86 million reads per transcriptome. In OE samples, nearly all OR and trace amine-associated receptor (TAAR) genes involved in the perception of volatile amines were detectably expressed. Other genes known to participate in olfactory signaling pathways were among the 200 genes with the highest expression levels in the OE. To identify OE-specific genes, we compared olfactory neuron expression profiles with RNA-Seq transcriptome data from different murine tissues. By analyzing different transcript classes, we detected the expression of non-olfactory GPCRs in ORNs and established an expression ranking for GPCRs detected in the OE. We also identified other previously undescribed membrane proteins as potential new players in olfaction. The quantitative and comprehensive transcriptome data provide a virtually complete catalogue of genes expressed in the OE and present a useful tool to uncover candidate genes involved in, for example, olfactory signaling, OR trafficking and recycling, and proliferation. PMID:25590618

  13. Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction.

    PubMed

    Laehnemann, David; Borkhardt, Arndt; McHardy, Alice Carolyn

    2016-01-01

    Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here.

  14. VirusDetect: An automated pipeline for efficient virus discovery using deep sequencing of small RNAs

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Accurate detection of viruses in plants and animals is critical for agriculture production and human health. Deep sequencing and assembly of virus-derived siRNAs has proven to be a highly efficient approach for virus discovery. However, to date no computational tools specifically designed for both k...

  15. Draft Genome Sequence of the Deep-Subsurface Actinobacterium Tessaracoccus lapidicaptus IPBSL-7T

    PubMed Central

    Pieper, Dietmar H.; Arce-Rodríguez, Alejandro

    2016-01-01

    The type strain of Tessaracoccus lapidicaptus was isolated from the deep subsurface of the Iberian Pyrite Belt (southwest Spain). Here, we report its draft genome, consisting of 27 contigs with a ~3.1-Mb genome size. The annotation revealed 2,905 coding DNA sequences, 45 tRNA genes, and three rRNA genes. PMID:27688325

  16. Studies of a Biochemical Factory: Tomato Trichome Deep Expressed Sequence Tag Sequencing and Proteomics1[W][OA

    PubMed Central

    Schilmiller, Anthony L.; Miner, Dennis P.; Larson, Matthew; McDowell, Eric; Gang, David R.; Wilkerson, Curtis; Last, Robert L.

    2010-01-01

    Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces β-caryophyllene and α-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells. PMID:20431087

  17. Resolving the Complexity of Human Skin Metagenomes Using Single-Molecule Sequencing

    PubMed Central

    Tsai, Yu-Chih; Deming, Clayton; Segre, Julia A.; Kong, Heidi H.; Korlach, Jonas

    2016-01-01

    ABSTRACT Deep metagenomic shotgun sequencing has emerged as a powerful tool to interrogate composition and function of complex microbial communities. Computational approaches to assemble genome fragments have been demonstrated to be an effective tool for de novo reconstruction of genomes from these communities. However, the resultant “genomes” are typically fragmented and incomplete due to the limited ability of short-read sequence data to assemble complex or low-coverage regions. Here, we use single-molecule, real-time (SMRT) sequencing to reconstruct a high-quality, closed genome of a previously uncharacterized Corynebacterium simulans and its companion bacteriophage from a skin metagenomic sample. Considerable improvement in assembly quality occurs in hybrid approaches incorporating short-read data, with even relatively small amounts of long-read data being sufficient to improve metagenome reconstruction. Using short-read data to evaluate strain variation of this C. simulans in its skin community at single-nucleotide resolution, we observed a dominant C. simulans strain with moderate allelic heterozygosity throughout the population. We demonstrate the utility of SMRT sequencing and hybrid approaches in metagenome quantitation, reconstruction, and annotation. PMID:26861018

  18. SOAP3: ultra-fast GPU-based parallel alignment tool for short reads.

    PubMed

    Liu, Chi-Man; Wong, Thomas; Wu, Edward; Luo, Ruibang; Yiu, Siu-Ming; Li, Yingrui; Wang, Bingqiang; Yu, Chang; Chu, Xiaowen; Zhao, Kaiyong; Li, Ruiqiang; Lam, Tak-Wah

    2012-03-15

    SOAP3 is the first short read alignment tool that leverages the multi-processors in a graphic processing unit (GPU) to achieve a drastic improvement in speed. We adapted the compressed full-text index (BWT) used by SOAP2 in view of the advantages and disadvantages of GPU. When tested with millions of Illumina Hiseq 2000 length-100 bp reads, SOAP3 takes < 30 s to align a million read pairs onto the human reference genome and is at least 7.5 and 20 times faster than BWA and Bowtie, respectively. For aligning reads with up to four mismatches, SOAP3 aligns slightly more reads than BWA and Bowtie; this is because SOAP3, unlike BWA and Bowtie, is not heuristic-based and always reports all answers.

  19. Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score

    PubMed Central

    Lee, Hayan; Schatz, Michael C.

    2012-01-01

    Motivation: Genome resequencing and short read mapping are two of the primary tools of genomics and are used for many important applications. The current state-of-the-art in mapping uses the quality values and mapping quality scores to evaluate the reliability of the mapping. These attributes, however, are assigned to individual reads and do not directly measure the problematic repeats across the genome. Here, we present the Genome Mappability Score (GMS) as a novel measure of the complexity of resequencing a genome. The GMS is a weighted probability that any read could be unambiguously mapped to a given position and thus measures the overall composition of the genome itself. Results: We have developed the Genome Mappability Analyzer to compute the GMS of every position in a genome. It leverages the parallelism of cloud computing to analyze large genomes, and enabled us to identify the 5–14% of the human, mouse, fly and yeast genomes that are difficult to analyze with short reads. We examined the accuracy of the widely used BWA/SAMtools polymorphism discovery pipeline in the context of the GMS, and found discovery errors are dominated by false negatives, especially in regions with poor GMS. These errors are fundamental to the mapping process and cannot be overcome by increasing coverage. As such, the GMS should be considered in every resequencing project to pinpoint the ‘dark matter’ of the genome, including of known clinically relevant variations in these regions. Availability: The source code and profiles of several model organisms are available at http://gma-bio.sourceforge.net Contact: hlee@cshl.edu Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:22668792

  20. Pooled Amplicon Deep Sequencing of Candidate Plasmodium falciparum Transmission-Blocking Vaccine Antigens

    PubMed Central

    Juliano, Jonathan J.; Parobek, Christian M.; Brazeau, Nicholas F.; Ngasala, Billy; Randrianarivelojosia, Milijaona; Lon, Chanthap; Mwandagalirwa, Kashamuka; Tshefu, Antoinette; Dhar, Ravi; Das, Bidyut K.; Hoffman, Irving; Martinson, Francis; Mårtensson, Andreas; Saunders, David L.; Kumar, Nirbhay; Meshnick, Steven R.

    2016-01-01

    Polymorphisms within Plasmodium falciparum vaccine candidate antigens have the potential to compromise vaccine efficacy. Understanding the allele frequencies of polymorphisms in critical binding regions of antigens can help in the designing of strain-transcendent vaccines. Here, we adopt a pooled deep-sequencing approach, originally designed to study P. falciparum drug resistance mutations, to study the diversity of two leading transmission-blocking vaccine candidates, Pfs25 and Pfs48/45. We sequenced 329 P. falciparum field isolates from six different geographic regions. Pfs25 showed little diversity, with only one known polymorphism identified in the region associated with binding of transmission-blocking antibodies among our isolates. However, we identified four new mutations among eight non-synonymous mutations within the presumed antibody-binding region of Pfs48/45. Pooled deep sequencing provides a scalable and cost-effective approach for the targeted study of allele frequencies of P. falciparum candidate vaccine antigens. PMID:26503281

  1. Genome-Wide Probing of RNA Structures In Vitro Using Nucleases and Deep Sequencing.

    PubMed

    Wan, Yue; Qu, Kun; Ouyang, Zhengqing; Chang, Howard Y

    2016-01-01

    RNA structure probing is an important technique that studies the secondary and tertiary conformations of an RNA. While it was traditionally performed on one RNA at a time, recent advances in deep sequencing has enabled the secondary structure mapping of thousands of RNAs simultaneously. Here, we describe the method Parallel Analysis for RNA Structures (PARS), which couples double and single strand specific nuclease probing to high throughput sequencing. Upon cloning of the cleavage sites into a cDNA library, deep sequencing and mapping of reads to the transcriptome, the position of paired and unpaired bases along cellular RNAs can be identified. PARS can be performed under diverse solution conditions and on different organismal RNAs to provide genome-wide RNA structural information. This information can also be further used to constrain computational predictions to provide better RNA structure models under different conditions.

  2. Deep-Sequencing Technologies and Potential Applications in Forensic DNA Testing.

    PubMed

    Zascavage, R R; Shewale, S J; Planz, J V

    2013-03-01

    Development of second- and third-generation DNA sequencing technologies have enabled an increasing number of applications in different areas such as molecular diagnostics, gene therapy, monitoring food and pharmaceutical products, biosecurity, and forensics. These technologies are based on different biochemical principles such as monitoring released pyrophosphate upon incorporation of a base (pyrosequencing), fluorescence detection subsequent to reversible incorporation of a fluorescently labeled terminator base, ligation based approach wherein fluorescence of cleaved nucleotide after ligation is measured, measuring the proton released after incorporation of a base (semiconductor-based sequencing), monitoring incorporation of a nucleotide by measuring the fluorescence of the fluorophore attached to the phosphate chain of the nucleotide, and by detecting the altered charge in a protein nanopore due to released nucleotide by exonuclease cleavage of a DNA strand. Analysis of multiple DNA fragments in parallel increases the depth of coverage while decreasing labor, cost, and time, highlighting some major advantages of deep-sequencing technologies. DNA sequencing has been routinely used in the forensic laboratories for mitochondrial DNA analysis. Fragment analysis, however, is the preferred method for Short Tandem Repeat genotyping due to the cumbersome and costly nature of fi rst-generation DNA sequencing methodologies. Deep-sequencing technologies have brought a new perspective to forensic DNA analysis. Studies include STR analysis to reveal hidden variation in the repeat regions, mtDNA sequencing, Single Nucleotide Polymorphism analysis, mixture resolution, and body fluid identification. Recent publications reveal that attempts are being made to expand the capability.

  3. Enhanced arbovirus surveillance with deep sequencing: Identification of novel rhabdoviruses and bunyaviruses in Australian mosquitoes.

    PubMed

    Coffey, Lark L; Page, Brady L; Greninger, Alexander L; Herring, Belinda L; Russell, Richard C; Doggett, Stephen L; Haniotis, John; Wang, Chunlin; Deng, Xutao; Delwart, Eric L

    2014-01-05

    Viral metagenomics characterizes known and identifies unknown viruses based on sequence similarities to any previously sequenced viral genomes. A metagenomics approach was used to identify virus sequences in Australian mosquitoes causing cytopathic effects in inoculated mammalian cell cultures. Sequence comparisons revealed strains of Liao Ning virus (Reovirus, Seadornavirus), previously detected only in China, livestock-infecting Stretch Lagoon virus (Reovirus, Orbivirus), two novel dimarhabdoviruses, named Beaumont and North Creek viruses, and two novel orthobunyaviruses, named Murrumbidgee and Salt Ash viruses. The novel virus proteomes diverged by ≥ 50% relative to their closest previously genetically characterized viral relatives. Deep sequencing also generated genomes of Warrego and Wallal viruses, orbiviruses linked to kangaroo blindness, whose genomes had not been fully characterized. This study highlights viral metagenomics in concert with traditional arbovirus surveillance to characterize known and new arboviruses in field-collected mosquitoes. Follow-up epidemiological studies are required to determine whether the novel viruses infect humans.

  4. Enhanced arbovirus surveillance with deep sequencing: identification of novel rhabdoviruses and bunyaviruses in Australian mosquitoes

    PubMed Central

    Coffey, Lark L.; Page, Brady L.; Greninger, Alexander L.; Herring, Belinda L.; Russell, Richard C.; Doggett, Stephen L.; Haniotis, John; Wang, Chunlin; Deng, Xutao; Delwart, Eric L.

    2013-01-01

    Viral metagenomics characterizes known and identifies unknown viruses based on sequence similarities to any previously sequenced viral genomes. A metagenomics approach was used to identify virus sequences in Australian mosquitoes causing cytopathic effects in inoculated mammalian cell cultures. Sequence comparisons revealed strains of Liao Ning virus (Reovirus, Seadornavirus), previously detected only in China, livestock-infecting Stretch Lagoon virus (Reovirus, Orbivirus), two novel dimarhabdoviruses, named Beaumont and North Creek viruses, and two novel orthobunyaviruses, named Murrumbidgee and Salt Ash viruses. The novel virus proteomes diverged by ≥50% relative to their closest previously genetically characterized viral relatives. Deep sequencing also generated genomes of Warrego and Wallal viruses, orbiviruses linked to kangaroo blindness, whose genomes had not been fully characterized. This study highlights viral metagenomics in concert with traditional arbovirus surveillance to characterize known and new arboviruses in field-collected mosquitoes. Follow-up epidemiological studies are required to determine whether the novel viruses infect humans. PMID:24314645

  5. Deep sequencing reveals global patterns of mRNA recruitment during translation initiation

    PubMed Central

    Gao, Rong; Yu, Kai; Nie, Jukui; Lian, Tengfei; Jin, Jianshi; Liljas, Anders; Su, Xiao-Dong

    2016-01-01

    In this work, we developed a method to systematically study the sequence preference of mRNAs during translation initiation. Traditionally, the dynamic process of translation initiation has been studied at the single molecule level with limited sequencing possibility. Using deep sequencing techniques, we identified the sequence preference at different stages of the initiation complexes. Our results provide a comprehensive and dynamic view of the initiation elements in the translation initiation region (TIR), including the S1 binding sequence, the Shine-Dalgarno (SD)/anti-SD interaction and the second codon, at the equilibrium of different initiation complexes. Moreover, our experiments reveal the conformational changes and regional dynamics throughout the dynamic process of mRNA recruitment. PMID:27460773

  6. Exploring fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing

    NASA Astrophysics Data System (ADS)

    Zhang, Xiao-Yong; Wang, Guang-Hua; Xu, Xin-Ya; Nong, Xu-Hua; Wang, Jie; Amin, Muhammad; Qi, Shu-Hua

    2016-10-01

    The present study investigated the fungal diversity in four different deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing of the nuclear ribosomal internal transcribed spacer-1 (ITS1). A total of 40,297 fungal ITS1 sequences clustered into 420 operational taxonomic units (OTUs) with 97% sequence similarity and 170 taxa were recovered from these sediments. Most ITS1 sequences (78%) belonged to the phylum Ascomycota, followed by Basidiomycota (17.3%), Zygomycota (1.5%) and Chytridiomycota (0.8%), and a small proportion (2.4%) belonged to unassigned fungal phyla. Compared with previous studies on fungal diversity of sediments from deep-sea environments by culture-dependent approach and clone library analysis, the present result suggested that Illumina sequencing had been dramatically accelerating the discovery of fungal community of deep-sea sediments. Furthermore, our results revealed that Sordariomycetes was the most diverse and abundant fungal class in this study, challenging the traditional view that the diversity of Sordariomycetes phylotypes was low in the deep-sea environments. In addition, more than 12 taxa accounted for 21.5% sequences were found to be rarely reported as deep-sea fungi, suggesting the deep-sea sediments from Okinawa Trough harbored a plethora of different fungal communities compared with other deep-sea environments. To our knowledge, this study is the first exploration of the fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing.

  7. Deep sequencing of Lotus corniculatus L. reveals key enzymes and potential transcription factors related to the flavonoid biosynthesis pathway.

    PubMed

    Wang, Ying; Hua, Wenping; Wang, Jian; Hannoufa, Abdelali; Xu, Ziqin; Wang, Zhezhi

    2013-04-01

    Lotus corniculatus L. is used worldwide as a forage crop due to its abundance of secondary metabolites and its ability to grow in severe environments. Although the entire genome of L. corniculatus var. japonicus R. is being sequenced, the differences in morphology and production of secondary metabolites between these two related species have led us to investigate this variability at the genetic level, in particular the differences in flavonoid biosynthesis. Our goal is to use the resulting information to develop more valuable forage crops and medicinal materials. Here, we conducted Illumina/Solexa sequencing to profile the transcriptome of L. corniculatus. We produced 26,492,952 short reads that corresponded to 2.38 gigabytes of total nucleotides. These reads were then assembled into 45,698 unigenes, of which a large number associated with secondary metabolism were annotated. In addition, we identified 2,998 unigenes based on homology with L. japonicus transcription factors (TFs) and grouped them into 55 families. Meanwhile, a comparison of four tag-based digital gene expression libraries, built from the flowers, pods, leaves, and roots, revealed distinct patterns of spatial expression of candidate unigenes in flavonoid biosynthesis. Based on these results, we identified many key enzymes from L. corniculatus which were different from reference genes of L. japonicus, and five TFs that are potential enhancers in flavonoid biosynthesis. Our results provide initial genetics resources that will be valuable in efforts to manipulate the flavonoid metabolic pathway in plants.

  8. Ultra-deep sequencing of intra-host rabies virus populations during cross-species transmission.

    PubMed

    Borucki, Monica K; Chen-Harris, Haiyin; Lao, Victoria; Vanier, Gilda; Wadford, Debra A; Messenger, Sharon; Allen, Jonathan E

    2013-11-01

    One of the hurdles to understanding the role of viral quasispecies in RNA virus cross-species transmission (CST) events is the need to analyze a densely sampled outbreak using deep sequencing in order to measure the amount of mutation occurring on a small time scale. In 2009, the California Department of Public Health reported a dramatic increase (350) in the number of gray foxes infected with a rabies virus variant for which striped skunks serve as a reservoir host in Humboldt County. To better understand the evolution of rabies, deep-sequencing was applied to 40 unpassaged rabies virus samples from the Humboldt outbreak. For each sample, approximately 11 kb of the 12 kb genome was amplified and sequenced using the Illumina platform. Average coverage was 17,448 and this allowed characterization of the rabies virus population present in each sample at unprecedented depths. Phylogenetic analysis of the consensus sequence data demonstrated that samples clustered according to date (1995 vs. 2009) and geographic location (northern vs. southern). A single amino acid change in the G protein distinguished a subset of northern foxes from a haplotype present in both foxes and skunks, suggesting this mutation may have played a role in the observed increased transmission among foxes in this region. Deep-sequencing data indicated that many genetic changes associated with the CST event occurred prior to 2009 since several nonsynonymous mutations that were present in the consensus sequences of skunk and fox rabies samples obtained from 20032010 were present at the sub-consensus level (as rare variants in the viral population) in skunk and fox samples from 1995. These results suggest that analysis of rare variants within a viral population may yield clues to ancestral genomes and identify rare variants that have the potential to be selected for if environment conditions change.

  9. Deep Impact Sequence Planning Using Multi-Mission Adaptable Planning Tools With Integrated Spacecraft Models

    NASA Technical Reports Server (NTRS)

    Wissler, Steven S.; Maldague, Pierre; Rocca, Jennifer; Seybold, Calina

    2006-01-01

    The Deep Impact mission was ambitious and challenging. JPL's well proven, easily adaptable multi-mission sequence planning tools combined with integrated spacecraft subsystem models enabled a small operations team to develop, validate, and execute extremely complex sequence-based activities within very short development times. This paper focuses on the core planning tool used in the mission, APGEN. It shows how the multi-mission design and adaptability of APGEN made it possible to model spacecraft subsystems as well as ground assets throughout the lifecycle of the Deep Impact project, starting with models of initial, high-level mission objectives, and culminating in detailed predictions of spacecraft behavior during mission-critical activities.

  10. The metagenome of shallow estuary sediment: A reflection of the deep biosphere

    NASA Astrophysics Data System (ADS)

    Biddle, J.; Crowgey, E.; Christman, G.; Russell, J.; Polson, S.

    2012-12-01

    Shallow sediments have proven to be valuable proxies for the deep biosphere as they contain many of the same microbial groups in a much more readily accessible habitat. One area under recent study is the White Oak River estuary in North Carolina, where a sulfate:methane transition zone is present year-round and relatives of deep subsurface Archaea such as ANME and MCG have been found. A previously studied sample was prepared for metagenomic sequencing through DNA extraction and whole genome amplification. An amplicon library was prepared from this using universal primers, showing that the community was roughly 28% MCG archaea, 27% Chloroflexi bacteria, 17% Proteobacteria, 3% ANME archaea and numerous rare taxa. The metagenome was sequenced via Illumina sequencing, yielding reads that were 152 bp. Assembly of these short reads was initially performed via the JGI pipeline and contigs over 800 bp had taxonomic assignments via MEGAN. In this analysis, the majority of reads had no hits. The next major taxon was the unassigned category, followed by a minority of hits to Thaumarchaeota, Dehalococcoides and DeltaProteobacteria. In order to improve potential errors caused by short reads in assembly, we developed a pipeline utilizing FR-HIT to bin taxonomically relevant reads prior to assembly. Using this approach, new contigs were discovered from rare groups such as the ANME that were not seen in the general assembly. Overall the data suggests shallow populations are relatively similar to deep ones on a metagenomic level and that bulk assembly of short reads should be critiqued at the individual study basis.

  11. miRBase: integrating microRNA annotation and deep-sequencing data.

    PubMed

    Kozomara, Ana; Griffiths-Jones, Sam

    2011-01-01

    miRBase is the primary online repository for all microRNA sequences and annotation. The current release (miRBase 16) contains over 15,000 microRNA gene loci in over 140 species, and over 17,000 distinct mature microRNA sequences. Deep-sequencing technologies have delivered a sharp rise in the rate of novel microRNA discovery. We have mapped reads from short RNA deep-sequencing experiments to microRNAs in miRBase and developed web interfaces to view these mappings. The user can view all read data associated with a given microRNA annotation, filter reads by experiment and count, and search for microRNAs by tissue- and stage-specific expression. These data can be used as a proxy for relative expression levels of microRNA sequences, provide detailed evidence for microRNA annotations and alternative isoforms of mature microRNAs, and allow us to revisit previous annotations. miRBase is available online at: http://www.mirbase.org/.

  12. Prognostic value of deep sequencing method for minimal residual disease detection in multiple myeloma.

    PubMed

    Martinez-Lopez, Joaquin; Lahuerta, Juan J; Pepin, François; González, Marcos; Barrio, Santiago; Ayala, Rosa; Puig, Noemí; Montalban, María A; Paiva, Bruno; Weng, Li; Jiménez, Cristina; Sopena, María; Moorhead, Martin; Cedena, Teresa; Rapado, Immaculada; Mateos, María Victoria; Rosiñol, Laura; Oriol, Albert; Blanchard, María J; Martínez, Rafael; Bladé, Joan; San Miguel, Jesús; Faham, Malek; García-Sanz, Ramón

    2014-05-15

    We assessed the prognostic value of minimal residual disease (MRD) detection in multiple myeloma (MM) patients using a sequencing-based platform in bone marrow samples from 133 MM patients in at least very good partial response (VGPR) after front-line therapy. Deep sequencing was carried out in patients in whom a high-frequency myeloma clone was identified and MRD was assessed using the IGH-VDJH, IGH-DJH, and IGK assays. The results were contrasted with those of multiparametric flow cytometry (MFC) and allele-specific oligonucleotide polymerase chain reaction (ASO-PCR). The applicability of deep sequencing was 91%. Concordance between sequencing and MFC and ASO-PCR was 83% and 85%, respectively. Patients who were MRD(-) by sequencing had a significantly longer time to tumor progression (TTP) (median 80 vs 31 months; P < .0001) and overall survival (median not reached vs 81 months; P = .02), compared with patients who were MRD(+). When stratifying patients by different levels of MRD, the respective TTP medians were: MRD ≥10(-3) 27 months, MRD 10(-3) to 10(-5) 48 months, and MRD <10(-5) 80 months (P = .003 to .0001). Ninety-two percent of VGPR patients were MRD(+). In complete response patients, the TTP remained significantly longer for MRD(-) compared with MRD(+) patients (131 vs 35 months; P = .0009).

  13. Prognostic value of deep sequencing method for minimal residual disease detection in multiple myeloma

    PubMed Central

    Lahuerta, Juan J.; Pepin, François; González, Marcos; Barrio, Santiago; Ayala, Rosa; Puig, Noemí; Montalban, María A.; Paiva, Bruno; Weng, Li; Jiménez, Cristina; Sopena, María; Moorhead, Martin; Cedena, Teresa; Rapado, Immaculada; Mateos, María Victoria; Rosiñol, Laura; Oriol, Albert; Blanchard, María J.; Martínez, Rafael; Bladé, Joan; San Miguel, Jesús; Faham, Malek; García-Sanz, Ramón

    2014-01-01

    We assessed the prognostic value of minimal residual disease (MRD) detection in multiple myeloma (MM) patients using a sequencing-based platform in bone marrow samples from 133 MM patients in at least very good partial response (VGPR) after front-line therapy. Deep sequencing was carried out in patients in whom a high-frequency myeloma clone was identified and MRD was assessed using the IGH-VDJH, IGH-DJH, and IGK assays. The results were contrasted with those of multiparametric flow cytometry (MFC) and allele-specific oligonucleotide polymerase chain reaction (ASO-PCR). The applicability of deep sequencing was 91%. Concordance between sequencing and MFC and ASO-PCR was 83% and 85%, respectively. Patients who were MRD– by sequencing had a significantly longer time to tumor progression (TTP) (median 80 vs 31 months; P < .0001) and overall survival (median not reached vs 81 months; P = .02), compared with patients who were MRD+. When stratifying patients by different levels of MRD, the respective TTP medians were: MRD ≥10−3 27 months, MRD 10−3 to 10−5 48 months, and MRD <10−5 80 months (P = .003 to .0001). Ninety-two percent of VGPR patients were MRD+. In complete response patients, the TTP remained significantly longer for MRD– compared with MRD+ patients (131 vs 35 months; P = .0009). PMID:24646471

  14. Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics.

    PubMed

    Asgari, Ehsaneddin; Mofrad, Mohammad R K

    2015-01-01

    We introduce a new representation and feature extraction method for biological sequences. Named bio-vectors (BioVec) to refer to biological sequences in general with protein-vectors (ProtVec) for proteins (amino-acid sequences) and gene-vectors (GeneVec) for gene sequences, this representation can be widely used in applications of deep learning in proteomics and genomics. In the present paper, we focus on protein-vectors that can be utilized in a wide array of bioinformatics investigations such as family classification, protein visualization, structure prediction, disordered protein identification, and protein-protein interaction prediction. In this method, we adopt artificial neural network approaches and represent a protein sequence with a single dense n-dimensional vector. To evaluate this method, we apply it in classification of 324,018 protein sequences obtained from Swiss-Prot belonging to 7,027 protein families, where an average family classification accuracy of 93%±0.06% is obtained, outperforming existing family classification methods. In addition, we use ProtVec representation to predict disordered proteins from structured proteins. Two databases of disordered sequences are used: the DisProt database as well as a database featuring the disordered regions of nucleoporins rich with phenylalanine-glycine repeats (FG-Nups). Using support vector machine classifiers, FG-Nup sequences are distinguished from structured protein sequences found in Protein Data Bank (PDB) with a 99.8% accuracy, and unstructured DisProt sequences are differentiated from structured DisProt sequences with 100.0% accuracy. These results indicate that by only providing sequence data for various proteins into this model, accurate information about protein structure can be determined. Importantly, this model needs to be trained only once and can then be applied to extract a comprehensive set of information regarding proteins of interest. Moreover, this representation can be considered as

  15. Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics

    PubMed Central

    Asgari, Ehsaneddin; Mofrad, Mohammad R. K.

    2015-01-01

    We introduce a new representation and feature extraction method for biological sequences. Named bio-vectors (BioVec) to refer to biological sequences in general with protein-vectors (ProtVec) for proteins (amino-acid sequences) and gene-vectors (GeneVec) for gene sequences, this representation can be widely used in applications of deep learning in proteomics and genomics. In the present paper, we focus on protein-vectors that can be utilized in a wide array of bioinformatics investigations such as family classification, protein visualization, structure prediction, disordered protein identification, and protein-protein interaction prediction. In this method, we adopt artificial neural network approaches and represent a protein sequence with a single dense n-dimensional vector. To evaluate this method, we apply it in classification of 324,018 protein sequences obtained from Swiss-Prot belonging to 7,027 protein families, where an average family classification accuracy of 93%±0.06% is obtained, outperforming existing family classification methods. In addition, we use ProtVec representation to predict disordered proteins from structured proteins. Two databases of disordered sequences are used: the DisProt database as well as a database featuring the disordered regions of nucleoporins rich with phenylalanine-glycine repeats (FG-Nups). Using support vector machine classifiers, FG-Nup sequences are distinguished from structured protein sequences found in Protein Data Bank (PDB) with a 99.8% accuracy, and unstructured DisProt sequences are differentiated from structured DisProt sequences with 100.0% accuracy. These results indicate that by only providing sequence data for various proteins into this model, accurate information about protein structure can be determined. Importantly, this model needs to be trained only once and can then be applied to extract a comprehensive set of information regarding proteins of interest. Moreover, this representation can be considered as

  16. Analysis of insertion-deletion from deep-sequencing data: software evaluation for optimal detection.

    PubMed

    Neuman, Joseph A; Isakov, Ofer; Shomron, Noam

    2013-01-01

    Insertion and deletion (indel) mutations, the most common type of structural variance in the human genome, affect a multitude of human traits and diseases. New sequencing technologies, such as deep sequencing, allow massive throughput of sequence data and greatly contribute to the field of disease causing mutation detection, in general, and indel detection, specifically. In order to infer indel presence (indel calling), the deep-sequencing data have to undergo comprehensive computational analysis. Selecting which indel calling software to use can often skew the results and inherent tool limitations may affect downstream analysis. In order to better understand these inter-software differences, we evaluated the performance of several indel calling software for short indel (1-10 nt) detection. We compared the software's sensitivity and predictive values in the presence of varying parameters such as read depth (coverage), read length, indel size and frequency. We pinpoint several key features that assist successful experimental design and appropriate tool selection. Our study may also serve as a basis for future evaluation of additional indel calling methods.

  17. De Novo Assembly of the Complete Genome of an Enhanced Electricity-Producing Variant of Geobacter sulfurreducens Using Only Short Reads

    PubMed Central

    Nagarajan, Harish; Butler, Jessica E.; Klimes, Anna; Qiu, Yu; Zengler, Karsten; Ward, Joy; Young, Nelson D.; Methé, Barbara A.; Palsson, Bernhard Ø.; Lovley, Derek R.; Barrett, Christian L.

    2010-01-01

    State-of-the-art DNA sequencing technologies are transforming the life sciences due to their ability to generate nucleotide sequence information with a speed and quantity that is unapproachable with traditional Sanger sequencing. Genome sequencing is a principal application of this technology, where the ultimate goal is the full and complete sequence of the organism of interest. Due to the nature of the raw data produced by these technologies, a full genomic sequence attained without the aid of Sanger sequencing has yet to be demonstrated. We have successfully developed a four-phase strategy for using only next-generation sequencing technologies (Illumina and 454) to assemble a complete microbial genome de novo. We applied this approach to completely assemble the 3.7 Mb genome of a rare Geobacter variant (KN400) that is capable of unprecedented current production at an electrode. Two key components of our strategy enabled us to achieve this result. First, we integrated the two data types early in the process to maximally leverage their complementary characteristics. And second, we used the output of different short read assembly programs in such a way so as to leverage the complementary nature of their different underlying algorithms or of their different implementations of the same underlying algorithm. The significance of our result is that it demonstrates a general approach for maximizing the efficiency and success of genome assembly projects as new sequencing technologies and new assembly algorithms are introduced. The general approach is a meta strategy, wherein sequencing data are integrated as early as possible and in particular ways and wherein multiple assembly algorithms are judiciously applied such that the deficiencies in one are complemented by another. PMID:20544019

  18. Genome Sequence of the Deep-Sea Denitrifier Pseudomonas sp. Strain MT-1, Isolated from the Mariana Trench.

    PubMed

    Fujinami, Shun; Oikawa, Yuji; Araki, Takuma; Shinmura, Yui; Midorikawa, Ryota; Ishizaka, Hikari; Kato, Chiaki; Horikoshi, Koki; Ito, Masahiro; Tamegai, Hideyuki

    2014-12-18

    Pseudomonas sp. strain MT-1 was the first deep-sea denitrifier isolated and characterized from mud recovered from a depth of 11,000 m in the Mariana Trench. We report here the genome sequence of this bacterium, which contributes to our understanding of denitrification and bioenergetics in the deep sea.

  19. Ultra-deep sequencing of VHSV isolates contributes to understanding the role of viral quasispecies.

    PubMed

    Schönherz, Anna A; Lorenzen, Niels; Guldbrandtsen, Bernt; Buitenhuis, Bart; Einer-Jensen, Katja

    2016-01-08

    The high mutation rate of RNA viruses enables the generation of a genetically diverse viral population, termed a quasispecies, within a single infected host. This high in-host genetic diversity enables an RNA virus to adapt to a diverse array of selective pressures such as host immune response and switching between host species. The negative-sense, single-stranded RNA virus, viral haemorrhagic septicaemia virus (VHSV), was originally considered an epidemic virus of cultured rainbow trout in Europe, but was later proved to be endemic among a range of marine fish species in the Northern hemisphere. To better understand the nature of a virus quasispecies related to the evolutionary potential of VHSV, a deep-sequencing protocol specific to VHSV was established and applied to 4 VHSV isolates, 2 originating from rainbow trout and 2 from Atlantic herring. Each isolate was subjected to Illumina paired end shotgun sequencing after PCR amplification and the 11.1 kb genome was successfully sequenced with an average coverage of 0.5-1.9 × 10(6) sequenced copies. Differences in single nucleotide polymorphism (SNP) frequency were detected both within and between isolates, possibly related to their stage of adaptation to host species and host immune reactions. The N, M, P and Nv genes appeared nearly fixed, while genetic variation in the G and L genes demonstrated presence of diverse genetic populations particularly in two isolates. The results demonstrate that deep sequencing and analysis methodologies can be useful for future in vivo host adaption studies of VHSV.

  20. 3′ terminal diversity of MRP RNA and other human noncoding RNAs revealed by deep sequencing

    PubMed Central

    2013-01-01

    Background Post-transcriptional 3′ end processing is a key component of RNA regulation. The abundant and essential RNA subunit of RNase MRP has been proposed to function in three distinct cellular compartments and therefore may utilize this mode of regulation. Here we employ 3′ RACE coupled with high-throughput sequencing to characterize the 3′ terminal sequences of human MRP RNA and other noncoding RNAs that form RNP complexes. Results The 3′ terminal sequence of MRP RNA from HEK293T cells has a distinctive distribution of genomically encoded termini (including an assortment of U residues) with a portion of these selectively tagged by oligo(A) tails. This profile contrasts with the relatively homogenous 3′ terminus of an in vitro transcribed MRP RNA control and the differing 3′ terminal profiles of U3 snoRNA, RNase P RNA, and telomerase RNA (hTR). Conclusions 3′ RACE coupled with deep sequencing provides a valuable framework for the functional characterization of 3′ terminal sequences of noncoding RNAs. PMID:24053768

  1. Metatranscriptomic analysis of small RNAs present in soybean deep sequencing libraries

    PubMed Central

    Molina, Lorrayne Gomes; da Fonseca, Guilherme Cordenonsi; de Morais, Guilherme Loss; de Oliveira, Luiz Felipe Valter; de Carvalho, Joseane Biso; Kulcheski, Franceli Rodrigues; Margis, Rogerio

    2012-01-01

    A large number of small RNAs unrelated to the soybean genome were identified after deep sequencing of soybean small RNA libraries. A metatranscriptomic analysis was carried out to identify the origin of these sequences. Comparative analyses of small interference RNAs (siRNAs) present in samples collected in open areas corresponding to soybean field plantations and samples from soybean cultivated in greenhouses under a controlled environment were made. Different pathogenic, symbiotic and free-living organisms were identified from samples of both growth systems. They included viruses, bacteria and different groups of fungi. This approach can be useful not only to identify potentially unknown pathogens and pests, but also to understand the relations that soybean plants establish with microorganisms that may affect, directly or indirectly, plant health and crop production. PMID:22802714

  2. Metatranscriptomic analysis of small RNAs present in soybean deep sequencing libraries.

    PubMed

    Molina, Lorrayne Gomes; da Fonseca, Guilherme Cordenonsi; de Morais, Guilherme Loss; de Oliveira, Luiz Felipe Valter; de Carvalho, Joseane Biso; Kulcheski, Franceli Rodrigues; Margis, Rogerio

    2012-06-01

    A large number of small RNAs unrelated to the soybean genome were identified after deep sequencing of soybean small RNA libraries. A metatranscriptomic analysis was carried out to identify the origin of these sequences. Comparative analyses of small interference RNAs (siRNAs) present in samples collected in open areas corresponding to soybean field plantations and samples from soybean cultivated in greenhouses under a controlled environment were made. Different pathogenic, symbiotic and free-living organisms were identified from samples of both growth systems. They included viruses, bacteria and different groups of fungi. This approach can be useful not only to identify potentially unknown pathogens and pests, but also to understand the relations that soybean plants establish with microorganisms that may affect, directly or indirectly, plant health and crop production.

  3. Deep Sequencing Analysis of Aptazyme Variants Based on a Pistol Ribozyme.

    PubMed

    Kobori, Shungo; Takahashi, Kei; Yokobayashi, Yohei

    2017-04-14

    Chemically regulated self-cleaving ribozymes, or aptazymes, are emerging as a promising class of genetic devices that allow dynamic control of gene expression in synthetic biology. However, further expansion of the limited repertoire of ribozymes and aptamers, and development of new strategies to couple the RNA elements to engineer functional aptazymes are highly desirable for synthetic biology applications. Here, we report aptazymes based on the recently identified self-cleaving pistol ribozyme class using a guanine aptamer as the molecular sensing element. Two aptazyme architectures were studied by constructing and assaying 17 728 mutants by deep sequencing. Although one of the architectures did not yield functional aptazymes, a novel aptazyme design in which the aptamer and the ribozyme were placed in tandem yielded a number of guanine-inhibited ribozymes. Detailed analysis of the extensive sequence-function data suggests a mechanism that involves a competition between two mutually exclusive RNA structures reminiscent of natural bacterial riboswitches.

  4. Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing

    SciTech Connect

    Whitehead, Timothy A.; Chevalier, Aaron; Song, Yifan; Dreyfus, Cyrille; Fleishman, Sarel J.; De Mattos, Cecilia; Myers, Chris A.; Kamisetty, Hetunandan; Blair, Patrick; Wilson, Ian A.; Baker, David

    2012-06-19

    We show that comprehensive sequence-function maps obtained by deep sequencing can be used to reprogram interaction specificity and to leapfrog over bottlenecks in affinity maturation by combining many individually small contributions not detectable in conventional approaches. We use this approach to optimize two computationally designed inhibitors against H1N1 influenza hemagglutinin and, in both cases, obtain variants with subnanomolar binding affinity. The most potent of these, a 51-residue protein, is broadly cross-reactive against all influenza group 1 hemagglutinins, including human H2, and neutralizes H1N1 viruses with a potency that rivals that of several human monoclonal antibodies, demonstrating that computational design followed by comprehensive energy landscape mapping can generate proteins with potential therapeutic utility.

  5. Multiplexed Metagenomic Deep Sequencing To Analyze the Composition of High-Priority Pathogen Reagents

    PubMed Central

    Wilson, Michael R.; Stenglein, Mark D.; Olejnik, Judith; Rennick, Linda J.; Nambulli, Sham; Feldmann, Friederike; Duprex, W. Paul

    2016-01-01

    ABSTRACT Laboratories studying high-priority pathogens need comprehensive methods to confirm microbial species and strains while also detecting contamination. Metagenomic deep sequencing (MDS) inventories nucleic acids present in laboratory stocks, providing an unbiased assessment of pathogen identity, the extent of genomic variation, and the presence of contaminants. Double-stranded cDNA MDS libraries were constructed from RNA extracted from in vitro-passaged stocks of six viruses (La Crosse virus, Ebola virus, canine distemper virus, measles virus, human respiratory syncytial virus, and vesicular stomatitis virus). Each library was dual indexed and pooled for sequencing. A custom bioinformatics pipeline determined the organisms present in each sample in a blinded fashion. Single nucleotide variant (SNV) analysis identified viral isolates. We confirmed that (i) each sample contained the expected microbe, (ii) dual indexing of the samples minimized false assignments of individual sequences, (iii) multiple viral and bacterial contaminants were present, and (iv) SNV analysis of the viral genomes allowed precise identification of the viral isolates. MDS can be multiplexed to allow simultaneous and unbiased interrogation of mixed microbial cultures and (i) confirm pathogen identity, (ii) characterize the extent of genomic variation, (iii) confirm the cell line used for virus propagation, and (iv) assess for contaminating microbes. These assessments ensure the true composition of these high-priority reagents and generate a comprehensive database of microbial genomes studied in each facility. MDS can serve as an integral part of a pathogen-tracking program which in turn will enhance sample security and increase experimental rigor and precision. IMPORTANCE Both the integrity and reproducibility of experiments using select agents depend in large part on unbiased validation to ensure the correct identity and purity of the species in question. Metagenomic deep sequencing

  6. Pathogen-specific deep sequence-coupled biopanning: A method for surveying human antibody responses

    PubMed Central

    Pascale, Juan M.; Moreno, Brechla; Chackerian, Bryce; Peabody, David S.

    2017-01-01

    Identifying the targets of antibody responses during infection is important for designing vaccines, developing diagnostic and prognostic tools, and understanding pathogenesis. We developed a novel deep sequence-coupled biopanning approach capable of identifying the protein epitopes of antibodies present in human polyclonal serum. Here, we report the adaptation of this approach for the identification of pathogen-specific epitopes recognized by antibodies elicited during acute infection. As a proof-of-principle, we applied this approach to assessing antibodies to Dengue virus (DENV). Using a panel of sera from patients with acute secondary DENV infection, we panned a DENV antigen fragment library displayed on the surface of bacteriophage MS2 virus-like particles and characterized the population of affinity-selected peptide epitopes by deep sequence analysis. Although there was considerable variation in the responses of individuals, we found several epitopes within the Envelope glycoprotein and Non-Structural Protein 1 that were commonly enriched. This report establishes a novel approach for characterizing pathogen-specific antibody responses in human sera, and has future utility in identifying novel diagnostic and vaccine targets. PMID:28152075

  7. Deep Sequencing Identification of Novel Glucocorticoid-Responsive miRNAs in Apoptotic Primary Lymphocytes

    PubMed Central

    Mav, Deepak; Scoltock, Alyson B.; Cidlowski, John A.

    2013-01-01

    Apoptosis of lymphocytes governs the response of the immune system to environmental stress and toxic insult. Signaling through the ubiquitously expressed glucocorticoid receptor, stress-induced glucocorticoid hormones induce apoptosis via mechanisms requiring altered gene expression. Several reports have detailed the changes in gene expression mediating glucocorticoid-induced apoptosis of lymphocytes. However, few studies have examined the role of non-coding miRNAs in this essential physiological process. Previously, using hybridization-based gene expression analysis and deep sequencing of small RNAs, we described the prevalent post-transcriptional repression of annotated miRNAs during glucocorticoid-induced apoptosis of lymphocytes. Here, we describe the development of a customized bioinformatics pipeline that facilitates the deep sequencing-mediated discovery of novel glucocorticoid-responsive miRNAs in apoptotic primary lymphocytes. This analysis identifies the potential presence of over 200 novel glucocorticoid-responsive miRNAs. We have validated the expression of two novel glucocorticoid-responsive miRNAs using small RNA-specific qPCR. Furthermore, through the use of Ingenuity Pathways Analysis (IPA) we determined that the putative targets of these novel validated miRNAs are predicted to regulate cell death processes. These findings identify two and predict the presence of additional novel glucocorticoid-responsive miRNAs in the rat transcriptome, suggesting a potential role for both annotated and novel miRNAs in glucocorticoid-induced apoptosis of lymphocytes. PMID:24250753

  8. Pathogen-specific deep sequence-coupled biopanning: A method for surveying human antibody responses.

    PubMed

    Frietze, Kathryn M; Pascale, Juan M; Moreno, Brechla; Chackerian, Bryce; Peabody, David S

    2017-01-01

    Identifying the targets of antibody responses during infection is important for designing vaccines, developing diagnostic and prognostic tools, and understanding pathogenesis. We developed a novel deep sequence-coupled biopanning approach capable of identifying the protein epitopes of antibodies present in human polyclonal serum. Here, we report the adaptation of this approach for the identification of pathogen-specific epitopes recognized by antibodies elicited during acute infection. As a proof-of-principle, we applied this approach to assessing antibodies to Dengue virus (DENV). Using a panel of sera from patients with acute secondary DENV infection, we panned a DENV antigen fragment library displayed on the surface of bacteriophage MS2 virus-like particles and characterized the population of affinity-selected peptide epitopes by deep sequence analysis. Although there was considerable variation in the responses of individuals, we found several epitopes within the Envelope glycoprotein and Non-Structural Protein 1 that were commonly enriched. This report establishes a novel approach for characterizing pathogen-specific antibody responses in human sera, and has future utility in identifying novel diagnostic and vaccine targets.

  9. MinVar: A rapid and versatile tool for HIV-1 drug resistance genotyping by deep sequencing.

    PubMed

    Huber, Michael; Metzner, Karin J; Geissberger, Fabienne D; Shah, Cyril; Leemann, Christine; Klimkait, Thomas; Böni, Jürg; Trkola, Alexandra; Zagordi, Osvaldo

    2017-02-01

    Genotypic monitoring of drug-resistance mutations (DRMs) in HIV-1 infected individuals is strongly recommended to guide selection of the initial antiretroviral therapy (ART) and changes of drug regimens. Traditionally, mutations conferring drug resistance are detected by population sequencing of the reverse transcribed viral RNA encoding the HIV-1 enzymes target by ART, followed by manual analysis and interpretation of Sanger sequencing traces. This process is labor intensive, relies on subjective interpretation from the operator, and offers limited sensitivity as only mutations above 20% frequency can be reliably detected. Here we present MinVar, a pipeline for the analysis of deep sequencing data, which allows reliable and automated detection of DRMs down to 5%. We evaluated MinVar with data from amplicon sequencing of defined mixtures of molecular virus clones with known DRM and plasma samples of viremic HIV-1 infected individuals and we compared it to VirVarSeq, another virus variant detection tool exclusively working on Illumina deep sequencing data. MinVar was designed to be compatible with a diverse range of sequencing platforms and allows the detection of DRMs and insertions/deletions from deep sequencing data without the need to perform additional bioinformatics analysis, a prerequisite to a widespread implementation of HIV-1 genotyping using deep sequencing in routine diagnostic settings.

  10. Population-genomic variation within RNA viruses of the Western honey bee, Apis mellifera, inferred from deep sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RN...

  11. Draft Genome Sequence of Caloranaerobacter sp. TR13, an Anaerobic Thermophilic Bacterium Isolated from a Deep-Sea Hydrothermal Vent

    PubMed Central

    Xie, Yunbiao; Dong, Binbin; Liu, Qing; Chen, Xiaoyao

    2015-01-01

    Here, we report the draft 2,261,881-bp genome sequence of Caloranaerobacter sp. TR13, isolated from a deep-sea hydrothermal vent on the East Pacific Rise. The sequence will be helpful for understanding the genetic and metabolic features, as well as potential biotechnological application in the genus Caloranaerobacter. PMID:26679595

  12. Deep sequencing analysis of the developing mouse brain reveals a novel microRNA

    PubMed Central

    2011-01-01

    Background MicroRNAs (miRNAs) are small non-coding RNAs that can exert multilevel inhibition/repression at a post-transcriptional or protein synthesis level during disease or development. Characterisation of miRNAs in adult mammalian brains by deep sequencing has been reported previously. However, to date, no small RNA profiling of the developing brain has been undertaken using this method. We have performed deep sequencing and small RNA analysis of a developing (E15.5) mouse brain. Results We identified the expression of 294 known miRNAs in the E15.5 developing mouse brain, which were mostly represented by let-7 family and other brain-specific miRNAs such as miR-9 and miR-124. We also discovered 4 putative 22-23 nt miRNAs: mm_br_e15_1181, mm_br_e15_279920, mm_br_e15_96719 and mm_br_e15_294354 each with a 70-76 nt predicted pre-miRNA. We validated the 4 putative miRNAs and further characterised one of them, mm_br_e15_1181, throughout embryogenesis. Mm_br_e15_1181 biogenesis was Dicer1-dependent and was expressed in E3.5 blastocysts and E7 whole embryos. Embryo-wide expression patterns were observed at E9.5 and E11.5 followed by a near complete loss of expression by E13.5, with expression restricted to a specialised layer of cells within the developing and early postnatal brain. Mm_br_e15_1181 was upregulated during neurodifferentiation of P19 teratocarcinoma cells. This novel miRNA has been identified as miR-3099. Conclusions We have generated and analysed the first deep sequencing dataset of small RNA sequences of the developing mouse brain. The analysis revealed a novel miRNA, miR-3099, with potential regulatory effects on early embryogenesis, and involvement in neuronal cell differentiation/function in the brain during late embryonic and early neonatal development. PMID:21466694

  13. Identification of torque teno virus in culture-negative endophthalmitis by representational deep-DNA sequencing

    PubMed Central

    Lee, Aaron Y.; Akileswaran, Lakshmi; Tibbetts, Michael D.; Garg, Sunir J.; Van Gelder, Russell N.

    2014-01-01

    Purpose To test the hypothesis that uncultured organisms may be present in cases of culture-negative endophthalmitis, by use of deep DNA sequencing of vitreous biopsies. Design Single center consecutive prospective observational study. Participants and Controls Aqueous or vitreous biopsies from 21 consecutive patients presenting with presumed infectious endophthalmitis, and seven vitreous samples from patients undergoing surgery for non-infectious retinal disorders. Methods Traditional bacterial and fungal culture, 16S quantitative polymerase chain reaction (qPCR) and a representational deep-sequencing method (Biome Representational in Silico Karyotyping [BRiSK]) were applied in parallel to samples to identify DNA sequences corresponding to potential pathogens. Main Outcome Measures Presence of potential pathogen DNA in ocular samples. Results None of 7 control eyes undergoing routine vitreous surgery yielded positive results for bacteria or virus by culture or 16S PCR. Fourteen of the 21 samples (66.7%) from eyes harboring suspected infectious endophthalmitis were culture-positive, the most common being Staphylococcal and Streptococcal species. There was good agreement among culture, 16S bacterial PCR, and BRiSK methodologies for culture-positive cases (Fleiss’ kappa of 0.621). 16S PCR did not yield a recognizable pathogen sequence in any culture-negative sample, while BRiSK suggested presence of Steptococcus in one culture-negative sample. Surprisingly, using BRiSK, 57.1% of culture-positive and 100% of culture-negative samples demonstrated presence of Torque Teno Virus (TTV) sequences, compared to none in the controls (Fisher exact, p = 0.0005). Presence of TTV viral DNA was confirmed in seven cases by qPCR. No other known viruses or potential pathogens were identified in these samples. Conclusion Culture, 16S qPCR, and BRiSK provide complementary information in presumed infectious endophthalmitis. The majority of culture-negative endophthalmitis samples did

  14. Fungal communities from the calcareous deep-sea sediments in the Southwest India Ridge revealed by Illumina sequencing technology.

    PubMed

    Zhang, Likui; Kang, Manyu; Huang, Yangchao; Yang, Lixiang

    2016-05-01

    The diversity and ecological significance of bacteria and archaea in deep-sea environments have been thoroughly investigated, but eukaryotic microorganisms in these areas, such as fungi, are poorly understood. To elucidate fungal diversity in calcareous deep-sea sediments in the Southwest India Ridge (SWIR), the internal transcribed spacer (ITS) regions of rRNA genes from two sediment metagenomic DNA samples were amplified and sequenced using the Illumina sequencing platform. The results revealed that 58-63 % and 36-42 % of the ITS sequences (97 % similarity) belonged to Basidiomycota and Ascomycota, respectively. These findings suggest that Basidiomycota and Ascomycota are the predominant fungal phyla in the two samples. We also found that Agaricomycetes, Leotiomycetes, and Pezizomycetes were the major fungal classes in the two samples. At the species level, Thelephoraceae sp. and Phialocephala fortinii were major fungal species in the two samples. Despite the low relative abundance, unidentified fungal sequences were also observed in the two samples. Furthermore, we found that there were slight differences in fungal diversity between the two sediment samples, although both were collected from the SWIR. Thus, our results demonstrate that calcareous deep-sea sediments in the SWIR harbor diverse fungi, which augment the fungal groups in deep-sea sediments. This is the first report of fungal communities in calcareous deep-sea sediments in the SWIR revealed by Illumina sequencing.

  15. Extracting features from protein sequences to improve deep extreme learning machine for protein fold recognition.

    PubMed

    Ibrahim, Wisam; Abadeh, Mohammad Saniee

    2017-03-27

    Protein fold recognition is an important problem in bioinformatics to predict three-dimensional structure of a protein. One of the most challenging tasks in protein fold recognition problem is the extraction of efficient features from the amino-acid sequences to obtain better classifiers. In this paper, we have proposed six descriptors to extract features from protein sequences. These descriptors are applied in the first stage of a three-stage framework PCA-DELM-LDA to extract feature vectors from the amino-acid sequences. Principal Component Analysis PCA has been implemented to reduce the number of extracted features. The extracted feature vectors have been used with original features to improve the performance of the Deep Extreme Learning Machine DELM in the second stage. Four new features have been extracted from the second stage and used in the third stage by Linear Discriminant Analysis LDA to classify the instances into 27 folds. The proposed framework is implemented on the independent and combined feature sets in SCOP datasets. The experimental results show that extracted feature vectors in the first stage could improve the performance of DELM in extracting new useful features in second stage.

  16. Identification of Dirofilaria immitis miRNA using illumina deep sequencing

    PubMed Central

    2013-01-01

    The heartworm Dirofilaria immitis is the causative agent of cardiopulmonary dirofilariosis in dogs and cats, which also infects a wide range of wild mammals and humans. The complex life cycle of D. immitis with several developmental stages in its invertebrate mosquito vectors and its vertebrate hosts indicates the importance of miRNA in growth and development, and their ability to regulate infection of mammalian hosts. This study identified the miRNA profiles of D. immitis of zoonotic significance by deep sequencing. A total of 1063 conserved miRNA candidates, including 68 anti-sense miRNA (miRNA*) sequences, were predicted by computational methods and could be grouped into 808 miRNA families. A significant bias towards family members, family abundance and sequence nucleotides was observed. Thirteen novel miRNA candidates were predicted by alignment with the Brugia malayi genome. Eleven out of 13 predicted miRNA candidates were verified by using a PCR-based method. Target genes of the novel miRNA candidates were predicted by using the heartworm transcriptome dataset. To our knowledge, this is the first report of miRNA profiles in D. immitis, which will contribute to a better understanding of the complex biology of this zoonotic filarial nematode and the molecular regulation roles of miRNA involved. Our findings may also become a useful resource for small RNA studies in other filarial parasitic nematodes. PMID:23331513

  17. Mapping vaccinia virus DNA replication origins at nucleotide level by deep sequencing.

    PubMed

    Senkevich, Tatiana G; Bruno, Daniel; Martens, Craig; Porcella, Stephen F; Wolf, Yuri I; Moss, Bernard

    2015-09-01

    Poxviruses reproduce in the host cytoplasm and encode most or all of the enzymes and factors needed for expression and synthesis of their double-stranded DNA genomes. Nevertheless, the mode of poxvirus DNA replication and the nature and location of the replication origins remain unknown. A current but unsubstantiated model posits only leading strand synthesis starting at a nick near one covalently closed end of the genome and continuing around the other end to generate a concatemer that is subsequently resolved into unit genomes. The existence of specific origins has been questioned because any plasmid can replicate in cells infected by vaccinia virus (VACV), the prototype poxvirus. We applied directional deep sequencing of short single-stranded DNA fragments enriched for RNA-primed nascent strands isolated from the cytoplasm of VACV-infected cells to pinpoint replication origins. The origins were identified as the switching points of the fragment directions, which correspond to the transition from continuous to discontinuous DNA synthesis. Origins containing a prominent initiation point mapped to a sequence within the hairpin loop at one end of the VACV genome and to the same sequence within the concatemeric junction of replication intermediates. These findings support a model for poxvirus genome replication that involves leading and lagging strand synthesis and is consistent with the requirements for primase and ligase activities as well as earlier electron microscopic and biochemical studies implicating a replication origin at the end of the VACV genome.

  18. Genome diversity in Brachypodium distachyon: deep sequencing of highly diverse inbred lines.

    PubMed

    Gordon, Sean P; Priest, Henry; Des Marais, David L; Schackwitz, Wendy; Figueroa, Melania; Martin, Joel; Bragg, Jennifer N; Tyler, Ludmila; Lee, Cheng-Ruei; Bryant, Doug; Wang, Wenqin; Messing, Joachim; Manzaneda, Antonio J; Barry, Kerrie; Garvin, David F; Budak, Hikmet; Tuna, Metin; Mitchell-Olds, Thomas; Pfender, William F; Juenger, Thomas E; Mockler, Todd C; Vogel, John P

    2014-08-01

    Brachypodium distachyon is small annual grass that has been adopted as a model for the grasses. Its small genome, high-quality reference genome, large germplasm collection, and selfing nature make it an excellent subject for studies of natural variation. We sequenced six divergent lines to identify a comprehensive set of polymorphisms and analyze their distribution and concordance with gene expression. Multiple methods and controls were utilized to identify polymorphisms and validate their quality. mRNA-Seq experiments under control and simulated drought-stress conditions, identified 300 genes with a genotype-dependent treatment response. We showed that large-scale sequence variants had extremely high concordance with altered expression of hundreds of genes, including many with genotype-dependent treatment responses. We generated a deep mRNA-Seq dataset for the most divergent line and created a de novo transcriptome assembly. This led to the discovery of >2400 previously unannotated transcripts and hundreds of genes not present in the reference genome. We built a public database for visualization and investigation of sequence variants among these widely used inbred lines.

  19. Mitogenome polymorphism in a single branch sample revealed by SOLiD deep sequencing of the Lophelia pertusa coral genome.

    PubMed

    Emblem, Ase; Karlsen, Bård Ove; Evertsen, Jussi; Miller, David J; Moum, Truls; Johansen, Steinar D

    2012-09-15

    We present an initial genomic analysis of the non-symbiotic scleractinian coral Lophelia pertusa, the dominant cold-water reef-building coral species in the North Atlantic Ocean. A significant fraction of the deep sequencing reads was of mitochondrial and microbial origins. SOLiD deep sequencing reads from fragment library experiments of total DNA and PCR amplified mitogenome generated about 21,000 times and 136,000 times coverage, respectively, of the 16,150 bp mitogenome. Five polymorphic sites that include two non-synonymous sites in the NADH dehydrogenase subunit 5 genes were detected in both experiments. This observation is surprising since anthozoans in general exhibit very low mtDNA sequence variation at intraspecific level compared to nuclear sequences. More than fifty bacterial species associated with the coral isolate were also sequence detected, representing at least ten complete genomes. Most reads, however, were predicted to originate from the Lophelia nuclear genome.

  20. Polymorphism identification and improved genome annotation of Brassica rapa through Deep RNA sequencing.

    PubMed

    Devisetty, Upendra Kumar; Covington, Michael F; Tat, An V; Lekkala, Saradadevi; Maloof, Julin N

    2014-08-12

    The mapping and functional analysis of quantitative traits in Brassica rapa can be greatly improved with the availability of physically positioned, gene-based genetic markers and accurate genome annotation. In this study, deep transcriptome RNA sequencing (RNA-Seq) of Brassica rapa was undertaken with two objectives: SNP detection and improved transcriptome annotation. We performed SNP detection on two varieties that are parents of a mapping population to aid in development of a marker system for this population and subsequent development of high-resolution genetic map. An improved Brassica rapa transcriptome was constructed to detect novel transcripts and to improve the current genome annotation. This is useful for accurate mRNA abundance and detection of expression QTL (eQTLs) in mapping populations. Deep RNA-Seq of two Brassica rapa genotypes-R500 (var. trilocularis, Yellow Sarson) and IMB211 (a rapid cycling variety)-using eight different tissues (root, internode, leaf, petiole, apical meristem, floral meristem, silique, and seedling) grown across three different environments (growth chamber, greenhouse and field) and under two different treatments (simulated sun and simulated shade) generated 2.3 billion high-quality Illumina reads. A total of 330,995 SNPs were identified in transcribed regions between the two genotypes with an average frequency of one SNP in every 200 bases. The deep RNA-Seq reassembled Brassica rapa transcriptome identified 44,239 protein-coding genes. Compared with current gene models of B. rapa, we detected 3537 novel transcripts, 23,754 gene models had structural modifications, and 3655 annotated proteins changed. Gaps in the current genome assembly of B. rapa are highlighted by our identification of 780 unmapped transcripts. All the SNPs, annotations, and predicted transcripts can be viewed at http://phytonetworks.ucdavis.edu/.

  1. Draft Genome Sequence of Deep-Sea Alteromonas sp. Strain V450 Isolated from the Marine Sponge Leiodermatium sp.

    PubMed Central

    Barrett, Nolan H.; McCarthy, Peter J.

    2017-01-01

    ABSTRACT The proteobacterium Alteromonas sp. strain V450 was isolated from the Atlantic deep-sea sponge Leiodermatium sp. Here, we report the draft genome sequence of this strain, with a genome size of approx. 4.39 Mb and a G+C content of 44.01%. The results will aid deep-sea microbial ecology, evolution, and sponge-microbe association studies. PMID:28153886

  2. Advancing Eucalyptus genomics: identification and sequencing of lignin biosynthesis genes from deep-coverage BAC libraries

    PubMed Central

    2011-01-01

    Background Eucalyptus species are among the most planted hardwoods in the world because of their rapid growth, adaptability and valuable wood properties. The development and integration of genomic resources into breeding practice will be increasingly important in the decades to come. Bacterial artificial chromosome (BAC) libraries are key genomic tools that enable positional cloning of important traits, synteny evaluation, and the development of genome framework physical maps for genetic linkage and genome sequencing. Results We describe the construction and characterization of two deep-coverage BAC libraries EG_Ba and EG_Bb obtained from nuclear DNA fragments of E. grandis (clone BRASUZ1) digested with HindIII and BstYI, respectively. Genome coverages of 17 and 15 haploid genome equivalents were estimated for EG_Ba and EG_Bb, respectively. Both libraries contained large inserts, with average sizes ranging from 135 Kb (Eg_Bb) to 157 Kb (Eg_Ba), very low extra-nuclear genome contamination providing a probability of finding a single copy gene ≥ 99.99%. Libraries were screened for the presence of several genes of interest via hybridizations to high-density BAC filters followed by PCR validation. Five selected BAC clones were sequenced and assembled using the Roche GS FLX technology providing the whole sequence of the E. grandis chloroplast genome, and complete genomic sequences of important lignin biosynthesis genes. Conclusions The two E. grandis BAC libraries described in this study represent an important milestone for the advancement of Eucalyptus genomics and forest tree research. These BAC resources have a highly redundant genome coverage (> 15×), contain large average inserts and have a very low percentage of clones with organellar DNA or empty vectors. These publicly available BAC libraries are thus suitable for a broad range of applications in genetic and genomic research in Eucalyptus and possibly in related species of Myrtaceae, including genome

  3. Complex Genotype Mixtures Analyzed by Deep Sequencing in Two Different Regions of Hepatitis B Virus

    PubMed Central

    Homs, Maria; Tabernero, David; Gonzalez, Carolina; Quer, Josep; Blasi, Maria; Casillas, Rosario; Nieto, Leonardo; Riveiro-Barciela, Mar; Esteban, Rafael; Buti, Maria; Rodriguez-Frias, Francisco

    2015-01-01

    This study assesses the presence and outcome of genotype mixtures in the polymerase/surface and X/preCore regions of the HBV genome in patients with chronic hepatitis B virus (HBV) infection. Thirty samples from ten chronic hepatitis B patients were included. The polymerase/surface and X/preCore regions were analyzed by deep sequencing (UDPS) in the first available sample at diagnosis, a pre-treatment sample, and a sample while under treatment. HBV genotype was determined by phylogenesis. Quasispecies complexity was evaluated by mutation frequency and nucleotide diversity. The polymerase/surface and X/preCore regions were validated for genotyping from 113 GenBank reference sequences. UDPS yielded a median of 10,960 sequences per sample (IQR 16,645) in the polymerase/surface region and 11,595 sequences per sample (IQR 14,682) in X/preCore. Genotype mixtures were more common in X/preCore (90%) than in polymerase/surface (30%) (p<0.001). On X/preCore genotyping, all samples were genotype A, whereas polymerase/surface yielded genotypes A (80%), D (16.7%), and F (3.3%) (p = 0.036). Genotype changes in polymerase/surface were observed in four patients during natural quasispecies dynamics and in two patients during treatment. There were no genotype changes in X/preCore. Quasispecies complexity was higher in X/preCore than in polymerase/surface (p = 0.004). The results provide evidence of genotype mixtures and differential genotype proportions in the polymerase/surface and X/preCore regions. The genotype dynamics in HBV infection and the different patterns of quasispecies complexity in the HBV genome suggest a new paradigm for HBV genotype classification. PMID:26714168

  4. Deep RNA Sequencing of the Skeletal Muscle Transcriptome in Swimming Fish

    PubMed Central

    Palstra, Arjan P.; Beltran, Sergi; Burgerhout, Erik; Brittijn, Sebastiaan A.; Magnoni, Leonardo J.; Henkel, Christiaan V.; Jansen, Hans J.; van den Thillart, Guido E. E. J. M.; Spaink, Herman P.; Planas, Josep V.

    2013-01-01

    Deep RNA sequencing (RNA-seq) was performed to provide an in-depth view of the transcriptome of red and white skeletal muscle of exercised and non-exercised rainbow trout (Oncorhynchus mykiss) with the specific objective to identify expressed genes and quantify the transcriptomic effects of swimming-induced exercise. Pubertal autumn-spawning seawater-raised female rainbow trout were rested (n = 10) or swum (n = 10) for 1176 km at 0.75 body-lengths per second in a 6,000-L swim-flume under reproductive conditions for 40 days. Red and white muscle RNA of exercised and non-exercised fish (4 lanes) was sequenced and resulted in 15–17 million reads per lane that, after de novo assembly, yielded 149,159 red and 118,572 white muscle contigs. Most contigs were annotated using an iterative homology search strategy against salmonid ESTs, the zebrafish Danio rerio genome and general Metazoan genes. When selecting for large contigs (>500 nucleotides), a number of novel rainbow trout gene sequences were identified in this study: 1,085 and 1,228 novel gene sequences for red and white muscle, respectively, which included a number of important molecules for skeletal muscle function. Transcriptomic analysis revealed that sustained swimming increased transcriptional activity in skeletal muscle and specifically an up-regulation of genes involved in muscle growth and developmental processes in white muscle. The unique collection of transcripts will contribute to our understanding of red and white muscle physiology, specifically during the long-term reproductive migration of salmonids. PMID:23308156

  5. Dysregulation of B Cell Repertoire Formation in Myasthenia Gravis Patients Revealed through Deep Sequencing.

    PubMed

    Vander Heiden, Jason A; Stathopoulos, Panos; Zhou, Julian Q; Chen, Luan; Gilbert, Tamara J; Bolen, Christopher R; Barohn, Richard J; Dimachkie, Mazen M; Ciafaloni, Emma; Broering, Teresa J; Vigneault, Francois; Nowak, Richard J; Kleinstein, Steven H; O'Connor, Kevin C

    2017-02-15

    Myasthenia gravis (MG) is a prototypical B cell-mediated autoimmune disease affecting 20-50 people per 100,000. The majority of patients fall into two clinically distinguishable types based on whether they produce autoantibodies targeting the acetylcholine receptor (AChR-MG) or muscle specific kinase (MuSK-MG). The autoantibodies are pathogenic, but whether their generation is associated with broader defects in the B cell repertoire is unknown. To address this question, we performed deep sequencing of the BCR repertoire of AChR-MG, MuSK-MG, and healthy subjects to generate ∼518,000 unique VH and VL sequences from sorted naive and memory B cell populations. AChR-MG and MuSK-MG subjects displayed distinct gene segment usage biases in both VH and VL sequences within the naive and memory compartments. The memory compartment of AChR-MG was further characterized by reduced positive selection of somatic mutations in the VH CDR and altered VH CDR3 physicochemical properties. The VL repertoire of MuSK-MG was specifically characterized by reduced V-J segment distance in recombined sequences, suggesting diminished VL receptor editing during B cell development. Our results identify large-scale abnormalities in both the naive and memory B cell repertoires. Particular abnormalities were unique to either AChR-MG or MuSK-MG, indicating that the repertoires reflect the distinct properties of the subtypes. These repertoire abnormalities are consistent with previously observed defects in B cell tolerance checkpoints in MG, thereby offering additional insight regarding the impact of tolerance defects on peripheral autoimmune repertoires. These collective findings point toward a deformed B cell repertoire as a fundamental component of MG.

  6. Complex Genotype Mixtures Analyzed by Deep Sequencing in Two Different Regions of Hepatitis B Virus.

    PubMed

    Caballero, Andrea; Gregori, Josep; Homs, Maria; Tabernero, David; Gonzalez, Carolina; Quer, Josep; Blasi, Maria; Casillas, Rosario; Nieto, Leonardo; Riveiro-Barciela, Mar; Esteban, Rafael; Buti, Maria; Rodriguez-Frias, Francisco

    2015-01-01

    This study assesses the presence and outcome of genotype mixtures in the polymerase/surface and X/preCore regions of the HBV genome in patients with chronic hepatitis B virus (HBV) infection. Thirty samples from ten chronic hepatitis B patients were included. The polymerase/surface and X/preCore regions were analyzed by deep sequencing (UDPS) in the first available sample at diagnosis, a pre-treatment sample, and a sample while under treatment. HBV genotype was determined by phylogenesis. Quasispecies complexity was evaluated by mutation frequency and nucleotide diversity. The polymerase/surface and X/preCore regions were validated for genotyping from 113 GenBank reference sequences. UDPS yielded a median of 10,960 sequences per sample (IQR 16,645) in the polymerase/surface region and 11,595 sequences per sample (IQR 14,682) in X/preCore. Genotype mixtures were more common in X/preCore (90%) than in polymerase/surface (30%) (p<0.001). On X/preCore genotyping, all samples were genotype A, whereas polymerase/surface yielded genotypes A (80%), D (16.7%), and F (3.3%) (p = 0.036). Genotype changes in polymerase/surface were observed in four patients during natural quasispecies dynamics and in two patients during treatment. There were no genotype changes in X/preCore. Quasispecies complexity was higher in X/preCore than in polymerase/surface (p = 0.004). The results provide evidence of genotype mixtures and differential genotype proportions in the polymerase/surface and X/preCore regions. The genotype dynamics in HBV infection and the different patterns of quasispecies complexity in the HBV genome suggest a new paradigm for HBV genotype classification.

  7. Implementation of a custom hardware-accelerator for short-read mapping using Burrows-Wheeler alignment.

    PubMed

    Waidyasooriya, Hasitha Muthumala; Hariyama, Masanori; Kameyama, Michitaka

    2013-01-01

    The mapping of millions of short DNA fragments to a large genome is a great challenge in modern computational biology. Usually, it takes many hours or days to map a large genome using software. However, the recent progress of programmable hardware such as field programmable gate arrays (FPGAs) provides a cost effective solution to this challenge. FPGAs contain millions of programmable logic gates to design massively parallel accelerators. This paper proposes a hardware architecture to accelerate the short-read mapping using Burrows-Wheeler alignment. The speed-up of the proposed architecture is estimated to be at least 10 times compared to its equivalent software application.

  8. Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data.

    PubMed

    Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A; Larsen, Martin Jakob

    2016-01-01

    Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths.

  9. Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data

    PubMed Central

    Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A.; Larsen, Martin Jakob

    2016-01-01

    Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths. PMID:27002637

  10. Small RNA Library Cloning Procedure for Deep Sequencing of Specific Endogenous siRNA Classes in Caenorhabditis elegans

    PubMed Central

    Ow, Maria C.; Lau, Nelson C.; Hall, Sarah E.

    2017-01-01

    In recent years, distinct classes of small RNAs ranging in size from ~21 to 26 nucleotides have been discovered and shown to play important roles in a wide array of cellular functions. Because of the abundance of these small RNAs, library preparation from an RNA sample followed by deep sequencing provides the identity and quantity of a particular class of small RNAs. In this chapter we describe a detailed protocol for preparing small RNA libraries for deep sequencing on the Illumina platform from the nematode C. elegans. PMID:24920360

  11. Enhanced methods for unbiased deep sequencing of Lassa and Ebola RNA viruses from clinical and biological samples.

    PubMed

    Matranga, Christian B; Andersen, Kristian G; Winnicki, Sarah; Busby, Michele; Gladden, Adrianne D; Tewhey, Ryan; Stremlau, Matthew; Berlin, Aaron; Gire, Stephen K; England, Eleina; Moses, Lina M; Mikkelsen, Tarjei S; Odia, Ikponmwonsa; Ehiane, Philomena E; Folarin, Onikepe; Goba, Augustine; Kahn, S Humarr; Grant, Donald S; Honko, Anna; Hensley, Lisa; Happi, Christian; Garry, Robert F; Malboeuf, Christine M; Birren, Bruce W; Gnirke, Andreas; Levin, Joshua Z; Sabeti, Pardis C

    2014-01-01

    We have developed a robust RNA sequencing method for generating complete de novo assemblies with intra-host variant calls of Lassa and Ebola virus genomes in clinical and biological samples. Our method uses targeted RNase H-based digestion to remove contaminating poly(rA) carrier and ribosomal RNA. This depletion step improves both the quality of data and quantity of informative reads in unbiased total RNA sequencing libraries. We have also developed a hybrid-selection protocol to further enrich the viral content of sequencing libraries. These protocols have enabled rapid deep sequencing of both Lassa and Ebola virus and are broadly applicable to other viral genomics studies.

  12. Deep sequencing analysis of defective genomes of parainfluenza virus 5 and their role in interferon induction.

    PubMed

    Killip, M J; Young, D F; Gatherer, D; Ross, C S; Short, J A L; Davison, A J; Goodbourn, S; Randall, R E

    2013-05-01

    Preparations of parainfluenza virus 5 (PIV5) that are potent activators of the interferon (IFN) induction cascade were generated by high-multiplicity passage in order to accumulate defective interfering virus genomes (DIs). Nucleocapsid RNA from these virus preparations was extracted and subjected to deep sequencing. Sequencing data were analyzed using methods designed to detect internal deletion and "copyback" DIs in order to identify and characterize the different DIs present and to approximately quantify the ratio of defective to nondefective genomes. Trailer copybacks dominated the DI populations in IFN-inducing preparations of both the PIV5 wild type (wt) and PIV5-VΔC (a recombinant virus that does not encode a functional V protein). Although the PIV5 V protein is an efficient inhibitor of the IFN induction cascade, we show that nondefective PIV5 wt is unable to prevent activation of the IFN response by coinfecting copyback DIs due to the interfering effects of copyback DIs on nondefective virus protein expression. As a result, copyback DIs are able to very rapidly activate the IFN induction cascade prior to the expression of detectable levels of V protein by coinfecting nondefective virus.

  13. Ultra Deep Sequencing of a Baculovirus Population Reveals Widespread Genomic Variations

    PubMed Central

    Chateigner, Aurélien; Bézier, Annie; Labrousse, Carole; Jiolle, Davy; Barbe, Valérie; Herniou, Elisabeth A.

    2015-01-01

    Viruses rely on widespread genetic variation and large population size for adaptation. Large DNA virus populations are thought to harbor little variation though natural populations may be polymorphic. To measure the genetic variation present in a dsDNA virus population, we deep sequenced a natural strain of the baculovirus Autographa californica multiple nucleopolyhedrovirus. With 124,221X average genome coverage of our 133,926 bp long consensus, we could detect low frequency mutations (0.025%). K-means clustering was used to classify the mutations in four categories according to their frequency in the population. We found 60 high frequency non-synonymous mutations under balancing selection distributed in all functional classes. These mutants could alter viral adaptation dynamics, either through competitive or synergistic processes. Lastly, we developed a technique for the delimitation of large deletions in next generation sequencing data. We found that large deletions occur along the entire viral genome, with hotspots located in homologous repeat regions (hrs). Present in 25.4% of the genomes, these deletion mutants presumably require functional complementation to complete their infection cycle. They might thus have a large impact on the fitness of the baculovirus population. Altogether, we found a wide breadth of genomic variation in the baculovirus population, suggesting it has high adaptive potential. PMID:26198241

  14. Characterization of Small Interfering RNAs Derived from the Geminivirus/Betasatellite Complex Using Deep Sequencing

    PubMed Central

    Yang, Xiuling; Wang, Yu; Guo, Wei; Xie, Yan; Xie, Qi; Fan, Longjiang; Zhou, Xueping

    2011-01-01

    Background Small RNA (sRNA)-guided RNA silencing is a critical antiviral defense mechanism employed by a variety of eukaryotic organisms. Although the induction of RNA silencing by bipartite and monopartite begomoviruses has been described in plants, the nature of begomovirus/betasatellite complexes remains undefined. Methodology/Principal Findings Solanum lycopersicum plant leaves systemically infected with Tomato yellow leaf curl China virus (TYLCCNV) alone or together with its associated betasatellite (TYLCCNB), and Nicotiana benthamiana plant leaves systemically infected with TYLCCNV alone, or together with TYLCCNB or with mutant TYLCCNB were harvested for RNA extraction; sRNA cDNA libraries were then constructed and submitted to Solexa-based deep sequencing. Both sense and anti-sense TYLCCNV and TYLCCNB-derived sRNAs (V-sRNAs and S-sRNAs) accumulated preferentially as 22 nucleotide species in infected S. lycopersicum and N. benthamiana plants. High resolution mapping of V-sRNAs and S-sRNAs revealed heterogeneous distribution of V-sRNA and S-sRNA sequences across the TYLCCNV and TYLCCNB genomes. In TYLCCNV-infected S. lycopersicum or N. benthamiana and TYLCCNV and βC1-mutant TYLCCNB co-infected N. benthamiana plants, the primary TYLCCNV targets were AV2 and the 5′ terminus of AV1. In TYLCCNV and betasatellite-infected plants, the number of V-sRNAs targeting this region decreased and the production of V-sRNAs increased corresponding to the overlapping regions of AC2 and AC3, as well as the 3′ terminal of AC1. βC1 is the primary determinant mediating symptom induction and also the primary silencing target of the TYLCCNB genome even in its mutated form. Conclusions/Significance We report the first high-resolution sRNA map for a monopartite begomovirus and its associated betasatellite using Solexa-based deep sequencing. Our results suggest that viral transcript might act as RDR substrates resulting in dsRNA and secondary siRNA production. In addition, the

  15. Deep RNA sequencing analysis of readthrough gene fusions in human prostate adenocarcinoma and reference samples

    PubMed Central

    2011-01-01

    Background Readthrough fusions across adjacent genes in the genome, or transcription-induced chimeras (TICs), have been estimated using expressed sequence tag (EST) libraries to involve 4-6% of all genes. Deep transcriptional sequencing (RNA-Seq) now makes it possible to study the occurrence and expression levels of TICs in individual samples across the genome. Methods We performed single-end RNA-Seq on three human prostate adenocarcinoma samples and their corresponding normal tissues, as well as brain and universal reference samples. We developed two bioinformatics methods to specifically identify TIC events: a targeted alignment method using artificial exon-exon junctions within 200,000 bp from adjacent genes, and genomic alignment allowing splicing within individual reads. We performed further experimental verification and characterization of selected TIC and fusion events using quantitative RT-PCR and comparative genomic hybridization microarrays. Results Targeted alignment against artificial exon-exon junctions yielded 339 distinct TIC events, including 32 gene pairs with multiple isoforms. The false discovery rate was estimated to be 1.5%. Spliced alignment to the genome was less sensitive, finding only 18% of those found by targeted alignment in 33-nt reads and 59% of those in 50-nt reads. However, spliced alignment revealed 30 cases of TICs with intervening exons, in addition to distant inversions, scrambled genes, and translocations. Our findings increase the catalog of observed TIC gene pairs by 66%. We verified 6 of 6 predicted TICs in all prostate samples, and 2 of 5 predicted novel distant gene fusions, both private events among 54 prostate tumor samples tested. Expression of TICs correlates with that of the upstream gene, which can explain the prostate-specific pattern of some TIC events and the restriction of the SLC45A3-ELK4 e4-e2 TIC to ERG-negative prostate samples, as confirmed in 20 matched prostate tumor and normal samples and 9 lung cancer

  16. An optimized kit-free method for making strand-specific deep sequencing libraries from RNA fragments

    PubMed Central

    Heyer, Erin E.; Ozadam, Hakan; Ricci, Emiliano P.; Cenik, Can; Moore, Melissa J.

    2015-01-01

    Deep sequencing of strand-specific cDNA libraries is now a ubiquitous tool for identifying and quantifying RNAs in diverse sample types. The accuracy of conclusions drawn from these analyses depends on precise and quantitative conversion of the RNA sample into a DNA library suitable for sequencing. Here, we describe an optimized method of preparing strand-specific RNA deep sequencing libraries from small RNAs and variably sized RNA fragments obtained from ribonucleoprotein particle footprinting experiments or fragmentation of long RNAs. Our approach works across a wide range of input amounts (400 pg to 200 ng), is easy to follow and produces a library in 2–3 days at relatively low reagent cost, all while giving the user complete control over every step. Because all enzymatic reactions were optimized and driven to apparent completion, sequence diversity and species abundance in the input sample are well preserved. PMID:25505164

  17. High resolution sequence stratigraphy of Miocene deep-water clastic outcrops, Taranaki coast, New Zealand

    SciTech Connect

    King, P.R.; Browne, G.H.; Slatt, R.M.

    1995-08-01

    Approximately 700m of deep water clastic deposits of Mt. Messenger Formation are superbly exposed along the Taranaki coast of North Island, New Zealand. Biostratigraphy indicates the interval was deposited during the time span 10.5-9.2m.y. in water depths grading upward from lower bathyal to middle-upper bathyal. This interval is considered part of a 3rd order depositional sequence deposited under conditions of fluctuating relative sea-level, concomitant with high sedimentation rates. Several 4th order depositional sequences, reflecting successive sea-level falls, are recognized within the interval. Sequence boundaries display a range of erosive morphologies from metre-wide canyons to scours several hundred metres across. All components of a generic lowstand systems tract--basin floor fan, channel-levee complex and progading complex--are present in logical and temporal order. They are repetitive through the interval, with the relatively shallower-water components becoming more prevalent upward. Basin floor fan lithologies are mainly m-thick, massive and convolute-bedded sandstones that alternate with cm- and dm-thick massive, horizontally-stratified and ripple-laminated sandstones and bioturbated mudstones. Channel-levee deposits consist of interleaving packages of thin-bedded, climbing-rippled and parallel-laminated sandstones and millstones; infrequent channels are filled with sandstones and mudstones, and sometimes lined with conglomerate. Thin beds of parallel to convoluted mudstone comprise prograding complex deposits. Similar lowstand systems tracts can be recognized and correlated on subsurface seismic reflection profiles and wireline logs. Such correlation has been aided by a continuous outcrop gamma-ray fog obtained over most of the measured interval. In the adjacent Taranaki peninsula, basin floor fan and channel-levee deposits comprise hydrocarbon reservoir intervals. Outcrop and subsurface reservior sandstones exhibit similar permeabilities.

  18. Deep Sequencing Analysis Reveals Temporal Microbiota Changes Associated with Development of Bovine Digital Dermatitis

    PubMed Central

    Krull, Adam C.; Shearer, Jan K.; Gorden, Patrick J.; Cooper, Vickie L.; Phillips, Gregory J.

    2014-01-01

    Bovine digital dermatitis (DD) is a leading cause of lameness in dairy cattle throughout the world. Despite 35 years of research, the definitive etiologic agent associated with the disease process is still unknown. Previous studies have demonstrated that multiple bacterial species are associated with lesions, with spirochetes being the most reliably identified organism. This study details the deep sequencing-based metagenomic evaluation of 48 staged DD biopsy specimens collected during a 3-year longitudinal study of disease progression. Over 175 million sequences were evaluated by utilizing both shotgun and 16S metagenomic techniques. Based on the shotgun sequencing results, there was no evidence of a fungal or DNA viral etiology. The bacterial microbiota of biopsy specimens progresses through a systematic series of changes that correlate with the novel morphological lesion scoring system developed as part of this project. This scoring system was validated, as the microbiota of each stage was statistically significantly different from those of other stages (P < 0.001). The microbiota of control biopsy specimens were the most diverse and became less diverse as lesions developed. Although Treponema spp. predominated in the advanced lesions, they were in relatively low abundance in the newly described early lesions that are associated with the initiation of the disease process. The consortium of Treponema spp. identified at the onset of disease changes considerably as the lesions progress through the morphological stages identified. The results of this study support the hypothesis that DD is a polybacterial disease process and provide unique insights into the temporal changes in bacterial populations throughout lesion development. PMID:24866801

  19. Ultra-Deep Sequencing of Mouse Mitochondrial DNA: Mutational Patterns and Their Origins

    PubMed Central

    Freyer, Christoph; Hagström, Erik; Ingman, Max; Larsson, Nils-Göran; Gyllensten, Ulf

    2011-01-01

    Somatic mutations of mtDNA are implicated in the aging process, but there is no universally accepted method for their accurate quantification. We have used ultra-deep sequencing to study genome-wide mtDNA mutation load in the liver of normally- and prematurely-aging mice. Mice that are homozygous for an allele expressing a proof-reading–deficient mtDNA polymerase (mtDNA mutator mice) have 10-times-higher point mutation loads than their wildtype siblings. In addition, the mtDNA mutator mice have increased levels of a truncated linear mtDNA molecule, resulting in decreased sequence coverage in the deleted region. In contrast, circular mtDNA molecules with large deletions occur at extremely low frequencies in mtDNA mutator mice and can therefore not drive the premature aging phenotype. Sequence analysis shows that the main proportion of the mutation load in heterozygous mtDNA mutator mice and their wildtype siblings is inherited from their heterozygous mothers consistent with germline transmission. We found no increase in levels of point mutations or deletions in wildtype C57Bl/6N mice with increasing age, thus questioning the causative role of these changes in aging. In addition, there was no increased frequency of transversion mutations with time in any of the studied genotypes, arguing against oxidative damage as a major cause of mtDNA mutations. Our results from studies of mice thus indicate that most somatic mtDNA mutations occur as replication errors during development and do not result from damage accumulation in adult life. PMID:21455489

  20. Reconstructing the Dynamics of HIV Evolution within Hosts from Serial Deep Sequence Data

    PubMed Central

    Poon, Art F. Y.; Swenson, Luke C.; Bunnik, Evelien M.; Edo-Matas, Diana; Schuitemaker, Hanneke; van 't Wout, Angélique B.; Harrigan, P. Richard

    2012-01-01

    At the early stage of infection, human immunodeficiency virus (HIV)-1 predominantly uses the CCR5 coreceptor for host cell entry. The subsequent emergence of HIV variants that use the CXCR4 coreceptor in roughly half of all infections is associated with an accelerated decline of CD4+ T-cells and rate of progression to AIDS. The presence of a ‘fitness valley’ separating CCR5- and CXCR4-using genotypes is postulated to be a biological determinant of whether the HIV coreceptor switch occurs. Using phylogenetic methods to reconstruct the evolutionary dynamics of HIV within hosts enables us to discriminate between competing models of this process. We have developed a phylogenetic pipeline for the molecular clock analysis, ancestral reconstruction, and visualization of deep sequence data. These data were generated by next-generation sequencing of HIV RNA extracted from longitudinal serum samples (median 7 time points) from 8 untreated subjects with chronic HIV infections (Amsterdam Cohort Studies on HIV-1 infection and AIDS). We used the known dates of sampling to directly estimate rates of evolution and to map ancestral mutations to a reconstructed timeline in units of days. HIV coreceptor usage was predicted from reconstructed ancestral sequences using the geno2pheno algorithm. We determined that the first mutations contributing to CXCR4 use emerged about 16 (per subject range 4 to 30) months before the earliest predicted CXCR4-using ancestor, which preceded the first positive cell-based assay of CXCR4 usage by 10 (range 5 to 25) months. CXCR4 usage arose in multiple lineages within 5 of 8 subjects, and ancestral lineages following alternate mutational pathways before going extinct were common. We observed highly patient-specific distributions and time-scales of mutation accumulation, implying that the role of a fitness valley is contingent on the genotype of the transmitted variant. PMID:23133358

  1. Multiregion ultra-deep sequencing reveals early intermixing and variable levels of intratumoral heterogeneity in colorectal cancer.

    PubMed

    Suzuki, Yuka; Ng, Sarah Boonhsi; Chua, Clarinda; Leow, Wei Qiang; Chng, Jermain; Liu, Shi Yang; Ramnarayanan, Kalpana; Gan, Anna; Ho, Dan Liang; Ten, Rachel; Su, Yan; Lezhava, Alexandar; Lai, Jiunn Herng; Koh, Dennis; Lim, Kiat Hon; Tan, Patrick; Rozen, Steven G; Tan, Iain Beehuat

    2017-02-01

    Intratumor heterogeneity (ITH) contributes to cancer progression and chemoresistance. We sought to comprehensively describe ITH of somatic mutations, copy number, and transcriptomic alterations involving clinically and biologically relevant gene pathways in colorectal cancer (CRC). We performed multiregion, high-depth (384× on average) sequencing of 799 cancer-associated genes in 24 spatially separated primary tumor and nonmalignant tissues from four treatment-naïve CRC patients. We then used ultra-deep sequencing (17 075× on average) to accurately verify the presence or absence of identified somatic mutations in each sector. We also digitally measured gene expression and copy number alterations using NanoString assays. We identified the subclonal point mutations and determined the mutational timing and phylogenetic relationships among spatially separated sectors of each tumor. Truncal mutations, those shared by all sectors in the tumor, affected the well-described driver genes such as APC, TP53, and KRAS. With sequencing at 17 075×, we found that mutations first detected at a sequencing depth of 384× were in fact more widely shared among sectors than originally assessed. Interestingly, ultra-deep sequencing also revealed some mutations that were present in all spatially dispersed sectors, but at subclonal levels. Ultra-high-depth validation sequencing, copy number analysis, and gene expression profiling provided a comprehensive and accurate genomic landscape of spatial heterogeneity in CRC. Ultra-deep sequencing allowed more sensitive detection of somatic mutations and a more accurate assessment of ITH. By detecting the subclonal mutations with ultra-deep sequencing, we traced the genomic histories of each tumor and the relative timing of mutational events. We found evidence of early mixing, in which the subclonal ancestral mutations intermixed across the sectors before the acquisition of subsequent nontruncal mutations. Our findings also indicate that

  2. Draft Genome Sequence of Hydrogenibacillus schlegelii MA48, a Deep-Branching Member of the Bacilli Class of Firmicutes

    PubMed Central

    Maker, Allison; Pace, Laura A.; Ward, Lewis M.; Fischer, Woodward W.

    2017-01-01

    ABSTRACT We report here the draft genome sequence of Hydrogenibacillus schlegelii MA48, a thermophilic facultative anaerobe that can oxidize hydrogen aerobically. H. schlegelii MA48 belongs to a deep-branching clade of the Bacilli class and provides important insight into the acquisition of aerobic respiration within the Firmicutes phylum. PMID:28104644

  3. Draft Genome Sequence of Alcanivorax sp. Strain KX64203 Isolated from Deep-Sea Sediments of Iheya North, Okinawa Trough

    PubMed Central

    Liu, Rui; Wang, Mengqiang; Wang, Hao; Gao, Qiang; Hou, Zhanhui; Gao, Dahai

    2016-01-01

    This report describes the draft genome sequence of Alcanivorax sp. strain KX64203, isolated from deep-sea sediment samples. The reads generated by an Ion Torrent PGM were assembled into contigs, with a total size of 4.76 Mb. The data will improve our understanding of the strain’s function in alkane degradation. PMID:27563046

  4. High-Resolution Hepatitis C Virus Subtyping Using NS5B Deep Sequencing and Phylogeny, an Alternative to Current Methods

    PubMed Central

    Gregori, Josep; Rodríguez-Frias, Francisco; Buti, Maria; Madejon, Antonio; Perez-del-Pulgar, Sofia; Garcia-Cehic, Damir; Casillas, Rosario; Blasi, Maria; Homs, Maria; Tabernero, David; Alvarez-Tejado, Miguel; Muñoz, Jose Manuel; Cubero, Maria; Caballero, Andrea; delCampo, Jose Antonio; Domingo, Esteban; Belmonte, Irene; Nieto, Leonardo; Lens, Sabela; Muñoz-de-Rueda, Paloma; Sanz-Cameno, Paloma; Sauleda, Silvia; Bes, Marta; Gomez, Jordi; Briones, Carlos; Perales, Celia; Sheldon, Julie; Castells, Lluis; Viladomiu, Lluis; Salmeron, Javier; Ruiz-Extremera, Angela; Quiles-Pérez, Rosa; Moreno-Otero, Ricardo; López-Rodríguez, Rosario; Allende, Helena; Romero-Gómez, Manuel; Guardia, Jaume; Esteban, Rafael; Garcia-Samaniego, Javier; Forns, Xavier

    2014-01-01

    Hepatitis C virus (HCV) is classified into seven major genotypes and 67 subtypes. Recent studies have shown that in HCV genotype 1-infected patients, response rates to regimens containing direct-acting antivirals (DAAs) are subtype dependent. Currently available genotyping methods have limited subtyping accuracy. We have evaluated the performance of a deep-sequencing-based HCV subtyping assay, developed for the 454/GS-Junior platform, in comparison with those of two commercial assays (Versant HCV genotype 2.0 and Abbott Real-time HCV Genotype II) and using direct NS5B sequencing as a gold standard (direct sequencing), in 114 clinical specimens previously tested by first-generation hybridization assay (82 genotype 1 and 32 with uninterpretable results). Phylogenetic analysis of deep-sequencing reads matched subtype 1 calling by population Sanger sequencing (69% 1b, 31% 1a) in 81 specimens and identified a mixed-subtype infection (1b/3a/1a) in one sample. Similarly, among the 32 previously indeterminate specimens, identical genotype and subtype results were obtained by direct and deep sequencing in all but four samples with dual infection. In contrast, both Versant HCV Genotype 2.0 and Abbott Real-time HCV Genotype II failed subtype 1 calling in 13 (16%) samples each and were unable to identify the HCV genotype and/or subtype in more than half of the non-genotype 1 samples. We concluded that deep sequencing is more efficient for HCV subtyping than currently available methods and allows qualitative identification of mixed infections and may be more helpful with respect to informing treatment strategies with new DAA-containing regimens across all HCV subtypes. PMID:25378574

  5. Hybridization Capture-Based Next-Generation Sequencing to Evaluate Coding Sequence and Deep Intronic Mutations in the NF1 Gene

    PubMed Central

    Cunha, Karin Soares; Oliveira, Nathalia Silva; Fausto, Anna Karoline; de Souza, Carolina Cruz; Gros, Audrey; Bandres, Thomas; Idrissi, Yamina; Merlio, Jean-Philippe; de Moura Neto, Rodrigo Soares; Silva, Rosane; Geller, Mauro; Cappellen, David

    2016-01-01

    Neurofibromatosis 1 (NF1) is one of the most common genetic disorders and is caused by mutations in the NF1 gene. NF1 gene mutational analysis presents a considerable challenge because of its large size, existence of highly homologous pseudogenes located throughout the human genome, absence of mutational hotspots, and diversity of mutations types, including deep intronic splicing mutations. We aimed to evaluate the use of hybridization capture-based next-generation sequencing to screen coding and noncoding NF1 regions. Hybridization capture-based next-generation sequencing, with genomic DNA as starting material, was used to sequence the whole NF1 gene (exons and introns) from 11 unrelated individuals and 1 relative, who all had NF1. All of them met the NF1 clinical diagnostic criteria. We showed a mutation detection rate of 91% (10 out of 11). We identified eight recurrent and two novel mutations, which were all confirmed by Sanger methodology. In the Sanger sequencing confirmation, we also included another three relatives with NF1. Splicing alterations accounted for 50% of the mutations. One of them was caused by a deep intronic mutation (c.1260 + 1604A > G). Frameshift truncation and missense mutations corresponded to 30% and 20% of the pathogenic variants, respectively. In conclusion, we show the use of a simple and fast approach to screen, at once, the entire NF1 gene (exons and introns) for different types of pathogenic variations, including the deep intronic splicing mutations. PMID:27999334

  6. Hybridization Capture-Based Next-Generation Sequencing to Evaluate Coding Sequence and Deep Intronic Mutations in the NF1 Gene.

    PubMed

    Cunha, Karin Soares; Oliveira, Nathalia Silva; Fausto, Anna Karoline; de Souza, Carolina Cruz; Gros, Audrey; Bandres, Thomas; Idrissi, Yamina; Merlio, Jean-Philippe; de Moura Neto, Rodrigo Soares; Silva, Rosane; Geller, Mauro; Cappellen, David

    2016-12-17

    Neurofibromatosis 1 (NF1) is one of the most common genetic disorders and is caused by mutations in the NF1 gene. NF1 gene mutational analysis presents a considerable challenge because of its large size, existence of highly homologous pseudogenes located throughout the human genome, absence of mutational hotspots, and diversity of mutations types, including deep intronic splicing mutations. We aimed to evaluate the use of hybridization capture-based next-generation sequencing to screen coding and noncoding NF1 regions. Hybridization capture-based next-generation sequencing, with genomic DNA as starting material, was used to sequence the whole NF1 gene (exons and introns) from 11 unrelated individuals and 1 relative, who all had NF1. All of them met the NF1 clinical diagnostic criteria. We showed a mutation detection rate of 91% (10 out of 11). We identified eight recurrent and two novel mutations, which were all confirmed by Sanger methodology. In the Sanger sequencing confirmation, we also included another three relatives with NF1. Splicing alterations accounted for 50% of the mutations. One of them was caused by a deep intronic mutation (c.1260 + 1604A > G). Frameshift truncation and missense mutations corresponded to 30% and 20% of the pathogenic variants, respectively. In conclusion, we show the use of a simple and fast approach to screen, at once, the entire NF1 gene (exons and introns) for different types of pathogenic variations, including the deep intronic splicing mutations.

  7. DeepSNVMiner: a sequence analysis tool to detect emergent, rare mutations in subsets of cell populations.

    PubMed

    Andrews, T Daniel; Jeelall, Yogesh; Talaulikar, Dipti; Goodnow, Christopher C; Field, Matthew A

    2016-01-01

    Background. Massively parallel sequencing technology is being used to sequence highly diverse populations of DNA such as that derived from heterogeneous cell mixtures containing both wild-type and disease-related states. At the core of such molecule tagging techniques is the tagging and identification of sequence reads derived from individual input DNA molecules, which must be first computationally disambiguated to generate read groups sharing common sequence tags, with each read group representing a single input DNA molecule. This disambiguation typically generates huge numbers of reads groups, each of which requires additional variant detection analysis steps to be run specific to each read group, thus representing a significant computational challenge. While sequencing technologies for producing these data are approaching maturity, the lack of available computational tools for analysing such heterogeneous sequence data represents an obstacle to the widespread adoption of this technology. Results. Using synthetic data we successfully detect unique variants at dilution levels of 1 in a 1,000,000 molecules, and find DeeepSNVMiner obtains significantly lower false positive and false negative rates compared to popular variant callers GATK, SAMTools, FreeBayes and LoFreq, particularly as the variant concentration levels decrease. In a dilution series with genomic DNA from two cells lines, we find DeepSNVMiner identifies a known somatic variant when present at concentrations of only 1 in 1,000 molecules in the input material, the lowest concentration amongst all variant callers tested. Conclusions. Here we present DeepSNVMiner; a tool to disambiguate tagged sequence groups and robustly identify sequence variants specific to subsets of starting DNA molecules that may indicate the presence of a disease. DeepSNVMiner is an automated workflow of custom sequence analysis utilities and open source tools able to differentiate somatic DNA variants from artefactual sequence

  8. Improved sequence learning with subthalamic nucleus deep brain stimulation: evidence for treatment-specific network modulation.

    PubMed

    Mure, Hideo; Tang, Chris C; Argyelan, Miklos; Ghilardi, Maria-Felice; Kaplitt, Michael G; Dhawan, Vijay; Eidelberg, David

    2012-02-22

    We used a network approach to study the effects of anti-parkinsonian treatment on motor sequence learning in humans. Eight Parkinson's disease (PD) patients with bilateral subthalamic nucleus (STN) deep brain stimulation underwent H(2)(15)O positron emission tomography (PET) imaging to measure regional cerebral blood flow (rCBF) while they performed kinematically matched sequence learning and movement tasks at baseline and during stimulation. Network analysis revealed a significant learning-related spatial covariance pattern characterized by consistent increases in subject expression during stimulation (p = 0.008, permutation test). The network was associated with increased activity in the lateral cerebellum, dorsal premotor cortex, and parahippocampal gyrus, with covarying reductions in the supplementary motor area (SMA) and orbitofrontal cortex. Stimulation-mediated increases in network activity correlated with concurrent improvement in learning performance (p < 0.02). To determine whether similar changes occurred during dopaminergic pharmacotherapy, we studied the subjects during an intravenous levodopa infusion titrated to achieve a motor response equivalent to stimulation. Despite consistent improvement in motor ratings during infusion, levodopa did not alter learning performance or network activity. Analysis of learning-related rCBF in network regions revealed improvement in baseline abnormalities with STN stimulation but not levodopa. These effects were most pronounced in the SMA. In this region, a consistent rCBF response to stimulation was observed across subjects and trials (p = 0.01), although the levodopa response was not significant. These findings link the cognitive treatment response in PD to changes in the activity of a specific cerebello-premotor cortical network. Selective modulation of overactive SMA-STN projection pathways may underlie the improvement in learning found with stimulation.

  9. mRNA deep sequencing reveals 75 new genes and a complex transcriptional landscape in Mimivirus.

    PubMed

    Legendre, Matthieu; Audic, Stéphane; Poirot, Olivier; Hingamp, Pascal; Seltzer, Virginie; Byrne, Deborah; Lartigue, Audrey; Lescot, Magali; Bernadac, Alain; Poulain, Julie; Abergel, Chantal; Claverie, Jean-Michel

    2010-05-01

    Mimivirus, a virus infecting Acanthamoeba, is the prototype of the Mimiviridae, the latest addition to the nucleocytoplasmic large DNA viruses. The Mimivirus genome encodes close to 1000 proteins, many of them never before encountered in a virus, such as four amino-acyl tRNA synthetases. To explore the physiology of this exceptional virus and identify the genes involved in the building of its characteristic intracytoplasmic "virion factory," we coupled electron microscopy observations with the massively parallel pyrosequencing of the polyadenylated RNA fractions of Acanthamoeba castellanii cells at various time post-infection. We generated 633,346 reads, of which 322,904 correspond to Mimivirus transcripts. This first application of deep mRNA sequencing (454 Life Sciences [Roche] FLX) to a large DNA virus allowed the precise delineation of the 5' and 3' extremities of Mimivirus mRNAs and revealed 75 new transcripts including several noncoding RNAs. Mimivirus genes are expressed across a wide dynamic range, in a finely regulated manner broadly described by three main temporal classes: early, intermediate, and late. This RNA-seq study confirmed the AAAATTGA sequence as an early promoter element, as well as the presence of palindromes at most of the polyadenylation sites. It also revealed a new promoter element correlating with late gene expression, which is also prominent in Sputnik, the recently described Mimivirus "virophage." These results-validated genome-wide by the hybridization of total RNA extracted from infected Acanthamoeba cells on a tiling array (Agilent)--will constitute the foundation on which to build subsequent functional studies of the Mimivirus/Acanthamoeba system.

  10. Targeted deep sequencing improves outcome stratification in chronic myelomonocytic leukemia with low risk cytogenetic features

    PubMed Central

    Palomo, Laura; Garcia, Olga; Arnan, Montse; Xicoy, Blanca; Fuster, Francisco; Cabezón, Marta; Coll, Rosa; Ademà, Vera; Grau, Javier; Jiménez, Maria-José; Pomares, Helena; Marcé, Sílvia; Mallo, Mar; Millá, Fuensanta; Alonso, Esther; Sureda, Anna; Gallardo, David; Feliu, Evarist; Ribera, Josep-Maria; Solé, Francesc; Zamora, Lurdes

    2016-01-01

    Clonal cytogenetic abnormalities are found in 20-30% of patients with chronic myelomonocytic leukemia (CMML), while gene mutations are present in >90% of cases. Patients with low risk cytogenetic features account for 80% of CMML cases and often fall into the low risk categories of CMML prognostic scoring systems, but the outcome differs considerably among them. We performed targeted deep sequencing of 83 myeloid-related genes in 56 CMML patients with low risk cytogenetic features or uninformative conventional cytogenetics (CC) at diagnosis, with the aim to identify the genetic characteristics of patients with a more aggressive disease. Targeted sequencing was also performed in a subset of these patients at time of acute myeloid leukemia (AML) transformation. Overall, 98% of patients harbored at least one mutation. Mutations in cell signaling genes were acquired at time of AML progression. Mutations in ASXL1, EZH2 and NRAS correlated with higher risk features and shorter overall survival (OS) and progression free survival (PFS). Patients with SRSF2 mutations associated with poorer OS, while absence of TET2 mutations (TET2wt) was predictive of shorter PFS. A decrease in OS and PFS was observed as the number of adverse risk gene mutations (ASXL1, EZH2, NRAS and SRSF2) increased. On multivariate analyses, CMML-specific scoring system (CPSS) and presence of adverse risk gene mutations remained significant for OS, while CPSS and TET2wt were predictive of PFS. These results confirm that mutation analysis can add prognostic value to patients with CMML and low risk cytogenetic features or uninformative CC. PMID:27486981

  11. Deep sequencing-based analysis of the anaerobic stimulon in Neisseria gonorrhoeae

    PubMed Central

    2011-01-01

    Background Maintenance of an anaerobic denitrification system in the obligate human pathogen, Neisseria gonorrhoeae, suggests that an anaerobic lifestyle may be important during the course of infection. Furthermore, mounting evidence suggests that reduction of host-produced nitric oxide has several immunomodulary effects on the host. However, at this point there have been no studies analyzing the complete gonococcal transcriptome response to anaerobiosis. Here we performed deep sequencing to compare the gonococcal transcriptomes of aerobically and anaerobically grown cells. Using the information derived from this sequencing, we discuss the implications of the robust transcriptional response to anaerobic growth. Results We determined that 198 chromosomal genes were differentially expressed (~10% of the genome) in response to anaerobic conditions. We also observed a large induction of genes encoded within the cryptic plasmid, pJD1. Validation of RNA-seq data using translational-lacZ fusions or RT-PCR demonstrated the RNA-seq results to be very reproducible. Surprisingly, many genes of prophage origin were induced anaerobically, as well as several transcriptional regulators previously unknown to be involved in anaerobic growth. We also confirmed expression and regulation of a small RNA, likely a functional equivalent of fnrS in the Enterobacteriaceae family. We also determined that many genes found to be responsive to anaerobiosis have also been shown to be responsive to iron and/or oxidative stress. Conclusions Gonococci will be subject to many forms of environmental stress, including oxygen-limitation, during the course of infection. Here we determined that the anaerobic stimulon in gonococci was larger than previous studies would suggest. Many new targets for future research have been uncovered, and the results derived from this study may have helped to elucidate factors or mechanisms of virulence that may have otherwise been overlooked. PMID:21251255

  12. Sequence stratigraphy of Cenozoic deepwater deposits in the Perdido fold belt, Northwestern Deep Gulf of Mexico

    SciTech Connect

    Fiduk, J.C.; Weimer, P.; Trudgill, B.D.

    1996-12-31

    Analysis of 12,000 km of 2-D multifold seismic data shows three large Cenozoic wedges of deepwater deposits in the Perdido fold belt that differ in seismic facies, areal distribution, and potential reservoir geometries. Together, these three wedges reflect the changing positions of Cenozoic depocenters and record the evolution of the Perdido structural province. Lithologic interpretation is based upon seismic facies and analogous facies in other drilled areas in the Gulf of Mexico (1) The Paleocene to middle Oligocene interval, which is strongly folded, reflects pre-growth deposition. Paleocene and Oligocene strata thicken westward and consist of medium to high amplitude, subparallel reflections of varying continuity. Broad channels and channel-levee systems are interpreted, suggesting turbidite deposition. These strata are interpreted as the down-dip equivalent of the Wilcox and Frio shallow-water depo-centers and are potentially sand-prone. Eocene strata are low amplitude, discontinuous, subparallel reflections interpreted to be shale-prone. (2) The upper Oligocene to upper Miocene interval consists of multiple well-developed sequences with variable amplitude, divergent reflections, many of which onlap against the fold crests. Sequences within this interval are often modified by erosion, faulting, and/or slumping against the folds. (3) The upper Miocene to Recent interval, which overlies most folds, consists of channel-levee, overbank, slump, and layered or amalgamated turbidite sheet deposits. These are similar to other coeval submarine fan sediments in the northern deep Gulf. Thus, the Cenozoic section in the Perdido fold belt is interpreted as mostly shale-prone, with some sand-prone intervals, based upon seismic facies, isopach thickening to the west, and similar producing facies elsewhere in the Gulf of Mexico.

  13. Sequence stratigraphy of Cenozoic deepwater deposits in the Perdido fold belt, Northwestern Deep Gulf of Mexico

    SciTech Connect

    Fiduk, J.C.; Weimer, P.; Trudgill, B.D. )

    1996-01-01

    Analysis of 12,000 km of 2-D multifold seismic data shows three large Cenozoic wedges of deepwater deposits in the Perdido fold belt that differ in seismic facies, areal distribution, and potential reservoir geometries. Together, these three wedges reflect the changing positions of Cenozoic depocenters and record the evolution of the Perdido structural province. Lithologic interpretation is based upon seismic facies and analogous facies in other drilled areas in the Gulf of Mexico (1) The Paleocene to middle Oligocene interval, which is strongly folded, reflects pre-growth deposition. Paleocene and Oligocene strata thicken westward and consist of medium to high amplitude, subparallel reflections of varying continuity. Broad channels and channel-levee systems are interpreted, suggesting turbidite deposition. These strata are interpreted as the down-dip equivalent of the Wilcox and Frio shallow-water depo-centers and are potentially sand-prone. Eocene strata are low amplitude, discontinuous, subparallel reflections interpreted to be shale-prone. (2) The upper Oligocene to upper Miocene interval consists of multiple well-developed sequences with variable amplitude, divergent reflections, many of which onlap against the fold crests. Sequences within this interval are often modified by erosion, faulting, and/or slumping against the folds. (3) The upper Miocene to Recent interval, which overlies most folds, consists of channel-levee, overbank, slump, and layered or amalgamated turbidite sheet deposits. These are similar to other coeval submarine fan sediments in the northern deep Gulf. Thus, the Cenozoic section in the Perdido fold belt is interpreted as mostly shale-prone, with some sand-prone intervals, based upon seismic facies, isopach thickening to the west, and similar producing facies elsewhere in the Gulf of Mexico.

  14. Deep sequencing of mycovirus-derived small RNAs from Botrytis species.

    PubMed

    Donaire, Livia; Ayllón, María A

    2016-08-31

    RNA silencing is an ancient regulatory mechanism operating in all eukaryotic cells. In fungi, it was first discovered in Neurospora crassa, although its potential as a defence mechanism against mycoviruses was first reported in Cryphonectria parasitica and, later, in several fungal species. There is little evidence of the antiviral potential of RNA silencing in the phytopathogenic species of the fungal genus Botrytis. Moreover, little is known about the RNA silencing components in these fungi, although the analysis of public genome databases identified two Dicer-like genes in B. cinerea, as in most of the ascomycetes sequenced to date. In this work, we used deep sequencing to study the virus-derived small RNA (vsiRNA) populations from different mycoviruses infecting field isolates of Botrytis spp. The mycoviruses under study belong to different genera and species, and have different types of genome [double-stranded RNA (dsRNA), (+)single-stranded RNA (ssRNA) and (-)ssRNA]. In general, vsiRNAs derived from mycoviruses are mostly of 21, 20 and 22 nucleotides in length, possess sense or antisense orientation, either in a similar ratio or with a predominance of sense polarity depending on the virus species, have predominantly U at their 5' end, and are unevenly distributed along the viral genome, showing conspicuous hotspots of vsiRNA accumulation. These characteristics reveal striking similarities with vsiRNAs produced by plant viruses, suggesting similar pathways of viral targeting in plants and fungi. We have shown that the fungal RNA silencing machinery acts against the mycoviruses used in this work in a similar manner independent of their viral or fungal origin.

  15. Discovering the Unknown: Improving Detection of Novel Species and Genera from Short Reads

    DOE PAGES

    Rosen, Gail L.; Polikar, Robi; Caseiro, Diamantino A.; ...

    2011-01-01

    High-throughput sequencing technologies enable metagenome profiling, simultaneous sequencing of multiple microbial species present within an environmental sample. Since metagenomic data includes sequence fragments (“reads”) from organisms that are absent from any database, new algorithms must be developed for the identification and annotation of novel sequence fragments. Homology-based techniques have been modified to detect novel species and genera, but, composition-based methods, have not been adapted. We develop a detection technique that can discriminate between “known” and “unknown” taxa, which can be used with composition-based methods, as well as a hybrid method. Unlike previous studies, we rigorously evaluate all algorithms for theirmore » ability to detect novel taxa. First, we show that the integration of a detector with a composition-based method performs significantly better than homology-based methods for the detection of novel species and genera, with best performance at finer taxonomic resolutions. Most importantly, we evaluate all the algorithms by introducing an “unknown” class and show that the modified version of PhymmBL has similar or better overall classification performance than the other modified algorithms, especially for the species-level and ultrashort reads. Finally, we evaluate theperformance of several algorithms on a real acid mine drainage dataset.« less

  16. Discovering the unknown: improving detection of novel species and genera from short reads.

    PubMed

    Rosen, Gail L; Polikar, Robi; Caseiro, Diamantino A; Essinger, Steven D; Sokhansanj, Bahrad A

    2011-01-01

    High-throughput sequencing technologies enable metagenome profiling, simultaneous sequencing of multiple microbial species present within an environmental sample. Since metagenomic data includes sequence fragments ("reads") from organisms that are absent from any database, new algorithms must be developed for the identification and annotation of novel sequence fragments. Homology-based techniques have been modified to detect novel species and genera, but, composition-based methods, have not been adapted. We develop a detection technique that can discriminate between "known" and "unknown" taxa, which can be used with composition-based methods, as well as a hybrid method. Unlike previous studies, we rigorously evaluate all algorithms for their ability to detect novel taxa. First, we show that the integration of a detector with a composition-based method performs significantly better than homology-based methods for the detection of novel species and genera, with best performance at finer taxonomic resolutions. Most importantly, we evaluate all the algorithms by introducing an "unknown" class and show that the modified version of PhymmBL has similar or better overall classification performance than the other modified algorithms, especially for the species-level and ultrashort reads. Finally, we evaluate the performance of several algorithms on a real acid mine drainage dataset.

  17. Discovering the Unknown: Improving Detection of Novel Species and Genera from Short Reads

    PubMed Central

    Rosen, Gail L.; Polikar, Robi; Caseiro, Diamantino A.; Essinger, Steven D.; Sokhansanj, Bahrad A.

    2011-01-01

    High-throughput sequencing technologies enable metagenome profiling, simultaneous sequencing of multiple microbial species present within an environmental sample. Since metagenomic data includes sequence fragments (“reads”) from organisms that are absent from any database, new algorithms must be developed for the identification and annotation of novel sequence fragments. Homology-based techniques have been modified to detect novel species and genera, but, composition-based methods, have not been adapted. We develop a detection technique that can discriminate between “known” and “unknown” taxa, which can be used with composition-based methods, as well as a hybrid method. Unlike previous studies, we rigorously evaluate all algorithms for their ability to detect novel taxa. First, we show that the integration of a detector with a composition-based method performs significantly better than homology-based methods for the detection of novel species and genera, with best performance at finer taxonomic resolutions. Most importantly, we evaluate all the algorithms by introducing an “unknown” class and show that the modified version of PhymmBL has similar or better overall classification performance than the other modified algorithms, especially for the species-level and ultrashort reads. Finally, we evaluate the performance of several algorithms on a real acid mine drainage dataset. PMID:21541181

  18. The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments

    PubMed Central

    Ingolia, Nicholas T.; Brar, Gloria A.; Rouskin, Silvia; McGeachy, Anna M.; Weissman, Jonathan S.

    2012-01-01

    Recent studies highlight the importance of translational control in determining protein abundance, underscoring the value of measuring gene expression at the level of translation. We present a protocol for genome-wide, quantitative analysis of in vivo translation by deep sequencing. This ribosome profiling approach maps the exact positions of ribosomes on transcripts by nuclease footprinting. The nuclease-protected mRNA fragments are converted into a DNA library suitable for deep sequencing using a strategy that minimizes bias. The abundance of different footprint fragments in deep sequencing data reports on the amount of translation of a gene. Additionally, footprints reveal the exact regions of the transcriptome that are translated. To better define translated reading frames, we describe an adaptation that reveals the sites of translation initiation by pre-treating cells with harringtonine to immobilize initiating ribosomes. The protocol we describe requires 5–7 days to generate a completed ribosome profiling sequencing library. Sequencing and data analysis requires a further 4 – 5 days. PMID:22836135

  19. Filtering of deep sequencing data reveals the existence of abundant Dicer-dependent small RNAs derived from tRNAs

    PubMed Central

    Cole, Christian; Sobala, Andrew; Lu, Cheng; Thatcher, Shawn R.; Bowman, Andrew; Brown, John W.S.; Green, Pamela J.; Barton, Geoffrey J.; Hutvagner, Gyorgy

    2009-01-01

    Deep sequencing technologies such as Illumina, SOLiD, and 454 platforms have become very powerful tools in discovering and quantifying small RNAs in diverse organisms. Sequencing small RNA fractions always identifies RNAs derived from abundant RNA species such as rRNAs, tRNAs, snRNA, and snoRNA, and they are widely considered to be random degradation products. We carried out bioinformatic analysis of deep sequenced HeLa RNA and after quality filtering, identified highly abundant small RNA fragments, derived from mature tRNAs that are likely produced by specific processing rather than from random degradation. Moreover, we showed that the processing of small RNAs derived from tRNAGln is dependent on Dicer in vivo and that Dicer cleaves the tRNA in vitro. PMID:19850906

  20. An effective differential expression analysis of deep-sequencing data based on the Poisson log-normal model.

    PubMed

    Wu, Jun; Zhao, Xiaodong; Lin, Zongli; Shao, Zhifeng

    2015-04-01

    Tremendous amount of deep-sequencing data has unprecedentedly improved our understanding in biomedical science by digital sequence reads. To mine useful information from such data, a proper distribution for modeling all range of the count data and accurate parameter estimation are required. In this paper, we propose a method, called "DEPln," for differential expression analysis based on the Poisson log-normal (PLN) distribution with an accurate parameter estimation strategy, which aims to overcome the inconvenience in the mathematical analysis of the traditional PLN distribution. The performance of our proposed method is validated by both synthetic and real data. Experimental results indicate that our method outperforms the traditional methods in terms of the discrimination ability and results in a good tradeoff between the recall rate and the precision. Thus, our work provides a new approach for gene expression analysis and has strong potential in deep-sequencing based research.

  1. Complete genome sequence of Southern tomato virus naturally infecting tomatoes in Bangladesh using small RNA deep sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The complete genome sequence of a Southern tomato virus (STV) isolate on tomato plants in a seed production field in Bangladesh was obtained for the first time using next generation sequencing. The identified isolate STV_BD-13 shares high degree of sequence identity (99%) with several known STV isol...

  2. Age-related decrease in TCR repertoire diversity measured with deep and normalized sequence profiling.

    PubMed

    Britanova, Olga V; Putintseva, Ekaterina V; Shugay, Mikhail; Merzlyak, Ekaterina M; Turchaninova, Maria A; Staroverov, Dmitriy B; Bolotin, Dmitriy A; Lukyanov, Sergey; Bogdanova, Ekaterina A; Mamedov, Ilgar Z; Lebedev, Yuriy B; Chudakov, Dmitriy M

    2014-03-15

    The decrease of TCR diversity with aging has never been studied by direct methods. In this study, we combined high-throughput Illumina sequencing with unique cDNA molecular identifier technology to achieve deep and precisely normalized profiling of TCR β repertoires in 39 healthy donors aged 6-90 y. We demonstrate that TCR β diversity per 10(6) T cells decreases roughly linearly with age, with significant reduction already apparent by age 40. The percentage of naive T cells showed a strong correlation with measured TCR diversity and decreased linearly up to age 70. Remarkably, the oldest group (average age 82 y) was characterized by a higher percentage of naive CD4(+) T cells, lower abundance of expanded clones, and increased TCR diversity compared with the previous age group (average age 62 y), suggesting the influence of age selection and association of these three related parameters with longevity. Interestingly, cross-analysis of individual TCR β repertoires revealed a set >10,000 of the most representative public TCR β clonotypes, whose abundance among the top 100,000 clones correlated with TCR diversity and decreased with aging.

  3. Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing.

    PubMed

    Manske, Magnus; Miotto, Olivo; Campino, Susana; Auburn, Sarah; Almagro-Garcia, Jacob; Maslen, Gareth; O'Brien, Jack; Djimde, Abdoulaye; Doumbo, Ogobara; Zongo, Issaka; Ouedraogo, Jean-Bosco; Michon, Pascal; Mueller, Ivo; Siba, Peter; Nzila, Alexis; Borrmann, Steffen; Kiara, Steven M; Marsh, Kevin; Jiang, Hongying; Su, Xin-Zhuan; Amaratunga, Chanaki; Fairhurst, Rick; Socheat, Duong; Nosten, Francois; Imwong, Mallika; White, Nicholas J; Sanders, Mandy; Anastasi, Elisa; Alcock, Dan; Drury, Eleanor; Oyola, Samuel; Quail, Michael A; Turner, Daniel J; Ruano-Rubio, Valentin; Jyothi, Dushyanth; Amenga-Etego, Lucas; Hubbart, Christina; Jeffreys, Anna; Rowlands, Kate; Sutherland, Colin; Roper, Cally; Mangano, Valentina; Modiano, David; Tan, John C; Ferdig, Michael T; Amambua-Ngwa, Alfred; Conway, David J; Takala-Harrison, Shannon; Plowe, Christopher V; Rayner, Julian C; Rockett, Kirk A; Clark, Taane G; Newbold, Chris I; Berriman, Matthew; MacInnis, Bronwyn; Kwiatkowski, Dominic P

    2012-07-19

    Malaria elimination strategies require surveillance of the parasite population for genetic changes that demand a public health response, such as new forms of drug resistance. Here we describe methods for the large-scale analysis of genetic variation in Plasmodium falciparum by deep sequencing of parasite DNA obtained from the blood of patients with malaria, either directly or after short-term culture. Analysis of 86,158 exonic single nucleotide polymorphisms that passed genotyping quality control in 227 samples from Africa, Asia and Oceania provides genome-wide estimates of allele frequency distribution, population structure and linkage disequilibrium. By comparing the genetic diversity of individual infections with that of the local parasite population, we derive a metric of within-host diversity that is related to the level of inbreeding in the population. An open-access web application has been established for the exploration of regional differences in allele frequency and of highly differentiated loci in the P. falciparum genome.

  4. Metagenomes obtained by 'deep sequencing' - what do they tell about the enhanced biological phosphorus removal communities?

    PubMed

    Albertsen, Mads; Saunders, Aaron M; Nielsen, Kåre L; Nielsen, Per H

    2013-01-01

    Metagenomics enables studies of the genomic potential of complex microbial communities by sequencing bulk genomic DNA directly from the environment. Knowledge of the genetic potential of a community can be used to formulate and test ecological hypotheses about stability and performance. In this study deep metagenomics and fluorescence in situ hybridization (FISH) were used to study a full-scale wastewater treatment plant with enhanced biological phosphorus removal (EBPR), and the results were compared to an existing EBPR metagenome. EBPR is a widely used process that relies on a complex community of microorganisms to function properly. Insight into community and species level stability and dynamics is valuable for knowledge-driven optimization of the EBPR process. The metagenomes of the EBPR communities were distinct compared to metagenomes of communities from a wide range of other environments, which could be attributed to selection pressures of the EBPR process. The metabolic potential of one of the key microorganisms in the EPBR process, Accumulibacter, was investigated in more detail in the two plants, revealing a potential importance of phage predation on the dynamics of Accumulibacter populations. The results demonstrate that metagenomics can be used as a powerful tool for system wide characterization of the EBPR community as well as for a deeper understanding of the function of specific community members. Furthermore, we discuss and illustrate some of the general pitfalls in metagenomics and stress the need of additional DNA extraction independent information in metagenome studies.

  5. New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing

    PubMed Central

    Song, Kai; Ren, Jie; Reinert, Gesine; Deng, Minghua

    2014-01-01

    With the development of next-generation sequencing (NGS) technologies, a large amount of short read data has been generated. Assembly of these short reads can be challenging for genomes and metagenomes without template sequences, making alignment-based genome sequence comparison difficult. In addition, sequence reads from NGS can come from different regions of various genomes and they may not be alignable. Sequence signature-based methods for genome comparison based on the frequencies of word patterns in genomes and metagenomes can potentially be useful for the analysis of short reads data from NGS. Here we review the recent development of alignment-free genome and metagenome comparison based on the frequencies of word patterns with emphasis on the dissimilarity measures between sequences, the statistical power of these measures when two sequences are related and the applications of these measures to NGS data. PMID:24064230

  6. Patchiness of deep-sea benthic Foraminifera across the Southern Ocean: Insights from high-throughput DNA sequencing

    NASA Astrophysics Data System (ADS)

    Lejzerowicz, Franck; Esling, Philippe; Pawlowski, Jan

    2014-10-01

    Spatial patchiness is a natural feature that strongly influences the level of species richness we perceive in surface sediments sampled in the deep-sea. Recent environmental DNA (eDNA) surveys of benthic micro- and meiofauna confirmed this exceptional richness. However, it is unknown to which extent the results of these studies, based usually on few grams of sediment, are affected by spatial patchiness of deep-sea benthos. Here, we analyse the eDNA diversity of Foraminifera in 42 deep-sea sediment samples collected across different scales in the Southern Ocean. At three stations, we deployed at least twice the multicorer and from each multicorer cast, we subsampled 3 sediment replicates per core for 2 cores. Using high-throughput sequencing (HTS), we generated over 2.35 million high-quality sequences that we clustered into 451 operational taxonomic units (OTUs). The majority of OTUs were assigned to the monothalamous (single-chambered) taxa and environmental clades. On average, a one-gram sediment sample captures 57.9% of the overall OTU diversity found in a single core, while three replicates cover at most 61.9% of the diversity found in a station. The OTUs found in all the replicates of each core gather up to 87.9% of the total sequenced reads, but only represent from 12.2% to 30% of the OTUs found in one core. These OTUs represent the most abundant species, among which dominate environmental lineages. The majority of the OTUs are represented by few sequences comprising several well-known deep-sea morphospecies or remaining unassigned. It is crucial to study wider arrays of sample and PCR replicates as well as RNA together with DNA in order to overcome biases stemming from deep-sea patchiness and molecular methods.

  7. Development of high-throughput SNP-based genotyping in Acacia auriculiformis x A. mangium hybrids using short-read transcriptome data

    PubMed Central

    2012-01-01

    Background Next Generation Sequencing has provided comprehensive, affordable and high-throughput DNA sequences for Single Nucleotide Polymorphism (SNP) discovery in Acacia auriculiformis and Acacia mangium. Like other non-model species, SNP detection and genotyping in Acacia are challenging due to lack of genome sequences. The main objective of this study is to develop the first high-throughput SNP genotyping assay for linkage map construction of A. auriculiformis x A. mangium hybrids. Results We identified a total of 37,786 putative SNPs by aligning short read transcriptome data from four parents of two Acacia hybrid mapping populations using Bowtie against 7,839 de novo transcriptome contigs. Given a set of 10 validated SNPs from two lignin genes, our in silico SNP detection approach is highly accurate (100%) compared to the traditional in vitro approach (44%). Further validation of 96 SNPs using Illumina GoldenGate Assay gave an overall assay success rate of 89.6% and conversion rate of 37.5%. We explored possible factors lowering assay success rate by predicting exon-intron boundaries and paralogous genes of Acacia contigs using Medicago truncatula genome as reference. This assessment revealed that presence of exon-intron boundary is the main cause (50%) of assay failure. Subsequent SNPs filtering and improved assay design resulted in assay success and conversion rate of 92.4% and 57.4%, respectively based on 768 SNPs genotyping. Analysis of clustering patterns revealed that 27.6% of the assays were not reproducible and flanking sequence might play a role in determining cluster compression. In addition, we identified a total of 258 and 319 polymorphic SNPs in A. auriculiformis and A. mangium natural germplasms, respectively. Conclusion We have successfully discovered a large number of SNP markers in A. auriculiformis x A. mangium hybrids using next generation transcriptome sequencing. By using a reference genome from the most closely related species, we

  8. MicroRNA Discovery and Analysis of Pinewood Nematode Bursaphelenchus xylophilus by Deep Sequencing

    PubMed Central

    Huang, Qi-Xing; Cheng, Xin-Yue; Mao, Zhen-Chuan; Wang, Yun-Sheng; Zhao, Li-Lin; Yan, Xia; Ferris, Virginia R.; Xu, Ru-Mei; Xie, Bing-Yan

    2010-01-01

    Background MicroRNAs (miRNAs) are considered to be very important in regulating the growth, development, behavior and stress response in animals and plants in post-transcriptional gene regulation. Pinewood nematode, Bursaphelenchus xylophilus, is an important invasive plant parasitic nematode in Asia. To have a comprehensive knowledge about miRNAs of the nematode is necessary for further in-depth study on roles of miRNAs in the ecological adaptation of the invasive species. Methods and Findings Five small RNA libraries were constructed and sequenced by Illumina/Solexa deep-sequencing technology. A total of 810 miRNA candidates (49 conserved and 761 novel) were predicted by a computational pipeline, of which 57 miRNAs (20 conserved and 37 novel) encoded by 53 miRNA precursors were identified by experimental methods. Ten novel miRNAs were considered to be species-specific miRNAs of B. xylophilus. Comparison of expression profiles of miRNAs in the five small RNA libraries showed that many miRNAs exhibited obviously different expression levels in the third-stage dispersal juvenile and at a cold-stressed status. Most of the miRNAs exhibited obviously down-regulated expression in the dispersal stage. But differences among the three geographic libraries were not prominent. A total of 979 genes were predicted to be targets of these authentic miRNAs. Among them, seven heat shock protein genes were targeted by 14 miRNAs, and six FMRFamide-like neuropeptides genes were targeted by 17 miRNAs. A real-time quantitative polymerase chain reaction was used to quantify the mRNA expression levels of target genes. Conclusions Basing on the fact that a negative correlation existed between the expression profiles of miRNAs and the mRNA expression profiles of their target genes (hsp, flp) by comparing those of the nematodes at a cold stressed status and a normal status, we suggested that miRNAs might participate in ecological adaptation and behavior regulation of the nematode. This is

  9. Deep Sequencing Reveals Potential Antigenic Variants at Low Frequencies in Influenza A Virus-Infected Humans

    PubMed Central

    Dinis, Jorge M.; Florek, Nicholas W.; Fatola, Omolayo O.; Moncla, Louise H.; Mutschler, James P.; Charlier, Olivia K.; Meece, Jennifer K.; Belongia, Edward A.

    2016-01-01

    ABSTRACT Influenza vaccines must be frequently reformulated to account for antigenic changes in the viral envelope protein, hemagglutinin (HA). The rapid evolution of influenza virus under immune pressure is likely enhanced by the virus's genetic diversity within a host, although antigenic change has rarely been investigated on the level of individual infected humans. We used deep sequencing to characterize the between- and within-host genetic diversity of influenza viruses in a cohort of patients that included individuals who were vaccinated and then infected in the same season. We characterized influenza HA segments from the predominant circulating influenza A subtypes during the 2012-2013 (H3N2) and 2013-2014 (pandemic H1N1; H1N1pdm) flu seasons. We found that HA consensus sequences were similar in nonvaccinated and vaccinated subjects. In both groups, purifying selection was the dominant force shaping HA genetic diversity. Interestingly, viruses from multiple individuals harbored low-frequency mutations encoding amino acid substitutions in HA antigenic sites at or near the receptor-binding domain. These mutations included two substitutions in H1N1pdm viruses, G158K and N159K, which were recently found to confer escape from virus-specific antibodies. These findings raise the possibility that influenza antigenic diversity can be generated within individual human hosts but may not become fixed in the viral population even when they would be expected to have a strong fitness advantage. Understanding constraints on influenza antigenic evolution within individual hosts may elucidate potential future pathways of antigenic evolution at the population level. IMPORTANCE Influenza vaccines must be frequently reformulated due to the virus's rapid evolution rate. We know that influenza viruses exist within each infected host as a “swarm” of genetically distinct viruses, but the role of this within-host diversity in the antigenic evolution of influenza has been unclear

  10. MicroRNA deep-sequencing reveals master regulators of follicular and papillary thyroid tumors.

    PubMed

    Mancikova, Veronika; Castelblanco, Esmeralda; Pineiro-Yanez, Elena; Perales-Paton, Javier; de Cubas, Aguirre A; Inglada-Perez, Lucia; Matias-Guiu, Xavier; Capel, Ismael; Bella, Maria; Lerma, Enrique; Riesco-Eizaguirre, Garcilaso; Santisteban, Pilar; Maravall, Francisco; Mauricio, Didac; Al-Shahrour, Fatima; Robledo, Mercedes

    2015-06-01

    MicroRNA deregulation could be a crucial event in thyroid carcinogenesis. However, current knowledge is based on studies that have used inherently biased methods. Thus, we aimed to define in an unbiased way a list of deregulated microRNAs in well-differentiated thyroid cancer in order to identify diagnostic and prognostic markers. We performed a microRNA deep-sequencing study using the largest well-differentiated thyroid tumor collection reported to date, comprising 127 molecularly characterized tumors with follicular or papillary patterns of growth and available clinical follow-up data, and 17 normal tissue samples. Furthermore, we integrated microRNA and gene expression data for the same tumors to propose targets for the novel molecules identified. Two main microRNA expression profiles were identified: one common for follicular-pattern tumors, and a second for papillary tumors. Follicular tumors showed a notable overexpression of several members of miR-515 family, and downregulation of the novel microRNA miR-1247. Among papillary tumors, top upregulated microRNAs were miR-146b and the miR-221~222 cluster, while miR-1179 was downregulated. BRAF-positive samples displayed extreme downregulation of miR-7 and -204. The identification of the predicted targets for the novel molecules gave insights into the proliferative potential of the transformed follicular cell. Finally, by integrating clinical follow-up information with microRNA expression, we propose a prediction model for disease relapse based on expression of two miRNAs (miR-192 and let-7a) and several other clinicopathological features. This comprehensive study complements the existing knowledge about deregulated microRNAs in the development of well-differentiated thyroid cancer and identifies novel markers associated with recurrence-free survival.

  11. A deep sequencing approach to uncover the miRNOME in the human heart.

    PubMed

    Leptidis, Stefanos; El Azzouzi, Hamid; Lok, Sjoukje I; de Weger, Roel; Olieslagers, Servé; Olieslagers, Serv; Kisters, Natasja; Silva, Gustavo J; Heymans, Stephane; Cuppen, Edwin; Berezikov, Eugene; De Windt, Leon J; da Costa Martins, Paula

    2013-01-01

    MicroRNAs (miRNAs) are a class of non-coding RNAs of ∼22 nucleotides in length, and constitute a novel class of gene regulators by imperfect base-pairing to the 3'UTR of protein encoding messenger RNAs. Growing evidence indicates that miRNAs are implicated in several pathological processes in myocardial disease. The past years, we have witnessed several profiling attempts using high-density oligonucleotide array-based approaches to identify the complete miRNA content (miRNOME) in the healthy and diseased mammalian heart. These efforts have demonstrated that the failing heart displays differential expression of several dozens of miRNAs. While the total number of experimentally validated human miRNAs is roughly two thousand, the number of expressed miRNAs in the human myocardium remains elusive. Our objective was to perform an unbiased assay to identify the miRNOME of the human heart, both under physiological and pathophysiological conditions. We used deep sequencing and bioinformatics to annotate and quantify microRNA expression in healthy and diseased human heart (heart failure secondary to hypertrophic or dilated cardiomyopathy). Our results indicate that the human heart expresses >800 miRNAs, the majority of which not being annotated nor described so far and some of which being unique to primate species. Furthermore, >250 miRNAs show differential and etiology-dependent expression in human dilated cardiomyopathy (DCM) or hypertrophic cardiomyopathy (HCM). The human cardiac miRNOME still possesses a large number of miRNAs that remain virtually unexplored. The current study provides a starting point for a more comprehensive understanding of the role of miRNAs in regulating human heart disease.

  12. Deep sequencing reveals microRNAs predictive of antiangiogenic drug response

    PubMed Central

    García-Donas, Jesús; Beuselinck, Benoit; Inglada-Pérez, Lucía; Graña, Osvaldo; Schöffski, Patrick; Wozniak, Agnieszka; Bechter, Oliver; Apellániz-Ruiz, Maria; Leandro-García, Luis Javier; Esteban, Emilio; Castellano, Daniel E.; González del Alba, Aranzazu; Climent, Miguel Angel; Hernando, Susana; Arranz, José Angel; Morente, Manuel; Pisano, David G.; Robledo, Mercedes

    2016-01-01

    The majority of metastatic renal cell carcinoma (RCC) patients are treated with tyrosine kinase inhibitors (TKI) in first-line treatment; however, a fraction are refractory to these antiangiogenic drugs. MicroRNAs (miRNAs) are regulatory molecules proven to be accurate biomarkers in cancer. Here, we identified miRNAs predictive of progressive disease under TKI treatment through deep sequencing of 74 metastatic clear cell RCC cases uniformly treated with these drugs. Twenty-nine miRNAs were differentially expressed in the tumors of patients who progressed under TKI therapy (P values from 6 × 10–9 to 3 × 10–3). Among 6 miRNAs selected for validation in an independent series, the most relevant associations corresponded to miR–1307-3p, miR–155-5p, and miR–221-3p (P = 4.6 × 10–3, 6.5 × 10–3, and 3.4 × 10–2, respectively). Furthermore, a 2 miRNA–based classifier discriminated individuals with progressive disease upon TKI treatment (AUC = 0.75, 95% CI, 0.64–0.85; P = 1.3 × 10–4) with better predictive value than clinicopathological risk factors commonly used. We also identified miRNAs significantly associated with progression-free survival and overall survival (P = 6.8 × 10–8 and 7.8 × 10–7 for top hits, respectively), and 7 overlapped with early progressive disease. In conclusion, this is the first miRNome comprehensive study, to our knowledge, that demonstrates a predictive value of miRNAs for TKI response and provides a new set of relevant markers that can help rationalize metastatic RCC treatment. PMID:27699216

  13. Deep sequencing reveals a novel closterovirus associated with wild rose leaf rosette disease.

    PubMed

    He, Yan; Yang, Zuokun; Hong, Ni; Wang, Guoping; Ning, Guogui; Xu, Wenxing

    2015-06-01

    A bizarre virus-like symptom of a leaf rosette formed by dense small leaves on branches of wild roses (Rosa multiflora Thunb.), designated as 'wild rose leaf rosette disease' (WRLRD), was observed in China. To investigate the presumed causal virus, a wild rose sample affected by WRLRD was subjected to deep sequencing of small interfering RNAs (siRNAs) for a complete survey of the infecting viruses and viroids. The assembly of siRNAs led to the reconstruction of the complete genomes of three known viruses, namely Apple stem grooving virus (ASGV), Blackberry chlorotic ringspot virus (BCRV) and Prunus necrotic ringspot virus (PNRSV), and of a novel virus provisionally named 'rose leaf rosette-associated virus' (RLRaV). Phylogenetic analysis clearly placed RLRaV alongside members of the genus Closterovirus, family Closteroviridae. Genome organization of RLRaV RNA (17,653 nucleotides) showed 13 open reading frames (ORFs), except ORF1 and the quintuple gene block, most of which showed no significant similarities with known viral proteins, but, instead, had detectable identities to fungal or bacterial proteins. Additional novel molecular features indicated that RLRaV seems to be the most complex virus among the known genus members. To our knowledge, this is the first report of WRLRD and its associated closterovirus, as well as two ilarviruses and one capilovirus, infecting wild roses. Our findings present novel information about the closterovirus and the aetiology of this rose disease which should facilitate its control. More importantly, the novel features of RLRaV help to clarify the molecular and evolutionary features of the closterovirus.

  14. Acyclic Identification of Aptamers for Human alpha-Thrombin Using Over-Represented Libraries and Deep Sequencing

    PubMed Central

    Kupakuwana, Gillian V.; Crill, James E.; McPike, Mark P.; Borer, Philip N.

    2011-01-01

    Background Aptamers are oligonucleotides that bind proteins and other targets with high affinity and selectivity. Twenty years ago elements of natural selection were adapted to in vitro selection in order to distinguish aptamers among randomized sequence libraries. The primary bottleneck in traditional aptamer discovery is multiple cycles of in vitro evolution. Methodology/Principal Findings We show that over-representation of sequences in aptamer libraries and deep sequencing enables acyclic identification of aptamers. We demonstrated this by isolating a known family of aptamers for human α-thrombin. Aptamers were found within a library containing an average of 56,000 copies of each possible randomized 15mer segment. The high affinity sequences were counted many times above the background in 2–6 million reads. Clustering analysis of sequences with more than 10 counts distinguished two sequence motifs with candidates at high abundance. Motif I contained the previously observed consensus 15mer, Thb1 (46,000 counts), and related variants with mostly G/T substitutions; secondary analysis showed that affinity for thrombin correlated with abundance (Kd = 12 nM for Thb1). The signal-to-noise ratio for this experiment was roughly 10,000∶1 for Thb1. Motif II was unrelated to Thb1 with the leading candidate (29,000 counts) being a novel aptamer against hexose sugars in the storage and elution buffers for Concanavilin A (Kd = 0.5 µM for α-methyl-mannoside); ConA was used to immobilize α-thrombin. Conclusions/Significance Over-representation together with deep sequencing can dramatically shorten the discovery process, distinguish aptamers having a wide range of affinity for the target, allow an exhaustive search of the sequence space within a simplified library, reduce the quantity of the target required, eliminate cycling artifacts, and should allow multiplexing of sequencing experiments and targets. PMID:21625587

  15. Microbial Dark Matter: Unusual intervening sequences in 16S rRNA genes of candidate phyla from the deep subsurface

    SciTech Connect

    Jarett, Jessica; Stepanauskas, Ramunas; Kieft, Thomas; Onstott, Tullis; Woyke, Tanja

    2014-03-17

    The Microbial Dark Matter project has sequenced genomes from over 200 single cells from candidate phyla, greatly expanding our knowledge of the ecology, inferred metabolism, and evolution of these widely distributed, yet poorly understood lineages. The second phase of this project aims to sequence an additional 800 single cells from known as well as potentially novel candidate phyla derived from a variety of environments. In order to identify whole genome amplified single cells, screening based on phylogenetic placement of 16S rRNA gene sequences is being conducted. Briefly, derived 16S rRNA gene sequences are aligned to a custom version of the Greengenes reference database and added to a reference tree in ARB using parsimony. In multiple samples from deep subsurface habitats but not from other habitats, a large number of sequences proved difficult to align and therefore to place in the tree. Based on comparisons to reference sequences and structural alignments using SSU-ALIGN, many of these ?difficult? sequences appear to originate from candidate phyla, and contain intervening sequences (IVSs) within the 16S rRNA genes. These IVSs are short (39 - 79 nt) and do not appear to be self-splicing or to contain open reading frames. IVSs were found in the loop regions of stem-loop structures in several different taxonomic groups. Phylogenetic placement of sequences is strongly affected by IVSs; two out of three groups investigated were classified as different phyla after their removal. Based on data from samples screened in this project, IVSs appear to be more common in microbes occurring in deep subsurface habitats, although the reasons for this remain elusive.

  16. Deep sequencing reveals the complete genome and evidence for transcriptional activity of the first virus-like sequences identified in Aristotelia chilensis (Maqui Berry).

    PubMed

    Villacreses, Javier; Rojas-Herrera, Marcelo; Sánchez, Carolina; Hewstone, Nicole; Undurraga, Soledad F; Alzate, Juan F; Manque, Patricio; Maracaja-Coutinho, Vinicius; Polanco, Victor

    2015-04-03

    Here, we report the genome sequence and evidence for transcriptional activity of a virus-like element in the native Chilean berry tree Aristotelia chilensis. We propose to name the endogenous sequence as Aristotelia chilensis Virus 1 (AcV1). High-throughput sequencing of the genome of this tree uncovered an endogenous viral element, with a size of 7122 bp, corresponding to the complete genome of AcV1. Its sequence contains three open reading frames (ORFs): ORFs 1 and 2 shares 66%-73% amino acid similarity with members of the Caulimoviridae virus family, especially the Petunia vein clearing virus (PVCV), Petuvirus genus. ORF1 encodes a movement protein (MP); ORF2 a Reverse Transcriptase (RT) and a Ribonuclease H (RNase H) domain; and ORF3 showed no amino acid sequence similarity with any other known virus proteins. Analogous to other known endogenous pararetrovirus sequences (EPRVs), AcV1 is integrated in the genome of Maqui Berry and showed low viral transcriptional activity, which was detected by deep sequencing technology (DNA and RNA-seq). Phylogenetic analysis of AcV1 and other pararetroviruses revealed a closer resemblance with Petuvirus. Overall, our data suggests that AcV1 could be a new member of Caulimoviridae family, genus Petuvirus, and the first evidence of this kind of virus in a fruit plant.

  17. Deep Sequencing Reveals the Complete Genome and Evidence for Transcriptional Activity of the First Virus-Like Sequences Identified in Aristotelia chilensis (Maqui Berry)

    PubMed Central

    Villacreses, Javier; Rojas-Herrera, Marcelo; Sánchez, Carolina; Hewstone, Nicole; Undurraga, Soledad F.; Alzate, Juan F.; Manque, Patricio; Maracaja-Coutinho, Vinicius; Polanco, Victor

    2015-01-01

    Here, we report the genome sequence and evidence for transcriptional activity of a virus-like element in the native Chilean berry tree Aristotelia chilensis. We propose to name the endogenous sequence as Aristotelia chilensis Virus 1 (AcV1). High-throughput sequencing of the genome of this tree uncovered an endogenous viral element, with a size of 7122 bp, corresponding to the complete genome of AcV1. Its sequence contains three open reading frames (ORFs): ORFs 1 and 2 shares 66%–73% amino acid similarity with members of the Caulimoviridae virus family, especially the Petunia vein clearing virus (PVCV), Petuvirus genus. ORF1 encodes a movement protein (MP); ORF2 a Reverse Transcriptase (RT) and a Ribonuclease H (RNase H) domain; and ORF3 showed no amino acid sequence similarity with any other known virus proteins. Analogous to other known endogenous pararetrovirus sequences (EPRVs), AcV1 is integrated in the genome of Maqui Berry and showed low viral transcriptional activity, which was detected by deep sequencing technology (DNA and RNA-seq). Phylogenetic analysis of AcV1 and other pararetroviruses revealed a closer resemblance with Petuvirus. Overall, our data suggests that AcV1 could be a new member of Caulimoviridae family, genus Petuvirus, and the first evidence of this kind of virus in a fruit plant. PMID:25855242

  18. Deep Transcriptome Sequencing of Two Green Algae, Chara vulgaris and Chlamydomonas reinhardtii, Provides No Evidence of Organellar RNA Editing

    PubMed Central

    Cahoon, A. Bruce; Nauss, John A.; Stanley, Conner D.; Qureshi, Ali

    2017-01-01

    Nearly all land plants post-transcriptionally modify specific nucleotides within RNAs, a process known as RNA editing. This adaptation allows the correction of deleterious mutations within the asexually reproducing and presumably non-recombinant chloroplast and mitochondrial genomes. There are no reports of RNA editing in any of the green algae so this phenomenon is presumed to have originated in embryophytes either after the invasion of land or in the now extinct algal ancestor of all land plants. This was challenged when a recent in silico screen for RNA edit sites based on genomic sequence homology predicted edit sites in the green alga Chara vulgaris, a multicellular alga found within the Streptophyta clade and one of the closest extant algal relatives of land plants. In this study, the organelle transcriptomes of C. vulgaris and Chlamydomonas reinhardtii were deep sequenced for a comprehensive assessment of RNA editing. Initial analyses based solely on sequence comparisons suggested potential edit sites in both species, but subsequent high-resolution melt analysis, RNase H-dependent PCR (rhPCR), and Sanger sequencing of DNA and complementary DNAs (cDNAs) from each of the putative edit sites revealed them to be either single-nucleotide polymorphisms (SNPs) or spurious deep sequencing results. The lack of RNA editing in these two lineages is consistent with the current hypothesis that RNA editing evolved after embryophytes split from its ancestral algal lineage. PMID:28230734

  19. A kinetic model-based algorithm to classify NGS short reads by their allele origin.

    PubMed

    Marinoni, Andrea; Rizzo, Ettore; Limongelli, Ivan; Gamba, Paolo; Bellazzi, Riccardo

    2015-02-01

    Genotyping Next Generation Sequencing (NGS) data of a diploid genome aims to assign the zygosity of identified variants through comparison with a reference genome. Current methods typically employ probabilistic models that rely on the pileup of bases at each locus and on a priori knowledge. We present a new algorithm, called Kimimila (KInetic Modeling based on InforMation theory to Infer Labels of Alleles), which is able to assign reads to alleles by using a distance geometry approach and to infer the variant genotypes accurately, without any kind of assumption. The performance of the model has been assessed on simulated and real data of the 1000 Genomes Project and the results have been compared with several commonly used genotyping methods, i.e., GATK, Samtools, VarScan, FreeBayes and Atlas2. Despite our algorithm does not make use of a priori knowledge, the percentage of correctly genotyped variants is comparable to these algorithms. Furthermore, our method allows the user to split the reads pool depending on the inferred allele origin.

  20. When Is a Microbial Culture “Pure”? Persistent Cryptic Contaminant Escapes Detection Even with Deep Genome Sequencing

    PubMed Central

    Shrestha, Pravin Malla; Nevin, Kelly P.; Shrestha, Minita; Lovley, Derek R.

    2013-01-01

    ABSTRACT Geobacter sulfurreducens strain KN400 was recovered in previous studies in which a culture of the DL1 strain of G. sulfurreducens served as the inoculum in investigations of microbial current production at low anode potentials (−400 mV versus Ag/AgCl). Differences in the genome sequences of KN400 and DL1 were too great to have arisen from adaptive evolution during growth on the anode. Previous deep sequencing (80-fold coverage) of the DL1 culture failed to detect sequences specific to KN400, suggesting that KN400 was an external contaminant inadvertently introduced into the anode culturing system. In order to evaluate this further, a portion of the gene for OmcS, a c-type cytochrome that both KN400 and DL1 possess, was amplified from the DL1 culture. HiSeq-2000 Illumina sequencing of the PCR product detected the KN400 sequence, which differs from the DL1 sequence at 14 bp, at a frequency of ca. 1 in 105 copies of the DL1 sequence. A similar low frequency of KN400 was detected with quantitative PCR of a KN400-specific gene. KN400 persisted at this frequency after intensive restreaking of isolated colonies from the DL1 culture. However, a culture in which KN400 could no longer be detected was obtained by serial dilution to extinction in liquid medium. The KN400-free culture could not grow on an anode poised at −400 mV. Thus, KN400 cryptically persisted in the culture dominated by DL1 for more than a decade, undetected by even deep whole-genome sequencing, and was only fortuitously uncovered by the unnatural selection pressure of growth on a low-potential electrode. PMID:23481604

  1. A Systematic Assessment of Accuracy in Detecting Somatic Mosaic Variants by Deep Amplicon Sequencing: Application to NF2 Gene

    PubMed Central

    Sestini, Roberta; Candita, Luisa; Capone, Gabriele Lorenzo; Barbetti, Lorenzo; Falconi, Serena; Frusconi, Sabrina; Giotti, Irene; Giuliani, Costanza; Torricelli, Francesca; Benelli, Matteo; Papi, Laura

    2015-01-01

    The accurate detection of low-allelic variants is still challenging, particularly for the identification of somatic mosaicism, where matched control sample is not available. High throughput sequencing, by the simultaneous and independent analysis of thousands of different DNA fragments, might overcome many of the limits of traditional methods, greatly increasing the sensitivity. However, it is necessary to take into account the high number of false positives that may arise due to the lack of matched control samples. Here, we applied deep amplicon sequencing to the analysis of samples with known genotype and variant allele fraction (VAF) followed by a tailored statistical analysis. This method allowed to define a minimum value of VAF for detecting mosaic variants with high accuracy. Then, we exploited the estimated VAF to select candidate alterations in NF2 gene in 34 samples with unknown genotype (30 blood and 4 tumor DNAs), demonstrating the suitability of our method. The strategy we propose optimizes the use of deep amplicon sequencing for the identification of low abundance variants. Moreover, our method can be applied to different high throughput sequencing approaches to estimate the background noise and define the accuracy of the experimental design. PMID:26066488

  2. Deep sequencing of immune repertoires during bovine development and in response to respiratory pathogen challenge

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Vertebrate immune systems generate diverse repertoires of antibodies capable of mediating response to a variety of antigens. Single-molecule circular consensus sequencing permits the sequencing of expressed antibody repertoires at previously unattainable depths of coverage and accuracy. We examined...

  3. Metavisitor, a Suite of Galaxy Tools for Simple and Rapid Detection and Discovery of Viruses in Deep Sequence Data

    PubMed Central

    Vernick, Kenneth D.

    2017-01-01

    Metavisitor is a software package that allows biologists and clinicians without specialized bioinformatics expertise to detect and assemble viral genomes from deep sequence datasets. The package is composed of a set of modular bioinformatic tools and workflows that are implemented in the Galaxy framework. Using the graphical Galaxy workflow editor, users with minimal computational skills can use existing Metavisitor workflows or adapt them to suit specific needs by adding or modifying analysis modules. Metavisitor works with DNA, RNA or small RNA sequencing data over a range of read lengths and can use a combination of de novo and guided approaches to assemble genomes from sequencing reads. We show that the software has the potential for quick diagnosis as well as discovery of viruses from a vast array of organisms. Importantly, we provide here executable Metavisitor use cases, which increase the accessibility and transparency of the software, ultimately enabling biologists or clinicians to focus on biological or medical questions. PMID:28045932

  4. Deep sequencing unearths nuclear mitochondrial sequences under Leber's hereditary optic neuropathy-associated false heteroplasmic mitochondrial DNA variants.

    PubMed

    Petruzzella, Vittoria; Carrozzo, Rosalba; Calabrese, Claudia; Dell'Aglio, Rosa; Trentadue, Raffaella; Piredda, Roberta; Artuso, Lucia; Rizza, Teresa; Bianchi, Marzia; Porcelli, Anna Maria; Guerriero, Silvana; Gasparre, Giuseppe; Attimonelli, Marcella

    2012-09-01

    Leber's hereditary optic neuropathy (LHON) is associated with mitochondrial DNA (mtDNA) ND mutations that are mostly homoplasmic. However, these mutations are not sufficient to explain the peculiar features of penetrance and the tissue-specific expression of the disease and are believed to be causative in association with unknown environmental or other genetic factors. Discerning between clear-cut pathogenetic variants, such as those that appear to be heteroplasmic, and less penetrant variants, such as the homoplasmic, remains a challenging issue that we have addressed here using next-generation sequencing approach. We set up a protocol to quantify MTND5 heteroplasmy levels in a family in which the proband manifests a LHON phenotype. Furthermore, to study this mtDNA haplotype, we applied the cybridization protocol. The results demonstrate that the mutations are mostly homoplasmic, whereas the suspected heteroplasmic feature of the observed mutations is due to the co-amplification of Nuclear mitochondrial Sequences.

  5. Natural variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions (2013 DOE JGI Genomics of Energy and Environment 8th Annual User Meeting)

    SciTech Connect

    Gordon, Sean

    2013-03-01

    Sean Gordon of the USDA on "Natural variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions" at the 8th Annual Genomics of Energy & Environment Meeting on March 27, 2013 in Walnut Creek, Calif.

  6. Evolution of the ISM in main-sequence versus starburst galaxies: A motivation for molecular deep fields

    NASA Astrophysics Data System (ADS)

    Aravena, Manuel

    In the last decade, significant progress has been made to understand the evolution with redshift of star formation processes in galaxies. Its is now clear that the majority of galaxies at z<3 form a nearly linear correlation between their stellar mass and star formation rates and appear to create most of their stars in timescales of ~1 Gyr. At the highest luminosities, a significant fraction of galaxies deviate from this main-sequence, showing short duty cycles and thus producing most of their stars in a single burst of star formation within ~100 Myr, being likely driven by major merger activity. Despite the large luminosities of starbursts, main-sequence galaxies appear to dominate the star formation density of the Universe at its peak. While progress has been impressive, a number of questions are still unanswered. In this paper, I briefly review our current observational understanding of this main-sequence vs starburst galaxy paradigm, and address how future observations will help us to have better insights into the fundamental properties of the interstellar medium of these galaxies. Finally, I show recent attempts to conduct molecular deep field observations and the motivation to perform molecular deep field spectroscopy with the Atacama Large Millimeter/submillimeter Array.

  7. Genomic Analysis by Deep Sequencing of the Probiotic Lactobacillus brevis KB290 Harboring Nine Plasmids Reveals Genomic Stability

    PubMed Central

    Fukao, Masanori; Oshima, Kenshiro; Morita, Hidetoshi; Toh, Hidehiro; Suda, Wataru; Kim, Seok-Won; Suzuki, Shigenori; Yakabe, Takafumi; Hattori, Masahira; Yajima, Nobuhiro

    2013-01-01

    We determined the complete genome sequence of Lactobacillus brevis KB290, a probiotic lactic acid bacterium isolated from a traditional Japanese fermented vegetable. The genome contained a 2,395,134-bp chromosome that housed 2,391 protein-coding genes and nine plasmids that together accounted for 191 protein-coding genes. KB290 contained no virulence factor genes, and several genes related to presumptive cell wall-associated polysaccharide biosynthesis and the stress response were present in L. brevis KB290 but not in the closely related L. brevis ATCC 367. Plasmid-curing experiments revealed that the presence of plasmid pKB290-1 was essential for the strain's gastrointestinal tract tolerance and tendency to aggregate. Using next-generation deep sequencing of current and 18-year-old stock strains to detect low frequency variants, we evaluated genome stability. Deep sequencing of four periodic KB290 culture stocks with more than 1,000-fold coverage revealed 3 mutation sites and 37 minority variation sites, indicating long-term stability and providing a useful method for assessing the stability of industrial bacteria at the nucleotide level. PMID:23544154

  8. Heavy-light chain interrelations of MS-associated immunoglobulins probed by deep sequencing and rational variation.

    PubMed

    Lomakin, Yakov A; Zakharova, Maria Yu; Stepanov, Alexey V; Dronina, Maria A; Smirnov, Ivan V; Bobik, Tatyana V; Pyrkov, Andrey Yu; Tikunova, Nina V; Sharanova, Svetlana N; Boitsov, Vitali M; Vyazmin, Sergey Yu; Kabilov, Marsel R; Tupikin, Alexey E; Krasnov, Alexey N; Bykova, Nadezda A; Medvedeva, Yulia A; Fridman, Marina V; Favorov, Alexander V; Ponomarenko, Natalia A; Dubina, Michael V; Boyko, Alexey N; Vlassov, Valentin V; Belogurov, Alexey A; Gabibov, Alexander G

    2014-12-01

    The mechanisms triggering most of autoimmune diseases are still obscure. Autoreactive B cells play a crucial role in the development of such pathologies and, in particular, production of autoantibodies of different specificities. The combination of deep-sequencing technology with functional studies of antibodies selected from highly representative immunoglobulin combinatorial libraries may provide unique information on specific features in the repertoires of autoreactive B cells. Here, we have analyzed cross-combinations of the variable regions of human immunoglobulins against the myelin basic protein (MBP) previously selected from a multiple sclerosis (MS)-related scFv phage-display library. On the other hand, we have performed deep sequencing of the sublibraries of scFvs against MBP, Epstein-Barr virus (EBV) latent membrane protein 1 (LMP1), and myelin oligodendrocyte glycoprotein (MOG). Bioinformatics analysis of sequencing data and surface plasmon resonance (SPR) studies have shown that it is the variable fragments of antibody heavy chains that mainly determine both the affinity of antibodies to the parent autoantigen and their cross-reactivity. It is suggested that LMP1-cross-reactive anti-myelin autoantibodies contain heavy chains encoded by certain germline gene segments, which may be a hallmark of the EBV-specific B cell subpopulation involved in MS triggering.

  9. HPV Population Profiling in Healthy Men by Next-Generation Deep Sequencing Coupled with HPV-QUEST

    PubMed Central

    Yin, Li; Yao, Jin; Chang, Kaifen; Gardner, Brent P.; Yu, Fahong; Giuliano, Anna R.; Goodenow, Maureen M.

    2016-01-01

    Multiple-type human papillomaviruses (HPV) infection presents a greater risk for persistence in asymptomatic individuals and may accelerate cancer development. To extend the scope of HPV types defined by probe-based assays, multiplexing deep sequencing of HPV L1, coupled with an HPV-QUEST genotyping server and a bioinformatic pipeline, was established and applied to survey the diversity of HPV genotypes among a subset of healthy men from the HPV in Men (HIM) Multinational Study. Twenty-one HPV genotypes (12 high-risk and 9 low-risk) were detected in the genital area from 18 asymptomatic individuals. A single HPV type, either HPV16, HPV6b or HPV83, was detected in 7 individuals, while coinfection by 2 to 5 high-risk and/or low-risk genotypes was identified in the other 11 participants. In two individuals studied for over one year, HPV16 persisted, while fluctuations of coinfecting genotypes occurred. HPV L1 regions were generally identical between query and reference sequences, although nonsynonymous and synonymous nucleotide polymorphisms of HPV16, 18, 31, 35h, 59, 70, 73, cand85, 6b, 62, 81, 83, cand89 or JEB2 L1 genotypes, mostly unidentified by linear array, were evident. Deep sequencing coupled with HPV-QUEST provides efficient and unambiguous classification of HPV genotypes in multiple-type HPV infection in host ecosystems. PMID:26821041

  10. Sequence boundaries in uppermost Proterozoic mixed siliciclastic-carbonate rocks: Deep Spring Formation, southern Basin and Range

    SciTech Connect

    Parsons, S.M.; Rees, M.N. . Geosciences Dept.)

    1993-04-01

    The authors propose that a sequence boundary lies at the top of the Reed Dolomite and another at the top of the lower member of the overlying Deep Spring Formation. These boundaries should be useful in correlating critical pre-trilobite Neoproterozoic rocks across the southern Basin and Range Province. Furthermore, the mixed siliciclastic-carbonate rocks between these boundaries reflect an intimate interplay between subsidences, sea-level change and the different rates at which siliciclastic and carbonate sediments accumulate. The Type 2 sequence boundary at the top of the Reed Dolomite is marked in outcrop near Bishop, California by minor channelization and dissolution surfaces that resulted from subaerial exposure of the carbonate platform. This sea level low stand is recorded in the lower Deep Spring Formation, 150 km northwest, by carbonate sediment-gravity-flow deposits. With initiation of transgression, siliciclastics buried the eroded platform and carbonate sedimentation continued in the northwest. As sea level continued to rise, carbonate deposition occurred across the region. Time of maximum flooding is represented by lagoonal deposits in the southeast and a condensed section to the northwest. The condensed section is characterized by dolomitized limestones containing glauconite and small shelly fossils that are overlain by thinly interbedded shales and siltstones with rare trace fossils. The slower rate of siliciclastic deposition on the rapidly subsiding shelf produced an increase in accommodation space resulting in development of an ooid shoal to the southeast. To the northwest, however, continued submarine deposition produced thinly interbedded limestone turbidities and shales. Ooid accumulation outpaced subsidence and together with sea level fall resulted in extensive subaerial exposure of the oolite. Thus, the top of the lower member of the Deep Spring Formation represents the second Type 2 sequence boundary.

  11. Characterization of the Genomic Diversity of Norovirus in Linked Patients Using a Metagenomic Deep Sequencing Approach

    PubMed Central

    Nasheri, Neda; Petronella, Nicholas; Ronholm, Jennifer; Bidawid, Sabah; Corneau, Nathalie

    2017-01-01

    Norovirus (NoV) is the leading cause of gastroenteritis worldwide. A robust cell culture system does not exist for NoV and therefore detailed characterization of outbreak and sporadic strains relies on molecular techniques. In this study, we employed a metagenomic approach that uses non-specific amplification followed by next-generation sequencing to whole genome sequence NoV genomes directly from clinical samples obtained from 8 linked patients. Enough sequencing depth was obtained for each sample to use a de novo assembly of near-complete genome sequences. The resultant consensus sequences were then used to identify inter-host nucleotide variations that occur after direct transmission, analyze amino acid variations in the major capsid protein, and provide evidence of recombination events. The analysis of intra-host quasispecies diversity was possible due to high coverage-depth. We also observed a linear relationship between NoV viral load in the clinical sample and the number of sequence reads that could be attributed to NoV. The method demonstrated here has the potential for future use in whole genome sequence analyses of other RNA viruses isolated from clinical, environmental, and food specimens. PMID:28197136

  12. Ultra-deep sequencing leads to earlier and more sensitive detection of the tyrosine kinase inhibitor resistance mutation T315I in chronic myeloid leukemia

    PubMed Central

    Baer, Constance; Kern, Wolfgang; Koch, Sarah; Nadarajah, Niroshan; Schindela, Sonja; Meggendorfer, Manja; Haferlach, Claudia; Haferlach, Torsten

    2016-01-01

    Chronic myeloid leukemia cells acquire resistance to tyrosine kinase inhibitors through mutations in the ABL1 kinase domain. The T315I mutation mediates resistance to imatinib, dasatinib, nilotinib and bosutinib, whereas sensitivity to ponatinib remains. Mutation detection by conventional Sanger sequencing requires 10%–20% expansion of the mutated subclone. We studied the T315I mutation development by ultra-deep sequencing on the 454 XL+ platform (Roche) in comparison to Sanger sequencing. By ultra-deep sequencing, mutations were detected at loads of 1%–2%. We selected 40 patients who had failed first-line to third-line treatment (imatinib, dasatinib, nilotinib) and had high loads of the T315I mutation detected by Sanger sequencing. We confirmed T315I mutations by ultra-deep sequencing and investigated the mutation dynamics by backtracking earlier samples. In 20 of 40 patients, we identified the T315I three months (median) before Sanger sequencing detection limits were reached. To exclude sporadic low percentage mutation development without subsequent mutation outgrowth, we selected 42 patients without resistance mutations detected by Sanger sequencing but loss of major molecular response. Here, no mutation was detected by ultradeep sequencing. Additional non-T315I resistance mutations were found in 20 of 40 patients. Only 15% had two mutations per cell; the other cases showed multiple independently mutated clones and the T315I clone demonstrated a rapid outgrowth. In conclusion, T315I mutations could be detected earlier by ultra-deep sequencing compared to Sanger sequencing in a selected group of cases. Earlier mutation detection by ultra-deep sequencing might allow treatment to be changed before clonal increase of cells with the T315I mutation. PMID:27102501

  13. Ultra-deep sequencing leads to earlier and more sensitive detection of the tyrosine kinase inhibitor resistance mutation T315I in chronic myeloid leukemia.

    PubMed

    Baer, Constance; Kern, Wolfgang; Koch, Sarah; Nadarajah, Niroshan; Schindela, Sonja; Meggendorfer, Manja; Haferlach, Claudia; Haferlach, Torsten

    2016-07-01

    Chronic myeloid leukemia cells acquire resistance to tyrosine kinase inhibitors through mutations in the ABL1 kinase domain. The T315I mutation mediates resistance to imatinib, dasatinib, nilotinib and bosutinib, whereas sensitivity to ponatinib remains. Mutation detection by conventional Sanger sequencing requires 10%-20% expansion of the mutated subclone. We studied the T315I mutation development by ultra-deep sequencing on the 454 XL+ platform (Roche) in comparison to Sanger sequencing. By ultra-deep sequencing, mutations were detected at loads of 1%-2%. We selected 40 patients who had failed first-line to third-line treatment (imatinib, dasatinib, nilotinib) and had high loads of the T315I mutation detected by Sanger sequencing. We confirmed T315I mutations by ultra-deep sequencing and investigated the mutation dynamics by backtracking earlier samples. In 20 of 40 patients, we identified the T315I three months (median) before Sanger sequencing detection limits were reached. To exclude sporadic low percentage mutation development without subsequent mutation outgrowth, we selected 42 patients without resistance mutations detected by Sanger sequencing but loss of major molecular response. Here, no mutation was detected by ultradeep sequencing. Additional non-T315I resistance mutations were found in 20 of 40 patients. Only 15% had two mutations per cell; the other cases showed multiple independently mutated clones and the T315I clone demonstrated a rapid outgrowth. In conclusion, T315I mutations could be detected earlier by ultra-deep sequencing compared to Sanger sequencing in a selected group of cases. Earlier mutation detection by ultra-deep sequencing might allow treatment to be changed before clonal increase of cells with the T315I mutation.

  14. Using deep RNA sequencing for the structural annotation of the laccaria bicolor mycorrhizal transcriptome.

    SciTech Connect

    Larsen, P. E.; Trivedi, G.; Sreedasyam, A.; Lu, V.; Podila, G. K.; Collart, F. R.; Biosciences Division; Univ. of Alabama

    2010-07-06

    Accurate structural annotation is important for prediction of function and required for in vitro approaches to characterize or validate the gene expression products. Despite significant efforts in the field, determination of the gene structure from genomic data alone is a challenging and inaccurate process. The ease of acquisition of transcriptomic sequence provides a direct route to identify expressed sequences and determine the correct gene structure. We developed methods to utilize RNA-seq data to correct errors in the structural annotation and extend the boundaries of current gene models using assembly approaches. The methods were validated with a transcriptomic data set derived from the fungus Laccaria bicolor, which develops a mycorrhizal symbiotic association with the roots of many tree species. Our analysis focused on the subset of 1501 gene models that are differentially expressed in the free living vs. mycorrhizal transcriptome and are expected to be important elements related to carbon metabolism, membrane permeability and transport, and intracellular signaling. Of the set of 1501 gene models, 1439 (96%) successfully generated modified gene models in which all error flags were successfully resolved and the sequences aligned to the genomic sequence. The remaining 4% (62 gene models) either had deviations from transcriptomic data that could not be spanned or generated sequence that did not align to genomic sequence. The outcome of this process is a set of high confidence gene models that can be reliably used for experimental characterization of protein function. 69% of expressed mycorrhizal JGI 'best' gene models deviated from the transcript sequence derived by this method. The transcriptomic sequence enabled correction of a majority of the structural inconsistencies and resulted in a set of validated models for 96% of the mycorrhizal genes. The method described here can be applied to improve gene structural annotation in other species, provided that there

  15. Deep HST Imaging in 47 Tuc and NGC 6397: Main Sequence Turnoff Ages

    NASA Astrophysics Data System (ADS)

    Dotter, Aaron L.; Anderson, J.; Fahlman, G.; Hansen, B.; Hurley, J.; Kalirai, J.; King, I.; Reitzel, D.; Rich, R. M.; Richer, H.; Shara, M.; Stetson, P.; Woodley, K.; Zurek, D.

    2011-01-01

    The ages of Galactic globular clusters provide insight into the formation history of the Milky Way. Utilizing HST photometry of unprecendented depth and wavelength coverage, we determine the main sequence turnoff ages of the nearby globular clusters NGC 6397 and 47 Tuc. The ages are determined by comparing stellar evolution models to the main sequences with a chi-squared minimization technique. Our analysis of 47 Tuc leverages the pronounced 'kink' or 'knee' feature that appears in the lower main sequence in the near-IR. We present our age estimates as probability distributions and construct confidence intervals over input parameters such as metallicity, distance, and reddening.

  16. Draft Genome Sequence of Pseudomonas pachastrellae Strain CCUG 46540T, a Deep-Sea Bacterium

    PubMed Central

    2017-01-01

    ABSTRACT Pseudomonas pachastrellae strain CCUG 46540T (KMM 330T) was isolated from a deep-sea sponge specimen collected in the Philippine Sea at a depth of 750 m. The draft genome has an estimated size of 4.0 Mb, exhibits a G+C content of 61.2 mol%, and is predicted to encode 3,592 proteins, including pathways for the degradation of aromatic compounds. PMID:28385850

  17. Increasing Clinical Severity during a Dengue Virus Type 3 Cuban Epidemic: Deep Sequencing of Evolving Viral Populations

    PubMed Central

    Blanc, Hervé; Bordería, Antonio V.; Díaz, Gisell; Henningsson, Rasmus; Gonzalez, Daniel; Santana, Emidalys; Alvarez, Mayling; Castro, Osvaldo; Fontes, Magnus; Vignuzzi, Marco; Guzman, Maria G.

    2016-01-01

    ABSTRACT During the dengue virus type 3 (DENV-3) epidemic that occurred in Havana in 2001 to 2002, severe disease was associated with the infection sequence DENV-1 followed by DENV-3 (DENV-1/DENV-3), while the sequence DENV-2/DENV-3 was associated with mild/asymptomatic infections. To determine the role of the virus in the increasing severity demonstrated during the epidemic, serum samples collected at different time points were studied. A total of 22 full-length sequences were obtained using a deep-sequencing approach. Bayesian phylogenetic analysis of consensus sequences revealed that two DENV-3 lineages were circulating in Havana at that time, both grouped within genotype III. The predominant lineage is closely related to Peruvian and Ecuadorian strains, while the minor lineage is related to Venezuelan strains. According to consensus sequences, relatively few nonsynonymous mutations were observed; only one was fixed during the epidemic at position 4380 in the NS2B gene. Intrahost genetic analysis indicated that a significant minor population was selected and became predominant toward the end of the epidemic. In conclusion, greater variability was detected during the epidemic's progression in terms of significant minority variants, particularly in the nonstructural genes. An increasing trend of genetic diversity toward the end of the epidemic was observed only for synonymous variant allele rates, with higher variability in secondary cases. Remarkably, significant intrahost genetic variation was demonstrated within the same patient during the course of secondary infection with DENV-1/DENV-3, including changes in the structural proteins premembrane (PrM) and envelope (E). Therefore, the dynamic of evolving viral populations in the context of heterotypic antibodies could be related to the increasing clinical severity observed during the epidemic. IMPORTANCE Based on the evidence that DENV fitness is context dependent, our research has focused on the study of viral

  18. A generic assay for whole-genome amplification and deep sequencing of enterovirus A71.

    PubMed

    Tan, Le Van; Tuyen, Nguyen Thi Kim; Thanh, Tran Tan; Ngan, Tran Thuy; Van, Hoang Minh Tu; Sabanathan, Saraswathy; Van, Tran Thi My; Thanh, Le Thi My; Nguyet, Lam Anh; Geoghegan, Jemma L; Ong, Kien Chai; Perera, David; Hang, Vu Thi Ty; Ny, Nguyen Thi Han; Anh, Nguyen To; Ha, Do Quang; Qui, Phan Tu; Viet, Do Chau; Tuan, Ha Manh; Wong, Kum Thong; Holmes, Edward C; Chau, Nguyen Van Vinh; Thwaites, Guy; van Doorn, H Rogier

    2015-04-01

    Enterovirus A71 (EV-A71) has emerged as the most important cause of large outbreaks of severe and sometimes fatal hand, foot and mouth disease (HFMD) across the Asia-Pacific region. EV-A71 outbreaks have been associated with (sub)genogroup switches, sometimes accompanied by recombination events. Understanding EV-A71 population dynamics is therefore essential for understanding this emerging infection, and may provide pivotal information for vaccine development. Despite the public health burden of EV-A71, relatively few EV-A71 complete-genome sequences are available for analysis and from limited geographical localities. The availability of an efficient procedure for whole-genome sequencing would stimulate effort to generate more viral sequence data. Herein, we report for the first time the development of a next-generation sequencing based protocol for whole-genome sequencing of EV-A71 directly from clinical specimens. We were able to sequence viruses of subgenogroup C4 and B5, while RNA from culture materials of diverse EV-A71 subgenogroups belonging to both genogroup B and C was successfully amplified. The nature of intra-host genetic diversity was explored in 22 clinical samples, revealing 107 positions carrying minor variants (ranging from 0 to 15 variants per sample). Our analysis of EV-A71 strains sampled in 2013 showed that they all belonged to subgenogroup B5, representing the first report of this subgenogroup in Vietnam. In conclusion, we have successfully developed a high-throughput next-generation sequencing-based assay for whole-genome sequencing of EV-A71 from clinical samples.

  19. Dissection of the Octoploid Strawberry Genome by Deep Sequencing of the Genomes of Fragaria Species

    PubMed Central

    Hirakawa, Hideki; Shirasawa, Kenta; Kosugi, Shunichi; Tashiro, Kosuke; Nakayama, Shinobu; Yamada, Manabu; Kohara, Mistuyo; Watanabe, Akiko; Kishida, Yoshie; Fujishiro, Tsunakazu; Tsuruoka, Hisano; Minami, Chiharu; Sasamoto, Shigemi; Kato, Midori; Nanri, Keiko; Komaki, Akiko; Yanagi, Tomohiro; Guoxin, Qin; Maeda, Fumi; Ishikawa, Masami; Kuhara, Satoru; Sato, Shusei; Tabata, Satoshi; Isobe, Sachiko N.

    2014-01-01

    Cultivated strawberry (Fragaria x ananassa) is octoploid and shows allogamous behaviour. The present study aims at dissecting this octoploid genome through comparison with its wild relatives, F. iinumae, F. nipponica, F. nubicola, and F. orientalis by de novo whole-genome sequencing on an Illumina and Roche 454 platforms. The total length of the assembled Illumina genome sequences obtained was 698 Mb for F. x ananassa, and ∼200 Mb each for the four wild species. Subsequently, a virtual reference genome termed FANhybrid_r1.2 was constructed by integrating the sequences of the four homoeologous subgenomes of F. x ananassa, from which heterozygous regions in the Roche 454 and Illumina genome sequences were eliminated. The total length of FANhybrid_r1.2 thus created was 173.2 Mb with the N50 length of 5137 bp. The Illumina-assembled genome sequences of F. x ananassa and the four wild species were then mapped onto the reference genome, along with the previously published F. vesca genome sequence to establish the subgenomic structure of F. x ananassa. The strategy adopted in this study has turned out to be successful in dissecting the genome of octoploid F. x ananassa and appears promising when applied to the analysis of other polyploid plant species. PMID:24282021

  20. Dissection of the octoploid strawberry genome by deep sequencing of the genomes of Fragaria species.

    PubMed

    Hirakawa, Hideki; Shirasawa, Kenta; Kosugi, Shunichi; Tashiro, Kosuke; Nakayama, Shinobu; Yamada, Manabu; Kohara, Mistuyo; Watanabe, Akiko; Kishida, Yoshie; Fujishiro, Tsunakazu; Tsuruoka, Hisano; Minami, Chiharu; Sasamoto, Shigemi; Kato, Midori; Nanri, Keiko; Komaki, Akiko; Yanagi, Tomohiro; Guoxin, Qin; Maeda, Fumi; Ishikawa, Masami; Kuhara, Satoru; Sato, Shusei; Tabata, Satoshi; Isobe, Sachiko N

    2014-01-01

    Cultivated strawberry (Fragaria x ananassa) is octoploid and shows allogamous behaviour. The present study aims at dissecting this octoploid genome through comparison with its wild relatives, F. iinumae, F. nipponica, F. nubicola, and F. orientalis by de novo whole-genome sequencing on an Illumina and Roche 454 platforms. The total length of the assembled Illumina genome sequences obtained was 698 Mb for F. x ananassa, and ∼200 Mb each for the four wild species. Subsequently, a virtual reference genome termed FANhybrid_r1.2 was constructed by integrating the sequences of the four homoeologous subgenomes of F. x ananassa, from which heterozygous regions in the Roche 454 and Illumina genome sequences were eliminated. The total length of FANhybrid_r1.2 thus created was 173.2 Mb with the N50 length of 5137 bp. The Illumina-assembled genome sequences of F. x ananassa and the four wild species were then mapped onto the reference genome, along with the previously published F. vesca genome sequence to establish the subgenomic structure of F. x ananassa. The strategy adopted in this study has turned out to be successful in dissecting the genome of octoploid F. x ananassa and appears promising when applied to the analysis of other polyploid plant species.

  1. Insights into Deep-Sea Sediment Fungal Communities from the East Indian Ocean Using Targeted Environmental Sequencing Combined with Traditional Cultivation

    PubMed Central

    Zhang, Xiao-yong; Tang, Gui-ling; Xu, Xin-ya; Nong, Xu-hua; Qi, Shu-Hua

    2014-01-01

    The fungal diversity in deep-sea environments has recently gained an increasing amount attention. Our knowledge and understanding of the true fungal diversity and the role it plays in deep-sea environments, however, is still limited. We investigated the fungal community structure in five sediments from a depth of ∼4000 m in the East India Ocean using a combination of targeted environmental sequencing and traditional cultivation. This approach resulted in the recovery of a total of 45 fungal operational taxonomic units (OTUs) and 20 culturable fungal phylotypes. This finding indicates that there is a great amount of fungal diversity in the deep-sea sediments collected in the East Indian Ocean. Three fungal OTUs and one culturable phylotype demonstrated high divergence (89%–97%) from the existing sequences in the GenBank. Moreover, 44.4% fungal OTUs and 30% culturable fungal phylotypes are new reports for deep-sea sediments. These results suggest that the deep-sea sediments from the East India Ocean can serve as habitats for new fungal communities compared with other deep-sea environments. In addition, different fungal community could be detected when using targeted environmental sequencing compared with traditional cultivation in this study, which suggests that a combination of targeted environmental sequencing and traditional cultivation will generate a more diverse fungal community in deep-sea environments than using either targeted environmental sequencing or traditional cultivation alone. This study is the first to report new insights into the fungal communities in deep-sea sediments from the East Indian Ocean, which increases our knowledge and understanding of the fungal diversity in deep-sea environments. PMID:25272044

  2. Insights into deep-sea sediment fungal communities from the East Indian Ocean using targeted environmental sequencing combined with traditional cultivation.

    PubMed

    Zhang, Xiao-yong; Tang, Gui-ling; Xu, Xin-ya; Nong, Xu-hua; Qi, Shu-hua

    2014-01-01

    The fungal diversity in deep-sea environments has recently gained an increasing amount attention. Our knowledge and understanding of the true fungal diversity and the role it plays in deep-sea environments, however, is still limited. We investigated the fungal community structure in five sediments from a depth of ∼ 4000 m in the East India Ocean using a combination of targeted environmental sequencing and traditional cultivation. This approach resulted in the recovery of a total of 45 fungal operational taxonomic units (OTUs) and 20 culturable fungal phylotypes. This finding indicates that there is a great amount of fungal diversity in the deep-sea sediments collected in the East Indian Ocean. Three fungal OTUs and one culturable phylotype demonstrated high divergence (89%-97%) from the existing sequences in the GenBank. Moreover, 44.4% fungal OTUs and 30% culturable fungal phylotypes are new reports for deep-sea sediments. These results suggest that the deep-sea sediments from the East India Ocean can serve as habitats for new fungal communities compared with other deep-sea environments. In addition, different fungal community could be detected when using targeted environmental sequencing compared with traditional cultivation in this study, which suggests that a combination of targeted environmental sequencing or traditional cultivation alone. This study is the first to report new insights into the fungal communities in deep-sea sediments environmental sequencing and traditional cultivation will generate a more diverse fungal community in deep-sea environments than using either from the East Indian Ocean, which increases our knowledge and understanding of the fungal diversity in deep-sea environments.

  3. Acute West Nile Virus Meningoencephalitis Diagnosed Via Metagenomic Deep Sequencing of Cerebrospinal Fluid in a Renal Transplant Patient.

    PubMed

    Wilson, M R; Zimmermann, L L; Crawford, E D; Sample, H A; Soni, P R; Baker, A N; Khan, L M; DeRisi, J L

    2017-03-01

    Solid organ transplant patients are vulnerable to suffering neurologic complications from a wide array of viral infections and can be sentinels in the population who are first to get serious complications from emerging infections like the recent waves of arboviruses, including West Nile virus, Chikungunya virus, Zika virus, and Dengue virus. The diverse and rapidly changing landscape of possible causes of viral encephalitis poses great challenges for traditional candidate-based infectious disease diagnostics that already fail to identify a causative pathogen in approximately 50% of encephalitis cases. We present the case of a 14-year-old girl on immunosuppression for a renal transplant who presented with acute meningoencephalitis. Traditional diagnostics failed to identify an etiology. RNA extracted from her cerebrospinal fluid was subjected to unbiased metagenomic deep sequencing, enhanced with the use of a Cas9-based technique for host depletion. This analysis identified West Nile virus (WNV). Convalescent serum serologies subsequently confirmed WNV seroconversion. These results support a clear clinical role for metagenomic deep sequencing in the setting of suspected viral encephalitis, especially in the context of the high-risk transplant patient population.

  4. Mosaic KCNJ2 mutation in Andersen-Tawil syndrome: targeted deep sequencing is useful for the detection of mosaicism.

    PubMed

    Hasegawa, K; Ohno, S; Kimura, H; Itoh, H; Makiyama, T; Yoshida, Y; Horie, M

    2015-03-01

    Andersen-Tawil syndrome (ATS) is an inherited disease characterized by ventricular arrhythmias, periodic paralysis, and dysmorphic features. It results from a heterozygous mutation of KCNJ2, but little is known about mosaicism in ATS. We performed genetic analysis of KCNJ2 in 32 ATS probands and their family members and identified KCNJ2 mutations in 25 probands, 20 families who underwent extensive genetic testing. These tests revealed that seven probands carried de novo mutations while 13 carried inherited mutations from their parents. We then specifically assessed a single proband and the respective family. The proband was a 9 year old girl who fulfilled the ATS triad and carried an insertion mutation (p.75_76insThr). We determined that the proband's mother carried a somatic mosaicism and that the proband's younger brother also carried the ATS phenotype with the same insertion mutation. The mother, who exhibited mosaicism, was asymptomatic, although she exhibited Q(T)U prolongation. Mutant allele frequency was 11% as per TA cloning and 17.3% as per targeted deep sequencing. Our observations suggest that targeted deep sequencing is useful for the detection of mosaicism and that the detection of mosaic mutations in parents of apparently sporadic ATS patients can help in the process of genetic counseling.

  5. Characterization of microRNAs and their targets in wild barley (Hordeum vulgare subsp. spontaneum) using deep sequencing.

    PubMed

    Deng, Pingchuan; Bian, Jianxin; Yue, Hong; Feng, Kewei; Wang, Mengxing; Du, Xianghong; Weining, Song; Nie, Xiaojun

    2016-05-01

    MicroRNAs (miRNA) are a class of small, endogenous RNAs that play a negative regulatory role in various developmental and metabolic processes of plants. Wild barley (Hordeum vulgare subsp. spontaneum), as the progenitor of cultivated barley (Hordeum vulgare subsp. vulgare), has served as a valuable germplasm resource for barley genetic improvement. To survey miRNAs in wild barley, we sequenced the small RNA library prepared from wild barley using the Illumina deep sequencing technology. A total of 70 known miRNAs and 18 putative novel miRNAs were identified. Sequence analysis revealed that all of the miRNAs identified in wild barley contained the highly conserved hairpin sequences found in barley cultivars. MiRNA target predictions showed that 12 out of 52 miRNA families were predicted to target transcription factors, including 8 highly conserved miRNA families in plants and 4 wheat-barley conserved miRNA families. In addition to transcription factors, other predicted target genes were involved in diverse physiological and metabolic processes and stress defense. Our study for the first time reported the large-scale investigation of small RNAs in wild barley, which will provide essential information for understanding the regulatory role of miRNAs in wild barley and also shed light on future practical utilization of miRNAs for barley improvement.

  6. Characterization and Development of EST-SSRs by Deep Transcriptome Sequencing in Chinese Cabbage (Brassica rapa L. ssp. pekinensis)

    PubMed Central

    Ding, Qian; Li, Jingjuan; Wang, Fengde; Zhang, Yihui; Li, Huayin; Zhang, Jiannong; Gao, Jianwei

    2015-01-01

    Simple sequence repeats (SSRs) are among the most important markers for population analysis and have been widely used in plant genetic mapping and molecular breeding. Expressed sequence tag-SSR (EST-SSR) markers, located in the coding regions, are potentially more efficient for QTL mapping, gene targeting, and marker-assisted breeding. In this study, we investigated 51,694 nonredundant unigenes, assembled from clean reads from deep transcriptome sequencing with a Solexa/Illumina platform, for identification and development of EST-SSRs in Chinese cabbage. In total, 10,420 EST-SSRs with over 12 bp were identified and characterized, among which 2744 EST-SSRs are new and 2317 are known ones showing polymorphism with previously reported SSRs. A total of 7877 PCR primer pairs for 1561 EST-SSR loci were designed, and primer pairs for twenty-four EST-SSRs were selected for primer evaluation. In nineteen EST-SSR loci (79.2%), amplicons were successfully generated with high quality. Seventeen (89.5%) showed polymorphism in twenty-four cultivars of Chinese cabbage. The polymorphic alleles of each polymorphic locus were sequenced, and the results showed that most polymorphisms were due to variations of SSR repeat motifs. The EST-SSRs identified and characterized in this study have important implications for developing new tools for genetics and molecular breeding in Chinese cabbage. PMID:26504770

  7. Identification of Hepatotropic Viruses from Plasma Using Deep Sequencing: A Next Generation Diagnostic Tool

    PubMed Central

    Patterson, Jordan; Ford, Glenn; O’keefe, Sandra; Wang, Weiwei; Meng, Bo; Song, Deyong; Zhang, Yong; Tian, Zhijian; Wasilenko, Shawn T.; Rahbari, Mandana; Mitchell, Troy; Jordan, Tracy; Carpenter, Eric; Mason, Andrew L.; Wong, Gane Ka-Shu

    2013-01-01

    We conducted an unbiased metagenomics survey using plasma from patients with chronic hepatitis B, chronic hepatitis C, autoimmune hepatitis (AIH), non-alcoholic steatohepatitis (NASH), and patients without liver disease (control). RNA and DNA libraries were sequenced from plasma filtrates enriched in viral particles to catalog virus populations. Hepatitis viruses were readily detected at high coverage in patients with chronic viral hepatitis B and C, but only a limited number of sequences resembling other viruses were found. The exception was a library from a patient diagnosed with hepatitis C virus (HCV) infection that contained multiple sequences matching GB virus C (GBV-C). Abundant GBV-C reads were also found in plasma from patients with AIH, whereas Torque teno virus (TTV) was found at high frequency in samples from patients with AIH and NASH. After taxonomic classification of sequences by BLASTn, a substantial fraction in each library, ranging from 35% to 76%, remained unclassified. These unknown sequences were assembled into scaffolds along with virus, phage and endogenous retrovirus sequences and then analyzed by BLASTx against the non-redundant protein database. Nearly the full genome of a heretofore-unknown circovirus was assembled and many scaffolds that encoded proteins with similarity to plant, insect and mammalian viruses. The presence of this novel circovirus was confirmed by PCR. BLASTx also identified many polypeptides resembling nucleo-cytoplasmic large DNA viruses (NCLDV) proteins. We re-evaluated these alignments with a profile hidden Markov method, HHblits, and observed inconsistencies in the target proteins reported by the different algorithms. This suggests that sequence alignments are insufficient to identify NCLDV proteins, especially when these alignments are only to small portions of the target protein. Nevertheless, we have now established a reliable protocol for the identification of viruses in plasma that can also be adapted to other

  8. Exome and deep sequencing of clinically aggressive neuroblastoma reveal somatic mutations that affect key pathways involved in cancer progression

    PubMed Central

    Lasorsa, Vito Alessandro; Formicola, Daniela; Pignataro, Piero; Cimmino, Flora; Calabrese, Francesco Maria; Mora, Jaume; Esposito, Maria Rosaria; Pantile, Marcella; Zanon, Carlo; De Mariano, Marilena; Longo, Luca; Hogarty, Michael D.; de Torres, Carmen; Tonini, Gian Paolo; Iolascon, Achille; Capasso, Mario

    2016-01-01

    The spectrum of somatic mutation of the most aggressive forms of neuroblastoma is not completely determined. We sought to identify potential cancer drivers in clinically aggressive neuroblastoma. Whole exome sequencing was conducted on 17 germline and tumor DNA samples from high-risk patients with adverse events within 36 months from diagnosis (HR-Event3) to identify somatic mutations and deep targeted sequencing of 134 genes selected from the initial screening in additional 48 germline and tumor pairs (62.5% HR-Event3 and high-risk patients), 17 HR-Event3 tumors and 17 human-derived neuroblastoma cell lines. We revealed 22 significantly mutated genes, many of which implicated in cancer progression. Fifteen genes (68.2%) were highly expressed in neuroblastoma supporting their involvement in the disease. CHD9, a cancer driver gene, was the most significantly altered (4.0% of cases) after ALK. Other genes (PTK2, NAV3, NAV1, FZD1 and ATRX), expressed in neuroblastoma and involved in cell invasion and migration were mutated at frequency ranged from 4% to 2%. Focal adhesion and regulation of actin cytoskeleton pathways, were frequently disrupted (14.1% of cases) thus suggesting potential novel therapeutic strategies to prevent disease progression. Notably BARD1, CHEK2 and AXIN2 were enriched in rare, potentially pathogenic, germline variants. In summary, whole exome and deep targeted sequencing identified novel cancer genes of clinically aggressive neuroblastoma. Our analyses show pathway-level implications of infrequently mutated genes in leading neuroblastoma progression. PMID:27009842

  9. Whole Genome Deep Sequencing of HIV-1 Reveals the Impact of Early Minor Variants Upon Immune Recognition During Acute Infection

    PubMed Central

    Henn, Matthew R.; Lennon, Niall J.; Power, Karen A.; Macalalad, Alexander R.; Berlin, Aaron M.; Malboeuf, Christine M.; Ryan, Elizabeth M.; Gnerre, Sante; Zody, Michael C.; Erlich, Rachel L.; Green, Lisa M.; Berical, Andrew; Wang, Yaoyu; Casali, Monica; Streeck, Hendrik; Bloom, Allyson K.; Dudek, Tim; Tully, Damien; Newman, Ruchi; Axten, Karen L.; Gladden, Adrianne D.; Battis, Laura; Kemper, Michael; Zeng, Qiandong; Shea, Terrance P.; Gujja, Sharvari; Zedlack, Carmen; Gasser, Olivier; Brander, Christian; Hess, Christoph; Günthard, Huldrych F.; Brumme, Zabrina L.; Brumme, Chanson J.; Bazner, Suzane; Rychert, Jenna; Tinsley, Jake P.; Mayer, Ken H.; Rosenberg, Eric; Pereyra, Florencia; Levin, Joshua Z.; Young, Sarah K.; Jessen, Heiko; Altfeld, Marcus; Birren, Bruce W.; Walker, Bruce D.; Allen, Todd M.

    2012-01-01

    Deep sequencing technologies have the potential to transform the study of highly variable viral pathogens by providing a rapid and cost-effective approach to sensitively characterize rapidly evolving viral quasispecies. Here, we report on a high-throughput whole HIV-1 genome deep sequencing platform that combines 454 pyrosequencing with novel assembly and variant detection algorithms. In one subject we combined these genetic data with detailed immunological analyses to comprehensively evaluate viral evolution and immune escape during the acute phase of HIV-1 infection. The majority of early, low frequency mutations represented viral adaptation to host CD8+ T cell responses, evidence of strong immune selection pressure occurring during the early decline from peak viremia. CD8+ T cell responses capable of recognizing these low frequency escape variants coincided with the selection and evolution of more effective secondary HLA-anchor escape mutations. Frequent, and in some cases rapid, reversion of transmitted mutations was also observed across the viral genome. When located within restricted CD8 epitopes these low frequency reverting mutations were sufficient to prime de novo responses to these epitopes, again illustrating the capacity of the immune response to recognize and respond to low frequency variants. More importantly, rapid viral escape from the most immunodominant CD8+ T cell responses coincided with plateauing of the initial viral load decline in this subject, suggestive of a potential link between maintenance of effective, dominant CD8 responses and the degree of early viremia reduction. We conclude that the early control of HIV-1 replication by immunodominant CD8+ T cell responses may be substantially influenced by rapid, low frequency viral adaptations not detected by conventional sequencing approaches, which warrants further investigation. These data support the critical need for vaccine-induced CD8+ T cell responses to target more highly constrained

  10. Whole genome deep sequencing of HIV-1 reveals the impact of early minor variants upon immune recognition during acute infection.

    PubMed

    Henn, Matthew R; Boutwell, Christian L; Charlebois, Patrick; Lennon, Niall J; Power, Karen A; Macalalad, Alexander R; Berlin, Aaron M; Malboeuf, Christine M; Ryan, Elizabeth M; Gnerre, Sante; Zody, Michael C; Erlich, Rachel L; Green, Lisa M; Berical, Andrew; Wang, Yaoyu; Casali, Monica; Streeck, Hendrik; Bloom, Allyson K; Dudek, Tim; Tully, Damien; Newman, Ruchi; Axten, Karen L; Gladden, Adrianne D; Battis, Laura; Kemper, Michael; Zeng, Qiandong; Shea, Terrance P; Gujja, Sharvari; Zedlack, Carmen; Gasser, Olivier; Brander, Christian; Hess, Christoph; Günthard, Huldrych F; Brumme, Zabrina L; Brumme, Chanson J; Bazner, Suzane; Rychert, Jenna; Tinsley, Jake P; Mayer, Ken H; Rosenberg, Eric; Pereyra, Florencia; Levin, Joshua Z; Young, Sarah K; Jessen, Heiko; Altfeld, Marcus; Birren, Bruce W; Walker, Bruce D; Allen, Todd M

    2012-01-01

    Deep sequencing technologies have the potential to transform the study of highly variable viral pathogens by providing a rapid and cost-effective approach to sensitively characterize rapidly evolving viral quasispecies. Here, we report on a high-throughput whole HIV-1 genome deep sequencing platform that combines 454 pyrosequencing with novel assembly and variant detection algorithms. In one subject we combined these genetic data with detailed immunological analyses to comprehensively evaluate viral evolution and immune escape during the acute phase of HIV-1 infection. The majority of early, low frequency mutations represented viral adaptation to host CD8+ T cell responses, evidence of strong immune selection pressure occurring during the early decline from peak viremia. CD8+ T cell responses capable of recognizing these low frequency escape variants coincided with the selection and evolution of more effective secondary HLA-anchor escape mutations. Frequent, and in some cases rapid, reversion of transmitted mutations was also observed across the viral genome. When located within restricted CD8 epitopes these low frequency reverting mutations were sufficient to prime de novo responses to these epitopes, again illustrating the capacity of the immune response to recognize and respond to low frequency variants. More importantly, rapid viral escape from the most immunodominant CD8+ T cell responses coincided with plateauing of the initial viral load decline in this subject, suggestive of a potential link between maintenance of effective, dominant CD8 responses and the degree of early viremia reduction. We conclude that the early control of HIV-1 replication by immunodominant CD8+ T cell responses may be substantially influenced by rapid, low frequency viral adaptations not detected by conventional sequencing approaches, which warrants further investigation. These data support the critical need for vaccine-induced CD8+ T cell responses to target more highly constrained

  11. Testing deep reticulate evolution in Amaryllidaceae Tribe Hippeastreae (Asparagales) with ITS and chloroplast sequence data

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The phylogeny of Amaryllidaceae tribe Hippeastreae was inferred using chloroplast (3’ycf1, ndhF, trnL-F) and nuclear (ITS rDNA) sequence data under maximum parsimony and maximum likelihood frameworks. Network analyses were applied to resolve conflicting signals among data sets and putative scenarios...

  12. Ultra Deep Sequencing of Listeria monocytogenes sRNA Transcriptome Revealed New Antisense RNAs

    PubMed Central

    Behrens, Sebastian; Widder, Stefanie; Mannala, Gopala Krishna; Qing, Xiaoxing; Madhugiri, Ramakanth; Kefer, Nathalie; Mraheil, Mobarak Abu; Rattei, Thomas; Hain, Torsten

    2014-01-01

    Listeria monocytogenes, a gram-positive pathogen, and causative agent of listeriosis, has become a widely used model organism for intracellular infections. Recent studies have identified small non-coding RNAs (sRNAs) as important factors for regulating gene expression and pathogenicity of L. monocytogenes. Increased speed and reduced costs of high throughput sequencing (HTS) techniques have made RNA sequencing (RNA-Seq) the state-of-the-art method to study bacterial transcriptomes. We created a large transcriptome dataset of L. monocytogenes containing a total of 21 million reads, using the SOLiD sequencing technology. The dataset contained cDNA sequences generated from L. monocytogenes RNA collected under intracellular and extracellular condition and additionally was size fractioned into three different size ranges from <40 nt, 40–150 nt and >150 nt. We report here, the identification of nine new sRNAs candidates of L. monocytogenes and a reevaluation of known sRNAs of L. monocytogenes EGD-e. Automatic comparison to known sRNAs revealed a high recovery rate of 55%, which was increased to 90% by manual revision of the data. Moreover, thorough classification of known sRNAs shed further light on their possible biological functions. Interestingly among the newly identified sRNA candidates are antisense RNAs (asRNAs) associated to the housekeeping genes purA, fumC and pgi and potentially their regulation, emphasizing the significance of sRNAs for metabolic adaptation in L. monocytogenes. PMID:24498259

  13. Deep sequencing and variant analysis of an Italian pathogenic field strain of equine infectious anaemia virus.

    PubMed

    Cappelli, K; Cook, R F; Stefanetti, V; Passamonti, F; Autorino, G L; Scicluna, M T; Coletti, M; Verini Supplizi, A; Capomaccio, S

    2017-03-15

    Equine infectious anaemia virus (EIAV) is a lentivirus with an almost worldwide distribution that causes persistent infections in equids. Technical limitations have restricted genetic analysis of EIAV field isolates predominantly to gag sequences resulting in very little published information concerning the extent of inter-strain variation in pol, env and the three ancillary open reading frames (ORFs). Here, we describe the use of long-range PCR in conjunction with next-generation sequencing (NGS) for rapid molecular characterization of all viral ORFs and known transcription factor binding motifs within the long terminal repeat of two EIAV isolates from the 2006 Italian outbreak. These isolates were from foals believed to have been exposed to the same source material but with different clinical histories: one died 53 days post-infection (SA) while the other (DE) survived 5 months despite experiencing multiple febrile episodes. Nucleotide sequence identity between the isolates was 99.358% confirming infection with the same EIAV strain with most differences comprising single nucleotide polymorphisms in env and the second exon of rev. Although the synonymous:non-synonymous nucleotide substitution ratio was approximately 2:1 in gag and pol, the situation is reversed in env and ORF3 suggesting these sequences are subjected to host-mediated selective pressure. EIAV proviral quasispecies complexity in vivo has not been extensively investigated; however, analysis suggests it was relatively low in SA at the time of death. These results highlight advantages of NGS for molecular characterization of EIAV namely it avoids potential artefacts generated by traditional composite sequencing strategies and can provide information about viral quasispecies complexity.

  14. Contribution of Ultra Deep Sequencing in the Clinical Diagnosis of a New Fungal Pathogen Species: Basidiobolus meristosporus

    PubMed Central

    Sitterlé, Emilie; Rodriguez, Christophe; Mounier, Roman; Calderaro, Julien; Foulet, Françoise; Develoux, Michel; Pawlotsky, Jean-Michel; Botterel, Françoise

    2017-01-01

    Some cases of fungal infection remained undiagnosed, especially when the pathogens are uncommon, require specific conditions for in vitro growth, or when several microbial species are present in the specimen. Ultra-Deep Sequencing (UDS) could be considered as a precise tool in the identification of involved pathogens in order to upgrade patient treatment. In this study, we report the implementation of UDS technology in medical laboratory during the follow-up of an atypical fungal infection case. Thanks to UDS technology, we document the first case of gastro-intestinal basidiobolomycosis (GIB) due to Basidiobolus meristosporus. The diagnosis was suspected after histopathological examination but conventional microbiological methods failed to supply proof. The final diagnosis was made by means of an original approach based on UDS. DNA was extracted from the embedded colon biopsy obtained after hemicolectomy, and a fragment encompassing the internal transcribed spacer (ITS) rDNA region was PCR-amplified. An Amplicon library was then prepared using Genome Sequencer Junior Titanium Kits (Roche/454 Life Sciences) and the library was pyrosequenced on a GS Junior (Roche/454 Life Sciences). Using this method, 2,247 sequences with more than 100 bases were generated and used for UDS analysis. B. meristosporus represented 80% of the sequences, with an average homology of 98.8%. A phylogenetic tree with Basidiobolus reference sequences confirmed the presence of B. meristosporus (bootstrap value of 99%). Conclusion : UDS-based diagnostic approaches are ready to integrate conventional diagnostic testing to improve documentation of infectious disease and the therapeutic management of patients. PMID:28326064

  15. Contribution of Ultra Deep Sequencing in the Clinical Diagnosis of a New Fungal Pathogen Species: Basidiobolus meristosporus.

    PubMed

    Sitterlé, Emilie; Rodriguez, Christophe; Mounier, Roman; Calderaro, Julien; Foulet, Françoise; Develoux, Michel; Pawlotsky, Jean-Michel; Botterel, Françoise

    2017-01-01

    Some cases of fungal infection remained undiagnosed, especially when the pathogens are uncommon, require specific conditions for in vitro growth, or when several microbial species are present in the specimen. Ultra-Deep Sequencing (UDS) could be considered as a precise tool in the identification of involved pathogens in order to upgrade patient treatment. In this study, we report the implementation of UDS technology in medical laboratory during the follow-up of an atypical fungal infection case. Thanks to UDS technology, we document the first case of gastro-intestinal basidiobolomycosis (GIB) due to Basidiobolus meristosporus. The diagnosis was suspected after histopathological examination but conventional microbiological methods failed to supply proof. The final diagnosis was made by means of an original approach based on UDS. DNA was extracted from the embedded colon biopsy obtained after hemicolectomy, and a fragment encompassing the internal transcribed spacer (ITS) rDNA region was PCR-amplified. An Amplicon library was then prepared using Genome Sequencer Junior Titanium Kits (Roche/454 Life Sciences) and the library was pyrosequenced on a GS Junior (Roche/454 Life Sciences). Using this method, 2,247 sequences with more than 100 bases were generated and used for UDS analysis. B. meristosporus represented 80% of the sequences, with an average homology of 98.8%. A phylogenetic tree with Basidiobolus reference sequences confirmed the presence of B. meristosporus (bootstrap value of 99%). Conclusion : UDS-based diagnostic approaches are ready to integrate conventional diagnostic testing to improve documentation of infectious disease and the therapeutic management of patients.

  16. Identification and profiling of novel microRNAs in the Brassica rapa genome based on small RNA deep sequencing

    PubMed Central

    2012-01-01

    Background MicroRNAs (miRNAs) are one of the functional non-coding small RNAs involved in the epigenetic control of the plant genome. Although plants contain both evolutionary conserved miRNAs and species-specific miRNAs within their genomes, computational methods often only identify evolutionary conserved miRNAs. The recent sequencing of the Brassica rapa genome enables us to identify miRNAs and their putative target genes. In this study, we sought to provide a more comprehensive prediction of B. rapa miRNAs based on high throughput small RNA deep sequencing. Results We sequenced small RNAs from five types of tissue: seedlings, roots, petioles, leaves, and flowers. By analyzing 2.75 million unique reads that mapped to the B. rapa genome, we identified 216 novel and 196 conserved miRNAs that were predicted to target approximately 20% of the genome’s protein coding genes. Quantitative analysis of miRNAs from the five types of tissue revealed that novel miRNAs were expressed in diverse tissues but their expression levels were lower than those of the conserved miRNAs. Comparative analysis of the miRNAs between the B. rapa and Arabidopsis thaliana genomes demonstrated that redundant copies of conserved miRNAs in the B. rapa genome may have been deleted after whole genome triplication. Novel miRNA members seemed to have spontaneously arisen from the B. rapa and A. thaliana genomes, suggesting the species-specific expansion of miRNAs. We have made this data publicly available in a miRNA database of B. rapa called BraMRs. The database allows the user to retrieve miRNA sequences, their expression profiles, and a description of their target genes from the five tissue types investigated here. Conclusions This is the first report to identify novel miRNAs from Brassica crops using genome-wide high throughput techniques. The combination of computational methods and small RNA deep sequencing provides robust predictions of miRNAs in the genome. The finding of numerous novel mi

  17. Identification of representative genes of the central nervous system of the locust, Locusta migratoria manilensis by deep sequencing.

    PubMed

    Zhang, Zhengyi; Peng, Zhi-Yu; Yi, Kang; Cheng, Yanbing; Xia, Yuxian

    2012-01-01

    The shortage of available genomic and transcriptomic data hampers the molecular study on the migratory locust, Locusta migratoria manilensis (L.) (Orthoptera: Acrididae) central nervous system (CNS). In this study, locust CNS RNA was sequenced by deep sequencing. 41,179 unigenes were obtained with an average length of 570 bp, and 5,519 unigenes were longer than 1,000 bp. Compared with an EST database of another locust species Schistocerca gregaria Forsskåi, 9,069 unigenes were found conserved, while 32,110 unigenes were differentially expressed. A total of 15,895 unigenes were identified, including 644 nervous system relevant unigenes. Among the 25,284 unknown unigenes, 9,482 were found to be specific to the CNS by filtering out the previous ESTs acquired from locust organs without CNS's. The locust CNS showed the most matches (18%) with Tribolium castaneum (Herbst) (Coleoptera: Tenebrionidae) sequences. Comprehensive assessment reveals that the database generated in this study is broadly representative of the CNS of adult locust, providing comprehensive gene information at the transcriptional level that could facilitate research of the locust CNS, including various physiological aspects and pesticide target finding.

  18. Deep sequencing identifies viral and wasp genes with potential roles in replication of Microplitis demolitor Bracovirus.

    PubMed

    Burke, Gaelen R; Strand, Michael R

    2012-03-01

    Viruses in the genus Bracovirus (BV) (Polydnaviridae) are symbionts of parasitoid wasps that specifically replicate in the ovaries of females. Recent analysis of expressed sequence tags from two wasp species, Cotesia congregata and Chelonus inanitus, identified transcripts related to 24 different nudivirus genes. These results together with other data strongly indicate that BVs evolved from a nudivirus ancestor. However, it remains unclear whether BV-carrying wasps contain other nudivirus-like genes and what types of wasp genes may also be required for BV replication. Microplitis demolitor carries Microplitis demolitor bracovirus (MdBV). Here we characterized MdBV replication and performed massively parallel sequencing of M. demolitor ovary transcripts. Our results indicated that MdBV replication begins in stage 2 pupae and continues in adults. Analysis of prereplication- and active-replication-stage ovary RNAs yielded 22 Gb of sequence that assembled into 66,425 transcripts. This breadth of sampling indicated that a large percentage of genes in the M. demolitor genome were sequenced. A total of 41 nudivirus-like transcripts were identified, of which a majority were highly expressed during MdBV replication. Our results also identified a suite of wasp genes that were highly expressed during MdBV replication. Among these products were several transcripts with conserved roles in regulating locus-specific DNA amplification by eukaryotes. Overall, our data set together with prior results likely identify the majority of nudivirus-related genes that are transcriptionally functional during BV replication. Our results also suggest that amplification of proviral DNAs for packaging into BV virions may depend upon the replication machinery of wasps.

  19. High diversity of picornaviruses in rats from different continents revealed by deep sequencing

    PubMed Central

    Hansen, Thomas Arn; Mollerup, Sarah; Nguyen, Nam-phuong; White, Nicole E; Coghlan, Megan; Alquezar-Planas, David E; Joshi, Tejal; Jensen, Randi Holm; Fridholm, Helena; Kjartansdóttir, Kristín Rós; Mourier, Tobias; Warnow, Tandy; Belsham, Graham J; Bunce, Michael; Willerslev, Eske; Nielsen, Lars Peter; Vinner, Lasse; Hansen, Anders Johannes

    2016-01-01

    Outbreaks of zoonotic diseases in humans and livestock are not uncommon, and an important component in containment of such emerging viral diseases is rapid and reliable diagnostics. Such methods are often PCR-based and hence require the availability of sequence data from the pathogen. Rattus norvegicus (R. norvegicus) is a known reservoir for important zoonotic pathogens. Transmission may be direct via contact with the animal, for example, through exposure to its faecal matter, or indirectly mediated by arthropod vectors. Here we investigated the viral content in rat faecal matter (n=29) collected from two continents by analyzing 2.2 billion next-generation sequencing reads derived from both DNA and RNA. Among other virus families, we found sequences from members of the Picornaviridae to be abundant in the microbiome of all the samples. Here we describe the diversity of the picornavirus-like contigs including near-full-length genomes closely related to the Boone cardiovirus and Theiler's encephalomyelitis virus. From this study, we conclude that picornaviruses within R. norvegicus are more diverse than previously recognized. The virome of R. norvegicus should be investigated further to assess the full potential for zoonotic virus transmission. PMID:27530749

  20. High diversity of picornaviruses in rats from different continents revealed by deep sequencing.

    PubMed

    Hansen, Thomas Arn; Mollerup, Sarah; Nguyen, Nam-Phuong; White, Nicole E; Coghlan, Megan; Alquezar-Planas, David E; Joshi, Tejal; Jensen, Randi Holm; Fridholm, Helena; Kjartansdóttir, Kristín Rós; Mourier, Tobias; Warnow, Tandy; Belsham, Graham J; Bunce, Michael; Willerslev, Eske; Nielsen, Lars Peter; Vinner, Lasse; Hansen, Anders Johannes

    2016-08-17

    Outbreaks of zoonotic diseases in humans and livestock are not uncommon, and an important component in containment of such emerging viral diseases is rapid and reliable diagnostics. Such methods are often PCR-based and hence require the availability of sequence data from the pathogen. Rattus norvegicus (R. norvegicus) is a known reservoir for important zoonotic pathogens. Transmission may be direct via contact with the animal, for example, through exposure to its faecal matter, or indirectly mediated by arthropod vectors. Here we investigated the viral content in rat faecal matter (n=29) collected from two continents by analyzing 2.2 billion next-generation sequencing reads derived from both DNA and RNA. Among other virus families, we found sequences from members of the Picornaviridae to be abundant in the microbiome of all the samples. Here we describe the diversity of the picornavirus-like contigs including near-full-length genomes closely related to the Boone cardiovirus and Theiler's encephalomyelitis virus. From this study, we conclude that picornaviruses within R. norvegicus are more diverse than previously recognized. The virome of R. norvegicus should be investigated further to assess the full potential for zoonotic virus transmission.

  1. Deep sequencing uncovers protistan plankton diversity in the Portuguese Ria Formosa solar saltern ponds.

    PubMed

    Filker, Sabine; Gimmler, Anna; Dunthorn, Micah; Mahé, Frédéric; Stoeck, Thorsten

    2015-03-01

    We used high-throughput sequencing to unravel the genetic diversity of protistan (including fungal) plankton in hypersaline ponds of the Ria Formosa solar saltern works in Portugal. From three ponds of different salinity (4, 12 and 38 %), we obtained ca. 105,000 amplicons (V4 region of the SSU rDNA). The genetic diversity we found was higher than what has been described from solar saltern ponds thus far by microscopy or molecular studies. The obtained operational taxonomic units (OTUs) could be assigned to 14 high-rank taxonomic groups and blasted to 120 eukaryotic families. The novelty of this genetic diversity was extremely high, with 27 % of all OTUs having a sequence divergence of more than 10 % to deposited sequences of described taxa. The highest degree of novelty was found at intermediate salinity of 12 % within the ciliates, which traditionally are considered as the best known and described taxon group within the kingdom Protista. Further substantial novelty was detected within the stramenopiles and the chlorophytes. Analyses of community structures suggest a transition boundary for protistan plankton between 4 and 12 % salinity, suggesting different haloadaptation strategies in individual evolutionary lineages as a result of environmental filtering. Our study makes evident the gaps in our knowledge not only of protistan and fungal plankton diversity in hypersaline environments, but also in their ecology and their strategies to cope with these environmental conditions. It substantiates that specific future research needs to fill these gaps.

  2. Predicting backbone Cα angles and dihedrals from protein sequences by stacked sparse auto-encoder deep neural network.

    PubMed

    Lyons, James; Dehzangi, Abdollah; Heffernan, Rhys; Sharma, Alok; Paliwal, Kuldip; Sattar, Abdul; Zhou, Yaoqi; Yang, Yuedong

    2014-10-30

    Because a nearly constant distance between two neighbouring Cα atoms, local backbone structure of proteins can be represented accurately by the angle between C(αi-1)-C(αi)-C(αi+1) (θ) and a dihedral angle rotated about the C(αi)-C(αi+1) bond (τ). θ and τ angles, as the representative of structural properties of three to four amino-acid residues, offer a description of backbone conformations that is complementary to φ and ψ angles (single residue) and secondary structures (>3 residues). Here, we report the first machine-learning technique for sequence-based prediction of θ and τ angles. Predicted angles based on an independent test have a mean absolute error of 9° for θ and 34° for τ with a distribution on the θ-τ plane close to that of native values. The average root-mean-square distance of 10-residue fragment structures constructed from predicted θ and τ angles is only 1.9Å from their corresponding native structures. Predicted θ and τ angles are expected to be complementary to predicted ϕ and ψ angles and secondary structures for using in model validation and template-based as well as template-free structure prediction. The deep neural network learning technique is available as an on-line server called Structural Property prediction with Integrated DEep neuRal network (SPIDER) at http://sparks-lab.org.

  3. Exploring the Gastrointestinal “Nemabiome”: Deep Amplicon Sequencing to Quantify the Species Composition of Parasitic Nematode Communities

    PubMed Central

    Avramenko, Russell W.; Redman, Elizabeth M.; Lewis, Roy; Yazwinski, Thomas A.; Wasmuth, James D.; Gilleard, John S.

    2015-01-01

    Parasitic helminth infections have a considerable impact on global human health as well as animal welfare and production. Although co-infection with multiple parasite species within a host is common, there is a dearth of tools with which to study the composition of these complex parasite communities. Helminth species vary in their pathogenicity, epidemiology and drug sensitivity and the interactions that occur between co-infecting species and their hosts are poorly understood. We describe the first application of deep amplicon sequencing to study parasitic nematode communities as well as introduce the concept of the gastro-intestinal “nemabiome”. The approach is analogous to 16S rDNA deep sequencing used to explore microbial communities, but utilizes the nematode ITS-2 rDNA locus instead. Gastro-intestinal parasites of cattle were used to develop the concept, as this host has many well-defined gastro-intestinal nematode species that commonly occur as complex co-infections. Further, the availability of pure mono-parasite populations from experimentally infected cattle allowed us to prepare mock parasite communities to determine, and correct for, species representation biases in the sequence data. We demonstrate that, once these biases have been corrected, accurate relative quantitation of gastro-intestinal parasitic nematode communities in cattle fecal samples can be achieved. We have validated the accuracy of the method applied to field-samples by comparing the results of detailed morphological examination of L3 larvae populations with those of the sequencing assay. The results illustrate the insights that can be gained into the species composition of parasite communities, using grazing cattle in the mid-west USA as an example. However, both the technical approach and the concept of the ‘nemabiome’ have a wide range of potential applications in human and veterinary medicine. These include investigations of host-parasite and parasite-parasite interactions

  4. Identification of an NAC Transcription Factor Family by Deep Transcriptome Sequencing in Onion (Allium cepa L.)

    PubMed Central

    Zhu, Siyuan; Dai, Qiuzhong; Liu, Touming

    2016-01-01

    Although onion has been used extensively in the past for cytogenetic studies, molecular analysis has been lacking because the availability of genetic resources is limited. NAM, ATAF, and CUC (NAC) transcription factors (TFs) are plant-specific proteins, and they play key roles in plant growth, development, and stress tolerance. However, none of the onion NAC (CepNAC) genes had been identified thus far. In this study, the transcriptome of onion leaves was analyzed by Illumina paired-end sequencing. Approximately 102.9 million clean sequence reads were produced and used for de novo assembly, which generated 117,189 non-redundant transcripts. Of these transcripts, 39,472 were annotated for their function. In order to mine the CepNAC TFs, CepNAC genes were searched from the transcripts assembled, resulting in the identification of all 39 CepNAC genes. These 39 CepNAC proteins were subjected to phylogenetic analysis together with 47 NAC proteins of known function that were previously identified in other species. The results showed that they can be divided into five groups (NAC-I–V). Interestingly, the NAC-IV and -V groups were found to be likely related to the processes of secondary wall synthesis and stress response, respectively. The transcriptome analysis generated a substantial amount of transcripts, which will aid immensely in identifying important genes and accelerating our understanding of onion growth and development. Moreover, the discovery of 39 CepNAC TFs and the identification of the sequence conservation between them and NAC proteins published will provide a basis for further characterization and validation of their functions in the future. PMID:27331904

  5. Deep Sequencing-Based Analysis of the Cymbidium ensifolium Floral Transcriptome

    PubMed Central

    Li, Xiaobai; Luo, Jie; Yan, Tianlian; Xiang, Lin; Jin, Feng; Qin, Dehui; Sun, Chongbo; Xie, Ming

    2013-01-01

    Cymbidium ensifolium is a Chinese Cymbidium with an elegant shape, beautiful appearance, and a fragrant aroma. C. ensifolium has a long history of cultivation in China and it has excellent commercial value as a potted plant and cut flower. The development of C. ensifolium genomic resources has been delayed because of its large genome size. Taking advantage of technical and cost improvement of RNA-Seq, we extracted total mRNA from flower buds and mature flowers and obtained a total of 9.52 Gb of filtered nucleotides comprising 98,819,349 filtered reads. The filtered reads were assembled into 101,423 isotigs, representing 51,696 genes. Of the 101,423 isotigs, 41,873 were putative homologs of annotated sequences in the public databases, of which 158 were associated with floral development and 119 were associated with flowering. The isotigs were categorized according to their putative functions. In total, 10,212 of the isotigs were assigned into 25 eukaryotic orthologous groups (KOGs), 41,690 into 58 gene ontology (GO) terms, and 9,830 into 126 Arabidopsis Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, and 9,539 isotigs into 123 rice pathways. Comparison of the isotigs with those of the two related orchid species P. equestris and C. sinense showed that 17,906 isotigs are unique to C. ensifolium. In addition, a total of 7,936 SSRs and 16,676 putative SNPs were identified. To our knowledge, this transcriptome database is the first major genomic resource for C. ensifolium and the most comprehensive transcriptomic resource for genus Cymbidium. These sequences provide valuable information for understanding the molecular mechanisms of floral development and flowering. Sequences predicted to be unique to C. ensifolium would provide more insights into C. ensifolium gene diversity. The numerous SNPs and SSRs identified in the present study will contribute to marker development for C. ensifolium. PMID:24392013

  6. Sequence-of-events-driven automation of the deep space network

    NASA Technical Reports Server (NTRS)

    Hill, R., Jr.; Fayyad, K.; Smyth, C.; Santos, T.; Chen, R.; Chien, S.; Bevan, R.

    1996-01-01

    In February 1995, sequence-of-events (SOE)-driven automation technology was demonstrated for a Voyager telemetry downlink track at DSS 13. This demonstration entailed automated generation of an operations procedure (in the form of a temporal dependency network) from project SOE information using artificial intelligence planning technology and automated execution of the temporal dependency network using the link monitor and control operator assistant system. This article describes the overall approach to SOE-driven automation that was demonstrated, identifies gaps in SOE definitions and project profiles that hamper automation, and provides detailed measurements of the knowledge engineering effort required for automation.

  7. Identification of MicroRNAs in Helicoverpa armigera and Spodoptera litura Based on Deep Sequencing and Homology Analysis

    PubMed Central

    Ge, Xie; Zhang, Yong; Jiang, Jianhao; Zhong, Yi; Yang, Xiaonan; Li, Zhiqian; Huang, Yongping; Tan, Anjiang

    2013-01-01

    The current identification of microRNAs (miRNAs) in insects is largely dependent on genome sequences. However, the lack of available genome sequences inhibits the identification of miRNAs in various insect species. In this study, we used a miRNA database of the silkworm Bombyx mori as a reference to identify miRNAs in Helicoverpa armigera and Spodoptera litura using deep sequencing and homology analysis. Because all three species belong to the Lepidoptera, the experiment produced reliable results. Our study identified 97 and 91 conserved miRNAs in H. armigera and S. litura, respectively. Using the genome of B. mori and BAC sequences of H. armigera as references, 1 novel miRNA and 8 novel miRNA candidates were identified in H. armigera, and 4 novel miRNA candidates were identified in S. litura. An evolutionary analysis revealed that most of the identified miRNAs were insect-specific, and more than 20 miRNAs were Lepidoptera-specific. The investigation of the expression patterns of miR-2a, miR-34, miR-2796-3p and miR-11 revealed their potential roles in insect development. miRNA target prediction revealed that conserved miRNA target sites exist in various genes in the 3 species. Conserved miRNA target sites for the Hsp90 gene among the 3 species were validated in the mammalian 293T cell line using a dual-luciferase reporter assay. Our study provides a new approach with which to identify miRNAs in insects lacking genome information and contributes to the functional analysis of insect miRNAs. PMID:23289012

  8. Deep Sequencing of Plant and Animal DNA Contained within Traditional Chinese Medicines Reveals Legality Issues and Health Safety Concerns

    PubMed Central

    Coghlan, Megan L.; Haile, James; Houston, Jayne; Murray, Dáithí C.; White, Nicole E.; Moolhuijzen, Paula; Bellgard, Matthew I.; Bunce, Michael

    2012-01-01

    Traditional Chinese medicine (TCM) has been practiced for thousands of years, but only within the last few decades has its use become more widespread outside of Asia. Concerns continue to be raised about the efficacy, legality, and safety of many popular complementary alternative medicines, including TCMs. Ingredients of some TCMs are known to include derivatives of endangered, trade-restricted species of plants and animals, and therefore contravene the Convention on International Trade in Endangered Species (CITES) legislation. Chromatographic studies have detected the presence of heavy metals and plant toxins within some TCMs, and there are numerous cases of adverse reactions. It is in the interests of both biodiversity conservation and public safety that techniques are developed to screen medicinals like TCMs. Targeting both the p-loop region of the plastid trnL gene and the mitochondrial 16S ribosomal RNA gene, over 49,000 amplicon sequence reads were generated from 15 TCM samples presented in the form of powders, tablets, capsules, bile flakes, and herbal teas. Here we show that second-generation, high-throughput sequencing (HTS) of DNA represents an effective means to genetically audit organic ingredients within complex TCMs. Comparison of DNA sequence data to reference databases revealed the presence of 68 different plant families and included genera, such as Ephedra and Asarum, that are potentially toxic. Similarly, animal families were identified that include genera that are classified as vulnerable, endangered, or critically endangered, including Asiatic black bear (Ursus thibetanus) and Saiga antelope (Saiga tatarica). Bovidae, Cervidae, and Bufonidae DNA were also detected in many of the TCM samples and were rarely declared on the product packaging. This study demonstrates that deep sequencing via HTS is an efficient and cost-effective way to audit highly processed TCM products and will assist in monitoring their legality and safety especially when

  9. Deep sequencing of plant and animal DNA contained within traditional Chinese medicines reveals legality issues and health safety concerns.

    PubMed

    Coghlan, Megan L; Haile, James; Houston, Jayne; Murray, Dáithí C; White, Nicole E; Moolhuijzen, Paula; Bellgard, Matthew I; Bunce, Michael

    2012-01-01

    Traditional Chinese medicine (TCM) has been practiced for thousands of years, but only within the last few decades has its use become more widespread outside of Asia. Concerns continue to be raised about the efficacy, legality, and safety of many popular complementary alternative medicines, including TCMs. Ingredients of some TCMs are known to include derivatives of endangered, trade-restricted species of plants and animals, and therefore contravene the Convention on International Trade in Endangered Species (CITES) legislation. Chromatographic studies have detected the presence of heavy metals and plant toxins within some TCMs, and there are numerous cases of adverse reactions. It is in the interests of both biodiversity conservation and public safety that techniques are developed to screen medicinals like TCMs. Targeting both the p-loop region of the plastid trnL gene and the mitochondrial 16S ribosomal RNA gene, over 49,000 amplicon sequence reads were generated from 15 TCM samples presented in the form of powders, tablets, capsules, bile flakes, and herbal teas. Here we show that second-generation, high-throughput sequencing (HTS) of DNA represents an effective means to genetically audit organic ingredients within complex TCMs. Comparison of DNA sequence data to reference databases revealed the presence of 68 different plant families and included genera, such as Ephedra and Asarum, that are potentially toxic. Similarly, animal families were identified that include genera that are classified as vulnerable, endangered, or critically endangered, including Asiatic black bear (Ursus thibetanus) and Saiga antelope (Saiga tatarica). Bovidae, Cervidae, and Bufonidae DNA were also detected in many of the TCM samples and were rarely declared on the product packaging. This study demonstrates that deep sequencing via HTS is an efficient and cost-effective way to audit highly processed TCM products and will assist in monitoring their legality and safety especially when

  10. Deep Sequencing of Subseafloor Eukaryotic rRNA Reveals Active Fungi across Marine Subsurface Provinces

    PubMed Central

    Orsi, William; Biddle, Jennifer F.; Edgcomb, Virginia

    2013-01-01

    The deep marine subsurface is a vast habitat for microbial life where cells may live on geologic timescales. Because DNA in sediments may be preserved on long timescales, ribosomal RNA (rRNA) is suggested to be a proxy for the active fraction of a microbial community in the subsurface. During an investigation of eukaryotic 18S rRNA by amplicon pyrosequencing, unique profiles of Fungi were found across a range of marine subsurface provinces including ridge flanks, continental margins, and abyssal plains. Subseafloor fungal populations exhibit statistically significant correlations with total organic carbon (TOC), nitrate, sulfide, and dissolved inorganic carbon (DIC). These correlations are supported by terminal restriction length polymorphism (TRFLP) analyses of fungal rRNA. Geochemical correlations with fungal pyrosequencing and TRFLP data from this geographically broad sample set suggests environmental selection of active Fungi in the marine subsurface. Within the same dataset, ancient rRNA signatures were recovered from plants and diatoms in marine sediments ranging from 0.03 to 2.7 million years old, suggesting that rRNA from some eukaryotic taxa may be much more stable than previously considered in the marine subsurface. PMID:23418556

  11. Deep sequencing of subseafloor eukaryotic rRNA reveals active Fungi across marine subsurface provinces.

    PubMed

    Orsi, William; Biddle, Jennifer F; Edgcomb, Virginia

    2013-01-01

    The deep marine subsurface is a vast habitat for microbial life where cells may live on geologic timescales. Because DNA in sediments may be preserved on long timescales, ribosomal RNA (rRNA) is suggested to be a proxy for the active fraction of a microbial community in the subsurface. During an investigation of eukaryotic 18S rRNA by amplicon pyrosequencing, unique profiles of Fungi were found across a range of marine subsurface provinces including ridge flanks, continental margins, and abyssal plains. Subseafloor fungal populations exhibit statistically significant correlations with total organic carbon (TOC), nitrate, sulfide, and dissolved inorganic carbon (DIC). These correlations are supported by terminal restriction length polymorphism (TRFLP) analyses of fungal rRNA. Geochemical correlations with fungal pyrosequencing and TRFLP data from this geographically broad sample set suggests environmental selection of active Fungi in the marine subsurface. Within the same dataset, ancient rRNA signatures were recovered from plants and diatoms in marine sediments ranging from 0.03 to 2.7 million years old, suggesting that rRNA from some eukaryotic taxa may be much more stable than previously considered in the marine subsurface.

  12. Power of deep sequencing and agilent microarray for gene expression profiling study.

    PubMed

    Feng, Lin; Liu, Hang; Liu, Yu; Lu, Zhike; Guo, Guangwu; Guo, Suping; Zheng, Hongwei; Gao, Yanning; Cheng, Shujun; Wang, Jian; Zhang, Kaitai; Zhang, Yong

    2010-06-01

    Next-generation sequencing-based Digital Gene Expression tag profiling (DGE) has been used to study the changes in gene expression profiling. To compare the quality of the data generated by microarray and DGE, we examined the gene expression profiles of an in vitro cell model with these platforms. In this study, 17,362 and 15,938 genes were detected by microarray and DGE, respectively, with 13,221 overlapping genes. The correlation coefficients between the technical replicates were >0.99 and the detection variance was <9% for both platforms. The dynamic range of microarray was fixed with four orders of magnitude, whereas that of DGE was extendable. The consistency of the two platforms was high, especially for those abundant genes. It was more difficult for the microarray to distinguish the expression variation of less abundant genes. Although microarrays might be eventually replaced by DGE or transcriptome sequencing (RNA-seq) in the near future, microarrays are still stable, practical, and feasible, which may be useful for most biological researchers.

  13. Targeted deep sequencing of flowering regulators in Brassica napus reveals extensive copy number variation

    PubMed Central

    Schiessl, Sarah; Huettel, Bruno; Kuehn, Diana; Reinhardt, Richard; Snowdon, Rod J.

    2017-01-01

    Gene copy number variation (CNV) is increasingly implicated in control of complex trait networks, particularly in polyploid plants like rapeseed (Brassica napus L.) with an evolutionary history of genome restructuring. Here we performed sequence capture to assay nucleotide variation and CNV in a panel of central flowering time regulatory genes across a species-wide diversity set of 280 B. napus accessions. The genes were chosen based on prior knowledge from Arabidopsis thaliana and related Brassica species. Target enrichment was performed using the Agilent SureSelect technology, followed by Illumina sequencing. A bait (probe) pool was developed based on results of a preliminary experiment with representatives from different B. napus morphotypes. A very high mean target coverage of ~670x allowed reliable calling of CNV, single nucleotide polymorphisms (SNPs) and insertion-deletion (InDel) polymorphisms. No accession exhibited no CNV, and at least one homolog of every gene we investigated showed CNV in some accessions. Some CNV appear more often in specific morphotypes, indicating a role in diversification. PMID:28291231

  14. Evolutionary Relations of Hexanchiformes Deep-Sea Sharks Elucidated by Whole Mitochondrial Genome Sequences

    PubMed Central

    Tanaka, Keiko; Tomita, Taketeru; Suzuki, Shingo; Hosomichi, Kazuyoshi; Sano, Kazumi; Doi, Hiroyuki; Kono, Azumi; Inoko, Hidetoshi; Kulski, Jerzy K.; Tanaka, Sho

    2013-01-01

    Hexanchiformes is regarded as a monophyletic taxon, but the morphological and genetic relationships between the five extant species within the order are still uncertain. In this study, we determined the whole mitochondrial DNA (mtDNA) sequences of seven sharks including representatives of the five Hexanchiformes, one squaliform, and one carcharhiniform and inferred the phylogenetic relationships among those species and 12 other Chondrichthyes (cartilaginous fishes) species for which the complete mitogenome is available. The monophyly of Hexanchiformes and its close relation with all other Squaliformes sharks were strongly supported by likelihood and Bayesian phylogenetic analysis of 13,749 aligned nucleotides of 13 protein coding genes and two rRNA genes that were derived from the whole mDNA sequences of the 19 species. The phylogeny suggested that Hexanchiformes is in the superorder Squalomorphi, Chlamydoselachus anguineus (frilled shark) is the sister species to all other Hexanchiformes, and the relations within Hexanchiformes are well resolved as Chlamydoselachus, (Notorynchus, (Heptranchias, (Hexanchus griseus, H. nakamurai))). Based on our phylogeny, we discussed evolutionary scenarios of the jaw suspension mechanism and gill slit numbers that are significant features in the sharks. PMID:24089661

  15. Deep Sequencing of the Human Retinae Reveals the Expression of Odorant Receptors

    PubMed Central

    Jovancevic, Nikolina; Wunderlich, Kirsten A.; Haering, Claudia; Flegel, Caroline; Maßberg, Désirée; Weinrich, Markus; Weber, Lea; Tebbe, Lars; Kampik, Anselm; Gisselmann, Günter; Wolfrum, Uwe; Hatt, Hanns; Gelis, Lian

    2017-01-01

    Several studies have demonstrated that the expression of odorant receptors (ORs) occurs in various tissues. These findings have served as a basis for functional studies that demonstrate the potential of ORs as drug targets for a clinical application. To the best of our knowledge, this report describes the first evaluation of the mRNA expression of ORs and the localization of OR proteins in the human retina that set a stage for subsequent functional analyses. RNA-Sequencing datasets of three individual neural retinae were generated using Next-generation sequencing and were compared to previously published but reanalyzed datasets of the peripheral and the macular human retina and to reference tissues. The protein localization of several ORs was investigated by immunohistochemistry. The transcriptome analyses detected an average of 14 OR transcripts in the neural retina, of which OR6B3 is one of the most highly expressed ORs. Immunohistochemical stainings of retina sections localized OR2W3 to the photosensitive outer segment membranes of cones, whereas OR6B3 was found in various cell types. OR5P3 and OR10AD1 were detected at the base of the photoreceptor connecting cilium, and OR10AD1 was also localized to the nuclear envelope of all of the nuclei of the retina. The cell type-specific expression of the ORs in the retina suggests that there are unique biological functions for those receptors. PMID:28174521

  16. Focused Evolution of HIV-1 Neutralizing Antibodies Revealed by Structures and Deep Sequencing

    SciTech Connect

    Wu, Xueling; Zhou, Tongqing; Zhu, Jiang; Zhang, Baoshan; Georgiev, Ivelin; Wang, Charlene; Chen, Xuejun; Longo, Nancy S.; Louder, Mark; McKee, Krisha; O’Dell, Sijy; Perfetto, Stephen; Schmidt, Stephen D.; Shi, Wei; Wu, Lan; Yang, Yongping; Yang, Zhi-Yong; Yang, Zhongjia; Zhang, Zhenhai; Bonsignori, Mattia; Crump, John A.; Kapiga, Saidi H.; Sam, Noel E.; Haynes, Barton F.; Simek, Melissa; Burton, Dennis R.; Koff, Wayne C.; Doria-Rose, Nicole A.; Connors, Mark; Mullikin, James C.; Nabel, Gary J.; Roederer, Mario; Shapiro, Lawrence; Kwong, Peter D.; Mascola, John R.

    2013-03-04

    Antibody VRC01 is a human immunoglobulin that neutralizes about 90% of HIV-1 isolates. To understand how such broadly neutralizing antibodies develop, we used x-ray crystallography and 454 pyrosequencing to characterize additional VRC01-like antibodies from HIV-1-infected individuals. Crystal structures revealed a convergent mode of binding for diverse antibodies to the same CD4-binding-site epitope. A functional genomics analysis of expressed heavy and light chains revealed common pathways of antibody-heavy chain maturation, confined to the IGHV1-2*02 lineage, involving dozens of somatic changes, and capable of pairing with different light chains. Broadly neutralizing HIV-1 immunity associated with VRC01-like antibodies thus involves the evolution of antibodies to a highly affinity-matured state required to recognize an invariant viral structure, with lineages defined from thousands of sequences providing a genetic roadmap of their development.

  17. Genomic DNA sequences from mastodon and woolly mammoth reveal deep speciation of forest and savanna elephants.

    PubMed

    Rohland, Nadin; Reich, David; Mallick, Swapan; Meyer, Matthias; Green, Richard E; Georgiadis, Nicholas J; Roca, Alfred L; Hofreiter, Michael

    2010-12-21

    To elucidate the history of living and extinct elephantids, we generated 39,763 bp of aligned nuclear DNA sequence across 375 loci for African savanna elephant, African forest elephant, Asian elephant, the extinct American mastodon, and the woolly mammoth. Our data establish that the Asian elephant is the closest living relative of the extinct mammoth in the nuclear genome, extending previous findings from mitochondrial DNA analyses. We also find that savanna and forest elephants, which some have argued are the same species, are as or more divergent in the nuclear genome as mammoths and Asian elephants, which are considered to be distinct genera, thus resolving a long-standing debate about the appropriate taxonomic classification of the African elephants. Finally, we document a much larger effective population size in forest elephants compared with the other elephantid taxa, likely reflecting species differences in ancient geographic structure and range and differences in life history traits such as variance in male reproductive success.

  18. The transcriptome of Verticillium dahliae-infected Nicotiana benthamiana determined by deep RNA sequencing.

    PubMed

    Faino, Luigi; de Jonge, Ronnie; Thomma, Bart P H J

    2012-09-01

    Verticillium wilt disease is caused by fungi of the Verticillium genus that occur on a wide range of host plants, including Solanaceous species such as tomato and tobacco. Currently, the well characterized Ve1 gene of tomato is the only Verticillium wilt resistance gene cloned. During experiments to identify the Verticillium molecule that activates Ve1 resistance in tomato, RNA sequencing (RNA-Seq) of Verticillium-infected Nicotiana benthamiana was performed. In total, over 99% of the obtained reads were derived from N. benthamiana. Here, we report the assembly and annotation of the N. benthamiana transcriptome. In total, 142,738 transcripts > 100 bp were obtained, amounting to a total transcriptome size of 38.7 Mbp, which is comparable to the Arabidopsis transcriptome. About 30,282 transcripts could be annotated based on homology to Arabidopsis genes. By assembly of the N. benthamiana transcriptome, we provide a catalogue of transcripts of a Solanaceous model plant under pathogen stress.

  19. Deep Sequencing of MYC DNA-Binding Sites in Burkitt Lymphoma

    PubMed Central

    Seitz, Volkhard; Butzhammer, Peter; Hirsch, Burkhard; Hecht, Jochen; Gütgemann, Ines; Ehlers, Anke; Lenze, Dido; Oker, Elisabeth; Sommerfeld, Anke; von der Wall, Edda; König, Christoph; Zinser, Christian; Spang, Rainer; Hummel, Michael

    2011-01-01

    Background MYC is a key transcription factor involved in central cellular processes such as regulation of the cell cycle, histone acetylation and ribosomal biogenesis. It is overexpressed in the majority of human tumors including aggressive B-cell lymphoma. Especially Burkitt lymphoma (BL) is a highlight example for MYC overexpression due to a chromosomal translocation involving the c-MYC gene. However, no genome-wide analysis of MYC-binding sites by chromatin immunoprecipitation (ChIP) followed by next generation sequencing (ChIP-Seq) has been conducted in BL so far. Methodology/Principal Findings ChIP-Seq was performed on 5 BL cell lines with a MYC-specific antibody giving rise to 7,054 MYC-binding sites after bioinformatics analysis of a total of approx. 19 million sequence reads. In line with previous findings, binding sites accumulate in gene sets known to be involved in the cell cycle, ribosomal biogenesis, histone acetyltransferase and methyltransferase complexes demonstrating a regulatory role of MYC in these processes. Unexpectedly, MYC-binding sites also accumulate in many B-cell relevant genes. To assess the functional consequences of MYC binding, the ChIP-Seq data were supplemented with siRNA- mediated knock-downs of MYC in BL cell lines followed by gene expression profiling. Interestingly, amongst others, genes involved in the B-cell function were up-regulated in response to MYC silencing. Conclusion/Significance The 7,054 MYC-binding sites identified by our ChIP-Seq approach greatly extend the knowledge regarding MYC binding in BL and shed further light on the enormous complexity of the MYC regulatory network. Especially our observations that (i) many B-cell relevant genes are targeted by MYC and (ii) that MYC down-regulation leads to an up-regulation of B-cell genes highlight an interesting aspect of BL biology. PMID:22102868

  20. Ultra-Deep Sequencing Reveals the microRNA Expression Pattern of the Human Stomach

    PubMed Central

    Ribeiro-dos-Santos, Ândrea; Khayat, André S.; Silva, Artur; Alencar, Dayse O.; Lobato, Jessé; Luz, Larissa; Pinheiro, Daniel G.; Varuzza, Leonardo; Assumpção, Monica; Assumpção, Paulo; Santos, Sidney; Zanette, Dalila L.; Silva, Wilson A.; Burbano, Rommel; Darnet, Sylvain

    2010-01-01

    Background While microRNAs (miRNAs) play important roles in tissue differentiation and in maintaining basal physiology, little is known about the miRNA expression levels in stomach tissue. Alterations in the miRNA profile can lead to cell deregulation, which can induce neoplasia. Methodology/Principal Findings A small RNA library of stomach tissue was sequenced using high-throughput SOLiD sequencing technology. We obtained 261,274 quality reads with perfect matches to the human miRnome, and 42% of known miRNAs were identified. Digital Gene Expression profiling (DGE) was performed based on read abundance and showed that fifteen miRNAs were highly expressed in gastric tissue. Subsequently, the expression of these miRNAs was validated in 10 healthy individuals by RT-PCR showed a significant correlation of 83.97% (P<0.05). Six miRNAs showed a low variable pattern of expression (miR-29b, miR-29c, miR-19b, miR-31, miR-148a, miR-451) and could be considered part of the expression pattern of the healthy gastric tissue. Conclusions/Significance This study aimed to validate normal miRNA profiles of human gastric tissue to establish a reference profile for healthy individuals. Determining the regulatory processes acting in the stomach will be important in the fight against gastric cancer, which is the second-leading cause of cancer mortality worldwide. PMID:20949028

  1. Deciphering KRAS and NRAS mutated clone dynamics in MLL-AF4 paediatric leukaemia by ultra deep sequencing analysis.

    PubMed

    Trentin, Luca; Bresolin, Silvia; Giarin, Emanuela; Bardini, Michela; Serafin, Valentina; Accordi, Benedetta; Fais, Franco; Tenca, Claudya; De Lorenzo, Paola; Valsecchi, Maria Grazia; Cazzaniga, Giovanni; Kronnie, Geertruy Te; Basso, Giuseppe

    2016-10-04

    To induce and sustain the leukaemogenic process, MLL-AF4+ leukaemia seems to require very few genetic alterations in addition to the fusion gene itself. Studies of infant and paediatric patients with MLL-AF4+ B cell precursor acute lymphoblastic leukaemia (BCP-ALL) have reported mutations in KRAS and NRAS with incidences ranging from 25 to 50%. Whereas previous studies employed Sanger sequencing, here we used next generation amplicon deep sequencing for in depth evaluation of RAS mutations in 36 paediatric patients at diagnosis of MLL-AF4+ leukaemia. RAS mutations including those in small sub-clones were detected in 63.9% of patients. Furthermore, the mutational analysis of 17 paired samples at diagnosis and relapse revealed complex RAS clone dynamics and showed that the mutated clones present at relapse were almost all originated from clones that were already detectable at diagnosis and survived to the initial therapy. Finally, we showed that mutated patients were indeed characterized by a RAS related signature at both transcriptional and protein levels and that the targeting of the RAS pathway could be of beneficial for treatment of MLL-AF4+ BCP-ALL clones carrying somatic RAS mutations.

  2. Somatic copy number alterations detected by ultra-deep targeted sequencing predict prognosis in oral cavity squamous cell carcinoma

    PubMed Central

    Ng, Ka-Pou; Tai, An-Shun; Peng, Shih-Chi; Yeh, Jen-Pao; Chen, Shu-Jen; Tsao, Kuo-Chien; Yen, Tzu-Chen; Hsieh, Wen-Ping

    2015-01-01

    Background Ultra-deep targeted sequencing (UDT-Seq) has advanced our knowledge on the incidence and functional significance of somatic mutations. However, the utility of UDT-Seq in detecting copy number alterations (CNAs) remains unclear. With the goal of improving molecular prognostication and identifying new therapeutic targets, we designed this study to assess whether UDT-Seq may be useful for detecting CNA in oral cavity squamous cell carcinoma (OSCC). Methods We sequenced a panel of clinically actionable cancer mutations in 310 formalin-fixed paraffin-embedded OSCC specimens. A linear model was developed to overcome uneven coverage across target regions and multiple samples. The 5-year rates of secondary primary tumors, local recurrence, neck recurrence, distant metastases, and survival served as the outcome measures. We confirmed the prognostic significance of the CNA signatures in an independent sample of 105 primary OSCC specimens. Results The CNA burden across 10 targeted genes was found to predict prognosis in two independent cohorts. FGFR1 and PIK3CAamplifications were associated with prognosis independent of clinical risk factors. Genes exhibiting CNA were clustered in the proteoglycan metabolism, the FOXO signaling, and the PI3K-AKT signaling pathways, for which targeted drugs are already available or currently under development. Conclusions UDT-Seq is clinically useful to identify CNA, which significantly improve the prognostic information provided by traditional clinicopathological risk factors in OSCC patients. PMID:26087196

  3. Deep sequencing and in silico analyses identify MYB-regulated gene networks and signaling pathways in pancreatic cancer

    PubMed Central

    Azim, Shafquat; Zubair, Haseeb; Srivastava, Sanjeev K.; Bhardwaj, Arun; Zubair, Asif; Ahmad, Aamir; Singh, Seema; Khushman, Moh’d.; Singh, Ajay P.

    2016-01-01

    We have recently demonstrated that the transcription factor MYB can modulate several cancer-associated phenotypes in pancreatic cancer. In order to understand the molecular basis of these MYB-associated changes, we conducted deep-sequencing of transcriptome of MYB-overexpressing and -silenced pancreatic cancer cells, followed by in silico pathway analysis. We identified significant modulation of 774 genes upon MYB-silencing (p < 0.05) that were assigned to 25 gene networks by in silico analysis. Further analyses placed genes in our RNA sequencing-generated dataset to several canonical signalling pathways, such as cell-cycle control, DNA-damage and -repair responses, p53 and HIF1α. Importantly, we observed downregulation of the pancreatic adenocarcinoma signaling pathway in MYB-silenced pancreatic cancer cells exhibiting suppression of EGFR and NF-κB. Decreased expression of EGFR and RELA was validated by both qPCR and immunoblotting and they were both shown to be under direct transcriptional control of MYB. These observations were further confirmed in a converse approach wherein MYB was overexpressed ectopically in a MYB-null pancreatic cancer cell line. Our findings thus suggest that MYB potentially regulates growth and genomic stability of pancreatic cancer cells via targeting complex gene networks and signaling pathways. Further in-depth functional studies are warranted to fully understand MYB signaling in pancreatic cancer. PMID:27354262

  4. Deciphering KRAS and NRAS mutated clone dynamics in MLL-AF4 paediatric leukaemia by ultra deep sequencing analysis

    PubMed Central

    Trentin, Luca; Bresolin, Silvia; Giarin, Emanuela; Bardini, Michela; Serafin, Valentina; Accordi, Benedetta; Fais, Franco; Tenca, Claudya; De Lorenzo, Paola; Valsecchi, Maria Grazia; Cazzaniga, Giovanni; Kronnie, Geertruy te; Basso, Giuseppe

    2016-01-01

    To induce and sustain the leukaemogenic process, MLL-AF4+ leukaemia seems to require very few genetic alterations in addition to the fusion gene itself. Studies of infant and paediatric patients with MLL-AF4+ B cell precursor acute lymphoblastic leukaemia (BCP-ALL) have reported mutations in KRAS and NRAS with incidences ranging from 25 to 50%. Whereas previous studies employed Sanger sequencing, here we used next generation amplicon deep sequencing for in depth evaluation of RAS mutations in 36 paediatric patients at diagnosis of MLL-AF4+ leukaemia. RAS mutations including those in small sub-clones were detected in 63.9% of patients. Furthermore, the mutational analysis of 17 paired samples at diagnosis and relapse revealed complex RAS clone dynamics and showed that the mutated clones present at relapse were almost all originated from clones that were already detectable at diagnosis and survived to the initial therapy. Finally, we showed that mutated patients were indeed characterized by a RAS related signature at both transcriptional and protein levels and that the targeting of the RAS pathway could be of beneficial for treatment of MLL-AF4+ BCP-ALL clones carrying somatic RAS mutations. PMID:27698462

  5. Identification and characterisation of microRNAs in young adults of Angiostrongylus cantonensis via a deep-sequencing approach

    PubMed Central

    Chang, Shih-Hsin; Tang, Petrus; Lai, Cheng-Hung; Kuo, Ming-Ling; Wang, Lian-Chen

    2013-01-01

    Angiostrongylus cantonensis is an important causative agent of eosinophilic meningitis and eosinophilic meningoencephalitis in humans. MicroRNAs (miRNAs) are small non-coding RNAs that participate in a wide range of biological processes. This study employed a deep-sequencing approach to study miRNAs from young adults of A. cantonensis. Based on 16,880,456 high-quality reads, 252 conserved mature miRNAs including 10 antisense miRNAs that belonging to 90 families, together with 10 antisense miRNAs were identified and characterised. Among these sequences, 53 miRNAs from 25 families displayed 50 or more reads. The conserved miRNA families were divided into four groups according to their phylogenetic distribution and a total of nine families without any members showing homology to other nematodes or adult worms were identified. Stem-loop real-time polymerase chain reaction analysis of aca-miR-1-1 and aca-miR-71-1 demonstrated that their level of expression increased dramatically from infective larvae to young adults and then decreased in adult worms, with the male worms exhibiting significantly higher levels of expression than female worms. These findings provide information related to the regulation of gene expression during the growth, development and pathogenesis of young adults of A. cantonensis. PMID:24037191

  6. Identifying genomic changes associated with insecticide resistance in the dengue mosquito Aedes aegypti by deep targeted sequencing.

    PubMed

    Faucon, Frederic; Dusfour, Isabelle; Gaude, Thierry; Navratil, Vincent; Boyer, Frederic; Chandre, Fabrice; Sirisopa, Patcharawan; Thanispong, Kanutcharee; Juntarajumnong, Waraporn; Poupardin, Rodolphe; Chareonviriyaphap, Theeraphap; Girod, Romain; Corbel, Vincent; Reynaud, Stephane; David, Jean-Philippe

    2015-09-01

    The capacity of mosquitoes to resist insecticides threatens the control of diseases such as dengue and malaria. Until alternative control tools are implemented, characterizing resistance mechanisms is crucial for managing resistance in natural populations. Insecticide biodegradation by detoxification enzymes is a common resistance mechanism; however, the genomic changes underlying this mechanism have rarely been identified, precluding individual resistance genotyping. In particular, the role of copy number variations (CNVs) and polymorphisms of detoxification enzymes have never been investigated at the genome level, although they can represent robust markers of metabolic resistance. In this context, we combined target enrichment with high-throughput sequencing for conducting the first comprehensive screening of gene amplifications and polymorphisms associated with insecticide resistance in mosquitoes. More than 760 candidate genes were captured and deep sequenced in several populations of the dengue mosquito Ae. aegypti displaying distinct genetic backgrounds and contrasted resistance levels to the insecticide deltamethrin. CNV analysis identified 41 gene amplifications associated with resistance, most affecting cytochrome P450s overtranscribed in resistant populations. Polymorphism analysis detected more than 30,000 variants and strong selection footprints in specific genomic regions. Combining Bayesian and allele frequency filtering approaches identified 55 nonsynonymous variants strongly associated with resistance. Both CNVs and polymorphisms were conserved within regions but differed across continents, confirming that genomic changes underlying metabolic resistance to insecticides are not universal. By identifying novel DNA markers of insecticide resistance, this study opens the way for tracking down metabolic changes developed by mosquitoes to resist insecticides within and among populations.

  7. Identifying genomic changes associated with insecticide resistance in the dengue mosquito Aedes aegypti by deep targeted sequencing

    PubMed Central

    Faucon, Frederic; Dusfour, Isabelle; Gaude, Thierry; Navratil, Vincent; Boyer, Frederic; Chandre, Fabrice; Sirisopa, Patcharawan; Thanispong, Kanutcharee; Juntarajumnong, Waraporn; Poupardin, Rodolphe; Chareonviriyaphap, Theeraphap; Girod, Romain; Corbel, Vincent; Reynaud, Stephane; David, Jean-Philippe

    2015-01-01

    The capacity of mosquitoes to resist insecticides threatens the control of diseases such as dengue and malaria. Until alternative control tools are implemented, characterizing resistance mechanisms is crucial for managing resistance in natural populations. Insecticide biodegradation by detoxification enzymes is a common resistance mechanism; however, the genomic changes underlying this mechanism have rarely been identified, precluding individual resistance genotyping. In particular, the role of copy number variations (CNVs) and polymorphisms of detoxification enzymes have never been investigated at the genome level, although they can represent robust markers of metabolic resistance. In this context, we combined target enrichment with high-throughput sequencing for conducting the first comprehensive screening of gene amplifications and polymorphisms associated with insecticide resistance in mosquitoes. More than 760 candidate genes were captured and deep sequenced in several populations of the dengue mosquito Ae. aegypti displaying distinct genetic backgrounds and contrasted resistance levels to the insecticide deltamethrin. CNV analysis identified 41 gene amplifications associated with resistance, most affecting cytochrome P450s overtranscribed in resistant populations. Polymorphism analysis detected more than 30,000 variants and strong selection footprints in specific genomic regions. Combining Bayesian and allele frequency filtering approaches identified 55 nonsynonymous variants strongly associated with resistance. Both CNVs and polymorphisms were conserved within regions but differed across continents, confirming that genomic changes underlying metabolic resistance to insecticides are not universal. By identifying novel DNA markers of insecticide resistance, this study opens the way for tracking down metabolic changes developed by mosquitoes to resist insecticides within and among populations. PMID:26206155

  8. Small RNA and transcriptome deep sequencing proffers insight into floral gene regulation in Rosa cultivars

    PubMed Central

    2012-01-01

    Background Roses (Rosa sp.), which belong to the family Rosaceae, are the most economically important ornamental plants—making up 30% of the floriculture market. However, given high demand for roses, rose breeding programs are limited in molecular resources which can greatly enhance and speed breeding efforts. A better understanding of important genes that contribute to important floral development and desired phenotypes will lead to improved rose cultivars. For this study, we analyzed rose miRNAs and the rose flower transcriptome in order to generate a database to expound upon current knowledge regarding regulation of important floral characteristics. A rose genetic database will enable comprehensive analysis of gene expression and regulation via miRNA among different Rosa cultivars. Results We produced more than 0.5 million reads from expressed sequences, totalling more than 110 million bp. From these, we generated 35,657, 31,434, 34,725, and 39,722 flower unigenes from Rosa hybrid: ‘Vital’, ‘Maroussia’, and ‘Sympathy’ and Rosa rugosa Thunb. , respectively. The unigenes were assigned functional annotations, domains, metabolic pathways, Gene Ontology (GO) terms, Plant Ontology (PO) terms, and MIPS Functional Catalogue (FunCat) terms. Rose flower transcripts were compared with genes from whole genome sequences of Rosaceae members (apple, strawberry, and peach) and grape. We also produced approximately 40 million small RNA reads from flower tissue for Rosa, representing 267 unique miRNA tags. Among identified miRNAs, 25 of them were novel and 242 of them were conserved miRNAs. Statistical analyses of miRNA profiles revealed both shared and species-specific miRNAs, which presumably effect flower development and phenotypes. Conclusions In this study, we constructed a Rose miRNA and transcriptome database, and we analyzed the miRNAs and transcriptome generated from the flower tissues of four Rosa cultivars. The database provides a comprehensive genetic

  9. The 2007 Nazko, British Columbia, earthquake sequence: Injection of magma deep in the crust beneath the Anahim volcanic belt

    USGS Publications Warehouse

    Cassidy, J.F.; Balfour, N.; Hickson, C.; Kao, H.; White, Rickie; Caplan-Auerbach, J.; Mazzotti, S.; Rogers, Gary C.; Al-Khoubbi, I.; Bird, A.L.; Esteban, L.; Kelman, M.; Hutchinson, J.; McCormack, D.

    2011-01-01

    On 9 October 2007, an unusual sequence of earthquakes began in central British Columbia about 20 km west of the Nazko cone, the most recent (circa 7200 yr) volcanic center in the Anahim volcanic belt. Within 25 hr, eight earthquakes of magnitude 2.3-2.9 occurred in a region where no earthquakes had previously been recorded. During the next three weeks, more than 800 microearthquakes were located (and many more detected), most at a depth of 25-31 km and within a radius of about 5 km. After about two months, almost all activity ceased. The clear P- and S-wave arrivals indicated that these were high-frequency (volcanic-tectonic) earthquakes and the b value of 1.9 that we calculated is anomalous for crustal earthquakes but consistent with volcanic-related events. Analysis of receiver functions at a station immediately above the seismicity indicated a Moho near 30 km depth. Precise relocation of the seismicity using a double-difference method suggested a horizontal migration at the rate of about 0:5 km=d, with almost all events within the lowermost crust. Neither harmonic tremor nor long-period events were observed; however, some spasmodic bursts were recorded and determined to be colocated with the earthquake hypocenters. These observations are all very similar to a deep earthquake sequence recorded beneath Lake Tahoe, California, in 2003-2004. Based on these remarkable similarities, we interpret the Nazko sequence as an indication of an injection of magma into the lower crust beneath the Anahim volcanic belt. This magma injection fractures rock, producing high-frequency, volcanic-tectonic earthquakes and spasmodic bursts.

  10. Deep sequencing reveals microbiota dysbiosis of tongue coat in patients with liver carcinoma

    NASA Astrophysics Data System (ADS)

    Lu, Haifeng; Ren, Zhigang; Li, Ang; Zhang, Hua; Jiang, Jianwen; Xu, Shaoyan; Luo, Qixia; Zhou, Kai; Sun, Xiaoli; Zheng, Shusen; Li, Lanjuan

    2016-09-01

    Liver carcinoma (LC) is a common malignancy worldwide, associated with high morbidity and mortality. Characterizing microbiome profiles of tongue coat may provide useful insights and potential diagnostic marker for LC patients. Herein, we are the first time to investigate tongue coat microbiome of LC patients with cirrhosis based on 16S ribosomal RNA (rRNA) gene sequencing. After strict inclusion and exclusion criteria, 35 early LC patients with cirrhosis and 25 matched healthy subjects were enrolled. Microbiome diversity of tongue coat in LC patients was significantly increased shown by Shannon, Simpson and Chao 1 indexes. Microbiome on tongue coat was significantly distinguished LC patients from healthy subjects by principal component analysis. Tongue coat microbial profiles represented 38 operational taxonomic units assigned to 23 different genera, distinguishing LC patients. Linear discriminant analysis (LDA) effect size (LEfSe) reveals significant microbial dysbiosis of tongue coats in LC patients. Strikingly, Oribacterium and Fusobacterium could distinguish LC patients from healthy subjects. LEfSe outputs show microbial gene functions related to categories of nickel/iron_transport, amino_acid_transport, energy produced system and metabolism between LC patients and healthy subjects. These findings firstly identify microbiota dysbiosis of tongue coat in LC patients, may providing novel and non-invasive potential diagnostic biomarker of LC.

  11. A global view of transcriptome dynamics during flower development in chickpea by deep sequencing.

    PubMed

    Singh, Vikash K; Garg, Rohini; Jain, Mukesh

    2013-08-01

    Measurement of gene expression can provide important clues about gene function and molecular basis of developmental processes. Here, we have analysed the chickpea transcriptome in vegetative and flower tissues by exploiting the potential of high-throughput sequencing to measure gene expression. We mapped more than 295 million reads to quantify the transcript abundance during flower development. We detected the expression of more than 90% genes in at least one tissue analysed. We found quite a large number of genes were differentially expressed during flower development as compared to vegetative tissues. Further, we identified several genes expressed in a stage-specific manner. Various transcription factor families and metabolic pathways involved in flower development were elucidated. The members of MADS-box family were most represented among the transcription factor genes up-regulated during various stages of flower development. The abundant expression of several well-known genes implicated in flower development in chickpea flower development stages confirmed our results. In addition, we detected the expression specificities of lineage-specific genes during flower development. The expression data presented in this study is the most comprehensive dataset available for chickpea as of now and will serve as resource for unraveling the functions of many specific genes involved in flower development in chickpea and other legumes.

  12. Deep sequencing of transcriptome profiling of GSTM2 knock-down in swine testis cells.

    PubMed

    Lv, Yuqi; Jin, Yi; Zhou, Yongqiang; Jin, Jianjun; Ma, Zhenfa; Ren, Zhuqing

    2016-12-01

    Glutathione-S-transferases mu 2 (GSTM2), a kind of important Phase II antioxidant enzyme of eukaryotes, is degraded by nonsense mediated mRNA decay due to a C27T substitution in the fifth exon of pigs. As a reproductive performance-related gene, GSTM2 is involved in embryo implantation, whereas, functional deficiency of GSTM2 induces pre- or post-natal death in piglets potentially. To have some insight into the role of GSTM2 in embryo development, high throughput RNA sequencing is performed using the swine testis cells (ST) with the deletion of GSTM2. Some embryo development-related genes are observed from a total of 242 differentially expressed genes, including STAT1, SRC, IL-8, DUSP family, CCL family and integrin family. GSTM2 affects expression of SRC, OPN, and SLCs. GSTM2 suppresses phosphorylation of STAT1 by binding to STAT1. In addition, as an important transcription factor, STAT1 regulates expression of uterus receptive-related genes including CCLs, IRF9, IFITs, MXs, and OAS. The present study provides evidence to molecular mechanism of GSTM2 modulating embryo development.

  13. Deep sequencing of transcriptome profiling of GSTM2 knock-down in swine testis cells

    PubMed Central

    Lv, Yuqi; Jin, Yi; Zhou, Yongqiang; Jin, Jianjun; Ma, Zhenfa; Ren, Zhuqing

    2016-01-01

    Glutathione-S-transferases mu 2 (GSTM2), a kind of important Phase II antioxidant enzyme of eukaryotes, is degraded by nonsense mediated mRNA decay due to a C27T substitution in the fifth exon of pigs. As a reproductive performance-related gene, GSTM2 is involved in embryo implantation, whereas, functional deficiency of GSTM2 induces pre- or post-natal death in piglets potentially. To have some insight into the role of GSTM2 in embryo development, high throughput RNA sequencing is performed using the swine testis cells (ST) with the deletion of GSTM2. Some embryo development-related genes are observed from a total of 242 differentially expressed genes, including STAT1, SRC, IL-8, DUSP family, CCL family and integrin family. GSTM2 affects expression of SRC, OPN, and SLCs. GSTM2 suppresses phosphorylation of STAT1 by binding to STAT1. In addition, as an important transcription factor, STAT1 regulates expression of uterus receptive-related genes including CCLs, IRF9, IFITs, MXs, and OAS. The present study provides evidence to molecular mechanism of GSTM2 modulating embryo development. PMID:27905550

  14. Deep sequencing reveals microbiota dysbiosis of tongue coat in patients with liver carcinoma

    PubMed Central

    Lu, Haifeng; Ren, Zhigang; Li, Ang; Zhang, Hua; Jiang, Jianwen; Xu, Shaoyan; Luo, Qixia; Zhou, Kai; Sun, Xiaoli; Zheng, Shusen; Li, Lanjuan

    2016-01-01

    Liver carcinoma (LC) is a common malignancy worldwide, associated with high morbidity and mortality. Characterizing microbiome profiles of tongue coat may provide useful insights and potential diagnostic marker for LC patients. Herein, we are the first time to investigate tongue coat microbiome of LC patients with cirrhosis based on 16S ribosomal RNA (rRNA) gene sequencing. After strict inclusion and exclusion criteria, 35 early LC patients with cirrhosis and 25 matched healthy subjects were enrolled. Microbiome diversity of tongue coat in LC patients was significantly increased shown by Shannon, Simpson and Chao 1 indexes. Microbiome on tongue coat was significantly distinguished LC patients from healthy subjects by principal component analysis. Tongue coat microbial profiles represented 38 operational taxonomic units assigned to 23 different genera, distinguishing LC patients. Linear discriminant analysis (LDA) effect size (LEfSe) reveals significant microbial dysbiosis of tongue coats in LC patients. Strikingly, Oribacterium and Fusobacterium could distinguish LC patients from healthy subjects. LEfSe outputs show microbial gene functions related to categories of nickel/iron_transport, amino_acid_transport, energy produced system and metabolism between LC patients and healthy subjects. These findings firstly identify microbiota dysbiosis of tongue coat in LC patients, may providing novel and non-invasive potential diagnostic biomarker of LC. PMID:27605161

  15. M4 - A Globular Cluster Hubble Deep Field: The Main Sequence

    NASA Astrophysics Data System (ADS)

    Richer, H. B.; Brewer, J.; Fahlman, G. G.; Gibson, B.; Hansen, B.; Ibata, R.; Limongi, M.; Rich, M. R.; Stetson, P. B.; Shara, M.

    2001-12-01

    A 123 orbit exposure with HST in 2 colors (GO-8679) in a single field of M4 has yielded photometry to V ~ 30, I ~ 28.5. In the proper motion selected cluster color-magnitude diagram constructed from these and earlier epoch data, there is an abrupt termination of the cluster main sequence at about V = 28 (MV = 15.5), fully 2 magnitudes brighter than our limiting magnitude. This is similar to what is seen among field Population II subdwarfs of the same metallicity ([m/H] = -1.3) and suggests that the lowest mass stars capable of burning hydrogen in their cores at this metallicity have been detected. Comparison with the current generation of models suggests that the masses here are near 0.09 Msun. Of particular interest are 6 objects which have proper motions appropriate for cluster membership, are moderately bright, but are undetected in V. The brightest of these will have (V - I) > 4, redder than any known metal poor subdwarf. The nature of these objects is currently unknown but one possibility is that they are recycled brown dwarfs.

  16. Deep sequencing of the Camellia chekiangoleosa transcriptome revealed candidate genes for anthocyanin biosynthesis.

    PubMed

    Wang, Zhong-Wei; Jiang, Cong; Wen, Qiang; Wang, Na; Tao, Yuan-Yuan; Xu, Li-An

    2014-03-15

    Camellia chekiangoleosa is an important species of genus Camellia. It provides high-quality edible oil and has great ornamental value. The flowers are big and red which bloom between February and March. Flower pigmentation is closely related to the accumulation of anthocyanin. Although anthocyanin biosynthesis has been studied extensively in herbaceous plants, little molecular information on the anthocyanin biosynthesis pathway of C. chekiangoleosa is yet known. In the present study, a cDNA library was constructed to obtain detailed and general data from the flowers of C. chekiangoleosa. To explore the transcriptome of C. chekiangoleosa and investigate genes involved in anthocyanin biosynthesis, a 454 GS FLX Titanium platform was used to generate an EST dataset. About 46,279 sequences were obtained, and 24,593 (53.1%) were annotated. Using Blast search against the AGRIS, 1740 unigenes were found homologous to 599 Arabidopsis transcription factor genes. Based on the transcriptome dataset, nine anthocyanin biosynthesis pathway genes (PAL, CHS1, CHS2, CHS3, CHI, F3H, DFR, ANS, and UFGT) were identified and cloned. The spatio-temporal expression patterns of these genes were also analyzed using quantitative real-time polymerase chain reaction. The study results not only enrich the gene resource but also provide valuable information for further studies concerning anthocyanin biosynthesis.

  17. Deep Sequencing and Ecological Characterization of Gut Microbial Communities of Diverse Bumble Bee Species

    PubMed Central

    Lim, Haw Chuan; Chu, Chia-Ching; Seufferheld, Manfredo J.; Cameron, Sydney A.

    2015-01-01

    Gut bacterial communities of bumble bees are correlated with defense against pathogens. Further understanding this host-microbe association is vitally important as bumble bees are currently experiencing global population declines, potentially due in part to emergent diseases. In this study, we used pyrosequencing and community fingerprinting (ARISA) to characterize the gut microbial communities of nine bumble species from across the Bombus phylogeny. Overall, we delimited 74 bacterial taxa (operational taxonomic units or OTUs) belonging to Betaproteobacteria, Gammaproteobacteria, Bacilli, Actinobacteria, Flavobacteria and Alphaproteobacteria. Each bacterial community was taxonomically simple, containing an average of 1.9 common (relative abundance per sample > 5%) bacterial OTUs. The most abundant and prevalent (occurring in 92% of the samples) bacterial OTU, based on 16S rRNA sequences, closely matched that of the previously described Betaproteobacteria species Snodgrassella alvi. Bacteria that were first described in bee-related external environments dominated a number of gut bacterial communities, suggesting that they are not strictly dependent on the internal gut environment. The ARISA data showed a correlation between bacterial community structures and the geographic locations where the bees were sampled, suggesting that at least a subset of the bacterial species may be transmitted environmentally. Using light and fluorescent microscopy, we demonstrated that the gut bacteria form a biofilm on the internal epithelial surface of the ileum, corroborating results obtained from Apis mellifera. PMID:25768110

  18. Multiple Layers of Chimerism in a Single-Stranded DNA Virus Discovered by Deep Sequencing

    PubMed Central

    Krupovic, Mart; Zhi, Ning; Li, Jungang; Hu, Gangqing; Koonin, Eugene V.; Wong, Susan; Shevchenko, Sofiya; Zhao, Keji; Young, Neal S.

    2015-01-01

    Viruses with single-stranded (ss) DNA genomes infect hosts in all three domains of life and include many medically, ecologically, and economically important pathogens. Recently, a new group of ssDNA viruses with chimeric genomes has been discovered through viral metagenomics. These chimeric viruses combine capsid protein genes and replicative protein genes that, respectively, appear to have been inherited from viruses with positive-strand RNA genomes, such as tombusviruses, and ssDNA genomes, such as circoviruses, nanoviruses or geminiviruses. Here, we describe the genome sequence of a new representative of this virus group and reveal an additional layer of chimerism among ssDNA viruses. We show that not only do these viruses encompass genes for capsid proteins and replicative proteins that have distinct evolutionary histories, but also the replicative genes themselves are chimeras of functional domains inherited from viruses of different families. Our results underscore the importance of horizontal gene transfer in the evolution of ssDNA viruses and the role of genetic recombination in the emergence of novel virus groups. PMID:25840414

  19. Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

    SciTech Connect

    Shi, CY; Yang, H; Wei, CL; Yu, O; Zhang, ZZ; Sun, J; Wan, XC

    2011-01-01

    time PCR (qRT-PCR). An extensive transcriptome dataset has been obtained from the deep sequencing of tea plant. The coverage of the transcriptome is comprehensive enough to discover all known genes of several major metabolic pathways. This transcriptome dataset can serve as an important public information platform for gene expression, genomics, and functional genomic studies in C. sinensis.

  20. Deep sequencing reveals small RNA characterization of invasive micropapillary carcinomas of the breast.

    PubMed

    Li, Shuai; Yang, Cuicui; Zhai, Lili; Zhang, Wenwei; Yu, Jing; Gu, Feng; Lang, Ronggang; Fan, Yu; Gong, Meihua; Zhang, Xiuqing; Fu, Li

    2012-11-01

    Invasive micropapillary carcinoma (IMPC) is an uncommon histological type of breast cancer. IMPC has a special growth pattern and a more aggressive behavior than invasive ductal carcinomas of no special types (IDC-NSTs). microRNAs are a large class of non-coding RNAs involved in the regulation of various biological processes. Here, we analyzed the small RNA transcriptomes of five formalin-fixed paraffin-embedded (FFPE) pure IMPC samples and five FFPE IDC-NSTs samples by means of next-generation sequencing, generating a total of >170,000,000 clean reads. In an unsupervised cluster analysis, differently expressed miRNAs generated a tree with clear distinction between IMPC and IDC-NSTs classes. Paired fresh-frozen and FFPE specimens showed very similar miRNA expression profiles. By means of RT-qPCR, we further investigated miRNA expression in more IMPC (n = 22) and IDC-NSTs (n = 24) FFPE samples and found let-7b, miR-30c, miR-148a, miR-181a, miR-181a*, and miR-181b were significantly differently expressed between the two groups. We also elucidated several features of miRNA in these breast cancer tissues including 5' variability, miRNA editing, and 3' untemplated addition. Our findings will lead to further understanding of the invasive potency of IMPC and gain an insight into the diversity and complexity of small RNA molecules in breast cancer tissues.

  1. Deep small RNA sequencing from the nematode Ascaris reveals conservation, functional diversification, and novel developmental profiles.

    PubMed

    Wang, Jianbin; Czech, Benjamin; Crunk, Amanda; Wallace, Adam; Mitreva, Makedonka; Hannon, Gregory J; Davis, Richard E

    2011-09-01

    Eukaryotic cells express several classes of small RNAs that regulate gene expression and ensure genome maintenance. Endogenous siRNAs (endo-siRNAs) and Piwi-interacting RNAs (piRNAs) mainly control gene and transposon expression in the germline, while microRNAs (miRNAs) generally function in post-transcriptional gene silencing in both somatic and germline cells. To provide an evolutionary and developmental perspective on small RNA pathways in nematodes, we identified and characterized known and novel small RNA classes through gametogenesis and embryo development in the parasitic nematode Ascaris suum and compared them with known small RNAs of Caenorhabditis elegans. piRNAs, Piwi-clade Argonautes, and other proteins associated with the piRNA pathway have been lost in Ascaris. miRNAs are synthesized immediately after fertilization in utero, before pronuclear fusion, and before the first cleavage of the zygote. This is the earliest expression of small RNAs ever described at a developmental stage long thought to be transcriptionally quiescent. A comparison of the two classes of Ascaris endo-siRNAs, 22G-RNAs and 26G-RNAs, to those in C. elegans, suggests great diversification and plasticity in the use of small RNA pathways during spermatogenesis in different nematodes. Our data reveal conserved characteristics of nematode small RNAs as well as features unique to Ascaris that illustrate significant flexibility in the use of small RNAs pathways, some of which are likely an adaptation to Ascaris' life cycle and parasitism. The transcriptome assembly has been submitted to NCBI Transcriptome Shotgun Assembly Sequence Database(http://www.ncbi.nlm.nih.gov/genbank/TSA.html) under accession numbers JI163767–JI182837 and JI210738–JI257410.

  2. Makah Formation; a deep-marginal-basin sequence of late Eocene and Oligocene age in the northwestern Olympic Peninsula, Washington

    USGS Publications Warehouse

    Snavely, P. D.; Niem, A.R.; MacLeod, N.S.; Pearl, J.E.; Rau, W.W.

    1980-01-01

    The Makah Formation of the Twin River Group crops out in a northwest-trending linear belt in the northwesternmost part of the Olympic Peninsula, Wash. This marine sequence consists of 2800 meters of predominantly thin-bedded siltstone and sandstone that encloses six distinctive newly named members--four thick-bedded amalgamated turbidite sandstone members, an olistostromal shallow-water marine sandstone and conglomerate member, and a thin-bedded water-laid tuff member. A local unconformity of submarine origin occurs within the lower part of the Makah Formation except in the central part of the study area, where it forms the contact between the older Hoko River Formation and the Makah. Foraminiferal faunas indicate that the Makah Formation ranges in age from late Eocene (late Narizian) to late Oligocene (Zemorrian) and was deposited in a predominantly lower to middle bathyal environment. The Makah Formation is part of a deep-marginalbasin facies that crops out in the western part of the Olympic Peninsula, in southwesternmost Washington and coastal embayments in northwestern Oregon, and along the central part of the coast of western Vancouver Island. On the basis of limited subsurface data from exploratory wells, correlative deep-marginal-basin deposits underlie the inner continental shelf of Oregon and the continental shelf (Tofino basin) along the southwestern side of Vancouver Island. Directional structures in the Makah Formation indicate that the predominantly lithic arkosic sandstone that forms the turbidite packets was derived from the northwest. A possible source of the clastic material is the dioritic, granitic, and volcanic terranes in the vicinity of the Hesquiat Peninsula and Barkley Sound on the west coast of Vancouver Island. Vertical and lateral variations of turbidite facies suggest that the four packets of sandstone were formed as depositional lobes on an outer submarine fan. The thin-bedded strata between the turbidite packets have characteristics of

  3. In silico serotyping of E. coli from short read data identifies limited novel O-loci but extensive diversity of O:H serotype combinations within and between pathogenic lineages

    PubMed Central

    Valcanis, Mary; Kuzevski, Alex; Tauschek, Marija; Inouye, Michael; Stinear, Tim; Levine, Myron M.; Robins-Browne, Roy M.; Holt, Kathryn E.

    2016-01-01

    The lipopolysaccharide (O) and flagellar (H) surface antigens of Escherichia coli are targets for serotyping that have traditionally been used to identify pathogenic lineages. These surface antigens are important for the survival of E. coli within mammalian hosts. However, traditional serotyping has several limitations, and public health reference laboratories are increasingly moving towards whole genome sequencing (WGS) to characterize bacterial isolates. Here we present a method to rapidly and accurately serotype E. coli isolates from raw, short read WGS data. Our approach bypasses the need for de novo genome assembly by directly screening WGS reads against a curated database of alleles linked to known and novel E. coli O-groups and H-types (the EcOH database) using the software package srst2. We validated the approach by comparing in silico results for 197 enteropathogenic E. coli isolates with those obtained by serological phenotyping in an independent laboratory. We then demonstrated the utility of our method to characterize isolates in public health and clinical settings, and to explore the genetic diversity of >1500 E. coli genomes from multiple sources. Importantly, we showed that transfer of O- and H-antigen loci between E. coli chromosomal backbones is common, with little evidence of constraints by host or pathotype, suggesting that E. coli ‘strain space’ may be virtually unlimited, even within specific pathotypes. Our findings show that serotyping is most useful when used in combination with strain genotyping to characterize microevolution events within an inferred population structure. PMID:28348859

  4. Isolation and Characterization of Microsatellite DNA Markers in the Deep-Sea Amphipod Paralicella tenuipes by Illumina MiSeq Sequencing.

    PubMed

    Ritchie, Heather; Jamieson, Alan J; Piertney, Stuart B

    2016-07-01

    Here, we describe the development of 16 polymorphic microsatellite markers using an Illumina MiSeq sequencing approach in the deep-sea amphipod Paralicella tenuipes A total of 25 577 844 DNA sequences were filtered for microsatellite motifs of which 197 873 sequences were identified. From these sequences, 64 had sufficient flanking regions for primer design and 16 of these loci were polymorphic. Between 5 and 30 alleles were detected per locus, with an average of 13.63 alleles per locus, across a total of 120 individuals from 5 separate deep sea trenches from the Pacific Ocean. For the 16 loci, observed and expected heterozygosity values ranged from 0.116 to 0.414 and 0.422 to 0.820, respectively, with one locus displaying significant deviation from Hardy-Weinberg equilibrium. The microsatellite loci that have been isolated and described here are the first molecular markers developed for deep sea amphipods and will be invaluable for elucidating the genetic population structure and the extent of connectivity between deep ocean trenches.

  5. InFusion: Advancing Discovery of Fusion Genes and Chimeric Transcripts from Deep RNA-Sequencing Data

    PubMed Central

    Okonechnikov, Konstantin; Imai-Matsushima, Aki; Seitz, Alexander; Meyer, Thomas F.; Garcia-Alcalde, Fernando

    2016-01-01

    Analysis of fusion transcripts has become increasingly important due to their link with cancer development. Since high-throughput sequencing approaches survey fusion events exhaustively, several computational methods for the detection of gene fusions from RNA-seq data have been developed. This kind of analysis, however, is complicated by native trans-splicing events, the splicing-induced complexity of the transcriptome and biases and artefacts introduced in experiments and data analysis. There are a number of tools available for the detection of fusions from RNA-seq data; however, certain differences in specificity and sensitivity between commonly used approaches have been found. The ability to detect gene fusions of different types, including isoform fusions and fusions involving non-coding regions, has not been thoroughly studied yet. Here, we propose a novel computational toolkit called InFusion for fusion gene detection from RNA-seq data. InFusion introduces several unique features, such as discovery of fusions involving intergenic regions, and detection of anti-sense transcription in chimeric RNAs based on strand-specificity. Our approach demonstrates superior detection accuracy on simulated data and several public RNA-seq datasets. This improved performance was also evident when evaluating data from RNA deep-sequencing of two well-established prostate cancer cell lines. InFusion identified 26 novel fusion events that were validated in vitro, including alternatively spliced gene fusion isoforms and chimeric transcripts that include intergenic regions. The toolkit is freely available to download from http:/bitbucket.org/kokonech/infusion. PMID:27907167

  6. Identification of MicroRNAs and transcript targets in Camelina sativa by deep sequencing and computational methods

    DOE PAGES

    Poudel, Saroj; Aryal, Niranjan; Lu, Chaofu; ...

    2015-03-31

    Camelina sativa is an annual oilseed crop that is under intensive development for renewable resources of biofuels and industrial oils. MicroRNAs, or miRNAs, are endogenously encoded small RNAs that play key roles in diverse plant biological processes. Here, we conducted deep sequencing on small RNA libraries prepared from camelina leaves, flower buds and two stages of developing seeds corresponding to initial and peak storage products accumulation. Computational analyses identified 207 known miRNAs belonging to 63 families, as well as 5 novel miRNAs. These miRNAs, especially members of the miRNA families, varied greatly in different tissues and developmental stages. The predictedmore » miRNA target genes are involved in a broad range of physiological functions including lipid metabolism. This report is the first step toward elucidating roles of miRNAs in C. sativa and will provide additional tools to improve this oilseed crop for biofuels and biomaterials.« less

  7. Molecular Epidemiology of Plasmodium falciparum kelch13 Mutations in Senegal Determined by Using Targeted Amplicon Deep Sequencing.

    PubMed

    Talundzic, Eldin; Ndiaye, Yaye D; Deme, Awa B; Olsen, Christian; Patel, Dhruviben S; Biliya, Shweta; Daniels, Rachel; Vannberg, Fredrik O; Volkman, Sarah K; Udhayakumar, Venkatachalam; Ndiaye, Daouda

    2017-03-01

    The emergence of Plasmodium falciparum resistance to artemisinin in Southeast Asia threatens malaria control and elimination activities worldwide. Multiple polymorphisms in the P. falciparum kelch gene found in chromosome 13 (Pfk13) have been associated with artemisinin resistance. Surveillance of potential drug resistance loci within a population that may emerge under increasing drug pressure is an important public health activity. In this context, P. falciparum infections from an observational surveillance study in Senegal were genotyped using targeted amplicon deep sequencing (TADS) for Pfk13 polymorphisms. The results were compared to previously reported Pfk13 polymorphisms from around the world. A total of 22 Pfk13 propeller domain polymorphisms were identified in this study, of which 12 have previously not been reported. Interestingly, of the 10 polymorphisms identified in the present study that were also previously reported, all had a different amino acid substitution at these codon positions. Most of the polymorphisms were present at low frequencies and were confined to single isolates, suggesting they are likely transient polymorphisms that are part of naturally evolving parasite populations. The results of this study underscore the need to identify potential drug resistance loci existing within a population, which may emerge under increasing drug pressure.

  8. Identification of microRNAs by small RNA deep sequencing for synthetic microRNA mimics to control Spodoptera exigua.

    PubMed

    Zhang, Yu Liang; Huang, Qi Xing; Yin, Guo Hua; Lee, Samantha; Jia, Rui Zong; Liu, Zhi Xin; Yu, Nai Tong; Pennerman, Kayla K; Chen, Xin; Guo, An Ping

    2015-02-25

    Beet armyworm, Spodoptera exigua, is a major pest of cotton around the world. With the increase of resistance to Bacillus thuringiensis (Bt) toxin in transgenic cotton plants, there is a need to develop an alternative control approach that can be used in combination with Bt transgenic crops as part of resistance management strategies. MicroRNAs (miRNAs), a non-coding small RNA family (18-25 nt), play crucial roles in various biological processes and over-expression of miRNAs has been shown to interfere with the normal development of insects. In this study, we identified 127 conserved miRNAs in S. exigua by using small RNA deep sequencing technology. From this, we tested the effects of 11 miRNAs on larval development. We found three miRNAs, Sex-miR-10-1a, Sex-miR-4924, and Sex-miR-9, to be differentially expressed during larval stages of S. exigua. Oral feeding experiments using synthetic miRNA mimics of Sex-miR-10-1a, Sex-miR-4924, and Sex-miR-9 resulted in suppressed growth of S. exigua and mortality. Over-expression of Sex-miR-4924 caused a significant reduction in the expression level of chitinase 1 and caused abortive molting in the insects. Therefore, we demonstrated a novel approach of using miRNA mimics to control S. exigua development.

  9. Chronic toxicological effects of β-diketone antibiotics on Zebrafish (Danio rerio) using transcriptome profiling of deep sequencing.

    PubMed

    Wang, Huili; Yin, Xiaohan; Li, Fanghui; Dahlgren, Randy A; Zhang, Yuna; Zhang, Hongqin; Wang, Xuedong

    2016-11-01

    Transcriptome analysis is important for interpreting the functional elements of the genome and revealing the molecular constituents of cells and tissues. Herein, differentially transcribed genes were identified by deep sequencing after zebrafish (Danio rerio) were exposed to β-diketone antibiotics (DKAs); 23,129 and 23,550 mapped genes were detected in control and treatment groups, a total of 3238 genes were differentially expressed between control and treatment groups. Of these genes, 328 genes (213 up- and 115 down-regulation) had significant differential expression (p < 0.05) and an expression ratio (control/treatment) of >2 or <0.5. Additionally, we performed Gene Ontology (GO) category and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses, and found 266 genes in the treatment group with annotation terms linked to the GO category. A total of 77 differentially expressed transcriptional genes were associated with 132 predicted KEGG metabolic pathways. Serious liver tissue damage was reflected and consistent with the differences in genetic classification and function from the transcriptome analysis. These results enhance our understanding of zebrafish developmental processes under exposure to DKA stress. © 2015 Wiley Periodicals, Inc. Environ Toxicol 31: 1357-1371, 2016.

  10. Investigating the molecular genetic basis of heterosis for internode expansion in maize by microRNA transcriptomic deep sequencing.

    PubMed

    Zhao, Peng; Ding, Dong; Zhang, Fangfang; Zhao, Xiaofeng; Xue, Yadong; Li, Weihua; Fu, Zhiyuan; Li, Haochuan; Tang, Jihua

    2015-05-01

    Heterosis has been used widely in the breeding of maize and other crops and plays an important role in increasing yield, improving quality, and enhancing stress resistance, but its molecular mechanism is far from clear. To determine whether microRNA (miRNA)-dependent gene regulation is responsible for heterosis of elongating internodes below the ear and ear height in maize, a deep-sequencing strategy was applied to the elite hybrid Xundan20, which is currently cultivated widely in China, and its two parents. RNA was extracted from the eighth internode because it shows clear internode length heterosis. A total of 99 conserved maize miRNAs were detected in both the hybrid and parental lines. Most of these miRNAs were expressed nonadditively in the hybrid compared with its parental lines. These results indicated that miRNAs might participate in heterosis during internode expansion in maize and exert an influence on ear and plant height via the repression of their target genes. In total, eight novel miRNAs belonging to four miRNA families were predicted in the expanding internode. Global repression of miRNAs in the hybrid, which might result in enhanced gene expression, might be one reason why the hybrid shows longer internodes and taller seedlings compared with its parental lines.

  11. Deep sequencing of pyrethroid-resistant bed bugs reveals multiple mechanisms of resistance within a single population.

    PubMed

    Adelman, Zach N; Kilcullen, Kathleen A; Koganemaru, Reina; Anderson, Michelle A E; Anderson, Troy D; Miller, Dini M

    2011-01-01

    A frightening resurgence of bed bug infestations has occurred over the last 10 years in the U.S. and current chemical methods have been inadequate for controlling this pest due to widespread insecticide resistance. Little is known about the mechanisms of resistance present in U.S. bed bug populations, making it extremely difficult to develop intelligent strategies for their control. We have identified bed bugs collected in Richmond, VA which exhibit both kdr-type (L925I) and metabolic resistance to pyrethroid insecticides. Using LD(50) bioassays, we determined that resistance ratios for Richmond strain bed bugs were ∼5200-fold to the insecticide deltamethrin. To identify metabolic genes potentially involved in the detoxification of pyrethroids, we performed deep-sequencing of the adult bed bug transcriptome, obtaining more than 2.5 million reads on the 454 titanium platform. Following assembly, analysis of newly identified gene transcripts in both Harlan (susceptible) and Richmond (resistant) bed bugs revealed several candidate cytochrome P450 and carboxylesterase genes which were significantly over-expressed in the resistant strain, consistent with the idea of increased metabolic resistance. These data will accelerate efforts to understand the biochemical basis for insecticide resistance in bed bugs, and provide molecular markers to assist in the surveillance of metabolic resistance.

  12. Deep Sequencing of Pyrethroid-Resistant Bed Bugs Reveals Multiple Mechanisms of Resistance within a Single Population

    PubMed Central

    Adelman, Zach N.; Kilcullen, Kathleen A.; Koganemaru, Reina; Anderson, Michelle A. E.; Anderson, Troy D.; Miller, Dini M.

    2011-01-01

    A frightening resurgence of bed bug infestations has occurred over the last 10 years in the U.S. and current chemical methods have been inadequate for controlling this pest due to widespread insecticide resistance. Little is known about the mechanisms of resistance present in U.S. bed bug populations, making it extremely difficult to develop intelligent strategies for their control. We have identified bed bugs collected in Richmond, VA which exhibit both kdr-type (L925I) and metabolic resistance to pyrethroid insecticides. Using LD50 bioassays, we determined that resistance ratios for Richmond strain bed bugs were ∼5200-fold to the insecticide deltamethrin. To identify metabolic genes potentially involved in the detoxification of pyrethroids, we performed deep-sequencing of the adult bed bug transcriptome, obtaining more than 2.5 million reads on the 454 titanium platform. Following assembly, analysis of newly identified gene transcripts in both Harlan (susceptible) and Richmond (resistant) bed bugs revealed several candidate cytochrome P450 and carboxylesterase genes which were significantly over-expressed in the resistant strain, consistent with the idea of increased metabolic resistance. These data will accelerate efforts to understand the biochemical basis for insecticide resistance in bed bugs, and provide molecular markers to assist in the surveillance of metabolic resistance. PMID:22039447

  13. Proteome sequencing goes deep

    PubMed Central

    Richards, Alicia L.; Merrill, Anna E.; Coon, Joshua J.

    2014-01-01

    Advances in mass spectrometry have transformed the scope and impact of protein characterization efforts. Identifying hundreds of proteins from rather simple biological matrices, such as yeast, was a daunting task just a few decades ago. Now, expression of more than half of the estimated ~20,000 human protein coding genes can be confirmed in record time and from minute sample quantities. Access to proteomic information at such unprecedented depths has been fueled by strides in every stage of the shotgun proteomics workflow – from sample processing to data analysis – and promises to revolutionize our understanding of the causes and consequences of proteome variation. PMID:25461719

  14. Identification of conserved and novel microRNAs in the Pacific oyster Crassostrea gigas by deep sequencing.

    PubMed

    Xu, Fei; Wang, Xiaotong; Feng, Yue; Huang, Wen; Wang, Wei; Li, Li; Fang, Xiaodong; Que, Huayong; Zhang, Guofan

    2014-01-01

    MicroRNAs (miRNAs) play important roles in regulatory processes in various organisms. To date many studies have been performed in the investigation of miRNAs of numerous bilaterians, but limited numbers of miRNAs have been identified in the few species belonging to the clade Lophotrochozoa. In the current study, deep sequencing was conducted to identify the miRNAs of Crassostrea gigas (Lophotrochozoa) at a genomic scale, using 21 libraries that included different developmental stages and adult organs. A total of 100 hairpin precursor loci were predicted to encode miRNAs. Of these, 19 precursors (pre-miRNA) were novel in the oyster. As many as 53 (53%) miRNAs were distributed in clusters and 49 (49%) precursors were intragenic, which suggests two important biogenetic sources of miRNAs. Different developmental stages were characterized with specific miRNA expression patterns that highlighted regulatory variation along a temporal axis. Conserved miRNAs were expressed universally throughout different stages and organs, whereas novel miRNAs tended to be more specific and may be related to the determination of the novel body plan. Furthermore, we developed an index named the miRNA profile age index (miRPAI) to integrate the evolutionary age and expression levels of miRNAs during a particular developmental stage. We found that the swimming stages were characterized by the youngest miRPAIs. Indeed, the large-scale expression of novel miRNAs indicated the importance of these stages during development, particularly from organogenetic and evolutionary perspectives. Some potentially important miRNAs were identified for further study through significant changes between expression patterns in different developmental events, such as metamorphosis. This study broadened the knowledge of miRNAs in animals and indicated the presence of sophisticated miRNA regulatory networks related to the biological processes in lophotrochozoans.

  15. Deep Sequencing of Suppression Subtractive Hybridisation Drought and Recovery Libraries of the Non-model Crop Trifolium repens L.

    PubMed

    Bisaga, Maciej; Lowe, Matthew; Hegarty, Matthew; Abberton, Michael; Ravagnani, Adriana

    2017-01-01

    White clover is a short-lived perennial whose persistence is greatly affected by abiotic stresses, particularly drought. The aim of this work was to characterize its molecular response to water deficit and recovery following re-hydration to identify targets for the breeding of tolerant varieties. We created a white clover reference transcriptome of 16,193 contigs by deep sequencing (mean base coverage 387x) four Suppression Subtractive Hybridization (SSH) libraries (a forward and a reverse library for each treatment) constructed from young leaf tissue of white clover at the onset of the response to drought and recovery. Reads from individual libraries were then mapped to the reference transcriptome and processed comparing expression level data. The pipeline generated four robust sets of transcripts induced and repressed in the leaves of plants subjected to water deficit stress (6,937 and 3,142, respectively) and following re-hydration (6,695 and 4,897, respectively). Semi-quantitative polymerase chain reaction was used to verify the expression pattern of 16 genes. The differentially expressed transcripts were functionally annotated and mapped to biological processes and pathways. In agreement with similar studies in other crops, the majority of transcripts up-regulated in response to drought belonged to metabolic processes, such as amino acid, carbohydrate, and lipid metabolism, while transcripts involved in photosynthesis, such as components of the photosystem and the biosynthesis of photosynthetic pigments, were up-regulated during recovery. The data also highlighted the role of raffinose family oligosaccharides (RFOs) and the possible delayed response of the flavonoid pathways in the initial response of white clover to water withdrawal. The work presented in this paper is to our knowledge the first large scale molecular analysis of the white clover response to drought stress and re-hydration. The data generated provide a valuable genomic resource for marker

  16. Ultra-Deep Sequencing Characterization of HCV Samples with Equivocal Typing Results Determined with a Commercial Assay

    PubMed Central

    Minosse, Claudia; Giombini, Emanuela; Bartolini, Barbara; Capobianchi, Maria R.; Garbuglia, Anna R.

    2016-01-01

    Hepatitis C virus (HCV) is classified into seven phylogenetically distinct genotypes, which are further subdivided into related subtypes. Accurate assignment of genotype/subtype is mandatory in the era of directly acting antivirals. Several molecular methods are available for HCV genotyping; however, a relevant number of samples with indeterminate, mixed, or unspecified subtype results, or even with misclassified genotypes, may occur. Using NS5B direct (DS) and ultra-deep pyrosequencing (UDPS), we have tested 43 samples, which resulted in genotype 1 unsubtyped (n = 17), mixed infection (n = 17), or indeterminate (n = 9) with the Abbott RealTime HCV Genotype II assay. Genotype 1 was confirmed in 14/17 samples (82%): eight resulted in subtype 1b, and five resulted in subtype 1a with both DS and UDPS, while one was classified as subtype 1e by DS and mixed infection (1e + 1a) by UDPS. Three of seventeen genotype 1 samples resulted in genotype 3h with both sequencing approaches. Only one mixed infection was confirmed by UDPS (4d + 1a), while in 88% of cases a single component of the mixture was detected (five genotype 1a, four genotype 1b, two genotype 3a, two genotype 4m, and two genotype 4d); 44% of indeterminate samples resulted genotype 2c by both DS and UDPS, 22% resulted genotype 3a; one indeterminate sample by Abbott resulted in genotype 4d, one resulted in genotype 6n, and one was classified as subtype 3a by DS, and resulted mixed infection (3a + 3h) by UDPS. The concordance between DS and UDPS was 94%, 88%, and 89% for genotype 1, co-infection, and indeterminate results, respectively. UDPS should be considered very useful to resolve ambiguous HCV genotyping results. PMID:27739414

  17. Identification of Conserved and Novel MicroRNAs in the Pacific Oyster Crassostrea gigas by Deep Sequencing

    PubMed Central

    Xu, Fei; Wang, Xiaotong; Feng, Yue; Huang, Wen; Wang, Wei; Li, Li; Fang, Xiaodong; Que, Huayong; Zhang, Guofan

    2014-01-01

    MicroRNAs (miRNAs) play important roles in regulatory processes in various organisms. To date many studies have been performed in the investigation of miRNAs of numerous bilaterians, but limited numbers of miRNAs have been identified in the few species belonging to the clade Lophotrochozoa. In the current study, deep sequencing was conducted to identify the miRNAs of Crassostrea gigas (Lophotrochozoa) at a genomic scale, using 21 libraries that included different developmental stages and adult organs. A total of 100 hairpin precursor loci were predicted to encode miRNAs. Of these, 19 precursors (pre-miRNA) were novel in the oyster. As many as 53 (53%) miRNAs were distributed in clusters and 49 (49%) precursors were intragenic, which suggests two important biogenetic sources of miRNAs. Different developmental stages were characterized with specific miRNA expression patterns that highlighted regulatory variation along a temporal axis. Conserved miRNAs were expressed universally throughout different stages and organs, whereas novel miRNAs tended to be more specific and may be related to the determination of the novel body plan. Furthermore, we developed an index named the miRNA profile age index (miRPAI) to integrate the evolutionary age and expression levels of miRNAs during a particular developmental stage. We found that the swimming stages were characterized by the youngest miRPAIs. Indeed, the large-scale expression of novel miRNAs indicated the importance of these stages during development, particularly from organogenetic and evolutionary perspectives. Some potentially important miRNAs were identified for further study through significant changes between expression patterns in different developmental events, such as metamorphosis. This study broadened the knowledge of miRNAs in animals and indicated the presence of sophisticated miRNA regulatory networks related to the biological processes in lophotrochozoans. PMID:25137038

  18. Deep Sequencing of Suppression Subtractive Hybridisation Drought and Recovery Libraries of the Non-model Crop Trifolium repens L.

    PubMed Central

    Bisaga, Maciej; Lowe, Matthew; Hegarty, Matthew; Abberton, Michael; Ravagnani, Adriana

    2017-01-01

    White clover is a short-lived perennial whose persistence is greatly affected by abiotic stresses, particularly drought. The aim of this work was to characterize its molecular response to water deficit and recovery following re-hydration to identify targets for the breeding of tolerant varieties. We created a white clover reference transcriptome of 16,193 contigs by deep sequencing (mean base coverage 387x) four Suppression Subtractive Hybridization (SSH) libraries (a forward and a reverse library for each treatment) constructed from young leaf tissue of white clover at the onset of the response to drought and recovery. Reads from individual libraries were then mapped to the reference transcriptome and processed comparing expression level data. The pipeline generated four robust sets of transcripts induced and repressed in the leaves of plants subjected to water deficit stress (6,937 and 3,142, respectively) and following re-hydration (6,695 and 4,897, respectively). Semi-quantitative polymerase chain reaction was used to verify the expression pattern of 16 genes. The differentially expressed transcripts were functionally annotated and mapped to biological processes and pathways. In agreement with similar studies in other crops, the majority of transcripts up-regulated in response to drought belonged to metabolic processes, such as amino acid, carbohydrate, and lipid metabolism, while transcripts involved in photosynthesis, such as components of the photosystem and the biosynthesis of photosynthetic pigments, were up-regulated during recovery. The data also highlighted the role of raffinose family oligosaccharides (RFOs) and the possible delayed response of the flavonoid pathways in the initial response of white clover to water withdrawal. The work presented in this paper is to our knowledge the first large scale molecular analysis of the white clover response to drought stress and re-hydration. The data generated provide a valuable genomic resource for marker

  19. A Deep Sequencing Approach to Comparatively Analyze the Transcriptome of Lifecycle Stages of the Filarial Worm, Brugia malayi

    PubMed Central

    Choi, Young-Jun; Ghedin, Elodie; Berriman, Matthew; McQuillan, Jacqueline; Holroyd, Nancy; Mayhew, George F.; Christensen, Bruce M.; Michalski, Michelle L.

    2011-01-01

    Background Developing intervention strategies for the control of parasitic nematodes continues to be a significant challenge. Genomic and post-genomic approaches play an increasingly important role for providing fundamental molecular information about these parasites, thus enhancing basic as well as translational research. Here we report a comprehensive genome-wide survey of the developmental transcriptome of the human filarial parasite Brugia malayi. Methodology/Principal Findings Using deep sequencing, we profiled the transcriptome of eggs and embryos, immature (≤3 days of age) and mature microfilariae (MF), third- and fourth-stage larvae (L3 and L4), and adult male and female worms. Comparative analysis across these stages provided a detailed overview of the molecular repertoires that define and differentiate distinct lifecycle stages of the parasite. Genome-wide assessment of the overall transcriptional variability indicated that the cuticle collagen family and those implicated in molting exhibit noticeably dynamic stage-dependent patterns. Of particular interest was the identification of genes displaying sex-biased or germline-enriched profiles due to their potential involvement in reproductive processes. The study also revealed discrete transcriptional changes during larval development, namely those accompanying the maturation of MF and the L3 to L4 transition that are vital in establishing successful infection in mosquito vectors and vertebrate hosts, respectively. Conclusions/Significance Characterization of the transcriptional program of the parasite's lifecycle is an important step toward understanding the developmental processes required for the infectious cycle. We find that the transcriptional program has a number of stage-specific pathways activated during worm development. In addition to advancing our understanding of transcriptome dynamics, these data will aid in the study of genome structure and organization by facilitating the identification of

  20. Ultra-deep T cell receptor sequencing reveals the complexity and intratumour heterogeneity of T cell clones in renal cell carcinomas.

    PubMed

    Gerlinger, Marco; Quezada, Sergio A; Peggs, Karl S; Furness, Andrew J S; Fisher, Rosalie; Marafioti, Teresa; Shende, Vishvesh H; McGranahan, Nicholas; Rowan, Andrew J; Hazell, Steven; Hamm, David; Robins, Harlan S; Pickering, Lisa; Gore, Martin; Nicol, David L; Larkin, James; Swanton, Charles

    2013-12-01

    The recognition of cancer cells by T cells can impact upon prognosis and be exploited for immunotherapeutic approaches. This recognition depends on the specific interaction between antigens displayed on the surface of cancer cells and the T cell receptor (TCR), which is generated by somatic rearrangements of TCR α- and β-chains (TCRb). Our aim was to assess whether ultra-deep sequencing of the rearranged TCRb in DNA extracted from unfractionated clear cell renal cell carcinoma (ccRCC) samples can provide insights into the clonality and heterogeneity of intratumoural T cells in ccRCCs, a tumour type that can display extensive genetic intratumour heterogeneity (ITH). For this purpose, DNA was extracted from two to four tumour regions from each of four primary ccRCCs and was analysed by ultra-deep TCR sequencing. In parallel, tumour infiltration by CD4, CD8 and Foxp3 regulatory T cells was evaluated by immunohistochemistry and correlated with TCR-sequencing data. A polyclonal T cell repertoire with 367-16 289 (median 2394) unique TCRb sequences was identified per tumour region. The frequencies of the 100 most abundant T cell clones/tumour were poorly correlated between most regions (Pearson correlation coefficient, -0.218 to 0.465). 3-93% of these T cell clones were not detectable across all regions. Thus, the clonal composition of T cell populations can be heterogeneous across different regions of the same ccRCC. T cell ITH was higher in tumours pretreated with an mTOR inhibitor, which could suggest that therapy can influence adaptive tumour immunity. These data show that ultra-deep TCR-sequencing technology can be applied directly to DNA extracted from unfractionated tumour samples, allowing novel insights into the clonality of T cell populations in cancers. These were polyclonal and displayed ITH in ccRCC. TCRb sequencing may shed light on mechanisms of cancer immunity and the efficacy of immunotherapy approaches.

  1. Complete Genome Sequence of the Hyperthermophilic Archaeon Pyrococcus sp. Strain ST04, Isolated from a Deep-Sea Hydrothermal Sulfide Chimney on the Juan de Fuca Ridge

    PubMed Central

    Jung, Jong-Hyun; Lee, Ju-Hoon; Holden, James F.; Seo, Dong-Ho; Shin, Hakdong; Kim, Hae-Yeong; Kim, Wooki; Ryu, Sangryeol

    2012-01-01

    Pyrococcus sp. strain ST04 is a hyperthermophilic, anaerobic, and heterotrophic archaeon isolated from a deep-sea hydrothermal sulfide chimney on the Endeavour Segment of the Juan de Fuca Ridge in the northeastern Pacific Ocean. To further understand the distinct characteristics of this archaeon at the genome level (polysaccharide utilization at high temperature and ATP generation by a Na+ gradient), the genome of strain ST04 was completely sequenced and analyzed. Here, we present the complete genome sequence analysis results of Pyrococcus sp. ST04 and report the major findings from the genome annotation, with a focus on its saccharolytic and metabolite production potential. PMID:22843576

  2. Complete genome sequence of the hyperthermophilic archaeon Pyrococcus sp. strain ST04, isolated from a deep-sea hydrothermal sulfide chimney on the Juan de Fuca Ridge.

    PubMed

    Jung, Jong-Hyun; Lee, Ju-Hoon; Holden, James F; Seo, Dong-Ho; Shin, Hakdong; Kim, Hae-Yeong; Kim, Wooki; Ryu, Sangryeol; Park, Cheon-Seok

    2012-08-01

    Pyrococcus sp. strain ST04 is a hyperthermophilic, anaerobic, and heterotrophic archaeon isolated from a deep-sea hydrothermal sulfide chimney on the Endeavour Segment of the Juan de Fuca Ridge in the northeastern Pacific Ocean. To further understand the distinct characteristics of this archaeon at the genome level (polysaccharide utilization at high temperature and ATP generation by a Na(+) gradient), the genome of strain ST04 was completely sequenced and analyzed. Here, we present the complete genome sequence analysis results of Pyrococcus sp. ST04 and report the major findings from the genome annotation, with a focus on its saccharolytic and metabolite production potential.

  3. Coval: Improving Alignment Quality and Variant Calling Accuracy for Next-Generation Sequencing Data

    PubMed Central

    Kosugi, Shunichi; Natsume, Satoshi; Yoshida, Kentaro; MacLean, Daniel; Cano, Liliana; Kamoun, Sophien; Terauchi, Ryohei

    2013-01-01

    Accurate identification of DNA polymorphisms using next-generation sequencing technology is challenging because of a high rate of sequencing error and incorrect mapping of reads to reference genomes. Currently available short read aligners and DNA variant callers suffer from these problems. We developed the Coval software to improve the quality of short read alignments. Coval is designed to minimize the incidence of spurious alignment of short reads, by filtering mismatched reads that remained in alignments after local realignment and error correction of mismatched reads. The error correction is executed based on the base quality and allele frequency at the non-reference positions for an individual or pooled sample. We demonstrated the utility of Coval by applying it to simulated genomes and experimentally obtained short-read data of rice, nematode, and mouse. Moreover, we found an unexpectedly large number of incorrectly mapped reads in ‘targeted’ alignments, where the whole genome sequencing reads had been aligned to a local genomic segment, and showed that Coval effectively eliminated such spurious alignments. We conclude that Coval significantly improves the quality of short-read sequence alignments, thereby increasing the calling accuracy of currently available tools for SNP and indel identification. Coval is available at http://sourceforge.net/projects/coval105/. PMID:24116042

  4. De novo assembly of the common bean transcriptome using short reads for the discovery of drought-responsive genes.

    PubMed

    Wu, Jing; Wang, Lanfen; Li, Long; Wang, Shumin

    2014-01-01

    The common bean (Phaseolus vulgaris L.) is one of the most important food legumes, far ahead of other legumes. The average grain yield of the common bean worldwide is much lower than its potential yields, primarily due to drought in the field. However, the gene network that mediates plant responses to drought stress remains largely unknown in this species. The major goals of our study are to identify a large scale of genes involved in drought stress using RNA-seq. First, we assembled 270 million high-quality trimmed reads into a non-redundant set of 62,828 unigenes, representing approximately 49 Mb of unique transcriptome sequences. Of these unigenes, 26,501 (42.2%) common bean unigenes had significant similarity with unigenes/predicted proteins from other legumes or sequenced plants. All unigenes were functionally annotated within the GO, COG and KEGG pathways. The strategy for de novo assembly of transcriptome data generated here will be useful in other legume plant transcriptome studies. Second, we identified 10,482 SSRs and 4,099 SNPs in transcripts. The large number of genetic markers provides a resource for gene discovery and development of functional molecular markers. Finally, we found differential expression genes (DEGs) between terminal drought and optimal irrigation treatments and between the two different genotypes Long 22-0579 (drought tolerant) and Naihua (drought sensitive). DEGs were confirmed by quantitative real-time PCR assays, which indicated that these genes are functionally associated with the drought-stress response. These resources will be helpful for basic and applied research for genome analysis and crop drought resistance improvement in the common bean.

  5. Identification and characterization of microRNAs by deep-sequencing in Hyalomma anatolicum anatolicum (Acari: Ixodidae) ticks.

    PubMed

    Luo, Jin; Liu, Guang-Yuan; Chen, Ze; Ren, Qiao-Yun; Yin, Hong; Luo, Jian-Xun; Wang, Hui

    2015-06-15

    Hyalomma anatolicum anatolicum (H.a. anatolicum) (Acari: Ixodidae) ticks are globally distributed ectoparasites with veterinary and medical importance. These ticks not only weaken animals by sucking their blood but also transmit different species of parasitic protozoans. Multiple factors influence these parasitic infections including miRNAs, which are non-coding, small regulatory RNA molecules essential for the complex life cycle of parasites. To identify and characterize miRNAs in H.a. anatolicum, we developed an integrative approach combining deep sequencing, bioinformatics and real-time PCR analysis. Here we report the use of this approach to identify miRNA expression, family distribution, and nucleotide characteristics, and discovered novel miRNAs in H.a. anatolicum. The result showed that miR-1-3p, miR-275-3p, and miR-92a were expressed abundantly. There was a strong bias on miRNA, family members, and nucleotide compositions at certain positions in H.a. anatolicum miRNA. Uracil was the dominant nucleotide, particularly at positions 1, 6, 16, and 18, which were located approximately at the beginning, middle, and end of conserved miRNAs. Analysis of the conserved miRNAs indicated that miRNAs in H.a. anatolicum were concentrated along three diverse phylogenetic branches of bilaterians, insects and coelomates. Two possible roles for the use of miRNA in H.a. anatolicum could be presumed based on its parasitic life cycle: to maintain a large category of miRNA families of different animals, and/or to preserve stringent conserved seed regions with active changes in other places of miRNAs mainly in the middle and the end regions. These might help the parasite to undergo its complex life style in different hosts and adapt more readily to the host changes. The present study represents the first large scale characterization of H.a. anatolicum miRNAs, which could further the understanding of the complex biology of this zoonotic parasite, as well as initiate miRNA studies

  6. Transposon Mutagenesis Paired with Deep Sequencing of Caulobacter crescentus under Uranium Stress Reveals Genes Essential for Detoxification and Stress Tolerance

    PubMed Central

    Yung, Mimi C.; Park, Dan M.; Overton, K. Wesley; Blow, Matthew J.; Hoover, Cindi A.; Smit, John; Murray, Sean R.; Ricci, Dante P.; Christen, Beat; Bowman, Grant R.

    2015-01-01

    ABSTRACT The ubiquitous aquatic bacterium Caulobacter crescentus is highly resistant to uranium (U) and facilitates U biomineralization and thus holds promise as an agent of U bioremediation. To gain an understanding of how C. crescentus tolerates U, we employed transposon (Tn) mutagenesis paired with deep sequencing (Tn-seq) in a global screen for genomic elements required for U resistance. Of the 3,879 annotated genes in the C. crescentus genome, 37 were found to be specifically associated with fitness under U stress, 15 of which were subsequently tested through mutational analysis. Systematic deletion analysis revealed that mutants lacking outer membrane transporters (rsaFa and rsaFb), a stress-responsive transcription factor (cztR), or a ppGpp synthetase/hydrolase (spoT) exhibited a significantly lower survival rate under U stress. RsaFa and RsaFb, which are homologues of TolC in Escherichia coli, have previously been shown to mediate S-layer export. Transcriptional analysis revealed upregulation of rsaFa and rsaFb by 4- and 10-fold, respectively, in the presence of U. We additionally show that rsaFa mutants accumulated higher levels of U than the wild type, with no significant increase in oxidative stress levels. Our results suggest a function for RsaFa and RsaFb in U efflux and/or maintenance of membrane integrity during U stress. In addition, we present data implicating CztR and SpoT in resistance to U stress. Together, our findings reveal novel gene targets that are key to understanding the molecular mechanisms of U resistance in C. crescentus. IMPORTANCE Caulobacter crescentus is an aerobic bacterium that is highly resistant to uranium (U) and has great potential to be used in U bioremediation, but its mechanisms of U resistance are poorly understood. We conducted a Tn-seq screen to identify genes specifically required for U resistance in C. crescentus. The genes that we identified have previously remained elusive using other omics approaches and thus

  7. Deep Sequencing Analysis of miRNA Expression in Breast Muscle of Fast-Growing and Slow-Growing Broilers

    PubMed Central

    Ouyang, Hongjia; He, Xiaomei; Li, Guihuan; Xu, Haiping; Jia, Xinzheng; Nie, Qinghua; Zhang, Xiquan

    2015-01-01

    Growth performance is an important economic trait in chicken. MicroRNAs (miRNAs) have been shown to play important roles in various biological processes, but their functions in chicken growth are not yet clear. To investigate the function of miRNAs in chicken growth, breast muscle tissues of the two-tail samples (highest and lowest body weight) from Recessive White Rock (WRR) and Xinghua Chickens (XH) were performed on high throughput small RNA deep sequencing. In this study, a total of 921 miRNAs were identified, including 733 known mature miRNAs and 188 novel miRNAs. There were 200, 279, 257 and 297 differentially expressed miRNAs in the comparisons of WRRh vs. WRRl, WRRh vs. XHh, WRRl vs. XHl, and XHh vs. XHl group, respectively. A total of 22 highly differentially expressed miRNAs (fold change > 2 or < 0.5; p-value < 0.05; q-value < 0.01), which also have abundant expression (read counts > 1000) were found in our comparisons. As far as two analyses (WRRh vs. WRRl, and XHh vs. XHl) are concerned, we found 80 common differentially expressed miRNAs, while 110 miRNAs were found in WRRh vs. XHh and WRRl vs. XHl. Furthermore, 26 common miRNAs were identified among all four comparisons. Four differentially expressed miRNAs (miR-223, miR-16, miR-205a and miR-222b-5p) were validated by quantitative real-time RT-PCR (qRT-PCR). Regulatory networks of interactions among miRNAs and their targets were constructed using integrative miRNA target-prediction and network-analysis. Growth hormone receptor (GHR) was confirmed as a target of miR-146b-3p by dual-luciferase assay and qPCR, indicating that miR-34c, miR-223, miR-146b-3p, miR-21 and miR-205a are key growth-related target genes in the network. These miRNAs are proposed as candidate miRNAs for future studies concerning miRNA-target function on regulation of chicken growth. PMID:26193261

  8. Next-Generation Sequencing and In Vitro Expression Study of ADAMTS13 Single Nucleotide Variants in Deep Vein Thrombosis

    PubMed Central

    Pagliari, Maria Teresa; Lotta, Luca A.; de Haan, Hugoline G.; Valsecchi, Carla; Casoli, Gloria; Pontiggia, Silvia; Martinelli, Ida; Passamonti, Serena M.; Rosendaal, Frits R.

    2016-01-01

    Background Deep vein thrombosis (DVT) genetic predisposition is partially known. Objectives This study aimed at assessing the functional impact of nine ADAMTS13 single nucleotide variants (SNVs) previously reported to be associated as a group with DVT in a burden test and the individual association of selected variants with DVT risk in two replication studies. Methods Wild-type and mutant recombinant ADAMTS13 were transiently expressed in HEK293 cells. Antigen and activity of recombinant ADAMTS13 were measured by ELISA and FRETS-VWF73 assays, respectively. The replication studies were performed in an Italian case-control study (Milan study; 298/298 patients/controls) using a next-generation sequencing approach and in a Dutch case-control study (MEGA study; 4306/4887 patients/controls) by TaqMan assays. Results In vitro results showed reduced ADAMTS13 activity for three SNVs (p.Val154Ile [15%; 95% confidence interval [CI] 14–16], p.Asp187His [19%; 95%[CI] 17–21], p.Arg421Cys [24%; 95%[CI] 22–26]) similar to reduced plasma ADAMTS13 levels of patients carriers for these SNVs. Therefore these three SNVs were interrogated for risk association. The first replication study identified 3 heterozygous carriers (2 cases, 1 control) of p.Arg421Cys (odds ratio [OR] 2, 95%[CI] 0.18–22.25). The second replication study identified 2 heterozygous carriers (1 case, 1 control) of p.Asp187His ([OR] 1.14, 95%[CI] 0.07–18.15) and 10 heterozygous carriers (4 cases, 6 controls) of p.Arg421Cys ([OR] 0.76, 95%[CI] 0.21–2.68). Conclusions Three SNVs (p.Val154Ile, p.Asp187His and p.Arg421Cys) showed reduced ex vivo and in vitro ADAMTS13 levels. However, the low frequency of these variants makes it difficult to confirm their association with DVT. PMID:27802307

  9. Fungal ITS1 Deep-Sequencing Strategies to Reconstruct the Composition of a 26-Species Community and Evaluation of the Gut Mycobiota of Healthy Japanese Individuals.

    PubMed

    Motooka, Daisuke; Fujimoto, Kosuke; Tanaka, Reiko; Yaguchi, Takashi; Gotoh, Kazuyoshi; Maeda, Yuichi; Furuta, Yoki; Kurakawa, Takashi; Goto, Naohisa; Yasunaga, Teruo; Narazaki, Masashi; Kumanogoh, Atsushi; Horii, Toshihiro; Iida, Tetsuya; Takeda, Kiyoshi; Nakamura, Shota

    2017-01-01

    The study of mycobiota remains relatively unexplored due to the lack of sufficient available reference strains and databases compared to those of bacterial microbiome studies. Deep sequencing of Internal Transcribed Spacer (ITS) regions is the de facto standard for fungal diversity analysis. However, results are often biased because of the wide variety of sequence lengths in the ITS regions and the complexity of high-throughput sequencing (HTS) technologies. In this study, a curated ITS database, ntF-ITS1, was constructed. This database can be utilized for the taxonomic assignment of fungal community members. We evaluated the efficacy of strategies for mycobiome analysis by using this database and characterizing a mock fungal community consisting of 26 species representing 15 genera using ITS1 sequencing with three HTS platforms: Illumina MiSeq (MiSeq), Ion Torrent Personal Genome Machine (IonPGM), and Pacific Biosciences (PacBio). Our evaluation demonstrated that PacBio's circular consensus sequencing with greater than 8 full-passes most accurately reconstructed the composition of the mock community. Using this strategy for deep-sequencing analysis of the gut mycobiota in healthy Japanese individuals revealed two major mycobiota types: a single-species type composed of Candida albicans or Saccharomyces cerevisiae and a multi-species type. In this study, we proposed the best possible processing strategies for the three sequencing platforms, of which, the PacBio platform allowed for the most accurate estimation of the fungal community. The database and methodology described here provide critical tools for the emerging field of mycobiome studies.

  10. Fungal ITS1 Deep-Sequencing Strategies to Reconstruct the Composition of a 26-Species Community and Evaluation of the Gut Mycobiota of Healthy Japanese Individuals

    PubMed Central

    Motooka, Daisuke; Fujimoto, Kosuke; Tanaka, Reiko; Yaguchi, Takashi; Gotoh, Kazuyoshi; Maeda, Yuichi; Furuta, Yoki; Kurakawa, Takashi; Goto, Naohisa; Yasunaga, Teruo; Narazaki, Masashi; Kumanogoh, Atsushi; Horii, Toshihiro; Iida, Tetsuya; Takeda, Kiyoshi; Nakamura, Shota

    2017-01-01

    The study of mycobiota remains relatively unexplored due to the lack of sufficient available reference strains and databases compared to those of bacterial microbiome studies. Deep sequencing of Internal Transcribed Spacer (ITS) regions is the de facto standard for fungal diversity analysis. However, results are often biased because of the wide variety of sequence lengths in the ITS regions and the complexity of high-throughput sequencing (HTS) technologies. In this study, a curated ITS database, ntF-ITS1, was constructed. This database can be utilized for the taxonomic assignment of fungal community members. We evaluated the efficacy of strategies for mycobiome analysis by using this database and characterizing a mock fungal community consisting of 26 species representing 15 genera using ITS1 sequencing with three HTS platforms: Illumina MiSeq (MiSeq), Ion Torrent Personal Genome Machine (IonPGM), and Pacific Biosciences (PacBio). Our evaluation demonstrated that PacBio’s circular consensus sequencing with greater than 8 full-passes most accurately reconstructed the composition of the mock community. Using this strategy for deep-sequencing analysis of the gut mycobiota in healthy Japanese individuals revealed two major mycobiota types: a single-species type composed of Candida albicans or Saccharomyces cerevisiae and a multi-species type. In this study, we proposed the best possible processing strategies for the three sequencing platforms, of which, the PacBio platform allowed for the most accurate estimation of the fungal community. The database and methodology described here provide critical tools for the emerging field of mycobiome studies. PMID:28261190

  11. Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation.

    PubMed

    Costello, Maura; Pugh, Trevor J; Fennell, Timothy J; Stewart, Chip; Lichtenstein, Lee; Meldrim, James C; Fostel, Jennifer L; Friedrich, Dennis C; Perrin, Danielle; Dionne, Danielle; Kim, Sharon; Gabriel, Stacey B; Lander, Eric S; Fisher, Sheila; Getz, Gad

    2013-04-01

    As researchers begin probing deep coverage sequencing data for increasingly rare mutations and subclonal events, the fidelity of next generation sequencing (NGS) laboratory methods will become increasingly critical. Although error rates for sequencing and polymerase chain reaction (PCR) are well documented, the effects that DNA extraction and other library preparation steps could have on downstream sequence integrity have not been thoroughly evaluated. Here, we describe the discovery of novel C > A/G > T transversion artifacts found at low allelic fractions in targeted capture data. Characteristics such as sequencer read orientation and presence in both tumor and normal samples strongly indicated a non-biological mechanism. We identified the source as oxidation of DNA during acoustic shearing in samples containing reactive contaminants from the extraction process. We show generation of 8-oxoguanine (8-oxoG) lesions during DNA shearing, present analysis tools to detect oxidation in sequencing data and suggest methods to reduce DNA oxidation through the introduction of antioxidants. Further, informatics methods are presented to confidently filter these artifacts from sequencing data sets. Though only seen in a low percentage of reads in affected samples, such artifacts could have profoundly deleterious effects on the ability to confidently call rare mutations, and eliminating other possible sources of artifacts should become a priority for the research community.

  12. Phylogenetic and Genome-Wide Deep-Sequencing Analyses of Canine Parvovirus Reveal Co-Infection with Field Variants and Emergence of a Recent Recombinant Strain

    PubMed Central

    Pérez, Ruben; Calleros, Lucía; Marandino, Ana; Sarute, Nicolás; Iraola, Gregorio; Grecco, Sofia; Blanc, Hervé; Vignuzzi, Marco; Isakov, Ofer; Shomron, Noam; Carrau, Lucía; Hernández, Martín; Francia, Lourdes; Sosa, Katia; Tomás, Gonzalo; Panzera, Yanina

    2014-01-01

    Canine parvovirus (CPV), a fast-evolving single-stranded DNA virus, comprises three antigenic variants (2a, 2b, and 2c) with different frequencies and genetic variability among countries. The contribution of co-infection and recombination to the genetic variability of CPV is far from being fully elucidated. Here we took advantage of a natural CPV population, recently formed by the convergence of divergent CPV-2c and CPV-2a strains, to study co-infection and recombination. Complete sequences of the viral coding region of CPV-2a and CPV-2c strains from 40 samples were generated and analyzed using phylogenetic tools. Two samples showed co-infection and were further analyzed by deep sequencing. The sequence profile of one of the samples revealed the presence of CPV-2c and CPV-2a strains that differed at 29 nucleotides. The other sample included a minor CPV-2a strain (13.3% of the viral population) and a major recombinant strain (86.7%). The recombinant strain arose from inter-genotypic recombination between CPV-2c and CPV-2a strains within the VP1/VP2 gene boundary. Our findings highlight the importance of deep-sequencing analysis to provide a better understanding of CPV molecular diversity. PMID:25365348

  13. Phylogenetic and genome-wide deep-sequencing analyses of canine parvovirus reveal co-infection with field variants and emergence of a recent recombinant strain.

    PubMed

    Pérez, Ruben; Calleros, Lucía; Marandino, Ana; Sarute, Nicolás; Iraola, Gregorio; Grecco, Sofia; Blanc, Hervé; Vignuzzi, Marco; Isakov, Ofer; Shomron, Noam; Carrau, Lucía; Hernández, Martín; Francia, Lourdes; Sosa, Katia; Tomás, Gonzalo; Panzera, Yanina

    2014-01-01

    Canine parvovirus (CPV), a fast-evolving single-stranded DNA virus, comprises three antigenic variants (2a, 2b, and 2c) with different frequencies and genetic variability among countries. The contribution of co-infection and recombination to the genetic variability of CPV is far from being fully elucidated. Here we took advantage of a natural CPV population, recently formed by the convergence of divergent CPV-2c and CPV-2a strains, to study co-infection and recombination. Complete sequences of the viral coding region of CPV-2a and CPV-2c strains from 40 samples were generated and analyzed using phylogenetic tools. Two samples showed co-infection and were further analyzed by deep sequencing. The sequence profile of one of the samples revealed the presence of CPV-2c and CPV-2a strains that differed at 29 nucleotides. The other sample included a minor CPV-2a strain (13.3% of the viral population) and a major recombinant strain (86.7%). The recombinant strain arose from inter-genotypic recombination between CPV-2c and CPV-2a strains within the VP1/VP2 gene boundary. Our findings highlight the importance of deep-sequencing analysis to provide a better understanding of CPV molecular diversity.

  14. Complete genome sequence of the aerobic, heterotroph Marinithermus hydrothermalis type strain (T1T) from a deep-sea hydrothermal vent chimney

    SciTech Connect

    Copeland, A; Gu, Wei; Yasawong, Montri; Lapidus, Alla L.; Lucas, Susan; Deshpande, Shweta; Pagani, Ioanna; Tapia, Roxanne; Cheng, Jan-Fang; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Pan, Chongle; Brambilla, Evelyne-Marie; Rohde, Manfred; Tindall, Brian; Sikorski, Johannes; Goker, Markus; Detter, J. Chris; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Woyke, Tanja

    2012-01-01

    Marinithermus hydrothermalis Sako et al. 2003 is the type species of the monotypic genus Marinithermus. M. hydrothermalis T1 T was the first isolate within the phylum ThermusDeinococcus to exhibit optimal growth under a salinity equivalent to that of sea water and to have an absolute requirement for NaCl for growth. M. hydrothermalis T1 T is of interest because it may provide a new insight into the ecological significance of the aerobic, thermophilic decomposers in the circulation of organic compounds in deep-sea hydrothermal vent ecosystems. This is the first completed genome sequence of a member of the genus Marinithermus and the seventh sequence from the family Thermaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,269,167 bp long genome with its 2,251 protein-coding and 59 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  15. Correlation of the Virological Response to Short-Term Maraviroc Monotherapy with Standard and Deep-Sequencing-Based Genotypic Tropism Prediction Methods

    PubMed Central

    Gonzalez-Serna, A.; McGovern, R. A.; Harrigan, P. R.; Vidal, F.; Poon, A. F. Y.; Ferrando-Martinez, S.; Abad, M. A.; Genebat, M.; Leal, M.

    2012-01-01

    Genotypic tropism testing methods are emerging as the first step before prescription of the CCR5 antagonist maraviroc (MVC) to HIV-infected patients in Europe. Studies validating genotypic tests have included other active drugs that could have potentially convoluted the effects of MVC. The maraviroc clinical test (MCT) is an in vivo drug sensitivity test based on the virological response to a short-term exposure to MVC monotherapy. Thus, our aim was to compare the results of genotypic tropism testing methods with the short-term virological response to MVC monotherapy. A virological response in the MCT was defined as a ≥1-log10 decrease in HIV RNA or undetectability after 8 days of drug exposure. Seventy-three patients undergoing the MCT were included in this study. We used both standard genotypic methods (n = 73) and deep sequencing (n = 27) on MCT samples at baseline. For the standard methods, the most widely used genotypic algorithms for analyzing the V3 loop sequence, geno2pheno and PSSM, were used. For deep sequencing, the geno2pheno algorithm was used with a false-positive rate cutoff of 3.5. The discordance rates between the standard genotypic methods and the virological response were approximately 20% (including mostly patients without a virological response). Interestingly, these discordance rates were similar to that obtained from deep sequencing (18.5%). The discordance rates between the genotypic methods (tropism assays predictive of the use of the CCR5 coreceptor) and the MCT (in vivo MVC sensitivity assay) indicate that the algorithms used by genotypic methods are still not sufficiently optimized. PMID:22143533

  16. Deep-sequencing method for quantifying background abundances of symbiodinium types: exploring the rare symbiodinium biosphere in reef-building corals.

    PubMed

    Quigley, Kate M; Davies, Sarah W; Kenkel, Carly D; Willis, Bette L; Matz, Mikhail V; Bay, Line K

    2014-01-01

    The capacity of reef-building corals to associate with environmentally-appropriate types of endosymbionts from the dinoflagellate genus Symbiodinium contributes significantly to their success at local scales. Additionally, some corals are able to acclimatize to environmental perturbations by shuffling the relative proportions of different Symbiodinium types hosted. Understanding the dynamics of these symbioses requires a sensitive and quantitative method of Symbiodinium genotyping. Electrophoresis methods, still widely utilized for this purpose, are predominantly qualitative and cannot guarantee detection of a background type below 10% of the total Symbiodinium population. Here, the relative abundances of four Symbiodinium types (A13, C1, C3, and D1) in mixed samples of known composition were quantified using deep sequencing of the internal transcribed spacer of the ribosomal RNA gene (ITS-2) by means of Next Generation Sequencing (NGS) using Roche 454. In samples dominated by each of the four Symbiodinium types tested, background levels of the other three types were detected when present at 5%, 1%, and 0.1% levels, and their relative abundances were quantified with high (A13, C1, D1) to variable (C3) accuracy. The potential of this deep sequencing method for resolving fine-scale genetic diversity within a symbiont type was further demonstrated in a natural symbiosis using ITS-1, and uncovered reef-specific differences in the composition of Symbiodinium microadriaticum in two species of acroporid corals (Acropora digitifera and A. hyacinthus) from Palau. The ability of deep sequencing of the ITS locus (1 and 2) to detect and quantify low-abundant Symbiodinium types, as well as finer-scale diversity below the type level, will enable more robust quantification of local genetic diversity in Symbiodinium populations. This method will help to elucidate the role that background types have in maximizing coral fitness across diverse environments and in response to

  17. Arthropod Phylogenetics in Light of Three Novel Millipede (Myriapoda: Diplopoda) Mitochondrial Genomes with Comments on the Appropriateness of Mitochondrial Genome Sequence Data for Inferring Deep Level Relationships

    PubMed Central

    Brewer, Michael S.; Swafford, Lynn; Spruill, Chad L.; Bond, Jason E.

    2013-01-01

    Background Arthropods are the most diverse group of eukaryotic organisms, but their phylogenetic relationships are poorly understood. Herein, we describe three mitochondrial genomes representing orders of millipedes for which complete genomes had not been characterized. Newly sequenced genomes are combined with existing data to characterize the protein coding regions of myriapods and to attempt to reconstruct the evolutionary relationships within the Myriapoda and Arthropoda. Results The newly sequenced genomes are similar to previously characterized millipede sequences in terms of synteny and length. Unique translocations occurred within the newly sequenced taxa, including one half of the Appalachioria falcifera genome, which is inverted with respect to other millipede genomes. Across myriapods, amino acid conservation levels are highly dependent on the gene region. Additionally, individual loci varied in the level of amino acid conservation. Overall, most gene regions showed low levels of conservation at many sites. Attempts to reconstruct the evolutionary relationships suffered from questionable relationships and low support values. Analyses of phylogenetic informativeness show the lack of signal deep in the trees (i.e., genes evolve too quickly). As a result, the myriapod tree resembles previously published results but lacks convincing support, and, within the arthropod tree, well established groups were recovered as polyphyletic. Conclusions The novel genome sequences described herein provide useful genomic information concerning millipede groups that had not been investigated. Taken together with existing sequences, the variety of compositions and evolution of myriapod mitochondrial genomes are shown to be more complex than previously thought. Unfortunately, the use of mitochondrial protein-coding regions in deep arthropod phylogenetics appears problematic, a result consistent with previously published studies. Lack of phylogenetic signal renders the

  18. Metagenome sequencing of the microbial community of two Brazilian anthropogenic Amazon dark earth sites, Brazil.

    PubMed

    Lemos, Leandro Nascimento; de Souza, Rosineide Cardoso; de Souza Cannavan, Fabiana; Patricio, André; Pylro, Victor Satler; Hanada, Rogério Eiji; Mui, Tsai Siu

    2016-12-01

    The Anthropogenic Amazon Dark Earth soil is considered one of the world's most fertile soils. These soils differs from conventional Amazon soils because its higher organic content concentration. Here we describe the metagenome sequencing of microbial communities of two sites of Anthropogenic Amazon Dark Earth soils from Amazon Rainforest, Brazil. The raw sequence data are stored under Short Read Accession number: PRJNA344917.

  19. Cross-Species, Amplifiable Microsatellite Markers for Neoverrucid Barnacles from Deep-Sea Hydrothermal Vents Developed Using Next-Generation Sequencing

    PubMed Central

    Nakajima, Yuichi; Shinzato, Chuya; Khalturina, Mariia; Watanabe, Hiromi; Inagaki, Fumio; Satoh, Nori; Mitarai, Satoshi

    2014-01-01

    Barnacles of the genus Neoverruca are abundant near deep-sea hydrothermal vents of the northwestern Pacific Ocean, and are useful for understanding processes of population formation and maintenance of deep-sea vent faunas. Using next-generation sequencing, we isolated 12 polymorphic microsatellite loci from Neoverruca sp., collected in the Okinawa Trough. These microsatellite loci revealed 2–19 alleles per locus. The expected and observed heterozygosities ranged from 0.286 to 1.000 and 0.349 to 0.935, respectively. Cross-species amplification showed that 9 of the 12 loci were successfully amplified for Neoverruca brachylepadoformis in the Mariana Trough. A pairwise FST value calculated using nine loci showed significant genetic differentiation between the two species. Consequently, the microsatellite markers we developed will be useful for further population genetic studies to elucidate genetic diversity, differentiation, classification, and evolutionary processes in the genus Neoverruca. PMID:25196437

  20. Cross-species, amplifiable microsatellite markers for neoverrucid barnacles from deep-sea hydrothermal vents developed using next-generation sequencing.

    PubMed

    Nakajima, Yuichi; Shinzato, Chuya; Khalturina, Mariia; Watanabe, Hiromi; Inagaki, Fumio; Satoh, Nori; Mitarai, Satoshi

    2014-08-18

    Barnacles of the genus Neoverruca are abundant near deep-sea hydrothermal vents of the northwestern Pacific Ocean, and are useful for understanding processes of population formation and maintenance of deep-sea vent faunas. Using next-generation sequencing, we isolated 12 polymorphic microsatellite loci from Neoverruca sp., collected in the Okinawa Trough. These microsatellite loci revealed 2-19 alleles per locus. The expected and observed heterozygosities ranged from 0.286 to 1.000 and 0.349 to 0.935, respectively. Cross-species amplification showed that 9 of the 12 loci were successfully amplified for Neoverruca brachylepadoformis in the Mariana Trough. A pairwise FST value calculated using nine loci showed significant genetic differentiation between the two species. Consequently, the microsatellite markers we developed will be useful for further population genetic studies to elucidate genetic diversity, differentiation, classification, and evolutionary processes in the genus Neoverruca.

  1. Permanent draft genome sequence of Bacillus flexus strain T6186-2, a multidrug-resistant bacterium isolated from a deep-subsurface oil reservoir.

    PubMed

    Zhang, Fan; Jiang, Xiawei; Chai, Lujun; She, Yuehui; Yu, Gaoming; Shu, Fuchang; Wang, Zhengliang; Su, Sanbao; Wenqiong, Wu; Tingsheng, Xiang; Zhang, Zhongzhi; Hou, Dujie; Zheng, Beiwen

    2014-12-01

    Previous studies suggest that antibiotic resistance genes have an ancient origin, which is not always linked to the use of antibiotics but can be enhanced by human activities. Bacillus flexus strain T6186-2 was isolated from the formation water sample of a deep-subsurface oil reservoir. Interestingly, antimicrobial susceptibility testing showed that this strain is susceptible to kanamycin, however, resistant to ampicillin, erythromycin, gentamicin, vancomycin, fosfomycin, fosmidomycin, tetracycline and teicoplanin. To explore our knowledge about the origins of antibiotic resistance genes (ARGs) in the relatively pristine environment, we sequenced the genome of B. flexus strain T6186-2 as a permanent draft. It represents the evidence for the existence of a reservoir of ARGs in nature among microbial populations from deep-subsurface oil reservoirs.

  2. Identification of genetic risk variants for deep vein thrombosis by multiplexed next-generation sequencing of 186 hemostatic/pro-inflammatory genes

    PubMed Central

    2012-01-01

    Background Next-generation DNA sequencing is opening new avenues for genetic association studies in common diseases that, like deep vein thrombosis (DVT), have a strong genetic predisposition still largely unexplained by currently identified risk variants. In order to develop sequencing and analytical pipelines for the application of next-generation sequencing to complex diseases, we conducted a pilot study sequencing the coding area of 186 hemostatic/proinflammatory genes in 10 Italian cases of idiopathic DVT and 12 healthy controls. Results A molecular-barcoding strategy was used to multiplex DNA target capture and sequencing, while retaining individual sequence information. Genomic libraries with barcode sequence-tags were pooled (in pools of 8 or 16 samples) and enriched for target DNA sequences. Sequencing was performed on ABI SOLiD-4 platforms. We produced > 12 gigabases of raw sequence data to sequence at high coverage (average: 42X) the 700-kilobase target area in 22 individuals. A total of 1876 high-quality genetic variants were identified (1778 single nucleotide substitutions and 98 insertions/deletions). Annotation on databases of genetic variation and human disease mutations revealed several novel, potentially deleterious mutations. We tested 576 common variants in a case-control association analysis, carrying the top-5 associations over to replication in up to 719 DVT cases and 719 controls. We also conducted an analysis of the burden of nonsynonymous variants in coagulation factor and anticoagulant genes. We found an excess of rare missense mutations in anticoagulant genes in DVT cases compared to controls and an association for a missense polymorphism of FGA (rs6050; p = 1.9 × 10-5, OR 1.45; 95% CI, 1.22-1.72; after replication in > 1400 individuals). Conclusions We implemented a barcode-based strategy to efficiently multiplex sequencing of hundreds of candidate genes in several individuals. In the relatively small dataset of our pilot study we were

  3. Genome Sequence of Aeribacillus pallidus Strain GS3372, an Endospore-Forming Bacterium Isolated in a Deep Geothermal Reservoir

    PubMed Central

    Filippidou, Sevasti; Jaussi, Marion; Junier, Thomas; Wunderlin, Tina; Jeanneret, Nicole; Regenspurg, Simona; Li, Po-E; Lo, Chien-Chi; McMurry, Kim; Gleasner, Cheryl D.; Vuyisich, Momchilo; Chain, Patrick S.

    2015-01-01

    The genome of strain GS3372 is the first publicly available strain of Aeribacillus pallidus. This endospore-forming thermophilic strain was isolated from a deep geothermal reservoir. The availability of this genome can contribute to the clarification of the taxonomy of the closely related Anoxybacillus, Geobacillus, and Aeribacillus genera. PMID:26316637

  4. Ultra-deep T cell receptor sequencing reveals the complexity and intratumour heterogeneity of T cell clones in renal cell carcinomas

    PubMed Central

    Gerlinger, Marco; Quezada, Sergio A; Peggs, Karl S; Furness, Andrew JS; Fisher, Rosalie; Marafioti, Teresa; Shende, Vishvesh H; McGranahan, Nicholas; Rowan, Andrew J; Hazell, Steven; Hamm, David; Robins, Harlan S; Pickering, Lisa; Gore, Martin; Nicol, David L; Larkin, James; Swanton, Charles

    2013-01-01

    The recognition of cancer cells by T cells can impact upon prognosis and be exploited for immunotherapeutic approaches. This recognition depends on the specific interaction between antigens displayed on the surface of cancer cells and the T cell receptor (TCR), which is generated by somatic rearrangements of TCR α- and β-chains (TCRb). Our aim was to assess whether ultra-deep sequencing of the rearranged TCRb in DNA extracted from unfractionated clear cell renal cell carcinoma (ccRCC) samples can provide insights into the clonality and heterogeneity of intratumoural T cells in ccRCCs, a tumour type that can display extensive genetic intratumour heterogeneity (ITH). For this purpose, DNA was extracted from two to four tumour regions from each of four primary ccRCCs and was analysed by ultra-deep TCR sequencing. In parallel, tumour infiltration by CD4, CD8 and Foxp3 regulatory T cells was evaluated by immunohistochemistry and correlated with TCR-sequencing data. A polyclonal T cell repertoire with 367–16 289 (median 2394) unique TCRb sequences was identified per tumour region. The frequencies of the 100 most abundant T cell clones/tumour were poorly correlated between most regions (Pearson correlation coefficient, –0.218 to 0.465). 3–93% of these T cell clones were not detectable across all regions. Thus, the clonal composition of T cell populations can be heterogeneous across different regions of the same ccRCC. T cell ITH was higher in tumours pretreated with an mTOR inhibitor, which could suggest that therapy can influence adaptive tumour immunity. These data show that ultra-deep TCR-sequencing technology can be applied directly to DNA extracted from unfractionated tumour samples, allowing novel insights into the clonality of T cell populations in cancers. These were polyclonal and displayed ITH in ccRCC. TCRb sequencing may shed light on mechanisms of cancer immunity and the efficacy of immunotherapy approaches. Copyright © 2013 Pathological Society of

  5. Implications of spatial and temporal development of the aftershock sequence for the Mw 8.3 June 9, 1994 Deep Bolivian Earthquake

    NASA Astrophysics Data System (ADS)

    Myers, Stephen C.; Wallace, Terry C.; Beck, Susan L.; Silver, Paul G.; Zandt, George; Vandecar, John; Minaya, Estela

    On June 9, 1994 the Mw 8.3 Bolivia earthquake (636 km depth) occurred in a region which had not experienced significant, deep seismicity for at least 30 years. The mainshock and aftershocks were recorded in Bolivia on the BANJO and SEDA broadband seismic arrays and on the San Calixto Network. We used the joint hypocenter determination method to determine the relative location of the aftershocks. We have identified no foreshocks and 89 aftershocks (m > 2.2) for the 20-day period following the mainshock. The frequency of aftershock occurrence decreased rapidly, with only one or two aftershocks per day occuring after day two. The temporal decay of aftershock activity is similar to shallow aftershock sequences, but the number of aftershocks is two orders of magnitude less. Additionally, a mb ∼6, apparently triggered earthquake occurred just 10 minutes after the mainshock about 330 km east-southeast of the mainshock at a depth of 671 km. The aftershock sequence occurred north and east of the mainshock and extends to a depth of 665 km. The aftershocks define a slab striking N68°W and dipping 45°NE. The strike, dip, and location of the aftershock zone are consistent with this seismicity being confined within the downward extension of the subducted Nazca plate. The location and orientation of the aftershock sequence indicate that the subducted Nazca plate bends between the NNW striking zone of deep seismicity in western Brazil and the N-S striking zone of seismicity in central Bolivia. A tear in the deep slab is not necessitated by the data. A subset of the aftershock hypocenters cluster along a subhorizontal plane near the depth of the mainshock, favoring a horizontal fault plane. The horizontal dimensions of the mainshock [Beck et al., this issue; Silver et al., 1995] and slab defined by the aftershocks are approximately equal, indicating that the mainshock ruptured through the slab.

  6. Deep sequencing of HPV16 genomes: A new high-throughput tool for exploring the carcinogenicity and natural history of HPV16 infection

    PubMed Central

    Cullen, Michael; Boland, Joseph F.; Schiffman, Mark; Zhang, Xijun; Wentzensen, Nicolas; Yang, Qi; Chen, Zigui; Yu, Kai; Mitchell, Jason; Roberson, David; Bass, Sara; Burdette, Laurie; Machado, Moara; Ravichandran, Sarangan; Luke, Brian; Machiela, Mitchell J.; Andersen, Mark; Osentoski, Matt; Laptewicz, Michael; Wacholder, Sholom; Feldman, Ashlie; Raine-Bennett, Tina; Lorey, Thomas; Castle, Philip E.; Yeager, Meredith; Burk, Robert D.; Mirabello, Lisa

    2015-01-01

    For unknown reasons, there is huge variability in risk conferred by different HPV types and, remarkably, strong differences even between closely related variant lineages within each type. HPV16 is a uniquely powerful carcinogenic type, causing approximately half of cervical cancer and most other HPV-related cancers. To permit the large-scale study of HPV genome variability and precancer/cancer, starting with HPV16 and cervical cancer, we developed a high-throughput next-generation sequencing (NGS) whole-genome method. We designed a custom HPV16 AmpliSeq™ panel that generated 47 overlapping amplicons covering 99% of the genome sequenced on the Ion Torrent Proton platform. After validating with Sanger, the current “gold standard” of sequencing, in 89 specimens with concordance of 99.9%, we used our NGS method and custom annotation pipeline to sequence 796 HPV16-positive exfoliated cervical cell specimens. The median completion rate per sample was 98.0%. Our method enabled us to discover novel SNPs, large contiguous deletions suggestive of viral integration (OR of 27.3, 95% CI 3.3–222, P=0.002), and the sensitive detection of variant lineage coinfections. This method represents an innovative high-throughput, ultra-deep coverage technique for HPV genomic sequencing, which, in turn, enables the investigation of the role of genetic variation in HPV epidemiology and carcinogenesis. PMID:26645052

  7. RNA editing events in mitochondrial genes by ultra-deep sequencing methods: a comparison of cytoplasmic male sterile, fertile and restored genotypes in cotton.

    PubMed

    Suzuki, Hideaki; Yu, Jiwen; Ness, Scott A; O'Connell, Mary A; Zhang, Jinfa

    2013-09-01

    Cytoplasmic male sterility (CMS) is a maternally inherited trait resulting in failure to produce functional pollen and is widely used in the production of hybrid seed. Improper RNA editing is implicated as the molecular basis for some CMS systems. However, the mechanism of CMS in cotton is unknown. This study compared RNA editing events in eight mitochondrial genes (atp1, 4, 6, 8, 9, and cox1, 2, 3) among three lines (maintainer B, CMS A, and restorer R). These events were quantified by ultra-deep sequencing of mitochondrial transcripts and sequencing of cloned versions of these genes as cDNAs. A comparison of genomic PCR and RT-PCR products detected 72 editing sites in coding sequences in the eight genes and four partial editing sites in the 3'-untranslated region of atp6. The most frequent alteration (61.4 %) resulted in changes of hydrophilic amino acids to hydrophobic amino acids and the most common alteration was proline (P) to leucine (L) (26.7 %). In atp6, RNA editing created a stop codon from a glutamine in the genomic sequence. Statistical analysis of the frequencies of RNA editing events detected differences between mtDNA genes, but no differences between cotton cytoplasms that could account for the CMS phenotype or restoration. This study represents the first work to use next-generation sequencing to identify RNA editing positions and efficiency, and possible association with CMS and restoration in plants.

  8. Deep Sequencing of ESTs from Nacreous and Prismatic Layer Producing Tissues and a Screen for Novel Shell Formation-Related Genes in the Pearl Oyster

    PubMed Central

    Kinoshita, Shigeharu; Wang, Ning; Inoue, Haruka; Maeyama, Kaoru; Okamoto, Kikuhiko; Nagai, Kiyohito; Kondo, Hidehiro; Hirono, Ikuo; Asakawa, Shuichi; Watabe, Shugo

    2011-01-01

    Background Despite its economic importance, we have a limited understanding of the molecular mechanisms underlying shell formation in pearl oysters, wherein the calcium carbonate crystals, nacre and prism, are formed in a highly controlled manner. We constructed comprehensive expressed gene profiles in the shell-forming tissues of the pearl oyster Pinctada fucata and identified novel shell formation-related genes candidates. Principal Findings We employed the GS FLX 454 system and constructed transcriptome data sets from pallial mantle and pearl sac, which form the nacreous layer, and from the mantle edge, which forms the prismatic layer in P. fucata. We sequenced 260477 reads and obtained 29682 unique sequences. We also screened novel nacreous and prismatic gene candidates by a combined analysis of sequence and expression data sets, and identified various genes encoding lectin, protease, protease inhibitors, lysine-rich matrix protein, and secreting calcium-binding proteins. We also examined the expression of known nacreous and prismatic genes in our EST library and identified novel isoforms with tissue-specific expressions. Conclusions We constructed EST data sets from the nacre- and prism-producing tissues in P. fucata and found 29682 unique sequences containing novel gene candidates for nacreous and prismatic layer formation. This is the first report of deep sequencing of ESTs in the shell-forming tissues of P. fucata and our data provide a powerful tool for a comprehensive understanding of the molecular mechanisms of molluscan biomineralization. PMID:21731681

  9. Detection of Short-Range DNA Interactions in Mammalian Cells Using High-Resolution Circular Chromosome Conformation Capture Coupled to Deep Sequencing.

    PubMed

    Millau, Jean-François; Gaudreau, Luc

    2015-01-01

    DNA interactions shape the genome to physically and functionally connect regulatory elements to their target genes. Studying these interactions is crucial to understanding the molecular mechanisms that regulate gene expression. In this chapter, we present a protocol for high-resolution circular chromosome conformation capture coupled to deep sequencing. This methodology allows to investigate short-range DNA interactions (<100 kbp) and to obtain high-resolution DNA interaction maps of loci. It is a powerful tool to explore how regulatory elements and genes are connected together.

  10. Draft genome sequence of Thermococcus sp. EP1, a novel hyperthermophilic archaeon isolated from a deep-sea hydrothermal vent on the East Pacific Rise.

    PubMed

    Zhou, Meixian; Liu, Qing; Xie, Yunbiao; Dong, Binbin; Chen, Xiaoyao

    2016-04-01

    Thermococcus sp. strain EP1 is a novel anaerobic hyperthermophilic archaeon isolated from a deep-sea hydrothermal vent on the East Pacific Rise. It grows optimally at 80 °C and can produce industrial enzymes at high temperature. We report here the draft genome of EP1, which contains 1,819,157 bp with a G+C content of 39.3%. The sequence will provide the genetic basis for better understanding of adaptation to hydrothermal environment and the development of novel thermostable enzymes for industrial application.

  11. Characterization of rainbow trout gonad, brain and gill deep cDNA repertoires using a Roche 454-Titanium sequencing approach.

    PubMed

    Le Cam, Aurélie; Bobe, Julien; Bouchez, Olivier; Cabau, Cédric; Kah, Olivier; Klopp, Christophe; Lareyre, Jean-Jacques; Le Guen, Isabelle; Lluch, Jérôme; Montfort, Jérôme; Moreews, Francois; Nicol, Barbara; Prunet, Patrick; Rescan, Pierre-Yves; Servili, Arianna; Guiguen, Yann

    2012-05-25

    Rainbow trout, Oncorhynchus mykiss, is an important aquaculture species worldwide and, in addition to being of commercial interest, it is also a research model organism of considerable scientific importance. Because of the lack of a whole genome sequence in that species, transcriptomic analyses of this species have often been hindered. Using next-generation sequencing (NGS) technologies, we sought to fill these informational gaps. Here, using Roche 454-Titanium technology, we provide new tissue-specific cDNA repertoires from several rainbow trout tissues. Non-normalized cDNA libraries were constructed from testis, ovary, brain and gill rainbow trout tissue samples, and these different libraries were sequenced in 10 separate half-runs of 454-Titanium. Overall, we produced a total of 3million quality sequences with an average size of 328bp, representing more than 1Gb of expressed sequence information. These sequences have been combined with all publicly available rainbow trout sequences, resulting in a total of 242,187 clusters of putative transcript groups and 22,373 singletons. To identify the predominantly expressed genes in different tissues of interest, we developed a Digital Differential Display (DDD) approach. This approach allowed us to characterize the genes that are predominantly expressed within each tissue of interest. Of these genes, some were already known to be tissue-specific, thereby validating our approach. Many others, however, were novel candidates, demonstrating the usefulness of our strategy and of such tissue-specific resources. This new sequence information, acquired using NGS 454-Titanium technology, deeply enriched our current knowledge of the expressed genes in rainbow trout through the identification of an increased number of tissue-specific sequences. This identification allowed a precise cDNA tissue repertoire to be characterized in several important rainbow trout tissues. The rainbow trout contig browser can be accessed at the following

  12. Scaffolding and completing genome assemblies in real-time with nanopore sequencing

    PubMed Central

    Cao, Minh Duc; Nguyen, Son Hoang; Ganesamoorthy, Devika; Elliott, Alysha G.; Cooper, Matthew A.; Coin, Lachlan J. M.

    2017-01-01

    Third generation sequencing technologies provide the opportunity to improve genome assemblies by generating long reads spanning most repeat sequences. However, current analysis methods require substantial amounts of sequence data and computational resources to overcome the high error rates. Furthermore, they can only perform analysis after sequencing has completed, resulting in either over-sequencing, or in a low quality assembly due to under-sequencing. Here we present npScarf, which can scaffold and complete short read assemblies while the long read sequencing run is in progress. It reports assembly metrics in real-time so the sequencing run can be terminated once an assembly of sufficient quality is obtained. In assembling four bacterial and one eukaryotic genomes, we show that npScarf can construct more complete and accurate assemblies while requiring less sequencing data and computational resources than existing methods. Our approach offers a time- and resource-effective strategy for completing short read assemblies. PMID:28218240

  13. Full-length novel MHC class I allele discovery by next-generation sequencing: two platforms are better than one.

    PubMed

    Dudley, Dawn M; Karl, Julie A; Creager, Hannah M; Bohn, Patrick S; Wiseman, Roger W; O'Connor, David H

    2014-01-01

    Deep sequencing has revolutionized major histocompatibility complex (MHC) class I analysis of nonhuman primates by enabling high-throughput, economical, and comprehensive genotyping. Full-length MHC class I cDNA sequences, which are required to generate reagents such as MHC-peptide tetramers, cannot be directly obtained by short read deep sequencing. We combined data from two next-generation sequencing platforms to discover novel full-length MHC class I mRNA/cDNA transcripts in Chinese rhesus macaques. We first genotyped macaques by Roche/454 pyrosequencing using a 530-bp amplicon spanning the densely polymorphic exons 2 through 4 of the MHC class I loci that encode the peptide-binding region. We then mapped short paired-end 250 bp Illumina sequence reads spanning the full-length transcript to each 530-bp amplicon at high stringency and used paired-end information to reconstruct full-length allele sequences. We characterized 65 full-length sequences from six Chinese rhesus macaques. Overall, approximately 70 % of the alleles distinguished in these six animals contained new sequence information, including 29 novel transcripts. The flexibility of this approach should make full-length MHC class I allele genotyping accessible for any nonhuman primate population of interest. We are currently optimizing this method for full-length characterization of other highly polymorphic, duplicated loci such as the MHC class II DRB and killer immunoglobulin-like receptors. We anticipate that this method will facilitate rapid expansion and near completion of sequence libraries of polymorphic loci, such as MHC class I, within a few years.

  14. Sensitive Next-Generation Sequencing Method Reveals Deep Genetic Diversity of HIV-1 in the Democratic Republic of the Congo

    PubMed Central

    Wilkinson, Eduan; Vallari, Ana; McArthur, Carole; Sthreshley, Larry; Brennan, Catherine A.; Cloherty, Gavin; de Oliveira, Tulio

    2017-01-01

    ABSTRACT As the epidemiological epicenter of the human immunodeficiency virus (HIV) pandemic, the Democratic Republic of the Congo (DRC) is a reservoir of circulating HIV strains exhibiting high levels of diversity and recombination. In this study, we characterized HIV specimens collected in two rural areas of the DRC between 2001 and 2003 to identify rare strains of HIV. The env gp41 region was sequenced and characterized for 172 HIV-positive specimens. The env sequences were predominantly subtype A (43.02%), but 7 other subtypes (33.14%), 20 circulating recombinant forms (CRFs; 11.63%), and 20 unclassified (11.63%) sequences were also found. Of the rare and unclassified subtypes, 18 specimens were selected for next-generation sequencing (NGS) by a modified HIV-switching mechanism at the 5′ end of the RNA template (SMART) method to obtain full-genome sequences. NGS produced 14 new complete genomes, which included pure subtype C (n = 2), D (n = 1), F1 (n = 1), H (n = 3), and J (n = 1) genomes. The two subtype C genomes and one of the subtype H genomes branched basal to their respective subtype branches but had no evidence of recombination. The remaining 6 genomes were complex recombinants of 2 or more subtypes, including subtypes A1, F, G, H, J, and K and unclassified fragments, including one subtype CRF25 isolate, which branched basal to all CRF25 references. Notably, all recombinant subtype H fragments branched basal to the H clade. Spatial-geographical analysis indicated that the diverse sequences identified here did not expand globally. The full-genome and subgenomic sequences identified in our study population significantly increase the documented diversity of the strains involved in the continually evolving HIV-1 pandemic. IMPORTANCE Very little is known about the ancestral HIV-1 strains that founded the global pandemic, and very few complete genome sequences are available from patients in the Congo Basin, where HIV-1 expanded early in the global pandemic

  15. Sensitive Next-Generation Sequencing Method Reveals Deep Genetic Diversity of HIV-1 in the Democratic Republic of the Congo.

    PubMed

    Rodgers, Mary A; Wilkinson, Eduan; Vallari, Ana; McArthur, Carole; Sthreshley, Larry; Brennan, Catherine A; Cloherty, Gavin; de Oliveira, Tulio

    2017-03-15

    As the epidemiological epicenter of the human immunodeficiency virus (HIV) pandemic, the Democratic Republic of the Congo (DRC) is a reservoir of circulating HIV strains exhibiting high levels of diversity and recombination. In this study, we characterized HIV specimens collected in two rural areas of the DRC between 2001 and 2003 to identify rare strains of HIV. The env gp41 region was sequenced and characterized for 172 HIV-positive specimens. The env sequences were predominantly subtype A (43.02%), but 7 other subtypes (33.14%), 20 circulating recombinant forms (CRFs; 11.63%), and 20 unclassified (11.63%) sequences were also found. Of the rare and unclassified subtypes, 18 specimens were selected for next-generation sequencing (NGS) by a modified HIV-switching mechanism at the 5' end of the RNA template (SMART) method to obtain full-genome sequences. NGS produced 14 new complete genomes, which included pure subtype C (n = 2), D (n = 1), F1 (n = 1), H (n = 3), and J (n = 1) genomes. The two subtype C genomes and one of the subtype H genomes branched basal to their respective subtype branches but had no evidence of recombination. The remaining 6 genomes were complex recombinants of 2 or more subtypes, including subtypes A1, F, G, H, J, and K and unclassified fragments, including one subtype CRF25 isolate, which branched basal to all CRF25 references. Notably, all recombinant subtype H fragments branched basal to the H clade. Spatial-geographical analysis indicated that the diverse sequences identified here did not expand globally. The full-genome and subgenomic sequences identified in our study population significantly increase the documented diversity of the strains involved in the continually evolving HIV-1 pandemic.IMPORTANCE Very little is known about the ancestral HIV-1 strains that founded the global pandemic, and very few complete genome sequences are available from patients in the Congo Basin, where HIV-1 expanded early in the global pandemic. By

  16. Deep sequencing identifies circulating mouse miRNAs that are functionally implicated in manifestations of aging and responsive to calorie restriction.

    PubMed

    Dhahbi, Joseph M; Spindler, Stephen R; Atamna, Hani; Yamakawa, Amy; Guerrero, Noel; Boffelli, Dario; Mote, Patricia; Martin, David I K

    2013-02-01

    MicroRNAs (miRNAs) function to modulate gene expression, and through this property they regulate a broad spectrum of cellular processes. They can circulate in blood and thereby mediate cell-to-cell communication. Aging involves changes in many cellular processes that are potentially regulated by miRNAs, and some evidence has implicated circulating miRNAs in the aging process. In order to initiate a comprehensive assessment of the role of circulating miRNAs in aging, we have used deep sequencing to characterize circulating miRNAs in the serum of young mice, old mice, and old mice maintained on calorie restriction (CR). Deep sequencing identifies a set of novel miRNAs, and also accurately measures all known miRNAs present in serum. This analysis demonstrates that the levels of many miRNAs circulating in the mouse are increased with age, and that the increases can be antagonized by CR. The genes targeted by this set of age-modulated miRNAs are predicted to regulate biological processes directly relevant to the manifestations of aging including metabolic changes, and the miRNAs themselves have been linked to diseases associated with old age. This finding implicates circulating miRNAs in the aging process, raising questions about their tissues of origin, their cellular targets, and their functional role in metabolic changes that occur with aging.

  17. Deep sequencing of voodoo lily (Amorphophallus konjac): an approach to identify relevant genes involved in the synthesis of the hemicellulose glucomannan.

    PubMed

    Gille, Sascha; Cheng, Kun; Skinner, Mary E; Liepman, Aaron H; Wilkerson, Curtis G; Pauly, Markus

    2011-09-01

    A Roche 454 cDNA deep sequencing experiment was performed on a developing corm of Amorphophallus konjac--also known as voodoo lily. The dominant storage polymer in the corm of this plant is the polysaccharide glucomannan, a hemicellulose known to exist in the cell walls of higher plants and a major component of plant biomass derived from softwoods. A total of 246 mega base pairs of sequence data was obtained from which 4,513 distinct contigs were assembled. Within this voodoo lily expressed sequence tag collection genes representing the carbohydrate related pathway of glucomannan biosynthesis were identified, including sucrose metabolism, nucleotide sugar conversion pathways for the formation of activated precursors as well as a putative glucomannan synthase. In vivo expression of the putative glucomannan synthase and subsequent in vitro activity assays unambiguously demonstrate that the enzyme has indeed glucomannan mannosyl- and glucosyl transferase activities. Based on the expressed sequence tag analysis hitherto unknown pathways for the synthesis of GDP-glucose, a necessary precursor for glucomannan biosynthesis, could be proposed. Moreover, the results highlight transcriptional bottlenecks for the synthesis of this hemicellulose.

  18. Deep Sequencing of T-Cell Receptor DNA as a biomarker of clonally expanded TILs in breast cancer after immunotherapy

    PubMed Central

    Page, David B.; Yuan, Jianda; Redmond, David; Wen, Y Hanna; Durack, Jeremy C.; Emerson, Ryan; Solomon, Stephen; Dong, Zhiwan; Wong, Phillip; Comstock, Christopher; Diab, Adi; Sung, Janice; Maybody, Majid; Morris, Elizabeth; Brogi, Edi; Morrow, Monica; Sacchini, Virgilio; Elemento, Olivier; Robins, Harlan; Patil, Sujata; Allison, James P.; Wolchok, Jedd D.; Hudis, Clifford; Norton, Larry; McArthur, Heather

    2016-01-01

    In early stage breast cancer, the degree of tumor-infiltrating lymphocytes (TILs) predicts response to chemotherapy and overall survival. Combination immunotherapy with immune checkpoint antibody plus tumor cryoablation can induce lymphocytic infiltrates and improve survival in mice. We used T-cell receptor (TCR) DNA sequencing to evaluate both the effect of cryo-immunotherapy in humans and the feasibility of TCR sequencing in early-stage breast cancer. In a pilot clinical trial, 18 women with early-stage breast cancer were treated preoperatively with cryoablation, single-dose anti-CTLA-4 (ipilimumab), or cryoablation + ipilimumab. TCRs within serially collected peripheral blood and tumor tissue were sequenced. In baseline tumor tissues, T-cell density as measured by TCR sequencing correlated with TIL scores obtained by hematoxylin and eosin (H&E) staining. However, tumors with little or no lymphocytes by H&E contained up to 3.6 × 106 TCR DNA sequences, highlighting the sensitivity of the ImmunoSEQ platform. In this dataset, ipilimumab increased intratumoral T-cell density over time, whereas cryoablation ± ipilimumab diversified and remodeled the intratumoral T-cell clonal repertoire. Compared to monotherapy, cryoablation plus ipilimumab was associated with numerically greater numbers of peripheral blood and intratumoral T-cell clones expanding robustly following therapy. In conclusion, TCR sequencing correlates with H&E lymphocyte scoring, and provides additional information on clonal diversity. These findings support further study of the use of TCR sequencing as a biomarker for T cell responses to therapy and for the study of cryo-immunotherapy in early-stage breast cancer. PMID:27587469

  19. Deep Sequencing of T-cell Receptor DNA as a Biomarker of Clonally Expanded TILs in Breast Cancer after Immunotherapy.

    PubMed

    Page, David B; Yuan, Jianda; Redmond, David; Wen, Y Hanna; Durack, Jeremy C; Emerson, Ryan; Solomon, Stephen; Dong, Zhiwan; Wong, Phillip; Comstock, Christopher; Diab, Adi; Sung, Janice; Maybody, Majid; Morris, Elizabeth; Brogi, Edi; Morrow, Monica; Sacchini, Virgilio; Elemento, Olivier; Robins, Harlan; Patil, Sujata; Allison, James P; Wolchok, Jedd D; Hudis, Clifford; Norton, Larry; McArthur, Heather L

    2016-10-01

    In early-stage breast cancer, the degree of tumor-infiltrating lymphocytes (TIL) predicts response to chemotherapy and overall survival. Combination immunotherapy with immune checkpoint antibody plus tumor cryoablation can induce lymphocytic infiltrates and improve survival in mice. We used T-cell receptor (TCR) DNA sequencing to evaluate both the effect of cryoimmunotherapy in humans and the feasibility of TCR sequencing in early-stage breast cancer. In a pilot clinical trial, 18 women with early-stage breast cancer were treated preoperatively with cryoablation, single-dose anti-CTLA-4 (ipilimumab), or cryoablation + ipilimumab. TCRs within serially collected peripheral blood and tumor tissue were sequenced. In baseline tumor tissues, T-cell density as measured by TCR sequencing correlated with TIL scores obtained by hematoxylin and eosin (H&E) staining. However, tumors with little or no lymphocytes by H&E contained up to 3.6 × 10(6) TCR DNA sequences, highlighting the sensitivity of the ImmunoSEQ platform. In this dataset, ipilimumab increased intratumoral T-cell density over time, whereas cryoablation ± ipilimumab diversified and remodeled the intratumoral T-cell clonal repertoire. Compared with monotherapy, cryoablation plus ipilimumab was associated with numerically greater numbers of peripheral blood and intratumoral T-cell clones expanding robustly following therapy. In conclusion, TCR sequencing correlates with H&E lymphocyte scoring and provides additional information on clonal diversity. These findings support further study of the use of TCR sequencing as a biomarker for T-cell responses to therapy and for the study of cryoimmunotherapy in early-stage breast cancer. Cancer Immunol Res; 4(10); 835-44. ©2016 AACR.

  20. Deep re-sequencing of a widely used maintainer line of hybrid rice for discovery of DNA polymorphisms and evaluation of genetic diversity.

    PubMed

    Hu, Yuanyi; Mao, Bigang; Peng, Yan; Sun, Yidan; Pan, Yinlin; Xia, Yumei; Sheng, Xiabing; Li, Yaokui; Tang, Li; Yuan, Longping; Zhao, Bingran

    2014-06-01

    Genetic diversity within parental lines of hybrid rice is the foundation of heterosis utilization and yield improvement. Previous studies have suggested that genetic diversity was narrow in cytoplasmic male sterile (CMS/A line) and restorer lines (R line) for Three-line hybrid rice. However, the genetic diversity within maintainer lines (B line), especially at a genome-wide scale, remains largely unknown. In the present study, we performed deep re-sequencing of the elite maintainer line V20B (Oryza sativa L. ssp. indica). We then compared the V20B sequence with the 93-11 (Oryza sativa L. ssp. indica) genome sequence. 112.1 × 106 paired-end reads (PE reads) were generated with approximately 30-fold sequencing depth. The V20B PE reads uniquely covered 87.6 % of the 93-11 genome sequence. Overall, a total of 660,778 single-nucleotide polymorphism (SNPs) and 266,301 insertions and deletions (InDels) were identified, yielding an average of 2.1 SNPs/kb and 0.8 InDels/kb. Genome-wide distribution of the SNPs and InDels was non-random, and variation-rich and variation-poor regions were identified in all chromosomes. A total of 20,562 non-synonymous SNPs spanning 8,854 genes were annotated. Our results identified DNA polymorphisms at the genome-wide scale and uncovered the high level of genetic diversity between V20B and 93-11. Our results proved that next-generation sequencing technologies can be powerful tools to study genome-wide DNA polymorphisms, to query genetic diversity, and to enable molecular improvement efforts with Three-line hybrid rice. Further, our results also indicated that 93-11 could be used as core germplasm for the improvement of wild-abortive CMS lines and the maintainer lines.

  1. Identification of SSRs and differentially expressed genes in two cultivars of celery (Apium graveolens L.) by deep transcriptome sequencing.

    PubMed

    Li, Meng-Yao; Wang, Feng; Jiang, Qian; Ma, Jing; Xiong, Ai-Sheng

    2014-01-01

    Celery (Apium graveolens L.) is one of the most important and widely grown vegetables in the Apiaceae family. Due to the lack of comprehensive genomic resources, research on celery has mainly utilized physiological and biochemical approaches, rather than molecular biology, to study this crop. Transcriptome sequencing has become an efficient and economic technology for obtaining information on gene expression that can greatly facilitate molecular and genomic studies of species for which a sequenced genome is not available. In the present study, 15 893 516 and 19 818 161 high-quality sequences were obtained by RNA-seq from two celery varieties 'Ventura' and 'Jinnan Shiqin', respectively. The obtained reads were assembled into 39 584 and 41 740 unigenes with mean lengths of 683 bp and 690 bp, respectively. A total of 1939 simple sequence repeat (SSR) markers were identified in 'Ventura' and 2004 SSRs in 'Jinnan Shiqin'. Di-nucleotide repeats were the most common repeat motif, accounting for 55.49% and 54.84% in 'Ventura' and 'Jinnan Shiqin', respectively. A comparison of expressed genes between the two libraries, identified 338 differentially expressed genes (DEGs). Three hundred and three of the DEGs were annotated based on a sequence similarity search utilizing eight public databases. Additionally, the expression profile of eight annotated DEGs was characterized in response to abiotic stresses. The collective data generated in the present research represent a valuable resource for further genetic and molecular studies in celery.

  2. Deep sequencing is an appropriate tool for the selection of unique Hepatitis C virus (HCV) variants after single genomic amplification.

    PubMed

    Guinoiseau, Thibault; Moreau, Alain; Hohnadel, Guillaume; Ngo-Giang-Huong, Nicole; Brulard, Celine; Vourc'h, Patrick; Goudeau, Alain; Gaudy-Graffin, Catherine

    2017-01-01

    Hepatitis C virus (HCV) evolves rapidly in a single host and circulates as a quasispecies wich is a complex mixture of genetically distinct virus's but closely related namely variants. To identify intra-individual diversity and investigate their functional properties in vitro, it is necessary to define their quasispecies composition and isolate the HCV variants. This is possible using single genome amplification (SGA). This technique, based on serially diluted cDNA to amplify a single cDNA molecule (clonal amplicon), has already been used to determine individual HCV diversity. In these studies, positive PCR reactions from SGA were directly sequenced using Sanger technology. The detection of non-clonal amplicons is necessary for excluding them to facilitate further functional analysis. Here, we compared Next Generation Sequencing (NGS) with De Novo assembly and Sanger sequencing for their ability to distinguish clonal and non-clonal amplicons after SGA on one plasma specimen. All amplicons (n = 42) classified as clonal by NGS were also classified as clonal by Sanger sequencing. No double peaks were seen on electropherograms for non-clonal amplicons with position-specific nucleotide variation below 15% by NGS. Altogether, NGS circumvented many of the difficulties encountered when using Sanger sequencing after SGA and is an appropriate tool to reliability select clonal amplicons for further functional studies.

  3. Ultra-deep sequencing enables high-fidelity recovery of biodiversity for bulk arthropod samples without PCR amplification

    PubMed Central

    2013-01-01

    Background Next-generation-sequencing (NGS) technologies combined with a classic DNA barcoding approach have enabled fast and credible measurement for biodiversity of mixed environmental samples. However, the PCR amplification involved in nearly all existing NGS protocols inevitably introduces taxonomic biases. In the present study, we developed new Illumina pipelines without PCR amplifications to analyze terrestrial arthropod communities. Results Mitochondrial enrichment directly followed by Illumina shotgun sequencing, at an ultra-high sequence volume, enabled the recovery of Cytochrome c Oxidase subunit 1 (COI) barcode sequences, which allowed for the estimation of species composition at high fidelity for a terrestrial insect community. With 15.5 Gbp Illumina data, approximately 97% and 92% were detected out of the 37 input Operational Taxonomic Units (OTUs), whether the reference barcode library was used or not, respectively, while only 1 novel OTU was found for the latter. Additionally, relatively strong correlation between the sequencing volume and the total biomass was observed for species from the bulk sample, suggesting a potential solution to reveal relative abundance. Conclusions The ability of the new Illumina PCR-free pipeline for DNA metabarcoding to detect small arthropod specimens and its tendency to avoid most, if not all, false positives suggests its great potential in biodiversity-related surveillance, such as in biomonitoring programs. However, further improvement for mitochondrial enrichment is likely needed for the application of the new pipeline in analyzing arthropod communities at higher diversity. PMID:23587339

  4. A Method for Amplicon Deep Sequencing of Drug Resistance Genes in Plasmodium falciparum Clinical Isolates from India

    PubMed Central

    Rao, Pavitra N.; Uplekar, Swapna; Kayal, Sriti; Mallick, Prashant K.; Bandyopadhyay, Nabamita; Kale, Sonal; Singh, Om P.; Mohanty, Akshaya; Mohanty, Sanjib; Wassmer, Samuel C.

    2016-01-01

    A major challenge to global malaria control and elimination is early detection and containment of emerging drug resistance. Next-generation sequencing (NGS) methods provide the resolution, scalability, and sensitivity required for high-throughput surveillance of molecular markers of drug resistance. We have developed an amplicon sequencing method on the Ion Torrent PGM platform for targeted resequencing of a panel of six Plasmodium falciparum genes implicated in resistance to first-line antimalarial therapy, including artemisinin combination therapy, chloroquine, and sulfadoxine-pyrimethamine. The protocol was optimized using 12 geographically diverse P. falciparum reference strains and successfully applied to multiplexed sequencing of 16 clinical isolates from India. The sequencing results from the reference strains showed 100% concordance with previously reported drug resistance-associated mutations. Single-nucleotide polymorphisms (SNPs) in clinical isolates revealed a number of known resistance-associated mutations and other nonsynonymous mutations that have not been implicated in drug resistance. SNP positions containing multiple allelic variants were used to identify three clinical samples containing mixed genotypes indicative of multiclonal infections. The amplicon sequencing protocol has been designed for the benchtop Ion Torrent PGM platform and can be operated with minimal bioinformatics infrastructure, making it ideal for use in countries that are endemic for the disease to facilitate routine large-scale surveillance of the emergence of drug resistance and to ensure continued success of the malaria treatment policy. PMID:27008882

  5. Discovery and profiling of novel and conserved microRNAs during flower development in Carya cathayensis via deep sequencing.

    PubMed

    Wang, Zheng Jia; Huang, Jian Qin; Huang, You Jun; Li, Zheng; Zheng, Bing Song

    2012-08-01

    Hickory (Carya cathayensis Sarg.) is an economically important woody plant in China, but its long juvenile phase delays yield. MicroRNAs (miRNAs) are critical regulators of genes and important for normal plant development and physiology, including flower development. We used Solexa technology to sequence two small RNA libraries from two floral differentiation stages in hickory to identify miRNAs related to flower development. We identified 39 conserved miRNA sequences from 114 loci belonging to 23 families as well as two novel and ten potential novel miRNAs belonging to nine families. Moreover, 35 conserved miRNA*s and two novel miRNA*s were detected. Twenty miRNA sequences from 49 loci belonging to 11 families were differentially expressed; all were up-regulated at the later stage of flower development in hickory. Quantitative real-time PCR of 12 conserved miRNA sequences, five novel miRNA families, and two novel miRNA*s validated that all were expressed during hickory flower development, and the expression patterns were similar to those detected with Solexa sequencing. Finally, a total of 146 targets of the novel and conserved miRNAs were predicted. This study identified a diverse set of miRNAs that were closely related to hickory flower development and that could help in plant floral induction.

  6. Deep sequencing is an appropriate tool for the selection of unique Hepatitis C virus (HCV) variants after single genomic amplification

    PubMed Central

    Guinoiseau, Thibault; Moreau, Alain; Hohnadel, Guillaume; Ngo-Giang-Huong, Nicole; Brulard, Celine; Vourc’h, Patrick; Goudeau, Alain; Gaudy-Graffin, Catherine

    2017-01-01

    Hepatitis C virus (HCV) evolves rapidly in a single host and circulates as a quasispecies wich is a complex mixture of genetically distinct virus’s but closely related namely variants. To identify intra-individual diversity and investigate their functional properties in vitro, it is necessary to define their quasispecies composition and isolate the HCV variants. This is possible using single genome amplification (SGA). This technique, based on serially diluted cDNA to amplify a single cDNA molecule (clonal amplicon), has already been used to determine individual HCV diversity. In these studies, positive PCR reactions from SGA were directly sequenced using Sanger technology. The detection of non-clonal amplicons is necessary for excluding them to facilitate further functional analysis. Here, we compared Next Generation Sequencing (NGS) with De Novo assembly and Sanger sequencing for their ability to distinguish clonal and non-clonal amplicons after SGA on one plasma specimen. All amplicons (n = 42) classified as clonal by NGS were also classified as clonal by Sanger sequencing. No double peaks were seen on electropherograms for non-clonal amplicons with position-specific nucleotide variation below 15% by NGS. Altogether, NGS circumvented many of the difficulties encountered when using Sanger sequencing after SGA and is an appropriate tool to reliability select clonal amplicons for further functional studies. PMID:28362878

  7. Ultra-deep sequencing detects ovarian cancer cells in peritoneal fluid and reveals somatic TP53 mutations in noncancerous tissues.

    PubMed

    Krimmel, Jeffrey D; Schmitt, Michael W; Harrell, Maria I; Agnew, Kathy J; Kennedy, Scott R; Emond, Mary J; Loeb, Lawrence A; Swisher, Elizabeth M; Risques, Rosa Ana

    2016-05-24

    Current sequencing methods are error-prone, which precludes the identification of low frequency mutations for early cancer detection. Duplex sequencing is a sequencing technology that decreases errors by scoring mutations present only in both strands of DNA. Our aim was to determine whether duplex sequencing could detect extremely rare cancer cells present in peritoneal fluid from women with high-grade serous ovarian carcinomas (HGSOCs). These aggressive cancers are typically diagnosed at a late stage and are characterized by TP53 mutations and peritoneal dissemination. We used duplex sequencing to analyze TP53 mutations in 17 peritoneal fluid samples from women with HGSOC and 20 from women without cancer. The tumor TP53 mutation was detected in 94% (16/17) of peritoneal fluid samples from women with HGSOC (frequency as low as 1 mutant per 24,736 normal genomes). Additionally, we detected extremely low frequency TP53 mutations (median mutant fraction 1/13,139) in peritoneal fluid from nearly all patients with and without cancer (35/37). These mutations were mostly deleterious, clustered in hotspots, increased with age, and were more abundant in women with cancer than in controls. The total burden of TP53 mutations in peritoneal fluid distinguished cancers from controls with 82% sensitivity (14/17) and 90% specificity (18/20). Age-associated, low frequency TP53 mutations were also found in 100% of peripheral blood samples from 15 women with and without ovarian cancer (none with hematologic disorder). Our results demonstrate the ability of duplex sequencing to detect rare cancer cells and provide evidence of widespread, low frequency, age-associated somatic TP53 mutation in noncancerous tissue.

  8. Global Transcriptome Analysis of the Tentacle of the Jellyfish Cyanea capillata Using Deep Sequencing and Expressed Sequence Tags: Insight into the Toxin- and Degenerative Disease-Related Transcripts

    PubMed Central

    Liu, Dan; Wang, Qianqian; Ruan, Zengliang; He, Qian; Zhang, Liming

    2015-01-01

    Background Jellyfish contain diverse toxins and other bioactive components. However, large-scale identification of novel toxins and bioactive components from jellyfish has been hampered by the low efficiency of traditional isolation and purification methods. Results We performed de novo transcriptome sequencing of the tentacle tissue of the jellyfish Cyanea capillata. A total of 51,304,108 reads were obtained and assembled into 50,536 unigenes. Of these, 21,357 unigenes had homologues in public databases, but the remaining unigenes had no significant matches due to the limited sequence information available and species-specific novel sequences. Functional annotation of the unigenes also revealed general gene expression profile characteristics in the tentacle of C. capillata. A primary goal of this study was to identify putative toxin transcripts. As expected, we screened many transcripts encoding proteins similar to several well-known toxin families including phospholipases, metalloproteases, serine proteases and serine protease inhibitors. In addition, some transcripts also resembled molecules with potential toxic activities, including cnidarian CfTX-like toxins with hemolytic activity, plancitoxin-1, venom toxin-like peptide-6, histamine-releasing factor, neprilysin, dipeptidyl peptidase 4, vascular endothelial growth factor A, angiotensin-converting enzyme-like and endothelin-converting enzyme 1-like proteins. Most of these molecules have not been previously reported in jellyfish. Interestingly, we also characterized a number of transcripts with similarities to proteins relevant to several degenerative diseases, including Huntington’s, Alzheimer’s and Parkinson’s diseases. This is the first description of degenerative disease-associated genes in jellyfish. Conclusion We obtained a well-categorized and annotated transcriptome of C. capillata tentacle that will be an important and valuable resource for further understanding of jellyfish at the molecular

  9. Identification of anti-CA125 antibody responses in ovarian cancer patients by a novel deep sequence-coupled biopanning platform

    PubMed Central

    Frietze, Kathryn M.; Roden, Richard B.S.; Lee, Ji-Hyun; Shi, Yang; Peabody, David S.; Chackerian, Bryce

    2015-01-01

    High-grade epithelial ovarian cancer (OvCa) kills more women than any other gynecologic cancer and is rarely diagnosed at an early stage. We sought to identify tumor-associated antigens (TAA) as candidate diagnostic and/or immunotherapeutic targets by taking advantage of tumor autoantibody responses in individuals with OvCa. Plasma-derived IgG from a pool of five patients with advanced OvCa was subjected to iterative biopanning using a library of bacteriophage MS2 virus-like particles (MS2-VLPs) displaying diverse short random peptides. After two rounds of biopanning, we analyzed the selectant population of MS2-VLPs by Ion Torrent deep-sequencing. One of the top 25 most abundant peptides identified (DISGTNTSRA) had sequence similarity to cancer antigen 125 (CA125/MUC16), a well-known OvCa-associated antigen. Mice immunized with MS2-DISGTNTSRA generated antibodies that cross-reacted with purified soluble CA125 from OvCa cells but not membrane-bound CA125, indicating that the DISGTNTSRA peptide was a CA125/MUC16 peptide mimic of soluble CA125. Pre-operative OvCa patient plasma (n = 100) was assessed for anti-DISGTNTSRA, anti-CA125, and CA125. Patients with normal CA125 (< 35 IU/mL) at time of diagnosis had significantly more antibodies to DISGTNTSRA and to CA125 than those patients who had high CA125 (> 35 IU/mL). A statistically significant survival advantage was observed for patients who had either normal CA125 and/or higher concentrations of antibodies to CA125 at time of diagnosis. These data show the feasibility of using deep sequence-coupled biopanning to identify TAA autoantibody responses from cancer patient plasma and suggest a possible antibody-mediated mechanism for low CA125 plasma concentrations in some OvCa patients. PMID:26589767

  10. Identification of Anti-CA125 Antibody Responses in Ovarian Cancer Patients by a Novel Deep Sequence-Coupled Biopanning Platform.

    PubMed

    Frietze, Kathryn M; Roden, Richard B S; Lee, Ji-Hyun; Shi, Yang; Peabody, David S; Chackerian, Bryce

    2016-02-01

    High-grade epithelial ovarian cancer kills more women than any other gynecologic cancer and is rarely diagnosed at an early stage. We sought to identify tumor-associated antigens (TAA) as candidate diagnostic and/or immunotherapeutic targets by taking advantage of tumor autoantibody responses in individuals with ovarian cancer. Plasma-derived IgG from a pool of five patients with advanced ovarian cancer was subjected to iterative biopanning using a library of bacteriophage MS2 virus-like particles (MS2-VLPs) displaying diverse short random peptides. After two rounds of biopanning, we analyzed the selectant population of MS2-VLPs by Ion Torrent deep sequencing. One of the top 25 most abundant peptides identified (DISGTNTSRA) had sequence similarity to cancer antigen 125 (CA125/MUC16), a well-known ovarian cancer-associated antigen. Mice immunized with MS2-DISGTNTSRA generated antibodies that cross-reacted with purified soluble CA125 from ovarian cancer cells but not membrane-bound CA125, indicating that the DISGTNTSRA peptide was a CA125/MUC16 peptide mimic of soluble CA125. Preoperative ovarian cancer patient plasma (n = 100) was assessed for anti-DISGTNTSRA, anti-CA125, and CA125. Patients with normal CA125 (<35 IU/mL) at the time of diagnosis had significantly more antibodies to DISGTNTSRA and to CA125 than those patients who had high CA125 (>35 IU/mL). A statistically significant survival advantage was observed for patients who had either normal CA125 and/or higher concentrations of antibodies to CA125 at the time of diagnosis. These data show the feasibility of using deep sequence-coupled biopanning to identify TAA autoantibody responses from cancer patient plasma and suggest a possible antibody-mediated mechanism for low CA125 plasma concentrations in some ovarian cancer patients.

  11. Quantitative T cell repertoire analysis by deep cDNA sequencing of T cell receptor α and β chains using next-generation sequencing (NGS)

    PubMed Central

    Fang, Hua; Yamaguchi, Rui; Liu, Xiao; Daigo, Yataro; Yew, Poh Yin; Tanikawa, Chizu; Matsuda, Koichi; Imoto, Seiya; Miyano, Satoru; Nakamura, Yusuke

    2015-01-01

    Immune responses play a critical role in various disease conditions including cancer and autoimmune diseases. However, to date, there has not been a rapid, sensitive, comprehensive, and quantitative analysis method to examine T-cell or B-cell immune responses. Here, we report a new approach to characterize T cell receptor (TCR) repertoire by sequencing millions of cDNA of TCR α and β chains in combination with a newly-developed algorithm. Using samples from lung cancer patients treated with cancer peptide vaccines as a model, we demonstrate that detailed information of the V-(D)-J combination along with complementary determining region 3 (CDR3) sequences can be determined. We identified extensive abnormal splicing of TCR transcripts in lung cancer samples, indicating the dysfunctional splicing machinery in T lymphocytes by prior chemotherapy. In addition, we found three potentially novel TCR exons that have not been described previously in the reference genome. This newly developed TCR NGS platform can be applied to better understand immune responses in many disease areas including immune disorders, allergies, and organ transplantations. PMID:25964866

  12. Draft Genome Sequence of the Deep-Sea Basidiomycetous Yeast Cryptococcus sp. Strain Mo29 Reveals Its Biotechnological Potential

    PubMed Central

    Rédou, Vanessa; Kumar, Abhishek; Hainaut, Matthieu; Henrissat, Bernard; Record, Eric; Barbier, Georges

    2016-01-01

    Cryptococcus sp. strain Mo29 was isolated from the Rainbow hydrothermal site on the Mid-Atlantic Ridge. Here, we present the draft genome sequence of this basidiomycetous yeast strain, which has highlighted its biotechnological potential as revealed by the presence of genes involved in the synthesis of secondary metabolites and biotechnologically important enzymes. PMID:27389259

  13. Characterization of attached bacterial populations in deep granitic groundwater from the Stripa research mine by 16S rRNA gene sequencing and scanning electron microscopy.

    PubMed

    Ekendahl, S; Arlinger, J; Ståhl, F; Pedersen, K

    1994-07-01

    This paper presents the molecular characterization of attached bacterial populations growing in slowly flowing artesian groundwater from deep crystalline bed-rock of the Stripa mine, south central Sweden. Bacteria grew on glass slides in laminar flow reactors connected to the anoxic groundwater flowing up through tubing from two levels of a borehole, 812-820 m and 970-1240 m. The glass slides were collected, the bacterial DNA was extracted and the 16S rRNA genes were amplified by PCR using primers matching universally conserved positions 519-536 and 1392-1405. The resulting PCR fragments were subsequently cloned and sequenced. The sequences were compared with each other and with 16S rRNA gene sequences in the EMBL database. Three major groups of bacteria were found. Signature bases placed the clones in the appropriate systematic groups. All belonged to the proteobacterial groups beta and gamma. One group was found only at the 812-820 m level, where it constituted 63% of the sequenced clones, whereas the second group existed almost exclusively at the 970-1240 m level, where it constituted 83% of the sequenced clones. The third group was equally distributed between the levels. A few other bacteria were also found. None of the 16S rRNA genes from the dominant bacteria showed more than 88% similarity to any of the others, and none of them resembled anything in the database by more than 96%. Temperature did not seem to have any effect on species composition at the deeper level. SEM images showed rods appearing in microcolonies.(ABSTRACT TRUNCATED AT 250 WORDS)

  14. From clinical sample to complete genome: Comparing methods for the extraction of HIV-1 RNA for high-throughput deep sequencing.

    PubMed

    Cornelissen, Marion; Gall, Astrid; Vink, Monique; Zorgdrager, Fokla; Binter, Špela; Edwards, Stephanie; Jurriaans, Suzanne; Bakker, Margreet; Ong, Swee Hoe; Gras, Luuk; van Sighem, Ard; Bezemer, Daniela; de Wolf, Frank; Reiss, Peter; Kellam, Paul; Berkhout, Ben; Fraser, Christophe; van der Kuyl, Antoinette C

    2016-08-04

    The BEEHIVE (Bridging the Evolution and Epidemiology of HIV in Europe) project aims to analyse nearly-complete viral genomes from >3000 HIV-1 infected Europeans using high-throughput deep sequencing techniques to investigate the virus genetic contribution to virulence. Following the development of a computational pipeline, including a new de novo assembler for RNA virus genomes, to generate larger contiguous sequences (contigs) from the abundance of short sequence reads that characterise the data, another area that determines genome sequencing success is the quality and quantity of the input RNA. A pilot experiment with 125 patient plasma samples was performed to investigate the optimal method for isolation of HIV-1 viral RNA for long amplicon genome sequencing. Manual isolation with the QIAamp Viral RNA Mini Kit (Qiagen) was superior over robotically extracted RNA using either the QIAcube robotic system, the mSample Preparation Systems RNA kit with automated extraction by the m2000sp system (Abbott Molecular), or the MagNA Pure 96 System in combination with the MagNA Pure 96 Instrument (Roche Diagnostics). We scored amplification of a set of four HIV-1 amplicons of ∼1.9, 3.6, 3.0 and 3.5kb, and subsequent recovery of near-complete viral genomes. Subsequently, 616 BEEHIVE patient samples were analysed to determine factors that influence successful amplification of the genome in four overlapping amplicons using the QIAamp Viral RNA Kit for viral RNA isolation. Both low plasma viral load and high sample age (stored before 1999) negatively influenced the amplification of viral amplicons >3kb. A plasma viral load of >100,000 copies/ml resulted in successful amplification of all four amplicons for 86% of the samples, this value dropped to only 46% for samples with viral loads of <20,000 copies/ml.

  15. [Sequence of venous blood flow alterations in patients after recently endured acute thrombosis of lower-limb deep veins based on the findings of ultrasonographic duplex scanning].

    PubMed

    Tarkovskiĭ, A A; Zudin, A M; Aleksandrova, E S

    2009-01-01

    This study was undertaken to investigate the sequence of alterations in the venous blood flow to have occurred within the time frame of one year after sustained acute thrombosis of the lower-limb deep veins, which was carried out using the standard technique of ultrasonographic duplex scanning. A total of thirty-two 24-to-62-year-old patients presenting with newly onset acute phlebothrombosis were followed up. All the patients were sequentially examined at 2 days, 3 weeks, 3 months, 6 months and 12 months after the manifestation of the initial clinical signs of the disease. Amongst the parameters to determine were the patency of the deep veins and the condition of the valvular apparatus of the deep, superficial and communicant veins. According to the obtained findings, it was as early as at the first stage of the phlebohaemodynamic alterations after the endured thrombosis, i. e., during the acute period of the disease, that seven (21.9%) patients were found to have developed valvular insufficiency of the communicant veins of the cms, manifesting itself in the formation of a horizontal veno-venous reflux, and 6 months later, these events were observed to have occurred in all the patients examined (100%). Afterwards, the second stage of the phlebohaemodynamic alterations was, simultaneously with the process of recanalization of the thrombotic masses in the deep veins, specifically characterized by the formation of valvular insufficiency of the latter, manifesting itself in the form of the development of a deep vertical veno-venous reflux, which was revealed at month six after the onset of the disease in 56.3% of the examined subjects, to be then observed after 12 months in 93.8% of the patients involved. Recanalization of thrombotic masses was noted to commence 3 months after the onset of thrombosis in twelve (37.5%) patients, and after 12 months it was seen to ensue in all the patients (100%), eventually ending in complete restoration of the patency of the affected

  16. The utility of diversity profiling using Illumina 18S rRNA gene amplicon deep sequencing to detect and discriminate Toxoplasma gondii among the cyst-forming coccidia.

    PubMed

    Cooper, Madalyn K; Phalen, David N; Donahoe, Shannon L; Rose, Karrie; Šlapeta, Jan

    2016-01-30

    Next-generation sequencing (NGS) has the capacity to screen a single DNA sample and detect pathogen DNA from thousands of host DNA sequence reads, making it a versatile and informative tool for investigation of pathogens in diseased animals. The technique is effective and labor saving in the initial identification of pathogens, and will complement conventional diagnostic tests to associate the candidate pathogen with a disease process. In this report, we investigated the utility of the diversity profiling NGS approach using Illumina small subunit ribosomal RNA (18S rRNA) gene amplicon deep sequencing to detect Toxoplasma gondii in previously confirmed cases of toxoplasmosis. We then tested the diagnostic approach with species-specific PCR genotyping, histopathology and immunohistochemistry of toxoplasmosis in a Risso's dolphin (Grampus griseus) to systematically characterise the disease and associate causality. We show that the Euk7A/Euk570R primer set targeting the V1-V3 hypervariable region of the 18S rRNA gene can be used as a species-specific assay for cyst-forming coccidia and discriminate T. gondii. Overall, the approach is cost-effective and improves diagnostic decision support by narrowing the differential diagnosis list with more certainty than was previously possible. Furthermore, it supplements the limitations of cryptic protozoan morphology and surpasses the need for species-specific PCR primer combinations.

  17. The sequence capture by hybridization: a new approach for revealing the potential of mono-aromatic hydrocarbons bioattenuation in a deep oligotrophic aquifer.

    PubMed

    Ranchou-Peyruse, Magali; Gasc, Cyrielle; Guignard, Marion; Aüllo, Thomas; Dequidt, David; Peyret, Pierre; Ranchou-Peyruse, Anthony

    2017-03-01

    The formation water of a deep aquifer (853 m of depth) used for geological storage of natural gas was sampled to assess the mono-aromatic hydrocarbons attenuation potential of the indigenous microbiota. The study of bacterial diversity suggests that Firmicutes and, in particular, sulphate-reducing bacteria (Peptococcaceae) predominate in this microbial community. The capacity of the microbial community to biodegrade toluene and m- and p-xylenes was demonstrated using a culture-based approach after several hundred days of incubation. In order to reveal the potential for biodegradation of these compounds within a shorter time frame, an innovative approach named the solution hybrid selection method, which combines sequence capture by hybridization and next-generation sequencing, was applied to the same original water sample. The bssA and bssA-like genes were investigated as they are considered good biomarkers for the potential of toluene and xylene biodegradation. Unlike a PCR approach which failed to detect these genes directly from formation water, this innovative strategy demonstrated the presence of the bssA and bssA-like genes in this oligotrophic ecosystem, probably harboured by Peptococcaceae. The sequence capture by hybridization shows significant potential to reveal the presence of genes of functional interest which have low-level representation in the biosphere.

  18. Deep sequencing of RNA from immune cell-derived vesicles uncovers the selective incorporation of small non-coding RNA biotypes with potential regulatory functions

    PubMed Central

    Nolte-’t Hoen, Esther N. M.; Buermans, Henk P. J.; Waasdorp, Maaike; Stoorvogel, Willem; Wauben, Marca H. M.; ’t Hoen, Peter A. C.

    2012-01-01

    Cells release RNA-carrying vesicles and membrane-free RNA/protein complexes into the extracellular milieu. Horizontal vesicle-mediated transfer of such shuttle RNA between cells allows dissemination of genetically encoded messages, which may modify the function of target cells. Other studies used array analysis to establish the presence of microRNAs and mRNA in cell-derived vesicles from many sources. Here, we used an unbiased approach by deep sequencing of small RNA released by immune cells. We found a large variety of small non-coding RNA species representing pervasive transcripts or RNA cleavage products overlapping with protein coding regions, repeat sequences or structural RNAs. Many of these RNAs were enriched relative to cellular RNA, indicating that cells destine specific RNAs for extracellular release. Among the most abundant small RNAs in shuttle RNA were sequences derived from vault RNA, Y-RNA and specific tRNAs. Many of the highly abundant small non-coding transcripts in shuttle RNA are evolutionary well-conserved and have previously been associated to gene regulatory functions. These findings allude to a wider range of biological effects that could be mediated by shuttle RNA than previously expected. Moreover, the data present leads for unraveling how cells modify the function of other cells via transfer of specific non-coding RNA species. PMID:22821563

  19. Identification of MiRNA from eggplant (Solanum melongena L.) by small RNA deep sequencing and their response to Verticillium dahliae infection.

    PubMed

    Yang, Liu; Jue, Dengwei; Li, Wang; Zhang, Ruijie; Chen, Min; Yang, Qing

    2013-01-01

    MiRNAs are a class of non-coding small RNAs that play important roles in the regulation of gene expression. Although plant miRNAs have been extensively studied in model systems, less is known in other plants with limited genome sequence data, including eggplant (Solanum melongena L.). To identify miRNAs in eggplant and their response to Verticillium dahliae infection, a fungal pathogen for which clear understanding of infection mechanisms and effective cure methods are currently lacking, we deep-sequenced two small RNA (sRNA) libraries prepared from mock-infected and infected seedlings of eggplants. Specifically, 30,830,792 reads produced 7,716,328 unique miRNAs representing 99 known miRNA families that have been identified in other plant species. Two novel putative miRNAs were predicted with eggplant ESTs. The potential targets of the identified known and novel miRNAs were also predicted based on sequence homology search. It was observed that the length distribution of obtained sRNAs and the expression of 6 miRNA families were obviously different between the two libraries. These results provide a framework for further analysis of miRNAs and their role in regulating plant response to fungal infection and Verticillium wilt in particular.

  20. MicroRNAs in Amoebozoa: Deep sequencing of the small RNA population in the social amoeba Dictyostelium discoideum reveals developmentally regulated microRNAs

    PubMed Central

    Avesson, Lotta; Reimegård, Johan; Wagner, E. Gerhart H.; Söderbom, Fredrik

    2012-01-01

    The RNA interference machinery has served as a guardian of eukaryotic genomes since the divergence from prokaryotes. Although the basic components have a shared origin, silencing pathways directed by small RNAs have evolved in diverse directions in different eukaryotic lineages. Micro (mi)RNAs regulate protein-coding genes and play vital roles in plants and animals, but less is known about their functions in other organisms. Here, we report, for the first time, deep sequencing of small RNAs from the social amoeba Dictyostelium discoideum. RNA from growing single-cell amoebae as well as from two multicellular developmental stages was sequenced. Computational analyses combined with experimental data reveal the expression of miRNAs, several of them exhibiting distinct expression patterns during development. To our knowledge, this is the first report of miRNAs in the Amoebozoa supergroup. We also show that overexpressed miRNA precursors generate miRNAs and, in most cases, miRNA* sequences, whose biogenesis is dependent on the Dicer-like protein DrnB, further supporting the presence of miRNAs in D. discoideum. In addition, we find miRNAs processed from hairpin structures originating from an intron as well as from a class of repetitive elements. We believe that these repetitive elements are sources for newly evolved miRNAs. PMID:22875808

  1. Variable sequence of events during the past seven terminations in two deep-sea cores from the Southern Ocean

    NASA Astrophysics Data System (ADS)

    Schneider Mor, Aya; Yam, Ruth; Bianchi, Cristina; Kunz-Pirrung, Martina; Gersonde, Rainer; Shemesh, Aldo

    2012-03-01

    The relationships among internally consistent records of summer sea-surface temperature (SSST), winter sea ice (WSI), and diatomaceous stable isotopes were studied across seven terminations over the last 660 ka in sedimentary cores from ODP sites 1093 and 1094. The sequence of events at both sites indicates that SSST and WSI changes led the carbon and nitrogen isotopic changes in three Terminations (TI, TII and TVI) and followed them in the other four Terminations (TIII, TIV, TV and TVII). In both TIII and TIV, the leads and lags between the proxies were related to weak glacial mode, while in TV and TVII they were due to the influence of the mid-Pleistocene transition. We show that the sequence of events is not unique and does not follow the same pattern across terminations, implying that the processes that initiated climate change in the Southern Ocean has varied through time.

  2. Genome-wide discovery and differential regulation of conserved and novel microRNAs in chickpea via deep sequencing.

    PubMed

    Jain, Mukesh; Chevala, V V S Narayana; Garg, Rohini

    2014-11-01

    MicroRNAs (miRNAs) are essential components of complex gene regulatory networks that orchestrate plant development. Although several genomic resources have been developed for the legume crop chickpea, miRNAs have not been discovered until now. For genome-wide discovery of miRNAs in chickpea (Cicer arietinum), we sequenced the small RNA content from seven major tissues/organs employing Illumina technology. About 154 million reads were generated, which represented more than 20 million distinct small RNA sequences. We identified a total of 440 conserved miRNAs in chickpea based on sequence similarity with known miRNAs in other plants. In addition, 178 novel miRNAs were identified using a miRDeep pipeline with plant-specific scoring. Some of the conserved and novel miRNAs with significant sequence similarity were grouped into families. The chickpea miRNAs targeted a wide range of mRNAs involved in diverse cellular processes, including transcriptional regulation (transcription factors), protein modification and turnover, signal transduction, and metabolism. Our analysis revealed several miRNAs with differential spatial expression. Many of the chickpea miRNAs were expressed in a tissue-specific manner. The conserved and differential expression of members of the same miRNA family in different tissues was also observed. Some of the same family members were predicted to target different chickpea mRNAs, which suggested the specificity and complexity of miRNA-mediated developmental regulation. This study, for the first time, reveals a comprehensive set of conserved and novel miRNAs along with their expression patterns and putative targets in chickpea, and provides a framework for understanding regulation of developmental processes in legumes.

  3. The venom gland transcriptome of Latrodectus tredecimguttatus revealed by deep sequencing and cDNA library analysis.

    PubMed

    He, Quanze; Duan, Zhigui; Yu, Ying; Liu, Zhen; Liu, Zhonghua; Liang, Songping

    2013-01-01

    Latrodectus tredecimguttatus, commonly known as black widow spider, is well known for its dangerous bite. Although its venom has been characterized extensively, some fundamental questions about its molecular composition remain unanswered. The limited transcriptome and genome data available prevent further understanding of spider venom at the molecular level. In the present study, we combined next-generation sequencing and conventional DNA sequencing to construct a venom gland transcriptome of the spider L. tredecimguttatus, which resulted in the identification of 9,666 and 480 high-confidence proteins among 34,334 de novo sequences and 1,024 cDNA sequences, respectively, by assembly, translation, filtering, quantification and annotation. Extensive functional analyses of these proteins indicated that mRNAs involved in RNA transport and spliceosome, protein translation, processing and transport were highly enriched in the venom gland, which is consistent with the specific function of venom glands, namely the production of toxins. Furthermore, we identified 146 toxin-like proteins forming 12 families, including 6 new families in this spider in which α-LTX-Lt1a family2 is firstly identified as a subfamily of α-LTX-Lt1a family. The toxins were classified according to their bioactivities into five categories that functioned in a coordinate way. Few ion channels were expressed in venom gland cells, suggesting a possible mechanism of protection from the attack of their own toxins. The present study provides a gland transcriptome profile and extends our understanding of the toxinome of spiders and coordination mechanism for toxin production in protein expression quantity.

  4. The Venom Gland Transcriptome of Latrodectus tredecimguttatus Revealed by Deep Sequencing and cDNA Library Analysis

    PubMed Central

    He, Quanze; Duan, Zhigui; Yu, Ying; Liu, Zhen; Liu, Zhonghua; Liang, Songping

    2013-01-01

    Latrodectus tredecimguttatus, commonly known as black widow spider, is well known for its dangerous bite. Although its venom has been characterized extensively, some fundamental questions about its molecular composition remain unanswered. The limited transcriptome and genome data available prevent further understanding of spider venom at the molecular level. In the present study, we combined next-generation sequencing and conventional DNA sequencing to construct a venom gland transcriptome of the spider L. tredecimguttatus, which resulted in the identification of 9,666 and 480 high-confidence proteins among 34,334 de novo sequences and 1,024 cDNA sequences, respectively, by assembly, translation, filtering, quantification and annotation. Extensive functional analyses of these proteins indicated that mRNAs involved in RNA transport and spliceosome, protein translation, processing and transport were highly enriched in the venom gland, which is consistent with the specific function of venom glands, namely the production of toxins. Furthermore, we identified 146 toxin-like proteins forming 12 families, including 6 new families in this spider in which α-LTX-Lt1a family2 is firstly identified as a subfamily of α-LTX-Lt1a family. The toxins were classified according to their bioactivities into five categories that functioned in a coordinate way. Few ion channels were expressed in venom gland cells, suggesting a possible mechanism of protection from the attack of their own toxins. The present study provides a gland transcriptome profile and extends our understanding of the toxinome of spiders and coordination mechanism for toxin production in protein expression quantity. PMID:24312294

  5. Deep Sequencing of Mixed Total DNA without Barcodes Allows Efficient Assembly of Highly Plastic Ascidian Mitochondrial Genomes

    PubMed Central

    Rubinstein, Nimrod D.; Feldstein, Tamar; Shenkar, Noa; Botero-Castro, Fidel; Griggio, Francesca; Mastrototaro, Francesco; Delsuc, Frédéric; Douzery, Emmanuel J.P.; Gissi, Carmela; Huchon, Dorothée

    2013-01-01

    Ascidians or sea squirts form a diverse group within chordates, which includes a few thousand members of marine sessile filter-feeding animals. Their mitochondrial genomes are characterized by particularly high evolutionary rates and rampant gene rearrangements. This extreme variability complicates standard polymerase chain reaction (PCR) based techniques for molecular characterization studies, and consequently only a few complete Ascidian mitochondrial genome sequences are available. Using the standard PCR and Sanger sequencing approach, we produced the mitochondrial genome of Ascidiella aspersa only after a great effort. In contrast, we produced five additional mitogenomes (Botrylloides aff. leachii, Halocynthia spinosa, Polycarpa mytiligera, Pyura gangelion, and Rhodosoma turcicum) with a novel strategy, consisting in sequencing the pooled total DNA samples of these five species using one Illumina HiSeq 2000 flow cell lane. Each mitogenome was efficiently assembled in a single contig using de novo transcriptome assembly, as de novo genome assembly generally performed poorly for this task. Each of the new six mitogenomes presents a different and novel gene order, showing that no syntenic block has been conserved at the ordinal level (in Stolidobranchia and in Phlebobranchia). Phylogenetic analyses support the paraphyly of both Ascidiacea and Phlebobranchia, with Thaliacea nested inside Phlebobranchia, although the deepest nodes of the Phlebobranchia–Thaliacea clade are not well resolved. The strategy described here thus provides a cost-effective approach to obtain complete mitogenomes characterized by a highly plastic gene order and a fast nucleotide/amino acid substitution rate. PMID:23709623

  6. Identification of novel microRNAs in primates by using the synteny information and small RNA deep sequencing data.

    PubMed

    Yuan, Zhidong; Liu, Hongde; Nie, Yumin; Ding, Suping; Yan, Mingli; Tan, Shuhua; Jin, Yuanchang; Sun, Xiao

    2013-10-16

    Current technologies that are used for genome-wide microRNA (miRNA) prediction are mainly based on BLAST tool. They often produce a large number of false positives. Here, we describe an effective approach for identifying orthologous pre-miRNAs in several primates based on syntenic information. Some of them have been validated by small RNA high throughput sequencing data. This approach uses the synteny information and experimentally validated miRNAs of human, and incorporates currently available algorithms and tools to identify the pre-miRNAs in five other primates. First, we identified 929 potential pre-miRNAs in the marmoset in which miRNAs have not yet been reported. Then, we predicted the miRNAs in other primates, and we successfully re-identified most of the published miRNAs and found 721, 979, 650 and 639 new potential pre-miRNAs in chimpanzee, gorilla, orangutan and rhesus macaque, respectively. Furthermore, the miRNA transcriptome in the four primates have been re-analyzed and some novel predicted miRNAs have been supported by the small RNA sequencing data. Finally, we analyzed the potential functions of those validated miRNAs and explored the regulatory elements and transcription factors of some validated miRNA genes of interest. The results show that our approach can effectively identify novel miRNAs and some miRNAs that supported by small RNA sequencing data maybe play roles in the nervous system.

  7. Deep COI sequencing of standardized benthic samples unveils overlooked diversity of Jordanian coral reefs in the northern Red Sea.

    PubMed

    Al-Rshaidat, Mamoon M D; Snider, Allison; Rosebraugh, Sydney; Devine, Amanda M; Devine, Thomas D; Plaisance, Laetitia; Knowlton, Nancy; Leray, Matthieu

    2016-09-01

    High-throughput sequencing (HTS) of DNA barcodes (metabarcoding), particularly when combined with standardized sampling protocols, is one of the most promising approaches for censusing overlooked cryptic invertebrate communities. We present biodiversity estimates based on sequencing of the cytochrome c oxidase subunit 1 (COI) gene for coral reefs of the Gulf of Aqaba, a semi-enclosed system in the northern Red Sea. Samples were obtained from standardized sampling devices (Autonomous Reef Monitoring Structures (ARMS)) deployed for 18 months. DNA barcoding of non-sessile specimens >2 mm revealed 83 OTUs in six phyla, of which only 25% matched a reference sequence in public databases. Metabarcoding of the 2 mm - 500 μm and sessile bulk fractions revealed 1197 OTUs in 15 animal phyla, of which only 4.9% matched reference barcodes. These results highlight the scarcity of COI data for cryptobenthic organisms of the Red Sea. Compared with data obtained using similar methods, our results suggest that Gulf of Aqaba reefs are less diverse than two Pacific coral reefs but much more diverse than an Atlantic oyster reef at a similar latitude. The standardized approaches used here show promise for establishing baseline data on biodiversity, monitoring the impacts of environmental change, and quantifying patterns of diversity at regional and global scales.

  8. Genome re-sequencing of semi-wild soybean reveals a complex Soja population structure and deep introgression.

    PubMed

    Qiu, Jie; Wang, Yu; Wu, Sanling; Wang, Ying-Ying; Ye, Chu-Yu; Bai, Xuefei; Li, Zefeng; Yan, Chenghai; Wang, Weidi; Wang, Ziqiang; Shu, Qingyao; Xie, Jiahua; Lee, Suk-Ha; Fan, Longjiang

    2014-01-01

    Semi-wild soybean is a unique type of soybean that retains both wild and domesticated characteristics, which provides an important intermediate type for understanding the evolution of the subgenus Soja population in the Glycine genus. In this study, a semi-wild soybean line (Maliaodou) and a wild line (Lanxi 1) collected from the lower Yangtze regions were deeply sequenced while nine other semi-wild lines were sequenced to a 3-fold genome coverage. Sequence analysis revealed that (1) no independent phylogenetic branch covering all 10 semi-wild lines was observed in the Soja phylogenetic tree; (2) besides two distinct subpopulations of wild and cultivated soybean in the Soja population structure, all semi-wild lines were mixed with some wild lines into a subpopulation rather than an independent one or an intermediate transition type of soybean domestication; (3) high heterozygous rates (0.19-0.49) were observed in several semi-wild lines; and (4) over 100 putative selective regions were identified by selective sweep analysis, including those related to the development of seed size. Our results suggested a hybridization origin for the semi-wild soybean, which makes a complex Soja population structure.

  9. DASAF: An R Package for Deep Sequencing-Based Detection of Fetal Autosomal Abnormalities from Maternal Cell-Free DNA

    PubMed Central

    Tang, Xiaoyan; Qiu, Feng; Tao, Chunmei; Gao, Junhui; Ma, Mengmeng; Zhong, Tingyan; Cai, JianPing; Li, Yixue

    2016-01-01

    Background. With the development of massively parallel sequencing (MPS), noninvasive prenatal diagnosis using maternal cell-free DNA is fast becoming the preferred method of fetal chromosomal abnormality detection, due to its inherent high accuracy and low risk. Typically, MPS data is parsed to calculate a risk score, which is used to predict whether a fetal chromosome is normal or not. Although there are several highly sensitive and specific MPS data-parsing algorithms, there are currently no tools that implement these methods. Results. We developed an R package, detection of autosomal abnormalities for fetus (DASAF), that implements the three most popular trisomy detection methods—the standard Z-score (STDZ) method, the GC correction Z-score (GCCZ) method, and the internal reference Z-score (IRZ) method—together with one subchromosome abnormality identification method (SCAZ). Conclusions. With the cost of DNA sequencing declining and with advances in personalized medicine, the demand for noninvasive prenatal testing will undoubtedly increase, which will in turn trigger an increase in the tools available for subsequent analysis. DASAF is a user-friendly tool, implemented in R, that supports identification of whole-chromosome as well as subchromosome abnormalities, based on maternal cell-free DNA sequencing data after genome mapping. PMID:27437397

  10. Deep sequencing of mixed total DNA without barcodes allows efficient assembly of highly plastic ascidian mitochondrial genomes.

    PubMed

    Rubinstein, Nimrod D; Feldstein, Tamar; Shenkar, Noa; Botero-Castro, Fidel; Griggio, Francesca; Mastrototaro, Francesco; Delsuc, Frédéric; Douzery, Emmanuel J P; Gissi, Carmela; Huchon, Dorothée

    2013-01-01

    Ascidians or sea squirts form a diverse group within chordates, which includes a few thousand members of marine sessile filter-feeding animals. Their mitochondrial genomes are characterized by particularly high evolutionary rates and rampant gene rearrangements. This extreme variability complicates standard polymerase chain reaction (PCR) based techniques for molecular characterization studies, and consequently only a few complete Ascidian mitochondrial genome sequences are available. Using the standard PCR and Sanger sequencing approach, we produced the mitochondrial genome of Ascidiella aspersa only after a great effort. In contrast, we produced five additional mitogenomes (Botrylloides aff. leachii, Halocynthia spinosa, Polycarpa mytiligera, Pyura gangelion, and Rhodosoma turcicum) with a novel strategy, consisting in sequencing the pooled total DNA samples of these five species using one Illumina HiSeq 2000 flow cell lane. Each mitogenome was efficiently assembled in a single contig using de novo transcriptome assembly, as de novo genome assembly generally performed poorly for this task. Each of the new six mitogenomes presents a different and novel gene order, showing that no syntenic block has been conserved at the ordinal level (in Stolidobranchia and in Phlebobranchia). Phylogenetic analyses support the paraphyly of both Ascidiacea and Phlebobranchia, with Thaliacea nested inside Phlebobranchia, although the deepest nodes of the Phlebobranchia-Thaliacea clade are not well resolved. The strategy described here thus provides a cost-effective approach to obtain complete mitogenomes characterized by a highly plastic gene order and a fast nucleotide/amino acid substitution rate.

  11. Deep Sequencing of the Trypanosoma cruzi GP63 Surface Proteases Reveals Diversity and Diversifying Selection among Chronic and Congenital Chagas Disease Patients

    PubMed Central

    Llewellyn, Martin S.; Messenger, Louisa A.; Luquetti, Alejandro O.; Garcia, Lineth; Torrico, Faustino; Tavares, Suelene B. N.; Cheaib, Bachar; Derome, Nicolas; Delepine, Marc; Baulard, Céline; Deleuze, Jean-Francois; Sauer, Sascha; Miles, Michael A.

    2015-01-01

    Background Chagas disease results from infection with the diploid protozoan parasite Trypanosoma cruzi. T. cruzi is highly genetically diverse, and multiclonal infections in individual hosts are common, but little studied. In this study, we explore T. cruzi infection multiclonality in the context of age, sex and clinical profile among a cohort of chronic patients, as well as paired congenital cases from Cochabamba, Bolivia and Goias, Brazil using amplicon deep sequencing technology. Methodology/ Principal Findings A 450bp fragment of the trypomastigote TcGP63I surface protease gene was amplified and sequenced across 70 chronic and 22 congenital cases on the Illumina MiSeq platform. In addition, a second, mitochondrial target—ND5—was sequenced across the same cohort of cases. Several million reads were generated, and sequencing read depths were normalized within patient cohorts (Goias chronic, n = 43, Goias congenital n = 2, Bolivia chronic, n = 27; Bolivia congenital, n = 20), Among chronic cases, analyses of variance indicated no clear correlation between intra-host sequence diversity and age, sex or symptoms, while principal coordinate analyses showed no clustering by symptoms between patients. Between congenital pairs, we found evidence for the transmission of multiple sequence types from mother to infant, as well as widespread instances of novel genotypes in infants. Finally, non-synonymous to synonymous (dn:ds) nucleotide substitution ratios among sequences of TcGP63Ia and TcGP63Ib subfamilies within each cohort provided powerful evidence of strong diversifying selection at this locus. Conclusions/Significance Our results shed light on the diversity of parasite DTUs within each patient, as well as the extent to which parasite strains pass between mother and foetus in congenital cases. Although we were unable to find any evidence that parasite diversity accumulates with age in our study cohorts, putative diversifying selection within members of the TcGP63I

  12. Toward an Understanding of Changes in Diversity Associated with Fecal Microbiome Transplantation Based on 16S rRNA Gene Deep Sequencing

    PubMed Central

    Shahinas, Dea; Silverman, Michael; Sittler, Taylor; Chiu, Charles; Kim, Peter; Allen-Vercoe, Emma; Weese, Scott; Wong, Andrew; Low, Donald E.; Pillai, Dylan R.

    2012-01-01

    ABSTRACT Fecal microbiome transplantation by low-volume enema is an effective, safe, and inexpensive alternative to antibiotic therapy for patients with chronic relapsing Clostridium difficile infection (CDI). We explored the microbial diversity of pre- and posttransplant stool specimens from CDI patients (n = 6) using deep sequencing of the 16S rRNA gene. While interindividual variability in microbiota change occurs with fecal transplantation and vancomycin exposure, in this pilot study we note that clinical cure of CDI is associated with an increase in diversity and richness. Genus- and species-level analysis may reveal a cocktail of microorganisms or products thereof that will ultimately be used as a probiotic to treat CDI. PMID:23093385

  13. Deep Sequencing of Distinct Preparations of the Live Attenuated Varicella-Zoster Virus Vaccine Reveals a Conserved Core of Attenuating Single-Nucleotide Polymorphisms

    PubMed Central

    Yamanishi, Koichi; Gomi, Yasuyuki; Gershon, Anne A.; Breuer, Judith

    2016-01-01

    ABSTRACT The continued success of the live attenuated varicella-zoster virus vaccine in preventing varicella-zoster and herpes zoster is well documented, as are many of the mutations that contribute to the attenuation of the vOka virus for replication in skin. At least three different preparations of vOka are marketed. Here, we show using deep sequencing of seven batches of vOka vaccine (including ZostaVax, VariVax, VarilRix, and the Oka/Biken working seed) from three different manufacturers (VariVax, GSK, and Biken) that 137 single-nucleotide polymorphism (SNP) mutations are present in all vaccine batches. This includes six sites at which the vaccine allele is fixed or near fixation, which we speculate are likely to be important for attenuation. We also show that despite differences in the vaccine populations between preparations, batch-to-batch variation is minimal, as is the number and frequency of mutations unique to individual batches. This suggests that the vaccine manufacturing processes are not introducing new mutations and that, notwithstanding the mixture of variants present, VZV live vaccines are extremely stable. IMPORTANCE The continued success of vaccinations to prevent chickenpox and shingles, combined with the extremely low incidence of adverse reactions, indicates the quality of these vaccines. The vaccine itself is comprised of a heterogeneous live attenuated virus population and thus requires deep-sequencing technologies to explore the differences and similarities in the virus populations between different preparations and batches of the vaccines. Our data demonstrate minimal variation between batches, an important safety feature, and provide new insights into the extent of the mutations present in this attenuated virus. PMID:27440875

  14. Microdiversity of deep-sea Bacillales isolated from Tyrrhenian sea sediments as revealed by ARISA, 16S rRNA gene sequencing and BOX-PCR fingerprinting.

    PubMed

    Ettoumi, Besma; Guesmi, Amel; Brusetti, Lorenzo; Borin, Sara; Najjari, Afef; Boudabous, Abdellatif; Cherif, Ameur

    2013-01-01

    With respect to their terrestrial relatives, marine Bacillales have not been sufficiently investigated. In this report, the diversity of deep-sea Bacillales, isolated from seamount and non-seamount stations at 3,425 to 3,580 m depth in the Tyrrhenian Sea, was investigated using PCR fingerprinting and 16S rRNA sequence analysis. The isolate collection (n=120) was de-replicated by automated ribosomal intergenic spacer analysis (ARISA), and phylogenetic diversity was analyzed by 16S rRNA gene sequencing of representatives of each ARISA haplotype (n=37). Phylogenetic analysis of isolates showed their affiliation to six different genera of low G+C% content Gram-positive Bacillales: Bacillus, Staphylococcus, Exiguobacterium, Paenibacillus, Lysinibacillus and Terribacillus. Bacillus was the dominant genus represented by the species B. licheniformis, B. pumilus, B. subtilis, B. amyloliquefaciens and B. firmus, typically isolated from marine sediments. The most abundant species in the collection was B. licheniformis (n=85), which showed seven distinct ARISA haplotypes with haplotype H8 being the most dominant since it was identified by 63 isolates. The application of BOX-PCR fingerprinting to the B. licheniformis sub-collection allowed their separation into five distinct BOX genotypes, suggesting a high level of intraspecies diversity among marine B. licheniformis strains. This species also exhibited distinct strain distribution between seamount and non-seamount stations and was shown to be highly prevalent in non-seamount stations. This study revealed the great microdiversity of marine Bacillales and contributes to understanding the biogeographic distribution of marine bacteria in deep-sea sediments.

  15. Characterization of microRNAs by deep sequencing in red claw crayfish Cherax quadricarinatus haematopoietic tissue cells after white spot syndrome virus infection.

    PubMed

    Zhao, Meng-Ru; Meng, Chuang; Xie, Xiao-Lu; Li, Cheng-Hua; Liu, Hai-Peng

    2016-12-01

    White spot syndrome virus (WSSV) is one of the most prevalent and widespread viruses in both shrimp and crayfish aquaculture. MicroRNAs (miRNAs) are crucial post-transcriptional regulators and play critical roles in cell differentiation and proliferation, apoptosis, signal transduction and immunity. In this study, miRNA expression profiles were identified via deep sequencing in red claw crayfish Cherax quadricarinatus haematopoietic tissue (Hpt) cell cultures infected with WSSV at both early (i.e., 1 hpi) and late (i.e., 12 hpi) infection stages. The results showed that 2 known miRNAs, namely, miR-7 and miR-184 play key roles in immunity. Meanwhile, 106 novel miRNA candidates were predicted by software in these combined miRNA transcriptomes. Compared with two control groups, 36 miRNAs showed significantly different expression levels after WSSV challenge. Furthermore, 10 differentially expressed miRNAs in WSSV-exposed Hpt cells were randomly selected for expression analysis by quantitative real-time RT-PCR. Consistent with the expression profiles identified by deep sequencing, RT-PCR showed a significant increase or decrease in miRNA expression in Hpt cells after WSSV infection. Prediction of targets of miRNAs such as miR-7, cqu-miR-52, cqu-miR-126 and cqu-miR-141 revealed that their target genes have diverse biological roles, including not only immunity but also transcriptional regulation, energy metabolism, cell communication, cell differentiation, cell death, autophagy, endocytosis and apoptosis. These results provide insight into the molecular mechanism of WSSV infection and highlight the function of miRNAs in the regulation of the immune response against WSSV infection in crustaceans.

  16. Deep transcriptome-sequencing and proteome analysis of the hydrothermal vent annelid Alvinella pompejana identifies the CvP-bias as a robust measure of eukaryotic thermostability

    PubMed Central

    2013-01-01

    Background Alvinella pompejana is an annelid worm that inhabits deep-sea hydrothermal vent sites in the Pacific Ocean. Living at a depth of approximately 2500 meters, these worms experience extreme environmental conditions, including high temperature and pressure as well as high levels of sulfide and heavy metals. A. pompejana is one of the most thermotolerant metazoans, making this animal a subject of great interest for studies of eukaryotic thermoadaptation. Results In order to complement existing EST resources we performed deep sequencing of the A. pompejana transcriptome. We identified several thousand novel protein-coding transcripts, nearly doubling the sequence data for this annelid. We then performed an extensive survey of previously established prokaryotic thermoadaptation measures to search for global signals of thermoadaptation in A. pompejana in comparison with mesophilic eukaryotes. In an orthologous set of 457 proteins, we found that the best indicator of thermoadaptation was the difference in frequency of charged versus polar residues (CvP-bias), which was highest in A. pompejana. CvP-bias robustly distinguished prokaryotic thermophiles from prokaryotic mesophiles, as well as the thermophilic fungus Chaetomium thermophilum from mesophilic eukaryotes. Experimental values for thermophilic proteins supported higher CvP-bias as a measure of thermal stability when compared to their mesophilic orthologs. Proteome-wide mean CvP-bias also correlated with the body temperatures of homeothermic birds and mammals. Conclusions Our work extends the transcriptome resources for A. pompejana and identifies the CvP-bias as a robust and widely applicable measure of eukaryotic thermoadaptation. Reviewer This article was reviewed by Sándor Pongor, L. Aravind and Anthony M. Poole. PMID:23324115

  17. Microdiversity of Deep-Sea Bacillales Isolated from Tyrrhenian Sea Sediments as Revealed by ARISA, 16S rRNA Gene Sequencing and BOX-PCR Fingerprinting

    PubMed Central

    Ettoumi, Besma; Guesmi, Amel; Brusetti, Lorenzo; Borin, Sara; Najjari, Afef; Boudabous, Abdellatif; Cherif, Ameur

    2013-01-01

    With respect to their terrestrial relatives, marine Bacillales have not been sufficiently investigated. In this report, the diversity of deep-sea Bacillales, isolated from seamount and non-seamount stations at 3,425 to 3,580 m depth in the Tyrrhenian Sea, was investigated using PCR fingerprinting and 16S rRNA sequence analysis. The isolate collection (n=120) was de-replicated by automated ribosomal intergenic spacer analysis (ARISA), and phylogenetic diversity was analyzed by 16S rRNA gene sequencing of representatives of each ARISA haplotype (n=37). Phylogenetic analysis of isolates showed their affiliation to six different genera of low G+C% content Gram-positive Bacillales: Bacillus, Staphylococcus, Exiguobacterium, Paenibacillus, Lysinibacillus and Terribacillus. Bacillus was the dominant genus represented by the species B. licheniformis, B. pumilus, B. subtilis, B. amyloliquefaciens and B. firmus, typically isolated from marine sediments. The most abundant species in the collection was B. licheniformis (n=85), which showed seven distinct ARISA haplotypes with haplotype H8 being the most dominant since it was identified by 63 isolates. The application of BOX-PCR fingerprinting to the B. licheniformis sub-collection allowed their separation into five distinct BOX genotypes, suggesting a high level of intraspecies diversity among marine B. licheniformis strains. This species also exhibited distinct strain distribution between seamount and non-seamount stations and was shown to be highly prevalent in non-seamount stations. This study revealed the great microdiversity of marine Bacillales and contributes to understanding the biogeographic distribution of marine bacteria in deep-sea sediments. PMID:24005887

  18. Role of IL-17 Pathways in Immune Privilege: A RNA Deep Sequencing Analysis of the Mice Testis Exposure to Fluoride

    PubMed Central

    Huo, Meijun; Han, Haijun; Sun, Zilong; Lu, Zhaojing; Yao, Xinglei; Wang, Shaolin; Wang, Jundong

    2016-01-01

    We sequenced RNA transcripts from the testicles of healthy male mice, divided into a control group with distilled water and two experimental groups with 50 and 100 mg/l NaF in drinking water for 56 days. Bowtie/Tophat were used to align 50-bp paired-end reads into transcripts, Cufflinks to measure the relative abundance of each transcript and IPA to analyze RNA-Sequencing data. In the 100 mg/l NaF-treated group, four pathways related to IL-17, TGF-β and other cellular growth factor pathways were overexpressed. The mRNA expression of IL-17RA, IL-17RC, MAP2K1, MAP2K2, MAP2K3 and MAPKAPK2, monitored by qRT-PCR, increased remarkably in the 100 mg/L NaF group and coincided with the result of RNA-Sequencing. Fluoride exposure could disrupt spermatogenesis and testicles in male mice by influencing many signaling pathways and genes, which work on the immune signal transduction and cellular metabolism. The high expression of the IL-17 signal pathway was a response to the invasion of the testicular immune system due to extracellular fluoride. The PI3-kinase/AKT, MAPKs and the cytokines in TGF-β family were contributed to control the IL-17 pathway activation and maintain the immune privilege and spermatogenesis. All the findings provided new ideas for further molecular researches of fluorosis on the reproduction and immune response mechanism. PMID:27572304

  19. Deep Sequencing Reveals Novel Genetic Variants in Children with Acute Liver Failure and Tissue Evidence of Impaired Energy Metabolism

    PubMed Central

    Valencia, C. Alexander; Wang, Xinjian; Wang, Jin; Peters, Anna; Simmons, Julia R.; Moran, Molly C.; Mathur, Abhinav; Husami, Ammar; Qian, Yaping; Sheridan, Rachel; Bove, Kevin E.; Witte, David; Huang, Taosheng; Miethke, Alexander G.

    2016-01-01

    Background & Aims The etiology of acute liver failure (ALF) remains elusive in almost half of affected children. We hypothesized that inherited mitochondrial and fatty acid oxidation disorders were occult etiological factors in patients with idiopathic ALF and impaired energy metabolism. Methods Twelve patients with elevated blood molar lactate/pyruvate ratio and indeterminate etiology were selected from a retrospective cohort of 74 subjects with ALF because their fixed and frozen liver samples were available for histological, ultrastructural, molecular and biochemical analysis. Results A customized next-generation sequencing panel for 26 genes associated with mitochondrial and fatty acid oxidation defects revealed mutations and sequence variants in five subjects. Variants involved the genes ACAD9, POLG, POLG2, DGUOK, and RRM2B; the latter not previously reported in subjects with ALF. The explanted livers of the patients with heterozygous, truncating insertion mutations in RRM2B showed patchy micro- and macrovesicular steatosis, decreased mitochondrial DNA (mtDNA) content <30% of controls, and reduced respiratory chain complex activity; both patients had good post-transplant outcome. One infant with severe lactic acidosis was found to carry two heterozygous variants in ACAD9, which was associated with isolated complex I deficiency and diffuse hypergranular hepatocytes. The two subjects with heterozygous variants of unknown clinical significance in POLG and DGUOK developed ALF following drug exposure. Their hepatocytes displayed abnormal mitochondria by electron microscopy. Conclusion Targeted next generation sequencing and correlation with histological, ultrastructural and functional studies on liver tissue in children with elevated lactate/pyruvate ratio expand the spectrum of genes associated with pediatric ALF. PMID:27483465

  20. Mining tissue-specific contigs from peanut (Arachis hypogaea L.) for promoter cloning by deep transcriptome sequencing.

    PubMed

    Geng, Lili; Duan, Xiaohong; Liang, Chun; Shu, Changlong; Song, Fuping; Zhang, Jie

    2014-10-01

    Peanut (Arachis hypogaea L.), one of the most important oil legumes in the world, is heavily damaged by white grubs. Tissue-specific promoters are needed to incorporate insect resistance genes into peanut by genetic transformation to control the subterranean pests. Transcriptome sequencing is the most effective way to analyze differential gene expression in this non-model species and contribute to promoter cloning. The transcriptomes of the roots, seeds and leaves of peanut were sequenced using Illumina technology. A simple digital expression profile was established based on number of transcripts per million clean tags (TPM) from different tissues. Subsequently, 584 root-specific candidate transcript assembly contigs (TACs) and 316 seed-specific candidate TACs were identified. Among these candidate TACs, 55.3% were root-specific and 64.6% were seed-specific by semi-quantitative RT-PCR analysis. Moreover, the consistency of semi-quantitative RT-PCR with the simple digital expression profile was correlated with the length and TPM value of TACs. The results of gene ontology showed that some root-specific TACs are involved in stress resistance and respond to auxin stimulus, whereas, seed-specific candidate TACs are involved in embryo development, lipid storage and long-chain fatty acid biosynthesis. One root-specific promoter was cloned and characterized. We developed a high-yield screening system in peanut by establishing a simple digital expression profile based on Illumina sequencing. The feasible and rapid method presented by this study can be used for other non-model crops to explore tissue-specific or spatially specific promoters.

  1. Appearances Can Be Deceptive: Revealing a Hidden Viral Infection with Deep Sequencing in a Plant Quarantine Context

    PubMed Central

    Candresse, Thierry; Filloux, Denis; Muhire, Brejnev; Julian, Charlotte; Galzi, Serge; Fort, Guillaume; Bernardo, Pauline; Daugrois, Jean-Heindrich; Fernandez, Emmanuel; Martin, Darren P.; Varsani, Arvind; Roumagnac, Philippe

    2014-01-01

    Comprehensive inventories of plant viral diversity are essential for effective quarantine and sanitation efforts. The safety of regulated plant material exchanges presently relies heavily on techniques such as PCR or nucleic acid hybridisation, which are only suited to the detection and characterisation of specific, well characterised pathogens. Here, we demonstrate the utility of sequence-independent next generation sequencing (NGS) of both virus-derived small interfering RNAs (siRNAs) and virion-associated nucleic acids (VANA) for the detailed identification and characterisation of viruses infecting two quarantined sugarcane plants. Both plants originated from Egypt and were known to be infected with Sugarcane streak Egypt Virus (SSEV; Genus Mastrevirus, Family Geminiviridae), but were revealed by the NGS approaches to also be infected by a second highly divergent mastrevirus, here named Sugarcane white streak Virus (SWSV). This novel virus had escaped detection by all routine quarantine detection assays and was found to also be present in sugarcane plants originating from Sudan. Complete SWSV genomes were cloned and sequenced from six plants and all were found to share >91% genome-wide identity. With the exception of two SWSV variants, which potentially express unusually large RepA proteins, the SWSV isolates display genome characteristics very typical to those of all other previously described mastreviruses. An analysis of virus-derived siRNAs for SWSV and SSEV showed them to be strongly influenced by secondary structures within both genomic single stranded DNA and mRNA transcripts. In addition, the distribution of siRNA size frequencies indicates that these mastreviruses are likely subject to both transcriptional and post-transcriptional gene silencing. Our study stresses the potential advantages of NGS-based virus metagenomic screening in a plant quarantine setting and indicates that such techniques could dramatically reduce the numbers of non

  2. Deep sequencing analysis of tick-borne encephalitis virus from questing ticks at natural foci reveals similarities between quasispecies pools of the virus.

    PubMed

    Asghar, Naveed; Pettersson, John H-O; Dinnetz, Patrik; Andreassen, Åshild; Johansson, Magnus

    2017-03-01

    Every year, tick-borne encephalitis virus (TBEV) causes severe central nervous system infection in 10 000 to 15 000 people in Europe and Asia. TBEV is maintained in the environment by an enzootic cycle that requires a tick vector and a vertebrate host, and the adaptation of TBEV to vertebrate and invertebrate environments is essential for TBEV persistence in nature. This adaptation is facilitated by the error-prone nature of the virus's RNA-dependent RNA polymerase, which generates genetically distinct virus variants called quasispecies. TBEV shows a focal geographical distribution pattern where each focus represents a TBEV hotspot. Here, we sequenced and characterized two TBEV genomes, JP-296 and JP-554, from questing Ixodes ricinus ticks at a TBEV focus in central Sweden. Phylogenetic analysis showed geographical clustering among the newly sequenced strains and three previously sequenced Scandinavian strains, Toro-2003, Saringe-2009 and Mandal-2009, which originated from the same ancestor. Among these five Scandinavian TBEV strains, only Mandal-2009 showed a large deletion within the 3' non-coding region (NCR), similar to the highly virulent TBEV strain Hypr. Deep sequencing of JP-296, JP-554 and Mandal-2009 revealed significantly high quasispecies diversity for JP-296 and JP-554, with intact 3'NCRs, compared to the low diversity in Mandal-2009, with a truncated 3'NCR. Single-nucleotide polymorphism analysis showed that 40 % of the single-nucleotide polymorphisms were common between quasispecies populations of JP-296 and JP-554, indicating a putative mechanism for how TBEV persists and is maintained within its natural foci.

  3. Deep sequencing of the ancestral tobacco species Nicotiana tomentosiformis reveals multiple T-DNA inserts and a complex evolutionary history of natural transformation in the genus Nicotiana.

    PubMed

    Chen, Ke; Dorlhac de Borne, François; Szegedi, Ernö; Otten, Léon

    2014-11-01

    Nicotiana species carry cellular T-DNA sequences (cT-DNAs), acquired by Agrobacterium-mediated transformation. We characterized the cT-DNA sequences of the ancestral Nicotiana tabacum species Nicotiana tomentosiformis by deep sequencing. N. tomentosiformis contains four cT-DNA inserts derived from different Agrobacterium strains. Each has an incomplete inverted-repeat structure. TA is similar to part of the Agrobacterium rhizogenes 1724 mikimopine-type T-DNA, but has unusual orf14 and mis genes. TB carries a 1724 mikimopine-type orf14-mis fragment and a mannopine-agropine synthesis region (mas2-mas1-ags). The mas2' gene codes for an active enzyme. TC is similar to the left part of the A. rhizogenes A4 T-DNA, but also carries octopine synthase-like (ocl) and c-like genes normally found in A. tumefaciens. TD shows a complex rearrangement of T-DNA fragments similar to the right end of the A4 TL-DNA, and including an orf14-like gene and a gene with unknown function, orf511. The TA, TB, TC and TD insertion sites were identified by alignment with N. tabacum and Nicotiana sylvestris sequences. The divergence values for the TA, TB, TC and TD repeats provide an estimate for their relative introduction times. A large deletion has occurred in the central part of the N. tabacum cv. Basma/Xanthi TA region, and another deletion removed the complete TC region in N. tabacum. Nicotiana otophora lacks TA, TB and TD, but contains TC and another cT-DNA, TE. This analysis, together with that of Nicotiana glauca and other Nicotiana species, indicates multiple sequential insertions of cT-DNAs during the evolution of the genus Nicotiana.

  4. The Holocene valley fill sequence in south Louisiana: A geological and geotechnical interpretation based on results of two deep cores

    SciTech Connect

    Kuecher, G.J.; Roberts, H.H.; Suhayda, J.H. ); McGinnis, L.D. )

    1992-01-01

    The Louisiana Geological Survey--US Geological Survey cooperative research program concerning wetland subsidence in Terrebonne and Lafourche Parishes, Louisiana, funded two deep research borings, each of which recovered core of the entire Holocene valley fill. These boreholes, 22 km apart in dip direction, were logged by porosity and resistivity tools and calibrated to cone penetrometer and seismic profiles immediately offsetting them. Major deltaic cycles and ravinement surfaces were recognized in each core by ROCKEVAL pyrolysis, [sup 13]C isotope signatures, shifts of increasing radiocarbon age with depth, shifts of increasing resistivity and density with depth, microfossil analysis, and the presence and type of shell material. Data collected in this project suggest the top of the Pleistocene in this onshore, fluvially-dominated section may not be the top of Substratum sands, but significantly higher in the section, as determined by the strongest positive reflection coefficient below the 10,000 year radiocarbon datum. This satisfies the criterion that this operational boundary be mappable and chronostratigraphic. Additionally, the presence of two growth faults in the northern part of the study area may have acted as sites for preferential thickening of the Holocene. Both reasons stated above profoundly influence the modeling of Holocene thickness and consolidation settlement potential, critical for understanding subsurface controls on wetland loss.

  5. Detection of low frequency FGFR3 mutations in the urine of bladder cancer patients using next-generation deep sequencing

    PubMed Central

    Millholland, John M; Li, Shuqiang; Fernandez, Cecilia A; Shuber, Anthony P

    2012-01-01

    Biological fluid-based noninvasive biomarker assays for monitoring and diagnosing disease are clinically powerful. A major technical hurdle for developing these assays is the requirement of high analytical sensitivity so that biomarkers present at very low levels can be consistently detected. In the case of biological fluid-based cancer diagnostic assays, sensitivities similar to those of tissue-based assays are difficult to achieve with DNA markers due to the high abundance of normal DNA background present in the sample. Here we describe a new urine-based assay that uses ultradeep sequencing technology to detect single mutant molecules of fibroblast growth factor receptor 3 (FGFR3) DNA that are indicative of bladder cancer. Detection of FGFR3 mutations in urine would provide clinicians with a noninvasive means of diagnosing early-stage bladder cancer. The single-molecule assay detects FGFR3 mutant DNA when present at as low as 0.02% of total urine DNA and results in 91% concordance with the frequency that FGFR3 mutations are detected in bladder cancer tumors, significantly improving diagnostic performance. To our knowledge, this is the first practical application of next-generation sequencing technology for noninvasive cancer diagnostics. PMID:24199178

  6. Small RNA Deep Sequencing and the Effects of microRNA408 on Root Gravitropic Bending in Arabidopsis

    NASA Astrophysics Data System (ADS)

    Li, Huasheng; Lu, Jinying; Sun, Qiao; Chen, Yu; He, Dacheng; Liu, Min

    2015-11-01

    MicroRNA (miRNA) is a non-coding small RNA composed of 20 to 24 nucleotides that influences plant root development. This study analyzed the miRNA expression in Arabidopsis root tip cells using Illumina sequencing and real-time PCR before (sample 0) and 15 min after (sample 15) a 3-D clinostat rotational treatment was administered. After stimulation was performed, the expression levels of seven miRNA genes, including Arabidopsis miR160, miR161, miR394, miR402, miR403, miR408, and miR823, were significantly upregulated. Illumina sequencing results also revealed two novel miRNAsthat have not been previously reported, The target genes of these miRNAs included pentatricopeptide repeat-containing protein and diadenosine tetraphosphate hydrolase. An overexpression vector of Arabidopsis miR408 was constructed and transferred to Arabidopsis plant. The roots of plants over expressing miR408 exhibited a slower reorientation upon gravistimulation in comparison with those of wild-type. This result indicate that miR408 could play a role in root gravitropic response.

  7. Deep HST Imaging in 47 Tuc and NGC 6397: The White Dwarf Cooling Sequence of 47 Tuc

    NASA Astrophysics Data System (ADS)

    Richer, Harvey B.; Anderson, J.; Dotter, A.; Fahlman, G.; Goldsbury, R.; Hansen, B.; Hurley, J.; Kalirai, J.; King, I.; Reitzel, D.; Rich, R.; Shara, M.; Stetson, P.; Woodley, K.; Zurek, D.

    2011-01-01

    In Cycle 17 we were awarded 121 orbits with HST to search for the faintest stellar populations (the coolest white dwarfs, the lowest mass main sequence stars and possibly the brown dwarfs) in 47 Tucanae. It took 10 months to secure all the data with exquisite care taken to minimize the effects of charge transfer and saturation spikes. The ACS stared at a single field for all 121 orbits but the roll angle of the telescope was varied through 180 degrees for the associated parallel fields observed with WFC3. Archival data were employed to proper motion clean the images allowing virtually complete separation of field stars and those in the background Small Magellanic Cloud from those in the cluster. In this poster, we present the resultant color-magnitude diagram for this important cluster which is a proxy for the Galactic bulge. A rich white dwarf cooling sequence is revealed which will be used to determine a cooling age for the cluster for comparison with the turnoff age (see associated poster by A. Dotter et al.). Multicolor data in other ACS filters as well as four filters with WFC3 are used to examine the spectral energy distributions of the cluster white dwarfs.

  8. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences.

    PubMed

    Quang, Daniel; Xie, Xiaohui

    2016-06-20

    Modeling the properties and functions of DNA sequences is an important, but challenging task in the broad field of genomics. This task is particularly difficult for non-coding DNA, the vast majority of which is still poorly understood in terms of function. A powerful predictive model for the function of non-coding DNA can have enormous benefit for both basic science and translational research because over 98% of the human genome is non-coding and 93% of disease-associated variants lie in these regions. To address this need, we propose DanQ, a novel hybrid convolutional and bi-directional long short-term memory recurrent neural network framework for predicting non-coding function de novo from sequence. In the DanQ model, the convolution layer captures regulatory motifs, while the recurrent layer captures long-term dependencies between the motifs in order to learn a regulatory 'grammar' to improve predictions. DanQ improves considerably upon other models across several metrics. For some regulatory markers, DanQ can achieve over a 50% relative improvement in the area under the precision-recall curve metric compared to related models. We have made the source code available at the github repository http://github.com/uci-cbcl/DanQ.

  9. Deep sequencing and transcriptome analyses to identify genes involved in secoiridoid biosynthesis in the Tibetan medicinal plant Swertia mussotii.

    PubMed

    Liu, Yue; Wang, Yi; Guo, Fengxian; Zhan, Lin; Mohr, Toni; Cheng, Prisca; Huo, Naxin; Gu, Ronghui; Pei, Danning; Sun, Jiaqing; Tang, Li; Long, Chunlin; Huang, Luqi; Gu, Yong Q

    2017-02-22

    Swertia mussotii Franch. is an important traditional Tibetan medicinal plant with pharmacological properties effective in the treatment of various ailments including hepatitis. Secoiridoids are the major bioactive compounds in S. mussotii. To better understand the secoiridoid biosynthesis pathway, we generated transcriptome sequences from the root, leaf, stem, and flower tissues, and performed de novo sequence assembly, yielding 98,613 unique transcripts with an N50 of 1,085 bp. Putative functions could be assigned to 35,029 transcripts (35.52%) based on BLAST searches against annotation databases including GO and KEGG. The expression profiles of 39 candidate transcripts encoding the key enzymes for secoiridoid biosynthesis were examined in different S. mussotii tissues, validated by qRT-PCR, and compared with the homologous genes from S. japonica, a species in the same family, unveiling the gene expression, regulation, and conservation of the pathway. The examination of the accumulated levels of three bioactive compounds, sweroside, swertiamarin, and gentiopicroside, revealed their considerable variations in different tissues, with no significant correlation with the expression profiles of key genes in the pathway, suggesting complex biological behaviours in the coordination of metabolite biosynthesis and accumulation. The genomic dataset and analyses presented here lay the foundation for further research on this important medicinal plant.

  10. NGS-QC Generator: A Quality Control System for ChIP-Seq and Related Deep Sequencing-Generated Datasets.

    PubMed

    Mendoza-Parra, Marco Antonio; Saleem, Mohamed-Ashick M; Blum, Matthias; Cholley, Pierre-Etienne; Gronemeyer, Hinrich

    2016-01-01

    The combination of massive parallel sequencing with a variety of modern DNA/RNA enrichment technologies provides means for interrogating functional protein-genome interactions (ChIP-seq), genome-wide transcriptional activity (RNA-seq; GRO-seq), chromatin accessibility (DNase-seq, FAIRE-seq, MNase-seq), and more recently the three-dimensional organization of chromatin (Hi-C, ChIA-PET). In systems biology-based approaches several of these readouts are generally cumulated with the aim of describing living systems through a reconstitution of the genome-regulatory functions. However, an issue that is often underestimated is that conclusions drawn from such multidimensional analyses of NGS-derived datasets critically depend on the quality of the compared datasets. To address this problem, we have developed the NGS-QC Generator, a quality control system that infers quality descriptors for any kind of ChIP-sequencing and related datasets. In this chapter we provide a detailed protocol for (1) assessing quality descriptors with the NGS-QC Generator; (2) to interpret the generated reports; and (3) to explore the database of QC indicators (www.ngs-qc.org) for >21,000 publicly available datasets.

  11. Deep Sequencing and High-Resolution Imaging Reveal Compartment-Specific Localization of Bdnf mRNA in Hippocampal Neurons

    PubMed Central

    Will, Tristan J.; Tushev, Georgi; Kochen, Lisa; Nassim-Assir, Belquis; Cajigas, Ivan J.; tom Dieck, Susanne; Schuman, Erin M.

    2016-01-01

    Brain-derived neurotrophic factor (BDNF) is a small protein of the neurotrophin family that regulates various brain functions. Although much is known about how its transcription is regulated, the abundance of endogenous BDNF mRNA and its subcellular localization pattern are matters of debate. We used next-generation sequencing and high-resolution in situ hybridization in the rat hippocampus to reexamine this question. We performed 3′ end sequencing on rat hippocampal slices and detected two isoforms of Bdnf containing either a short or a long 3′ untranslated region (3′UTR). Most of the Bdnf transcripts contained the short 3′UTR isoform and were present in low amounts relative to other neuronal transcripts. Bdnf mRNA was present in the somatic compartment of rat hippocampal slices or the somata of cultured rat hippocampal neurons but was rarely detected in the dendritic processes. Pharmacological stimulation of hippocampal neurons induced Bdnf expression but did not change the ratio of Bdnf isoform abundance. The findings indicate that endogenous Bdnf mRNA, although weakly abundant, is primarily localized to the somatic compartment of hippocampal neurons. Both Bdnf mRNA isoforms have shorter half-lives compared with other neuronal mRNAs. Furthermore, the findings show that using complementary high-resolution techniques can provide sensitive measures of endogenous transcript abundance. PMID:24345682

  12. Deep sequencing and transcriptome analyses to identify genes involved in secoiridoid biosynthesis in the Tibetan medicinal plant Swertia mussotii

    PubMed Central

    Liu, Yue; Wang, Yi; Guo, Fengxian; Zhan, Lin; Mohr, Toni; Cheng, Prisca; Huo, Naxin; Gu, Ronghui; Pei, Danning; Sun, Jiaqing; Tang, Li; Long, Chunlin; Huang, Luqi; Gu, Yong Q.

    2017-01-01

    Swertia mussotii Franch. is an important traditional Tibetan medicinal plant with pharmacological properties effective in the treatment of various ailments including hepatitis. Secoiridoids are the major bioactive compounds in S. mussotii. To better understand the secoiridoid biosynthesis pathway, we generated transcriptome sequences from the root, leaf, stem, and flower tissues, and performed de novo sequence assembly, yielding 98,613 unique transcripts with an N50 of 1,085 bp. Putative functions could be assigned to 35,029 transcripts (35.52%) based on BLAST searches against annotation databases including GO and KEGG. The expression profiles of 39 candidate transcripts encoding the key enzymes for secoiridoid biosynthesis were examined in different S. mussotii tissues, validated by qRT-PCR, and compared with the homologous genes from S. japonica, a species in the same family, unveiling the gene expression, regulation, and conservation of the pathway. The examination of the accumulated levels of three bioactive compounds, sweroside, swertiamarin, and gentiopicroside, revealed their considerable variations in different tissues, with no significant correlation with the expression profiles of key genes in the pathway, suggesting complex biological behaviours in the coordination of metabolite biosynthesis and accumulation. The genomic dataset and analyses presented here lay the foundation for further research on this important medicinal plant. PMID:28225035

  13. Identification and expression profiling of Vigna mungo microRNAs from leaf small RNA transcriptome by deep sequencing.

    PubMed

    Paul, Sujay; Kundu, Anirban; Pal, Amita

    2014-01-01

    MicroRNAs (miRNAs) represent a class of small non-coding RNA molecules that play a crucial role in post-transcriptional gene regulation. Several conserved and species-specific miRNAs have been characterized to date, predominantly from the plant species whose genome is well characterized. However, information on the variability of these regulatory RNAs in economically important but genetically less characterized crop species are limited. Vigna mungo is an important grain legume, which is grown primarily for its protein-rich edible seeds. miRNAs from this species have not been identified to date due to lack of genome sequence information. To identify miRNAs from V. mungo, a small RNA library was constructed from young leaves. High-throughput Illumina sequencing technology and bioinformatic analysis of the small RNA reads led to the identification of 66 miRNA loci represented by 45 conserved miRNAs belonging to 19 families and eight non-conserved miRNAs belonging to seven families. Besides, 13 novel miRNA candidates in V. mungo were also identified. Expression patterns of selected conserved, non-conserved, and novel miRNA candidates have been demonstrated in leaf, stem, and root tissues by quantitative polymerase chain reaction, and potential target genes were predicted for most of the conserved miRNAs. This information offers genomic resources for better understanding of miRNA mediated post-transcriptional gene regulation.

  14. Detection of Inter-Lineage Natural Recombination in Avian Paramyxovirus Serotype 1 Using Simplified Deep Sequencing Platform

    PubMed Central

    Satharasinghe, Dilan A.; Murulitharan, Kavitha; Tan, Sheau W.; Yeap, Swee K.; Munir, Muhammad; Ideris, Aini; Omar, Abdul R.

    2016-01-01

    Newcastle disease virus (NDV) is a prototype member of avian paramyxovirus serotype 1 (APMV-1), which causes severe and contagious disease in the commercial poultry and wild birds. Despite extensive vaccination programs and other control measures, the disease remains endemic around the globe especially in Asia, Africa, and the Middle East. Being a single serotype, genotype II based vaccines remained most acceptable means of immunization. However, the evidence is emerging on failures of vaccines mainly due to evolving nature of the virus and higher genetic gaps between vaccine and field strains of APMV-1. Most of the epidemiological and genetic characterizations of APMVs are based on conventional methods, which are prone to mask the diverse population of viruses in complex samples. In this study, we report the application of a simple, robust, and less resource-demanding methodology for the whole genome sequencing of NDV, using next-generation sequencing (NGS) on the Illumina MiSeq platform. Using this platform, we sequenced full genomes of five virulent Malaysian NDV strains collected during 2004–2013. All isolates clustered within highly prevalent lineage 5 (specifically in lineage 5a); however, a significantly greater genetic divergence was observed in isolates collected from 2004 to 2011. Interestingly, genetic characterization of one isolate collected in 2013 (IBS025/13) shown natural recombination between lineage 2 and lineage 5. In the event of recombination, the isolate (IBS025/13) carried nucleocapsid protein consist of 55–1801 nucleotides (nts) and near-complete phosphoprotein (1804–3254 nts) genes of lineage 2 whereas surface glycoproteins (fusion, hemagglutinin-neuraminidase) and large polymerase of lineage 5. Additionally, the recombinant virus has a genome size of 15,186 nts which is characteristics for the old genotypes I–IV isolated from 1930 to 1960. Taken together, we report the occurrence of a natural recombination in circulating strains of

  15. Delayed gratification habitable zones: when deep outer solar system regions become balmy during post-main sequence stellar evolution.

    PubMed

    Stern, S Alan

    2003-01-01

    Like all low- and moderate-mass stars, the Sun will burn as a red giant during its later evolution, generating of solar luminosities for some tens of millions of years. During this post-main sequence phase, the habitable (i.e., liquid water) thermal zone of our Solar System will lie in the region where Triton, Pluto-Charon, and Kuiper Belt objects orbit. Compared with the 1 AU habitable zone where Earth resides, this "delayed gratification habitable zone" (DGHZ) will enjoy a far less biologically hazardous environment - with lower harmful radiation levels from the Sun, and a far less destructive collisional environment. Objects like Triton, Pluto-Charon, and Kuiper Belt objects, which are known to be rich in both water and organics, will then become possible sites for biochemical and perhaps even biological evolution. The Kuiper Belt, with >10(5) objects > or =50 km in radius and more than three times the combined surface area of the four terrestrial planets, provides numerous sites for possible evolution once the Sun's DGHZ reaches it. The Sun's DGHZ might be thought to only be of academic interest owing to its great separation from us in time. However, approximately 10(9) Milky Way stars burn as luminous red giants today. Thus, if icy-organic objects are common in the 20-50 AU zones of these stars, as they are in our Solar System (and as inferred in numerous main sequence stellar disk systems), then DGHZs may form a niche type of habitable zone that is likely to be numerically common in the Galaxy.

  16. Bacterial communities associated with host-adapted populations of pea aphids revealed by deep sequencing of 16S ribosomal DNA.

    PubMed

    Gauthier, Jean-Pierre; Outreman, Yannick; Mieuzet, Lucie; Simon, Jean-Christophe

    2015-01-01

    Associations between microbes and animals are ubiquitous and hosts may benefit from harbouring microbial communities through improved resource exploitation or resistance to environmental stress. The pea aphid, Acyrthosiphon pisum, is the host of heritable bacterial symbionts, including the obligate endosymbiont Buchnera aphidicola and several facultative symbionts. While obligate symbionts supply aphids with key nutrients, facultative symbionts influence their hosts in many ways such as protection against natural enemies, heat tolerance, color change and reproduction alteration. The pea aphid also encompasses multiple plant-specialized biotypes, each adapted to one or a few legume species. Facultative symbiont communities differ strongly between biotypes, although bacterial involvement in plant specialization is uncertain. Here, we analyse the diversity of bacterial communities associated with nine biotypes of the pea aphid complex using amplicon pyrosequencing of 16S rRNA genes. Combined clustering and phylogenetic analyses of 16S sequences allowed identifying 21 bacterial OTUs (Operational Taxonomic Unit). More than 98% of the sequencing reads were assigned to known pea aphid symbionts. The presence of Wolbachia was confirmed in A. pisum while Erwinia and Pantoea, two gut associates, were detected in multiple samples. The diversity of bacterial communities harboured by pea aphid biotypes was very low, ranging from 3 to 11 OTUs across samples. Bacterial communities differed more between than within biotypes but this difference did not correlate with the genetic divergence between biotypes. Altogether, these results confirm that the aphid microbiota is dominated by a few heritable symbionts and that plant specialization is an important structuring factor of bacterial communities associated with the pea aphid complex. However, since we examined the microbiota of aphid samples kept a few generations in controlled conditions, it may be that bacterial diversity was

  17. Revisiting bovine pyometra--new insights into the disease using a culture-independent deep sequencing approach.

    PubMed

    Knudsen, Lif Rødtness Vesterby; Karstrup, Cecilia Christensen; Pedersen, Hanne Gervi; Agerholm, Jørgen Steen; Jensen, Tim Kåre; Klitgaard, Kirstine

    2015-02-25

    The bacteria present in the uterus during pyometra have previously been studied using bacteriological culturing. These studies identified Fusobacterium necrophorum and Trueperella pyogenes as the major contributors to the pathogenesis of pyometra. However, an increasing number of culture-independent studies have demonstrated that the bacterial diversity in most environments is underestimated in culture-based studies. Consequently, fastidious pyometra-associated pathogens may have been overlooked. Therefore, the primary purpose of this study was to investigate the diversity of bacteria in the uterus of cows with pyometra by using culture-independent 16S rRNA PCR combined with next generation sequencing. We investigated the microbial composition in the uterus of 21 cows with pyometra, which were obtained from a Danish slaughterhouse. Similar to the observations from the culture studies, Fusobacteriaceae, the family that F. necrophorum belongs to, was the operational taxonomic unit (OTU) observed in the largest quantities. By contrast, the Actinomycetaceae family, which includes T. pyogenes, constituted only 1% of the total number of reads. Thus we cannot confirm the previously reported role of species from this family in the pathogenesis of pyometra. Finally, we identified a large number of sequences representing three families of Gram-negative bacteria in the pyometra samples: Porphyromonadaceae, Mycoplasmataceae, and Pasteurellaceae. It is likely that these families comprise potential pathogenic species of a fastidious nature, which have been overlooked in previous studies. Our results increase the knowledge of the complexity of the pyometra microbiota and suggest that pathogens in addition to F. necrophorum may be involved in the pathogenesis of pyometra.

  18. Bacterial Communities Associated with Host-Adapted Populations of Pea Aphids Revealed by Deep Sequencing of 16S Ribosomal DNA

    PubMed Central

    Gauthier, Jean-Pierre; Outreman, Yannick; Mieuzet, Lucie; Simon, Jean-Christophe

    2015-01-01

    Associations between microbes and animals are ubiquitous and hosts may benefit from harbouring microbial communities through improved resource exploitation or resistance to environmental stress. The pea aphid, Acyrthosiphon pisum, is the host of heritable bacterial symbionts, including the obligate endosymbiont Buchnera aphidicola and several facultative symbionts. While obligate symbionts supply aphids with key nutrients, facultative symbionts influence their hosts in many ways such as protection against natural enemies, heat tolerance, color change and reproduction alteration. The pea aphid also encompasses multiple plant-specialized biotypes, each adapted to one or a few legume species. Facultative symbiont communities differ strongly between biotypes, although bacterial involvement in plant specialization is uncertain. Here, we analyse the diversity of bacterial communities associated with nine biotypes of the pea aphid complex using amplicon pyrosequencing of 16S rRNA genes. Combined clustering and phylogenetic analyses of 16S sequences allowed identifying 21 bacterial OTUs (Operational Taxonomic Unit). More than 98% of the sequencing reads were assigned to known pea aphid symbionts. The presence of Wolbachia was confirmed in A. pisum while Erwinia and Pantoea, two gut associates, were detected in multiple samples. The diversity of bacterial communities harboured by pea aphid biotypes was very low, ranging from 3 to 11 OTUs across samples. Bacterial communities differed more between than within biotypes but this difference did not correlate with the genetic divergence between biotypes. Altogether, these results confirm that the aphid microbiota is dominated by a few heritable symbionts and that plant specialization is an important structuring factor of bacterial communities associated with the pea aphid complex. However, since we examined the microbiota of aphid samples kept a few generations in controlled conditions, it may be that bacterial diversity was

  19. An ORFome assembly approach to metagenomics sequences analysis.

    PubMed

    Ye, Yuzhen; Tang, Haixu

    2009-06-01

    Metagenomics is an emerging methodology for the direct genomic analysis of a mixed community of uncultured microorganisms. The current analyses of metagenomics data largely rely on the computational tools originally designed for microbial genomics projects. The challenge of assembling metagenomic sequences arises mainly from the short reads and the high species complexity of the community. Alternatively, individual (short) reads will be searched directly against databases of known genes (or proteins) to identify homologous sequences. The latter approach may have low sensitivity and specificity in identifying homologous sequences, which may further bias the subsequent diversity analysis. In this paper, we present a novel approach to metagenomic data analysis, called Metagenomic ORFome Assembly (MetaORFA). The whole computational framework consists of three steps. Each read from a metagenomics project will first be annotated with putative open reading frames (ORFs) that likely encode proteins. Next, the predicted ORFs are assembled into a collection of peptides using an EULER assembly method. Finally, the assembled peptides (i.e. ORFome) are used for database searching of homologs and subsequent diversity analysis. We applied MetaORFA approach to several metagenomics datasets with low coverage short reads. The results show that MetaORFA can produce long peptides even when the sequence coverage of reads is extremely low. Hence, the ORFome assembly significantly increases the sensitivity of homology searching, and may potentially improve the diversity analysis of the metagenomic data. This improvement is especially useful for metagenomic projects when the genome assembly does not work because of the low sequence coverage.

  20. Whole exome and targeted deep sequencing identify genome-wide allelic loss and frequent SETDB1 mutations in malignant pleural mesotheliomas

    PubMed Central

    Lee, Sharon; Mendez, Pedro; Kim, James Wansoo; Woodard, Gavitt; Yoon, Jun-Hee; Jen, Kuang-Yu; Fang, Li Tai; Jones, Kirk; Jablons, David M.; Kim, Il-Jin

    2016-01-01

    Malignant pleural mesothelioma (MPM), a rare malignancy with a poor prognosis, is mainly caused by exposure to asbestos or other organic fibers, but the underlying genetic mechanism is not fully understood. Genetic alterations and causes for multiple primary cancer development including MPM are unknown. We used whole exome sequencing to identify somatic mutations in a patient with MPM and two additional primary cancers who had no evidence of venous, arterial, lymphovascular, or perineural invasion indicating dissemination of a primary lung cancer to the pleura. We found that the MPM had R282W, a key TP53 mutation, and genome-wide allelic loss or loss of heterozygosity, a distinct genomic alteration not previously described in MPM. We identified frequent inactivating SETDB1 mutations in this patient and in 68 additional MPM patients (mutation frequency: 10%, 7/69) by targeted deep sequencing. Our observations suggest the possibility of a new genetic mechanism in the development of either MPM or multiple primary cancers. The frequent SETDB1 inactivating mutations suggest there could be new diagnostic or therapeutic options for MPM. PMID:26824986

  1. An ultra-deep sequencing strategy to detect sub-clonal TP53 mutations in presentation chronic lymphocytic leukaemia cases using multiple polymerases.

    PubMed

    Worrillow, L; Baskaran, P; Care, M A; Varghese, A; Munir, T; Evans, P A; O'Connor, S J; Rawstron, A; Hazelwood, L; Tooze, R M; Hillmen, P; Newton, D J

    2016-10-06

    Chronic lymphocytic leukaemia (CLL) is the most common clonal B-cell disorder characterized by clonal diversity, a relapsing and remitting course, and in its aggressive forms remains largely incurable. Current front-line regimes include agents such as fludarabine, which act primarily via the DNA damage response pathway. Key to this is the transcription factor p53. Mutations in the TP53 gene, altering p53 functionality, are associated with genetic instability, and are present in aggressive CLL. Furthermore, the emergence of clonal TP53 mutations in relapsed CLL, refractory to DNA-damaging therapy, suggests that accurate detection of sub-clonal TP53 mutations prior to and during treatment may be indicative of early relapse. In this study, we describe a novel deep sequencing workflow using multiple polymerases to generate sequencing libraries (MuPol-Seq), facilitating accurate detection of TP53 mutations at a frequency as low as 0.3%, in presentation CLL cases tested. As these mutations were mostly clustered within the regions of TP53 encoding DNA-binding domains, essential for DNA contact and structural architecture, they are likely to be of prognostic relevance in disease progression. The workflow described here has the potential to be implemented routinely to identify rare mutations across a range of diseases.

  2. Deep sequencing reveals different compositions of mRNA transcribed from the F8 gene in a panel of FVIII-producing CHO cell lines.

    PubMed

    Kaas, Christian S; Bolt, Gert; Hansen, Jens J; Andersen, Mikael R; Kristensen, Claus

    2015-07-01

    Coagulation factor VIII (FVIII) is one of the most complex biopharmaceuticals due to the large size, poor protein stability and extensive post-translational modifications. As a consequence, efficient production of FVIII in mammalian cells poses a major challenge, with typical yields two to three orders of magnitude lower than for antibodies. In the present study we investigated CHO DXB11 cells transfected with a plasmid encoding human coagulation factor VIII. Single cell clones were isolated from the pool of transfectants and a panel of 14 clones representing a dynamic range of FVIII productivities was selected for RNA sequencing analysis. The analysis showed distinct differences in F8 RNA composition between the clones. The exogenous F8-dhfr transcript was found to make up the most abundant transcript in the present clones. No correlation was seen between F8 mRNA levels and the measured FVIII productivity. It was found that three MTX resistant, nonproducing clones had different truncations of the F8 transcripts. We find that by using deep sequencing, in contrast to microarray technology, for determining the transcriptome from CHO transfectants, we are able to accurately deduce the mature mRNA composition of the transgene and identify significant truncations that would probably otherwise have remained undetected.

  3. MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island.

    PubMed

    Ashton, Philip M; Nair, Satheesh; Dallman, Tim; Rubino, Salvatore; Rabsch, Wolfgang; Mwaigwisya, Solomon; Wain, John; O'Grady, Justin

    2015-03-01

    Short-read, high-throughput sequencing technology cannot identify the chromosomal position of repetitive insertion sequences that typically flank horizontally acquired genes such as bacterial virulence genes and antibiotic resistance genes. The MinION nanopore sequencer can produce long sequencing reads on a device similar in size to a USB memory stick. Here we apply a MinION sequencer to resolve the structure and chromosomal insertion site of a composite antibiotic resistance island in Salmonella Typhi Haplotype 58. Nanopore sequencing data from a single 18-h run was used to create a scaffold for an assembly generated from short-read Illumina data. Our results demonstrate the potential of the MinION device in clinical laboratories to fully characterize the epidemic spread of bacterial pathogens.

  4. Discovery of Bovine Digital Dermatitis-Associated Treponema spp. in the Dairy Herd Environment by a Targeted Deep-Sequencing Approach

    PubMed Central

    Nielsen, Martin W.; Ingerslev, Hans-Christian; Boye, Mette; Jensen, Tim K.

    2014-01-01

    The bacteria associated with the infectious claw disease bovine digital dermatitis (DD) are spirochetes of the genus Treponema; however, their environmental reservoir remains unknown. To our knowledge, the current study is the first report of the discovery and phylogenetic characterization of rRNA gene sequences from DD-associated treponemes in the dairy herd environment. Although the spread of DD appears to be facilitated by wet floors covered with slurry, no DD-associated treponemes have been isolated from this environment previously. Consequently, there is a lack of knowledge about the spread of this disease among cows within a herd as well as between herds. To address the issue of DD infection reservoirs, we searched for evidence of DD-associated treponemes in fresh feces, in slurry, and in hoof lesions by deep sequencing of the V3 and V4 hypervariable regions of the 16S rRNA gene coupled with identification at the operational-taxonomic-unit level. Using treponeme-specific primers in this high-throughput approach, we identified small amounts of DNA (on average 0.6% of the total amount of sequence reads) from DD-associated treponemes in 43 of 64 samples from slurry and cow feces collected from six geographically dispersed dairy herds. Species belonging to the Treponema denticola/Treponema pedis-like and Treponema phagedenis-like phylogenetic clusters were among the most prevalent treponemes in both the dairy herd environment and the DD lesions. By the high-throughput approach presented here, we have demonstrated that cow feces and environmental slurry are possible reservoirs of DD-associated treponemes. This method should enable further clarification of the etiopathogenesis of DD. PMID:24814794

  5. Nucleotide and deduced amino acid sequences of a subtilisin-like serine protease from a deep-sea bacterium, Alkalimonas collagenimarina AC40(T).

    PubMed

    Kurata, Atsushi; Uchimura, Kohsuke; Shimamura, Shigeru; Kobayashi, Tohru; Horikoshi, Koki

    2007-11-01

    The acpI gene encoding an alkaline protease (AcpI) from a deep-sea bacterium, Alkalimonas collagenimarina AC40(T), was shotgun-cloned and sequenced. It had a 1,617-bp open reading frame encoding a protein of 538 amino acids. Based on analysis of the deduced amino acid sequence, AcpI is a subtilisin-like serine protease belonging to subtilase family A. It consists of a prepropeptide, a catalytic domain, and a prepeptidase C-terminal domain like other serine proteases from the genera Pseudomonas, Shewanella, Alteromonas, and Xanthomonas. Heterologous expression of the acpI gene in Escherichia coli cells yielded a 28-kDa recombinant AcpI (rAcpI), suggesting that both the prepropeptide and prepeptidase C-terminal domains were cleaved off to give the mature form. Analysis of N-terminal and C-terminal amino acid sequences of purified rAcpI showed that the mature enzyme would be composed of 273 amino acids. The optimal pH and temperature for the caseinolytic activity of the purified rAcpI were 9.0-9.5 and 45 degrees C in 100 mM glycine-NaOH buffer. Calcium ions slightly enhanced the enzyme activity and stability. The enzyme favorably hydrolyzed gelatin, collagen, and casein. AcpI from A. collagenimarina AC40(T) was also purified from culture broth, and its molecular mass was around 28 kDa, indicating that the cleavage manner of the enzyme is similar to that in E. coli cells.

  6. Investigation of a Case of Genotype 5a Hepatitis C Virus Transmission in a French Hemodialysis Unit Using Epidemiologic Data and Deep Sequencing.

    PubMed

    Aho-Glélé, L S; Giraudon, H; Astruc, K; Soltani, Z; Lefebvre, A; Pothier, P; Bour, J B; Manoha, C

    2016-02-01

    BACKGROUND Hepatitis C virus (HCV) is a major cause of chronic liver disease worldwide. A patient was recently found to be HCV seropositive during hemodialysis follow-up. OBJECTIVE To determine whether nosocomial transmission had occurred and which viral populations were transmitted. DESIGN HCV transmission case. SETTING A dialysis unit in a French hospital. METHODS Molecular and epidemiologic investigations were conducted to determine whether 2 cases were related. Risk analysis and auditing procedures were performed to determine the transmission pathway(s). RESULTS Sequence analyses of the NS5b region revealed a 5a genotype in the newly infected patient. Epidemiologic investigations suggested that a highly viremic genotype 5a HCV-infected patient who underwent dialysis in the same unit was the source of the infection. Phylogenetic analysis of NS5b and hypervariable region-1 sequences revealed a genetically related virus (>99.9% nucleotide identity). Deep sequencing of hypervariable region-1 indicated that HCV quasispecies were found in the source whereas a single hypervariable region-1 HCV variant was found in the newly infected patient, and that this was identical to the major variant identified in the source patient. Risk analysis and auditing procedures were performed to determine the transmission pathway(s). Nosocomial patient-to-patient transmission via healthcare workers' hands was the most likely explanation. In our dialysis unit, this unique incident led to the adjustment of infection control policy. CONCLUSIONS The data support transmission of a unique variant from a source with a high viral load and genetic diversity. This investigation also underlines the need to periodically evaluate prevention and control practices.

  7. Transcriptome profiling and digital gene expression by deep sequencing in early somatic embryogenesis of endangered medicinal Eleutherococcus senticosus Maxim.

    PubMed

    Tao, Lei; Zhao, Yue; Wu, Ying; Wang, Qiuyu; Yuan, Hongmei; Zhao, Lijuan; Guo, Wendong; You, Xiangling

    2016-03-01

    Somatic embryogenesis (SE) has been studied as a model system to understand molecular events in physiology, biochemistry, and cytology during plant embryo development. In particular, it is exceedingly difficult to access the morphological and early regulatory events in zygotic embryos. To understand the molecular mechanisms regulating early SE in Eleutherococcus senticosus Maxim., we used high-throughput RNA-Seq technology to investigate its transcriptome. We obtained 58,327,688 reads, which were assembled into 75,803 unique unigenes. To better understand their functions, the unigenes were annotated using the Clusters of Orthologous Groups, Gene Ontology, and Kyoto Encyclopedia of Genes and Genomes databases. Digital gene expression libraries revealed differences in gene expression profiles at different developmental stages (embryogenic callus, yellow embryogenic callus, global embryo). We obtained a sequencing depth of >5.6 million tags per sample and identified many differentially expressed genes at various stages of SE. The initiation of SE affected gene expression in many KEGG pathways, but predominantly that in metabolic pathways, biosynthesis of secondary metabolites, and plant hormone signal transduction. This information on the changes in the multiple pathways related to SE induction in E. senticosus Maxim. embryogenic tissue will contribute to a more comprehensive understanding of the mechanisms involved in early SE. Additionally, the differentially expressed genes may act as molecular markers and could play very important roles in the early stage of SE. The results are a comprehensive molecular biology resource for investigating SE of E. senticosus Maxim.

  8. Integrative analyses of RNA editing, alternative splicing, and expression of young genes in human brain transcriptome by deep RNA sequencing.

    PubMed

    Wu, Dong-Dong; Ye, Ling-Qun; Li, Yan; Sun, Yan-Bo; Shao, Yi; Chen, Chunyan; Zhu, Zhu; Zhong, Li; Wang, Lu; Irwin, David M; Zhang, Yong E; Zhang, Ya-Ping

    2015-08-01

    Next-generation RNA sequencing has been successfully used for identification of transcript assembly, evaluation of gene expression levels, and detection of post-transcriptional modifications. Despite these large-scale studies, additional comprehensive RNA-seq data from different subregions of the human brain are required to fully evaluate the evolutionary patterns experienced by the human brain transcriptome. Here, we provide a total of 6.5 billion RNA-seq reads from different subregions of the human brain. A significant correlation was observed between the levels of alternative splicing and RNA editing, which might be explained by a competition between the molecular machineries responsible for the splicing and editing of RNA. Young human protein-coding genes demonstrate biased expression to the neocortical and non-neocortical regions during evolution on the lineage leading to humans. We also found that a significantly greater number of young human protein-coding genes are expressed in the putamen, a tissue that was also observed to have the highest level of RNA-editing activity. The putamen, which previously received little attention, plays an important role in cognitive ability, and our data suggest a potential contribution of the putamen to human evolution.

  9. Identification of microRNAs and their target genes in Alport syndrome using deep sequencing of iPSCs samples.

    PubMed

    Chen, Wen-biao; Huang, Jian-rong; Yu, Xiang-qi; Lin, Xiao-cong; Dai, Yong

    2015-03-01

    MicroRNAs (miRNAs) are a class of small RNA molecules that are implicated in post-transcriptional regulation of gene expression during development. The discovery and understanding of miRNAs has revolutionized the traditional view of gene expression. Alport syndrome (AS) is an inherited disorder of type IV collagen, which most commonly leads to glomerulonephritis and kidney failure. Patients with AS inevitably reach end-stage renal disease and require renal replacement therapy, starting in young adulthood. In this study, Solexa sequencing was used to identify and quantitatively profile small RNAs from an AS family. We identified 30 known miRNAs that showed a significant change in expression between two individuals. Nineteen miRNAs were up-regulated and eleven were down-regulated. Forty-nine novel miRNAs showed significantly different levels of expression between two individuals. Gene target predictions for the miRNAs revealed that high ranking target genes were implicated in cell, cell part and cellular process categories. The purine metabolism pathway and mitogen-activated protein kinase (MAPK) signaling pathway were enriched by the largest number of target genes. These results strengthen the notion that miRNAs and their target genes are involved in AS and the data advance our understanding of miRNA function in the pathogenesis of AS.

  10. Dodecyl Maltopyranoside Enabled Purification of Active Human GABA Type A Receptors for Deep and Direct Proteomic Sequencing*

    PubMed Central

    Zhang, Xi; Miller, Keith W.

    2015-01-01

    The challenge in high-quality membrane proteomics is all about sample preparation prior to HPLC, and the cell-to-protein step poses a long-standing bottleneck. Traditional protein extraction methods apply ionic or poly-disperse detergents, harsh denaturation, and repeated protein/peptide precipitation/resolubilization afterward, but suffer low yield, low reproducibility, and low sequence coverage. Contrary to attempts to subdue, we resolved this challenge by providing proteins nature-and-activity-promoting conditions throughout preparation. Using 285-kDa hetero-pentameric human GABA type A receptor overexpressed in HEK293 as a model, we describe a n-dodecyl-β-d-maltopyranoside/cholesteryl hemisuccinate (DDM/CHS)-based affinity purification method, that produced active receptors, supported protease activity, and allowed high performance with both in-gel and direct gel-free proteomic analyses—without detergent removal. Unlike conventional belief that detergents must be removed before HPLC MS, the high-purity low-dose nonionic detergent DDM did not interfere with peptides, and obviated removal or desalting. Sonication or dropwise addition of detergent robustly solubilized over 90% of membrane pellets. The purification conditions were comparable to those applied in successful crystallizations of most membrane proteins. These results enabled streamlined proteomics of human synaptic membrane proteins, and more importantly, allowed directly coupling proteomics with crystallography to characterize both static and dynamic structures of membrane proteins in crystallization pipelines. PMID:25473089

  11. Functional Impact of RNA editing and ADARs on regulation of gene expression: perspectives from deep sequencing studies.

    PubMed

    Liu, Hsuan; Ma, Chung-Pei; Chen, Yi-Tung; Schuyler, Scott C; Chang, Kai-Ping; Tan, Bertrand Chin-Ming

    2014-01-01

    Cells regulate gene expression at multiple levels leading to a balance between robustness and complexity within their proteome. One core molecular step contributing to this important balance during metazoan gene expression is RNA editing, such as the co-transcriptional recoding of RNA transcripts catalyzed by the adenosine deaminse acting on RNA (ADAR) family of enzymes. Understanding of the adenosine-to-inosine RNA editing process has been broadened considerably by the next generation sequencing (NGS) technology, which allows for in-depth demarcation of an RNA editome at nucleotide resolution. However, critical issues remain unresolved with regard to how RNA editing cooperates with other transcript-associated events to underpin regulated gene expression. Here we review the growing body of evidence, provided by recent NGS-based studies, that links RNA editing to other mechanisms of post-transcriptional RNA processing and gene expression regulation including alternative splicing, transcript stability and localization, and the biogenesis and function of microRNAs (miRNAs). We also discuss the possibility that systematic integration of NGS data may be employed to establish the rules of an "RNA editing code", which may give us new insights into the functional consequences of RNA editing.

  12. Conserved and novel heat stress-responsive microRNAs were identified by deep sequencing in Saccharina japonica (Laminariales, Phaeophyta).

    PubMed

    Liu, Fuli; Wang, Wenjun; Sun, Xiutao; Liang, Zhourui; Wang, Feijiu

    2015-07-01

    As a temperate-cold species, Saccharina japonica often suffers heat stress when it is transplanted to temperate and subtropical zones. Study the heat stress response and resistance mechanism of Saccharina is of great significance for understanding the acclimation to heat stress under domestication as well as for breeding new cultivars with heat stress resistance. In this study, we identified a set of heat stress-responsive miRNAs and analysed their regulation during the heat stress response. CO (control) and heat stress (HS) sRNA libraries were constructed and sequenced. Forty-nine known miRNAs and 75 novel miRNAs were identified, of which seven known and 25 novel miRNAs were expressed differentially under heat stress. Quantitative PCR of six selected miRNAs confirmed that these loci were responsive to heat stress. Thirty-nine and 712 genes were predicted to be targeted by the seven known miRNAs and 25 novel miRNAs, respectively. Gene function and pathway analyses showed that these genes probably play important roles in S. japonica heat stress tolerance. The miRNAs identified represent the first set of heat-responsive miRNAs identified from S. japonica, and their identification can help elucidate the heat stress response and resistance mechanisms in S. japonica.

  13. Deep-sequencing transcriptome analysis of field-grown Medicago sativa L. crown buds acclimated to freezing stress.

    PubMed

    Song, Lili; Jiang, Lin; Chen, Yue; Shu, Yongjun; Bai, Yan; Guo, Changhong

    2016-09-01

    Medicago sativa L. (alfalfa) 'Zhaodong' is an important forage legume that can safely survive in northern China where winter temperatures reach as low as -30 °C. Survival of alfalfa following freezing stress depends on the amount and revival ability of crown buds. In order to investigate the molecular mechanisms of frost tolerance in alfalfa, we used transcriptome sequencing technology and bioinformatics strategies to analyze crown buds of field-grown alfalfa during winter. We statistically identified a total of 5605 differentially expressed genes (DEGs) involved in freezing stress including 1900 upregulated and 3705 downregulated DEGs. We validated 36 candidate DEGs using qPCR to confirm the accuracy of the RNA-seq data. Unlike other recent studies, this study employed alfalfa plants grown in the natural environment. Our results indicate that not only the CBF orthologs but also membrane proteins, hormone signal transduction pathways, and ubiquitin-mediated proteolysis pathways indicate the presence of a special freezing adaptation mechanism in alfalfa. The antioxidant defense system may rapidly confer freezing tolerance to alfalfa. Importantly, biosynthesis of secondary metabolites and phenylalanine metabolism, which is of potential importance in coordinating freezing tolerance with growth and development, were downregulated in subzero temperatures. The adaptive mechanism for frost tolerance is a complex multigenic process that is not well understood. This systematic analysis provided an in-depth view of stress tolerance mechanisms in alfalfa.

  14. Draft Genome Sequence for the Type Strain Vulcanibacillus modesticaldus BR, a Strictly Anaerobic, Moderately Thermophilic, and Nitrate-Reducing Bacterium Isolated from Deep-Sea Hydrothermal Vents of the Mid-Atlantic Ridge

    PubMed Central

    Abin, Christopher A.

    2016-01-01

    Vulcanibacillus modesticaldus BRT was isolated from calcite-rich, metalliferous core samples collected at the Rainbow deep-sea hydrothermal vent field on the Mid-Atlantic Ridge. Here, we report the 2.2-Mb draft genome sequence for this strain, consisting of 100 contigs with a G+C content of 33.6% and 2,227 protein-coding sequences. PMID:27834704

  15. Draft Genome Sequence for the Type Strain Vulcanibacillus modesticaldus BR, a Strictly Anaerobic, Moderately Thermophilic, and Nitrate-Reducing Bacterium Isolated from Deep-Sea Hydrothermal Vents of the Mid-Atlantic Ridge.

    PubMed

    Abin, Christopher A; Hollibaugh, James T

    2016-11-10

    Vulcanibacillus modesticaldus BR(T) was isolated from calcite-rich, metalliferous core samples collected at the Rainbow deep-sea hydrothermal vent field on the Mid-Atlantic Ridge. Here, we report the 2.2-Mb draft genome sequence for this strain, consisting of 100 contigs with a G+C content of 33.6% and 2,227 protein-coding sequences.

  16. Intraclonal diversity in follicular lymphoma analyzed by quantitative ultra-deep sequencing of non-coding regions1

    PubMed Central

    Spence, Janice M.; Abumoussa, Andrew; Spence, John P.; Burack, W. Richard

    2014-01-01

    Cancers are characterized by genomic instability and the resulting intra-clonal diversity is a prerequisite for tumor evolution. Therefore, metrics of tumor heterogeneity may prove to be clinically meaningful. Intra-clonal heterogeneity in follicular lymphoma (FL) is apparent from studies of somatic hypermutation (SHM) caused by Activation Induced Deaminase (AID) in IGH. Aberrant SHM (aSHM), defined as AID activity outside of the IG loci, predominantly targets non-coding regions causing numerous “passenger” mutations but has the potential to generate rare significant “driver” mutations. The quantitative relationship between SHM and aSHM has not been defined. To measure SHM and aSHM, ultradeep sequencing (>20,000 fold coverage) was performed on IGH (∼1650nt) and 9 other non-coding regions potentially targeted by AID (combined 9411nt), including the 5′UTR of BCL2. Single nucleotide variants (SNV) were found in 12/12 FL specimens (median 136 SHM and 53 aSHM). The aSHM SNVs were associated with AID-motifs (p<0.0001). The number of SNVs at BCL2 varied widely among specimens and correlated with the number of SNVs at 8 other potential aSHM sites. In contrast SHM at IGH was not predictive of aSHM. Tumor heterogeneity is apparent from SNVs at low variant allele frequencies (VAF); the relative number of SNVs with VAF<5% varied with clinical grade indicating that tumor heterogeneity based on aSHM reflects a clinically meaningful parameter. These data suggest that genome-wide aSHM may be estimated from aSHM of BCL2 but not SHM of IGH. The results demonstrate a practical approach to the quantification of intra-tumoral genetic heterogeneity for clinical specimens. PMID:25311808

  17. The Lower Main Sequence of ω Centauri from Deep Hubble Space Telescope NICMOS Near-Infrared Observations

    NASA Astrophysics Data System (ADS)

    Pulone, Luigi; De Marchi, Guido; Paresce, Francesco; Allard, France

    1998-01-01

    A 20" × 20" field located ~7' from the center of the massive galactic globular cluster ω Centauri (NGC 5139) was observed by the NIC2 camera of the Near-Infrared Camera and Multiobject Spectrometer on board the Hubble Space Telescope (HST) through the F110W and F160W broadband filters centered at 1.1 and 1.6 μm for a total of 3000 and 4000 s for the two filters, respectively. Standard photometric analysis of the resulting images yields 340 stars with a signal above a 10 σ threshold in both filters, covering the range of HST m160 magnitudes between 20 and 26, the deepest probe yet of a globular cluster in this wavelength region. These objects form a well-defined sequence in the m160 versus m110-m160 plane that is consistent with the theoretical near-IR color-magnitude diagram expected from recent low-mass stellar model calculations. The resulting stellar luminosity function increases steadily with increasing magnitude up to a peak at m160~=25, where it turns over and drops slowly down to the detection limit set by the incompleteness limit of 60% at m160~=26. With the theoretical mass-luminosity relationship that provides the best fit to the IR color-magnitude diagram, we obtain an excellent fit to the observed luminosity function down to a mass of ~0.2 Msolar with a power-law mass function having a slope of α=-1. Based on observations with the NASA/ESA Hubble Space Telescope, obtained at the Space Telescope Science Institute, which is operated by the Association of Universities for Research in Astronomy, Inc., for NASA under contract NAS5-26555.

  18. Deep sequencing of Brachypodium small RNAs at the global genome level identifies microRNAs involved in cold stress response

    PubMed Central

    Zhang, Jingyu; Xu, Yunyuan; Huan, Qing; Chong, Kang

    2009-01-01

    Background MicroRNAs (miRNAs) are endogenous small RNAs having large-scale regulatory effects on plant development and stress responses. Extensive studies of miRNAs have only been performed in a few model plants. Although miRNAs are proved to be involved in plant cold stress responses, little is known for winter-habit monocots. Brachypodium distachyon, with close evolutionary relationship to cool-season cereals, has recently emerged as a novel model plant. There are few reports of Brachypodium miRNAs. Results High-throughput sequencing and whole-genome-wide data mining led to the identification of 27 conserved miRNAs, as well as 129 predicted miRNAs in Brachypodium. For multiple-member conserved miRNA families, their sizes in Brachypodium were much smaller than those in rice and Populus. The genome organization of miR395 family in Brachypodium was quite different from that in rice. The expression of 3 conserved miRNAs and 25 predicted miRNAs showed significant changes in response to cold stress. Among these miRNAs, some were cold-induced and some were cold-suppressed, but all the conserved miRNAs were up-regulated under cold stress condition. Conclusion Our results suggest that Brachypodium miRNAs are composed of a set of conserved miRNAs and a large proportion of non-conserved miRNAs with low expression levels. Both kinds of miRNAs were involved in cold stress response, but all the conserved miRNAs were up-regulated, implying an important role for cold-induced miRNAs. The different size and genome organization of miRNA families in Brachypodium and rice suggest that the frequency of duplication events or the selection pressure on duplicated miRNAs are different between these two closely related plant species. PMID:19772667

  19. Novel MicroRNA Involved in Host Response to Avian Pathogenic Escherichia coli Identified by Deep Sequencing and Integration Analysis

    PubMed Central

    Jia, Xinzheng; Zhang, Xiquan; Nolan, Lisa K.

    2016-01-01

    ABSTRACT Avian pathogenic Escherichia coli (APEC) causes one of the most common bacterial diseases of poultry worldwide. Effective control methods are therefore desirable and will be facilitated by a better understanding of the host response to the pathogen. Currently, microRNAs (miRNAs) involved in host resistance to APEC are unknown. Here, we applied RNA sequencing to explore the changed miRNAs and deregulated genes in the spleen of three groups of broilers: nonchallenged (NC), APEC-challenged with mild pathology (CM), and APEC-challenged with severe pathology (CS). Twenty-seven differentially expressed miRNAs (fold change >1.5; P value <0.01) were identified, including 13 miRNAs between the NC and CM, 17 between the NC and CS, and 14 between the CM and CS groups. Through functional analysis of these miRNA targets, 12 immune-related biological processes were found to be significantly enriched. Based on combined analyses of differentially expressed miRNAs and mRNAs within each of the three groups, 43 miRNA-mRNA pairs displayed significantly negative correlations (r < −0.8). Notably, gga-miR-429 was greatly increased in the CS group compared to levels in both the CM and NC groups. In vitro, gga-miR-429 directly repressed luciferase reporter gene activity via binding to 3′ untranslated regions of TMEFF2, NTRK2, and SHISA2. Overexpression of gga-miR-429 in the HD11 macrophage cell line significantly inhibited TMEFF2 and SHISA2 expression, which are involved in the lipopolysaccharide-induced platelet-derived growth factor (PDGF) and Wnt signaling pathways. In summary, we provide the first report characterizing the miRNA changes during APEC infection, which may help to shed light on the roles of these recently identified genetic elements in the mechanisms of host resistance and susceptibility to APEC. PMID:27795362

  20. RNA deep sequencing reveals differential microRNA expression during development of sea urchin and sea star.

    PubMed

    Kadri, Sabah; Hinman, Veronica F; Benos, Panayiotis V

    2011-01-01

    microRNAs (miRNAs) are small (20-23 nt), non-coding single stranded RNA molecules that act as post-transcriptional regulators of mRNA gene expression. They have been implicated in regulation of developmental processes in diverse organisms. The echinoderms, Strongylocentrotus purpuratus (sea urchin) and Patiria miniata (sea star) are excellent model organisms for studying development with well-characterized transcriptional networks. However, to date, nothing is known about the role of miRNAs during development in these organisms, except that the genes that are involved in the miRNA biogenesis pathway are expressed during their developmental stages. In this paper, we used Illumina Genome Analyzer (Illumina, Inc.) to sequence small RNA libraries in mixed stage population of embryos from one to three days after fertilization of sea urchin and sea star (total of 22,670,000 reads). Analysis of these data revealed the miRNA populations in these two species. We found that 47 and 38 known miRNAs are expressed in sea urchin and sea star, respectively, during early development (32 in common). We also found 13 potentially novel miRNAs in the sea urchin embryonic library. miRNA expression is generally conserved between the two species during development, but 7 miRNAs are highly expressed in only one species. We expect that our two datasets will be a valuable resource for everyone working in the field of developmental biology and the regulatory networks that affect it. The computational pipeline to analyze Illumina reads is available at http://www.benoslab.pitt.edu/services.html.

  1. Deep-Sequence Identification and Role in Virus Replication of a JC Virus Quasispecies in Patients with Progressive Multifocal Leukoencephalopathy.

    PubMed

    Takahashi, Kenta; Sekizuka, Tsuyoshi; Fukumoto, Hitomi; Nakamichi, Kazuo; Suzuki, Tadaki; Sato, Yuko; Hasegawa, Hideki; Kuroda, Makoto; Katano, Harutaka

    2017-01-01

    JC virus (JCV) is a DNA virus causing progressive multifocal leukoencephalopathy (PML) in immunodeficient patients. In the present study, 22 genetic quasispecies with more than 1.5% variant frequency were detected in JCV genomes from six clinical samples of PML by next-generation sequencing. A mutation from A to C at nucleotide (nt) 3495 in JCV Mad1 resulting in a V-to-G amino acid substitution at amino acid (aa) position 392 of the large T antigen (TAg) was identified in all six cases of PML at 3% to 19% variant frequencies. Transfection of JCV Mad1 DNA possessing the V392G substitution in TAg into IMR-32 and human embryonic kidney 293 (HEK293) cells resulted in dramatically decreased production of JCV-encoded proteins. The virus DNA copy number was also reduced in supernatants of the mutant virus-transfected cells. Transfection of the IMR-32 and HEK293 cells with a virus genome containing a revertant mutation recovered viral production and protein expression. Cotransfection with equal amounts of wild-type genome and mutated JCV genome did not reduce the expression of viral proteins or viral replication, suggesting that the mutation did not have any dominant-negative function. Finally, immunohistochemistry demonstrated that TAg was expressed in all six pathological samples in which the quasispecies were detected. In conclusion, the V392G amino acid substitution in TAg identified frequently in PML lesions has a function in suppressing JCV replication, but the frequency of the mutation was restricted and its role in PML lesions was limited.

  2. The SCUBA-2 Cosmology Legacy Survey: galaxies in the deep 850 μm survey, and the star-forming `main sequence'

    NASA Astrophysics Data System (ADS)

    Koprowski, M. P.; Dunlop, J. S.; Michałowski, M. J.; Roseboom, I.; Geach, J. E.; Cirasuolo, M.; Aretxaga, I.; Bowler, R. A. A.; Banerji, M.; Bourne, N.; Coppin, K. E. K.; Chapman, S.; Hughes, D. H.; Jenness, T.; McLure, R. J.; Symeonidis, M.; Werf, P. van der

    2016-06-01

    We investigate the properties of the galaxies selected from the deepest 850-μm survey undertaken to date with (Submillimetre Common-User Bolometer Array 2) SCUBA-2 on the James Clerk Maxwell Telescope as part of the SCUBA-2 Cosmology Legacy Survey. A total of 106 sources (>5σ) were uncovered at 850 μm from an area of ≃150 arcmin2 in the centre of the COSMOS/UltraVISTA/Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS) field, imaged to a typical depth of σ850 ≃ 0.25 mJy. We utilize the available multifrequency data to identify galaxy counterparts for 80 of these sources (75 per cent), and to establish the complete redshift distribution for this sample, yielding bar{z} = 2.38± 0.09. We have also been able to determine the stellar masses of the majority of the galaxy identifications, enabling us to explore their location on the star formation rate:stellar mass (SFR:M*) plane. Crucially, our new deep 850-μm-selected sample reaches flux densities equivalent to SFR ≃ 100 M⊙ yr-1, enabling us to confirm that sub-mm galaxies form the high-mass end of the `main sequence' (MS) of star-forming galaxies at z > 1.5 (with a mean specific SFR of sSFR = 2.25 ± 0.19 Gyr-1 at z ≃ 2.5). Our results are consistent with no significant flattening of the MS towards high masses at these redshifts. However, our results add to the growing evidence that average sSFR rises only slowly at high redshift, resulting in log10sSFR being an apparently simple linear function of the age of the Universe.

  3. Genome-wide identification of miRNAs responsive to drought in peach (Prunus persica) by high-throughput deep sequencing.

    PubMed

    Eldem, Vahap; Çelikkol Akçay, Ufuk; Ozhuner, Esma; Bakır, Yakup; Uranbey, Serkan; Unver, Turgay

    2012-01-01

    Peach (Prunus persica L.) is one of the most important worldwide fresh fruits. Since fruit growth largely depends on adequate water supply, drought stress is considered as the most important abiotic stress limiting fleshy fruit production and quality in peach. Plant responses to drought stress are regulated both at transcriptional and post-transcriptional level. As post-transcriptional gene regulators, miRNAs (miRNAs) are small (19-25 nucleotides in length), endogenous, non-coding RNAs. Recent studies indicate that miRNAs are involved in plant responses to drought. Therefore, Illumina deep sequencing technology was used for genome-wide identification of miRNAs and their expression profile in response to drought in peach. In this study, four sRNA libraries were constructed from leaf control (LC), leaf stress (LS), root control (RC) and root stress (RS) samples. We identified a total of 531, 471, 535 and 487 known mature miRNAs in LC, LS, RC and RS libraries, respectively. The expression level of 262 (104 up-regulated, 158 down-regulated) of the 453 miRNAs changed significantly in leaf tissue, whereas 368 (221 up-regulated, 147 down-regulated) of the 465 miRNAs had expression levels that changed significantly in root tissue upon drought stress. Additionally, a total of 197, 221, 238 and 265 novel miRNA precursor candidates were identified from LC, LS, RC and RS libraries, respectively. Target transcripts (137 for LC, 133 for LS, 148 for RC and 153 for RS) generated significant Gene Ontology (GO) terms related to DNA binding and catalytic activities. Genome-wide miRNA expression analysis of peach by deep sequencing approach helped to expand our understanding of miRNA function in response to drought stress in peach and Rosaceae. A set of differentially expressed miRNAs could pave the way for developing new strategies to alleviate the adverse effects of drought stress on plant growth and development.

  4. Looking For a Needle in the Haystack: Deciphering Indigenous 1.79 km Deep Subsurface Microbial Communities from Drilling Mud Contaminants Using 454 Pyrotag Sequencing

    NASA Astrophysics Data System (ADS)

    Dong, Y.; Cann, I.; Mackie, R.; Price, N.; Flynn, T. M.; Sanford, R.; Miller, P.; Chia, N.; Kumar, C. G.; Kim, P.; Sivaguru, M.; Fouke, B. W.

    2010-12-01

    Knowledge of the composition, structure and activity of microbial communities that live in deeply buried sedimentary rocks is fundamental to the future of subsurface biosphere stewardship as it relates to hydrocarbon exploration and extraction, carbon sequestration, gas storage and groundwater management. However, the study of indigenous subsurface microorganisms has been limited by the technical challenges of collecting deep formation water samples that have not been heavily contaminated by the mud used to drill the wells. To address this issue, a “clean-sampling method” deploying the newly developed Schlumberger Quicksilver MDT probe was used to collect a subsurface sample at a depth of 1.79 km (5872 ft) from an exploratory well within Cambrian-age sandstones in the Illinois Basin. This yielded a formation water sample that was determined to have less than 4% drilling mud contamination based on tracking changes in the aqueous geochemistry of the formation water during ~3 hours of pumping at depth prior to sample collection. A suite of microscopy and culture-independent molecular analyses were completed using the DNA extracted from microbial cells in the formation water, which included 454 amplicon pyrosequencing that targeted the V1-V3 hypervariable region of bacterial 16S rRNA gene sequences. Results demonstrated an extremely low diversity microbial community living in formation water at 1.79 km-depth. More than 95 % of the total V1-V3 pyrosequencing reads (n=11574) obtained from the formation water were affiliated with a halophilic γ-proteobacterium and most closely related to the genus Halomonas. In contrast, about 3 % of the V1-V3 sequences in the drilling mud library (n=13044) were classified as genus Halomonas but were distinctly different and distantly related to the formation water Halomonas detected at 1.79 km-depth. These results were consistent with those obtained using a suite of other molecular screens (e.g., Terminal-Restriction Fragment Length

  5. Extraction of high-molecular-weight genomic DNA for long-read sequencing of single molecules.

    PubMed

    Mayjonade, Baptiste; Gouzy, Jérôme; Donnadieu, Cécile; Pouilly, Nicolas; Marande, William; Callot, Caroline; Langlade, Nicolas; Muños, Stéphane

    2016-10-01

    De novo sequencing of complex genomes is one of the main challenges for researchers seeking high-quality reference sequences. Many de novo assemblies are based on short reads, producing fragmented genome sequences. Third-generation sequencing, with read lengths >10 kb, will improve the assembly of complex genomes, but these techniques require high-molecular-weight genomic DNA (gDNA), and gDNA extraction protocols used for obtaining smaller fragments for short-read sequencing are not suitable for this purpose. Methods of preparing gDNA for bacterial artificial chromosome (BAC) libraries could be adapted, but these approaches are time-consuming, and commercial kits for these methods are expensive. Here, we present a protocol for rapid, inexpensive extraction of high-molecular-weight gDNA from bacteria, plants, and animals. Our technique was validated using sunflower leaf samples, producing a mean read length of 12.6 kb and a maximum read length of 80 kb.

  6. Euxinic Deep Ocean Inferred from 3.2GA Black Shale Sequence in Dxcl-Dp Pilbara, Western Australia

    NASA Astrophysics Data System (ADS)

    Sakamoto, R.; Kiyokawa, S.; Naraoka, H.; Ikehara, M.; Ito, T.; Suganuma, Y.; Yamaguchi, K. E.

    2011-12-01

    The 3.2 Ga Dixon Island - Cleaverville formations in the coastal Pilbara terrane, Western Australia, are among the best-preserved examples of Mesoarchean sedimentary sequences (Kiyokawa et al., 2006). The DXCL-DP (Dixon Island - Cleaverville Drilling Project; Yamaguchi et al., 2009) was conducted in 2007 where modern-weathering-free cores including black shale were successfully recovered (DX, CL2 and CL1 in ascending order). These core samples include pyrite as laminae and tiny crystal. Tiny pyrite crystals are divided into three morphological types; spherical, hollow, and filled types. Based on microscopic observation, pyrite laminae are found to be composed of an aggregate of these pyrite crystals, where spherical-type pyrite crystals were overgrown by pore-space filling pyrite (filled type). The sulfur content of black shale increases from 0.9 wt.% (DX core) 1.8 wt.% (CL1 core) on average. The Corg/S ratios (by wt.%) range from 0.5 (CL1 core) to 1.7 (DX core). Despite a few stratigraphic levels that have >2.0 Corg/S ratios, most of the samples in these three cores have Corg/S ratios < 1.0. Although S content of DX core is generally lower than that of the other cores, DX core has many thin pyrite laminae. On the other hand, CL1 and CL2 cores have few pyrite laminae but many disseminated fine-grained pyrite. Sulfur isotope compositions were measured for pyrite laminae and bulk black shale that include fine-grained pyrite. They range from -10.1 to +26.8 % (relative to CDT) and randomly vary with stratigraphic height. Highly 34S-enriched values are outstanding in the Archean S isotope record published to date. Based on these observations, we suggest the following scenario of sedimentary pyrite formation. Spherical-type pyrite crystallized syngenetically or during early diagenesis. Those pyrite crystals likely formed in euxinic environments like Black Sea (e.g., Berner, 1984), as suggested by the relationship between their Corg and S contents. Such the environment is

  7. Deep sequencing and genome-wide analysis reveals the expansion of MicroRNA genes in the gall midge Mayetiola destructor

    PubMed Central

    2013-01-01

    Background MicroRNAs (miRNAs) are small non-coding RNAs that play critical roles in regulating post transcriptional gene exp